Publications
2021
1.
Jáñez-Martino, Francisco; Alaiz-Rodríguez, Rocío; González-Castro, Víctor; Fidalgo, Eduardo
Trustworthiness of spam email addresses using machine learning Artículo de revista
En: Proceedings of the 21st ACM Symposium on Document Engineering, pp. 1–4, 2021.
Resumen | Enlaces | BibTeX | Etiquetas: Cybersecurity, machine learning, Phishing, Spam Email Detection, Trustworthiness Analysis
@article{janez-martino_trustworthiness_2021,
title = {Trustworthiness of spam email addresses using machine learning},
author = {Francisco Jáñez-Martino and Rocío Alaiz-Rodríguez and Víctor González-Castro and Eduardo Fidalgo},
url = {https://dl.acm.org/doi/abs/10.1145/3469096.3475060},
year = {2021},
date = {2021-01-01},
journal = {Proceedings of the 21st ACM Symposium on Document Engineering},
pages = {1–4},
abstract = {This paper addresses the growing issue of spam emails used by cybercriminals for scams, phishing, and malware attacks. It presents a proof-of-concept methodology to help users assess the trustworthiness of email addresses. The authors introduce a manually labeled dataset of email addresses, categorized as low and high quality, and extract 18 handcrafted features based on social engineering techniques and natural language properties. Four machine learning classifiers are tested, with Naive Bayes yielding the best performance (88.17% accuracy and 0.808 F1-Score). The study also utilizes the InterpretML framework to identify the most relevant features for building an automatic system to assess email address trustworthiness.},
keywords = {Cybersecurity, machine learning, Phishing, Spam Email Detection, Trustworthiness Analysis},
pubstate = {published},
tppubtype = {article}
}
This paper addresses the growing issue of spam emails used by cybercriminals for scams, phishing, and malware attacks. It presents a proof-of-concept methodology to help users assess the trustworthiness of email addresses. The authors introduce a manually labeled dataset of email addresses, categorized as low and high quality, and extract 18 handcrafted features based on social engineering techniques and natural language properties. Four machine learning classifiers are tested, with Naive Bayes yielding the best performance (88.17% accuracy and 0.808 F1-Score). The study also utilizes the InterpretML framework to identify the most relevant features for building an automatic system to assess email address trustworthiness.