Publications
2022
1.
Redondo-Gutierrez, Luis Ángel; Jáñez-Martino, Francisco; Fidalgo, Eduardo; Alegre, Enrique; González-Castro, Víctor; Alaiz-Rodríguez, Rocío
Detecting malware using text documents extracted from spam email through machine learning Artículo de revista
En: Proceedings of the 22nd ACM Symposium on Document Engineering, pp. 1–4, 2022.
Resumen | Enlaces | BibTeX | Etiquetas: Malware Detection, NLP, Spam Email, Text classification
@article{redondo-gutierrez_detecting_2022,
title = {Detecting malware using text documents extracted from spam email through machine learning},
author = {Luis Ángel Redondo-Gutierrez and Francisco Jáñez-Martino and Eduardo Fidalgo and Enrique Alegre and Víctor González-Castro and Rocío Alaiz-Rodríguez},
url = {https://dl.acm.org/doi/abs/10.1145/3558100.3563854},
year = {2022},
date = {2022-01-01},
journal = {Proceedings of the 22nd ACM Symposium on Document Engineering},
pages = {1–4},
abstract = {This work introduces the "Spam Email Malware Detection - 600" (SEMD-600) dataset for detecting malware in spam emails using text analysis. It compares two text representation techniques (Bag of Words and TF-IDF) combined with three classifiers (SVM, Naive Bayes, and Logistic Regression). The combination of TF-IDF and Logistic Regression achieved the best performance, with a macro F1 score of 0.763.},
keywords = {Malware Detection, NLP, Spam Email, Text classification},
pubstate = {published},
tppubtype = {article}
}
This work introduces the "Spam Email Malware Detection - 600" (SEMD-600) dataset for detecting malware in spam emails using text analysis. It compares two text representation techniques (Bag of Words and TF-IDF) combined with three classifiers (SVM, Naive Bayes, and Logistic Regression). The combination of TF-IDF and Logistic Regression achieved the best performance, with a macro F1 score of 0.763.