Publications
2020
1.
Jánez-Martino, Francisco; Fidalgo, Eduardo; González, Santiago; Velasco-Mata, Javier
Classification of spam emails through hierarchical clustering and supervised learning Artículo de revista
En: arXiv preprint arXiv:2005.08773, 2020.
Resumen | Enlaces | BibTeX | Etiquetas: Cybersecurity, machine learning, Spam Classification, Text Processing, TF-IDF & BOW
@article{janez-martino_classification_2020,
title = {Classification of spam emails through hierarchical clustering and supervised learning},
author = {Francisco Jánez-Martino and Eduardo Fidalgo and Santiago González and Javier Velasco-Mata},
url = {https://arxiv.org/abs/2005.08773},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
journal = {arXiv preprint arXiv:2005.08773},
abstract = {This work introduces SPEMC-11K, the first multi-class spam email dataset, categorizing spam into Health and Technology, Personal Scams, and Sexual Content. Using TF-IDF and BOW with Naïve Bayes, Decision Trees, and SVM, the best accuracy (95.39% F1-score) is achieved with TF-IDF and SVM, while TF-IDF and NB offer the fastest classification (2.13ms per email).},
keywords = {Cybersecurity, machine learning, Spam Classification, Text Processing, TF-IDF & BOW},
pubstate = {published},
tppubtype = {article}
}
This work introduces SPEMC-11K, the first multi-class spam email dataset, categorizing spam into Health and Technology, Personal Scams, and Sexual Content. Using TF-IDF and BOW with Naïve Bayes, Decision Trees, and SVM, the best accuracy (95.39% F1-score) is achieved with TF-IDF and SVM, while TF-IDF and NB offer the fastest classification (2.13ms per email).