Publications
2024
1.
Al-Nabki, Wesam; Fidalgo, Eduardo; Alegre, Enrique; Delany, Sarah Jane; Jáñez-Martino, Francisco
Classifying the content of online notepad services using active learning Artículo de revista
En: Journal of Intelligent Information Systems, pp. 1–27, 2024, (Publisher: Springer US).
Resumen | Enlaces | BibTeX | Etiquetas: Cybersecurity, Illegal Activities, machine learning, Pastebin, Text classification
@article{al-nabki_classifying_2024,
title = {Classifying the content of online notepad services using active learning},
author = {Wesam Al-Nabki and Eduardo Fidalgo and Enrique Alegre and Sarah Jane Delany and Francisco Jáñez-Martino},
url = {https://link.springer.com/article/10.1007/s10844-024-00902-8},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {Journal of Intelligent Information Systems},
pages = {1–27},
abstract = {This paper proposes a cascading classification system with Active Learning to identify suspicious activities on Pastebin. The model classifies texts into code snippets, readability, and suspicious or illegal activities. It introduces the INSPECT-3.8M dataset, containing 3.8 million labeled samples. This approach helps law enforcement agencies detect and block illegal content on Pastebin before it spreads.},
note = {Publisher: Springer US},
keywords = {Cybersecurity, Illegal Activities, machine learning, Pastebin, Text classification},
pubstate = {published},
tppubtype = {article}
}
This paper proposes a cascading classification system with Active Learning to identify suspicious activities on Pastebin. The model classifies texts into code snippets, readability, and suspicious or illegal activities. It introduces the INSPECT-3.8M dataset, containing 3.8 million labeled samples. This approach helps law enforcement agencies detect and block illegal content on Pastebin before it spreads.
2019
2.
Alegre, Enrique
SUPERVISED MACHINE LEARNING FOR CLASSIFICATION, MINING, AND RANKING OF ILLEGAL WEB CONTENTS Tesis doctoral
UNIVERSITY OF LEÓN, 2019.
Resumen | Enlaces | BibTeX | Etiquetas: Darknet, Illegal Activities, Pastebin, Text classification, TOR Network
@phdthesis{alegre_supervised_2019,
title = {SUPERVISED MACHINE LEARNING FOR CLASSIFICATION, MINING, AND RANKING OF ILLEGAL WEB CONTENTS},
author = {Enrique Alegre},
url = {https://scholar.google.es/citations?view_op=view_citation&hl=es&user=yATJZvcAAAAJ&cstart=100&pagesize=100&sortby=title&citation_for_view=yATJZvcAAAAJ:ldfaerwXgEUC},
year = {2019},
date = {2019-01-01},
school = {UNIVERSITY OF LEÓN},
abstract = {This thesis introduces algorithms, methods, and datasets aimed at classifying, mining information, and ranking web domains or similar resources containing text. The focus is on detecting web content that may indicate illegal activities, particularly in the Tor Darknet and Online Notepad Services (ONS), like Pastebin. Motivated by a collaboration with INCIBE, the research addresses the identification of criminal content in these areas, based on the assumption that the Tor network harbors a significant amount of illicit activity.},
keywords = {Darknet, Illegal Activities, Pastebin, Text classification, TOR Network},
pubstate = {published},
tppubtype = {phdthesis}
}
This thesis introduces algorithms, methods, and datasets aimed at classifying, mining information, and ranking web domains or similar resources containing text. The focus is on detecting web content that may indicate illegal activities, particularly in the Tor Darknet and Online Notepad Services (ONS), like Pastebin. Motivated by a collaboration with INCIBE, the research addresses the identification of criminal content in these areas, based on the assumption that the Tor network harbors a significant amount of illicit activity.