Publications
2019
1.
Al-Nabki, Wesam; Fidalgo, Eduardo; Alegre, Enrique; Chaves, Deisy
Content-Based Features to Rank Influential Hidden Services of the Tor Darknet Artículo de revista
En: arXiv e-prints, pp. arXiv–1910, 2019.
Resumen | Enlaces | BibTeX | Etiquetas: Darknet, Feature extraction, Hidden Services, Influence Detection, Learning-to-Rank, TOR
@article{al-nabki_content-based_2019,
title = {Content-Based Features to Rank Influential Hidden Services of the Tor Darknet},
author = {Wesam Al-Nabki and Eduardo Fidalgo and Enrique Alegre and Deisy Chaves},
url = {https://arxiv.org/abs/1910.02332},
year = {2019},
date = {2019-01-01},
journal = {arXiv e-prints},
pages = {arXiv–1910},
abstract = {This paper introduces a content-based ranking framework to identify the most influential onion domains on the Tor Darknet. It models domains using 40 features from five sources (text, HTML, named entities, network topology, and visual content) and applies a Learning-to-Rank (LtR) approach for ranking. A case study on drug-related domains shows that (1) the listwise LtR method achieves an NDCG of 0.95 for the top-10, (2) the framework outperforms link-based ranking techniques, and (3) textual features (text, NER, HTML) offer the best balance of efficiency and accuracy. This system could aid law enforcement in detecting suspicious domains.},
keywords = {Darknet, Feature extraction, Hidden Services, Influence Detection, Learning-to-Rank, TOR},
pubstate = {published},
tppubtype = {article}
}
This paper introduces a content-based ranking framework to identify the most influential onion domains on the Tor Darknet. It models domains using 40 features from five sources (text, HTML, named entities, network topology, and visual content) and applies a Learning-to-Rank (LtR) approach for ranking. A case study on drug-related domains shows that (1) the listwise LtR method achieves an NDCG of 0.95 for the top-10, (2) the framework outperforms link-based ranking techniques, and (3) textual features (text, NER, HTML) offer the best balance of efficiency and accuracy. This system could aid law enforcement in detecting suspicious domains.