Improving Text Recognition in Tor with Rectification and Super-Resolution
The Onion Router (Tor) network hosts different types of suspicious content and services. Images found in Tor can contain from drugs to fake documents, weapons and seller names.
While some of the Computer Vision approaches on Tor darknet work with images, they omit the analysis of the text found within them, which could be an important source of information. To obtain this information, text recognition can be used.
However, image conditions such as low-resolution and text orientation, commonly found in Tor darknet images, can cause problems when performing this task.
In our work [1], we study the performance of three different super-resolution algorithms to enhance and retrieve text from Tor images. We combine these approaches with a rectification network, which helps correct the text’s orientation.
We perform the testing on our own TOICO-1K [2], a Tor-based image dataset created for the specific purpose of applying Text Spotting in Tor images. We achieved a 3.41% of improvement when we combined Deep CNN and the rectification network, noting that rectification performs slightly better than super-resolution when used separately.
This work is joint research carried out by the Vision and Intelligent Systems group (GVIS) from the University of León, Spain, in collaboration with INCIBE (Spanish National Cybersecurity Institute). Working together, researchers and developers from these two organizations are developing new tools that use machine learning to obtain information from the content of Tor hidden services.
If you are interested in knowing more about our work, you can access the referenced paper and other papers related to the application of Machine Learning to solve Cybersecurity issues on our website [3].
[1] Blanco-Medina, P., Fidalgo, E., Alegre, E., & Jánez-Martino, F. (2019). Improving Text Recognition in Tor darknet with Rectification and Super-Resolution techniques. 9th International Conference on Imaging for Crime Detection and Prevention.
[2] https://gvis.unileon.es/dataset/tor_images_in_context-1k/