
How we recognize text in images with TextSpotting
The Deep Web is a portion of the Internet that is not indexed by standard search engines, such as Google or Bing, and, therefore, is not shown by them when we search anything on the Internet. There is a deeper part of the Deep Web, called the Dark Web, where users can only access to using special software. Dark Web types usually include networks that ensures the privacy and anonymity of its users, such as The Onion Router (Tor), whose domains are also called hidden services. Due to this anonymity, they are a common source of illegal content and activities. It is estimated that about 25% of the content found in Tor network may involve potentially illegal activities, such as counterfeiting ID documents, credit cards, weapons, drug selling, and other types of illegal content.
Due to the great number of these hidden services as well as the size of the information available within them, automated techniques are used to analyze the content and detect potential threats or illegal activities in a less time-consuming manner. Several works have been proposed to fight against illegal activities in Tor network, including automatic methods to detect the topic of a text (Text Classification), or methods to make a summary of a text automatically (Text Summarization), which are applied after getting automatically the text of the domains inside these networks. However, these methods are not capable of processing written text inside an image. Hence, they are losing a large amount of potentially valuable information, such as a product name, brand, or even the seller’s name. A technique called Text Spotting fills this gap, as it is capable of detecting and recognizing text in natural scene images.
Text Spotting forms a pipeline of two phases: first, it localizes the text within an image, creating a bounding box with the text’s coordinates, and second, it recognizes the word(s) of this text.
There are several challenges to overcome when performing this task, such as partial occlusion of the text in the image, text orientation, or even the presence of different languages in the same image. Text Spotting has many potential applications on real environments aside from detecting illegal content in the mentioned dark networks, such as road navigation, acquiring important geographic information, generic scene understanding and video or image indexing.
We have designed a Text Spotting pipeline by using two special types of neural networks: First, we have used a connectionist text proposal neural network to detect the location of text within the images and, second, an end-to-end trainable neural network for text recognition. Then, in order to test this proposed pipeline, we have considered a subset of 100 images containing text from the TOIC dataset, which contains five different categories of images related to different illegal activities from the Tor network. The results of this test suggest that our proposed pipeline might support tools to help the authorities in detecting these types of illegal activities.
In this post, we have explained the basic operation of Text Spotting, the areas in which it can be of help and different issues it might have with real-world scenarios. Hereafter, this technique can save a lot of time, effort and resources when crawling through domains where valuable textual information is deliberately shown in images with the goal of avoiding automatic text classification techniques.