The standard search engines, such as Google and Bing, are the principle way we use to find information on the Internet. However, despite their powerful performance, they only can access and store the addresses of only a part of the Web content present on the Internet (a process known as indexing). The part of the Web that can be indexed is called the Surface Web, whereas the part that they cannot index is known as Deep Web. In the depth of the Deep Web, there is some content which not only cannot be indexed by search engines, but also needs a special software application or a proxy to access to: this portion is called the Dark Web.
There are many networks in the Dark Web, being The Onion Router (Tor) one of the most famous. Tor was designed to provide a high level of anonymity: it was developed to help journalists to express their opinion freely in dictatorship countries, and far farm authorities monitoring tools. However, this feature has attracted abusers to promote their illegal business. The Tor metrics website reported that the number of unique addresses has increased from 30000 to almost 90000 between April 2015 and August 2018. This statistic reflects a rapid proliferation in the Tor domains, which are called Hidden Services (HSs) in the Tor community, and hereafter, there is an urgent need to monitor them against any illegal activity.
The Hidden Services vary in their importance: for example, some of them might dominate the market for a specific activity, while others might be only spam or not really have a strong effect on the illegal marketplace. This variation of importance raises up several questions: Which is the Hidden Servcice that dominates the market in each class of activities? Which are the influential domains in the Tor network? What is the backend structure of the onion domains? Answering those questions would be very useful for Law Enforcement Agencies (LEAs) as it gives insights about the illegal activities in the Tor network.
In this post, we propose a fully automatic algorithm, that we call ToRank, to order and to detect the most influential Hidden Services in the Tor network according to their importance in the market. ToRank is a link-based ranking algorithm, which means that it defines the importance of a given domain using information about the connectivity of its hyperlinks to other domains in the network. ToRank represents the Tor network as a directed graph where the nodes represent the Hidden Services, and directed edges connecting two nodes represent the hyperlinks connectivity between the two corresponding Hidden Services in the network.
For each domain, ToRank assigns at least two values: the first one represents its global rank with respect to all the nodes in the network, while the other one refers to its rank within the category of the studied domains. In Fig.1, we illustrate the graph representation of the Tor network, where each dot refers to a Hidden Service and the gray edge denote a hyperlink between two web pages.
In conclusion, ToRank is an automatic algorithm to rank and to detect the most important nodes in the Tor network. The assigned rank value can help the LEAs in easing monitoring process as it can lead them to the HSs that dominates the market and gives insights about the onion domains.