In this post, I am going to briefly talk about our work with regards to object retrieval, which is one of the most important areas in the domain of computer vision. To begin with, let me illustrate with an example. Suppose, you have an image containing a particular object, let’s say your favorite dog. Now, you want to find out all those images in which your dog appears, and you wish to collect all of them. In a naive way, we remember the picture of the dog and we search among the images we have.

Let’s make this situation broader. Imagine now that you have millions of images and you are given a limited time to find out the images in which the object you desire appears. This is a complicated task, and it is not feasible unless you are ready to spend months after months. To solve such problem, we need an Artificially Intelligent (AI) system to automatically find out the images in which the object appears, and also, we want to sort it out in less amount of time.

To design such an AI system, many people came out with different ideas and strategies and were able to succeed. To do so, people need to train the AI system so it can learn how the features of a similar object look like. After training such a system, we can show an object in the form of a query, and the system will extract features which will be used for similarity search.

We now know what an object retrieval system does and how it works. With this discussion, lets us go into a bit depth about the object retrieval method we proposed.

In our work [1], we have proposed a query based similar object retrieval method, in which similar objects are retrieved from a dataset based on a given query image. Such applications are highly dependent on how the features of the objects are represented.

Even though lots of techniques are being proposed, object retrieval remains a challenging task in the computer vision domain. This is due to the fact that there is an enormous amount of complex image data, and descriptor matching is computationally expensive. Recently, we have witnessed the power of deep learning in the domain of computer vision. Unlike conventional handcrafted features, the deep learning algorithms automatically extract complex features and train itself in a hierarchical fashion.  We have also witnessed that, in the task of image classification, the deep learning algorithms proved to be much more efficient than the traditional machine learning algorithms.

So, in our work, we have employed deep learning.  We use the features generated from a pre-trained network, and we call them neural codes. Basically, the neural codes are the activations which are generated at different layers of a network, which you can imagine as vectors at different layers of a neural network.

To detect objects of interest we have used an algorithm known as Faster-RCNN, which generates regions of interest.

An image may contain various objects, and this algorithm proposes some regions where it thinks that some objects might be present. To search for a particular object amidst a huge collection of images, our method first processes all these images and extract features with respect to all generated object proposals to compare against a query image. If there are proposal matches, then images which contained them are retrieved and are sorted based on a similarity metric.

To summarize, our method compares the query descriptor with the proposals of images, and we retrieve the images which contain the same or similar object as that of the query.  

[1] Saikia, S., Fidalgo, E., Alegre, E., & Fernández-Robles, L. (2017, September). Object detection for crime scene evidence analysis using deep learning. In International Conference on Image Analysis and Processing (pp. 14-24). Springer, Cham.

Leave a Comment

Your email address will not be published.