Abstract
Human pose as a query modality is an alternative and rich experience for image and video retrieval. It has interesting retrieval applications in domains such as sports and dance databases. In this work we propose two novel ways for representing the image of a person striking a pose, one looking for parts and other looking at the whole image. These representations are then used for retrieval. Both the representations are obtained using deep learning methods. In the first method, we make the following contributions: (a) We introduce ‘deep poselets’ for pose- sensitive detection of various body parts, built on convolutional neural network (CNN) features. These deep poselets significantly outperform previous instantiations of Berkeley poselets [6], and (b) Using these detec- tor responses, we construct a pose representation that is suitable for pose search, and show that pose retrieval performance is on par with the previous methods. In the second method, we make the follow- ing contributions: (a) We design an optimized neural network which maps the input image to a very low dimensional space where similar poses are close by and dissimilar poses are farther away, and (b) We show that pose retrieval system using these low dimensional representation is on par with the deep poselet representation and is on par with the previous methods. The previous works with which the above two methods are compared include bag of visual words [44], Berkeley poselets [6] and human pose estimation algorithms [52]. All the methods are quantitatively eval- uated on a large dataset of images built from a number of standard benchmarks together with frames from