Abstract
Automatic annotation is an elegant alternative to explicit recognition in images. In annotation, the image is matched with keyword models, and the most relevant keywords are assigned to the image. Using existing techniques, the annotation time for large collections is very high, while the annotation performance degrades with increase in number of keywords. Towards the goal of large scale annotation, we present an approach called “Reverse Annotation”. Unlike traditional annotationwhere keywords are identified for a given image, in Reverse Annotation, the relevant images are identified for each keyword. With this seemingly simple shift in perspective, the annotation time is reduced significantly. To be able to rank relevant images, the approach is extended to Probabilistic Reverse Annotation. Our framework is applicable to a wide variety of multimedia documents, and scalable to large collections. Here, we demonstrate the framework over a large collection of 75,000 document images, containing 21 million word segments, annotated by 35000 keywords. Our image retrieval system replicates text-based search engines, in response time.