Abstract
Recognizing text in images taken in the wild is achallenging problem that has received great attention in recentyears. Previous methods addressed this problem by first detectingindividual characters, and then forming them into words. Suchapproaches often suffer from weak character detections, due tolarge intra-class variations, even more so than characters fromscanned documents. We take a different view of the problemand present a holistic word recognition framework. In this,we first represent the scene text image and synthetic imagesgenerated from lexicon words using gradient-based features. Wethen recognize the text in the image by matching the scene andsynthetic image features with our novel weighted Dynamic TimeWarping (wDTW) approach.We perform experimental analysis on challenging publicdatasets, such as Street View Text and ICDAR 2003. Ourproposed method significantly outperforms our earlier work inMishraet al.(CVPR 2012), as well as many other recent works,such as Novikovaet al.(ECCV 2012), Wanget al.(ICPR 2012),Wanget al.(ICCV 2011)