Abstract
Extensive research and the development of benchmark datasets have primarily focused on Scene Text Recognition (STR) in Latin languages. However, the scenario differs for Indian languages, where the complexities in syntax and semantics have posed many challenges, resulting in limited datasets and comparatively less research in this domain. Overcoming these challenges is crucial for advancing scene text recognition in Indian languages. Although a few works have touched upon this issue, they are constrained in the size and scale of the data as far as we know. To bridge this gap, this paper introduces a large scale, diverse dataset, named as IIIT-IndicSTR-Word for Indic scene text. Comprising a total of 250K word level images in ten different languages—Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu, these images are extracted from roadside scenes captured by a GoPro camera. The dataset encompasses a wide array of realistic adversarial conditions, including blur, changes in illumination, occlusion, non-iconic texts, low resolution, and perspective text. We establish a baseline for the proposed dataset, facilitating evaluation and benchmarking with a specific focus on STR tasks. Our findings indicate that our dataset is a practical training source to enhance performance on respective datasets