Abstract
The task of headline generation within the realm of Natural Language Processing (NLP) holds immense significance, as it strives to dis- till the true essence of textual content into con- cise and attention-grabbing summaries. While noteworthy progress has been made in head- line generation for widely spoken languages like English, there persist numerous challenges when it comes to generating headlines in low- resource languages, such as the rich and diverse Indian languages. A prominent obstacle that specifically hinders headline generation in In- dian languages is the scarcity of high-quality annotated data. To address this crucial gap, we proudly present Mukhyansh, an extensive mul- tilingual dataset, tailored for Indian language headline generation. Comprising an impressive collection of over 3.39 million article-headline pairs, Mukhyansh spans across eight promi- nent Indian languages, namely Telugu, Tamil, Kannada, Malayalam, Hindi, Bengali, Marathi, and Gujarati. We present a comprehensive evaluation of several state-of-the-art baseline models. Additionally, through an empirical analysis of existing works, we demonstrate that Mukhyansh outperforms all other mod- els, achieving an impressive average ROUGE-L score of 31.43 across all 8 languages.