Abstract
                                                                        Named Entity Recognition (NER) is a successful and well-researched problem in English due  to the availability of resources. The transformer  models, specifically the masked-language models (MLM), have shown remarkable performance in NER in recent times. With growing  data in different online platforms, there is a  need for NER in other languages too. NER  remains underexplored in Indian languages due  to the lack of resources and tools. Our contributions in this paper include (i) Two annotated  NER datasets for the Telugu language in multiple domains: Newswire Dataset (ND) and  Medical Dataset (MD), and we combined ND  and MD to form a Combined Dataset (CD) (ii)  Comparison of the finetuned Telugu pretrained  transformer models (BERT-Te, RoBERTa-Te,  and ELECTRA-Te) with other baseline models (CRF, LSTM-CRF, and BiLSTM-CRF) (iii)  Further investigation of the performance of  Telugu pretrained transformer models against  the multilingual models mBERT (Devlin et al.,  2018), XLM-R (Conneau et al., 2020), and  IndicBERT (Kakwani et al., 2020). We find  that pretrained Telugu language models (BERTTe and RoBERTa) outperform the existing pretrained multilingual and baseline models in  NER. On a large dataset (CD) of 38,363 sentences, the BERT-Te achieves a high F1-score  of 0.80 (entity-level) and 0.75 (token-level).  Further, these pretrained Telugu models have  shown state-of-the-art performance on various