IIITH

Assessing Translation capabilities of Large Language Models involving English and Indian Languages

Vandan Mujadia, Urlana Ashok, Yash Bhaskar, Penumalla Aditya Pavani, Kukkapalli Shravya, Parameswari Krishnamurthy, Dipti Mishra Sharma

Technical Report, arXiv, 2023

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Asse_2023, AUTHOR = {Vandan Mujadia, Urlana Ashok, Yash Bhaskar, Penumalla Aditya Pavani, Kukkapalli Shravya, Parameswari Krishnamurthy, Dipti Mishra Sharma}, TITLE = {Assessing Translation capabilities of Large Language Models involving English and Indian Languages}, BOOKTITLE = {Technical Report}. YEAR = {2023}}

Assessing Translation capabilities of Large Language Models involving English and Indian Languages

Abstract

Generative Large Language Models (LLMs) have achieved remarkable advancements in var- ious NLP tasks. In this work, our aim is to ex- plore the multilingual capabilities of large lan- guage models by using machine translation as a task involving English and 22 Indian languages. We first investigate the translation capabilities of raw large language models, followed by ex- ploring the in-context learning capabilities of the same raw models. We fine-tune these large language models using parameter efficient fine- tuning methods such as LoRA and additionally with full fine-tuning. Through our study, we have identified the best performing large lan- guage model for the translation task involving LLMs, which is based on LLaMA.

AbhiPaw@ DravidianLangTech: Abusive Comment Detection in Tamil and Telugu using Logistic Regression

Abhinaba Bala, Parameswari Krishnamurthy

Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech, 2023

Core Rank : - Google Rank :20

Abs PDF bibTex

@inproceedings{bib_Abhi_2023, AUTHOR = {Abhinaba Bala, Parameswari Krishnamurthy}, TITLE = {AbhiPaw@ DravidianLangTech: Abusive Comment Detection in Tamil and Telugu using Logistic Regression}, BOOKTITLE = {Workshop on Speech and Language Technologies for Dravidian Languages}. YEAR = {2023}}

AbhiPaw@ DravidianLangTech: Abusive Comment Detection in Tamil and Telugu using Logistic Regression

Abstract

Abusive comments in online platforms have become a significant concern, necessitating the development of effective detection systems. However, limited work has been done in low resource languages, including Dravidian languages. This paper addresses this gap by focusing on abusive comment detection in a dataset containing Tamil, Tamil-English and TeluguEnglish code-mixed comments. Our methodology involves logistic regression and explores suitablef embeddings to enhance the performance of the detection model. Through rigorous experimentation, we identify the most effective combination of logistic regression and embeddings. The results demonstrate the performance of our proposed model, which contributes to the development of robust abusive comment detection systems in low resource language settings. Keywords: Abusive comment detection, Dravidian languages, logistic regression, embeddings, low resource languages, code-mixed dataset.

AbhiPaw@ DravidianLangTech: Fake News Detection in Dravidian Languages using Multilingual BERT

Abhinaba Bala, Parameswari Krishnamurthy

Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech, 2023

Core Rank : - Google Rank :20

Abs PDF bibTex

@inproceedings{bib_Abhi_2023, AUTHOR = {Abhinaba Bala, Parameswari Krishnamurthy}, TITLE = {AbhiPaw@ DravidianLangTech: Fake News Detection in Dravidian Languages using Multilingual BERT}, BOOKTITLE = {Workshop on Speech and Language Technologies for Dravidian Languages}. YEAR = {2023}}

AbhiPaw@ DravidianLangTech: Fake News Detection in Dravidian Languages using Multilingual BERT

Abstract

This study addresses the challenge of detecting fake news in Dravidian languages by leveraging Google’s MuRIL (Multilingual Representations for Indian Languages) model. Drawing upon previous research, we investigate the intricacies involved in identifying fake news and explore the potential of transformer-based models for linguistic analysis and contextual understanding. Through supervised learning, we fine-tune the ”muril-base-cased” variant of MuRIL using a carefully curated dataset of labeled comments and posts in Dravidian languages, enabling the model to discern between original and fake news. During the inference phase, the fine-tuned MuRIL model analyzes new textual content, extracting contextual and semantic features to predict the content’s classification. We evaluate the model’s performance using standard metrics, highlighting the effectiveness of MuRIL in detecting fake news in Dravidian languages and contributing to the establishment of a safer digital ecosystem. Keywords: fake news detection, Dravidian languages, MuRIL, transformer-based models, linguistic analysis, contextual understanding.x

AbhiPaw@ DravidianLangTech: Multimodal Abusive Language Detection and Sentiment Analysis using Transformer based architecture

Abhinaba Bala, Parameswari Krishnamurthy

Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech, 2023

Core Rank : - Google Rank :20

Abs PDF bibTex

@inproceedings{bib_Abhi_2023, AUTHOR = {Abhinaba Bala, Parameswari Krishnamurthy}, TITLE = {AbhiPaw@ DravidianLangTech: Multimodal Abusive Language Detection and Sentiment Analysis using Transformer based architecture}, BOOKTITLE = {Workshop on Speech and Language Technologies for Dravidian Languages}. YEAR = {2023}}

AbhiPaw@ DravidianLangTech: Multimodal Abusive Language Detection and Sentiment Analysis using Transformer based architecture

Abstract

Detecting abusive language in multimodal videos has become a pressing need in ensuring a safe and inclusive online environment. This paper focuses on addressing this challenge through the development of a novel approach for multimodal abusive language detection in Tamil videos and sentiment analysis for Tamil/Malayalam videos. By leveraging state-of-the-art models such as Multiscale Vision Transformers (MViT) for video analysis, OpenL3 for audio analysis, and the bert-basemultilingual-cased model for textual analysis, our proposed framework integrates visual, auditory, and textual features. Through extensive experiments and evaluations, we demonstrate the effectiveness of our model in accurately detecting abusive content and predicting sentiment categories. The limited availability of effective tools for performing these tasks in Dravidian Languages has prompted a new avenue of research in these domains. Keywords: abusive language detection, sentiment analysis, multimodal analysis, video analysis, Dravidian languages.