IIITH

Towards Detection of Subjective Bias using Contextualized Word Embeddings

International Conference on World wide web, WWW, 2020

Core Rank : A* Google Rank :112

Abs PDF DOI bibTex

@inproceedings{bib_Towa_2020, AUTHOR = {Pant, Kartikey and Dadu, Tanvi and Mamidi, Radhika }, TITLE = {Towards Detection of Subjective Bias using Contextualized Word Embeddings}, BOOKTITLE = {International Conference on World wide web}. YEAR = {2020}}

Towards Detection of Subjective Bias using Contextualized Word Embeddings

Abstract

Subjective bias detection is critical for applications like propaganda detection, content recommendation, sentiment analysis, and bias neutralization. This bias is introduced in natural language via inflammatory words and phrases, casting doubt over facts, and presupposing the truth. In this work, we perform comprehensive experiments for detecting subjective bias using BERT-based models on the Wiki Neutrality Corpus(WNC). The dataset consists of 360𝑘 labeled instances, from Wikipedia edits that remove various instances of the bias. We further propose BERT-based ensembles that outperform state-of-the-art methods like 𝐵𝐸𝑅𝑇𝑙𝑎𝑟𝑔𝑒 by a margin of 5.6 F1 score.

Dataset Creation and Evaluation of Aspect Based Sentiment Analysis in Telugu, a Low Resource Language

International Conference on Language Resources and Evaluation, LREC, 2020

Core Rank : B Google Rank :59

Abs PDF DOI bibTex

@inproceedings{bib_Data_2020, AUTHOR = {Reddy, Regatte Yashwanth and Reddy, Gangula Rama Rohit and Mamidi, Radhika }, TITLE = {Dataset Creation and Evaluation of Aspect Based Sentiment Analysis in Telugu, a Low Resource Language}, BOOKTITLE = {International Conference on Language Resources and Evaluation}. YEAR = {2020}}

Dataset Creation and Evaluation of Aspect Based Sentiment Analysis in Telugu, a Low Resource Language

Abstract

In recent years, sentiment analysis has gained popularity as it is essential to moderate and analyse the information across the internet. It has various applications like opinion mining, social media monitoring, and market research. Aspect Based Sentiment Analysis (ABSA) is an area of sentiment analysis which deals with sentiment at a finer level. ABSA classifies sentiment with respect to each aspect to gain greater insights into the sentiment expressed. Significant contributions have been made in ABSA, but this progress is limited only to a few languages with adequate resources. Telugu lags behind in this area of research despite being one of the most spoken languages in India and an enormous amount of data being created each day. In this paper, we create a reliable resource for aspect based sentiment analysis in Telugu. The data is annotated for three tasks namely Aspect Term Extraction, Aspect Polarity Classification and Aspect Categorisation. Further, we develop baselines for the tasks using deep learning methods demonstrating the reliability and usefulness of the resource.

Annotated Corpus for Sentiment Analysis in Odia Language

International Conference on Language Resources and Evaluation, LREC, 2020

Core Rank : B Google Rank :59

Abs PDF DOI bibTex

@inproceedings{bib_Anno_2020, AUTHOR = {MOHANTY, GAURAV and MISHRA, PRUTHWIK and Mamidi, Radhika }, TITLE = {Annotated Corpus for Sentiment Analysis in Odia Language}, BOOKTITLE = {International Conference on Language Resources and Evaluation}. YEAR = {2020}}

Annotated Corpus for Sentiment Analysis in Odia Language

Abstract

Given the lack of an annotated corpus of non-traditional Odia literature which serves as the standard when it comes sentiment analysis, we have created an annotated corpus of Odia sentences and made it publicly available to promote research in the field. Secondly, in order to test the usability of currently available Odia sentiment lexicon, we experimented with various classifiers by training and testing on the sentiment annotated corpus while using identified affective words from the same as features. Annotation and classification are done at sentence level as the usage of sentiment lexicon is best suited to sentiment analysis at this level. The created corpus contains 2045 Odia sentences from news domain annotated with sentiment labels using a well-defined annotation scheme. An inter-annotator agreement score of 0.79 is reported for the corpus.

Manovaad: A Novel Approach to Event Oriented Corpus Creation Capturing Subjectivity and Focus

International Conference on Language Resources and Evaluation, LREC, 2020

Core Rank : B Google Rank :59

Abs PDF DOI bibTex

@inproceedings{bib_Mano_2020, AUTHOR = {Kameswari, V A Lalitha and Mamidi, Radhika }, TITLE = {Manovaad: A Novel Approach to Event Oriented Corpus Creation Capturing Subjectivity and Focus}, BOOKTITLE = {International Conference on Language Resources and Evaluation}. YEAR = {2020}}

Manovaad: A Novel Approach to Event Oriented Corpus Creation Capturing Subjectivity and Focus

Abstract

In today’s era of globalisation, the increased outreach for every event across the world has been leading to conflicting opinions, arguments and disagreements, often reflected in print media and online social platforms. It is necessary to distinguish factual observations from personal judgements in news, as subjectivity in reporting can influence the audience’s perception of reality. Several studies conducted on the different styles of reporting in journalism are essential in understanding phenomena such as media bias and multiple interpretations of the same event. This domain finds applications in fields such as Media Studies, Discourse Analysis, Information Extraction, Sentiment Analysis, and Opinion Mining. We present an event corpus “Manovaad-v1.0” consisting of 1035 news articles corresponding to 65 events from 3 levels of newspapers viz., Local, National, and International levels. Using this novel format, we correlate the trends in the degree of subjectivity with the geographical closeness of reporting using a Bi-RNN model. We also analyse the role of background and focus in event reporting and capture the focus shift patterns within a global discourse structure for an event. We do this across different levels of reporting and compare the results with the existing work on discourse processing.

A SentiWordNet Strategy for Curriculum Learning in Sentiment Analysis

International Conference on Applications of Natural Language to Information Systems, NLBD, 2020

Core Rank : - Google Rank :18

Abs PDF bibTex

@inproceedings{bib_A_Se_2020, AUTHOR = {RAO, V ANVESH and ANURANJANA, KAVERI and Mamidi, Radhika }, TITLE = {A SentiWordNet Strategy for Curriculum Learning in Sentiment Analysis}, BOOKTITLE = {International Conference on Applications of Natural Language to Information Systems}. YEAR = {2020}}

A SentiWordNet Strategy for Curriculum Learning in Sentiment Analysis

Abstract

Curriculum Learning (CL) is the idea that learning on a training set sequenced or ordered in a manner where samples range from easy to difficult, results in an increment in performance over otherwise random ordering. The idea parallels cognitive science’s theory of how human brains learn, and that learning a difficult task can be made easier by phrasing it as a sequence of easy to difficult tasks. This idea has gained a lot of traction in machine learning and image processing for a while and recently in Natural Language Processing (NLP). In this paper, we apply the ideas of curriculum learning, driven by SentiWordNet in a sentiment analysis setting. In this setting, given a text segment, our aim is to extract its sentiment or polarity. SentiWordNet is a lexical resource with sentiment polarity annotations. By comparing performance with other curriculum strategies and with no curriculum, the effectiveness of the proposed strategy is presented. Convolutional, Recurrence, and Attention-based architectures are employed to assess this improvement. The models are evaluated on a standard sentiment dataset, Stanford Sentiment Treebank.

BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text

CEUR Workshop Proceedings, CEUR, 2020

Core Rank : - Google Rank :28

Abs PDF bibTex

@inproceedings{bib_BERT_2020, AUTHOR = {Dadu, Tanvi and Pant, Kartikey and Mamidi, Radhika }, TITLE = {BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text}, BOOKTITLE = {CEUR Workshop Proceedings}. YEAR = {2020}}

BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text

Abstract

There is a growing interest in understanding how humans initiate and hold conversations. The affective understanding of conversations focuses on the problem of how speakers use emotions to react to a situation and to each other. In the CL-Aff Shared Task, the organizers released Get it #OffMyChest dataset, which contains Reddit comments from casual and confessional conversations, labeled for their disclosure and supportiveness characteristics. In this paper, we introduce a predictive ensemble model exploiting the finetuned contextualized word embeddings, RoBERTa and ALBERT. We show that our model outperforms the base models in all considered metrics, achieving an improvement of 3% in the F1 score. We further conduct statistical analysis and outline deeper insights into the given dataset while providing a new characterization of impact for the dataset.

Enhancing Bias Detection in Political News Using Pragmatic Presupposition

International Joint Conference on Natural Language Processing Workshop, IJCNLP-W, 2020

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Enha_2020, AUTHOR = {Kameswari, V A Lalitha and Sravani, Dama and Mamidi, Radhika }, TITLE = {Enhancing Bias Detection in Political News Using Pragmatic Presupposition}, BOOKTITLE = {International Joint Conference on Natural Language Processing Workshop}. YEAR = {2020}}

Enhancing Bias Detection in Political News Using Pragmatic Presupposition

Abstract

Usage of presuppositions in social media and news discourse can be a powerful way to influence the readers as they usually tend to not examine the truth value of the hidden or indirectly expressed information. Fairclough and Wodak (1997) discuss presupposition at a discourse level where some implicit claims are taken for granted in the explicit meaning of a text or utterance. From the Gricean perspective, the presuppositions of a sentence determine the class of contexts in which the sentence could be felicitously uttered. This paper aims to correlate the type of knowledge presupposed in a news article to the bias present in it. We propose a set of guidelines to identify various kinds of presuppositions in news articles and present a dataset consisting of 1050 articles which are annotated for bias (positive, negative or neutral) and the magnitude of presupposition. We introduce a supervised classification approach for detecting bias in political news which significantly outperforms the existing systems.

Detecting Sarcasm in Conversation Context Using Transformer-Based Models

Conference of the Association for Computational Linguistics Workshops, ACL-W, 2020

Core Rank : - Google Rank :-

Abs PDF DOI bibTex

@inproceedings{bib_Dete_2020, AUTHOR = {AVVARU, ADITHYA and Vobilisetty, Sanath and Mamidi, Radhika }, TITLE = {Detecting Sarcasm in Conversation Context Using Transformer-Based Models}, BOOKTITLE = {Conference of the Association for Computational Linguistics Workshops}. YEAR = {2020}}

Detecting Sarcasm in Conversation Context Using Transformer-Based Models

Abstract

Sarcasm detection, regarded as one of the subproblems of sentiment analysis, is a very typical task because the introduction of sarcastic words can flip the sentiment of the sentence itself. To date, many research works revolve around detecting sarcasm in one single sentence and there is very limited research to detect sarcasm resulting from multiple sentences. Current models used Long Short Term Memory (Hochreiter and Schmidhuber, 1997) (LSTM) variants with or without attention to detect sarcasm in conversations. We showed that the models using state-of-the-art Bidirectional Encoder Representations from Transformers (Devlin et al., 2018) (BERT), to capture syntactic and semantic information across conversation sentences, performed better than the current models. Based on the data analysis, we estimated that the number of sentences in the conversation that can contribute to the sarcasm and the results agrees to this estimation. We also perform a comparative study of our different versions of BERT-based model with other variants of LSTM model and XLNet (Yang et al., 2019) (both using the estimated number of conversation sentences) and find out that BERT-based models outperformed them.

Samajh-Boojh: A Reading Comprehension system in Hindi

International Conference on Natural Language Processing., ICON, 2019

Core Rank : - Google Rank :5

Abs PDF bibTex

@inproceedings{bib_Sama_2019, AUTHOR = {Vaidya, Shalaka and Adibhatla, Hiranmai Sri and Mamidi, Radhika }, TITLE = {Samajh-Boojh: A Reading Comprehension system in Hindi}, BOOKTITLE = {International Conference on Natural Language Processing.}. YEAR = {2019}}

Samajh-Boojh: A Reading Comprehension system in Hindi

Abstract

This paper presents a novel approach designed to answer questions on a reading comprehension passage. It is an end-to-end system which first focuses on comprehending the given passage wherein it converts unstructured passage into a structured data and later proceeds to answer the questions related to the passage using solely the aforementioned structured data. To the best of our knowledge, the proposed design is first of its kind which accounts for entire process of comprehending the passage and then answering the questions associated with the passage. The comprehension stage converts the passage into a Discourse Collection that comprises of the relation shared amongst logical sentences in given passage along with the key characteristics of each sentence. This design has its applications in academic domain , query comprehension in speech systems among others.

Samvaadhana : A Telugu Dialogue System in Hospital Domain

Workshop on Deep Learning Approaches for Low-Resource NLP, DeepLo, 2019

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Samv_2019, AUTHOR = {Duggenpudi, Suma Reddy and Varma, Kusampudi Siva Subrahamanyam and Mamidi, Radhika }, TITLE = {Samvaadhana : A Telugu Dialogue System in Hospital Domain}, BOOKTITLE = {Workshop on Deep Learning Approaches for Low-Resource NLP}. YEAR = {2019}}

Samvaadhana : A Telugu Dialogue System in Hospital Domain

Abstract

In this paper, a dialogue system for Hospital domain in Telugu, which is a resource-poor Dravidian language, has been built. It handles various hospital and doctor related queries. The main aim of this paper is to present an approach for modelling a dialogue system in a resource-poor language by combining linguistic and domain knowledge. Focusing on the question answering aspect of the dialogue system, we identified Question Classification and Query Processing as the two most important parts of the dialogue system. Our method combines deep learning techniques for question classification and computational rule-based analysis for query processing. Human evaluation of the system has been performed as there is no automated evaluation tool for dialogue systems in Telugu. Our system achieves a high overall rating along with a significantly accurate context-capturing method as shown in the results.