IIITH

New data is indeed helping lexical simplification

International Conference on Intelligent Text Processing and Computational Linguistics, CICLing, 2017

Core Rank : C Google Rank :-

Abs PDF bibTex

@inproceedings{bib_New__2017, AUTHOR = {PALAKURTHI, ASHISH and Mamidi, Radhika }, TITLE = {New data is indeed helping lexical simplification}, BOOKTITLE = {International Conference on Intelligent Text Processing and Computational Linguistics}. YEAR = {2017}}

New data is indeed helping lexical simplification

Abstract

We propose the use of the Newsela corpus for Complex Word Identification, a sub-problem of Lexical Simplification and conduct an empirical evaluation by comparing it with benchmark corpora previously employed for this task. Our experiments suggest that the proposed corpus is effective for Complex Word Identification, thus helping Lexical Simplification.

Automatic Generation of Jokes in Hindi

Conference of the Association of Computational Linguistics, ACL, 2017

Core Rank : A* Google Rank :106

Abs PDF bibTex

@inproceedings{bib_Auto_2017, AUTHOR = {AGGARWAL, SRISHTI and Mamidi, Radhika }, TITLE = {Automatic Generation of Jokes in Hindi}, BOOKTITLE = {Conference of the Association of Computational Linguistics}. YEAR = {2017}}

Automatic Generation of Jokes in Hindi

Abstract

When it comes to computational language generation systems, humour is a relatively unexplored domain, especially more so for Hindi (or rather, for most languages other than English). Most researchers agree that a joke consists of two main parts-the setup and the punchline, with humour being encoded in the incongruity between the two. In this paper, we look at Dur se Dekha jokes, a restricted domain of humorous three liner poetry in Hindi. We analyze their structure to understand how humour is encoded in them and formalize it. We then develop a system which is successfully able to generate a basic form of these jokes.

When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data

workshop on NLP and computational social science, NLPCSS-W, 2017

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_When_2017, AUTHOR = {JHA, AKSHITA and Mamidi, Radhika }, TITLE = {When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data}, BOOKTITLE = {workshop on NLP and computational social science}. YEAR = {2017}}

When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data

Abstract

Sexism is prevalent in today’s society, both offline and online, and poses a credible threat to social equality with respect to gender. According to ambivalent sexism theory (Glick and Fiske, 1996), it comes in two forms: Hostile and Benevolent. While hostile sexism is characterized by an explicitly negative attitude, benevolent sexism is more subtle. Previous works on computationally detecting sexism present online are restricted to identifying the hostile form. Our objective is to investigate the less pronounced form of sexism demonstrated online. We achieve this by creating and analyzing a dataset of tweets that exhibit benevolent sexism. By using Support Vector Machines (SVM), sequence-to-sequence models and FastText classifier, we classify tweets into ‘Hostile’,‘Benevolent’or ‘Others’ class depending on the kind of sexism they exhibit. We have been able to achieve an F1-score of 87.22% using FastText classifier. Our work helps analyze and understand the much prevalent ambivalent sexism in social media.

Bolly: Annotation of sentiment polarity in bollywood lyrics dataset

Conference of the Pacific Association for Computational Linguistics, PACLING, 2017

Core Rank : C Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Boll_2017, AUTHOR = {APOORVA, G DRUSHTI and Mamidi, Radhika }, TITLE = {Bolly: Annotation of sentiment polarity in bollywood lyrics dataset}, BOOKTITLE = {Conference of the Pacific Association for Computational Linguistics}. YEAR = {2017}}

Bolly: Annotation of sentiment polarity in bollywood lyrics dataset

Abstract

This work presents a corpus of Bollywood song lyrics and its metadata, annotated with sentiment polarity. We call this BolLy. It contains lyrics of 1055 songs ranging from those composed in the year 1970 to the most recent ones. This dataset is of utmost value as all the annotation is done manually by three annotators and this makes it a very rich dataset for training purposes. In this work, we describe the creation and annotation process, content, and the possible uses of the dataset. As an experiment, we have built a basic classification system to identify the emotion polarity of the song based solely on the lyrics and this can be used as a baseline algorithm for the same. BolLy can also be used for studying code-mixing with respect to lyrics.

Tag me a label with multi-arm: Active learning for telugu sentiment analysis

International Conference on Big Data Analytics and Knowledge Discovery, ICBDAKD, 2017

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Tag__2017, AUTHOR = {MUKKU, SANDEEP SRICHARAN and REDDY, OOTA SUBBA and Mamidi, Radhika }, TITLE = {Tag me a label with multi-arm: Active learning for telugu sentiment analysis}, BOOKTITLE = {International Conference on Big Data Analytics and Knowledge Discovery}. YEAR = {2017}}

Tag me a label with multi-arm: Active learning for telugu sentiment analysis

Abstract

Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.

Actsa: Annotated corpus for telugu sentiment analysis

Workshop on Building Linguistically Generalizable NLP Systems, BLGNLP-W, 2017

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Acts_2017, AUTHOR = {MUKKU, SANDEEP SRICHARAN and Mamidi, Radhika }, TITLE = {Actsa: Annotated corpus for telugu sentiment analysis}, BOOKTITLE = {Workshop on Building Linguistically Generalizable NLP Systems}. YEAR = {2017}}

Actsa: Annotated corpus for telugu sentiment analysis

Abstract

Sentiment analysis deals with the task of determining the polarity of a document or sentence and has received a lot of attention in recent years for the English language. With the rapid growth of social media these days, a lot of data is available in regional languages besides English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labelled data of sentences for Telugu Sentiment Analysis. In this paper, we describe an effort to build a gold-standard annotated corpus of Telugu sentences to support Telugu Sentiment Analysis. The corpus, named ACTSA (Annotated Corpus for Telugu Sentiment Analysis) has a collection of Telugu sentences taken from different sources which were then pre-processed and manually annotated by native Telugu speakers using our annotation guidelines. In total, we have annotated 5457 sentences, which makes our corpus the largest resource currently available. The corpus and the annotation guidelines are made publicly available.

Building a SentiWordNet For Odia

Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA, 2017

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Buil_2017, AUTHOR = {MOHANTY, GAURAV and KANNAN, ABISHEK and Mamidi, Radhika }, TITLE = {Building a SentiWordNet For Odia}, BOOKTITLE = {Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis}. YEAR = {2017}}

Building a SentiWordNet For Odia

Abstract

As a discipline of Natural Language Processing, Sentiment Analysis is used to extract and analyze subjective information present in natural language data. The task of Sentiment Analysis has acquired wide commercial uses including social media monitoring tasks, survey responses, review systems, etc. Languages like English have several resources which aid in the task of Sentiment Analysis. SentiWord-Net and Subjectivity WordList are examples of such tools and resources. With more data being available in native vernacular, language-specific SentiWordNet (s) have become essential. For resource poor languages, creating such SentiWordNet (s) is a difficult task to achieve. One solution is to use available resources in English and translate the final source lexicon to target lexicon via machine translation. Machine translation systems for the English-Odia language pair have not yet been developed. In this paper, we discuss a method to create a SentiWordNet for Odia, which is resource-poor, by only using resources which are currently available for Indian languages. The lexicon created, would serve as a tool for Sentiment Analysis related task specific to Odia data.

Multi-Arm Active Transfer Learning for Telugu Sentiment Analysis.

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databa, PKDD/ECML, 2017

Core Rank : A

Abs PDF bibTex

@inproceedings{bib_Mult_2017, AUTHOR = {REDDY, OOTA SUBBA and Vijayasaradhi, I and Marreddy, Mounika and MUKKU, SANDEEP SRICHARAN and Mamidi, Radhika }, TITLE = {Multi-Arm Active Transfer Learning for Telugu Sentiment Analysis.}, BOOKTITLE = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databa}. YEAR = {2017}}

Multi-Arm Active Transfer Learning for Telugu Sentiment Analysis.

Abstract

Transfer learning algorithms can be used when sufficient amount of training data is available in the source domain and limited training data is available in the target domain. The transfer of knowledge from one domain to another requires similarity between two domains. In many resource-poor languages, it is rare to find labeled training data in both the source and target domains. Active learning algorithms, which query more labels from an oracle, can be used effectively in training the source domain when an oracle is available in the source domain but not available in the target domain. Active learning strategies are subjective as they are designed by humans. It can be time consuming to design a strategy and it can vary from one human to other. To tackle all these problems, we design a learning algorithm that connects transfer learning and active learning with the well-known multi-armed bandit problem by querying …

Domain independent keyword identification for question answering

International Conference on Asian Language Processing, IALP, 2017

Core Rank : - Google Rank :12

Abs PDF bibTex

@inproceedings{bib_Doma_2017, AUTHOR = {JWALAPURAM, PRATHYUSHA and Mamidi, Radhika }, TITLE = {Domain independent keyword identification for question answering}, BOOKTITLE = {International Conference on Asian Language Processing}. YEAR = {2017}}

Domain independent keyword identification for question answering

Abstract

In this paper, we look at domain independent keyword identification for natural language queries using statistical methods. We took queries supplemented by only their dependency tags (Stanford Parser) and part-of-speech tags (Stanford POS tagger) and labeled the keywords. We then delexicalised the training data, and used the Conditional Random Fields algorithm to learn these labels. We used the queries created by [1] in the course management domain for training, and tested our model on the queries of three domains: course management, library and the GeoQueries250 dataset and report fairly high accuracies of 90.65%, 83.19% and 97.13% respectively, making our model a truly domain independent and highly accurate keyword identifier.

“nee intention enti?” towards dialog act recognition in code-mixed conversations

International Conference on Asian Language Processing, IALP, 2017

Core Rank : - Google Rank :12

Abs PDF bibTex

@inproceedings{bib_“n_2017, AUTHOR = {SAI, J DIVYA and RAGHAVI, CHANDU KHYATHI and HARSHA, PAMIDIPALLI GNANA SRI and Mamidi, Radhika }, TITLE = {“nee intention enti?” towards dialog act recognition in code-mixed conversations}, BOOKTITLE = {International Conference on Asian Language Processing}. YEAR = {2017}}

“nee intention enti?” towards dialog act recognition in code-mixed conversations

Abstract

Code-Mixing (CM) is a very commonly observed mode of communication in a multilingual configuration. The trends of using this newly emerging language has its effect as a culling option especially in platforms like social media. This becomes particularly important in the context of technology and health, where expressing the upcoming advancements is difficult in native language. Despite the change of such language dynamics, current dialog systems cannot handle a switch between languages across sentences and mixing within a sentence. Everyday conversations are fabricated in this mixed language and analyzing dialog acts in this language is very essential in further advancements of making interaction with personal assistants more natural. The problem is further compounded with crossing the script barriers in code-mixing. In this paper we take the first step towards understanding code-mixing in dialog …