IIITH

Francis wilde at semeval-2023 task 5: Clickbait spoiler type identification with transformers

International Workshop on Semantic Evaluation, SemEval, 2023

Core Rank : - Google Rank :41

Abs PDF bibTex

@inproceedings{bib_Fran_2023, AUTHOR = {Vijayasaradhi, I and Kalidindi, Vasudeva Varma }, TITLE = {Francis wilde at semeval-2023 task 5: Clickbait spoiler type identification with transformers}, BOOKTITLE = {International Workshop on Semantic Evaluation}. YEAR = {2023}}

Francis wilde at semeval-2023 task 5: Clickbait spoiler type identification with transformers

Abstract

Clickbait is the text or a thumbnail image that entices the user to click the accompanying link. Clickbaits employ strategies while deliberately hiding the critical elements of the article and revealing partial information in the title, which arouses sufficient curiosity and motivates the user to click the link. In this work, we identify the kind of spoiler given a clickbait title. We formulate this as a text classification problem. We finetune pretrained transformer models on the title of the post and build models for theclickbait-spoiler classification. We achieve a balanced accuracy of 0.70 which is close to the baselin

Billy-Batson at SemEval-2023 Task 5: An Information Condensation based System for Clickbait Spoiling

International Workshop on Semantic Evaluation, SemEval, 2023

Core Rank : - Google Rank :41

Abs PDF bibTex

@inproceedings{bib_Bill_2023, AUTHOR = {Sharma, Anubhav and Joshi, Sagar Sandeep and Abhishek, Tushar and Mamidi, Radhika and Kalidindi, Vasudeva Varma }, TITLE = {Billy-Batson at SemEval-2023 Task 5: An Information Condensation based System for Clickbait Spoiling}, BOOKTITLE = {International Workshop on Semantic Evaluation}. YEAR = {2023}}

Billy-Batson at SemEval-2023 Task 5: An Information Condensation based System for Clickbait Spoiling

Abstract

The Clickbait Challenge targets spoiling the clickbaits using short pieces of information known as spoilers to satisfy the curiosity in- duced by a clickbait post. The large context of the article associated with the clickbait and differences in the spoiler forms, make the task challenging. Hence, to tackle the large con- text, we propose an Information Condensation- based approach, which prunes down the unnec- essary context. Given an article, our filtering module optimised with a contrastive learning objective first selects the parapraphs that are the most relevant to the corresponding clickbait. The resulting condensed article is then fed to the two downstream tasks of spoiler type clas- sification and spoiler generation. We demon- strate and analyze the gains from this approach on both the tasks. Overall, we win the task of spoiler type classification and achieve competi- tive results on spoiler generation

IREL at SemEval-2023 Task 11: User Conditioned Modelling for Toxicity Detection in Subjective Tasks

International Workshop on Semantic Evaluation, SemEval, 2023

Core Rank : - Google Rank :41

Abs PDF bibTex

@inproceedings{bib_IREL_2023, AUTHOR = {Maity, Ankita and Kumar, Kandru Siri Venkata Pavan and Singh, Bhavyajeet and Hari, Kancharla Aditya and Kalidindi, Vasudeva Varma }, TITLE = {IREL at SemEval-2023 Task 11: User Conditioned Modelling for Toxicity Detection in Subjective Tasks}, BOOKTITLE = {International Workshop on Semantic Evaluation}. YEAR = {2023}}

IREL at SemEval-2023 Task 11: User Conditioned Modelling for Toxicity Detection in Subjective Tasks

Abstract

This paper describes our system used in the SemEval-2023 Task 11 Learning With Dis- agreements (Le-Wi-Di). This is a subjective task since it deals with detecting hate speech, misogyny and offensive language. Thus, dis- agreement among annotators is expected. We experiment with different settings like loss functions specific for subjective tasks and in- clude anonymized annotator-specific informa- tion to help us understand the level of disagree- ment. We perform an in-depth analysis of the performance discrepancy of these differ- ent modelling choices. Our system achieves a cross-entropy of 0.58, 4.01 and 3.70 on the test sets of HS-Brexit, ArMIS and MD-Agreement, respectively. Our code implementation is pub- licly available.

Tenzin-Gyatso at SemEval-2023 Task 4: Identifying Human Values behind Arguments using DeBERTa

International Workshop on Semantic Evaluation, SemEval, 2023

Core Rank : - Google Rank :41

Abs PDF bibTex

@inproceedings{bib_Tenz_2023, AUTHOR = {Kumar, Kandru Siri Venkata Pavan and Singh, Bhavyajeet and Maity, Ankita and Hari, Kancharla Aditya and Kalidindi, Vasudeva Varma }, TITLE = {Tenzin-Gyatso at SemEval-2023 Task 4: Identifying Human Values behind Arguments using DeBERTa}, BOOKTITLE = {International Workshop on Semantic Evaluation}. YEAR = {2023}}

Tenzin-Gyatso at SemEval-2023 Task 4: Identifying Human Values behind Arguments using DeBERTa

Abstract

Identifying human values behind arguments is a complex task which requires understanding of premise, stance and conclusion together. We propose a method that uses a pre-trained lan- guage model, DeBERTa, to tokenize and con- catenate the text before feeding it into a fully connected neural network. We also show that leveraging the hierarchy in values improves the performance by .14 F1 score compared to only using level 2 values. Our code is made publicly available here.1

iREL at SemEval-2023 Task 10: Multi-level Training for Explainable Detection of Online Sexism

International Workshop on Semantic Evaluation, SemEval, 2023

Core Rank : - Google Rank :41

Abs PDF bibTex

@inproceedings{bib_iREL_2023, AUTHOR = {C, Nirmal Manoj and Joshi, Sagar Sandeep and Maity, Ankita and Kalidindi, Vasudeva Varma }, TITLE = {iREL at SemEval-2023 Task 10: Multi-level Training for Explainable Detection of Online Sexism}, BOOKTITLE = {International Workshop on Semantic Evaluation}. YEAR = {2023}}

iREL at SemEval-2023 Task 10: Multi-level Training for Explainable Detection of Online Sexism

Abstract

This paper describes our approach for SemEval- 2023 Task 10: Explainable Detection of Online Sexism (EDOS). The task deals with identifi- cation and categorization of sexist content into fine-grained categories for explainability in sex- ism classification. The explainable categoriza- tion is proposed through a set of three hierar- chical tasks that constitute a taxonomy of sexist content, each task being more granular than the former for categorization of the content. Our team (iREL) participated in all three hierarchi- cal subtasks. Considering the inter-connected task structure, we study multilevel training to study the transfer learning from coarser to finer tasks. Our experiments based on pretrained transformer architectures also make use of ad- ditional strategies such as domain-adaptive pre- training to adapt our models to the nature of the content dealt with, and use of the focal loss objective for handling class imbalances. Our best-performing systems on the three tasks achieve macro-F1 scores of 85.93, 69.96 and 54.62 on their respective validation sets

Summarizing Indian Languages using Multilingual Transformers based Models

Technical Report, arXiv, 2023

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Summ_2023, AUTHOR = {Taunk, Dhaval and Kalidindi, Vasudeva Varma }, TITLE = {Summarizing Indian Languages using Multilingual Transformers based Models}, BOOKTITLE = {Technical Report}. YEAR = {2023}}

Summarizing Indian Languages using Multilingual Transformers based Models

Abstract

With the advent of multilingual models like mBART, mT5, IndicBART etc., summarization in low resource Indian languages is getting a lot of attention now a days. But still the number of datasets is low in number. In this work, we (Team HakunaMatata) study how these multilingual models perform on the datasets which have Indian languages as source and target text while performing summarization. We experimented with IndicBART and mT5 models to perform the experiments and report the ROUGE-1, ROUGE-2, ROUGE-3 and ROUGE-4 scores as a performance metric.

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

International Conference on World wide web, WWW, 2023

Core Rank : A* Google Rank :112

Abs PDF bibTex

@inproceedings{bib_XWik_2023, AUTHOR = {Taunk, Dhaval and Rajendra, Sagare Shivprasad and Patil, Anupam and Shivansh, S and Gupta, Manish and Kalidindi, Vasudeva Varma }, TITLE = {XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages}, BOOKTITLE = {International Conference on World wide web}. YEAR = {2023}}

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

Abstract

Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are sum- marized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summarization inefective in solving this problem. Hence, in this work, we propose XWikiGen, which is the task of cross-lingual multi-document summarization of text from multiple reference ar- ticles, written in various languages, to generate Wikipedia-style text. Accordingly, we contribute a benchmark dataset, XWikiRef, spanning ∼69K Wikipedia articles covering fve domains and eight languages. We harness this dataset to train a two-stage system where the input is a set of citations and a section title and the output is a section-specifc LR summary. The proposed system is based on a novel idea of neural unsupervised extractive summariza- tion to coarsely identify salient information followed by a neural abstractive model to generate the section-specifc text. Extensive experiments show that multi-domain training is better than the multi-lingual setup on average. We make our code and dataset publicly availableLack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are sum- marized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summarization inefective in solving this problem. Hence, in this work, we propose XWikiGen, which is the task of cross-lingual multi-document summarization of text from multiple reference ar- ticles, written in various languages, to generate Wikipedia-style text. Accordingly, we contribute a benchmark dataset, XWikiRef, spanning ∼69K Wikipedia articles covering fve domains and eight languages. We harness this dataset to train a two-stage system where the input is a set of citations and a section title and the output is a section-specifc LR summary. The proposed system is based on a novel idea of neural unsupervised extractive summariza- tion to coarsely identify salient information followed by a neural abstractive model to generate the section-specifc text. Extensive experiments show that multi-domain training is better than the multi-lingual setup on average. We make our code and dataset publicly available

LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa

Technical Report, arXiv, 2023

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_LLM-_2023, AUTHOR = {Mehta, Rahul and Kalidindi, Vasudeva Varma }, TITLE = {LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa}, BOOKTITLE = {Technical Report}. YEAR = {2023}}

LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa

Abstract

Named Entity Recognition(NER) is a task of recognizing entities at a token level in a sen- tence. This paper focuses on solving NER tasks in a multilingual setting for complex named entities.Our team, LLM-RM partici- pated in the recently organized SemEval 2023 task, Task 2: MultiCoNER II,Multilingual Complex Named Entity Recognition. We approach the problem by leveraging cross- lingual representation provided by fine-tuning XLM-Roberta base model on datasets of all of the 12 languages provided - Bangla, Chinese, English, Farsi, French, German, Hindi, Italian, Portuguese, Spanish, Swedish and Ukrainian

Neural models for Factual Inconsistency Classification with Explanations

Technical Report, arXiv, 2023

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Neur_2023, AUTHOR = {Raha, Tathagata and Choudhary, Mukund and Menon, Abhinav S and Gupta, Harshit and Srivatsa, K V Aditya and Gupta, Manish and Kalidindi, Vasudeva Varma }, TITLE = {Neural models for Factual Inconsistency Classification with Explanations}, BOOKTITLE = {Technical Report}. YEAR = {2023}}

Neural models for Factual Inconsistency Classification with Explanations

Abstract

Factual consistency is one of the most important requirements when editing high quality documents. It is extremely important for automatic text generation systems like summarization, question answering, dialog modeling, and language modeling. Still, automated factual inconsistency detection is rather under-studied. Existing work has focused on (a) finding fake news keeping a knowledge base in context, or (b) detecting broad contradiction (as part of natural language inference literature). However, there has been no work on detecting and explaining types of factual inconsistencies in text, without any knowledge base in context. In this paper, we leverage existing work in linguistics to formally define five types of factual inconsistencies. Based on this categorization, we contribute a novel dataset, FICLE (Factual Inconsistency CLassification with Explanation), with ~8K samples where each sample consists of two sentences (claim and context) annotated with type and span of inconsistency. When the inconsistency relates to an entity type, it is labeled as well at two levels (coarse and fine-grained). Further, we leverage this dataset to train a pipeline of four neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair. Explanations include inconsistent claim fact triple, inconsistent context span, inconsistent claim component, coarse and fine-grained inconsistent entity types. The proposed system first predicts inconsistent spans from claim and context; and then uses them to predict inconsistency types and inconsistent entity types (when inconsistency is due to entities). We experiment with multiple Transformer …

iREL at SemEval-2023 Task 9: Improving understanding of multilingual Tweets using Translation-Based Augmentation and Domain Adapted Pre-Trained Models

International Workshop on Semantic Evaluation, SemEval, 2023

Core Rank : - Google Rank :41

Abs PDF bibTex

@inproceedings{bib_iREL_2023, AUTHOR = {Singh, Bhavyajeet and Maity, Ankita and Kumar, Kandru Siri Venkata Pavan and Hari, Kancharla Aditya and Kalidindi, Vasudeva Varma }, TITLE = {iREL at SemEval-2023 Task 9: Improving understanding of multilingual Tweets using Translation-Based Augmentation and Domain Adapted Pre-Trained Models}, BOOKTITLE = {International Workshop on Semantic Evaluation}. YEAR = {2023}}

iREL at SemEval-2023 Task 9: Improving understanding of multilingual Tweets using Translation-Based Augmentation and Domain Adapted Pre-Trained Models

Abstract

This paper describes our system (iREL) for Tweet intimacy analysis shared task of the SemEval 2023 workshop at ACL 2023. Our system achieved an overall Pearson’s r score of 0.5924 and ranked 10th on the overall leaderboard. For the unseen languages, we ranked third on the leaderboard and achieved a Pearson’s r score of 0.485. We used a single multilingual model for all languages, as discussed in this paper. We provide a detailed description of our pipeline along with multiple ablation experiments to further analyse each component of the pipeline. We demonstrate how translation-based augmentation, domainspecific features, and domain-adapted pretrained models improve the understanding of intimacy in tweets. The code can be found at https://github.com/bhavyajeet/Multilingualtweet-intimacy