IIITH

“The Times They Are-a-Changin”: The Effect of the Covid-19 Pandemic on Online Music Sharing in India

International Conference on Social Informatics, SocInfo, 2022

Core Rank : - Google Rank :-

Abs bibTex

@inproceedings{bib_“T_2022, AUTHOR = {Kamble, Tanvi and Desur, Pooja Govind and Krause, Amanda and Kumaraguru, Ponnurangam and R, Vinoo A }, TITLE = {“The Times They Are-a-Changin”: The Effect of the Covid-19 Pandemic on Online Music Sharing in India}, BOOKTITLE = {International Conference on Social Informatics}. YEAR = {2022}}

“The Times They Are-a-Changin”: The Effect of the Covid-19 Pandemic on Online Music Sharing in India

Abstract

Music sharing trends have been shown to change during times of socio-economic crises. Studies have also shown that music can act as a social surrogate, helping to significantly reduce loneliness by acting as an empathetic friend. We explored these phenomena through a novel study of online music sharing during the Covid-19 pandemic in India. We collected tweets from the popular social media platform Twitter during India’s first and second wave of the pandemic (n = 1,364). We examined the different ways in which music was able to accomplish the role of a social surrogate via analyzing tweet text using Natural Language Processing techniques. Additionally, we analyzed the emotional connotations of the music shared through the acoustic features and lyrical content and compared the results between pandemic and pre-pandemic times. It was observed that the role of music shifted to a more community focused function rather than tending to a more self-serving utility. Results demonstrated that people shared music during the Covid-19 pandemic which had lower valence and shared songs with topics that reflected turbulent times such as Hardship and Exclusion when compared to songs shared during pre-Covid times. The results are further discussed in the context of individualistic versus collectivistic cultures.

Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection

International Conference on World wide web, WWW, 2022

Core Rank : A* Google Rank :112

Abs bibTex

@inproceedings{bib_Leve_2022, AUTHOR = {Singhal, Shivangi and Pandey, Tanisha and Mrig, Saksham and Shah, Rajiv Ratn and Kumaraguru, Ponnurangam }, TITLE = {Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection}, BOOKTITLE = {International Conference on World wide web}. YEAR = {2022}}

Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection

Abstract

Recent years have witnessed a massive growth in the proliferation of fake news online. User-generated content is a blend of text and visual information leading to producing different variants of fake news. As a result, researchers started targeting multimodal methods for fake news detection. Existing methods capture high-level information from different modalities and jointly model them to decide. Given multiple input modalities, we hypothesize that not all modalities may be equally responsible for decision-making. Hence, this paper presents a novel architecture that effectively identifies and suppresses information from weaker modalities and extracts relevant information from the strong modality on a per-sample basis. We also establish intra-modality relationship by extracting fine-grained image and text features. We conduct extensive experiments on real-world datasets to show that our approach outperforms the state-of-the-art by an average of 3.05% and 4.525% on accuracy and F1-score, respectively. We also release the code, implementation details, and model checkpoints for the community’s interest.1

TweetBoost: Influence of Social Media on NFT Valuation

International Conference on World wide web, WWW, 2022

Core Rank : A* Google Rank :112

Abs PDF bibTex

@inproceedings{bib_Twee_2022, AUTHOR = {Kapoor, Arnav and Guhathakurta, Dipanwita and Mathur, Mehul and Yadav, Rupanshu and Gupta, Manish and Kumaraguru, Ponnurangam }, TITLE = {TweetBoost: Influence of Social Media on NFT Valuation}, BOOKTITLE = {International Conference on World wide web}. YEAR = {2022}}

TweetBoost: Influence of Social Media on NFT Valuation

Abstract

NFT or Non-Fungible Token is a token that certifies a digital asset to be unique. A wide range of assets including, digital art, music, tweets, memes, are being sold as NFTs. NFT-related content has been widely shared on social media sites such as Twitter. We aim to understand the dominant factors that influence NFT asset valuation. Towards this objective, we create a first-of-its-kind dataset linking Twitter and OpenSea (the largest NFT marketplace) to capture social media profiles and linked NFT assets. Our dataset contains 245,159 tweets posted by 17,155 unique users, directly linking 62,997 NFT assets on OpenSea worth 19 Million USD. We have made the dataset public. We analyze the growth of NFTs, characterize the Twitter users promoting NFT assets, and gauge the impact of Twitter features on the virality of an NFT. Further, we investigate the effectiveness of different social media and NFT platform features by experimenting with multiple machine learning and deep learning models to predict an asset's value. Our results show that social media features improve the accuracy by 6% over baseline models that use only NFT platform features. Among social media features, count of user membership lists, number of likes and retweets are important features.

Contrastive Personalization Approach to Suspect Identification (Student Abstract)

AAAI Conference on Artificial Intelligence, AAAI, 2022

Core Rank : A* Google Rank :220

Abs PDF bibTex

@inproceedings{bib_Cont_2022, AUTHOR = {Gupta, Devansh and Bhasin, Drishti and Bhagat, Sarthak and Uppal, Shagun and Kumaraguru, Ponnurangam and Shah, Rajiv Ratn }, TITLE = {Contrastive Personalization Approach to Suspect Identification (Student Abstract)}, BOOKTITLE = {AAAI Conference on Artificial Intelligence}. YEAR = {2022}}

Contrastive Personalization Approach to Suspect Identification (Student Abstract)

Abstract

Targeted image retrieval has long been a challenging problem since each person has a different perception of different features leading to inconsistency among users in describing the details of a particular image. Due to this, each user needs a system personalized according to the way they have structured the image in their mind. One important application of this task is suspect identifcation in forensic investigations where a witness needs to identify the suspect from an existing criminal database. Existing methods require the attributes for each image or suffer from poor latency during training and inference. We propose a new approach to tackle this problem through explicit relevance feedback by introducing a novel loss function and a corresponding scoring function. For this, we leverage contrastive learning on the user feedback to generate the next set of suggested images while improving the level of personalization with each user feedback iteration.

HashSet - A Dataset For Hashtag Segmentation

International Conference on Language Resources and Evaluation, LREC, 2022

Core Rank : B Google Rank :59

Abs PDF bibTex

@inproceedings{bib_Hash_2022, AUTHOR = {Prashant, Kodali and Bhatnagar, Akshala and Ahuja, Naman and Shrivastava, Manish and Kumaraguru, Ponnurangam }, TITLE = {HashSet - A Dataset For Hashtag Segmentation}, BOOKTITLE = {International Conference on Language Resources and Evaluation}. YEAR = {2022}}

HashSet - A Dataset For Hashtag Segmentation

Abstract

Hashtag segmentation is the task of breaking a hashtag into its constituent tokens. Hashtags often encode the essence of usergenerated posts, along with information like topic and sentiment, which are useful in downstream tasks. Hashtags prioritize brevity and are written in unique ways - transliterating and mixing languages, spelling variations, creative named entities. Benchmark datasets used for the hashtag segmentation task - STAN, BOUN - are small in size and extracted from a single set of tweets. However, datasets should reflect the variations in writing styles of hashtags and also account for domain and language specificity, failing which the results will misrepresent model performance. We argue that model performance should be assessed on a wider variety of hashtags, and datasets should be carefully curated. To this end, we propose HashSet, a dataset comprising of: a) 1.9k manually annotated dataset; b) 3.3M loosely supervised dataset. HashSet dataset is sampled from a different set of tweets when compared to existing datasets and provides an alternate distribution of hashtags to build and validate hashtag segmentation models. We show that the performance of SOTA models for Hashtag Segmentation drops substantially on proposed dataset, indicating that the proposed dataset provides an alternate set of hashtags to train and assess models. Datasets and results are released publicly and can be accessed from https://github.com/prashantkodali/HashSet

SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing

Findings of the Association for Computational Linguistics, FACL, 2022

Core Rank : A Google Rank :-

Abs PDF bibTex

@inproceedings{bib_SyMC_2022, AUTHOR = {Prashant, Kodali and Goel, Anmol and Choudhury, Monojit and Shrivastava, Manish and Kumaraguru, Ponnurangam }, TITLE = {SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing}, BOOKTITLE = {Findings of the Association for Computational Linguistics}. YEAR = {2022}}

SyMCoM - Syntactic Measure of Code Mixing A Study Of English-Hindi Code-Mixing

Abstract

Code mixing is the linguistic phenomenon where bilingual speakers tend to switch between two or more languages in conversations. Recent work on code-mixing in computational settings has leveraged social media code mixed texts to train NLP models. For capturing the variety of code mixing in, and across corpus, Language ID (LID) tags based measures (CMI) have been proposed. Syntactical variety/patterns of code-mixing and their relationship vis-a-vis computational model’s performance is under explored. In this work, we investigate a collection of English(en)-Hindi(hi) code-mixed datasets from a syntactic lens to propose, SyMCoM, an indicator of syntactic variety in code-mixed text, with intuitive theoretical bounds. We train SoTA en-hi PoS tagger, accuracy of 93.4%, to reliably compute PoS tags on a corpus, and demonstrate the utility of SyMCoM by applying it on various syntactical categories on a collection of datasets, and compare datasets using the measure.

Diagnosing Data from ICTs to Provide Focused Assistance in Agricultural Adoptions

International Conference on Information and Communication Technologies and Development, ICTD, 2022

Core Rank : C Google Rank :20

Abs PDF bibTex

@inproceedings{bib_Diag_2022, AUTHOR = {Singh, Ashwin and GARG, LOKESH and ARYA, ERICA and Subramanian, Mallika and Agarwal, Anmol and Priyadarshi, Pratyush Pratap and Gupta, Shrey and GARIMELLA, KIRAN and Kumaraguru, Ponnurangam and KUMAR, SANJEEV and KUMAR, RITESH and , }, TITLE = {Diagnosing Data from ICTs to Provide Focused Assistance in Agricultural Adoptions}, BOOKTITLE = {International Conference on Information and Communication Technologies and Development}. YEAR = {2022}}

Diagnosing Data from ICTs to Provide Focused Assistance in Agricultural Adoptions

Abstract

In the last two decades, Information and Communication Technologies (ICTs) have played a pivotal role in empowering rural populations in India by making knowledge more accessible. Digital Green is one such ICT that employs a participatory approach with smallholder farmers to produce instructional agricultural videos that encompass content specific to them. With the help of human mediators, they disseminate these videos to farmers using projectors to improve the adoption of agricultural practices. Digital Green’s web-based data tracker (CoCo) stores the attendance and adoption logs of millions of farmers, the videos screened to them and their demographic information. In our work, we leverage this data for a period of ten years between 2010-2020 across five states in India where Digital Green is most active and use it to conduct a holistic evaluation of the ICT. First, we find disparities in the adoption rates of farmers, following which we use statistical tests to identify the different factors that lead to these disparities as well as gender-based inequalities. We find that farmers with higher adoption rates adopt videos of shorter duration and belong to smaller villages. Second, to provide assistance to farmers facing challenges, we model the adoption of practices from a video as a prediction problem and experiment with different model architectures. Our classifier achieves accuracies ranging from 79% to 90% across the five states, demonstrating its potential for assisting future ethnographic investigations. Third, we use SHAP values in conjunction with our model for explaining the impact of various network, content and demographic features on adoption. Our research finds that farmers greatly benefit from past adopters of a video from their group and village. We also discover that videos with a low content-specificity benefit some farmers more than others. Next, we highlight the implications of our findings by translating them into recommendations for providing focused assistance, community building, video screening, revisiting participatory approach and mitigating inequalities. Lastly, we conclude with a discussion on how our work can assist future investigations into the lived experiences of farmers. CCS Concepts: • Human-centered computing → Empirical studies in collaborative and social computing. Additional Key Words and Phrases: Diagnosis, ICT4D, Agriculture, Social Networks

Learning to Automate Follow-up Question Generation using Process Knowledge for Depression Triage on Reddit Posts

Workshop on Computational Linguistics and Clinical Psychology, CLPsych-w, 2022

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Lear_2022, AUTHOR = {Gupta, Shrey and Agarwal, Anmol and Gaur, Manas and Roy, Kaushik and Narayanan, Vignesh and Kumaraguru, Ponnurangam and Sheth, Amit }, TITLE = {Learning to Automate Follow-up Question Generation using Process Knowledge for Depression Triage on Reddit Posts}, BOOKTITLE = {Workshop on Computational Linguistics and Clinical Psychology}. YEAR = {2022}}

Learning to Automate Follow-up Question Generation using Process Knowledge for Depression Triage on Reddit Posts

Abstract

Conversational Agents (CAs) powered with deep language models (DLMs) have shown tremendous promise in the domain of mental health. Prominently, the CAs have been used to provide informational or therapeutic services (e.g., cognitive behavioral therapy) to patients. However, the utility of CAs to assist in mental health triaging has not been explored in the existing work as it requires a controlled generation of follow-up questions (FQs), which are often initiated and guided by the mental health professionals (MHPs) in clinical settings. In the context of ‘depression’, our experiments show that DLMs coupled with process knowledge in a mental health questionnaire generate 12.54% and 9.37% better FQs based on similarity and longest common subsequence matches to questions in the PHQ-9 dataset respectively, when compared with DLMs without process knowledge support. Despite coupling with process knowledge, we find that DLMs are still prone to hallucination, i.e., generating redundant, irrelevant, and unsafe FQs. We demonstrate the challenge of using existing datasets to train a DLM for generating FQs that adhere to clinical process knowledge. To address this limitation, we prepared an extended PHQ-9 based dataset, PRIMATE, in collaboration with MHPs. PRIMATE contains annotations regarding whether a particular question in the PHQ-9 dataset has already been answered in the user’s initial description of the mental health condition. We used PRIMATE to train a DLM in a supervised setting to identify which of the PHQ-9 questions can be answered directly from the user’s

Understanding the Impact of Awards on Award Winners and the Community on Reddit

IEEE International Conference on Advances in Social Networks Analysis and Mining, ASONAM, 2022

Core Rank : B Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Unde_2022, AUTHOR = {Tulasi, Avinash and Mondal, Mainack and Buduru, Arun Balaji and Kumaraguru, Ponnurangam }, TITLE = {Understanding the Impact of Awards on Award Winners and the Community on Reddit}, BOOKTITLE = {IEEE International Conference on Advances in Social Networks Analysis and Mining}. YEAR = {2022}}

Understanding the Impact of Awards on Award Winners and the Community on Reddit

Abstract

Non-financial incentives in the form of awards often act as a driver of positive reinforcement and elevation of social status in the offline world. The elevated social status results in people becoming more active, aligning to a change in the communities' expectations. However, the impact in terms of longevity of social influence and community acceptance of leaders of these incentives in the form of awards are not well-understood in the online world. Our work aims to shed light on the impact of these awards on the awardee and the community. We focus on three large subreddits with a snapshot of 219K posts and 5.8 million comments contributed by 88K Reddit users who received 14,146 awards. Our work establishes that the behaviour of awardees change statistically significantly for a short time after getting an award; however, the change is ephemeral since the awardees return to their pre-award behaviour within days. Additionally, via a user survey, we identified a long-lasting impact of awards-we found that the community's stance softened towards awardees.

The Pursuit of Being Heard: An Unsupervised Approach to Narrative Detection in Online Protest

IEEE International Conference on Advances in Social Networks Analysis and Mining, ASONAM, 2022

Core Rank : B Google Rank :-

Abs PDF bibTex

@inproceedings{bib_The__2022, AUTHOR = {Neha, Kumari and Agrawal, Vibhu and Buduru, Arun Balaji and Kumaraguru, Ponnurangam }, TITLE = {The Pursuit of Being Heard: An Unsupervised Approach to Narrative Detection in Online Protest}, BOOKTITLE = {IEEE International Conference on Advances in Social Networks Analysis and Mining}. YEAR = {2022}}

The Pursuit of Being Heard: An Unsupervised Approach to Narrative Detection in Online Protest

Abstract

Protests and mass mobilization are scarce; however, they may lead to dramatic outcomes when they occur. Social media such as Twitter has become a center point for the organization and development of online protests worldwide. It becomes crucial to decipher various narratives shared during an online protest to understand people’s perceptions. In this work, we propose an unsupervised clustering-based framework to understand the narratives present in a given online protest. Through a comparative analysis of tweet clusters in 3 protests around government policy bills, we contribute novel insights about narratives shared during an online protest. Across case studies of government policy-induced online protests in India and the United Kingdom, we found familiar mass mo- bilization narratives across protests. We found reports of on-ground activities and call-to-action for people’s participation narrative clusters in all three protests under study. We also found protest-centric narratives in different protests, such as skepticism around the topic. The results from our analysis can be used to understand and compare people’s perceptions of future mass mobilizations. Index Terms—Social Media Protest, Unsupervised clustering, Protests, Narratives, Twitter