Abstract
The use of the Internet and online social networks has increased tremendously around the world over the past decade, providing an opportunity to exchange thoughts, ideas, opinions, and feelings with other people. This exponential growth of social media networks has allowed the production, distribution, and consumption of data at a phenomenal rate. However, it has led to various forms of online problems such as offensive content, hate speech, fake news, racism, trolling, etc. Additionally, such issues arise due to the ability to express any content on social media and the possibility of writing anonymous posts or comments. Although every social media platform has community standards and guidelines against online abuse, often these guidelines are indistinct and subjective. So, it is vital to identify and mitigate such social media issues to prevent psychological impact on affected communities and to avert
hate crimes. And simultaneously, we need to increase encouraging, positive, and supportive content on social media.
This thesis predominantly focuses on developing automated models to detect and classify the (abovementioned) online issues in two distinct modalities. The first one is Text; another one is the combination of Text + Vision (multimodality). In the text modality, we worked on two different types of text, one of which is the monolingual English text and the other is code-mixed text. Code-mixing is a prevalent phenomenon that occurs in text or speech on social media, so we work on social media problems with
the monolingual and code-mixed text. Internet memes appear in multimodality (text + vision) on social media networks. Their primary purpose is to communicate ideas and opinions through the use of combinations of text and images, which create a specific state for the receptor, depending on the message the meme is to send. Most of the memes are downright humorous, while others, behind an amusing presentation, seek to convey subtle subtleties, including hatred, fake, sarcasm, propaganda regarding
an idea, or a motivational purpose. Therefore, it is necessary to identify and remove such content on social media and make it a safer place for everyone. Hence, this thesis presents the various traditional machine learning models to the latest advanced transformer-based models to detect burgeoning social
media problems in text and multimodalities. Coming to language use in social media, we find both monolingual data and code-mixed data. Initially, in monolingual text, we investigate the work of detecting fake news in COVID-19 related social media posts through the adaption of transformers-based contextual word representations. We propose an ensemble model by concatenating the BERT, ALBERT, and XLNet representations. This contextual
word representations-based ensemble model outperformed all other individual transformer models for this problem. Group of people increasing conflict and hatred among consumers by exploiting freedom of speech and expression on social media platforms. Therefore, it is essential to take a positive reinforcement approach to research on positive, helpful, and supportive social media content. Wherefore, we build a transformer-based BERT with a threshold-based language detection system to detect Hope
Speech in Youtube comments. The primary intent of this work is to reduce negativity and strengthen encouraging, supportive, and efficacious social media content.Next, we examine two principal issues in code-mixed data. Language identification is a primary
preprocessing step in numerous code-mixed applications. With this purpose, we develop a word level language identification system by utilizing the English-Telugu code-mixed content. Subsequently, we
explore the problem of sentiment analysis with English-Hindi bilingual code-mixed data. For this task, we propose a character level and sub-word level word representations with long-short term memory architecture. Later in multimodality, we explore a very salient new issue on social media: the emotion analysis of
Internet memes. This work analyzes three different types of issues that revolve around Internet memes. The first of them is predicting the sentiment polarity of a meme, the second is a multi-label classification task that assesses whether a meme is offensive, humorous, satirical, or motivational, and the third is a multi-output ordinal classification task that predicts the degree of offense, humor, sarcasm of a meme. To handle these tasks, we introduce a multimodal architecture with late fusion technique that combines
LSTM for textual features with VGG-16 for image feature representations. Propaganda is a communication tool that nfluences the opinions and actions of other people to
achieve a predetermined goal. Initially, it was seen in newspapers, advertisements, and so on, but now it is widely used on social media. Therefore we develop a multimodal fusion system to detect propaganda in memes. We have used a robust fusion