IIIT

A Level Playing Field? Comparative Analysis of Political and SocialLandscapes in Highland and Lowland India

Author(s): Devesh Marwah
Advisor(s): Aniket Alam

Masters

June '25
Report no: IIIT/TH//
Center of HSRC

Abs PDF

A Level Playing Field? Comparative Analysis of Political and SocialLandscapes in Highland and Lowland India

Abstract

This thesis investigates the influence of geography on politics and social structures in India. This study specifically views geography as a driving force that actively shapes how politics evolve by particularly examining deviations from Duverger’s Law. According to Duverger’s Law, single-member plurality electoral systems typically result in two-party dominance. However, India is a notable exception to this law. To test the hypothesis we perform a quantitative analysis on electoral data comparing India’s Himalayan states to those in the Gangetic and Brahmaputra plains. The electoral data is operationalized using the Effective Number of Parties (ENP) which measures the degree of political fragmentation across parliamentary and assembly constituencies in both mountain and plain states. The results indicate that the plains are diverging from while the mountain states are converging towards Duverger’s two- party equilibrium. Complementing the electoral data, this study also explores the gender dynamics of mountain and plain societies by using data from the National Family Health Survey and Census data over various parameters – namely, literacy rates, child marriage prevalence, contraceptive use, and breastfeeding practices. The parameters indicate the personal liberties of women and their agency in making decisions that affect their personal and family lives. Upon aggregating these parameters into a composite ranking system, we find that women in mountain states generally have more autonomy than those in plain states. To explain this, this thesis draws on anthropological and historical scholarship which postulates that mountain societies are structurally different from plain societies with regard to political and gender dynamics. Revisiting the theories of identity politics and strategic voting suggests that strategic voting alone cannot account for the observed deviations from Duverger’s Law. This study engages with the idea of Zomia, first presented by Schendel and elaborated by Scott, which hypothesizes that highland communities across Asia are structurally different from plain communities, and have historically resisted state incorporation; developing more egalitarian and decentralized social systems. The results observed in our quantitative analyses are consistent with Scott’s hypothesis which suggests that geographic differences might be a reason for the difference. The study also includes detailed case studies of Himachal Pradesh and Manipur to illustrate how these dynamics manifest within individual states. By integrating electoral theory, sociological data, and regional case studies, this thesis offers geographical differences as a potential parameter to the understanding of Indian democracy.

Formal Languages for Mechanistic Interpretability

Author(s): Abhinav S Menon
Advisor(s): Manish Shrivastava

Masters

June '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

Formal Languages for Mechanistic Interpretability

Abstract

Neural models have seen exponential changes, both in terms of scale and deployment, in the years since transformers and large language models were developed. The scale of these models and of their training data have enabled them to reach near- (and in some cases super-) human performances in several tasks. However, this raises concerns of value misalignment and potential misbehaviour of these models in high-stakes situations. This creates the need for a more fine-grained, general, and mathematical understanding of the functioning of these models, with the objective of being able to reliably and generally predict and control their behaviour. This is the central effort of interpretability, a field of study aiming to reduce the heavily overparameterized functions implemented by neural nets to simple, sparse, and abstract causal models. However, the relative immaturity of the discipline has meant that the rigour of paradigms, techniques, and experiments has not seen consensus. In this thesis, we present a proof of concept that analogy with the natural sciences can form a valuable foundation for achieving the long-term aims of interpretability; in particular, we leverage the reductionist approach to understanding complex systems, and apply it to the study of deep models. We restrict our scope to models that operate on natural language – or, more generally, text – rather than other modalities like images, audio, or time series. We take inspiration, therefore, from computational linguistics, which in its incipient phases relied on a remarkably expressive reduction of natural language – formal grammars. We exploit this concept to idealize the conditions under which we examine neural language models, and present a study that operationalizes this intuition. Concretely, we examine the recently popular sparse autoencoder (SAE) method for interpretability. This method centres on using two-layer MLPs with a sparse, overcomplete hidden representation, trained to encode a latent space of a large model, in the hopes that meaningful semantic decompositions of this space arises. We use language models trained on formal grammars, attempt to uncover relevant features using this approach, and try to find properties of the approach that are significant to its usability. Our findings align for the most part with existing conclusions on the properties of SAEs (although these were based mostly on experiments in the image domain) such as their sensitivity to inductive biases and lack of robustness. Most significantly, we note that the features identified by SAEs are rarely causally relevant – ablating them fails to produce the expected effects most of the time. As causality has emerged as a widely agreed upon sine qua non among interpretability researchers, this is a major deficiency of the method. We propose, accordingly, a modification of the pipeline that aims to incentivize the causality of identified features, and demonstrate its efficacy in the same setting of formal grammars. Overall, we believe that our results demonstrate the potential of importing scientific modi operandi into interpretability, and more specifically, the capacity of reductionism to provide useful insights into the functioning of deep models.

Time Efficient, Space Efficient, and Fault Tolerant Social Network Algorithms for Static and Dynamic Graphs

Author(s): Subhajit Sahu
Advisor(s): Kishore Kothapalli

PhD

June '25
Report no: IIIT/TH//
Center of CSTAR

Abs PDF

Time Efficient, Space Efficient, and Fault Tolerant Social Network Algorithms for Static and Dynamic Graphs

Abstract

The explosive growth of interconnected data has elevated the role of graph analytics in domains ranging from social media and e-commerce to transportation and biological systems. However, the massive scale and dynamic nature of real-world graphs pose substantial challenges to traditional graph processing methods, which are often sequential, memory-intensive, and ill-suited for rapid updates. This thesis addresses these limitations by developing high-performance, memory-efficient, and fault-tolerant algorithms for analyzing both static and dynamic graphs, with a focus on community detection, link prediction, and PageRank computation. We first introduce GVE-Louvain and GVE-Leiden — parallel, shared-memory implementations that significantly accelerate community detection by optimizing both the local-moving and aggregation phases. These algorithms employ techniques such as preallocated CSR structures, per-thread hash tables, dynamic OpenMP scheduling, and a refinement step for Leiden. On a 3.8B-edge graph, GVE-Louvain and GVE-Leiden achieve processing rates of 560M and 403M edges/s, respectively, offering up to 50× mean speedup over existing methods while maintaining or improving modularity. To address memory constraints, we propose weighted-sketch-based variants of Louvain, Leiden, and LPA that replace per-thread hash tables with Misra-Gries and Boyer-Moore sketches. These methods maintain over 99% of the community quality while reducing memory usage to a few kilobytes per thread and incurring only modest runtime overhead. For link prediction, we introduce DLH (Disregard Large Hubs), a parallel algorithm that restricts similarity computations to 2-hop neighborhoods and skips high-degree hubs to improve both efficiency and accuracy. DLH achieves up to 1622× mean speedup over baseline methods and reaches processing rates of 38.1M edges/s on billion-scale graphs. In the dynamic setting, we develop asynchronous PageRank update strategies (DF and DF-P) that selectively recompute ranks based on local changes, as well as DFLF — a fault-tolerant, lock-free parallel implementation. These methods deliver up to 26× mean speedup over static recomputation and maintain high accuracy and scalability under thread failures. Finally, we extend the Dynamic Frontier approach to community detection on dynamic graphs. This technique identifies minimal affected regions using efficient heuristics and supports integration with parallel Louvain, LPA, and hybrid algorithms. Our methods consistently outperform current dynamic algorithms in speed and community quality on large-scale benchmarks. Together, these contributions represent a comprehensive suite of scalable, practical solutions for processing massive, evolving graphs using multicore architectures.

Cinematic Video Editing: Integrating Audio-Visual Perception and Dialogue Interpretation

Author(s): Girmaji Rohit
Advisor(s): Vineet Gandhi

Masters

June '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Cinematic Video Editing: Integrating Audio-Visual Perception and Dialogue Interpretation

Abstract

This thesis focuses on advancing automated video editing by analyzing raw, unedited footage to extract essential information such as speaker detection, video saliency, and dialogue interpretation. At the core of this work is EditIQ, an automated video editing pipeline that leverages speaker cues, saliency predictions, and large language model (LLM)-based dialogue understanding to optimize shot selection—the critical step in the editing process. The study begins with a comprehensive assessment of active speaker detection techniques tailored for automated editing. Using the BBC Old School Dataset, annotated with active speaker information, we propose a robust audio-based nearest-neighbor algorithm that integrates facial and audio features. This approach reliably identifies speakers even under challenging conditions such as occlusions and noise, outperforming existing methods and closely aligning with manual annotations. In the domain of video saliency prediction, we present ViNet-S and ViNet-A, compact yet effective models designed to predict saliency maps and identify salient regions in video frames. These models are computationally efficient, balancing high accuracy with reduced model complexity. Starting with a static, wide-angle camera feed, EditIQ generates multiple virtual camera feeds, mimicking a team of cinematographers. Speaker detection, saliency-based scene understanding, and LLMsdriven dialogue analysis guide shot selection, which is formulated as an energy minimization problem. This optimization ensures cinematic coherence, smooth transitions, and narrative clarity in the final output. The efficacy of EditIQ is validated through a psychophysical study involving twenty participants using the BBC Old School dataset. Results demonstrate EditIQ’s ability to produce aesthetically compelling and narratively coherent edits, surpassing competing baselines and showcasing its potential to transform raw footage into polished cinematic narratives.

Theoretical and Empirical Advances in Steering Neural Language Models

Author(s): Shashwat Singh
Advisor(s): Ponnurangam Kumaraguru

Masters

June '25
Report no: IIIT/TH//
Center of C2S2

Abs PDF

Theoretical and Empirical Advances in Steering Neural Language Models

Abstract

Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model’s representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model’s representations in a manner that reduces the probability of it generating undesirable text. In this thesis, we present work that investigates the formal and empirical properties of steering functions, i.e., transformation of the neural language model’s representations that alter its behavior. First, we derive two optimal, in the least-squares sense, affine steering functions under different constraints. Our theory provides justification for existing approaches and offers a novel, improved steering approach. Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation.

Learning with Weak Supervision for Visual Scene Understanding

Author(s): Aditya Arun
Advisor(s): Jawahar C V

PhD

June '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Learning with Weak Supervision for Visual Scene Understanding

Abstract

n recent years, computer vision has made remarkable progress in understanding visual scenes, in- cluding tasks such as object detection, human pose estimation, semantic segmentation, and instance segmentation. These advancements are largely driven by high-capacity models, such as deep neural networks, trained in fully supervised settings with large-scale labeled data sets. However, reliance on extensive annotations poses scalability challenges due to the significant human effort required to create these data sets. Fine-grained annotations, such as pixel-level segmentation masks, keypoint coordinates for pose estimation, or detailed object instance boundaries, provide the high precision needed for many tasks but are extremely time-consuming and costly to produce. Coarse annotations, on the other hand, such as image-level labels or approximate scribbles, are much easier and faster to create but lack the granularity required for detailed model supervision. To address these challenges, researchers have increasingly explored alternatives to traditional su- pervised learning, with weakly supervised learning emerging as a promising approach. This approach mitigates annotation costs by utilizing coarse annotations (cheaper and less detailed) during training rather than the fine-grained annotations required at the output stage during testing. Despite its poten- tial, weakly supervised learning faces challenges in transferring information from coarse annotations to fine-grained predictions, often encountering ambiguity and uncertainty during this process. Existing methods rely on various priors and heuristics to refine annotations, which are then used to train models for specific tasks. This involves managing uncertainty in latent variables during training and ensuring accurate predictions for both latent and output variables at test time. This thesis introduces a unified approach to weakly supervised learning in computer vision, address- ing tasks such as human pose estimation, object detection, and instance segmentation. Central to this work is a framework based on the dissimilarity coefficient loss, which models uncertainty in the loca- tion of objects and human poses using coarse annotations. The approach employs two key probability distributions: • Conditional Distribution: Captures output probabilities using coarse annotations (e.g., action la- bels, image-level labels, object counts), modeled with deep generative models for efficient sam- pling. • Prediction Distribution: Provides test-time predictions independent of coarse annotations.The framework minimizes the difference between these distributions using the dissimilarity coeffi- cient loss, facilitating the transfer of information from coarse annotations to accurate predictions. This methodology is consistently applied across diverse computer vision tasks, showcasing its versatility. The efficacy of the proposed framework is demonstrated across three progressively complex visual scene recognition tasks: • Human Pose Estimation: A probabilistic framework is introduced for learning human poses from still images using data sets with costly ground-truth pose annotations and inexpensive action la- bels. By aligning the conditional and prediction distributions through the dissimilarity coefficient loss, the method achieves significant improvements over baselines on the MPII and JHMDB data sets, effectively leveraging action information. • Object Detection: The framework addresses weakly supervised object detection (WSOD) by mod- eling uncertainty in object locations using a dissimilarity coefficient-based objective. Leveraging discrete generative models, it efficiently samples from annotation-aware conditional distributions and integrates coarse annotations, such as image-level labels, object counts, points, and scribbles. Spatial cluster regularization and curriculum learning further enhance performance, achieving state-of-the-art results on benchmarks like PASCAL VOC and MS COCO. • Instance Segmentation: The framework models uncertainty in pseudo-label generation using se- mantic class-aware, boundary-aware, and annotation-consistent higher-order terms. By aligning conditional and prediction distributions, it generates accurate pseudo-labels and trains Mask R- CNN-like architectures effectively. Experiments on the PASCAL VOC 2012 data set demonstrate state-of-the-art performance, with improved object boundary alignment and significant gains over baselines

IoT-based Water Consumption Monitoring in Intermittent Water Supply Systems

Author(s): Archit Goyal
Advisor(s): Sachin Chaudhari

Masters

June '25
Report no: IIIT/TH//
Center of SPCRC

Abs PDF

IoT-based Water Consumption Monitoring in Intermittent Water Supply Systems

Abstract

Intermittent Water Supply (IWS) systems, prevalent in many developing regions, pose significant challenges for urban water management due to their scheduled delivery and the resulting need for consumers to store water for non-supply periods. The lack of high-frequency, building-level consumption data in such systems hinders accurate demand profiling, leakage detection, and informed infrastructure planning. This thesis addresses these gaps by developing and deploying scalable, Internet of Things (IoT) enabled strategies to capture and analyse water consumption patterns in IWS settings, with a particular focus on the Indian context. The work is structured in two parts: first, improving the accuracy and reliability of IoT-based smart retrofit devices for analog water meter digitization; and second, constructing and analysing water consumption curves at different temporal resolutions for diverse building types using the collected data. Retrofitting existing analog water meters is a cost-effective approach to enable high-frequency, automated water usage monitoring without replacing established infrastructure. While image-based meter reading using IoT devices has shown promise, real-world deployments encounter significant challenges such as dew accumulation, scratches, smudges, and insect intrusion, all of which can degrade image quality and lead to digit detection errors. To address these issues, this thesis introduces a lightweight Hamming distance-based refinement algorithm that systematically corrects digit misclassifications by leveraging the expected range of meter readings and minimizing the number of differing digits. The algorithm is computationally efficient, requiring only basic arithmetic operations, and is thus well-suited for real-time deployment on edge devices like the Raspberry Pi. To further enhance data quality, a webbased annotation tool was developed, enabling rapid, targeted manual validation of flagged anomalies and supporting the efficient correction and curation of large datasets. Building on these advancements, the thesis proposes two innovative, non-intrusive IoT-based strategies for capturing water consumption in IWS-serviced buildings. The first strategy involves installing smart flow meters at the outlets of Over Head Tank (OHT) to directly measure consumption. The second combines flow meters at the inlet with water level sensors inside OHTs, using mass conservation principles to infer consumption at high temporal resolution. These approaches minimize the need for intrusive instrumentation and are readily scalable to a variety of building types commonly found in Indian urban settings. A comprehensive field study was conducted at International Institute of Information TechnologyHyderabad (IIIT-H), where seven IoT nodes were deployed across five buildings, including hostels and classroom blocks. Over 125,000 data points were collected at three-minute intervals over a period of two months, capturing detailed water usage dynamics. The data underwent rigorous cleaning, refinement, and synchronization, leveraging the developed algorithms and annotation tools to ensure high integrity and accuracy. Using a bottom-up approach, the thesis constructs instantaneous, diurnal, and weekly water consumption curves for each building, normalized using peak factors to facilitate comparison of demand volatility across user categories. The resulting consumption patterns reveal distinct signatures for residential and non-residential buildings, highlight the impact of scheduled events (such as sports screenings and hackathons), and quantify temporal variations in demand. These high-resolution insights enable utilities and building managers to detect leaks and unauthorized usage, optimize supply schedules, and inform the design of future-ready water distribution networks. The methodologies and findings also provide a robust foundation for transitioning from IWS to CWS systems and for upgrading conventional buildings to smart, resource-efficient infrastructure.

From Genes to Drugs: A Machine Learning and Network Science Approach to Understanding and Treating Malaria

Author(s): Shreeya Pahune
Advisor(s): Bhaswar Ghosh

Masters

June '25
Report no: IIIT/TH//
Center of CCNSB

Abs PDF

From Genes to Drugs: A Machine Learning and Network Science Approach to Understanding and Treating Malaria

Abstract

Malaria, caused by Plasmodium falciparum, remains a major global health challenge, requiring both a deeper understanding of parasite biology and innovative therapeutic strategies. This thesis presents a dual-pronged computational approach to address this challenge by combining transcriptomic analysis of Plasmodium falciparum with network-driven drug repurposing methodologies to address these gaps. First, we analyze single-cell RNA sequencing (scRNA-seq) data from the Malarial Cell Atlas to gain comprehensive insights into P. falciparum’s complex life cycle. Using feature selection techniques and a neural network classifier, we extract high-confidence gene sets that capture stage-specific signatures with high accuracy and outperform the gene set, confirming the capture of distinctive stage-specific signatures. Functional enrichment and pathway analyses further validate their biological relevance, linking distinct gene sets to immune evasion, haemoglobin digestion, merozoite invasion, and sexual differentiation. Next, we implemented a network-driven drug repurposing framework to identify repurposable drugs for malaria. We constructed an integrated graph combining Protein-Protein Interaction networks, Drug-Target Protein graphs, Disease-Protein associations, and Drug-Disease links to predict potential antimalarial drug candidates. Comparative analysis of several Graph Neural Network architectures, including Graph Convolutional Networks, Graph Attention Networks, GraphSAGE, and Graph Isomorphism Networks, showed comparable performance with AUC-ROC values around 0.98. However, when the task was framed as a recommendation problem using Matrix Factorization with side information, performance decreased significantly, highlighting the critical importance of incorporating protein-protein interactions in the modelling process. These findings underscore the power of combining transcriptomic insights with computational drug discovery. By refining gene selection for parasite stage characterization and leveraging network-based drug repurposing, this work provides a data-driven framework for identifying potential malaria therapeutics. Future research should focus on experimental validation and extending these methodologies to other infectious diseases.

Towards understanding Compositionality in Vision-Language Models

Author(s): Darshana S
Advisor(s): Vineet Gandhi

Masters

June '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Towards understanding Compositionality in Vision-Language Models

Abstract

Human intelligence relies on compositional generalization: the ability to interpret novel situations by flexibly combining familiar concepts and relational structures. This thesis investigates compositionality in vision-language models (VLMs), focusing on their ability to understand and generalise across visual (images, videos) and linguistic inputs. In the first part, we introduce VELOCITI, a benchmark for evaluating compositional understanding in video-language models through a suite of entailment tasks. Unlike prior compositionality benchmarks constrained to single-agent videos, VELOCITI captures the complexity of real-world videos involving multiple agents and dynamic interactions. VELOCITI assesses how well models recognize and bind agents, actions, and temporal events using both text-inspired and in-video counterfactual negations. In the second part, we probe the internal activations of VLMs to understand how concepts in an image are bound to their attributes and references in text. Extending the Binding ID mechanism in language models, we demonstrate that VLMs construct binding ID vectors in the activations of both image tokens and their textual references, enabling in-context concept association. Together, these contributions advance our understanding of compositional reasoning in VLMs and offer tools for probing their capabilities.

Deciphering Narendra Modi: Hybrid Methodologies for Analysing Political Text

Author(s): Pranoy J
Advisor(s): Sushmita Banerji

Masters

June '25
Report no: IIIT/TH//
Center of HSRC

Abs PDF

Deciphering Narendra Modi: Hybrid Methodologies for Analysing Political Text

Abstract

This thesis examines the utilisation of hybrid methodologies, integrating traditional textual analysis with computational techniques, to optimise research efficiency in the analysis of political discourse. This endeavour aligns with the ongoing paradigm shift within the Digital Humanities, driven by advancements in computational and machine learning technologies. Specifically, two distinct projects analyse the public addresses of the incumbent Prime Minister, Narendra Modi. The first project investigates Mann ki Baat, a series of radio broadcasts initiated during the first Modi administration. This study aims to ascertain the primary objective of the series. The second project analyses the Prime Minister’s Independence Day speeches (15th August), exploring the contrasts in narrative style and content between the two terms of the current administration and the rationale behind these perceived changes. The first study examining the Mann ki Baat radio broadcasts utilises a suite of Natural Language Processing (NLP) techniques to generate initial observations. Subsequently, classical textual analysis is applied to a carefully curated corpus of 25 speeches, selected for their temporal alignment with significant socio-political events in India, to validate, substantiate, and extend these findings. The research demonstrates that the series functions as an aesthetic imperative, representing a large-scale governmental initiative that uses state machinery and financial resources to propagate the values and ideals of the BJP as a political entity over the overarching NDA government. The second study divides Narendra Modi’s ten Indian Independence Day addresses according to his terms of office. To facilitate rigorous analysis, a comprehensive annotation procedure was implemented, wherein the thematic content of each sentence was systematically categorised. Subsequently, a comparative analysis was performed, contrasting the content and narrative styles of the two sets of speeches. The findings reveal a significant thematic transition: from an emphasis on inclusivity during the first term to a focus on infrastructural development and national self-sufficiency in the second. Furthermore, discernible shifts in narrative style were observed, including a consistent augmentation of monoreligious allusions and direct appeals to a gendered audience. By integrating the conclusions derived from both projects, albeit obtained through different methodologies, a coherent understanding of the overarching narrative that the government hasarticulated during its tenure in power can be established. Furthermore, this research demonstrates the feasibility of employing hybrid methodologies in the analysis of political rhetoric.