IIIT

Fault Tolerant Control of Quadrotors

Author(s): Vidya C S
Advisor(s): Harikumar Kandath

Masters

July '25
Report no: IIIT/TH//
Center of RRC

Abs PDF

Fault Tolerant Control of Quadrotors

Abstract

Quadrotors are extensively utilized in applications such as surveillance, mapping, and delivery, owing to their high maneuverability and vertical takeoff and landing (VTOL) capabilities. The deployment of quadrotors in these critical tasks necessitates a high degree of safety and reliability. However, their inherent lack of rotor redundancy poses significant challenges in addressing motor faults, making fault tolerance a crucial concern in their design and operation. This thesis presents a simple method for stabilizing the quadrotor dynamics under complete loss of one actuator using two-stage optimal control. Detailed equilibrium analysis and subsequent selection of the operating point under actuator loss are provided, incorporating constraints on the maximum available thrust. Two distinct cases were considered for equilibrium selection: one in which two actuators have equal inputs, and another in which all actuator inputs are allowed to vary independently. It is shown that when all actuator inputs are allowed to vary independently, the system can track an additional state, thereby enabling attitude tracking. A detailed simulation study using a high-fidelity nonlinear model of the quadrotor is presented, showing the stability and performance of the closed-loop system under complete actuator loss, in the presence of external disturbances. To ensure better maneuverability, the quadrotor under fault may have to shift to multiple equilibrium states along the course of a single flight. This poses a challenge as designing a controller for every equilibrium state across the state space is a tedious task. To address this challenge, a controller capable of stabilizing the quadrotor under fault across multiple equilibrium states was designed by applying Linear Matrix Inequality (LMI) conditions. The controller was tested in simulation using a high-fidelity nonlinear model of the quadrotor. The proposed method not only ensures continued stable flight in fault scenarios but also lays the groundwork for resilient flight control design in safety-critical aerial robotics applications.

Dynamical Aspects of Ergodic Quantum Channels

Author(s): Ritam Basu
Advisor(s): Samyadeb Bhattacharya

Masters

July '25
Report no: IIIT/TH//
Center of CSTAR

Abs PDF

Dynamical Aspects of Ergodic Quantum Channels

Abstract

In this work, we introduce and rigorously characterize a broad class of quantum operations termed quantum ergodic channels. These are completely positive, and trace-preserving (CPTP) maps that possess a unique fixed point toward which any initial quantum state asymptotically evolves. The existence of such a fixed point enables a systematic exploration of ergodicity in quantum dynamics, extending classical notions into the quantum regime. We construct and analyze Lindblad-type master equations governing the evolution under these ergodic channels in arbitrary finite-dimensional Hilbert spaces. These equations describe both Markovian and non-Markovian dynamics, depending on the time dependence and structure of the generator. In particular, the class of ergodic channels considered here encompasses both memoryless and memory-affected evolution, allowing us to explore their differences through established quantitative measures of non-Markovianity. A special focus is placed on the case where the fixed point of the channel is a passive state, i.e., a state from which no work can be extracted via unitary operations. In such settings, evolution under ergodic channels gives rise to non-trivial ergotropy dynamics. Ergotropy is a fundamental thermodynamic quantity that quantifies the maximum extractable work from a quantum state via unitary transformations, assuming no access to additional resources. We show that in the Markovian regime, where the dynamics lack memory, the ergotropy of the system decreases monotonically over time, consistent with the second law of thermodynamics in open systems. However, under non-Markovian dynamics, the situation changes significantly. We demonstrate that ergotropy can temporarily increase during evolution, indicating a backflow of useful energy from the environment into the system. This phenomenon—referred to as ergotropy backflow—is a direct manifestation of memory effects and points to the possibility of temporary reversals in the degradation of thermodynamic resources. Our analysis shows that this backflow is not only physically meaningful but also operationally relevant: it can serve as a dynamical witness or indicator of non-Markovianity. These findings offer a novel perspective on the interplay between non-Markovianity and thermodynamic behavior in open quantum systems. By grounding the discussion in welldefined resource-theoretic and thermodynamic terms, this study deepens our understanding of how memory effects influence the evolution of quantum resources such as ergotropy. In particular, it highlights the role of ergodic channels as a versatile theoretical framework for studying energy dynamics, dissipation, and resource recovery in realistic quantum settings—including potential applications in the design and analysis of quantum batteries. Overall, the results presented here enrich the theoretical framework of open quantum systems and pave the way for future investigations into the thermodynamic implications of quantum memory effects. They suggest practical strategies for identifying, characterizing, and potentially exploiting non-Markovian dynamics to enhance the performance of emerging quantum technologies.

Do Large Language Models Reason by following Rules?

Author(s): Karthik Prasanna N
Advisor(s): Ashwin Jayanti

Masters

July '25
Report no: IIIT/TH//
Center of HSRC

Abs PDF

Do Large Language Models Reason by following Rules?

Abstract

Training and evaluating large language models (LLMs) on deductive reasoning tasks has attracted much attention in recent times. There have been various attempts at training and evaluating LLMs to perform deductive reasoning tasks. Some among these have shown interesting results which suggest that LLMs behave like humans by displaying content effect, that is, they reason better on reasoning tasks containing rules that align with our everyday beliefs and reason poorly when the rules are beliefviolating. Others, using chain-of-thought prompting, a technique to make LLMs generate intermediate steps before making them arrive at the final conclusion, suggest that LLMs may emulate human-like reasoning thought processes by generating intermediate reasoning steps. On the other hand, there are studies which conclude that language models do not yet demonstrate reliable deductive reasoning, since their performance is shown to decrease upon introduction of perturbations, such as synonym substitution, and attribute their reasoning to artefacts from the training data. In this study, we look at these three distinct attempts at evaluating LLMs on deductive competence tasks, each employing different criteria such as content-based reasoning (Dasgupta et al., 2023 [1]), chain-of-thought prompting (Wei et. al., 2023 [2]), and introduction of perturbations (Yuan et al., 2023 [3]). In order to make sense of these claims concerning genuine reasoning, we introduce a framework developed by Diane Proudfoot using externalist criteria for machine cognition (Proudfoot 2004 [4]), which is based on Wittgenstein’s argument that deductive reasoning involves rule-following which is normative in nature. Based on the notion that deductive competence involves the following of normative rules, the framework proposes the use of the Wittgensteinian distinction between rule-following and quasi rule-following as a method to distinguish genuine deductive competence from quasi-competence. The criteria distinguishing between rule-following and quasi rule-following shall be adapted to analyse whether these LLMs can be said to reason genuinely or are merely imitating reasoning-like behaviour. We propose this use of Proudfoot’s criteria for rule-following as a framework to distinguish genuine deductive competence from quasi deductive competence. In doing so, we also draw attention to the limitations and implications of Proudfoot’s claims regarding machine cognition through the introduction of a thought experiment. This thought experiment enables us to think through Proudfoot’s argument, according to which it is due to pragmatic considerations–and not in principle–that LLMs are unlikely to possess genuine reasoning.

Planning and Control Strategies for Contact-Rich, Non-Prehensile Mobile Manipulation

Author(s): Priyansh Sinha
Advisor(s): Nagamanikandan Govindan

Masters

July '25
Report no: IIIT/TH//
Center of RRC

Abs PDF

Planning and Control Strategies for Contact-Rich, Non-Prehensile Mobile Manipulation

Abstract

With the advent of Industry 4.0, the demands on industrial robots have expanded beyond simple pick-and-place tasks. Future smart factories require robots capable of a wide range of manipulation skills, including the ability to manipulate objects with variety of actions like pushing. This thesis investigates the capabilities of a manipulator and the system to perform fine or controlled non-prehensile manipulation (without grasping the objects), potentially exceeding the robot arm’s reachable workspace. Key contributions of this research include: • Hybrid Planner combining Striking, Pushing and Pick and Place motions. • Development of sophisticated optimal control strategies for generate manipulation of objects to specific target locations with high precision without necessarily grasping. These algorithms calculate the optimal contact point,force and velocity required for each action. • Conducting experiments primarily in a simulation environment to validate the effectiveness of the end-effector design and its control algorithms, complemented by preliminary tests on a real robot. These evaluations demonstrate the practical viability and robustness of the proposed system under controlled conditions. The thesis further explores the vast practical implications of Non-Prehensile manipulation. In warehouse logistics, this technology can significantly optimize sorting, material transfer, and distribution processes by enabling faster and more precise handling of items. By combining optimization and planning strategies, this research provides a comprehensive framework for designing a framework for hybrid manipulation. This interdisciplinary methodology enhances the versatility, adaptability, and performance of robots, ultimately improving efficiency, safety, and productivity in various industrial and operational settings. Keywords: Mobile Manipulation, Non-Prehensile Manipulation, Redundant robot, Trajectory optimization, Robot Arm Control, Action-based Planning, Hybrid Action Planner, Whole body control, Robot Simulation, Rigid body dynamics, Sim-to-real, residual learning, Planning framework.

Adaptive Control of Autonomous Aerial Manipulator under Uncertainties and Unknown State-dependent Dynamics

Author(s): Amitabh Sharma
Advisor(s): Spandan Roy

Masters

July '25
Report no: IIIT/TH//
Center of RRC

Abs PDF

Adaptive Control of Autonomous Aerial Manipulator under Uncertainties and Unknown State-dependent Dynamics

Abstract

Enabling effective grasping capabilities in unmanned aerial manipulators (UAMs) presents formidable challenges stemming from the intricate coupling forces that emerge between the flying platform and manipulator arm, compounded by parametric uncertainties and environmental disturbances. Current methodologies predominantly fall into two categories: those demanding precise dynamic models of the entire system, and those addressing the aerial and manipulator subsystems as separate entities , both approaches facing substantial limitations when deployed in practical settings. Despite substantial advancement in research on the topic of aerial manipulation, there is a lack of methods that properly address the effect of uncertain interaction forces between the manipulator interacting with the environment and the floating aerial base. These forces tend to be notoriously hard, if not impossible, to model due to uncertainties in payload characteristics and environmental interaction. This thesis proposes a novel adaptive control architecture for an integrated solution for the UAM system combined with a bistable passive gripper to facilitate dependable aerial grasping operations without necessitating prior knowledge of system dynamics or disturbance characteristics. The bistable gripper employs a pre-stressed spring steel band that transitions between two stable states, autonomously initiating object capture upon contact and thereby minimizing alignment precision requirements. Its innovative cable-driven actuation mechanism, powered by a single compact DC motor, enables efficient gripper release without requiring bulky pneumatic components. The developed adaptive control strategy effectively addresses the challenge of unmodeled coupling dynamics and state-dependent uncertainties inherent in aerial manipulation systems. The controller incorporates adaptation mechanisms that dynamically estimate composite uncertainties, encompassing variations in inertial parameters, Coriolis and centrifugal effects, gravitational influences, and external forces, maintaining reliable tracking performance despite the absence of precise system identification. A rigorous Lyapunov stability analysis demonstrates that the closed-loop system achieves uniform ultimate boundedness under practical operating conditions.

Abstaining to Predict Right: Reliable Graph Neural Networks through Strategic Rejection

Author(s): Jayadratha Gayen
Advisor(s): Charu Sharma

Masters

July '25
Report no: IIIT/TH//
Center of MLL

Abs PDF

Abstaining to Predict Right: Reliable Graph Neural Networks through Strategic Rejection

Abstract

Many real-world systems can be modeled as dynamic graphs, where nodes and edges evolve with time. Graph Neural Networks (GNNs) excel at modeling relational data. Temporal GNNs capture the dynamics of time-changing data very well. Still, their reliability in risk-sensitive domains where errors in fraud detection, legal judgments, or medical diagnoses carry severe consequences remains limited. Traditional GNNs lack mechanisms to quantify uncertainty or abstain from low-confidence predictions, particularly in dynamic systems where temporal evolution and class imbalance amplify ambiguity. This thesis addresses these gaps by introducing strategic abstention mechanisms into graph learning, helping models to prioritize high-confidence decisions while rejecting uncertain predictions. We unify this approach across dynamic and static graphs, advancing reliability in high-stakes applications through uncertainty-aware frameworks. For the first time, our approach integrates a reject option strategy within the framework of GNNs for continuous-time dynamic graphs. This allows the model to strategically abstain from making predictions when uncertainty is high and confidence is low, minimizing the risk of critical misclassification and enhancing reliability. We propose a coverage-based abstention prediction model to implement the reject option that maximizes predictions within specified coverage. It improves prediction scores for link prediction and node classification tasks. Temporal GNNs deal with skewed datasets for the next state prediction or node classification. In cases of class imbalance, our method can be tuned to provide a higher weight to the minority class. Exhaustive experiments are presented on four datasets for dynamic link prediction and two for dynamic node classification tasks. This demonstrates our approach’s effectiveness in improving reliability and area under the curve (AUC)/average precision (AP) scores for predictions in dynamic graph scenarios. The results highlight our model’s ability to efficiently handle trade-offs between prediction confidence and coverage, making it a dependable solution for applications requiring high precision in dynamic and uncertain environments. Beyond temporal graphs, we extend the concept of classification with the reject option to static graph settings. We reformulate legal judgment prediction (LJP) as node classification on citation networks (ILDC dataset), integrating cost- and coverage-based abstention. Our models (NCwR-Cost/NCwR-Cov) improve accuracy by rejecting uncertain cases, ensuring reliability in legal decision-making. SHAPbased explanations reveal case-specific abstention rationale, enhancing transparency. Further validation on medical datasets (thyroid diagnosis, diabetes prediction) confirms cross-domain validation, with abstention mechanisms reducing misclassification risks in ambiguous cases. This work provides deployable solutions for applications where reliability is non-negotiable.

The Anatomy of Synthesis: Simulating Changes in the Human Brain over Time through Diffeomorphic Deformations

Author(s): Anirudh Kaushik
Advisor(s): Jayanthi Sivaswamy

Masters

July '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

The Anatomy of Synthesis: Simulating Changes in the Human Brain over Time through Diffeomorphic Deformations

Abstract

The human brain undergoes continuous structural changes throughout the lifespan, driven by a complex interplay of aging processes, environmental influences, and disease-related mechanisms. Patterns of structural change—particularly atrophy associated with tissue loss and shrinkage—emerge gradually over time and are observable using medical imaging techniques. While these changes are shaped by common biological mechanisms, they are also highly individualized, influenced by factors such as lifestyle, and neurological conditions like Alzheimer’s Disease (AD), Parkinson’s disease, tumors, and stroke. Understanding the progression of these changes—both at the individual level and across populations—is critical for advancing our knowledge of healthy aging and the dynamics of neurodegenerative disease. To study how brain structure evolves over time, researchers rely on longitudinal neuroimaging: repeated imaging of the same individuals at multiple timepoints. Unlike cross-sectional imaging, which captures a single snapshot per subject, longitudinal scans provide a temporal sequence that enables direct observation of anatomical trajectories. These sequences allow for the measurement of rates of change, identification of early biomarkers, and modeling of disease progression in a subject-specific manner. However, acquiring complete longitudinal datasets in practice remains challenging. Subject dropout, missed clinical visits, and protocol variability often result in missing scans, interrupting the temporal continuity required for accurate modeling. These gaps limit the effectiveness of methods that rely on temporally complete inputs and can bias downstream analyses. Imputing the missing scan to complete the subject’s imaging timeline is therefore a critical step toward enabling robust longitudinal modeling and improving our understanding of neurodegenerative processes. This thesis addresses the challenges of modeling and analyzing longitudinal brain changes by developing anatomically grounded methods for data imputation, latent space disentanglement, and downstream trajectory analysis. We first introduce SynBADD, a deformation-based framework that synthesizes missing brain scans by predicting physiologically plausible stationary velocity fields (SVFs)—parametric fields that encode smooth, invertible deformations over space and time—rather than directly generating full image intensities. By operating in the deformation space, SynBADD preserves anatomical coherence and spatial fidelity while mitigating the artifacts typically associated with intensity-based synthesis approaches. Building on this foundation, we propose DIVA, a metadata-informed variational autoencoder designed to learn a temporally disentangled latent space for modeling brain morphological changes. DIVA is trained on synthetically augmented Stationary Velocity Fields (SVFs), enriching intra-subject variation and improving the model’s capacity to generalize across limited real-world samples. Age, disease label, and temporal information are explicitly disentangled through conditional bottleneck supervision, allowing the learned latent representations to reflect meaningful clinical and chronological factors. This disentanglement enhances temporal predictability, supports more accurate subject-specific trajectory modeling, and enables the use of powerful transformer-based architectures for latent space interpolation and prediction. We further extend our analysis in an “Analysis by Synthesis” framework, investigating how metadatainformed conditioning affects generation quality and how different temporal reasoning strategies (past-only, future-only, and bidirectional) impact anatomical plausibility. We perform trajectory-based analyses of generated scans, evaluating how well imputed data aligns with true subject-specific anatomical trends across key regions of interest (ROIs) such as the hippocampus and parahippocampus. Additionally, we assess subgroup differences by age, sex, and disease status, and evaluate downstream task performance with and without synthetic imputation. Through comprehensive evaluations on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) longitudinal dataset, our approaches achieve substantial improvements over state-of-the-art baselines in both image fidelity and clinically relevant anatomical accuracy. Beyond technical advancements, this work provides new insights into the modeling of individualized brain aging patterns and opens pathways for data augmentation in clinical studies where longitudinal completeness is often unattainable.

Prediction and Generation Models for Traffic Flow Forecasting

Author(s): Rahul Biju
Advisor(s): Deepak Gangadharan

Masters

June '25
Report no: IIIT/TH//
Center of CSG

Abs PDF

Prediction and Generation Models for Traffic Flow Forecasting

Abstract

Traffic flow prediction is a critical component of intelligent transportation systems, with direct implications for congestion mitigation, safety enhancement, and efficient travel planning. Given the complex spatio-temporal nature of traffic data, a wide range of hybrid deep learning models—such as CNN-LSTM, ConvLSTM, and Temporal Convolutional Networks (TCNs)— have been explored to capture spatial, temporal, and periodic dependencies effectively. In this work, we perform a comprehensive comparison of deep learning models with and without periodicity to evaluate their prediction accuracies. To address the challenge of hyperparameter tuning in these models, we propose a Genetic Algorithm (GA)-based optimization framework for CNN-LSTM, ConvLSTM, and a novel GA-TCN architecture, all of which demonstrate improved performance on benchmark datasets. Furthermore, we introduce a purely temporal deep learning model named Grid LSTM-based Attention Modelling for Traffic Flow Prediction (GLSTM-A), which leverages a combination of Grid LSTM for long-term dependencies, a standard LSTM for recent trends, and a custom attention mechanism to automatically prioritize significant temporal features. GLSTM-A exhibits superior prediction accuracy and memory efficiency compared to existing temporal models such as TCN, LSTM, and Bi-LSTM. Finally, to tackle the challenge posed by limited long-term traffic datasets, we develop TrafficFlowGAN—a GAN-based time-series data generation framework that effectively captures temporal patterns through a joint supervised and adversarial learning process. The synthetic data generated by TrafficFlowGAN closely resembles real-world traffic patterns, thereby enhancing the robustness and accuracy of downstream prediction models. Extensive experimental evaluations and ablation studies validate the efficiency of the proposed models across various traffic forecasting scenarios.

You Can (Not) Trust: Reliability and Robustness of LLMs as Human-Like Annotators and Judges

Author(s): Manav Chaudhary
Advisor(s): Vasudeva Varma Kalidindi

Masters

June '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

You Can (Not) Trust: Reliability and Robustness of LLMs as Human-Like Annotators and Judges

Abstract

The rapid adoption of Large Language Models (LLMs) as tools for annotation and evaluation tasks in Natural Language Processing (NLP) has led to important questions about their reliability, robustness, and alignment with human expectations. This thesis investigates these concerns across three connected lines of inquiry: • the indistinguishability of LLM-generated annotations from human annotations, • human alignment and reliability of LLMs as subjective judges of language quality, and • the susceptibility of LLM-judges to misinformation attacks in evaluation settings. In the first part of this thesis, we study whether annotations made by LLMs can be reliably distinguished from those created by humans. Previous research has claimed that LLMs behave as ’human-like annotators’, motivating our work to rigorously test this hypothesis. We frame this as a classification task: Given a dataset of annotations, can a model detect their origin? Surprisingly, our findings indicate that even state-of-the-art classifiers achieve near-chance performance, with accuracy not exceeding 51%. These results offer strong empirical evidence that LLMs produce annotations that are indistinguishable from human annotators in most settings, validating the ’human-like’ claim from a discriminative modeling point of view. Building on this, we explore whether LLMs can function as reliable evaluators: a more nuanced role that goes beyond labeling and involves subjective scoring of text quality (e.g., summaries, open-ended responses). In this phase, we test how well LLM-generated judgments align with human evaluations and how robust they are to subtle changes in prompt phrasing. We introduce controlled prompt perturbations and evaluate the consistency of LLM scores and textual justifications. The results show a significant lack of robustness: Small, often misleading perturbations lead to large changes in judgment outputs. These inconsistencies expose limitations in using LLMs as trustworthy evaluators, especially in high-stakes or subjective settings. The final part of this thesis presents a preliminary exploration into the robustness of LLM-based evaluators when exposed to misinformation, marking a conceptual shift from controlled perturbations to real-world-inspired adversarial scenarios. We introduce a novel framework that systematically injects misinformation into prompts, grounded in a taxonomy of ten misinformation types. This study probes how these manipulations affect LLM judgments, analyzing both score alignment and justification consistency. Contrary to our initial expectations, the results suggest that LLM-judges demonstrate a surprising degree of resilience. Scores and justifications remain largely consistent across many misinformation types, even when factual content is altered. However, this robustness appears to be superficial, with the LLM-Judge often failing to explicitly recognize or reflect awareness of being misled in its justifications. Rather than detecting and correcting misinformation, they tend to preserve their prior judgments, raising concerns about unconscious robustness without epistemic awareness. These early findings open new avenues for future research. Comparative assessment, misinformation detection, and deeper semantic analysis may reveal subtler vulnerabilities. More broadly, this work lays the groundwork for building misinformation-aware evaluation systems and motivates the development of LLMs that not only evaluate well but do so with fact-sensitive reasoning. Collectively, this thesis explores the evolving role of LLMs as annotation and evaluation agents. While they show promise as ’human-like’ annotators, their behavior as evaluators under adversarial and misinformation-rich contexts highlights both strengths and blind spots. By offering early insights into their robustness and limitations, we provide a foundation for future work aimed at building trustworthy, interpretable, and context-aware LLM judges.

From Cultural Nuance to Lateral Logic: Assessing and Improving LLM Capabilities for Cultural Understanding and Advanced Problem Solving

Author(s): Harshit Gupta
Advisor(s): Vasudeva Varma Kalidindi

Masters

June '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

From Cultural Nuance to Lateral Logic: Assessing and Improving LLM Capabilities for Cultural Understanding and Advanced Problem Solving

Abstract

Large Language Models (LLMs) are powerful AI tools with broad abilities, but they still struggle with tasks that require deep understanding and complex, non-standard reasoning. This thesis examines two major challenges: how well LLMs understand and interpret cross-cultural communication, and how to improve their advanced problem-solving skills like lateral thinking and context-aware support. It evaluates how well current LLMs handle these tasks and explores two main strategies to improve their performance: advanced prompting and reinforcement learning-based fine-tuning. The first part of the research focuses on cultural understanding. A user study of book reviews shows that most (83%) contain Culture-Specific Items (CSIs), often making cross-cultural understanding difficult. We test different LLMs—including GPT-4o and smaller open models like Aya and Gemma—on how well they can detect and categorize these CSIs. The results show mixed strengths across models (e.g., Gemma-2 does well with social references, Aya with customs), but all models show a noticeable Western bias compared to human evaluations. To support future work, we also provide standardized datasets. The second part explores LLM performance in complex reasoning tasks such as puzzles that require creative and unconventional thinking. We find that advanced prompting—using both static and dynamic examples, along with model-generated reasoning steps—greatly improves results with Gemini 1.0 Pro over basic methods. However, models still do not reach human-level performance, suggesting that prompting is a promising way to improve complex reasoning, even if it is not yet perfect. Finally, the thesis focuses on generating context-aware assistance for solving challenging math problems, particularly those at the Olympiad level. We propose a framework for producing diverse, multistage synthetic hints tailored to different student needs. These hints cover a range of instructional types, including: beginning steps to help students start; relevant theorems and definitions to build conceptual grounding; equivalent examples for analogical learning; next steps and summaries of remaining steps based on student progress; identification of unused information from the problem; and correctionoriented hints to help students recognize and fix mistakes. Using this framework, we fine-tuned Small Language Models (SLMs) through several reinforcement learning approaches—Supervised Fine-Tuning (SFT), Kahneman-Tversky Optimization (KTO), and Odds Ratio Policy Optimization (ORPO)—to train models to generate helpful, targeted mathematical hints. This not only addresses a key gap in automated pedagogical support for difficult problems but also tackles an important research question: how effectively can these reinforcement learning methods align a smaller model with significantly fewer parameters to synthetic data generated by a much larger LLM? Overall, we work on multi-dimensional evaluation of LLMs in cultural understanding and advanced problem-solving. It highlights current limitations, reveals existing biases, and demonstrates that prompting and fine-tuning together can build more capable, adaptive, and culturally aware AI systems.