Faculty - Dashboard
Faculty/ Makarand Tapaswi
Makarand Tapaswi
Assistant Professor
mail_outline
makarand.tapaswi[at]iiit.ac.in
language
About

Hi! I am an Assistant Professor at the Computer Vision group at IIIT Hyderabad and a Principal Machine Learning Scientist at Wadhwani AI, a non-profit on using AI for social good. I enjoy working on multimodal learning, primarily vision and language understanding, especially related to analyzing stories. See our group's work here: https://katha-ai.github.io/

Selected Publications

STRinGS: Selective Text Refinement in Gaussian Splatting

Abhinav Digambar Raundhal,Gaurav Behera,Narayanan P J,Ravi Kiran Sarvadevabhatla,Makarand Tapaswi

Winter Conference on Applications of Computer Vision, WACV, 2026
Abs | PDF | bib Tex

Auditory CNN Analysis: What Do Layers Encode?

Pratyaksh Gautam,Makarand Tapaswi,Vinoo Alluri R

International Conference on Music Perception and Cognition, ICMPC, 2025
Abs | PDF | bib Tex

MALeR: Improving Compositional Fidelity in Layout-Guided Generation

Shivank Saxena,Dhruv Srivastava,Makarand Tapaswi

ACM Transactions on Graphics, ACM-TG, 2025
Abs | PDF | bib Tex

What You See is What You Ask: Evaluating Audio Descriptions

Divy Kala,Eshika Khandelwal,Makarand Tapaswi

Conference on Empirical Methods in Natural Language Processing, EMNLP, 2025
Abs | PDF | bib Tex

Investigating Mechanisms for In-Context Vision Language Binding

Darshana S,Makarand Tapaswi,Vineet Gandhi

Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2025
Abs | PDF | bib Tex

IdentifyMe: A Challenging Mention Resolution Benchmark for LLMs

S Kawshik Manikantan,Makarand Tapaswi,Vineet Gandhi,Shubham Toshniwal

North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL- HLT, 2025
Abs | PDF | bib Tex

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment

Darshana S,Varun Gupta,Darshan Singh S,Zeeshan Khan,Vineet Gandhi,Makarand Tapaswi

Computer Vision and Pattern Recognition, CVPR, 2025
Abs | PDF | bib Tex

The Sound of Water: Inferring Physical Properties from Pouring Liquids

Piyush Bagad,Makarand Tapaswi,Cees G. M. Snoek,Andrew Zisserman

International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2025
Abs | PDF | bib Tex

No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning

Manu Gaur,Darshan Singh S,Makarand Tapaswi

Transactions in Machine Learning Research, TMLR, 2025
Abs | PDF | bib Tex

Seeing Eye to AI Comparing Human Gaze and Model Attention in Video Memorability

Prajneya Kumar,Eshika Khandelwal,Makarand Tapaswi,Vishnu Sreekumar

Winter Conference on Applications of Computer Vision, WACV, 2025
Abs | PDF | bib Tex

Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation

Manu Gaur,Darshan Singh S,Makarand Tapaswi

Workshop Emergent Visual Abilities and Limits of Foundation Models, EVAL-FoMo W, 2024
Abs | PDF | bib Tex

Localizing Auditory Concepts in CNNs

Pratyaksh Gautam,Makarand Tapaswi,Vinoo Alluri R

ICML Mechanistic Interpretability Workshop, ICMLMI-W, 2024
Abs | PDF | bib Tex

System and method for identifying soundtrack for a digital book using a movie adaptation technique

Vinoo Alluri R,Makarand Tapaswi,Jaidev Shriram

United States Patent, Us patent, 2024
Abs | PDF | bib Tex

Major Entity Identification: A Generalizable Alternative to Coreference Resolution

S Kawshik Manikantan,Shubham Toshniwal,Makarand Tapaswi,Vineet Gandhi

Conference on Empirical Methods in Natural Language Processing, EMNLP, 2024
Abs | PDF | bib Tex

MICap: A Unified Model for Identity-aware Movie Descriptions

Haran S K Raajesh,Naveen Reddy Desanur,Zeeshan Khan,Makarand Tapaswi

Computer Vision and Pattern Recognition, CVPR, 2024
Abs | PDF | bib Tex

Previously On ... From Recaps to Story Summarization

Aditya Kumar Singh,Dhruv Srivastava,Makarand Tapaswi

Computer Vision and Pattern Recognition, CVPR, 2024
Abs | PDF | bib Tex

How you feelin? Learning Emotions and Mental States in Movie Scenes

Dhruv Srivastava,Aditya Kumar Singh,Makarand Tapaswi

Computer Vision and Pattern Recognition, CVPR, 2023
Abs | PDF | bib Tex

GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering

Dhaval Taunk,Lakshya Khanna,Kandru Siri Venkata Pavan Kumar,Vasudeva Varma Kalidindi,Charu Sharma,Makarand Tapaswi

WWW Workshop on Natural Language Processing for Knowledge Graph Construction, NLP4KGc, 2023
Abs | PDF | bib Tex

DO VIDEO-LANGUAGE FOUNDATION MODELS HAVE A SENSE OF TIME?

Piyush Bagad,Makarand Tapaswi,Cees G. M. Snoek

workshop on International Conference on Learning Representations, ICLR-W, 2023
Abs | PDF | bib Tex

Test of Time: Instilling Video-Language Models with a Sense of Time

Piyush Bagad,Makarand Tapaswi,Cees G. M. Snoek

Computer Vision and Pattern Recognition, CVPR, 2023
Abs | PDF | bib Tex

Unsupervised Audio-Visual Lecture Segmentation

Darshan Singh S,Anchit Gupta,Jawahar C V,Makarand Tapaswi

Winter Conference on Applications of Computer Vision, WACV, 2023
Abs | PDF | bib Tex

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation

Shizhe Chen, Pierre-Louis Guhur,Makarand Tapaswi,Cordelia Schmid, Ivan Laptev

European Conference on Computer Vision, ECCV, 2022
Abs | | bib Tex

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Shizhe Chen,Pierre-Louis Guhur,Makarand Tapaswi,Cordelia Schmid,Ivan Laptev

Computer Vision and Pattern Recognition, CVPR, 2022
Abs | PDF | bib Tex

Learning Object Manipulation Skills from Video via Approximate Differentiable Physics

Vladim´ır Petr´ık,Mohammad Nomaan Qureshi,Josef Sivic,Makarand Tapaswi

International Conference on Intelligent Robots and Systems, IROS, 2022
Abs | PDF | bib Tex

Instruction-driven history-aware policies for robotic manipulations

Pierre-Louis Guhur,Shizhe Chen,Ricardo Garcia,Makarand Tapaswi,Ivan Laptev,Cordelia Schmid

Conference on Robot Learning, CORL, 2022
Abs | PDF | bib Tex

Can we Adopt Self-supervised Pretraining for Chest X-Rays?

Arsh Verma,Makarand Tapaswi

Machine Learning for Health Workshop, ML4H, 2022
Abs | PDF | bib Tex

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

Shizhe Chen,Pierre-Louis Guhur,Makarand Tapaswi,Cordelia Schmid,Ivan Laptev

Neural Information Processing Systems, NeurIPS, 2022
Abs | PDF | bib Tex

Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations

Jaidev Shriram,Makarand Tapaswi,Vinoo A R

International Society for Music Information Retrieval, ISMIR, 2022
Abs | PDF | bib Tex

Grounded Video Situation Recognition

Zeeshan Khan,Jawahar C V,Makarand Tapaswi

Neural Information Processing Systems, NeurIPS, 2022
Abs | PDF | bib Tex

Long term spatio-temporal modeling for action detection

Makarand Tapaswi,Vijay Kumar,Ivan Laptev

Computer Vision and Image Understanding, CVIU, 2021
Abs | PDF | bib Tex

Airbert: In-domain Pretraining for Vision-and-Language Navigation

Pierre-Louis Guhur,Makarand Tapaswi,Shizhe Chen,Ivan Laptev,Cordelia Schmid

International Conference on Computer Vision, ICCV, 2021
Abs | PDF | bib Tex

Feature Generation for Long-tail Classification

Rahul Vigneswaran,Marc T. Law,Vineeth N. Balasubramanian,Makarand Tapaswi

Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2021
Abs | PDF | bib Tex