Seeing Eye to AI Comparing Human Gaze and Model Attention in Video Memorability
Prajneya Kumar,Eshika Khandelwal,Makarand Tapaswi,Vishnu Sreekumar
Winter Conference on Applications of Computer Vision, WACV, 2025
@inproceedings{bib_Seei_2025, AUTHOR = {Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar}, TITLE = {Seeing Eye to AI Comparing Human Gaze and Model Attention in Video Memorability}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2025}}
Understanding what makes a video memorable has important applications in advertising or education technology. Towards this goal, we investigate spatio-temporal attention mechanisms underlying video memorability. Different from previous works that fuse multiple features, we adopt a simple CNN+Transformer architecture that enables analysis of spatio-temporal attention while matching state-of-the-art (SoTA) performance on video memorability prediction. We compare model attention against human gaze fixations collected through a small-scale eye-tracking study where humans perform the video memory task. We uncover the following insights: (i) Quantitative saliency metrics show that our model, trained only to predict a memorability score, exhibits similar spatial attention patterns to human gaze, especially for more memorable videos. (ii) The model assigns greater importance to initial frames in a video, mimicking human attention patterns. (iii) Panoptic segmentation reveals that both (model and humans) assign a greater share of attention to things and less attention to stuff as compared to their occurrence probability.
From Sound To Meaning In The Auditory Cortex: A Neuronal Representation And Classification Analysis
Kumar Neelabh,Vishnu Sreekumar
Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024
@inproceedings{bib_From_2024, AUTHOR = {Kumar Neelabh, Vishnu Sreekumar}, TITLE = {From Sound To Meaning In The Auditory Cortex: A Neuronal Representation And Classification Analysis}, BOOKTITLE = {Annual Conference of the International Speech Communication Association}. YEAR = {2024}}
The neural mechanisms underlying the comprehension of meaningful sounds are yet to be fully understood. While previous research has shown that the auditory cortex can classify auditory stimuli into distinct semantic categories, the specific contributions of the primary (A1) and the secondary auditory cortex (A2) to this process are not well understood. We used songbirds as a model species, and analyzed their neural responses as they listened to their entire vocal repertoire ((sim )10 types of vocalizations). We first demonstrate that the distances between the call types in the neural representation spaces of A1 and A2 are correlated with their respective distances in the acoustic feature space. Then, we show that while the neural activity in both A1 and A2 is equally informative of the acoustic category of the vocalizations, A2 is significantly more informative of the semantic category of those vocalizations. Additionally, we show that the semantic categories are more separated in A2. These findings suggest that as the incoming signal moves downstream within the auditory cortex, its acoustic information is preserved, whereas its semantic information is enhanced.
Towards an ecologically valid naturalistic cognitive neuroscience of memory and event cognition
R Pooja,Pritha Ghosh,Vishnu Sreekumar
Neuropsychologia, NPS, 2024
@inproceedings{bib_Towa_2024, AUTHOR = {R Pooja, Pritha Ghosh, Vishnu Sreekumar}, TITLE = {Towards an ecologically valid naturalistic cognitive neuroscience of memory and event cognition}, BOOKTITLE = {Neuropsychologia}. YEAR = {2024}}
The landscape of human memory and event cognition research has witnessed a transformative journey toward the use of naturalistic contexts and tasks. In this review, we track this progression from abrupt, artificial stimuli used in extensively controlled laboratory experiments to more naturalistic tasks and stimuli that present a more faithful representation of the real world. We argue that in order to improve ecological validity, naturalistic study designs must consider the complexity of the cognitive phenomenon being studied. Then, we review the current state of “naturalistic” event segmentation studies and critically assess frequently employed movie stimuli. We evaluate recently developed tools like lifelogging and other extended reality technologies to help address the challenges we identified with existing naturalistic approaches. We conclude by offering some guidelines that can be used to design ecologically valid cognitive neuroscience studies of memory and event cognition.