IIITH

System and method for retrieving a three-dimensional (3D) object using a self-supervised model

United States Patent, Us patent, 2025

Core Rank : - Google Rank :-

Abs PDF DOI bibTex

@inproceedings{bib_Syst_2025, AUTHOR = {J, Narayanan P and Sanklecha, Kajal Mohan and Mathur, Prayushi }, TITLE = {System and method for retrieving a three-dimensional (3D) object using a self-supervised model}, BOOKTITLE = {United States Patent}. YEAR = {2025}}

System and method for retrieving a three-dimensional (3D) object using a self-supervised model

Abstract

A system and processor-implemented method for three-dimensional (3D) object retrieval using a self-supervised model is provided. The present system learns an embedding space of the 3D mesh objects in a self-supervised manner without the need for objects annotated with their class or other properties. Effective embeddings of 3D mesh objects are learned using the self-supervised method for ranked retrieval from a large collection of 3D objects. A simple representation of mesh objects and a standard neural network model is used to learn the embedding. The results are retrieved on the basis of the shape of the object which may not belong to the same category but look similar in shape using the embeddings generated by self-supervised model. The system is independent of class labels and uses the entire 3D model for better information extraction.

Fast self-supervised 3D mesh object retrieval for geometric similarity

Computer Vision and Image Understanding, CVIU, 2025

Core Rank : - Google Rank :48

Abs PDF DOI bibTex

@inproceedings{bib_Fast_2025, AUTHOR = {Sanklecha, Kajal Mohan and Mathur, Prayushi and J, Narayanan P }, TITLE = {Fast self-supervised 3D mesh object retrieval for geometric similarity}, BOOKTITLE = {Computer Vision and Image Understanding}. YEAR = {2025}}

Fast self-supervised 3D mesh object retrieval for geometric similarity

Abstract

Digital 3D models play a pivotal role in engineering, entertainment, education, and various domains. However, the search and retrieval of these models have not received adequate attention compared to other digital assets like documents and images. Traditional supervised methods face challenges in scalability due to the impracticality of creating large, labeled collections of 3D objects. In response, this paper introduces a self-supervised approach to generate efficient embeddings for 3D mesh objects, facilitating ranked retrieval of similar objects. The proposed method employs a straightforward representation of mesh objects and utilizes an encoder–decoder architecture to learn the embedding. Extensive experiments demonstrate the competitiveness of our approach compared to supervised methods, showcasing its scalability across diverse object collections. Notably, the method exhibits transferability across datasets, implying its potential for broader applicability beyond the training dataset. The robustness and generalization capabilities of the proposed method are substantiated through experiments conducted on varied datasets. These findings underscore the efficacy of the approach in capturing underlying patterns and features, independent of dataset-specific nuances. This self-supervised framework offers a promising solution for enhancing the search and retrieval of 3D models, addressing key challenges in scalability and dataset transferability.

Prototype Guided Backdoor Defense

International Conference on Computer Vision, ICCV, 2025

Core Rank : A* Google Rank :291

Abs PDF DOI bibTex

@inproceedings{bib_Prot_2025, AUTHOR = {J, Narayanan P and Adithya, Amula Venkat and Samavedam, Sunayana and SAINI, SAURABH and Gupta, Avani }, TITLE = {Prototype Guided Backdoor Defense}, BOOKTITLE = {International Conference on Computer Vision}. YEAR = {2025}}

Prototype Guided Backdoor Defense

Abstract

Deep learning models are susceptible to backdoor attacks involving malicious attackers perturbing a small subset of training data with a trigger to causes misclassifications. Various triggers have been used including semantic triggers that are easily realizable without requiring attacker to manipulate the image. The emergence of generative AI has eased generation of varied poisoned samples. Robustness across types of triggers is crucial to effective defense. We propose Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers. PGBD exploits displacements in the geometric spaces of activations to penalize movements towards the trigger. This is done using a novel sanitization loss of a post-hoc finetuning step. The geometric approach scales easily to all types of attacks. PGBD achieves better performance across all settings. We also present the first defense against a new semantic attack on celebrity face images.

LightHeadEd: Relightable & Editable Head Avatars from a Smartphone

Technical Report, arXiv, 2025

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Ligh_2025, AUTHOR = {Manu, Pranav and Srivastava, Astitva and Raj, Amit and JAMPANI, VARUN and Sharma, Avinash and J, Narayanan P }, TITLE = {LightHeadEd: Relightable & Editable Head Avatars from a Smartphone}, BOOKTITLE = {Technical Report}. YEAR = {2025}}

LightHeadEd: Relightable & Editable Head Avatars from a Smartphone

Abstract

Creating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.

Linearly Transformed Spherical Distributions for Interactive Single Scattering with Area Lights

European Association for Computer Graphics, Eurographics, 2025

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_Line_2025, AUTHOR = {Ajit, K T Aakash and Nikhil, Shah Ishaan and J, Narayanan P }, TITLE = {Linearly Transformed Spherical Distributions for Interactive Single Scattering with Area Lights}, BOOKTITLE = {European Association for Computer Graphics}. YEAR = {2025}}

Linearly Transformed Spherical Distributions for Interactive Single Scattering with Area Lights

Abstract

Linearly Transformed Spherical Distributions (LTSDs), a superset of the commonly known Linearly Transformed Cosines (LTCs) for analytic area light rendering, applied in the context of analytic area light rendering with participating media.

Real-Time Decompression and Rasterization of Massive Point Clouds

Proceedings of the ACM on Computer Graphics and Interactive Techniques, PACMCGIT, 2024

Core Rank : - Google Rank :19

Abs bibTex

@inproceedings{bib_Real_2024, AUTHOR = {Goel, Rahul and SCHÜTZ, MARKUS and J, Narayanan P and KERBL, BERNHARD }, TITLE = {Real-Time Decompression and Rasterization of Massive Point Clouds}, BOOKTITLE = {Proceedings of the ACM on Computer Graphics and Interactive Techniques}. YEAR = {2024}}

Real-Time Decompression and Rasterization of Massive Point Clouds

Abstract

Large-scale capturing of real-world scenes as 3D point clouds (e.g., using LIDAR scanning) generates billions of points that are challenging to visualize. High storage requirements prevent the quick and easy inspection of captured datasets on user-grade hardware. The fastest real-time rendering methods are limited by the available GPU memory and render only around 1 billion points interactively. We show that we can achieve state-of-the-art in both while simultaneously supporting datasets that surpass the capabilities of other methods. We present an on-the-fly point cloud decompression scheme that tightly integrates with software rasterization to reduce on-chip memory requirements by more than 4×. Our method compresses geometry losslessly and provides high visual quality at real-time framerates. We use a GPU-friendly, clipped Huffman encoding for compression. Point clouds are divided into equal-sized batches, which are Huffman-encoded independently. Batches are further subdivided to form easy-to-consume streams of data for massively parallel execution. The compressed point clouds are stored in an access-aware manner to achieve coherent GPU memory access

Neural Histogram‐Based Glint Rendering of Surfaces With Spatially Varying Roughness

Computer Graphics Forum, CGF, 2024

Core Rank : - Google Rank :59

Abs bibTex

@inproceedings{bib_Neur_2024, AUTHOR = {Nikhil, Shah Ishaan and Gamboa, and Gruson, and J, Narayanan P }, TITLE = {Neural Histogram‐Based Glint Rendering of Surfaces With Spatially Varying Roughness}, BOOKTITLE = {Computer Graphics Forum}. YEAR = {2024}}

Neural Histogram‐Based Glint Rendering of Surfaces With Spatially Varying Roughness

Abstract

The complex, glinty appearance of detailed normal-mapped surfaces at different scales requires expensive per-pixel Normal Distribution Function computations. Moreover, large light sources further compound this integration and increase the noise in the Monte Carlo renderer. Specialized rendering techniques that explicitly express the underlying normal distribution have been developed to improve performance for glinty surfaces controlled by a fixed material roughness. We present a new method that supports spatially varying roughness based on a neural histogram that computes per-pixel NDFs with arbitrary positions and sizes. Our representation is both memory and compute efficient. Additionally, we fully integrate direct illumination for all light directions in constant time. Our approach decouples roughness and normal distribution, allowing the live editing of the spatially varying roughness of complex normal-mapped objects. We demonstrate that our approach improves on previous work by achieving smaller footprints while offering GPU-friendly computation and compact representation.

A survey on Concept-based Approaches For Model Improvement

Technical Report, arXiv, 2024

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_A_su_2024, AUTHOR = {Gupta, Avani and J, Narayanan P }, TITLE = {A survey on Concept-based Approaches For Model Improvement}, BOOKTITLE = {Technical Report}. YEAR = {2024}}

A survey on Concept-based Approaches For Model Improvement

Abstract

The focus of recent research has shifted from merely improving the metrics based performance of Deep Neural Networks (DNNs) to DNNs which are more interpretable to humans. The field of eXplainable Artificial Intelligence (XAI) has observed various techniques, including saliency-based and concept-based approaches. These approaches explain the model's decisions in simple human understandable terms called Concepts. Concepts are known to be the thinking ground of humans}. Explanations in terms of concepts enable detecting spurious correlations, inherent biases, or clever-hans. With the advent of concept-based explanations, a range of concept representation methods and automatic concept discovery algorithms have been introduced. Some recent works also use concepts for model improvement in terms of interpretability and generalization. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in DNNs, specifically in vision. We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.

Specularity Factorization for Low-Light Enhancement

Computer Vision and Pattern Recognition, CVPR, 2024

Core Rank : A* Google Rank :440

Abs PDF bibTex

@inproceedings{bib_Spec_2024, AUTHOR = {SAINI, SAURABH and J, Narayanan P }, TITLE = {Specularity Factorization for Low-Light Enhancement}, BOOKTITLE = {Computer Vision and Pattern Recognition}. YEAR = {2024}}

Specularity Factorization for Low-Light Enhancement

Abstract

We present a new additive image factorization technique that treats images to be composed of multiple latent specular components which can be simply estimated recursively by modulating the sparsity during decomposition. Our model-driven {\em RSFNet} estimates these factors by unrolling the optimization into network layers requiring only a few scalars to be learned. The resultant factors are interpretable by design and can be fused for different image enhancement tasks via a network or combined directly by the user in a controllable fashion. Based on RSFNet, we detail a zero-reference Low Light Enhancement (LLE) application trained without paired or unpaired supervision. Our system improves the state-of-the-art performance on standard benchmarks and achieves better generalization on multiple other datasets. We also integrate our factors with other task specific fusion networks for applications like deraining, deblurring and dehazing with negligible overhead thereby highlighting the multi-domain and multi-task generalizability of our proposed RSFNet. The code and data is released for reproducibility on the project homepage.

GSN: Generalisable Segmentation in Neural Radiance Fields

Association for the Advancement of Artificial Intelligence, AAAI, 2024

Core Rank : A* Google Rank :220

Abs PDF DOI bibTex

@inproceedings{bib_GSN:_2024, AUTHOR = {Gupta, Vinayak and Goel, Rahul and Sirikonda, Dhawal and J, Narayanan P }, TITLE = {GSN: Generalisable Segmentation in Neural Radiance Fields}, BOOKTITLE = {Association for the Advancement of Artificial Intelligence}. YEAR = {2024}}

GSN: Generalisable Segmentation in Neural Radiance Fields

Abstract

Radiance Fields are being widely explored for 3D scene reconstruction and several downstream tasks, such as segmentation. Prior radiance field segmentation methods require scene-specific training to enable segmentation. We propose distilling semantic features into Radiance Fields in a generalisable fashion using GNT, a transformer-based architecture, enabling 3D reconstruction and multi-view segmentation on arbitrarily new scenes. By fine-tuning our method, any set of 2D features can be distilled into a radiance field, providing better multi-view consistency than the original features. We show multi-view segmentation results on standard datasets and compare our method against existing NeRF-based segmentation methods. We perform on par with the state-of-the-art scene-specific segmentation methods. Our approach and experiments bring generalisable NeRF methods one step closer to the contemporary NeRF literature.