Split Screens for Optimal Viewing – the latest in computer vision

One usually associates split screens with news channel panel discussions with multiple guests in the same frame. But we are yet to see recordings of stage performances like a dance recital or a play in split mode. Such recordings are typically captured with a single high-resolution camera, or multiple cameras and cameramen to capture various views, which are then edited together to create a single video.

These videos can’t focus on the actors’ or dancers’ faces in close up because of their movement on stage. The wide angle shots of the stage further makes it difficult to understand the emotions and facial expressions, sometimes making it difficult to even recognize and locate performers on stage if you’re seated at a distance.

Creating good split-screen compositions requires creating a set of views that are good individually and can be used together, as well as creating layouts that correctly convey the scene and its details.

Prof Vineet Gandhi from the Centre for Visual Information Technology (CVIT) at IIIT-Hyderabad has found a solution to this problem. Along with his student Moneish Kumar, and in collaboration with French Institute for Research in Computer Science and Automation (INRIA) and University of Wisconsin-Madison, he has been able to automatically create a split screen video from non-static recordings and display both the context of the scene as well as close-up details of the performers.

This approach is especially attractive for digital heritage projects of the performing arts in general, and social and sports events, etc. The split screens which are essentially zoomed-in views can also track lip movements more easily, which is useful for adding subtitles to each of these partitioned screens to enhance the viewing perception for audibly challenged people.

Prof Gandhi presented his groundbreaking research at the prestigious Eurographics 2017 (an annual conference of the European Association for Computer Graphics) at Lyon, France in April. Incidentally, it has been over a decade since any research work from India has been featured at this prestigious conference. Commending his work, Prof C.V. Jawahar, Head of CVIT & Machine Learning (ML) says, “Vineet investigates difficult problems in imaging and image understanding. At the same time, his solutions are simple and elegant. I am impressed with his depth and dedication.”

Given his special interest in computer vision and multimedia, human detection and tracking, computational photography and cinematography, depth reconstruction and application, Prof Gandhi was the recipient of the Erasmus Mundus scholarship for his Masters Program. He spent a semester each in Spain, Norway and France and later joined INRIA for his Master thesis and continued pursuing his PhD there.

He joined IIIT-Hyderabad in April 2015 as a senior research scientist and was last month promoted as Assistant Professor, making him one of the youngest professors at IIIT Hyderabad. No surprise then that the 30-year-old is often mistaken for a student on campus!

Reminiscing on his decision to move to India despite various offers in Europe and to academics instead of the more lucrative industry research, Prof Gandhi says, “I wasn’t sure if I wanted to get into academics since I was apprehensive of the culture around academics in India. But what I did know was that if I were to move back to academics in India, it would have to be to IIIT-Hyderabad as it’s the top ranking institute for Computer Vision in the country and third in Asia (according to csrankings).”

After two years on the job, when asked if he has finally made the transition, Prof Gandhi says, “Yes, I’m here to stay. Couldn’t have asked for a better work culture. IIIT-Hyderabad gives me the freedom to do my own research, with the constant support and encouragement from professors and peers. Everyone’s very approachable and the students and facilities on campus are of a high caliber. Also the scope for computer vision in India is huge. Just as we use our eyes and our brain takes decisions, in computer vision cameras are the eyes, and computers the brain. The possibilities and applications in the future are endless. I believe it’s up to our capabilities to make the best of it”.

More details on the research:

https://faculty.iiit.ac.in/~vgandhi/Splitscreen/

https://faculty.iiit.ac.in/~vgandhi/vgandhi_files/Kumar_Eurographics_2017.pdf