IIITH-incubated AI start-up to be a springboard for affordable Lip-sync technology

A deep Tech AI web app built by two Ph.D scholars of IIITH’s CVIT is all set to disrupt the art of video-editing and dubbing. The patented software from the newly launched start-up can morph authentic-looking speeches with a single video clip of a human or digital avatar, in a matter of minutes, in the language of your choice. Read on.

Way2Lip is a compact video editor with an AI algorithm that is trained to generate accurate lip movement based on the target’s speech.

NeuralSync AI, the startup company was finally launched on March 30th 2022, in response to the flattering amount of global inquiries that its software developers Rudrabha Mukhopadhyay and Prajwal K R received, within days of its publication. “In 17 short days, we were able to close about twenty-five thousand dollars’ worth of leads”, states CEO Pavan Reddy. The US patented technology owned by IIIT Hyderabad was invented by Rudrabha, Prajwal, Prof. C.V. Jawahar and Prof. Vinay Namboodiri.

A cost of a burger for one minute of precision video lip syncing! The web app will do to the dubbing industry what quartz technology did to the 1980s watch industry. Typically, a 10-minute video clip using expensive camera equipment in a studio with editing and post production paraphernalia would take several weeks and cost a pretty packet.

“We are slashing it down to minutes. To create one minute of video today, it will take 45 seconds of time and we hope to pare that down even further to 15-20 seconds” says Rudrabha, the NeuralSync CTO. Apart from production time, it cuts down operating costs substantially.

The technology is so forgiving that you can even shoot the video segment with your phone. Rudrabha adds that you can dub almost all European and Asian languages except probably Mandarin. With AI entering video-production, those mismatched lip movements will be a thing of the past. The algorithm here is trained in a way that it sees short speech segments. Thus a very short sound has a corresponding lip movement.

Slashing video production budgets across industries
Right off the bat, the companies targeted are ed-tech companies, entities that create e-learning lectures like Byjus or even very small startups that would benefit tremendously with the web app. Marketing video films is a big chunk of the target demographic. They are currently in talks with a few metaverse companies in China.

“The movie industry is an end goal where typically, we will create our product in a small google chrome extension”, says Rudrabha. If it is integrated into, say Netflix, the subscriber can watch a Spanish movie sitting in Telangana, with perfectly dubbed dialogues in Telugu. For the movie industry, they plan to build a full-fledged version that addresses their privacy concerns and is scheduled to get rolling in the next 3-4 months.

“Our video creation platform monetizes between 3-10 dollars, depending on the complexity of the problem,” reveals CEO Pavan Reddy. “A simple talking head would cost 3-4 dollars per minute. For a dynamic scene with multiple actors with speaking lines in the same frame, we charge 10 dollars per minute”. Expensive advertising campaigns shot in Hindi or English, can now be released with perfect dubbing in all the south Indian languages, in a fraction of the time!

The startup is in talks with a Bollywood movie producer who has signed up a contract for executing a few scenes. “We will move to Bombay to start the initial phase. Right now, we are a two-man team with part-time interns. The intellectual capital is led by Prof. C.V. Jawahar, Ramesh Loganathan and Prof. Vinay Namboodiri”, he adds. The IIITH professors have given them a carte blanche in the running of the day-to-day operations.

Genesis of the Start-up idea
When Rudrabha joined CVIT in 2018 as a Ph.D scholar, there were a few seniors working on a similar lip sync technology, achieving very basic results. “Prof. Jawahar asked Prajwal and me to play around with their code base, standard operating practice in the CVIT Lab”, remarks the scholar. Given a speech transcript and a video clip, the objective was to morph the speaker’s lip movements to match the target’s speech perfectly.

The researchers began working on it in January 2019 and in four short 4 months, “we got some nice results”, says Rudrabha modestly. Actually what transpired was that it was a major breakthrough that received wide appreciation from IIITH and the academic community when they presented at a conference in France. After that, there was no looking back for the youngsters. In 2020, they built upon their first work, adding a few key components and suddenly they had a patent-worthy commercial product.

NeuralSync AI comes into being
One LinkedIn post, one YouTube video and a few blogs on open forums was all it took to get the ball rolling. “Pretty soon, we were swamped with requests from commercial interests and industry leaders like Facebook”, says Rudrabha. “We tried to commercialize it ourselves but couldn’t navigate the hurdles like acquiring the IP rights for the software that was still an experimental model”. Clueless about juggling the invoicing and the legal requirements, the students struggled for 6 months with a steady inflow of leads that they couldn’t convert.

“This technology was a perfect candidate for the Product labs model setup at the institute, to ensure mature technologies developed at the research labs, seed new deeptech startups. Several use cases were discussed and deep market studies were conducted. It needed an entrepreneurial person like Pavan teaming with a top notch researcher like Rudrabha. We are happy that it all fell in place” says Prakash Yalla, Head of Product Labs.

“One fine day, I was asked to attend a meeting with Products Labs where I met Pavan and from there, we really took off” observes Rudrabha. Prajwal who is presently pursuing his Ph. D at Oxford University was a critical part of the start-up backstory. “It was a tech that both of us built in the lab”, says Rudrabha. “ Anchit and Faizan, two undergraduate students built a video editor around Wav2Lip. Anchit is now pursuing his masters in IIIT and continues to work on improving the algorithm. Faizan has since graduated but has also continued to contribute.

NeuralSync CEO Pavan Reddy joined IIITH in April 2021 with the mandate of mentoring starts ups along with productivising technologies at Product labs. On Day One, the Way2Lip research paper was emailed to him, his first rodeo at the Lab. The ex-IIT-Madras alumni had prior start-up expertise and spent a couple of months, exploring use cases and competitors in the world market. Prof Jawahar’s initial agenda for converting the CVIT leads was not monetary but to ensure that the tech was being used in a practical set up.

From concept to commercial product
“When I joined, the technology was working but it didn’t have any commercial validation”, says Pavan. Talking to leads and creating videos for them gave him valuable customer inputs that helped them to improve the model. But soon they realized that IIITH being an academic institute was not geared for the financial mechanism that the situation demanded. “At that point, Ramesh Loganathan, Head of CIE decided that it was high time that we changed to a start-up model, amp up our style of functioning and aggressively convert each lead to generate revenue” reports Pavan.

When Pavan shared this tool with an edutech company, they were literally swept off their feet. You don’t need a studio, infrastructure, or even a microphone or avatar to do the job. “All you need is the transcript of the speech and using our tools, you can create video content in a matter of minutes. Right now, the goal is edutech and regional news channels. We are also looking at markets like Microsoft or Amazon that churn out customer communication videos. The vast entertainment and gaming industry are big markets that we want to ease into, systematically”, observes Pavan.

The company is right now in that sweet spot with over 90% revenue coming from the US market. Going forward, they plan to join an accelerator like Y Combinator with a track record of growing startups in the US market and also hope to expand into the vast Indian market and monetize the product.

What’s next?
Any deep learning technology that revolutionizes the current system has the potential to create mischief in the wrong hands. “What we are trying to do is to build a robust watermarking system to ensure that our technology is not misused”, says Rudrabha who plans to move into a more research driven role for incubating future tech.

The wow factor was that even before incorporation, the startup had begun generating commercial interest from various customer prospects. “The way we are going, we should be able to bootstrap it”, says Pavan confidently. The plan is to move to AWS servers and shift the whole portal to a cloud- based business to minimize any hiccups.

“This is a startup that is revolutionizing video production”, Pavan summarizes. “When people think about cabs, it’s Ola or Uber. That’s the kind of recall we want. We want everyone to automatically think about Neuralsync as the video platform.