Abstract
                                                                        We introduce an SSMT (Speech to Speech Machine Translation, aka Speech to Speech Video  Translation) Pipeline1  , as a web application  for translating videos from one language to another by cascading multiple language modules.  Our speech translation system combines highly  accurate speech to text (ASR) for Indian English, pre-possessing modules to bridge ASRMT gaps such as spoken disfluency and punctuation, robust machine translation (MT) systems  for multiple language pairs, SRT module for  translated text, text to speech (TTS) module and  a module to render translated synthesized audio  on the original video. It is user-friendly, flexible, and easily accessible system. We aim to  provide a complete configurable speech translation experience to users and researchers with  this system. It also supports human intervention where users can edit outputs of different  modules and the edited output can then be used  for subsequent processing to improve overall  output quality. By adopting a human-in-theloop approach, the aim is to configure technology in such a way where it can assist humans  and help to reduce the involved human efforts  in speech translation involving English and Indian languages. As per our understanding, this  is the first fully integrated system for English  to Indian languages (Hindi, Telugu, Gujarati,  Marathi, and Punjabi) video translation. Our  evaluation shows that one can get 3.5+ MOS  score using the developed pipeline with human intervention for English to Hindi. A short  video demonstrating our system is available at  https://youtu.be/MVftzoeRg48.