Abstract
Identifying language information from speech utterance is referred to as spoken language identification. Language Identification (LID) is essential in multilingual speech systems. The performance of LID systems have been studied for various adverse conditions such as background noise, telephonic channel, short utterances, so on. In contrast to these studies, for the first time in the literature, the present work investigated the impact of emotional speech on language identification. In this work, different emotional speech databases have been pooled to create the experimental setup. Additionally, state-of-art i-vectors, timedelay neural networks, long short term memory, and deep neural network x-vector systems have been considered to build the LID systems. Performance of the LID system has been evaluated for speech utterances of different emotions in terms of equal error rate and Cavg. The results of the study indicate that the speech utterances of anger and happy emotions degrades performance of LID systems more compared to the neutral and sad emotions. Index Terms—Language Identification, i-vector, TDNN, LSTM, DNN x-vector