Abstract
This paper proposes the Fourier-Bessel cepstral coefficients (FBCC) as features for robust text-independent speaker identification. Fourier-Bessel (FB) expansion is used instead of Fourier transform for representing the signal in frequency domain. FB expansion can be viewed as two-dimensional Fourier transform. Change in the kernel of the transform from exponential to decaying exponentials helps in viewing the speech signal as a linear sum of decaying exponentials. For signals arising out of acoustic tubes, where the signal is subjected to many damping effects, delays in the different components of the signal is inevitable. Representing such signals using FB coefficients helps in able identification of different components present in the signal. The random non-stationary nature of speech signal is more efficiently represented by damped sinusoidal nature of basis function that is more natural for the voiced speech signal since Bessel functions have damped sinusoidal as basis function, so it is more natural choice for the representation of natural signals. Vocal tract is modeled as a set of linear acoustic tubes being cylindrical in shape can be efficiently modeled using FB expansion because Bessel functions are solutions to cylindrical wave equations. The proposed approach to speaker identification is based on FBCC features, and method employ Gaussian mixture for modeling the speaker characteristics. However, we have build the speaker models from the Fourier-Bessel features derived from the speech samples, as an alternative to Mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) for building the speaker models. An evaluation of the Gaussian mixture model is conducted on TIMIT database which consists of 630 speakers and 10 speech utterances per speaker and white noise signals of TIMIT database having various SNRs of 50, 40, 30 and 20 dB. Using the statistical model like Gaussian mixture model (GMM) and features extracted from the speech signals build a unique identity for each person who enrolled for speaker identification [1]. Estimation and Maximization algorithm is used for finding the maximum likelihood solution for a model with features, to test the later speeches against the database of all speakers who enrolled in the database. Experimental results shows that the FBCC can be used as the alternate feature for the LPCC and MFCC since it can improve the performance of the speaker identification task