Abstract
Voice disorders are caused due to abnormality in the laryngeal system. The signs and symptoms of voice disorder may include: abnormal pitch (too high pitch, too low pitch, pitch breaks), reduction in loudness, degradation of individual�s voice quality (breathy, rough, and strained voice quality), loss of voice and so on. Instrumental assessment, auditory-perceptual assessment and objective assessment are most widely used methods for diagnosing the voice disorders. Instrumental assessment methods often involve the use of laryngoscopes and stroboscopes, but these procedures can be expensive and painful. Auditory-perceptual methods used by Speech-Language Pathologists (SLPs) is considered as a gold standard for detecting voice disorder. The decisions taken in the subjective intelligibility test vary with experience of SLPs, type of scale used, and also depend on the examiner�s experience. To address these limitations, objective or automatic assessment methods have been extensively explored in the literature. These approaches extract acoustic features from speech signals, offering reliable, costeffective, and repeatable assessments. Objective assessment methods have potential to be used as a pre-diagnostic measure for voice disorder assessment by SLPs. This thesis primarily focuses on the objective or automatic assessment methods of voice disorders. Various objective assessment methods for the automatic detection of voice disorders have been explored in the literature. These methods aim to detect the presence or absence of voice disorders, as well as assess their severity ratings. However, clinical assessment of voice disorders relies on considering the underlying etiological diagnosis. Therefore, this study proposes a clinical approach to assess voice disorders. Along with the detection which was explored in the literature, this thesis explored an objective assessment method which can automatically identify the cause of voice disorders based on the acoustic features extracted from the speech signal. The resulting speech samples are categorized into four distinct categories: structural, neurogenic, functional, and psychogenic. To conduct a comprehensive clinical analysis, a multi-level classification approach is employed. This approach involves training four binary classifiers on acoustic features to achieve a thorough assessment from a clinical perspective. Voice disorders are characterised by irregularities in the vocal fold vibration, incomplete glottal closure and opening, variation in the amplitude of consecutive opening and closing of the vocal folds. Hence the parameters, which can capture these disturbances in a better way will be able to discriminate the voice disorders from healthy samples. From the source-filter model of speech production these features can be captured in a better way from excitation source signals. Glottal flow waveform, zero frequency filtered (ZFF) signal and linear prediction (LP) residual signals are some evidence of excitation source signal. Features derived from these evidences were used to capture the characteristics of voice disorders. First study explores perturbation (jitter, shimmer, noise to harmonic ratios etc.) and cepstral features derived from the excitation source evidence for detection and identification of voice disorders. In this regard state-of-art speech signal processing techniques, such as quasi-closed-phase (QCP) analysis, LP analysis and ZFF techniques, have been explored in this thesis in order to capture the excitation source information. From this study, it was concluded that perturbation parameters can capture voice disorder information in a better way. In addition it was also found that excitation source based features can discriminate between the organic voice disorder from non-organic voice disorder, as well as structural voice disorders from the neurogenic voice disorder category. However, distinguishing functional voice disorders from psychogenic voice disorders proved to be challenging in the study. From the first study, it was found that excitation source based features are able to differentiate the various categories of voice disorders. Computation of these features involves the detection of epoch locations from speech. Therefore, accurate estimation of epoch locations is important for computing these features for the automatic detection and identification of voice disorders. Second study aimed to compare the various algorithms for detecting epoch locations from the speech associated with voice disorders. In this regard, nine state-of-the-art epoch extraction algorithms were considered, and their performance for different categories of voice disorders was evaluated. From the results it can be concluded that most of the epoch extraction methods showed better performance for healthy speech; however, their performance was degraded for speech associated with voice disorders. Furthermore, the performance of epo