Abstract
Traditionally, dysarthric speech intelligibility assessment systems have focused on speech as the primary input, utilizing methods such as extraction of relevant speech features, classification models, alignment of Automatic Speech Recognition (ASR) outputs, and comparisons between speech representations of dysarthric and healthy speakers. However, to achieve an automated intelligibility assessment that closely mirrors the auditory-perceptual evaluations conducted by clinicians, a model that captures both the acoustic characteristics of dysarthric speech and the linguistic structure related to word pronunciation are needed. Inspired by the practices of clinicians, this study introduces a novel text-guided dysarthric speech intelligibility assessment framework that leverages custom keyword spotting (DySIA-CKWS). The model evaluates intelligibility by detecting specific keywords and is extensively tested using UA-Speech database for speaker-wise analysis and across word groups of varying complexity. To ensure robustness, the system's performance is further validated on TORGO database, demonstrating its adaptability in cross-database settings. Statistical analysis demonstrates strong alignment between predicted and subjective intelligibility scores, with a Pearson Correlation Coefficient (PCC) of 0.9588 and a Spearman's Correlation Coefficient (SCC) of 0.9141, achieved using the proposed system on the UA-Speech database. The findings emphasize the importance of word selection and showcase the model's effectiveness in diagnosing dysarthric speech, offering a significant advancement in intelligibility assessment methodologies.