Abstract
Automatic detection and classification of stuttering events remain a challenging problem in speech processing, particularly due to the variability of disfluency
types and the scarcity of large annotated datasets. This paper presents a signal processing-based approach for stuttering classification that avoids the data-intensive requirements of machine learning models. Our method focuses on syllable-level analysis using key acoustic parameters such as energy patterns, pitch contours, spectral stability, and temporal features, implemented through rule-based classification with predefined thresholds. The methodology involves automatic syllable segmentation followed by rule-based classification using thresholds on acoustic features. The dataset comprises annotated read and spontaneous speech from 106 Kannada-speaking adults, including 26 new participants evaluated during clinical trials. Three certified Speech-Language Pathologists (SLPs) conducted perceptual evaluations, and system outputs were validated using intraclass correlation (ICC), yielding strong agreement (textgreater 0.82) across stuttering types. The system achieved classification accuracies of 89% for blocks, 83% for repetitions, and 81% for prolongations. However, the system has several limitations: it relies on static threshold values, exhibits sensitivity to noise, and lacks automated severity grading. Additionally, generalizability across languages and speaking conditions requires further exploration. Machine learning models such as SVM, TDNN, and LSTM were trained and achieved a performance of around 30 to 60% accuracy. Our approach offers high interpretability, low computational cost, and real-time feasibility compared to machine learning methods. Future work will focus on adaptive thresholding, noise-robust processing, and hybrid approaches integrating signal processing with lightweight learning-based refinement.