The analysis of speech in individuals with amyotrophic lateral sclerosis is a powerful tool to support clinicians in the assessment of bulbar dysfunction. However, current methods used in clinical practice consist of subjective evaluations or expensive instrumentation. This study investigates different approaches combining audio-visual analysis and machine learning to predict the speech impairment evaluation performed by clinicians. Using a small dataset of acoustic and kinematic features extracted from audio and video recordings of speech tasks, we trained and tested some regression models. The best performance was achieved using the extreme boosting machine regressor with multimodal features, which resulted in a root mean squared error of 0.93 on a scale ranging from 5 to 25. Results suggest that integrating audio-video analysis enhances speech impairment assessment, providing an objective tool for early detection and monitoring of bulbar dysfunction, also in home settings.
Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches / Pierotti, F.; Bandini, A.. - (2025), pp. 3743-3747. ( 26th Interspeech Conference 2025 Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld 2025) [10.21437/Interspeech.2025-1931].
Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches
Bandini A.
2025
Abstract
The analysis of speech in individuals with amyotrophic lateral sclerosis is a powerful tool to support clinicians in the assessment of bulbar dysfunction. However, current methods used in clinical practice consist of subjective evaluations or expensive instrumentation. This study investigates different approaches combining audio-visual analysis and machine learning to predict the speech impairment evaluation performed by clinicians. Using a small dataset of acoustic and kinematic features extracted from audio and video recordings of speech tasks, we trained and tested some regression models. The best performance was achieved using the extreme boosting machine regressor with multimodal features, which resulted in a root mean squared error of 0.93 on a scale ranging from 5 to 25. Results suggest that integrating audio-video analysis enhances speech impairment assessment, providing an objective tool for early detection and monitoring of bulbar dysfunction, also in home settings.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_Pierotti_INTERSPEECH.pdf
Accesso riservato
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
386.55 kB
Formato
Adobe PDF
|
386.55 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris




