The analysis of speech in individuals with amyotrophic lateral sclerosis is a powerful tool to support clinicians in the assessment of bulbar dysfunction. However, current methods used in clinical practice consist of subjective evaluations or expensive instrumentation. This study investigates different approaches combining audio-visual analysis and machine learning to predict the speech impairment evaluation performed by clinicians. Using a small dataset of acoustic and kinematic features extracted from audio and video recordings of speech tasks, we trained and tested some regression models. The best performance was achieved using the extreme boosting machine regressor with multimodal features, which resulted in a root mean squared error of 0.93 on a scale ranging from 5 to 25. Results suggest that integrating audio-video analysis enhances speech impairment assessment, providing an objective tool for early detection and monitoring of bulbar dysfunction, also in home settings.

Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches / Pierotti, F.; Bandini, A.. - (2025), pp. 3743-3747. ( 26th Interspeech Conference 2025 Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld 2025) [10.21437/Interspeech.2025-1931].

Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches

Bandini A.
2025

Abstract

The analysis of speech in individuals with amyotrophic lateral sclerosis is a powerful tool to support clinicians in the assessment of bulbar dysfunction. However, current methods used in clinical practice consist of subjective evaluations or expensive instrumentation. This study investigates different approaches combining audio-visual analysis and machine learning to predict the speech impairment evaluation performed by clinicians. Using a small dataset of acoustic and kinematic features extracted from audio and video recordings of speech tasks, we trained and tested some regression models. The best performance was achieved using the extreme boosting machine regressor with multimodal features, which resulted in a root mean squared error of 0.93 on a scale ranging from 5 to 25. Results suggest that integrating audio-video analysis enhances speech impairment assessment, providing an objective tool for early detection and monitoring of bulbar dysfunction, also in home settings.
2025
26th Interspeech Conference 2025
Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld
2025
3743
3747
Pierotti, F.; Bandini, A.
Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches / Pierotti, F.; Bandini, A.. - (2025), pp. 3743-3747. ( 26th Interspeech Conference 2025 Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld 2025) [10.21437/Interspeech.2025-1931].
File in questo prodotto:
File Dimensione Formato  
2025_Pierotti_INTERSPEECH.pdf

Accesso riservato

Tipologia: VOR - Versione pubblicata dall'editore
Dimensione 386.55 kB
Formato Adobe PDF
386.55 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1401671
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact