Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches

Pierotti, F.; Bandini, A.

doi:10.21437/Interspeech.2025-1931

The analysis of speech in individuals with amyotrophic lateral sclerosis is a powerful tool to support clinicians in the assessment of bulbar dysfunction. However, current methods used in clinical practice consist of subjective evaluations or expensive instrumentation. This study investigates different approaches combining audio-visual analysis and machine learning to predict the speech impairment evaluation performed by clinicians. Using a small dataset of acoustic and kinematic features extracted from audio and video recordings of speech tasks, we trained and tested some regression models. The best performance was achieved using the extreme boosting machine regressor with multimodal features, which resulted in a root mean squared error of 0.93 on a scale ranging from 5 to 25. Results suggest that integrating audio-video analysis enhances speech impairment assessment, providing an objective tool for early detection and monitoring of bulbar dysfunction, also in home settings.

Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches / Pierotti, F., Bandini, A.. - (2025), pp. 3743-3747. (26th Interspeech Conference 2025 Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld 2025) [10.21437/Interspeech.2025-1931].

Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches

Pierotti F.;Bandini A.

2025

Abstract

The analysis of speech in individuals with amyotrophic lateral sclerosis is a powerful tool to support clinicians in the assessment of bulbar dysfunction. However, current methods used in clinical practice consist of subjective evaluations or expensive instrumentation. This study investigates different approaches combining audio-visual analysis and machine learning to predict the speech impairment evaluation performed by clinicians. Using a small dataset of acoustic and kinematic features extracted from audio and video recordings of speech tasks, we trained and tested some regression models. The best performance was achieved using the extreme boosting machine regressor with multimodal features, which resulted in a root mean squared error of 0.93 on a scale ranging from 5 to 25. Results suggest that integrating audio-video analysis enhances speech impairment assessment, providing an objective tool for early detection and monitoring of bulbar dysfunction, also in home settings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del Convegno
	
				26th Interspeech Conference 2025
			
	Luogo del Convegno
	
				Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld
			
	Data del Convegno
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.21437/Interspeech.2025-1931
			
	Codice WoS
	
				WOS:001613931400173
			
	Codice Scopus
	
				2-s2.0-105020047259
			
	Serie
	
				INTERSPEECH
			
	Pagina iniziale
	
				3743
			
	Pagina finale
	
				3747
			
	Tutti gli autori
	
						Pierotti, F.; Bandini, A.
					
	Citazione
	
				Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches / Pierotti, F., Bandini, A.. - (2025), pp. 3743-3747. (26th Interspeech Conference 2025 Rotterdam Ahoy Convention Centre, Ahoyweg 10, nld 2025) [10.21437/Interspeech.2025-1931].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2025_Pierotti_INTERSPEECH.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 386.55 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	386.55 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris