An automatic caption alignment mechanism for off-the-shelf speech recognition technologies

Federico, Maria; Furini, Marco

doi:10.1007/s11042-012-1318-3

With a growing number of online videos, many producers feel the need to use video captions in order to expand content accessibility and face two main issues: production and alignment of the textual transcript. Both activities are expensive either for the high labor of human resources or for the employment of dedicated software. In this paper, we focus on caption alignment and we propose a novel, automatic, simple and low-cost mechanism that does not require human transcriptions or special dedicated software to align captions. Our mechanism uses a unique audio markup and intelligently introduces copies of it into the audio stream before giving it to an off-the-shelf automatic speech recognition (ASR) application; then it transforms the plain transcript produced by the ASR application into a timecoded transcript, which allows video players to know when to display every single caption while playing out the video. The experimental study evaluation shows that our proposal is effective in producing timecoded transcripts and therefore it can be helpful to expand video content accessibility.

An automatic caption alignment mechanism for off-the-shelf speech recognition technologies / Federico, M., Furini, M.. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - STAMPA. - 72:1(2014), pp. 21-40. [10.1007/s11042-012-1318-3]

An automatic caption alignment mechanism for off-the-shelf speech recognition technologies

FEDERICO, Maria;FURINI, Marco

2014

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2014
			
	Rivista
	
				MULTIMEDIA TOOLS AND APPLICATIONS
			
	N° del Volume
	
				72
			
	Fascicolo
	
				1
			
	Pagina iniziale
	
				21
			
	Pagina finale
	
				40
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11042-012-1318-3
			
	Codice WoS
	
				WOS:000339889800002
			
	Codice Scopus
	
				2-s2.0-84905976807
			
	Citazione
	
				An automatic caption alignment mechanism for off-the-shelf speech recognition technologies / Federico, M., Furini, M.. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - STAMPA. - 72:1(2014), pp. 21-40. [10.1007/s11042-012-1318-3]
			
	Tutti gli autori
	
						Federico, Maria; Furini, Marco
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
MTAP-2012.pdf Accesso riservato Descrizione: Articolo principale Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 565.71 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	565.71 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris