With a growing number of online videos, many producers feel the need to use video captions in order to expand content accessibility and face two main issues: production and alignment of the textual transcript. Both activities are expensive either for the high labor of human resources or for the employment of dedicated software. In this paper, we focus on caption alignment and we propose a novel, automatic, simple and low-cost mechanism that does not require human transcriptions or special dedicated software to align captions. Our mechanism uses a unique audio markup and intelligently introduces copies of it into the audio stream before giving it to an off-the-shelf automatic speech recognition (ASR) application; then it transforms the plain transcript produced by the ASR application into a timecoded transcript, which allows video players to know when to display every single caption while playing out the video. The experimental study evaluation shows that our proposal is effective in producing timecoded transcripts and therefore it can be helpful to expand video content accessibility.

An automatic caption alignment mechanism for off-the-shelf speech recognition technologies / Federico, Maria; Furini, Marco. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - STAMPA. - 72:1(2014), pp. 21-40. [10.1007/s11042-012-1318-3]

An automatic caption alignment mechanism for off-the-shelf speech recognition technologies

FEDERICO, Maria;FURINI, Marco
2014

Abstract

With a growing number of online videos, many producers feel the need to use video captions in order to expand content accessibility and face two main issues: production and alignment of the textual transcript. Both activities are expensive either for the high labor of human resources or for the employment of dedicated software. In this paper, we focus on caption alignment and we propose a novel, automatic, simple and low-cost mechanism that does not require human transcriptions or special dedicated software to align captions. Our mechanism uses a unique audio markup and intelligently introduces copies of it into the audio stream before giving it to an off-the-shelf automatic speech recognition (ASR) application; then it transforms the plain transcript produced by the ASR application into a timecoded transcript, which allows video players to know when to display every single caption while playing out the video. The experimental study evaluation shows that our proposal is effective in producing timecoded transcripts and therefore it can be helpful to expand video content accessibility.
2014
72
1
21
40
An automatic caption alignment mechanism for off-the-shelf speech recognition technologies / Federico, Maria; Furini, Marco. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - STAMPA. - 72:1(2014), pp. 21-40. [10.1007/s11042-012-1318-3]
Federico, Maria; Furini, Marco
File in questo prodotto:
File Dimensione Formato  
MTAP-2012.pdf

Accesso riservato

Descrizione: Articolo principale
Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 565.71 kB
Formato Adobe PDF
565.71 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/909690
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 18
social impact