Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic "someone" tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the visual appearance of each character is linked both through time and to textual mentions in captions. We annotate, in a semi-automatic manner, a total of 53k face tracks and 29k textual mentions on 92 movies. Moreover, to underline and quantify the challenges of the task of generating captions with names, we present different multi-modal approaches to solve the problem on already generated captions.

Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach / Pini, Stefano; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 10485:(2017), pp. 384-395. (Intervento presentato al convegno 19th International Conference on Image Analysis and Processing, ICIAP 2017 tenutosi a Catania, Italy nel 11-15 September 2017) [10.1007/978-3-319-68548-9_36].

Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach

Pini, Stefano;CORNIA, MARCELLA;BARALDI, LORENZO;CUCCHIARA, Rita
2017

Abstract

Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic "someone" tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the visual appearance of each character is linked both through time and to textual mentions in captions. We annotate, in a semi-automatic manner, a total of 53k face tracks and 29k textual mentions on 92 movies. Moreover, to underline and quantify the challenges of the task of generating captions with names, we present different multi-modal approaches to solve the problem on already generated captions.
2017
19th International Conference on Image Analysis and Processing, ICIAP 2017
Catania, Italy
11-15 September 2017
10485
384
395
Pini, Stefano; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach / Pini, Stefano; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 10485:(2017), pp. 384-395. (Intervento presentato al convegno 19th International Conference on Image Analysis and Processing, ICIAP 2017 tenutosi a Catania, Italy nel 11-15 September 2017) [10.1007/978-3-319-68548-9_36].
File in questo prodotto:
File Dimensione Formato  
paper.pdf

Open access

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 847.59 kB
Formato Adobe PDF
847.59 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1137457
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact