Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach

Pini, Stefano; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

doi:10.1007/978-3-319-68548-9_36

Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic "someone" tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the visual appearance of each character is linked both through time and to textual mentions in captions. We annotate, in a semi-automatic manner, a total of 53k face tracks and 29k textual mentions on 92 movies. Moreover, to underline and quantify the challenges of the task of generating captions with names, we present different multi-modal approaches to solve the problem on already generated captions.

Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach / Pini, S., Cornia, M., Baraldi, L., Cucchiara, R.. - 10485:(2017), pp. 384-395. (19th International Conference on Image Analysis and Processing, ICIAP 2017 Catania, Italy 11-15 September 2017) [10.1007/978-3-319-68548-9_36].

Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach

Pini, Stefano;CORNIA, MARCELLA;BARALDI, LORENZO;CUCCHIARA, Rita

2017

Abstract

Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic "someone" tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the visual appearance of each character is linked both through time and to textual mentions in captions. We annotate, in a semi-automatic manner, a total of 53k face tracks and 29k textual mentions on 92 movies. Moreover, to underline and quantify the challenges of the task of generating captions with names, we present different multi-modal approaches to solve the problem on already generated captions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2017
			
	Titolo del Convegno
	
				19th International Conference on Image Analysis and Processing, ICIAP 2017
			
	Luogo del Convegno
	
				Catania, Italy
			
	Data del Convegno
	
				11-15 September 2017
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-68548-9_36
			
	Codice WoS
	
				WOS:000445230400036
			
	Codice Scopus
	
				2-s2.0-85032488641
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				10485
			
	Pagina iniziale
	
				384
			
	Pagina finale
	
				395
			
	Tutti gli autori
	
						Pini, Stefano; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				Towards Video Captioning with Naming: a Novel Dataset and a Multi-Modal Approach / Pini, S., Cornia, M., Baraldi, L., Cucchiara, R.. - 10485:(2017), pp. 384-395. (19th International Conference on Image Analysis and Processing, ICIAP 2017 Catania, Italy 11-15 September 2017) [10.1007/978-3-319-68548-9_36].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
paper.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 847.59 kB Formato Adobe PDF Visualizza/Apri	847.59 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris