A Hierarchical Quasi-Recurrent approach to Video Captioning

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.

A Hierarchical Quasi-Recurrent approach to Video Captioning / Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino. - (2018), pp. 162-167. (Intervento presentato al convegno 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS) tenutosi a Inria Sophia Antipolis, France nel Dec 12-14) [10.1109/IPAS.2018.8708893].

A Hierarchical Quasi-Recurrent approach to Video Captioning

BOLELLI, FEDERICO;Baraldi, Lorenzo;Grana, Costantino

2018

Abstract

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Data di prima pubblicazione
	
				9-mag-2019
			
	Titolo del Convegno
	
				2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)
			
	Luogo del Convegno
	
				Inria Sophia Antipolis, France
			
	Data del Convegno
	
				Dec 12-14
			
	Codice DOI
	
				https://dx.doi.org/10.1109/IPAS.2018.8708893
			
	Codice WoS
	
				WOS:000471844500029
			
	Codice Scopus
	
				2-s2.0-85066316807
			
	Pagina iniziale
	
				162
			
	Pagina finale
	
				167
			
	Tutti gli autori
	
						Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino
					
	Citazione
	
				A Hierarchical Quasi-Recurrent approach to Video Captioning / Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino. - (2018), pp. 162-167. (Intervento presentato al  convegno 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS) tenutosi a Inria Sophia Antipolis, France nel Dec 12-14) [10.1109/IPAS.2018.8708893].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2018_IPAS_A_Hierarchical_Quasi_Recurrent_Approach_to_Video_Captioning.pdf Open access Tipologia: AO - Versione originale dell'autore proposta per la pubblicazione Dimensione 971.05 kB Formato Adobe PDF Visualizza/Apri	971.05 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1166860

Citazioni

ND

16

14

social impact