Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.

A Hierarchical Quasi-Recurrent approach to Video Captioning / Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino. - (2018), pp. 162-167. (Intervento presentato al convegno 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS) tenutosi a Inria Sophia Antipolis, France nel Dec 12-14) [10.1109/IPAS.2018.8708893].

A Hierarchical Quasi-Recurrent approach to Video Captioning

BOLELLI, FEDERICO
;
Baraldi, Lorenzo;Grana, Costantino
2018

Abstract

Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.
2018
9-mag-2019
2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)
Inria Sophia Antipolis, France
Dec 12-14
162
167
Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino
A Hierarchical Quasi-Recurrent approach to Video Captioning / Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino. - (2018), pp. 162-167. (Intervento presentato al convegno 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS) tenutosi a Inria Sophia Antipolis, France nel Dec 12-14) [10.1109/IPAS.2018.8708893].
File in questo prodotto:
File Dimensione Formato  
2018_IPAS_A_Hierarchical_Quasi_Recurrent_Approach_to_Video_Captioning.pdf

Open access

Tipologia: Versione originale dell'autore proposta per la pubblicazione
Dimensione 971.05 kB
Formato Adobe PDF
971.05 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1166860
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 14
social impact