Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.
A Hierarchical Quasi-Recurrent approach to Video Captioning / Bolelli, Federico; Baraldi, Lorenzo; Grana, Costantino. - (2018), pp. 162-167. (Intervento presentato al convegno 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS) tenutosi a Inria Sophia Antipolis, France nel Dec 12-14) [10.1109/IPAS.2018.8708893].
A Hierarchical Quasi-Recurrent approach to Video Captioning
BOLELLI, FEDERICO
;Baraldi, Lorenzo;Grana, Costantino
2018
Abstract
Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements.File | Dimensione | Formato | |
---|---|---|---|
2018_IPAS_A_Hierarchical_Quasi_Recurrent_Approach_to_Video_Captioning.pdf
Open access
Tipologia:
Versione originale dell'autore proposta per la pubblicazione
Dimensione
971.05 kB
Formato
Adobe PDF
|
971.05 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris