Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Image captioning models have lately shown impressive results when applied to standard datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger variety of visual concepts which are not covered in existing training sets. For this reason, novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase. In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly. Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints. We perform experiments on the held-out COCO dataset, where we demonstrate improvements over the state of the art, both in terms of adaptability to novel objects and caption quality.

Learning to Select: A Fully Attentive Approach for Novel Object Captioning / Cagrandi, Marco; Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Cucchiara, Rita. - (2021), pp. 437-441. (Intervento presentato al convegno 11th ACM International Conference on Multimedia Retrieval, ICMR 2021 tenutosi a Taipei, Taiwan nel August 21-24, 2021) [10.1145/3460426.3463587].

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Marco Cagrandi;Marcella Cornia;Matteo Stefanini;Lorenzo Baraldi;Rita Cucchiara

2021

Abstract

Image captioning models have lately shown impressive results when applied to standard datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger variety of visual concepts which are not covered in existing training sets. For this reason, novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase. In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly. Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints. We perform experiments on the held-out COCO dataset, where we demonstrate improvements over the state of the art, both in terms of adaptability to novel objects and caption quality.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo del Convegno
	
				11th ACM International Conference on Multimedia Retrieval, ICMR 2021
			
	Luogo del Convegno
	
				Taipei, Taiwan
			
	Data del Convegno
	
				August 21-24, 2021
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3460426.3463587
			
	Codice WoS
	
				WOS:000723651900049
			
	Codice Scopus
	
				2-s2.0-85114860567
			
	Pagina iniziale
	
				437
			
	Pagina finale
	
				441
			
	Tutti gli autori
	
						Cagrandi, Marco; Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				Learning to Select: A Fully Attentive Approach for Novel Object Captioning / Cagrandi, Marco; Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Cucchiara, Rita. - (2021), pp. 437-441. (Intervento presentato al  convegno 11th ACM International Conference on Multimedia Retrieval, ICMR 2021 tenutosi a Taipei, Taiwan nel August 21-24, 2021) [10.1145/3460426.3463587].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2021_ICMR_NOC.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 708.29 kB Formato Adobe PDF Visualizza/Apri	708.29 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1243376

Citazioni

ND

9

4

social impact