Visual Saliency for Image Captioning in New Multimedia Services

Image and video captioning are important tasks in visual data analytics, as they concern the capability of describing visual content in natural language. They are the pillars of query answering systems, improve indexing and search and allow a natural form of human-machine interaction. Even though promising deep learning strategies are becoming popular, the heterogeneity of large image archives makes this task still far from being solved. In this paper we explore how visual saliency prediction can support image captioning. Recently, some forms of unsupervised machine attention mechanisms have been spreading, but the role of human attention prediction has never been examined extensively for captioning. We propose a machine attention model driven by saliency prediction to provide captions in images, which can be exploited for many services on cloud and on multimedia data. Experimental evaluations are conducted on the SALICON dataset, which provides groundtruths for both saliency and captioning, and on the large Microsoft COCO dataset, the most widely used for image captioning.

Visual Saliency for Image Captioning in New Multimedia Services / Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita. - (2017), pp. 309-314. (Intervento presentato al convegno 2017 IEEE International Conference on Multimedia and Expo Workshops tenutosi a Hong Kong nel July 10-14, 2017) [10.1109/ICMEW.2017.8026277].

Visual Saliency for Image Captioning in New Multimedia Services

CORNIA, MARCELLA;BARALDI, LORENZO;SERRA, GIUSEPPE;CUCCHIARA, Rita

2017

Abstract

Image and video captioning are important tasks in visual data analytics, as they concern the capability of describing visual content in natural language. They are the pillars of query answering systems, improve indexing and search and allow a natural form of human-machine interaction. Even though promising deep learning strategies are becoming popular, the heterogeneity of large image archives makes this task still far from being solved. In this paper we explore how visual saliency prediction can support image captioning. Recently, some forms of unsupervised machine attention mechanisms have been spreading, but the role of human attention prediction has never been examined extensively for captioning. We propose a machine attention model driven by saliency prediction to provide captions in images, which can be exploited for many services on cloud and on multimedia data. Experimental evaluations are conducted on the SALICON dataset, which provides groundtruths for both saliency and captioning, and on the large Microsoft COCO dataset, the most widely used for image captioning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2017
			
	Titolo del Convegno
	
				2017 IEEE International Conference on Multimedia and Expo Workshops
			
	Luogo del Convegno
	
				Hong Kong
			
	Data del Convegno
	
				July 10-14, 2017
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICMEW.2017.8026277
			
	Codice WoS
	
				WOS:000427041800049
			
	Codice Scopus
	
				2-s2.0-85031674363
			
	Pagina iniziale
	
				309
			
	Pagina finale
	
				314
			
	Tutti gli autori
	
						Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita
					
	Citazione
	
				Visual Saliency for Image Captioning in New Multimedia Services / Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita. - (2017), pp. 309-314. (Intervento presentato al  convegno 2017 IEEE International Conference on Multimedia and Expo Workshops tenutosi a Hong Kong nel July 10-14, 2017) [10.1109/ICMEW.2017.8026277].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
main.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 2.22 MB Formato Adobe PDF Visualizza/Apri	2.22 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1130904

Citazioni

ND

20

4

social impact