Vision and language integration: Moving beyond objects

Shekhar, R.; Pezzelle, S.; Herbelot, A.; Nabi, M.; Sangineto, E.; Bernardi, R.

The last years have seen an explosion of work on the integration of vision and language data. New tasks like Image Captioning and Visual Questions Answering have been proposed and impressive results have been achieved. There is now a shared desire to gain an in-depth understanding of the strengths and weaknesses of those models. To this end, several datasets have been proposed to try and challenge the state-of-the-art. Those datasets, however, mostly focus on the interpretation of objects (as denoted by nouns in the corresponding captions). In this paper, we reuse a previously proposed methodology to evaluate the ability of current systems to move beyond objects and deal with attributes (as denoted by adjectives), actions (verbs), manner (adverbs) and spatial relations (prepositions). We show that the coarse representations given by current approaches are not informative enough to interpret attributes or actions, whilst spatial relations somewhat fare better, but only in attention models.

Vision and language integration: Moving beyond objects / Shekhar, R.; Pezzelle, S.; Herbelot, A.; Nabi, M.; Sangineto, E.; Bernardi, R.. - (2017). (Intervento presentato al convegno 12th International Conference on Computational Semantics, IWCS 2017 tenutosi a Montpellier, fra nel 19 - 22 September 2017).

Vision and language integration: Moving beyond objects

Shekhar R.;Pezzelle S.;Herbelot A.;Nabi M.;Sangineto E.;Bernardi R.

2017

Abstract

The last years have seen an explosion of work on the integration of vision and language data. New tasks like Image Captioning and Visual Questions Answering have been proposed and impressive results have been achieved. There is now a shared desire to gain an in-depth understanding of the strengths and weaknesses of those models. To this end, several datasets have been proposed to try and challenge the state-of-the-art. Those datasets, however, mostly focus on the interpretation of objects (as denoted by nouns in the corresponding captions). In this paper, we reuse a previously proposed methodology to evaluate the ability of current systems to move beyond objects and deal with attributes (as denoted by adjectives), actions (verbs), manner (adverbs) and spatial relations (prepositions). We show that the coarse representations given by current approaches are not informative enough to interpret attributes or actions, whilst spatial relations somewhat fare better, but only in attention models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2017
			
	Titolo del Convegno
	
				12th International Conference on Computational Semantics, IWCS 2017
			
	Luogo del Convegno
	
				Montpellier, fra
			
	Data del Convegno
	
				19 - 22 September 2017
			
	Codice Scopus
	
				2-s2.0-85061739546
			
	Tutti gli autori
	
						Shekhar, R.; Pezzelle, S.; Herbelot, A.; Nabi, M.; Sangineto, E.; Bernardi, R.
					
	Citazione
	
				Vision and language integration: Moving beyond objects / Shekhar, R.; Pezzelle, S.; Herbelot, A.; Nabi, M.; Sangineto, E.; Bernardi, R.. - (2017). (Intervento presentato al  convegno 12th International Conference on Computational Semantics, IWCS 2017 tenutosi a Montpellier, fra nel 19 - 22 September 2017).
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
Vision.pdf Accesso riservato Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 591.98 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	591.98 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris