Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach

Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach / Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 11134:(2019), pp. 625-640. (Intervento presentato al convegno 15th European Conference on Computer Vision, ECCV 2018 tenutosi a Munich, Germany nel 8-14 September 2018) [10.1007/978-3-030-11024-6_47].

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach

Carraggi, Angelo;Cornia, Marcella;Baraldi, Lorenzo;Cucchiara, Rita

2019

Abstract

Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Data di prima pubblicazione
	
				2019
			
	Titolo del Convegno
	
				15th European Conference on Computer Vision, ECCV 2018
			
	Luogo del Convegno
	
				Munich, Germany
			
	Data del Convegno
	
				8-14 September 2018
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-11024-6_47
			
	Codice WoS
	
				WOS:000594200000047
			
	Codice Scopus
	
				2-s2.0-85061735270
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				11134
			
	Pagina iniziale
	
				625
			
	Pagina finale
	
				640
			
	Tutti gli autori
	
						Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach / Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 11134:(2019), pp. 625-640. (Intervento presentato al  convegno 15th European Conference on Computer Vision, ECCV 2018 tenutosi a Munich, Germany nel 8-14 September 2018) [10.1007/978-3-030-11024-6_47].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1164578

Citazioni

ND

4

1

social impact