Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach / Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 11134:(2019), pp. 625-640. (Intervento presentato al convegno 15th European Conference on Computer Vision, ECCV 2018 tenutosi a Munich, Germany nel 8-14 September 2018) [10.1007/978-3-030-11024-6_47].

Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach

Cornia, Marcella;Baraldi, Lorenzo;Cucchiara, Rita
2019

Abstract

Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.
2019
2019
15th European Conference on Computer Vision, ECCV 2018
Munich, Germany
8-14 September 2018
11134
625
640
Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach / Carraggi, Angelo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 11134:(2019), pp. 625-640. (Intervento presentato al convegno 15th European Conference on Computer Vision, ECCV 2018 tenutosi a Munich, Germany nel 8-14 September 2018) [10.1007/978-3-030-11024-6_47].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1164578
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact