Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve visual items given textual queries and vice-versa. The key idea of many state-of-the-art approaches has been that of learning a joint multi-modal embedding space in which text and images could be projected and compared. Here we take a different approach and reformulate the problem of text-image retrieval as that of learning a translation between the textual and visual domain. Our proposal leverages an end-to-end trainable architecture that can translate text into image features and vice versa and regularizes this mapping with a cycle-consistency criterion. Experimental evaluations for text-to-image and image-to-text retrieval, conducted on small, medium and large-scale datasets show consistent improvements over the baselines, thus confirming the appropriateness of using a cycle-consistent constrain for the text-image matching task.

A Unified Cycle-Consistent Neural Model for Text and Image Retrieval / Cornia, Marcella; Baraldi, Lorenzo; Tavakoli, Hamed R.; Cucchiara, Rita. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1573-7721. - 79:35-36(2020), pp. 25697-25721. [10.1007/s11042-020-09251-4]

A Unified Cycle-Consistent Neural Model for Text and Image Retrieval

Marcella Cornia;Lorenzo Baraldi;Rita Cucchiara
2020

Abstract

Text-image retrieval has been recently becoming a hot-spot research field, thanks to the development of deeply-learnable architectures which can retrieve visual items given textual queries and vice-versa. The key idea of many state-of-the-art approaches has been that of learning a joint multi-modal embedding space in which text and images could be projected and compared. Here we take a different approach and reformulate the problem of text-image retrieval as that of learning a translation between the textual and visual domain. Our proposal leverages an end-to-end trainable architecture that can translate text into image features and vice versa and regularizes this mapping with a cycle-consistency criterion. Experimental evaluations for text-to-image and image-to-text retrieval, conducted on small, medium and large-scale datasets show consistent improvements over the baselines, thus confirming the appropriateness of using a cycle-consistent constrain for the text-image matching task.
2020
79
35-36
25697
25721
A Unified Cycle-Consistent Neural Model for Text and Image Retrieval / Cornia, Marcella; Baraldi, Lorenzo; Tavakoli, Hamed R.; Cucchiara, Rita. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1573-7721. - 79:35-36(2020), pp. 25697-25721. [10.1007/s11042-020-09251-4]
Cornia, Marcella; Baraldi, Lorenzo; Tavakoli, Hamed R.; Cucchiara, Rita
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1204135
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 8
social impact