The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data / Cartella, Giuseppe; Baldrati, Alberto; Morelli, Davide; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita. - 14233:(2023), pp. 245-256. ( 22nd International Conference on Image Analysis and Processing, ICIAP 2023 Udine, Italy September 11-15, 2023) [10.1007/978-3-031-43148-7_21].

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Cartella, Giuseppe;Morelli, Davide;Cornia, Marcella;Cucchiara, Rita
2023

Abstract

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip.
2023
Inglese
22nd International Conference on Image Analysis and Processing, ICIAP 2023
Udine, Italy
September 11-15, 2023
Proceedings of the 22nd International Conference on Image Analysis and Processing
Foresti G.L., Fusiello A., Hancock E.
14233
245
256
9783031431470
Springer Science and Business Media Deutschland GmbH
Fashion Domain; Open-Source Datasets; Vision-and-Language Pre-Training;
Cartella, Giuseppe; Baldrati, Alberto; Morelli, Davide; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita
Atti di CONVEGNO::Relazione in Atti di Convegno
273
6
OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data / Cartella, Giuseppe; Baldrati, Alberto; Morelli, Davide; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita. - 14233:(2023), pp. 245-256. ( 22nd International Conference on Image Analysis and Processing, ICIAP 2023 Udine, Italy September 11-15, 2023) [10.1007/978-3-031-43148-7_21].
open
info:eu-repo/semantics/conferenceObject
File in questo prodotto:
File Dimensione Formato  
2023-iciap-fashion.pdf

Open access

Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1309486
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 3
social impact