Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Lobba, Davide; Sanguigni, Fulvio; Ren, Bin; Cornia, Marcella; Cucchiara, Rita; Sebe, Nicu

Virtual try-on (VTON) has been widely explored for rendering garments onto person images, while its inverse task, virtual try-off (VTOFF), remains largely overlooked. VTOFF aims to recover standardized product images of garments directly from photos of clothed individuals. This capability is of great practical importance for e-commerce platforms, large-scale dataset curation, and the training of foundation models. Unlike VTON, which must handle diverse poses and styles, VTOFF naturally benefits from a consistent output format in the form of flat garment images. However, existing methods face two major limitations: (i) exclusive reliance on visual cues from a single photo often leads to ambiguity, and (ii) generated images usually suffer from loss of fine details, limiting their real-world applicability. To address these challenges, we introduce TEMU-VTOFF, a Text-Enhanced MUlti-category framework for VTOFF. Our architecture is built on a dual DiT-based backbone equipped with a multimodal attention mechanism that jointly exploits image, text, and mask information to resolve visual ambiguities and enable robust feature learning across garment categories. To explicitly mitigate detail degradation, we further design an alignment module that refines garment structures and textures, ensuring high-quality outputs. Extensive experiments on VITON-HD and Dress Code show that TEMU-VTOFF achieves new state-of-the-art performance, substantially improving both visual realism and consistency with target garments. Code and models are available at: https://temu-vtoff-page.github.io/.

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals / Lobba, D., Sanguigni, F., Ren, B., Cornia, M., Cucchiara, R., Sebe, N.. - (2026). (International Conference on Learning Representations Rio de Janeiro, Brazil April 23-27, 2026).

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Davide Lobba;Fulvio Sanguigni;Bin Ren;Marcella Cornia;Rita Cucchiara;Nicu Sebe

2026

Abstract

Virtual try-on (VTON) has been widely explored for rendering garments onto person images, while its inverse task, virtual try-off (VTOFF), remains largely overlooked. VTOFF aims to recover standardized product images of garments directly from photos of clothed individuals. This capability is of great practical importance for e-commerce platforms, large-scale dataset curation, and the training of foundation models. Unlike VTON, which must handle diverse poses and styles, VTOFF naturally benefits from a consistent output format in the form of flat garment images. However, existing methods face two major limitations: (i) exclusive reliance on visual cues from a single photo often leads to ambiguity, and (ii) generated images usually suffer from loss of fine details, limiting their real-world applicability. To address these challenges, we introduce TEMU-VTOFF, a Text-Enhanced MUlti-category framework for VTOFF. Our architecture is built on a dual DiT-based backbone equipped with a multimodal attention mechanism that jointly exploits image, text, and mask information to resolve visual ambiguities and enable robust feature learning across garment categories. To explicitly mitigate detail degradation, we further design an alignment module that refines garment structures and textures, ensuring high-quality outputs. Extensive experiments on VITON-HD and Dress Code show that TEMU-VTOFF achieves new state-of-the-art performance, substantially improving both visual realism and consistency with target garments. Code and models are available at: https://temu-vtoff-page.github.io/.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Titolo del Convegno
	
				International Conference on Learning Representations
			
	Luogo del Convegno
	
				Rio de Janeiro, Brazil
			
	Data del Convegno
	
				April 23-27, 2026
			
	Tutti gli autori
	
						Lobba, Davide; Sanguigni, Fulvio; Ren, Bin; Cornia, Marcella; Cucchiara, Rita; Sebe, Nicu
					
	Citazione
	
				Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals / Lobba, D., Sanguigni, F., Ren, B., Cornia, M., Cucchiara, R., Sebe, N.. - (2026). (International Conference on Learning Representations Rio de Janeiro, Brazil April 23-27, 2026).
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2026_ICLR_TEMU_VTOFF.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 25.66 MB Formato Adobe PDF Visualizza/Apri	25.66 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris