Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.

doi:10.1007/978-3-031-72986-7_14

Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference process of pretrained diffusion models to achieve zero-shot capabilities. An example is the generation of panorama images, which has been tackled in recent works by combining independent diffusion paths over overlapping latent features, which is referred to as joint diffusion, obtaining perceptually aligned panoramas. However, these methods often yield semantically incoherent outputs and trade-off diversity for uniformity. To overcome this limitation, we propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting to improve the perceptual and semantical coherence of the generated panorama images. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Extensive quantitative and qualitative experimental analysis, together with a user study, demonstrate that our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence. We release the code at https://github.com/aimagelab/MAD.

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas / Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.. - 15137:(2025), pp. 234-251. (Intervento presentato al convegno 18th European Conference on Computer Vision, ECCV 2024 tenutosi a ita nel SEP 29-OCT 04, 2024) [10.1007/978-3-031-72986-7_14].

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Quattrini F.;Pippi V.;Cascianelli S.;Cucchiara R.

2025

Abstract

Diffusion models have become the State-of-the-Art for text-to-image generation, and increasing research effort has been dedicated to adapting the inference process of pretrained diffusion models to achieve zero-shot capabilities. An example is the generation of panorama images, which has been tackled in recent works by combining independent diffusion paths over overlapping latent features, which is referred to as joint diffusion, obtaining perceptually aligned panoramas. However, these methods often yield semantically incoherent outputs and trade-off diversity for uniformity. To overcome this limitation, we propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting to improve the perceptual and semantical coherence of the generated panorama images. Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space. Extensive quantitative and qualitative experimental analysis, together with a user study, demonstrate that our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence. We release the code at https://github.com/aimagelab/MAD.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del Convegno
	
				18th European Conference on Computer Vision, ECCV 2024
			
	Luogo del Convegno
	
				ita
			
	Data del Convegno
	
				SEP 29-OCT 04, 2024
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-72986-7_14
			
	Codice WoS
	
				WOS:001353712400014
			
	Codice Scopus
	
				2-s2.0-85208576300
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				15137
			
	Pagina iniziale
	
				234
			
	Pagina finale
	
				251
			
	Tutti gli autori
	
						Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.
					
	Citazione
	
				Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas / Quattrini, F.; Pippi, V.; Cascianelli, S.; Cucchiara, R.. - 15137:(2025), pp. 234-251. (Intervento presentato al  convegno 18th European Conference on Computer Vision, ECCV 2024 tenutosi a ita nel SEP 29-OCT 04, 2024) [10.1007/978-3-031-72986-7_14].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1363933

Citazioni

ND

0

0

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Quattrini F.;Pippi V.;Cascianelli S.;Cucchiara R.

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)