GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

D'Oronzio, Fabio; Putamorsi, Federico; Zini, Leonardo; Cornia, Marcella; Baraldi, Lorenzo

Despite recent advances, single-image super-resolution (SR) remains challenging, especially in real-world scenarios with complex degradations. Diffusion-based SR methods, particularly those built on Stable Diffusion, leverage strong generative priors but commonly rely on text conditioning derived from semantic captioning. Such textual descriptions provide only high-level semantics and lack the spatially aligned visual information required for faithful restoration, leading to a representation gap between abstract semantics and spatially aligned visual details. To address this limitation, we propose GramSR, a one-step diffusion-based SR framework that replaces text conditioning with dense visual features extracted from the low-resolution input using a pre-trained DINOv3 encoder. GramSR adopts a three-stage LoRA architecture, where pixel-level, semantic-level, and texture-level LoRA modules are trained sequentially. The pixel-level module focuses on degradation removal using L2 loss, the semantic-level module enhances perceptual details via LPIPS and CSD losses, and the texture-level module enforces feature correlation consistency through a Gram matrix loss computed from DINOv3 features. At inference, independent guidance scales enable flexible control over degradation removal, semantic enhancement, and texture preservation. Extensive experiments on standard SR benchmarks demonstrate that GramSR consistently outperforms existing one-step diffusion-based methods, achieving superior structural fidelity and texture realism.

GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution / D'Oronzio, F., Putamorsi, F., Zini, L., Cornia, M., Baraldi, L.. - (2026). (International Conference on Pattern Recognition Lyon, France August 17-22, 2026).

GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution

Fabio D'Oronzio;Federico Putamorsi;Leonardo Zini;Marcella Cornia;Lorenzo Baraldi

2026

Abstract

Despite recent advances, single-image super-resolution (SR) remains challenging, especially in real-world scenarios with complex degradations. Diffusion-based SR methods, particularly those built on Stable Diffusion, leverage strong generative priors but commonly rely on text conditioning derived from semantic captioning. Such textual descriptions provide only high-level semantics and lack the spatially aligned visual information required for faithful restoration, leading to a representation gap between abstract semantics and spatially aligned visual details. To address this limitation, we propose GramSR, a one-step diffusion-based SR framework that replaces text conditioning with dense visual features extracted from the low-resolution input using a pre-trained DINOv3 encoder. GramSR adopts a three-stage LoRA architecture, where pixel-level, semantic-level, and texture-level LoRA modules are trained sequentially. The pixel-level module focuses on degradation removal using L2 loss, the semantic-level module enhances perceptual details via LPIPS and CSD losses, and the texture-level module enforces feature correlation consistency through a Gram matrix loss computed from DINOv3 features. At inference, independent guidance scales enable flexible control over degradation removal, semantic enhancement, and texture preservation. Extensive experiments on standard SR benchmarks demonstrate that GramSR consistently outperforms existing one-step diffusion-based methods, achieving superior structural fidelity and texture realism.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Titolo del Convegno
	
				International Conference on Pattern Recognition
			
	Luogo del Convegno
	
				Lyon, France
			
	Data del Convegno
	
				August 17-22, 2026
			
	Tutti gli autori
	
						D'Oronzio, Fabio; Putamorsi, Federico; Zini, Leonardo; Cornia, Marcella; Baraldi, Lorenzo
					
	Citazione
	
				GramSR: Visual Feature Conditioning for Diffusion-Based Super-Resolution / D'Oronzio, F., Putamorsi, F., Zini, L., Cornia, M., Baraldi, L.. - (2026). (International Conference on Pattern Recognition Lyon, France August 17-22, 2026).
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris