Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cornia, Marcella; Boccignone, Giuseppe; Cucchiara, Rita

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. While deep learning models have advanced scanpath prediction, most existing approaches generate averaged behaviors, failing to capture the variability of human visual exploration. In this work, we present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths. Our method explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, producing a wide range of plausible gaze trajectories. Additionally, we introduce textual conditioning to enable task-driven scanpath generation, allowing the model to adapt to different visual search objectives. Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios, producing more diverse and accurate scanpaths. These results highlight its ability to better capture the complexity of human visual behavior, pushing forward gaze prediction research.

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction / Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cornia, Marcella; Boccignone, Giuseppe; Cucchiara, Rita. - (2025). ( IEEE/CVF International Conference on Computer Vision Honolulu, Hawaii Oct 19 – 23th, 2025).

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Giuseppe Cartella;Vittorio Cuculo;Alessandro D'Amelio;Marcella Cornia;Giuseppe Boccignone;Rita Cucchiara

2025

Abstract

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. While deep learning models have advanced scanpath prediction, most existing approaches generate averaged behaviors, failing to capture the variability of human visual exploration. In this work, we present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths. Our method explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, producing a wide range of plausible gaze trajectories. Additionally, we introduce textual conditioning to enable task-driven scanpath generation, allowing the model to adapt to different visual search objectives. Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios, producing more diverse and accurate scanpaths. These results highlight its ability to better capture the complexity of human visual behavior, pushing forward gaze prediction research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del Convegno
	
				IEEE/CVF International Conference on Computer Vision
			
	Luogo del Convegno
	
				Honolulu, Hawaii
			
	Data del Convegno
	
				Oct 19 – 23th, 2025
			
	Tutti gli autori
	
						Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cornia, Marcella; Boccignone, Giuseppe; Cucchiara, Rita
					
	Citazione
	
				Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction / Cartella, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Cornia, Marcella; Boccignone, Giuseppe; Cucchiara, Rita. - (2025). ( IEEE/CVF International Conference on Computer Vision Honolulu, Hawaii Oct 19 – 23th, 2025).
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris