Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.

Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World / Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita. - 11208:(2018), pp. 450-466. (Intervento presentato al convegno European Conference on Computer Vision (ECCV) 2018 tenutosi a Munich (Germany) nel September, 8-14 2018) [10.1007/978-3-030-01225-0_27].

Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World

Matteo Fabbri;Fabio Lanzi;Simone Calderara;Andrea Palazzi;Roberto Vezzani;Rita Cucchiara

2018

Abstract

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Titolo del Convegno
	
				European Conference on Computer Vision (ECCV) 2018
			
	Luogo del Convegno
	
				Munich (Germany)
			
	Data del Convegno
	
				September, 8-14 2018
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-01225-0_27
			
	Codice WoS
	
				WOS:000594212900027
			
	Codice Scopus
	
				2-s2.0-85055457975
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				11208
			
	Pagina iniziale
	
				450
			
	Pagina finale
	
				466
			
	Tutti gli autori
	
						Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita
					
	Citazione
	
				Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World / Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita. - 11208:(2018), pp. 450-466. (Intervento presentato al  convegno European Conference on Computer Vision (ECCV) 2018 tenutosi a Munich (Germany) nel September, 8-14 2018) [10.1007/978-3-030-01225-0_27].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
eccv2018camerareadykit.pdf Open access Descrizione: Articolo principale Tipologia: AO - Versione originale dell'autore proposta per la pubblicazione Dimensione 6.63 MB Formato Adobe PDF Visualizza/Apri	6.63 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1164663

Citazioni

ND

28

102

social impact