In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene.

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation / Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita. - (2020), pp. 7202-7211. ((Intervento presentato al convegno 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 tenutosi a Seattle nel June, 16-18 2020 [10.1109/CVPR42600.2020.00723].

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Matteo Fabbri;Fabio Lanzi;Simone Calderara;Stefano Alletto;Rita Cucchiara
2020-01-01

Abstract

In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020
Seattle
June, 16-18 2020
7202
7211
Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation / Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita. - (2020), pp. 7202-7211. ((Intervento presentato al convegno 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 tenutosi a Seattle nel June, 16-18 2020 [10.1109/CVPR42600.2020.00723].
File in questo prodotto:
File Dimensione Formato  
main.pdf

Open access

Descrizione: Articolo principale e Supplementary Material
Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 7.88 MB
Formato Adobe PDF
7.88 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1206226
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 30
  • ???jsp.display-item.citation.isi??? 20
social impact