Since decades, the problem of multiple people tracking has been tackled leveraging 2D data only. However, people moves and interact in a three-dimensional space. For this reason, using only 2D data might be limiting and overly challenging, especially due to occlusions and multiple overlapping people. In this paper, we take advantage of 3D synthetic data from the novel MOTSynth dataset, to train our proposed 3D people detector, whose observations are fed to a tracker that works in the corresponding 3D space. Compared to conventional 2D trackers, we show an overall improvement in performance with a reduction of identity switches on both real and synthetic data. Additionally, we propose a tracker that jointly exploits 3D and 2D data, showing an improvement over the proposed baselines. Our experiments demonstrate that 3D data can be beneficial, and we believe this paper will pave the road for future efforts in leveraging 3D data for tackling multiple people tracking. The code is available at (https://github.com/GianlucaMancusi/LoCO-Det ).
First Steps Towards 3D Pedestrian Detection and Tracking from Single Image / Mancusi, G.; Fabbri, M.; Egidi, S.; Verasani, M.; Scarabelli, P.; Calderara, S.; Cucchiara, R.. - 13232:(2022), pp. 335-346. (Intervento presentato al convegno 21st International Conference on Image Analysis and Processing, ICIAP 2022 tenutosi a ita nel 2022) [10.1007/978-3-031-06430-2_28].