The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. More precisely, we follow a case study where a rough estimate of 3D information is also available and must be merged with other traditional metrics (e.g., the IoU). To achieve that, recent approaches resort to either simple rules or complex heuristics to balance the contribution of each cost. However, i) they require careful tuning of tailored hyperparameters on a hold-out set, and ii) they imply these costs to be independent, which does not hold in reality. We address these issues by building upon an elegant probabilistic formulation, which considers the cost of a candidate association as the negative log-likelihood yielded by a deep density estimator, trained to model the conditional joint probability distribution of correct associations. Our experiments, conducted on both simulated and real benchmarks, show that our approach consistently enhances the performance of several tracking-by-detection algorithms.

TrackFlow: Multi-Object Tracking with Normalizing Flows / Mancusi, Gianluca; Panariello, Aniello; Porrello, Angelo; Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita. - (2023), pp. 9497-9509. (Intervento presentato al convegno 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 tenutosi a Paris, fra nel OCT 02-06, 2023) [10.1109/ICCV51070.2023.00874].

TrackFlow: Multi-Object Tracking with Normalizing Flows

Gianluca Mancusi;Aniello Panariello;Angelo Porrello;Matteo Fabbri;Simone Calderara;Rita Cucchiara
2023

Abstract

The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. More precisely, we follow a case study where a rough estimate of 3D information is also available and must be merged with other traditional metrics (e.g., the IoU). To achieve that, recent approaches resort to either simple rules or complex heuristics to balance the contribution of each cost. However, i) they require careful tuning of tailored hyperparameters on a hold-out set, and ii) they imply these costs to be independent, which does not hold in reality. We address these issues by building upon an elegant probabilistic formulation, which considers the cost of a candidate association as the negative log-likelihood yielded by a deep density estimator, trained to model the conditional joint probability distribution of correct associations. Our experiments, conducted on both simulated and real benchmarks, show that our approach consistently enhances the performance of several tracking-by-detection algorithms.
2023
2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Paris, fra
OCT 02-06, 2023
9497
9509
Mancusi, Gianluca; Panariello, Aniello; Porrello, Angelo; Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita
TrackFlow: Multi-Object Tracking with Normalizing Flows / Mancusi, Gianluca; Panariello, Aniello; Porrello, Angelo; Fabbri, Matteo; Calderara, Simone; Cucchiara, Rita. - (2023), pp. 9497-9509. (Intervento presentato al convegno 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 tenutosi a Paris, fra nel OCT 02-06, 2023) [10.1109/ICCV51070.2023.00874].
File in questo prodotto:
File Dimensione Formato  
Mancusi_TrackFlow_Multi-Object_tracking_with_Normalizing_Flows_ICCV_2023_paper.pdf

Open access

Tipologia: Versione pubblicata dall'editore
Dimensione 778.32 kB
Formato Adobe PDF
778.32 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1321706
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact