Towards the evaluation of reproducible robustness in tracking-by-detection