LAMV: Learning to align and match videos with kernelized temporal layers

Baraldi, Lorenzo; Douze, Matthijs; Cucchiara, Rita; Jégou, Hervé

doi:10.1109/CVPR.2018.00814

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.

LAMV: Learning to align and match videos with kernelized temporal layers / Baraldi, L., Douze, M., Cucchiara, R., Jégou, H.. - (2018), pp. 7804-7813. (31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 Salt Lake City, UT, USA, USA June 18-22) [10.1109/CVPR.2018.00814].

LAMV: Learning to align and match videos with kernelized temporal layers

Baraldi, Lorenzo;Douze, Matthijs;Cucchiara, Rita;Jégou, Hervé

2018

Abstract

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate. We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Titolo del Convegno
	
				31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
			
	Luogo del Convegno
	
				Salt Lake City, UT, USA, USA
			
	Data del Convegno
	
				June 18-22
			
	Codice DOI
	
				https://dx.doi.org/10.1109/CVPR.2018.00814
			
	Codice WoS
	
				WOS:000457843607099
			
	Codice Scopus
	
				2-s2.0-85061019248
			
	Serie
	
				PROCEEDINGS - IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
			
	Pagina iniziale
	
				7804
			
	Pagina finale
	
				7813
			
	Tutti gli autori
	
						Baraldi, Lorenzo; Douze, Matthijs; Cucchiara, Rita; Jégou, Hervé
					
	Citazione
	
				LAMV: Learning to align and match videos with kernelized temporal layers / Baraldi, L., Douze, M., Cucchiara, R., Jégou, H.. - (2018), pp. 7804-7813. (31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 Salt Lake City, UT, USA, USA June 18-22) [10.1109/CVPR.2018.00814].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
1517.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 2.54 MB Formato Adobe PDF Visualizza/Apri	2.54 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris