Semantic video labeling by developmental visual agents

Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano

doi:10.1016/j.cviu.2016.02.011

In the recent years, computer vision has been undergoing a period of great development, testified by the many successful applications that are currently available in a variety of industrial products. Yet, when we come to the most challenging and foundational problem of building autonomous agents capable of performing scene understanding in unrestricted videos, there is still a lot to be done. In this paper we focus on semantic labeling of video streams, in which a set of semantic classes must be predicted for each pixel of the video. We propose to attack the problem from bottom to top, by introducing Developmental Visual Agents (DVAs) as general purpose visual systems that can progressively acquire visual skills from video data and experience, by continuously interacting with the environment and following lifelong learning principles. DVAs gradually develop a hierarchy of architectural stages, from unsupervised feature extraction to the symbolic level, where supervisions are provided by external users, pixel-wise. Differently from classic machine learning algorithms applied to computer vision, which typically employ huge datasets of fully labeled images to perform recognition tasks, DVAs can exploit even a few supervisions per semantic category, by enforcing coherence constraints based on motion estimation. Experiments on different vision tasks, performed on a variety of heterogeneous visual worlds, confirm the great potential of the proposed approach.

Semantic video labeling by developmental visual agents / Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 146:(2016), pp. 9-26. [10.1016/j.cviu.2016.02.011]

Semantic video labeling by developmental visual agents

GORI, MARCO;LIPPI, MARCO;Maggini, Marco;Melacci, Stefano

2016

Abstract

In the recent years, computer vision has been undergoing a period of great development, testified by the many successful applications that are currently available in a variety of industrial products. Yet, when we come to the most challenging and foundational problem of building autonomous agents capable of performing scene understanding in unrestricted videos, there is still a lot to be done. In this paper we focus on semantic labeling of video streams, in which a set of semantic classes must be predicted for each pixel of the video. We propose to attack the problem from bottom to top, by introducing Developmental Visual Agents (DVAs) as general purpose visual systems that can progressively acquire visual skills from video data and experience, by continuously interacting with the environment and following lifelong learning principles. DVAs gradually develop a hierarchy of architectural stages, from unsupervised feature extraction to the symbolic level, where supervisions are provided by external users, pixel-wise. Differently from classic machine learning algorithms applied to computer vision, which typically employ huge datasets of fully labeled images to perform recognition tasks, DVAs can exploit even a few supervisions per semantic category, by enforcing coherence constraints based on motion estimation. Experiments on different vision tasks, performed on a variety of heterogeneous visual worlds, confirm the great potential of the proposed approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2016
			
	Rivista
	
				COMPUTER VISION AND IMAGE UNDERSTANDING
			
	N° del Volume
	
				146
			
	Pagina iniziale
	
				9
			
	Pagina finale
	
				26
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.cviu.2016.02.011
			
	Codice WoS
	
				WOS:000374430200002
			
	Codice Scopus
	
				2-s2.0-84959880509
			
	Citazione
	
				Semantic video labeling by developmental visual agents / Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - 146:(2016), pp. 9-26. [10.1016/j.cviu.2016.02.011]
			
	Tutti gli autori
	
						Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
CVIU2016.pdf Accesso riservato Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 8.38 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	8.38 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris