Sfruttare e Trasferire conoscenza a priori nelle Architetture di Deep Learning

Porrello, Angelo

In the last decade, Deep Learning has arisen as a hot topic and a disruptive tool in the fields of Machine Learning and Computer Vision. It builds upon a learning paradigm in which data (e.g., videos acquired by surveillance cameras placed on a public road) play a crucial role. By leveraging a great number of data-points, it is possible to fit complex and human-like tasks (e.g., recognizing abnormal actions in a video-stream) with impressive results. However, if data availability represents the source of the greatest strength of Deep Learning techniques, it also reveals the greatest weakness: the development of applications and services is indeed often restrained by such a requirement, as the acquisition and maintenance of a huge amount of data are expensive activities that require expert staff and equipment. However, the design of modern Deep Learning architectures offers several degrees of freedom that can be exploited to mitigate the lack of training data, either partial or complete. The underlying idea is to compensate for it by incorporating a prior knowledge that humans (specifically, those who control and guide the learning process) hold about the domain at hand. Indeed, intrinsic rules and properties extend far beyond training data and can often be identified and imposed on the learner. If we take image classification into account, the success of Convolutional Neural Networks (CNNs) over past solutions (such as Multi-Layered Neural Networks) can be mainly ascribed to such a practice. Indeed, the design principle of its fundamental building block (i.e., the convolution between two 2D-signals) naturally reflect what we knew about natural images: in this regard, the correlations that subsist between neighborhood regions of the image provided so a powerful insight for the development of efficient and effective models as CNNs still prove to be. The ultimate aim of this thesis is the investigation and proposal of novel ways of modeling and injecting prior knowledge in Deep Learning architectures. Importantly, we conduct such a discussion across the board: in fact, it focuses on several data domains (e.g., images, videos, graph-structured data, etc.) and involves different levels of the overall training pipeline. On this latter point, we guide the reader towards this research by means of the following threefold categorization: i) parameter-based approaches, which limit the space of feasible solutions to those regions reflecting geometrical properties of the data; ii) goal-driven approaches, which guide the learning process towards solutions that embody some advantageous properties; iii) data-driven approaches, which exploit data to extract the knowledge to be used to condition the training algorithm. Along with a comprehensive description of both settings and tools involved, we present extensive experimental results and ablation studies that demonstrate the value of the techniques proposed in this research.

Nell'ultimo decennio, il Deep Learning è diventato un argomento caldo oltre che uno strumento dirompente nel contesto del Machine Learning e della Computer Vision. Si basa su un paradigma di apprendimento in cui i dati (ad esempio, i video acquisiti da telecamere di video-sorveglianza poste su una strada pubblica) giocano un ruolo cruciale. Sfruttando un gran numero di esempi, è possibile imparare compiti complessi e simili a quelli svolti da esseri umani (ad esempio, riconoscere azioni anomale in un video-stream) con risultati impressionanti. Tuttavia, se la disponibilità di dati rappresenta la più grande forza delle tecniche di Deep Learning, essa nasconde anche la più grande debolezza: lo sviluppo di applicazioni e servizi è, infatti, spesso limitato da tale requisito, poiché l'acquisizione e il mantenimento di una enorme quantità di dati sono attività costose che richiedono personale esperto e attrezzature idonee. Tuttavia, la progettazione delle moderne architetture di Deep Learning offre diversi gradi di libertà, i quali possono essere sfruttati per mitigare la mancanza di dati di allenamento, sia essa parziale che completa. L'idea di fondo è quella di compensare tale mancanza incorporando una conoscenza preliminare che gli umani (in particolare, colore che controllano e guidano il processo di apprendimento) detengono sul dominio in questione. Infatti, le regole e le proprietà intrinseche si estendono ben oltre i dati di formazione e spesso possono essere identificate e imposte al modello di learning. Se prendiamo in considerazione la classificazione delle immagini, il successo delle Reti Neurali Convoluzionali (CNN) rispetto alle soluzioni del passato (come le Reti Neurali Multistrato) può essere attribuito principalmente a tale pratica. Infatti, i principi di progettazione del suo elemento costitutivo fondamentale (cioè la convoluzione tra due segnali 2D) riflettono naturalmente ciò che sapevamo sulle immagini naturali: le correlazioni che sussistono tra le regioni vicine dell'immagine hanno fornito pertanto una potente intuizione per lo sviluppo di modelli efficienti ed efficaci come lo sono ancora le CNN. Lo scopo di questa tesi riguarda l'indagine e la proposta di nuovi modi di modellare e iniettare la conoscenza a priori nelle architetture di Deep Learning. È importante sottolineare che tale discussione è trasversale: infatti, si concentra su diversi domini di dati (ad esempio, immagini, video, dati strutturati mediante un grafo, ecc.) e coinvolge diversi livelli della pipeline complessiva. Su quest'ultimo punto, il lettore viene guidato in questa ricerca attraverso la seguente triplice categorizzazione: i) approcci basati sui parametri, che limitano lo spazio delle soluzioni possibili a quelle regioni che riflettono le proprietà geometriche dei dati; ii) approcci goal-driven, che guidano il processo di apprendimento verso soluzioni che incarnano alcune proprietà vantaggiose; iii) approcci data-driven, che sfruttano i dati per estrarre la conoscenza da utilizzare successivamente per condizionare l'algoritmo di training. Insieme a una descrizione completa di entrambe le impostazioni e degli strumenti coinvolti, presentiamo ampi risultati sperimentali e studi di ablazione che dimostrano il valore delle tecniche proposte in questa ricerca.

Sfruttare e Trasferire conoscenza a priori nelle Architetture di Deep Learning / Angelo Porrello , 2022 Mar 25. 34. ciclo, Anno Accademico 2020/2021.

Sfruttare e Trasferire conoscenza a priori nelle Architetture di Deep Learning

PORRELLO, ANGELO

2022

Abstract

In the last decade, Deep Learning has arisen as a hot topic and a disruptive tool in the fields of Machine Learning and Computer Vision. It builds upon a learning paradigm in which data (e.g., videos acquired by surveillance cameras placed on a public road) play a crucial role. By leveraging a great number of data-points, it is possible to fit complex and human-like tasks (e.g., recognizing abnormal actions in a video-stream) with impressive results. However, if data availability represents the source of the greatest strength of Deep Learning techniques, it also reveals the greatest weakness: the development of applications and services is indeed often restrained by such a requirement, as the acquisition and maintenance of a huge amount of data are expensive activities that require expert staff and equipment. However, the design of modern Deep Learning architectures offers several degrees of freedom that can be exploited to mitigate the lack of training data, either partial or complete. The underlying idea is to compensate for it by incorporating a prior knowledge that humans (specifically, those who control and guide the learning process) hold about the domain at hand. Indeed, intrinsic rules and properties extend far beyond training data and can often be identified and imposed on the learner. If we take image classification into account, the success of Convolutional Neural Networks (CNNs) over past solutions (such as Multi-Layered Neural Networks) can be mainly ascribed to such a practice. Indeed, the design principle of its fundamental building block (i.e., the convolution between two 2D-signals) naturally reflect what we knew about natural images: in this regard, the correlations that subsist between neighborhood regions of the image provided so a powerful insight for the development of efficient and effective models as CNNs still prove to be. The ultimate aim of this thesis is the investigation and proposal of novel ways of modeling and injecting prior knowledge in Deep Learning architectures. Importantly, we conduct such a discussion across the board: in fact, it focuses on several data domains (e.g., images, videos, graph-structured data, etc.) and involves different levels of the overall training pipeline. On this latter point, we guide the reader towards this research by means of the following threefold categorization: i) parameter-based approaches, which limit the space of feasible solutions to those regions reflecting geometrical properties of the data; ii) goal-driven approaches, which guide the learning process towards solutions that embody some advantageous properties; iii) data-driven approaches, which exploit data to extract the knowledge to be used to condition the training algorithm. Along with a comprehensive description of both settings and tools involved, we present extensive experimental results and ablation studies that demonstrate the value of the techniques proposed in this research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
			Prior Knowledge Exploitation and Transfer in Deep Learning Architectures
		
	Anno di discussione
	
			25-mar-2022
		
	Tutor afferenti all'Ateneo
	
			CALDERARA, Simone
		
	Tipologia
	
			Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
thesis_angelo_porrello_revised.pdf Open access Descrizione: Tesi Definitiva Porrello Angelo Tipologia: Tesi di dottorato Dimensione 20.51 MB Formato Adobe PDF Visualizza/Apri	20.51 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris