SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

doi:10.1109/ICRA40945.2020.9196653

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans. While the research efforts in image and video captioning are giving promising results, this is often done at the expense of the computational requirements of the approaches, limiting their applicability to real contexts. In this paper, we propose a fully-attentive captioning algorithm which can provide state-of-the-art performances on language generation while restricting its computational demands. Our model is inspired by the Transformer model and employs only two Transformer layers in the encoding and decoding stages. Further, it incorporates a novel memory-aware encoding of image regions. Experiments demonstrate that our approach achieves competitive results in terms of caption quality while featuring reduced computational demands. Further, to evaluate its applicability on autonomous agents, we conduct experiments on simulated scenes taken from the perspective of domestic robots.

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability / Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - (2020), pp. 1128-1134. ( International Conference on Robotics and Automation Paris, France May, 31 - June, 4) [10.1109/ICRA40945.2020.9196653].

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

Cornia, Marcella;BARALDI, LORENZO;Cucchiara, Rita

2020

Abstract

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans. While the research efforts in image and video captioning are giving promising results, this is often done at the expense of the computational requirements of the approaches, limiting their applicability to real contexts. In this paper, we propose a fully-attentive captioning algorithm which can provide state-of-the-art performances on language generation while restricting its computational demands. Our model is inspired by the Transformer model and employs only two Transformer layers in the encoding and decoding stages. Further, it incorporates a novel memory-aware encoding of image regions. Experiments demonstrate that our approach achieves competitive results in terms of caption quality while featuring reduced computational demands. Further, to evaluate its applicability on autonomous agents, we conduct experiments on simulated scenes taken from the perspective of domestic robots.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Titolo del Convegno
	
				International Conference on Robotics and Automation
			
	Luogo del Convegno
	
				Paris, France
			
	Data del Convegno
	
				May, 31 - June, 4
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICRA40945.2020.9196653
			
	Codice WoS
	
				WOS:000712319500122
			
	Codice Scopus
	
				2-s2.0-85092713950
			
	Serie
	
				IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION
			
	Pagina iniziale
	
				1128
			
	Pagina finale
	
				1134
			
	Tutti gli autori
	
						Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability / Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - (2020), pp. 1128-1134. ( International Conference on Robotics and Automation Paris, France May, 31 - June, 4) [10.1109/ICRA40945.2020.9196653].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2020_ICRA_SMArT.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 747.82 kB Formato Adobe PDF Visualizza/Apri	747.82 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris