Explore and Explain: Self-supervised Navigation and Recounting

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the navigation and explanation capabilities of the agent as well as the role of their interactions.

Explore and Explain: Self-supervised Navigation and Recounting / Bigazzi, Roberto; Landi, Federico; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita. - (2021), pp. 1152-1159. (Intervento presentato al convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a Milan, Italy nel 10-15 January 2021) [10.1109/ICPR48806.2021.9412628].

Explore and Explain: Self-supervised Navigation and Recounting

Roberto Bigazzi;Federico Landi;Marcella Cornia;Silvia Cascianelli;Lorenzo Baraldi;Rita Cucchiara

2021

Abstract

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the navigation and explanation capabilities of the agent as well as the role of their interactions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo del Convegno
	
				25th International Conference on Pattern Recognition, ICPR 2020
			
	Luogo del Convegno
	
				Milan, Italy
			
	Data del Convegno
	
				10-15 January 2021
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICPR48806.2021.9412628
			
	Codice WoS
	
				WOS:000678409201034
			
	Codice Scopus
	
				2-s2.0-85108266762
			
	Serie
	
				INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
			
	Pagina iniziale
	
				1152
			
	Pagina finale
	
				1159
			
	Tutti gli autori
	
						Bigazzi, Roberto; Landi, Federico; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				Explore and Explain: Self-supervised Navigation and Recounting / Bigazzi, Roberto; Landi, Federico; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita. - (2021), pp. 1152-1159. (Intervento presentato al  convegno 25th International Conference on Pattern Recognition, ICPR 2020 tenutosi a Milan, Italy nel 10-15 January 2021) [10.1109/ICPR48806.2021.9412628].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2020_ICPR_Navigation.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 1.11 MB Formato Adobe PDF Visualizza/Apri	1.11 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1204117

Citazioni

ND

12

12

social impact