Embodied Agents for Efficient Exploration and Smart Scene Description

Bigazzi, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

doi:10.1109/ICRA48891.2023.10160668

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images generated through agent-environment interaction. Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions. Further, such descriptions offer user-understandable insights into the robot's representation of the environment by high-lighting the prominent objects and the correlation between them as encountered during the exploration. To quantitatively assess the performance of the proposed approach, we also devise a specific score that takes into account both exploration and description skills. The experiments carried out on both photorealistic simulated environments and real-world ones demonstrate that our approach can effectively describe the robot's point of view during exploration, improving the human-friendly interpretability of its observations.

Embodied Agents for Efficient Exploration and Smart Scene Description / Bigazzi, R., Cornia, M., Cascianelli, S., Baraldi, L., Cucchiara, R.. - 2023-May:(2023), pp. 6057-6064. (2023 IEEE International Conference on Robotics and Automation, ICRA 2023 London 29 May - 2 June 2023) [10.1109/ICRA48891.2023.10160668].

Embodied Agents for Efficient Exploration and Smart Scene Description

Roberto Bigazzi;Marcella Cornia;Silvia Cascianelli;Lorenzo Baraldi;Rita Cucchiara

2023

Abstract

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images generated through agent-environment interaction. Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions. Further, such descriptions offer user-understandable insights into the robot's representation of the environment by high-lighting the prominent objects and the correlation between them as encountered during the exploration. To quantitatively assess the performance of the proposed approach, we also devise a specific score that takes into account both exploration and description skills. The experiments carried out on both photorealistic simulated environments and real-world ones demonstrate that our approach can effectively describe the robot's point of view during exploration, improving the human-friendly interpretability of its observations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Lingua/e di pubblicazione
	
				Inglese
			
	Titolo del Convegno
	
				2023 IEEE International Conference on Robotics and Automation, ICRA 2023
			
	Luogo del Convegno
	
				London
			
	Data del Convegno
	
				29 May - 2 June 2023
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICRA48891.2023.10160668
			
	Codice WoS
	
				WOS:001036713005006
			
	Codice Scopus
	
				2-s2.0-85168712745
			
	Serie
	
				IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION
			
	Titolo del Volume
	
				Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
			
	N° del Volume
	
				2023-May
			
	Codice Articolo
	
				190430
			
	Pagina iniziale
	
				6057
			
	Pagina finale
	
				6064
			
	Codice ISBN del Volume
	
				9798350323658
			
	Nome Editore
	
				Institute of Electrical and Electronics Engineers Inc.
			
	Città Editore
	
				345 E 47TH ST, NEW YORK, NY 10017 USA
			
	Tutti gli autori
	
						Bigazzi, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita
					
	Tipologia
	
				Atti di CONVEGNO::Relazione in Atti di Convegno
			
	Tipologia sito docente
	
				273
			
	Numero autori
	
				5
			
	Citazione
	
				Embodied Agents for Efficient Exploration and Smart Scene Description / Bigazzi, R., Cornia, M., Cascianelli, S., Baraldi, L., Cucchiara, R.. - 2023-May:(2023), pp. 6057-6064. (2023 IEEE International Conference on Robotics and Automation, ICRA 2023 London 29 May - 2 June 2023) [10.1109/ICRA48891.2023.10160668].
			
	Fulltext
	
				none
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris