SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning

Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

doi:10.1007/978-3-031-43148-7_10

Image captioning is a challenging task that combines Computer Vision and Natural Language Processing to generate descriptive and accurate textual descriptions for input images. Research efforts in this field mainly focus on developing novel architectural components to extend image captioning models and using large-scale image-text datasets crawled from the web to boost final performance. In this work, we explore an alternative to web-crawled data and augment the training dataset with synthetic images generated by a latent diffusion model. In particular, we propose a simple yet effective synthetic data augmentation framework that is capable of significantly improving the quality of captions generated by a standard Transformer-based model, leading to competitive results on the COCO dataset.

SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning / Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 14233:(2023), pp. 112-123. (Intervento presentato al convegno 22nd International Conference on Image Analysis and Processing, ICIAP 2023 tenutosi a Udine, Italy nel September 11-15, 2023) [10.1007/978-3-031-43148-7_10].

SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning

Caffagni, Davide;Barraco, Manuele;Cornia, Marcella;Baraldi, Lorenzo;Cucchiara, Rita

2023

Abstract

Image captioning is a challenging task that combines Computer Vision and Natural Language Processing to generate descriptive and accurate textual descriptions for input images. Research efforts in this field mainly focus on developing novel architectural components to extend image captioning models and using large-scale image-text datasets crawled from the web to boost final performance. In this work, we explore an alternative to web-crawled data and augment the training dataset with synthetic images generated by a latent diffusion model. In particular, we propose a simple yet effective synthetic data augmentation framework that is capable of significantly improving the quality of captions generated by a standard Transformer-based model, leading to competitive results on the COCO dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2023
			
	Titolo del Convegno
	
				22nd International Conference on Image Analysis and Processing, ICIAP 2023
			
	Luogo del Convegno
	
				Udine, Italy
			
	Data del Convegno
	
				September 11-15, 2023
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-43148-7_10
			
	Codice WoS
	
				WOS:001156196000010
			
	Codice Scopus
	
				2-s2.0-85172239665
			
	Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	N° del Volume
	
				14233
			
	Pagina iniziale
	
				112
			
	Pagina finale
	
				123
			
	Tutti gli autori
	
						Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
					
	Citazione
	
				SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning / Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - 14233:(2023), pp. 112-123. (Intervento presentato al  convegno 22nd International Conference on Image Analysis and Processing, ICIAP 2023 tenutosi a Udine, Italy nel September 11-15, 2023) [10.1007/978-3-031-43148-7_10].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
2023-iciap-captioning.pdf Open access Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 576.36 kB Formato Adobe PDF Visualizza/Apri	576.36 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris