Explaining Digital Humanities by Aligning Images and Textual Descriptions

Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita

doi:10.1016/j.patrec.2019.11.018

Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer Vision and the Natural Language Processing communities. This research effort has resulted in algorithms that can retrieve images from textual descriptions and vice versa, when realistic images and sentences with simple semantics are employed and when paired training data is provided. In this paper, we go beyond these limitations and tackle the design of visual-semantic algorithms in the domain of the Digital Humanities. This setting not only advertises more complex visual and semantic structures but also features a significant lack of training data which makes the use of fully-supervised approaches infeasible. With this aim, we propose a joint visual-semantic embedding that can automatically align illustrations and textual elements without paired supervision. This is achieved by transferring the knowledge learned on ordinary visual-semantic datasets to the artistic domain. Experiments, performed on two datasets specifically designed for this domain, validate the proposed strategies and quantify the domain shift between natural images and artworks.

Explaining Digital Humanities by Aligning Images and Textual Descriptions / Cornia, M., Stefanini, M., Baraldi, L., Corsini, M., Cucchiara, R.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 129:(2020), pp. 166-172. [10.1016/j.patrec.2019.11.018]

Explaining Digital Humanities by Aligning Images and Textual Descriptions

Cornia, Marcella;STEFANINI, MATTEO;Baraldi, Lorenzo;Corsini, Massimiliano;Cucchiara, Rita

2020

Abstract

Replicating the human ability to connect Vision and Language has recently been gaining a lot of attention in the Computer Vision and the Natural Language Processing communities. This research effort has resulted in algorithms that can retrieve images from textual descriptions and vice versa, when realistic images and sentences with simple semantics are employed and when paired training data is provided. In this paper, we go beyond these limitations and tackle the design of visual-semantic algorithms in the domain of the Digital Humanities. This setting not only advertises more complex visual and semantic structures but also features a significant lack of training data which makes the use of fully-supervised approaches infeasible. With this aim, we propose a joint visual-semantic embedding that can automatically align illustrations and textual elements without paired supervision. This is achieved by transferring the knowledge learned on ordinary visual-semantic datasets to the artistic domain. Experiments, performed on two datasets specifically designed for this domain, validate the proposed strategies and quantify the domain shift between natural images and artworks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2020
			
	Data di prima pubblicazione
	
				18-nov-2019
			
	Rivista
	
				PATTERN RECOGNITION LETTERS
			
	N° del Volume
	
				129
			
	Pagina iniziale
	
				166
			
	Pagina finale
	
				172
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.patrec.2019.11.018
			
	Codice WoS
	
				WOS:000504641500024
			
	Codice Scopus
	
				2-s2.0-85075538596
			
	Citazione
	
				Explaining Digital Humanities by Aligning Images and Textual Descriptions / Cornia, M., Stefanini, M., Baraldi, L., Corsini, M., Cucchiara, R.. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - 129:(2020), pp. 166-172. [10.1016/j.patrec.2019.11.018]
			
	Tutti gli autori
	
						Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Corsini, Massimiliano; Cucchiara, Rita
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0167865519303381-main.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 2.64 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.64 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
post_print_j.patrec.2019.11.018.pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 1.71 MB Formato Adobe PDF Visualizza/Apri	1.71 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris