Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages

Bergamaschi, S.; Martoglia, R.; Ruozzi, F.; Vigliermo, R. A.; De Nardis, S.; Sala, L.; Vanzini, M.

doi:10.1145/3462203.3475927

Managing and sharing cultural heritages also in supranational and multi-literate contexts is a very hot research topic. In this paper we discuss the research we are conducting in the DigitalMaktaba project, presenting the first steps for designing an innovative workflow and tool for the automatic extraction of knowledge from documents written in multiple non-Latin languages (Arabic, Persian and Azerbaijani languages). The tool leverages different OCR, text processing techniques and linguistic corpora in order to provide both a highly accurate extracted text and a rich metadata content, overcoming typical limitations of current state-of-the-art systems; this will enable in the near future the development of an automatic cataloguer which we hope will ultimately help in better preserving and conserving culture in such a demanding scenario.

Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages / Bergamaschi, S., Martoglia, R., Ruozzi, F., Vigliermo, R.A., De Nardis, S., Sala, L., Vanzini, M.. - (2021), pp. 301-304. (1st Conference on Information Technology for Social Good, GoodIT 2021 Roma, ita 9-11 settembre 2021) [10.1145/3462203.3475927].

Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages

Bergamaschi S.;Martoglia R.;Ruozzi F.;Vigliermo R. A.;De Nardis S.;Sala L.;Vanzini M.

2021

Abstract

Managing and sharing cultural heritages also in supranational and multi-literate contexts is a very hot research topic. In this paper we discuss the research we are conducting in the DigitalMaktaba project, presenting the first steps for designing an innovative workflow and tool for the automatic extraction of knowledge from documents written in multiple non-Latin languages (Arabic, Persian and Azerbaijani languages). The tool leverages different OCR, text processing techniques and linguistic corpora in order to provide both a highly accurate extracted text and a rich metadata content, overcoming typical limitations of current state-of-the-art systems; this will enable in the near future the development of an automatic cataloguer which we hope will ultimately help in better preserving and conserving culture in such a demanding scenario.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Titolo del Convegno
	
				1st Conference on Information Technology for Social Good, GoodIT 2021
			
	Luogo del Convegno
	
				Roma, ita
			
	Data del Convegno
	
				9-11 settembre 2021
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3462203.3475927
			
	Codice WoS
	
				WOS:001344995100053
			
	Codice Scopus
	
				2-s2.0-85115348298
			
	Pagina iniziale
	
				301
			
	Pagina finale
	
				304
			
	Tutti gli autori
	
						Bergamaschi, S.; Martoglia, R.; Ruozzi, F.; Vigliermo, R. A.; De Nardis, S.; Sala, L.; Vanzini, M.
					
	Citazione
	
				Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages / Bergamaschi, S., Martoglia, R., Ruozzi, F., Vigliermo, R.A., De Nardis, S., Sala, L., Vanzini, M.. - (2021), pp. 301-304. (1st Conference on Information Technology for Social Good, GoodIT 2021 Roma, ita 9-11 settembre 2021) [10.1145/3462203.3475927].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris