Managing and sharing cultural heritages also in supranational and multi-literate contexts is a very hot research topic. In this paper we discuss the research we are conducting in the DigitalMaktaba project, presenting the first steps for designing an innovative workflow and tool for the automatic extraction of knowledge from documents written in multiple non-Latin languages (Arabic, Persian and Azerbaijani languages). The tool leverages different OCR, text processing techniques and linguistic corpora in order to provide both a highly accurate extracted text and a rich metadata content, overcoming typical limitations of current state-of-the-art systems; this will enable in the near future the development of an automatic cataloguer which we hope will ultimately help in better preserving and conserving culture in such a demanding scenario.
Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages / Bergamaschi, S.; Martoglia, R.; Ruozzi, F.; Vigliermo, R. A.; De Nardis, S.; Sala, L.; Vanzini, M.. - (2021), pp. 301-304. (Intervento presentato al convegno 1st Conference on Information Technology for Social Good, GoodIT 2021 tenutosi a ita nel 2021) [10.1145/3462203.3475927].
Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages
Bergamaschi S.;Martoglia R.;Ruozzi F.;Vigliermo R. A.;Sala L.;Vanzini M.
2021
Abstract
Managing and sharing cultural heritages also in supranational and multi-literate contexts is a very hot research topic. In this paper we discuss the research we are conducting in the DigitalMaktaba project, presenting the first steps for designing an innovative workflow and tool for the automatic extraction of knowledge from documents written in multiple non-Latin languages (Arabic, Persian and Azerbaijani languages). The tool leverages different OCR, text processing techniques and linguistic corpora in order to provide both a highly accurate extracted text and a rich metadata content, overcoming typical limitations of current state-of-the-art systems; this will enable in the near future the development of an automatic cataloguer which we hope will ultimately help in better preserving and conserving culture in such a demanding scenario.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris