The rapid digitization of cultural heritage has underscored the critical need for robust digital libraries, particularly for underrepresented languages like Arabic and Persian. This paper describes the methodologies and challenges involved in developing a metadata-driven Arabic digital library, utilizing bibliographic metadata extracted from the Diamond catalogue. It explores advanced metadata schemas, such as Dublin Core, and integrates text recognition technologies and preservation strategies to address key concerns of accessibility, scholarly use, and the long-term preservation of Arabic-script texts. The paper delves into specific challenges of processing Arabic script, including handling calligraphy, diacritics, and ligatures, and introduces innovative solutions like the use of frontispiece images to train OCR systems. Furthermore, it discusses how integrated metadata could not only enhance text recognition but also improve user engagement by enabling refined search functionalities and better resource discovery. Finally, the paper outlines future directions for expanding metadata frameworks to ensure interoperability and the long-term preservation of cultural heritage.
Digital Maktaba Project: Proposing a Metadata-Driven Framework for Arabic Library Digitization / EL GANADI, Amina; Gagliardelli, Luca; Aftar, Sania; Ruozzi, Federico. - Vol-3937:(2025). (Intervento presentato al convegno IRCDL 2025: 21st Conference on Information and Research Sciences Connecting to Digital and Library Science tenutosi a Udine, Italy nel February, 20-21 2025).
Digital Maktaba Project: Proposing a Metadata-Driven Framework for Arabic Library Digitization
Amina El Ganadi
Writing – Original Draft Preparation
;Luca GagliardelliWriting – Review & Editing
;Sania AftarMembro del Collaboration Group
;Federico RuozziSupervision
2025
Abstract
The rapid digitization of cultural heritage has underscored the critical need for robust digital libraries, particularly for underrepresented languages like Arabic and Persian. This paper describes the methodologies and challenges involved in developing a metadata-driven Arabic digital library, utilizing bibliographic metadata extracted from the Diamond catalogue. It explores advanced metadata schemas, such as Dublin Core, and integrates text recognition technologies and preservation strategies to address key concerns of accessibility, scholarly use, and the long-term preservation of Arabic-script texts. The paper delves into specific challenges of processing Arabic script, including handling calligraphy, diacritics, and ligatures, and introduces innovative solutions like the use of frontispiece images to train OCR systems. Furthermore, it discusses how integrated metadata could not only enhance text recognition but also improve user engagement by enabling refined search functionalities and better resource discovery. Finally, the paper outlines future directions for expanding metadata frameworks to ensure interoperability and the long-term preservation of cultural heritage.File | Dimensione | Formato | |
---|---|---|---|
short13.pdf
Open access
Tipologia:
VOR - Versione pubblicata dall'editore
Dimensione
2.83 MB
Formato
Adobe PDF
|
2.83 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris