Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text as well as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.

Layout analysis and content classification in digitized books / Corbelli, Andrea; Baraldi, Lorenzo; Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita. - ELETTRONICO. - 701:(2017), pp. 153-165. ( 12th Italian Research Conference on Digital Libraries, IRCDL 2016 Firenze Feb. 4-5) [10.1007/978-3-319-56300-8_14].

Layout analysis and content classification in digitized books

CORBELLI, ANDREA;BARALDI, LORENZO;BALDUCCI, FABRIZIO;GRANA, Costantino;CUCCHIARA, Rita
2017

Abstract

Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text as well as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.
2017
no
Inglese
12th Italian Research Conference on Digital Libraries, IRCDL 2016
Firenze
Feb. 4-5
Digital Libraries and Multimedia Archives
701
153
165
13
978-3-319-56300-8
Springer International Publishing
GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
Nazionale
digitazion, digital libraries, layout analysis, content classification
Corbelli, Andrea; Baraldi, Lorenzo; Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita
Atti di CONVEGNO::Relazione in Atti di Convegno
273
5
Layout analysis and content classification in digitized books / Corbelli, Andrea; Baraldi, Lorenzo; Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita. - ELETTRONICO. - 701:(2017), pp. 153-165. ( 12th Italian Research Conference on Digital Libraries, IRCDL 2016 Firenze Feb. 4-5) [10.1007/978-3-319-56300-8_14].
open
info:eu-repo/semantics/conferenceObject
File in questo prodotto:
File Dimensione Formato  
2016_IRCDL.pdf

Open access

Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 4.81 MB
Formato Adobe PDF
4.81 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1084436
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 5
social impact