XML is among the preferred formats for storing the structure of documents such as scientic articles, manuals, documentation, literary works, etc. Sometimes publishers adopt established and well-known vocabularies such as DocBook and TEI, other times they create partially or entirely new ones that better deal with the particular requirements of their documents. The (explicit and implicit) requirements of use in these vocabularies often follow well-established patterns, creating meta-structures (the block, the container, the inline element, etc.) that persist across vocabularies and authors and that describe a truer and more general conceptualization of the documents' building blocks. Addressing such meta-structures not only gives a better insight of what documents really are composed of, but provides abstract and more general mechanisms to work on documents regardless of the availability of specic schemas, tools and presentation stylesheets. In this paper we introduce a schema-independent theory based on eleven structural patterns. We provide a denition of such patterns and how they synthesize characteristics emerging from real markup documents. Additionally, we propose an algorithm that allows us to identify the pattern of each element in a set of homogeneous markup documents.

A first approach to the automatic recognition of structural patterns in XML documents / Di Iorio, A.; Peroni, S.; Poggi, F.; Vitali, F.. - (2012), pp. 85-94. (Intervento presentato al convegno 2012 ACM Symposium on Document Engineering, DocEng 2012 tenutosi a Paris, fra nel September 4-7, 2012) [10.1145/2361354.2361374].

A first approach to the automatic recognition of structural patterns in XML documents

Poggi F.;
2012

Abstract

XML is among the preferred formats for storing the structure of documents such as scientic articles, manuals, documentation, literary works, etc. Sometimes publishers adopt established and well-known vocabularies such as DocBook and TEI, other times they create partially or entirely new ones that better deal with the particular requirements of their documents. The (explicit and implicit) requirements of use in these vocabularies often follow well-established patterns, creating meta-structures (the block, the container, the inline element, etc.) that persist across vocabularies and authors and that describe a truer and more general conceptualization of the documents' building blocks. Addressing such meta-structures not only gives a better insight of what documents really are composed of, but provides abstract and more general mechanisms to work on documents regardless of the availability of specic schemas, tools and presentation stylesheets. In this paper we introduce a schema-independent theory based on eleven structural patterns. We provide a denition of such patterns and how they synthesize characteristics emerging from real markup documents. Additionally, we propose an algorithm that allows us to identify the pattern of each element in a set of homogeneous markup documents.
2012
2012 ACM Symposium on Document Engineering, DocEng 2012
Paris, fra
September 4-7, 2012
85
94
Di Iorio, A.; Peroni, S.; Poggi, F.; Vitali, F.
A first approach to the automatic recognition of structural patterns in XML documents / Di Iorio, A.; Peroni, S.; Poggi, F.; Vitali, F.. - (2012), pp. 85-94. (Intervento presentato al convegno 2012 ACM Symposium on Document Engineering, DocEng 2012 tenutosi a Paris, fra nel September 4-7, 2012) [10.1145/2361354.2361374].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1199166
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? ND
social impact