Modern applications face the challenge of dealing with structured and semi-structured data. They have to deal with complex objects, most of them presenting some kind of internal structure, which often forms a hierarchy. Though XML documents are the most known, chemical compounds, CAD drawings, web-sites and many other applications have to deal with similar problems. In such environments, ordered and unordered tree pattern matching are the fundamental search operations. One of the main thrusts of research activities for tree pattern matching is the class of holistic approaches. Their ultimate goal is to evaluate a query twig as a whole by relying on sequential access patterns and non trivial auxiliary storage structures, typically stored in main memory. Based on the pre/post-order ranks of individual tree nodes, we establish strong theoretical bases as a foundation for correct and efficient holistic pattern matching algorithms. In particular, we define and prove sufficient and necessary conditions to minimize the amount of data retained in memory, thus introducing a correct and complete framework on which different holistic solutions can be compared. We also show how these rules can be applied for building algorithms for ordered and unordered tree-pattern matching. Thanks to the above theoretical achievements, each holistic algorithm gains in efficiency as it is directly implemented on the adopted numbering scheme, avoids expensive matching refinements and keeps memory requirements stable. An experimental analysis and comparison with previous approaches confirms the superiority of our approach tested on synthetic as well as real-life data sets.

Principles of Holism for Sequential Twig Pattern Matching / Mandreoli, Federica; Martoglia, Riccardo; P., Zezula. - In: VLDB JOURNAL. - ISSN 1066-8888. - STAMPA. - 18:(2009), pp. 1369-1392. [10.1007/s00778-009-0143-4]

Principles of Holism for Sequential Twig Pattern Matching

MANDREOLI, Federica;MARTOGLIA, Riccardo;
2009

Abstract

Modern applications face the challenge of dealing with structured and semi-structured data. They have to deal with complex objects, most of them presenting some kind of internal structure, which often forms a hierarchy. Though XML documents are the most known, chemical compounds, CAD drawings, web-sites and many other applications have to deal with similar problems. In such environments, ordered and unordered tree pattern matching are the fundamental search operations. One of the main thrusts of research activities for tree pattern matching is the class of holistic approaches. Their ultimate goal is to evaluate a query twig as a whole by relying on sequential access patterns and non trivial auxiliary storage structures, typically stored in main memory. Based on the pre/post-order ranks of individual tree nodes, we establish strong theoretical bases as a foundation for correct and efficient holistic pattern matching algorithms. In particular, we define and prove sufficient and necessary conditions to minimize the amount of data retained in memory, thus introducing a correct and complete framework on which different holistic solutions can be compared. We also show how these rules can be applied for building algorithms for ordered and unordered tree-pattern matching. Thanks to the above theoretical achievements, each holistic algorithm gains in efficiency as it is directly implemented on the adopted numbering scheme, avoids expensive matching refinements and keeps memory requirements stable. An experimental analysis and comparison with previous approaches confirms the superiority of our approach tested on synthetic as well as real-life data sets.
2009
18
1369
1392
Principles of Holism for Sequential Twig Pattern Matching / Mandreoli, Federica; Martoglia, Riccardo; P., Zezula. - In: VLDB JOURNAL. - ISSN 1066-8888. - STAMPA. - 18:(2009), pp. 1369-1392. [10.1007/s00778-009-0143-4]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
File in questo prodotto:
File Dimensione Formato  
VLDBJ-mandreoli.pdf

Accesso riservato

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 924.02 kB
Formato Adobe PDF
924.02 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/609107
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact