Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the “hidden meaning” associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels.However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.

Schema Label Normalization for Improving Schema Matching / Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura. - In: DATA & KNOWLEDGE ENGINEERING. - ISSN 0169-023X. - STAMPA. - 69:12(2010), pp. 1254-1273. [10.1016/j.datak.2010.10.004]

Schema Label Normalization for Improving Schema Matching

SORRENTINO, Serena;BERGAMASCHI, Sonia;GAWINECKI, MacieJ;PO, Laura
2010

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the “hidden meaning” associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels.However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.
2010
69
12
1254
1273
Schema Label Normalization for Improving Schema Matching / Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura. - In: DATA & KNOWLEDGE ENGINEERING. - ISSN 0169-023X. - STAMPA. - 69:12(2010), pp. 1254-1273. [10.1016/j.datak.2010.10.004]
Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura
File in questo prodotto:
File Dimensione Formato  
DKE2010.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 1.53 MB
Formato Adobe PDF
1.53 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
DKE_2010_POSTPRINT.pdf

Open access

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 625.75 kB
Formato Adobe PDF
625.75 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/646390
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 15
social impact