Schema Normalization for Improving Schema Matching

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the \hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations.In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.

Schema Normalization for Improving Schema Matching / Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura. - STAMPA. - 5829:(2009), pp. 280-293. (Intervento presentato al convegno International Conference on Conceptual Modeling (ER 2009) tenutosi a Gramado, Brasile nel 9-12 Novembre 2009) [10.1007/978-3-642-04840-1_22].

Schema Normalization for Improving Schema Matching

SORRENTINO, Serena;BERGAMASCHI, Sonia;GAWINECKI, MacieJ;PO, Laura

2009

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the \hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations.In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2009
		
	Titolo del Convegno
	
			International Conference on Conceptual Modeling (ER 2009)
		
	Luogo del Convegno
	
			Gramado, Brasile
		
	Data del Convegno
	
			9-12 Novembre 2009
		
	Codice DOI
	
			https://dx.doi.org/10.1007/978-3-642-04840-1_22
		
	Codice WoS
	
			WOS:000278558400022
		
	Codice Scopus
	
			2-s2.0-78649477578
		
	Serie
	
			LECTURE NOTES IN COMPUTER SCIENCE
		
	N° del Volume
	
			5829
		
	Pagina iniziale
	
			280
		
	Pagina finale
	
			293
		
	Tutti gli autori
	
			Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura
		
	Citazione
	
			Schema Normalization for Improving Schema Matching / Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura. - STAMPA. - 5829:(2009), pp. 280-293. (Intervento presentato al  convegno International Conference on Conceptual Modeling (ER 2009) tenutosi a Gramado, Brasile nel 9-12 Novembre 2009) [10.1007/978-3-642-04840-1_22].
		
	Tipologia
	
			Relazione in Atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
ER_2009_POSTPRINT.pdf Open access Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 200.99 kB Formato Adobe PDF Visualizza/Apri	200.99 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/615696

Citazioni

ND

20

17

social impact