Extraction, Transformation and Loading processes (ETL) are crucial for the data warehouseconsistency and are typically based on constraints and requirements expressed in natural language in the form ofcomments and documentations. This task is poorly supported by automatic software applications, thus makingthese activities a huge works for data warehouse. In a traditional business scenario, this fact does not representa real big issue, since the sources populating a data warehouse are fixed and directly known by the dataadministrator. Nowadays, the actual business needs require enterprise information systems to have a greatflexibility concerning the allowed business analysis and the treated data. Temporary alliances of enterprises,market analysis processes, the data availability on Internet push enterprises to quickly integrate unexpected datasources for their activities. Therefore, the reference scenario for data warehouse systems extremely changes,since data sources populating the data warehouse may not directly be known and managed by the designers,thus creating new requirements for ETL tools related to the improvement of the automation of the extraction andtransformation process, the need of managing heterogeneous attribute values and the ability to manage differentkinds of data sources, ranging from DBMS, to flat file, XML documents and spreadsheets. In this paper wepropose a semantic-driven tool that couples and extends the functionalities of two systems: the MOMISintegration system and the RELEVANT data analysis system. The tool aims at supporting the semi-automaticdefinition of ETL inter-attribute mappings and transformations in a data warehouse project. By means of asemantic analysis, two tasks are performed: 1) identification of the parts of the schemata of the data sourceswhich are related to the data warehouse; 2) supporting the definition of transformation rules for populating thedata warehouse. We experimented the approach in a real scenario: preliminary qualitative results show that ourtool may really support the data warehouse administrator’s work, by considerably reducing the data warehousedesign time.
|Data di pubblicazione:||2009|
|Titolo:||Improving Extraction and Transformation in ETL by Semantic Analysis|
|Autori:||Francesco Guerra; Sonia Bergamaschi; Mirko Orsini; Claudio Sartori; Maurizio Vincini|
|Data del convegno:||3-4 September 2009|
|Nome del convegno:||European Conference on Knowledge Management|
|Luogo del convegno:||Vicenza, Italy|
|Titolo del libro:||Proceedings of the 10th European Conference on Knowledge Management|
|Appare nelle tipologie:||Relazione in Atti di Convegno|
File in questo prodotto:
I documenti presenti in Iris Unimore sono rilasciati con licenza Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia, salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris