Data integration is a technique used to combine different sources of data together to provide an unified view among them. MOMIS[1] is an open-source data integration framework developed by the DBGroup1. The goal of our work is to make MOMIS be able to scale-out as the input data sources increase without introducing noticeable performance penalty. In particular, we present a full outer join method capable to efficiently integrate multiple sources at the same time by using data streams and provenance information. To evaluate the scalability of this innovative approach, we developed a join engine employing a distributed data processing framework. Our solution is able to process input data sources in the form of continuous stream, execute the join operation on-the-fly and produce outputs as soon as they are generated. In this way, the join can return partial results before the input streams have been completely received or processed optimizing the entire execution.

Sopj: A scalable online provenance join for data integration / Zhu, Song; S., Email Author; Fiameni, Giuseppe; G., Email Author; Simonini, Giovanni; G., Email Author; Bergamaschi, S.. - (2017), pp. 79-85. (Intervento presentato al convegno 15th International Conference on High Performance Computing and Simulation, HPCS 2017 tenutosi a Genova nel Italy; 17 July) [10.1109/HPCS.2017.23].

Sopj: A scalable online provenance join for data integration

Zhu;Fiameni
;
Simonini;Bergamaschi, S.
2017

Abstract

Data integration is a technique used to combine different sources of data together to provide an unified view among them. MOMIS[1] is an open-source data integration framework developed by the DBGroup1. The goal of our work is to make MOMIS be able to scale-out as the input data sources increase without introducing noticeable performance penalty. In particular, we present a full outer join method capable to efficiently integrate multiple sources at the same time by using data streams and provenance information. To evaluate the scalability of this innovative approach, we developed a join engine employing a distributed data processing framework. Our solution is able to process input data sources in the form of continuous stream, execute the join operation on-the-fly and produce outputs as soon as they are generated. In this way, the join can return partial results before the input streams have been completely received or processed optimizing the entire execution.
2017
15th International Conference on High Performance Computing and Simulation, HPCS 2017
Genova
Italy; 17 July
79
85
Zhu, Song; S., Email Author; Fiameni, Giuseppe; G., Email Author; Simonini, Giovanni; G., Email Author; Bergamaschi, S.
Sopj: A scalable online provenance join for data integration / Zhu, Song; S., Email Author; Fiameni, Giuseppe; G., Email Author; Simonini, Giovanni; G., Email Author; Bergamaschi, S.. - (2017), pp. 79-85. (Intervento presentato al convegno 15th International Conference on High Performance Computing and Simulation, HPCS 2017 tenutosi a Genova nel Italy; 17 July) [10.1109/HPCS.2017.23].
File in questo prodotto:
File Dimensione Formato  
08035062.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 318.62 kB
Formato Adobe PDF
318.62 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1149628
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact