We present SparkER, an ER tool that can scale practitioners’ favorite ER algorithms. SparkER has been devised to take full ad- vantage of parallel and distributed computation as well (running on top of Apache Spark). The first SparkER version was focused on the blocking step and implements both schema-agnostic and Blast meta-blocking approaches (i.e. the state-of-the-art ones); a GUI for SparkER, to let non-expert users to use it in an unsupervised mode, was developed. The new version of SparkER to be shown in this demo, extends significantly the tool. Entity matching and Entity Clustering modules have been added. Moreover, in addition to the completely unsupervised mode of the first version, a supervised mode has been added. The user can be assisted in supervising the entire process and in injecting his knowledge in order to achieve the best result. During the demonstration, attendees will be shown how SparkER can significantly help in devising and debugging ER algorithms.
SparkER: Scaling Entity Resolution in Spark / Gagliardelli, Luca; Simonini, Giovanni; Beneventano, Domenico; Bergamaschi, Sonia. - 2019-:(2019), pp. 602-605. (Intervento presentato al convegno 22nd International Conference on Extending Database Technology, EDBT 2019 tenutosi a Lisbon, Portugal nel March 26-29, 2019) [10.5441/002/edbt.2019.66].
SparkER: Scaling Entity Resolution in Spark
Luca Gagliardelli;Giovanni Simonini;Domenico Beneventano;Sonia Bergamaschi
2019
Abstract
We present SparkER, an ER tool that can scale practitioners’ favorite ER algorithms. SparkER has been devised to take full ad- vantage of parallel and distributed computation as well (running on top of Apache Spark). The first SparkER version was focused on the blocking step and implements both schema-agnostic and Blast meta-blocking approaches (i.e. the state-of-the-art ones); a GUI for SparkER, to let non-expert users to use it in an unsupervised mode, was developed. The new version of SparkER to be shown in this demo, extends significantly the tool. Entity matching and Entity Clustering modules have been added. Moreover, in addition to the completely unsupervised mode of the first version, a supervised mode has been added. The user can be assisted in supervising the entire process and in injecting his knowledge in order to achieve the best result. During the demonstration, attendees will be shown how SparkER can significantly help in devising and debugging ER algorithms.File | Dimensione | Formato | |
---|---|---|---|
EDBT19_paper_347.pdf
Open access
Tipologia:
Versione pubblicata dall'editore
Dimensione
1.78 MB
Formato
Adobe PDF
|
1.78 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris