Entity Resolution (ER) is a fundamental task of data integration: it identifies different representations (i.e., profiles) of the same real-world entity in databases. To compare all possible profile pairs through an ER algorithm has a quadratic complexity. Blocking is commonly employed to avoid that: profiles are grouped into blocks according to some features, and ER is performed only for entities of the same block. Yet, devising blocking criteria and ER algorithms for data with highly schema heterogeneity is a difficult and error-prone task calling for automatic methods and debugging tools. In our previous work, we presented Blast, an ER system that can scale practitioners’ favorite Entity Resolution algorithms. In current version, Blast has been devised to take full advantage of parallel and distributed computation as well (running on top of Apache Spark). It implements the state-of-the-art unsuper- vised blocking method based on automatically extracted loose schema information. It comes with a GUI, which allows: (i) to visualize, understand, and (optionally) manually modify the loose schema information automatically extracted (i.e., injecting user’s knowledge in the system); (ii) to retrieve resolved entities through a free-text search box, and to visualize the process that lead to that result (i.e., the provenance). Experimental results on real-world datasets show that these two functionalities can significantly enhance Entity Resolution results.
|Data di pubblicazione:||2018|
|Titolo:||Enhancing Loosely Schema-aware Entity Resolution with User Interaction|
|Autore/i:||Simonini, Giovanni; Gagliardelli, Luca; Zhu, Song; Bergamaschi, Sonia|
|Nome del convegno:||2018 International Conference on High Performance Computing & Simulation, HPCS 2018, Orleans, France, July 16-20, 2018.|
|Luogo del convegno:||Orléans, France|
|Data del convegno:||16-20 luglio 2018|
|Citazione:||Enhancing Loosely Schema-aware Entity Resolution with User Interaction / Simonini, Giovanni; Gagliardelli, Luca; Zhu, Song; Bergamaschi, Sonia. - (2018), pp. 844-851. ((Intervento presentato al convegno 2018 International Conference on High Performance Computing & Simulation, HPCS 2018, Orleans, France, July 16-20, 2018. tenutosi a Orléans, France nel 16-20 luglio 2018.|
|Tipologia||Relazione in Atti di Convegno|
I documenti presenti in Iris Unimore sono rilasciati con licenza Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia, salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris