Entity Resolution, the task of identifying records that refer to the same real-world entity, is a fundamental step in data integration. Blocking is a widely employed technique to avoid the comparison of all possible record pairs in a dataset (an inefficient approach). Renouncing to exploit schema information for blocking has been proved to limit the chance of missing matches (i.e., it guarantees high recall), at the cost of a low precision. Meta-blocking alleviates this issue by restructuring a block collection, removing redundant and superfluous comparisons. Yet, existing meta-blocking techniques exclusively rely on schema-agnostic features. In this paper, we investigate how loose schema information, induced directly from the data, can be exploited in an holistic loosely schema-aware (meta-)blocking approach that outperforms the state-of-the-art meta-blocking in terms of precision, without renouncing high level of recall. We implemented our idea in a system called Blast, and experimentally evaluated it on real-world datasets.

Enhancing entity resolution efficiency with loosely schema-aware techniques - Discussion paper / Simonini, G.; Bergamaschi, S.. - (2016), pp. 270-277. ((Intervento presentato al convegno 24th Italian Symposium on Advanced Database Systems, SEBD 2016 tenutosi a ita nel 2016.

Enhancing entity resolution efficiency with loosely schema-aware techniques - Discussion paper

Simonini G.;Bergamaschi S.
2016

Abstract

Entity Resolution, the task of identifying records that refer to the same real-world entity, is a fundamental step in data integration. Blocking is a widely employed technique to avoid the comparison of all possible record pairs in a dataset (an inefficient approach). Renouncing to exploit schema information for blocking has been proved to limit the chance of missing matches (i.e., it guarantees high recall), at the cost of a low precision. Meta-blocking alleviates this issue by restructuring a block collection, removing redundant and superfluous comparisons. Yet, existing meta-blocking techniques exclusively rely on schema-agnostic features. In this paper, we investigate how loose schema information, induced directly from the data, can be exploited in an holistic loosely schema-aware (meta-)blocking approach that outperforms the state-of-the-art meta-blocking in terms of precision, without renouncing high level of recall. We implemented our idea in a system called Blast, and experimentally evaluated it on real-world datasets.
24th Italian Symposium on Advanced Database Systems, SEBD 2016
ita
2016
270
277
Simonini, G.; Bergamaschi, S.
Enhancing entity resolution efficiency with loosely schema-aware techniques - Discussion paper / Simonini, G.; Bergamaschi, S.. - (2016), pp. 270-277. ((Intervento presentato al convegno 24th Italian Symposium on Advanced Database Systems, SEBD 2016 tenutosi a ita nel 2016.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11380/1200649
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact