This paper reports the runner-up solution to the ACM SIGMOD 2020 programming contest, whose target was to identify the specifications (i.e., records) collected across 24 e-commerce data sources that refer to the same real-world entities. First, we investigate the machine learning (ML) approach, but surprisingly find that existing state-of-the-art ML-based methods fall short in such a context-not reaching 0.49 F-score. Then, we propose an efficient solution that exploits annotated lists and regular expressions generated by humans that reaches a 0.99 F-score. In our experience, our approach was not more expensive than the dataset labeling of match/non-match pairs required by ML-based methods, in terms of human efforts.

Entity resolution on camera records without machine learning / Zecchini, L., Simonini, G., Bergamaschi, S.. - 2726:(2020). (2nd International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs, DI2KG 2020 jpn 2020).

Entity resolution on camera records without machine learning

Zecchini L.;Simonini G.
Writing – Original Draft Preparation
;
Bergamaschi S.
Writing – Original Draft Preparation
2020

Abstract

This paper reports the runner-up solution to the ACM SIGMOD 2020 programming contest, whose target was to identify the specifications (i.e., records) collected across 24 e-commerce data sources that refer to the same real-world entities. First, we investigate the machine learning (ML) approach, but surprisingly find that existing state-of-the-art ML-based methods fall short in such a context-not reaching 0.49 F-score. Then, we propose an efficient solution that exploits annotated lists and regular expressions generated by humans that reaches a 0.99 F-score. In our experience, our approach was not more expensive than the dataset labeling of match/non-match pairs required by ML-based methods, in terms of human efforts.
2020
no
Inglese
2nd International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs, DI2KG 2020
jpn
2020
CEUR Workshop Proceedings
2726
CEUR-WS
Data integration; Data wrangling; Entity matching; Entity resolution
Zecchini, L.; Simonini, G.; Bergamaschi, S.
Atti di CONVEGNO::Relazione in Atti di Convegno
273
3
Entity resolution on camera records without machine learning / Zecchini, L., Simonini, G., Bergamaschi, S.. - 2726:(2020). (2nd International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs, DI2KG 2020 jpn 2020).
none
info:eu-repo/semantics/conferenceObject
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1222884
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact