Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.

Evaluating the integration of datasets / Paganelli, Matteo; Buono, Francesco Del; Guerra, Francesco; Ferro, Nicola. - (2022), pp. 347-356. ((Intervento presentato al convegno 37th ACM/SIGAPP Symposium on Applied Computing tenutosi a virtual nel April 2022 [10.1145/3477314.3507688].

Evaluating the integration of datasets

Paganelli, Matteo;Buono, Francesco Del;Guerra, Francesco;
2022

Abstract

Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.
2022
37th ACM/SIGAPP Symposium on Applied Computing
virtual
April 2022
347
356
Paganelli, Matteo; Buono, Francesco Del; Guerra, Francesco; Ferro, Nicola
Evaluating the integration of datasets / Paganelli, Matteo; Buono, Francesco Del; Guerra, Francesco; Ferro, Nicola. - (2022), pp. 347-356. ((Intervento presentato al convegno 37th ACM/SIGAPP Symposium on Applied Computing tenutosi a virtual nel April 2022 [10.1145/3477314.3507688].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11380/1276439
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact