Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.

Unsupervised Evaluation of Data Integration Processes / Paganelli, M.; Buono, F. D.; Guerra, F.; Ferro, N.. - (2020), pp. 77-81. (Intervento presentato al convegno 22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020 tenutosi a tha nel 2020) [10.1145/3428757.3429129].

Unsupervised Evaluation of Data Integration Processes

Paganelli M.
;
Guerra F.
;
2020

Abstract

Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.
2020
22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020
tha
2020
77
81
Paganelli, M.; Buono, F. D.; Guerra, F.; Ferro, N.
Unsupervised Evaluation of Data Integration Processes / Paganelli, M.; Buono, F. D.; Guerra, F.; Ferro, N.. - (2020), pp. 77-81. (Intervento presentato al convegno 22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020 tenutosi a tha nel 2020) [10.1145/3428757.3429129].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1237815
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact