Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.
Unsupervised Evaluation of Data Integration Processes / Paganelli, M.; Buono, F. D.; Guerra, F.; Ferro, N.. - (2020), pp. 77-81. (Intervento presentato al convegno 22nd International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2020 tenutosi a tha nel 2020) [10.1145/3428757.3429129].
Unsupervised Evaluation of Data Integration Processes
Paganelli M.
;Guerra F.
;
2020
Abstract
Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris