The paper studies the application of automated machine learning approaches (AutoML) for addressing the problem of Entity Matching (EM). This would make the existing, highly effective, Machine Learning (ML) and Deep Learning based approaches for EM usable also by non-expert users, who do not have the expertise to train and tune such complex systems. Our experiments show that the direct application of AutoML systems to this scenario does not provide high quality results. To address this issue, we introduce a new component, the EM adapter, to be pipelined with standard AutoML systems, that preprocesses the EM datasets to make them usable by automated approaches. The experimental evaluation shows that our proposal obtains the same effectiveness as the state-of-the-art EM systems, but it does not require any skill on ML to tune it.
Automated Machine Learning for Entity Matching Tasks / Paganelli, Matteo; DEL BUONO, Francesco; Pevarello, Marco; Guerra, Francesco; Vincini, Maurizio. - 2021-:(2021), pp. 325-330. (Intervento presentato al convegno EDBT tenutosi a Nicosia nel 23-26 March 2021) [10.5441/002/edbt.2021.29].
Automated Machine Learning for Entity Matching Tasks
Matteo Paganelli
;Francesco Del Buono
;Francesco Guerra
;Maurizio Vincini
2021
Abstract
The paper studies the application of automated machine learning approaches (AutoML) for addressing the problem of Entity Matching (EM). This would make the existing, highly effective, Machine Learning (ML) and Deep Learning based approaches for EM usable also by non-expert users, who do not have the expertise to train and tune such complex systems. Our experiments show that the direct application of AutoML systems to this scenario does not provide high quality results. To address this issue, we introduce a new component, the EM adapter, to be pipelined with standard AutoML systems, that preprocesses the EM datasets to make them usable by automated approaches. The experimental evaluation shows that our proposal obtains the same effectiveness as the state-of-the-art EM systems, but it does not require any skill on ML to tune it.File | Dimensione | Formato | |
---|---|---|---|
p260 (5).pdf
Open access
Tipologia:
Versione pubblicata dall'editore
Dimensione
471.97 kB
Formato
Adobe PDF
|
471.97 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris