Analyzing How BERT Performs Entity Matching

State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as BERT, for generating highly contextualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the users, who have limited insight into the motivations behind their decisions. In this paper, we perform a multi-facet analysis of the components of pre-trained and fine-tuned BERT architectures applied to an EM task. The main findings resulting from our extensive experimental evaluation are (1) the fine-tuning process applied to the EM task mainly modifies the last layers of the BERT components, but in a different way on tokens belonging to descriptions of matching / non-matching entities; (2) the special structure of the EM datasets, where records are pairs of entity descriptions is recognized by BERT; (3) the pair-wise semantic similarity of tokens is not a key knowledge exploited by BERT-based EM models.

Analyzing How BERT Performs Entity Matching / Paganelli, M.; Del Buono, F.; Baraldi, A.; Guerra, F.. - In: PROCEEDINGS OF THE VLDB ENDOWMENT. - ISSN 2150-8097. - 15:8(2022), pp. 1726-1738. (Intervento presentato al convegno 48th International Conference on Very Large Data Bases, VLDB 2022 tenutosi a aus nel 2022) [10.14778/3529337.3529356].

Analyzing How BERT Performs Entity Matching

Paganelli M.;Del Buono F.;Baraldi A.;Guerra F.

2022

Abstract

State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as BERT, for generating highly contextualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the users, who have limited insight into the motivations behind their decisions. In this paper, we perform a multi-facet analysis of the components of pre-trained and fine-tuned BERT architectures applied to an EM task. The main findings resulting from our extensive experimental evaluation are (1) the fine-tuning process applied to the EM task mainly modifies the last layers of the BERT components, but in a different way on tokens belonging to descriptions of matching / non-matching entities; (2) the special structure of the EM datasets, where records are pairs of entity descriptions is recognized by BERT; (3) the pair-wise semantic similarity of tokens is not a key knowledge exploited by BERT-based EM models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Data di prima pubblicazione
	
				2022
			
	Rivista
	
				PROCEEDINGS OF THE VLDB ENDOWMENT
			
	N° del Volume
	
				15
			
	Fascicolo
	
				8
			
	Pagina iniziale
	
				1726
			
	Pagina finale
	
				1738
			
	Codice DOI
	
				https://dx.doi.org/10.14778/3529337.3529356
			
	Codice WoS
	
				WOS:000992381800018
			
	Codice Scopus
	
				2-s2.0-85140953114
			
	Citazione
	
				Analyzing How BERT Performs Entity Matching / Paganelli, M.; Del Buono, F.; Baraldi, A.; Guerra, F.. - In: PROCEEDINGS OF THE VLDB ENDOWMENT. - ISSN 2150-8097. - 15:8(2022), pp. 1726-1738. (Intervento presentato al  convegno 48th International Conference on Very Large Data Bases, VLDB 2022 tenutosi a aus nel 2022) [10.14778/3529337.3529356].
			
	Tutti gli autori
	
						Paganelli, M.; Del Buono, F.; Baraldi, A.; Guerra, F.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
p1726-paganelli.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Dimensione 3.08 MB Formato Adobe PDF Visualizza/Apri	3.08 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1291984

Citazioni

ND

21

9

social impact