Finding Synonymous Attributes in Evolving Wikipedia Infoboxes

Sottovia, Paolo; Paganelli, Matteo; Guerra, Francesco; Velegrakis, Yannis

doi:10.1007/978-3-030-28730-6_11

Wikipedia Infoboxes are semi-structured data structures organized in an attribute-value fashion. Policies establish for each type of entity represented in Wikipedia the attribute names that the Infobox should contain in the form of a template. However, these requirements change over time and often users choose not to strictly obey them. As a result, it is hard to treat in an integrated way the history of the Wikipedia pages, making it difficult to analyze the temporal evolution of Wikipedia entities through their Infobox and impossible to perform direct comparison of entities of the same type. To address this challenge, we propose an approach to deal with the misalignment of the attribute names and identify clusters of synonymous Infobox attributes. Elements in the same cluster are considered as a temporal evolution of the same attribute. To identify the clusters we use two different distance metrics. The first is the co-occurrence degree that is treated as a negative distance, and the second is the co-occurrence of similar values in the attributes that are treated as a positive evidence of synonymy. We formalize the problem as a correlation clustering problem over a weighted graph constructed with attributes as nodes and positive and negative evidence as edges. We solve it with a linear programming model that shows a good approximation. Our experiments over a collection of Infoboxes of the last 13 years shows the potential of our approach.

Finding Synonymous Attributes in Evolving Wikipedia Infoboxes / Sottovia, P., Paganelli, M., Guerra, F., Velegrakis, Y.. - 11695:(2019), pp. 169-185. (23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019 Bled 8-11 September 2019) [10.1007/978-3-030-28730-6_11].

Finding Synonymous Attributes in Evolving Wikipedia Infoboxes

Sottovia, Paolo;Paganelli, Matteo;Guerra, Francesco;Velegrakis, Yannis

2019

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Titolo del Convegno
	
				23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019
			
	Luogo del Convegno
	
				Bled
			
	Data del Convegno
	
				8-11 September 2019
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-28730-6_11
			
	Codice WoS
	
				WOS:000558104700014
			
	Codice Scopus
	
				2-s2.0-85072840003
			
	Serie
	
				LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
			
	N° del Volume
	
				11695
			
	Pagina iniziale
	
				169
			
	Pagina finale
	
				185
			
	Tutti gli autori
	
						Sottovia, Paolo; Paganelli, Matteo; Guerra, Francesco; Velegrakis, Yannis
					
	Citazione
	
				Finding Synonymous Attributes in Evolving Wikipedia Infoboxes / Sottovia, P., Paganelli, M., Guerra, F., Velegrakis, Y.. - 11695:(2019), pp. 169-185. (23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019 Bled 8-11 September 2019) [10.1007/978-3-030-28730-6_11].
			
	Tipologia
	
				Relazione in Atti di Convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris