Parallelizing computations of full disjunctions

Paganelli, Matteo; Beneventano, Domenico; Guerra, Francesco; Sottovia, Paolo

doi:10.1016/j.bdr.2019.07.002

In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.

Parallelizing computations of full disjunctions / Paganelli, M., Beneventano, D., Guerra, F., Sottovia, P.. - In: BIG DATA RESEARCH. - ISSN 2214-5796. - 17:(2019), pp. 18-31. [10.1016/j.bdr.2019.07.002]

Parallelizing computations of full disjunctions

Paganelli, Matteo;Beneventano, Domenico;Guerra, Francesco;Sottovia, Paolo

2019

Abstract

In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Rivista
	
				BIG DATA RESEARCH
			
	N° del Volume
	
				17
			
	Pagina iniziale
	
				18
			
	Pagina finale
	
				31
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.bdr.2019.07.002
			
	Codice WoS
	
				WOS:000483397300002
			
	Codice Scopus
	
				2-s2.0-85069975530
			
	Citazione
	
				Parallelizing computations of full disjunctions / Paganelli, M., Beneventano, D., Guerra, F., Sottovia, P.. - In: BIG DATA RESEARCH. - ISSN 2214-5796. - 17:(2019), pp. 18-31. [10.1016/j.bdr.2019.07.002]
			
	Tutti gli autori
	
						Paganelli, Matteo; Beneventano, Domenico; Guerra, Francesco; Sottovia, Paolo
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Parallelizing computations of full disjunction.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 1.29 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.29 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Parallelizing_Computations_of_Full_Disjunctions__Camera_Ready_ (1).pdf Open access Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 760.35 kB Formato Adobe PDF Visualizza/Apri	760.35 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris