In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.
Parallelizing computations of full disjunctions / Paganelli, Matteo; Beneventano, Domenico; Guerra, Francesco; Sottovia, Paolo. - In: BIG DATA RESEARCH. - ISSN 2214-5796. - 17:(2019), pp. 18-31. [10.1016/j.bdr.2019.07.002]
Parallelizing computations of full disjunctions
Paganelli, Matteo;Beneventano, Domenico;Guerra, Francesco
;Sottovia, Paolo
2019
Abstract
In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.File | Dimensione | Formato | |
---|---|---|---|
Parallelizing computations of full disjunction.pdf
Accesso riservato
Tipologia:
Versione pubblicata dall'editore
Dimensione
1.29 MB
Formato
Adobe PDF
|
1.29 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Parallelizing_Computations_of_Full_Disjunctions__Camera_Ready_ (1).pdf
Open access
Tipologia:
Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione
760.35 kB
Formato
Adobe PDF
|
760.35 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris