Data integration aims at combining data acquired from different autonomous sources to provide the user with a unified view of this data. One of the main challenges in data integration processes is entity resolution, whose goal is to detect the different representations of the same real-world entity across the sources, in order to produce a unique and consistent representation for it. The advent of big data has challenged traditional data integration paradigms, making the offline batch approach to entity resolution no longer suitable for several scenarios (e.g., when performing data exploration or dealing with datasets that change with a high frequency). Therefore, it becomes of primary importance to produce new solutions capable of operating effectively in such situations. In this paper, I present some contributions made during the first half of my PhD program, mainly focusing on the design of a framework to perform entity resolution in an on-demand fashion, building on the results achieved by the progressive and query-driven approaches to this task. Moreover, I also briefly describe two projects in which I took part as a member of my research group, touching on some real-world applications of big data integration techniques, to conclude with some ideas on the future directions of my research.

Task-Driven Big Data Integration / Zecchini, Luca. - 3194:(2022), pp. 627-632. (Intervento presentato al convegno 30th Italian Symposium on Advanced Database Systems (SEBD 2022) tenutosi a Tirrenia (Pisa) nel June 19-22, 2022).

Task-Driven Big Data Integration

Zecchini, Luca
2022

Abstract

Data integration aims at combining data acquired from different autonomous sources to provide the user with a unified view of this data. One of the main challenges in data integration processes is entity resolution, whose goal is to detect the different representations of the same real-world entity across the sources, in order to produce a unique and consistent representation for it. The advent of big data has challenged traditional data integration paradigms, making the offline batch approach to entity resolution no longer suitable for several scenarios (e.g., when performing data exploration or dealing with datasets that change with a high frequency). Therefore, it becomes of primary importance to produce new solutions capable of operating effectively in such situations. In this paper, I present some contributions made during the first half of my PhD program, mainly focusing on the design of a framework to perform entity resolution in an on-demand fashion, building on the results achieved by the progressive and query-driven approaches to this task. Moreover, I also briefly describe two projects in which I took part as a member of my research group, touching on some real-world applications of big data integration techniques, to conclude with some ideas on the future directions of my research.
2022
19-giu-2022
30th Italian Symposium on Advanced Database Systems (SEBD 2022)
Tirrenia (Pisa)
June 19-22, 2022
3194
627
632
Zecchini, Luca
Task-Driven Big Data Integration / Zecchini, Luca. - 3194:(2022), pp. 627-632. (Intervento presentato al convegno 30th Italian Symposium on Advanced Database Systems (SEBD 2022) tenutosi a Tirrenia (Pisa) nel June 19-22, 2022).
File in questo prodotto:
File Dimensione Formato  
Task-Driven Big Data Integration.pdf

Open access

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 1.51 MB
Formato Adobe PDF
1.51 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1285761
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact