Stream join is a fundamental data operator for processing real-time data, but it faces computational challenges during stream inequality join (theta join operators) due to frequent updates in indexing data structures. To tackle this problem, we identify three key insights: 1) identifying skewed data distributions in real-time and implementing dedicated indexing structures for skewed keys to reduce index update costs; 2) leveraging optimized data structures, including insert-efficient mutable and search-efficient immutable structures to optimize the search stream join process and 3) adopting learned indexes instead of conventional ones, which can provide up to 4x better performance.In this Ph.D. work, we propose novel solutions for distributed and multi-core stream join processing, including an indexing solution that uses a space-efficient dedicated filter and a two-stage data structure that effectively holds and processes sliding window items (bounded streaming contents). We are also exploring the adoption and benefits of learned indexes for real-time stream join processing. Despite non-trivial challenges like state management for distributed processing, processing guarantees, and efficient concurrency mechanisms, experiments on distributed stream processing systems show superior performance compared to state-of-the-art solutions.

Efficient Stream Join Processing: Novel Approaches and Challenges / Aslam, A.; Simonini, G.. - (2024), pp. 409-412. (Intervento presentato al convegno 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024 tenutosi a Pisa, Italy nel June 3-7, 2024) [10.1145/3625549.3658833].

Efficient Stream Join Processing: Novel Approaches and Challenges

Aslam A.
;
Simonini G.
2024

Abstract

Stream join is a fundamental data operator for processing real-time data, but it faces computational challenges during stream inequality join (theta join operators) due to frequent updates in indexing data structures. To tackle this problem, we identify three key insights: 1) identifying skewed data distributions in real-time and implementing dedicated indexing structures for skewed keys to reduce index update costs; 2) leveraging optimized data structures, including insert-efficient mutable and search-efficient immutable structures to optimize the search stream join process and 3) adopting learned indexes instead of conventional ones, which can provide up to 4x better performance.In this Ph.D. work, we propose novel solutions for distributed and multi-core stream join processing, including an indexing solution that uses a space-efficient dedicated filter and a two-stage data structure that effectively holds and processes sliding window items (bounded streaming contents). We are also exploring the adoption and benefits of learned indexes for real-time stream join processing. Despite non-trivial challenges like state management for distributed processing, processing guarantees, and efficient concurrency mechanisms, experiments on distributed stream processing systems show superior performance compared to state-of-the-art solutions.
2024
33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024
Pisa, Italy
June 3-7, 2024
409
412
Aslam, A.; Simonini, G.
Efficient Stream Join Processing: Novel Approaches and Challenges / Aslam, A.; Simonini, G.. - (2024), pp. 409-412. (Intervento presentato al convegno 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024 tenutosi a Pisa, Italy nel June 3-7, 2024) [10.1145/3625549.3658833].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1373189
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact