Distributed computing enables large-scale computation tasks to be processed by multiple workers in parallel. However, the randomness of communication and computation delays across the workers causes the straggler effect, which may degrade the delay performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and task assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to the workers with different computing capabilities. The goal is to minimize the communication plus computation delay of all the tasks. We propose joint worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks from either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks.

Coded Computation Across Shared Heterogeneous Workers With Communication Delay / Sun, Y.; Zhang, F.; Zhao, J.; Zhou, S.; Niu, Z.; Gunduz, D.. - In: IEEE TRANSACTIONS ON SIGNAL PROCESSING. - ISSN 1053-587X. - 70:(2022), pp. 3371-3385. [10.1109/TSP.2022.3185905]

Coded Computation Across Shared Heterogeneous Workers With Communication Delay

Zhou S.;Gunduz D.
2022

Abstract

Distributed computing enables large-scale computation tasks to be processed by multiple workers in parallel. However, the randomness of communication and computation delays across the workers causes the straggler effect, which may degrade the delay performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and task assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to the workers with different computing capabilities. The goal is to minimize the communication plus computation delay of all the tasks. We propose joint worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks from either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks.
2022
70
3371
3385
Coded Computation Across Shared Heterogeneous Workers With Communication Delay / Sun, Y.; Zhang, F.; Zhao, J.; Zhou, S.; Niu, Z.; Gunduz, D.. - In: IEEE TRANSACTIONS ON SIGNAL PROCESSING. - ISSN 1053-587X. - 70:(2022), pp. 3371-3385. [10.1109/TSP.2022.3185905]
Sun, Y.; Zhang, F.; Zhao, J.; Zhou, S.; Niu, Z.; Gunduz, D.
File in questo prodotto:
File Dimensione Formato  
Coded_Computation_Across_Shared_Heterogeneous_Workers_With_Communication_Delay.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 1.52 MB
Formato Adobe PDF
1.52 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
2109.11246.pdf

Open Access dal 01/07/2024

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1286015
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 8
social impact