Coded Distributed Computing with Partial Recovery

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behavior and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed scheme in terms of the trade-off between the computation accuracy and latency.

Coded Distributed Computing with Partial Recovery / Ozfatura, E.; Ulukus, S.; Gunduz, D.. - In: IEEE TRANSACTIONS ON INFORMATION THEORY. - ISSN 0018-9448. - 68:3(2022), pp. 1945-1959. [10.1109/TIT.2021.3133791]

Coded Distributed Computing with Partial Recovery

Ozfatura E.;Ulukus S.;Gunduz D.

2022

Abstract

Coded computation techniques provide robustness against straggling workers in distributed computing. However, most of the existing schemes require exact provisioning of the straggling behavior and ignore the computations carried out by straggling workers. Moreover, these schemes are typically designed to recover the desired computation results accurately, while in many machine learning and iterative optimization algorithms, faster approximate solutions are known to result in an improvement in the overall convergence time. In this paper, we first introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated. Numerical simulations on a large linear regression task confirm the benefits of the proposed scheme in terms of the trade-off between the computation accuracy and latency.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2022
		
	Rivista
	
			IEEE TRANSACTIONS ON INFORMATION THEORY
		
	N° del Volume
	
			68
		
	Fascicolo
	
			3
		
	Pagina iniziale
	
			1945
		
	Pagina finale
	
			1959
		
	Codice DOI
	
			https://dx.doi.org/10.1109/TIT.2021.3133791
		
	Codice WoS
	
			WOS:000757850700033
		
	Codice Scopus
	
			2-s2.0-85121371661
		
	Citazione
	
			Coded Distributed Computing with Partial Recovery / Ozfatura, E.; Ulukus, S.; Gunduz, D.. - In: IEEE TRANSACTIONS ON INFORMATION THEORY. - ISSN 0018-9448. - 68:3(2022), pp. 1945-1959. [10.1109/TIT.2021.3133791]
		
	Tutti gli autori
	
			Ozfatura, E.; Ulukus, S.; Gunduz, D.
		
	Tipologia
	
			Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Coded_Distributed_Computing_With_Partial_Recovery.pdf Accesso riservato Tipologia: Versione pubblicata dall'editore Dimensione 1.39 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.39 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
2007.02191.pdf Open Access dal 01/04/2024 Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione Dimensione 477.32 kB Formato Adobe PDF Visualizza/Apri	477.32 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1280049

Citazioni

ND

12

7

social impact