When gradient descent (GD) is scaled to many parallel computing servers (workers) for large scale machine learning problems, its per-iteration computation time is limited by the straggling workers. Coded distributed GD (DGD) can tolerate straggling workers by assigning redundant computations to the workers, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (master) after completing all its computations. We allow multiple computations to be conveyed from each worker per iteration in order to exploit computations executed also by the straggling worker. We show that the average completion time per iteration can be reduced significantly at a reasonable increase in the communication load. We also propose a general coded DGD technique which can trade-off the average computation time with the communication load.

Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers / Ozfatura, E.; Gunduz, D.; Ulukus, S.. - 2019-:(2019), pp. 2729-2733. (Intervento presentato al convegno 2019 IEEE International Symposium on Information Theory, ISIT 2019 tenutosi a La Maison de La Mutualite, fra nel 2019) [10.1109/ISIT.2019.8849684].

Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers

D. Gunduz;
2019

Abstract

When gradient descent (GD) is scaled to many parallel computing servers (workers) for large scale machine learning problems, its per-iteration computation time is limited by the straggling workers. Coded distributed GD (DGD) can tolerate straggling workers by assigning redundant computations to the workers, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (master) after completing all its computations. We allow multiple computations to be conveyed from each worker per iteration in order to exploit computations executed also by the straggling worker. We show that the average completion time per iteration can be reduced significantly at a reasonable increase in the communication load. We also propose a general coded DGD technique which can trade-off the average computation time with the communication load.
2019
2019 IEEE International Symposium on Information Theory, ISIT 2019
La Maison de La Mutualite, fra
2019
2019-
2729
2733
Ozfatura, E.; Gunduz, D.; Ulukus, S.
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers / Ozfatura, E.; Gunduz, D.; Ulukus, S.. - 2019-:(2019), pp. 2729-2733. (Intervento presentato al convegno 2019 IEEE International Symposium on Information Theory, ISIT 2019 tenutosi a La Maison de La Mutualite, fra nel 2019) [10.1109/ISIT.2019.8849684].
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1202691
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 55
  • ???jsp.display-item.citation.isi??? 47
social impact