When gradient descent (GD) is scaled to many parallel computing servers (workers) for large scale machine learning problems, its per-iteration computation time is limited by the straggling workers. Coded distributed GD (DGD) can tolerate straggling workers by assigning redundant computations to the workers, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (master) after completing all its computations. We allow multiple computations to be conveyed from each worker per iteration in order to exploit computations executed also by the straggling worker. We show that the average completion time per iteration can be reduced significantly at a reasonable increase in the communication load. We also propose a general coded DGD technique which can trade-off the average computation time with the communication load.
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers / Ozfatura, E.; Gunduz, D.; Ulukus, S.. - 2019-:(2019), pp. 2729-2733. (Intervento presentato al convegno 2019 IEEE International Symposium on Information Theory, ISIT 2019 tenutosi a La Maison de La Mutualite, fra nel 2019) [10.1109/ISIT.2019.8849684].
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers
D. Gunduz;
2019
Abstract
When gradient descent (GD) is scaled to many parallel computing servers (workers) for large scale machine learning problems, its per-iteration computation time is limited by the straggling workers. Coded distributed GD (DGD) can tolerate straggling workers by assigning redundant computations to the workers, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (master) after completing all its computations. We allow multiple computations to be conveyed from each worker per iteration in order to exploit computations executed also by the straggling worker. We show that the average completion time per iteration can be reduced significantly at a reasonable increase in the communication load. We also propose a general coded DGD technique which can trade-off the average computation time with the communication load.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris