We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers from which at most b ≤ n can be utilized in parallel. By assigning tasks to all the workers and waiting only for the k fastest ones, the main node can trade-off the error of the algorithm with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k-sync, can incur additional costs since it ignores the computational efforts of slow workers. We propose a cost-efficient scheme that assigns tasks only to k workers and gradually increases k. As the response times of the available workers are unknown to the main node a priori, we utilize a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations, and to minimize the effect of slow workers. Assuming that the mean response times of the workers are independent and exponentially distributed with different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Compared to adaptive k-sync, our scheme achieves significantly lower errors with the same computational efforts while being inferior in terms of speed.

Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits / Egger, M.; Bitar, R.; Wachter-Zeh, A.; Gunduz, D.. - 2022-:(2022), pp. 1653-1658. (Intervento presentato al convegno 2022 IEEE International Symposium on Information Theory, ISIT 2022 tenutosi a fin nel 2022) [10.1109/ISIT50566.2022.9834499].

Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits

Gunduz D.
2022-01-01

Abstract

We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers from which at most b ≤ n can be utilized in parallel. By assigning tasks to all the workers and waiting only for the k fastest ones, the main node can trade-off the error of the algorithm with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k-sync, can incur additional costs since it ignores the computational efforts of slow workers. We propose a cost-efficient scheme that assigns tasks only to k workers and gradually increases k. As the response times of the available workers are unknown to the main node a priori, we utilize a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations, and to minimize the effect of slow workers. Assuming that the mean response times of the workers are independent and exponentially distributed with different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Compared to adaptive k-sync, our scheme achieves significantly lower errors with the same computational efforts while being inferior in terms of speed.
2022
2022 IEEE International Symposium on Information Theory, ISIT 2022
fin
2022
2022-
1653
1658
Egger, M.; Bitar, R.; Wachter-Zeh, A.; Gunduz, D.
Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits / Egger, M.; Bitar, R.; Wachter-Zeh, A.; Gunduz, D.. - 2022-:(2022), pp. 1653-1658. (Intervento presentato al convegno 2022 IEEE International Symposium on Information Theory, ISIT 2022 tenutosi a fin nel 2022) [10.1109/ISIT50566.2022.9834499].
File in questo prodotto:
File Dimensione Formato  
Efficient_Distributed_Machine_Learning_via_Combinatorial_Multi-Armed_Bandits.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 2.04 MB
Formato Adobe PDF
2.04 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1286022
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact