We study collaborative machine learning at the wireless edge, where power and bandwidth-limited devices (workers), with limited local datasets, implement distributed stochastic gradient descent (DSGD) over-the-air with the help of a remote parameter server (PS). We consider a wireless multiple access channel (MAC) from the workers to the PS for communicating the local gradient estimates. We first introduce a digital DSGD (D-DSGD) scheme, assuming that the workers operate on the boundary of the MAC capacity region at each iteration of the DSGD algorithm, and digitize their estimates within the bit budget allowed by the employed power allocation. We then introduce an analog scheme, called A-DSGD, motivated by the additive nature of the wireless MAC, where the workers send their gradient estimates over the MAC through the available channel bandwidth without employing any digital code. Numerical results show that A-DSGD converges much faster than D-DSGD. The improvement is particularly compelling at low power and low bandwidth regimes. We also observe that the performance of A-DSGD improves with the number of workers, while D-DSGD deteriorates, limiting the ability of the latter in harnessing the computation power of many edge devices.
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air / Mohammadi Amiri, M.; Gunduz, D.. - 2019-:(2019), pp. 1432-1436. (Intervento presentato al convegno 2019 IEEE International Symposium on Information Theory, ISIT 2019 tenutosi a La Maison de La Mutualite, fra nel 2019) [10.1109/ISIT.2019.8849334].
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
D. Gunduz
2019
Abstract
We study collaborative machine learning at the wireless edge, where power and bandwidth-limited devices (workers), with limited local datasets, implement distributed stochastic gradient descent (DSGD) over-the-air with the help of a remote parameter server (PS). We consider a wireless multiple access channel (MAC) from the workers to the PS for communicating the local gradient estimates. We first introduce a digital DSGD (D-DSGD) scheme, assuming that the workers operate on the boundary of the MAC capacity region at each iteration of the DSGD algorithm, and digitize their estimates within the bit budget allowed by the employed power allocation. We then introduce an analog scheme, called A-DSGD, motivated by the additive nature of the wireless MAC, where the workers send their gradient estimates over the MAC through the available channel bandwidth without employing any digital code. Numerical results show that A-DSGD converges much faster than D-DSGD. The improvement is particularly compelling at low power and low bandwidth regimes. We also observe that the performance of A-DSGD improves with the number of workers, while D-DSGD deteriorates, limiting the ability of the latter in harnessing the computation power of many edge devices.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris