CISPA
Browse
cispa_all_3800.pdf (581.16 kB)

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Download (581.16 kB)
Version 2 2023-12-14, 12:33
Version 1 2023-11-29, 18:22
conference contribution
posted on 2023-12-14, 12:33 authored by Anastasia Koloskova, Martin Jaggi, Sebastian StichSebastian Stich
We study the asynchronous stochastic gradient descent algorithm for distributed training over n workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return those to the server without any synchronization. Existing convergence rates of this algorithm for non-convex smooth objectives depend on the maximum gradient delay τ_{max} and show that an ϵ-stationary point is reached after O(σ^2ϵ^{−2}+τ_{max}ϵ^{−1}) iterations, where σ denotes the variance of stochastic gradients. In this work (i) we obtain a tighter convergence rate of O(σ^2ϵ^{−2}+ √ τ_{max}τ_{avg}ϵ^{−1}) without any change in the algorithm where τ_{avg} is the average delay, which can be significantly smaller than τ_{max}. We also provide (ii) a simple delay-adaptive learning rate scheme, under which asynchronous SGD achieves a convergence rate of O(σ^2ϵ^{−2}+τ_{avg}ϵ^{−1}), and does not require any extra hyperparameter tuning nor extra communications. Our result allows to show for the first time that asynchronous SGD is always faster than mini-batch SGD. In addition, (iii) we consider the case of heterogeneous functions motivated by federated learning applications and improve the convergence rate by proving a weaker dependence on the maximum delay compared to prior works. In particular, we show that the heterogeneity term in convergence rate is only affected by the average delay within each worker.

History

Preferred Citation

Anastasia Koloskova, Sebastian Stich and Martin Jaggi. Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning. In: Conference on Neural Information Processing Systems (NeurIPS). 2022.

Primary Research Area

  • Trustworthy Information Processing

Name of Conference

Conference on Neural Information Processing Systems (NeurIPS)

Legacy Posted Date

2022-10-12

Open Access Type

  • Green

BibTeX

@inproceedings{cispa_all_3800, title = "Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning", author = "Koloskova, Anastasia and Stich, Sebastian U. and Jaggi, Martin", booktitle="{Conference on Neural Information Processing Systems (NeurIPS)}", year="2022", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC