CISPA
Browse

The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Download (996.14 kB)
conference contribution
posted on 2024-10-10, 13:00 authored by Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Ziheng Cheng, Sebastian StichSebastian Stich, Nirmit Joshi, Nathan Srebro
Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.

History

Primary Research Area

  • Trustworthy Information Processing

Name of Conference

Conference on Learning Theory (COLT)

BibTeX

@conference{Patel:Glasgow:Zindari:Wang:Cheng:Stich:Joshi:Srebro:2024, title = "The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication", author = "Patel, Kumar Kshitij" AND "Glasgow, Margalit" AND "Zindari, Ali" AND "Wang, Lingxiao" AND "Cheng, Ziheng" AND "Stich, Sebastian" AND "Joshi, Nirmit" AND "Srebro, Nathan", year = 2024, month = 6 }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC