posted on 2024-10-10, 13:00authored byKumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Ziheng Cheng, Sebastian StichSebastian Stich, Nirmit Joshi, Nathan Srebro
Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.
History
Primary Research Area
Trustworthy Information Processing
Name of Conference
Conference on Learning Theory (COLT)
BibTeX
@conference{Patel:Glasgow:Zindari:Wang:Cheng:Stich:Joshi:Srebro:2024,
title = "The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication",
author = "Patel, Kumar Kshitij" AND "Glasgow, Margalit" AND "Zindari, Ali" AND "Wang, Lingxiao" AND "Cheng, Ziheng" AND "Stich, Sebastian" AND "Joshi, Nirmit" AND "Srebro, Nathan",
year = 2024,
month = 6
}