Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Jiang, Xiaowen; Stich, Sebastian U.

doi:10.60882/cispa.25304443.v1

cispa_all_4060.pdf (1.39 MB)

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

conference contribution

posted on 2024-03-05, 12:18 authored by Xiaowen JiangXiaowen Jiang, Stich, Sebastian U.

The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have shown remarkable effectiveness when training overparameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. 2022), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al. 2022). So far, no VR methods successfully accelerate these two stepsizes with a convergence guarantee. In this work, we make two contributions: Firstly, we propose two new robust variants of SPS and SLS, called AdaSPS and AdaSLS, which achieve optimal asymptotic rates in both strongly-convex or convex and interpolation or noninterpolation settings, except for the case when we have both strong convexity and non-interpolation. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we propose a novel VR method that can use Polyak stepsizes or line-search to achieve acceleration. When it is equipped with AdaSPS or AdaSLS, the resulting algorithms obtain the optimal rate for optimizing convex smooth functions. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms.

History

Preferred Citation

Xiaowen Jiang, Sebastian U. Stich. Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction. In: Conference on Neural Information Processing Systems. 2024.

Primary Research Area

Trustworthy Information Processing

Name of Conference

Conference on Neural Information Processing Systems (NeurIPS)

Legacy Posted Date

2024-01-09

Open Access Type

Gold

BibTeX

@inproceedings{cispa_all_4060, author = {Xiaowen Jiang AND Sebastian U. Stich}, title = {Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction}, booktitle = {Conference on Neural Information Processing Systems}, year = {2024} }

Usage metrics

Keywords

Trustworthy Information Processing

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

History

Preferred Citation

Primary Research Area

Name of Conference

Legacy Posted Date

Open Access Type

BibTeX

Usage metrics

Categories

Keywords

Licence

Exports