CISPA
Browse
cispa_all_4060.pdf (1.39 MB)

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Download (1.39 MB)
conference contribution
posted on 2024-03-05, 12:18 authored by Xiaowen JiangXiaowen Jiang, Stich, Sebastian U.
The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have shown remarkable effectiveness when training overparameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. 2022), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al. 2022). So far, no VR methods successfully accelerate these two stepsizes with a convergence guarantee. In this work, we make two contributions: Firstly, we propose two new robust variants of SPS and SLS, called AdaSPS and AdaSLS, which achieve optimal asymptotic rates in both strongly-convex or convex and interpolation or noninterpolation settings, except for the case when we have both strong convexity and non-interpolation. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we propose a novel VR method that can use Polyak stepsizes or line-search to achieve acceleration. When it is equipped with AdaSPS or AdaSLS, the resulting algorithms obtain the optimal rate for optimizing convex smooth functions. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms.

History

Preferred Citation

Xiaowen Jiang, Sebastian U. Stich. Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction. In: Conference on Neural Information Processing Systems. 2024.

Primary Research Area

  • Trustworthy Information Processing

Name of Conference

Conference on Neural Information Processing Systems

Legacy Posted Date

2024-01-09

Open Access Type

  • Gold

BibTeX

@inproceedings{cispa_all_4060, author = {Xiaowen Jiang AND Sebastian U. Stich}, title = {Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction}, booktitle = {Conference on Neural Information Processing Systems}, year = {2024} }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC