CISPA
Browse

Diversity-adjusted adaptive step size

Download (1005.95 kB)
online resource
posted on 2024-10-10, 12:47 authored by Parham Yazdkhasti, Xiaowen Jiang, Sebastian StichSebastian Stich
Optimizing machine learning models often requires careful tuning of parameters, especially the learning rate. Traditional methods involve exhaustive searches or adopting pre-established rates, both with drawbacks. The former is computationally intensive, a concern amplified by the trend toward larger models like large language models (LLM). The latter risks suboptimal model training. Consequently, there is growing research on adaptive and parameter-free approaches to reduce reliance on manual step size tuning. While adaptive gradient methods like AdaGrad, RMSProp, and Adam aim to adjust learning rates dynamically, they still rely on learning rate parameters dependent on problem-specific characteristics. Our work explores the interplay between step size and gradient dissimilarity, introducing a “diversity-adjusted adaptive stepsize” that adapts to different levers of dissimilarity in sampled gradients within the SGD algorithm. We also investigate approximate algorithms to compute this step size efficiently while maintaining performance.

History

Primary Research Area

  • Trustworthy Information Processing

BibTeX

@misc{Yazdkhasti:Jiang:Stich:2023, title = "Diversity-adjusted adaptive step size", author = "Yazdkhasti, Parham" AND "Jiang, Xiaowen" AND "Stich, sebastian", year = 2023, month = 12 }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC