Optimizing machine learning models often requires careful tuning of parameters, especially the
learning rate. Traditional methods involve exhaustive searches or adopting pre-established rates,
both with drawbacks. The former is computationally intensive, a concern amplified by the trend toward larger models like large language models (LLM). The latter risks suboptimal model training.
Consequently, there is growing research on adaptive and parameter-free approaches to reduce reliance on manual step size tuning. While adaptive gradient methods like AdaGrad, RMSProp, and
Adam aim to adjust learning rates dynamically, they still rely on learning rate parameters dependent
on problem-specific characteristics. Our work explores the interplay between step size and gradient dissimilarity, introducing a “diversity-adjusted adaptive stepsize” that adapts to different levers
of dissimilarity in sampled gradients within the SGD algorithm. We also investigate approximate
algorithms to compute this step size efficiently while maintaining performance.
History
Primary Research Area
Trustworthy Information Processing
BibTeX
@misc{Yazdkhasti:Jiang:Stich:2023,
title = "Diversity-adjusted adaptive step size",
author = "Yazdkhasti, Parham" AND "Jiang, Xiaowen" AND "Stich, sebastian",
year = 2023,
month = 12
}