In machine learning and neural network optimization, algorithms like incremental gradient, single shuffle SGD, and random reshuffle SGD are popular due to their cache-mismatch efficiency and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if n
is the training set size, we improve n
times the optimization term of convergence guarantee to reach accuracy ϵ
from O(nϵ)
to O(1ϵ).
History
Tertiary Research Area
Trustworthy Information Processing
Name of Conference
International Conference on Machine Learning (ICML)
BibTeX
@conference{Koloskova:Doikov:Stich:Jaggi:2024,
title = "On Convergence of Incremental Gradient for Non-convex Smooth Functions",
author = "Koloskova, Anastasia" AND "Doikov, Nikita" AND "Stich, Sebastian" AND "Jaggi, Martin",
year = 2024,
month = 7
}