Noise Injection Irons Out Local Minima and Saddle Points

online resource

posted on 2024-10-10, 12:48 authored by Konstantin Mishchenko, Sebastian StichSebastian Stich

Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. It has been observed in practice that injecting artificial noise into stochastic gradient descent (SGD) can sometimes improve training and generalization performance. In this work, we formalize noise injection as a smoothing operator and (review and derive) convergence guarantees of SGD under smoothing. We empirically found that Gaussian smoothing works really well for training two-layer neural networks, but these findings do not translate to deeper nets. We would like to use this contribution to stimulate a discussion in the community to further investigate the impact of noise in training machine learning models.

History

Primary Research Area

Algorithmic Foundations and Cryptography

BibTeX

@misc{Mishchenko:Stich:2023, title = "Noise Injection Irons Out Local Minima and Saddle Points", author = "Mishchenko, Konstantin" AND "Stich, Sebastian", year = 2023, month = 12 }

Noise Injection Irons Out Local Minima and Saddle Points

History

Primary Research Area

BibTeX

Usage metrics

Categories

Keywords

Licence

Exports