CISPA
Browse

Noise Injection Irons Out Local Minima and Saddle Points

Download (244.82 kB)
online resource
posted on 2024-10-10, 12:48 authored by Konstantin Mishchenko, Sebastian StichSebastian Stich
Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. It has been observed in practice that injecting artificial noise into stochastic gradient descent (SGD) can sometimes improve training and generalization performance. In this work, we formalize noise injection as a smoothing operator and (review and derive) convergence guarantees of SGD under smoothing. We empirically found that Gaussian smoothing works really well for training two-layer neural networks, but these findings do not translate to deeper nets. We would like to use this contribution to stimulate a discussion in the community to further investigate the impact of noise in training machine learning models.

History

Primary Research Area

  • Algorithmic Foundations and Cryptography

BibTeX

@misc{Mishchenko:Stich:2023, title = "Noise Injection Irons Out Local Minima and Saddle Points", author = "Mishchenko, Konstantin" AND "Stich, Sebastian", year = 2023, month = 12 }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC