Seminarraum 05.002, Spiegelgasse 5, 4051 Basel
Veranstalter:
Lovelace-Turing Club
Enhancing Optimizer Stability: Momentum Adaptation of NGN Step-size
TLDR: NGN-M combines momentum with an NGN adaptive step size to deliver state-of-the-art deep-learning performance that is far less sensitive to learning-rate tuning.
Abstract: Modern optimization algorithms that incorporate momentum and adaptive step-size offer improved performance in various challenging Deep Learning tasks. However, their effectiveness is often highly sensitive to the choice of hyper-parameters, especially the learning rate. Tuning these parameters is often difficult, resource-intensive, and time-consuming. State-of-the-art optimization algorithms incorporating momentum and adaptive step size are the algorithms of choice in several challenging Deep Learning domains. However, their effectiveness is frequently dependent on selecting the right hyper-parameters, especially the learning rate. Therefore, recent efforts have been directed toward enhancing the stability of optimizers across a wide range of hyper-parameter choices (Schaipp et al., 2024). In this paper, we introduce an algorithm that matches the performance of state-of-the-art optimizers while improving stability through a novel adaptation of the NGN step-size method (Orvieto & Xiao, 2024). Specifically, we propose a momentum-based version (NGN-M) that attains the standard convergence rate of $\mathcal{O}(1/\sqrt{K})$ under common assumptions, without the need for interpolation condition or assumptions of bounded stochastic gradients or iterates, in contrast to previous approaches. Additionally, we empirically demonstrate that the combination of the NGN step-size with momentum results in high robustness while delivering performance that is comparable to or surpasses other state-of-the-art optimizers.
Veranstaltung übernehmen als
iCal