Hoạt động trong tuần

Convergence rates for the Adam optimizer
Báo cáo viên: Prof. Arnulf Jentzen, The Chinese University of Hong Kong

Thời gian: 10h30-11h30, thứ 6 ngày 28/3/2025

Địa điểm: Hội trường Hoàng Tụy, Nhà A6, Viện Toán học

Tóm tắt: Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems.

In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods such as the famous Adam optimizer are applied. In this work we establishing optimal convergence rates for the Adam optimizer for a large class of stochastic optimization problems covering strongly convex stochastic optimization problems. The key ingredient of our convergence analysis is a new vector field function which we propose to refer to as the Adam vector field. This Adam vector field accurately describes the macroscopic behaviour of the Adam optimization process but differs from the negative gradient of the objective function (the function we intend to minimize) of the considered stochastic optimization problem. In particular, our convergence analysis suggests that Adam does typically not converge to critical points of the objective function (zeros of the gradient of the objective function) of the considered optimization problem but converges with rates to zeros of this Adam vector field. Finally, we present acceleration techniques for Adam in the context of deep learning approximations for partial differential equation and optimal control problems. The talk is based on joint works with Steffen Dereich, Thang Do, Robin Graeber, and Adrian Riekert.

References:

  1. S. Dereich & A. Jentzen, Convergence rates for the Adam optimizer, arXiv:2407.21078 (2024), 43 pages.
  2. S. Dereich, R. Graeber, & A. Jentzen, Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates, arXiv:2407.08100 (2024), 54 pages.
  3. T. Do, A. Jentzen, & A. Riekert, Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks, arXiv:2503.01660 (2025), 42 pages.
  4. A. Jentzen & A. Riekert, Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks, arXiv:2402.05155 (2024), 36 pages, to appear in SIAM/ASA J. Uncertain. Quantif.

 

Trở lại