03Apr 2019

Seminar in probability theory: Arthur Jacot (EPFL)

Theory of Deep Learning 3: Neural Tangent Kernel: Convergence and Generalization of Deep Neural Networks

We show that the behaviour of a Deep Neural Network (DNN) during gradient descent is described by a new kernel: the Neural Tangent Kernel (NTK). More precisely, as the parameters are trained using gradient descent, the network function (which maps the network inputs to the network outputs) follows a so-called kernel gradient descent w.r.t. the NTK. We prove that as the network layers get wider and wider, the NTK converges to a deterministic limit at initialization, which stays constant during training. This implies in particular that if the NTK is positive definite, the network function converges to a global minimum. The NTK also describes how DNNs generalise outside the training set: for a least squares cost, the network function converges in expectation to the NTK kernel ridgeless regression, explaining how DNNs generalise in the so-called overparametrized regime, which is at the heart of most recent developments in deep learning.

Veranstaltung übernehmen als iCal