BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Sabre//Sabre VObject 4.5.7//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:Europe/Zurich
X-LIC-LOCATION:Europe/Zurich
TZURL:http://tzurl.org/zoneinfo/Europe/Zurich
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19810329T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19961027T030000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:news841@dmi.unibas.ch
DTSTAMP;TZID=Europe/Zurich:20190319T144730
DTSTART;TZID=Europe/Zurich:20190403T110000
SUMMARY:Seminar in probability theory: Arthur Jacot (EPFL)
DESCRIPTION:We show that the behaviour of a Deep Neural Network (DNN) durin
 g gradient descent is described by a new kernel: the Neural Tangent Kernel
  (NTK). More precisely\, as the parameters are trained using gradient desc
 ent\, the network function (which maps the network inputs to the network o
 utputs) follows a so-called kernel gradient descent w.r.t. the NTK. We pro
 ve that as the network layers get wider and wider\, the NTK converges to a
  deterministic limit at initialization\, which stays constant during train
 ing. This implies in particular that if the NTK is positive definite\, the
  network function converges to a global minimum. The NTK also describes ho
 w DNNs generalise outside the training set: for a least squares cost\, the
  network function converges in expectation to the NTK kernel ridgeless reg
 ression\, explaining how DNNs generalise in the so-called overparametrized
  regime\, which is at the heart of most recent developments in deep learni
 ng.
X-ALT-DESC:We show that the behaviour of a Deep Neural Network (DNN) during
  gradient descent is described by a new kernel: the Neural Tangent Kernel 
 (NTK). More precisely\, as the parameters are trained using gradient desce
 nt\, the network function (which maps the network inputs to the network ou
 tputs) follows a so-called kernel gradient descent w.r.t. the NTK. We prov
 e that as the network layers get wider and wider\, the NTK converges to a 
 deterministic limit at initialization\, which stays constant during traini
 ng. This implies in particular that if the NTK is positive definite\, the 
 network function converges to a global minimum. The NTK also describes how
  DNNs generalise outside the training set: for a least squares cost\, the 
 network function converges in expectation to the NTK kernel ridgeless regr
 ession\, explaining how DNNs generalise in the so-called overparametrized 
 regime\, which is at the heart of most recent developments in deep learnin
 g. 
END:VEVENT
END:VCALENDAR
