BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Sabre//Sabre VObject 4.5.8//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:Europe/Zurich
X-LIC-LOCATION:Europe/Zurich
TZURL:http://tzurl.org/zoneinfo/Europe/Zurich
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19810329T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19961027T030000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:news1605@dmi.unibas.ch
DTSTAMP;TZID=Europe/Zurich:20231030T110327
DTSTART;TZID=Europe/Zurich:20231109T110000
SUMMARY:Learning linear models in-context with transformers
DESCRIPTION:Attention-based neural network sequence models such as transfor
 mers have the capacity to act as supervised learning algorithms: They can 
 take as input a sequence of labeled examples and output predictions for un
 labeled test examples.  Indeed\, recent work by Garg et al. has shown tha
 t when training GPT2 architectures over random instances of linear regress
 ion problems\, these models' predictions mimic those of ordinary least squ
 ares.  Towards understanding the mechanisms underlying this phenomenon\, 
 we investigate the dynamics of in-context learning of linear predictors fo
 r a transformer with a single linear self-attention layer trained by gradi
 ent flow.  We show that despite the non-convexity of the underlying optim
 ization problem\, gradient flow with a random initialization finds a globa
 l minimum of the objective function.  Moreover\, when given a prompt of l
 abeled examples from a new linear prediction task\, the trained transforme
 r achieves small prediction error on unlabeled test examples.  We further
  characterize the behavior of the trained transformer under distribution s
 hifts. \\r\\nSpencer Frei is an Assistant Professor of Statistics at UC 
 Davis. He is broadly interested in machine learning\, statistics\, and opt
 imization\, with a particular interest in understanding and improving deep
  learning. Prior to joining UC Davis\, he was a postdoctoral fellow at th
 e Simons Institute for the Theory of Computing at UC Berkeley\, mentored b
 y Peter Bartlett and Bin Yu as a part of the NSF/Simons Collaboration on t
 he Theoretical Foundations of Deep Learning. He completed his PhD in Stati
 stics at UCLA under the supervision of Quanquan Gu and Ying Nian Wu. \\r\
 \nhttps://spencerfrei.github.io/ [https://spencerfrei.github.io/]
X-ALT-DESC:<p>Attention-based neural network sequence models such as transf
 ormers have the capacity to act as supervised learning algorithms: They ca
 n take as input a sequence of labeled examples and output predictions for 
 unlabeled test examples.&nbsp\; Indeed\, recent work by Garg et al. has sh
 own that when training GPT2 architectures over random instances of linear 
 regression problems\, these models' predictions mimic those of ordinary le
 ast squares.&nbsp\; Towards understanding the mechanisms underlying this p
 henomenon\, we investigate the dynamics of in-context learning of linear p
 redictors for a transformer with a single linear self-attention layer trai
 ned by gradient flow.&nbsp\; We show that despite the non-convexity of the
  underlying optimization problem\, gradient flow with a random initializat
 ion finds a global minimum of the objective function.&nbsp\; Moreover\, wh
 en given a prompt of labeled examples from a new linear prediction task\, 
 the trained transformer achieves small prediction error on unlabeled test 
 examples.&nbsp\; We further characterize the behavior of the trained trans
 former under distribution shifts.&nbsp\;</p>\n<p>Spencer Frei is&nbsp\;an 
 Assistant Professor of Statistics at UC Davis. He is broadly interested in
  machine learning\, statistics\, and optimization\, with a particular inte
 rest in understanding and improving deep learning.&nbsp\;Prior to joining 
 UC Davis\, he was a postdoctoral fellow at the Simons Institute for the Th
 eory of Computing at UC Berkeley\, mentored by Peter Bartlett and Bin Yu a
 s a part of the NSF/Simons Collaboration on the Theoretical Foundations of
  Deep Learning. He completed his PhD in Statistics at UCLA under the super
 vision of Quanquan Gu and Ying Nian Wu.&nbsp\;</p>\n<p><a href="https://sp
 encerfrei.github.io/">https://spencerfrei.github.io/</a></p>
END:VEVENT
END:VCALENDAR
