Seminar room 212 in Kollegienhaus
Learning linear models in-context with transformers
Attention-based neural network sequence models such as transformers have the capacity to act as supervised learning algorithms: They can take as input a sequence of labeled examples and output predictions for unlabeled test examples. Indeed, recent work by Garg et al. has shown that when training GPT2 architectures over random instances of linear regression problems, these models' predictions mimic those of ordinary least squares. Towards understanding the mechanisms underlying this phenomenon, we investigate the dynamics of in-context learning of linear predictors for a transformer with a single linear self-attention layer trained by gradient flow. We show that despite the non-convexity of the underlying optimization problem, gradient flow with a random initialization finds a global minimum of the objective function. Moreover, when given a prompt of labeled examples from a new linear prediction task, the trained transformer achieves small prediction error on unlabeled test examples. We further characterize the behavior of the trained transformer under distribution shifts.
Spencer Frei is an Assistant Professor of Statistics at UC Davis. He is broadly interested in machine learning, statistics, and optimization, with a particular interest in understanding and improving deep learning. Prior to joining UC Davis, he was a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley, mentored by Peter Bartlett and Bin Yu as a part of the NSF/Simons Collaboration on the Theoretical Foundations of Deep Learning. He completed his PhD in Statistics at UCLA under the supervision of Quanquan Gu and Ying Nian Wu.
https://spencerfrei.github.io/
Veranstaltung übernehmen als
iCal