b
Discover
Models
Search
About
In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization
2 weeks ago
·
NeurIPS