b

DiscoverModelsSearch
About
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
1 week ago
·
NeurIPS