b
Discover
Models
Search
About
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
1 week ago
·
NeurIPS