b
Discover
Models
Search
About
Non-asymptotic Convergence of Training Transformers for Next-token Prediction
2 weeks ago
·
NeurIPS