bytez
Search
Feed
Models
Agent
Devs
Plan
docs
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training | Read Paper on Bytez