bytez
Search
Feed
Models
Agent
Devs
Plan
docs
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL | Read Paper on Bytez