bytez
Search
Feed
Models
Agent
Devs
Plan
docs
Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization | Read Paper on Bytez