bytez
Search
Feed
Models
Agent
Devs
Plan
docs
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification | Read Paper on Bytez