bytez
Search
Feed
Models
Agent
Devs
Model API
docs
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking | Read Paper on Bytez