bytez
Search
Feed
Models
Agent
Devs
Model API
docs
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment | Read Paper on Bytez