b
Discover
Models
Search
About
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
2 weeks ago
·
NeurIPS