UFT: Unifying Supervised and Reinforcement Fine-Tuning | Read Paper on Bytez