SDPGO: Efficient Self-Distillation Training Meets Proximal Gradient Optimization | Read Paper on Bytez