Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation | Read Paper on Bytez