Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks | Read Paper on Bytez