Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models | Read Paper on Bytez