FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training | Read Paper on Bytez