VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model | Read Paper on Bytez