MATS: An Audio Language Model under Text-only Supervision | Read Paper on Bytez