UAVM: Towards Unifying Audio and Visual Models | Read Paper on Bytez