bytez
Search
Feed
Models
Agent
Devs
Plan
docs
An eye for an ear: zero-shot audio description leveraging an image captioner with audio-visual token distribution matching | Read Paper on Bytez