b
Discover
Models
Search
About
An eye for an ear: zero-shot audio description leveraging an image captioner with audio-visual token distribution matching
1 week ago
·
NeurIPS