Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition | Read Paper on Bytez