MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention | Read Paper on Bytez