InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption | Read Paper on Bytez