OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | Read Paper on Bytez