Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions | Read Paper on Bytez