Unifying Event Detection and Captioning as Sequence Generation via Pre-Training | Read Paper on Bytez