HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning | Read Paper on Bytez