SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs | Read Paper on Bytez