VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models | Read Paper on Bytez