Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design | Read Paper on Bytez