Perception Encoder: The best visual embeddings are not at the output of the network | Read Paper on Bytez