Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers | Read Paper on Bytez