Stratified Prediction-Powered Inference for Effective Hybrid Evaluation of Language Models | Read Paper on Bytez