Position: AI Evaluation Should Learn from How We Test Humans | Read Paper on Bytez