Evaluating language models as risk scores | Read Paper on Bytez