When Safety Fails Before the Answer: Benchmarking Harmful Behavior Detection in Reasoning Chains

Devs

When Safety Fails Before the Answer: Benchmarking Harmful Behavior Detection in Reasoning Chains | Read Paper on Bytez