SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents | Read Paper on Bytez