Measuring AI Ability to Complete Long Software Tasks | Read Paper on Bytez