SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios | Read Paper on Bytez