Learning to Generalize from Sparse and Underspecified Rewards | Read Paper on Bytez