Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis | Read Paper on Bytez