Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient | Read Paper on Bytez