When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback | Read Paper on Bytez