Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
2020·Arxiv