Approximate exploitability: Learning a best response in large games