Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation | Read Paper on Bytez