Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes | Read Paper on Bytez