36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"1605.07139","publisher":"arxiv","paperJSON":{"title":"Fairness in Learning: Classic and Contextual Bandits","paperID":"1605.07139","avgLineHeight":13.55,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. We prove results of two types:","element":"span"}],[{"text":"First, in the important special case of the classic stochastic bandits problem (i.e. in which there are no contexts), we provide a provably fair algorithm based on chained confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case.","element":"span"}],[{"text":"In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.","element":"span"}]]},{"heading":"1 Introduction","paragraphs":[[{"text":"Automated techniques from statistics and machine learning are increasingly being used to make decisions that have important consequences on people’s lives, including hiring ","element":"span"},{"href":"#id-0","referenceIndex":24,"text":"[Miller, ","element":"a"},{"href":"#id-0","referenceIndex":24,"text":"2015]","element":"a"},{"text":", lending ","element":"span"},{"href":"#id-1","referenceIndex":10,"text":"[Byrnes, ","element":"a"},{"href":"#id-1","referenceIndex":10,"text":"2016]","element":"a"},{"text":", policing ","element":"span"},{"href":"#id-2","referenceIndex":27,"text":"[Rudin, ","element":"a"},{"href":"#id-2","referenceIndex":27,"text":"2013]","element":"a"},{"text":", and even criminal sentencing ","element":"span"},{"href":"#id-3","referenceIndex":7,"text":"[Barry-Jester et al., ","element":"a"},{"href":"#id-3","referenceIndex":7,"text":"2015]","element":"a"},{"text":". These high stakes uses of machine learning have led to increasing concern in law and policy circles about the potential for (often opaque) machine learning techniques to be ","element":"span"},{"style":{"fontStyle":"italic"},"text":"discriminatory ","element":"span"},{"text":"or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"unfair ","element":"span"},{"href":"#id-4","referenceIndex":13,"text":"[Coglianese and Lehr, ","element":"a"},{"href":"#id-4","referenceIndex":13,"text":"2016, ","element":"a"},{"href":"#id-5","referenceIndex":6,"text":"Barocas and Selbst, ","element":"a"},{"href":"#id-5","referenceIndex":6,"text":"2016]","element":"a"},{"text":". Moreover, these concerns are not merely hypothetical: ","element":"span"},{"href":"#id-6","referenceIndex":29,"text":"Sweeney ","element":"a"},{"href":"#id-6","referenceIndex":29,"text":"[2013] ","element":"a"},{"text":"observed that contextual ads for public record services shown in response to Google searches for stereotypically African American names were more likely to contain text referring to arrest records, compared to comparable ads shown in response to searches for stereotypically Caucasian names, which showed more neutral text. She confirmed that this was not because of stated preferences of the advertisers, but rather the automated outcome of Google’s targeting algorithms. Despite the recognized importance of this problem, very little is known about technical solutions to the problem of “unfairness”, or the extent to which “fairness” is in conflict with the goals of learning.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-0.png","element":"img","alt":"1","inline":true}],[{"text":"In this paper, we consider the extent to which a natural fairness notion is compatible with learning in a general setting (the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"contextual bandit setting","element":"span"},{"text":"), which can be used to model many of the applications mentioned above in which machine learning is currently employed. In this model, the learner is a sequential decision maker, which must choose at each time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"which decision to make, out of a finite set of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"choices (for example, which of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"loan applicants – potentially from different populations or racial groups – to give a loan to). Before the learner makes its decision at round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", it observes some ","element":"span"},{"style":{"height":21.62},"width":193.64,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-1.png","element":"img","alt":" context xtj ","inline":true,"padRight":true},{"text":"for each choice of arm ","element":"span"},{"style":{"height":21.62},"width":92.96,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-2.png","element":"img","alt":" j (xtj ","inline":true,"padRight":true},{"text":"could, for example, represent ","element":"span"},{"text":"the contents of the loan application of an individual from population ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"at round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":"). When the learner chooses arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"at time ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", it obtains a stochastic ","element":"span"},{"style":{"height":21.62},"width":179.62,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-3.png","element":"img","alt":" reward rtj ","inline":true,"padRight":true},{"text":"whose expectation is determined ","element":"span"},{"text":"by some unknown function of the context: ","element":"span"},{"style":{"height":32.4},"width":272.94,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-4.png","element":"img","alt":" E�rtj�= fj(xtj","inline":true},{"text":"). The goal of the learning algorithm is ","element":"span"},{"text":"to maximize its expected reward – i.e. to approximate the optimal policy, which at each round, chooses arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"to maximize ","element":"span"},{"style":{"height":32.4},"width":115.18,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-5.png","element":"img","alt":" E�rtj�","inline":true},{"text":". The difficulty in this task stems from the unknown functions ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/2-6.png","element":"img","alt":"fj","inline":true,"padRight":true},{"text":"which map contexts to rewards; these functions must be learned. Despite this, there are many known algorithms for learning the optimal policy (in the absence of any fairness constraint).","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Fairness and Learning","element":"span"}],[{"text":"Our notion of individual fairness is very simple: it states that it is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"unfair ","element":"span"},{"text":"to preferentially choose one individual (e.g. ","element":"span"},{"text":"for a loan, a job, admission to college, etc.) ","element":"span"},{"text":"over another if he or she is not as qualified as the other individual. This definition of fairness is apt for our setting, since in contextual learning, the quality of an arm is clear: its expected reward. We view different arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"as representing different populations (e.g. different ethnic groups, cultures, or other divisions within society), and view the context ","element":"span"},{"style":{"height":21.62},"width":251.58,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-0.png","element":"img","alt":" xtj at round t","inline":true,"padRight":true},{"text":"as representing information about a particular ","element":"span"},{"text":"individual from that population. Each population has its own underlying function ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-1.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"which maps contexts to expected payoff","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-2.png","element":"img","alt":"2","inline":true},{"text":". At each time step ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", the algorithm is asked to choose between specific members of each population, represented by the contexts ","element":"span"},{"style":{"height":21.62},"width":39.94,"height":54.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-3.png","element":"img","alt":" xtj","inline":true},{"text":". The quality of an individual is thus ","element":"span"},{"text":"exactly ","element":"span"},{"style":{"height":32.4},"width":274.29,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-4.png","element":"img","alt":" E�rtj�= fj(xtj","inline":true},{"text":"). Our fairness condition translates thus: for any pair of arms ","element":"span"},{"style":{"height":16},"width":225.72,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-5.png","element":"img","alt":" j, j′ at time","inline":true}],[{"style":{"height":22.14},"width":373.92,"height":55.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-6.png","element":"img","alt":"t, if fj(xtj) ≥ fj′(xtj′","inline":true},{"text":"), then an algorithm is said to be discriminatory if it preferentially chooses the ","element":"span"},{"text":"lower quality arm ","element":"span"},{"style":{"height":16},"width":36.47,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-7.png","element":"img","alt":" j′","inline":true},{"text":". Said another way, an algorithm is ","element":"span"},{"style":{"fontStyle":"italic"},"text":"fair ","element":"span"},{"text":"if it guarantees the following: with high probability, over all rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", and for all pairs of arms ","element":"span"},{"style":{"height":22.14},"width":700.6,"height":55.34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-8.png","element":"img","alt":" j, j′, whenever fj(xtj) ≥ fj′(xtj′), the","inline":true,"padRight":true},{"text":"algorithm chooses arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"with probability at least that with which it chooses arm ","element":"span"},{"style":{"height":18.73},"width":62.58,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-9.png","element":"img","alt":" j′3.","inline":true}],[{"text":"It is worth noting that this definition of fairness (formalized in the preliminaries) is entirely consistent with the optimal policy, which can simply choose at each round to play uniformly at random from the arms arg max","element":"span"},{"style":{"height":32.4},"width":191.64,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-10.png","element":"img","alt":"j�E�rtj��","inline":true},{"text":"which maximize the expected reward. This is because – it seems – the goal of fairness as enunciated above is entirely consistent with the goal of maximizing expected reward. Indeed, the fairness constraint exactly states that the algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"cannot ","element":"span"},{"text":"favor low reward arms!","element":"span"}],[{"text":"Our main conceptual result is that this intuition is incorrect in the face of unknown reward functions. ","element":"span"},{"text":"Even though the constraint of fairness is consistent with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"implementing ","element":"span"},{"text":"the optimal policy, it is not necessarily consistent with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"learning ","element":"span"},{"text":"the optimal policy. We show that fairness always has a cost, in terms of the achievable learning rate of the algorithm. For some problems, the cost is mild, but for others, the cost is large.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.2 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Our Results","element":"span"}],[{"text":"We divide our results into two parts. First, we study the classic stochastic multi-armed bandit problem ","element":"span"},{"href":"#id-7","referenceIndex":20,"text":"[Lai and Robbins, ","element":"a"},{"href":"#id-7","referenceIndex":20,"text":"1985, ","element":"a"},{"href":"#id-8","referenceIndex":19,"text":"Katehakis and Robbins, ","element":"a"},{"href":"#id-8","referenceIndex":19,"text":"1995]","element":"a"},{"text":". In this case, there are no contexts, and each arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"has a fixed but unknown average reward ","element":"span"},{"style":{"height":12},"width":38.29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-11.png","element":"img","alt":" µi","inline":true},{"text":". Note that this is a special case of the contextual bandit problem in which the contexts are the same every day. In this setting, our fairness constraint specializes to require that with probability 1 ","element":"span"},{"style":{"height":12.8},"width":65.33,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-12.png","element":"img","alt":" − δ","inline":true},{"text":", for any pair of arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, j ","element":"span"},{"text":"for which ","element":"span"},{"style":{"height":16.22},"width":139.29,"height":40.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-13.png","element":"img","alt":" µi ≥ µj","inline":true},{"text":", at no round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"does the algorithm play arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"with probability higher than that with which it plays arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":". Note that even this special case models interesting scenarios from the point of view of fairness in learning. It models, for example, the case in which choices are made by a loan officer after applicants have been categorized into ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"internally indistinguishable equivalence classes based on their applications.","element":"span"}],[{"text":"Without a fairness constraint, it is known that it is possible to guarantee non-trivial regret to the optimal policy after only ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":") many rounds ","element":"span"},{"href":"#id-9","referenceIndex":5,"text":"[Auer et al., ","element":"a"},{"href":"#id-9","referenceIndex":5,"text":"2002]","element":"a"},{"text":". ","element":"span"},{"text":"In Section ","element":"span"},{"text":"3, ","element":"span"},{"text":"we give an algorithm that satisfies our fairness constraint and is able to guarantee non-trivial regret after ","element":"span"},{"style":{"height":19.13},"width":196.32,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/3-14.png","element":"img","alt":" T = O(k3","inline":true},{"text":") rounds. We then show in Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"that it is not possible to do better – ","element":"span"},{"style":{"fontStyle":"italic"},"text":"any ","element":"span"},{"text":"fair learning algorithm can be forced to endure constant per-round regret for ","element":"span"},{"style":{"height":19.13},"width":363.1,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-0.png","element":"img","alt":" T = Ω(k3) rounds.","inline":true,"padRight":true},{"text":"Thus, we tightly characterize the optimal regret attainable by fair algorithms in this setting, and formally separate it from the regret attainable by algorithms absent a fairness constraint. Note that this already shows a separation between the best possible learning rates for contextual bandit learning with and without the fairness constraint – the stochastic multi-armed bandit problem is a special case of every contextual bandit problem, and for general contextual bandit problems, it is also known how to get non-trivial regret after only ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":") many rounds ","element":"span"},{"href":"#id-10","referenceIndex":2,"text":"[Agarwal et al., ","element":"a"},{"href":"#id-10","referenceIndex":2,"text":"2014, ","element":"a"},{"href":"#id-11","referenceIndex":8,"text":"Beygelzimer et al., ","element":"a"},{"href":"#id-11","referenceIndex":8,"text":"2011, ","element":"a"},{"href":"#id-12","referenceIndex":12,"text":"Chu et al., ","element":"a"},{"href":"#id-12","referenceIndex":12,"text":"2011]","element":"a"},{"text":".","element":"span"}],[{"text":"We then move on to the general contextual bandit setting and prove a broad characterization result, relating fair contextual bandit learning to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"KWIK ","element":"span"},{"text":"learning ","element":"span"},{"href":"#id-13","referenceIndex":22,"text":"[Li et al., ","element":"a"},{"href":"#id-13","referenceIndex":22,"text":"2011]","element":"a"},{"text":". The KWIK model, which stands for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Knows What it Knows ","element":"span"},{"text":"and has a close relationship with reinforcement learning, is a model of sequential supervised classification in which the learning algorithm must be confident in its predictions. Informally, a KWIK learning algorithm receives a sequence of unlabeled examples, whose true labels are defined by some unknown function in a class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":". For each example, the algorithm may either predict a label, or announce “I Don’t Know”. The KWIK requirement is that with high probability, for each example, if the algorithm predicts a label, then its prediction must be very close to the true label. The quality of a KWIK learning algorithm is characterized by its “KWIK bound”, which provides an upper bound on the maximum number of times the algorithm can be forced to announce “I Don’t Know”. For any contextual bandit problem (defined by the set of functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"from which the payoff functions ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-1.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"may be selected), we show that the optimal learning rate of any fair algorithm is determined by the best KWIK bound for the class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":". We prove this constructively – we give a reduction showing how to convert a KWIK learning algorithm into a fair contextual bandit algorithm in Section ","element":"span"},{"text":"5, ","element":"span"},{"text":"and vice versa in Section ","element":"span"},{"text":"6. ","element":"span"},{"text":"Both reductions show that the KWIK bound of the KWIK algorithm is polynomially related to the regret of the fair algorithm.","element":"span"}],[{"text":"This general connection has immediate implications, because it allows us to import known results for KWIK learning ","element":"span"},{"href":"#id-13","referenceIndex":22,"text":"[Li et al., ","element":"a"},{"href":"#id-13","referenceIndex":22,"text":"2011]","element":"a"},{"text":". For example, it implies that some fair contextual bandit problems are ","element":"span"},{"style":{"fontStyle":"italic"},"text":"easy","element":"span"},{"text":", in that there are fair algorithms which can obtain non-trivial regret guarantees after polynomially many rounds. This is the case, for example, for the important linear special case in which the contexts ","element":"span"},{"style":{"height":22.42},"width":145.32,"height":56.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-2.png","element":"img","alt":" xtj ∈ Rd","inline":true,"padRight":true},{"text":"are real valued vectors and the unknown functions ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-3.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"are linear: ","element":"span"},{"style":{"height":22.02},"width":329.76,"height":55.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-4.png","element":"img","alt":"fj(xtj) = ⟨θj, xtj⟩4","inline":true},{"text":". In this case, the KWIK-learnability of noisy linear regression problems ","element":"span"},{"href":"#id-14","referenceIndex":28,"text":"[Strehl ","element":"a"},{"href":"#id-14","referenceIndex":28,"text":"and Littman, ","element":"a"},{"href":"#id-14","referenceIndex":28,"text":"2008, ","element":"a"},{"href":"#id-13","referenceIndex":22,"text":"Li et al., ","element":"a"},{"href":"#id-13","referenceIndex":22,"text":"2011] ","element":"a"},{"text":"implies that we can construct a fair contextual bandit algorithm whose per-round regret is polynomial in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":". Conversely, it also implies that some contextual bandit problems which are easy without the fairness constraint become ","element":"span"},{"style":{"fontStyle":"italic"},"text":"hard ","element":"span"},{"text":"once we impose the fairness constraint, in that any fair algorithm must suffer constant per-round regret for exponentially many rounds. This is the case, for example, when the context consists of boolean vectors ","element":"span"},{"style":{"height":22.42},"width":240.5,"height":56.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-5.png","element":"img","alt":" xtj ∈ {0, 1}d,","inline":true,"padRight":true},{"text":"and the unknown functions ","element":"span"},{"style":{"height":20.15},"width":716.45,"height":50.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-6.png","element":"img","alt":" fj : {0, 1}d → {0, 1} are conjunctions","inline":true,"padRight":true},{"text":"– the “and”s of some unknown set of features","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/4-7.png","element":"img","alt":"5","inline":true},{"text":". The impossibility of non-trivial KWIK-learning of conjunctions ","element":"span"},{"href":"#id-15","referenceIndex":21,"text":"[Li, ","element":"a"},{"href":"#id-15","referenceIndex":21,"text":"2009, ","element":"a"},{"href":"#id-13","referenceIndex":22,"text":"Li et al., ","element":"a"},{"href":"#id-13","referenceIndex":22,"text":"2011] ","element":"a"},{"text":"implies that no fair learner in the contextual bandit setting can achieve non-trivial regret before exponentially many (in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":") rounds.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"1.3 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Other Related Work","element":"span"}],[{"text":"Several papers study the problem of fairness in machine learning. One line of work aims to give algorithms for batch classification which achieve ","element":"span"},{"style":{"fontStyle":"italic"},"text":"group fairness ","element":"span"},{"text":"otherwise known as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"equality of outcomes","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"statistical parity ","element":"span"},{"text":"– or algorithms that avoid ","element":"span"},{"style":{"fontStyle":"italic"},"text":"disparate impact ","element":"span"},{"text":"(see e.g. ","element":"span"},{"href":"#id-16","referenceIndex":11,"text":"Calders and Verwer ","element":"a"},{"href":"#id-16","referenceIndex":11,"text":"[2010]","element":"a"},{"text":", ","element":"span"},{"href":"#id-17","referenceIndex":23,"text":"Luong et al. ","element":"a"},{"href":"#id-17","referenceIndex":23,"text":"[2011]","element":"a"},{"text":", ","element":"span"},{"href":"#id-18","referenceIndex":18,"text":"Kamishima et al. ","element":"a"},{"href":"#id-18","referenceIndex":18,"text":"[2011]","element":"a"},{"text":", ","element":"span"},{"href":"#id-19","referenceIndex":15,"text":"Feldman et al. ","element":"a"},{"href":"#id-19","referenceIndex":15,"text":"[2015]","element":"a"},{"text":", ","element":"span"},{"href":"#id-20","referenceIndex":16,"text":"Fish et al. ","element":"a"},{"href":"#id-20","referenceIndex":16,"text":"[2016] ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-21","referenceIndex":1,"text":"Adler ","element":"a"},{"href":"#id-21","referenceIndex":1,"text":"et al. ","element":"a"},{"href":"#id-21","referenceIndex":1,"text":"[2016] ","element":"a"},{"text":"for a study of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"auditing ","element":"span"},{"text":"existing algorithms for disparate impact). While statistical parity is sometimes a desirable goal – indeed, it is sometimes required by law – as observed by ","element":"span"},{"href":"#id-22","referenceIndex":14,"text":"Dwork et al. ","element":"a"},{"href":"#id-22","referenceIndex":14,"text":"[2012] ","element":"a"},{"text":"and others, it suffers from two problems. First, if different populations indeed have different statistical properties, then it can be at odds with accurate classification. Second, even in cases when statistical parity is attainable with an optimal classifier, it does not prevent discrimination at an individual level – see ","element":"span"},{"href":"#id-22","referenceIndex":14,"text":"Dwork et al. ","element":"a"},{"href":"#id-22","referenceIndex":14,"text":"[2012] ","element":"a"},{"text":"for a catalog of ways in which statistical parity can be insufficient from the perspective of fairness. In contrast, we study a notion aimed at guaranteeing fairness at the individual level.","element":"span"}],[{"text":"Our definition of fairness is most closely related to that of ","element":"span"},{"href":"#id-22","referenceIndex":14,"text":"Dwork et al. ","element":"a"},{"href":"#id-22","referenceIndex":14,"text":"[2012]","element":"a"},{"text":", who proposed and explored the basic properties of a technical definition of individual fairness formalizing the idea that “similar individuals should be treated similarly”. Specifically, their work presupposes the existence of a task-specific metric on individuals, and proposes that fair algorithms should satisfy a Lipschitz condition with respect to this metric. ","element":"span"},{"text":"Our definition of fairness is similar, in that the expected reward of each arm is a natural metric through which we define fairness. The main conceptual distinction between our work and ","element":"span"},{"href":"#id-22","referenceIndex":14,"text":"Dwork et al. ","element":"a"},{"href":"#id-22","referenceIndex":14,"text":"[2012] ","element":"a"},{"text":"is that their work operates under the assumption that the metric is known to the algorithm designer, and hence in their setting, the fairness constraint binds only insofar as it is in conflict with the desired outcome of the algorithm designer. The most challenging aspect of this approach (as they acknowledge) is that it requires that some third party design a “fair” metric on individuals, which in a sense encodes much of the relevant challenge. The question of how to design such a metric was considered by ","element":"span"},{"href":"#id-23","referenceIndex":30,"text":"Zemel et al. ","element":"a"},{"href":"#id-23","referenceIndex":30,"text":"[2013]","element":"a"},{"text":", who study methods to learn representations that encode the data, while obscuring protected attributes. Our fairness constraint, conversely, is entirely aligned with the goal of the algorithm designer in that it is satisfied by the optimal policy; nevertheless, it affects the space of feasible learning algorithms, because it interferes with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"learning ","element":"span"},{"text":"an optimal policy, which depends on the unknown reward functions.","element":"span"}],[{"text":"At a technical level, our work is related to ","element":"span"},{"href":"#id-24","referenceIndex":3,"text":"Amin et al. ","element":"a"},{"href":"#id-24","referenceIndex":3,"text":"[2012] ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-25","referenceIndex":4,"text":"Amin et al. ","element":"a"},{"href":"#id-25","referenceIndex":4,"text":"[2013]","element":"a"},{"text":", which also relate KWIK learning to bandit learning in a different context, unrelated to fairness (when the arm space is very large).","element":"span"}]]},{"heading":"2 Preliminaries","paragraphs":[[{"text":"We study the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"contextual bandit ","element":"span"},{"text":"setting, which is defined by a domain ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X","element":"span"},{"text":", a set of “arms” [","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"] := ","element":"span"},{"style":{"fontStyle":"italic"},"text":"{","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , k","element":"span"},{"style":{"fontStyle":"italic"},"text":"} ","element":"span"},{"text":"and a class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"of functions of the form ","element":"span"},{"style":{"height":17.6},"width":213.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-0.png","element":"img","alt":" f : X → [0,","inline":true,"padRight":true},{"text":"1]. For each arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"there is some function ","element":"span"},{"style":{"height":17.42},"width":132.1,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-1.png","element":"img","alt":"fj ∈ C","inline":true},{"text":", unknown to the learner. In rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"= 1","element":"span"},{"style":{"fontStyle":"italic"},"text":", . . . , T","element":"span"},{"text":", an adversary reveals to the algorithm a ","element":"span"},{"style":{"height":21.62},"width":192.68,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-2.png","element":"img","alt":"context xtj ","inline":true,"padRight":true},{"text":"for each arm","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-3.png","element":"img","alt":"6","inline":true},{"text":". An algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"then chooses an arm ","element":"span"},{"style":{"height":14.62},"width":27.03,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-4.png","element":"img","alt":" it","inline":true},{"text":", and observes stochastic reward ","element":"span"},{"style":{"height":20.82},"width":42.22,"height":52.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-5.png","element":"img","alt":"rtit ","inline":true,"padRight":true},{"text":"for the arm it chose. We assume ","element":"span"},{"style":{"height":32.4},"width":442.02,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-6.png","element":"img","alt":" rtj ∼ Dtj, E�rtj�= fj(xtj","inline":true},{"text":"), for some distribution ","element":"span"},{"style":{"height":21.62},"width":258.47,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-7.png","element":"img","alt":" Dtj over [0, 1].","inline":true}],[{"text":"Let Π be the set of policies mapping contexts to distributions over arms ","element":"span"},{"style":{"height":18.33},"width":416.51,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-8.png","element":"img","alt":" Xk → ∆k, and π∗ the","inline":true,"padRight":true},{"text":"optimal policy which selects a distribution over arms as a function of contexts to maximize the expected reward of those arms. The ","element":"span"},{"style":{"fontWeight":"bold"},"text":"pseudo-regret ","element":"span"},{"text":"of an algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"on contexts ","element":"span"},{"style":{"height":18.33},"width":237.48,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/5-9.png","element":"img","alt":" x1, . . . , xT is","inline":true}],[{"text":"defined as follows, where ","element":"span"},{"style":{"height":17.93},"width":296.78,"height":44.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-0.png","element":"img","alt":" πt represents A","inline":true},{"text":"’s distribution on arms at round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"69%"},"width":1294,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-1.png","element":"img"}],[{"text":"We hereafter refer to this as the ","element":"span"},{"style":{"fontWeight":"bold"},"text":"regret ","element":"span"},{"text":"of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":". The optimal policy ","element":"span"},{"style":{"height":12.73},"width":43.44,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-2.png","element":"img","alt":" π∗ ","inline":true,"padRight":true},{"text":"pulls arms with highest expectation at each round, so:","element":"span"}],[{"style":{"width":"62%"},"width":1172,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-3.png","element":"img"}],[{"text":"We say that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"satisfies regret bound ","element":"span"},{"style":{"height":21.29},"width":879.01,"height":53.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-4.png","element":"img","alt":" R(T) if maxx1,...,xT Regret(x1, . . . , xt) ≤ R(T).","inline":true}],[{"text":"Let the history ","element":"span"},{"style":{"height":24.4},"width":492.05,"height":61.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-5.png","element":"img","alt":" ht ∈�X k × [k] × [0, 1]�t−1","inline":true,"padRight":true},{"text":"be a record of ","element":"span"},{"style":{"height":11.6},"width":60.16,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-6.png","element":"img","alt":" t −","inline":true,"padRight":true},{"text":"1 rounds experienced by ","element":"span"},{"style":{"height":16},"width":155.32,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-7.png","element":"img","alt":" A, t − 1","inline":true,"padRight":true},{"text":"3-tuples which encode for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"the realization of the contexts, arm chosen, and reward observed. We write ","element":"span"},{"style":{"height":23.56},"width":80.34,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-8.png","element":"img","alt":" πtj|ht","inline":true,"padRight":true},{"text":"to denote the probability that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"chooses arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"after observing contexts ","element":"span"},{"style":{"height":18.33},"width":167.54,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-9.png","element":"img","alt":" xt, given","inline":true},{"style":{"height":14.73},"width":37.14,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-10.png","element":"img","alt":"ht","inline":true},{"text":". For notational simplicity, we will often drop the superscript ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"on the history when referring to the distribution over arms: ","element":"span"},{"style":{"height":23.56},"width":235.45,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-11.png","element":"img","alt":" πtj|h := πtj|ht.","inline":true}],[{"text":"We now define what it means for a contextual bandit algorithm to be ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-12.png","element":"img","alt":" δ","inline":true},{"text":"-fair with respect to its arms. Informally, this will mean that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"will play arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"with higher probability than arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"only if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"has higher mean than ","element":"span"},{"style":{"height":17.6},"width":525.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-13.png","element":"img","alt":" j in round t, for all i, j ∈ [k","inline":true},{"text":"], and in all rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":".","element":"span"}],[{"id":"id-26","style":{"height":17.6},"width":640.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-14.png","element":"img","alt":"Definition 1 (δ-fair). A is δ-fair","inline":true,"padRight":true},{"text":"if, for all sequences of contexts ","element":"span"},{"style":{"height":17.93},"width":177.78,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-15.png","element":"img","alt":" x1, . . . , xt ","inline":true,"padRight":true},{"text":"and all payoff distributions ","element":"span"},{"style":{"height":19.65},"width":201.22,"height":49.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-16.png","element":"img","alt":" Dt1, . . . , Dtk","inline":true},{"text":", with probability at least 1 ","element":"span"},{"style":{"height":12.8},"width":62.01,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-17.png","element":"img","alt":" − δ","inline":true,"padRight":true},{"text":"over the realization of the history ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h","element":"span"},{"text":", for all rounds ","element":"span"},{"style":{"height":17.6},"width":112.21,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-18.png","element":"img","alt":"t ∈ [T","inline":true},{"text":"] and all pairs of arms ","element":"span"},{"style":{"height":17.6},"width":182.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-19.png","element":"img","alt":" j, j′ ∈ [k],","inline":true}],[{"style":{"width":"36%"},"width":680,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-20.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"1","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Definition ","element":"span"},{"href":"#id-26","text":"1 ","element":"a"},{"text":"prohibits favoring lower payoff arms over higher payoff arms. One relaxed definition only requires that ","element":"span"},{"style":{"height":23.56},"width":624.42,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-21.png","element":"img","alt":" πtj|h = πtj′|h when fj(xtj) = fj′(xtj′","inline":true},{"text":") – requiring only ","element":"span"},{"style":{"fontStyle":"italic"},"text":"identical ","element":"span"},{"text":"individuals ","element":"span"},{"text":"(concerning expected payoff) be treated identically. This relaxation is a special case of ","element":"span"},{"href":"#id-22","referenceIndex":14,"text":"Dwork et al. ","element":"a"},{"href":"#id-22","referenceIndex":14,"text":"[2012]","element":"a"},{"text":"’s proposed family of definitions, which require that “similar individuals be treated similarly”. We use Definition ","element":"span"},{"href":"#id-26","text":"1 ","element":"a"},{"text":"as it is better motivated in its implications for fair treatment of individuals, but all of our results – including our lower bounds – apply also to this relaxation.","element":"span"}],[{"style":{"fontWeight":"bold"},"text":"KWIK learning ","element":"span"},{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"be an algorithm which takes as input a sequence of examples ","element":"span"},{"style":{"height":18.33},"width":204.19,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-22.png","element":"img","alt":" x1, . . . , xT ,","inline":true,"padRight":true},{"text":"and when given some ","element":"span"},{"style":{"height":15.13},"width":148.09,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-23.png","element":"img","alt":" xt ∈ X","inline":true},{"text":", outputs either a prediction ˆ","element":"span"},{"style":{"height":18.73},"width":156.04,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-24.png","element":"img","alt":"yt ∈ [0,","inline":true,"padRight":true},{"text":"1] or else outputs ˆ","element":"span"},{"style":{"height":18.33},"width":160.89,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-25.png","element":"img","alt":"yt = ⊥,","inline":true,"padRight":true},{"text":"representing “I don’t know”. When ˆ","element":"span"},{"style":{"height":18.33},"width":190.2,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-26.png","element":"img","alt":"yt = ⊥, B","inline":true,"padRight":true},{"text":"receives feedback ","element":"span"},{"style":{"height":20.8},"width":626.68,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-27.png","element":"img","alt":" yt such that E�yt�= f(xt). B is","inline":true,"padRight":true},{"text":"an (","element":"span"},{"style":{"height":15.6},"width":57.14,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-28.png","element":"img","alt":"ϵ, δ","inline":true},{"text":")-KWIK learning algorithm for ","element":"span"},{"style":{"height":17.6},"width":222,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-29.png","element":"img","alt":" C : X → [0,","inline":true,"padRight":true},{"text":"1], with KWIK bound ","element":"span"},{"style":{"height":17.6},"width":112.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-30.png","element":"img","alt":" m(ϵ, δ","inline":true},{"text":") if for any sequence of examples ","element":"span"},{"style":{"height":17.93},"width":177.34,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-31.png","element":"img","alt":" x1, x2, . . .","inline":true,"padRight":true},{"text":"and any target ","element":"span"},{"style":{"height":16.4},"width":113.39,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-32.png","element":"img","alt":" f ∈ C","inline":true},{"text":", with probability at least 1 ","element":"span"},{"style":{"height":15.6},"width":191.88,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-33.png","element":"img","alt":" − δ, both:","inline":true}],[{"text":"1. Its numerical predictions are accurate: for all ","element":"span"},{"style":{"height":18.73},"width":739.81,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-34.png","element":"img","alt":" t, ˆyt ∈ {⊥} ∪ [f(xt) − ϵ, f(xt) + ϵ], and","inline":true}],[{"text":"2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"rarely outputs “I Don’t Know”: ","element":"span"},{"style":{"height":20.8},"width":501.34,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-35.png","element":"img","alt":"�∞t=1 I�ˆyt = ⊥�≤ m(ϵ, δ).","inline":true}],[{"style":{"fontWeight":"bold"},"text":"2.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Specializing to Classic Stochastic Bandits","element":"span"}],[{"text":"In Sections ","element":"span"},{"text":"3 ","element":"span"},{"text":"and ","element":"span"},{"text":"4, ","element":"span"},{"text":"we study the classic stochastic bandit problem, an important special case of the contextual bandit setting described above. Here we specialize our notation to this setting, in which there are no contexts. For each arm ","element":"span"},{"style":{"height":17.6},"width":110.53,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-36.png","element":"img","alt":" j ∈ [k","inline":true},{"text":"], there is an unknown distribution ","element":"span"},{"style":{"height":20.49},"width":247.31,"height":51.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/6-37.png","element":"img","alt":" Dj over [0, 1]","inline":true,"padRight":true},{"text":"with unknown mean ","element":"span"},{"style":{"height":13.02},"width":41.3,"height":32.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-0.png","element":"img","alt":" µj","inline":true},{"text":". A learning algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"chooses an arm ","element":"span"},{"style":{"height":15.02},"width":243.71,"height":37.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-1.png","element":"img","alt":" it in round t","inline":true},{"text":", and observes the reward ","element":"span"},{"style":{"height":20.82},"width":163.52,"height":52.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-2.png","element":"img","alt":" rtit ∼ Dit ","inline":true,"padRight":true},{"text":"for the arm that it chose. Let ","element":"span"},{"style":{"height":17.6},"width":124.95,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-3.png","element":"img","alt":" i∗ ∈ [k","inline":true},{"text":"] be the arm with highest expected reward: ","element":"span"},{"style":{"height":19.73},"width":356.09,"height":49.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-4.png","element":"img","alt":"i∗ ∈ arg maxi∈[k] µi","inline":true},{"text":". The pseudo-regret of an algorithm ","element":"span"},{"style":{"height":18.12},"width":311.22,"height":45.29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-5.png","element":"img","alt":" A on D1, . . . , Dk","inline":true,"padRight":true},{"text":"is now just:","element":"span"}],[{"style":{"width":"52%"},"width":989,"height":158,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-6.png","element":"img"}],[{"text":"Let ","element":"span"},{"style":{"height":20.84},"width":376.81,"height":52.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-7.png","element":"img","alt":" ht ∈ ([k] × [0, 1])t−1 ","inline":true,"padRight":true},{"text":"denote a record of the ","element":"span"},{"style":{"height":11.6},"width":60.59,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-8.png","element":"img","alt":" t −","inline":true,"padRight":true},{"text":"1 rounds experienced by the algorithm so far, represented by ","element":"span"},{"style":{"height":11.6},"width":60.06,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-9.png","element":"img","alt":" t −","inline":true,"padRight":true},{"text":"1 2-tuples encoding the previous arms chosen and rewards observed. We write ","element":"span"},{"style":{"height":23.56},"width":80.34,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-10.png","element":"img","alt":"πtj|ht","inline":true,"padRight":true},{"text":"to denote the probability that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"chooses arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"given history ","element":"span"},{"style":{"height":14.73},"width":37.14,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-11.png","element":"img","alt":" ht","inline":true},{"text":". Again, we will often drop ","element":"span"},{"text":"the superscript ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"on the history when referring to the distribution over arms: ","element":"span"},{"style":{"height":23.56},"width":235.45,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-12.png","element":"img","alt":" πtj|h := πtj|ht.","inline":true}],[{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-13.png","element":"img","alt":"δ","inline":true},{"text":"-fairness in the classic bandit setting specializes as follows:","element":"span"}],[{"style":{"height":17.6},"width":301.97,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-14.png","element":"img","alt":"Definition 2 (δ","inline":true},{"text":"-fairness in the classic bandits setting)","element":"span"},{"style":{"height":14},"width":223.67,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-15.png","element":"img","alt":". A is δ-fair","inline":true,"padRight":true},{"text":"if, for all distributions ","element":"span"},{"style":{"height":17.31},"width":215.7,"height":43.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-16.png","element":"img","alt":" D1, . . . , Dk,","inline":true,"padRight":true},{"text":"with probability at least 1 ","element":"span"},{"style":{"height":12.8},"width":63.64,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-17.png","element":"img","alt":" − δ","inline":true,"padRight":true},{"text":"over the history ","element":"span"},{"style":{"height":17.6},"width":646.85,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-18.png","element":"img","alt":" h, for all t ∈ [T] and all j, j′ ∈ [k]:","inline":true}],[{"style":{"width":"28%"},"width":527,"height":60,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-19.png","element":"img"}]]},{"heading":"3 Fair Classic Stochastic Bandits: An Algorithm","paragraphs":[[{"text":"In this section, we describe a simple and intuitive modification of the standard UCB algorithm ","element":"span"},{"href":"#id-9","referenceIndex":5,"text":"[Auer ","element":"a"},{"href":"#id-9","referenceIndex":5,"text":"et al., ","element":"a"},{"href":"#id-9","referenceIndex":5,"text":"2002]","element":"a"},{"text":", called ","element":"span"},{"text":"FairBandits","element":"span"},{"text":", prove that it is fair, and analyze its regret bound. The algorithm and its analysis highlight a key idea that is important to the design of fair algorithms in this setting: that of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"chaining ","element":"span"},{"text":"confidence intervals. Intuitively, as a ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-20.png","element":"img","alt":" δ","inline":true},{"text":"-fair algorithm explores different arms it must play two arms ","element":"span"},{"style":{"height":16.4},"width":177.24,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-21.png","element":"img","alt":" j1 and j2","inline":true,"padRight":true},{"text":"with equal probability until it has sufficient data to deduce, with confidence 1 ","element":"span"},{"style":{"height":12.8},"width":67.46,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-22.png","element":"img","alt":" − δ","inline":true},{"text":", either that ","element":"span"},{"style":{"height":14.62},"width":190.84,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-23.png","element":"img","alt":" µj1 > µj2","inline":true,"padRight":true},{"text":"or vice versa. ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"does this by maintaining empirical estimates of the means of both arms, together with confidence intervals around those means. To be safe, the algorithm must play the arms with equal probability while their confidence intervals overlap. The same reasoning applies simultaneously to every pair of arms. Thus, if the confidence intervals of each pair of arms ","element":"span"},{"style":{"height":16.4},"width":214.77,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-24.png","element":"img","alt":" ji and ji+1","inline":true,"padRight":true},{"text":"overlap for each ","element":"span"},{"style":{"height":17.6},"width":121.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-25.png","element":"img","alt":" i ∈ [k","inline":true},{"text":"], the algorithm is forced to play ","element":"span"},{"style":{"fontStyle":"italic"},"text":"all ","element":"span"},{"text":"arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"with equal probability. This is the case even if the confidence intervals around arm ","element":"span"},{"style":{"height":16.4},"width":257.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-26.png","element":"img","alt":" jk and arm j1","inline":true,"padRight":true},{"text":"are far from overlapping – i.e. when the algorithm can be confident that ","element":"span"},{"style":{"height":14.62},"width":189.05,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-27.png","element":"img","alt":"µj1 > µjk.","inline":true}],[{"text":"This approach initially seems naive: in an attempt to achieve fairness, it seems overly conservative when ruling out arms, and can be forced to play arms uniformly at random for long periods of time. This is reflected in its regret bound, which is only non-trivial after ","element":"span"},{"style":{"height":15.93},"width":144.46,"height":39.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-28.png","element":"img","alt":" T ≫ k3","inline":true},{"text":", whereas the UCB algorithm ","element":"span"},{"href":"#id-9","referenceIndex":5,"text":"[Auer et al., ","element":"a"},{"href":"#id-9","referenceIndex":5,"text":"2002] ","element":"a"},{"text":"achieves non-trivial regret after ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"= ","element":"span"},{"style":{"fontStyle":"italic"},"text":"O","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":") rounds. However, our lower bound in Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"shows that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"any ","element":"span"},{"text":"fair algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"must ","element":"span"},{"text":"suffer constant per-round regret for ","element":"span"},{"style":{"height":15.93},"width":140.53,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-29.png","element":"img","alt":"T ≫ k3 ","inline":true,"padRight":true},{"text":"rounds on some instances.","element":"span"}],[{"text":"We now give an overview of the behavior of ","element":"span"},{"text":"FairBandits","element":"span"},{"text":". At every round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"identifies the arm ","element":"span"},{"style":{"height":19.22},"width":299.74,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-30.png","element":"img","alt":" it∗ = arg maxi uti ","inline":true,"padRight":true},{"text":"that has the largest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"upper ","element":"span"},{"text":"confidence interval amongst the active ","element":"span"},{"text":"arms. At each round ","element":"span"},{"style":{"height":21.62},"width":1458.77,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-31.png","element":"img","alt":" t, we say i is linked to j if [ℓti, uti] ∩ [ℓtj, utj] ̸= ∅, and i is chained to j if i and","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"are in the same component of the transitive closure of the linked relation. ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"plays uniformly at random among all active arms chained to arm ","element":"span"},{"style":{"height":18.65},"width":45.96,"height":46.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/7-32.png","element":"img","alt":" it∗.","inline":true}],[{"text":"Initially, the active set contains all arms. The active set of arms at each subsequent round is defined to be the set of arms that are chained to the arm with highest upper confidence bound at the previous round. The algorithm can be confident that arms that have become unchained to the arm with the highest upper confidence bound at any round have means that are lower than the means of any chained arms, and hence such arms can be safely removed from the active set, never to be played again. This has the useful property that the active set of arms can only shrink: at any round ","element":"span"},{"style":{"height":15.6},"width":223.92,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-0.png","element":"img","alt":" t, St ⊆ St−1","inline":true},{"text":"; see Figure ","element":"span"},{"href":"#id-27","text":"1 ","element":"a"},{"text":"for an example of active set evolution over time.","element":"span"}],[{"style":{"width":"100%"},"width":1872,"height":910,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-1.png","element":"img"}],[{"text":"We first observe that with probability 1 ","element":"span"},{"style":{"height":12.8},"width":68.19,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-2.png","element":"img","alt":" − δ","inline":true},{"text":", all of the confidence intervals maintained by ","element":"span"},{"style":{"height":17.6},"width":318.63,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-3.png","element":"img","alt":"FairBandits (δ","inline":true},{"text":") contain the true means of their respective arms over all rounds. We prove this claim, along with all other claims in this section without proofs, in Appendix ","element":"span"},{"text":"A.","element":"span"}],[{"id":"id-28","style":{"fontWeight":"bold"},"text":"Lemma 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"With probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":12.8},"width":63.64,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-4.png","element":"img","alt":" − δ","inline":true},{"style":{"fontStyle":"italic"},"text":", for every arm ","element":"span"},{"style":{"height":19.23},"width":511.9,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-5.png","element":"img","alt":" i and round t ℓti ≤ µi ≤ uti.","inline":true}],[{"text":"The fairness of ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"follows almost immediately from this guarantee.","element":"span"}],[{"style":{"width":"41%"},"width":784,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-6.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"By Lemma ","element":"span"},{"href":"#id-28","text":"1, ","element":"a"},{"text":"with probability at least 1","element":"span"},{"style":{"height":12.8},"width":58.9,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-7.png","element":"img","alt":"−δ","inline":true,"padRight":true},{"text":"all confidence intervals contain their true means across all rounds. Thus, with probability 1","element":"span"},{"style":{"height":12.8},"width":59.62,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-8.png","element":"img","alt":"−δ","inline":true},{"text":", at every round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", for every ","element":"span"},{"style":{"height":18.73},"width":246.33,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-9.png","element":"img","alt":" i ∈ St, j /∈ St","inline":true},{"text":", it must be that ","element":"span"},{"style":{"height":14.62},"width":142.93,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-10.png","element":"img","alt":" µj < µi","inline":true,"padRight":true},{"text":"– the arms not in the active set have strictly smaller means than those in the active set; if not, ","element":"span"},{"style":{"height":21.62},"width":537.9,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-11.png","element":"img","alt":" utj ≥ µj ≥ µi ≥ ℓti implies j","inline":true,"padRight":true},{"text":"would be chained to ","element":"span"},{"style":{"height":18.65},"width":107.36,"height":46.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-12.png","element":"img","alt":" it∗ if i","inline":true,"padRight":true},{"text":"is. Finally, all arms in ","element":"span"},{"style":{"height":14.73},"width":118.89,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-13.png","element":"img","alt":" St are","inline":true,"padRight":true},{"text":"played uniformly at random – but since all such arms are played with the same probability, this does not cause the fairness constraint to bind for any pair ","element":"span"},{"style":{"height":17.54},"width":164.57,"height":43.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-14.png","element":"img","alt":" i, i′ ∈ St","inline":true},{"text":", for any realization of ","element":"span"},{"style":{"height":12.89},"width":101.5,"height":32.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-15.png","element":"img","alt":" µi, µ′i","inline":true,"padRight":true},{"text":"which lie within their confidence intervals.","element":"span"}],[{"text":"Next, we upper bound the regret of ","element":"span"},{"text":"FairBandits","element":"span"},{"text":".","element":"span"}],[{"id":"id-29","style":{"height":19.98},"width":903.6,"height":49.94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-16.png","element":"img","alt":"Theorem 2. If δ < 1/√T, then FairBandits","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has regret","element":"span"}],[{"style":{"width":"27%"},"width":518,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/8-17.png","element":"img"}],[{"id":"id-27","style":{"width":"44%"},"width":835,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-0.png","element":"img"}],[{"text":"Figure 1: Confidence intervals over time for the lower bound instance outlined in Section ","element":"figcaption","subtype":"caption"},{"text":"4 ","element":"span","subtype":"caption"},{"text":"for ","element":"figcaption","subtype":"caption"},{"style":{"fontStyle":"italic"},"text":"k ","element":"figcaption","subtype":"caption"},{"text":"= 10. Lines correspond to upper and lower confidence bounds for each arm and cut off at the round in which the arm leaves the active set.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"45%"},"width":859,"height":214,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-1.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"2","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"Before proving Theorem ","element":"span"},{"href":"#id-29","text":"2, ","element":"a"},{"text":"we highlight two points. First, this bound becomes non-trivial (i.e. the average per-round regret is ","element":"span"},{"style":{"height":19.13},"width":348.89,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-2.png","element":"img","alt":" ≪ 1) for T = Ω(k3","inline":true},{"text":"). As we show in the next section, it is not possible to improve on this. Second, the bound may appear to have suboptimal dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"when compared to unconstrained regret bounds (where the dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is often described as logarithmic). However, it is known that Ω","element":"span"},{"style":{"height":31.6},"width":513.74,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-3.png","element":"img","alt":"�√kT�regret is necessary","inline":true,"padRight":true},{"text":"even in the unrestricted setting (without fairness) if one does not make data-specific assumptions on an instance ","element":"span"},{"href":"#id-30","referenceIndex":9,"text":"[Bubeck ","element":"a"},{"href":"#id-30","referenceIndex":9,"text":"and Cesa-Bianchi, ","element":"a"},{"href":"#id-30","referenceIndex":9,"text":"2012] ","element":"a"},{"text":"(e.g. that there is a lower bound on the gap between the best and second best arm). It would be possible to state a logarithmic dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"in our setting as well while making assumptions on the gaps between arms, but since our fairness constraint manifests itself as a cost that depends on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", we choose for clarity to avoid such assumptions. Without such assumptions, our dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is also optimal.","element":"span"}],[{"text":"We now prove Theorem ","element":"span"},{"href":"#id-29","text":"2. ","element":"a"},{"text":"Lemma ","element":"span"},{"href":"#id-31","text":"2 ","element":"a"},{"text":"upper bounds the probability any arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"active in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"has been pulled substantially fewer times than its expectation, i.e. ","element":"span"},{"style":{"height":20.89},"width":144.41,"height":52.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-4.png","element":"img","alt":" nti ≪ tk","inline":true},{"text":". Lemma ","element":"span"},{"href":"#id-32","text":"3 ","element":"a"},{"text":"upper ","element":"span"},{"text":"bounds the width of any confidence interval used by ","element":"span"},{"style":{"height":17.6},"width":601.49,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-5.png","element":"img","alt":" FairBandits in round t by η(t","inline":true},{"text":"), conditioned on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"being pulled the number of times guaranteed by Lemma ","element":"span"},{"href":"#id-31","text":"2. ","element":"a"},{"text":"Finally, we stitch this together to prove Theorem ","element":"span"},{"href":"#id-29","text":"2 ","element":"a"},{"text":"by upper bounding the total regret incurred for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"rounds by noticing that the regret of any arm active in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"is at most ","element":"span"},{"style":{"height":17.6},"width":109.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-6.png","element":"img","alt":" kη(t).","inline":true}],[{"text":"We begin by lower bounding the probability that any arm active in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"has been pulled substantially fewer times than its expectation.","element":"span"}],[{"id":"id-31","style":{"fontWeight":"bold"},"text":"Lemma 2. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"With probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":22.95},"width":111.98,"height":57.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-7.png","element":"img","alt":" − δ2t2 ,","inline":true}],[{"style":{"width":"26%"},"width":497,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-8.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"for all ","element":"span"},{"style":{"height":15.13},"width":109.64,"height":37.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-9.png","element":"img","alt":" i ∈ St ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"(for all active arms in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"style":{"fontStyle":"italic"},"text":").","element":"span"}],[{"text":"We now use this lower bound on the number of pulls of active arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"to upper-bound ","element":"span"},{"style":{"height":17.6},"width":56.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/9-10.png","element":"img","alt":"η(t","inline":true},{"text":"), an upper bound on the confidence interval width ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"uses for any active arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"8%"},"width":159,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-0.png","element":"img"}],[{"id":"id-32","style":{"fontWeight":"bold"},"text":"Lemma 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider any round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and any arm ","element":"span"},{"style":{"height":15.13},"width":109.64,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-1.png","element":"img","alt":" i ∈ St","inline":true},{"style":{"fontStyle":"italic"},"text":". Condition on ","element":"span"},{"style":{"height":20.89},"width":170.34,"height":52.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-2.png","element":"img","alt":" nti ≥ tk −","inline":true}],[{"style":{"width":"38%"},"width":729,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-3.png","element":"img"}],[{"text":"Finally, we prove the bound on the total regret of the algorithm, using the bound on the width of any active arm’s confidence interval in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"provided by Lemma ","element":"span"},{"href":"#id-32","text":"3.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-29","style":{"fontStyle":"italic"},"text":"2. ","element":"a"},{"text":"We condition on ","element":"span"},{"style":{"height":19.22},"width":403.59,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-4.png","element":"img","alt":" µi ∈ [ℓti, uti] for all i, t","inline":true},{"text":". This occurs with probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":12.8},"width":61.86,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-5.png","element":"img","alt":" − δ","inline":true},{"text":", by Lemma ","element":"span"},{"href":"#id-28","text":"1. ","element":"a"},{"text":"We claim that this implies that arm ","element":"span"},{"style":{"height":14.62},"width":32.03,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-6.png","element":"img","alt":" i∗","inline":true,"padRight":true},{"text":"with highest expected reward is always in the active set. This follows from the fact that ","element":"span"},{"style":{"height":21.62},"width":935.52,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-7.png","element":"img","alt":" µi∗ ∈ [ℓti∗, uti∗] and µj ∈ [ℓtj, utj] for all j, t; thus, if","inline":true},{"style":{"height":14.62},"width":158.37,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-8.png","element":"img","alt":"µi∗ > µj","inline":true},{"text":", it must be that ","element":"span"},{"style":{"height":21.62},"width":148.94,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-9.png","element":"img","alt":" uti∗ ≥ ℓtj","inline":true},{"text":". Thus, this holds for ","element":"span"},{"style":{"height":18.65},"width":32.03,"height":46.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-10.png","element":"img","alt":" it∗","inline":true},{"text":", the arm with highest upper confidence ","element":"span"},{"text":"bound in round ","element":"span"},{"style":{"height":15.2},"width":128.03,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-11.png","element":"img","alt":" t, so i∗","inline":true,"padRight":true},{"text":"must be chained to ","element":"span"},{"style":{"height":18.65},"width":411.06,"height":46.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-12.png","element":"img","alt":" it∗ in round t for all t.","inline":true}],[{"text":"We further condition on the event that for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j, t","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"25%"},"width":478,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-13.png","element":"img"}],[{"text":"which holds with probability at least 1 ","element":"span"},{"style":{"height":22.49},"width":83.72,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-14.png","element":"img","alt":" − πδ2 ","inline":true,"padRight":true},{"text":"by Lemma ","element":"span"},{"href":"#id-31","text":"2 ","element":"a"},{"text":"and a union bound over all times ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". This ","element":"span"},{"text":"implies that, for all rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", for every active arm ","element":"span"},{"style":{"height":18.33},"width":115.07,"height":45.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-15.png","element":"img","alt":" j ∈ St","inline":true},{"text":", Lemma ","element":"span"},{"href":"#id-32","text":"3 ","element":"a"},{"text":"applies, and therefore","element":"span"}],[{"style":{"width":"14%"},"width":271,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-16.png","element":"img"}],[{"text":"Finally, we upper-bound the per-round regret of pulling any active arm ","element":"span"},{"style":{"height":16.95},"width":497.05,"height":42.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-17.png","element":"img","alt":" i ∈ St at round t. Since i∗","inline":true,"padRight":true},{"text":"is active, any ","element":"span"},{"style":{"height":15.13},"width":114.14,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-18.png","element":"img","alt":" i ∈ St ","inline":true,"padRight":true},{"text":"is chained to arm ","element":"span"},{"style":{"height":14.62},"width":32.03,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-19.png","element":"img","alt":" i∗","inline":true},{"text":". Since all active arms have confidence interval width at most ","element":"span"},{"style":{"height":17.6},"width":187.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-20.png","element":"img","alt":" η(t) and i","inline":true,"padRight":true},{"text":"must be chained using at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"arms’ confidence intervals, we have that","element":"span"}],[{"style":{"width":"18%"},"width":337,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-21.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":20.82},"width":395.42,"height":52.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-22.png","element":"img","alt":" µi ≥ ℓti and uti∗ ≥ µi∗","inline":true},{"text":", it follows that ","element":"span"},{"style":{"height":18.73},"width":628.2,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-23.png","element":"img","alt":" |µi − µi∗| ≤ k · η(t) for any i ∈ St","inline":true},{"text":". Finally, summing up ","element":"span"},{"text":"over all rounds ","element":"span"},{"style":{"height":12.8},"width":100.09,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-24.png","element":"img","alt":" t ∈ T","inline":true},{"text":", we know that","element":"span"}],[{"style":{"width":"89%"},"width":1670,"height":359,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-25.png","element":"img"}],[{"text":"where this bound is derived in Appendix ","element":"span"},{"href":"#id-33","text":"A.1.","element":"a"}]]},{"heading":"4 Fair Classic Stochastic Bandits: A Lower Bound","paragraphs":[[{"text":"We now show that the regret bound for ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"has an optimal dependence on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":": ","element":"span"},{"style":{"fontStyle":"italic"},"text":"no ","element":"span"},{"text":"fair algorithm has diminishing regret before ","element":"span"},{"style":{"height":19.13},"width":180.08,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/10-26.png","element":"img","alt":" T = Ω(k3","inline":true},{"text":") rounds. All missing proofs are in Appendix ","element":"span"},{"text":"B. ","element":"span"},{"text":"The main result of this section is the following.","element":"span"}],[{"id":"id-34","style":{"fontWeight":"bold"},"text":"Theorem 3. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There is a distribution ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontStyle":"italic"},"text":"over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"style":{"fontStyle":"italic"},"text":"-arm instances of the stochastic multi-armed bandit problem such that any fair algorithm run on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"style":{"fontStyle":"italic"},"text":"experiences constant per-round regret for at least","element":"span"}],[{"style":{"width":"16%"},"width":308,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-0.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"rounds.","element":"span"}],[{"text":"Despite the fact that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"regret ","element":"span"},{"text":"is defined in a prior-free way, the proof of Theorem ","element":"span"},{"href":"#id-34","text":"3 ","element":"a"},{"text":"proceeds via Bayesian reasoning. We construct a family of lower bound instances such that arms have payoffs drawn from Bernoulli distributions, denoted ","element":"span"},{"style":{"height":17.6},"width":314.19,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-1.png","element":"img","alt":" B(µ) for mean µ","inline":true},{"text":". So, to specify a problem instance, it suffices to specify a mean for each of ","element":"span"},{"style":{"height":16.4},"width":345.47,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-2.png","element":"img","alt":" k arms: µ1, . . . , µk","inline":true},{"text":". The proof formalizes the following outline.","element":"span"}],[{"text":"1. We define an instance distribution ","element":"span"},{"style":{"height":14.84},"width":321,"height":37.11,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-3.png","element":"img","alt":" P = P1×. . .×Pk","inline":true,"padRight":true},{"text":"over means ","element":"span"},{"style":{"height":12},"width":38.29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-4.png","element":"img","alt":" µi","inline":true,"padRight":true},{"text":"(Definition ","element":"span"},{"href":"#id-35","text":"3)","element":"a"},{"text":". ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"will have two important properties. First, we will draw means from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"such that for any ","element":"span"},{"style":{"height":17.6},"width":214.41,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-5.png","element":"img","alt":" i ∈ [k − 1],","inline":true},{"style":{"height":12},"width":179.98,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-6.png","element":"img","alt":"µi = µi+1","inline":true,"padRight":true},{"text":"with probability at least 1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4. Second, for any realization of means drawn from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", if an algorithm plays uniformly at random over [","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":"], it will suffer constant per-round regret.","element":"span"}],[{"text":"2. We treat ","element":"span"},{"style":{"height":15.6},"width":261.36,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-7.png","element":"img","alt":" Pi as a prior","inline":true,"padRight":true},{"text":"distribution over mean ","element":"span"},{"style":{"height":12},"width":38.29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-8.png","element":"img","alt":" µi","inline":true},{"text":", and analyze the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"posterior ","element":"span"},{"text":"distribution ","element":"span"},{"style":{"height":19.62},"width":247.6,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-9.png","element":"img","alt":"Pi(r1i , , . . . , rti","inline":true},{"text":") over means that results after applying Bayes’ rule to the payoff observations ","element":"span"},{"style":{"height":19.62},"width":169.7,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-10.png","element":"img","alt":"r1i , . . . , rti ","inline":true,"padRight":true},{"text":"made by the algorithm. Bayes’ rule implies (Lemma ","element":"span"},{"href":"#id-36","text":"4) ","element":"a"},{"text":"the joint distribution over ","element":"span"},{"text":"rewards and means drawn from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"is identical to the distribution which first draws means according to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P","element":"span"},{"text":", then draws rewards conditioned on those means, and finally ","element":"span"},{"style":{"fontStyle":"italic"},"text":"resamples ","element":"span"},{"text":"the means from the posterior distribution on means. ","element":"span"},{"text":"Thus, we can reason about fairness (a frequentist quantity) by analyzing the Bayesian posterior distribution on means conditioned on the observed rewards.","element":"span"}],[{"text":"3. A ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-11.png","element":"img","alt":" δ","inline":true},{"text":"-fair algorithm, for any set of means realized from the instance (prior) distribution, must not play arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"+ 1 with lower probability than arm ","element":"span"},{"style":{"height":16.4},"width":261.62,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-12.png","element":"img","alt":" i if µi = µi+1","inline":true},{"text":", except with probability ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-13.png","element":"img","alt":"δ","inline":true},{"text":". By the above change of perspective, therefore, any ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-14.png","element":"img","alt":" δ","inline":true},{"text":"-fair algorithm must play arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"+1 with equal probability until the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"posterior ","element":"span"},{"text":"distribution on means given observed rewards, satisfies ","element":"span"},{"style":{"height":17.6},"width":354.74,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-15.png","element":"img","alt":" P [µi = µi+1|h] < δ","inline":true,"padRight":true},{"text":"(Lemmas ","element":"span"},{"href":"#id-37","text":"5 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-38","text":"6)","element":"a"},{"text":".","element":"span"}],[{"text":"4. We finally lower bound the number of reward observations necessary before the posterior distribution on means given payoffs is such that ","element":"span"},{"style":{"height":17.6},"width":357.83,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-16.png","element":"img","alt":" P [µi = µi+1|h] < δ","inline":true,"padRight":true},{"text":"for any pair of adjacent arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, i ","element":"span"},{"text":"+ 1. ","element":"span"},{"text":"We show that this is Ω(","element":"span"},{"style":{"height":15.13},"width":41.1,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-17.png","element":"img","alt":"k2","inline":true},{"text":") (Lemma ","element":"span"},{"href":"#id-39","text":"7)","element":"a"},{"text":". ","element":"span"},{"text":"Since fair algorithms must play from among the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"arms uniformly at random until this point, with high probability, no arm accumulates sufficiently many reward observations until ","element":"span"},{"style":{"height":19.13},"width":179.32,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-18.png","element":"img","alt":" T = Ω(k3","inline":true},{"text":") rounds of play.","element":"span"}],[{"text":"We begin by describing our distribution over instances. Each arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"’s payoff distribution will be Bernoulli with mean ","element":"span"},{"style":{"height":16},"width":138.02,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-19.png","element":"img","alt":" µi ∼ Pi","inline":true,"padRight":true},{"text":"independently of each other arm.","element":"span"}],[{"id":"id-35","style":{"fontWeight":"bold"},"text":"Definition 3 ","element":"span"},{"text":"(Prior Distribution over ","element":"span"},{"style":{"height":17.6},"width":70.79,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-20.png","element":"img","alt":" µi).","inline":true,"padRight":true},{"text":"For each arm ","element":"span"},{"style":{"height":16},"width":87.88,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-21.png","element":"img","alt":" i, µi","inline":true,"padRight":true},{"text":"is distributed according to the distribution with the following probability mass function:","element":"span"}],[{"style":{"width":"71%"},"width":1345,"height":206,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/11-22.png","element":"img"}],[{"text":"We treat ","element":"span"},{"style":{"fontStyle":"italic"},"text":"P ","element":"span"},{"text":"as a prior distribution over instances, and analyze the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"posterior ","element":"span"},{"text":"distribution on instances given the realized rewards. Lemma ","element":"span"},{"href":"#id-36","text":"4 ","element":"a"},{"text":"justifies this reasoning.","element":"span"}],[{"id":"id-36","style":{"fontWeight":"bold"},"text":"Lemma 4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Consider the following two experiments: In the first, let ","element":"span"},{"style":{"height":19.62},"width":495.7,"height":49.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-0.png","element":"img","alt":" µi ∼ Pi and r1i , . . . , rti ∼","inline":true},{"style":{"height":17.6},"width":280.39,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-1.png","element":"img","alt":"B(µi), and W","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"denote the joint distribution on ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.62},"width":248.14,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-2.png","element":"img","alt":"µi, r1i , . . . , rti)","inline":true},{"style":{"fontStyle":"italic"},"text":". In the second, let ","element":"span"},{"style":{"height":16.4},"width":263.34,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-3.png","element":"img","alt":" µi ∼ Pi, and","inline":true},{"style":{"height":19.62},"width":349.53,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-4.png","element":"img","alt":"r1i , . . . , rti ∼ B(µi)","inline":true},{"style":{"fontStyle":"italic"},"text":", and then re-draw the mean ","element":"span"},{"style":{"height":19.62},"width":355.78,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-5.png","element":"img","alt":" µ′i ∼ Pi(r1i , . . . , rti)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"from its posterior distribution ","element":"span"},{"style":{"fontStyle":"italic"},"text":"given the rewards. Let ","element":"span"},{"text":"(","element":"span"},{"style":{"height":19.62},"width":736.22,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-6.png","element":"img","alt":"µ′i, r1i , . . . , rti) ∼ W ′. Then, W and W ′","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"are identical distributions.","element":"span"}],[{"text":"Next, we lower-bound the number of reward observations necessary such that for some ","element":"span"},{"style":{"height":17.6},"width":129.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-7.png","element":"img","alt":" i ∈ [k]:","inline":true},{"style":{"height":20.8},"width":398.57,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-8.png","element":"img","alt":"P�µi = µi+1|ht� < δ","inline":true,"padRight":true},{"text":"with respect to the posterior. ","element":"span"},{"text":"It will be useful to refer to an algorithm’s histories as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"distinguishing ","element":"span"},{"text":"the mean of an arm given that history with high probability.","element":"span"}],[{"style":{"height":17.6},"width":307.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-9.png","element":"img","alt":"Definition 4 (δ","inline":true},{"text":"-distinguishing)","element":"span"},{"style":{"fontWeight":"bold"},"text":". ","element":"span"},{"text":"We will say ","element":"span"},{"style":{"height":18.33},"width":595.08,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-10.png","element":"img","alt":" ht δ-distinguishes arm i for A","inline":true,"padRight":true},{"text":"if, for some ","element":"span"},{"style":{"height":17.6},"width":180.7,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-11.png","element":"img","alt":" α ∈ [0, 1],","inline":true}],[{"style":{"width":"27%"},"width":511,"height":58,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-12.png","element":"img"}],[{"text":"The next lemma shows that if no arm is","element":"span"},{"style":{"height":17.6},"width":56.37,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-13.png","element":"img","alt":"√δ","inline":true},{"text":"-distinguished by a history, all pairs of arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, i ","element":"span"},{"text":"+ 1 have posterior probability strictly greater than ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-14.png","element":"img","alt":" δ","inline":true,"padRight":true},{"text":"of having equal means.","element":"span"}],[{"id":"id-37","style":{"fontWeight":"bold"},"text":"Lemma 5. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has history ","element":"span"},{"style":{"height":18.88},"width":535.51,"height":47.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-15.png","element":"img","alt":" ht, and that ht does not√δ","inline":true},{"style":{"fontStyle":"italic"},"text":"-distinguish any arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"style":{"fontStyle":"italic"},"text":". Then, for all arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, i ","element":"span"},{"text":"+ 1","element":"span"},{"style":{"fontStyle":"italic"},"text":",","element":"span"}],[{"style":{"width":"20%"},"width":389,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-16.png","element":"img"}],[{"text":"Now, we prove that for any fair algorithm, with probability ","element":"span"},{"style":{"height":21.29},"width":67.84,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-17.png","element":"img","alt":" ≥ 12 ","inline":true,"padRight":true},{"text":"over the draw of histories ","element":"span"},{"style":{"height":17.53},"width":101.37,"height":43.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-18.png","element":"img","alt":" ht, ht","inline":true,"padRight":true},{"text":"must","element":"span"},{"style":{"height":17.6},"width":78.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-19.png","element":"img","alt":"√2δ","inline":true},{"text":"-distinguish some arm, or the algorithm must play uniformly across all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"arms conditioned on ","element":"span"},{"style":{"height":14.74},"width":51.36,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-20.png","element":"img","alt":" ht.","inline":true}],[{"id":"id-38","style":{"fontWeight":"bold"},"text":"Lemma 6. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Suppose an algorithm ","element":"span"},{"style":{"height":14},"width":117.29,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-21.png","element":"img","alt":" A is δ","inline":true},{"style":{"fontStyle":"italic"},"text":"-fair. Then:","element":"span"}],[{"style":{"width":"77%"},"width":1447,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-22.png","element":"img"}],[{"text":"We now lower-bound the number of observations from arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"which are required to ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-23.png","element":"img","alt":" δ","inline":true},{"text":"-distinguish it.","element":"span"}],[{"id":"id-39","style":{"height":21.29},"width":755.39,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-24.png","element":"img","alt":"Lemma 7. Fix any δ < 18. Let µi ∼ Pi","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"as in Definition ","element":"span"},{"href":"#id-35","style":{"fontStyle":"italic"},"text":"3. ","element":"a"},{"style":{"fontStyle":"italic"},"text":"Then, arm ","element":"span"},{"style":{"height":17.6},"width":156.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-25.png","element":"img","alt":" i is√2δ","inline":true},{"style":{"fontStyle":"italic"},"text":"-distinguishable by ","element":"span"},{"style":{"height":21.29},"width":1099.3,"height":53.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-26.png","element":"img","alt":"ht only if Ti = Ω(k2 ln 1δ), where Ti = |{t′ : ht′2 = i, t′ ≤ t}|","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"is the number of times arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"style":{"fontStyle":"italic"},"text":"is played.","element":"span"}],[{"style":{"height":21.29},"width":419.48,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-27.png","element":"img","alt":"Proof. Write p, p + 13k ","inline":true,"padRight":true},{"text":"to represent the two possible realizations that ","element":"span"},{"style":{"height":12},"width":38.3,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-28.png","element":"img","alt":" µi","inline":true,"padRight":true},{"text":"might take, when drawn ","element":"span"},{"text":"from the distribution over instances given in Definition ","element":"span"},{"href":"#id-35","text":"3. ","element":"a"},{"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"represent the event that ","element":"span"},{"style":{"height":12},"width":127.78,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-29.png","element":"img","alt":" µi = p","inline":true,"padRight":true},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"the event that ","element":"span"},{"style":{"height":21.7},"width":496.56,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-30.png","element":"img","alt":" µi = p + 13k. Let δ′ =√2δ","inline":true,"padRight":true},{"text":"throughout.","element":"span"}],[{"text":"Fix a history ","element":"span"},{"style":{"height":17.53},"width":359.2,"height":43.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-31.png","element":"img","alt":" ht, and let m = Ti","inline":true,"padRight":true},{"text":"represent the number of observations of arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":"’s reward. We will abuse notation and use ","element":"span"},{"style":{"height":19.22},"width":37.14,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-32.png","element":"img","alt":" hti ","inline":true,"padRight":true},{"text":"to refer to the payoff sequence of arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"observed in history ","element":"span"},{"style":{"height":19.22},"width":152.99,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-33.png","element":"img","alt":" ht. hti is","inline":true,"padRight":true},{"text":"therefore a binary sequence of length ","element":"span"},{"style":{"height":19.22},"width":309.91,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-34.png","element":"img","alt":" m; let ||hti||0 = s","inline":true,"padRight":true},{"text":"denote the number of 1s in the sequence. We ","element":"span"},{"text":"will calculate conditions under which ","element":"span"},{"style":{"height":19.22},"width":137.47,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-35.png","element":"img","alt":" hti, m, s","inline":true,"padRight":true},{"text":"will imply that either ","element":"span"},{"style":{"height":37.15},"width":578.8,"height":92.87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-36.png","element":"img","alt":" 1−δ′δ′ ≤P[B|hti]P[A|hti] orP[B|hti]P[A|hti] ≤ δ′1−δ′","inline":true}],[{"text":"holds, implying that one of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"or ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"has posterior probability at least 1 ","element":"span"},{"style":{"height":12.8},"width":82.93,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-37.png","element":"img","alt":" − δ′","inline":true},{"text":", conditioned on the observed rewards. If neither of these is the case, ","element":"span"},{"style":{"height":12.8},"width":188.08,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-38.png","element":"img","alt":" i is not δ′","inline":true},{"text":"-distinguished by ","element":"span"},{"style":{"height":14.73},"width":51.37,"height":36.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-39.png","element":"img","alt":" ht.","inline":true}],[{"text":"We begin by rearranging our definition of this ratio ","element":"span"},{"style":{"height":37.15},"width":410.38,"height":92.87,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-40.png","element":"img","alt":" X = P[B|hti]P[A|hti] = P[hti|B]P[hti|A","inline":true},{"text":"],which follows from ","element":"span"},{"text":"Bayes’ rule and the fact that ","element":"span"},{"text":"P ","element":"span"},{"text":"[","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"] = ","element":"span"},{"text":"P ","element":"span"},{"text":"[","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"]. We wish to upper and lower bound ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"text":"in terms of ","element":"span"},{"style":{"height":19.22},"width":68.5,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-41.png","element":"img","alt":" hti’s","inline":true,"padRight":true},{"text":"value. By definition of the Bernoulli distribution, we have that","element":"span"}],[{"style":{"width":"82%"},"width":1542,"height":118,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/12-42.png","element":"img"}],[{"text":"We now calculate under what conditions either (a) ","element":"span"},{"style":{"height":22.49},"width":505.17,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-0.png","element":"img","alt":" X ≤ δ′1−δ′ , or (b) X ≥ 1−δ′δ′","inline":true,"padRight":true},{"text":". One of these must ","element":"span"},{"text":"hold if ","element":"span"},{"style":{"height":12.8},"width":110.72,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-1.png","element":"img","alt":" i is δ′","inline":true},{"text":"-distinguished. Before we do so, we mention that a Chernoff bound implies that with probability 1 ","element":"span"},{"style":{"height":12.8},"width":80.68,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-2.png","element":"img","alt":" − δ′","inline":true},{"text":", for events ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":", Equations ","element":"span"},{"href":"#id-40","text":"1 ","element":"a"},{"text":"and ","element":"span"},{"href":"#id-40","text":"2, ","element":"a"},{"text":"respectively:","element":"span"}],[{"id":"id-40","style":{"width":"85%"},"width":1602,"height":84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-3.png","element":"img"}],[{"text":"since the mean of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"Bernoulli trials with mean ","element":"span"},{"style":{"height":21.29},"width":649.94,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-4.png","element":"img","alt":" p ( or p + 13k) is mp (or mp + m3k).","inline":true,"padRight":true},{"text":"We begin by analyzing case (a), where ","element":"span"},{"style":{"height":20.08},"width":441.18,"height":50.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-5.png","element":"img","alt":" δ′ =√2δ < 1/2 implies","inline":true}],[{"style":{"width":"55%"},"width":1041,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-6.png","element":"img"}],[{"text":"Taking logarithms on both sides, we have that","element":"span"}],[{"style":{"width":"92%"},"width":1734,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-7.png","element":"img"}],[{"text":"where the inequality follows from ln(1 + ","element":"span"},{"style":{"height":20.42},"width":462.88,"height":51.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-8.png","element":"img","alt":" x) ≥ xx+1 for x ∈ [−1, ∞","inline":true},{"text":"]. Then, this implies that","element":"span"}],[{"text":"(3","element":"span"},{"style":{"height":17.6},"width":1725.98,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-9.png","element":"img","alt":"kp + 1)(3k(1 − p) − 1) ln(2δ′) > s(3k(1 − p) − 1) − (m − s)(3kp + 1) = 3ks − 3kpm − m.","inline":true}],[{"text":"Multiplying both sides by ","element":"span"},{"style":{"height":4.8},"width":34,"height":12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-10.png","element":"img","alt":" −","inline":true},{"text":"1, this implies that","element":"span"}],[{"style":{"width":"97%"},"width":1816,"height":372,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-11.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":20.08},"width":634.32,"height":50.19,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-12.png","element":"img","alt":" p, 1 − p ∈ [1/3, 2/3] and δ′ =√2δ","inline":true},{"text":", solving for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"implies that ","element":"span"},{"style":{"height":21.29},"width":304.37,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-13.png","element":"img","alt":" m = Ω(k2 ln 1δ′ ).","inline":true,"padRight":true},{"text":"In case (b), we have","element":"span"}],[{"style":{"width":"69%"},"width":1304,"height":107,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-14.png","element":"img"}],[{"text":"where we used the fact that 1 + ","element":"span"},{"style":{"height":14.8},"width":291.44,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-15.png","element":"img","alt":" x ≤ ex for all x","inline":true},{"text":". Taking logarithms, this will imply that","element":"span"}],[{"style":{"width":"96%"},"width":1813,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-16.png","element":"img"}],[{"text":"whose last inequality comes the range of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"p","element":"span"},{"text":". Combining this inequality with Equation ","element":"span"},{"href":"#id-40","text":"2, ","element":"a"},{"text":"this implies","element":"span"}],[{"style":{"width":"99%"},"width":1869,"height":267,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/13-17.png","element":"img"}],[{"text":"We now have the tools in hand to prove Theorem ","element":"span"},{"href":"#id-34","text":"3.","element":"a"}],[{"href":"#id-34","style":{"height":16.8},"width":787.56,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-0.png","element":"img","alt":"Proof of Theorem 3. Assume A is some δ","inline":true},{"text":"-fair algorithm where ","element":"span"},{"style":{"height":17.6},"width":284.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-1.png","element":"img","alt":" δ < 1/8. Fix T","inline":true},{"text":"; we claim that with probability at least ","element":"span"},{"style":{"height":24.49},"width":912.26,"height":61.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-2.png","element":"img","alt":"12, for any t = o(k3 ln 1δ), t ≤ T, πtj|ht = 1k for all j","inline":true},{"text":". Since the payoff for uniformly ","element":"span"},{"text":"random play is ","element":"span"},{"style":{"height":21.29},"width":148.87,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-3.png","element":"img","alt":" ≤ 12 + 1k","inline":true},{"text":", while the best arm has payoff ","element":"span"},{"style":{"height":21.29},"width":67.95,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-4.png","element":"img","alt":" ≥ 23","inline":true},{"text":", in any round ","element":"span"},{"style":{"height":23.56},"width":453.98,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-5.png","element":"img","alt":" t where πti|ht = πti′|ht for","inline":true,"padRight":true},{"text":"all ","element":"span"},{"style":{"height":17.6},"width":149.09,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-6.png","element":"img","alt":" i, i′ ∈ [k","inline":true},{"text":"], the algorithm suffers Ω(1) regret in that round.","element":"span"}],[{"text":"Lemma ","element":"span"},{"href":"#id-38","text":"6 ","element":"a"},{"text":"implies that, with probability at least ","element":"span"},{"style":{"height":21.29},"width":17,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-7.png","element":"img","alt":"12 ","inline":true,"padRight":true},{"text":"over the distribution over histories ","element":"span"},{"style":{"height":17.53},"width":174.62,"height":43.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-8.png","element":"img","alt":" ht, either","inline":true,"padRight":true},{"text":"(a) ","element":"span"},{"href":"#id-39","style":{"height":27.02},"width":1030.73,"height":67.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-9.png","element":"img","alt":" πt′i|ht′ = πt′i′|ht′ for all i, i′ ∈ [k], t′ ≤ t or (b) ht must√2δ","inline":true},{"text":"-distinguish some arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":". Case (","element":"span"},{"style":{"fontStyle":"italic"},"text":"a","element":"span"},{"text":") implies","element":"span"}],[{"text":"our claim. In case (b), Lemma ","element":"span"},{"href":"#id-39","text":"7 ","element":"a"},{"text":"states than an arm ","element":"span"},{"style":{"height":17.6},"width":146.87,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-10.png","element":"img","alt":" i is√2δ","inline":true},{"text":"-distinguishable only if ","element":"span"},{"style":{"height":21.29},"width":295.1,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-11.png","element":"img","alt":" Ti = Ω(k2 ln 1δ).","inline":true,"padRight":true},{"text":"We now argue that unless ","element":"span"},{"style":{"height":21.29},"width":537.53,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-12.png","element":"img","alt":" t = Ω(k3 ln 1δ), Ti = o(k2 ln 1δ","inline":true},{"text":"), which will imply our claim for case (","element":"span"},{"style":{"fontStyle":"italic"},"text":"b","element":"span"},{"text":").","element":"span"}],[{"text":"Fix some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i, t","element":"span"},{"text":". We lower-bound ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"for which, with probability at least 1 ","element":"span"},{"style":{"height":22.49},"width":77,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-13.png","element":"img","alt":" − δ′k ","inline":true,"padRight":true},{"text":"over histories ","element":"span"},{"style":{"height":17.53},"width":95.9,"height":43.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-14.png","element":"img","alt":" ht, it","inline":true,"padRight":true},{"text":"will be the case that ","element":"span"},{"style":{"height":26.62},"width":1464.99,"height":66.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-15.png","element":"img","alt":" nti ≥ c · k2 ln 1δ when πt′i|ht′ = πt′i′|ht′ for all i, i′ ∈ [k], t′ ≤ t. Let X1, . . . , Xt be","inline":true,"padRight":true},{"text":"indicator variables of arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"being played in round ","element":"span"},{"style":{"height":13.6},"width":101.12,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-16.png","element":"img","alt":" t′ ≤ t","inline":true},{"text":". Note that for all ","element":"span"},{"style":{"height":21.29},"width":452.77,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-17.png","element":"img","alt":" t′ ≤ t, E[Xt′] = 1k, since","inline":true,"padRight":true},{"text":"in all rounds prior to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", we have all arms are played with equal probability. For any ","element":"span"},{"style":{"height":17.6},"width":230.81,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-18.png","element":"img","alt":" ϵ ∈ [0, 1], as","inline":true},{"style":{"height":19.22},"width":50.42,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-19.png","element":"img","alt":"nt′i ","inline":true,"padRight":true},{"text":"are nondecreasing in ","element":"span"},{"style":{"height":11.6},"width":31.76,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-20.png","element":"img","alt":" t′","inline":true},{"text":", an additive Chernoff bound implies","element":"span"}],[{"style":{"width":"100%"},"width":1872,"height":715,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-21.png","element":"img"}],[{"text":"probability 1 ","element":"span"},{"style":{"height":42.4},"width":752.86,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-22.png","element":"img","alt":" − δ′ for all i unless t ≥ min�c2 · k3 ln 1δ,","inline":true}]]},{"heading":"5 KWIK Learnability Implies Fair Bandit Learnability","paragraphs":[[{"text":"In this section, we show if a class of functions is KWIK learnable, then there is a fair algorithm for learning the same class of functions in the contextual bandit setting, with a regret bound polynomially related to the function class’ KWIK bound. Intuitively, KWIK-learnability of a class of functions guarantees we can learn the function’s behavior to a high degree of accuracy with a high degree of confidence. As fairness constrains an algorithm most before the algorithm has determined the payoff functions’ behavior accurately, this guarantee enables us to learn fairly without incurring much additional regret. Formally, we prove the following polynomial relationship.","element":"span"}],[{"id":"id-43","style":{"fontWeight":"bold"},"text":"Theorem 4. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"For an instance of the contextual multi-armed bandit problem where ","element":"span"},{"style":{"height":17.42},"width":226.79,"height":43.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-23.png","element":"img","alt":" fj ∈ C for","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"all ","element":"span"},{"style":{"height":17.6},"width":400.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-24.png","element":"img","alt":" j ∈ [k], if C is (ϵ, δ)","inline":true},{"style":{"fontStyle":"italic"},"text":"-KWIK learnable with bound ","element":"span"},{"style":{"height":17.6},"width":838.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-25.png","element":"img","alt":" m(ϵ, δ), KWIKToFair (δ, T) is δ-fair and","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"achieves regret bound:","element":"span"}],[{"style":{"width":"77%"},"width":1453,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/14-26.png","element":"img"}],[{"text":"First, we construct an algorithm ","element":"span"},{"style":{"height":17.6},"width":379.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-0.png","element":"img","alt":" KWIKToFair(δ, T","inline":true},{"text":") that uses the KWIK learning algorithm as a subroutine, and prove that it is ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-1.png","element":"img","alt":" δ","inline":true},{"text":"-fair. A call to ","element":"span"},{"style":{"height":17.6},"width":379.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-2.png","element":"img","alt":" KWIKToFair(δ, T","inline":true},{"text":") will initialize a KWIK learner for each arm, and in each of the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"rounds will implicitly construct a confidence interval around the prediction of each learner. If a learner makes a numeric valued prediction, we will interpret this as a confidence interval centered at the prediction with width ","element":"span"},{"style":{"height":12.73},"width":34.71,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-3.png","element":"img","alt":" ϵ∗","inline":true},{"text":". If a learner outputs ","element":"span"},{"style":{"height":12},"width":34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-4.png","element":"img","alt":"⊥","inline":true},{"text":", we interpret this as a trivial confidence interval (covering all of [0","element":"span"},{"style":{"fontStyle":"italic"},"text":", ","element":"span"},{"text":"1]). We use the same chaining technique that we use in the classic stochastic setting. In every round ","element":"span"},{"style":{"height":17.6},"width":463.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-5.png","element":"img","alt":" t, KWIKToFair (δ, T)","inline":true,"padRight":true},{"text":"identifies the arm ","element":"span"},{"style":{"height":19.22},"width":310.65,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-6.png","element":"img","alt":" it∗ = arg maxi uti ","inline":true,"padRight":true},{"text":"that has the largest ","element":"span"},{"style":{"fontStyle":"italic"},"text":"upper ","element":"span"},{"text":"confidence bound. At each round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", we will say ","element":"span"},{"style":{"height":21.62},"width":1142.72,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-7.png","element":"img","alt":" i is linked to j if [ℓti, uti] ∩ [ℓtj, utj] ̸= ∅, and i is chained to j","inline":true,"padRight":true},{"text":"if they are in the same ","element":"span"},{"text":"component of the transitive closure of the linked relation. Then, it plays uniformly at random amongst all arms chained to arm ","element":"span"},{"style":{"height":18.65},"width":45.96,"height":46.62,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-8.png","element":"img","alt":" it∗.","inline":true,"padRight":true},{"text":"Whenever all learners output predictions, they need no feedback. When a learner for ","element":"span"},{"style":{"height":16.4},"width":327.98,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-9.png","element":"img","alt":" j outputs ⊥, if j","inline":true,"padRight":true},{"text":"is selected then we have feedback ","element":"span"},{"style":{"height":21.62},"width":34.69,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-10.png","element":"img","alt":" rtj ","inline":true,"padRight":true},{"text":"to give it; on ","element":"span"},{"text":"the other hand, if ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"isn’t selected, we “roll back” the learning algorithm for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"to before this round by not updating the algorithm’s state.","element":"span"}],[{"style":{"width":"98%"},"width":1847,"height":497,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-11.png","element":"img"}],[{"style":{"height":17.6},"width":623.35,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-12.png","element":"img","alt":"10: Pull j∗ ← (x ∈R [k","inline":true},{"text":"]), receive reward ","element":"span"},{"style":{"height":21.62},"width":227.44,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-13.png","element":"img","alt":" rtj∗ ▷","inline":true,"padRight":true},{"text":"Pick arm at random from all arms","element":"span"}],[{"style":{"width":"99%"},"width":1858,"height":213,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-14.png","element":"img"}],[{"style":{"height":21.62},"width":1343.76,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-15.png","element":"img","alt":"15: hj∗ ← hj∗ :: (xtj∗, rtj∗) ▷","inline":true,"padRight":true},{"text":"Update the history for ","element":"span"},{"style":{"height":17.02},"width":60.24,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-16.png","element":"img","alt":" Lj∗","inline":true}],[{"text":"We begin by bounding the probability of certain failures of ","element":"span"},{"text":"KWIKToFair ","element":"span"},{"text":"in Lemma ","element":"span"},{"href":"#id-41","text":"8, ","element":"a"},{"text":"proven in Appendix ","element":"span"},{"text":"C. ","element":"span"},{"text":"This in turn lets us prove the fairness of ","element":"span"},{"text":"KWIKToFair ","element":"span"},{"text":"in Theorem ","element":"span"},{"href":"#id-42","text":"5.","element":"a"}],[{"id":"id-41","style":{"fontWeight":"bold"},"text":"Lemma 8. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"With probability at least ","element":"span"},{"text":"1 ","element":"span"},{"style":{"height":21.29},"width":223.6,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-17.png","element":"img","alt":" − min(δ, 1T )","inline":true},{"style":{"fontStyle":"italic"},"text":", for all rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and all arms ","element":"span"},{"style":{"height":19.22},"width":291.9,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-18.png","element":"img","alt":" i, (a) if sti ∈ R","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"then ","element":"span"},{"style":{"height":20.8},"width":1269.46,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-19.png","element":"img","alt":" |sti − fi(xti)| ≤ ϵ∗ and (b) �t I�sti = ⊥ and i is pulled�≤ m(ϵ∗, δ∗).","inline":true}],[{"id":"id-42","style":{"width":"45%"},"width":845,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-20.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-42","style":{"fontStyle":"italic"},"text":"5. ","element":"a"},{"text":"We condition on both (a) and (b) holding for all arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"from Lemma ","element":"span"},{"href":"#id-41","text":"8, ","element":"a"},{"text":"which occur with probability 1 ","element":"span"},{"style":{"height":12.8},"width":64.55,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-21.png","element":"img","alt":" − δ","inline":true,"padRight":true},{"text":"for all arms and all times ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":". Therefore, we proceed by conditioning on the event that for all arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and all rounds ","element":"span"},{"style":{"height":19.22},"width":591.49,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-22.png","element":"img","alt":" t, if Li = sti for sti ̸= ⊥ then","inline":true},{"style":{"height":19.22},"width":313.14,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-23.png","element":"img","alt":"|sti − fi(xti)| ≤ ϵ∗","inline":true},{"text":". Having done so, there are two possibilities for each round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":".","element":"span"},{"text":"In case 1, for each ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"we have that ","element":"span"},{"style":{"height":19.22},"width":301.37,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-24.png","element":"img","alt":" Li(xti) = sti ̸= ⊥","inline":true},{"text":". By the condition above, for any arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and ","element":"span"},{"style":{"height":21.62},"width":320.49,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-25.png","element":"img","alt":"j, fi(xti) ≥ fj(xtj","inline":true},{"text":") implies that ","element":"span"},{"style":{"height":21.62},"width":323.76,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-26.png","element":"img","alt":" sti + ϵ∗ ≥ stj − ϵ∗","inline":true},{"text":". Since in this case no learner outputs ","element":"span"},{"style":{"height":16},"width":175.01,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-27.png","element":"img","alt":" ⊥, arm j","inline":true,"padRight":true},{"text":"chains to the top arm only if arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"does. Therefore ","element":"span"},{"style":{"height":23.56},"width":195.33,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-28.png","element":"img","alt":" πti|h ≥ πtj|h","inline":true},{"text":". In case 2, there exists some ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"such ","element":"span"},{"text":"that ","element":"span"},{"style":{"height":19.22},"width":209.5,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-29.png","element":"img","alt":" Li(xti) = ⊥","inline":true},{"text":". Then we choose uniformly at random across all arms, so ","element":"span"},{"style":{"height":23.56},"width":440.63,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-30.png","element":"img","alt":" πti|h = πtj|h for all i and","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":".","element":"span"}],[{"text":"Thus, with probability at least 1","element":"span"},{"style":{"height":12.8},"width":54.12,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-31.png","element":"img","alt":"−δ","inline":true},{"text":", for each round ","element":"span"},{"style":{"height":21.62},"width":300.59,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-32.png","element":"img","alt":" t, fi(xti) ≥ fj(xtj","inline":true},{"text":") implies that ","element":"span"},{"style":{"height":23.56},"width":208.84,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/15-33.png","element":"img","alt":" πti|h ≥ πtj|h.","inline":true}],[{"text":"We now use the KWIK bounds of the KWIK learners to upper-bound the regret of ","element":"span"},{"text":"KWIKTo- ","element":"span"},{"style":{"height":17.6},"width":207.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-0.png","element":"img","alt":"Fair(δ, T).","inline":true}],[{"id":"id-49","style":{"height":17.6},"width":634.22,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-1.png","element":"img","alt":"Lemma 9. KWIKToFair(δ, T)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"achieves regret ","element":"span"},{"style":{"height":21.69},"width":605.18,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-2.png","element":"img","alt":" O(max(k2 · m(ϵ∗, δ∗), k3 ln Tkδ )).","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"We first condition on the event that both (a) and (b) from Lemma ","element":"span"},{"href":"#id-41","text":"8 ","element":"a"},{"text":"hold for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t, i","element":"span"},{"text":", which holds with probability 1 ","element":"span"},{"style":{"height":21.29},"width":204.44,"height":53.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-3.png","element":"img","alt":" − min(δ, 1T ","inline":true,"padRight":true},{"text":"), and bound the regret when they both hold. ","element":"span"},{"text":"Choose an arbitrary round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"in the execution of ","element":"span"},{"style":{"height":17.6},"width":379.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-4.png","element":"img","alt":" KWIKToFair(δ, T","inline":true},{"text":"). As above, there are two cases. In the first case, ","element":"span"},{"style":{"height":19.22},"width":464.39,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-5.png","element":"img","alt":" Li(xti) = sti ̸= ⊥ for all i","inline":true,"padRight":true},{"text":"and we choose uniformly at random from the arms chained by ","element":"span"},{"style":{"height":12.73},"width":34.71,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-6.png","element":"img","alt":"ϵ∗","inline":true},{"text":"-intervals to the arm with the highest prediction. Since we have conditioned on the event that all KWIK learners are correct, ","element":"span"},{"style":{"height":15.13},"width":130.84,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-7.png","element":"img","alt":" i∗ ∈ St","inline":true},{"text":". Furthermore, for any ","element":"span"},{"style":{"height":18.33},"width":151.77,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-8.png","element":"img","alt":" i, j ∈ St","inline":true},{"text":", we have that ","element":"span"},{"style":{"height":21.63},"width":306.15,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-9.png","element":"img","alt":" |sti − stj| ≤ 2kϵ∗,","inline":true,"padRight":true},{"text":"and in particular that ","element":"span"},{"style":{"height":19.22},"width":308.64,"height":48.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-10.png","element":"img","alt":" |sti − sti∗| ≤ 2kϵ∗","inline":true},{"text":". Thus, the regret is at most 2","element":"span"},{"style":{"height":12.8},"width":58.8,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-11.png","element":"img","alt":"kϵ∗","inline":true,"padRight":true},{"text":"in such a round. In the ","element":"span"},{"text":"second case some arm outputs ","element":"span"},{"style":{"height":12},"width":34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-12.png","element":"img","alt":" ⊥","inline":true},{"text":", so we choose randomly from all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"arms, and the worst-case regret is 1. Thus, the total regret will be at most 2","element":"span"},{"style":{"height":14},"width":455.87,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-13.png","element":"img","alt":"kϵ∗T + n + δT where n","inline":true,"padRight":true},{"text":"is the number of rounds in which some ","element":"span"},{"style":{"height":15.6},"width":263.95,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-14.png","element":"img","alt":" Li outputs ⊥.","inline":true}],[{"text":"We now upper bound ","element":"span"},{"style":{"height":10.62},"width":38.19,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-15.png","element":"img","alt":" ni","inline":true},{"text":", the number of rounds in which arm ","element":"span"},{"style":{"height":15.6},"width":217.98,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-16.png","element":"img","alt":" i outputs ⊥","inline":true},{"text":". Fix some arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"which outputs ","element":"span"},{"style":{"height":14.62},"width":140.47,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-17.png","element":"img","alt":" ⊥ in ni","inline":true,"padRight":true},{"text":"rounds. Arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"is played and therefore receives feedback every time it outputs ","element":"span"},{"style":{"height":12},"width":34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-18.png","element":"img","alt":" ⊥","inline":true,"padRight":true},{"text":"with probability at least 1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/k","element":"span"},{"text":". Thus, using a Chernoff bound, with probability 1 ","element":"span"},{"style":{"height":15.6},"width":360.67,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-19.png","element":"img","alt":" − δ′, arm i receives","inline":true,"padRight":true},{"text":"feedback for ","element":"span"},{"style":{"height":10.62},"width":38.19,"height":26.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-20.png","element":"img","alt":" ni","inline":true,"padRight":true},{"text":"outputs of ","element":"span"},{"style":{"height":12},"width":34,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-21.png","element":"img","alt":" ⊥","inline":true,"padRight":true},{"text":"in at least ","element":"span"},{"style":{"height":31.6},"width":512.07,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-22.png","element":"img","alt":"nik −�2ni ln 2δ′ rounds. Li","inline":true,"padRight":true},{"text":"has the guarantee that there ","element":"span"},{"text":"can be at most ","element":"span"},{"style":{"height":17.6},"width":149.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-23.png","element":"img","alt":" m(ϵ∗, δ∗","inline":true},{"text":") many such rounds (in which it outputs ","element":"span"},{"style":{"height":12.8},"width":120.31,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-24.png","element":"img","alt":" ⊥ and","inline":true,"padRight":true},{"text":"receives feedback). Thus,","element":"span"}],[{"style":{"width":"28%"},"width":535,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-25.png","element":"img"}],[{"text":"If ","element":"span"},{"style":{"height":17.6},"width":321.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-26.png","element":"img","alt":" ni ≥ ck · m(ϵ∗, δ∗","inline":true},{"text":"), this implies","element":"span"}],[{"style":{"width":"48%"},"width":907,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-27.png","element":"img"}],[{"text":"We now analyze cases in which (1) ","element":"span"},{"style":{"height":21.29},"width":857.56,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-28.png","element":"img","alt":" k ln 2δ′ ≤ m(ϵ∗, δ∗) and (2) k ln 2δ′ > m(ϵ∗, δ∗).","inline":true,"padRight":true},{"text":"Case (1) this implies","element":"span"}],[{"style":{"width":"41%"},"width":783,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-29.png","element":"img"}],[{"text":"For ","element":"span"},{"style":{"fontStyle":"italic"},"text":"c > ","element":"span"},{"text":"4, this leads to contradiction. Thus, in this case, if we set ","element":"span"},{"style":{"height":22.49},"width":125.64,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-30.png","element":"img","alt":" δ′ = δk","inline":true},{"text":", we know that with ","element":"span"},{"text":"probability 1 ","element":"span"},{"style":{"height":17.6},"width":422.27,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-31.png","element":"img","alt":" − δ, ni ≤ 4k · m(ϵ∗, δ∗","inline":true},{"text":") which summing up over all ","element":"span"},{"style":{"height":19.9},"width":625.39,"height":49.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-32.png","element":"img","alt":" i implies �i ni ≤ 4k2 · m(ϵ∗, δ∗),","inline":true,"padRight":true},{"text":"as desired.","element":"span"}],[{"text":"In case (2), we have that","element":"span"}],[{"style":{"width":"23%"},"width":444,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-33.png","element":"img"}],[{"text":"which solving for ","element":"span"},{"style":{"height":10.62},"width":38.19,"height":26.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-34.png","element":"img","alt":" ni","inline":true,"padRight":true},{"text":"implies that ","element":"span"},{"style":{"height":21.69},"width":702.62,"height":54.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-35.png","element":"img","alt":" ni = O(k2 ln 1δ′ ), so �i ni = O(k3 ln kδ ","inline":true,"padRight":true},{"text":") by setting ","element":"span"},{"style":{"height":22.49},"width":329.99,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-36.png","element":"img","alt":" δ′ = δk and taking","inline":true,"padRight":true},{"text":"a union bound. Thus, there are at most ","element":"span"},{"style":{"height":21.69},"width":602.56,"height":54.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-37.png","element":"img","alt":" n = O(max(k2 · m(ϵ, δ∗), k3 ln kδ ","inline":true,"padRight":true},{"text":")) rounds in expectation ","element":"span"},{"text":"during the execution of ","element":"span"},{"style":{"height":17.6},"width":379.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-38.png","element":"img","alt":" KWIKToFair(δ, T","inline":true},{"text":") in which some arm outputs ","element":"span"},{"style":{"height":12},"width":45.94,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-39.png","element":"img","alt":" ⊥.","inline":true}],[{"text":"Combining both cases, the total regret incurred by ","element":"span"},{"style":{"height":17.6},"width":379.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-40.png","element":"img","alt":" KWIKToFair(δ, T","inline":true},{"text":") across all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"rounds is","element":"span"}],[{"style":{"width":"98%"},"width":1839,"height":183,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/16-41.png","element":"img"}],[{"text":"Our presentation of ","element":"span"},{"style":{"height":17.6},"width":379.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-0.png","element":"img","alt":" KWIKToFair(δ, T","inline":true},{"text":") has a known time horizon ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":". Its guarantees extend to the case in which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"is unknown via the standard “doubling trick” to prove Theorem ","element":"span"},{"href":"#id-43","text":"4 ","element":"a"},{"text":"in Appendix ","element":"span"},{"text":"C.","element":"span"}],[{"text":"An important instance of the contextual bandit problem is the linear case, where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"consists of the set of all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"linear ","element":"span"},{"text":"functions of bounded norm in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"dimensions. ","element":"span"},{"text":"This captures the natural setting in which the rewards of each arm are governed by an underlying linear regression model on a ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional real valued feature space. The linear case is well studied, and there are known KWIK algorithms ","element":"span"},{"href":"#id-14","referenceIndex":28,"text":"[Strehl and Littman, ","element":"a"},{"href":"#id-14","referenceIndex":28,"text":"2008] ","element":"a"},{"text":"for the set of linear functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"text":", which allows us via our reduction to give a fair contextual bandit algorithm for this setting with a polynomial regret bound.","element":"span"}],[{"id":"id-44","style":{"fontWeight":"bold"},"text":"Lemma 10 ","element":"span"},{"text":"(","element":"span"},{"href":"#id-14","referenceIndex":28,"text":"[Strehl and Littman, ","element":"a"},{"href":"#id-14","referenceIndex":28,"text":"2008]","element":"a"},{"text":")","element":"span"},{"style":{"height":19.53},"width":1055.35,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-1.png","element":"img","alt":". Let C = {fθ|fθ(x) = ⟨θ, x⟩, θ ∈ Rd, ||θ|| ≤ 1} and","inline":true},{"style":{"height":20.41},"width":1649.16,"height":51.03,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-2.png","element":"img","alt":"X = {x ∈ Rd : ||x|| ≤ 1}. C is KWIK learnable with KWIK bound m(ϵ, δ) = ˜O(d3/ϵ4).","inline":true}],[{"text":"Then, an application of Theorem ","element":"span"},{"href":"#id-43","text":"4 ","element":"a"},{"text":"implies that ","element":"span"},{"text":"KWIKToFair ","element":"span"},{"text":"has a polynomial regret guarantee for the class of linear functions. This proof can be found in Appendix ","element":"span"},{"text":"C.","element":"span"}],[{"id":"id-50","style":{"fontWeight":"bold"},"text":"Corollary 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Let ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"X ","element":"span"},{"style":{"fontStyle":"italic"},"text":"be as in Lemma ","element":"span"},{"href":"#id-44","style":{"fontStyle":"italic"},"text":"10, ","element":"a"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"style":{"height":18.22},"width":832.88,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-3.png","element":"img","alt":" fj ∈ C for each j ∈ [k]. Then, KWIKTo-","inline":true},{"style":{"height":17.6},"width":195.78,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-4.png","element":"img","alt":"Fair(T, δ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"using the learner from ","element":"span"},{"href":"#id-14","referenceIndex":28,"style":{"fontStyle":"italic"},"text":"[Strehl and Littman, ","element":"a"},{"href":"#id-14","referenceIndex":28,"style":{"fontStyle":"italic"},"text":"2008] ","element":"a"},{"style":{"fontStyle":"italic"},"text":"has regret:","element":"span"}],[{"style":{"width":"43%"},"width":814,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-5.png","element":"img"}]]},{"heading":"6 Fair Bandit Learnability Implies KWIK Learnability","paragraphs":[[{"text":"In this section, we show how to use a fair, no-regret contextual bandit algorithm to construct a KWIK learning algorithm whose KWIK bound has logarithmic dependence on the number of rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":". Intuitively, any fair algorithm which achieves low regret must both be able to find and exploit an optimal arm (since the algorithm is no-regret) ","element":"span"},{"style":{"fontStyle":"italic"},"text":"and ","element":"span"},{"text":"can only exploit that arm once it has a tight understanding of the qualities of all arms (since the algorithm is fair). Thus, any fair no-regret algorithm will ultimately have tight (1","element":"span"},{"style":{"height":12.8},"width":60.6,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-6.png","element":"img","alt":"− δ","inline":true},{"text":")-confidence about each arm’s reward function.","element":"span"}],[{"id":"id-45","style":{"height":16.4},"width":604.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-7.png","element":"img","alt":"Theorem 6. Suppose A is a δ","inline":true},{"style":{"fontStyle":"italic"},"text":"-fair algorithm for the contextual bandit problem over the class of functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C","element":"span"},{"style":{"fontStyle":"italic"},"text":", with regret bound ","element":"span"},{"style":{"height":17.6},"width":137.02,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-8.png","element":"img","alt":" R(T, δ)","inline":true},{"style":{"fontStyle":"italic"},"text":". Suppose also there exists ","element":"span"},{"style":{"height":17.6},"width":318.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-9.png","element":"img","alt":" f ∈ C, x(ℓ) ∈ X","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for every ","element":"span"},{"style":{"height":21.29},"width":1185.51,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-10.png","element":"img","alt":" ℓ ∈ [⌈ 1ϵ⌉], f(x(ℓ)) = ℓ · ϵ. Then, FairToKWIK is an (ϵ, δ)","inline":true},{"style":{"fontStyle":"italic"},"text":"-KWIK algorithm for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with ","element":"span"},{"style":{"fontStyle":"italic"},"text":"KWIK bound ","element":"span"},{"style":{"height":17.6},"width":384.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-11.png","element":"img","alt":" m(ϵ, δ), with m(ϵ, δ)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"the solution to ","element":"span"},{"style":{"height":24.22},"width":453.39,"height":60.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-12.png","element":"img","alt":"m(ϵ,δ)ϵ4 = R(m(ϵ, δ), ϵδ2T ).","inline":true}],[{"style":{"fontStyle":"italic"},"text":"Remark ","element":"span"},{"text":"3","element":"span"},{"style":{"fontStyle":"italic"},"text":". ","element":"span"},{"text":"The condition that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"should contain a function that can take on values that are multiples of ","element":"span"},{"style":{"height":8},"width":18,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-13.png","element":"img","alt":" ϵ","inline":true,"padRight":true},{"text":"is for technical convenience; ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"can always be augmented by adding a single such function.","element":"span"}],[{"text":"Our aim is to construct a KWIK algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"to predict labels for a sequence of examples labeled with some unknown function ","element":"span"},{"style":{"height":16.4},"width":140.42,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-14.png","element":"img","alt":" f∗ ∈ C","inline":true},{"text":". To do this, we will run our fair contextual bandit algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"on an instance that we construct online as examples ","element":"span"},{"style":{"height":14.73},"width":280.58,"height":36.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-15.png","element":"img","alt":" xt arrive for B","inline":true},{"text":". The idea is to simulate a two arm instance, in which one arm’s rewards are governed by ","element":"span"},{"style":{"height":16.4},"width":43.06,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-16.png","element":"img","alt":" f∗ ","inline":true,"padRight":true},{"text":"(the function to be KWIK learned), and the other arm’s rewards are governed by a function ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f ","element":"span"},{"text":"that we can set to take any value in ","element":"span"},{"style":{"height":17.6},"width":280.45,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-17.png","element":"img","alt":" {0, ϵ, 2ϵ, . . . , 1}","inline":true},{"text":". For each input ","element":"span"},{"style":{"height":14.73},"width":36.94,"height":36.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-18.png","element":"img","alt":" xt","inline":true},{"text":", we perform a thought experiment and consider ","element":"span"},{"style":{"height":14},"width":63.97,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-19.png","element":"img","alt":" A’s","inline":true,"padRight":true},{"text":"probability distribution over arms when facing a context which forces arm 2’s payoff to take each of the values 0","element":"span"},{"style":{"height":16},"width":611.49,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-20.png","element":"img","alt":", ϵ∗, 2ϵ∗, . . . , 1. Since A is fair, A","inline":true,"padRight":true},{"text":"will play arm 1 with weakly higher probability than arm 2 for those ","element":"span"},{"style":{"height":18.73},"width":249.22,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-21.png","element":"img","alt":" ℓ : ℓϵ∗ ≤ f(xt","inline":true},{"text":"); analogously, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"will play arm 1 with weakly lower probability than arm 2 for those ","element":"span"},{"style":{"height":18.73},"width":258.13,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-22.png","element":"img","alt":" ℓ : ℓϵ∗ ≥ f(xt","inline":true},{"text":"). If there are at least 2 values of ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/17-23.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"for which arm 1 and arm 2 are played with equal probability, one of those contexts will force ","element":"span"},{"style":{"height":14},"width":254.74,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-0.png","element":"img","alt":" A to suffer ϵ∗ ","inline":true,"padRight":true},{"text":"regret, so we continue the simulation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"on one of those instances selected at random, forcing at least ","element":"span"},{"style":{"height":17.6},"width":58.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-1.png","element":"img","alt":" ϵ∗/","inline":true},{"text":"2 regret in expectation, and at the same time have ","element":"span"},{"style":{"height":18.73},"width":558.66,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-2.png","element":"img","alt":" B return ⊥. B receives f∗(xt","inline":true},{"text":") on such a round, which is used to construct feedback for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":". Otherwise, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"must transition from playing arm 1 with strictly higher probability to playing 2 with strictly higher probability as ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-3.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"increases: the point at which that occurs will “sandwich” the value of ","element":"span"},{"style":{"height":18.73},"width":266.15,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-4.png","element":"img","alt":" f(xt), since A","inline":true},{"text":"’s fairness implies this transition must occur when the expected payoff of arm 2 exceeds that of arm 1. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"uses this value to output a numeric prediction.","element":"span"}],[{"text":"An important fact we exploit is that we can ","element":"span"},{"style":{"fontStyle":"italic"},"text":"query ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"’s behavior on (","element":"span"},{"style":{"height":18.73},"width":118.46,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-5.png","element":"img","alt":"xt, x(ℓ","inline":true},{"text":")), for any ","element":"span"},{"style":{"height":14.73},"width":126.14,"height":36.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-6.png","element":"img","alt":" xt and","inline":true},{"style":{"height":21.29},"width":187.01,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-7.png","element":"img","alt":"ℓ ∈�⌈ 1ϵ∗ ⌉�","inline":true},{"text":"without providing it feedback (and instead “roll back” its history to ","element":"span"},{"style":{"height":14.73},"width":37.14,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-8.png","element":"img","alt":" ht ","inline":true,"padRight":true},{"text":"not including the query (","element":"span"},{"style":{"height":18.73},"width":118.46,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-9.png","element":"img","alt":"xt, x(ℓ","inline":true},{"text":"))). We update ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"’s history by providing it feedback only in rounds where ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"outputs ","element":"span"},{"style":{"height":12},"width":45.94,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-10.png","element":"img","alt":"⊥.","inline":true}],[{"style":{"width":"98%"},"width":1846,"height":500,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-11.png","element":"img"}],[{"style":{"height":21.41},"width":1424.02,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-12.png","element":"img","alt":"10: Select at ∼ A(ht, (xt, x(ˆℓ))) ▷ Run A","inline":true,"padRight":true},{"text":"to get a predicted arm","element":"span"}],[{"style":{"width":"29%"},"width":553,"height":75,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-13.png","element":"img"}],[{"style":{"height":21.68},"width":1461.83,"height":54.21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-14.png","element":"img","alt":"13: rt1 ← yt and h ← ht :: ((xt, x(ˆℓ)), 1, rt1) ▷","inline":true,"padRight":true},{"text":"Use KWIK feedback","element":"span"}],[{"style":{"width":"25%"},"width":486,"height":63,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-15.png","element":"img"}],[{"style":{"height":21.68},"width":1291.76,"height":54.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-16.png","element":"img","alt":"16: h ← ht :: ((xt, x(ˆℓ)), 2, rt2) ▷","inline":true,"padRight":true},{"text":"Construct feedback for arm 2","element":"span"}],[{"style":{"height":14},"width":1399.16,"height":35,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-17.png","element":"img","alt":"17: else ▷ A","inline":true},{"text":"’s history is not updated","element":"span"}],[{"style":{"width":"66%"},"width":1245,"height":214,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-18.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof. ","element":"span"},{"text":"For a fixed run of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":", we calculate the probability that for all times ","element":"span"},{"style":{"height":21.29},"width":384.31,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-19.png","element":"img","alt":" t and ℓ ∈�⌈ 1ϵ∗ ⌉�, it","inline":true,"padRight":true},{"text":"is the case that ","element":"span"},{"style":{"height":21.49},"width":1549.86,"height":53.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-20.png","element":"img","alt":" pt,ℓ1 > pt,ℓ2 only if f∗(xt) > ℓ · ϵ∗ and also pt,ℓ1 < pt,ℓ2 only if f∗(xt) < ℓ · ϵ∗. In","inline":true,"padRight":true},{"text":"this run, ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is queried on ","element":"span"},{"style":{"height":21.69},"width":28.69,"height":54.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-21.png","element":"img","alt":"Tϵ∗","inline":true,"padRight":true},{"text":"histories and contexts: prefixes of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"h ","element":"span"},{"text":"along with (","element":"span"},{"style":{"height":18.73},"width":118.47,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-22.png","element":"img","alt":"xt, x(ℓ","inline":true},{"text":")) for each ","element":"span"},{"style":{"height":21.29},"width":334.8,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-23.png","element":"img","alt":"t ∈ [T], ℓ ∈�⌈ 1ϵ∗ ⌉�","inline":true},{"text":". The fairness of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"implies for any fixed ","element":"span"},{"style":{"height":18.73},"width":381.23,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-24.png","element":"img","alt":" ht and fixed (xt, x(ℓ","inline":true},{"text":")), with probability 1 ","element":"span"},{"style":{"height":21.49},"width":1338.18,"height":53.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-25.png","element":"img","alt":" − δ∗, pt,ℓ1 > pt,ℓ2 only if f∗(xt) > ℓϵ∗ and pt,ℓ1 < pt,ℓ2 only if f∗(xt) < ℓϵ∗","inline":true},{"text":". Then, by a union bound over ","element":"span"},{"style":{"height":21.29},"width":342.47,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-26.png","element":"img","alt":" t ∈ [T], ℓ ∈�⌈ 1ϵ∗ ⌉�","inline":true},{"text":", with probability at least 1 ","element":"span"},{"style":{"height":21.69},"width":545.26,"height":54.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-27.png","element":"img","alt":" − δ∗ Tϵ∗ = 1 − δ, A(ht, xt, x(ℓ","inline":true},{"text":")) will satisfy this ","element":"span"},{"text":"property for all ","element":"span"},{"style":{"height":21.29},"width":331.3,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-28.png","element":"img","alt":" t ∈ [T], ℓ ∈�⌈ 1ϵ∗ ⌉�","inline":true},{"text":". We condition on this holding in the remainder of the proof.","element":"span"}],[{"text":"We now argue that the numeric predictions of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"are correct within an additive ","element":"span"},{"style":{"height":12.4},"width":124.84,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-29.png","element":"img","alt":" ϵ. Let:","inline":true}],[{"style":{"width":"20%"},"width":390,"height":57,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-30.png","element":"img"}],[{"text":"When ","element":"span"},{"style":{"height":18.73},"width":297.73,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-31.png","element":"img","alt":" B(xt) = yt ∈ [0,","inline":true,"padRight":true},{"text":"1], note that ","element":"span"},{"style":{"height":18.73},"width":292.53,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-32.png","element":"img","alt":" |Et| ≤ 1, else B","inline":true,"padRight":true},{"text":"would have output ","element":"span"},{"style":{"height":12},"width":45.94,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/18-33.png","element":"img","alt":" ⊥.","inline":true}],[{"text":"If ","element":"span"},{"style":{"height":21.89},"width":1218.95,"height":54.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-0.png","element":"img","alt":" pt,ℓ1 ≤ pt,ℓ2 for all ℓ, since |Et| ≤ 1, either pt,01 < pt,02 or pt,11 < pt,12 ","inline":true,"padRight":true},{"text":", which we have conditioned ","element":"span"},{"text":"on implying that either ","element":"span"},{"style":{"height":18.73},"width":1398.72,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-1.png","element":"img","alt":" f∗(xt) < f(x(0)) = 0 or f∗(xt) < f(x(1)) = ϵ∗. Since f∗(xt) ≥ 0, this","inline":true,"padRight":true},{"text":"implies ","element":"span"},{"style":{"height":21.41},"width":759.52,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-2.png","element":"img","alt":" f∗(xt) ∈ [0, ϵ∗) = [ˆℓϵ∗, ϵ∗) = [ˆyt, ˆyt + ϵ∗).","inline":true}],[{"text":"Otherwise, we have that ","element":"span"},{"style":{"height":26.68},"width":1330.49,"height":66.69,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-3.png","element":"img","alt":" pt,ˆℓ1 > pt,ˆℓ2 , and pt,ℓ1 ≤ pt,ℓ2 for all ℓ > ˆℓ. If (a) ˆℓ = ⌈ 1ϵ∗ ⌉, then f∗(xt) > 1,","inline":true,"padRight":true},{"text":"a contradiction, so ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"height":23.03},"width":1491.5,"height":57.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-4.png","element":"img","alt":"ℓ < ⌈ 1ϵ∗ ⌉. If (b) ˆℓ = ⌈ 1ϵ∗ ⌉ − 1, then f∗(xt) > f(x(ˆℓ)) = (⌈ 1ϵ∗ ⌉ − 1)ϵ∗ and so","inline":true},{"style":{"height":23.79},"width":1005.04,"height":59.47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-5.png","element":"img","alt":"f∗(xt) ∈ ((⌈ 1ϵ∗ ⌉ − 1)ϵ∗, 1] = ( ˆyt, ˆyt + ϵ∗], so ˆyt is ϵ∗","inline":true},{"text":"-accurate. If neither (","element":"span"},{"style":{"fontStyle":"italic"},"text":"a","element":"span"},{"text":") nor (","element":"span"},{"style":{"fontStyle":"italic"},"text":"b","element":"span"},{"text":"), then (","element":"span"},{"style":{"fontStyle":"italic"},"text":"c","element":"span"},{"text":") it ","element":"span"},{"text":"must be ","element":"span"},{"text":"ˆ","element":"span"},{"style":{"height":21.29},"width":512.97,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-6.png","element":"img","alt":"ℓ < ⌈ 1ϵ∗ ⌉ − 1. Since |Et| ≤","inline":true,"padRight":true},{"text":"1, for some ","element":"span"},{"style":{"height":21.41},"width":331.48,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-7.png","element":"img","alt":" ℓ ∈ {ˆℓ + 1, ˆℓ + 2}","inline":true},{"text":", we know that ","element":"span"},{"style":{"height":21.49},"width":305.29,"height":53.73,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-8.png","element":"img","alt":" pt,ℓ1 < pt,ℓ2 ; thus,","inline":true},{"style":{"height":21.41},"width":533.5,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-9.png","element":"img","alt":"f∗(xt) < f(x(ℓ)) ≤ (ˆℓ + 2)ϵ∗ ","inline":true,"padRight":true},{"text":"and therefore ","element":"span"},{"style":{"height":21.41},"width":753.17,"height":53.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-10.png","element":"img","alt":" f∗(xt) ∈ (ˆℓϵ∗, (ˆℓ + 2)ϵ∗) = (ˆyt, ˆyt + 2ϵ∗).","inline":true}],[{"text":"Finally, we upper-bound ","element":"span"},{"style":{"height":17.6},"width":112.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-11.png","element":"img","alt":" m(ϵ, δ","inline":true},{"text":"), the number of rounds ","element":"span"},{"style":{"height":18.73},"width":247.39,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-12.png","element":"img","alt":" t : B(xt) = ⊥","inline":true},{"text":". For each such ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"runs on a random draw of one of two contexts, one of whose arms’ payoffs differ by at least ","element":"span"},{"style":{"height":12.73},"width":34.71,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-13.png","element":"img","alt":" ϵ∗","inline":true},{"text":". Thus, for one of those contexts, either ","element":"span"},{"style":{"height":18.73},"width":738.66,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-14.png","element":"img","alt":" f∗(xt) ≥ f(x)−ϵ∗ or f∗(xt) ≤ f(x)−ϵ∗","inline":true},{"text":". In either case, since ","element":"span"},{"style":{"height":22.51},"width":257,"height":56.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-15.png","element":"img","alt":" pt,ℓ1 = pt,ℓ2 = 12","inline":true,"padRight":true},{"text":"for ","element":"span"},{"style":{"height":17.6},"width":221.93,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-16.png","element":"img","alt":" x(ℓ) = x, A","inline":true,"padRight":true},{"text":"suffers expected regret at least ","element":"span"},{"style":{"height":21.79},"width":28.7,"height":54.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-17.png","element":"img","alt":"ϵ∗2 ","inline":true,"padRight":true},{"text":"for that context, and at least ","element":"span"},{"style":{"height":21.79},"width":28.69,"height":54.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-18.png","element":"img","alt":" ϵ∗4 ","inline":true,"padRight":true},{"text":"when faced with ","element":"span"},{"text":"one chosen at random. Thus, ","element":"span"},{"style":{"height":22.49},"width":1269.24,"height":56.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-19.png","element":"img","alt":" m(ϵ, δ) · ϵ∗4 = m(ϵ, δ) · ϵ8 < R(m(ϵ, δ), δ∗) = R(m(ϵ, δ), ϵδ2T ), since","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"A","element":"span"},{"text":"’s regret is upper bounded by this quantity over ","element":"span"},{"style":{"height":17.6},"width":112.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-20.png","element":"img","alt":" m(ϵ, δ","inline":true},{"text":") rounds (which is an upper bound on the number of rounds for which ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"is actually run and updated).","element":"span"}],[{"id":"id-51","style":{"fontWeight":"bold"},"text":"6.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"An Exponential Separation Between Fair and Unfair Learning","element":"span"}],[{"text":"In this section, we exploit the other direction of the equivalence we have proven between fair contextual bandit algorithms and KWIK learning algorithms to give a simple contextual bandit problem for which fairness imposes an ","element":"span"},{"style":{"fontStyle":"italic"},"text":"exponential ","element":"span"},{"text":"cost in its regret bound. This is in contrast to the case in which the underlying class of functions is linear, for which we gave fair contextual bandit algorithms with regret bounds within a polynomial factor of their unconstrained counterparts. In this problem, the context domain is the ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":"-dimensional boolean hypercube: ","element":"span"},{"style":{"height":19.53},"width":419.37,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-21.png","element":"img","alt":" X = {0, 1}d – i.e. the","inline":true,"padRight":true},{"text":"context each round for each individual consists of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"boolean attributes. Our class of functions ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"text":"is the class of boolean ","element":"span"},{"style":{"fontStyle":"italic"},"text":"conjunctions","element":"span"},{"text":":","element":"span"}],[{"style":{"width":"75%"},"width":1404,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-22.png","element":"img"}],[{"text":"We first give a simple but unfair algorithm, ","element":"span"},{"text":"ConjunctionBandit","element":"span"},{"text":", for this problem which obtains a regret bound which is linear in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":". It maintains a set of candidate variables ","element":"span"},{"style":{"height":19.69},"width":220.64,"height":49.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-23.png","element":"img","alt":" C∗j for each","inline":true,"padRight":true},{"text":"conjunction ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-24.png","element":"img","alt":" fj","inline":true},{"text":"; this set shrinks across rounds, while always containing the true set of variables over which ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-25.png","element":"img","alt":" fj","inline":true,"padRight":true},{"text":"is defined. We denote the boolean value of variable ","element":"span"},{"style":{"fontStyle":"italic"},"text":"m ","element":"span"},{"text":"in the context for arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"in round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"by ","element":"span"},{"style":{"height":21.62},"width":91.96,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-26.png","element":"img","alt":" xtj,m.","inline":true}],[{"text":"The formal claim and proof that ","element":"span"},{"text":"ConjunctionBandit ","element":"span"},{"text":"achieves regret ","element":"span"},{"style":{"height":19.13},"width":441.87,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-27.png","element":"img","alt":" R(T) = O(k2d), as well","inline":true,"padRight":true},{"text":"as ","element":"span"},{"text":"ConjunctionBandit","element":"span"},{"text":"’s formal description, can be found in Appendix ","element":"span"},{"text":"C. ","element":"span"},{"text":"ConjunctionBandit ","element":"span"},{"text":"violates the fairness in every round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"in which it predicts 0 for arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"but 1 for arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"even though ","element":"span"},{"style":{"height":22.56},"width":757.89,"height":56.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-28.png","element":"img","alt":"fi(xt) = fj(xt) = 1, as πti = 0 < 1k < πtj.","inline":true}],[{"text":"We now show that fair algorithms cannot guarantee subexponential regret in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":". This relies upon a known lower bound for KWIK learning conjunctions ","element":"span"},{"href":"#id-15","referenceIndex":21,"text":"[Li, ","element":"a"},{"href":"#id-15","referenceIndex":21,"text":"2009]","element":"a"},{"text":":","element":"span"}],[{"id":"id-46","style":{"fontWeight":"bold"},"text":"Lemma 11. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"There exists a sequence of examples ","element":"span"},{"text":"(","element":"span"},{"style":{"height":22.03},"width":262.34,"height":55.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-29.png","element":"img","alt":"x1, . . . , x2d−1)","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"such that for ","element":"span"},{"style":{"height":17.6},"width":326.9,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-30.png","element":"img","alt":" ϵ, δ ≤ 1/2, every","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":17.6},"width":75.15,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-31.png","element":"img","alt":"ϵ, δ)","inline":true},{"style":{"fontStyle":"italic"},"text":"-KWIK learning algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"style":{"fontStyle":"italic"},"text":"for the class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of conjunctions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"style":{"fontStyle":"italic"},"text":"variables must output ","element":"span"},{"style":{"height":16.4},"width":106.05,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-32.png","element":"img","alt":" ⊥ for","inline":true},{"style":{"height":19.53},"width":619.56,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-33.png","element":"img","alt":"xt for each t ∈ [2d − 1]. Thus, B","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":"has a KWIK bound of at least ","element":"span"},{"style":{"height":19.53},"width":308.28,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-34.png","element":"img","alt":" m(ϵ, δ) = Ω(2d).","inline":true}],[{"text":"We then use the equivalence between fair algorithms and KWIK learning to translate this lower bound on ","element":"span"},{"style":{"height":17.6},"width":112.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/19-35.png","element":"img","alt":" m(ϵ, δ","inline":true},{"text":") into a minimum worst case regret bound for fair algorithms on conjunctions. We modify Theorem ","element":"span"},{"href":"#id-45","text":"6 ","element":"a"},{"text":"to yield the following lemma, proven in Appendix ","element":"span"},{"text":"C.","element":"span"}],[{"id":"id-47","style":{"height":16.4},"width":580.24,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-0.png","element":"img","alt":"Lemma 12. Suppose A is a δ","inline":true},{"style":{"fontStyle":"italic"},"text":"-fair algorithm for the contextual bandit problem over the class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of conjunctions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"style":{"fontStyle":"italic"},"text":"variables. If ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"style":{"fontStyle":"italic"},"text":"has regret bound ","element":"span"},{"style":{"height":17.6},"width":909.86,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-1.png","element":"img","alt":" R(T, δ) then for δ′ = 2Tδ, FairToKWIK is an","inline":true,"padRight":true},{"text":"(0","element":"span"},{"style":{"height":17.6},"width":68.62,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-2.png","element":"img","alt":", δ′)","inline":true},{"style":{"fontStyle":"italic"},"text":"-KWIK algorithm for ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"with KWIK bound ","element":"span"},{"style":{"height":17.6},"width":492.23,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-3.png","element":"img","alt":" m(0, δ′) = 4R(m(0, δ′), δ).","inline":true}],[{"text":"Lemma ","element":"span"},{"href":"#id-46","text":"11 ","element":"a"},{"text":"then lets us lower-bound the worst case regret of fair learning algorithms on conjunctions.","element":"span"}],[{"style":{"height":21.29},"width":626.75,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-4.png","element":"img","alt":"Corollary 2. For δ < 12T , any δ","inline":true},{"style":{"fontStyle":"italic"},"text":"-fair algorithm for the contextual bandit problem over the class ","element":"span"},{"style":{"fontStyle":"italic"},"text":"C ","element":"span"},{"style":{"fontStyle":"italic"},"text":"of conjunctions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"style":{"fontStyle":"italic"},"text":"boolean variables has a worst case regret bound of ","element":"span"},{"style":{"height":19.53},"width":276.85,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-5.png","element":"img","alt":" R(T) = Ω(2d).","inline":true}],[{"style":{"height":19.13},"width":383.34,"height":47.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-6.png","element":"img","alt":"Proof. Let T ≤ 2d−1","inline":true},{"text":". We know then that if ","element":"span"},{"style":{"height":13.2},"width":78.35,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-7.png","element":"img","alt":" δ′ <","inline":true,"padRight":true},{"text":"1, Lemma ","element":"span"},{"href":"#id-46","text":"11 ","element":"a"},{"text":"guarantees the existence of a sequence of contexts ","element":"span"},{"style":{"height":18.33},"width":189.78,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-8.png","element":"img","alt":" x1, . . . , xT ","inline":true,"padRight":true},{"text":"for which any (0","element":"span"},{"style":{"height":15.6},"width":56.44,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-9.png","element":"img","alt":", δ′","inline":true},{"text":")-KWIK algorithm has KWIK bound ","element":"span"},{"style":{"height":17.6},"width":295.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-10.png","element":"img","alt":" m(T, 0, δ′) = T.","inline":true}],[{"text":"Lemma ","element":"span"},{"href":"#id-47","text":"12 ","element":"a"},{"text":"implies 4","element":"span"},{"style":{"height":17.6},"width":284.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-11.png","element":"img","alt":"R(m(T, 0, δ′), δ","inline":true},{"text":") gives a KWIK bound of ","element":"span"},{"style":{"height":17.6},"width":629.99,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-12.png","element":"img","alt":" m(T, 0, δ′) when δ′ = 2Tδ. Thus,","inline":true,"padRight":true},{"text":"if ","element":"span"},{"style":{"height":24.22},"width":1151.47,"height":60.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/20-13.png","element":"img","alt":" δ < 12T , then δ′ < 1 and so R(m(T, 0, δ′), δ) = m(T,0,δ′)4 = T4 .","inline":true}],[{"text":"Together with the analysis of ","element":"span"},{"text":"ConjunctionBandit","element":"span"},{"text":", this demonstrates a strong separation between fair and unfair contextual bandit algorithms: when the underlying functions mapping contexts to payoffs are conjunctions on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"variables, there exist a sequence of contexts on which fair algorithms must incur regret exponential in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"while unfair algorithms can achieve regret linear in ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d","element":"span"},{"text":".","element":"span"}]]},{"heading":"References","paragraphs":[[{"id":"id-21","text":"Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh ","element":"span"},{"text":"Venkatasubramanian. ","element":"span"},{"text":"Auditing black-box models by obscuring features. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"CoRR","element":"span"},{"text":", abs/1602.07043, 2016. ","element":"span"},{"text":"URL ","element":"span"},{"href":"http://arxiv.org/abs/1602.07043","style":{"fontFamily":"monospace"},"text":"http://arxiv.org/abs/1602.07043","element":"a"},{"text":".","element":"span"}],[{"id":"id-10","text":"Alekh Agarwal, Daniel J. Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire. Taming the monster: ","element":"span"},{"text":"A fast and simple algorithm for contextual bandits. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014","element":"span"},{"text":", pages 1638–1646, 2014.","element":"span"}],[{"id":"id-24","text":"Kareem Amin, Michael Kearns, and Umar Syed. ","element":"span"},{"text":"Graphical models for bandit problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"arXiv preprint arXiv:1202.3782","element":"span"},{"text":", 2012.","element":"span"}],[{"id":"id-25","text":"Kareem Amin, Michael Kearns, Moez Draief, and Jacob D Abernethy. Large-scale bandit problems and kwik learning. ","element":"span"},{"text":"In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 30th International Conference on Machine Learning (ICML-13)","element":"span"},{"text":", pages 588–596, 2013.","element":"span"}],[{"id":"id-9","text":"Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Machine learning","element":"span"},{"text":", 47(2-3):235–256, 2002.","element":"span"}],[{"id":"id-5","text":"Solon Barocas and Andrew D. Selbst. Big data’s disparate impact. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"California Law Review","element":"span"},{"text":", 104, 2016. Available at SSRN: http://ssrn.com/abstract=2477899.","element":"span"}],[{"id":"id-3","text":"Anna Maria Barry-Jester, Ben Casselman, and Dana Goldstein. The new science of sentencing. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The Marshall Project","element":"span"},{"text":", August 8 2015. URL ","element":"span"},{"href":"https://www.themarshallproject.org/2015/08/04/the-new-science-of-sentencing","style":{"fontFamily":"monospace"},"text":"https://www.themarshallproject.org/2015/08/04/the-new-science-of-sentencing","element":"a"},{"text":". Retrieved 4/28/2016.","element":"span"}],[{"id":"id-11","text":"Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with ","element":"span"},{"text":"supervised learning guarantees. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011","element":"span"},{"text":", pages 19–26, 2011.","element":"span"}],[{"id":"id-30","text":"S´ebastien Bubeck and Nicolo Cesa-Bianchi. ","element":"span"},{"text":"Regret analysis of stochastic and nonstochastic multi-armed bandit problems. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Machine Learning","element":"span"},{"text":", 5(1):1–122, 2012.","element":"span"}],[{"id":"id-1","text":"Nanette Byrnes. ","element":"span"},{"text":"Artificial intolerance. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"MIT Technology Review","element":"span"},{"text":", ","element":"span"},{"text":"March 28 2016. ","element":"span"},{"text":"URL ","element":"span"},{"href":"https://www.technologyreview.com/s/600996/artificial-intolerance/","style":{"fontFamily":"monospace"},"text":"https://www. ","element":"a"},{"href":"https://www.technologyreview.com/s/600996/artificial-intolerance/","style":{"fontFamily":"monospace"},"text":"technologyreview.com/s/600996/artificial-intolerance/","element":"a"},{"text":". Retrieved 4/28/2016.","element":"span"}],[{"id":"id-16","text":"Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Data Mining and Knowledge Discovery","element":"span"},{"text":", 21(2):277–292, 2010.","element":"span"}],[{"id":"id-12","text":"Wei Chu, Lihong Li, Lev Reyzin, and Robert E. Schapire. ","element":"span"},{"text":"Contextual bandits with linear payoff functions. ","element":"span"},{"text":"In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011","element":"span"},{"text":", pages 208–214, 2011.","element":"span"}],[{"id":"id-4","text":"Cary Coglianese and David Lehr. Regulating by robot: Administrative decision-making in the machine-learning era. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Georgetown Law Journal","element":"span"},{"text":", 2016. Forthcoming.","element":"span"}],[{"id":"id-22","text":"Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 3rd Innovations in Theoretical Computer Science Conference","element":"span"},{"text":", pages 214–226. ACM, 2012.","element":"span"}],[{"id":"id-19","text":"Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying ","element":"span"},{"text":"and removing disparate impact. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015","element":"span"},{"text":", pages 259–268, 2015.","element":"span"}],[{"id":"id-20","text":"Benjamin Fish, Jeremy Kun, and ","element":"span"},{"text":"´","element":"span"},{"text":"Ad´am D Lelkes. A confidence-based approach for balancing fairness and accuracy. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"SIAM International Symposium on Data Mining","element":"span"},{"text":", 2016.","element":"span"}],[{"text":"FTC Commisioner Julie Brill. Navigating the “trackless ocean”: Fairness in big data research and decision making. ","element":"span"},{"text":"Keynote Address at the Columbia University Data Science Institute, April 2015.","element":"span"}],[{"id":"id-18","text":"Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. ","element":"span"},{"text":"In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on","element":"span"},{"text":", pages 643–650. IEEE, 2011.","element":"span"}],[{"id":"id-8","text":"Michael N Katehakis and Herbert Robbins. Sequential choice from several populations. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"PROCEEDINGS-NATIONAL ACADEMY OF SCIENCES USA","element":"span"},{"text":", 92:8584–8584, 1995.","element":"span"}],[{"id":"id-7","text":"Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in applied mathematics","element":"span"},{"text":", 6(1):4–22, 1985.","element":"span"}],[{"id":"id-15","text":"Lihong Li. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A unifying framework for computational reinforcement learning theory","element":"span"},{"text":". PhD thesis, Rutgers, The State University of New Jersey, 2009.","element":"span"}],[{"id":"id-13","text":"Lihong Li, Michael L Littman, Thomas J Walsh, and Alexander L Strehl. Knows what it knows: a framework for ","element":"span"},{"text":"self-aware learning. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Machine learning","element":"span"},{"text":", 82(3):399–443, 2011.","element":"span"}],[{"id":"id-17","text":"Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. ","element":"span"},{"text":"k-nn as an implementation of situation testing for discrimination discovery and prevention. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining","element":"span"},{"text":", pages 502–510. ACM, 2011.","element":"span"}],[{"id":"id-0","text":"Clair C Miller. Can an algorithm hire better than a human? ","element":"span"},{"style":{"fontStyle":"italic"},"text":"The New York Times","element":"span"},{"text":", June 25 2015. URL ","element":"span"},{"href":"http://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html","style":{"fontFamily":"monospace"},"text":"http://www. ","element":"a"},{"href":"http://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html","style":{"fontFamily":"monospace"},"text":"nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html","element":"a"},{"text":". Retrieved 4/28/2016.","element":"span"}],[{"text":"Cecilia Munoz, Megan Smith, and DJ Patil. Big data: A report on algorithmic systems, opportunity, and civil rights. ","element":"span"},{"text":"Technical report, Executive Office of the President, The White House, 2016.","element":"span"}],[{"text":"John Podesta, Penny Pritzker, Ernest J. Moniz, John Holdern, and Jeffrey Zients. Big data: Seizing opportunities, ","element":"span"},{"text":"protecting values. Technical report, Executive Office of the President, The White House, 2014.","element":"span"}],[{"id":"id-2","text":"Cynthia ","element":"span"},{"text":"Rudin. ","element":"span"},{"text":"Predictive ","element":"span"},{"text":"policing ","element":"span"},{"text":"using ","element":"span"},{"text":"machine ","element":"span"},{"text":"learning ","element":"span"},{"text":"to ","element":"span"},{"text":"detect ","element":"span"},{"text":"patterns ","element":"span"},{"text":"of crime. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Wired ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Magazine","element":"span"},{"text":", ","element":"span"},{"text":"August ","element":"span"},{"text":"2013. ","element":"span"},{"text":"URL ","element":"span"},{"href":"http://www.wired.com/insights/2013/08/predictive-policing-using-machine-learning-to-detect- \\ patterns-of-crime/","style":{"fontFamily":"monospace"},"text":"http://www.wired.com/insights/2013/08/ ","element":"a"},{"href":"http://www.wired.com/insights/2013/08/predictive-policing-using-machine-learning-to-detect- \\ patterns-of-crime/","style":{"fontFamily":"monospace"},"text":"predictive-policing-using-machine-learning-to-detect-\\patterns-of-crime/","element":"a"},{"text":". Retrieved 4/28/2016.","element":"span"}],[{"id":"id-14","text":"Alexander L Strehl and Michael L Littman. Online linear regression and its application to model-based reinforcement ","element":"span"},{"text":"learning. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Advances in Neural Information Processing Systems","element":"span"},{"text":", pages 1417–1424, 2008.","element":"span"}],[{"id":"id-6","text":"Latanya Sweeney. Discrimination in online ad delivery. ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Commununications of the ACM","element":"span"},{"text":", 56(5):44–54, 2013.","element":"span"}],[{"id":"id-23","text":"Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In ","element":"span"},{"style":{"fontStyle":"italic"},"text":"Proceedings of the 30th International Conference on Machine Learning (ICML-13)","element":"span"},{"text":", pages 325–333, 2013.","element":"span"}]]},{"heading":"A Missing Proofs for the Classic Stochastic Bandits Upper Bound","paragraphs":[[{"text":"We begin by proving Lemma ","element":"span"},{"href":"#id-28","text":"1, ","element":"a"},{"text":"used in Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"to prove the fairness of the ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"algorithm.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-28","style":{"fontStyle":"italic"},"text":"1. ","element":"a"},{"text":"Choose an arbitrary arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t ","element":"span"},{"text":"and define indicator variables ","element":"span"},{"style":{"height":18.75},"width":259.8,"height":46.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-0.png","element":"img","alt":" X1, . . . , Xni(t)","inline":true,"padRight":true},{"text":"where ","element":"span"},{"style":{"height":14.62},"width":57.15,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-1.png","element":"img","alt":" Xn","inline":true,"padRight":true},{"text":"takes on the reward of pull ","element":"span"},{"style":{"fontStyle":"italic"},"text":"n ","element":"span"},{"text":"of arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":". By a Chernoff bound, for any ","element":"span"},{"style":{"height":14.8},"width":115.06,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-2.png","element":"img","alt":" a ≥ 0,","inline":true}],[{"style":{"width":"82%"},"width":1546,"height":458,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-3.png","element":"img"}],[{"text":"By a union bound over all rounds ","element":"span"},{"style":{"fontStyle":"italic"},"text":"t","element":"span"},{"text":", the probability of any true mean ever falling outside of its confidence interval is at most ","element":"span"},{"style":{"height":21.75},"width":349.48,"height":54.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-4.png","element":"img","alt":" δ( 6π2�∞t=1 1t2 ) = δ.","inline":true}],[{"text":"Next, we prove Lemma ","element":"span"},{"href":"#id-31","text":"2, ","element":"a"},{"text":"which we used in Section ","element":"span"},{"text":"3 ","element":"span"},{"text":"to bound the regret of ","element":"span"},{"text":"FairBandits ","element":"span"},{"text":"in Theorem ","element":"span"},{"href":"#id-29","text":"2.","element":"a"}],[{"href":"#id-31","style":{"height":16.4},"width":633.05,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-5.png","element":"img","alt":"Proof of Lemma 2. Let X1, ..., Xt","inline":true,"padRight":true},{"text":"be indicator variables of whether ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"was pulled at each time ","element":"span"},{"style":{"height":17.6},"width":132.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-6.png","element":"img","alt":" t′ ∈ [t].","inline":true,"padRight":true},{"text":"Let ","element":"span"},{"style":{"height":20.76},"width":931.23,"height":51.9,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-7.png","element":"img","alt":" Mt = �t′≤t Xt′, with E [Mt] = pt. For any ϵ ∈ [0,","inline":true,"padRight":true},{"text":"1], a standard additive Chernoff bound states ","element":"span"},{"text":"that","element":"span"}],[{"style":{"width":"25%"},"width":471,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-8.png","element":"img"}],[{"text":"Since ","element":"span"},{"style":{"height":15.13},"width":109.64,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-9.png","element":"img","alt":" i ∈ St","inline":true},{"text":", it must be that ","element":"span"},{"style":{"height":16.73},"width":631.21,"height":41.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-10.png","element":"img","alt":" i ∈ St′ for all t′ ≤ t and all i ∈ St","inline":true},{"text":", by the definition of ","element":"span"},{"text":"FairBandits","element":"span"},{"text":". Thus, ","element":"span"},{"style":{"height":21.29},"width":547.48,"height":53.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-11.png","element":"img","alt":" P [Xi = 1] ≥ 1k for any i ∈ St","inline":true},{"text":", and therefore ","element":"span"},{"style":{"height":20.89},"width":117.14,"height":52.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-12.png","element":"img","alt":" pt ≥ tk","inline":true},{"text":". so this also implies that","element":"span"}],[{"style":{"width":"48%"},"width":903,"height":192,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-13.png","element":"img"}],[{"text":"Setting ","element":"span"},{"style":{"height":11.6},"width":79.59,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-14.png","element":"img","alt":" ϵt =","inline":true}],[{"style":{"width":"53%"},"width":1005,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-15.png","element":"img"}],[{"text":"as desired. Then, taking a union bound over all active arms of which there are at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k","element":"span"},{"text":", the claim follows.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-32","style":{"fontStyle":"italic"},"text":"3. ","element":"a"},{"text":"This follows from the definition of ","element":"span"},{"style":{"height":19.22},"width":88.78,"height":48.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-16.png","element":"img","alt":" ℓti, uti ","inline":true,"padRight":true},{"text":"and the lower bound on ","element":"span"},{"style":{"height":19.22},"width":222.2,"height":48.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/22-17.png","element":"img","alt":" nti provided","inline":true,"padRight":true},{"text":"by the assumption of the lemma.","element":"span"}],[{"id":"id-33","style":{"fontWeight":"bold"},"text":"A.1 ","element":"span"},{"style":{"fontWeight":"bold"},"text":"Missing Derivation of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"R","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":") ","element":"span"},{"style":{"fontWeight":"bold"},"text":"for Theorem ","element":"span"},{"href":"#id-29","style":{"fontWeight":"bold"},"text":"2","element":"a"}],[{"style":{"width":"66%"},"width":1243,"height":1407,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-0.png","element":"img"}],[{"text":"where the final step follows from ","element":"span"},{"style":{"height":25.51},"width":153.45,"height":63.76,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-1.png","element":"img","alt":" δ ≤ 1√T .","inline":true}]]},{"heading":"B Missing Proofs for the Classic Stochastic Bandits Lower Bound","paragraphs":[[{"text":"All lemmas in this section are used in Section ","element":"span"},{"text":"4 ","element":"span"},{"text":"to prove the fair lower bound in Theorem ","element":"span"},{"href":"#id-34","text":"3. ","element":"a"},{"text":"The first, Lemma ","element":"span"},{"href":"#id-36","text":"4, ","element":"a"},{"text":"lets us analyze distributions over payoffs.","element":"span"}],[{"href":"#id-36","style":{"height":16.4},"width":508.41,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-2.png","element":"img","alt":"Proof of Lemma 4. Let Ri","inline":true,"padRight":true},{"text":"represent the joint distribution on rewards for either experiment: in both cases, the joint distribution on rewards is identical, since the process which generates them is the same.","element":"span"}],[{"text":"We will use the notation ","element":"span"},{"style":{"height":17.93},"width":231.02,"height":44.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-3.png","element":"img","alt":" m, d1, . . . , dt ","inline":true,"padRight":true},{"text":"to represent some fixed realization of the random variables ","element":"span"},{"style":{"height":19.62},"width":559.44,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-4.png","element":"img","alt":"µi, r1i , . . . , rti and µ′i, r1i , . . . , rti","inline":true},{"text":". In particular, it suffices to show that","element":"span"}],[{"style":{"height":24.39},"width":409.48,"height":60.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-5.png","element":"img","alt":"P(µi,r1i ,...,rti)∼W�(µi, r1i ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", . . . , r","element":"span"},{"style":{"height":20},"width":204.98,"height":50.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-6.png","element":"img","alt":"ti) = (m, d1","inline":true},{"style":{"fontStyle":"italic"},"text":", . . . , d","element":"span"},{"style":{"height":24.39},"width":527.78,"height":60.97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-7.png","element":"img","alt":"t)�= P(µ′i,r1i ,...,rti)∼W ′�(µ′i, r1i ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", . . . , r","element":"span"},{"style":{"height":20},"width":204.98,"height":50.01,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-8.png","element":"img","alt":"ti) = (m, d1","inline":true},{"style":{"fontStyle":"italic"},"text":", . . . , d","element":"span"},{"style":{"height":21.07},"width":68.65,"height":52.67,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-9.png","element":"img","alt":"t)�.","inline":true}],[{"text":"The first experiment which generates (","element":"span"},{"style":{"height":19.62},"width":228.91,"height":49.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-10.png","element":"img","alt":"µi, r1i , . . . , rti","inline":true},{"text":") according to ","element":"span"},{"style":{"fontStyle":"italic"},"text":"W ","element":"span"},{"text":"has probability mass on this ","element":"span"},{"text":"particular value of its random variables:","element":"span"}],[{"style":{"height":24.39},"width":409.48,"height":60.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-11.png","element":"img","alt":"P(µi,r1i ,...,rti)∼W�(µi, r1i ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", . . . , r","element":"span"},{"style":{"height":20.01},"width":204.98,"height":50.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-12.png","element":"img","alt":"ti) = (m, d1","inline":true},{"style":{"fontStyle":"italic"},"text":", . . . , d","element":"span"},{"style":{"height":24.39},"width":760.37,"height":60.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-13.png","element":"img","alt":"t)�= Pµi∼Pi [µi = m] · Pr1i ,...,rti∼B(µi)�(r1i ","inline":true,"padRight":true},{"style":{"fontStyle":"italic"},"text":", . . . , r","element":"span"},{"style":{"height":20.01},"width":147.27,"height":50.02,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-14.png","element":"img","alt":"ti) = (d1","inline":true},{"style":{"fontStyle":"italic"},"text":", . . . , d","element":"span"},{"style":{"height":21.07},"width":49.2,"height":52.67,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/23-15.png","element":"img","alt":"t)�","inline":true}],[{"text":"The second experiment has joint probability:","element":"span"}],[{"style":{"width":"68%"},"width":1273,"height":215,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-0.png","element":"img"}],[{"text":"where equality follows from Bayes’ Rule.","element":"span"}],[{"text":"Next, we prove Lemma ","element":"span"},{"href":"#id-37","text":"5, ","element":"a"},{"text":"used to reason about distinguishing between arms.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-37","style":{"fontStyle":"italic"},"text":"5. ","element":"a"},{"text":"Since neither ","element":"span"},{"style":{"height":17.6},"width":321.26,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-1.png","element":"img","alt":" i nor i + 1 is√δ","inline":true},{"text":"-distinguished by ","element":"span"},{"style":{"height":21.29},"width":560.75,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-2.png","element":"img","alt":" ht, for any αi ∈ {13 + i3k, 13 +","inline":true}],[{"style":{"width":"99%"},"width":1871,"height":380,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-3.png","element":"img"}],[{"text":"which completes the proof.","element":"span"}],[{"text":"Finally, we prove Lemma ","element":"span"},{"href":"#id-38","text":"6, ","element":"a"},{"text":"which lets us reason about how fair algorithm choices depend on histories.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-38","style":{"fontStyle":"italic"},"text":"6. ","element":"a"},{"text":"We will define a set of histories which cause ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"to play some pair of arms ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"+ 1 with different probabilities when ","element":"span"},{"style":{"height":12},"width":211.04,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-4.png","element":"img","alt":" µi = µi+1.","inline":true,"padRight":true},{"text":"Define the set unfair(","element":"span"},{"style":{"height":16.8},"width":80.26,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-5.png","element":"img","alt":"A, µ","inline":true},{"text":") such that ","element":"span"},{"style":{"height":18.74},"width":302.78,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-6.png","element":"img","alt":"ht ∈ unfair(A, µ","inline":true},{"text":") if there exist ","element":"span"},{"style":{"height":17.6},"width":319.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-7.png","element":"img","alt":" i ∈ [k − 1], t′ ∈ [t","inline":true},{"text":"] such that ","element":"span"},{"style":{"height":25.68},"width":572.03,"height":64.21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-8.png","element":"img","alt":" µi = µi+1 but πt′i|ht′ ̸= πt′i+1|ht′.","inline":true}],[{"text":"Consider some ","element":"span"},{"style":{"height":14.73},"width":37.14,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-9.png","element":"img","alt":" ht ","inline":true,"padRight":true},{"text":"which has not","element":"span"},{"style":{"height":17.6},"width":78.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-10.png","element":"img","alt":"√2δ","inline":true},{"text":"-distinguished any arm, such that there exists some ","element":"span"},{"style":{"height":15.6},"width":126.27,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-11.png","element":"img","alt":" i, t′ for","inline":true,"padRight":true},{"text":"which ","element":"span"},{"style":{"height":26.62},"width":176.83,"height":66.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-12.png","element":"img","alt":" πt′i|ht′ ̸= 1k","inline":true},{"text":". Then, in particular, there exists some ","element":"span"},{"style":{"height":17.6},"width":153.55,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-13.png","element":"img","alt":" i ∈ [k −","inline":true,"padRight":true},{"text":"1] such that ","element":"span"},{"style":{"height":25.68},"width":378.15,"height":64.21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-14.png","element":"img","alt":" πt′i|ht′ ̸= πt′i+1|ht′. By","inline":true,"padRight":true},{"text":"Lemma ","element":"span"},{"href":"#id-37","text":"5, ","element":"a"},{"text":"for all ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i ","element":"span"},{"text":"and in particular this ","element":"span"},{"style":{"fontStyle":"italic"},"text":"i","element":"span"},{"text":", it is the case that 2","element":"span"},{"style":{"height":21.69},"width":692.65,"height":54.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-15.png","element":"img","alt":"δ < Pµ′∼P|ht�µ′i = µ′i+1�= X and so","inline":true}],[{"style":{"width":"89%"},"width":1671,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-16.png","element":"img"}],[{"text":"where the first equality comes from the fact that ","element":"span"},{"style":{"height":14.73},"width":37.14,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-17.png","element":"img","alt":" ht ","inline":true,"padRight":true},{"text":"is a history for which ","element":"span"},{"style":{"height":23.56},"width":431.73,"height":58.89,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-18.png","element":"img","alt":" πt′i|ht ̸= πt′i+1|ht and the","inline":true,"padRight":true},{"text":"second equality from the definition of the set unfair.","element":"span"}],[{"text":"We will show that Equation ","element":"span"},{"href":"#id-48","text":"4 ","element":"a"},{"text":"cannot hold with probability more than ","element":"span"},{"style":{"height":21.29},"width":17,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-19.png","element":"img","alt":"12 ","inline":true,"padRight":true},{"text":"over the draw of ","element":"span"},{"style":{"height":18.33},"width":82.83,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-20.png","element":"img","alt":" µ, ht","inline":true,"padRight":true},{"text":"from the underlying distribution, or else ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"would not satisfy ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-21.png","element":"img","alt":" δ","inline":true},{"text":"-fairness. Since ","element":"span"},{"style":{"height":13.2},"width":113.98,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-22.png","element":"img","alt":" A is δ","inline":true},{"text":"-fair, for any fixed ","element":"span"},{"style":{"height":12},"width":26,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-23.png","element":"img","alt":" µ","inline":true}],[{"id":"id-48","style":{"width":"30%"},"width":577,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-24.png","element":"img"}],[{"text":"and therefore for any distribution ","element":"span"},{"style":{"height":16.4},"width":262.97,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-25.png","element":"img","alt":" P over µ that","inline":true}],[{"style":{"width":"36%"},"width":680,"height":55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-26.png","element":"img"}],[{"text":"Lemma ","element":"span"},{"href":"#id-36","text":"4 ","element":"a"},{"text":"implies also ","element":"span"},{"style":{"height":21.69},"width":810.88,"height":54.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-27.png","element":"img","alt":" δ ≥ Pµ∼P,ht∼A|µ,µ′∼P|ht�ht ∈ unfair(A, µ′)�","inline":true},{"text":", so by Markov’s inequality","element":"span"}],[{"style":{"width":"53%"},"width":997,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/24-28.png","element":"img"}],[{"style":{"width":"81%"},"width":1527,"height":154,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-0.png","element":"img"}],[{"text":"However, Equation ","element":"span"},{"href":"#id-48","text":"4 ","element":"a"},{"text":"shows this does not hold for any ","element":"span"},{"style":{"height":14.73},"width":37.14,"height":36.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-1.png","element":"img","alt":" ht ","inline":true,"padRight":true},{"text":"which does not","element":"span"},{"style":{"height":17.6},"width":78.18,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-2.png","element":"img","alt":"√2δ","inline":true},{"text":"-distinguish any arm but for which ","element":"span"},{"style":{"height":26.62},"width":632.32,"height":66.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-3.png","element":"img","alt":" πt′i|ht′ ̸= 1k for some i ∈ [k], t′ ≤ t","inline":true},{"text":". Thus, for at least ","element":"span"},{"style":{"height":21.29},"width":17,"height":53.23,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-4.png","element":"img","alt":" 12 ","inline":true,"padRight":true},{"text":"of all probability mass over ","element":"span"},{"text":"histories, either ","element":"span"},{"style":{"height":27.02},"width":775.27,"height":67.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-5.png","element":"img","alt":" πt′i|ht′ = 1k for all i, t′ ≤ t, or ht must√2δ","inline":true},{"text":"-distinguish some arm.","element":"span"}]]},{"heading":"C Missing Proofs for the Contextual Bandit Setting","paragraphs":[[{"text":"We begin by proving two results related to ","element":"span"},{"text":"KWIKToFair","element":"span"},{"text":". The first, Lemma ","element":"span"},{"href":"#id-41","text":"8, ","element":"a"},{"text":"was used in Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"to prove that ","element":"span"},{"style":{"height":12.8},"width":372.2,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-6.png","element":"img","alt":" KWIKToFair is δ","inline":true},{"text":"-fair in Theorem ","element":"span"},{"href":"#id-42","text":"5.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-41","style":{"fontStyle":"italic"},"text":"8. ","element":"a"},{"text":"We will refer to a violation of either (a) or (b) as a failure of learner ","element":"span"},{"style":{"height":14.62},"width":55.22,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-7.png","element":"img","alt":" Li.","inline":true,"padRight":true},{"text":"For each ","element":"span"},{"style":{"height":14.62},"width":41.7,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-8.png","element":"img","alt":" Li","inline":true},{"text":", the set of queries asked of it are pairs (","element":"span"},{"style":{"height":19.22},"width":95,"height":48.05,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-9.png","element":"img","alt":"hi, xti","inline":true},{"text":"), histories along with new contexts. ","element":"span"},{"text":"There are at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"contexts queried, and at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"histories on which ","element":"span"},{"style":{"height":14.62},"width":41.7,"height":36.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-10.png","element":"img","alt":" Li","inline":true,"padRight":true},{"text":"is queried for a fixed run of our algorithm (namely, prefixes of ","element":"span"},{"style":{"height":14.62},"width":41.7,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-11.png","element":"img","alt":" Li","inline":true},{"text":"’s final history). Thus, there are at most ","element":"span"},{"style":{"height":14.73},"width":48.56,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-12.png","element":"img","alt":" T 2","inline":true,"padRight":true},{"text":"queries for ","element":"span"},{"style":{"height":14.62},"width":55.22,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-13.png","element":"img","alt":" Li.","inline":true,"padRight":true},{"text":"Thus, by a union bound over these ","element":"span"},{"style":{"height":14.73},"width":48.56,"height":36.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-14.png","element":"img","alt":" T 2 ","inline":true,"padRight":true},{"text":"queries for learner ","element":"span"},{"style":{"height":14.62},"width":41.7,"height":36.55,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-15.png","element":"img","alt":" Li","inline":true},{"text":", by the KWIK guarantee, ","element":"span"},{"style":{"height":17.6},"width":89.75,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-16.png","element":"img","alt":" P [Li","inline":true,"padRight":true},{"text":"fails in some round] ","element":"span"},{"style":{"height":21.29},"width":432.39,"height":53.24,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-17.png","element":"img","alt":" ≤ T 2δ∗ = min(δ, 1T )/k","inline":true},{"text":", and by a union bound over ","element":"span"},{"style":{"fontStyle":"italic"},"text":"k ","element":"span"},{"text":"arms, ","element":"span"},{"text":"P ","element":"span"},{"text":"[A learner fails in a round] ","element":"span"},{"style":{"height":21.29},"width":236.74,"height":53.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-18.png","element":"img","alt":" ≤ min(δ, 1T ).","inline":true}],[{"text":"We proceed to Theorem ","element":"span"},{"href":"#id-43","text":"4, ","element":"a"},{"text":"used in Section ","element":"span"},{"text":"5 ","element":"span"},{"text":"to construct a ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-19.png","element":"img","alt":" δ","inline":true},{"text":"-fair algorithm with quantified regret from KWIK learners.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Theorem ","element":"span"},{"href":"#id-43","style":{"fontStyle":"italic"},"text":"4. ","element":"a"},{"text":"We use repeated calls to ","element":"span"},{"style":{"height":17.6},"width":391.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-20.png","element":"img","alt":" KWIKToFair (δ, T","inline":true},{"text":") to run for an indefinite number of rounds. Specifically, we will make calls ","element":"span"},{"style":{"height":19.53},"width":1058.15,"height":48.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-21.png","element":"img","alt":" E = 1, 2, . . . to KWIKToFair (6δ/π(log(T)2, 2E). We","inline":true,"padRight":true},{"text":"will refer to each such call to ","element":"span"},{"text":"KWIKToFair ","element":"span"},{"text":"by its epoch ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E","element":"span"},{"text":". ","element":"span"},{"text":"By Lemma ","element":"span"},{"href":"#id-41","text":"8, ","element":"a"},{"text":"each epoch ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E ","element":"span"},{"text":"is 6","element":"span"},{"style":{"height":19.13},"width":145.96,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-22.png","element":"img","alt":"δ/πE2k","inline":true},{"text":"-fair, i.e. has a 6","element":"span"},{"style":{"height":19.13},"width":145.96,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-23.png","element":"img","alt":"δ/πE2k","inline":true,"padRight":true},{"text":"probability of violating fairness. Therefore by a union bound across epochs, the probability of ever violating fairness through repeated calls to ","element":"span"},{"text":"KWIKToFair ","element":"span"},{"text":"is bounded above by ","element":"span"},{"href":"#id-49","style":{"height":26.15},"width":593.34,"height":65.38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-24.png","element":"img","alt":"�∞E=1 6δ(πE)2 = 6δπ2�∞E=1 1E2 = δ","inline":true},{"text":", so the overall algorithm is ","element":"span"},{"style":{"height":12.8},"width":111.98,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-25.png","element":"img","alt":" δ-fair.","inline":true}],[{"text":"Next, by Lemma ","element":"span"},{"href":"#id-49","text":"9 ","element":"a"},{"text":"each epoch ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E ","element":"span"},{"text":"contributes at most regret 3 ","element":"span"},{"style":{"height":20.31},"width":335.56,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-26.png","element":"img","alt":" · 2Ekϵ∗E where ϵ∗E ","inline":true,"padRight":true},{"text":"denotes the ","element":"span"},{"text":"value of ","element":"span"},{"style":{"height":12.73},"width":34.71,"height":31.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-27.png","element":"img","alt":" ϵ∗ ","inline":true,"padRight":true},{"text":"used in epoch ","element":"span"},{"style":{"height":20.31},"width":872.01,"height":50.77,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-28.png","element":"img","alt":" E, i.e. ϵ∗E satisfying ϵ∗E = k ·m(ϵ∗E, 6δ/πE2, 2E","inline":true},{"text":"). Then since each epoch ","element":"span"},{"style":{"fontStyle":"italic"},"text":"E ","element":"span"},{"text":"covers 2","element":"span"},{"style":{"height":8.8},"width":26,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-29.png","element":"img","alt":"E ","inline":true,"padRight":true},{"text":"rounds, through round ","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"the algorithm has used fewer than log(","element":"span"},{"style":{"fontStyle":"italic"},"text":"T","element":"span"},{"text":") epochs, and by the doubling trick achieves regret ","element":"span"},{"style":{"height":24.29},"width":1081.5,"height":60.74,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-30.png","element":"img","alt":" R(T) < �log(T)E=1 3 · 2Ekϵ∗E = O(Tkϵ∗) = O(k2 · m(ϵ∗, δ∗)).","inline":true}],[{"text":"Next, we address the special subcase of ","element":"span"},{"text":"KWIKToFair ","element":"span"},{"text":"for linear functions outlined in Corollary ","element":"span"},{"href":"#id-50","text":"1.","element":"a"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Corollary ","element":"span"},{"href":"#id-50","style":{"fontStyle":"italic"},"text":"1. ","element":"a"},{"text":"By Lemma ","element":"span"},{"href":"#id-44","text":"10, ","element":"a"},{"text":"for each arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j ","element":"span"},{"text":"the associated learner ","element":"span"},{"style":{"height":17.02},"width":44.7,"height":42.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-31.png","element":"img","alt":" Lj","inline":true,"padRight":true},{"text":"has mistake bound ","element":"span"},{"style":{"height":36.05},"width":1530.02,"height":90.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-32.png","element":"img","alt":"m(ϵ, δ) = ˜O(d3/ϵ4). Since ϵ∗ satisfies ϵ∗ = k · m(ϵ∗, δ)/T we get ϵ∗ =�kd3T �1/5","inline":true},{"text":"Substituting this","element":"span"}],[{"text":"into Theorem ","element":"span"},{"href":"#id-43","text":"4, ","element":"a"},{"text":"the overall regret guarantee satisfies regret ","element":"span"},{"style":{"height":19.13},"width":730.05,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-33.png","element":"img","alt":" R(T) = O(k2 · m(ϵ∗, δ∗)) = O(Tkϵ∗) =","inline":true},{"style":{"height":20.33},"width":317.21,"height":50.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-34.png","element":"img","alt":"O(T 4/5k6/5d3/5).","inline":true}],[{"text":"This brings us to the formal algorithm description of ","element":"span"},{"text":"ConjunctionBandit ","element":"span"},{"text":"and its corresponding regret bound, used in Section ","element":"span"},{"href":"#id-51","text":"6.1 ","element":"a"},{"text":"as an example of an unfair learning algorithm for conjunctions.","element":"span"}],[{"style":{"width":"98%"},"width":1843,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/25-35.png","element":"img"}],[{"style":{"width":"24%"},"width":464,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-0.png","element":"img"}],[{"style":{"height":15.93},"width":1316.98,"height":39.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-1.png","element":"img","alt":"4: St ← ∅ ▷","inline":true,"padRight":true},{"text":"Initialize active set of arms ","element":"span"},{"style":{"height":16.4},"width":587.2,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-2.png","element":"img","alt":"5: for j = 1, 2, . . . , k do","inline":true}],[{"style":{"width":"99%"},"width":1864,"height":398,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-3.png","element":"img"}],[{"style":{"height":18.73},"width":1173.58,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-4.png","element":"img","alt":"13: Pull arm j∗ ← (x ∈R St) ▷","inline":true,"padRight":true},{"text":"Pull arm from active set at random","element":"span"}],[{"text":"We can now upper bound the regret achieved by ","element":"span"},{"text":"ConjunctionBandit","element":"span"},{"text":".","element":"span"}],[{"id":"id-52","style":{"width":"69%"},"width":1304,"height":49,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-5.png","element":"img"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-52","style":{"fontStyle":"italic"},"text":"13. ","element":"a"},{"text":"First, we claim that for every ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":", for the duration of the algorithm, that ","element":"span"},{"style":{"height":17.42},"width":95.06,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-6.png","element":"img","alt":" Cj ⊆","inline":true},{"style":{"height":19.69},"width":246.1,"height":49.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-7.png","element":"img","alt":"C∗j , where Cj","inline":true,"padRight":true},{"text":"is the true set of variables corresponding to ","element":"span"},{"style":{"height":17.42},"width":36.36,"height":43.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-8.png","element":"img","alt":" fj","inline":true},{"text":". This holds at initialization: ","element":"span"},{"style":{"height":18.22},"width":200,"height":45.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-9.png","element":"img","alt":" Cj ⊆ [d] =","inline":true},{"style":{"height":19.69},"width":51.31,"height":49.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-10.png","element":"img","alt":"C∗j ","inline":true,"padRight":true},{"text":". Suppose the claim holds prior to round ","element":"span"},{"style":{"height":19.69},"width":142.07,"height":49.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-11.png","element":"img","alt":" t: if C∗j ","inline":true,"padRight":true},{"text":"is updated in this round, then ","element":"span"},{"style":{"height":21.63},"width":257.24,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-12.png","element":"img","alt":" fj(xtj) = 1 ⇒","inline":true},{"style":{"height":21.62},"width":1871.88,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-13.png","element":"img","alt":"∀m ∈ [d] : xtj,m = 0, m /∈ Cj. Thus, C∗j = C∗j \\{m : xtj,m = 0} = C∗j \\{m : xtj,m = 0∩m /∈ Cj} ⊃ Cj.","inline":true}],[{"text":"Therefore, the algorithm never makes false positive mistakes: in any round ","element":"span"},{"style":{"height":21.62},"width":386.87,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-14.png","element":"img","alt":" t, j ∈ St ⇒ fj(xtj) =","inline":true,"padRight":true},{"text":"1. Therefore ","element":"span"},{"text":"ConjunctionBandit ","element":"span"},{"text":"only accumulates regret in rounds where it makes false negative mistakes by predicting that all arms have reward 0 when some arm has reward 1.","element":"span"}],[{"text":"Then, we have Regret(","element":"span"},{"style":{"height":31.6},"width":943.47,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-15.png","element":"img","alt":"x1, . . . , xT ) = �t maxj�fj(xtj)�− E��t fit(xtit)�","inline":true},{"text":". We then rewrite the first term as ","element":"span"},{"style":{"height":31.6},"width":1035.62,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-16.png","element":"img","alt":"�t maxj�fj(xtj)�= �t I{fj(xtj) = 1 for some j ∈ [k]}","inline":true,"padRight":true},{"text":"and the second term as","element":"span"}],[{"style":{"width":"85%"},"width":1603,"height":565,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-17.png","element":"img"}],[{"text":"where the last inequality follows from ","element":"span"},{"style":{"height":32.4},"width":679.61,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-18.png","element":"img","alt":" P�j∗ = j | St = ∅ ∧ fj(xtj) = 1�= 1k ","inline":true,"padRight":true},{"text":"and the fact that if ","element":"span"},{"style":{"height":21.63},"width":625.48,"height":54.06,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-19.png","element":"img","alt":"St = ∅ and fj∗(xtj∗) = 1 then Cj∗","inline":true,"padRight":true},{"text":"loses at least one of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"variables, and this loss can therefore occur ","element":"span"},{"text":"at most ","element":"span"},{"style":{"fontStyle":"italic"},"text":"d ","element":"span"},{"text":"times for each arm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"j","element":"span"},{"text":". Substituting this into the original regret expression then yields","element":"span"}],[{"style":{"width":"84%"},"width":1580,"height":243,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/26-20.png","element":"img"}],[{"text":"Finally, we prove Lemma ","element":"span"},{"href":"#id-47","text":"12, ","element":"a"},{"text":"which we used in Section ","element":"span"},{"href":"#id-51","text":"6.1 ","element":"a"},{"text":"to translate between fair and KWIK learning on conjunctions.","element":"span"}],[{"style":{"fontStyle":"italic"},"text":"Proof of Lemma ","element":"span"},{"href":"#id-47","style":{"fontStyle":"italic"},"text":"12. ","element":"a"},{"text":"We mimic the structure of the proof of Theorem ","element":"span"},{"href":"#id-45","text":"6, ","element":"a"},{"text":"once again using ","element":"span"},{"text":"FairToKWIK ","element":"span"},{"text":"to construct a KWIK learner ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"by running the given fair algorithm ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"on a constructed bandit instance for each context ","element":"span"},{"style":{"height":14.73},"width":51.16,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-0.png","element":"img","alt":" xt.","inline":true}],[{"text":"There are two primary modifications for the specific case of conjunctions: as conjunctions output either 0 or 1 we set ","element":"span"},{"style":{"height":22.49},"width":608.57,"height":56.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-1.png","element":"img","alt":" ϵ = 0, ϵ∗ = 1, and δ∗ = δ2T . A","inline":true,"padRight":true},{"text":"therefore runs on 2","element":"span"},{"style":{"fontStyle":"italic"},"text":"T ","element":"span"},{"text":"histories and contexts, ","element":"span"},{"text":"either of form (","element":"span"},{"style":{"height":18.73},"width":1130.93,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-2.png","element":"img","alt":"xt, x(0) = 0) or (xt, x(1) = 1). Since we initialize A to be δ∗","inline":true},{"text":"-fair, if we fix history ","element":"span"},{"style":{"height":14.73},"width":37.14,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-3.png","element":"img","alt":" ht","inline":true,"padRight":true},{"text":"along with context and arm assignment (","element":"span"},{"style":{"height":18.73},"width":118.47,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-4.png","element":"img","alt":"xt, x(ℓ","inline":true},{"text":")) then, with probability at least 1 ","element":"span"},{"style":{"height":21.49},"width":290.18,"height":53.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-5.png","element":"img","alt":" − δ∗, pt,ℓ1 > pt,ℓ2","inline":true,"padRight":true},{"text":"implies ","element":"span"},{"style":{"height":18.73},"width":222.01,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-6.png","element":"img","alt":" f∗(xt) > ℓ/","inline":true},{"text":"2 and similarly ","element":"span"},{"style":{"height":21.49},"width":570.56,"height":53.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-7.png","element":"img","alt":" pt,ℓ2 > pt,ℓ1 implies f∗(xt) < ℓ/","inline":true},{"text":"2. Union bounding over all such ","element":"span"},{"style":{"height":12.8},"width":134.77,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-8.png","element":"img","alt":"t and ℓ","inline":true,"padRight":true},{"text":"yields that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"satisfies this fairness over all ","element":"span"},{"style":{"height":12.8},"width":134.77,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-9.png","element":"img","alt":" t and ℓ","inline":true,"padRight":true},{"text":"with probability at least 1 ","element":"span"},{"style":{"height":15.6},"width":228.87,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-10.png","element":"img","alt":" − δ, and we","inline":true,"padRight":true},{"text":"condition on this event for the rest of the proof.","element":"span"}],[{"text":"We proceed to prove that the resulting KWIK learner ","element":"span"},{"style":{"height":12.8},"width":114.33,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-11.png","element":"img","alt":" B is ϵ","inline":true},{"text":"-accurate. Here, as ","element":"span"},{"style":{"height":15.6},"width":212.92,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-12.png","element":"img","alt":" ϵ = 0, this","inline":true,"padRight":true},{"text":"requires showing that all of ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B","element":"span"},{"text":"’s numerical predictions are correct. Assume instead that ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"outputs an incorrect prediction on (","element":"span"},{"style":{"height":18.74},"width":118.47,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-13.png","element":"img","alt":"xt, x(ℓ","inline":true},{"text":")). By the construction of ","element":"span"},{"text":"FairToKWIK","element":"span"},{"text":", a prediction from ","element":"span"},{"style":{"fontStyle":"italic"},"text":"B ","element":"span"},{"text":"implies that at least one of ","element":"span"},{"style":{"height":21.89},"width":387.97,"height":54.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-14.png","element":"img","alt":" pt,01 , pt,11 , pt,02 and pt,12","inline":true,"padRight":true},{"text":"is distinct from the others. We condition on this distinctness to get two cases. In the first case, ","element":"span"},{"style":{"height":45.19},"width":1870.07,"height":112.98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-15.png","element":"img","alt":" pt,ℓ1 ≤ pt,ℓ2 for both ℓ = 0 and 1. By distinctness, this means that either pt,01 < pt,02or pt,11 < pt,12 ","inline":true,"padRight":true},{"text":". By the fairness assumption, this respectively implies that ","element":"span"},{"style":{"height":18.73},"width":464.9,"height":46.83,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-16.png","element":"img","alt":" f∗(xt) < f(x(0)) = 0 or","inline":true},{"style":{"height":18.73},"width":1048.16,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-17.png","element":"img","alt":"f∗(xt) < f(x(1)) = 1. In either event, f∗x(t) = 0 = ˆyt","inline":true},{"text":". In the second case, ","element":"span"},{"style":{"height":21.49},"width":181.1,"height":53.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-18.png","element":"img","alt":" pt,ℓ1 > pt,ℓ2","inline":true,"padRight":true},{"text":"for at least one of ","element":"span"},{"style":{"height":21.89},"width":417.55,"height":54.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-19.png","element":"img","alt":" ℓ = 0 or 1. pt,11 > pt,12","inline":true,"padRight":true},{"text":"violates the fairness assumption on ","element":"span"},{"style":{"fontStyle":"italic"},"text":"A ","element":"span"},{"text":"as ","element":"span"},{"style":{"fontStyle":"italic"},"text":"f","element":"span"},{"text":"(","element":"span"},{"style":{"fontStyle":"italic"},"text":"x","element":"span"},{"text":"(1)) = 1, so it must be that ","element":"span"},{"style":{"height":21.89},"width":181.3,"height":54.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-20.png","element":"img","alt":" pt,01 > pt,02 ","inline":true,"padRight":true},{"text":". Fairness then implies that ","element":"span"},{"style":{"height":18.73},"width":291.24,"height":46.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-21.png","element":"img","alt":" f∗(xt) = 1 = ˆyt","inline":true},{"text":". Therefore ","element":"span"},{"style":{"height":12.8},"width":292.76,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-22.png","element":"img","alt":" B is ϵ-accurate.","inline":true}],[{"text":"It remains to upper bound ","element":"span"},{"style":{"height":17.6},"width":112.42,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-23.png","element":"img","alt":" m(ϵ, δ","inline":true},{"text":"). Any round where ","element":"span"},{"style":{"height":16},"width":246.65,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-24.png","element":"img","alt":" B outputs ⊥","inline":true,"padRight":true},{"text":"means a choice between two contexts, one of which has a difference of 1 between arms. It follows that choosing randomly between both arms and contexts incurs expected regret 1","element":"span"},{"style":{"fontStyle":"italic"},"text":"/","element":"span"},{"text":"4. Therefore ","element":"span"},{"style":{"height":24.22},"width":503.98,"height":60.54,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-25.png","element":"img","alt":"m(ϵ,δ)4 < R(m(ϵ, δ), δ∗, d) =","inline":true},{"style":{"height":22.49},"width":322.24,"height":56.22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1605.07139/images/27-26.png","element":"img","alt":"R(m(ϵ, δ), δ2T , d).","inline":true}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]