b

DiscoverSearch
About
Near Neighbor: Who is the Fairest of Them All?
2019·arXiv
Abstract
Abstract

In this work we study a fair variant of the near neighbor problem. Namely, given a set of n points P and a parameter r, the goal is to preprocess the points, such that given a query point q, any point in the r-neighborhood of the query, i.e., Bpq, rq, have the same probability of being reported as the near neighbor.

We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point in the r-neighborhood of a query q with almost uniform probability. The query time is proportional to O`dnspq.rqQpn, cq˘,and its space is OpSpn, cqq, where Qpn, cq and Spn, cq are the query time and space of an LSH algorithm for c-approximate near neighbor, and dnspq, rq is a function of the local density around q.

Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. Finally, we run experiments to show performance of our approach on real data.

Nowadays, many important decisions, such as college admissions, offering home loans, or estimating the likelihood of recidivism, rely on machine learning algorithms. There is a growing concern about the fairness of the algorithms and creating bias toward a specific population or feature [HPS16, Cho17, MSP16, KLL`17]. While algorithms are not inherently biased, nevertheless, they may amplify the already existing biases in the data. Hence, this concern has led to the design of fair algorithms for many different applications, e.g., [DOBD`18,ABD`18, PRW`17, CKLV19, EJJ`19, OA18, CKLV17, BIO`19, BCN19, KSAM19].

Bias in the data used for training machine learning algorithms is a monumental challenge in creating fair algorithms [HGB`07, TE11, ZVGRG17, Cho17]. Here, we are interested in a somewhat different problem, of handling the bias introduced by the data-structures used by such algorithms. Specifically, data-structures may introduce bias in the data stored in them, and the way they answer queries, because of the way the data is stored and how it is being accessed. Such a defect leads to selection bias by the algorithms using such data-structures. It is natural to want data-structures that do not introduce a selection bias into the data when handling queries.

The target as such is to derive data-structures that are bias-neutral. To this end, imagine a data-structure that can return, as an answer to a query, an item out of a set of acceptable answers. The purpose is then to return uniformly a random item out of the set of acceptable outcomes, without explicitly computing the whole set of acceptable answers (which might be prohibitively expensive).

Several notions of fairness have been studied, including  group fairness1 (where demographics of the population is preserved in the outcome) and individual fairness (where the goal is to treat individuals with similar conditions similarly) [DHP`12]. In this work, we study the near neighbor problem from the perspective of individual fairness.

Near Neighbor is a fundamental problem that has applications in many areas such as machine learning, databases, computer vision, information retrieval, and many others, see [SDI06, AI08] for an overview. The problem is formally defined as follows. Let pM, dq be a metric space. Given a set  P Ď M of npoints and a parameter r, the goal of the near neighbor problem is to preprocess P, such that for a query point q P M, one can report a point p P P, such that dpp, qq ď rif such a point exists. As all the existing algorithms for the exact variant of the problem have either space or query time that depends exponentially on the ambient dimension of M, people have considered the approximate variant of the problem. In the c-approximate near neighbor (ANN) problem, the algorithm is allowed to report a point p whose distance to the query is at most cr if a point within distance r of the query exists, for some prespecified constant  c ą 1.

Perhaps the most prominent approach to get an ANN data structure is via Locality Sensitive Hashing (LSH) [IM98, HIM12], which leads to sub-linear query time and subquadratic space. In particular, for  M “ Rd, by using LSHone can get a query time of nρ`op1q and space n1`ρ`op1q where for the  L1distance metric  ρ “ 1{c [IM98, HIM12], and forthe  L2distance metric  ρ “ 1{c2 `ocp1q [AI08]. The idea of the LSH method is to hash all the points using several hash functions that are chosen randomly, with the property that closer points have a higher probability of collision than the far points. Therefore, the closer points to a query have a higher probability of falling into a bucket being probed than far points. Thus, reporting a random point from a random bucket computed for the query, produces a distribution that is biased by the distance to the query: closer points to the query have a higher probability of being chosen.

When random nearby is better than nearest. The bias mentioned above towards nearer points is usually a good property, but is not always desirable. Indeed, consider the following scenarios:

(I) The nearest neighbor might not be the best if the input is noisy, and the closest point might be viewed as an unrepresentative outlier. Any point in the neighborhood might be then considered to be equivalently beneficial. This is to some extent why k-NN classification [ELL09] is so effective in reducing the effect of noise.

(II) However, k-NN works better in many cases if k is large, but computing the k nearestneighbors is quite expensive if k is large [HAAA14]. Computing quickly a random nearby neighbor can significantly speed-up such classification.

(III) We are interested in annonymizing the query [Ada07], thus returning a random near-neighbor might serve as first line of defense in trying to make it harder to recover the query. Similarly, one might want to anonymize the nearest-neighbor [QA08], for applications were we are interested in a “typical” data item close to the query, without identifying the nearest item.

(IV) If one wants to estimate the number of items with a desired property within the neighborhood, then the easiest way to do it is via uniform random sampling from the neighborhood. In particular, this is useful for density estimation [KLK12].

(V) Another natural application is simulating a random walk in the graph where two items are connected if they are in distance at most r from each other. Such random walks are used by some graph clustering algorithms [HK01].

1.1. Results

Our goal is to solve the near-neighbor problem, and yet be fair among “all the points” in the neighborhood. We introduce and study the fair near neighbor problem – where the goal is to report any point of Npq, rq with uniform distribution. That is, report a point within distance r of the query point with probability of  Ppq, rq “ 1{npq, rq, where npq, rq “ |Npq, rq|.Naturally, we study the approximate fair near neighbor problem, where one can hope to get efficient data-structures. We have the following results:

image

(II) Approximate neighborhood. This formulation reports an almost uniform distribution from an approximate neighborhood S of the query. We can provide such a data structure that uses space Spn, cq and whose query time is rOpQpn, cqq, albeit in expectation. See Lemma 4.3 for the exact statement.

Moreover, the algorithm produces the samples independently of past queries. In particular, one can assume that an adversary is producing the set of queries and has full knowledge of the data structure. Even then the generated samples have the same (almost) uniform guarantees. Furthermore, we remark that the new sampling strategy can be embedded in the existing LSH method to achieve unbiased query results. Finally, we remark that to get a distribution that is  p1 ` εq-uniform (See preliminaries for the definition), the dependence of our algorithms on  ε is only Oplogp1{εqq.

Very recently, independent of our work, [APS19] also provides a similar definition for the fair near neighbor problem.

Experiments. Finally, we compare the performance of our algorithm with the algorithm that uniformly picks a bucket and reports a random point, on the MNIST, SIFT10K, and GloVe data sets. Our empirical results show that while the standard LSH algorithm fails to fairly sample a point in the neighborhood of the query, our algorithm produces an empirical distribution which is much closer to the uniform distribution: it improves the statistical distance to the uniform distribution by a significant factor.

1.2. Data-structure: Sampling from a sub-collection of sets

We first study the more generic problem – given a collection F of sets from a universe of n elements, a query is a sub-collection  G Ď Fof these sets and the goal is to sample (almost) uniformly from the union of the sets in this sub-collection. We do this by first sampling a set X in the sub-collection G proportional to the size of the set X, and then sampling an element x P X uniformly at random. This produces a distribution on elements in the sub-collection such that any element x is chosen proportional to its degree dpxq (i.e., the number of sets X P G that x P X). Therefore, we can use rejection sampling and only report x with probability 1{dpxq.

We can compute the degree by checking if x P X for all sets X P G in the collection, which takes time proportional to  g “ |G|. Also because of the rejection sampling, we might need to repeat this process  Opdegavgqtimes which can be as large as g. This leads to expected runtime  Opg2qto generate a single sample, see Lemma 3.1.

As a first improvement, we approximate the degree using standard sampling techniques, which can be done in time rOpg{dpxqq. Although this can still be large for small degrees, however, those small values will also be rejected with a smaller probability. Using this, we can bound the runtime of a query by  Opε´2g log nq, see Lemma 3.3(the sampling is p1 ˘ εq-uniform), where n is (roughly) the input size.

Our second improvement, which the authors believe to be quite interesting, follows by simulating this rejection sampling directly. This follows by first introducing a heuristic to approximate the degree, and then shortcutting it to get the desired simulation. Section 3.4 describes this modified algorithm. In particular, one can get uniform sampling with high probability. Specifically, one can sample uniformly in expected  Opg log γ´1qtime, where the sampling succeeds with probability  ě 1´γ. Alternatively, one can sample  p1˘εq-uniformly, with the expected running time being  O`g logpn{εq˘. This is a significant improvement, in both simplicity and dependency on  ε, over the previous scheme.

We also show how to modify that data-structure to handle outliers, as it is the case for LSH, as the sampling algorithm needs to ignore such points once they are reported as a sample.

Applications. Here are a few examples of applications of such a data-structure (for sampling

from a union of sets):

(A) Given a subset X of vertices in the graph, randomly pick (with uniform distribution) a neighbor to one of the vertices of X. This can be used in simulating disease spread [KE05].

(B) Here, we use a variant of this data-structure to implement the fair ANN. (C) Uniform sampling for range searching [HQT14, AW17, AP19]. Indeed, consider a set of points, stored in a data-structure for range queries. Using the above, we can support sampling from the points reported by several queries, even if the reported answers are not disjoint. Being unaware of any previous work on this problem, we believe this data-structure is of independent interest.

1.3. Paper organization

We describe some basic sampling and approximation tools in Section 2. We describe the sampling from union of set data-structure in Section 3. The application of the data-structure to LSH is described in Section 4. The experiments are described in Section 5.

Neighborhood, fair nearest-neighbor, and approximate neighborhood. Let pM, dq be a metric space and let  P Ď Mbe a set of n points. Let  Bpc, rq “ tx P M | dpc, xq ď rube the (close) ball of radius r around a point  c P M, and let Npc, rq “ Bpc, rq X P be ther-neighborhood of c in P. The size of the r-neighborhood is  npc, rq “ |Npc, rq|.

Definition 2.1 (FANN). Given a data set  P Ď M of npoints and a parameter r, the goal is to preprocess P such that for a given query q, one reports each point p P Npq, rq with probability  µp where µis an approximately uniform probability distribution:  Ppq, rq{p1`εq ďµp ď p1 ` εqPpq, rq, where Ppq, rq “ 1{npq, rq.

Definition 2.2 (FANN with approximate neighborhood). Given a data set  P Ď M of n pointsand a parameter r, the goal is to preprocess them such that for a given query q, one reports each point p P S with probability  µp where ϕ{p1 ` εq ď µp ď p1 ` εqϕ, where Sis a point set such that  Npq, rq Ď S Ď Npq, crq, and ϕ “ 1{|S|.

Set representation. Let U be an underlying ground set of n objects (i.e., elements). In this paper, we deal with sets of objects. Assume that such a set  X Ď Uis stored in some reasonable data-structure, where one can insert delete, or query an object in constant time. Querying for an object o P U, requires deciding if o P X. Such a representation of a set is straightforward to implement using an array to store the objects, and a hash table. This representation allows random access to the elements in the set, or uniform sampling from the set.

If hashing is not feasible, one can just use a standard dictionary data-structure – this would slow down the operations by a logarithmic factor.

Subset size estimation. We need the following standard estimation tool, [BHR`17,Lemma 2.8].

Lemma 2.3. Consider two sets  B Ď U, where n “ |U|. Let ξ, γ P p0, 1qbe parameters, such that  γ ă 1{ log n. Assume that one is given an access to a membership oracle that, given an element x P U, returns whether or not x P B. Then, one can compute an estimate s, such that  p1´ξq |B| ď s ď p1`ξq |B|, and computing this estimates requires  Oppn{ |B|qξ´2 log γ´1qoracle queries. The returned estimate is correct with probability  ě 1 ´ γ.

Weighted sampling. We need the following standard data-structure for weighted sampling.

Lemma 2.4. Given a set of objects  H “ to1, . . . , otu, with associated weights  w1, . . . , wt, onecan preprocess them in Optq time, such that one can sample an object out of H. The probability of an object  oito be sampled is  wi{ řtj“1 wj. In addition the data-structure supports updates to the weights. An update or sample operation takes Oplog tq time.

Proof: Build a balanced binary tree T, where the objects of G are stored in the leaves. Every internal node u of T, also maintains the total weight wpuq of the objects in its subtree. The tree T has height Oplog tq, and weight updates can be carried out in Oplog tq time, by updating the path from the root to the leaf storing the relevant object.

Sampling is now done as follows – we start the traversal from the root. At each stage, when being at node u, the algorithm considers the two children  u1, u2. It continues to  u1with probability  wpu1q{wpuq, and otherwise it continues into  u2. The object sampled is the one in the leaf that this traversal ends up at.

The problem. Assume you are given a data-structure that contains a large collection F of sets of objects. The sets in F are not necessarily disjoint. The task is to preprocess the data-structure, such that given a sub-collection  G Ď Fof the sets, one can quickly pick uniformly at random an object from the set ŤG :“ ŤXPG X.

Naive solution. The naive solution is to take the sets under consideration (in G), compute their union, and sample directly from the union set ŤG. Our purpose is to do (much) better – in particular, the goal is to get a query time that depends logarithmically on the total size of all sets in G.

3.1. Preprocessing

For each set X P F, we build the set representation mentioned in the preliminaries section. In addition, we assume that each set is stored in a data-structure that enables easy random access or uniform sampling on this set (for example, store each set in its own array). Thus, for each set X, and an element, we can decide if the element is in X in constant time.

3.2. Uniform sampling via exact degree computation

The query is a family  G Ď F, and define  m “���G���:“ řXPG |X|(which should be distinguished from  g “ |G| and from n “ |Ť G|). The degreeof an element  x P Ť G, is the number of sets of G that contains it – that is,  dGpxq “ |DGpxq|, where DGpxq “ tX P G | x P Xu . Thealgorithm repeatedly does the following:

(I) Picks one set from G with probabilities proportional to their sizes. That is, a set X P G is picked with probability |X| {m. (II) It picks an element x P X uniformly at random. (III) Computes the degree  d “ dGpxq.(IV) Outputs x and stop with probability 1{d. Otherwise, continues to the next iteration.

Lemma 3.1. Let n “ |Ť G| and g “ |G|. The above algorithm samples an element  x P Ť Gaccording to the uniform distribution. The algorithm takes in expectation  Opgm{nq “ Opg2qtime. The query time is takes  Opg2 log nqwith high probability.

Proof: Let m “���G���. Observe that an element  x P Ť Gis picked by step (II) with probability α “ dpxq{m. The element x is output with probability  β “ 1{dpxq. As such, the probability of x to be output by the algorithm in this round is  αβ “ 1{���G���. This implies that the output distribution is uniform on all the elements of Ť G.

The probability of success in a round is n{m, which implies that in expectation m{n rounds are used, and with high probability Oppm{nq log nq rounds. Computing the degree dGpxq takes Op|G|qtime, which implies the first bound on the running time. As for the second bound, observe that an element can appear only once in each set of G, which readily implies that  dpyq ď |G|, for all y P Ť G.

3.3. Almost uniform sampling via degree approximation

The bottleneck in the above algorithm is computing the degree of an element. We replace this by an approximation.

Definition 3.2. Given two positive real numbers x and y, and a parameter  ε P p0, 1q, thenumbers  x and y are ε-approximationof each other, denoted by  x «ε y, if x{p1 ` εq ďy ď xp1 ` εq and y{p1 ` εq ď x ď yp1 ` εq.

In the approximate version, given an item  x P Ť G, we can approximate its degree and get an improved runtime for the algorithm.

Lemma 3.3. The input is a family of sets F that one can preprocess in linear time. Let G Ď Fbe a sub-family and let  n “ |Ť G|, g “ |G|, and ε P p0, 1qbe a parameter. One can sample an element  x P Ť Gwith almost uniform probability distribution. Specifically, the probability of an element to be output is  «ε 1{n. After linear time preprocessing, the query time is  Opgε´2 log nq, in expectation, and the query succeeds with high probability.

Proof: Let m “���G���. Since dpxq “ |DGpxq|, it follows that we need to approximate the size of DGpxq in G. Given a set X P G, we can in constant time check if x P X, and as such decide if  X P DGpxq. It follows that we can apply the algorithm of Lemma 2.3, which requires Wpxq “ O` gdpxqε´2 log n˘time, where the algorithm succeeds with high probability. The query algorithm is the same as before, except that it uses the estimated degree.

For  x P Ť G, let Exbe the event that the element x is picked for estimation in a round, and let  E1x be the event that it was actually output in that round. Clearly, we have  PrE1x | Exs “1{d, where d is the degree estimate of  x. Since d «ε dpxq(with high probability), it follows that  PrE1x | Exs «ε 1{dpxq. Since there are dpxq copies of x in G, and the element for estimation is picked uniformly from the sets of G, it follows that the probability of any element  x P Ť Gto be output in a round is

image

as  E1x Ď Ex.As such, the probability of the algorithm terminating in a round is  α “řxPŤ G PrE1xs «ε n{m ě n{2m.As for the expected amount of work in each round, observe that it is proportional to

image

Intuitively, since the expected amount of work in each iteration is W, and the expected number of rounds is 1{α, the expected running time is  OpW{αq.This argument is not quite right, as the amount of work in each round effects the probability of the algorithm to terminate in the round (i.e., the two variables are not independent). We continue with a bit more care – let  Libe the running time in the ith round of the algorithm if it was to do an ith iteration (i.e., think about a version of the algorithm that skips the experiment in the end of the iteration to decide whether it is going to stop), and let  Yibe a random variable that is 1 if the (original) algorithm had not stopped at the end of the first i iterations of the algorithm.

By the above, we have that  yi “ PrYi “ 1s “ PrYi “ 1 | Yi´1 “ 1s PrYi´1 “ 1s ď p1 ´αqyi´1 ď p1 ´ αqi, and ErLis “ OpWq. Importantly,  Li and Yi´1are independent (while Li and Yiare dependent). We clearly have that the running time of the algorithm is O`ř8i“1 Yi´1Li˘(here, we define  Y0 “1). Thus, the expected running time of the algorithm is proportional to

image

because of linearity of expectations, and since  Li and Yi´1are independent.

Remark 3.4. The query time of Lemma 3.3 deteriorates to  O`gε´2 log2 n˘if one wants the bound to hold with high probability. This follows by restarting the query algorithm if the query time exceeds (say by a factor of two) the expected running time. A standard application of Markov’s inequality implies that this process would have to be restarted at most Oplog nq times, with high probability.

Remark 3.5. The sampling algorithm is independent of whether or not we fully know the underlying family F and the sub-family G. This means the past queries do not affect the sampled object reported for the query G. Therefore, the almost uniform distribution property holds in the presence of several queries and independently for each of them.

3.4. Almost uniform sampling via simulation

It turns out that one can avoid the degree approximation stage in the above algorithm, and achieve only a polylogarithmic dependence on  ε´1. To this end, let x be the element picked. We need to simulate a process that accepts x with probability 1{dpxq.

We start with the following natural idea for estimating dpxq – probe the sets randomly (with replacement), and stop in the ith iteration if it is the first iteration where the probe found a set that contains x. If there are g sets, then the distribution of i is geometric, with probability  p “ dpxq{g. In particular, in expectation,  Eris “ g{dpxq, which implies that dpxq “ g{ Eris. As such, it is natural to take g{i as an estimation for the degree of x. Thus, to simulate a process that succeeds with probability 1{dpxq, it would be natural to return 1 with probability i{g and 0 otherwise. Surprisingly, while this seems like a heuristic, it does work, under the right interpretation, as testified by the following.

Lemma 3.6. Assume we have g urns, and exactly  d ą 0of them, are non-empty. Furthermore, assume that we can check if a specific urn is empty in constant time. Then, there is a randomized algorithm, that outputs a number  Y ě 0, such that  ErY s “ 1{d. The expected running time of the algorithm is Opg{dq.

Proof: The algorithm repeatedly probes urns (uniformly at random), until it finds a non-empty urn. Assume it found a non-empty urn in the ith probe. The algorithm outputs the value i{g and stops.

Setting  p “ d{g, and let Ybe the output of the algorithm. we have that

image

using the formula ř8i“1 ixi “ x{p1 ´ xq2.The expected number of probes performed by the algorithm until it finds a non-empty urn is 1{p “ g{d, which implies that the expected running time of the algorithm is Opg{dq.

The natural way to deploy Lemma 3.6, is to run its algorithm to get a number y, and then return 1 with probability y. The problem is that y can be strictly larger than 1, which is meaningless for probabilities. Instead, we backoff by using the value y{∆, for some parameter ∆. If the returned value is larger than 1, we just treat it at zero. If the zeroing never happened, the algorithm would return one with probability 1{pdpxq∆q– which we can use to our purposes via, essentially, amplification. Instead, the probability of success is going to be slightly smaller, but fortunately, the loss can be made arbitrarily small by taking ∆ to be sufficiently large.

Lemma 3.7. There are g urns, and exactly  d ą 0of them are not empty. Furthermore, assume one can check if a specific urn is empty in constant time. Let  γ P p0, 1q be aparameter. Then one can output a number  Z ě 0, such that  Z P r0, 1s, and ErZs P I ““ 1d∆ ´ γ, 1d∆‰, where ∆ “ rln γ´1s ` 4 “ Θplog γ´1q.The expected running time of the algorithm is Opg{dq.

image

Proof: We modify the algorithm of Lemma 3.6, so that it outputs  i{pg∆qinstead of i{g. If the algorithm does not stop in the first  g∆ `1 iterations, then the algorithm stops and outputs 0. Observe that the probability that the algorithm fails to stop in the first  g∆iterations, for  p “ d{g, is p1 ´ pqg∆ ď exp´´ dgg∆¯ď expp´d∆q ď expp´∆q ! γ.

Let Z be the random variable that is the number output by the algorithm. Arguing as in Lemma 3.6, we have that  ErZs ď 1{pd∆q.More precisely, we have  ErZs “ 1d∆ ´

image

Let  gpjq “ j`1∆ expp´djq. We have that  ErZs ě 1d∆ ´ β, where β “ ř8j“∆ gpjq. Further-more, for  j ě∆, we have

image

As such, we have that

image

Lemma 3.8. The input is a family of sets F that one preprocesses in linear time. Let  G Ď Fbe a sub-family and let  n “ |Ť G|, g “ |G|, and let ε P p0, 1qbe a parameter. One can sample an element  x P Ť Gwith almost uniform probability distribution. Specifically, the probability of an element to be output is  «ε 1{n. After linear time preprocessing, the query time is Opg logpg{εqq, in expectation, and the query succeeds, with high probability (in g).

Proof: The algorithm repeatedly samples an element x using steps (I) and (II) of the algorithm of Section 3.2. The algorithm returns x if the algorithm of Lemma 3.7, invoked with γ “ pε{gqOp1q returns 1. We have that ∆  “ Θplogpg{εqq. Let α “ 1{pdpxq∆q. The algorithm returns x in this iteration with probability  p, where p P rα´γ, αs. Observe that  α ě 1{pg∆q,which implies that  γ ! pε{4qα, it follows that  p «ε 1{pdpxq∆q, as desired. The expected running time of each round is Opg{dpxqq.

Arguing as in Lemma 3.3, this implies that each round, in expectation takes Opng{mq time, where  m “���G���. Similarly, the expected number of rounds, in expectation, is  Op∆m{nq.Again, arguing as in Lemma 3.3, implies that the expected running time is  Opg∆q “Opg logpg{εqq.

Remark 3.9. Similar to Remark 3.4, the query time of Lemma 3.8 can be made to work with high probability with an additional logarithmic factor. Thus with high probability, the query time is  Opg logpg{εq log nq.

3.5. Handling outliers

Imagine a situation where we have a marked set of outliers O. We are interested in sampling from Ť GzO. We assume that the total degree of the outliers in the query is at most  mO forsome prespecified parameter  mO. More precisely, we have  dGpOq “ řxPO dGpxq ď mO.

Lemma 3.10. The input is a family of sets F that one can preprocess in linear time. A query is a sub-family  G Ď F, a set of outliers O, a parameter  mO, and a parameter  ε P p0, 1q.

One can either

(A) Sample an element  x P Ť GzO with ε-approximate uniform distribution. Specifically, the probabilities of two elements to be output is the same up to a factor of 1  ˘ ε.

image

Proof: The main modification of the algorithm of Lemma 3.8, is that whenever we encounter an outlier (the assumption is that one can check if an element is an outlier in constant time), then we delete it from the set X where it was discovered. If we implement sets as arrays, this can be done by moving an outlier object to the end of the active prefix of the array, and decreasing the count of the active array. We also need to decrease the (active) size of the set. If the algorithm encounters more than  mOoutliers then it stops and reports that the number of outliers is too large.

Otherwise, the algorithm continues as before. The only difference is that once the query process is done, the active count (i.e., size) of each set needs to be restored to its original size, as is the size of the set. This clearly can be done in time proportional to the query time.

In this section, we employ our data structure of Section 3 to show the two results on uniformly reporting a neighbor of a query point mentioned in Section 1.1. First, let us briefly give some preliminaries on LSH. We refer the reader to [HIM12] for further details. Throughout the section, we assume that our metric space, admits the LSH data structure.

4.1. Background on LSH

Locality Sensitive Hashing (LSH). Let D denote the data structure constructed by LSH, and let c denote the approximation parameter of LSH. The data-structure D consists of L hash functions  g1, . . . , gL (e.g., L « n1{c for a c-approximate LSH), which are chosen via a random process and each function hashes the points to a set of buckets. For a point p P M, let Hippqbe the bucket that the point p is hashed to using the hash function  gi. Thefollowing are standard guarantees provided by the LSH data structure [HIM12].

Lemma 4.1. For a given query point  q, let S “ Ťi Hipqq. Then for any point p P Npq, rq, we have that with a probability of least 1´1{e´1{3, we have (i)  p P S and (ii) |SzBpq, crq| ď 3L,i.e., the number of outliers is at most 3L. Moreover, the expected number of outliers in any single bucket  Hipqqis at most 1.

Therefore, if we take  t “ Oplog nqdifferent data structures  D1, . . . , Dtwith corresponding hash functions  gji to denote the ith hash function in the jth data structure, we have the following lemma.

Lemma 4.2. Let the query point be q, and let p be any point in Npq, rq. Then, with high probability, there exists a data structure  Dj, such that  p P S “ Ťi Hji pqq and |SzBpq, crq| ď3L.

By the above, the space used by  LSH is Spn, cq “ rOpn ¨ Lqand the query time is  Qpn, cq “rOpLq.

4.2. Approximate Neighborhood

For  t “ Oplog nq, let D1, . . . , Dtbe data structures constructed by LSH. Let F be the set of all buckets in all data structures, i.e.,  F “␣Hji ppqˇˇ i ď L, j ď t, p P P(.For a query point q, consider the family G of all buckets containing the query, i.e.,  G “ tHji pqq | i ď L, j ď tu,and thus  |G| “ OpL log nq. Moreover, we let O to be the set of outliers, i.e., the points that are farther than cr from q. Note that as mentioned in Lemma 4.1, the expected number of outliers in each bucket of LSH is at most 1. Therefore, by Lemma 3.10, we immediately get the following result.

Lemma 4.3. Given a set P of n points and a parameter r, we can preprocess it such that given query q, one can report a point p P S with probability  µp where ϕ{p1`εq ď µp ď p1`εqϕ,where S is a point set such that  Npq, rq Ď S Ď Npq, crq, and ϕ “ 1{|S|. The algorithm uses space Spn, cq and its expected query time is  rOpQpn, cq ¨ logp1{εqq.

Proof: Let S “ Ť GzO; by Lemma 4.2, we know that  Npq, rq Ď S Ď Npq, crq, and moreover in expectation  mO ď L “ |G|. We apply the algorithm of Lemma 3.10. The runtime of the algorithm is in expectation  rOp|G| logp1{εqq “ rOpL¨logp1{εqq “ rOpQpn, cq¨logp1{εqq, and thealgorithm produces an almost uniform distribution over the points in S.

Remark 4.4. For the L1distance, the runtime of our algorithm is  rOpnp1{cq`op1qqand for the L2distance, the runtime of our algorithm is  rOpnp1{c2q`op1qq. These matches the runtime of the standard LSH-based near neighbor algorithms up to polylog factors.

4.3. Exact Neighborhood

As noted earlier, the result of the previous section only guarantees a query time which holds in expectation. Here, we provide an algorithm whose query time holds with high probability. Note that, here we cannot apply Lemma 3.10 directly, as the total number of outliers in our data structure might be large with non-negligible probability (and thus we cannot bound mO). However, as noted in Lemma 4.2, with high probability, there exists a subset of these data structures  J Ď rtssuch that for each j P J, the number of outliers in  Sj “ Ťi Hji pqq isat most 3L, and moreover, we have that  Npq, rq Ď ŤjPJ Sj. Therefore, on a high level, we make a guess  J1 of J, which we initialize it to  J1 “ rts, and start by drawing samples from G; once we encounter more than 3L outliers from a certain data structure  Dj, we infer that j R J, update the value of  J1 “ J1ztju, and set the weights of the buckets corresponding to Djequal to 0, so that they will never participate in the sampling process. As such, at any iteration of the algorithm we are effectively sampling from  G “ tHji pqq | i ď L, j P J1u.

Preprocessing. We keep t “ Oplog nq LSHdata structures which we refer to as  D1, . . . , Dt,and we keep the hashed points by the ith hash function of the jth data structure in the array denoted by  Hji . Moreover, for each bucket in  Hji , we store its size  |Hji |.

Query Processing. We maintain the variables  zji showing the weights of the bucket  Hji pqq,which is initialized to  |Hji pqq|that is stored in the preprocessing stage. Moreover, we keep

the set of outliers detected from  Hji pqq in Oji which is initially set to be empty. While running the algorithm, as we detect an outlier in  Hji pqq, we add it to  Oji , and we further decrease  zjiby one. Moreover, in order to keep track of  J1, for any data structure  Dj, whenever ři |Oji |exceeds 3L, we will ignore all buckets in  Dj, by setting all corresponding  zji to zero.

At each iteration, the algorithm proceeds by sampling a bucket  Hji pqqproportional toits weight  zji , but only among the set of buckets from those data structures  Dj for whichless than 3L outliers are detected so far, i.e.,  j P J1. We then sample a point uniformly at random from the points in the chosen bucket that have not been detected as an outlier, i.e.,  Hji pqqzOji . If the sampled point is an outlier, we update our data structure accordingly. Otherwise, we proceed as in Lemma 3.8.

Definition 4.5 (Active data structures and active buckets). Consider an iteration k of the algorithm. Let us define the set of active data structures to be the data structures from whom we have seen less than 3L outliers so far, and let us denote their indices by  J1k Ď rts, i.e.,J1k “␣jˇˇ ři |Oij| ă 3L(.Moreover, let us define the active buckets to be all buckets containing the query in these active data structures, i.e.,  Gk “ tHji pqq | i ď L, j P J1ku.

Observation 4.6. Lemma 4.2 implies that with high probability at any iteration k of the algorithm  Npq, rq Ď Ť Gk.

Definition 4.7 (active size). For an active bucket  Hji pqq, we define its active size to be  zji whichshows the total number of points in the bucket that have not yet been detected as an outlier, i.e.,  |Hji pqqzOji |.

Lemma 4.8. Given a set P of n points and a parameter r, we can preprocess it such that given a query q, one can report a point p P P with probability  µp, so that there exists a value

image

The space used is rOpSpn, cqq and the query time is  rO`Qpn, cq¨logp1{εqqwith high probability.

Proof: First note that the algorithm never outputs an outlier, and thus the third item is always satisfied. Next, let K be a random variable showing the number of iterations of the algorithm, and for an iteration k, define the random variable  Mk “ Npq, crqXŤ Gkas the set of non-outlier points in the set of active buckets. Conditioned on  K “ k, by Lemma 3.8, weknow that the distribution of the output is almost uniform on  Mk. Moreover, we know that for all  k we have Mk Ď Mk´1, and that by Observation 4.6,  Npq, rq Ď Mk. Therefore, for all points in Npq, rq their probability of being reported as the final output of the algorithm is equal, and moreover, for all points in Npq, crqzNpq, rq, their probability of being reported is lower (as at some iteration, some of these points might go out of the set of active buckets). This proves the probability condition.

To bound the query time, let us consider the iterations where the sampled point p is an outlier, and not an outlier, separately. The total number of iterations where an outlier point is sampled is at most 3L ¨ t “ rOpLq “ rOpQpn, cqqfor which we only pay rOp1q cost. For non-outlier points, their total cost can be bounded using Lemma 3.8 and Remark 3.9 by rOp|G1| logp1{εqq “ rOpL ¨ logp1{εqq “ rOpQpn, cq ¨ logp1{εqq.

Lemma 4.9. Given a set P of n points and a parameter r, we can preprocess it such that given a query q, one can report a point p P S with probability  µp where µis an approximately uniform probability distribution:  ϕ{p1 ` εq ď µp ď ϕp1 ` εq, where ϕ “ 1{|Npq, rq|. Thealgorithm uses space Spn, cq and has query time of  rO`Qpn, cq ¨ |Npq,crq||Npq,rq| ¨ logp1{εq˘with highprobability.

Proof: We run Algorithm of Lemma 4.8, and while its output is outside of Npq, rq, we ignore it and run the algorithm again. By Lemma 4.8, the output is guaranteed to be almost uniform on Npq, rq. Moreover, by Lemma 4.8, and because with high probability, we only need to run the algorithm  rOp|Npq,crq||Npq,rq| qtimes, we get the desired bound on the query time.

In this section, we consider the task of retrieving a random point from the neighborhood of a given query point, and evaluate the effectiveness of our proposed algorithm empirically on real data sets.

Data set and Queries. We run our experiments on three datasets that are standard benchmarks in the context of Nearest Neighbor algorithms (see [ABF17])

(I) Our first data set contains a random subset of 10K points in the MNIST training data set [LBBH98]3. The full data set contains 60K images of hand-written digits, where each image is of size 28 by 28. For the query, we use a random subset of 100 (out of 10K) images of the MNIST test data set. Therefore, each of our points lie in a 784 dimensional Euclidean space and each coordinate is in r0, 255s. (II) Second, we take SIFT10K image descriptors that contains 10K 128-dimensional points as data set and 100 points as queries 4.

(III) Finally, we take a random subset of 10K words from the GloVe data set [PSM14] and a random subset of 100 words as our query. GloVe is a data set of 1.2M word embeddings in 100-dimensional space and we further normalize them to unit norm.

We use the  L2Euclidean distance to measure the distance between the points.

LSH data structure and parameters. We use the locality sensitive hashing data structure for the  L2Euclidean distance [AI08]. That is, each of the L hash functions  gi, is aconcatenation of k unit hash functions  h1i ‘ ¨ ¨ ¨ ‘ hki . Each of the unit hash functions  hjiis chosen by selecting a point in a random direction (by choosing every coordinate from a Gaussian distribution with parameters p0, 1q). Then all the points are projected onto this one dimensional direction. Then we put a randomly shifted one dimensional grid of length w along this direction. The cells of this grid are considered as buckets of the unit hash function. For tuning the parameters of LSH, we follow the method described in [DIIM04], and the manual of E2LSH library [And05], as follows.

For MNIST, the average distance of a query to its nearest neighbor in the our data set is around 4.5. Thus we choose the near neighbor radius  r “5. Consequently, as we observe, the r-neighborhood of at least half of the queries are non-empty. As suggested in [DIIM04] to set the value of  w “4, we tune it between 3 and 5 and set its value to  w “ 3.1. We tunek and L so that the false negative rate (the near points that are not retrieved by LSH) is less than 10%, and moreover the cost of hashing (proportional to L) balances out the cost of scanning. We thus get  k “ 15 and L “100. This also agrees with the fact that L should be roughly square root of the total number of points. Note that we use a single LSH data structure as opposed to taking  t “ Oplog nqinstances. We use the same method for the other two data sets. For SIFT, we use  R “ 255, w “ 4, k “ 15, L “100, and for GloVe we use  R “ 0.9, w “ 3.3, k “ 15, and L “ 100.

Algorithms. Given a query point q, we retrieve all L buckets corresponding to the query. We then implement the following algorithms and compare their performance in returning a neighbor of the query point.

‚ Uniform/Uniform: Picks bucket uniformly at random and picks a random point in bucket.

‚ Weighted/Uniform: Picks bucket according to its size, and picks uniformly random point inside bucket.

‚ Optimal: Picks bucket according to size, and then picks uniformly random point p inside bucket. Then it computes  p’s degree exactlyand rejects p with probability 1  ´ 1{degppq.

‚ Degree approximation: Picks bucket according to size, and picks uniformly random point p inside bucket. It approximates p’s degree and rejects p with probability 1  ´1{deg1ppq.

Degree approximation method. We use the algorithm of Section 3.4 for the degree approximation: we implement a variant of the sampling algorithm which repeatedly samples a bucket uniformly at random and checks whether p belongs to the bucket. If the first time this happens is at iteration i, then it outputs the estimate as  deg1ppq “ L{i.

Experiment Setup. In order to compare the performance of different algorithms, for each query q, we compute Mpqq: the set of neighbors of q which fall to the same bucket as q by at least one of the L hash functions. Then for 100|Mpqq| times, we draw a sample from the neighborhood of the query, using all four algorithms. We compare the empirical distribution of the reported points on |Mpqq| with the uniform distribution on it. More specifically, we compute the total variation distance (statistical distance)5 to the uniform distribution. We repeat each experiment 10 times and report the average result of all 10 experiments over all 100 query points.

Results. Figure 5.1 shows the comparison between all four algorithms. To compare their performance, we compute the total variation distance of the empirical distribution of the algorithms to the uniform distribution. For the tuned parameters (k “ 15 , L “ 100), ourresults are as follows. For MNIST, we see that our proposed degree approximation based algorithm performs only 2.4 times worse than the optimal algorithm, while we see that other standard sampling methods perform 6.6 times and 10 times worse than the optimal algorithm. For SIFT, our algorithm performs only 1.4 times worse than the optimal while the other two perform 6.1 and 9.7 times worse. For GloVe, our algorithm performs only 2.7 times worse while the other two perform 6.5 and 13.1 times worse than the optimal algorithm.

Moreover, in order get a different range of degrees and show that our algorithm works well for those cases, we further vary the parameters k and L of LSH. More precisely, to get higher ranges of the degrees, first we decrease k (the number of unit hash functions used in each of the L hash function); this will result in more collisions. Second, we increase L (the total number of hash functions). These are two ways to increase the degree of points. For example for the MNIST data set, the above procedure increases the degree range from r1, 33s to r1, 99s.

Query time discussion. As stated in the experiment setup, in order to have a meaningful comparison between distributions, in our code, we retrieve a random neighbor of each query 100m times, where m is the size of its neighborhood (which itself can be as large as 1000). We further repeat each experiment 10 times. Thus, every query might be asked upto 106times. This is going to be costly for the optimal algorithm that computes the degree exactly. Thus, we use the fact that we are asking the same query many times and preprocess the exact degrees for the optimal solution. Therefore, it is not meaningful to compare runtimes directly. Thus we run the experiments on a smaller size dataset to compare the runtimes of all the four approaches: For  k “ 15 and L “100, our sampling approach is twice faster than the optimal algorithm, and almost five times slower than the other two approaches. However, when the number of buckets (L) increases from 100 to 300, our algorithm is 4.3 times faster than the optimal algorithm, and almost 15 times slower than the other two approaches.

Trade-off of time and accuracy. We can show a trade-off between our proposed sampling approach and the optimal. For the MNIST data set with tuned parameters (k “ 15 andL “100), by asking twice more queries (for degree approximation), the solution of our approach improves from 2.4 to 1.6, and with three times more, it improves to 1.2, and with four times more, it improves to 1.05. For the SIFT data set (using the same parameters), using twice more queries, the solution improves from 1.4 to 1.16, and with three times more, it improves to 1.04, and with four times more, it improves to 1.05. For GloVe, using twice more queries, the solution improves from 2.7 to 1.47, and with three times more, it improves to 1.14, and with four times more, it improves to 1.01.

image

Figure 5.1: Comparison of the performance of the four algorithms is measured by computing the statistical distance of their empirical distribution to the uniform distribution.

The authors would like to thank Piotr Indyk for the helpful discussions about the modeling and experimental sections of the paper.

[ABD`18]Alekh Agarwal, Alina Beygelzimer, Miroslav Dud´ık, John Langford, and Hanna M. Wallach. A reductions approach to fair classification. In Jennifer G. Dy and Andreas Krause, editors, Proc. 35th Int. Conf. Mach. Learning (ICML), volume 80 of Proc. of Mach. Learn. Research, pages 60–69. PMLR, 2018.

[ABF17] M. Aum¨uller, E. Bernhardsson, and A. Faithfull. Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In International Conference on Similarity Search and Applications, 2017.

[Ada07] Eytan Adar. User 4xxxxx9: Anonymizing query logs. Appeared in the workshop Query Log Analysis: Social and Technological Challenges, in association with WWW 2007, 01 2007.

[AI08] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117–122, 2008.

[And05] Alexandr Andoni. E2lsh 0.1 user manual. https://www.mit.edu/ andoni/LSH/manual.pdf, 2005.

[AP19] Peyman Afshani and Jeff M. Phillips. Independent range sampling, revisited again. CoRR, abs/1903.08014, 2019. to appear in SoCG 2019.

[APS19] Martin Aum¨uller, Rasmus Pagh, and Francesco Silvestri. Fair near neighbor search: Independent range sampling in high dimensions. arXiv preprint arXiv:1906.01859, 2019.

[AW17] Peyman Afshani and Zhewei Wei. Independent range sampling, revisited. In Kirk Pruhs and Christian Sohler, editors, 25th Annual European Symposium on Algorithms, ESA 2017, September 4-6, 2017, Vienna, Austria, volume 87 of LIPIcs, pages 3:1–3:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017.

[BCN19] Suman K Bera, Deeparnab Chakrabarty, and Maryam Negahbani. Fair algorithms for clustering. arXiv preprint arXiv:1901.02393, 2019.

[BHR`17]Paul Beame, Sariel Har-Peled, Sivaramakrishnan Natarajan Ramamoorthy, Cyrus Rashtchian, and Makrand Sinha. Edge estimation with independent set oracles. CoRR, abs/1711.07567, 2017.

[BIO`19]Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable fair clustering. arXiv preprint arXiv:1902.03519, 2019.

[Cho17] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.

[CKLV17] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Advances in Neural Information Processing Systems, pages 5029–5037, 2017.

[CKLV19] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvtiskii. Matroids, matchings, and fairness. In Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 2212–2220. PMLR, 2019.

[DHP`12]Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012.

[DIIM04] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proc. 20th Annu. Sympos. Comput. Geom. (SoCG), pages 253–262, 2004.

[DOBD`18] Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil. Empirical risk minimization under fairness constraints. In Advances in Neural Information Processing Systems, pages 2791–2801, 2018.

[EJJ`19]Hadi Elzayn, Shahin Jabbari, Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, and Zachary Schutzman. Fair algorithms for learning in allocation problems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 170–179. ACM, 2019.

[ELL09] Brian S. Everitt, Sabine Landau, and Morven Leese. Cluster Analysis. Wiley Publishing, 4th edition, 2009.

[HAAA14] Ahmad Basheer Hassanat, Mohammad Ali Abbadi, Ghada Awad Altarawneh, and Ahmad Ali Alhasanat. Solving the problem of the K parameter in the KNN classifier using an ensemble learning approach. CoRR, abs/1409.0919, 2014.

[HGB`07]Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Sch¨olkopf, and Alex J Smola. Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems, pages 601–608, 2007.

[HIM12] S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. Theory Comput., 8:321–350, 2012. Special issue in honor of Rajeev Motwani.

[HK01] David Harel and Yehuda Koren. On clustering using random walks. In Ramesh Hariharan, Madhavan Mukund, and V. Vinay, editors, FST TCS 2001: Foundations of Software Technology and Theoretical Computer Science, 21st Conference, Bangalore, India, December 13-15, 2001, Proceedings, volume 2245 of Lecture Notes in Computer Science, pages 18–41. Springer, 2001.

[HPS16] Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Neural Info. Proc. Sys. (NIPS), pages 3315–3323, 2016.

[HQT14] Xiaocheng Hu, Miao Qiao, and Yufei Tao. Independent range sampling. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’14, Snowbird, UT, USA, June 22-27, 2014, pages 246–255. ACM, 2014.

[IM98] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annu. ACM Sympos. Theory Comput. (STOC), pages 604–613, 1998.

[KE05] Matt J Keeling and Ken T.D Eames. Networks and epidemic models. Journal of The Royal Society Interface, 2(4):295–307, September 2005.

[KLK12] Yi-Hung Kung, Pei-Sheng Lin, and Cheng-Hsiung Kao. An optimal k-nearest neighbor for density estimation. Statistics & Probability Letters, 82(10):1786 – 1791, 2012.

[KLL`17]Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human decisions and machine predictions. The quarterly journal of economics, 133(1):237–293, 2017.

[KSAM19] Matth¨aus Kleindessner, Samira Samadi, Pranjal Awasthi, and Jamie Morgenstern. Guarantees for spectral clustering with fairness constraints. arXiv preprint arXiv:1901.08668, 2019.

[LBBH98] Yann LeCun, L´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[MSP16] Cecilia Munoz, Megan Smith, and DJ Patil. Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights. Executive Office of the President and Penny Hill Press, 2016.

[OA18] Matt Olfat and Anil Aswani. Convex formulations for fair principal component analysis. arXiv preprint arXiv:1802.03765, 2018.

[PRW`17]Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness and calibration. In Advances in Neural Information Processing Systems, pages 5680–5689, 2017.

[PSM14] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.

[QA08] Yinian Qi and Mikhail J. Atallah. Efficient privacy-preserving k-nearest neighbor search. In 28th IEEE International Conference on Distributed Computing Systems (ICDCS 2008), 17-20 June 2008, Beijing, China, pages 311–319. IEEE Computer Society, 2008.

[SDI06] Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. Nearest-neighbor methods in learning and vision: theory and practice (neural information processing). The MIT Press, 2006.

[TE11] A. Torralba and A. A. Efros. Unbiased look at dataset bias. In CVPR 2011, pages 1521–1528, 2011.

[ZVGRG17] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Kr- ishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180, 2017.


Designed for Accessibility and to further Open Science