b

DiscoverSearch
About
My stuff
Ranking a set of objects: a graph based least-square approach
2020·arXiv
Abstract
Abstract

We consider the problem of ranking N objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We assume that objects are endowed with intrinsic qualities and that the probability with which an object is preferred to another depends only on the difference between the qualities of the two competitors. We propose a class of non-adaptive ranking algorithms that rely on a least-squares optimization criterion for the estimation of qualities. Such algorithms are shown to be asymptotically optimal (i.e., they require  O( Nǫ2 log Nδ )comparisons to be  (ǫ, δ)-PAC). Numerical results show that our schemes are very efficient also in many non-asymptotic scenarios exhibiting a performance similar to the maximum-likelihood algorithm. Moreover, we show how they can be extended to adaptive schemes and test them on real-world datasets.

image

Ranking algorithms have many applications. For example they are used for ranking pages, users preferences against advertisements on the web, hotels, restaurants, or online games [1], [2]. In general a ranking algorithm infers an estimated order relation among objects starting from a set of evaluations or comparisons. Sometimes, such evaluations are performed by human “workers” in the framework of crowdsourcing applications. However, since the behavior of humans cannot be deterministically predicted, it is usually described through the adoption of a probabilistic model. Then, the challenge in designing algorithms, is the ability to infer reliable estimates of the ranking starting from “noisy” evaluations of the objects. Often the ranking algorithm resorts to pairwise comparisons of objects. In this work, we focus on such a class of ranking algorithms. Several stochastic models have been proposed in the literature [3], [4], [5], [6] to represent the outcome of comparisons. Most of them are based on the idea that objects to be compared have an intrinsic quality and that the probability,  pi,j, that object i is preferred to object j depends on their qualities  qiand qj. In this context, we devise a class of efficient algorithms, which reconstruct object qualities from pairwise difference through a least-square (LS) approach. To do so, we establish a parallelism between the estimation process and the average cumulative reward of random walks on a weighted graph.

1.1 System model

Let  Q ⊂ Rbe a compact set. We assume that N objects are available for ranking: object i is provided with an intrinsic quality,  qi ∈ Q, which is unknown to the system. Qualities induce a true ranking r among objects, in which  r(i) ≺ r(j)iff  qi > qj1. A ranking algorithm resorts to a set of observations (or answers) provided by workers, which compare pairs of objects and return the identity of the object they prefer. The comparison procedure implicitly contains some randomness reflect-ing the workers’ behavior. Thus, in general, workers’ answers can be modeled as a collection of binary random variables, whose distribution depends on the qualities of the objects to be evaluated.

Due to this randomness in the evaluation process, the inferred ranking for object  i, �r(i), does not always coincide with the true ranking r(i). The reliability of  �r(i)depends on how the evaluation process is organized. In particular, it depends on (i) the workers’ behavior, (ii) the choice of the set of object pairs to be compared, (iii) the number of workers assigned to each pair of objects, and (iv) the processing algorithm used to infer the ranking from workers’ answers.

We assume that all workers behave similarly and that they provide independent answers. In particular, a worker comparing objects i and j, will express a preference for object i against j with probability:

image

where the function  F(·)is differentiable and strictly increasing in its argument (and therefore invertible) and such that  F(0) = 12. Moreover, we assume that F ′(q)is bounded away from zero for  q ∈ ¯Qwhere ¯Q = {qi − qj|qi, qj ∈ Q}. When the pair of objects (i, j) is compared, the worker’s output is modeled as a binary random variable,  wi,j ∈ [0, 1], whose outcomes have probability

image

The model in (1) is pretty general. For example, it encompasses

the Thurstone model [5], where the preferred object (in a pair) is chosen in accordance with the qualities as perceived by the worker and defined as

image

respectively, where  niand  njare zero-mean random variables that represent noise terms. In this case  F(·)is the cumulative distribution function of the zero-mean random variable  ηi,j = ni − nj, i.e.,

image

the Bradley-Terry-Luce (BTL) model [3], [4], where

image

Let V = {1, . . . , N} be the set of objects. We observe that an arbitrary choice of a set of object pairs to be compared, denoted by  E ⊆ V × V, automatically induces an undirected graph G, whose vertex and edge sets are, respectively, V and E. Clearly, it is possible to infer a ranking among the N objects only if the graph G is connected.

Each object pair  (i, j) ∈ Eis assigned to a number of workers W. In general, an increase of W leads to a more reliable estimate of the ranking. On the other hand, the overall complexity, C, of the ranking algorithm is proportional to the total number of workers employed in the process, i.e.,

image

Then, an efficient ranking algorithm must find a good trade-off between the complexity C and the reliability of the inferred ranking, i.e., by returning an almost correct ranking of objects with a minimal number of pair comparisons.

About the reliability of the inferred ranking we say that an estimated ranking is  ǫ-quality approximately correct (or, is an  ǫ-quality ranking) if  �r(i) ≺ �r(j)whenever qi ≥ qj − ǫ. Moreover a ranking algorithm is  (ǫ, δ)-PAC [7], [8], [9] if it returns an  ǫ-quality ranking with a probability larger than  1 − δ.2

1.2 Paper contribution and related work

This paper contributes to a better understanding of the fundamental limits of ranking algorithms based on noisy pairwise comparisons. Our main results complement and extend previous findings about minimal complexity of ranking algorithms under different non-parametric preference models recently derived in [?], [8], [9]. As shown in former studies the efficiency of ranking algorithm is crucially determined by the structure of the underlying preference model.

On the one hand, under a non-parametric preference model satisfying both Strong Stochastic Transitivity (SST) and Stochastic Triangle Inequality (STI) properties,3 a provably asymptotically-optimal4 adaptive algorithm has been proposed, under the restriction that  δ > 1N. In particular, the algorithm proposed in [9] is  (ǫ, δ)-PAC provided that  O( Nǫ2 log Nδ )comparisons are dynamically allocated on the basis of previous outcomes. On the other hand, in [8], [9], it is shown that  Ω(N 2)comparisons are strictly needed to obtain a reliable ranking as soon as either STI or SST are relaxed.

When considering parametric models, estimating a ranking is essentially related to estimating the under- lying qualities. [10], [11] provide a characterization of the expected norm-two distance between estimated and true qualities (later on referred to as mean square error (MSE)), in connection with the properties of a fixed graph G. In particular [10], under the assumption that F(·)is log-concave, provides universal (i.e., applicable to optimal algorithms, such as the maximum-likelihood (ML) algorithm) order-optimal upper and lower bounds for the MSE, relating it to the spectral gap of a certain scaled version of the Laplacian of G. The very recent paper [11], for the BTL model only, introduces a LS algorithm and provides upper and lower bounds for a variant of the MSE and the relative tail-probabilities achievable by such algorithm, characterizing it in terms of the graph resistance.

Interesting works are also [12], [13], [14], [15], [16], [17], [18]. In [16], [17] a LS approach for ranking is first introduced, but no theoretical guarantees are given. In particular, [17] proposes Sync-Rank, a semi-definite programming algorithm based on the angular synchronization framework. In [12], [13], instead, an iterative algorithm that emulates a weighted random walk of graph G is proposed and its performance analyzed under the BTL model. In particular, it provides bounds on the MSE and the corresponding tail-probabilities. A direct comparison between the performance of algorithms proposed in [11], [14], [15] is reported in [11] where the LS approach is shown to be, in general, asymptotically more efficient. Under the BTL model, [12], [13] propose and analyze algorithms able to identify the top-k quality objects. At last, [18] describes a ranking algorithm based on the singular value decomposition approach by assuming that workers return unquantized noisy estimates

of objects quality differences.

Regarding online ranking algorithms, in [7], for the BTL model, an online algorithm inspired to a finite-budget version of quick sort is described, able to obtain an  (ǫ, δ)-PAC ranking with  O( Nǫ2 log N log Nδ )compar- isons. In [19], it is shown that, for online ranking algorithms, parametric models help to reduce the complexity only by logarithmic factors, in order sense.

In this work, unlike [11], we introduce a rather general parametric preference model according to which preference probabilities are determined by an arbitrary smooth monotonic function of object-quality differences. In this scenario, we show that order-optimal non-adaptive algorithms can be defined without the necessity of introducing any restriction to parameter  δ. In particular, differently from [10], [11], we work with the PAC framework and show that our algorithms are  (ǫ, δ)-PAC, provided that  O( Nǫ2 log( Nδ ))comparisons are blindly allocated in a single round. Observe that our preference model does not necessarily satisfy STI, while it satisfies SST. Our ranking procedure is based on the reconstruction of object qualities from pairwise quality differences, by adopting a LS approach akin to the one in [11]. Notice however that the analysis in [11] only applies to the case where  Ω(N log2( Nδ ))total comparisons are performed. Our analysis establishes a parallelism between the quality estimation process and the cumulative reward of random walks on graphs. As an original contribution, we also introduce a weighted LS algorithm with performance very close to the more complex ML algorithm. Finally, by simulation, we show that the performance of our algorithms is extremely good also in non-asymptotic scenarios.

The paper is organized as follows: in Section 2 we introduce a ranking algorithm based on the Maximum Likelihood (ML) approach, which is used as a performance reference. In Section 3 we describe our proposed LS estimation algorithm, whose asymptotic analysis is investigated in Section 4. The LS estimation algorithm is then tested in Sections 5 and 6 against synthetic and real-world datasets, respectively. Finally, in Section 7 we draw our conclusions.

1.3 Notation

Boldface uppercase and lowercase letters denote matrices and vectors, respectively. I is the identity matrix. The transpose of the matrix A is denoted by  AT, while [A]i,jindicates its (i, j)-th entry. For the sake of notation compactness we use the notation  A = {ai,j}to define a matrix A whose elements are  ai,j. Finally, the symbol ⊙represents the Hadamard product, while calligraphic letters denote sets or graphs.

Consider a graph G(V, E) with |V| = N vertices where each pair of objects  (i, j) ∈ Eis evaluated W times by independent workers 5. Without loss of generality we assume that the indices of the objects connected by the generic edge  (i, j) ∈ Eare such that i > j. Moreover, we assume that the m-th worker evaluating the pair of objects (i, j) outputs the binary random variable  wi,j,mwhose distribution is given by (2).

In our proposed ML approach, the estimate of the ranking can be obtained by sorting the quality estimates �q = [�q1, . . . , �qN]Twhich are obtained as follows:

image

(5) When workers are independent on each other and behave similarly, the random variables  wi,j,mcan be modeled as independent and identically distributed. Therefore, the conditional probability in (5) factorizes as

image

By using (2) we write

image

where we recall that  pi,j = F(qi − qj). By substituting the above result in (5), the ML estimate of the qualities q can be rewritten as

image

where

image

and  si,j = 1W�Wm=1 wi,j,m. The function  Ψ(q)has a finite global maximum. Indeed, since  pi,j ∈ [0, 1], and si,j ∈ [0, 1], it is straightforward to show that  Ψ(q) ≤ 0. However, in general,  Ψ(q)is a non-linear function of q and its maximization non trivial. Nevertheless, a local maximum can be found by using standard techniques such as, for example, the Newton-Raphson method which works iteratively and requires the function  F(·)to be twice differentiable.

Let  �qtbe the estimate of q at iteration t = 1, 2, . . ..Then the estimate of q at iteration t + 1 can be updated as follows:

image

where  ∇Ψ(q)and S(q) are, respectively, the gradient and the Hessian matrix of  Ψ(q). Specifically,  [∇Ψ(q)]h ≜

∂qhand  [S(q)]h,k ≜ ∂2Ψ(q)∂qh∂qk. In order to compute  ∇Ψ(q)and S(q) consider a generic node  h ∈ Vand the set Eh ⊆ Eof edges connecting node h to its neighbors. Then, the function  Ψ(q)can be rewritten as

image

where the term c does not depend on  qh. Since  pi,j =F(qi − qj), we can write the partial derivatives of  pi,jas follows:

image

and, similarly

image

It immediately follows that

image

and

image

Moreover, for  h ̸= k

image

The above equations can be specialized for both the Thurstone model as well as for the BTL model, by using the expressions for  pi,jprovided, respectively, in (3) and (4).

We propose a simpler linear estimation algorithm, based on a least-square criterion, that can be applied on the graph G(V, E). Let the distance between objects i and j be

image

and let  Wi,jbe the set of binary answers, of cardinality W, provided by the workers comparing the pair (i, j). Also, let  Ki,jbe the number of times object i is preferred to object j. Then, by construction,  Ki,jfollows the binomial distribution  Ki,j ∼ Bin(W, pi,j), where pi,j = F(di,j). Out of the evaluation results, an estimate �di,jof  di,jis formed as

image

where  �pi,j = Ki,j/Wis the estimate of  pi,j, and  yi,j =�pi,j − pi,jrepresents the estimation error on the probability  pi,j. Note that  yi,jhas zero mean and variance E[y2i,j] = pi,j(1−pi,j)W. As a consequence, �di,j = di,j + zi,j, where  zi,jrepresents the error on the estimate of  di,jinduced by the presence of  yi,j. From the set of noisy estimates  { �di,j, (i, j) ∈ E}, the estimate  �q = [�q1, . . . , �qN]Tof  q = [q1, . . . , qN]Tcan be obtained by solving the following LS optimization problem

image

where  ωi,jare arbitrary positive weights, whose setting is discussed in Section 3.1. The solution of (9) satisfies the following linear equations:

image

where  Nirepresents the neighborhood of node i (i.e., the set of nodes connected to i in G), and  ρiis its generalized degree, i.e.,  ρi = �j∈Ni ωi,j. We can compactly express the previous linear system in terms of the  N × Nmatrix �Hassociated to the graph G, whose elements are defined as

image

Let I be the identity matrix, �M = I − �H, and  Z = {zi,j}. Moreover let  D = {di,j}and �D = { �di,j}be, respectively, the antisymmetric matrices of the true and estimated quality differences6. Thus, from (10) we can write:

image

where  ⊙represents the Hadamard product and 1 = [1, . . . , 1]Tis a column vector of size N. We observe that, by construction,  rank(�M) = N − 1, i.e., �Mis singular. Indeed �M1 = 0, as it can be easily checked. This implies that the associated linear operator on  RNis not injective and that, given a solution  �q′of (10), also �q′′ = �q′ + α1is a solution of (10) for any  α ∈ R. Note, however, that, for the purposes of object ranking, the actual value of  αis irrelevant, since every solution of the form  �q′′ = �q′ + α1induces the same object ranking. Therefore, we can arbitrarily fix the quality of, say, object N to 0 as a reference, i.e.,  qN = 0. To keep into account this constraint, we define the new matrices H and M as follows:

image

and  M = I − H, respectively. We then replace �Mand �Hin (11) with, respectively, M and H. Since M is full rank, solving for q we obtain

image

where we have used the fact that  q = M−1(H ⊙ D)1.

3.1 Weight optimization

In the following, we will consider two possible choices for the weights  ωi,j. The first, which will be studied in the next section for its simplicity, corresponds to  ωi,j = 1for all i, j, and will be called unweighted LS or simply LS. The second, which will be called weighted LS (WLS), is dictated by the fact that the estimates �di,jdo not have the same reliability. Indeed, by developing (8) at the first order for  W → ∞, we obtain

image

so that, if we neglect the higher-order term,  zi,jis a zero-mean random variable with variance

image

Given the values of  qj, j ∈ Ni, the optimal weights for W → ∞in (10) are then proportional to  σ−2i,j. For our WLS algorithm, we will then set  ωi,j = �σ−2i,j, with

image

where  �pi,j = max(min(�pi,j, 1−χ), χ), for a small positive parameter  χsuch that dF 1(p)dp ���p=χexists finite. Note that, under this setting  0 < ωi,j < ∞.

All the theoretical results in this section are obtained by considering the unweighted LS estimator, for simplicity. However, they can be extended to the general weighted case as long as  minij ωi,j/ maxij ωi,jis bounded away from 0, as for the case described in Section 3.1.

The following propositions derive the conditions for the asymptotic convergence of the estimated qualities to their true values. We start by presenting a preliminary asymptotic result on the mean square error.

Proposition 4.1: Consider the unweighted LS estimator in (10). Assume that the degree of nodes of the graph are upper-bounded and define  ρinf ≜ infi ρi. Then the mean

image

where c is a constant,  λmaxCis the largest eigenvalue of C =�M−1� TM−1and  W ≥ β log Nfor a sufficiently large  β.

The proof is provided in Appendix A.

Even if an expression similar to (14) is reported in [10], we recall that the latter was derived for perfect ML-estimators under the assumption that  F(·)is log-concave; our results, instead, apply to LS algorithm for a generic strictly-increasing  F(·). Furthermore, (14) complements and extends results in [11] under more general settings (we recall that results in [11] apply to the BTL model only). It is also to be noted that the theoretical results in [11] only apply to the regime where W is large, i.e.,  W = Ω�log2 Nδ�. Under such constraint, for any connected graph, the total complexity of the algorithm in terms of number of comparisons is at least  Ω�N log2 Nδ�. From (14), we can deduce that, whenever  λmaxCis bounded (as for example in the case of Ramanujan graphs), by symmetry,  E[(�qi − qi)2] = O� 1Wρinf�, i =1, . . . , N. Thus, if  W → ∞for  N → ∞, then  �qconverges in probability to q. To find out the minimum number of comparisons under which the LS approach satisfies the  (ǫ, δ)-PAC conditions, we need to evaluate  P(supi |�qi − qi| > ǫ)for  ǫ > 0. The following proposition gives sufficient conditions in order for the absolute error to converge to zero in the properly defined limiting regime. Proposition 4.2: Consider the unweighted LS estimator in (10). For any  ǫ > 0, as N grows,  P(supi |�qi−qi| > ǫ) < δ, provided that i)  lim supN→∞ ∥M−1∥∞ < ∞(i.e., the  ∞-norm of M−1is bounded), ii) the total number of edges of G is O(N), and W > β(ǫ, δ) log Nfor some  β(ǫ, δ) = O�1ǫ2log Nδlog N�.

Assumption i) can be weakened by the following condi- tion i’):

i’)  lim supN→∞ supA:∥A∥∞≤1 ∥M−1(H ⊙ A)1∥∞ < ∞. The proof is provided in Appendix B.

Remark 4.1: Note that Proposition 4.2 provides suffi-cient conditions for the existence of a  (ǫ, δ)-PAC ranking algorithm with complexity  O( Nǫ2 log Nδ ). In the following subsection we characterize classes of graphs meeting condition (i) or (i’) of Proposition 4.2.

4.1 Considerations on graphs structure

image

computes the quality of object i as the average value of the sum of estimated quality differences along all paths joining node i to the reference node N. In other words, �qican be regarded as the average total reward earned by a standard random walk that starts from node i and stops as soon it hits node N, when estimates �di,jare the elementary rewards associated to graph edges [20]. Then, in Section 4.1.1 we restrict our asymptotic analysis to directed and acyclic graphs. Finally, in Section 4.1.2 we extend it to the more general class of undirected graphs.

4.1.1 G is directed and acyclic

An explicit solution of (10) can be given when graph G is turned into a directed graph, i.e., by imposing to all edges one of the two possible directions. While this assumption is suboptimal, since it constrains the random walk to a subset of possible trajectories, it greatly simpli-fies the analysis. Indeed, first observe that, for directed graphs, (10) can be rewritten as:

image

where  N −irepresents the set of in-neighborhoods of i and  ρ−i = |N −i |. Then, when the graph is directed and acyclic, and has the reference node, N, as a common ancestor, an explicit solution for  �qi, i = 1, . . . , N − 1, is

image

where

image

and  ℓiis the length of the longest (simple) path from node 1 to the reference node. Proposition 4.3 gives suffi-cient conditions for a directed acyclic graph to meet the requirements of Proposition 4.2. The proposition exploits the notion of proximality between nodes according to the following definition:

Definition 4.1: Given a family of graphs  {GN}N, we say that a node i is proximal to the reference node N, with parameters  (τ, h), if a random walk starting from i reaches the reference node N within h hops with a probability that is asymptotically (with N) bounded below by  τ.

Proposition 4.3: Given a family of directed and acyclic graphs  { �GN}Nwith bounded diameter, condition (i’) of Proposition 4.2 is satisfied if one of the following three conditions is met: (i) all paths from any node to the reference have bounded length, (ii)  supi ρ−i < ∞, or (iii) a fraction bounded away from 0 of the in-neighbors of any node is  (τ, h)proximal for some  τ > 0and  h < ∞. The proof is provided in Appendix C.

4.1.2 G is undirected

Now, let us go back to the original formulation (10) on the undirected graph. In the following, we will show that, considered from the point of view of a given node, the solution of (10) for an undirected graph can be obtained by defining an equivalent problem for a properly defined directed acyclic graph. Consider the graph G = (V, E) on N nodes and let T be the  (N −1)×(N −1)matrix obtained from matrix H by removing the last row and column (i.e., those corresponding to the reference node N). Consider a given node  i, i = 1, . . . , N − 1, and notice that�(I − T)−1�i,jgives the average number of times that node j is visited in the random walk starting from i, before ending in the reference node N [21]. Let

image

be the average number of times any edge incident to node j is traversed in the direction from j to its neighbors, in the standard random walk defined on G. Now, define a directed graph

if and only if  (j, ℓ) ∈ Eand one of the two following conditions are satisfied: (i) j < N and  ℓ = N, or (ii) j < N, ℓ < N, and  θj,i > θℓ,i.

image

of j in

only in-neighbors and node N (the reference node) has only out-neighbors. It is also easy to prove that

acyclic. Indeed, suppose that the cycle  (j1, j2, . . . , jr, j1)belongs to

θj2,i > · · · > θjr,i > θj1,i, which is impossible.

image

G i, for which, given that the current node is j, the probability of taking outgoing edge  (j, ℓ), ℓ ∈ N −jiis given by

image

The following proposition relates the standard random walk on G to the biased random walk on

image

can be obtained by solving

image

The proof is provided in Appendix D.

image

seen as the average total reward of the standard random walk on graph G or as the average total reward of the biased random walk on graph

proposition gives sufficient conditions for a family of graphs to meet the conditions of Proposition 4.2.

image

with bounded diameter, condition i’) of Proposition 4.2 is satisfied if, for each node  i, i = 1, . . . , N − 1one of the following conditions are satisfied: (i) all paths in

from i to the reference have bounded length, (ii) in

a fraction bounded away from 0 of the in-neighbors of any node is proximal.

The proof is provided in Appendix E.

Example 4.1: Consider the family of complete graphs on N nodes, i.e.,  GN = KN.7 Because of symmetry, we can easily see that, after a proper permutation of the nodes, (

the same reason, in (

2, . . . , N −1. Thus, the only surviving edges in (

the edges connected either to node 1 or to the reference node N. Then, the maximum path length from node 1 to the reference is 2. Thus this family of graphs meets condition i) of Prop. 4.5. In particular, the estimate of  qiis given by

image

bers. Let us build the family of graphs  {GN}N≥N ′as follows. Nodes  N − N ′ + 1, . . . , N(a set that includes the reference) are “hubs” with potentially unbounded degree. The subgraph induced by the hub nodes is a connected arbitrary graph. The remaining nodes are divided into  N ′subsets  S1, . . . , SN ′. Subset  Sj, j = 1, . . . , N ′, is composed of nodes with maximum degree  ∆, which are neighbors of hub node  N − j + 1and whose other neighbors all belong to  Sj. It is easy to see that, for this family of graphs, the diameter is bounded by  N ′ + 1.

image

reference must pass through the hub nodes, it is easy to see that, in (

belonging to  Sj ∪ {N − N ′ + 1, . . . , N}. Whenever the biased random walk on (

hub node  N − j + 1) does not enter it any more. Thus, we can divide into two parts the biased random walk: the first on the subgraph of (

hub node  N − j + 1serves as reference, and the second on the hub nodes. Then, we can deduce the following facts.

image

image

Fig. 1. Error probability achieved by several ranking algorithms plotted versus the complexity per object C/N, for N = 50. Object qualities are equally spaced in [0, 1) and the workers behave according to the Thurstone model.

Remark 4.2: Although our unweighted LS estimator is akin to the one in [11], our analysis of its performance differs substantially, also because we consider the PAC approach. Consequently, our characterization of “good” graphs does not coincide with that of [11]. For instance, a particular case of Example 4.2 is the wheel graph, which corresponds to choosing  N ′ = 1(the reference node as the only hub) and  ∆ = 3. From [11, Theorem 1], the wheel graph would require W = O(N) comparisons per edge in order for the upper bound on the estimation error to hold, since every edge belongs to a simple path from any node to the reference. Instead, Prop. 4.5 allows to conclude that  W > β(ǫ, δ) log Nis enough to achieve the  (ǫ, δ)-PAC.

Example 4.3: Star graphs represent a particular sequence  {GN}Nof acyclic graphs with bounded-length paths. Therefore, they satisfy condition i) of Prop. 4.5. In such a case, an object (let us say object 1) is taken as pivot (i.e. center of the star) and qualities of all the other objects are estimated only through direct comparisons with the pivot. Observe, that, in such particular case, ranking among objects can be directly inferred from  �pi,1without the necessity of inverting function  F(·). In practice it is enough to rank objects according to the following rules: �r(i) ≺ �r(j)iff  �pi,1 > �pj,1and  �r(i) ≺ �r(1)iff  �pi,1 > 1/2. Therefore, star graphs are appealing when function  F(·)(i.e. the precise worker model) is not known.

We present numerical results showing the performance of our proposed algorithms for moderate values of N. In Figures 1 and 2 we compare the error probability achieved by several ranking algorithms versus the complexity per object C/N. Objects qualities are equally

image

Fig. 2. Error probability achieved by several ranking algorithms plotted versus the complexity per object C/N, for N = 500. Object qualities are equally spaced in [0, 1) and the workers behave according to the Thurstone model.

spaced in the range [0, 1), i.e., object i has quality i/N where N = 50 (Fig. 1) and N = 500 (Fig. 2). Workers’ behavior is described by the Thurstone model detailed in Section 1.1 where  pi,j = F(qi − qj)and  F(·)is the cdf of a Gaussian random variable with zero mean and standard deviation  σ = 0.4. On the y-axis we display the empirical probability of generating an output which is not an  ǫ-quality ranking, for  ǫ = 0.04. Note that an error is counted whenever at least two objects, whose quality difference exceeds  ǫ, appear swapped in the estimated ranking. The curve labeled “MergeRank” refers to the Merge-Rank algorithm proposed in [22], which we consider as a performance reference. The LS, WLS 8, and ML algorithms have been applied to randomly generated regular graphs [23] whose nodes have degree  ρ = 6(lines with square marker) and  ρ = 12(lines without markers). The figure shows the superior performance of our ranking algorithm. It is interesting to observe that the WLS algorithm provides significant enhancements with respect to the LS algorithm and almost perfectly matches the performance of the more (computationally) complex ML approach. As the number of nodes increases our proposed solutions substantially outperform the “MergeRank” algorithm.

Figure 3 compares the performance of the LS and of the WLS algorithms for N = 200 objects. Object qualities are randomly generated according to a uniform distribution in [0, 1). Other system parameters are set as in Figure 2. The figure reports the empirical error probability plotted versus the number of tests per node, C/N, for different values of the degree of the nodes in the graph. We first observe that, given C/N, the number

image

Fig. 3. Performance of the LS and WLS ranking algorithms plotted against the complexity per object C/N, for N = 200 objects. Object qualities are drawn from a uniform distribution in [0, 1) and the workers behave according to the Thurstone model.

of tests per edge of the graph decreases as the degree, ρ, increases. Hence, as  ρincreases, distances between pairs of nodes (corresponding to edges of the graph) are estimated with a decreasing accuracy. In spite of that, a larger number of neighbors for each node (i.e., a larger ρ) leads a more reliable estimation of object qualities. This effect is more evident when the WLS algorithm is employed. Indeed, because of the weights  ωi,j, as ρincreases, WLS is able to well exploit the increasing number of highly-reliable edges in the graph connecting objects with similar qualities; at the same time WLS is able to limit the impact of the greater number of scarcelyreliable edges that connect objects with largely different qualities.

5.1 Adaptive multistage approach

The performance of the proposed ranking algorithms can be improved by adopting a multistage approach where, at each stage, new edges are added to the graph, depending on the quality estimates obtained at previous stage. The rationale of this approach stems from the fact that such algorithms provide approximate rankings, in which the probability of swapping the order of two objects increases as their distance (in terms of their qualities) decreases. Therefore, in order to mitigate this phenomenon and, thus, improve the reliability of the estimate, it is convenient to (i) add to the graph extra edges connecting neighboring objects (in terms of their estimated qualities); (ii) assign additional workers to the already existing edges connecting the aforementioned neighboring objects. This procedure can be iterated until a desired performance level is achieved.

image

Fig. 4. Error probability provided by ML and WLS algorithms when a 2-stage adaptive approach is employed, for  ρ = ρ(1) = ρ(2) = 6, 12, and N = 50.

In our simulation setup, we have considered a 2-stage approach where we first apply the estimation algorithm to a random regular graph,  G(1)(V, E(1)), of degree  ρ(1), obtaining the vector of estimates  �q(1). In the second stage, we create a new regular graph,  G(2)(V, E(2))of degree  ρ(2), where each node is connected to its  ρ(2)closest neighbors, according to the estimates  �q(1). Finally, the estimation algorithm is applied to the graph G(1)∪G(2)obtaining the output  �q(2)which is used to infer the ranking. In Figure 4 we show the performance of the ML and WLS algorithms when the proposed multistage approach is employed. For both algorithms we show the error probability versus the number of tests per object, C/N, for  ρ = ρ(1) = ρ(2) = 6, 12, and N = 50. We observe that the second stage allows for a significant improvement of the performance and a reduction of about 60% of the required tests per object for  ρ = 6and of about 30% for  ρ = 12. In both cases the performance of the WLS algorithm is very close to that provided by the ML algorithm.

In this section, we show that our algorithm works well even when considering a real scenario, where the “evaluations” are the outcome of experiments, and not synthetically generated by simulations. In particular, we consider five recent seasons of the English Premier League and build up a N = 20 complete graph, where nodes are the football teams and edges are the matches between each pair of them. The match between team i and team j is considered as lasting for 180 minutes, since it includes both the round when i is at home and the round where i is away. If team i has scored  xijgoals in the match against team j, we count  Kij = αxij + βevaluations in

image

Fig. 5. Distance between true and estimated ranking for Premier League scores. The x-axis is the season. The y-axis is the Kendall tau distance (number of inversions) between the final season ranking and the output of the WLS algorithm, for different choices of the model and of the parameters.

favor of i when compared to j, where  α > 0and  β ≥ 0are constant. The total number of comparisons between i and j is then simply  Wij = Kij + Kji9.

The WLS algorithm has been run with  χ = 10−4and both the Thurstone and BTL models, to see the influence of the underlying worker model. The true ranking is assumed to be the final season ranking. The results have been plotted in terms of the Kendall tau distance, which counts the number of inversions in the estimated ranking with respect to the true ranking, i.e. the number of pairs (i, j) for which i is ranked better than j in the true ranking and worse than j in the estimated one.

Results are shown in Fig. 5. First, we can observe that the performance is better with  β > 0than with  β = 0, since in the latter case there might be some edges for which the estimated preference probability is very close to either 0 or 1. Such edges are automatically dropped by the WLS algorithm, while in the former case each object in each comparison receives at least  βpreferences, so that all edges are used for ranking computation. Second, the Thurstone model seems to be slightly better suited than the BTL model. Third, while in most cases, the influence of parameters is limited, there are cases (like season 16/17) that are more sensible to the choice of  αand  β. It is worth mentioning that, in [17], the Sync-Rank algorithm is applied to older Seasons of the Premier League. Comparatively, for  β > 0, Kendall tau distance for our algorithm never goes beyond 20, giving rise to a Kendall correlation larger than 0.90, which is a better result than those shown in [17].

In this paper, we focused on the problem of ranking N objects starting from a set of noisy pairwise comparisons. Objects are assumed to be endowed with intrinsic qualities. The probability  pi,jthat object i is preferred to j is given by an arbitrary smooth monotonic function of the difference between qualities of the two competitors. For such a scenario we developed a class of order-optimal ranking algorithms, i.e. algorithms, which are provably (ǫ, δ)-PAC when  O( Nǫ2 log( Nδ ))comparisons are blindly allocated in a single round. Our ranking procedure is based on the reconstruction of object qualities, from pairwise quality differences, by adopting a simple LS approach. The analysis establishes a parallelism between the quality estimations process and the cumulative reward accumulated by random walks on graphs. Finally, by simulation, we show that the performance of our algorithms, and further variants, is extremely good also in non-asymptotic scenarios and approaches that obtained by the ML algorithm. Our results complement and extend previous recent studies [7], [8], [9] on the minimal complexity of ranking algorithms under different non-parametric preference models.

[1] M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click-through rate for new ads,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07, 2007, pp. 521–530.

[2] R. Herbrich, T. Minka, and T. Graepel, “TrueskillTM: A bayesian skill rating system,” in Advances in Neural Information Processing Systems 19, B. Sch¨olkopf, J. C. Platt, and T. Hoffman, Eds. MIT Press, 2007, pp. 569–576.

[3] R. L. Plackett, “The analysis of permutations,” Applied Statistics, pp. 193–202, 1975.

[4] R. D. Luce, Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.

[5] L. L. Thurstone, “The method of paired comparisons for social values.” The Journal of Abnormal and Social Psychology, vol. 21, no. 4, p. 384, 1927.

[6] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. the method of paired comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.

[7] B. Sz¨or´enyi, R. Busa-Fekete, A. Paul, and E. H¨ullermeier, “Online rank elicitation for plackett-luce: A dueling bandits approach,” in Advances in Neural Information Processing Systems, 2015, pp. 604– 612.

[8] M. Falahatgar, A. Orlitsky, V. Pichapati, and A. T. Suresh, “Maximum selection and ranking under noisy comparisons,” arXiv preprint arXiv:1705.05366, 2017.

[9] M. Falahatgar, A. Jain, A. Orlitsky, V. Pichapati, and V. Ravindrakumar, “The limits of maxing, ranking, and preference learning,” in International Conference on Machine Learning, 2018, pp. 1426–1435.

[10] N. B. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchan- dran, and M. J. Wainwright, “Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2049–2095, 2016.

[11] J. Hendrickx, A. Olshevsky, and V. Saligrama, “Graph resistance and learning from pairwise comparisons,” in Proceedings of the 36th International Conference on Machine Learning. International Machine Learning Society, 2019.

[12] M. Jang, S. Kim, C. Suh, and S. Oh, “Top-k ranking from pairwise comparisons: When spectral ranking is optimal,” arXiv preprint arXiv:1603.04153, 2016.

[13] Y. Chen and C. Suh, “Spectral mle: Top-k rank aggregation from pairwise comparisons,” in International Conference on Machine Learning, 2015, pp. 371–380.

[14] S. Negahban, S. Oh, and D. Shah, “Iterative ranking from pair- wise comparisons,” in Advances in neural information processing systems, 2012, pp. 2474–2482.

[15] ——, “Rank centrality: Ranking from pairwise comparisons,” Operations Research, vol. 65, no. 1, pp. 266–287, 2016.

[16] A. N. Hirani, K. Kalyanaraman, and S. Watts, “Least squares ranking on graphs, hodge laplacians, time optimality, and iterative methods,” CoRR, vol. abs/1011.1716, 2010. [Online]. Available: http://arxiv.org/abs/1011.1716

[17] M. Cucuringu, “Sync-rank: Robust ranking, constrained ranking and rank aggregation via eigenvector and semidefinite programming synchronization,” CoRR, vol. abs/1504.01070, 2015. [Online]. Available: http://arxiv.org/abs/1504.01070

[18] A. d’Aspremont, M. Cucuringu, and H. Tyagi, “Ranking and synchronization from pairwise measurements via svd,” arXiv preprint arXiv:1906.02746, 2019.

[19] K. R. M. J. W. Reinhard Heckel, Nihar B. Shah, “Active ranking from pairwise comparisons and when parametric assumptions don’t help,” arXiv preprint arXiv:1606.08842, 2016.

[20] B. Ewald, J. Humpherys, and J. West, “Computing expected transition events in reducible markov chains,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1040–1054, 2010.

[21] J. G. Kemeny and J. L. Snell, Markov Chains. Springer-Verlag, New York, 1976.

[22] M. Falahatgar, Y. Hao, A. Orlitsky, V. Pichapati, and V. Ravin- drakumar, “Maxing and ranking with few assumptions,” in Advances in Neural Information Processing Systems, 2017, pp. 7063– 7073.

[23] B. Bollob´as, Random graphs. Cambridge university press, 2001.

1 Ranking a set of objects: a graph based least-square approach

Supplemental material

image

Given that:

image

we can write the MSE, on the estimate of q as

where  C = (M−1)TM−1is a deterministic matrix and  λmaxCis the largest eigenvalue of C. By using the definition of the matrix H the term  E[1T(H ⊙ Z)T(H ⊙ Z)1]in (20) can be expanded as

image

In order to evaluate the averages in (21) we consider the following linear approximation of  zi,j

image

where  γi,j = dF −1(p)dp |p=pi,jare positive constants. By applying this result to (21) we obtain

image

thanks to the fact that  E[yi,j] = 0and that  E�y2i,j�= pi,j(1−pi,j)W. As for the term  E�z2k,i�we can write

image

Since,  yi,jis defined as a difference of two probabilities we have  |yi,j| ≤ 1. Then  y4i,j ≤ |y3i,j| ≤ y2i,j. This result allows to upper-bound the term  E[z2i,j]as

image

for some constant  c1 > 0. In conclusion the MSE can be bounded as

image

where, c is a constant, we assumed that  ρiis uniformly upper-bounded for any i, we defined  ρinf ≜ infi ρi, and used the bound  pi,j(1 − pi,j) ≤ 14.

image

From (13) we can write the error on the quality estimates as  �q − q = M−1(H ⊙ Z)1. We can then upper-bound the term  supi |�qi − qi|by using the infinity norm as follows

image

thanks to the fact that H is substochastic. Let  lim supN→∞ ∥M−1∥∞ = K < ∞. Then we can write

image

which converges to 0 as  N → ∞under the conditions ii) and iii), as stated in Appendix F, Proposition F.1.

The last statement of the proposition can be proved by considering a sequence of operators  A{·} : RN×N →RNmapping the sequence of matrices  A = {ai,j}with bounded norm  ∥A∥∞ = supi,j |ai,j| ≤ 1into a set of vectors  M−1(H ⊙ A)1with uniformly bounded  ∞-norm. Let  K ≥ ∥M−1(H ⊙ A)1∥∞. We know that, as  N → ∞, sup(i,j)∈E |zi,j| < ǫKwith a probability larger than  1−δ. Then, we can write  Z = κAwhere  κ = ∥Z∥∞and  A = Z∥Z∥∞is such that  ∥A∥∞ = 1. It follows that  ∥M−1(H ⊙ Z)1∥∞ = κ∥M−1(H ⊙ A)1∥∞ < ǫwith a probability larger than 1 − δ.

image

Let us consider the directed graph �Gon N nodes. The proof of the proposition descends from the fact that (18) provides an explicit expression for operator  b = M−1(H ⊙ A)1, i.e.,

image

where

and  liis the length of the longest (simple) path from i to the reference node n. Now, we denote with  Pithe set of paths from i to n, and for any path  p ∈ Piwe denote with L(p) and p(h) respectively the graph-theoretical length (expressed in number of hops) of p and h-th node along p. By inspection, it can be easily seen that the previous expression can be rewritten as follows:

image

From the above expression, it is rather immediate to check that  ∥bi∥∞ <

D �G supi,j |ai,j|whereD �Gis the length of the longest path to node n on �G. Thus, ifD �Gis finite and  supi,j |ai,j| = 1, also  ∥bi∥∞is finite, and condition i) of Proposition 4.3 is demonstrated. Observe that, if we interpret  ap(h)p(h+1)as the elementary reward associated to edge (p(h), p(h + 1)), �L(p)h∈1 ap(h)p(h+1)can be regarded as the total reward associated to path p. Furthermore,  �L(p)h=1 1ρ−p(h)is equal to

the probability, for a Random Walker (henceforth, RWer) on �Gthat starts in i and ends in n, to take path p. As a result,  bican be interpreted as the expected total reward accumulated by the RWer starting in i and stopping as soon as it reaches node n.

Now, assuming  supi,j |ai,j| = 1, we have that, for i = 1, . . . , n,

image

∞�t=1P{RWer is still active after t hops}

image

where  E[Ti]represents the average stopping (i.e., absorbing) time for the RWer. Therefore, from the above considerations we can write  lim supN→∞ supi |bi| < ∞if  lim supN→∞ supi E[Ti] < ∞. Now, we show that under assumptions ii)  lim supN→∞ supi E[Ti] < ∞.

Let D denote the diameter of �G. By construction, from any node i < n, there exists one path of length at most D that leads to node n. We call the shortest path from i to n as critical. Define the following events:

A[k] = {RWer is still active after t = kD hops}

Wi[k] = {RWer in i (active) at time t = kD } Ki[k] = {RWer takes the critical path from i

image

We have

image

Now, uniformly over i,

image

Therefore, since by construction �n−1i=1 P {Wi[k − 1] | A[k − 1]} = 1we have that:

image

from which we get:

The assertion concerning condition ii) follows immediately,

Now, the previous argument can be easily extended under iii). Let D be now any finite integer sufficiently large and keep the same definitions of the events  A[k], Wi[k]and  Ki[k]. We denote with B the set of proximal nodes, i.e., of the nodes satisfying the  (τ, D)property. Let alsoB = V\{B ∪ {N}}be the set of non-proximal nodes. We qualify as critical any path that, from a given node, reaches the reference node within  D′hops. In particular let  α > 0be a uniform lower bound to the fraction of in-edges connecting an arbitrary node v to B.

Note that, by construction, at every instant the RWer, if active, is visiting a node in B with a probability at least α. Therefore:

image

Then, proceeding as before we get the result.

The proof descends from the analysis of (13). Let us define

image

so that

image

Thus, from (13), we get

where  θj,iis defined in (16) for  j < n, θN,i = 0, and we have used the fact that �dℓ,j = − �dj,ℓ.

Now consider the directed graph

G i. Without loss of generality, let us consider the case i = 1 and, whenever

possible, let us drop the subscript and write simply

G, θj, etc. Let  N −jand  N +jbe the out-neighbors and in-neighbors of node j, respectively. We first prove the following lemma.

Lemma D.1: With the previous definitions,

image

Proof: For 1 < j < N, we have on the indirect graph G

image

Now, from the definition of

E , it turns out that  Nj = N −j ∪ N +j ∪ N 0j, where  N 0jis the subset of neighbors  ℓof

j such that  θj = θℓ. Reordering the terms in (39), we obtain (38). For j = 1, we can proceed in the same way, with the additional remark that  N +1is empty.

Consider a perturbed Random Walker (RWer), which starts from node 1 and, given that it has reached node

j, proceeds through edge  (j, ℓ) ∈

E with probability  ηj→ℓdefined in (17). The proof of Proposition 4.4 consists in

showing that the probability for the RWer to pass through edge  (j, ℓ) ∈

image

We know by definition that  ξ1 = 1. Moreover, for  ℓ > 1, ξℓsatisfies the linear equation

image

H be the random-walk Laplacian matrix for the perturbed random walk defined above, i.e., (

Let also

H1:N−1,1:N−1. Finally, let  ξ = (ξ2, . . . , ξN−1)be the row vector of node probabilities. Now, (41) can be written in matrix form as

image

which can be solved univocally as

Now notice also that  ξ∗j = �ℓ∈N +j (θℓ − θj)satisfies (41), thanks to the definition of  ηj→ℓin (17) and Lemma D.1.

Then, since (41) has a unique solution given by (43), we conclude that  ξ = ξ∗ = (ξ∗2, . . . , ξ∗N−1).

As a consequence, the probability for the RWer to pass through edge  (j, ℓ) ∈

image

Thus, the average total reward accumulated by the RWer before being absorbed will be given by

image

which coincides with (37). This concludes the proof.

image

The proof descends from the fact that (18) provides an explicit expression for the operator  b = M−1(H ⊙ A)1:

image

where  Piis the set of paths from i to n, and, for any path  π ∈ Pi, L(π)and  π(h)are the graph-theoretical length (expressed in number of hops) of  πand h-th node along  π, respectively. Notice that (45) is similar to (28), except for the expression of the probabilities associated to the graph edges.

Notice that conditions i) and ii) of Prop. 3.5 coincide with conditions i) and iii) of Prop. 3.3. The proof of Prop. 3.5 then follows immediately from the fact that the proofs of conditions i) and iii) of Prop. 3.3 do not explicitly depend on the expression of the edge probabilities.

image

Proposition F.1: For any  ǫ > 0and  δ > 0, there exists  β(ǫ, δ)such that, as  N → ∞,

image

< δ and P

image

provided that  W > β(ǫ, δ) log Nwith  β(ǫ, δ) = O�1ǫ2log Nδlog N�and the total number of edges of graph G is |E| = O(N). Proof: We first use the union bound and write P�sup(i,j)∈E |yi,j| > ǫ�≤ �i P(yi,j > ǫ) + �i P(−yi,j > ǫ). We then observe that the MGF  φyi,j(t)of  yi,jis given by:

image

Then we bound  P(yi,j > ǫ)by applying the Chernoff bound:

image

By setting  t = ζ log N, and  W = β log N, for a sufficiently large  β = β(ǫ.δ), we have

image

ζβ − 1) = β(log(1 + pi,jζβ + O( ζ2β2 )) = ζpi,j + O( ζ2β ). Now, for  βsufficiently large, we can always

image

by imposing that  N 1− ǫζ2 > δ, we get that  β(ǫ, δ) = O�1ǫ2log Nδlog N�for the more general case.

Similarly, the term �i P(−yi,j > ǫ)also tends to 0 as N grows. As for the second claim of the proposition we can write again  P�sup(i,j)∈E |zi,j| > ǫ′�≤ �i P(zi,j > ǫ) + �i P(−zi,j > ǫ). We then recall that  zi,j = �di,j − di,j, �di,j = F −1(yi,j + pi,j), and  di,j = qi − qj. It follows that

image

By defining  ǫ ≜ F(ǫ′ + F(qi − qj)) − F(qi − qj) > 0the convergence of  P(zi,j > ǫ′)to 0 as N grows immediately follows. Similarly, it is straightforward to prove the convergence to 0 of the term �i P(−zi,j > ǫ).


Designed for Accessibility and to further Open Science