Ranking algorithms have many applications. For example they are used for ranking pages, users preferences against advertisements on the web, hotels, restaurants, or online games [1], [2]. In general a ranking algorithm infers an estimated order relation among objects starting from a set of evaluations or comparisons. Sometimes, such evaluations are performed by human “workers” in the framework of crowdsourcing applications. However, since the behavior of humans cannot be deterministically predicted, it is usually described through the adoption of a probabilistic model. Then, the challenge in designing algorithms, is the ability to infer reliable estimates of the ranking starting from “noisy” evaluations of the objects. Often the ranking algorithm resorts to pairwise comparisons of objects. In this work, we focus on such a class of ranking algorithms. Several stochastic models have been proposed in the literature [3], [4], [5], [6] to represent the outcome of comparisons. Most of them are based on the idea that objects to be compared have an intrinsic quality and that the probability, , that object i is preferred to object j depends on their qualities
and
. In this context, we devise a class of efficient algorithms, which reconstruct object qualities from pairwise difference through a least-square (LS) approach. To do so, we establish a parallelism between the estimation process and the average cumulative reward of random walks on a weighted graph.
1.1 System model
Let be a compact set. We assume that N objects are available for ranking: object i is provided with an intrinsic quality,
, which is unknown to the system. Qualities induce a true ranking r among objects, in which
iff
. A ranking algorithm resorts to a set of observations (or answers) provided by workers, which compare pairs of objects and return the identity of the object they prefer. The comparison procedure implicitly contains some randomness reflect-ing the workers’ behavior. Thus, in general, workers’ answers can be modeled as a collection of binary random variables, whose distribution depends on the qualities of the objects to be evaluated.
Due to this randomness in the evaluation process, the inferred ranking for object , does not always coincide with the true ranking r(i). The reliability of
depends on how the evaluation process is organized. In particular, it depends on (i) the workers’ behavior, (ii) the choice of the set of object pairs to be compared, (iii) the number of workers assigned to each pair of objects, and (iv) the processing algorithm used to infer the ranking from workers’ answers.
We assume that all workers behave similarly and that they provide independent answers. In particular, a worker comparing objects i and j, will express a preference for object i against j with probability:
where the function is differentiable and strictly increasing in its argument (and therefore invertible) and such that
. Moreover, we assume that
is bounded away from zero for
where
. When the pair of objects (i, j) is compared, the worker’s output is modeled as a binary random variable,
, whose outcomes have probability
The model in (1) is pretty general. For example, it encompasses
• the Thurstone model [5], where the preferred object (in a pair) is chosen in accordance with the qualities as perceived by the worker and defined as
respectively, where and
are zero-mean random variables that represent noise terms. In this case
is the cumulative distribution function of the zero-mean random variable
, i.e.,
• the Bradley-Terry-Luce (BTL) model [3], [4], where
Let V = {1, . . . , N} be the set of objects. We observe that an arbitrary choice of a set of object pairs to be compared, denoted by , automatically induces an undirected graph G, whose vertex and edge sets are, respectively, V and E. Clearly, it is possible to infer a ranking among the N objects only if the graph G is connected.
Each object pair is assigned to a number of workers W. In general, an increase of W leads to a more reliable estimate of the ranking. On the other hand, the overall complexity, C, of the ranking algorithm is proportional to the total number of workers employed in the process, i.e.,
Then, an efficient ranking algorithm must find a good trade-off between the complexity C and the reliability of the inferred ranking, i.e., by returning an almost correct ranking of objects with a minimal number of pair comparisons.
About the reliability of the inferred ranking we say that an estimated ranking is -quality approximately correct (or, is an
-quality ranking) if
whenever
. Moreover a ranking algorithm is
-PAC [7], [8], [9] if it returns an
-quality ranking with a probability larger than
.2
1.2 Paper contribution and related work
This paper contributes to a better understanding of the fundamental limits of ranking algorithms based on noisy pairwise comparisons. Our main results complement and extend previous findings about minimal complexity of ranking algorithms under different non-parametric preference models recently derived in [?], [8], [9]. As shown in former studies the efficiency of ranking algorithm is crucially determined by the structure of the underlying preference model.
On the one hand, under a non-parametric preference model satisfying both Strong Stochastic Transitivity (SST) and Stochastic Triangle Inequality (STI) properties,3 a provably asymptotically-optimal4 adaptive algorithm has been proposed, under the restriction that . In particular, the algorithm proposed in [9] is
-PAC provided that
comparisons are dynamically allocated on the basis of previous outcomes. On the other hand, in [8], [9], it is shown that
comparisons are strictly needed to obtain a reliable ranking as soon as either STI or SST are relaxed.
When considering parametric models, estimating a ranking is essentially related to estimating the under- lying qualities. [10], [11] provide a characterization of the expected norm-two distance between estimated and true qualities (later on referred to as mean square error (MSE)), in connection with the properties of a fixed graph G. In particular [10], under the assumption that is log-concave, provides universal (i.e., applicable to optimal algorithms, such as the maximum-likelihood (ML) algorithm) order-optimal upper and lower bounds for the MSE, relating it to the spectral gap of a certain scaled version of the Laplacian of G. The very recent paper [11], for the BTL model only, introduces a LS algorithm and provides upper and lower bounds for a variant of the MSE and the relative tail-probabilities achievable by such algorithm, characterizing it in terms of the graph resistance.
Interesting works are also [12], [13], [14], [15], [16], [17], [18]. In [16], [17] a LS approach for ranking is first introduced, but no theoretical guarantees are given. In particular, [17] proposes Sync-Rank, a semi-definite programming algorithm based on the angular synchronization framework. In [12], [13], instead, an iterative algorithm that emulates a weighted random walk of graph G is proposed and its performance analyzed under the BTL model. In particular, it provides bounds on the MSE and the corresponding tail-probabilities. A direct comparison between the performance of algorithms proposed in [11], [14], [15] is reported in [11] where the LS approach is shown to be, in general, asymptotically more efficient. Under the BTL model, [12], [13] propose and analyze algorithms able to identify the top-k quality objects. At last, [18] describes a ranking algorithm based on the singular value decomposition approach by assuming that workers return unquantized noisy estimates
of objects quality differences.
Regarding online ranking algorithms, in [7], for the BTL model, an online algorithm inspired to a finite-budget version of quick sort is described, able to obtain an -PAC ranking with
compar- isons. In [19], it is shown that, for online ranking algorithms, parametric models help to reduce the complexity only by logarithmic factors, in order sense.
In this work, unlike [11], we introduce a rather general parametric preference model according to which preference probabilities are determined by an arbitrary smooth monotonic function of object-quality differences. In this scenario, we show that order-optimal non-adaptive algorithms can be defined without the necessity of introducing any restriction to parameter . In particular, differently from [10], [11], we work with the PAC framework and show that our algorithms are
-PAC, provided that
comparisons are blindly allocated in a single round. Observe that our preference model does not necessarily satisfy STI, while it satisfies SST. Our ranking procedure is based on the reconstruction of object qualities from pairwise quality differences, by adopting a LS approach akin to the one in [11]. Notice however that the analysis in [11] only applies to the case where
total comparisons are performed. Our analysis establishes a parallelism between the quality estimation process and the cumulative reward of random walks on graphs. As an original contribution, we also introduce a weighted LS algorithm with performance very close to the more complex ML algorithm. Finally, by simulation, we show that the performance of our algorithms is extremely good also in non-asymptotic scenarios.
The paper is organized as follows: in Section 2 we introduce a ranking algorithm based on the Maximum Likelihood (ML) approach, which is used as a performance reference. In Section 3 we describe our proposed LS estimation algorithm, whose asymptotic analysis is investigated in Section 4. The LS estimation algorithm is then tested in Sections 5 and 6 against synthetic and real-world datasets, respectively. Finally, in Section 7 we draw our conclusions.
1.3 Notation
Boldface uppercase and lowercase letters denote matrices and vectors, respectively. I is the identity matrix. The transpose of the matrix A is denoted by , while
indicates its (i, j)-th entry. For the sake of notation compactness we use the notation
to define a matrix A whose elements are
. Finally, the symbol
represents the Hadamard product, while calligraphic letters denote sets or graphs.
Consider a graph G(V, E) with |V| = N vertices where each pair of objects is evaluated W times by independent workers 5. Without loss of generality we assume that the indices of the objects connected by the generic edge
are such that i > j. Moreover, we assume that the m-th worker evaluating the pair of objects (i, j) outputs the binary random variable
whose distribution is given by (2).
In our proposed ML approach, the estimate of the ranking can be obtained by sorting the quality estimates which are obtained as follows:
(5) When workers are independent on each other and behave similarly, the random variables can be modeled as independent and identically distributed. Therefore, the conditional probability in (5) factorizes as
By using (2) we write
where we recall that . By substituting the above result in (5), the ML estimate of the qualities q can be rewritten as
where
and . The function
has a finite global maximum. Indeed, since
, and
, it is straightforward to show that
. However, in general,
is a non-linear function of q and its maximization non trivial. Nevertheless, a local maximum can be found by using standard techniques such as, for example, the Newton-Raphson method which works iteratively and requires the function
to be twice differentiable.
Let be the estimate of q at iteration t = 1, 2, . . ..Then the estimate of q at iteration t + 1 can be updated as follows:
where and S(q) are, respectively, the gradient and the Hessian matrix of
. Specifically,
and
. In order to compute
and S(q) consider a generic node
and the set
of edges connecting node h to its neighbors. Then, the function
can be rewritten as
where the term c does not depend on . Since
, we can write the partial derivatives of
as follows:
and, similarly
It immediately follows that
and
Moreover, for
The above equations can be specialized for both the Thurstone model as well as for the BTL model, by using the expressions for provided, respectively, in (3) and (4).
We propose a simpler linear estimation algorithm, based on a least-square criterion, that can be applied on the graph G(V, E). Let the distance between objects i and j be
and let be the set of binary answers, of cardinality W, provided by the workers comparing the pair (i, j). Also, let
be the number of times object i is preferred to object j. Then, by construction,
follows the binomial distribution
, where
. Out of the evaluation results, an estimate
of
is formed as
where is the estimate of
, and
represents the estimation error on the probability
. Note that
has zero mean and variance
. As a consequence,
, where
represents the error on the estimate of
induced by the presence of
. From the set of noisy estimates
, the estimate
of
can be obtained by solving the following LS optimization problem
where are arbitrary positive weights, whose setting is discussed in Section 3.1. The solution of (9) satisfies the following linear equations:
where represents the neighborhood of node i (i.e., the set of nodes connected to i in G), and
is its generalized degree, i.e.,
. We can compactly express the previous linear system in terms of the
matrix
associated to the graph G, whose elements are defined as
Let I be the identity matrix, , and
. Moreover let
and
be, respectively, the antisymmetric matrices of the true and estimated quality differences6. Thus, from (10) we can write:
where represents the Hadamard product and 1 =
is a column vector of size N. We observe that, by construction,
, i.e.,
is singular. Indeed
, as it can be easily checked. This implies that the associated linear operator on
is not injective and that, given a solution
of (10), also
is a solution of (10) for any
. Note, however, that, for the purposes of object ranking, the actual value of
is irrelevant, since every solution of the form
induces the same object ranking. Therefore, we can arbitrarily fix the quality of, say, object N to 0 as a reference, i.e.,
. To keep into account this constraint, we define the new matrices H and M as follows:
and , respectively. We then replace
and
in (11) with, respectively, M and H. Since M is full rank, solving for q we obtain
where we have used the fact that .
3.1 Weight optimization
In the following, we will consider two possible choices for the weights . The first, which will be studied in the next section for its simplicity, corresponds to
for all i, j, and will be called unweighted LS or simply LS. The second, which will be called weighted LS (WLS), is dictated by the fact that the estimates
do not have the same reliability. Indeed, by developing (8) at the first order for
, we obtain
so that, if we neglect the higher-order term, is a zero-mean random variable with variance
Given the values of , the optimal weights for
in (10) are then proportional to
. For our WLS algorithm, we will then set
, with
where , for a small positive parameter
such that
exists finite. Note that, under this setting
All the theoretical results in this section are obtained by considering the unweighted LS estimator, for simplicity. However, they can be extended to the general weighted case as long as is bounded away from 0, as for the case described in Section 3.1.
The following propositions derive the conditions for the asymptotic convergence of the estimated qualities to their true values. We start by presenting a preliminary asymptotic result on the mean square error.
Proposition 4.1: Consider the unweighted LS estimator in (10). Assume that the degree of nodes of the graph are upper-bounded and define . Then the mean
where c is a constant, is the largest eigenvalue of
and
for a sufficiently large
.
The proof is provided in Appendix A.
Even if an expression similar to (14) is reported in [10], we recall that the latter was derived for perfect ML-estimators under the assumption that is log-concave; our results, instead, apply to LS algorithm for a generic strictly-increasing
. Furthermore, (14) complements and extends results in [11] under more general settings (we recall that results in [11] apply to the BTL model only). It is also to be noted that the theoretical results in [11] only apply to the regime where W is large, i.e.,
. Under such constraint, for any connected graph, the total complexity of the algorithm in terms of number of comparisons is at least
. From (14), we can deduce that, whenever
is bounded (as for example in the case of Ramanujan graphs), by symmetry,
1, . . . , N. Thus, if
for
, then
converges in probability to q. To find out the minimum number of comparisons under which the LS approach satisfies the
-PAC conditions, we need to evaluate
for
. The following proposition gives sufficient conditions in order for the absolute error to converge to zero in the properly defined limiting regime. Proposition 4.2: Consider the unweighted LS estimator in (10). For any
, as N grows,
, provided that i)
(i.e., the
-norm of
is bounded), ii) the total number of edges of G is O(N), and W >
for some
.
Assumption i) can be weakened by the following condi- tion i’):
i’) . The proof is provided in Appendix B.
Remark 4.1: Note that Proposition 4.2 provides suffi-cient conditions for the existence of a -PAC ranking algorithm with complexity
. In the following subsection we characterize classes of graphs meeting condition (i) or (i’) of Proposition 4.2.
4.1 Considerations on graphs structure
computes the quality of object i as the average value of the sum of estimated quality differences along all paths joining node i to the reference node N. In other words, can be regarded as the average total reward earned by a standard random walk that starts from node i and stops as soon it hits node N, when estimates
are the elementary rewards associated to graph edges [20]. Then, in Section 4.1.1 we restrict our asymptotic analysis to directed and acyclic graphs. Finally, in Section 4.1.2 we extend it to the more general class of undirected graphs.
4.1.1 G is directed and acyclic
An explicit solution of (10) can be given when graph G is turned into a directed graph, i.e., by imposing to all edges one of the two possible directions. While this assumption is suboptimal, since it constrains the random walk to a subset of possible trajectories, it greatly simpli-fies the analysis. Indeed, first observe that, for directed graphs, (10) can be rewritten as:
where represents the set of in-neighborhoods of i and
. Then, when the graph is directed and acyclic, and has the reference node, N, as a common ancestor, an explicit solution for
, is
where
and is the length of the longest (simple) path from node 1 to the reference node. Proposition 4.3 gives suffi-cient conditions for a directed acyclic graph to meet the requirements of Proposition 4.2. The proposition exploits the notion of proximality between nodes according to the following definition:
Definition 4.1: Given a family of graphs , we say that a node i is proximal to the reference node N, with parameters
, if a random walk starting from i reaches the reference node N within h hops with a probability that is asymptotically (with N) bounded below by
.
Proposition 4.3: Given a family of directed and acyclic graphs with bounded diameter, condition (i’) of Proposition 4.2 is satisfied if one of the following three conditions is met: (i) all paths from any node to the reference have bounded length, (ii)
, or (iii) a fraction bounded away from 0 of the in-neighbors of any node is
proximal for some
and
. The proof is provided in Appendix C.
4.1.2 G is undirected
Now, let us go back to the original formulation (10) on the undirected graph. In the following, we will show that, considered from the point of view of a given node, the solution of (10) for an undirected graph can be obtained by defining an equivalent problem for a properly defined directed acyclic graph. Consider the graph G = (V, E) on N nodes and let T be the matrix obtained from matrix H by removing the last row and column (i.e., those corresponding to the reference node N). Consider a given node
, and notice that
gives the average number of times that node j is visited in the random walk starting from i, before ending in the reference node N [21]. Let
be the average number of times any edge incident to node j is traversed in the direction from j to its neighbors, in the standard random walk defined on G. Now, define a directed graph
if and only if and one of the two following conditions are satisfied: (i) j < N and
, or (ii)
, and
.
of j in
only in-neighbors and node N (the reference node) has only out-neighbors. It is also easy to prove that
acyclic. Indeed, suppose that the cycle belongs to
, which is impossible.
, for which, given that the current node is j, the probability of taking outgoing edge
is given by
The following proposition relates the standard random walk on G to the biased random walk on
can be obtained by solving
The proof is provided in Appendix D.
seen as the average total reward of the standard random walk on graph G or as the average total reward of the biased random walk on graph
proposition gives sufficient conditions for a family of graphs to meet the conditions of Proposition 4.2.
with bounded diameter, condition i’) of Proposition 4.2 is satisfied if, for each node one of the following conditions are satisfied: (i) all paths in
from i to the reference have bounded length, (ii) in
a fraction bounded away from 0 of the in-neighbors of any node is proximal.
The proof is provided in Appendix E.
Example 4.1: Consider the family of complete graphs on N nodes, i.e., .7 Because of symmetry, we can easily see that, after a proper permutation of the nodes, (
the same reason, in (
. Thus, the only surviving edges in (
the edges connected either to node 1 or to the reference node N. Then, the maximum path length from node 1 to the reference is 2. Thus this family of graphs meets condition i) of Prop. 4.5. In particular, the estimate of is given by
bers. Let us build the family of graphs as follows. Nodes
(a set that includes the reference) are “hubs” with potentially unbounded degree. The subgraph induced by the hub nodes is a connected arbitrary graph. The remaining nodes are divided into
subsets
. Subset
, is composed of nodes with maximum degree
, which are neighbors of hub node
and whose other neighbors all belong to
. It is easy to see that, for this family of graphs, the diameter is bounded by
.
reference must pass through the hub nodes, it is easy to see that, in (
belonging to . Whenever the biased random walk on (
hub node ) does not enter it any more. Thus, we can divide into two parts the biased random walk: the first on the subgraph of (
hub node serves as reference, and the second on the hub nodes. Then, we can deduce the following facts.
Fig. 1. Error probability achieved by several ranking algorithms plotted versus the complexity per object C/N, for N = 50. Object qualities are equally spaced in [0, 1) and the workers behave according to the Thurstone model.
Remark 4.2: Although our unweighted LS estimator is akin to the one in [11], our analysis of its performance differs substantially, also because we consider the PAC approach. Consequently, our characterization of “good” graphs does not coincide with that of [11]. For instance, a particular case of Example 4.2 is the wheel graph, which corresponds to choosing (the reference node as the only hub) and
. From [11, Theorem 1], the wheel graph would require W = O(N) comparisons per edge in order for the upper bound on the estimation error to hold, since every edge belongs to a simple path from any node to the reference. Instead, Prop. 4.5 allows to conclude that
is enough to achieve the
-PAC.
Example 4.3: Star graphs represent a particular sequence of acyclic graphs with bounded-length paths. Therefore, they satisfy condition i) of Prop. 4.5. In such a case, an object (let us say object 1) is taken as pivot (i.e. center of the star) and qualities of all the other objects are estimated only through direct comparisons with the pivot. Observe, that, in such particular case, ranking among objects can be directly inferred from
without the necessity of inverting function
. In practice it is enough to rank objects according to the following rules:
iff
and
iff
. Therefore, star graphs are appealing when function
(i.e. the precise worker model) is not known.
We present numerical results showing the performance of our proposed algorithms for moderate values of N. In Figures 1 and 2 we compare the error probability achieved by several ranking algorithms versus the complexity per object C/N. Objects qualities are equally
Fig. 2. Error probability achieved by several ranking algorithms plotted versus the complexity per object C/N, for N = 500. Object qualities are equally spaced in [0, 1) and the workers behave according to the Thurstone model.
spaced in the range [0, 1), i.e., object i has quality i/N where N = 50 (Fig. 1) and N = 500 (Fig. 2). Workers’ behavior is described by the Thurstone model detailed in Section 1.1 where and
is the cdf of a Gaussian random variable with zero mean and standard deviation
. On the y-axis we display the empirical probability of generating an output which is not an
-quality ranking, for
. Note that an error is counted whenever at least two objects, whose quality difference exceeds
, appear swapped in the estimated ranking. The curve labeled “MergeRank” refers to the Merge-Rank algorithm proposed in [22], which we consider as a performance reference. The LS, WLS 8, and ML algorithms have been applied to randomly generated regular graphs [23] whose nodes have degree
(lines with square marker) and
(lines without markers). The figure shows the superior performance of our ranking algorithm. It is interesting to observe that the WLS algorithm provides significant enhancements with respect to the LS algorithm and almost perfectly matches the performance of the more (computationally) complex ML approach. As the number of nodes increases our proposed solutions substantially outperform the “MergeRank” algorithm.
Figure 3 compares the performance of the LS and of the WLS algorithms for N = 200 objects. Object qualities are randomly generated according to a uniform distribution in [0, 1). Other system parameters are set as in Figure 2. The figure reports the empirical error probability plotted versus the number of tests per node, C/N, for different values of the degree of the nodes in the graph. We first observe that, given C/N, the number
Fig. 3. Performance of the LS and WLS ranking algorithms plotted against the complexity per object C/N, for N = 200 objects. Object qualities are drawn from a uniform distribution in [0, 1) and the workers behave according to the Thurstone model.
of tests per edge of the graph decreases as the degree, , increases. Hence, as
increases, distances between pairs of nodes (corresponding to edges of the graph) are estimated with a decreasing accuracy. In spite of that, a larger number of neighbors for each node (i.e., a larger
) leads a more reliable estimation of object qualities. This effect is more evident when the WLS algorithm is employed. Indeed, because of the weights
, as
increases, WLS is able to well exploit the increasing number of highly-reliable edges in the graph connecting objects with similar qualities; at the same time WLS is able to limit the impact of the greater number of scarcelyreliable edges that connect objects with largely different qualities.
5.1 Adaptive multistage approach
The performance of the proposed ranking algorithms can be improved by adopting a multistage approach where, at each stage, new edges are added to the graph, depending on the quality estimates obtained at previous stage. The rationale of this approach stems from the fact that such algorithms provide approximate rankings, in which the probability of swapping the order of two objects increases as their distance (in terms of their qualities) decreases. Therefore, in order to mitigate this phenomenon and, thus, improve the reliability of the estimate, it is convenient to (i) add to the graph extra edges connecting neighboring objects (in terms of their estimated qualities); (ii) assign additional workers to the already existing edges connecting the aforementioned neighboring objects. This procedure can be iterated until a desired performance level is achieved.
Fig. 4. Error probability provided by ML and WLS algorithms when a 2-stage adaptive approach is employed, for , and N = 50.
In our simulation setup, we have considered a 2-stage approach where we first apply the estimation algorithm to a random regular graph, , of degree
, obtaining the vector of estimates
. In the second stage, we create a new regular graph,
of degree
, where each node is connected to its
closest neighbors, according to the estimates
. Finally, the estimation algorithm is applied to the graph
obtaining the output
which is used to infer the ranking. In Figure 4 we show the performance of the ML and WLS algorithms when the proposed multistage approach is employed. For both algorithms we show the error probability versus the number of tests per object, C/N, for
, and N = 50. We observe that the second stage allows for a significant improvement of the performance and a reduction of about 60% of the required tests per object for
and of about 30% for
. In both cases the performance of the WLS algorithm is very close to that provided by the ML algorithm.
In this section, we show that our algorithm works well even when considering a real scenario, where the “evaluations” are the outcome of experiments, and not synthetically generated by simulations. In particular, we consider five recent seasons of the English Premier League and build up a N = 20 complete graph, where nodes are the football teams and edges are the matches between each pair of them. The match between team i and team j is considered as lasting for 180 minutes, since it includes both the round when i is at home and the round where i is away. If team i has scored goals in the match against team j, we count
evaluations in
Fig. 5. Distance between true and estimated ranking for Premier League scores. The x-axis is the season. The y-axis is the Kendall tau distance (number of inversions) between the final season ranking and the output of the WLS algorithm, for different choices of the model and of the parameters.
favor of i when compared to j, where and
are constant. The total number of comparisons between i and j is then simply
.
The WLS algorithm has been run with and both the Thurstone and BTL models, to see the influence of the underlying worker model. The true ranking is assumed to be the final season ranking. The results have been plotted in terms of the Kendall tau distance, which counts the number of inversions in the estimated ranking with respect to the true ranking, i.e. the number of pairs (i, j) for which i is ranked better than j in the true ranking and worse than j in the estimated one.
Results are shown in Fig. 5. First, we can observe that the performance is better with than with
, since in the latter case there might be some edges for which the estimated preference probability is very close to either 0 or 1. Such edges are automatically dropped by the WLS algorithm, while in the former case each object in each comparison receives at least
preferences, so that all edges are used for ranking computation. Second, the Thurstone model seems to be slightly better suited than the BTL model. Third, while in most cases, the influence of parameters is limited, there are cases (like season 16/17) that are more sensible to the choice of
and
. It is worth mentioning that, in [17], the Sync-Rank algorithm is applied to older Seasons of the Premier League. Comparatively, for
, Kendall tau distance for our algorithm never goes beyond 20, giving rise to a Kendall correlation larger than 0.90, which is a better result than those shown in [17].
In this paper, we focused on the problem of ranking N objects starting from a set of noisy pairwise comparisons. Objects are assumed to be endowed with intrinsic qualities. The probability that object i is preferred to j is given by an arbitrary smooth monotonic function of the difference between qualities of the two competitors. For such a scenario we developed a class of order-optimal ranking algorithms, i.e. algorithms, which are provably
-PAC when
comparisons are blindly allocated in a single round. Our ranking procedure is based on the reconstruction of object qualities, from pairwise quality differences, by adopting a simple LS approach. The analysis establishes a parallelism between the quality estimations process and the cumulative reward accumulated by random walks on graphs. Finally, by simulation, we show that the performance of our algorithms, and further variants, is extremely good also in non-asymptotic scenarios and approaches that obtained by the ML algorithm. Our results complement and extend previous recent studies [7], [8], [9] on the minimal complexity of ranking algorithms under different non-parametric preference models.
[1] M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click-through rate for new ads,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07, 2007, pp. 521–530.
[2] R. Herbrich, T. Minka, and T. Graepel, “TrueskillTM: A bayesian skill rating system,” in Advances in Neural Information Processing Systems 19, B. Sch¨olkopf, J. C. Platt, and T. Hoffman, Eds. MIT Press, 2007, pp. 569–576.
[3] R. L. Plackett, “The analysis of permutations,” Applied Statistics, pp. 193–202, 1975.
[4] R. D. Luce, Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.
[5] L. L. Thurstone, “The method of paired comparisons for social values.” The Journal of Abnormal and Social Psychology, vol. 21, no. 4, p. 384, 1927.
[6] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. the method of paired comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
[7] B. Sz¨or´enyi, R. Busa-Fekete, A. Paul, and E. H¨ullermeier, “Online rank elicitation for plackett-luce: A dueling bandits approach,” in Advances in Neural Information Processing Systems, 2015, pp. 604– 612.
[8] M. Falahatgar, A. Orlitsky, V. Pichapati, and A. T. Suresh, “Maximum selection and ranking under noisy comparisons,” arXiv preprint arXiv:1705.05366, 2017.
[9] M. Falahatgar, A. Jain, A. Orlitsky, V. Pichapati, and V. Ravindrakumar, “The limits of maxing, ranking, and preference learning,” in International Conference on Machine Learning, 2018, pp. 1426–1435.
[10] N. B. Shah, S. Balakrishnan, J. Bradley, A. Parekh, K. Ramchan- dran, and M. J. Wainwright, “Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2049–2095, 2016.
[11] J. Hendrickx, A. Olshevsky, and V. Saligrama, “Graph resistance and learning from pairwise comparisons,” in Proceedings of the 36th International Conference on Machine Learning. International Machine Learning Society, 2019.
[12] M. Jang, S. Kim, C. Suh, and S. Oh, “Top-k ranking from pairwise comparisons: When spectral ranking is optimal,” arXiv preprint arXiv:1603.04153, 2016.
[13] Y. Chen and C. Suh, “Spectral mle: Top-k rank aggregation from pairwise comparisons,” in International Conference on Machine Learning, 2015, pp. 371–380.
[14] S. Negahban, S. Oh, and D. Shah, “Iterative ranking from pair- wise comparisons,” in Advances in neural information processing systems, 2012, pp. 2474–2482.
[15] ——, “Rank centrality: Ranking from pairwise comparisons,” Operations Research, vol. 65, no. 1, pp. 266–287, 2016.
[16] A. N. Hirani, K. Kalyanaraman, and S. Watts, “Least squares ranking on graphs, hodge laplacians, time optimality, and iterative methods,” CoRR, vol. abs/1011.1716, 2010. [Online]. Available: http://arxiv.org/abs/1011.1716
[17] M. Cucuringu, “Sync-rank: Robust ranking, constrained ranking and rank aggregation via eigenvector and semidefinite programming synchronization,” CoRR, vol. abs/1504.01070, 2015. [Online]. Available: http://arxiv.org/abs/1504.01070
[18] A. d’Aspremont, M. Cucuringu, and H. Tyagi, “Ranking and synchronization from pairwise measurements via svd,” arXiv preprint arXiv:1906.02746, 2019.
[19] K. R. M. J. W. Reinhard Heckel, Nihar B. Shah, “Active ranking from pairwise comparisons and when parametric assumptions don’t help,” arXiv preprint arXiv:1606.08842, 2016.
[20] B. Ewald, J. Humpherys, and J. West, “Computing expected transition events in reducible markov chains,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1040–1054, 2010.
[21] J. G. Kemeny and J. L. Snell, Markov Chains. Springer-Verlag, New York, 1976.
[22] M. Falahatgar, Y. Hao, A. Orlitsky, V. Pichapati, and V. Ravin- drakumar, “Maxing and ranking with few assumptions,” in Advances in Neural Information Processing Systems, 2017, pp. 7063– 7073.
[23] B. Bollob´as, Random graphs. Cambridge university press, 2001.
1 Ranking a set of objects: a graph based least-square approach
Supplemental material
Given that:
we can write the MSE, on the estimate of q as
where is a deterministic matrix and
is the largest eigenvalue of C. By using the definition of the matrix H the term
in (20) can be expanded as
In order to evaluate the averages in (21) we consider the following linear approximation of
where are positive constants. By applying this result to (21) we obtain
thanks to the fact that and that
. As for the term
we can write
Since, is defined as a difference of two probabilities we have
. Then
. This result allows to upper-bound the term
as
for some constant . In conclusion the MSE can be bounded as
where, c is a constant, we assumed that is uniformly upper-bounded for any i, we defined
, and used the bound
.
From (13) we can write the error on the quality estimates as . We can then upper-bound the term
by using the infinity norm as follows
thanks to the fact that H is substochastic. Let . Then we can write
which converges to 0 as under the conditions ii) and iii), as stated in Appendix F, Proposition F.1.
The last statement of the proposition can be proved by considering a sequence of operators mapping the sequence of matrices
with bounded norm
into a set of vectors
with uniformly bounded
-norm. Let
. We know that, as
,
with a probability larger than
. Then, we can write
where
and
is such that
. It follows that
with a probability larger than
.
Let us consider the directed graph on N nodes. The proof of the proposition descends from the fact that (18) provides an explicit expression for operator
, i.e.,
where
and is the length of the longest (simple) path from i to the reference node n. Now, we denote with
the set of paths from i to n, and for any path
we denote with L(p) and p(h) respectively the graph-theoretical length (expressed in number of hops) of p and h-th node along p. By inspection, it can be easily seen that the previous expression can be rewritten as follows:
From the above expression, it is rather immediate to check that
where
is the length of the longest path to node n on
. Thus, if
is finite and
, also
is finite, and condition i) of Proposition 4.3 is demonstrated. Observe that, if we interpret
as the elementary reward associated to edge (p(h), p(h + 1)),
can be regarded as the total reward associated to path p. Furthermore,
is equal to
the probability, for a Random Walker (henceforth, RWer) on that starts in i and ends in n, to take path p. As a result,
can be interpreted as the expected total reward accumulated by the RWer starting in i and stopping as soon as it reaches node n.
Now, assuming , we have that, for i = 1, . . . , n,
P{RWer is still active after t hops}
where represents the average stopping (i.e., absorbing) time for the RWer. Therefore, from the above considerations we can write
if
. Now, we show that under assumptions ii)
.
Let D denote the diameter of . By construction, from any node i < n, there exists one path of length at most D that leads to node n. We call the shortest path from i to n as critical. Define the following events:
A[k] = {RWer is still active after t = kD hops}
RWer in i (active) at time t = kD }
RWer takes the critical path from i
We have
Now, uniformly over i,
Therefore, since by construction we have that:
from which we get:
The assertion concerning condition ii) follows immediately,
Now, the previous argument can be easily extended under iii). Let D be now any finite integer sufficiently large and keep the same definitions of the events and
. We denote with B the set of proximal nodes, i.e., of the nodes satisfying the
property. Let also
be the set of non-proximal nodes. We qualify as critical any path that, from a given node, reaches the reference node within
hops. In particular let
be a uniform lower bound to the fraction of in-edges connecting an arbitrary node v to B.
Note that, by construction, at every instant the RWer, if active, is visiting a node in B with a probability at least . Therefore:
Then, proceeding as before we get the result.
The proof descends from the analysis of (13). Let us define
so that
Thus, from (13), we get
where is defined in (16) for
, and we have used the fact that
.
Now consider the directed graph
. Without loss of generality, let us consider the case i = 1 and, whenever
possible, let us drop the subscript and write simply
, etc. Let
and
be the out-neighbors and in-neighbors of node j, respectively. We first prove the following lemma.
Lemma D.1: With the previous definitions,
Proof: For 1 < j < N, we have on the indirect graph G
Now, from the definition of
E , it turns out that , where
is the subset of neighbors
of
j such that . Reordering the terms in (39), we obtain (38). For j = 1, we can proceed in the same way, with the additional remark that
is empty.
Consider a perturbed Random Walker (RWer), which starts from node 1 and, given that it has reached node
j, proceeds through edge
E with probability defined in (17). The proof of Proposition 4.4 consists in
showing that the probability for the RWer to pass through edge
We know by definition that . Moreover, for
satisfies the linear equation
H be the random-walk Laplacian matrix for the perturbed random walk defined above, i.e., (
Let also
. Finally, let
be the row vector of node probabilities. Now, (41) can be written in matrix form as
which can be solved univocally as
Now notice also that satisfies (41), thanks to the definition of
in (17) and Lemma D.1.
Then, since (41) has a unique solution given by (43), we conclude that .
As a consequence, the probability for the RWer to pass through edge
Thus, the average total reward accumulated by the RWer before being absorbed will be given by
which coincides with (37). This concludes the proof.
The proof descends from the fact that (18) provides an explicit expression for the operator :
where is the set of paths from i to n, and, for any path
and
are the graph-theoretical length (expressed in number of hops) of
and h-th node along
, respectively. Notice that (45) is similar to (28), except for the expression of the probabilities associated to the graph edges.
Notice that conditions i) and ii) of Prop. 3.5 coincide with conditions i) and iii) of Prop. 3.3. The proof of Prop. 3.5 then follows immediately from the fact that the proofs of conditions i) and iii) of Prop. 3.3 do not explicitly depend on the expression of the edge probabilities.
Proposition F.1: For any and
, there exists
such that, as
,
< δ and P
provided that with
and the total number of edges of graph G is |E| = O(N). Proof: We first use the union bound and write
. We then observe that the MGF
of
is given by:
Then we bound by applying the Chernoff bound:
By setting , and
, for a sufficiently large
, we have
. Now, for
sufficiently large, we can always
by imposing that , we get that
for the more general case.
Similarly, the term also tends to 0 as N grows. As for the second claim of the proposition we can write again
. We then recall that
,
, and
. It follows that
By defining the convergence of
to 0 as N grows immediately follows. Similarly, it is straightforward to prove the convergence to 0 of the term
.