b

DiscoverSearch
About
My stuff
The Complexity of Interactively Learning a Stable Matching by Trial and Error
2020·arXiv
Abstract
Abstract

In a stable matching setting, we consider a query model that allows for an interactive learning algorithm to make precisely one type of query: proposing a matching, the response to which is either that the proposed matching is stable, or a blocking pair (chosen adversarially) indicating that this matching is unstable. For one-to-one matching markets, our main result is an essentially tight upper bound of  O(n2 log n) on the deterministic query complexity of interactively learning a stable matching in this coarse query model, along with an efficient randomized algorithm that achieves this query complexity with high probability. For many-to-many matching markets in which participants have responsive preferences, we first give an interactive learning algorithm whose query complexity and running time are polynomial in the size of the market if the maximum quota of each agent is bounded; our main result for many-to-many markets is that the deterministic query complexity can be made polynomial (more specifically,  O(n3 log n))in the size of the market even for arbitrary (e.g., linear in the market size) quotas.

Problem and Background In the classic Stable Matching Problem (Gale and Shapley, 1962), there are n women and n men; each woman has a subset of men she deems acceptable and a strict preference order over them; similarly, each man has a subset of women he finds acceptable and a strict preference order over them. The goal is to find a stable matching: a one-to-one mapping between mutually acceptable women and men that contains no blocking pair, i.e., no pair of a woman and a man who mutually prefer each other over their current partner in the matching (or over being unmatched, if one or both are unmatched). In their seminal paper, Gale and Shapley (1962) constructively proved that such a stable matching always exists.

The study of stable matchings within computer science was originated in Knuth’s famous series-of-lectures-turned-book (Knuth, 1976), in which Knuth focused on the stable matching problem as — according to his foreword to the book — an ideal introductory problem for the analysis of algorithms. In this book, Knuth’s first order of business was that the most natural “first attempt” at an algorithm for finding a stable matching fails: if one repeatedly tries to resolve blocking pairs by matching the two participants of the pair with each other and matching their previous partners with each other, this may lead to a cycle.

Some decade and a half following Knuth’s book, Roth and Vande Vate (1990) showed a positive result for a close variant of the algorithm, in which a blocking pair is resolved by matching the blocking pair with each other and leaving their previous partners unmatched; we refer to this algorithm as the myopic algorithm.1 Roth and Vande Vate showed that if the blocking pair to be resolved is chosen uniformly at random among all blocking pairs (rather than adversarially), then with probability one, the algorithm terminates in finite time, and thus yields a stable matching. Roth and Vande Vate’s result was later generalized by Kojima and ¨Unver (2006) to the canonical “many to many” extension of the stable matching problem, in which each participant on each side of the market (the two sides of the market in this setting are customarily called firms and workers) can be matched to more than one participant on the other side, (e.g., up to a given  quota).2

Another decade and a half later, Ackermann et al. (2011) focused on the number of iterations it takes the myopic algorithm to reach a stable matching, proving the following negative result: for some preference structure, while convergence with probability one is assured, it may take exponentially many iterations to reach a stable matching. Their result applies already in the one-to-one setting, and thus of course extends to the many-to-many case as well.

Through the lens of learning theory, we can view this type of interaction as corresponding naturally to Angluin’s Equivalence Query model (Angluin, 1988) for learning a binary classifier, and its recent generalization due to Emamjomeh-Zadeh and Kempe (2017). In this interaction model, the learner proposes a candidate structure (here: a matching), and — unless the proposal was correct (here: stable) — learns of one local mistake (here: a blocking pair). Because the queries are very coarse-grained compared to standard preference queries — after all, a query comprises an entire matching and cannot target a specific comparison or even a specific participant — we refer to the query model as the coarse query model. Viewed through this lens, the preceding results state that in the coarse query model, the myopic algorithm could take exponentially long to converge given randomly drawn (legal) responses, and might never converge given adversarial (legal) responses. Taking this learning-theoretic perspective as our point of departure, in this paper, we ask whether more sophisticated interactive learning algorithms in the coarse query model can be made to perform much better than the myopic algorithm, and if so, precisely to what extent.

Beyond a learning-theoretic interest, our main question also has implications from a “purely EconCS” perspective. A famous quote by Kamal Jain (see, e.g., Papadimitriou, 2007) questions the validity of market equilibria as a practical solution concept on the grounds that “if your laptop cannot find it, neither can the market.” This quote refers to a line of work that eventually culminated in showing that even finding approximate equilibria is PPAD-complete (and thus likely to require an exorbitant number of queries), not only in a decentralized manner, but even by a capable central planner. Most famously, this was shown for Nash equilibria (Daskalakis et al., 2009; Chen and Deng, 2006; Rubinstein, 2016), and for Arrow-Debreu market equilibria (Chen and Deng, 2006; Chen et al., 2009, 2017; Rubinstein, 2018), even with severely restricted utility functions. A negative answer to our main question would therefore cast doubt on the possibility that a market designer with access only to coarse queries such as those of “trial and error”3 could find a stable matching without the ability to ask each participant detailed questions about their preferences (like those that the Gale-Shapley algorithm requires). On the other hand, thinking of a blocking pair as “minimal informative evidence” of the instability of a matching, which is arguably what is naturally observed as a first sign of a matching being about to unravel4, a positive answer to our question would mitigate to some extent the negative message of Ackermann et al. (2011), providing evidence that stability can in fact be achieved even by a market designer looking at the market through a far coarser and less informative lens than usually considered.

One-to-one markets We start our analysis in the one-to-one setting of men and women, and consider adversarial responses to the queries of our algorithm. (The model is defined precisely in Section 2.) A trivial observation is that an exponential-query brute-force search over all matchings can find a stable matching. Some further consideration gives rise to the following algorithm: at every step, fix speculative preferences for all participants; the only requirement is that these speculative preferences must be consistent with all of the information gathered so far about the participants’ preferences. (In the first step, any preferences are consistent, as no information has been gathered yet.) The algorithm then proposes a matching that is stable with respect to these speculative preferences. If this matching is unstable, then the response to the query tells the algorithm that some pair (w, m) is blocking; that is, that w prefers m to her assigned partner and that m prefers w to his assigned partner. At least one of these two pairwise comparisons must contradict the speculative preferences and so constitutes new information upon which to base the new speculative preferences in the next step. Noticing that throughout the run of the algorithm, the algorithm can collect at most�n2�= O(n2) pairwise comparisons for each participant’s preferences, we conclude that this algorithm makes at most  O(n3) unsuccessful queries, and therefore finds a stable matching in  O(n3) queries (and in polynomial running time).

The above cubic algorithm essentially attempts to “sort” the preferences of each participant (unless it is “interrupted” by finding a stable matching). In the worst case, it uses quadratically many of its queries to “sort” each participant’s preferences. While finding a stable matching in polynomially many queries is already appealing, encapsulating a quadratic-query sort within this algorithm is dissatisfying. The challenge with replacing this “inner sort” with an O(n log n)-query sort is that unlike standard sorting algorithms, in the coarse query model, the next comparison is chosen not by the algorithm, but adversarially. It is therefore a priori unclear whether the query complexity of each such “inner sort” can be improved. Our first main result, proved in Section 3, is a positive answer to this question: it is possible, at the beginning of each step of the algorithm, to choose the speculative preferences in a way such that no more than O(n log n) comparisons are ever made for any one participant, implying the following theorem:

Theorem 1.1 (Informal, see Theorem 3.45).In the coarse query model, the deterministic query

complexity of finding a stable matching is  O(n2 log n).

The main combinatorial tool that we leverage in our proof of this theorem is Yu’s remarkable Proportional Transitivity Theorem (Yu, 1998). Implementing the resulting strategy for finding the “right” speculative preferences to fix is computationally non-trivial; indeed, note that Theorem 1.1 only expresses guarantees about the number of rounds of interaction, but not about the computational complexity. Fortunately, a Markov chain-based algorithm due to Huber (2006) for sampling linear extensions of a partial order can be leveraged to obtain “good” speculative preferences effi-ciently (i.e., in running time polynomial in n), up to negligible probability of failure:

Theorem 1.2 (Informal, see Theorem 3.96).In the coarse query model, a stable matching can be efficiently found with high probability using  O(n2 log n) queries.

Can this quadratic (up to the log factor) query complexity be improved? The existing literature on the query and communication complexity of stable matchings (Ng and Hirschberg, 1990; Chou and Lu, 2010; Segal, 2007; Gonczarowski et al., 2019) provides some very robust lower bounds on the complexity of finding a stable matching. However, these are all obtained in a model in which each query is to the preference list of one participant, or more generally involves only preference lists of participants on one of the sides of the market.7 The most general of these lower bounds (Gonczarowski et al., 2019) states that Ω(n2) queries are required to find a stable matching. While queries in that model are very flexible, the theorem of Gonczarowski et al. (2019) that proves this bound also proves that in that model, even verifying the stability of a proposed matching requires quadratically many queries. This is in contrast with our coarse query model, in which verifying the stability of a proposed matching takes one query. In general, though, queries in any of the models of those papers are more powerful and flexible than in our coarse query model, in the sense that in those models, a query can specify, for example, a precise pairwise comparison to be revealed. By contrast, queries in our coarse query model are answered adversarially.8 Given these differences between the models, it is quite curious that in both models, the best known upper bound (up to log factors) is quadratic. Completing the picture, we present a quadratic lower bound in our coarse query model as well:

Proposition 1.3 (Informal, see Proposition 3.109).In the coarse query model, the query complexity of finding a stable matching is Ω(n2). This holds even for the number of expected queries by a randomized algorithm for surely finding a stable matching when the preferences of participants on one side of the market are identical and known, and the preferences of the participants on the other side are i.i.d. uniform.

The proof of this lower bound is less elaborate than those of the quadratic lower bounds of the above-mentioned papers. This provides further evidence regarding the weakness/coarseness of the queries that are allowed in our model, in some sense making the  O(n2 log n) upper bound more surprising. The remaining gap between our upper and lower bounds, while in a very different model than that in which Gonczarowski et al. (2019) presented their gap, is between the exact same bounds of Ω(n2) and O(n2 log n). Both gaps, in very different technical senses, may be thought of as leaving open the interesting question of whether the number of “atomic” queries required to find a stable matching can be significantly less than the Θ(n2 log n) bits of information that are needed to encode the preference lists of all participants.

Many-to-many markets Finally, in Section 4, we generalize our results not only to many-to-one markets, but in fact “all the way” to many-to-many matching markets (Roth, 1984). We consider a market with firms on one side and workers on the other, where each participant has responsive preferences (Roth, 1985). That is, each participant has a strict preference order over the participants on the other side of the market, as well as a quota: the maximum number of participants from the other side of the market to which this participant can be matched. The coarse query model that we consider in this setting is similar to the coarse query model from the one-to-one setting: an interactive learning algorithm may propose a matching, and the response is either that this matching is (pairwise) stable, or a blocking pair. However, the revelation of a blocking pair only reveals that the firm prefers the worker to some worker it is currently matched with (but does not identify this worker), and that the worker prefers the firm to some firm she is currently matched with (but does not identify this firm). Thus, the information given to the algorithm is even coarser than in the one-to-one matching case. It would of course also be natural — and the appropriate choice for some applications — to consider a model that reveals the discarded firm/worker; as we discus in Section 3.6, our analysis from the one-to-one case applies to such a model without any need for adjustment. We focus on the even coarser model that does not reveal the discarded partners in order to keep in line with our motivation above: focusing on the weakest/coarsest model in which the algorithm still naturally obtains information about the underlying preferences.10 In this model, a single response does not definitively imply even a single pairwise comparison in the preferences of any participant. It therefore becomes much less clear whether or how exponentially many queries (and exponential running time) can be avoided. Indeed, this problem bears a considerable resemblance to a problem that we show to be NP-hard (see Proposition 4.2). Some further thought, though, shows that the above  O(n3) algorithm for the one-to-one setting can be conceptually generalized to obtain an  nO(q) algorithm, where  q ≤ n is themaximum quota of any participant:

Theorem 1.4 (Informal, see Theorem 4.3). In the coarse query model, the deterministic query complexity of finding a stable matching is  nO(q). Furthermore, a stable matching can be found in time  nO(q) using such a number of queries.

For a market with small (constant) quotas, this theorem is satisfactory. For markets in which at least one quota is large (e.g., superpolylogarithmic in the size of the market; for instance, where at least one employer employs a sizable chunk of the market), the query complexity of this theorem is too high, a result of the significantly coarser information the algorithm receives. Our main result for the many-to-many setting is nonetheless a positive one: it is possible to once again carefully set the speculative preferences in each step of the algorithm to obtain a polynomial number of queries:

Theorem 1.5 (Informal, see Theorem 4.5). In the coarse query model, the deterministic query complexity of finding a stable matching is  O(n3 log n)(regardless of the maximum quota q).

This theorem makes no guarantees regarding the running time of an algorithm that makes this number of queries. Indeed, the main problem that we leave open is designing an interactive algorithm for many-to-many markets that runs in time polynomial in n, regardless of q. As discussed in Section 5, we identify intimate connections between this open problem and open problems regarding rapid mixing of certain stochastic processes under constraints that cease to be convex when q is greater than one.

Open Problem 1.6. For the coarse query model, construct an interactive learning algorithm that finds a stable matching in many-to-many, or even many-to-one, markets with an expected polynomial number of queries and an expected polynomial run time.

As mentioned above, the coarse learning model we study can be viewed as a case of the generalized equivalence query model of Emamjomeh-Zadeh and Kempe (2017). Note that the structures to be learned in our setting are the agents’ preferences, not the matching itself; the proposed matchings serve as conduits for receiving additional information about the preferences. Emamjomeh-Zadeh and Kempe (2017) provided a general approach, based on a technique of Emamjomeh-Zadeh et al. (2016), for learning such structures in a number of rounds that is logarithmic in the number of candidate structures. Had this technique been applicable, their technique would have implied our O(n2 log n) bound from Theorem 3.4. However, their framework is provably not applicable to our setting, necessitating the self-contained algorithm and analysis we provide here. We prove this fact in Appendix A.

image

After this paper appeared at EC’20, the authors of (Bei et al., 2013) contacted us (personal communication, August 20, 2020) and made us aware of their paper, of which we had been unaware until that point. As they pointed out, in their paper, they asked a similar question to ours, for a number of domains including one-to-one matching markets, and proved theorems equivalent to our Theorems 3.4 and 3.9 and a theorem similar to our Proposition 3.10. For Theorems 3.4 and 3.9, while of the same order, the number of queries that our algorithms use is smaller by a constant factor, and our proofs are somewhat shorter, since we used Yu’s Proportional Transitivity Theorem, whereas they proved the result from first principles; however, neither this difference nor the differ-ence in proofs is substantial. For Proposition 3.10, our result is somewhat stronger, by allowing preferences on one side to be known and identical, and preferences on the other side to be i.i.d. random, whereas theirs is a worst-case result; however, the qualitative message is similar. Bei et al. (2013) did not study many-to-one or many-to-many matching markets in their paper, and so our results for these markets, including Theorems 4.3 and 4.5, have no counterparts in their paper.

2.1 The Stable Matching Problem

In a general many-to-many matching market, there is a set  W of nWworkers and a set  F of nFfirms. We will use w and its variants exclusively for workers and f and its variants exclusively for firms. We write  A = W ∪ Ffor the set of all agents, and a and its variants when we define or reason about notation that applies equally to both sides of agents. Each worker  w ∈ W hasquota qw ∈ {1, . . . , nF }and each firm  f ∈ F has quota qf ∈ {1, . . . , nW }. Each worker  w ∈ Whas a preference order (linear order)  ≻∗w over a subset of F, which we interpret as the subset of firms to which w prefers being matched over being unmatched. We write  f ≻∗w f′to denote that w (strictly) prefers  f over f′, and write  f ≻∗w ∅to denote that w prefers f over being unmatched. Similarly, each firm  f ∈ Fhas a preference order  ≻∗f over a subset of W.

A many-to-many matching  µ ⊆ F × Wis a set of worker-firm pairs such that the set of firms to which worker w is matched,  µ(w) =�f ∈ F �� (w, f) ∈ µ�, is of size at most  qw, and the set of workers to whom firm f is matched,  µ(f) =�w ∈ W �� (w, f) ∈ µ�, is of size at most  qf.

In the special case in which  qw= 1 for all  w and qf= 1 for all f, this is exactly the definition of a standard (not necessarily perfect) one-to-one matching. In this case, keeping with standard nomenclature, we refer to the two sides as women W and men M, using the notation w and m and their variants for individuals. In this case, we also write  µ(a) for the (unique) partner of  a ∈ A. Ifa is unmatched, then we define  µ(a) = ∅.

Given a matching  µ, an agent a is individually blockingif she is matched to an agent that she does not prefer over being unmatched. A matching is individually rational if no agent is individually blocking. Given a matching  µ, a pair (w, f) /∈ µis called a  blocking pair (w.r.t. µ) if all of the following hold:

 f ≻∗w ∅ and w ≻∗f ∅.

Either��µ(f)�� < qfor there exists  w′ ∈ µ(f) such that  w ≻∗f w′.

Either��µ(w)�� < qwor there exists  f′ ∈ µ(w) such that  f ≻∗w f′.

A blocking pair would prefer to add each other to their respective sets of matches (while removing f′ and w′ from these respective sets if these sets are full). A matching  µis (pairwise) stable (with respect to the preferences  ≻∗w and ≻∗f and quotas qw and qf) if it is individually rational and no pair is blocking it. That is, if  µis stable, then it is individually rational, and furthermore, for each pair (w, f) /∈ µ with f ≻∗w ∅ and w ≻∗f ∅, we have either��µ(w)�� = qw and f′ ≻∗w f for all f′ ∈ µ(w),or��µ(f)�� = qf and w′ ≻∗f w for all w′ ∈ µ(f); in words, a matching is stable iff each agent is only matched to agents that it prefers over being unmatched, and furthermore for each unmatched pair (w, f), either w is matched to her quota of partners and prefers all her current partners to f, or f it matched to its quota of partners and prefers all its current partners to w.

2.2 The Interactive Learning Model

Our goal is to design an algorithm to produce/learn a stable matching. The challenge is that  ≻∗w and≻∗f are unknown to the algorithm; nonetheless, the algorithm seeks to find a stable matching with respect to them, by gradually learning about the individuals’ preferences. The learning proceeds in rounds. In each round t, the algorithm proposes a matching  µ(t). If µ(t) is stable with respect to the (initially unknown)  ≻∗w and ≻∗f, the process terminates. Otherwise, the algorithm learns about one blocking pair  B(t) = (w(t), f(t)) or one individually blocking agent. If there are multiple possible blocking pairs (and/or individually blocking agents), the one that is revealed to the algorithm is chosen adversarially. Based on this new information, the algorithm can then compute a new proposed matching  µ(t+1), and so on.

2.3 Reduction to Full Preference Lists

We will say that a matching market has full preference lists if a) All preference lists are full: each agent prefers every agent on the other side of the market to being unmatched, and b) Quotas are balanced: �w∈W qw = �f∈F qf. (In a one-to-one market this translates to  nW = nM.) Wesay that a matching in such a market is perfect if every participant is matched to her full quota of partners. In such markets, a matching is individually rational if and only if it is perfect. Restricting attention to perfect matchings, a matching is stable if and only if it is not blocked by any pair. When stating our learning algorithms and analyzing them, it will be convenient and reduce clutter to restrict attention to markets with full preference lists. Since we will design our algorithms to only output perfect matchings, we will only have to deal with blocking pairs (rather than individually blocking agents) as responses. Luckily, by a well-known reduction in stable matching theory, this restriction is without loss of generality. See Appendix B for the details of the embedding of the general problem in the special case of full preference lists, as well as for a discussion and a word of caution on applying other well-known reductions within the context of our coarse query model.

2.4 Additional Notation

When  qw = qf= 1 for the blocking pair (w, f) = (w(t), f(t)), the algorithm can infer that  f ≻∗w µ(w)and  w ≻∗f µ(f). In the more general case, the algorithm only learns that there exist  f′ ∈ µ(w) andw′ ∈ µ(f) such that  f ≻∗w f′ and w ≻∗f w′. In Section 3, we will briefly discuss a model in which the algorithm also learns the exact identity of  w′ and f′; unsurprisingly, in this case, the problem directly reduces to the case in which  qw = qm= 1 for all  w and m.11

In either case, the algorithm obtains information about the agents’ preferences in the form of “a must precede at least one element of S in a particular preference order.” We write such constraints as  ⟨a, S⟩(the agent to whose preference order the constraint applies will be clear from context), and also as  ⟨a, a′⟩ when S = {a′}is a singleton. We will consider sets C of such constraints. We say that a ranking  ≻ is consistent with Cif it satisfies all constraints in C. Throughout, all of our sets C will be such that at least one ranking (namely, the unknown true ranking) is consistent with C. When C is a set consisting only of constraints of the form  ⟨a, a′⟩, i.e., involving only the relative order of two elements, then C precisely defines a strict partial order, and the rankings consistent with C are exactly the linear extensions of C.

In this section, we focus on the case in which all agents have a quota of 1, i.e., the case of women and men. We will write  n = nW = nM. Recall that in this special case, the revelation of a blocking pair B(t) = (w(t), m(t)) = (w, m) implies that  m ≻∗w µ(w) and w ≻∗m µ(m); this information reduces the number of (remaining) possible preferences, by reducing either the number of possible preference orders  ≻∗w or the number of possible preference orders  ≻∗m.

3.1 High-Level Overview

The high-level idea of our approach is to choose  µ(t) in such a way that whichever blocking pair B(t) the adversary reveals, the number of possible preference orders for at least one of w, m not only decreases, but in fact decreases by a constant factor.

For each agent  a ∈ A, let C(t)abe the set of all preference constraints that can be inferred from the blocking pairs revealed before (but not including) time t. That is, for a woman w, we have C(t)w =�⟨m, µ(t′)(w)⟩�� t′ < t and B(t′) = {w, m}�; similarly for men. Let  P(t)adenote the set of all linear extensions of  C(t)a .

We call a scenario (at time t) a vector giving for each woman and each man a preference order consistent with all observations so far; that is, a vector (≻a)a∈A ∈×w∈W P(t)w ××m∈M P(t)m . Letρ(t)a = |P(t)a |denote the number of remaining possible preference orders for  a, and ρ(t) = �a∈A ρ(t)athe total number of scenarios. Initially, for every agent a, there are  ρ(1)a = n! possible preference orders, so  ρ(1) = (n!)2n.

Our algorithm will ensure that the number of possible preference orders for at least one agent, and thus the number of possible scenarios, decreases by a constant factor in each round t. Thus, after at most  t∗ = log�(n!)2n�= O(n2 log n) rounds, there is at most one scenario remaining, at which point the algorithm will have identified all of the true preference orders  ≻∗a. By running the Gale-Shapley Algorithm using those preference orders, the algorithm then ensures that it has found a stable matching.12

3.2 Representative Rankings and Proportional Transitivity

The basic idea of our algorithm is the same as in the  O(n3) algorithm of Section 1. In each round, the algorithm chooses a speculative preference order for each agent, and then computes a stable matching with respect to those speculative preference orders. The important improvement is to choose the speculative preference orders for all agents carefully. Specifically, we want all preference orders  ≻(t)ato be representative of the corresponding  P(t)a , in the following sense.

Definition 3.1 (Representative preference order). Let C be a set of constraints (forming a partial order), and P the set of all linear extensions of  C. Let pa,a′ = pa,a′(C) be the fraction of linear orders  ≻ in P that have a ≻ a′. We say that  ≻ is α-representative of Piff for all pairs (a, a′),whenever  pa,a′ ≥ α, we have a ≻ a′.

It is not a priori clear that an  α-representative preference order even exists for a constant  α < 1.This is established by the following lemma.

Lemma 3.2. Let C and α ≥ 0.8be arbitrary, and let  Sα(C) =�(a, a′)�� pa,a′ ≥ α�be the set of all ordered pairs (a, a′)such that at least an  αfraction of the linear extensions of C rank a ahead of a′. Then, Sα(C)is acyclic, and thus defines a partial order. In particular, every linear extension of  Sα(C) is an α-representative preference order.

The proof of Lemma 3.2 relies heavily on Yu’s Proportional Transitivity Theorem (Yu, 1998).

Theorem 3.3 (Theorem 3 of Yu, 1998). Let φmin = 1+(√2−1)·√2√2−12 ≈ 0.78, and φ ≥ φmin anyconstant. Let P be any partial order. For every  a, a′, a′′, if pa,a′(P) ≥ φ and pa′,a′′(P) ≥ φ, thenpa,a′′(P) ≥ φ.

Theorem 3.3 is remarkable: it states that if a large enough fraction of linear extensions of a partial order have  a ahead of a′, and (at least) the same fraction have  a′ ahead of a′′, then (atleast) the same fraction have  a ahead of a′′. For p= 1, this statement is of course trivial (because it exactly captures transitivity of partial orders), but that it should hold for a constant p < 1 is not at all obvious.

Proof of Lemma 3.2. Assume (for contradiction) that  Sα(C) contains a cycle  a1 ≻ a2 ≻ a3 ≻ · · · ≻aℓ ≻ a1 of length ℓ. Then, pai,ai+1(C) ≥ α for all i = 1, . . . , ℓ −1. Applying Theorem 3.3 with φ = α ≥ 0.8 > φminrepeatedly, we obtain that  pa1,aℓ(C) ≥ α. But this gives a contradiction with the fact that  paℓ,a1(C) ≥ α, because pa1,aℓ(C) + paℓ,a1(C) = 1.

image

The total number of iterations required by Algorithm 1 is characterized by the following theorem.

Theorem 3.4. 13 Algorithm 1 terminates after at most  O(n2 log n)iterations.

Proof. Because α ≥ 0.8, by Lemma 3.2, the algorithm can always find representative preference orders to use as speculative preference orders.

Consider any iteration t in which a blocking pair  B(t) = {w, m}(with respect to the true preferences  ≻∗a) was revealed. Because the Gale-Shapley Algorithm finds a stable matching, the pair {w, m} was not blocking for the preferences  ≻(t)aused in running the Gale-Shapley Algorithm. In particular, this means that at least one of w, m preferred their assigned partner over the other, i.e.,  µ(w) ≻(t)w m or µ(m) ≻(t)m w. Without loss of generality, assume that  µ(w) ≻(t)w m.

image

Remark 1. The number of iterations is  O( α1−α · n2 log n), but because  αis an absolute constant (e.g.,  α = 0.8, or α = 0.9in the efficient implementation to follow), the number of iterations is simply  O(n2 log n).

The implementation of Algorithm 1, as given above, is not efficient, because we did not specify how to find representative preference orders. One way to do so is to enumerate all linear extensions of  C(t)a , and use this enumeration to compute the set  Sα(C(t)a ) =�(a, a′)�� pa,a′ ≥ α�fromLemma 3.2 explicitly. Given  Sα(C(t)a ), finding a linear extension is simply a topological sort. Of course, the enumeration may take time Ω(n!). In the next section, we give an efficient randomized implementation, which terminates in  O(n2 log n) iterations with high probability.

3.4 An Efficient Randomized Implementation

In this section, we describe and analyze a randomized algorithm for sampling representative preference orders for a set of constraints (partial order) C. The algorithm is given as Algorithm 2, and relies heavily on the following theorem of Huber (2006), which is based on the Karzanov-Khachiyan Chain (Karzanov and Khachiyan, 1991) and a bounding chain:

Theorem 3.5 (Lemma 10 of Huber, 2006). There exists a randomized algorithm for sampling linear extensions of a partial order C, with the following guarantees:

1. The algorithm samples linear extensions exactly uniformly from all linear extensions of C.

2. The expected running time of the algorithm is at most 4.3 · n3 ln n.

3. For every integer k, with probability at least 1−1/nk−1, the running time of the algorithm is at most  k · 16π2 · n3 ln n.

image

The two key observations about Algorithm 2 are the following: (1) the set S of sampled constraints is acyclic with high probability, ensuring (fast) termination of the algorithm; and (2) with high probability, ˆ≻is 0.9-representative.

To prove both statements, we begin with a simple tail bound, capturing that the ˆpa,a′ areaccurate estimates of  pa,a′. Let Edenote the event that all estimates ˆpa,a′simultaneously satisfy that  |ˆpa,a′ − pa,a′(C)| ≤ 0.05.

Lemma 3.6. Prob[E] ≥ 1 − 1n.

Proof. First focus on one pair (a, a′). By Theorem 3.5, the linear extensions are sampled exactly from the uniform distribution.14 Therefore, we have  E�ˆpa,a′�= pa,a′(C). The Hoeffding Bound15(Hoeffding, 1963) guarantees that Prob�|ˆpa,a′ − pa,a′(C)| ≥ 0.05�≤ 2 exp(−K/200) = 2n−3. A unionbound over all�n2�pairs a, a′ now implies that all estimates are accurate to within an additive 0.05 with probability at least 1− 1n.

Lemma 3.7. Under E, the set S of preference constraints is acyclic.

Proof. Because we are conditioning on the event E, for all constraints  ⟨a, a′⟩ ∈ S, we must have that  pa,a′(C) ≥ 0.8. Acyclicity now follows from Lemma 3.2, because  S ⊆ S0.8(C) for the set  S0.8(C)defined in that lemma.

Lemma 3.8. Under E, the preference order ˆ≻returned by Algorithm 2 is 0.9-representative.

Proof. Let a, a′ be such that  pa,a′(C) ≥ 0.9. Because we are conditioning on E, this implies that ˆpa,a′ ≥ 0.85, so by definition,  ⟨a, a′⟩ ∈ S. Because ˆ≻is a linear extension of S, we get that  aˆ≻a′.

We now return to the analysis of Algorithm 1, proving the following theorem: Theorem 3.9. 16 If Sample-Representative-Preference(C(t)a )is used to compute the specu- lative preference order  ≻(t)ain Line 6 of Algorithm 1, then Algorithm 1 terminates after at most

O(n2 log n)iterations with high probability; furthermore, with probability at least 1−O(1/n2), eachiteration runs in time  O(n3 log2 n).

Proof. Consider running Algorithm 1 for Θ(n2 log n) iterations, with a suitably large constant. Let N be the number of those iterations in which the event E held. By Lemma 3.6, N is lowerbounded by a Binomial�Θ(n2 log n), 1− 1n�random variable. By Chernoff Bounds, with high probability,  N = Θ(n2 log n), again for a suitably large constant.

During any iteration t in which E did not hold, we simply (pessimistically) bound  ρ(t+1) ≤ ρ(t).For the remaining iterations, Lemma 3.8 guarantees that the  ≻(t)a are 0.9 representative. Therefore, using  α = 0.9, Theorem 3.4 guarantees that when  N = Ω(n2 log n), the algorithm terminates with a stable matching.

Finally, we analyze the running time. The running time of Algorithm 2 is dominated by drawing O(log n) independent uniformly random linear extensions of C. By Theorem 3.5 with k = 6, each draw takes time  O(n3 log n) with probability at least 1−1/n5. The failure probability of event E is at most 1n, so with probability at least 1− 1/n5, the number of times Algorithm 2 is repeated before succeeding is at most 5, and in particular constant. Next, we take a union bound over the following events: (1) each of the O(log n) calls to Huber’s algorithm does not take too long, and (2) Algorithm 2 is not repeated more than 5 times. By this union bound, Algorithm 2 runs in time O(n3 log2 n) with probability at least 1−O(log n/n5).

In each iteration t, Algorithm 1 in Line 6 needs to “construct”  α-representative preference orders for  C(t)afor all agents a. In the first iteration, this is trivial, as any order is  α-representative. In subsequent iterations, only the orders for the two agents who were part of blocking pairs may have to be updated. Thus, each iteration requires two calls to Algorithm 2, for a total time of  O(n3 log2 n).By a union bound over these two invocations of Algorithm 2, each iteration of Algorithm 1 takes time O(n3 log2 n) in calls to Algorithm 2, with probability at least 1−O(log n/n5). This (high-probability) running time of  O(n3 log2 n) dominates the running time of the Gale-Shapley Algorithm and of updating the constraint sets in each iteration.

Finally, by a union bound over all  O(n2 log n) iterations of Algorithm 1, the running time guarantee holds with probability at least 1−O(log2 n/n3) = 1−O(1/n2).

image

Proof. The high-level idea of the proof is quite straightforward, but a formal proof needs some care in the details. We begin with the very straightforward case of adversarially chosen preferences, and then discuss the intuition of the proof for random preferences. Let  m1 ≻ m2 ≻ · · · ≻ mn bethe men ranked by the common (and known) preference of all women. For each man  i, let ≻∗ibe his preference order over women. As is well known, the unique stable matching when women have common preferences is the result of a serial dictatorship of the men in the order of these preferences. That is,  m1is matched with his first choice,  m2is matched with his first choice among the remaining women, and so on.

An adversary can use the following strategy. Whatever matching the algorithm proposes in the first iteration, the woman w matched to  m1will be adversarially placed as his last-ranked choice, and the algorithm will reveal a blocking pair of  m1with some other woman, and will continue revealing that same blocking pair whenever  m1is matched with w. Whatever different woman m1is next matched with (whether it is the one that was revealed or not) will be adversarially placed as  m1’s second-to-last choice, and so on. Once the algorithm has matched  m1 with alln women, the the last woman to be matched with him is known to be his first choice. At this point, the adversary will continue with the same strategy for  m2with the remaining  n−1 women.Because only blocking pairs involving  m1have been revealed up to that point, the ranking of  m2 iscompletely undetermined. The adversary continues in this way for all men in order. In total, this adversary strategy will force any algorithm to use at least �n−1i=0 i = Ω(n2) rounds of queries.

When the men have i.i.d. uniformly random preference rankings, the high-level idea remains the same. For any i, until the algorithm has identified the correct matches of  m1, . . . , mi−1, theadversary will not reveal any blocking pair involving  mi, . . . , mn. However, in contrast with the case of adversarial rankings, once all of these  i−1 matches have been identified, the adversary cannot ensure any more that the woman  w with whom miis matched is the lowest remaining in his ranking. Nevertheless, the adversary can still reveal a block with the woman immediately preceding  w in mi’s ranking. Intuitively (we will make this argument precise, of course), this “does not reveal much information” besides the fact that  w is not mi’s correct match, so the algorithm in expectation still has to try a constant fraction of all remaining women as  mi’s match.

To precisely define the adversary’s behavior in the random-preferences case, fix realized preferences for all men. We inductively define the following women and sets of women.  S1is the set of all women. For each  i, let Xibe the top choice of  miamong the women in  Si, according to the uniformly random (but fixed for now) preference ranking  ≻∗i . Then, let  Si+1 = Si \ {Xi}. Theunique stable matching has each  mimatched with  Xi. The adversary behaves as follows: if the proposed matching is not stable, then let  i = min�j�� µ(t)(mj) ̸= Xj�. Reveal the blocking pair consisting of  mi and w′, where the latter is the  immediate predecessor of w = µ(t)(mi) among the set  Siaccording to  ≻∗mi.

Making the preceding informal argument — about this adversary not revealing “much information” — precise requires some care. In particular, we must reason about what the algorithm has “learned” about the uniformly random preferences of the men. Our analysis will therefore use the Principle of Deferred Decisions, and consider the random preferences of the men as if they are drawn in stages over the course of the algorithm’s execution. However, these deferred random choices will of course be sound, as we will show that the final preferences are i.i.d. uniformly random, regardless of the matchings proposed by the learning algorithm A. Due to the probabilistic nature of our analysis,  Xi and Siwill henceforth be treated as random variables.

In each round t, the algorithm A proposes a matching  µ(t). The adversary reveals a blocking pair involving  miif and only if  i = min�j �� µ(t)(mj) ̸= Xj�. Naturally, an adversary following this strategy will reveal to the algorithm that  µ(t)(mj) = Xj for all j < i.Because blocking pairs involving  miare only revealed in this case, we obtain the following first key property: (1) each blocking pair for  mi involves µ(t)(mi) ∈ Siand some other  w ∈ Si. Since the adversary will always reveal a w who is the immediate predecessor of  µ(i)(mi) among Si, this implies a second key property: (2)  C(t)mi, which to avoid clutter we denote by  C(t)i , defines a union of disjoint chains on  Si.

In our analysis, we consider as already drawn the following information about all of the  ≻∗iat all times t, which contains all of the information known to the algorithm: (1) The preference constraints  C(t)i(the blocking pairs involving  mialready revealed to the algorithm), and (2) possibly the realization  wiof the random  Xi, if it has been revealed, in which case we consider the entire preference relation  ≻∗i as having been drawn already. Specifically, this information is tracked using the variables  Ci and wi, where initially we have  Ci = ∅ and wi =⊥ for all i. Whenever the adversary commits to a blocking pair, we update the variable  Ci, and whenever the adversary commits to the value of  Xi in round t, we update the variable  wito reflect the realized value of  Xiand update the variable  Cito reflect the realization of all of  ≻∗i . We write b(t)i = |Si| −��C(t)i �� = n + 1 − i −��C(t)i �� forthe number of chains in the partial order defined by  C(t)i . The adversary’s strategy alongside the gradual draw of preferences, as viewed from the Deferred Decision point of view of our analysis, is given in Algorithm 3.

image

We first verify that Algorithm 3, using deferred decisions, generates uniformly random rankings regardless of the proposed matchings  µ(t). To see that this, recall the key properties (1) and (2) above, which state that the revealed constraints  C(t)i partition Siinto disjoint chains of immediate predecessors. In a uniformly random permutation, the order of these chains is uniformly random. In particular, when there are  b(t)ichains, each woman without a known predecessor is first in the ranking with probability 1/b(t)i . The bias of the random coin flip in Line 8 guarantees that when any woman w without a known predecessor is considered, she is revealed to be first in the ranking with exactly this probability 1/b(t)i . Conditioned on not being first overall in the ranking, w has a predecessor who is uniformly random among the last elements of all other  b(t)i −1 chains. The draw of  w′ in Line 10 exactly matches this distribution.

Finally, we analyze the number of iterations that A needs. In Step 8 of Algorithm 3, the probability that  wiis revealed (and thus the algorithm can move on to finding  mi+1) is 1b(t)i =

n+1−i−|C(t)i |. For i ≤ n/3and the first  n/3iterations for  mi (where C(t)i ≤ n/3), we obtain that the probability of finding  wiis at most 3/n. Thus, in expectation, it takes at least n/3iterations each to identify  w1, . . . , wn/3, implying that A needs at least n2/9iterations in expectation.

3.6 Many-to-Many Matchings with Extra Information

Our algorithms and analysis would have remained essentially unchanged for the case of many-to-many matchings had the algorithm, along with a blocking pair  B = {w, f}, learned f′ ∈ µ(w), w′ ∈µ(f) such that  f ≻∗w f′ and w ≻∗f w′. In this case, the extra information would have still allowed the algorithm to add at least one new constraint  ⟨f, f′⟩ or ⟨w, w′⟩ to Cw or Cfin each iteration. So Algorithm Sample-Representative-Preference (Algorithm 2) could have been used entirely unchanged, and the standard many-to-many variant of the Gale-Shapley Algorithm could have been used to compute a stable matching with respect to the sampled preference orders.

Computing a stable many-to-many matching with respect to unknown preferences becomes significantly more challenging when the algorithm only sees the blocking pair (w, f), with no additional information about which agent in  µ(w) or µ(f) should be “swapped out.” The modified algorithms and analysis for this case are the subject of Section 4.

In this section, we describe and analyze algorithms for many-to-many matchings when the response to a query reveals only a blocking pair (w, f), but not the current matches  f′ ∈ µ(w) and w′ ∈ µ(f)such that  f ≻∗w f′ and w ≻∗f w′.

We obtain two results. First, in Section 4.1, we assume that all quotas are bounded, and write q = maxa qa. We give a simple learning algorithm that takes at most  O(nq+2) rounds of interaction, with computation time at most  O(nq+3) per round — this algorithm is a conceptual generalization of the simple  O(n3) algorithm for one-to-one matchings discussed in Section 1. Then, in Section 4.2, we present a more involved algorithm that only requires  O(n3 log n) rounds of interaction, regardless of the quotas. While the number of interactions is always polynomial, the computational requirements are not: we are currently not aware of any implementation that is not exponential in n. We show that being able to efficiently sample preference orders subject to constraints more involved than pairwise comparisons would be sufficient to obtain an efficient implementation.

The algorithms for both cases are based on modifying Algorithm 1. The main difference is the type of information that the algorithm obtains from a blocking pair. Whereas previously, a blocking pair {w, f} revealed that  w prefers f over µ(w), it now only reveals that w prefers f over (at least) one participant in  µ(w).

For the modified algorithm,  C(t)awill be a set of constraints of the form  ⟨a′, S⟩; in these constraints, all sets S will have the same size, namely,  qa. The algorithm uses speculative preference orders  ≻(t)athat are “representative” of  C(t)ain a sense that we will describe below. The modified learning algorithm is given by Algorithm 4.

image

The difference between the analyses in Sections 4.1 and 4.2 is solely in the choice of the speculative preferences, and the corresponding guarantees that can be derived on the progress of Algorithm 4 and the running time of the implementation. Specifically, in Section 4.1 we will allow the speculative preferences to be arbitrary preferences that are consistent with  C(t)a , while in Sec- tion 4.2 we will use speculative preferences that are representative of  C(t)awith respect to a notion of representativeness that will be defined iteratively in terms of fractions of being ranked last.

4.1 Small Quotas

The simplest way to implement Line 6 in Algorithm 4 is to let  ≻(t)a be anypreference order consistent with all the constraints in  C(t)a . We show that such a preference order can be found efficiently.

Proposition 4.1. Assume that there exists at least one linear order ˆ≻consistent with C. Then, such an order can be found in time  O(n · |C|).

Proof. The algorithm is a generalization of the standard Topological Sort algorithm for directed acyclic graphs. It scans through all the constraints and counts, for each relevant  a (either a ∈ Wor  a ∈ F, depending on whether a preference order is being constructed for some  w ∈ W orsome  f ∈ F), how many constraints there are of the form  ⟨a, S⟩. Because each constraint  ⟨a, S⟩implies that a cannot be last in the preference order, and we assumed that there is a linear order ˆ≻consistent with C, there must be at least one a for which there are no constraints  ⟨a, S⟩.

The algorithm picks any such a and puts it in the last remaining place of the preference order. It then removes all constraints of the form  ⟨a′, S⟩ with S ∋ a from C, because they have been satisfied by placing a last. It then continues with the remaining constraints for the remaining positions.

The algorithm takes n iterations, each of which take time O(C) for the scan and removal.

While the generalization of Topological Sort to the more general types of constraints is fairly straightforward, we remark on a subtlety that does not arise for standard Topological Sort. A slight generalization of the types of constraints we consider is to allow constraints both of the form “a must precede at least one element of  S” and “amust follow at least one element of  S.” When S isa singleton, these types of constraints are the same. When the sets S can be larger, the types of constraints are different, and we showed that when only one type of constraint is allowed, a feasible assignment can be found efficiently if one exists. (And otherwise, the algorithm can determine that no feasible assignment exists.) This ceases to be true when both types of constraint are combined:

Proposition 4.2. The following problem is NP-hard: given a set C of constraints, each of which is of the form (1)  zimust precede at least one element of  Si, or (2) zimust follow at least one element of  Si, decide whether there exists a linear order on the elements satisfying all the constraints. NPhardness holds even when  |Si| ≤ 3 for all i.

Proof. We give a straightforward reduction from 3SAT. Given an instance of  n variables x1, . . . , xnand  m clauses Cj, each of which is a disjunction of three literals, the reduction constructs an instance with 2n+1 elements and n+m constraints. There are two elements  zi, ¯zifor each variable xi, and one more element ˆy. For each variable  xi, there is a constraint that ˆy must follow at least one of  zi, ¯zi. For each clause  Cj = ℓj,1 ∨ ℓj,2 ∨ ℓj,3, there is a constraint that ˆy must precede at least one of the elements corresponding to the  ℓj,i; for example, if  Cj = x2 ∨ ¯x3 ∨ x5, then the constraint says that ˆy must precede at least one of the elements  {z2, ¯z3, z5}.

The correctness of the reduction can be easily observed by interpreting all the literals after ˆy as true. The first set of constraints captures that no assignment may make both  xi and ¯xi true, andthe second set of constraints captures that each clause must have at least one true literal.

We can now prove our main theorem for the simple algorithm:

Theorem 4.3. If the algorithm of Proposition 4.1 is used to compute the speculative preference order  ≻(t)ain Line 6 of Algorithm 4, then Algorithm 4 terminates after at most  O(nq+2)iterations; furthermore, each iteration runs in time at most  O(nq+3).

Proof. The proof follows in the footsteps of that for Theorem 3.9. Assume that in iteration t, a blocking pair  B(t) = {w, f}is revealed, implying that  f ≻∗w f′ for some f′ ∈ µ(t)(w) and w ≻∗f w′for some  w′ ∈ µ(t)(f). Again, by the non-blocking property of a stable matching with respect to the  ≻(t)a , we can infer that at least one of  µ(t)(w) ≻(t)w f (i.e., wprefers each of  µ(t)(w) to f) orµ(t)(f) ≻(t)f wholds. Without loss of generality,  µ(t)(w) ≻(t)w fholds. In particular, because  ≻(t)wsatisfies all constraints in  C(t)w , we obtain that  ⟨f, µ(t)(w)⟩ /∈ C(t)w , implying that  |C(t+1)w | = |C(t)w | + 1.

Therefore, after each round, the quantity Φt = �a |C(t)a |increases by at least 1. The size of each constraint set  C(t)ais upper-bounded by  |C(t)a | ≤� nqa�· (n − qa) = O(nq+1). This implies that Φt ≤ �a� nqa�· (n − qa) = O(nq+2), so the process terminates after at most  O(nq+2) iterations. Similarly, because the algorithm of Proposition 4.1 runs in time  n · |C(t)a | = O(nq+2) for each agent (and since the running time of each iteration is dominated by this), each iteration runs in time at most  O(nq+3).

For the one-to-one matching case, as discussed in the introduction, this proves a bound of  O(n3)on the deterministic query complexity, which is obviously worse by a factor of n/log ncompared to the bound we obtained in Section 3 with a more careful choice of speculative preference orders.

4.2 General Quotas

In this section, our goal is to improve the guarantee of Theorem 4.3 to eliminate the (exponential) dependence on q. As in Section 3, we do so by a more careful choice of the speculative preference orders  ≻(t)aused in Algorithm 4. Unlike in Section 3, our implementation will not be computationally efficient.

Our new sampling algorithm is a modification of Algorithm 2. It repeatedly estimates the fractions of valid preference orders (consistent with the given constraints) that have an element a in last position, among a set of remaining elements. For any set C of constraints, set U, and agent a ∈ U, let ℓU,a(C) denote the fraction of preference orders consistent with C that have a ranked last among the agents in U. The algorithm, in each iteration t, will have a set  U (t) of elements still in need of ranking, and chooses the last element based on the  ℓU,a(C) (or estimates thereof in Section 4.3). It is detailed in Algorithm 5.

image

Akin to the analysis of representative preference orders in Section 3, we will show that the preference order ˆ≻produced by Algorithm 5 guarantees a reasonable reduction in the number of remaining consistent preference orders.

Lemma 4.4. Let ˆ≻be the preference order returned by Algorithm 5. Let  ⟨a′, S⟩be any constraint that holds in more than a 1− 1n fraction of preference orders consistent with  C. Then, a′ precedesat least one element of  S in ˆ≻.

image

Because the algorithm chooses ¯z(t) ∈ argmaxa ℓU(t),a(C), it ensured in particular that ℓU(t),¯z(t)(C) ≥ 1n. In particular, because no element of S had been previously placed, and since ⟨a′, S⟩was assumed to be a constraint holding in (strictly) more than a 1− 1n fraction of the pref- erence orders consistent with C, we get that ¯z(t) ̸= a′, so by definition of t, an element of S must be placed in iteration t. This implies that  a′ precedes at least one element of S, namely, ¯z(t).

Theorem 4.5. If Representative-Preference-Large-Constraints(C(t)a )is used to compute the speculative preference order  ≻(t)ain Line 6 of Algorithm 4, then Algorithm 4 terminates after at most  O(n3 log n)iterations.

Proof. As in the proof of Theorem 3.4, consider any iteration t in which a blocking pair  B(t) = {w, f}(with respect to the true preferences  ≻∗a) was revealed. Because the Gale-Shapley Algorithm finds a stable matching, the pair {w, f} was not blocking for the preferences  ≻(t)aused in running GaleShapley. In particular, this means that at least one of w, f preferred all of their assigned matches over the other. Without loss of generality, assume that  µ(t)(w) ≻(t)w f, i.e., f′ ≻(t)w f for allf′ ∈ µ(t)(w)).

By Lemma 4.4, applied to  ≻(t)w , we infer that at least a 1n fraction of the preference orders consistent with  C(t)w have µ(t)(w) ≻(t)w f. The revelation of the blocking pair {w, f} rules out all of the linear orders  ≻∗w that have µ(t)(w) ≻∗w f. Therefore,  ρ(t+1)w ≤ (1 − 1n) · ρ(t)w . In turn, this implies that  ρ(t+1) ≤ (1 − 1n) · ρ(t).

We obtain that Algorithm 4 terminates after at most  T = log nn−1 (n!)n = O(n3 log n) iterations.

image

As in the case of Theorem 3.4, Theorem 4.5 makes no statement about computational efficiency. Unlike in the case of one-to-one matchings, we are not aware of any efficient implementation. However, in the remainder of this section, we will show that as with one-to-one matchings, an ability to sample (nearly) uniformly preference orders from among all satisfying a given constraint set would be sufficient to achieve an efficient implementation. The obstacles to sampling are discussed in more depth in Section 5.

4.3 Estimations via Sampling

Instead of the actual values  ℓU(t),a(C), Algorithm 5 can be run with estimates ˆℓU(t),a(C). Specifi-cally, in each iteration t, the algorithm can sample  K = 6n2 ln(n) linear orders  ≻(t)iindependently and uniformly from the set of all linear orders satisfying all constraints  C. Then, ˆℓU(t),a(C) isthe fraction of sampled orders that have a ranked last among  U (t). The algorithm uses the estimates ˆℓU(t),a(C) in place of the actual  ℓU(t),a(C). We will refer to this variant of Algorithm 5 as Sample-Representative-Preference-Large-Constraints.

We reiterate that we are not aware of any computationally efficient algorithm to sample such  ≻(t)i , even approximately uniformly. Nonetheless, we will analyze the impact of such sam- pling, and show that the remainder of the algorithm works as before.

We define (high-probability, as we will show) events  Et, which jointly will guarantee that the samples are sufficiently accurate. Specifically,  Etis defined as the event that all estimates ˆℓU(t),ain round t are accurate estimates to within at most  ± 12n. More formally, the event  Et is definedas��ℓU(t),a(C) − ˆℓU(t),a(C)�� ≤ 12n for all a ∈ U (t). We begin by showing a suitable high-probability guarantee on these events.

Lemma 4.6. For all t and all sets U, we have that Prob[Et | U (t) = U] ≥ 1 − 2n2 .

Proof. Consider any iteration t and condition on  U (t) = U. Let a ∈ Ube arbitrary. Because the ≻iare sampled uniformly randomly, by definition,  E�ˆℓU,a(C)�= ℓU,a(C). The Hoeffding Bound

guarantees that Prob�|ˆℓU,a(C) − ℓU,a(C)| ≥ 12n�= 2 exp(−2K/(2n)2) ≤ 2n−3. A union bound over all of the at most  n elements a ∈ Unow implies that all estimates are accurate to within an additive

image

Sample-Representative-Preference-Large-Constraints(C) are accurate up to an additive

2n. By the union bound, Prob[E] ≥ 1 − 2n. The key implication of the event E is captured by the following lemma, which is a slightly weaker version of Lemma 4.4:

Lemma 4.7. Assume that the event E happened, and let ˆ≻be the preference order returned by Sample-Representative-Preference-Large-Constraints(C). Let ⟨a′, S⟩be any constraint that holds in more than a 1− 12n fraction of preference orders consistent with  C. Then, a′ precedesat least one element of S.

Proof. The proof is nearly identical to that of Lemma 4.4. In the iteration when an element of S ∪{a′}is placed for the first time, the algorithm uses the estimates ˆℓU(t),ainstead of  ℓU(t),a. Underthe event E, the fact that  ℓU(t),a′ < 12n by assumption and that the estimate is accurate to within an additive  ± 12n means that ˆℓU(t),a′ < 1n, so again,  a′cannot be placed in this iteration.

We obtain the following theorem.

Theorem 4.8. If Sample-Representative-Preference-Large-Constraints(C(t)a )(i.e., Al- gorithm 5 when run using random sampling-based estimates) is used to compute the preference order ≻(t)ain Line 6 of Algorithm 4, then Algorithm 4 terminates after at most  O(n3 log n)iterations with high probability; furthermore, each iteration requires drawing  O(n2 log n)samples for n iterations of the for loop of Algorithm 5, for each of the n agents. Thus, each iteration requires drawing O(n4 log n) samples.

Proof. The proof is practically identical to that of Theorem 3.9. Because E has high probability, if the algorithm is run for Θ(n3 log n) iterations, the number of iterations in which its speculative preference orders are such that they sufficiently reduce the number of candidate scenarios is also Θ(n3 log n). The other iterations do not guarantee progress, but cannot increase the number of remaining scenarios. Thus, up to constant, we achieve the same guarantees as the deterministic Theorem 4.5.

We presented interactive algorithms for learning stable matchings, both in the case of one-to-one matchings and in the case of many-to-many matchings. The algorithm repeatedly proposes a matching; if the matching is not stable, the algorithm is informed of one blocking pair. We showed that in the case of one-to-one matchings, there is a computationally efficient interactive algorithm that finds a stable matching after at most  O(n2 log n) rounds of interaction with high probability. We also showed that any algorithm for finding a stable matching in this model requires Ω(n2)rounds of interaction, even in expectation.

Our algorithm extends to the case of many-to-many matchings, where the guarantee on the number of rounds deteriorates to  O(n3 log n).However, this algorithm is not computationally efficient; an “efficient” version can be constructed which requires  O(nq+2) iterations when each agent can be matched with at most q other agents.

The most immediate open question is whether the algorithm with  O(n3 log n) rounds of interactions can be implemented in polynomial time. A sufficient condition for doing so would be to be able to sample (approximately) uniformly from the set of all linear orders consistent with a set of constraints of the form “element z must precede at least one element from the set  S.” This isan interesting sampling question in its own right, and it also has applications in other scenarios in which a principal is trying to learn preference rankings of agents from observed behavior, e.g., in the case of learning valuations of single-minded buyers from their product choices.

This sampling problem appears non-trivial. As we discussed (and heavily used), the special case when all sets S have size 1 is exactly the problem of sampling a linear extension of a given partial order. This is accomplished by the well-known Karzanov-Khachiyan chain (Karzanov and Khachiyan, 1991), which starts from any feasible linear extension (e.g., computed by Topological Sort), and repeatedly picks a uniformly random index i, swapping elements i and i + 1 with probability 1/2if doing so is consistent with the partial order constraints, and doing nothing otherwise. Karzanov and Khachiyan show rapid mixing of this chain by studying conductance properties of

the order polytope associated with the partial order (Stanley, 1986). Bubley and Dyer (1999) show that giving a slight bias towards sampling indices i towards the middle of the ranking leads to a much simpler analysis based on a beautiful and elementary path coupling. One could still aim to prove rapid mixing for the Karzanov-Khachiyan chain, applied to constraints in which the sets S are larger than singletons. Unfortunately, the natural representation of constraints in n-dimensional space ceases to be convex, so it is not clear whether the conductance arguments can be applied. Similarly, the specific path coupling of Bubley and Dyer also seems to break down for more complex constraints. An open question with fewer implications, but possible technical interest, is whether the number of iterations can be reduced from  O(n3 log n) to O(n2q log n) when all agents’ capacities are bounded by q. One would not expect a better bound, because even when the rankings are not constrained at all (e.g., in the first iteration), a constraint of the form  ⟨a, S⟩only rules out a 1|S|+1 fraction of rankings. A sufficient result would be a generalization of Yu’s Proportional Transitivity Theorem (Yu, 1998). A possible formulation of such a result would be the following. For a given set C of constraints of the form  ⟨ai, Si⟩, where each  |Si| ≤ q, let Cφbe the set of all constraints  ⟨a, S⟩with  |S| ≤ qsuch that at least a  φfraction of all rankings  ≻consistent with  C satisfies ⟨a, S⟩.Theorem 3.3 can then be restated as saying that when q = 1, for every  φ ≥ φmin, we have that

image

φqmin = 1 − Ω(1/q) such that when each constraint  ⟨a, S⟩ ∈ C has |S| ≤ q, for every  φ ≥ φqmin theconstraint set C satisfies C

image

from that of Yu (1998). Yu’s proof is based on an application of Shepp’s XYZ inequality (Shepp, 1982), combined with a dimension reduction technique of Ball (1988) (see also a similar proof in Kahn and Yu (1998)). Ball’s result very heavily relies on the set under consideration (here, the order polytope) being convex, since it relies on the Brunn-Minkowski Inequality. As discussed in the preceding paragraphs, the natural n-dimensional representation of larger constraints ceases to be convex, so one of the key techniques will fail to be available.

Heiner Ackermann, Paul W. Goldberg, Vahab S. Mirrokni, Heiko R¨oglin, and Berthold V¨ocking. 2011. Uncoordinated Two-Sided Matching Markets. SIAM J. Comput. 40, 1 (2011), 92–106. A preliminary version appeared at EC’08.

Dana Angluin. 1988. Queries and Concept Learning. Machine Learning 2 (1988), 319–342.

Keith Ball. 1988. Logarithmically concave functions and sections of convex sets in  Rn. StudiaMathematica 88, 1 (1988), 69–84.

Xiaohui Bei, Ning Chen, and Shengyu Zhang. 2013. On the Complexity of Trial and Error. In Proc. 45th ACM Symp. on Theory of Computing. 31–40.

Russ Bubley and Martin Dyer. 1999. Faster Random Generation of Linear Extensions. Discrete Mathematics 201, 1 (1999), 81–88.

Xi Chen, Decheng Dai, Ye Du, and Shang-Hua Teng. 2009. Settling the Complexity of Arrow- Debreu Equilibria in Markets with Additively Separable Utilities. In Proc. 50th IEEE Symp. on Foundations of Computer Science. 273–282.

Xi Chen and Xiaotie Deng. 2006. Settling the Complexity of Two-Player Nash-Equilibrium. In Proc. 47th IEEE Symp. on Foundations of Computer Science. 261–270.

Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. 2017. The Complexity of Non-Monotone Markets. J. ACM 64, 3 (2017), 20:1–20:56.

Jen-Hou Chou and Chi-Jen Lu. 2010. Communication Requirements for Stable Marriages. In Proc. 7th Intl. Conf. on Algorithms and Complexity (CIAC). 371–382.

Kim-Sau Chung. 2000. On the Existence of Stable Roommate Matchings. Games and Economic Behavior 33, 2 (2000), 206–230.

Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. 2009. The Complexity of Computing a Nash Equilibrium. SIAM J. Comput. 39, 1 (2009), 195–259. A preliminary version appeared at STOC’06.

Effrosyni Diamantoudi, Eiichi Miyagawa, and Licun Xue. 2004. Random paths to stability in the roommate problem. Games and Economic Behavior 48, 1 (2004), 18–28.

Ehsan Emamjomeh-Zadeh and David Kempe. 2017. A General Framework for Robust Interactive Learning. In Proc. 31st Advances in Neural Information Processing Systems. 7082–7091.

Ehsan Emamjomeh-Zadeh, David Kempe, and Vikrant Singhal. 2016. Deterministic and Proba- bilistic Binary Search in Graphs. In Proc. 48th ACM Symp. on Theory of Computing. 519–532.

David Gale and Lloyd S. Shapley. 1962. College Admissions and the Stability of Marriage. Amer. Math. Monthly 69, 1 (1962), 9–15.

Yannai A. Gonczarowski, Noam Nisan, Rafail Ostrovsky, and Will Rosenbaum. 2019. A Stable Marriage Requires Communication. Games and Economic Behavior 118 (2019), 626–647. A preliminary version appeared at SODA’15.

Julien Grenet, Yinghua He, and Dorothea K¨ubler. 2019. Decentralizing Centralized Matching Markets: Implications from Early Offers in University Admissions. (2019). Mimeo.

Avinatan Hassidim, D´eborah Marciano, Assaf Romm, and Ran I. Shorrer. 2017. The Mechanism is Truthful, Why aren’t You? American Economic Review Papers and Proceedings 107, 5 (2017), 220–24.

Wassily Hoeffding. 1963. Probability Inequalities for Sums of Bounded Random Variables. J. Amer. Statist. Assoc. 58 (1963), 13–30.

Mark Huber. 2006. Fast Perfect Sampling from Linear Extensions. Discrete Mathematics 306, 4 (2006), 420–428.

Jeff Kahn and Yang Yu. 1998. Log-Concave Functions and Poset Probabilities. Combinatorica 18, 1 (1998), 85–99.

Alexander Karzanov and Leonid Khachiyan. 1991. On the Conductance of Order Markov Chains. Order 8, 1 (1991), 7–15.

Bettina Klaus and Flip Klijn. 2007. Paths to stability for matching markets with couples. Games and Economic Behavior 58, 1 (2007), 154–171.

Donald E. Knuth. 1976. Marriage stables et leurs relations avec d’autres probl`emes combinatoires. Les Presses de l’Universit´e de Montr´eal.

Fuhito Kojima and M. Utku ¨Unver. 2006. Random paths to pairwise stability in many-to-many matching problems: a study on market equilibration. International Journal of Game Theory 36 (2006), 473–488.

Jinpeng Ma. 1996. On Randomized Matching Mechanisms. Economic Theory 8, 2 (1996), 377–381.

Yusuke Narita. 2018. Match or Mismatch? Learning and Inertia in School Choice. (2018). Mimeo.

Cheng Ng and Daniel Hirschberg. 1990. Lower Bounds for the Stable Marriage Problem and Its Variants. SIAM J. Comput. 19, 1 (1990), 71–77.

Christos H. Papadimitriou. 2007. The Complexity of Finding Nash Equilibria. In Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V. Vazirani (Eds.). Cambridge University Press, 29–52.

Baharak Rastegari, Anne Condon, Nicole Immorlica, Robert Irving, and Kevin Leyton-Brown. 2014. Reasoning about optimal stable matchings under partial information. In Proc. 15th ACM Conf. on Economics and Computation. 431–448.

Alvin E. Roth. 1984. Stability and Polarization of Interests in Job Matching. Econometrica 52, 1 (1984), 47–57.

Alvin E Roth. 1985. The college admissions problem is not equivalent to the marriage problem. Journal of Economic Theory 36, 2 (1985), 277–288.

Alvin E. Roth and Marilda A. Oliveira Sotomayor. 1990. Two-Sided Matching: A Study in GameTheoretic Modeling and Analysis. Cambridge University Press.

Alvin E. Roth and John H. Vande Vate. 1990. Random Paths to Stability in Two-Sided Matching. Econometrica 58, 6 (1990), 1475–1480.

Aviad Rubinstein. 2016. Settling the complexity of computing approximate two-player Nash equi- libria. In Proc. 57th IEEE Symp. on Foundations of Computer Science. 258–265.

Aviad Rubinstein. 2018. Inapproximability of Nash Equilibrium. SIAM J. Comput. 47, 3 (2018), 917–959.

Ilya Segal. 2007. The Communication Requirements of Social Choice Rules and Supporting Budget Sets. Journal of Economic Theory 136 (2007), 341–378.

Larry A. Shepp. 1982. The XYZ Conjecture and the FKG Inequality. The Annals of Probability 10, 3 (1982), 824–827.

Richard P. Stanley. 1986. Two Poset Polytopes. Discrete Computational Geometry 1 (1986), 9–23.

Akihisa Tamura. 1993. Transformation from arbitrary matchings to stable matchings. Journal of Combinatorial Theory, Series A 62, 2 (1993), 310–323.

Yang Yu. 1998. On Proportional Transitivity of Ordered Sets. Order 15 (1998), 87–95.

In Emamjomeh-Zadeh and Kempe (2017), a framework was proposed for analyzing the number of rounds necessary to learn combinatorial structures in interactive learning settings such as ours. In the framework of Emamjomeh-Zadeh and Kempe (2017), which generalizes Angluin’s (Angluin, 1988) Equivalence Query Model, an algorithm repeatedly gets to propose a “structure,” and either learns that it has proposed the correct structure, or is given a small “local” correction to the structure. The key notion in Emamjomeh-Zadeh and Kempe (2017) is their Definition 2.1, restated here as follows:

Definition A.1. Let G be a (directed or undirected) graph whose nodes are the N candidate structures and whose edges have positive weights. For each node/structure  v, let Rvbe the set of possible responses that the learner can receive, and let Γvdenote the neighborhood of v in G. It is required that for each v, there be a mapping  φ : Rv → Γvwith the following property: if t is the correct structure (the “target”) and r is a correct response to the structure v when t is the target, then  φ(r) must lie on a shortest path from v to t in G.

Using machinery from Emamjomeh-Zadeh et al. (2016), it is shown in Emamjomeh-Zadeh and Kempe (2017) that under the feedback model of Definition A.1, the correct structure can be learned in few rounds of interactions. Specifically, assume that the structure is known to lie in a set S of N0 ≤ Nof the candidate structures. If G is undirected, then the correct structure can always be learned in at most log2 N0rounds of interaction. This result furthermore extends to some directed graphs: if each edge e (of weight  we) is part of directed cycle of total weight at most  cwe, then thecorrect structure can be learned in at most log cc−1 N0rounds of interaction. (Notice that undirected graphs are subsumed by the special case c = 2.)

The proof of this result relies heavily on Proposition 2.1 of Emamjomeh-Zadeh and Kempe (2017), the relevant special case of which we restate here as follows:

Proposition A.2 (Proposition 2.1 of Emamjomeh-Zadeh and Kempe, 2017). Let G be a (weighted, directed) graph satisfying Definition A.1, and assume that each edge  e of weight weis part of a cycle of total weight at most  cwe. Then, for every set  S ⊆ V (G)of nodes, there exists a node v = v(S) such that for every valid response  r ∈ Rv, at most a c−1cfraction of the nodes in S have a shortest path from v to them starting with the edge  φ(r).

While our interaction model at a high level is exactly that of Emamjomeh-Zadeh and Kempe (2017), we show that for finding stable matchings, there is no useful graph G (undirected or directed) satisfying Definition A.1. Thus, the techniques of Emamjomeh-Zadeh and Kempe (2017) cannot be leveraged to obtain our  O(n2 log n) result from Theorem 3.9. We will show this result even for the simplest case of one-to-one matchings.

First, in applying the framework of Emamjomeh-Zadeh and Kempe (2017), it is important to clarify which structures would comprise the nodes of G. While the algorithm proposes a matching in each round, the revelation of a blocking pair does not constitute a local correction or improvement of the matching itself, but provides information about the underlying preferences. Therefore, the set of all candidate structures is the set of all 2n-tuples of n-element preference orders, i.e., there are (n!)2n structures. Because there are only n! possible matchings, by symmetry, each matching is stable for (n!)2n−1 structures, and the algorithm succeeds when it guesses any one of them. The feedback is always consistent with one particular (adversarially chosen) structure.

In each round, a blocking pair reveals to the algorithm exactly one (not necessarily adjacent) transposition in one or two of the preference orders. We show that even the following, apparently simpler, interactive search problem cannot be encoded in a graph satisfying Definition A.1.

Proposition A.3. Consider the following interactive learning problem: The set of structures is the set of all preference orders over n elements. There is an (unknown to the algorithm) correct preference order  ≻∗. In each round t, the algorithm proposes a preference order  ≻(t). Unless ≻(t)equals  ≻∗, the algorithm is shown one (not necessarily adjacent) pair of elements that  ≻(t) and ≻∗have in opposite orders.

There is no (weighted) undirected graph G for this feedback model satisfying Definition A.1. There is also no (weighted) directed graph G for this feedback model in which each edge of weight weis part of a directed cycle of total weight at most  cwe, for any c < n.

Notice the contrast between Proposition A.3 and Lemma 3.1 of Emamjomeh-Zadeh and Kempe (2017), which states that when the pair of elements that is revealed is always adjacent, there exists an undirected (and unweighted) graph satisfying Definition A.1, leading to an efficient learning algorithm.

Proof. We prove the proposition by showing that it violates the conclusion of Proposition A.2 for any c < n. Specifically, let S be the set of all n cyclical shifts of (1, 2, . . . , n), i.e., the orders

image

[n ≻ 1 ≻ 2 ≻ · · · ≻ n − 1]. Let vdenote the node corresponding to any linear order ˆ≻ of {1, . . . , n}(not necessarily in S). We will focus on the response  r ∈ Rvdefined as follows. If there exists any index  i such that i+1ˆ≻i, then rreveals any such pair (i+1, i), informing the algorithm that i should precede i+1. Notice that the only order in S not consistent with this information is  ≻i. If no such i exists, then ˆ≻must equal  ≻1. In this case, r reveals that n should precede 1; this is consistent with all of  S except ≻1.

We have shown that there is a set S of size n such that for each proposed node v corresponding to a linear order ˆ≻, there exists a valid response r such that an n−1nfraction of S is consistent with r. Hence, any (weighted, directed) graph G satisfying Definition A.1 must have some edge e of weight  wethat is not part of any cycle of total weight less than  nwe.

In particular, applying the techniques of Emamjomeh-Zadeh and Kempe (2017) cannot guarantee a better bound than log nn−1 (n!)2n = Ω(n3 log n) for the number of rounds. Notice that a bound of  O(n3 log n) is in fact worse than even the bound of  O(n3) obtained by the simple  O(n3)algorithm for one-to-one matchings discussed in Section 1. This confirms that the techniques of Emamjomeh-Zadeh and Kempe (2017) cannot be applied to yield a more direct proof of our result, and the machinery of proportional transitivity appears necessary to obtain our results.

In this appendix, we explain why by a well-known reduction in stable matching theory, restricting attention to markets with full preference lists is without loss of generality. Given any market with partial preference lists, for each agent a with quota  qawe will add to the other side of the market qanew “phantom” agents, each with a quota of 1. These agents will rank a first, and then rank all other agents on a’s side of the market (real and phantom) in arbitrary order. We complete the preference order of a by appending to it, after all agents acceptable to  a, the qaphantom agents of a, followed by all other agents on the other side of the market in arbitrary order; these are the other phantom agents as well as real agents that are unacceptable to a.

Under this construction, it is well known that there is a bijection between the set of matchings  µin the original (partial preference lists) market and the set of perfect matchings  µ′ in the completed (full preference lists) market in which no pair of phantom agents block. This bijection is as follows: µ′ ⊇ µ, and in addition, in  µ′ each agent ais matched with the  qa −��µ(a)��of its phantom agents that it prefers the most. (The remaining, phantom, agents are matched among themselves to create no blocking pairs between them.) Under this embedding,  µis stable if and only if  µ′ is stable, a pair (w, f) ∈ W × Fis blocking in  µif and only if it is blocking in  µ′, and an agent a individually blocks  µif and only if it blocks  µ′ with one of its phantom agents (as a blocking pair).

Given the above, it is now clear why an algorithm designed to work only with markets with full preference lists can be applied to interactively learn a matching also in markets with (possibly) partial preference lists. Consider the following intermediary protocol between this algorithm A and the environment with partial preference lists: the intermediary will execute A, aiming to learn a stable matching in the “completion” of the market of the environment to full preference lists. When A outputs a perfect matching  µ′ in the completed market, the intermediary will present to the environment the corresponding matching  µin the original market.18 Any blocking pair or individually blocking agent revealed by the environment will then be translated by the intermediary to a blocking pair with respect to  µ, and revealed to A. Once A learns a stable matching  µ′ in thecompleted market, the intermediary will have learned a stable matching in the original market, as desired.19

For the above reason, in our analysis, we assume without loss of generality that all preference lists are full, and we construct learning algorithms that only output perfect matchings. Therefore, if the proposed matching is unstable, then a blocking pair is revealed. We conclude this appendix with an observation that shows that one should be careful with applying even well-known reductions within the context of our coarse query model.

We have shown that any algorithm for full preference lists can be converted into an algorithm for partial preference lists. While any algorithm for partial preference lists can also be converted to work with full preference lists in the adversarial model, we observe that this is in fact not true in other models, due to the requirement to always propose perfect matchings. To see this, consider the result of Roth and Vande Vate (1990) which we mentioned in Section 1, which states that in a model where the response blocking pair is chosen randomly among all blocking pairs based on a myopic (i.e., which depends only on the proposed matching) distribution that gives positive probability for any of the blocking pairs, with probability one, the myopic algorithm converges. As already mentioned in passing, this result crucially depends, even for markets with full preference lists, on the ability to resolve the blocking pair by matching its agents and leaving their previous partners unmatched. Indeed, Tamura (1993) showed that if their partners were to be rematched with each other immediately (as in Knuth’s original version of the myopic algorithm), then the algorithm would never converge from some initial (perfect) matchings — the existence of a “path” from these matchings to a stable matching depends on the ability to leave the former partners of a blocking pair unmatched. Tamura (1993) also shows that to fix this problem, an additional type of query — identifying certain cycles — is required.

In our adversarial setting, though, this issue could not have arisen: since in a matching with full preference lists, the two former partners of a resolved blocking pair together form a new blocking pair, the environment can simply report this pair (and force it to match if the algorithm is myopic) before continuing. More generally, in an adversarial setting, the requirement to always propose perfect matchings does not restrict the power of any learning algorithm since even without this requirement, the adversary can always reveal individually blocking agents if they exist, and thus never reveal any new information for matchings that are not perfect. (On the other hand, a randomly, rather than adversarially, chosen blocking pair may indeed contain new information, which is what saves the myopic algorithm of Roth and Vande Vate (1990) from failing.)


Designed for Accessibility and to further Open Science