Correlated Equilibria for Approximate Variational Inference in MRFs

2016·Arxiv

Abstract

Abstract

Almost all of the work in graphical models for game theory has mirrored previous work in probabilistic graphical models. Our work considers the opposite direction: Taking advantage of recent advances in equilibrium computation for probabilistic inference. In particular, we present formulations of inference problems in Markov random fields (MRFs) as computation of equilibria in a certain class of game-theoretic graphical models. While some previous work explores this direction, none of that work concretely establishes the precise connection between variational probabilistic inference in MRFs and correlated equilibria. There is no work that exploits recent theoretical and empirical results from the literature on algorithmic and computational game theory on the tractable, polynomial-time computation of exact or approximate correlated equilibria in graphical games with arbitrary, loopy graph structure. Our work discusses how to design new algorithms with equally tractable guarantees for the computation of approximate variational inference in MRFs. In addition, inspired by a previously stated game-theoretic view of state-of-the-art tree-reweighed (TRW) message-passing techniques for belief inference as zero-sum game, we propose a different, general-sum potential game to design approximate fictitious-play techniques. We perform synthetic experiments evaluating our proposed approximation algorithms with standard methods and TRW on several classes of classical Ising models (i.e., with binary random variables). We also evaluate the algorithms using Ising models learned from the MNIST dataset. Our experiments show that our global approach is competitive, particularly shinning in a class of Ising models with constant, “highly attractive” edge-weights, in which

it is often better than all other alternatives we evaluated. With a notable exception, our more local approach was not as effective as our global approach or TRW. Yet, in fairness, almost all of the alternatives are often no better than a simple baseline: estimate the marginal probability to be 0.5.

1 Introduction

Almost all of the work in graphical games has borrowed heavily from analogies to probabilistic graphical models. Yet, over-reliance on those analogies and previous standard approaches to exact inference might have led that approach to face the same computational roadblocks that plagued most exact-inference techniques.

As an example of work that heavily exploits previous work in probablistic graphical models (PGMs), Kakade et al. [2003] designed polynomial-time algorithms based on linear programming for computing correlated equilibria (CE) in standard graphical games with tree graphs. The approach and polynomial-time results extend to graphical games with bounded-tree-width graphs and graphical polymatrix games with tree graphs. Exact inference is tractable in PGMs whose graphs have bounded treewidth, but intractable in general [Cooper, 1990, Shimony, 1994, Istrail, 2000]. In 2005, Papadimitriou and Roughgarden showed the intractability of computing the “social-welfare” optimum CE in arbitrary graphical games (see also Papadimitriou and Roughgarden [2008]). Everything seemed to point toward an eventual resignation that the approach of Kakade et al. [2003], along with any other approach to the problem for that matter, had hit the “bounded-treewidth-threshold wall.”

Yet, soon after, Papadimitriou [2005] took a radically different approach to the problem, and surprised the community with an efficient algorithm for computing CE not only in graphical games, but also in almost all known compactly representable games. Jiang and Leyton-Brown [2015a] built upon Pa- padimitriou’s idea to provide what most people would consider an improved polynomial-time algorithm, because of the simplification of the CE that their algorithm outputs (see also Jiang and Leyton-Brown, 2011, for a summary).

An immediate question that arises from the algorithmic results just described is, what is so fundamentally different between the problem of exact inference in graphical models and equilibrium computation that made this result possible in the context of graphical games? Of course, CE, probabilistic inference, and their variants are different problems, even within the same framework of graphical models. The question is, how different are they?

It is well-known that pure strategy Nash equilibrium (PSNE) is inherently a classical/standard discrete constraints satisfaction problem (CSP). It is also

well-known that any CSP can be cast as a most-likely, or equivalently, a maximum a posteriori (MAP) assignment estimation problem in Markov random fields (MRFs). Through this connection, it is clear that there exists a MAP formulation of PSNE. But what about other, more general forms of equilibria?

We present here a formulation of the problem of equilibrium computation as a kind of local conditions for different approximations to belief inference. Similarly, we show how one can view some special games, called graphical potential games [Ortiz, 2015], as defining an equivalent MRF whose “locally optimal” solutions correspond to arbitrary equilibria of the game. Hence, Papadimitriou’s result, and later that of Jiang and Leyton-Brown, open up the possibility that at least new classes of problems in probabilistic graphical models could be solved exactly and efficiently. The question is, which classes?

While we provide specific connections between the two fields that yield immediate theoretical and computational implications, we also provide practical alternatives that result from those connections. That is, the foundation of both Papadimitriou’s and Jiang and Leyton-Brown’s algorithms is the ellipsoid method, which is one approach that leads to the polynomial-time algorithm for linear programming. This approach, while provably efficient in theory, is often seen as less practical as other alternatives such as so-called interior-point methods. This is in contrast to the simple linear programs that are possible for certain classes of graphical games [Kakade et al., 2003]. Are there simpler and practically effective variants of Papadimitriou’s or Jiang and Leyton-Brown’s algorithms? While the last question is an important open question, we do not address it directly in this paper. Instead, We employ ideas from the literature of learning in games [Fudenberg and Levine, 1999], particularly no-regret algorithms and fictitious play, to propose two specific instances of game-theoretic inspired, practical, and effective heuristics for belief inference in MRFs. One heuristic takes a local approach, and the other takes a global approach. We evaluate our proposed algorithms within the context of the most popular, standard, and state-of-art techniques from the literature in probabilistic graphical models.

This manuscript describes our work, which starts to address some of the questions above, and reports on our progress.

1.1 Overview of the Paper

Section 2 provides preliminary material, introducing basic notation, terminology, and concepts from graphical models and game theory.

Section 3 is the main technical section of the paper. It shows reductions of different problems in belief inference in MRFs as computing equilibria in graphical potential games compactly represented as Gibbs potential games [Ortiz, 2015]. The reductions presented here vary in generality from MAP assignment, marginals, and full-joint estimation to pure-strategy Nash equilibria (PSNE), mixed-strategy Nash equilibria (MSNE), and correlated equilibria (CE), respectively. We briefly discuss a connection between Papadimitriou’s algorithm, as well as Jiang and Leyton-Brown’s, and the work of Jaakkola and Jordan [1997] on variational approximations to the problem of probabilistic inference in MRFs via mean-field mixtures. The paper also includes a discussion on the connections to previous work in computer vision on the problem of relaxation labeling, and work on game-theoretic approaches to (Bayesian) statistical estimation. We then present an alternative approach based on a more global view of the problem, in contrast to the more local approach of the formulations mentioned above. More specifically, we formulate the inference problem using a two-player potential game, inspired by the work on tree reweighed (TRW) message-passing [Wain- wright et al., 2005]. We propose a special type of sequential, “hybrid” standard and stochastic fictitious play algorithm for belief inference.

Section 4 reports on our experimental evaluation. We compare our proposed algorithms to the popular, most commonly used, standard, and easily implementable approximation techniques in use today.

Section 5 discusses future work and suggests new opportunities for other potential research directions, beyond those already discussed in the main technical sections of the paper.

Section 6 concludes the paper with a summary of our contributions.

2 Preliminaries

This section introduces basic notation and concepts in graphical models and game theory used throughout the paper. It also includes brief statements on current state-of-the-art mathematical and computational results in the area.

Basic Notation. Denote by ) an n-dimensional vector and by ) the same vector without component i. Similarly, for every set , denote by ) the (sub-)vector formed from x using only components in S, such that, letting [denote the complement of S, we can denote ) for every i. If are sets, denote by and .

Graph Terminology and Notation. Let G = (V, E) be an undirected graph, with finite set of n vertices or nodes V = {1, . . . , n} and a set of (undirected) edges E. For each node i, let be the set of neighbors of i in G, not including i, and the set including i. A clique C of G is a set of nodes with the property that they are all mutually connected: for all , (; in addition, C is maximal if there is no other node k outside C that is also connected to each node in C, i.e., for all , (for some .

Another useful concept in the context of this paper is that of hypergraphs, which are generalizations of regular graphs. A hypergraph graph G = (V, E) is defined by a set of nodes V and a set of . We can think of the hyperedges as cliques in a regular graph. Indeed, the primal graph of the hypergraph is the graph induced by the node set V and where there is an edge between two nodes if they both belong to the same hyperedge; in other words, the primal graph is the graph induced by taking each hyperedge and forming cliques of nodes in a regular graph.

2.1 Probabilistic Graphical Models

Probabilistic graphical models are an elegant marriage of probability and graph theory that has had tremendous impact in the theory and practice of modern artificial intelligence, machine learning, and statistics. It has permitted effective modeling of large, structured high-dimensional complex systems found in the real world. The language of probabilistic graphical models allows us to capture the structure of complex interactions between individual entities in the system within a single model. The core component of the model is a graph in which each node i corresponds to a random variable and the edges express conditional independence assumptions about those random variables in the probabilistic system.

2.1.1 Markov Random Fields, Gibbs Distributions, and the Hammersley-Clifford Theorem

By definition, a joint probability distribution P is a Markov random field (MRF) with respect to (wrt) an undirected graph G if for all x, for every node i, ) = In that case, the neighbors/variables form the Markov blanket of node/variable .

Also by definition, a joint distribution P is a Gibbs distribution wrt an undirected graph G if it can be expressed as P(X = x) = Φ) for some functions Φindexed by a clique , the set of all (maximal) cliques in G, and mapping every possible value that the random variables associated with the nodes in C can take to a non-negative number.

We say that a joint probability distribution P is positive if it has full support (i.e., P(x) > 0 for all x).

Let P be a positive joint probability distribution. Then, P is an MRF with respect to G if and only if P is a Gibbs distribution with respect to G.

In the context of the theorem, the functions Φare positive, which allows us to define MRFs in terms of over each clique C in the graph. Define the function Ψ(). Let us refer to any function of this form as a Gibbs potential with respect to G. A more familiar expression of an MRF is exp()) = exp(Ψ(x)).

2.1.2 Some Inference-Related Problems in MRFs

One problem of interest in an MRF is to compute a most likely assignment arg max) = arg max)); that is, the most likely outcome with respect to the MRF P. Another problem is to compute the in- ) = exp())) for each variable . A related problem is to com- pute the normalizing constant Z = exp())) (also known as the partition function of the MRF).

Another set of problems concern so called “belief updating.” That is, computing information related to the having observed the outcome of some of the variables, also known as the evidence. For MRFs, this problem is computationally equivalent to that of computing prior marginal probabilities.

2.1.3 Brief Overview of Computational Results in Probabilistic Graphical Models

Both the exact and approximate versions of most inference-related problems in MRFs are in general intractable (e.g., NP-hard), although polynomial-time algorithms do exists for some special cases (see, e.g., Dagum and Luby, 1993, Roth, 1996, Istrail, 2000, Wang et al., 2013, and the references therein). The complexity of exact algorithms is usually characterized by structural properties of the graph, and the typical statement is that running times are polynomial only for graphs with bounded treewidth (see, e.g., Russell and Norvig, 2003 for more information). Several deterministic and randomized approximation approaches exist (see, e.g., Jordan et al., 1999, Jaakkola, 2000, Geman and Ge- man, 1984). An approximation approach of particular interest in this paper is variational inference [Jordan et al., 1999, Jaakkola, 2000]. Roughly speaking, the general idea is to approximate an intractable MRF P by a “closest” probability distribution within a “computationally tractable” class Q: formally, arg maxKL(), where KL() ln is the Kullback-Leibler (KL) divergence between probability distributions P and Q wrt Q. The simplest example in the so called mean-field (MF) approximation, in which Q = {Q | Q(x) = ) for all consists of all possible product distributions. Even if P is an IM, no closed-form solution exists for its mean-field approximation, and the most common computational scheme is based on simple axis parallel optimizations, leading to individual local conditions of optimality and potential local minima: that is, the problem is essentially reduced to finding ) = () such that for all i, we have

) ln ) is the (Shannon) entropy of ran- dom variable .

2.2 Game Theory

Game theory [von Neumann and Morgenstern, 1947] provides a mathematical model of the stable behavior (or outcome) that may result from the interaction of rational individuals. This paper concentrates on noncooperative settings: individuals maximize their own utility, act independently, and do not have (direct) control over the behavior of others.

The concept of equilibrium is central to game theory. Roughly, an equilibrium in a noncooperative game is a point of strategic stance, where no individual player can gain by unilaterally deviating from the equilibrium behavior.

2.2.1 Games and their Representation

Let V = [n] denote a finite set of n players in a game. For each player V , let denote the set of actions or pure strategies that i can play. Let denote the set of denote a joint action, and the individual action of player i in x. Denote by () the joint action of all the players except i, such that ). Let denote the payoff/utility function of player i. If the ’s are finite, then is called the payoff matrix of player i. Games represented this way are called normal- or strategic-form games.

There are a variety of compact representations for large games inspired by probabilistic graphical models in AI and machine learning [La Mura, 2000, Kearns et al., 2001, Koller and Milch, 2003, Leyton-Brown and Tennenholtz, 2003, Jiang and Leyton-Brown, 2008]. The results of this paper are presented in the context of the following generalization of graphical games [Kearns et al., 2001], a simple but powerful model inspired by probabilistic graphical models such as MRFs previously defined by Ortiz [2014].

Definition 1. A graphical multi-hypermatrix game (GMhG) is defined by

• a directed graph G = (V, E) in which there is a node in G for each of the n players in the game (i.e., |V | = n), and the set of directed edges, or arcs, E defines a set of whose action affect the payoff function of i (i.e., j is a neighbor of i if and only if there is an arc from j to i); and

• for each player ,

– a hypergraph where the vertex set is its (inclusive) neighborhood and the hyperedge set is a set of cliques of players , and

– a set : of local-clique payoff (hyper)matrices.

The interpretation of a GMhG is that, for each player i, the local and global : and of i are (implicitly) defined as ) and ), respectively.

Graphical potential games. Graphical potential games are special instances of GMhGs. They play a key role in establishing a stronger connection between probabilistic inference in MRFs and equilibria in games than previously noted. Ortiz [2015] provides a characterization of graphical potential games, and discusses the implication of convergence of certain kinds of “playing” processes in games based on connections to the Gibbs sampler [Geman and Geman, 1984], via the Hammersley-Clifford Theorem [Hammersley and Clifford, 1971, Besag, 1974]. Yu and Berthod [1995] (implicitly) used graphical potential games to establish an equivalence between local maximum-a-posteriori (MAP) inference in Markov random fields and Nash equilibria of the game, a topic revisited in Section 3.1.

2.2.2 Equilibria as Solution Concepts

Equilibria are generally considered the solutions of games. Various notions of equilibria exist. A pure strategy (Nash) equilibrium (PSNE) of a game is a joint action such that for all players i, and for all actions That is, no player can improve its payoff by unilaterally deviating from its prescribed equilibrium