36:[["$","audio",null,{"id":"tts"}],["$","$L3b",null,{"paperID":"1301.2724","publisher":"arxiv","paperJSON":{"title":"Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models","paperID":"1301.2724","avgLineHeight":13.56,"imgScale":4,"sections":[{"heading":"Abstract","paragraphs":[[{"text":"$3c","element":"span"}],[{"text":"Keywords: ","element":"span"},{"text":"expectation consistent inference, expectation propagation, perturbation correction, Wick expansions, Ising model, Gaussian process","element":"span"}]]},{"heading":"1. Introduction","paragraphs":[[{"text":"Expectation Propagation (EP) (Opper and Winther, 2000, Minka, 2001a,b) is part of a rich family of variational methods, which approximate the sums and integrals required for exact probabilistic inference by an optimization problem. Variational methods are perfectly amenable to probabilistic graphical models, as the nature of the optimization problem often allows it to be distributed across a graph. By relying on local computations on a graph, inference in very large probabilistic models becomes feasible.","element":"span"}],[{"text":"$3d","element":"span"}],[{"text":"The approach outlined here is by no means unique in correcting the approximation, as is evinced by cluster-based expansions (Paquet et al., 2009), marginal corrections for EP (Cseke and Heskes, 2011) and the Laplace approximation (Rue et al., 2009), and corrections to Loopy Belief Propagation (Chertkov and Chernyak, 2006, Sudderth et al., 2008, Welling et al., 2012).","element":"span"}],[{"text":"1.1 Overview","element":"span"}],[{"text":"EP is introduced in a general way in Section 3, making it clear how various degrees of complexity can be included in its approximating structure. The partition function will be used throughout the paper to explain the necessary machinery for correcting any moments of interest. In the experiments, corrections to the marginal and predictive means and variances are also shown, although the technical details for correcting moments beyond the partition function are relegated to Appendix D. The Ising model, which is cast as a Gaussian latent variable model in Section 2, will furthermore be used as a running example throughout the paper.","element":"span"}],[{"text":"The key to obtaining a correction lies in isolating the “intractable quantity” from the “tractable part” (or EP solution) in the true problem. This is done by considering the cumulants of both: as EP locally matches lower-order cumulants like means and variances, the “intractable part” exists as an expression over the higher-order cumulants which are neglected by EP. This process is outlined in Section 4, which concludes with two useful results: a shift of the “intractable part” to be an average over complex Gaussian variables with ","element":"span"},{"text":"zero ","element":"span"},{"text":"diagonal relation matrix, and Wick’s theorem, which allows us to evaluate the expectations of polynomials under centered Gaussian measures. As a last stage, the “intractable part” is expanded in Sections 5 and 7 to obtain corrections to various orders. In Section 6, we provide a theoretical analysis of the radius of convergence of these expansions.","element":"span"}],[{"text":"Experimental evidence is presented in Section 8 on Gaussian process (GP) classification and (non-Gaussian) GP regression models. An insightful counterexample where EP diverges under increasing data, is also presented. Ising models are examined in Section 9.","element":"span"}],[{"text":"Numerous additional examples, derivations, and material are provided in the appendices. Details on different EP approximations can be found in Appendix A, while corrections to tree-structured approximations are provided in Appendix B. In Appendix C we analytically show that the correction to a tractable example is zero. The main body of the paper deals with corrections to the partition function, while corrections to marginal moments are left to Appendix D. Finally, useful calculations of certain cumulants appear in Appendix E.","element":"span"}]]},{"heading":"2. Gaussian Latent Variable Models","paragraphs":[[{"text":"Let ","element":"span"},{"style":{"height":17.6},"width":307.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-0.png","element":"img","alt":" x = (x1, . . . , xN","inline":true},{"text":") be an unobserved random variable with an intractable distribution ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). In the Gaussian latent variable model (GLVM) considered in this paper, terms ","element":"span"},{"style":{"height":17.6},"width":119.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-1.png","element":"img","alt":" tn(xn)","inline":true,"padRight":true},{"text":"are combined over a quadratic exponential ","element":"span"},{"style":{"height":17.6},"width":243.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-2.png","element":"img","alt":" f0(x) to give","inline":true}],[{"style":{"width":"64%"},"width":1107,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-3.png","element":"img"}],[{"text":"with partition function (normalizer)","element":"span"}],[{"style":{"width":"30%"},"width":519,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-4.png","element":"img"}],[{"text":"This model encapsulates many important methods used in statistical inference. ","element":"span"},{"text":"As an example, ","element":"span"},{"style":{"height":16.4},"width":38.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-5.png","element":"img","alt":" f0","inline":true,"padRight":true},{"text":"can encode the covariance matrix of a Gaussian process (GP) prior on latent function observations ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-6.png","element":"img","alt":" xn","inline":true},{"text":". In the case of GP classification with a class label ","element":"span"},{"style":{"height":17.6},"width":272.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-7.png","element":"img","alt":" yn ∈ {−1, +1}","inline":true,"padRight":true},{"text":"on a latent function evaluation ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-8.png","element":"img","alt":" xn","inline":true},{"text":", the terms are typically probit link functions, for example","element":"span"}],[{"style":{"width":"69%"},"width":1202,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-9.png","element":"img"}],[{"text":"The probit function is the standard cumulative Gaussian density Φ(","element":"span"},{"style":{"height":20.77},"width":435.36,"height":51.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-10.png","element":"img","alt":"x) =� x−∞ N(z; 0, 1) dz.","inline":true,"padRight":true},{"text":"In this example, the partition function is not analytically tractable but for the one-dimensional case ","element":"span"},{"text":"N ","element":"span"},{"text":"= 1.","element":"span"}],[{"text":"An Ising model can be constructed by letting the terms ","element":"span"},{"style":{"height":17.6},"width":564.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-11.png","element":"img","alt":" tn restrict xn to ±1 (through","inline":true,"padRight":true},{"text":"Dirac delta functions). By introducing the symmetric coupling matrix ","element":"span"},{"style":{"height":12.8},"width":344.56,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-12.png","element":"img","alt":" J and field θ into","inline":true},{"style":{"height":16.4},"width":38.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-13.png","element":"img","alt":"f0","inline":true},{"text":", an Ising model can be written as","element":"span"}],[{"style":{"width":"85%"},"width":1480,"height":130,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-14.png","element":"img"}],[{"text":"In the Ising model, the partition function ","element":"span"},{"text":"Z ","element":"span"},{"text":"is intractable, as it sums ","element":"span"},{"style":{"height":19.53},"width":398.84,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-15.png","element":"img","alt":" f0(x) over 2N binary","inline":true,"padRight":true},{"text":"values of ","element":"span"},{"text":"x","element":"span"},{"text":". In the variational approaches, the intractability is addressed by allowing approximations to ","element":"span"},{"text":"Z ","element":"span"},{"text":"and other marginal distributions, decreasing the computational complexity from being exponential to polynomial in ","element":"span"},{"text":"N","element":"span"},{"text":", which is typically cubic for EP.","element":"span"}]]},{"heading":"3. Expectation Propagation","paragraphs":[[{"text":"An approximation to ","element":"span"},{"text":"Z ","element":"span"},{"text":"can be made by allowing ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") in Equation (1) to factorize into a product of ","element":"span"},{"style":{"height":16.4},"width":187.44,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-16.png","element":"img","alt":" factors fa","inline":true},{"text":". This factorization is not unique, and the structure of the factorization of ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") defines the complexity of the resulting approximation, resulting in different structures in the approximating distribution. Where GLVMs are concerned, a natural and computationally convenient choice is to use Gaussian factors ","element":"span"},{"style":{"height":12},"width":38.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/2-17.png","element":"img","alt":" ga","inline":true},{"text":", and as such, the approximating distribution ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") in this paper will be Gaussian. Appendix A summarizes a number of factorizations for Gaussian approximations.","element":"span"}],[{"text":"The tractability of the resulting inference method imposes a pragmatic constraint on the choice of factorization; in the extreme case ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") could be chosen as a single factor and inference would be exact. For the model in Equation (1), a three-term product may be factorized as (","element":"span"},{"style":{"height":17.6},"width":170.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-0.png","element":"img","alt":"t1)(t2)(t3","inline":true},{"text":"), which gives the typical GP setup. When a division is introduced and the term product factorizes as (","element":"span"},{"style":{"height":17.6},"width":261.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-1.png","element":"img","alt":"t1t2)(t2t3)/(t2","inline":true},{"text":"), the resulting free energy will be that of the tree-structured EC approximation (Opper and Winther, 2005). To therefore allow for regrouping, combining, splitting, and dividing terms, a power ","element":"span"},{"style":{"height":14.69},"width":54,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-2.png","element":"img","alt":" Da","inline":true,"padRight":true},{"text":"is associated with each ","element":"span"},{"style":{"height":16.4},"width":39.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-3.png","element":"img","alt":"fa","inline":true},{"text":", such that","element":"span"}],[{"style":{"width":"61%"},"width":1069,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-4.png","element":"img"}],[{"text":"with intractable normalization (or partition function) ","element":"span"},{"style":{"height":20.72},"width":434.12,"height":51.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-5.png","element":"img","alt":" Z =� �a fa(x)Da dx.1","inline":true,"padRight":true},{"text":"Appendix A ","element":"span"},{"text":"shows how the introduction of ","element":"span"},{"style":{"height":14.69},"width":54,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-6.png","element":"img","alt":" Da","inline":true,"padRight":true},{"text":"lends itself to a clear definition of tree-structured and more complex approximations.","element":"span"}],[{"text":"To define an approximation to ","element":"span"},{"style":{"height":14.8},"width":211.44,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-7.png","element":"img","alt":" p, terms ga","inline":true},{"text":", which typically take an exponential family form, are chosen such that","element":"span"}],[{"style":{"width":"62%"},"width":1073,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-8.png","element":"img"}],[{"text":"has the same structure as ","element":"span"},{"text":"p","element":"span"},{"text":"’s factorization. Although not shown explicitly, ","element":"span"},{"style":{"height":16.4},"width":283.52,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-9.png","element":"img","alt":" fa and ga have","inline":true,"padRight":true},{"text":"a dependence on the ","element":"span"},{"text":"same ","element":"span"},{"text":"subset of variables ","element":"span"},{"style":{"height":10.69},"width":44.4,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-10.png","element":"img","alt":" xa","inline":true},{"text":". The optimal parameters of the ","element":"span"},{"style":{"height":14.8},"width":145.48,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-11.png","element":"img","alt":" ga-term","inline":true,"padRight":true},{"text":"approximations are found through a set of auxiliary ","element":"span"},{"text":"tilted ","element":"span"},{"text":"distributions, defined by","element":"span"}],[{"style":{"width":"64%"},"width":1123,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-12.png","element":"img"}],[{"text":"Here a ","element":"span"},{"text":"single ","element":"span"},{"text":"approximating term ","element":"span"},{"style":{"height":12},"width":38.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-13.png","element":"img","alt":" ga","inline":true,"padRight":true},{"text":"is replaced by an original term ","element":"span"},{"style":{"height":16.4},"width":39.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-14.png","element":"img","alt":" fa","inline":true},{"text":". Assuming that this replacement leaves ","element":"span"},{"style":{"height":11.6},"width":37.68,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-15.png","element":"img","alt":" qa","inline":true,"padRight":true},{"text":"still tractable, the parameters in ","element":"span"},{"style":{"height":12},"width":38.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-16.png","element":"img","alt":" ga","inline":true,"padRight":true},{"text":"are determined by the condition that ","element":"span"},{"style":{"height":17.6},"width":327.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-17.png","element":"img","alt":" q(x) and all qa(x","inline":true},{"text":") should be made as similar as possible. This is usually achieved by requiring that these distributions share a set of generalised moments which usually coincide with the sufficient statistics of the exponential family. For example with sufficient statistics ","element":"span"},{"style":{"height":17.6},"width":69.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-18.png","element":"img","alt":"φ(x","inline":true},{"text":") we require that","element":"span"}],[{"style":{"width":"66%"},"width":1146,"height":53,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-19.png","element":"img"}],[{"text":"Note that those factors ","element":"span"},{"style":{"height":17.6},"width":172.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-20.png","element":"img","alt":" fa in p(x","inline":true},{"text":") which are already in the exponential family, such as the Gaussian terms in examples above, can trivially be solved for by setting ","element":"span"},{"style":{"height":16.4},"width":278.72,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-21.png","element":"img","alt":" ga = fa. The","inline":true,"padRight":true},{"text":"partition function associated with this approximation is","element":"span"}],[{"style":{"width":"60%"},"width":1045,"height":92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-22.png","element":"img"}],[{"text":"Appendix A.2 shows that the moment-matching conditions must hold at a stationary point of log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-23.png","element":"img","alt":" ZEP","inline":true},{"text":". The EP algorithm iteratively updates the ","element":"span"},{"style":{"height":12},"width":38.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-24.png","element":"img","alt":" ga","inline":true},{"text":"-terms by enforcing ","element":"span"},{"text":"q ","element":"span"},{"text":"to share moments with each of the tilted distributions ","element":"span"},{"style":{"height":11.6},"width":37.68,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-25.png","element":"img","alt":" qa","inline":true},{"text":"; on reaching a fixed point all moments match according to Equation (7) (Minka, 2001a,b). Although ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-26.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"is defined in the terminology of EP, other algorithms may be required to solve for the fixed point, and ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/3-27.png","element":"img","alt":" ZEP","inline":true},{"text":", as a free energy, can be derived from the saddle point of a set of self-consistent (moment-matching) equations (Opper and Winther, 2005, van Gerven et al., 2010, Seeger and Nickisch, 2010). We next make EP concrete by applying it to the Ising model, which will serve as a running example in the paper. The section is finally concluded with a discussion of the interpretation of EP.","element":"span"}],[{"text":"3.1 EP for Ising Models","element":"span"}],[{"text":"The Ising model in Equation (3) will be used as a running example throughout this paper. To make the technical developments more concrete, we will consider both the ","element":"span"},{"text":"N","element":"span"},{"text":"-variate and bivariate cases. The bivariate case can be solved analytically, and thus allows for a direct comparison to be made between the exact and approximate solutions.","element":"span"}],[{"text":"We use the factorized approximation as a running example, dividing ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") in Equation (3) into ","element":"span"},{"text":"N ","element":"span"},{"text":"+ 1 factors with ","element":"span"},{"style":{"height":21.27},"width":1183.12,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-0.png","element":"img","alt":" f0(x) = exp{12xT Jx + θT x} and fn(xn) = tn(xn) = 12δ(xn +","inline":true,"padRight":true},{"text":"1) + ","element":"span"},{"style":{"height":21.27},"width":551.64,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-1.png","element":"img","alt":"12δ(xn − 1), for n = 1, . . . , N","inline":true,"padRight":true},{"text":"(see Appendix A for generalizations). We consider the ","element":"span"},{"text":"Gaussian exponential family such that ","element":"span"},{"style":{"height":21.27},"width":975.84,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-2.png","element":"img","alt":" gn(xn) = exp{λn1xn − 12λn2x2n} and g0(x) = f0(x).","inline":true,"padRight":true},{"text":"The approximating distribution from Equation (5), ","element":"span"},{"style":{"height":22.05},"width":468.36,"height":55.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-3.png","element":"img","alt":" q(x) ∝ f0(x) �Nn=1 gn(xn","inline":true},{"text":"), is thus a ","element":"span"},{"text":"full ","element":"span"},{"text":"multivariate Gaussian density, which we write as ","element":"span"},{"style":{"height":18.4},"width":359.52,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-4.png","element":"img","alt":" q(x) = N(x; µ, Σ).","inline":true}],[{"text":"3.1.1 Moment Matching","element":"span"}],[{"text":"The moment matching condition in Equation (7) involves only the mean and variance if ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") fully factorizes according to ","element":"span"},{"style":{"height":17.6},"width":250.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-5.png","element":"img","alt":" p(x)’s terms.","inline":true,"padRight":true},{"text":"We therefore only need to match the mean and variances of marginals of ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") and the tilted distribution ","element":"span"},{"style":{"height":17.6},"width":85.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-6.png","element":"img","alt":" qn(x","inline":true},{"text":") in Equation (6). The tilted distribution may be decomposed into a Gaussian and a discrete part as ","element":"span"},{"style":{"height":20.05},"width":466.92,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-7.png","element":"img","alt":"qn(x) = qn(x\\n|xn)qn(xn","inline":true},{"text":"), where the vector ","element":"span"},{"style":{"height":14.85},"width":64.2,"height":37.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-8.png","element":"img","alt":" x\\n","inline":true,"padRight":true},{"text":"consists of all variables apart from ","element":"span"},{"style":{"height":14.69},"width":139.52,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-9.png","element":"img","alt":" xn. We","inline":true,"padRight":true},{"text":"may marginalize out ","element":"span"},{"style":{"height":20.05},"width":381.48,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-10.png","element":"img","alt":" x\\n and write qn(xn","inline":true},{"text":") in terms of two factors:","element":"span"}],[{"style":{"width":"80%"},"width":1388,"height":163,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-11.png","element":"img"}],[{"text":"where we dropped the dependency of ","element":"span"},{"style":{"height":16.4},"width":248.24,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-12.png","element":"img","alt":" γ and Λ on n","inline":true,"padRight":true},{"text":"for notational simplicity. Through some manipulation, the tilted distribution is equivalent to","element":"span"}],[{"style":{"width":"95%"},"width":1644,"height":94,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-13.png","element":"img"}],[{"text":"This discrete distribution has mean ","element":"span"},{"style":{"height":10.69},"width":59.4,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-14.png","element":"img","alt":" mn","inline":true,"padRight":true},{"text":"and variance 1 ","element":"span"},{"style":{"height":19.15},"width":103.08,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-15.png","element":"img","alt":" − m2n","inline":true},{"text":". By adapting the parameters ","element":"span"},{"text":"of ","element":"span"},{"style":{"height":17.6},"width":105.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-16.png","element":"img","alt":" gn(xn","inline":true},{"text":") using for example the EP algorithm, we aim to match the mean and variance of the marginal ","element":"span"},{"style":{"height":17.6},"width":248.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-17.png","element":"img","alt":" q(xn) (of q(x","inline":true},{"text":")) to the mean and variance of ","element":"span"},{"style":{"height":17.6},"width":105,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-18.png","element":"img","alt":" qn(xn","inline":true},{"text":"). The reader is referred to Section 9 for benchmarked results for the Ising model.","element":"span"}],[{"text":"3.1.2 Analytic Bivariate Case","element":"span"}],[{"text":"Here we shall compare the exact result with EP and the correction for the simplest non-trivial model, the ","element":"span"},{"text":"N ","element":"span"},{"text":"= 2 Ising model with no external field","element":"span"}],[{"style":{"width":"70%"},"width":1225,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/4-19.png","element":"img"}],[{"text":"In order to solve the moment matching conditions we observe that the mean values must be zero because the distribution is symmetric around zero. Likewise the linear term in the approximating factors disappears and we can write ","element":"span"},{"style":{"height":19.15},"width":704.56,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-0.png","element":"img","alt":" gn(xn) = exp{−λx2n/2} and q(x) =","inline":true}],[{"style":{"width":"99%"},"width":1727,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-1.png","element":"img"}],[{"style":{"height":14.69},"width":156.84,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-2.png","element":"img","alt":"1 = Σnn","inline":true},{"text":", turns into a second order equation with solution ","element":"span"},{"style":{"height":32.4},"width":597.12,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-3.png","element":"img","alt":" λ = 12�J2 +√J4 + 4�. We can","inline":true,"padRight":true},{"text":"now insert this solution into the expression for the EP partition function in Equation (8). By expanding the result to the second order in ","element":"span"},{"style":{"height":15.14},"width":45.32,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-4.png","element":"img","alt":" J2","inline":true},{"text":", we find that","element":"span"}],[{"style":{"width":"83%"},"width":1446,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-5.png","element":"img"}],[{"text":"Comparing with the exact expression","element":"span"}],[{"style":{"width":"39%"},"width":689,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-6.png","element":"img"}],[{"text":"we see that EP gives the correct ","element":"span"},{"style":{"height":15.14},"width":45.32,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-7.png","element":"img","alt":" J2 ","inline":true,"padRight":true},{"text":"coefficient, but the ","element":"span"},{"style":{"height":15.14},"width":45.32,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-8.png","element":"img","alt":" J4 ","inline":true,"padRight":true},{"text":"coefficient comes out wrong. In Section 4 we investigate how cumulant corrections can correct for this discrepancy.","element":"span"}],[{"text":"3.2 Two Explanations Why Gaussian EP is Often Very Accurate","element":"span"}],[{"text":"EP, as introduced above, is an algorithm. The justification for the algorithm put forward by Minka and adopted by others (see for example recent textbooks by Bishop 2006, Barber 2012 and Murphy 2012) is useful for explaining the steps in the algorithm but may be misleading in order to explain why EP often provides excellent accuracy in estimation of marginal moments and ","element":"span"},{"text":"Z","element":"span"},{"text":".","element":"span"}],[{"text":"The general justification for EP (Minka, 2001a,b) is based upon a minimization of Kullback-Leiber (KL) divergences. Ideally, one would determine the approximating distribution ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") as the minimizer of KL(","element":"span"},{"style":{"height":17.6},"width":63.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-9.png","element":"img","alt":"p∥q","inline":true},{"text":") in an exponential family of (in our case, Gaussian) densities. Since this is not possible—it would require the computation of exact moments— we instead iteratively minimize “local” KL-divergences KL(","element":"span"},{"style":{"height":17.6},"width":81.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-10.png","element":"img","alt":"qa∥q","inline":true},{"text":"), between the tilted distribution ","element":"span"},{"style":{"height":16},"width":161.12,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-11.png","element":"img","alt":" qa and q","inline":true},{"text":", with respect to ","element":"span"},{"style":{"height":12},"width":38.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-12.png","element":"img","alt":" ga","inline":true,"padRight":true},{"text":"(appearing in ","element":"span"},{"text":"q","element":"span"},{"text":"). This leads to the moment matching conditions in Equation (7). The argument for this procedure is essentially that this will ensure that the approximation ","element":"span"},{"text":"q ","element":"span"},{"text":"will capture high density regions of the intractable posterior ","element":"span"},{"text":"p","element":"span"},{"text":". Obviously, this argument cannot be applied to Ising models because the exact and approximate distributions are very different, with the former being discrete due to the Dirac ","element":"span"},{"style":{"height":12.8},"width":20,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-13.png","element":"img","alt":" δ","inline":true},{"text":"-functions that constrain ","element":"span"},{"style":{"height":14.29},"width":150.64,"height":35.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-14.png","element":"img","alt":" xn = ±","inline":true},{"text":"1 to be binary variables. Even though the optimization still implies moment matching, this discrete-continuous discrepancy makes local KL-divergences KL(","element":"span"},{"style":{"height":17.6},"width":260.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/5-15.png","element":"img","alt":"qa∥q) infinite!","inline":true}],[{"text":"In order to justify the usefulness of EP for Ising models we therefore need an alternative argument. Our argument is entirely restricted to ","element":"span"},{"text":"Gaussian ","element":"span"},{"text":"EP for our extended definition of GLVMs and do not extend to approximations with other exponential families. In the following, we will discuss these assumptions in inference approximations that preceded the formulation of EP, in order to provide a possibly more relevant justification of the method. Although this justification is not strictly necessary for practically using EP nor corrections to EP, it nevertheless provides a good starting point for understanding both.","element":"span"}],[{"text":"The argument goes back to the mathematical analysis of the Sherrington-Kirkpatrick (SK) model for a disordered magnet (a so-called spin glass) (Sherrington and Kirckpatrick, 1975). For this Ising model, the couplings ","element":"span"},{"text":"J ","element":"span"},{"text":"are drawn at random from a Gaussian distribution. An important contribution in the context of inference for this model (the computations of partition functions and average magnetizations) was the work of Thouless et al. (1977) who derived ","element":"span"},{"text":"self-consistency equations ","element":"span"},{"text":"which are assumed to be valid with a probability (with respect to the drawing of random couplings) approaching one as the number of variables ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-0.png","element":"img","alt":" xn","inline":true,"padRight":true},{"text":"grows to infinity. These so-called Thouless-Anderson-Palmer (TAP) equations are closely related to the EP moment matching conditions of Equation (7), but they differ by partly relying on the specific assumption of the randomness of the couplings. ","element":"span"},{"text":"Selfconsistency equations equivalent to the EP moment matching conditions which avoided such assumptions on the statistics of the random couplings were first derived by Opper and Winther (2000) by using a so-called cavity argument (M´ezard et al., 1987). A new important contribution of Minka (2001a) was to provide an efficient algorithmic recipe for solving these equations.","element":"span"}],[{"text":"We will now sketch the main idea of the cavity argument for the GLVM. Let ","element":"span"},{"style":{"height":19.86},"width":147.96,"height":49.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-1.png","element":"img","alt":" x\\n (“x","inline":true,"padRight":true},{"text":"without ","element":"span"},{"text":"n","element":"span"},{"text":"”) denote the complement to ","element":"span"},{"style":{"height":19.25},"width":452.04,"height":48.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-2.png","element":"img","alt":" xn, that is x = x\\n ∪ xn","inline":true},{"text":". Without loss of generality we will take the quadratic exponential term to be written as ","element":"span"},{"style":{"height":19.54},"width":570.72,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-3.png","element":"img","alt":" f0(x) ∝ exp(−xT Jx/2). With","inline":true,"padRight":true},{"text":"similar definitions of ","element":"span"},{"style":{"height":18.85},"width":63.72,"height":47.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-4.png","element":"img","alt":" J\\n","inline":true},{"text":", the exact marginal distribution of ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-5.png","element":"img","alt":" xn","inline":true,"padRight":true},{"text":"may be written as","element":"span"}],[{"style":{"width":"96%"},"width":1663,"height":301,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-6.png","element":"img"}],[{"text":"It is clear that ","element":"span"},{"style":{"height":17.6},"width":107.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-7.png","element":"img","alt":" pn(xn","inline":true},{"text":") depends entirely on the statistics of the random variable ","element":"span"},{"style":{"height":15.09},"width":105.52,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-8.png","element":"img","alt":" hn ≡","inline":true},{"style":{"height":21.04},"width":281.76,"height":52.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-9.png","element":"img","alt":"�n′̸=n Jnn′xn′.","inline":true,"padRight":true},{"text":"This is the total ","element":"span"},{"style":{"height":16.4},"width":108.2,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-10.png","element":"img","alt":" ‘field’","inline":true,"padRight":true},{"text":"created by all other ‘magnetic moments’ ","element":"span"},{"style":{"height":14.67},"width":114.24,"height":36.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-11.png","element":"img","alt":" xn′ in","inline":true,"padRight":true},{"text":"the ‘cavity’ opened once ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-12.png","element":"img","alt":" xn","inline":true,"padRight":true},{"text":"has been removed from the system. In the context of densely connected models with weak couplings, we can appeal to the central limit theorem","element":"span"},{"style":{"height":15.13},"width":78.16,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-13.png","element":"img","alt":"2 to","inline":true,"padRight":true},{"text":"approximate ","element":"span"},{"style":{"height":15.09},"width":45.96,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-14.png","element":"img","alt":" hn","inline":true,"padRight":true},{"text":"by a Gaussian random variable with mean ","element":"span"},{"style":{"height":11.6},"width":43.56,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-15.png","element":"img","alt":" γn","inline":true,"padRight":true},{"text":"and variance ","element":"span"},{"style":{"height":15.09},"width":208.32,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-16.png","element":"img","alt":" Vn. When","inline":true,"padRight":true},{"text":"looking at the influence of the remaining variables ","element":"span"},{"style":{"height":14.66},"width":197.16,"height":36.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-17.png","element":"img","alt":" x\\n on xn","inline":true},{"text":", the non-Gaussian details of their distribution have been washed out in the marginalization. ","element":"span"},{"text":"Integrating out the Gaussian random variable ","element":"span"},{"style":{"height":15.09},"width":45.96,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-18.png","element":"img","alt":" hn","inline":true,"padRight":true},{"text":"gives the Gaussian cavity field approximation to the marginal distribution:","element":"span"}],[{"style":{"width":"63%"},"width":1103,"height":218,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-19.png","element":"img"}],[{"text":"This is precisely of the form of the marginal tilted distribution ","element":"span"},{"style":{"height":17.6},"width":105,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/6-20.png","element":"img","alt":" qn(xn","inline":true},{"text":") of Equation (9) as given by Gaussian EP. In the cavity formulation, ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") is simply a ","element":"span"},{"text":"placeholder ","element":"span"},{"text":"for the sufficient statistics of the individual Gaussian cavity fields. So we may observe cases, with the Ising model or bounded support factors being the prime examples, where EP gives essentially correct results for the marginal distributions of the ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-0.png","element":"img","alt":" xn","inline":true,"padRight":true},{"text":"and of the partition function ","element":"span"},{"text":"Z","element":"span"},{"text":", while ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") gives a poor or even meaningless (in the sense of KL divergences) approximation to the multivariate posterior. ","element":"span"},{"text":"Note however, that the entire ","element":"span"},{"text":"covariance ","element":"span"},{"style":{"height":15.09},"width":312.36,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-1.png","element":"img","alt":"matrix of the xn","inline":true,"padRight":true},{"text":"can be computed simply from a derivative of the free energy (Opper and Winther, 2005) resulting in an approximation of this covariance by that of ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). ","element":"span"},{"text":"This may indicate that a good EP approximation of the free energy may also result in a good approximation to the full covariance. The near exactness of EP (as compared to exhaustive summation) in Section 9 therefore shows the central limit theorem at work. Conversely, mediocre accuracy or even failure of Gaussian EP, as also observed in our simulations in Sections 8.3 and 9, may be attributed to breakdown of the Gaussian cavity field assumption. Exact inference on the strongest couplings as considered for the Ising model in Section 9 is one way to alleviate the shortcoming of the Gaussian cavity field assumption.","element":"span"}]]},{"heading":"4. Corrections to EP","paragraphs":[[{"text":"The ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-2.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"approximation can be corrected in a principled approach, which traces the following outline:","element":"span"}],[{"text":"1. The exact partition function ","element":"span"},{"text":"Z ","element":"span"},{"text":"is re-written in terms of ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-3.png","element":"img","alt":" ZEP","inline":true},{"text":", scaled by a correction factor ","element":"span"},{"style":{"height":17.6},"width":230.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-4.png","element":"img","alt":" R = Z/ZEP","inline":true},{"text":". This correction factor ","element":"span"},{"text":"R ","element":"span"},{"text":"encapsulates the intractability in the model, and contains a “local marginal” contribution by each ","element":"span"},{"style":{"height":16.4},"width":39.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-5.png","element":"img","alt":" fa","inline":true,"padRight":true},{"text":"(see Section 4.1).","element":"span"}],[{"text":"2. A “handle” on ","element":"span"},{"text":"R ","element":"span"},{"text":"is obtained by writing it in terms of the cumulants (to be defined in Section 4.2) of ","element":"span"},{"style":{"height":17.6},"width":274.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-6.png","element":"img","alt":" q(x) and qa(x","inline":true},{"text":") from Equations (5) and (6). ","element":"span"},{"text":"As ","element":"span"},{"style":{"height":17.6},"width":290.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-7.png","element":"img","alt":" qa(x) and q(x)","inline":true,"padRight":true},{"text":"share their two first cumulants, the mean and covariance from the moment matching condition in Equation (7), a cumulant expansion of ","element":"span"},{"text":"R ","element":"span"},{"text":"will be in terms of ","element":"span"},{"text":"higher-order ","element":"span"},{"text":"cumulants (see Section 4.2).","element":"span"}],[{"text":"3. ","element":"span"},{"text":"R","element":"span"},{"text":", defined in terms of cumulant differences, is written as a complex Gaussian average. Each factor ","element":"span"},{"style":{"height":16.4},"width":39.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-8.png","element":"img","alt":" fa","inline":true,"padRight":true},{"text":"contributes a complex random variable ","element":"span"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-9.png","element":"img","alt":" ka","inline":true,"padRight":true},{"text":"in this average (see Section 4.3).","element":"span"}],[{"text":"4. Finally, the cumulant differences are used as “small quantities” in a Taylor series expansion of ","element":"span"},{"text":"R","element":"span"},{"text":", and the leading terms are kept (see Sections 5 and 7).","element":"span"}],[{"text":"The series expansion is in terms of a complex expectation with a ","element":"span"},{"text":"zero ","element":"span"},{"text":"“self-relation” matrix, and this has two important consequences. Firstly, it causes all first order terms in the Taylor expansion to disappear, showing that ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/7-10.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"is correct to first order. Secondly, due to Wick’s theorem (introduced in Section 4.4), these zeros will contract the expansion by making many other terms vanish.","element":"span"}],[{"text":"The strategy that is presented here can be re-used to correct other quantities of interest, like marginal distributions or the predictive density of new data when ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") is a Bayesian probabilistic model. These corrections are outlined in Appendix D.","element":"span"}],[{"text":"4.1 Exact Expression for Correction","element":"span"}],[{"text":"We define the (intractable) correction ","element":"span"},{"style":{"height":14.69},"width":307.16,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-0.png","element":"img","alt":" R as Z = RZEP","inline":true},{"text":". We can derive a useful expression for ","element":"span"},{"text":"R ","element":"span"},{"text":"in a few steps as follows: First we solve for ","element":"span"},{"style":{"height":16.4},"width":39.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-1.png","element":"img","alt":" fa","inline":true,"padRight":true},{"text":"in Equation (6), and substitute this into Equation (4) to obtain","element":"span"}],[{"style":{"width":"86%"},"width":1488,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-2.png","element":"img"}],[{"text":"We introduce ","element":"span"},{"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":")","element":"span"}],[{"style":{"width":"25%"},"width":432,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-3.png","element":"img"}],[{"text":"to derive the expression for the correction ","element":"span"},{"style":{"height":17.6},"width":216.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-4.png","element":"img","alt":" R = Z/ZEP","inline":true,"padRight":true},{"text":"by integrating Equation (11):","element":"span"}],[{"style":{"width":"99%"},"width":1723,"height":363,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-5.png","element":"img"}],[{"text":"Corrections to the marginal and predictive densities of ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") can be computed from this formulation. This expression will become especially useful because the terms in ","element":"span"},{"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") turn out to be “local”, that is, they only depend on the marginals of the variables associated with factor ","element":"span"},{"style":{"height":17.6},"width":218.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-6.png","element":"img","alt":" a. Let fa(x","inline":true},{"text":") depend on the subset ","element":"span"},{"style":{"height":20.05},"width":834.08,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-7.png","element":"img","alt":" xa of x, and let x\\a (“x without a”) denote","inline":true,"padRight":true},{"text":"the remaining variables. The distributions in Equations (5) and (6) differ only with respect to their marginals on ","element":"span"},{"style":{"height":17.6},"width":374.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-8.png","element":"img","alt":" xa, qa(xa) and q(xa","inline":true},{"text":"), and therefore","element":"span"}],[{"style":{"width":"39%"},"width":681,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-9.png","element":"img"}],[{"text":"Now we can rewrite ","element":"span"},{"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") in terms of marginals:","element":"span"}],[{"style":{"width":"63%"},"width":1104,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-10.png","element":"img"}],[{"text":"The key quantity, then, is ","element":"span"},{"text":"F","element":"span"},{"text":", after which the key operation is to compute its expected value. The rest of this section is devoted to the task of obtaining a “handle” on ","element":"span"},{"text":"F","element":"span"},{"text":".","element":"span"}],[{"text":"4.2 Characteristic Functions and Cumulants","element":"span"}],[{"text":"The distributions present in each of the ratios in ","element":"span"},{"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") in Equation (14) share their first two cumulants, mean and covariance. Cumulants and cumulant differences are formally defined in the next paragraph. This simple observation has a crucial consequence: As the ","element":"span"},{"style":{"height":17.6},"width":82.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/8-11.png","element":"img","alt":"q(xa","inline":true},{"text":")’s are Gaussian and do not contain any higher order cumulants (three and above), ","element":"span"},{"text":"F ","element":"span"},{"text":"can be expressed in terms of the higher cumulants of the ","element":"span"},{"style":{"height":17.6},"width":299.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-0.png","element":"img","alt":" marginals qa(xa","inline":true},{"text":"). When the term-product approximation is fully factorized, these are simply cumulants of ","element":"span"},{"text":"one","element":"span"},{"text":"-dimensional distributions.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":14.69},"width":53.04,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-1.png","element":"img","alt":" Na","inline":true,"padRight":true},{"text":"be the number of variables in subvector ","element":"span"},{"style":{"height":10.69},"width":58.56,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-2.png","element":"img","alt":" xa.","inline":true,"padRight":true},{"text":"In the examples presented in this work, ","element":"span"},{"style":{"height":14.69},"width":53.04,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-3.png","element":"img","alt":" Na","inline":true,"padRight":true},{"text":"is one or two. ","element":"span"},{"text":"Furthermore, let ","element":"span"},{"style":{"height":15.09},"width":256.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-4.png","element":"img","alt":" ka be an Na","inline":true},{"text":"-dimensional vector ","element":"span"},{"style":{"height":15.09},"width":105.04,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-5.png","element":"img","alt":" ka =","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":17.6},"width":243.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-6.png","element":"img","alt":"k1, . . . , kNa)a","inline":true},{"text":". The characteristic function of ","element":"span"},{"style":{"height":15.2},"width":82.76,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-7.png","element":"img","alt":" qa is","inline":true}],[{"style":{"width":"73%"},"width":1275,"height":97,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-8.png","element":"img"}],[{"text":"and is obtained through the Fourier transform of the density. Inversely,","element":"span"}],[{"style":{"width":"71%"},"width":1233,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-9.png","element":"img"}],[{"text":"The cumulants ","element":"span"},{"style":{"height":16},"width":157.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-10.png","element":"img","alt":" cαa of qa","inline":true,"padRight":true},{"text":"are the coefficients that appear in the Taylor expansion of log ","element":"span"},{"style":{"height":17.6},"width":127.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-11.png","element":"img","alt":" χa(ka)","inline":true,"padRight":true},{"text":"around the zero vector,","element":"span"}],[{"style":{"width":"43%"},"width":760,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-12.png","element":"img"}],[{"text":"By this definition of ","element":"span"},{"style":{"height":11.09},"width":62.64,"height":27.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-13.png","element":"img","alt":" cαa","inline":true},{"text":", the Taylor expansion of log ","element":"span"},{"style":{"height":17.6},"width":171.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-14.png","element":"img","alt":" χa(ka) is","inline":true}],[{"style":{"width":"34%"},"width":602,"height":133,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-15.png","element":"img"}],[{"text":"Some notation was introduced in the above two equations to facilitate manipulating a multivariate series. The vector ","element":"span"},{"style":{"height":18.29},"width":622.28,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-16.png","element":"img","alt":" α = (α1, . . . , αNa), with αj ∈ N0","inline":true},{"text":", denotes a multi-index on the elements of ","element":"span"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-17.png","element":"img","alt":" ka","inline":true},{"text":". Other notational conventions that employ ","element":"span"},{"style":{"height":18.29},"width":250.68,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-18.png","element":"img","alt":" α (writing kj","inline":true,"padRight":true},{"text":"instead of ","element":"span"},{"style":{"height":18.29},"width":75.56,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-19.png","element":"img","alt":" kaj)","inline":true,"padRight":true},{"text":"are:","element":"span"}],[{"style":{"width":"85%"},"width":1470,"height":125,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-20.png","element":"img"}],[{"text":"For example, when ","element":"span"},{"style":{"height":16.4},"width":1343.64,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-21.png","element":"img","alt":" Na = 2, say for the edge-factors in a spanning tree, the set of multi-","inline":true,"padRight":true},{"text":"indices ","element":"span"},{"style":{"height":17.6},"width":696.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-22.png","element":"img","alt":" α where |α| = 3 are (3, 0), (2, 1), (1,","inline":true,"padRight":true},{"text":"2), and (0","element":"span"},{"text":", ","element":"span"},{"text":"3).","element":"span"}],[{"text":"There are two characteristic functions that come into play in ","element":"span"},{"text":"F","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") and ","element":"span"},{"text":"R ","element":"span"},{"text":"in Equation (13). The first is that of the tilted distribution, log ","element":"span"},{"style":{"height":17.6},"width":108.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-23.png","element":"img","alt":" χa(ka","inline":true},{"text":"), and the other is the characteristic function of the EP marginal ","element":"span"},{"style":{"height":17.6},"width":82.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-24.png","element":"img","alt":" q(xa","inline":true},{"text":"), defined as ","element":"span"},{"style":{"height":25.14},"width":331.36,"height":62.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-25.png","element":"img","alt":" χ(ka) = ⟨eikTa xa⟩q","inline":true},{"text":". By virtue of matching the first two moments, and ","element":"span"},{"style":{"height":17.6},"width":82.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-26.png","element":"img","alt":" q(xa","inline":true},{"text":") being Gaussian with cumulants ","element":"span"},{"style":{"height":12.82},"width":76.8,"height":32.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-27.png","element":"img","alt":" c′αa,","inline":true}],[{"style":{"width":"81%"},"width":1408,"height":258,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/9-28.png","element":"img"}],[{"text":"contains the remaining higher-order cumulants where the tilted and approximate distributions ","element":"span"},{"text":"differ","element":"span"},{"text":". All our subsequent derivations rest upon moment matching being attained. This especially means that one cannot use the derived corrections if EP has not converged.","element":"span"}],[{"text":"4.2.1 Ising Model Example","element":"span"}],[{"text":"The cumulant expansion for the discrete distribution in Equation (10) becomes","element":"span"}],[{"style":{"width":"95%"},"width":1657,"height":210,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-0.png","element":"img"}],[{"text":"(we’re compactly writing ","element":"span"},{"style":{"height":15.09},"width":178.92,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-1.png","element":"img","alt":" m for mn","inline":true},{"text":"), from which the cumulants are obtained as","element":"span"}],[{"style":{"width":"74%"},"width":1294,"height":191,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-2.png","element":"img"}],[{"text":"4.3 The Correction as a Complex Expectation","element":"span"}],[{"text":"The expected value of ","element":"span"},{"text":"F","element":"span"},{"text":", which is required for the correction, has a dependence on a product of ratios of distributions ","element":"span"},{"style":{"height":17.6},"width":254.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-3.png","element":"img","alt":" qa(xa)/q(xa).","inline":true,"padRight":true},{"text":"In the preceding section it was shown that the contributing distributions share lower-order statistics, allowing a twofold simplification. Firstly, the ratio ","element":"span"},{"style":{"height":17.6},"width":80.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-4.png","element":"img","alt":" qa/q","inline":true,"padRight":true},{"text":"will be written as a single quantity that depends on ","element":"span"},{"style":{"height":10.69},"width":37.68,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-5.png","element":"img","alt":" ra","inline":true},{"text":", which was introduced above in Equation (17). Secondly, we will show that it is natural to shift integration variables into the complex plane, and rely on complex Gaussian random variables (meaning that both real and imaginary parts are jointly Gaussian). These complex random variables that define the ","element":"span"},{"style":{"height":10.69},"width":37.68,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-6.png","element":"img","alt":" ra","inline":true},{"text":"’s have a peculiar property: they have a zero self-relation matrix! This property has important consequences in the resulting expansion.","element":"span"}],[{"text":"4.3.1 Complex Expectations","element":"span"}],[{"text":"Assume that ","element":"span"},{"style":{"height":18.4},"width":643.44,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-7.png","element":"img","alt":" q(xa) = N(xa ; µa, Σa) and qa(xa","inline":true},{"text":") share the same mean and covariance, and substitute log ","element":"span"},{"style":{"height":17.6},"width":511.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-8.png","element":"img","alt":" χa(ka) = ra(ka) + log χ(ka","inline":true},{"text":") in the definition of ","element":"span"},{"style":{"height":11.6},"width":37.68,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-9.png","element":"img","alt":" qa","inline":true,"padRight":true},{"text":"in Equation (16) to give","element":"span"}],[{"style":{"width":"70%"},"width":1215,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-10.png","element":"img"}],[{"text":"Although the ","element":"span"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-11.png","element":"img","alt":" ka","inline":true,"padRight":true},{"text":"variables have not been introduced as random variables, we find it natural to ","element":"span"},{"text":"interpret them as such","element":"span"},{"text":", because the rules of expectations over Gaussian random variables will be extremely helpful in developing the subsequent expansions. We will therefore write ","element":"span"},{"style":{"height":17.6},"width":223.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-12.png","element":"img","alt":"qa(xa)/q(xa","inline":true},{"text":") as an expectation of exp ","element":"span"},{"style":{"height":17.6},"width":101.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-13.png","element":"img","alt":" ra(ka","inline":true},{"text":") over a density ","element":"span"},{"style":{"height":21.95},"width":480.48,"height":54.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-14.png","element":"img","alt":" p(ka|xa) ∝ e−ikTa xaχ(ka):","inline":true}],[{"style":{"width":"65%"},"width":1130,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-15.png","element":"img"}],[{"text":"By substituting log ","element":"span"},{"style":{"height":19.54},"width":525.52,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-16.png","element":"img","alt":" χ(ka) = iµTa ka − kTa Σaka/","inline":true},{"text":"2 into Equation (18), we see that ","element":"span"},{"style":{"height":17.6},"width":161,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-17.png","element":"img","alt":" p(ka|xa)","inline":true,"padRight":true},{"text":"can be viewed as Gaussian, but not for real random variables! We have to consider ","element":"span"},{"style":{"height":15.09},"width":101,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-18.png","element":"img","alt":" ka as","inline":true,"padRight":true},{"text":"Gaussian random variables with a real and an imaginary part with","element":"span"}],[{"style":{"width":"66%"},"width":1157,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/10-19.png","element":"img"}],[{"style":{"width":"56%"},"width":976,"height":531,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-0.png","element":"img"}],[{"text":"Figure 1: Equation (20) shifts ","element":"figcaption","subtype":"caption"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-1.png","element":"img","alt":" ka","inline":true,"padRight":true},{"text":"to the complex plane. In the simplest case the joint density","element":"figcaption","subtype":"caption"}],[{"style":{"width":"88%"},"width":1536,"height":370,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-2.png","element":"img"}],[{"text":"For the purpose of computing the expectation in Equation (19), ","element":"span"},{"style":{"height":17.6},"width":102.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-3.png","element":"img","alt":" ka|xa","inline":true,"padRight":true},{"text":"is a degenerate complex Gaussian that shifts the coefficients ","element":"span"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-4.png","element":"img","alt":" ka","inline":true,"padRight":true},{"text":"into the complex plane. The expectation of exp ","element":"span"},{"style":{"height":17.6},"width":101.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-5.png","element":"img","alt":" ra(ka","inline":true},{"text":") is therefore taken over Gaussian random variables that have ","element":"span"},{"style":{"height":17.6},"width":282.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-6.png","element":"img","alt":" q(xa)’s inverse","inline":true,"padRight":true},{"text":"covariance matrix as their (real) covariance! As shorthand, we write","element":"span"}],[{"style":{"width":"74%"},"width":1283,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-7.png","element":"img"}],[{"text":"Figure 1 illustrates a simple density ","element":"span"},{"style":{"height":17.6},"width":141.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-8.png","element":"img","alt":" p(ka|xa","inline":true},{"text":"), showing that the imaginary component is a deterministic function of ","element":"span"},{"style":{"height":15.09},"width":233.52,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-9.png","element":"img","alt":" xa. Once xa","inline":true,"padRight":true},{"text":"is averaged out of the joint density ","element":"span"},{"style":{"height":17.6},"width":281.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-10.png","element":"img","alt":" p(ka|xa) q(xa),","inline":true,"padRight":true},{"text":"a ","element":"span"},{"text":"circularly symmetric complex Gaussian ","element":"span"},{"text":"distribution over ","element":"span"},{"style":{"height":15.09},"width":228.48,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-11.png","element":"img","alt":" ka remains.","inline":true,"padRight":true},{"text":"It is circularly symmetric as ","element":"span"},{"style":{"height":17.6},"width":163.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-12.png","element":"img","alt":" ⟨ka⟩ = 0","inline":true},{"text":", relation matrix","element":"span"},{"style":{"height":21.12},"width":223.72,"height":52.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-13.png","element":"img","alt":"�kakTa�= 0","inline":true},{"text":", and covariance matrix","element":"span"},{"style":{"height":25.34},"width":320.36,"height":63.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-14.png","element":"img","alt":"�kakaT �= 2Σ−1a","inline":true,"padRight":true},{"text":"(notation ","element":"span"},{"text":"k ","element":"span"},{"text":"indicates the complex conjugate of ","element":"span"},{"text":"k","element":"span"},{"text":"). ","element":"span"},{"text":"For the purpose of computing the expected values with Wick’s theorem (following in Section 4.4 below), we ","element":"span"},{"text":"only ","element":"span"},{"text":"need the relations","element":"span"},{"style":{"height":21.12},"width":141,"height":52.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-15.png","element":"img","alt":"�kakTb�","inline":true},{"text":"for pairs of factors ","element":"span"},{"text":"a ","element":"span"},{"text":"and ","element":"span"},{"text":"b","element":"span"},{"text":". All of these will be derived next:","element":"span"}],[{"style":{"width":"99%"},"width":1727,"height":148,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/11-16.png","element":"img"}],[{"text":"into Equation (12), ","element":"span"},{"text":"R ","element":"span"},{"text":"is equal to","element":"span"}],[{"style":{"width":"76%"},"width":1320,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-0.png","element":"img"}],[{"text":"When ","element":"span"},{"text":"x ","element":"span"},{"text":"is given, the ","element":"span"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-1.png","element":"img","alt":" ka","inline":true},{"text":"-variables are independent. However, when they are averaged over ","element":"span"},{"style":{"height":17.6},"width":227.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-2.png","element":"img","alt":"q(x), the ka","inline":true},{"text":"-variables become coupled. They are zero-mean complex Gaussians","element":"span"}],[{"style":{"width":"52%"},"width":911,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-3.png","element":"img"}],[{"text":"and are coupled with a zero self-relation matrix! In other words, if ","element":"span"},{"style":{"height":17.6},"width":428,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-4.png","element":"img","alt":" Σab = cov(xa, xb), the","inline":true,"padRight":true},{"text":"expected values","element":"span"},{"style":{"height":21.12},"width":141,"height":52.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-5.png","element":"img","alt":"�kakTb�","inline":true},{"text":"between the variables in the set ","element":"span"},{"style":{"height":17.6},"width":163.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-6.png","element":"img","alt":" {ka} are","inline":true}],[{"style":{"width":"84%"},"width":1467,"height":206,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-7.png","element":"img"}],[{"text":"Complex Gaussian random variables are additionally characterized by","element":"span"},{"style":{"height":25.34},"width":364.8,"height":63.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-8.png","element":"img","alt":"�kakbT �. However,","inline":true,"padRight":true},{"text":"these expectations are not required for computing and simplifying the expansion of log ","element":"span"},{"text":"R ","element":"span"},{"text":"in Section 5, and are not needed for the remainder of this paper. Figure 2 illustrates the structure of the resulting relation matrix ","element":"span"},{"style":{"height":21.12},"width":141,"height":52.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-9.png","element":"img","alt":"�kakTb�","inline":true,"padRight":true},{"text":"for two different factorizations of the same distribution. Each factor ","element":"span"},{"style":{"height":16.4},"width":39.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-10.png","element":"img","alt":" fa","inline":true,"padRight":true},{"text":"contributes a ","element":"span"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-11.png","element":"img","alt":" ka","inline":true,"padRight":true},{"text":"variable, such that the tree-structured approximation’s relation matrix will be larger than that of the fully factorized one.","element":"span"}],[{"text":"Section 5 shows that when ","element":"span"},{"style":{"height":16.4},"width":1133.48,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-12.png","element":"img","alt":" Da = 1, the above expectation can be written directly over","inline":true},{"style":{"height":17.6},"width":90.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-13.png","element":"img","alt":"{ka}","inline":true,"padRight":true},{"text":"and expanded. In the general case, discussed in Section 7, the inner expectation is first expanded (to treat the ","element":"span"},{"style":{"height":14.69},"width":54,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-14.png","element":"img","alt":" Da","inline":true,"padRight":true},{"text":"powers) before computing an expectation over ","element":"span"},{"style":{"height":17.6},"width":173.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-15.png","element":"img","alt":" {ka}. In","inline":true,"padRight":true},{"text":"both cases the expectation will involve polynomials in ","element":"span"},{"text":"k","element":"span"},{"text":"-variables. The expected values of Gaussian polynomials can be evaluated with Wick’s theorem.","element":"span"}],[{"text":"4.4 Wick’s Theorem","element":"span"}],[{"text":"Wick’s theorem provides a useful formula for mixed central moments of Gaussian variables. Let ","element":"span"},{"style":{"height":17.39},"width":214.08,"height":43.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-16.png","element":"img","alt":" kn1, . . . , knℓ","inline":true,"padRight":true},{"text":"be real or complex centered jointly Gaussian variables, noting that they do not have to be different. Then","element":"span"}],[{"style":{"width":"66%"},"width":1154,"height":99,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-17.png","element":"img"}],[{"text":"where the sum is over all partitions of ","element":"span"},{"style":{"height":17.6},"width":227.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-18.png","element":"img","alt":" {n1, . . . , nℓ}","inline":true,"padRight":true},{"text":"into disjoint pairs ","element":"span"},{"style":{"height":18.69},"width":392.36,"height":46.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-19.png","element":"img","alt":" {iη, jη}. If ℓ = 2m is","inline":true,"padRight":true},{"text":"even, then there are (2","element":"span"},{"style":{"height":17.6},"width":404.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-20.png","element":"img","alt":"m)!/(2mm!) = (2m −","inline":true,"padRight":true},{"text":"1)!! such partitions.","element":"span"},{"style":{"height":15.14},"width":98.64,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-21.png","element":"img","alt":"3 If ℓ","inline":true,"padRight":true},{"text":"is odd, then there are none, and the expectation in Equation (23) is zero.","element":"span"}],[{"text":"Consider the one-dimensional variable ","element":"span"},{"style":{"height":19.14},"width":321.12,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-22.png","element":"img","alt":" k ∼ N(k; 0, σ2).","inline":true,"padRight":true},{"text":"Wick’s theorem states that ","element":"span"},{"style":{"height":17.6},"width":379.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-23.png","element":"img","alt":"⟨kℓ⟩ = (ℓ−1)!! σℓ if ℓ","inline":true,"padRight":true},{"text":"is even, and ","element":"span"},{"style":{"height":17.6},"width":217.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-24.png","element":"img","alt":" ⟨kℓ⟩ = 0 if ℓ","inline":true,"padRight":true},{"text":"is odd. In other words, ","element":"span"},{"style":{"height":19.14},"width":447.36,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-25.png","element":"img","alt":" ⟨k3⟩ = 0, ⟨k4⟩ = 3(σ2)2,","inline":true},{"style":{"height":19.14},"width":274.76,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/12-26.png","element":"img","alt":"⟨k6⟩ = 15(σ2)3","inline":true},{"text":", and so forth.","element":"span"}],[{"style":{"width":"49%"},"width":860,"height":789,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-0.png","element":"img"}],[{"text":"Figure 2: The relation matrices between ","element":"figcaption","subtype":"caption"},{"style":{"height":15.09},"width":44.4,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-1.png","element":"img","alt":" ka","inline":true,"padRight":true},{"text":"for two factorizations of ","element":"figcaption","subtype":"caption"},{"style":{"height":21.65},"width":405.12,"height":54.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-2.png","element":"img","alt":"�4n=1 tn(xn): the top","inline":true},{"text":"illustration is for ","element":"figcaption","subtype":"caption"},{"style":{"height":13.89},"width":137,"height":34.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-3.png","element":"img","alt":" t1t2t3t4","inline":true},{"text":", while the bottom illustration is of a tree structure (","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":403.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-4.png","element":"img","alt":"t1t2)(t2t3)(t3t4)/t2/t3","inline":true},{"text":". The white squares indicate a zero relation matrix","element":"figcaption","subtype":"caption"},{"style":{"height":20.93},"width":152.64,"height":52.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-5.png","element":"img","alt":"�kakTb�,","inline":true,"padRight":true},{"text":"with the ","element":"figcaption","subtype":"caption"},{"text":"diagonal ","element":"figcaption","subtype":"caption"},{"text":"being zero. From the properties of Equation (22) there are additional zeros in the tree structure’s relation matrix, where edge and node factors share variables. The factor ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":141.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-6.png","element":"img","alt":" f0 = g0","inline":true,"padRight":true},{"text":"is shadowed in grey in the left-hand figures, and can make ","element":"figcaption","subtype":"caption"},{"text":"q","element":"figcaption","subtype":"caption"},{"text":"(","element":"figcaption","subtype":"caption"},{"text":"x","element":"figcaption","subtype":"caption"},{"text":") densely connected.","element":"figcaption","subtype":"caption"}]]},{"heading":"5. Factorized Approximations","paragraphs":[[{"text":"In the fully factorized approximation, with ","element":"span"},{"style":{"height":17.6},"width":284.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-7.png","element":"img","alt":" fn(xn) = tn(xn","inline":true},{"text":"), the exact distribution in Equation (13) depends on the ","element":"span"},{"style":{"height":18.38},"width":880.68,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-8.png","element":"img","alt":" single node marginals F(x) = �n qn(xn)/q(xn","inline":true},{"text":"). Following Equa- ","element":"span"},{"text":"tion (21), the correction to the free energy","element":"span"}],[{"style":{"width":"80%"},"width":1384,"height":133,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-9.png","element":"img"}],[{"text":"is taken directly over the centered complex-valued Gaussian random variables ","element":"span"},{"style":{"height":17.6},"width":324,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-10.png","element":"img","alt":" k = (k1, . . . , kN),","inline":true,"padRight":true},{"text":"which have a relations","element":"span"}],[{"style":{"width":"74%"},"width":1279,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/13-11.png","element":"img"}],[{"text":"In the section to follow, all expectations shall be with respect to ","element":"span"},{"text":"k","element":"span"},{"text":", which will be dropped where it is clear from the context.","element":"span"}],[{"text":"Thus far, ","element":"span"},{"text":"R ","element":"span"},{"text":"is re-expressed in terms of site contributions. The expression in Equation (24) is exact, albeit still intractable, and will be treated through a power series expansion. Other quantities of interest, like marginal distributions or moments, can similarly be expressed exactly, and then expanded (see Appendix D).","element":"span"}],[{"text":"5.1 Second Order Correction to ","element":"span"},{"text":"log ","element":"span"},{"text":"R","element":"span"}],[{"text":"Assuming that the ","element":"span"},{"style":{"height":10.69},"width":40.68,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-0.png","element":"img","alt":" rn","inline":true},{"text":"’s are small on average with respect to ","element":"span"},{"text":"k","element":"span"},{"text":", Equation (24) is expanded and the lower order terms kept:","element":"span"}],[{"style":{"width":"73%"},"width":1262,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-1.png","element":"img"}],[{"text":"log ","element":"span"},{"text":"R ","element":"span"},{"text":"= log","element":"span"}],[{"style":{"width":"82%"},"width":1427,"height":171,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-2.png","element":"img"}],[{"text":"The simplification in the second line is a result of the variance terms being zero from Equation (25). ","element":"span"},{"text":"The single marginal terms also vanish (and hence EP is correct to first order) because both ","element":"span"},{"style":{"height":20.8},"width":436.8,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-3.png","element":"img","alt":" ⟨kn⟩ = 0 and�k2n�= 0.","inline":true}],[{"text":"This result can give us a hint in which situations the corrections are expected to be small:","element":"span"}],[{"text":"• ","element":"span"},{"text":"Firstly, the ","element":"span"},{"style":{"height":10.69},"width":40.68,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-4.png","element":"img","alt":" rn","inline":true,"padRight":true},{"text":"could be small for values of ","element":"span"},{"style":{"height":15.09},"width":43.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-5.png","element":"img","alt":" kn","inline":true,"padRight":true},{"text":"where the density of ","element":"span"},{"text":"k ","element":"span"},{"text":"is not small. For example, under a zero noise Gaussian process classification model, ","element":"span"},{"style":{"height":17.6},"width":256.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-6.png","element":"img","alt":" qn(xn) equals","inline":true,"padRight":true},{"text":"a step function ","element":"span"},{"style":{"height":17.6},"width":101.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-7.png","element":"img","alt":" tn(xn","inline":true},{"text":") times a Gaussian, where the latter often has small variance compared to the mean. Hence, ","element":"span"},{"style":{"height":17.6},"width":104.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-8.png","element":"img","alt":" qn(xn","inline":true},{"text":") should be very close to a Gaussian.","element":"span"}],[{"text":"• ","element":"span"},{"text":"Secondly, for systems with weakly (posterior) dependent variables ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-9.png","element":"img","alt":" xn","inline":true,"padRight":true},{"text":"we might expect that the log partition function log ","element":"span"},{"text":"Z ","element":"span"},{"text":"would scale approximately linearly with ","element":"span"},{"text":"N","element":"span"},{"text":", the number of variables. Since terms with ","element":"span"},{"text":"m ","element":"span"},{"text":"= ","element":"span"},{"text":"n ","element":"span"},{"text":"vanish in the computation of ln ","element":"span"},{"text":"R","element":"span"},{"text":", there are no corrections that are ","element":"span"},{"style":{"height":16},"width":549.96,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-10.png","element":"img","alt":" proportional to N when Σmn","inline":true,"padRight":true},{"text":"is sufficiently small as ","element":"span"},{"style":{"height":12.4},"width":158.24,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-11.png","element":"img","alt":"N → ∞","inline":true},{"text":". Hence, the dominant contributions to log ","element":"span"},{"text":"Z ","element":"span"},{"text":"should already be included in the EP approximation. However, Section 8.3 illustrates an example where this need not be the case.","element":"span"}],[{"text":"The expectation ","element":"span"},{"style":{"height":17.6},"width":127.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-12.png","element":"img","alt":" ⟨rmrn⟩","inline":true},{"text":", as it appears in Equation (26), is treated by substituting ","element":"span"},{"style":{"height":15.09},"width":141.6,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-13.png","element":"img","alt":" rn with","inline":true,"padRight":true},{"text":"its cumulant expansion ","element":"span"},{"style":{"height":22.53},"width":449.8,"height":56.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-14.png","element":"img","alt":" rn(kn) = �l≥3 ilclnkln/l","inline":true},{"text":"! from Equation (17). Wick’s theorem now ","element":"span"},{"text":"plays a pivotal role in evaluating the expectations that appear in the expansion:","element":"span"}],[{"style":{"width":"74%"},"width":1280,"height":388,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-15.png","element":"img"}],[{"text":"The second line above follows from contractions in Wick’s theorem. All the ","element":"span"},{"text":"self-pairing terms","element":"span"},{"text":", when for example one of the ","element":"span"},{"style":{"height":15.09},"width":73.32,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-16.png","element":"img","alt":" l kn","inline":true},{"text":"’s is paired with another ","element":"span"},{"style":{"height":15.09},"width":43.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-17.png","element":"img","alt":" kn","inline":true,"padRight":true},{"text":"in Equation (23), are zero because","element":"span"},{"style":{"height":21.12},"width":1057.8,"height":52.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-18.png","element":"img","alt":"�k2n�= 0. To therefore get a non-zero result for�ksmkln�","inline":true},{"text":", using Equation (23), ","element":"span"},{"style":{"height":15.09},"width":270.12,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-19.png","element":"img","alt":"each factor kn","inline":true,"padRight":true},{"text":"has to be paired with some factor ","element":"span"},{"style":{"height":15.09},"width":52.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-20.png","element":"img","alt":" km","inline":true},{"text":", and this is possible only when ","element":"span"},{"text":"l ","element":"span"},{"text":"= ","element":"span"},{"text":"s","element":"span"},{"text":". Wick’s theorem sums over all pairings, and there are ","element":"span"},{"text":"l","element":"span"},{"text":"! ways of pairing a ","element":"span"},{"style":{"height":15.6},"width":273.12,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/14-21.png","element":"img","alt":" kn with a km,","inline":true,"padRight":true},{"text":"giving the result in Equation (27). Finally, plugging Equation (27) into Equation (26) gives the second order correction","element":"span"}],[{"style":{"width":"75%"},"width":1297,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-0.png","element":"img"}],[{"text":"5.1.1 Ising Example Continued","element":"span"}],[{"text":"We can now compute the second order log ","element":"span"},{"text":"R ","element":"span"},{"text":"correction for the ","element":"span"},{"text":"N ","element":"span"},{"text":"= 2 Ising model example of Section 3.1. The covariance matrix has Σ","element":"span"},{"style":{"height":16.4},"width":844.24,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-1.png","element":"img","alt":"nn = 1 from moment matching and Σ12 =","inline":true},{"style":{"height":32.4},"width":790.44,"height":81,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-2.png","element":"img","alt":"J/(λ2 − J2) with λ = 12�J2 +√J4 + 4�","inline":true},{"text":". The uneven terms in the cumulant expansion derived in Section 4.2.1 disappear because ","element":"span"},{"text":"m ","element":"span"},{"text":"= 0. The first nontrivial term is therefore ","element":"span"},{"style":{"height":26.18},"width":1343.72,"height":65.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-3.png","element":"img","alt":"l = 4 which gives a contribution of 12 × 2 × c244! Σ412 = (−2)24! Σ412 = 16Σ412","inline":true},{"text":". In Section 3.1, we ","element":"span"},{"text":"saw that log ","element":"span"},{"style":{"height":23.68},"width":329.4,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-4.png","element":"img","alt":" Z − log ZEP = J46 ","inline":true,"padRight":true},{"text":"plus terms of order ","element":"span"},{"style":{"height":15.14},"width":45.32,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-5.png","element":"img","alt":" J6","inline":true,"padRight":true},{"text":"and higher. To lowest order in ","element":"span"},{"text":"J ","element":"span"},{"text":"we ","element":"span"},{"text":"have Σ","element":"span"},{"style":{"height":14.69},"width":121.6,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-6.png","element":"img","alt":"12 = J","inline":true,"padRight":true},{"text":"and thus log ","element":"span"},{"style":{"height":23.68},"width":133.08,"height":59.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-7.png","element":"img","alt":" R = J46 ","inline":true,"padRight":true},{"text":"which exactly cancels the lowest order error of EP.","element":"span"}],[{"text":"5.2 Corrections to Other Quantities","element":"span"}],[{"text":"The schema given here is applicable to any other quantity of interest, be it marginal or predictive distributions, or the marginal moments of ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). The cumulant corrections for the marginal moments are derived in Appendix D; for example, the correction to the marginal mean ","element":"span"},{"style":{"height":12},"width":38.4,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-8.png","element":"img","alt":" µi","inline":true,"padRight":true},{"text":"of an approximation ","element":"span"},{"style":{"height":18.4},"width":391.88,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-9.png","element":"img","alt":" q(x) = N(x; µ, Σ) is","inline":true}],[{"style":{"width":"79%"},"width":1373,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-10.png","element":"img"}],[{"text":"while the correction to the marginal covariance is","element":"span"}],[{"style":{"width":"96%"},"width":1662,"height":287,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-11.png","element":"img"}],[{"text":"5.3 Edgeworth-Type Expansions","element":"span"}],[{"text":"To simplify the expansion of Equation (24), we integrated (combined) degenerate complex Gaussians ","element":"span"},{"style":{"height":17.6},"width":276.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-12.png","element":"img","alt":" kn|xn over q(x","inline":true},{"text":") to obtain fully complex Gaussian random variables ","element":"span"},{"style":{"height":17.6},"width":234.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-13.png","element":"img","alt":" {kn}. We’ve","inline":true,"padRight":true},{"text":"then relied on","element":"span"},{"style":{"height":20.8},"width":836.16,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-14.png","element":"img","alt":"�k2n�= 0 to simplify the expansion of log R.","inline":true}],[{"text":"The expectations","element":"span"},{"style":{"height":20.8},"width":1323.96,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-15.png","element":"img","alt":"�k2n�= 0 are closely related to the orthogonality of Hermite polynomi-","inline":true,"padRight":true},{"text":"als, and this can be employed in an alternative derivation. In particular, one can ","element":"span"},{"text":"first ","element":"span"},{"text":"make a Taylor expansion of exp ","element":"span"},{"style":{"height":17.6},"width":102.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-16.png","element":"img","alt":" rn(kn","inline":true},{"text":") around zero, giving complex-valued polynomials in ","element":"span"},{"style":{"height":17.6},"width":100.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-17.png","element":"img","alt":" {kn}.","inline":true,"padRight":true},{"text":"When the inner average in Equation (24) is then taken over ","element":"span"},{"style":{"height":17.6},"width":103.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-18.png","element":"img","alt":" kn|xn","inline":true},{"text":", a real-valued series of Hermite polynomials in ","element":"span"},{"style":{"height":17.6},"width":91.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-19.png","element":"img","alt":" {xn}","inline":true,"padRight":true},{"text":"arises. These polynomials are orthogonal under ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). The series that describes the tilted distribution ","element":"span"},{"style":{"height":17.6},"width":104.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-20.png","element":"img","alt":" qn(xn","inline":true},{"text":") is equal to the product of ","element":"span"},{"style":{"height":17.6},"width":249.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/15-21.png","element":"img","alt":" q(xn) and an","inline":true,"padRight":true},{"text":"expansion of polynomials for the higher-cumulant ","element":"span"},{"text":"deviation ","element":"span"},{"text":"from a Gaussian density. This line of derivation gives an Edgeworth expansion for","element":"span"},{"text":"each ","element":"span"},{"text":"factor’s tilted distribution.","element":"span"}],[{"text":"As a second step, Equation (24) couples the product of separate Edgeworth expansions (one for each factor) together by requiring an outer average over ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). The orthogonality of Hermite polynomials under ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") now come into play: it allows products of orthogonal polynomials under ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") to integrate to zero. ","element":"span"},{"text":"This is similar to contractions in Wick’s theorem, where","element":"span"},{"style":{"height":20.8},"width":1418.6,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-0.png","element":"img","alt":"�k2n�= 0 allows us to simplify Equation (27). Although it is not the focus","inline":true,"padRight":true},{"text":"of this work, an example of such a derivation appears in Appendix C.1.","element":"span"}]]},{"heading":"6. Radius of Convergence","paragraphs":[[{"text":"We may hope that in practice the low order terms in the cumulant expansions will account already for the dominant contributions. But will such an expansion actually converge when extended to arbitrary orders? While we will leave a more general answer to future research, we can at least give a partial result for the example of the Ising model. Let ","element":"span"},{"style":{"height":17.6},"width":261.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-1.png","element":"img","alt":" D = diag(Σ),","inline":true,"padRight":true},{"text":"the diagonal of the covariance matrix of the EP approximation ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). We prove here that a cumulant expansion for ","element":"span"},{"text":"R ","element":"span"},{"text":"will converge when the eigenvalues of ","element":"span"},{"style":{"height":16.34},"width":502.28,"height":40.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-2.png","element":"img","alt":" D−1/2ΣD−1/2—which has","inline":true,"padRight":true},{"text":"diagonal values of one—are bounded between zero and two.","element":"span"}],[{"text":"In practice we’ve found that even if the largest of these eigenvalues grows with ","element":"span"},{"text":"N","element":"span"},{"text":", the second-order correction gives a remarkable improvement. This, with the results in Figure 6, lead us to believe that the power series expansion is often divergent. It may well be that our expansions are only of an asymptotic type (Boyd, 1999) for which the summation of only a certain number of terms might give an improvement whereas further terms would lead to worse results. It leads to a paradoxical situation, which seems common when interesting functions are computed: On the one hand we may have a series which does not converge, but in many ways is more practical; on the other hand one might obtain an expansion that converges, but only impractically. Quoting George F. Carrier’s rule from Boyd (1999):","element":"span"}],[{"text":"Divergent series converge faster than convergent series because they don’t have to converge.","element":"span"}],[{"text":"For this, we do not yet have a clear-cut answer.","element":"span"}],[{"text":"6.1 A Formal Expression for the Cumulant Expansion to All Orders","element":"span"}],[{"text":"To discuss the question when our expansion will converge when extended to arbitrary orders, we introduce a single extra parameter ","element":"span"},{"style":{"height":12.8},"width":168.88,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-3.png","element":"img","alt":" λ into R","inline":true},{"text":", which controls the strength of the contribution of cumulants. Expanded into a series in powers of ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-4.png","element":"img","alt":" λ","inline":true},{"text":", contributions of cumulants of ","element":"span"},{"text":"total ","element":"span"},{"text":"order ","element":"span"},{"text":"l ","element":"span"},{"text":"are multiplied by a factor ","element":"span"},{"style":{"height":15.53},"width":35.44,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-5.png","element":"img","alt":" λl","inline":true},{"text":", for example ","element":"span"},{"style":{"height":18.01},"width":460.48,"height":45.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-6.png","element":"img","alt":" λlcnl or λk+lcnkcnl. Of","inline":true,"padRight":true},{"text":"course, at the end of the calculation, we set ","element":"span"},{"style":{"height":16.4},"width":882.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-7.png","element":"img","alt":" λ = 1. This approach is obviously achieved by","inline":true,"padRight":true},{"text":"replacing","element":"span"}],[{"style":{"width":"19%"},"width":331,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-8.png","element":"img"}],[{"text":"in Equation (24). Hence, we define","element":"span"}],[{"style":{"width":"80%"},"width":1391,"height":304,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/16-9.png","element":"img"}],[{"text":"By working backwards, and expressing everything by the original densities over ","element":"span"},{"style":{"height":15.6},"width":141.44,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-0.png","element":"img","alt":" xn, the","inline":true,"padRight":true},{"text":"correction can be written as","element":"span"}],[{"style":{"width":"65%"},"width":1125,"height":142,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-1.png","element":"img"}],[{"text":"where the density ","element":"span"},{"style":{"height":17.6},"width":85.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-2.png","element":"img","alt":" qλ(x","inline":true},{"text":") is a multivariate Gaussian with mean ","element":"span"},{"style":{"height":11.6},"width":31,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-3.png","element":"img","alt":" µ","inline":true,"padRight":true},{"text":"and covariance given by","element":"span"}],[{"style":{"width":"23%"},"width":414,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-4.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.14},"width":464.84,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-5.png","element":"img","alt":" D = diag(Σ) and z = λ2","inline":true},{"text":". Hence, we see that the expansion in powers of ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-6.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"is actually equivalent to an expansion in products of nondiagonal elements of ","element":"span"},{"style":{"height":12},"width":48.48,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-7.png","element":"img","alt":" Σ.","inline":true}],[{"text":"Noticing that as ","element":"span"},{"style":{"height":17.6},"width":76.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-8.png","element":"img","alt":" R(λ","inline":true},{"text":") depends on ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-9.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"through the density ","element":"span"},{"style":{"height":23.47},"width":546.73,"height":58.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-10.png","element":"img","alt":" qλ(x) ∝ |Σλ|−1/2e− 12x⊤Σ−1λ x,","inline":true,"padRight":true},{"text":"we can see by expressing ","element":"span"},{"style":{"height":21.46},"width":79.88,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-11.png","element":"img","alt":" Σ−1λ","inline":true,"padRight":true},{"text":"in terms of eigenvalues and eigenvectors that for any ","element":"span"},{"text":"fixed ","element":"span"},{"style":{"height":17.6},"width":137.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-12.png","element":"img","alt":"x, qλ(x","inline":true},{"text":") is an analytic function of the ","element":"span"},{"text":"complex variable ","element":"span"},{"text":"z ","element":"span"},{"text":"as long as ","element":"span"},{"style":{"height":14.88},"width":56.48,"height":37.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-13.png","element":"img","alt":" Σλ","inline":true,"padRight":true},{"text":"is positive definite. Since","element":"span"}],[{"style":{"width":"50%"},"width":864,"height":65,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-14.png","element":"img"}],[{"text":"this is equivalent to the condition that the matrix ","element":"span"},{"style":{"height":20.34},"width":471.64,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-15.png","element":"img","alt":" I + z(D−1/2ΣD−1/2 − I","inline":true},{"text":") is positive definite. Introducing ","element":"span"},{"style":{"height":11.6},"width":34.56,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-16.png","element":"img","alt":" γi","inline":true},{"text":", the eigenvalues of ","element":"span"},{"style":{"height":15.94},"width":269.48,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-17.png","element":"img","alt":" D−1/2ΣD−1/2","inline":true},{"text":", positive definiteness fails when for the first time 1 + ","element":"span"},{"style":{"height":17.6},"width":733.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-18.png","element":"img","alt":" z(γi − 1) = 0. Thus the series for qλ(x","inline":true},{"text":") is convergent for","element":"span"}],[{"style":{"width":"20%"},"width":348,"height":101,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-19.png","element":"img"}],[{"text":"Setting ","element":"span"},{"text":"z ","element":"span"},{"text":"= 1, this is equivalent to the condition","element":"span"}],[{"style":{"width":"18%"},"width":325,"height":100,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-20.png","element":"img"}],[{"text":"This means that the eigenvalues have to fulfil 0 ","element":"span"},{"style":{"height":15.2},"width":202.56,"height":38,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-21.png","element":"img","alt":" < γi < 2.","inline":true,"padRight":true},{"text":"Unfortunately, we can not conclude from this condition that pointwise convergence of ","element":"span"},{"style":{"height":17.6},"width":314.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-22.png","element":"img","alt":" qλ(x) for each x","inline":true,"padRight":true},{"text":"leads to convergence of ","element":"span"},{"style":{"height":17.6},"width":76.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-23.png","element":"img","alt":" R(λ","inline":true},{"text":") (which is an integral of ","element":"span"},{"style":{"height":17.6},"width":304.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-24.png","element":"img","alt":" qλ(x) over all x","inline":true},{"text":"!). However, in cases where the integral eventually becomes a finite sum, such as the Ising model, pointwise convergence in ","element":"span"},{"text":"x ","element":"span"},{"text":"leads to convergence of ","element":"span"},{"style":{"height":17.6},"width":104.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-25.png","element":"img","alt":" R(λ).","inline":true}],[{"text":"6.1.1 Ising Model Example","element":"span"}],[{"text":"From Section 4.2.1 the tilted distribution for the running example Ising model is ","element":"span"},{"style":{"height":17.6},"width":169.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-26.png","element":"img","alt":" qn(xn) =","inline":true}],[{"style":{"height":19.15},"width":391.12,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-27.png","element":"img","alt":"2[δ(xn + 1) + δ(xn −","inline":true,"padRight":true},{"text":"1)], and hence ","element":"span"},{"style":{"height":28.03},"width":679.08,"height":70.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-28.png","element":"img","alt":" q(xn) = 1(2π)1/2 e−x2n/2. As each q(xn","inline":true},{"text":") is a unit-variance Gaussian, ","element":"span"},{"style":{"height":20.34},"width":935.52,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-29.png","element":"img","alt":" D = diag(Σ) = I. Hence D−1/2ΣD−1/2 = Σ and","inline":true}],[{"style":{"width":"85%"},"width":1475,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-30.png","element":"img"}],[{"text":"follows from Equation (31). The arguments of the previous section show that the ","element":"span"},{"text":"radius of ","element":"span"},{"style":{"height":17.6},"width":360.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-31.png","element":"img","alt":"convergence of R(λ","inline":true},{"text":") is determined by the condition that the matrix ","element":"span"},{"style":{"height":19.14},"width":226.84,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-32.png","element":"img","alt":" I+λ2(Σ−I","inline":true},{"text":") is positive definite or the eigenvalues ","element":"span"},{"style":{"height":19.14},"width":526.56,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/17-33.png","element":"img","alt":" li of Σ fulfil |li − 1| ≤ 1/λ2.","inline":true}],[{"style":{"width":"96%"},"width":1659,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-0.png","element":"img"}],[{"text":"1 + ","element":"span"},{"text":"c","element":"span"},{"text":", meaning that cumulant expansion for ","element":"span"},{"style":{"height":17.6},"width":76.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-1.png","element":"img","alt":" R(λ","inline":true},{"text":") is convergent for the ","element":"span"},{"text":"N ","element":"span"},{"text":"= 2 Ising model. For ","element":"span"},{"text":"N > ","element":"span"},{"text":"2, it is easy to show that this is not necessarily true. Consider the ‘isotropic’ Ising model with ","element":"span"},{"style":{"height":17.09},"width":144.16,"height":42.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-2.png","element":"img","alt":" Jij = J","inline":true,"padRight":true},{"text":"and zero external field, then Σ","element":"span"},{"style":{"height":17.89},"width":636.96,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-3.png","element":"img","alt":"ii = 1 and Σij = c for i ̸= j with","inline":true},{"style":{"height":17.6},"width":489.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-4.png","element":"img","alt":"c = c(J) ∈] − 1/(N − 1),","inline":true,"padRight":true},{"text":"1[. The eigenvalues are now 1 + (","element":"span"},{"style":{"height":17.6},"width":356.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-5.png","element":"img","alt":"N − 1)c and 1 − c","inline":true,"padRight":true},{"text":"(the latter with degeneracy ","element":"span"},{"style":{"height":12},"width":83.92,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-6.png","element":"img","alt":" N −","inline":true,"padRight":true},{"text":"1). For finite ","element":"span"},{"text":"c","element":"span"},{"text":", the largest eigenvalue will scale with ","element":"span"},{"text":"N ","element":"span"},{"text":"and thus be larger than the upper value of two that would be required for convergence. Scaling with ","element":"span"},{"text":"N ","element":"span"},{"text":"for the largest eigenvalue of ","element":"span"},{"style":{"height":15.94},"width":269.48,"height":39.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-7.png","element":"img","alt":" D−1/2ΣD−1/2 ","inline":true,"padRight":true},{"text":"is also observed in the Ising model simulations Section 9.","element":"span"}],[{"text":"We conjecture that convergence of the cumulant series for ","element":"span"},{"style":{"height":17.6},"width":76.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-8.png","element":"img","alt":" R(λ","inline":true},{"text":") also implies convergence of the series for log ","element":"span"},{"style":{"height":17.6},"width":76.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-9.png","element":"img","alt":" R(λ","inline":true},{"text":") but leave an investigation of this point to future research. We only illustrate this point for the ","element":"span"},{"text":"N ","element":"span"},{"text":"= 2 Ising model case, where we have the explicit formula","element":"span"}],[{"style":{"width":"73%"},"width":1278,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-10.png","element":"img"}],[{"text":"As can be easily seen, an expansion in ","element":"span"},{"style":{"height":12.8},"width":26,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-11.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"converges for ","element":"span"},{"style":{"height":15.54},"width":128.08,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-12.png","element":"img","alt":" c2λ4 <","inline":true,"padRight":true},{"text":"1 which gives the same radius of convergence ","element":"span"},{"text":"|","element":"span"},{"text":"c","element":"span"},{"text":"| ","element":"span"},{"text":"< ","element":"span"},{"text":"1 as for the expansion of ","element":"span"},{"text":"R","element":"span"},{"text":".","element":"span"}]]},{"heading":"7. General Approximations","paragraphs":[[{"text":"The general approximations differ from the factorized approximation in that an expansion in terms of expectations under ","element":"span"},{"style":{"height":17.6},"width":90.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-13.png","element":"img","alt":" {ka}","inline":true,"padRight":true},{"text":"doesn’t immediately arise. Consider ","element":"span"},{"text":"R ","element":"span"},{"text":"in Equation (21): Its inner expectations are over ","element":"span"},{"style":{"height":17.6},"width":85.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-14.png","element":"img","alt":" ka|x","inline":true},{"text":", and outer expectations are over ","element":"span"},{"text":"x","element":"span"},{"text":". First take the binomial expansion of the ","element":"span"},{"text":"inner ","element":"span"},{"text":"expectation, and keep it to second order in ","element":"span"},{"style":{"height":10.69},"width":51.36,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-15.png","element":"img","alt":" ra:","inline":true}],[{"style":{"width":"95%"},"width":1656,"height":351,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-16.png","element":"img"}],[{"text":"Notice that ","element":"span"},{"style":{"height":17.6},"width":101.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-17.png","element":"img","alt":" ra(ka","inline":true},{"text":") can be complex, but ","element":"span"},{"style":{"height":21.59},"width":223.56,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-18.png","element":"img","alt":" ⟨ra(ka)⟩ka|x","inline":true},{"text":", as it appears in the above expansion, is real-valued. Using this result, again expand ","element":"span"},{"style":{"height":26.85},"width":283.08,"height":67.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-19.png","element":"img","alt":" ⟨�a ⟨era⟩Daka|x⟩x","inline":true},{"text":". The correction to log ","element":"span"},{"text":"R","element":"span"},{"text":", up ","element":"span"},{"text":"to second order, is","element":"span"}],[{"style":{"width":"81%"},"width":1409,"height":248,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-20.png","element":"img"}],[{"text":"In the above relation the first-order terms all disappeared as ","element":"span"},{"style":{"height":17.6},"width":598,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-21.png","element":"img","alt":" ⟨⟨ra(ka)⟩⟩ = 0. Terms involving","inline":true},{"style":{"height":19.14},"width":1726.28,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-22.png","element":"img","alt":"⟨⟨ra(ka)2⟩⟩ = 0 similarly disappear, as every polynomial in the expansion ra(ka)2 averages","inline":true,"padRight":true},{"text":"to zero. This is a general case of Equation (26), in which ","element":"span"},{"style":{"height":16.4},"width":665.24,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/18-23.png","element":"img","alt":" Dn = 1 for all factors. In Appendix","inline":true,"padRight":true},{"text":"B we show how to use the general result for the case where the factorization is a tree and our factors are edges (pairs) and nodes (single variables).","element":"span"}]]},{"heading":"8. Gaussian Process Results","paragraphs":[[{"text":"One of the most important applications of EP is to statistical models with Gaussian process (GP) priors, where ","element":"span"},{"text":"x ","element":"span"},{"text":"is a latent variable with Gaussian prior distribution with a kernel matrix ","element":"span"},{"text":"K ","element":"span"},{"text":"as covariance ","element":"span"},{"style":{"height":19.53},"width":242.4,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-0.png","element":"img","alt":" E[xxT ] = K.","inline":true}],[{"text":"It is well known that for many models, like GP classification, inference with EP is on par with MCMC ground truth (Kuss and Rasmussen, 2005). Section 8.1 underlines this case, and shows corrections to the partition function on the USPS data set over a range of kernel hyperparameter settings.","element":"span"}],[{"text":"A common inference task is to predict the output for previously unseen data. Under a GP regression model, a key quantity is the predictive mean function. The predictive mean is analytically tractable when the latent function is corrupted with Gaussian noise to produce observations ","element":"span"},{"style":{"height":12},"width":56.16,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-1.png","element":"img","alt":" yn.","inline":true,"padRight":true},{"text":"This need not be the case; in Section 8.2 we examine the problem of quantized regression, where the noise model is non-Gaussian with sharp discontinuities. We show practically how the corrections transfer to other moments, like the predictive mean. Through it, we arrive at a hypothetical rule of thumb: if the data isn’t “sensible” under the (probabilistic) model of interest, there is no guarantee for EP giving satisfactory inference.","element":"span"}],[{"text":"Armed with the rule of thumb, Section 8.3 constructs an insightful counterexample where the EP estimate diverges or is far from ground truth with more data. Divergence in the partition function is manifested in the initial correction terms, giving a test for the approximation accuracy that doesn’t rely on any Monte Carlo ground truth.","element":"span"}],[{"text":"8.1 Gaussian Process Classification","element":"span"}],[{"text":"The GP classification model arises when we observe ","element":"span"},{"text":"N ","element":"span"},{"text":"data points ","element":"span"},{"style":{"height":11.09},"width":40.68,"height":27.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-2.png","element":"img","alt":" sn","inline":true,"padRight":true},{"text":"with class labels ","element":"span"},{"style":{"height":17.6},"width":248.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-3.png","element":"img","alt":"yn ∈ {−1, 1}","inline":true},{"text":", and model ","element":"span"},{"text":"y ","element":"span"},{"text":"through a latent function ","element":"span"},{"text":"x ","element":"span"},{"text":"with a GP prior. The likelihood terms for ","element":"span"},{"style":{"height":12},"width":42.6,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-4.png","element":"img","alt":" yn","inline":true,"padRight":true},{"text":"are assumed to be ","element":"span"},{"style":{"height":17.6},"width":337.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-5.png","element":"img","alt":" tn(xn) = Φ(ynxn","inline":true},{"text":"), where Φ(","element":"span"},{"style":{"height":5.6},"width":12,"height":14,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-6.png","element":"img","alt":"·","inline":true},{"text":") denotes the cumulative Normal density.","element":"span"}],[{"text":"An extensive MCMC evaluation of EP for GP classification on various data sets was given by Kuss and Rasmussen (2005), showing that the log marginal likelihood of the data can be approximated remarkably well. As shown by Opper et al. (2009), an even more accurate estimation of the approximation error is given by considering the second order correction in Equation (28). For GPC we generally found that the ","element":"span"},{"text":"l ","element":"span"},{"text":"= 3 term dominates ","element":"span"},{"text":"l ","element":"span"},{"text":"= 4, and we do not include any higher cumulants here.","element":"span"}],[{"text":"Figure 3 illustrates the correction to log ","element":"span"},{"text":"R","element":"span"},{"text":", with ","element":"span"},{"text":"l ","element":"span"},{"text":"= 3","element":"span"},{"text":", ","element":"span"},{"text":"4, on the binary subproblem of the USPS 3’s vs. 5’s digits data set, with ","element":"span"},{"text":"N ","element":"span"},{"text":"= 767. This is the same set-up of Kuss and Rasmussen (2005) and Opper et al. (2009), using the kernel ","element":"span"},{"style":{"height":21.26},"width":618.24,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-7.png","element":"img","alt":" k(s, s′) = σ2 exp(− 12∥s−s′∥2/ℓ2),","inline":true,"padRight":true},{"text":"and we refer the reader to both papers for additional and complimentary figures and results. We evaluated Equation (28) on a similar grid of log ","element":"span"},{"style":{"height":16.4},"width":370.08,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-8.png","element":"img","alt":" ℓ and log σ values.","inline":true,"padRight":true},{"text":"For the same grid values we obtained Monte Carlo estimates of log ","element":"span"},{"text":"Z","element":"span"},{"text":", and hence log ","element":"span"},{"text":"R","element":"span"},{"text":". The correction, compared to the magnitude of the log ","element":"span"},{"text":"Z ","element":"span"},{"text":"grids by Kuss and Rasmussen (2005), is remarkably small, and underlines their findings on the accuracy of EP for GPC.","element":"span"}],[{"text":"The correction from Equation (28), as computed here, is ","element":"span"},{"style":{"height":19.14},"width":109.64,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-9.png","element":"img","alt":" O(N 2","inline":true},{"text":"), and compares favorably to ","element":"span"},{"style":{"height":19.13},"width":109.64,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/19-10.png","element":"img","alt":" O(N 3","inline":true},{"text":") complexity of EP for GPC.","element":"span"}],[{"style":{"width":"89%"},"width":1546,"height":768,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-0.png","element":"img"}],[{"text":"Figure 3: A comparison of log ","element":"figcaption","subtype":"caption"},{"text":"R ","element":"figcaption","subtype":"caption"},{"text":"using a perturbation expansion of Equation (28) against Monte Carlo estimates of log ","element":"figcaption","subtype":"caption"},{"text":"R","element":"figcaption","subtype":"caption"},{"text":", using the USPS data set from Kuss and Rasmussen (2005). The second order correction to log ","element":"figcaption","subtype":"caption"},{"text":"R","element":"figcaption","subtype":"caption"},{"text":", with ","element":"figcaption","subtype":"caption"},{"text":"l ","element":"figcaption","subtype":"caption"},{"text":"= 3","element":"figcaption","subtype":"caption"},{"text":", ","element":"figcaption","subtype":"caption"},{"text":"4, is used on the ","element":"figcaption","subtype":"caption"},{"text":"left","element":"figcaption","subtype":"caption"},{"text":"; the ","element":"figcaption","subtype":"caption"},{"text":"right ","element":"figcaption","subtype":"caption"},{"text":"plot uses a Monte Carlo estimate of log ","element":"figcaption","subtype":"caption"},{"text":"R","element":"figcaption","subtype":"caption"},{"text":".","element":"figcaption","subtype":"caption"}],[{"text":"8.2 Uniform Noise Regression","element":"span"}],[{"text":"We turn our attention to a regression problem, that of learning a latent function ","element":"span"},{"text":"x","element":"span"},{"text":"(","element":"span"},{"text":"s","element":"span"},{"text":") from inputs ","element":"span"},{"style":{"height":17.6},"width":85.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-1.png","element":"img","alt":" {sn}","inline":true,"padRight":true},{"text":"and matching real-valued observations ","element":"span"},{"style":{"height":17.6},"width":87.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-2.png","element":"img","alt":" {yn}","inline":true},{"text":". A frequent nonparametric treatment assumes that ","element":"span"},{"text":"x","element":"span"},{"text":"(","element":"span"},{"text":"s","element":"span"},{"text":") is ","element":"span"},{"text":"a priori ","element":"span"},{"text":"drawn from a GP prior with covariance function ","element":"span"},{"style":{"height":17.6},"width":115.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-3.png","element":"img","alt":"k(s, s′","inline":true},{"text":"), from which a corrupted version ","element":"span"},{"text":"y ","element":"span"},{"text":"is observed. Analytically tractable inference is no longer possible in this model when the observation noise is non-Gaussian. Some scenarios include that of quantized regression, where ","element":"span"},{"style":{"height":12},"width":42.6,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-4.png","element":"img","alt":" yn","inline":true,"padRight":true},{"text":"is formed by rounding ","element":"span"},{"style":{"height":17.6},"width":82.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-5.png","element":"img","alt":" x(sn","inline":true},{"text":") to, say, the nearest integer, or where ","element":"span"},{"text":"x","element":"span"},{"text":"(","element":"span"},{"text":"s","element":"span"},{"text":") indicates a robot’s path in a control problem, with conditions to stay within certain “wall” bounds. In these scenarios the latent function ","element":"span"},{"style":{"height":17.6},"width":246.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-6.png","element":"img","alt":" x(sn) can be","inline":true,"padRight":true},{"text":"reconstructed from ","element":"span"},{"style":{"height":12},"width":42.6,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-7.png","element":"img","alt":" yn","inline":true,"padRight":true},{"text":"by adding sharply discontinuous uniformly random ","element":"span"},{"style":{"height":17.6},"width":276.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-8.png","element":"img","alt":" U[−a, a] noise,","inline":true}],[{"style":{"width":"47%"},"width":822,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-9.png","element":"img"}],[{"text":"We now assume an EP approximation ","element":"span"},{"style":{"height":18.4},"width":355.24,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-10.png","element":"img","alt":" q(x) = N(x ; µ, Σ","inline":true},{"text":"), which can be obtained by using the moment calculations in Appendix E.2. To simplify the exposition of the predictive marginal, we follow the notation of Rasmussen and Williams (2005, Chapter 3) and let ","element":"span"},{"style":{"height":14.69},"width":97.84,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-11.png","element":"img","alt":" λn =","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":11.2},"width":103.56,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-12.png","element":"img","alt":"τn, νn","inline":true},{"text":"), so that the final EP approximation multiplies ","element":"span"},{"style":{"height":21.26},"width":661.84,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-13.png","element":"img","alt":" gn terms �n exp{− 12τnx2n + νnxn}","inline":true,"padRight":true},{"text":"into a joint Gaussian ","element":"span"},{"text":"N","element":"span"},{"text":"(","element":"span"},{"text":"x ","element":"span"},{"text":"; ","element":"span"},{"text":"0","element":"span"},{"text":", ","element":"span"},{"text":"K","element":"span"},{"text":").","element":"span"}],[{"text":"8.2.1 Making Predictions for New Data","element":"span"}],[{"text":"The latent function ","element":"span"},{"style":{"height":17.6},"width":78.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-14.png","element":"img","alt":" x(s∗","inline":true},{"text":") at any new input ","element":"span"},{"style":{"height":11.09},"width":36.68,"height":27.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-15.png","element":"img","alt":" s∗","inline":true,"padRight":true},{"text":"is obtained by the predictive marginal ","element":"span"},{"style":{"height":17.6},"width":99.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-16.png","element":"img","alt":" q(x∗)","inline":true,"padRight":true},{"text":"of ","element":"span"},{"style":{"height":17.6},"width":125.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-17.png","element":"img","alt":" q(x, x∗","inline":true},{"text":"). The marginal ","element":"span"},{"style":{"height":17.6},"width":79.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/20-18.png","element":"img","alt":" q(x∗","inline":true},{"text":")—given below in Equation (34)—is directly obtained from the EP approximation ","element":"span"},{"style":{"height":18.4},"width":358.6,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-0.png","element":"img","alt":" q(x) = N(x ; µ, Σ","inline":true},{"text":"). However, the correction to its mean, as was given in Equation (29), requires covariances Σ","element":"span"},{"style":{"height":6},"width":37.8,"height":15,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-1.png","element":"img","alt":"∗n","inline":true},{"text":", which are derived here.","element":"span"}],[{"text":"Let ","element":"span"},{"style":{"height":17.6},"width":429.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-2.png","element":"img","alt":" κ∗ = k(s∗, s∗), and k∗","inline":true,"padRight":true},{"text":"be a vector containing the covariance function evaluations ","element":"span"},{"style":{"height":17.6},"width":139.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-3.png","element":"img","alt":"k(s∗, sn","inline":true},{"text":"). Again following Rasmussen and Williams (2005)’s notation, let ","element":"span"},{"text":"˜","element":"span"},{"style":{"height":12},"width":37,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-4.png","element":"img","alt":"Σ","inline":true,"padRight":true},{"text":"be the diagonal matrix containing 1","element":"span"},{"style":{"height":17.6},"width":61.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-5.png","element":"img","alt":"/τn","inline":true,"padRight":true},{"text":"along its diagonal. The EP covariance, on the inclusion of ","element":"span"},{"style":{"height":14.8},"width":99.56,"height":37,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-6.png","element":"img","alt":" x∗, is","inline":true}],[{"style":{"width":"80%"},"width":1399,"height":266,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-7.png","element":"img"}],[{"text":"with ","element":"span"},{"style":{"height":20.41},"width":487.84,"height":51.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-8.png","element":"img","alt":" Σ = K − K(K + ˜Σ)−1K","inline":true},{"text":". There is no observation associated with ","element":"span"},{"style":{"height":15.6},"width":317.68,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-9.png","element":"img","alt":" s∗, hence τ∗ = 0","inline":true,"padRight":true},{"text":"in the first line above, and its inclusion has ","element":"span"},{"style":{"height":15.28},"width":284.08,"height":38.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-10.png","element":"img","alt":" cl∗ = 0 for l ≥","inline":true,"padRight":true},{"text":"3. The second line follows by computing matrix partitioned inverses twice on ","element":"span"},{"style":{"height":14.69},"width":53.48,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-11.png","element":"img","alt":" Σ∗","inline":true},{"text":". The joint EP approximation for any new input point ","element":"span"},{"style":{"height":11.09},"width":36.68,"height":27.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-12.png","element":"img","alt":" s∗","inline":true,"padRight":true},{"text":"is directly obtained as","element":"span"}],[{"style":{"width":"49%"},"width":861,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-13.png","element":"img"}],[{"text":"with the marginal ","element":"span"},{"style":{"height":17.6},"width":216.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-14.png","element":"img","alt":" q(x∗) being","inline":true}],[{"style":{"width":"86%"},"width":1491,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-15.png","element":"img"}],[{"text":"According to Equation (29), one needs the covariances Σ","element":"span"},{"style":{"height":10.8},"width":31.8,"height":27,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-16.png","element":"img","alt":"∗j","inline":true,"padRight":true},{"text":"to correct the marginal’s mean; they appear in the last column of ","element":"span"},{"style":{"height":14.69},"width":53.48,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-17.png","element":"img","alt":" Σ∗","inline":true,"padRight":true},{"text":"in Equation (33). The correction is","element":"span"}],[{"style":{"width":"62%"},"width":1087,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-18.png","element":"img"}],[{"text":"The sum over pairs ","element":"span"},{"style":{"height":16.8},"width":109.04,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-19.png","element":"img","alt":" j ̸= n","inline":true,"padRight":true},{"text":"include the added dimension ","element":"span"},{"style":{"height":8.4},"width":22,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-20.png","element":"img","alt":" ∗","inline":true},{"text":", and thus pairs (","element":"span"},{"style":{"height":17.6},"width":291.36,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-21.png","element":"img","alt":"j, ∗) and (∗, n).","inline":true,"padRight":true},{"text":"The cumulants for this problem, used both for EP and correcting it, are derived in Appendix E.2.","element":"span"}],[{"text":"8.2.2 Predictive Corrections","element":"span"}],[{"text":"In Figure 4 we investigate the predictive mean correction for two cases, one where the data cannot realistically be expected to appear under the prior, and the other where the prior is reasonable. For ","element":"span"},{"style":{"height":12.8},"width":110.72,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-22.png","element":"img","alt":" s ∈ R","inline":true},{"text":", the values of ","element":"span"},{"style":{"height":17.6},"width":79.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-23.png","element":"img","alt":" x(s∗","inline":true},{"text":") are predicted using a GP with squared exponential covariance function ","element":"span"},{"style":{"height":21.27},"width":587.52,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-24.png","element":"img","alt":" k(s, s′) = θ exp(− 12(s − s′)2/ℓ).","inline":true}],[{"text":"In the first instance, the prior amplitude ","element":"span"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-25.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"and lengthscale ","element":"span"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-26.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"are deliberately set to values that are too big; in other words, a typical sample from the prior would not match the observed data. We illustrate the posterior marginal ","element":"span"},{"style":{"height":17.6},"width":79.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-27.png","element":"img","alt":" q(x∗","inline":true},{"text":"), and using Equations (29) and (30), show visible corrections to its mean and variance.","element":"span"},{"style":{"height":8.4},"width":17,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/21-28.png","element":"img","alt":"4","inline":true,"padRight":true},{"text":"For comparison, Figure 4","element":"span"}],[{"style":{"width":"62%"},"width":1080,"height":1484,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-0.png","element":"img"}],[{"text":"Figure 4: Predicting ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":79.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-1.png","element":"img","alt":" x(s∗","inline":true},{"text":") with a GP. The “boxed” bars indicate the permissible ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":101.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-2.png","element":"img","alt":" x(sn)","inline":true,"padRight":true},{"text":"values; they are linked to observations ","element":"figcaption","subtype":"caption"},{"style":{"height":12},"width":42.6,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-3.png","element":"img","alt":" yn","inline":true,"padRight":true},{"text":"through the uniform likelihood ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":131.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-4.png","element":"img","alt":" I[|xn −","inline":true},{"style":{"height":17.6},"width":144.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-5.png","element":"img","alt":"yn| < a","inline":true},{"text":"]. Due to the ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":143,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-6.png","element":"img","alt":" U[−a, a","inline":true},{"text":"] noise model, ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":79.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-7.png","element":"img","alt":" q(x∗","inline":true},{"text":") is ambivalent to where in the “box” ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":79.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-8.png","element":"img","alt":" x(s∗","inline":true},{"text":") is placed. A second order correction to the mean of ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":277.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-9.png","element":"img","alt":" q(x∗) is shown","inline":true,"padRight":true},{"text":"in a dotted line. The lightly shaded function plots ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":165.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-10.png","element":"img","alt":" p(x∗), if","inline":true,"padRight":true},{"text":"the likelihood was also Gaussian with variance matching that of the “box”. In the ","element":"figcaption","subtype":"caption"},{"text":"top ","element":"figcaption","subtype":"caption"},{"text":"figure both the prior amplitude ","element":"figcaption","subtype":"caption"},{"style":{"height":12.8},"width":21,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-11.png","element":"img","alt":" θ","inline":true,"padRight":true},{"text":"and lengthscale ","element":"figcaption","subtype":"caption"},{"style":{"height":12.8},"width":18,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-12.png","element":"img","alt":" ℓ","inline":true,"padRight":true},{"text":"are overestimated. In the ","element":"figcaption","subtype":"caption"},{"text":"bottom ","element":"figcaption","subtype":"caption"},{"text":"figure, ","element":"figcaption","subtype":"caption"},{"style":{"height":12.8},"width":140.88,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-13.png","element":"img","alt":"θ and ℓ","inline":true,"padRight":true},{"text":"were chosen by maximizing log ","element":"figcaption","subtype":"caption"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-14.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"with respect to their values. Notice the smaller EP approximation error.","element":"figcaption","subtype":"caption"}],[{"text":"additionally shows what the predictive mean would have been were ","element":"span"},{"style":{"height":17.6},"width":87.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-15.png","element":"img","alt":" {yn}","inline":true,"padRight":true},{"text":"observed under Gaussian noise with the same mean and variance as ","element":"span"},{"style":{"height":17.6},"width":143,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/22-16.png","element":"img","alt":" U[−a, a","inline":true},{"text":"]: it is substantially different.","element":"span"}],[{"style":{"width":"92%"},"width":1605,"height":575,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-0.png","element":"img"}],[{"text":"Figure 5: Predicting ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":79.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-1.png","element":"img","alt":" x(s∗","inline":true},{"text":") with a GP with ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":888.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-2.png","element":"img","alt":" k(s, s′) = exp{−|s − s′|/2ℓ} and ℓ = 1. In the","inline":true},{"style":{"height":16.4},"width":516.48,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-3.png","element":"img","alt":"left figure log RMCMC = 0.","inline":true},{"text":"41, while the second order correction estimates it as log ","element":"figcaption","subtype":"caption"},{"style":{"height":16.4},"width":457.16,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-4.png","element":"img","alt":" R ≈ 0.64. On the right","inline":true},{"text":", the correction to the variance is not as accurate as that on the ","element":"figcaption","subtype":"caption"},{"text":"left","element":"figcaption","subtype":"caption"},{"text":". The ","element":"figcaption","subtype":"caption"},{"text":"right ","element":"figcaption","subtype":"caption"},{"text":"correction is log ","element":"figcaption","subtype":"caption"},{"style":{"height":14.69},"width":243.84,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-5.png","element":"img","alt":" RMCMC = 0.","inline":true},{"text":"28, and its discrepancy with log ","element":"figcaption","subtype":"caption"},{"style":{"height":12.4},"width":125.28,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-6.png","element":"img","alt":" R ≈ 0.","inline":true},{"text":"45 (EP+corr) is much bigger.","element":"figcaption","subtype":"caption"}],[{"text":"In the second instance, log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-7.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"is maximized with respect to the covariance function hyperparameters ","element":"span"},{"style":{"height":12.8},"width":138.96,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-8.png","element":"img","alt":" θ and ℓ","inline":true,"padRight":true},{"text":"to get a kernel function that more reasonably describes the data. The correction to the mean of ","element":"span"},{"style":{"height":17.6},"width":75.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-9.png","element":"img","alt":" q(s∗","inline":true},{"text":") is much smaller, and furthermore, generally follows the “Gaussian noise” posterior mean. When the observed data is not typical under the prior, the correction to ","element":"span"},{"style":{"height":17.6},"width":77.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-10.png","element":"img","alt":" ⟨x∗⟩","inline":true,"padRight":true},{"text":"is substantially bigger than when the prior is representative of the data.","element":"span"}],[{"text":"8.2.3 Underestimating the Truth","element":"span"}],[{"text":"Under closer inspection, the variance in Figure 4 is slightly underestimated in regions where there are many close box constraints ","element":"span"},{"style":{"height":17.6},"width":267.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-11.png","element":"img","alt":" |xn − yn| < a","inline":true},{"text":". However, under sparser constraints relative to the kernel width, EP accurately estimates the predictive mean and variance. In Figure 5 this is taken further: for ","element":"span"},{"style":{"height":17.6},"width":799.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-12.png","element":"img","alt":" N = 100 uniformly spaced inputs s ∈ [0,","inline":true,"padRight":true},{"text":"1], it is clear that ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") becomes too narrow. The second order correction, on the other hand, provides a much closer estimate to the ground truth.","element":"span"}],[{"text":"One might inquire about the behavior of the EP estimate as ","element":"span"},{"style":{"height":12.4},"width":158.72,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-13.png","element":"img","alt":" N → ∞","inline":true,"padRight":true},{"text":"in Figure 5. In the next section, this will be used as a basis for illustrating a special case where log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-14.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"diverges","element":"span"},{"text":".","element":"span"}],[{"text":"8.3 Gaussian Process in a Box","element":"span"}],[{"text":"In the following insightful example—a special case of uniform noise regression—log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-15.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"diverges from the ground truth with more data. Consider the ratio of functions ","element":"span"},{"text":"x","element":"span"},{"text":"(","element":"span"},{"text":"s","element":"span"},{"text":") over [0","element":"span"},{"text":", ","element":"span"},{"text":"1], drawn from a GP prior with kernel ","element":"span"},{"style":{"height":17.6},"width":117.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-16.png","element":"img","alt":" k(s, s′","inline":true},{"text":"), such that ","element":"span"},{"text":"x","element":"span"},{"text":"(","element":"span"},{"text":"s","element":"span"},{"text":") lies within the [","element":"span"},{"style":{"height":17.6},"width":206.4,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-17.png","element":"img","alt":"−a, a] box.","inline":true,"padRight":true},{"text":"Figure 6 illustrates three random draws from a GP prior, two of which are not contained in the [","element":"span"},{"style":{"height":11.2},"width":99.32,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/23-18.png","element":"img","alt":"−a, a","inline":true},{"text":"] interval. ","element":"span"},{"text":"The ratio of functions contained in the interval is equal to the","element":"span"}],[{"style":{"width":"93%"},"width":1612,"height":1132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-0.png","element":"img"}],[{"text":"Figure 6: Samples from a GP prior with kernel ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":784.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-1.png","element":"img","alt":" k(s, s′) = exp{−|s − s′|/2ℓ} with ℓ = 1,","inline":true,"padRight":true},{"text":"two of which are not contained in the [","element":"figcaption","subtype":"caption"},{"style":{"height":11.2},"width":99.32,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-2.png","element":"img","alt":"−a, a","inline":true},{"text":"] interval, are shown ","element":"figcaption","subtype":"caption"},{"text":"top left","element":"figcaption","subtype":"caption"},{"text":". As ","element":"figcaption","subtype":"caption"},{"text":"N ","element":"figcaption","subtype":"caption"},{"text":"increases in Equation (35), with ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":381.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-3.png","element":"img","alt":" sn ∈ [0, 1], log ZEP","inline":true,"padRight":true},{"text":"diverges, while log ","element":"figcaption","subtype":"caption"},{"text":"Z ","element":"figcaption","subtype":"caption"},{"text":"converges to a constant. This is shown ","element":"figcaption","subtype":"caption"},{"text":"top right","element":"figcaption","subtype":"caption"},{"text":". The +’s and ","element":"figcaption","subtype":"caption"},{"style":{"height":8.8},"width":34,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-4.png","element":"img","alt":" ×","inline":true},{"text":"’s indicate the inclusion of the fourth (+) and fourth and sixth (","element":"figcaption","subtype":"caption"},{"style":{"height":8.8},"width":34,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-5.png","element":"img","alt":"×","inline":true},{"text":") cumulants from the 2","element":"figcaption","subtype":"caption"},{"style":{"height":15.54},"width":150.92,"height":38.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-6.png","element":"img","alt":"nd order","inline":true,"padRight":true},{"text":"in Equation (28) (an arrangement by total order would include 3","element":"figcaption","subtype":"caption"},{"style":{"height":17.82},"width":313.64,"height":44.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-7.png","element":"img","alt":"rd order c4–c4–c4","inline":true,"padRight":true},{"text":"in ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":489.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-8.png","element":"img","alt":" ×). Bottom left and right","inline":true,"padRight":true},{"text":"show the growth for 2","element":"figcaption","subtype":"caption"},{"style":{"height":17.63},"width":203.24,"height":44.08,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-9.png","element":"img","alt":"nd order c4","inline":true,"padRight":true},{"text":"correction relative to the exact correction.","element":"figcaption","subtype":"caption"}],[{"text":"normalizing constant of","element":"span"}],[{"style":{"width":"70%"},"width":1224,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-10.png","element":"img"}],[{"text":"The fraction of samples from the GP prior that lie inside [","element":"span"},{"style":{"height":11.2},"width":99.32,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-11.png","element":"img","alt":"−a, a","inline":true},{"text":"] shouldn’t change as the GP is sampled at increasing granularity of inputs ","element":"span"},{"text":"s","element":"span"},{"text":". As Figure 6 illustrates, the MCMC estimate of log ","element":"span"},{"text":"Z ","element":"span"},{"text":"converges to a constant as ","element":"span"},{"style":{"height":12.4},"width":151.52,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-12.png","element":"img","alt":" N → ∞","inline":true},{"text":". The EP estimate log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-13.png","element":"img","alt":" ZEP","inline":true},{"text":", on the other hand, diverges to ","element":"span"},{"style":{"height":8},"width":78.08,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-14.png","element":"img","alt":" −∞","inline":true},{"text":". (The cumulants that are required for the correction in Equation (28), and recipes for deriving them, are given in Appendix E.1.) Of course the correction also depends on the value ","element":"span"},{"text":"a ","element":"span"},{"text":"chosen. Figure 7 shows that for both ","element":"span"},{"style":{"height":12.8},"width":439.52,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/24-15.png","element":"img","alt":" a → 0 and a → ∞ the","inline":true,"padRight":true},{"text":"correction is zero for large ","element":"span"},{"text":"N","element":"span"},{"text":".","element":"span"}],[{"style":{"width":"55%"},"width":966,"height":598,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-0.png","element":"img"}],[{"text":"Figure 7: The accurateness of log ","element":"figcaption","subtype":"caption"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-1.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"depends on the size of the [","element":"figcaption","subtype":"caption"},{"style":{"height":11.2},"width":99.32,"height":28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-2.png","element":"img","alt":"−a, a","inline":true},{"text":"] box relative to ","element":"figcaption","subtype":"caption"},{"style":{"height":15.6},"width":30.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-3.png","element":"img","alt":" ℓ,","inline":true,"padRight":true},{"text":"with the estimation being exact as ","element":"figcaption","subtype":"caption"},{"style":{"height":12.8},"width":337.76,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-4.png","element":"img","alt":" a → 0 and a → ∞","inline":true},{"text":". The second order correction for Figure 6’s kernel is illustrated here over varying ","element":"figcaption","subtype":"caption"},{"text":"a","element":"figcaption","subtype":"caption"},{"text":"’s. The +’s and ","element":"figcaption","subtype":"caption"},{"style":{"height":8.8},"width":34,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-5.png","element":"img","alt":" ×","inline":true},{"text":"’s indicate the inclusion of the 4","element":"figcaption","subtype":"caption"},{"style":{"height":19.54},"width":493.36,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-6.png","element":"img","alt":"th (+) and 4th and 6th (×","inline":true},{"text":") cumulants in Equation (28). Of these, the top pair of lines are for ","element":"figcaption","subtype":"caption"},{"text":"N ","element":"figcaption","subtype":"caption"},{"text":"= 100, and the bottom pair for ","element":"figcaption","subtype":"caption"},{"text":"N ","element":"figcaption","subtype":"caption"},{"text":"= 50.","element":"figcaption","subtype":"caption"}],[{"text":"An intuitive explanation, due to Philipp Hennig, takes a one-dimensional model ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") = ","element":"span"},{"style":{"height":19.54},"width":378.24,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-7.png","element":"img","alt":"I[|x| < a]N N(x ; 0,","inline":true,"padRight":true},{"text":"1). A fully-factorized approximation therefore has ","element":"span"},{"style":{"height":12.8},"width":329.96,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-8.png","element":"img","alt":" N − 1 redundant","inline":true,"padRight":true},{"text":"factors, as removing them doesn’t change ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). However, each additional ","element":"span"},{"text":"I","element":"span"},{"text":"[","element":"span"},{"text":"|","element":"span"},{"text":"x","element":"span"},{"text":"| ","element":"span"},{"text":"< a","element":"span"},{"text":"] truncates the estimate, forcing EP to further reduce the variance of ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":"). The EP estimate using ","element":"span"},{"text":"N ","element":"span"},{"text":"factors ","element":"span"},{"style":{"height":20.34},"width":239.76,"height":50.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-9.png","element":"img","alt":" I[|x| < a]1/N ","inline":true,"padRight":true},{"text":"is correct (see Appendix C for a similar example and analysis), even though the original problem remains unchanged. ","element":"span"},{"text":"Even though this immediate solution cannot be applied to Equation (35), the ","element":"span"},{"text":"redundancy ","element":"span"},{"text":"across factors could be addressed by a principled junction tree-like factorization, where tuples of “neighboring” factors can be co-treated. Although beyond the scope of this paper, Appendix A gives a guideline on how to structure such an approximation.","element":"span"}]]},{"heading":"9. Ising Model Results","paragraphs":[[{"text":"This section discusses various aspects of corrections to EP as applied to the Ising model—a Bayesian network with binary variables and pairwise potentials—in Equation (3).","element":"span"}],[{"text":"We consider the set-up proposed by Wainwright and Jordan (2006) in which ","element":"span"},{"text":"N ","element":"span"},{"text":"= 16 nodes are either fully connected or connected to their nearest neighbors in a 4-by-4 grid. The external field (observation) strengths ","element":"span"},{"style":{"height":15.09},"width":32.64,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-10.png","element":"img","alt":" θi","inline":true,"padRight":true},{"text":"are drawn from a ","element":"span"},{"text":"uniform ","element":"span"},{"text":"distribution ","element":"span"},{"style":{"height":15.09},"width":85.36,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-11.png","element":"img","alt":" θi ∼","inline":true},{"style":{"height":17.6},"width":540.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-12.png","element":"img","alt":"U[−dobs, dobs] with dobs = 0.","inline":true},{"text":"25. Three types of coupling strength statistics are considered: repulsive (anti-ferromagnetic) ","element":"span"},{"style":{"height":18.29},"width":1144.44,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-13.png","element":"img","alt":" Jij ∼ U[−2dcoup, 0], mixed Jij ∼ U[−dcoup, +dcoup], and at-","inline":true,"padRight":true},{"text":"tractive (ferromagnetic) ","element":"span"},{"style":{"height":18.29},"width":370.56,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/25-14.png","element":"img","alt":" Jij ∼ U[0, +2dcoup].","inline":true}],[{"text":"Previously we have shown (Opper and Winther, 2005) that EP/EC gives very competitive results compared to several standard methods. In Section 9.1 we are interested in investigating whether a further improvement is obtained with the cumulant expansion. In","element":"span"}],[{"style":{"width":"68%"},"width":1192,"height":695,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/26-0.png","element":"img"}],[{"text":"Table 1: Average absolute deviation (AAD) of marginals in a Wainwright-Jordan set-up, comparing loopy belief propagation (LBP), log-determinant relaxation (LD), EC, EC with ","element":"figcaption","subtype":"caption"},{"text":"l ","element":"figcaption","subtype":"caption"},{"text":"= 4 second order correction (EC c), and an EC tree (EC t). ","element":"figcaption","subtype":"caption"},{"text":"Results in bold face highlight best results, while italics indicate where the cumulant expression is less accurate than the original approximation.","element":"figcaption","subtype":"caption"}],[{"text":"Section 9.2, we revisit the correction approach proposed in Paquet et al. (2009) and make and empirical comparison with the cumulant approach.","element":"span"}],[{"text":"9.1 Cumulant Expansion","element":"span"}],[{"text":"For the factorized approximation we use Equations (26) and (29) for the log ","element":"span"},{"text":"Z ","element":"span"},{"text":"and marginal corrections, respectively. ","element":"span"},{"text":"The expression for the cumulants of the Ising model is given in Section 4.2.1. ","element":"span"},{"text":"The derivation of the corresponding tree expressions may be found in Appendices B and E.4.","element":"span"}],[{"text":"Table 1 gives the average absolute deviation (AAD) of marginals","element":"span"}],[{"style":{"width":"76%"},"width":1318,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/26-1.png","element":"img"}],[{"text":"while Table 2 gives the absolute deviation of log ","element":"span"},{"text":"Z ","element":"span"},{"text":"averaged of 100 repetitions. In two cases (Grid, ","element":"span"},{"style":{"height":18.29},"width":1596.68,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/26-2.png","element":"img","alt":" dcoup = 2 Repulsive and Attractive coupling) we observed some numerical problems","inline":true,"padRight":true},{"text":"with the EC tree solver. It might be some cases that a solution does not exist but we ascribe numerical instabilities in our implementation as the main cause for these problems. It is currently out of the scope of this work to come up with a better solver. We choose to report the average performance for those runs that could attain a high degree of expectation consistency: ","element":"span"},{"style":{"height":24.26},"width":563.72,"height":60.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/26-3.png","element":"img","alt":"�Ni=1(⟨xi⟩qi − ⟨xi⟩q)2 ≤ 10−20","inline":true},{"text":". This was 69 out of 100 in the mentioned cases ","element":"span"},{"text":"and 100 of 100 in the remaining.","element":"span"}],[{"text":"We observe that for the Grid simulations, the corrected marginals in factorized approximation are less accurate than the original approximation. In Figure 8 we vary the coupling strength for a specific set-up (Grid Mixed) and observe a cross-over between the","element":"span"}],[{"style":{"width":"74%"},"width":1289,"height":695,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/27-0.png","element":"img"}],[{"text":"Table 2: Absolute deviation log partition function in a Wainwright-Jordan set-up, comparing EC, EC with ","element":"figcaption","subtype":"caption"},{"text":"l ","element":"figcaption","subtype":"caption"},{"text":"= 4 second order correction (EC c), EC with a full second order ","element":"figcaption","subtype":"caption"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/27-1.png","element":"img","alt":"ε","inline":true,"padRight":true},{"text":"expansion (EC ","element":"figcaption","subtype":"caption"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/27-2.png","element":"img","alt":" ε","inline":true},{"text":"c), EC tree (EC t) and EC tree with ","element":"figcaption","subtype":"caption"},{"text":"l ","element":"figcaption","subtype":"caption"},{"text":"= 4 second order correction (EC tc). Results in bold face highlight best results. The cumulant expression is consistently more accurate than the original approximation.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"91%"},"width":1587,"height":583,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/27-3.png","element":"img"}],[{"text":"Figure 8: Error on marginal ","element":"figcaption","subtype":"caption"},{"text":"(left) ","element":"figcaption","subtype":"caption"},{"text":"and log ","element":"figcaption","subtype":"caption"},{"text":"Z ","element":"figcaption","subtype":"caption"},{"text":"(right) ","element":"figcaption","subtype":"caption"},{"text":"for grid and mixed couplings as a function of coupling strength.","element":"figcaption","subtype":"caption"}],[{"text":"correction and original for the error on marginals as the coupling strength increases. We conjecture that when the error of the original solution is high then the number of terms needed in the cumulant correction increases. The estimation of the marginal seems more sensitive to this than the log ","element":"span"},{"text":"Z ","element":"span"},{"text":"estimate. The tree approximation is very precise for the whole coupling strength interval considered and the fourth order cumulant in the second order expansion is therefore sufficient to get often quite large improvements over the original tree approximation.","element":"span"}],[{"text":"9.2 The ","element":"span"},{"text":"ε","element":"span"},{"text":"-Expansion","element":"span"}],[{"text":"In Paquet et al. (2009) we introduced an alternative expansion for ","element":"span"},{"text":"R ","element":"span"},{"text":"and applied it to Gaussian processes and mixture models. It is obtained from Equation (12) using a finite series expansion, where the normalized deviation","element":"span"}],[{"style":{"width":"22%"},"width":387,"height":104,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-0.png","element":"img"}],[{"text":"is treated as the small quantity instead of higher order cumulants. ","element":"span"},{"text":"R ","element":"span"},{"text":"has an exact representation with 2","element":"span"},{"style":{"height":8.8},"width":30,"height":22,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-1.png","element":"img","alt":"N ","inline":true,"padRight":true},{"text":"terms that we may truncate at lowest non-trivial order:","element":"span"}],[{"style":{"width":"71%"},"width":1228,"height":141,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-2.png","element":"img"}],[{"text":"The linear terms are all equal to one because","element":"span"},{"style":{"height":34.69},"width":847.04,"height":86.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-3.png","element":"img","alt":"�qn(xn)q(xn)�q =�q(xn)qn(xn)q(xn) dxn = 1 and since","inline":true}],[{"style":{"height":17.6},"width":105,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-4.png","element":"img","alt":"qn(xn","inline":true},{"text":") is a binary distribution the quadratic term becomes a weighted sum of ratios of Normal distributions:","element":"span"}],[{"style":{"width":"67%"},"width":1161,"height":122,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-5.png","element":"img"}],[{"text":"The final expression for the lowest order approximation to ","element":"span"},{"text":"R ","element":"span"},{"text":"is then","element":"span"}],[{"style":{"width":"76%"},"width":1327,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-6.png","element":"img"}],[{"text":"From Table 2 we observe an improvement over the original factorized approximation and results similar to the cumulant correction to the factorized approximation for all settings. The ","element":"span"},{"style":{"height":8.4},"width":21,"height":21,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/28-7.png","element":"img","alt":"ε","inline":true},{"text":"-expansion may also used to calculate marginals and applied to generalized factorizations. These topics will be studied elsewhere.","element":"span"}]]},{"heading":"10. Future Directions","paragraphs":[[{"text":"Corrections to Gaussian EP approximations were examined in this paper. The Gaussian measure allowed for a convenient set of mathematical tools to be employed, mostly because it admits orthogonality of a set of polynomials, the Hermite polynomials, which allowed a clean simplification of many expressions. So far we have restricted ourselves to expansions to low orders in cumulants. ","element":"span"},{"text":"Our results indicate that these first corrections to EP can already provide useful information about the quality of the EP solution. Small corrections typically show that EP is fairly accurate and the corrections improve on that. On the other hand, large corrections indicate that the EP approximation performs poorly. The low order corrections can yield a step in the right direction but in general their result may not be trusted and alternatives to the Gaussian EP approximation should be considered. It will be interesting to develop similar expansions to EP approximations with other exponential families besides the Gaussian one.","element":"span"}],[{"text":"$3e","element":"span"}],[{"text":"We expect that such questions could at least be answered for toy models such as the ","element":"span"},{"text":"Gaussian process in a box ","element":"span"},{"text":"model of Section 8.3. Our results for the latter example (together with the related ","element":"span"},{"text":"uniform noise regression ","element":"span"},{"text":"case) indicates that EP may not be understood as an off the shelf method for approximately calculating arbitrary high dimensional sums or integrals. One may conjecture that its quality strongly depends on the fact that such sums or integrals may or may not have an interpretation in terms of a proper statistical inference model which contain data that are highly probable with respect to the model. It would be interesting to see if one can develop a theory for the average case performance of EP under such statistical assumptions of the data.","element":"span"}]]},{"heading":"Appendix A. Factorizations: Gaussian Examples","paragraphs":[[{"text":"As ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") is a latent Gaussian model, the ","element":"span"},{"text":"g","element":"span"},{"text":"-terms in Equation (5) are chosen in this paper to give a Gaussian approximation","element":"span"}],[{"style":{"width":"43%"},"width":760,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-0.png","element":"img"}],[{"text":"The sufficient statistics ","element":"span"},{"style":{"height":17.6},"width":69.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-1.png","element":"img","alt":" φ(x","inline":true},{"text":") and natural parameters ","element":"span"},{"style":{"height":12},"width":29,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-2.png","element":"img","alt":" λ","inline":true,"padRight":true},{"text":"of the Gaussian are defined as","element":"span"}],[{"style":{"width":"43%"},"width":749,"height":56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-3.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":21.46},"width":878.52,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-4.png","element":"img","alt":" λT φ(x) = γT x − 12 tr[ΛxxT ] = γT x − 12xT Λx","inline":true},{"text":". There exists a bijection between the ","element":"span"},{"text":"canonical parameters ","element":"span"},{"style":{"height":16},"width":165.16,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-5.png","element":"img","alt":" µ and Σ","inline":true,"padRight":true},{"text":"and natural parameters, such that the mean and covariance can be determined with ","element":"span"},{"style":{"height":18.34},"width":440.64,"height":45.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-6.png","element":"img","alt":" Σ = Λ−1 and µ = Σγ.","inline":true}],[{"text":"In Equation (1) we can define ","element":"span"},{"style":{"height":20.73},"width":856.84,"height":51.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-7.png","element":"img","alt":" g0(x) = exp{λT0 φ(x)}, where λ0 = (γ(0), Λ(0)","inline":true},{"text":"), such that ","element":"span"},{"text":"it is essentially a rescaling of factor ","element":"span"},{"style":{"height":16.4},"width":38.6,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-8.png","element":"img","alt":" f0","inline":true},{"text":". In the Ising model in Equation (3), this means that ","element":"span"},{"style":{"height":19.54},"width":453.64,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-9.png","element":"img","alt":"Λ(0) = −J and γ(0) = θ","inline":true},{"text":". In the Gaussian process classification model in Equation (2), this implies that ","element":"span"},{"style":{"height":19.54},"width":491.52,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-10.png","element":"img","alt":" Λ(0) = K−1 and γ(0) = 0.","inline":true}],[{"style":{"width":"38%"},"width":664,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-11.png","element":"img"}],[{"text":"It remains to define a suitable factorization for the term-product ","element":"span"},{"style":{"height":18.38},"width":172.2,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/29-12.png","element":"img","alt":"�n tn(xn","inline":true},{"text":"). This factoriza- ","element":"span"},{"text":"tion can be fully factorized, factorized over disjoint sets of variables, factorized as a tree, or follow more arbitrary factorizations (see the simple example in Appendix C). A few such factorizations are given below in increasing orders of complexity. ","element":"span"},{"text":"In each case we do not ","element":"span"},{"style":{"height":16.4},"width":613,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-0.png","element":"img","alt":"include the f0 factor for clarity.","inline":true,"padRight":true},{"text":"Furthermore, even though the term factorization may be chosen to fully factorize, ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") may be fully connected through the inclusion of ","element":"span"},{"style":{"height":16.4},"width":52.32,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-1.png","element":"img","alt":" f0.","inline":true}],[{"style":{"width":"30%"},"width":520,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-2.png","element":"img"}],[{"text":"A common factorization of ","element":"span"},{"style":{"height":18.38},"width":171.72,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-3.png","element":"img","alt":"�n tn(xn","inline":true},{"text":") is to set ","element":"span"},{"style":{"height":17.6},"width":276.84,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-4.png","element":"img","alt":" fn(x) = tn(xn","inline":true},{"text":"). The natural parameters ","element":"span"},{"text":"of ","element":"span"},{"style":{"height":19.55},"width":427.6,"height":48.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-5.png","element":"img","alt":" gn(x) = exp{λTnφ(x)}","inline":true,"padRight":true},{"text":"are chosen to be ","element":"span"},{"style":{"height":23.22},"width":310.6,"height":58.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-6.png","element":"img","alt":" λn = (γ(n)n , Λ(n)nn ","inline":true,"padRight":true},{"text":"), corresponding to ","element":"span"},{"style":{"height":17.6},"width":182.8,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-7.png","element":"img","alt":" φn(xn) =","inline":true,"padRight":true},{"text":"(","element":"span"},{"style":{"height":21.45},"width":173.64,"height":53.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-8.png","element":"img","alt":"xn, − 12x2n","inline":true},{"text":"). For clarity the other ","element":"span"},{"style":{"height":11.6},"width":24,"height":29,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-9.png","element":"img","alt":" γ","inline":true,"padRight":true},{"text":"and Λ parameters in ","element":"span"},{"style":{"height":14.69},"width":50.28,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-10.png","element":"img","alt":" λn ","inline":true,"padRight":true},{"text":"are not shown, as they are ","element":"span"},{"text":"clamped at zero. This gives an approximation ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") that is defined by ","element":"span"},{"style":{"height":18.24},"width":328.8,"height":45.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-11.png","element":"img","alt":" λ = λ0 + �n λn.","inline":true}],[{"style":{"width":"50%"},"width":881,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-12.png","element":"img"}],[{"text":"As a second step the ","element":"span"},{"text":"N ","element":"span"},{"text":"variables can be subdivided into ","element":"span"},{"style":{"height":17.6},"width":642.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-13.png","element":"img","alt":" disjoint pairs xπ = (xm, xn). The","inline":true,"padRight":true},{"text":"factorization over terms couples pairs of variables through","element":"span"}],[{"style":{"width":"53%"},"width":926,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-14.png","element":"img"}],[{"text":"In this case each factor will have a contribution ","element":"span"},{"style":{"height":19.54},"width":426.64,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-15.png","element":"img","alt":" gπ(x) = exp{λTπφ(x)}","inline":true,"padRight":true},{"text":"to the overall ap- ","element":"span"},{"text":"proximation, and, as ","element":"span"},{"style":{"height":12},"width":43.64,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-16.png","element":"img","alt":" gπ","inline":true,"padRight":true},{"text":"is a function of two variables, it is parameterized by the “correlated Gaussian form” ","element":"span"},{"style":{"height":23.22},"width":621,"height":58.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-17.png","element":"img","alt":" λπ = (γ(π)m , γ(π)n , Λ(π)mm, Λ(π)nn , Λ(π)mn","inline":true},{"text":"). By symmetry Λ","element":"span"},{"style":{"height":21.12},"width":193.32,"height":52.8,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-18.png","element":"img","alt":"(π)nm = Λ(π)mn","inline":true},{"text":". The result- ","element":"span"},{"text":"ing ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") is defined in terms of these disjoint sets with ","element":"span"},{"style":{"height":18.24},"width":336.48,"height":45.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-19.png","element":"img","alt":" λ = λ0 + �π λπ.","inline":true}],[{"style":{"width":"48%"},"width":839,"height":33,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-20.png","element":"img"}],[{"text":"A tree structure factorization can be defined by extending the above “disjoint pairs” case to allow for overlaps between terms. Let ","element":"span"},{"text":"G ","element":"span"},{"text":"define a spanning tree structure over all ","element":"span"},{"text":"x","element":"span"},{"text":", and let ","element":"span"},{"style":{"height":17.6},"width":290.52,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-21.png","element":"img","alt":" τ = (m, n) ∈ G","inline":true,"padRight":true},{"text":"define the edges in the tree. Let ","element":"span"},{"style":{"height":15.09},"width":43.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-22.png","element":"img","alt":" dn","inline":true,"padRight":true},{"text":"be the number of edges emanating from node ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-23.png","element":"img","alt":" xn","inline":true,"padRight":true},{"text":"in the graph. Through a clever regrouping of terms into a “junction tree” form with","element":"span"}],[{"style":{"width":"62%"},"width":1073,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-24.png","element":"img"}],[{"text":"the term-approximation will be tree-structured. In this example the ","element":"span"},{"style":{"height":14.69},"width":54,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-25.png","element":"img","alt":" Da","inline":true,"padRight":true},{"text":"powers are 1 for edge factors ","element":"span"},{"style":{"height":17.6},"width":278.28,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-26.png","element":"img","alt":" fτ and (1 − dn","inline":true},{"text":") for node factors ","element":"span"},{"style":{"height":17.6},"width":444.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-27.png","element":"img","alt":" fn. Let gτ (x) and gn(x","inline":true},{"text":") be parameterized by ","element":"span"},{"style":{"height":15.49},"width":203.4,"height":38.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-28.png","element":"img","alt":"λτ and λn","inline":true},{"text":", as was done in the two examples above. Using","element":"span"}],[{"style":{"width":"44%"},"width":761,"height":111,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-29.png","element":"img"}],[{"text":"the resulting ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") has parameter vector ","element":"span"},{"style":{"height":18.78},"width":660.48,"height":46.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/30-30.png","element":"img","alt":" λ = λ0 + �τ λτ − �n(dn − 1)λn.","inline":true}],[{"text":"It is useful to note that the form of the tree-structured approximation given here is that used by Opper and Winther (2005); it approximates the “junction tree” form using a Power EP factorization (Minka, 2004). The factorization and stationary condition is ","element":"span"},{"text":"different ","element":"span"},{"text":"from that of Tree EP (Minka and Qi, 2004).","element":"span"}],[{"style":{"width":"26%"},"width":463,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-0.png","element":"img"}],[{"text":"The EP moment matching conditions from Equation (7) are uniquely met at the stationary point of log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-1.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"in Equation (8), and are shown here. ","element":"span"},{"text":"Consider the logarithm of the normalizer,","element":"span"}],[{"style":{"width":"68%"},"width":1177,"height":93,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-2.png","element":"img"}],[{"text":"Using the sufficient statistics and natural parameters defined above, the two normalizers that constitute Equation (36) are","element":"span"}],[{"style":{"width":"45%"},"width":785,"height":216,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-3.png","element":"img"}],[{"text":"Using these definitions, the derivatives of the terms in Equation (36) with respect to some EP factor ","element":"span"},{"text":"c","element":"span"},{"text":"’s parameters ","element":"span"},{"style":{"height":14.69},"width":119.36,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-4.png","element":"img","alt":" λc are","inline":true}],[{"style":{"width":"63%"},"width":1098,"height":229,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-5.png","element":"img"}],[{"text":"When ","element":"span"},{"style":{"height":17.6},"width":534.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-6.png","element":"img","alt":" ∂ log ZEP/∂λc = 0 for any c","inline":true},{"text":", the following therefore holds:","element":"span"}],[{"style":{"width":"69%"},"width":1208,"height":102,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-7.png","element":"img"}],[{"text":"Let ","element":"span"},{"text":"D ","element":"span"},{"text":"be a square matrix where the values in column ","element":"span"},{"style":{"height":14.69},"width":162.48,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-8.png","element":"img","alt":" a are Da","inline":true},{"text":"; all the rows in ","element":"span"},{"text":"D ","element":"span"},{"text":"are equal and it is singular. Furthermore, let ","element":"span"},{"style":{"height":20.79},"width":458.56,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-9.png","element":"img","alt":" ψa = ⟨φ(x)⟩qa − ⟨φ(x)⟩q","inline":true},{"text":". By stacking all the ","element":"span"},{"style":{"height":16},"width":171.28,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-10.png","element":"img","alt":" ψa’s into","inline":true,"padRight":true},{"text":"a column vector ","element":"span"},{"style":{"height":15.6},"width":33,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-11.png","element":"img","alt":" ψ","inline":true},{"text":", the above set of equalities lead to a system of equations","element":"span"}],[{"style":{"width":"26%"},"width":457,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-12.png","element":"img"}],[{"text":"(The Kronecker product is only required as the sufficient statistics’ differences ","element":"span"},{"style":{"height":16},"width":159.68,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-13.png","element":"img","alt":" ψa have","inline":true,"padRight":true},{"text":"dimensionality “dim”, usually larger than one.) As ","element":"span"},{"style":{"height":12},"width":115.48,"height":30,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-14.png","element":"img","alt":" D − I","inline":true,"padRight":true},{"text":"is nonsingular, it is solved by ","element":"span"},{"style":{"height":15.6},"width":118.12,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-15.png","element":"img","alt":"ψ = 0","inline":true},{"text":", and hence ","element":"span"},{"style":{"height":20.78},"width":529.44,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-16.png","element":"img","alt":" ⟨φ(x)⟩qa = ⟨φ(x)⟩q for all a.","inline":true}],[{"text":"The choice of parameterization of ","element":"span"},{"style":{"height":14.69},"width":47.28,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-17.png","element":"img","alt":" λa","inline":true,"padRight":true},{"text":"might give an overcomplete representation, and the exact moment-matching conditions ","element":"span"},{"style":{"height":20.59},"width":359.2,"height":51.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-18.png","element":"img","alt":" ⟨φ(x)⟩qa = ⟨φ(x)⟩q","inline":true,"padRight":true},{"text":"might have more than one unique solution. However, this does not invalidate that at the stationary point of Equation (36), all moment-matching conditions must hold.","element":"span"}]]},{"heading":"Appendix B. Tree-Structured Approximation","paragraphs":[[{"text":"Let the factorization of the term-product ","element":"span"},{"style":{"height":18.38},"width":172.2,"height":45.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-19.png","element":"img","alt":"�n tn(xn","inline":true},{"text":") take the form of a tree ","element":"span"},{"text":"G ","element":"span"},{"text":"with edges ","element":"span"},{"style":{"height":17.6},"width":291.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/31-20.png","element":"img","alt":"τ = (m, n) ∈ G","inline":true},{"text":", as is described in Appendix A.1.3. The number connections to a node or","element":"span"}],[{"text":"vertex ","element":"span"},{"text":"n ","element":"span"},{"text":"shall be denoted by ","element":"span"},{"style":{"height":15.09},"width":43.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-0.png","element":"img","alt":" dn","inline":true},{"text":". From Equation (32) the second order expansion is","element":"span"}],[{"style":{"width":"88%"},"width":1524,"height":254,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-1.png","element":"img"}],[{"text":"where the inner expectations are over ","element":"span"},{"style":{"height":17.6},"width":271.32,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-2.png","element":"img","alt":" kτ |x and kn|x","inline":true},{"text":", while the outer expectations are over ","element":"span"},{"style":{"height":14.74},"width":55.4,"height":36.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-3.png","element":"img","alt":"x.5 ","inline":true,"padRight":true},{"text":"The edge-edge, edge-node, and node-node expectations that are needed in Equation (37) are given in the following three sections.","element":"span"}],[{"style":{"width":"36%"},"width":634,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-4.png","element":"img"}],[{"text":"The edge-edge expectation provides a beautiful illustration of the combinatorics that may be involved in Wick’s theorem. For ","element":"span"},{"style":{"height":16.8},"width":131.2,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-5.png","element":"img","alt":" τ ̸= τ ′","inline":true},{"text":", the following expectation needs to be evaluated:","element":"span"}],[{"style":{"width":"92%"},"width":1603,"height":225,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-6.png","element":"img"}],[{"text":"The vectors ","element":"span"},{"style":{"height":8},"width":33,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-7.png","element":"img","alt":" α","inline":true,"padRight":true},{"text":"that are summed over to get ","element":"span"},{"style":{"height":17.6},"width":864.68,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-8.png","element":"img","alt":" |α| = l are α = (0, l), (1, l − 1), . . . , (l, 0); let","inline":true},{"style":{"height":17.6},"width":562.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-9.png","element":"img","alt":"α = (α1, l − α1) when |α| = l","inline":true},{"text":". From the independence of ","element":"span"},{"style":{"height":17.6},"width":300.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-10.png","element":"img","alt":" kτ |x and kτ ′|x,","inline":true}],[{"style":{"width":"94%"},"width":1629,"height":91,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-11.png","element":"img"}],[{"text":"and therefore ","element":"span"},{"style":{"height":17.6},"width":777.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-12.png","element":"img","alt":" ⟨⟨rτ ⟩⟨rτ ′⟩⟩ = ⟨⟨rτ rτ ′⟩⟩ whenever τ ̸= τ ′.","inline":true}],[{"text":"Wick’s theorem is again instrumental in computing ","element":"span"},{"style":{"height":18.97},"width":153.32,"height":47.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-13.png","element":"img","alt":" ⟨kατ kα′τ ′ ⟩","inline":true},{"text":", as all possible pairings of ","element":"span"},{"text":"the random variables ","element":"span"},{"style":{"height":21.07},"width":636.48,"height":52.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-14.png","element":"img","alt":" kτ = (kτ1, kτ2) and kτ ′ = (kτ ′1, kτ ′2","inline":true},{"text":") need to be included. As ","element":"span"},{"style":{"height":20.88},"width":181.92,"height":52.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-15.png","element":"img","alt":" ⟨k2τ1⟩ = 0,","inline":true},{"style":{"height":24.91},"width":1727,"height":62.28,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-16.png","element":"img","alt":"⟨kτ1kτ2⟩ = 0, ⟨k2τ ′1⟩ = 0, and ⟨kτ ′1kτ ′2⟩ = 0, the only non-zero expectations in the Wick","inline":true,"padRight":true},{"text":"expansion of Equation (39) occur when ","element":"span"},{"text":"all ","element":"span"},{"text":"the variables in ","element":"span"},{"style":{"height":15.87},"width":217.44,"height":39.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-17.png","element":"img","alt":" kτ and kτ ′","inline":true,"padRight":true},{"text":"are paired. This immediately means that ","element":"span"},{"style":{"height":26.14},"width":798.12,"height":65.36,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-18.png","element":"img","alt":" ⟨kα1τ1 kl−α1τ2 kα′1τ ′1 ks−α′1τ ′2 ⟩ = 0 whenever l ̸= s","inline":true},{"text":", as there will be some remaining variables in ","element":"span"},{"style":{"height":17.6},"width":193.92,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-19.png","element":"img","alt":" kτ (or kτ ′","inline":true},{"text":") that can’t be paired and have to be self-paired with zero expectation.","element":"span"}],[{"text":"Given ","element":"span"},{"text":"l ","element":"span"},{"text":"= ","element":"span"},{"text":"s","element":"span"},{"text":", evaluate the expectation in Equation (39). ","element":"span"},{"text":"We introduce the “pairing count” vector ","element":"span"},{"style":{"height":15.6},"width":29,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-20.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"with elements ","element":"span"},{"style":{"height":17.89},"width":161,"height":44.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-21.png","element":"img","alt":" βj ∈ N0","inline":true,"padRight":true},{"text":"and constraint ","element":"span"},{"style":{"height":23.85},"width":625.28,"height":59.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-22.png","element":"img","alt":"�4j=1 βj = l. Let β1 count the","inline":true,"padRight":true},{"text":"number of pairings of ","element":"span"},{"style":{"height":20.67},"width":375.56,"height":51.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-23.png","element":"img","alt":" kτ1 with kτ ′1, and β2","inline":true,"padRight":true},{"text":"count the number of pairings of ","element":"span"},{"style":{"height":20.67},"width":305.48,"height":51.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-24.png","element":"img","alt":" kτ1 with kτ ′2. As","inline":true,"padRight":true},{"text":"there are ","element":"span"},{"style":{"height":16.82},"width":113.88,"height":42.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-25.png","element":"img","alt":" α1 kτ1","inline":true,"padRight":true},{"text":"terms, the sum of its outgoing pairings should equal ","element":"span"},{"style":{"height":15.09},"width":145.92,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-26.png","element":"img","alt":" α1 with","inline":true}],[{"style":{"width":"15%"},"width":269,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/32-27.png","element":"img"}],[{"text":"A furthermore requirement is that","element":"span"}],[{"style":{"width":"53%"},"width":916,"height":47,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-0.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.41},"width":793.64,"height":43.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-1.png","element":"img","alt":" α2 = l−α1 and α′2 = l−α′1, and β3 and β4","inline":true,"padRight":true},{"text":"be as in the Wick expansion below. Define ","element":"span"},{"text":"B ","element":"span"},{"text":"to be the set of all such ","element":"span"},{"style":{"height":15.6},"width":29,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-2.png","element":"img","alt":" β","inline":true},{"text":"’s, and let ","element":"span"},{"style":{"height":17.6},"width":71.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-3.png","element":"img","alt":" C(β","inline":true},{"text":") count the number of permuted configurations for a given pairing ","element":"span"},{"style":{"height":15.6},"width":29,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-4.png","element":"img","alt":" β","inline":true},{"text":". From Wick’s theorem the expected value is equal to the sum over all possible pairings ","element":"span"},{"style":{"height":15.6},"width":42.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-5.png","element":"img","alt":" β:","inline":true}],[{"style":{"width":"84%"},"width":1455,"height":110,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-6.png","element":"img"}],[{"text":"A simple scheme to enumerate all ","element":"span"},{"style":{"height":16},"width":273.8,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-7.png","element":"img","alt":" β ∈ B is to let","inline":true}],[{"style":{"width":"55%"},"width":965,"height":80,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-8.png","element":"img"}],[{"text":"so that ","element":"span"},{"style":{"height":17.81},"width":1243.2,"height":44.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-9.png","element":"img","alt":" β ∈ B for each β1 ∈ {max(0, (α1 + α′1) − l), . . . , min(α1, α′1)}.","inline":true,"padRight":true},{"text":"The remaining components of ","element":"span"},{"style":{"height":15.6},"width":29,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-10.png","element":"img","alt":" β","inline":true,"padRight":true},{"text":"are uniquely determined from ","element":"span"},{"style":{"height":16.4},"width":55.68,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-11.png","element":"img","alt":" β1.","inline":true}],[{"style":{"width":"31%"},"width":538,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-12.png","element":"img"}],[{"text":"How many permuted pairings ","element":"span"},{"style":{"height":17.6},"width":71.24,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-13.png","element":"img","alt":" C(β","inline":true},{"text":") are there?","element":"span"}],[{"text":"1. There are","element":"span"},{"style":{"height":23.36},"width":78.56,"height":58.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-14.png","element":"img","alt":"�α1β1�","inline":true},{"text":"ways of choosing ","element":"span"},{"style":{"height":17.22},"width":112.92,"height":43.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-15.png","element":"img","alt":" β1 kτ1","inline":true},{"text":"’s, and then ","element":"span"},{"style":{"height":28.97},"width":135.72,"height":72.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-16.png","element":"img","alt":"α′1!(α′1−β1)!","inline":true,"padRight":true},{"text":"ways of choosing ","element":"span"},{"style":{"height":20.08},"width":112.24,"height":50.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-17.png","element":"img","alt":" kτ ′1 to","inline":true,"padRight":true},{"text":"pair with.","element":"span"}],[{"text":"2. This leaves a remaining (","element":"span"},{"style":{"height":17.62},"width":234.84,"height":44.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-18.png","element":"img","alt":"α1 − β1) kτ1","inline":true},{"text":"’s, that need to be paired with (","element":"span"},{"style":{"height":20.88},"width":250.56,"height":52.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-19.png","element":"img","alt":"l − α′1) kτ ′2’s.","inline":true,"padRight":true},{"text":"There are ","element":"span"},{"style":{"height":29.37},"width":289.32,"height":73.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-20.png","element":"img","alt":"(l−α′1)!((l−α′1)−(α1−β1))!","inline":true,"padRight":true},{"text":"such pairings.","element":"span"}],[{"text":"3. There are also ","element":"span"},{"style":{"height":20.67},"width":430.56,"height":51.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-21.png","element":"img","alt":" α′1 − β1 remaining kτ ′1","inline":true},{"text":"’s, that need to be paired with ","element":"span"},{"style":{"height":16.81},"width":52.44,"height":42.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-22.png","element":"img","alt":" kτ2","inline":true,"padRight":true},{"text":"variables. ","element":"span"},{"text":"There are","element":"span"},{"style":{"height":26.65},"width":140.48,"height":66.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-23.png","element":"img","alt":"� l−α1α′1−β1�","inline":true},{"text":"ways of picking a ","element":"span"},{"style":{"height":16.81},"width":52.44,"height":42.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-24.png","element":"img","alt":" kτ2","inline":true},{"text":", and a further (","element":"span"},{"style":{"height":17.41},"width":144.2,"height":43.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-25.png","element":"img","alt":"α′1 − β1","inline":true},{"text":")! ways of arranging ","element":"span"},{"text":"the remaining ","element":"span"},{"style":{"height":20.27},"width":68.64,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-26.png","element":"img","alt":" kτ ′1.","inline":true}],[{"text":"4. Finally, the (","element":"span"},{"style":{"height":17.81},"width":322.28,"height":44.52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-27.png","element":"img","alt":"l −α′1)−(α1 −β1","inline":true},{"text":") remaining ","element":"span"},{"style":{"height":18.35},"width":77.64,"height":45.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-28.png","element":"img","alt":" k′τ2s","inline":true,"padRight":true},{"text":"need to be coupled with the remaining ","element":"span"},{"style":{"height":20.27},"width":53.28,"height":50.68,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-29.png","element":"img","alt":"kτ ′2","inline":true},{"text":"’s, and there are ((","element":"span"},{"style":{"height":18},"width":343.4,"height":45,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-30.png","element":"img","alt":"l − α′1) − (α1 − β1","inline":true},{"text":"))! such arrangements.","element":"span"}],[{"text":"Multiplying the possible pairings from the four steps above gives","element":"span"}],[{"style":{"width":"65%"},"width":1137,"height":349,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-31.png","element":"img"}],[{"text":"which adds up to the total number of possible pairings ","element":"span"},{"style":{"height":20.78},"width":291.4,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-32.png","element":"img","alt":"�β∈B C(β) = l","inline":true},{"text":"!. A further useful ","element":"span"},{"text":"simplification is ","element":"span"},{"style":{"height":17.6},"width":741.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/33-33.png","element":"img","alt":" C(β)/α!α′! = 1/β! when |α| = |α′| = l","inline":true},{"text":", and is used below.","element":"span"}],[{"style":{"width":"37%"},"width":653,"height":32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-0.png","element":"img"}],[{"text":"The absence of any self-interacting loops from Wick’s theorem lets the ","element":"span"},{"style":{"height":20.64},"width":105.32,"height":51.6,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-1.png","element":"img","alt":"�s≥3 ","inline":true,"padRight":true},{"text":"drop away in ","element":"span"},{"text":"Equation (38), as all terms are zero except for when ","element":"span"},{"text":"l ","element":"span"},{"text":"= ","element":"span"},{"text":"s","element":"span"},{"text":". Substituting ","element":"span"},{"style":{"height":18.98},"width":342.44,"height":47.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-2.png","element":"img","alt":" ⟨kατ kα′τ ′ ⟩ and C(β)","inline":true,"padRight":true},{"text":"into Equation (38) gives the final result,","element":"span"}],[{"style":{"width":"98%"},"width":1703,"height":324,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-3.png","element":"img"}],[{"text":"The derivation for the edge-node expectations is similar to that of the edge-edge case,","element":"span"}],[{"style":{"width":"74%"},"width":1294,"height":274,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-4.png","element":"img"}],[{"text":"where the expectations in the last line are again over ","element":"span"},{"style":{"height":17.82},"width":459.56,"height":44.56,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-5.png","element":"img","alt":" {kτ , kn}. When ⟨kατ ksn⟩","inline":true,"padRight":true},{"text":"is evaluated ","element":"span"},{"text":"with Wick’s theorem, there are ","element":"span"},{"style":{"height":16.82},"width":1130.72,"height":42.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-6.png","element":"img","alt":" α1 copies of kτ1, l−α1 copies of kτ2, and s copies of kn. The","inline":true,"padRight":true},{"text":"zero relation of ","element":"span"},{"style":{"height":15.49},"width":193.32,"height":38.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-7.png","element":"img","alt":" kτ and kn","inline":true,"padRight":true},{"text":"ensures that the only non-zero terms in the Wick sum are those where all the ","element":"span"},{"style":{"height":15.09},"width":40.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-8.png","element":"img","alt":" kτ","inline":true},{"text":"’s are paired with ","element":"span"},{"style":{"height":15.09},"width":43.56,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-9.png","element":"img","alt":" kn","inline":true},{"text":"’s; in other words, when ","element":"span"},{"text":"l ","element":"span"},{"text":"= ","element":"span"},{"text":"s","element":"span"},{"text":". There are ","element":"span"},{"text":"l","element":"span"},{"text":"! possible pairings, which cancels ","element":"span"},{"text":"l","element":"span"},{"text":"! in the denominator.","element":"span"}],[{"text":"The above edge-node expectation is for any edge and node in the tree, but notice that it simplifies greatly when the edge ","element":"span"},{"style":{"height":8},"width":27,"height":20,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-10.png","element":"img","alt":" τ","inline":true,"padRight":true},{"text":"is a connection to node ","element":"span"},{"style":{"height":16.4},"width":176.36,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-11.png","element":"img","alt":" n. Say τ1","inline":true,"padRight":true},{"text":"is the edge variable corresponding to ","element":"span"},{"style":{"height":10.69},"width":45.96,"height":26.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-12.png","element":"img","alt":" xn","inline":true},{"text":". In this case the covariance with respect to the ","element":"span"},{"text":"opposite ","element":"span"},{"text":"pair is zero, with ","element":"span"},{"style":{"height":17.62},"width":953.16,"height":44.04,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-13.png","element":"img","alt":" ⟨kτ2, kn⟩ = 0 (see Figure 2) and only one of the α","inline":true},{"text":"’s will have a non-zero contribution to the sum, namely when ","element":"span"},{"style":{"height":17.6},"width":192,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-14.png","element":"img","alt":" α = (l, 0).","inline":true}],[{"style":{"width":"37%"},"width":649,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-15.png","element":"img"}],[{"text":"The node-node expectation is given in Equation (27), and is also used for ","element":"span"},{"style":{"height":20.86},"width":158.12,"height":52.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-16.png","element":"img","alt":" ⟨⟨rn⟩2⟩.6","inline":true}]]},{"heading":"Appendix C. A Tractable, One-Dimensional Example","paragraphs":[[{"text":"The following example illustrates a tractable one-dimensional model with two factors. It is shown analytically that the correction to log ","element":"span"},{"style":{"height":14.69},"width":75.8,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-17.png","element":"img","alt":" ZEP","inline":true,"padRight":true},{"text":"must be zero, and that the result is reflected in the higher-order terms in Equation (32), which are also zero.","element":"span"}],[{"text":"Consider the factorization of a probit term with a Gaussian prior into","element":"span"}],[{"style":{"width":"63%"},"width":1090,"height":90,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/34-18.png","element":"img"}],[{"text":"where Φ(","element":"span"},{"text":"x","element":"span"},{"text":") is the cumulative Gaussian density function, and ","element":"span"},{"style":{"height":17.6},"width":548.64,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-0.png","element":"img","alt":" fa(x) = fb(x) = Φ(x). Z can","inline":true,"padRight":true},{"text":"be computed exactly, but for the sake of example ","element":"span"},{"text":"p","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") will be approximated with","element":"span"}],[{"style":{"width":"56%"},"width":980,"height":103,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-1.png","element":"img"}],[{"text":"Choose ","element":"span"},{"style":{"height":19.54},"width":1186.88,"height":48.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-2.png","element":"img","alt":" ga(x) = exp{φ(x)T λa}, and gb(x) = exp{φ(x)T λb}. The q","inline":true,"padRight":true},{"text":"approximation has parameter vector ","element":"span"},{"style":{"height":21.27},"width":412.92,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-3.png","element":"img","alt":" λ = λ0 + 12λa + 12λb","inline":true},{"text":". The EP fixed point is defined by ","element":"span"},{"style":{"height":15.28},"width":255.84,"height":38.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-4.png","element":"img","alt":" λa = λb and","inline":true},{"style":{"height":14.88},"width":152.76,"height":37.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-5.png","element":"img","alt":"Za = Zb","inline":true},{"text":". (For example, subtracting ","element":"span"},{"style":{"height":14.69},"width":47.28,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-6.png","element":"img","alt":" λa","inline":true,"padRight":true},{"text":"at the fixed point will leave ","element":"span"},{"style":{"height":19.06},"width":431.24,"height":47.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-7.png","element":"img","alt":" λ\\a = λ0 + 0, which is","inline":true,"padRight":true},{"text":"equal to a scaled version of the prior ","element":"span"},{"style":{"height":17.6},"width":82.12,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-8.png","element":"img","alt":" f0(x","inline":true},{"text":"). The factor ","element":"span"},{"style":{"height":17.6},"width":231.88,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-9.png","element":"img","alt":" fa(x) = Φ(x","inline":true},{"text":") is hence incorporated into the prior, giving ","element":"span"},{"style":{"height":14.69},"width":47.76,"height":36.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-10.png","element":"img","alt":" Za","inline":true},{"text":". By a symmetric argument, ","element":"span"},{"style":{"height":14.88},"width":161.88,"height":37.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-11.png","element":"img","alt":" Za = Zb","inline":true},{"text":".) Although it is trivial to show that ","element":"span"},{"style":{"height":24.58},"width":353.48,"height":61.44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-12.png","element":"img","alt":" ZEP = ZqZ1/2a Z1/2b","inline":true,"padRight":true},{"text":"will be equal to the true partition function ","element":"span"},{"text":"Z","element":"span"},{"text":", we shall prove it by showing that the correction term is log ","element":"span"},{"text":"R ","element":"span"},{"text":"= 0.","element":"span"}],[{"style":{"width":"30%"},"width":535,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-13.png","element":"img"}],[{"text":"In this section a transformation of variables from ","element":"span"},{"style":{"height":18.4},"width":769.44,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-14.png","element":"img","alt":" x to y ∼ N(y; 0, 1), with y = (x − µ)/σ,","inline":true,"padRight":true},{"text":"will be used to make the derivation slightly simpler, and therefore","element":"span"}],[{"style":{"width":"67%"},"width":1162,"height":106,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-15.png","element":"img"}],[{"text":"Below we analytically show that the correction log ","element":"span"},{"text":"R ","element":"span"},{"text":"is zero, and hence that","element":"span"}],[{"style":{"width":"83%"},"width":1450,"height":114,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-16.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":17.6},"width":90.16,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-17.png","element":"img","alt":" Fa(y","inline":true},{"text":") is a shorthand for ","element":"span"},{"style":{"height":24.13},"width":300.48,"height":60.32,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-18.png","element":"img","alt":" ⟨era(ka)⟩ka|y and","inline":true}],[{"style":{"width":"50%"},"width":877,"height":109,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-19.png","element":"img"}],[{"text":"Because ","element":"span"},{"style":{"height":16.4},"width":145.08,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-20.png","element":"img","alt":" fa = fb","inline":true},{"text":", the cumulants will be the same for all ","element":"span"},{"style":{"height":15.6},"width":328.24,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-21.png","element":"img","alt":" l, hence cal = cbl","inline":true},{"text":". Furthermore, ","element":"span"},{"style":{"height":17.6},"width":251.44,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-22.png","element":"img","alt":"ka|y and kb|y","inline":true,"padRight":true},{"text":"are both distributed according to the ","element":"span"},{"text":"same ","element":"span"},{"text":"density. Now define, using e","element":"span"},{"style":{"height":11.93},"width":82.48,"height":29.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-23.png","element":"img","alt":"ra =","inline":true,"padRight":true},{"text":"1 + ","element":"span"},{"style":{"height":21.26},"width":282.72,"height":53.16,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-24.png","element":"img","alt":" ra + 12r2a + · · · ,","inline":true}],[{"style":{"width":"93%"},"width":1620,"height":458,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-25.png","element":"img"}],[{"text":"In the second line above a transformation of variables was made in the integral, with ","element":"span"},{"text":"u ","element":"span"},{"text":"= ","element":"span"},{"style":{"height":16.4},"width":163.12,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-26.png","element":"img","alt":"σka + iy","inline":true},{"text":", such that ","element":"span"},{"style":{"height":17.6},"width":310.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-27.png","element":"img","alt":" ka = (u − iy)/σ","inline":true},{"text":". The Jacobian 1","element":"span"},{"style":{"height":17.6},"width":46.6,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/35-28.png","element":"img","alt":"/σ","inline":true,"padRight":true},{"text":"ensures proper normalization so that the average is over ","element":"span"},{"style":{"height":18.4},"width":220.32,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-0.png","element":"img","alt":" u ∼ N(u; 0,","inline":true,"padRight":true},{"text":"1). In the last line ","element":"span"},{"style":{"height":17.6},"width":671.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-1.png","element":"img","alt":" Hl(y) is the Hermite polynomial of","inline":true,"padRight":true},{"text":"degree ","element":"span"},{"text":"l","element":"span"},{"text":",","element":"span"}],[{"style":{"width":"88%"},"width":1530,"height":121,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-2.png","element":"img"}],[{"text":"which can be obtained for any real ","element":"span"},{"text":"y ","element":"span"},{"text":"and integer ","element":"span"},{"text":"l ","element":"span"},{"text":"= 0","element":"span"},{"text":", ","element":"span"},{"text":"1","element":"span"},{"text":", ","element":"span"},{"text":"2","element":"span"},{"text":", . . . ","element":"span"},{"text":"from the average ","element":"span"},{"style":{"height":17.6},"width":154.96,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-3.png","element":"img","alt":" Hl(y) =","inline":true}],[{"style":{"width":"35%"},"width":621,"height":50,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-4.png","element":"img"}],[{"text":"The remarkable property ","element":"span"},{"style":{"height":20.78},"width":400.36,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-5.png","element":"img","alt":" ⟨Hl(y)⟩y = 0 for all l","inline":true},{"text":", ensures that ","element":"span"},{"style":{"height":20.78},"width":485.76,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-6.png","element":"img","alt":" ⟨Fa(y)⟩y = 1 in Equation","inline":true,"padRight":true},{"text":"(41). Furthermore, ","element":"span"},{"style":{"height":17.6},"width":255.76,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-7.png","element":"img","alt":" Fa(y) = Fb(y","inline":true},{"text":") follows from the equivalence in cumulants ","element":"span"},{"style":{"height":15.6},"width":245.6,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-8.png","element":"img","alt":" cal = cbl; the","inline":true,"padRight":true},{"text":"roots in Equation (40) disappear to give ","element":"span"},{"style":{"height":20.78},"width":159.08,"height":51.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-9.png","element":"img","alt":" ⟨Fa(y)⟩y","inline":true},{"text":", proving that ","element":"span"},{"text":"R ","element":"span"},{"text":"= 1 in Equation (40).","element":"span"}],[{"style":{"width":"37%"},"width":649,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-10.png","element":"img"}],[{"text":"The second order expansion in Equation (32) in Section 7 evaluates to zero, as the matching cumulants ","element":"span"},{"style":{"height":10.88},"width":167.92,"height":27.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-11.png","element":"img","alt":" cal = cbl","inline":true,"padRight":true},{"text":"and equal distributions of ","element":"span"},{"style":{"height":17.6},"width":265.48,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-12.png","element":"img","alt":" ka|x and kb|x","inline":true,"padRight":true},{"text":"ensure that ","element":"span"},{"style":{"height":21.59},"width":271.12,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-13.png","element":"img","alt":" ⟨ra(ka)⟩ka|x =","inline":true},{"style":{"height":21.58},"width":218.4,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-14.png","element":"img","alt":"⟨rb(kb)⟩kb|x:","inline":true}],[{"style":{"width":"90%"},"width":1567,"height":252,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-15.png","element":"img"}]]},{"heading":"Appendix D. Corrections to Marginals Distributions","paragraphs":[[{"text":"Corrections to the marginal distributions follow from a similar derivation to that of the normalizing constant. As a simplification, let the Gaussian approximation be centred with ","element":"span"},{"style":{"height":18.4},"width":778.12,"height":46,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-16.png","element":"img","alt":"y = x − µ, so that q(y) = N(y ; 0, Σ","inline":true},{"text":"), and assume that ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":") is arises from the fully factorized approximation in Section 5. In this appendix corrections will be computed for the mean ","element":"span"},{"style":{"height":21.59},"width":422.92,"height":53.96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-17.png","element":"img","alt":" ⟨xi − µi⟩p(x) = ⟨yi⟩p(y)","inline":true},{"text":", and variance ","element":"span"},{"style":{"height":22.35},"width":907.68,"height":55.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-18.png","element":"img","alt":" ⟨(xi − µi)(xj − µj) − Σij⟩p(x) = ⟨yiyj⟩p(y) − Σij.","inline":true}],[{"text":"A further simplification that will be employed in the following section is a change of variables ","element":"span"},{"style":{"height":19.13},"width":988.52,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-19.png","element":"img","alt":" ηn = kn + iΣ−1nnyn, so that ηn ∼ N(ηn ; 0, Σ−1nn). Let","inline":true}],[{"style":{"width":"20%"},"width":357,"height":51,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-20.png","element":"img"}],[{"text":"which is zero-mean complex Gaussian random variable with a relation ","element":"span"},{"style":{"height":20.8},"width":286.08,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-21.png","element":"img","alt":"�z2n� = 0 and","inline":true},{"style":{"height":17.6},"width":780.08,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-22.png","element":"img","alt":"⟨zmzn⟩ = −Σmn/(ΣmmΣnn) when m ̸= n","inline":true},{"text":". Following Equation (24), the correction reads","element":"span"}],[{"style":{"width":"97%"},"width":1682,"height":186,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/36-23.png","element":"img"}],[{"style":{"width":"31%"},"width":541,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-0.png","element":"img"}],[{"text":"The lowest order correction to the EP marginal’s mean follows from the result in Equation (13):","element":"span"}],[{"style":{"width":"99%"},"width":1725,"height":668,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-1.png","element":"img"}],[{"text":"pears as","element":"span"},{"style":{"height":31.6},"width":139.72,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-2.png","element":"img","alt":"�∂rj(zj)∂zj","inline":true}],[{"style":{"height":31.6},"width":329.84,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-3.png","element":"img","alt":"�= 0. The j = n","inline":true,"padRight":true},{"text":"second order term also disappears as","element":"span"},{"style":{"height":31.6},"width":372,"height":79,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-4.png","element":"img","alt":"�rj(zj)∂rj(zj)∂zj �= 0.","inline":true,"padRight":true},{"text":"These equivalences can be seen by taking ","element":"span"},{"style":{"height":18.29},"width":88.92,"height":45.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-5.png","element":"img","alt":" rj(zj","inline":true},{"text":") (and also its derivative) as a expansion over powers of ","element":"span"},{"style":{"height":21.93},"width":1437.76,"height":54.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-6.png","element":"img","alt":" zj; as ⟨z2j ⟩ = 0, Wick’s theorem states that every expectation of powers of","inline":true},{"style":{"height":13.09},"width":35.16,"height":32.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-7.png","element":"img","alt":"zj","inline":true,"padRight":true},{"text":"should be zero. Hence","element":"span"}],[{"style":{"width":"75%"},"width":1313,"height":124,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-8.png","element":"img"}],[{"text":"The derivative of the characteristic function, as required in Equation (42), is","element":"span"}],[{"style":{"width":"76%"},"width":1321,"height":158,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-9.png","element":"img"}],[{"text":"The expectations for ","element":"span"},{"style":{"height":16.8},"width":104.72,"height":42,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-10.png","element":"img","alt":" j ̸= n","inline":true,"padRight":true},{"text":"in Equation (42) evaluate to","element":"span"}],[{"style":{"width":"90%"},"width":1572,"height":254,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-11.png","element":"img"}],[{"text":"with the second term disappearing as ","element":"span"},{"style":{"height":15.09},"width":595.08,"height":37.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-12.png","element":"img","alt":" s > l = 2 ensures that some zn","inline":true,"padRight":true},{"text":"is always self-paired in Wick’s theorem. Finally, by substituting Equation (43) into (42), the correction to the mean is","element":"span"}],[{"style":{"width":"53%"},"width":933,"height":132,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/37-13.png","element":"img"}],[{"style":{"width":"38%"},"width":658,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-0.png","element":"img"}],[{"text":"The correction to the second moments follow the same recipe as that of the marginal mean in Appendix D.1. We proceed by first treating ","element":"span"},{"style":{"height":16.4},"width":133.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-1.png","element":"img","alt":" yi with","inline":true}],[{"style":{"width":"62%"},"width":1079,"height":497,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-2.png","element":"img"}],[{"text":"Reapplying the recipe gives the correction to the covariance:","element":"span"}],[{"style":{"width":"83%"},"width":1443,"height":694,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-3.png","element":"img"}]]},{"heading":"Appendix E. Higher Order Cumulants","paragraphs":[[{"text":"Much of this paper hinges on cumulants beyond the second order. These are frequently more cumbersome to obtain than the initial moments that are required by EP. This appendix provides details of the cumulants used in this paper.","element":"span"}],[{"text":"The cumulants of a distribution ","element":"span"},{"style":{"height":17.6},"width":83.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-4.png","element":"img","alt":" qn(x","inline":true},{"text":") can be obtained from its moments through","element":"span"}],[{"style":{"width":"61%"},"width":1066,"height":137,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-5.png","element":"img"}],[{"style":{"height":24.56},"width":1728,"height":61.4,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/38-6.png","element":"img","alt":"c5 =�x5�− 5�x4�⟨x⟩ − 10�x3� �x2�+ 20�x3�⟨x⟩2 + 30�x2�2 ⟨x⟩ − 60�x2�⟨x⟩3 + 24 ⟨x⟩5 ;","inline":true}],[{"text":"they are derived for doubly-truncated Gaussian distributions in Appendices E.1 and E.2. One might also directly take derivatives of the cumulant generating function, and the cumulants of a Probit-times-Gaussian distribution, common to GP classification models, are derived this way in Appendix E.3.","element":"span"}],[{"text":"The tree-structured approximation in Sections 7 and 9.1, and Appendices A.1.3 and B, require cumulants over two variables. They are presented in Appendix E.4 for the Ising model.","element":"span"}],[{"style":{"width":"53%"},"width":927,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-0.png","element":"img"}],[{"text":"Consider the centered distribution ","element":"span"},{"style":{"height":19.15},"width":650.12,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-1.png","element":"img","alt":" qn(xn) ∝ I[|xn| < a] N(xn ; 0, λ−1n ","inline":true,"padRight":true},{"text":"). The odd moments ","element":"span"},{"text":"of this tilted distributions are, by symmetry, ","element":"span"},{"style":{"height":20.8},"width":550.28,"height":52,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-2.png","element":"img","alt":" ⟨xn⟩ =�x3n�=�x5n�= 0. Let","inline":true}],[{"style":{"width":"60%"},"width":1047,"height":112,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-3.png","element":"img"}],[{"text":"with the Probit function being Φ(","element":"span"},{"style":{"height":20.77},"width":425.64,"height":51.92,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-4.png","element":"img","alt":"x) =� x−∞ N(z; 0, 1) dz","inline":true},{"text":". Subscripts ","element":"span"},{"text":"n ","element":"span"},{"text":"are dropped where ","element":"span"},{"text":"they are clearly implied by their context. To get the even moments, consider","element":"span"}],[{"style":{"width":"67%"},"width":1167,"height":211,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-5.png","element":"img"}],[{"text":"Using the partition function, we get","element":"span"}],[{"style":{"width":"99%"},"width":1721,"height":1116,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-6.png","element":"img"}],[{"text":"The same calculation from Appendix E.1 can be repeated to get the moments of the non-centered truncated Gaussian ","element":"span"},{"style":{"height":19.15},"width":678.92,"height":47.88,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-7.png","element":"img","alt":" qn(xn) ∝ I[|xn| < a] N(xn ; µ, λ−1n ","inline":true,"padRight":true},{"text":"). The subscripts ","element":"span"},{"text":"n ","element":"span"},{"text":"are ","element":"span"},{"text":"dropped where evident. The partition function is","element":"span"}],[{"style":{"width":"61%"},"width":1067,"height":113,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/39-8.png","element":"img"}],[{"style":{"width":"49%"},"width":854,"height":616,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-0.png","element":"img"}],[{"text":"Figure 9: The moments of ","element":"figcaption","subtype":"caption"},{"style":{"height":19.14},"width":587.24,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-1.png","element":"img","alt":" qn(x) ∝ I[|x| < a] N(x ; µ, σ2","inline":true},{"text":"), as a function of ","element":"figcaption","subtype":"caption"},{"style":{"height":15.14},"width":223.52,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-2.png","element":"img","alt":" σ2. As the","inline":true,"padRight":true},{"text":"Gaussian variance ","element":"figcaption","subtype":"caption"},{"style":{"height":15.13},"width":157.28,"height":37.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-3.png","element":"img","alt":" σ2 → ∞","inline":true},{"text":", the moments converge to that of a uniform ","element":"figcaption","subtype":"caption"},{"style":{"height":17.6},"width":155.04,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-4.png","element":"img","alt":" U[−a, a]","inline":true,"padRight":true},{"text":"distribution.","element":"figcaption","subtype":"caption"}],[{"style":{"width":"72%"},"width":1251,"height":96,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-5.png","element":"img"}],[{"text":"By again taking increasing derivatives of ","element":"span"},{"style":{"height":17.6},"width":120.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-6.png","element":"img","alt":" Z(λ, µ","inline":true},{"text":") with respect to ","element":"span"},{"style":{"height":16.4},"width":159.92,"height":41,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-7.png","element":"img","alt":" µ and λ","inline":true},{"text":", the moments solved for are","element":"span"}],[{"style":{"width":"77%"},"width":1334,"height":711,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-8.png","element":"img"}],[{"text":"Finally,","element":"span"}],[{"style":{"width":"97%"},"width":1693,"height":229,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/40-9.png","element":"img"}],[{"text":"As Figure 9 illustrates, these moments will converge to that of a uniform distribution as the Gaussian’s variance grows large.","element":"span"}],[{"style":{"width":"91%"},"width":1580,"height":609,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-0.png","element":"img"}],[{"text":"Figure 10: The third and fourth cumulants of the density ","element":"figcaption","subtype":"caption"},{"style":{"height":19.14},"width":629.96,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-1.png","element":"img","alt":" qn(x) ∝ Φ((x−m)/v) N(x; µ, σ2)","inline":true,"padRight":true},{"text":"in Appendix E.3. ","element":"figcaption","subtype":"caption"},{"text":"The step function Θ(","element":"figcaption","subtype":"caption"},{"text":"x","element":"figcaption","subtype":"caption"},{"text":"), with ","element":"figcaption","subtype":"caption"},{"text":"m ","element":"figcaption","subtype":"caption"},{"text":"= ","element":"figcaption","subtype":"caption"},{"text":"v ","element":"figcaption","subtype":"caption"},{"text":"= 0, is taken as an example here. The third cumulant is always positive, while the fourth cumulant is positive only when ","element":"figcaption","subtype":"caption"},{"style":{"height":13.6},"width":123.36,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-2.png","element":"img","alt":" σ > µ.","inline":true}],[{"style":{"width":"35%"},"width":605,"height":31,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-3.png","element":"img"}],[{"text":"EP approximations to Probit regression models, and Gaussian process classification models in general (see Section 8.1), depend on the moments of ","element":"span"},{"style":{"height":19.14},"width":654.24,"height":47.84,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-4.png","element":"img","alt":" qn(x) ∝ Φ((x − m)/v) N(x; µ, σ2).","inline":true,"padRight":true},{"text":"We introduce ","element":"span"},{"style":{"height":13.6},"width":69.04,"height":34,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-5.png","element":"img","alt":" v ≥","inline":true,"padRight":true},{"text":"0 so that the likelihood can become a step function at ","element":"span"},{"text":"v ","element":"span"},{"text":"= 0, for example. We shall obtain the cumulants by taking derivatives of the characteristic function. The characteristic function of ","element":"span"},{"style":{"height":17.6},"width":83.56,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-6.png","element":"img","alt":" qn(x","inline":true},{"text":"), as described by Equation (15), is","element":"span"}],[{"style":{"width":"54%"},"width":944,"height":105,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-7.png","element":"img"}],[{"text":"with","element":"span"}],[{"style":{"width":"42%"},"width":730,"height":98,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-8.png","element":"img"}],[{"text":"The cumulants ","element":"span"},{"style":{"height":10.88},"width":50.28,"height":27.2,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-9.png","element":"img","alt":" cln","inline":true,"padRight":true},{"text":"are determined from the derivatives of log ","element":"span"},{"style":{"height":17.6},"width":89.72,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-10.png","element":"img","alt":" χn(k","inline":true},{"text":") at zero; a lengthy calculation shows that they are","element":"span"}],[{"style":{"width":"55%"},"width":953,"height":127,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-11.png","element":"img"}],[{"text":"where ","element":"span"},{"style":{"height":19.79},"width":815.04,"height":49.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-12.png","element":"img","alt":" α = σ2/√v2 + σ2 and β = N(z; 0, 1)/Φ(z).","inline":true}],[{"style":{"width":"53%"},"width":920,"height":40,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-13.png","element":"img"}],[{"text":"We need some third and fourth order two-variable cumulants and thus generalize the results of Section 4.2 to the bivariate case. To do this we can exploit the cumulant generating property of log ","element":"span"},{"style":{"height":20.05},"width":332.68,"height":50.12,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-14.png","element":"img","alt":" χa(ka). Let c(l,l′)","inline":true,"padRight":true},{"text":"denote the joint ","element":"span"},{"style":{"height":15.6},"width":63.04,"height":39,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/41-15.png","element":"img","alt":" l, l′ ","inline":true,"padRight":true},{"text":"order cumulant of variable one and","element":"span"}],[{"text":"two, respectively. We can generate this cumulant from derivatives of log ","element":"span"},{"style":{"height":17.6},"width":139.2,"height":44,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-0.png","element":"img","alt":" χa(ka):","inline":true}],[{"style":{"width":"47%"},"width":818,"height":135,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-1.png","element":"img"}],[{"text":"We can also express this as a recursion in terms of cumulants:","element":"span"}],[{"style":{"width":"52%"},"width":900,"height":134,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-2.png","element":"img"}],[{"text":"By explicit calculation for a bivariate binary distribution we get the first two orders’ cumulants: ","element":"span"},{"style":{"height":21.39},"width":1252.84,"height":53.48,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-3.png","element":"img","alt":" c(1,0) = m1, c(0,1) = m2, c(2,0) = 1 − m21, c(0,2) = 1 − m22 and c(1,1) ","inline":true,"padRight":true},{"text":"is equal to the ","element":"span"},{"text":"covariance between the two variables (to be matched with ","element":"span"},{"text":"q","element":"span"},{"text":"(","element":"span"},{"text":"x","element":"span"},{"text":")). The fact that we can write ","element":"span"},{"style":{"height":14.66},"width":87.88,"height":36.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-4.png","element":"img","alt":"c(2,0)","inline":true,"padRight":true},{"text":"in terms of the first order cumulant shows that we can express all order cumulants in terms of the first and second order cumulant for example:","element":"span"}],[{"style":{"width":"74%"},"width":1294,"height":108,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-5.png","element":"img"}],[{"text":"Using the same recursion it is easy to show: ","element":"span"},{"style":{"height":23.89},"width":806.32,"height":59.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-6.png","element":"img","alt":" c(3,0) = −2c(1,0)c(2,0), c(4,0) = −2c2(2,0) −","inline":true,"padRight":true},{"text":"2","element":"span"},{"style":{"height":23.89},"width":1706.32,"height":59.72,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-7.png","element":"img","alt":"c(1,0)c(3,0), c(3,1) = −2c(2,0)c(1,1)−2c(1,0)c(2,1) and c(2,2) = −2c2(1,1)−2c(1,0)c(1,2) = −2c2(1,1)+","inline":true,"padRight":true},{"text":"4","element":"span"},{"style":{"height":14.66},"width":283.68,"height":36.64,"src":"https://cdn.bytez.com/mobilePapers/v2/arxiv/1301.2724/images/42-8.png","element":"img","alt":"c(1,0)c(0,1)c(1,1).","inline":true}]]},{"heading":"References","paragraphs":[[{"text":"D. Barber. ","element":"span"},{"text":"Bayesian Reasoning and Machine Learning","element":"span"},{"text":". Cambridge University Press, 2012.","element":"span"}],[{"text":"C. M. Bishop. ","element":"span"},{"text":"Pattern Recognition and Machine Learning","element":"span"},{"text":". Springer, 2006.","element":"span"}],[{"text":"S. Blinnikov and R. Moessner. Expansions for nearly Gaussian distributions. ","element":"span"},{"text":"Astronomy and Astrophysics Supplement Series","element":"span"},{"text":", 130:193–205, 1998.","element":"span"}],[{"text":"J. P. Boyd. The devil’s invention: Asymptotic, superasymptotic and hyperasymptotic series. ","element":"span"},{"text":"Acta Applicandae Mathematicae","element":"span"},{"text":", 56:1–98, 1999.","element":"span"}],[{"text":"M. Chertkov and V. Y. Chernyak. Loop series for discrete statistical models on graphs. ","element":"span"},{"text":"Journal of Statistical Mechanics: Theory and Experiment","element":"span"},{"text":", 2006:P06009, 2006.","element":"span"}],[{"text":"B. Cseke and T. Heskes. Approximate marginals in latent Gaussian models. ","element":"span"},{"text":"Journal of Machine Learning Research","element":"span"},{"text":", 12:417–457, 2011.","element":"span"}],[{"text":"M. Kuss and C. E. Rasmussen. Assessing approximate inference for binary Gaussian process classification. ","element":"span"},{"text":"Journal of Machine Learning Research","element":"span"},{"text":", 6:1679–1704, 2005.","element":"span"}],[{"text":"M. M´ezard, G. Parisi, and M. A. Virasoro. ","element":"span"},{"text":"Spin Glass Theory and Beyond","element":"span"},{"text":", volume 9 of ","element":"span"},{"text":"Lecture Notes in Physics","element":"span"},{"text":". World Scientific, 1987.","element":"span"}],[{"text":"T. P. Minka. Expectation propagation for approximate Bayesian inference. In ","element":"span"},{"text":"UAI 2001","element":"span"},{"text":", pages 362–369, 2001a.","element":"span"}],[{"text":"T. P. Minka. ","element":"span"},{"text":"A family of algorithms for approximate Bayesian inference","element":"span"},{"text":". PhD thesis, MIT Media Lab, 2001b.","element":"span"}],[{"text":"T. P. Minka. Power EP. Technical Report MSR-TR-2004-149, Microsoft Research Ltd, 2004.","element":"span"}],[{"text":"T. P. Minka and Y. Qi. Tree-structured approximations by expectation propagation. In ","element":"span"},{"text":"Advances in Neural Information Processing Systems 16","element":"span"},{"text":". 2004.","element":"span"}],[{"text":"K. P. Murphy. ","element":"span"},{"text":"Machine Learning: A Probabilistic Perspective","element":"span"},{"text":". The MIT Press, 2012.","element":"span"}],[{"text":"M. Opper, U. Paquet, and O. Winther. Improving on expectation propagation. In ","element":"span"},{"text":"Advances in Neural Information Processing Systems 21","element":"span"},{"text":", pages 1241–1248. 2009.","element":"span"}],[{"text":"M. Opper and O. Winther. Gaussian processes for classification: Mean field algorithms. ","element":"span"},{"text":"Neural Computation","element":"span"},{"text":", 12:2655–2684, 2000.","element":"span"}],[{"text":"M. Opper and O. Winther. ","element":"span"},{"text":"Expectation consistent approximate inference. ","element":"span"},{"text":"Journal of Machine Learning Research","element":"span"},{"text":", 6:2177–2204, 2005.","element":"span"}],[{"text":"U. Paquet, M. Opper, and O. Winther. Perturbation corrections in approximate inference: Mixture modelling applications. ","element":"span"},{"text":"Journal of Machine Learning Research","element":"span"},{"text":", 10:935–976, 2009.","element":"span"}],[{"text":"C. E. Rasmussen and C. K. I. Williams. ","element":"span"},{"text":"Gaussian Processes for Machine Learning","element":"span"},{"text":". The MIT Press, 2005.","element":"span"}],[{"text":"H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. ","element":"span"},{"text":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)","element":"span"},{"text":", 71(2):319–392, 2009.","element":"span"}],[{"text":"M. W. Seeger and H. Nickisch. Fast convergent algorithms for expectation propagation approximate Bayesian inference. ","element":"span"},{"text":"Arxiv preprint arXiv:1012.3584","element":"span"},{"text":", 2010.","element":"span"}],[{"text":"D. Sherrington and S. Kirckpatrick. Solvable model of a spin-glass. ","element":"span"},{"text":"Phys. Rev. Lett.","element":"span"},{"text":", 35 (26):1792–1796, December 1975.","element":"span"}],[{"text":"E. Sudderth, M. Wainwright, and A. Willsky. Loop series and Bethe variational bounds in attractive graphical models. In ","element":"span"},{"text":"Advances in Neural Information Processing Systems 20","element":"span"},{"text":", pages 1425–1432. 2008.","element":"span"}],[{"text":"D. J. Thouless, P. W. Anderson, and R. G. Palmer. Solution of a ‘solvable model of a spin glass’. ","element":"span"},{"text":"Phil. Mag.","element":"span"},{"text":", 35:593, 1977.","element":"span"}],[{"text":"M. A. J. van Gerven, B. Cseke, F. P. de Lange, and T. Heskes. Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior. ","element":"span"},{"text":"NeuroImage","element":"span"},{"text":", 50(1):150–161, 2010.","element":"span"}],[{"text":"M. J. Wainwright and M. I. Jordan. Log-determinant relaxation for approximate inference in discrete Markov random fields. ","element":"span"},{"text":"IEEE Transactions on Signal Processing","element":"span"},{"text":", 54(6):2099– 2109, 2006.","element":"span"}],[{"text":"M. Welling, A. Gelfand, and A. Ihler. A cluster-cumulant expansion at the fixed points of belief propagation. In ","element":"span"},{"text":"Uncertainty in Artificial Intelligence (UAI)","element":"span"},{"text":". 2012.","element":"span"}]]}],"_version":"3.3.4"},"paperNode":"$28:props:children:props:children:0:props:product"}]]