Domain-Liftability of Relational Marginal Polytopes

2020·Arxiv

Abstract

Abstract

We study computational aspects of relational marginal polytopes which are statistical relational learning counterparts of marginal polytopes, well-known from probabilistic graphical models. Here, given some first-order logic formula, we can define its relational marginal statistic to be the fraction of groundings that make this formula true in a given possible world. For a list of first-order logic formulas, the relational marginal polytope is the set of all points that correspond to the expected values of the relational marginal statistics that are realizable. In this paper, we study the following two problems: (i) Do domain-liftability results for the partition functions of Markov logic networks (MLNs) carry over to the problem of relational marginal polytope construction? (ii) Is the relational marginal polytope containment problem hard under some plausible complexity-theoretic assumptions? Our positive results have consequences for lifted weight learning of MLNs. In particular, we show that weight learning of MLNs is domain-liftable whenever the computation of the partition function of the respective MLNs is domain-liftable (this result has not been rigorously proven before).

Introduction

In this paper, we study two problems about objects, called relational marginal polytopes (Kuželka et al. 2018), which arise naturally in the study of statistical relational learning models such as Markov logic networks (Richardson and Domingos 2006). A Markov logic network is given by a set of first-order logic formulas and their weights. Roughly speaking, the corresponding relational marginal polytope then represents the set of all possible values of vectors of sufficient statistics that the Markov logic network with given first-order logic formulas may represent. Here, the sufficient statistics of the Markov logic network count the fraction of groundings of the first-order logic formulas defining it that are satisfied in a given possible world. For instance, for an MLN defined by formulas sm(x) and smfriends, the sufficient statistics would be the proportion of smokers in the population (the formula ) and the fraction of pairs of people who are friends and, at the same time, both smoke (the formula ).

The main advantages of specifying MLNs using sufficient statistics over specifying them using weights of formulas are (i) that it is clear what the parameters mean and (ii) that there are no issues with varying domain size caused by non-projectivity (Shalizi and Rinaldo 2013) of MLNs; we refer to (Kuželka et al. 2018; Jaeger and Schulte 2018; Mittal et al. 2019) for further justification. An obvious disadvantage of specifying MLNs using sufficient statistics directly (instead of weights) is that they may be inconsistent, i.e. there may be no distribution satisfying them. Deciding if a given point lies inside the relational marginal polytope is equivalent to asking if there exists a distribution that has given expected values of the sufficient statistics.

Marginal polytopes are important objects which have been studied for the propositional case in the literature (Sontag and Jaakkola 2008; Roughgarden and Kearns 2013; Bresler, Gamarnik, and Shah 2014). We believe it is important to generalize the most important results from nonrelational PGM literature to the relational setting and we believe the study of relational marginal polytopes is one of them. It is well known that even just deciding whether a point is contained in a marginal polytope is NP-hard. A marginal polytope may have exponentially many vertices in the dimension of the modeled data. However, in the relational setting, in particular for Markov logic networks, the situation is more complex because there is no single notion of dimensionality of the data; instead there is the number of first-order logic formulas defining the Markov logic network and the size of the domain, which is the set of objects modeled by the Markov logic network (e.g. users of a social network). It turns out that the number of vertices of a relational marginal polytope grows only polynomially with the size of the domain. It is quite frequent in applications of SRL that the number of first-order logic formulas is quite small and the domain is large, which is also one of the motivations for so-called lifted inference algorithms (De Salvo Braz, Amir, and Roth 2005; Gogate and Domingos 2011; Van den Broeck 2011) which exploit symmetries stemming from the first-order nature of statistical relational models. It is therefore important to understand the properties of relational marginal polytopes, in particular the complexity of the relevant problems as a function of the domain size, which is what we study in this paper. In particular the main contributions of this paper are as follows: (i) We show that if a class of Markov logic networks admits a domain-lifted inference algorithm (in particular an algorithm for computing the partition function of Markov logic networks, with runtime polynomial in the size of the domain) then there is also a domain-lifted algorithm for constructing the respective relational marginal polytopes. This allows us to extend recent domain-liftability results for weight learning of Markov logic networks from (Kuželka and Kungurtsev 2019). (ii) We show that the problem of deciding whether a point is contained in a relational marginal polytope given by a set of first-order logic formulas and a domain of size n is, in general, not domain-liftable. This negative result for containment then also directly translates into a negative result for relational marginal polytope construction (because the number of vertices of the polytope is polynomial in the domain size).

The results in this paper are mostly meant as showing what is and what is not possible theoretically. Making the results practical is a different question.

Preliminaries

First Order Logic We assume a function-free first-order language defined by a set of constants , a set of variables V and for each a set of k-ary predicates. Variables start with lowercase letters and constants start with uppercase letters. An atom is with and . A literal is an atom or its negation. A clause is a universally quantified disjunction of a finite set of literals. A clause in which none of the literals contains any variables is called ground. The set of grounding substitutions of a clause w.r.t. a set of constants is the set that contains substitutions to all variables occurring in using constants from . A possible world is represented as a set of ground atoms that are true in . The satisfaction relation |= is defined in the usual way: means that the formula is true in . When x is a list of first-order logic variables then |x| is used to denote the length of this list. An important concept that we will need in this paper is the spectrum of a first-order logic sentence.

Definition 1 (Spectrum of sentence). Let be a first-order logic sentence. Its spectrum is the set of integers n for which has a model on domain of size n.

Complexity Classes ETIME and NETIME

In this section we briefly describe the necessary results from complexity theory that we will use for proving negative results in this paper.

Definition 2 (Complexity classes). Given a function f, the class DTIME[f(n)] (or NTIME[f(n)]) of decision problems are solvable by f(n) time-bounded deterministic (or non-deterministic) Turing machines. The class ETIME is defined as and the class NETIME equals to .

The class NETIME is tightly related to the spectra of first-order logic sentences via the following result of Jones and Selman.

Theorem 1 (Jones and Selman 1974). A set is in NETIME if and only if A is the spectrum of a function-free and constant-free sentence .

Markov Logic Networks

A Markov logic network (Richardson and Domingos 2006) (MLN) is a set of weighted first-order logic formulas , where and is a function-free and quantifier-free first-order formula. The semantics are defined w.r.t. the groundings of the first-order formulas, relative to some finite set of constants , called the domain. An MLN induces the probability distribution over possible worlds : where is the number of groundings of satisfied in , and Z, called partition function, is a normalization constant to ensure that is a probability distribution. It turns out to be more convenient to replace in the definition of MLNs by where is the set of all grounding substitutions of ’s variables using constants from and is the indicator function, which is equal to 1 when is true in the possible world . Thus, is the fraction of the groundings of satisfied in . Equivalently, we can say that is the probability that if we pick from uniformly at random. With this notation we will write the probability of a possible world as:

Relational Marginal Problems

An alternative way to view a Markov logic network is to think of it as a maximum entropy distribution satisfying given marginal constraints . Assuming we have the expected values of the formula statistics , we can define the following maximum entropy problem (Kuželka et al. 2018).

Relational Marginal Problem (Formulation):

Here, the ’s are the problem’s decision variables, each of which represents the probability of one possible world . Line (1) is the maximum entropy criterion, which is shown here as the minimization of the negative entropy; Line (2) shows the constraints given by the statistics; and Line (3) provides the normalization constraints for the probability distribution.

Assuming there exists a feasible solution satisfying (we call such a solution positive), the optimal solution of the above maximum entropy problem is an MLN where the parameters are obtained by maximizing the dual criterion, which also happens to be equivalent to the log-likelihood of the MLN w.r.t. a (possibly fictitious) training example that is over the same domain and satis-fies for all the formula statistics (we refer to (Kuželka et al. 2018) for details).

Inference Using Weighted Model Counting

For exact inference in MLNs, one generally needs to be able to compute the partition function Z = . Computation of the partition function Z can be converted to a weighted first-order model counting problem (WFOMC).

Definition 3 (WFOMC (Van den Broeck 2011)). Let w(P) and w(P) be functions from predicates to real numbers (we call w and w weight functions) and let be a first-order theory. Then

where and denote the positive literals that are true and false in , respectively, and Pred(a) denotes the predicate of a (e.g. Pred(friends(Alice, Bob)) = friends).

To compute the partition function Z using weighted model counting, we may proceed as in (Van den Broeck 2011). Let a Markov logic network be given. Here, for simplicity of exposition, we will assume that the formulas in do not contain constants (we refer to (Van den Broeck 2011) for the general case). For every weighted formula , where the free variables in are exactly , we create a new formula where is a new fresh predicate; we denote the resulting set of formulas . Then we set

and and for all other predicates we set both w andw equal to 1. It is easy to check that then , which is what we needed to compute.

Liftability Importantly, there are classes of first-order logic theories for which weighted model counting is polynomial-time. In particular, as shown in (Van den Broeck, Meert, and Darwiche 2014), when the theory consists only of first-order logic sentences, each of which contains at most two logic variables, the weighted model count can be computed in time polynomial in the number of elements in the domain over which the set of possible worlds is defined. It follows from the translation described in the previous section that this also means that computing the partition function of 2-variable MLNs can be done in time polynomial in the size of the domain and in the absolute value of the MLN’s weights. This is not the case in general when the number of variables in the formulas is greater than two unless P = #P(Beame et al. 2015). Within statistical relational learning, the term used for

problems that have such polynomial-time algorithms is domain liftability. Next, we define what we mean by domain liftability (Van den Broeck 2011).

Definition 4 (Domain liftability). An algorithm for computing the partition function Z of an MLN , where each is represented by two numbers2 as , is said to be domain-lifted if it runs in time polynomial in the size of the domain and in the number of bits needed to encode the numbers and . A class of MLNs is said to be domain-liftable if there is a domain lifted algorithm for computing the partition function Z for MLNs from this class.

The definition that we use here differs slightly from the original definition by Van den Broeck (Van den Broeck 2011) in that it also requires lifted algorithms to depend polynomially on the size of the representation of the formulas’ weights. A justification for this alternative definition follows from the work of Jaeger (Jaeger 2015) (Section 4.2). In particular, all existing domain-lifted inference algorithms are also domain-lifted according to our definition. Another small technical difference is that we define domain-liftability directly in terms of complexity of computing the partition function Z.

Relational Marginal Polytopes

Now we can finally define relational marginal polytopes (Kuželka et al. 2018) which are the main focus of this paper. These represent the expected values for the vectors of statistics of the given formulas that are possible. Definition 5 (Relational marginal polytope (Kuželka et al. 2018)). Let be the set of possible worlds on domain and be a list of formulas. We define the relational marginal polytope w.r.t. as dist. on The relational marginal polytope w.r.t. a given list of formulas can be equivalently defined as the convex hull of the set . Next, we illustrate the fact that relational marginal polytopes can be quite complex using the following example from (Kuželka and Kungurtsev 2019). In fact, we show later in this paper that, in general, the problem of deciding whether a point belongs to a relational marginal polytope is not domain-liftable. Example 2. Consider the formulas friendsand friendsfriendsfriends. Let be the set of domain elements and be the respective set of possible worlds over the first-order language given by the predicate friends/2 and the constants from . The possible worlds may be thought of as representing social networks. Then corresponds to the “frequency” of friendships in the network

and to the “frequency” of friendship-triangles. We can then see easily why there is, for instance, no distribution with and (as graphs without edges cannot have a positive number of triangles). Hence, the point (0, 1) will not be contained in the relational marginal polytope. In general relational marginal polytopes may be quite complicated objects as, for instance, in this case to construct the respective polytope we would also need to also find the “extremal” directed graphs with maximum number of triangles with a constrained number of edges etc.

Next, we define what it means for a point to be in the -interior of a polytope.

Definition 6 (Interiority). Let be a polytope and be the maximal linearly independent system of linear equations that hold for the vertices of P. A point is said to be in the -interior of P if

We need to consider the system of linear equations in the definition of interiority because the polytope may live in a lower-dimensional subset of the given space. Our defi-nition of interiority is also often called relative interiority in the literature.

A Lifted Reduction to Partition Functions of MLNs

In this section, we show that computing relational marginal polytopes can be reduced to computing the partition functions of some MLNs. Instead of , we deal with in this section, since there is a simple one-to-one mapping and only uses integers. We call the convex hull of the set the integer relational marginal polytope . Let . We can see that is a subset of . The algorithm described in the following theorem is in polynomial time. Even though it would not be efficient in practice, our goal here is to show a reduction, instead of devising a fast algorithm.

Theorem 3. Let be a list of first-order logic formulas. If the class of MLNs is domain-liftable then computing the facets of the relational marginal polytope can be done in time polynomial in .

Proof. We first observe that, for any world can only take value in where Hence, can only take value in, which are integer points (vectors) in an m- dimensional space, whose i-th entries represent .

If is full-dimensional, each of its facets corresponds to a halfspace, which is of the form b, where is the i-th coordinate, and can be written as . If is not full-dimensional, each of its facets corresponds to an intersection of such half-spaces. To determine these halfspaces, we first list all possible normal vectors , and then find b for each normal vector.

Enumerate all possible ’s Note that every vertex of the IRMP is an integer vector in. We list all possi- ble linearly-independent m-tuples of , where every is a point in. For every tuple, we can efficiently compute the perpendicular vector of the hyperplane that passes every point in the tuple. For example, we subtract the coordinates of one of the points from all of the others and then compute their generalized cross product to obtain the perpendicular vector . To enumerate every possible , both and need to be taken into account. In this way, every coordinate of every is an integer (which is not necessary but makes the next step easier). It is not difficult to see that the number of ’s that we collect is polynomial in and every entry is polynomial in as well.

Find b for eachFor every vector, to determine the integer b for the halfspace , we construct a Markov logic network and then compute the partition function of this MLN. If for every possible world , , then the partition function is since there are possible worlds. If there exists a world such that , then the partition function , since every is an integer. Therefore, one can easily find the smallest integer b such that there does not exist any world satisfying .After the steps above, we obtain a set of linear inequality constraints (halfspaces). We can apply Lemma 2 (in the appendix) to get a minimal set of constraints that correspond to the facets of the IRMP, which can be transformed into the facets of RMP efficiently.

Here we would like to note that the techniques for lifted linear programming developed in the work (Mladenov, Ahmadi, and Kersting 2012; Kersting, Mladenov, and Tokmakov 2017) are not directly relevant to our work. In our case, all the involved linear programs are already small (are polynomial in the size of the domain).

A Corollary for Lifted Weight Learning of Markov Logic Networks

Recently, it has been shown in (Kuželka and Kungurtsev 2019) that weight learning of Markov logic networks based on maximization of log-likelihood is domain-liftable for the 2-variable fragment of Markov logic networks. Previously, it had been shown in (Van Haaren et al. 2016) that computing the gradients of log-likelihood is domain-liftable for liftable fragments of Markov logic networks. What the work (Kuželka and Kungurtsev 2019) added to this was to show that the whole weight learning problem is also domain-liftable, not just the procedure that computes the gradients. This is formally stated in the next theorem from (Kuželka and Kungurtsev 2019).

Theorem 4 (Theorem 11 in (Kuželka and Kungurtsev 2019)). Let be a set of quantifier-free first-order logic formulas, each with at most 2 variables. Let be a set of universally quantified first-order logic sentences, each also with at most 2 variables. Let be the set of models of over a given domain . Let be a training example. Then there is an algorithm which finds weights of the MLN M given by formulas such that the log-likelihood of M given the training example is within of the optimum. The algorithm runs in time polynomial in and where is the interiority of the vector in the relational marginal polytope RMP.

The above theorem is restricted only to the 2-variable fragment of Markov logic networks but a bit larger tractable fragments were also studied in the literature, e.g. (Kazemi et al. 2016). The only reason why the proof of Theorem 4 needs the assumption that the Markov logic network is from the 2-variable fragment is that the weight learning algorithm needs the respective relational marginal polytope and the work (Kuželka and Kungurtsev 2019) only gives a domain-lifted algorithm for computing relational marginal polytopes for the 2-variable fragment. However, the result about domain-liftability that we presented in this section guarantees that the construction of relational marginal polytopes is always domain-liftable for Markov logic networks with domain-liftable inference. Hence we get a strengthening of the results from (Kuželka and Kungurtsev 2019).

Corollary 5. The positive result stated in Theorem 4 holds for all Markov logic networks for which computing the partition function Z is domain-liftable, i.e. not just for the 2-variable fragment.

Complexity of the Containment Test

In this section we study computational complexity of deciding if a point is contained in the relational marginal polytope given by a fixed list of first-order logic formulas and a domain . In particular we are interested in the complexity w.r.t. the domain size as a parameter, i.e. in domain-liftability of the problem. We prove a negative result showing that, unless ETIME = NETIME, there is no algorithm for deciding if a point is contained inside running in time polynomial in , and .

The proof of the negative result borrows ideas from the work of Jaeger (Jaeger 2015). In particular, we will need the next two results.

Proposition 6 (Corollary 3.4 in (Jaeger 2015)). If then there exists a function-free and constant-free first-order sentence such that {unaryspecis not recognized in deterministic polynomial time.

Lemma 1 (Proposition 4.2 in (Jaeger 2015)). Let be a quantifier-free first-order logic formula (possibly containing constants and function symbols). Let S be the set of relation symbols and the set of function symbols and constants contained in . Let be a set of new relation symbols that for every k-ary contains a k + 1-ary (constant symbols are treated as 0-ary function symbols). Let Func be a set of sentences such that for every it contains the following first-order logic sentences:

Then there exists a formula without quantifiers, function symbols and constants such that the following are equivalent for all n:

1. there exists an on domain such that ,

2. there exists an on domain such that Func .

Next, we state and prove our main negative result.

Theorem 7. There exists a list of first-order logic formulas such that the following holds. If then there is no algorithm for deciding if a point , where is contained in the relational marginal polytope that runs in time polynomial the size of the domain and in and . The hardness result holds even if the point is guaranteed not to be equal to a vertex of the polytope.

Proof idea. We use the techniques which were devised by Jaeger to prove lower bounds on the complexity of weighted model counting. Specifically, Proposition 6 tells us that there exists a function-free first-order logic sentence whose spectrum is not recognizable in time polynomial in the size of the domain (assuming ). Lemma 1 allows us to convert that sentence to a relatively manageable form. The conversion is done by first Skolemizing the sentence (which introduces function symbols) and then using Lemma 1 to get rid of the function symbols. Although Lemma 1 introduces new existential quantifiers, these may then appear only in (5), which we will be able to deal with. Then what we need to show is that if we could decide membership of points in relational marginal polytopes efficiently then we could also efficiently decide satisfiability of on domains of a given size. This is not completely straightforward because, first, the formulas in cannot have quantifiers and, second, the points should not correspond to statistics of individual possible worlds so that we could still guarantee that these points do not correspond to vertices of the polytopes. Hence, we need to select the point carefully; we will place it quite (polynomially) close to a potential vertex of the polytope. Finally, we have to show that the point is indeed inside the polytope for domain of size n if and only if the sentence has a model on a domain of size n.

Proof. Let be a first-order logic sentence with non-polynomial-time spectrum whose existence follows from Proposition 6. Let be a Skolemization of and let Func be constructed from as in Lemma 1. Let us write

where and . Finally we define and for every we define

where ’s and ’s are auxiliary relations which will be useful in turn. Then we construct the list of formulas Next all we need to do is to select a point such that the respective polytope-membership query would allow us to decide if has a model on a domain of size n. We define and

is polynomial in the size of the domain and so are the representations of and . It also follows from the way is selected that cannot be a vertex of the polytope.

has a model on a domain of size n.

has a model on a domain of size n. Since, by Lemma 1, has such a model if and only if has a model on a domain of size n, it will also follow that if then has a model on a domain of size n.

be written as a convex combination:

Now we want to show that at least one of the possible worlds in (6), which we denote as , must satisfy the following three conditions:

Let us denote . Then the above conditions can be also written as . We start by showing that the first two conditions hold. Suppose, for contradiction, that one of these two conditions does not hold. Then which fol-

lows from the fact that must hold for at least one of the ’s in (6). However, we also have and , hence we have a contradiction. There- fore there must be at least one that satisfies the first two conditions and there may, in fact, be multiple such worlds. Hence, is a possible world on a domain of size n which is a model of the first-order logic sentences , .

To show that at least one possible world from (6) must satisfy all three conditions, we will need to strengthen the above argument. We define where is the set of indices of those possible worlds from (6) that satisfy the first two conditions, and . Using similar reasoning as above, we must have as otherwise we would have , which can be seen as follows. If one of the sentences , is violated then . We then obtain the inequality when we realize that each of the possible worlds , where , must violate at least one of the l + 1 formulas.

Hence it follows from the above that the next inequality must hold

Next, we can notice that if then it must also hold . Suppose, for contradiction, that none of the possible worlds satis-fies the third condition. Then it must be the case for each of these possible worlds that for at least one . We need the following to hold for every from :

and we know that and for at least one . It follows that we need (at the very least) the inequal- ity to be true. After simple algebraic manipulations this yields the inequality

Plugging into (7) yields

whereas plugging into (8) yields

So we have arrived at a contradiction. It follows that at least one of the must simultaneously satisfy conditions 1, 2 and 3. We have thus shown that if then has a model on a domain of size n which is what we needed to show.

() If has a model on a domain of size n then so does . Hence, let be such a model of on a domain of size n. Let be a possible world on the set of relations of extended by the and relations. Let agree with on all the relations from and let satisfy , and , for all . Then it is not difficult to check that Next let be a possible world constructed from in almost the same way as but this time satisfying and , for all . Then we have Since, by definition, and and since is a convex combination of and , it follows that .

Conclusion

In this paper, we studied the complexity of problems related to relational marginal polytopes. As our first main contribution, we proved that domain-liftability of computing the partition function of a Markov logic network carries over to the problem of constructing relational marginal polytopes, which allowed us to extend positive results on domain-liftability for weight learning of Markov logic networks from (Kuželka and Kungurtsev 2019). As our second main contribution, we showed the hardness of deciding whether a point is contained in a relational marginal polytope, assuming a widely believed complexity-theoretic conjecture.

In this paper, we were interested only in answering the theoretical questions of domain-liftability: what is and is not possible. The next step is to design algorithms that will also be efficient in practice.

Acknowledgements. OK’s work has been supported by the OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics” and a donation from X-Order Lab. Part of this work was done while OK was already supported by the Czech Science Foundation project “Generative Relational Models” (20-19104Y).

A Lemma Used in Theorem 3

Lemma 2. Given a set S of linear inequality constraints, there is an algorithm to find a minimal subset such that specifies the same polytope as S, in polynomial time in the size of S.

Proof. Without loss of generality, we assume that every constraint in S is of the form We construct |S| linear programs: The i-th linear program uses all constraints in S except as the constraints, and its objective function is . If the optimal solution of this lin- ear program is strictly larger than , then we add into . It is not difficult to see that every constraint in cannot be implied by other constraints, or else that constraint cannot be added into , so is minimal. Besides, we only have |S| linear programs each of which can be solved in polynomial time (e.g., using some interior-point methods (Boyd and Vandenberghe 2004)), hence the whole procedure is in polynomial time.

References

[Beame et al. 2015] Beame, P.; Van den Broeck, G.; Gribkoff, E.; and Suciu, D. 2015. Symmetric weighted first-order model counting. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 313–328. ACM.

[Boyd and Vandenberghe 2004] Boyd, S., and Vandenberghe, L. 2004. Convex optimization. Cambridge university press.

[Bresler, Gamarnik, and Shah 2014] Bresler, G.; Gamarnik, D.; and Shah, D. 2014. Hardness of parameter estimation in graphical models. In Advances in Neural Information Processing Systems, 1062–1070.

[De Salvo Braz, Amir, and Roth 2005] De Salvo Braz, R.; Amir, E.; and Roth, D. 2005. Lifted first-order probabilistic inference. In Proceedings of the 19th international joint conference on Artificial intelligence, 1319–1325. Citeseer.

[Gogate and Domingos 2011] Gogate, V., and Domingos, P. 2011. Probabilistic theorem proving. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 256–265. AUAI Press.

[Jaeger and Schulte 2018] Jaeger, M., and Schulte, O. 2018. Inference, learning, and population size: Projectivity for SRL models. CoRR abs/1807.00564.

[Jaeger 2015] Jaeger, M. 2015. Lower complexity bounds for lifted inference. TPLP 15(2):246–263.

[Jones and Selman 1974] Jones, N. D., and Selman, A. L. 1974. Turing machines and the spectra of first-order formulas. The Journal of Symbolic Logic 39(1):139–150.

[Kazemi et al. 2016] Kazemi, S. M.; Kimmig, A.; den Broeck, G. V.; and Poole, D. 2016. New liftable classes for first-order probabilistic inference. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 3117–3125.

[Kersting, Mladenov, and Tokmakov 2017] Kersting, K.; Mladenov, M.; and Tokmakov, P. 2017. Relational linear programming. Artif. Intell. 244:188–216.

[Kuželka and Kungurtsev 2019] Kuželka, O., and Kungurt- sev, V. 2019. Lifted weight learning of markov logic networks revisited. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS-19).

[Kuželka et al. 2018] Kuželka, O.; Wang, Y.; Davis, J.; and Schockaert, S. 2018. Relational marginal problems: Theory and estimation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18).

[Mittal et al. 2019] Mittal, H.; Bhardwaj, A.; Gogate, V.; and Singla, P. 2019. Domain-size aware markov logic networks. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 3216–3224.

[Mladenov, Ahmadi, and Kersting 2012] Mladenov, M.; Ah- madi, B.; and Kersting, K. 2012. Lifted linear programming. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2012, 788–797.

[Richardson and Domingos 2006] Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine Learning 62(1-2):107–136.

[Roughgarden and Kearns 2013] Roughgarden, T., and Kearns, M. 2013. Marginals-to-models reducibility. In Advances in Neural Information Processing Systems, 1043–1051.

[Shalizi and Rinaldo 2013] Shalizi, C. R., and Rinaldo, A. 2013. Consistency under sampling of exponential random graph models. Annals of statistics 41(2):508.

[Sontag and Jaakkola 2008] Sontag, D., and Jaakkola, T. S. 2008. New outer bounds on the marginal polytope. In Advances in Neural Information Processing Systems, 1393– 1400.

[Van den Broeck, Meert, and Darwiche 2014] Van den Broeck, G.; Meert, W.; and Darwiche, A. 2014. Skolemization for weighted first-order model counting. In Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning.

[Van den Broeck 2011] Van den Broeck, G. 2011. On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Advances in Neural Information Processing Systems, 1386–1394.

[Van Haaren et al. 2016] Van Haaren, J.; Van den Broeck, G.; Meert, W.; and Davis, J. 2016. Lifted generative learning of markov logic networks. Machine Learning 103(1):27–55.