Exploiting Database Management Systems and Treewidth for Counting

2020·Arxiv

Abstract

Abstract

Bounded treewidth is one of the most cited combinatorial invariants, which was applied in the literature for solving several counting problems efficiently. A canonical counting problem is #Sat, which asks to count the satisfying assignments of a Boolean formula. Recent work shows that benchmarking instances for #Sat often have reasonably small treewidth. This paper deals with counting problems for instances of small treewidth. We introduce a general framework to solve counting questions based on state-of-the-art database management systems (DBMS). Our framework takes explicitly advantage of small treewidth by solving instances using dynamic programming (DP) on tree decompositions (TD). Therefore, we implement the concept of DP into a DBMS (PostgreSQL), since DP algorithms are already often given in terms of table manipulations in theory. This allows for elegant specifications of DP algorithms and the use of SQL to manipulate records and tables, which gives us a natural approach to bring DP algorithms into practice. To the best of our knowledge, we present the first approach to employ a DBMS for algorithms on TDs. A key advantage of our approach is that DBMS naturally allow to deal with huge tables with a limited amount of main memory (RAM), parallelization, as well as suspending computation.

Keywords: Dynamic Programming Parameterized Algorithmics Treewidth Database Systems Relational Algebra

1 Introduction

Counting solutions is a well-known task in mathematics, computer science, and other areas [8,17,24,38]. In combinatorics, for instance, one characterizes the number of solutions to problems by means of mathematical expressions, e.g., generating functions [18]. One particular counting problem, namely model counting (#Sat) asks to output the number of solutions of a given Boolean formula. Model counting and variants thereof have already been applied for solving a variety of real-world applications [8,10,19,44]. Such problems are typically considered rather hard, since #Sat is complete for the class #P [3,35], i.e., one can simulate any problem of the polynomial hierarchy with polynomially many calls [41] to a #Sat solver. Taming this high complexity is possible with techniques from parameterized complexity [12]. In fact, many of the publicly available #Sat instances show good structural properties after using regular preprocessors like pmc [29], see [23]. By good structural properties, we mean that graph representations of these instance have reasonably small treewidth. The measure treewidth is a structural parameter of graphs which models the closeness of the graph to being a tree and is one of the most cited combinatorial invariants studied in parameterized complexity [12], and subject of recent competitions [15].

This observation gives rise to a general framework for counting problems that leverages treewidth. The general idea to develop such frameworks is indeed not new, since there are both, specialized solvers [9,23,25], as well as general systems like D-FLAT [5], Jatatosk [4], and sequoia [31], that exploit treewidth. Some of these systems explicitly use dynamic programming (DP) to directly exploit treewidth by means of so-called tree decompositions (TDs), whereas others provide some kind of declarative layer to model the problem (and perform decomposition and DP internally). In this work, we solve (counting) problems by means of explicitly specified DP algorithms, where essential parts of the DP algorithm are specified in form of SQL SELECT queries. The actual run of the DP algorithm is then delegated to our system dpdb, which employs database management systems (DBMS) [43]. This has not only the advantage of naturally describing and manipulating the tables that are obtained during DP, but also allows dpdb to benefit from decades of database technology in form of the capability to deal with huge tables using limited amount of main memory (RAM), dedicated database joins, as well as query optimization and data-dependent execution plans.

Contribution. We implement a system dpdb for solving counting problems based on dynamic programming on tree decompositions, and present the following contributions. (i) Our system dpdb uses database management systems to handle table operations needed for performing dynamic programming efficiently. The system dpdb is written in Python and employs PostgreSQL as DBMS, but can work with other DBMSs easily. (ii) The architecture of dpdb allows to solve general problems of bounded treewidth that can be solved by means of table operations (in form of relational algebra and SQL) on tree decompositions. As a result, dpdb is a generalized framework for dynamic programming on tree decompositions, where one only needs to specify the essential and problem-specific parts of dynamic programming in order to solve (counting) problems. (iii) Finally, we show how to solve the canonical problem #Sat with the help of dpdb, where it seems that the architecture of dpdb is particularly well-suited. Concretely, we compare the runtime of our system with state-of-the-art model counters, where we observe competitive behavior and promising indications for future work.

Figure 1: Graph G (left) with a TD T of graph G (right).

2 Preliminaries

We assume familiarity with terminology of graphs and trees. For details, we refer to the literature and standard textbooks [16].

Boolean Satisfiability. We define Boolean formulas and their evaluation in the usual way, cf., [26]. A literal is a Boolean variable x or its negation is a set of clauses interpreted as conjunction. A clause is a set of literals interpreted as disjunction. For a formula or clause X, we abbreviate by var(X) the variables that occur in is a mapping The formula is obtained by removing every clause c from that contains a literal set to 1 by I, and removing from every remaining clause of all literals set to 0 by I. An assignment I is satisfying if ) = . Problem #Sat asks to output the number of satisfying assignments of a formula.

Tree Decomposition and Treewidth. A tree decomposition (TD) [27,12] of a given graph G is a pair T = () where T is a rooted tree and is a mapping which assigns to each node ) a set ), called bag, such that (i) V (G) = ) and ; and (ii) for each ), such that s lies on the path from r to t, we have

of G is the minimum width(T ) over all TDs T of G. For a node ), we say that type(t) is leaf if t has no children and has children and with and ) = ) = (“introduce”) if t has a single child ) and + 1; rem (“removal”) if t has a single child ) and + 1. If for every node ), type(, then the TD is called nice.

Example 1. Figure 1 depicts a graph G and a TD T of G of width 2. The treewidth of G is also 2 since G contains a complete graph with 3 vertices [27].

Relational Algebra. We use relational algebra [11] for manipulation of relations, which forms the theoretical basis of the database standard Structured Query Language (SQL) [43] on tables. An attribute a is of a certain finite domain dom(a). Then, a tuple r over set att(r) of attributes is a set of pairs of the form (a, v) with ) s.t. for each ), there is exactly one ) with (. A relation R is a finite set of tuples r over set att(R) := att(r) of attributes. Given a relation R over att(R). Then, we let dom(R) := ), and let relation ) be given by ) := , where . This concept can be lifted to , where we assume in addition to ), a set S of expressions of the form , such that , and f is an arithmetic function that takes a tuple , such that there is at most one expression in S for each . Formally, we define ˙) := with . Later, we use , where we assume and a so-called aggregate function g, which takes a relation and returns a value of domain dom(a). Therefore, we let ) := ])), where R[r] := . We define renaming of R given set A of attributes, and a bijective mapping m : att(s.t. dom(a) = dom(m(a)) for ), by ) := of rows in R according to a given Boolean formula with equalityis defined by ) := is a truth assignment over var() such that for each (3) ) = ), and (4) if ) = 1, and ) = 1, then ) = 1. Given a relation . Then, we refer to the . Further, a (according to ) corresponds to

3 Towards Relational Algebra for Dynamic Programming

A solver based on dynamic programming (DP) evaluates the input I in parts along a given TD of a graph representation G of the input. Thereby, for each node t of the TD, intermediate results are stored in a . This is achieved by running a so-called table algorithm A, which is designed for a certain graph representation, and stores in results of problem parts of I, thereby considering tables for child nodes . DP works for many problems P as follows. 1. Construct a graph representation G of the given input instance I. 2. Heuristically compute a tree decomposition 3. Traverse the nodes in V (T) in post-order, i.e., perform a bottom-up traversal of T. At every node t during post-order traversal, execute a table algorithm A that takes as input depending on P, as well as previously computed child tables of t and stores the result in

For solving problem P = #Sat, we need the following graph representation. The ] of a formula has as vertices its variables, where two variables are joined by an edge if they occur together in a clause of . Given formula and a node t of T. Sometimes, we refer to the treewidth of the primal graph of a given formula by the treewidth of the formula. Then, we let local problem which are the clauses entirely covered by

Table algorithm S as presented in Algorithm 1 shows all the cases that are needed to solve #Sat by means of DP over nice TDs. Each table consist of rows of the form , where I is an assignment of and c is a counter.

Figure 2: Selected tables obtained by DP on of Example 2 using algorithm S.

Nodes t with type(t) = leaf consist of the empty assignment and counter 1, cf., Line 1. For a node t with introduced variable ), we guess in Line 3 for each assignment of the child table, whether a is set to true or to false, and ensure that is satisfied. When an atom a is removed in node t, we project assignments of child tables to ), cf., Line 5, and sum up counters of the same assignments. For join nodes, counters of common assignments are multiplied, cf., Line 7.

isfying assignments of formula are, e.g., , . In total, there are 6 satisfying assignments of . Observe that graph G of Figure 1 actually depicts the primal graph . Intuitively, T of Figure 1 allows to evaluate formula in parts. Figure 2 illustrates a nice TD = () of the primal graph and tables that are obtained during the execution of We assume that each row in a table is identified by a number, i.e., row i corresponds to

table Then, introduces c and introduces , but since ) we have for . In consequence, for each of

table , we have ) = since S enforces satisfiability of in node t. Since type() = rem, we remove variable c from all elements in and sum up counters accordingly to construct . Note that we have already seen all rules where c occurs and hence c can no longer affect interpretations during the remaining traversal. We similarly create and . Since type() = join, we build table by taking the intersection of and . Intuitively, this combines assignments agreeing on a, where counters are multiplied accordingly. By definition (primal graph and TDs), for every , variables var(c) occur together in at least one common bag. Hence, since , we can reconstruct for example model 1of using highlighted (yellow) rows in Figure 2. On the other hand, if was unsatisfiable, would be empty (

Alternative: Relational Algebra. Instead of using set theory to describe how tables are obtained during dynamic programming, one could alternatively use relational algebra. There, tables for each TD node t are pictured as relations, where distinguishes a unique column (attribute) [[x]] for each ). Further, there might be additional attributes required depending on the problem at hand, e.g., we need an attribute cnt for counting in #Sat, or an attribute for modeling costs or weights in case of optimization problems. Algorithm 2 presents a table algorithm for problem #Sat that is equivalent to Algorithm 1, but relies on relational algebra only for computing tables. This step from set notation to relational algebra is driven by the observation that in these table algorithms one can identify recurring patterns, and one mainly has to adjust problem-specific parts of it (highlighted by coloring in Algorithm 1). In particular, one typically derives for nodes t with type(t) = leaf, a fresh initial table , cf., Line 1 of Algorithm 2. Then, whenever an atom a is introduced, such algorithms often use -joins with a fresh initial table for the introduced variable a representing potential values for a. In Line 3 the selection of the -join is performed according to , i.e. corresponding to the local problem of #Sat. Further, for nodes t with type(t) = rem, these table algorithms typically need projection. In case of Algorithm 2, Line 5 also needs grouping in order to maintain the counter, as several rows of might collapse in . Finally, for a node t with type(t) = join, in Line 7 we use extended projection and -joins, where we join on the same truth assignments, which allows us later to leverage advanced database technology.

Figure 3: Architecture of Dynamic Programming with Databases. Steps highlighted in red are provided by the system depending on specification of yellow and blue parts, which is given by the user for specific problems P. The yellow “E”s represent events that can be intercepted and handled by the user. The blue part concentrates on table algorithm , where the user specifies how SQL code is generated in a modular way.

Extended projection is needed for multiplying the counters of the two rows containing the same assignment.

4 Dynamic Programming on TDs using Databases & SQL

In this section, we present a general architecture to model table algorithms by means of database management systems. The architecture is influenced by the DP approach of the previous section and works as depicted in Figure 3, where the steps highlighted in yellow and blue need to be specified depending on the problem P. Steps outside Step 3 are mainly setup tasks, the yellow “E”s indicate events that might be needed to solve more complex problems on the polynomial hierarchy. For example, one could create and drop auxiliary sub-tables for each node during Step 3 within such events. Observe that after the generation of a TD T = (), Step 2b automatically creates tables for each node t of T, where the corresponding table schema of is specified in the blue part, i.e., within of such a table that is assumed in this section foresees one column for each element of the bag ), where additional columns such as counters or costs can be added.

Actually, the core of this architecture is focused on the table algorithm executed for each node t of T of TD T = (). Besides the definition of table schemes, the blue part concerns specification of the table algorithm by means of a procedural generator template that describes how to dynamically obtain SQL codefor each node t thereby oftentimes depending on ). This generated SQL code is then used internally for manipulation of tables during the tree decomposition traversal in Step 3 of dynamic programming. Listing 3 presents a general template, where parts of table algorithms for problems that are typically problem-specific are replaced by colored placeholders of the form #placeHolder#, cf., Algorithm 2. Observe that Line 3 of Listing 3 uses extended projection as in Line 7. This is needed for some problems requiring changes on vertex introduction.

Note, however, that the whole architecture does not depend on certain normalization or forms of TDs, e.g., whether it is nice or not. Instead, a table algorithm of any TD is simply specified by handling problem-specific implementations of the placeholders of Listing 3, where the system following this architecture is responsible for interleaving and overlapping these cases within a node t. In fact,

we discuss an implementation of a system according to this architecture next, where it is crucial to implement non-nice TDs to obtain higher efficiency.

4.1 System dpdb: Dynamic Programming with Databases

We implemented the proposed architecture of the previous section in the prototypical dpdb system. The system is open-source, written in Python 3 and uses PostgreSQL as DBMS. We are convinced though that one can easily replace PostgreSQL by any other state-of-the-art relational database that uses SQL. In the following, we discuss implementation specifics that are crucial for a performant system that is still extendable and flexible.

Computing TDs. TDs are computed mainly with the library htd version 1.2 with default settings [2], which finds TDs extremely quick also for interesting instances [23] due to heuristics. Note that dpdb directly supports the TD format of recent competitions [15], i.e., one could easily replace the TD library. It is important though to not enforce htd to compute nice TDs, as this would cause a lot of overhead later in dpdb for copying tables. However, in order to benefit from the implementation of -joins, query optimization and state-of-the-art database technology in general, we observed that it is crucial to limit the number of child nodes of every TD node. Then, especially when there are huge tables involved, -joins among child node tables cover at most a limited number of child node tables. In consequence, the query optimizer of the database system still has a chance to come up with meaningful execution plans depending on the contents of the table. Note that though one should consider -joins with more than just two tables, since such binary -joins already fix in which order these tables shall be combined, thereby again limiting the query optimizer. Apart from this trade-off, we tried to outsource the task of joining tables to the DBMS, since the performance of database systems highly depends on query optimization. The actual limit, which is a restriction from experience and practice only, highly depends on the DBMS that is used. For PostgreSQL, we set a limit of at most 5 child nodes for each node of the TD, i.e., each -join covers at most 5 child tables.

Towards non-nice TDs. Although this paper presents the algorithms for nice TDs (mainly due to simplicity), the system dpdb interleaves these cases as presented in Listing 3. Concretely, the system executes one query per table for each node t during the traversal of TD T . This query consists of serveral parts and we briefly explain its parts from outside to inside. First of all, the inner-most part concerns the row candiates for consisting of the -join as in Line 7 of Listing 3, including parts of Line 3, namely cross-joins for each introduced variable, involving #intrTab# without the filtering on #localProbFilter#. Then, there are different configurations of dpdb concerning these row candidates. For debugging (see below) one could (1) actually materialize the result in a table, whereas for performance runs, one should use (2) common table expressions (CTEs or WITH-queries) or (3) sub-queries (nested queries), which both result in one nested SQL query per table . On top of these row candidates, projectionand grouping involving #aggrExp# as in Line 5 of Listing 3, as well as selection acording to #localProbFilter#, cf., Line 3, is specified. It turns out that PostgreSQL can do better with sub-queries, where the query optimizer oftentimes pushes selection and projection into the sub-query if needed, which is not the case for CTEs, as discussed in the PostgreSQL manual [1, Sec. 7.8.1]. On different DBMS or other vendors, e.g., Oracle, it might be better to use CTEs instead.

Example 3. Consider again Example 2 and Figure 1. If we use table algorithm on formula and Option (3): sub-queries, where the row candidates are expressed via a sub-queries. Then, for each node dpdb generates a view vi as well as a table containing in the end the content of vi. Observe that each view only has one column [[a]] for each variable a of since the truth assignment of the other variables are not needed later. This keeps the tables compact, only has two rows, , and have only one row. We obtain the following views.

CREATE VIEW v1 AS SELECT a, sum(cnt) AS cnt FROM (WITH intrTab AS (SELECT 1 AS val UNION ALL SELECT 0) SELECT i1.val AS a, i2.val AS b, i3.val AS c, 1 AS cnt FROM intrTab i1, intrTab i2, intrTab i3) WHERE (NOT a OR b OR c) AND (a OR NOT b OR NOT c) GROUP BY a

CREATE VIEW v2 AS SELECT a, sum(cnt) AS cnt FROM (WITH intrTab AS (SELECT 1 AS val UNION ALL SELECT 0) SELECT i1.val AS a, i2.val AS d, 1 AS cnt FROM intrTab i1, intrTab i2) WHERE (a OR d) AND (a OR NOT d) GROUP BY a

Parallelization. A further reason to not over-restrict the number of child nodes within the TD, lies in parallelization. In dpdb, we compute tables in parallel along the TD, where multiple tables can be computed at the same time, as long as the child tables are computed. Therefore, we tried to keep the number of child nodes in the TD as high as possible. In our system dpdb, we currently allow for at most 24 worker threads for table computations and 24 database connections at the same time (both pooled and configurable). On top of that we have 2 additional threads and database connections for job assignments to workers, as well as one dedicated watcher thread for clean-up and connection termination, respectively.

Logging, Debugging and Extensions. Currently, we have two versions of the dpdb system implemented. One version aims for performance and the other one tries to achieve comprehensive logging and easy debugging of problem (instances), thereby increasing explainability. The former for instance does neither keep intermediate results nor create database tables in advance (Step 2b), as depicted in Figure 3, but creates tables according to an SQL SELECT statement. In the latter we keep all the intermediate results, we record database timestamps before and after certain nodes, provide statistics as, e.g., width, number of rows, etc. Further, since for each table , exactly one SQL statement is executed for filling this table, we also have a dedicated view of the SQL SELECT statement, whose result is then inserted in . Together with the power and flexibility of SQL queries, we observed that this helps in finding errors in the table algorithm specifications.

Besides convient debugging, system dpdb immediately contains an extension for approximation. There, we restrict the table contents to a maximum number of rows. This allows for certain approximations on counting problems or optimization problems, where it is infeasible to compute the full tables. Further, dpdb foresees a dedicated randomization on these restricted number of rows such that we obtain different approximate results on different random seeds.

Note that dpdb can be easily extended. Each problem can overwrite existing default behavior and dpdb also supports problem-specific argument parser for each problem individually. Out-of-the-box, we support the formats DIMACS sat and DIMACS graph [32] as well as the common format for TDs [15].

4.2 Table algorithms with dpdb for selected problems

The system dpdb allows for easy protyping of DP algorithms on TDs. This covers decision problems, counting problems as well as optimization problems. As a proof of concept, we present the relevant parts of table algorithm specification according to the template in Listing 3 for a selection of problems below. To this end, we assume in this section a not necessarily nice TD T = () of the corresponding graph representation of our given instance I. Further, for the following specifications of the table algorithm using the template in Algorithm 2, we assume any node t of T and its child nodes

Problem #Sat. Given instance formula . Then, specific parts for #Sat for node

– #SELECT 1 AS cnt

– #intrTab#: SELECT 1 AS val UNION ALL 0

– #– #aggrExp#: SUM(cnt) AS cnt

– #.cnt AS cnt

Observe that for the corresponding decision problem Sat, where the goal is to decide only the existence of a satisfying assignment for given formula returns the empty table and parts #aggrExp#, #extProj# are just empty since there is no counter needed.

– #SELECT 1 AS cnt

– #intrTab#: SELECT 1 AS val UNION ALL . . . UNION ALL o

– #– #aggrExp#: SUM(cnt) AS cnt

– #.cnt AS cnt

Problem MinVC. Given input graph I = G = (V, E), a vertex cover is a set of vertices of G such that for each edge , we have = . Then, MinVC asks to find the minimum cardinality |C| among all vertex covers C, i.e., C is such that there is no vertex cover with . Local problem MinVC(t, G) := is defined as above. We use an additional column card for storing cardinalities.

Problem can be specified as follows.

– #SELECT 0 AS card

– #intrTab#: SELECT 1 AS val UNION ALL 0

– #– #aggrExp#: MIN(card) AS card

– #

Observe that #ExtProj# is a bit more involved on non-nice TDs, as, whenever the column for a vertex a is set to 1, i.e., vertex a is in the vertex cover, we have to consider a only with cost 1, also if a appears in several child node bags.

Note that concrete implementations could generate and apply parts of this specification, as for example in #localProbFilter# only edges involving newly introduced vertices need to be checked.

Similar to MinVC and #o-Col one can model several other (graph) problems. One could also think of counting the number of solutions of problem MinVC, where both a column for cardinalities and one for counting is used. There, in addition to grouping with GROUP BY in dpdb, we additionally could use the HAVING construct of SQL, where only rows are kept, whose column card is minimal.

5 Experiments

We conducted a series of experiments using publicly available benchmark sets for #Sat. Our tested benchmarks [22] are publicly available, and our results are also on github at github.com/hmarkus/dp on dbs/padl2020.

5.1 Setup

Measure & Resources. We mainly compare wall clock time and number of timeouts. In the time we include preprocessing time as well as decomposition time for computing a TD with a fixed random seed. For parallel solvers we allowed access to 24 physical cores on machines. We set a timeout of 900 seconds and limited available RAM to 14 GB per instance and solver.

Benchmark Instances. We considered a selection of overall 1494 instances from various publicly available benchmark sets #Sat consisting of fre/meel benchmarks(1480 instances), and c2d benchmarks(14 instances). However, we considered instances preprocessed by regular #Sat preprocessor pmc [29], similar to results of recent work on #Sat [23], where it was also shown that more than 80% of the #Sat instances have primal treewidth below 19 after preprocessing.

Benchmarked system dpdb. We used PostgreSQL 9.5 for our system dpdb, which was available on our benchmark described hardware below. However, we expect major performance increases if higher versions are used, which was not available on our benchmark machines. In particular, parallel queries, where a query is evaluated in parallel, were added and improved in every version greater than 9.6.

Other benchmarked systems. In our experimental work, we present results for the most recent versions of publicly available #Sat solvers, namely, c2d 2.20 [13],

all 1.0.2 [42], and sdd 2.0 [14], which are all based on knowledge compilation techniques. We also considered rather recent approximate solvers ApproxMC2, ApproxMC3 [7] and sts 1.0 [20], as well as CDCL-based solvers Cachet 1.21 [37], , and sharpSAT 13.02 [40]. Finally, we also included multi-core solvers gpusat 1.0 and gpusat 2.0 [23], which both are based on dynamic programming, as well as countAntom 1.0 [6] on 12 physical CPU cores, which performed better than on 24 cores. Experiments were conducted with default solver options.

Benchmark Hardware. Almost all solvers were executed on a cluster of 12 nodes. Each node is equipped with two Intel Xeon E5-2650 CPUs consisting of 12 physical cores each at 2.2 GHz clock speed, 256 GB RAM and 1 TB hard disc drives (not an SSD) Seagate ST1000NM0033. The results were gathered on Ubuntu 16.04.1 LTS machines with disabled hyperthreading on kernel 4.4.0-139. As we also took into account solvers using a GPU, for gpusat1 and gpusat2

Figure 4: Runtime for the top 15 solvers over all #Sat instances. The x-axis refers to the number of instances and the y-axis depicts the runtime sorted in ascending order for each solver individually.

we used a machine equipped with a consumer GPU: Intel Core i3-3245 CPU operating at 3.4 GHz, 16 GB RAM, and one Sapphire Pulse ITX Radeon RX 570 GPU running at 1.24 GHz with 32 compute units, 2048 shader units, and 4GB VRAM using driver amdgpu-pro-18.30-641594 and OpenCL 1.2. The system operated on Ubuntu 18.04.1 LTS with kernel 4.15.0-34.

5.2 Results

Figure 4 illustrates the top 15 solvers, where instances are preprocessed by pmc, in a cactus-like plot, which provides an overview over all the benchmarked #Sat instances. The x-axis of these plots refers to the number of instances and the y-axis depicts the runtime sorted in ascending order for each solver individually. Overall, dpdb seems to be quite competitive and beats most of the solvers, as for example gpusat1, sharpSAT, dsharp, approxmc as well as cachet. Surprisingly, our system shows a different runtime behavior than the other solvers. We believe that the reason lies in an initial overhead caused by the creation of the tables that seems to depend on the number of nodes of the used TD. There, I/O operations of writing from main memory to hard disk seem to kick in. Table 1 presents more detailed runtime results, showing a solid fifth place for dpdb as our system solves the vast majority of the instances. Assume we only have instances up to an upper boundof treewidth 35. Then, if instances with TDs up to width 35 are considered, dpdb solves even slightly more instances than countAntom.

6 Final Discussion & Conclusions

We presented a generic system dpdb for explicitly exploiting treewidth by means of dynamic programming on databases. The idea of dpdb is to use database

Table 1: Number of solved #Sat instances, preprocessed by pmc and grouped by intervals of upper bounds of the treewidth. time[h] is the cumulated wall clock time in hours, where unsolved instances are counted as 900 seconds.

management systems (DBMS) for table manipulation, which makes it (1) easy and elegant to perform rapid prototyping for problems, and (2) allows to leverage from decades of database theory and database system tuning. It turned out that all the cases that occur in dynamic programming can be handled quite elegantly with plain SQL queries. Our system dpdb can be used for both decision and counting problems, thereby also considering optimization. We see our system particularly well-suited for counting problems, especially, since it was shown that for model counting (#Sat) instances of practical relevance typically have small treewidth [23]. In consequence, we carried out preliminary experiments on publicly available instances for #Sat, where we see competitive behavior compared to most recent solvers.

Future Work. Our results give rise to several research questions. First of all, we want to push towards PostgreSQL 12, but at the same time also consider other vendors and systems, e.g., Oracle. In particular, the behavior of different systems might change, when we use different strategies on how to write and evaluate our SQL queries, e.g., sub-queries vs. common table expressions. Currently, we do not create or use any indices, as preliminary tests showed that meaningful B*tree indices are hard to create and oftentimes cost too much time to create. Further, the exploration of bitmap indices, as available in Oracle enterprise DBMS would be worth trying in our case (and for #Sat), since one can efficiently combine database columns by using extremely efficient bit operations.

It might be worth to rigorously test and explore our extensions on limiting the number of rows per table for approximating #Sat or other counting problems, cf., [8,19,39]. Another interesting research direction is to study whether efficient data representation techniques on DBMS can be combined with dynamic programming in order to lift our solver to quantified Boolean formulas. Finally, we are also interested in extending this work to projected model counting [21].

References

1. Postgresql documentation 12 (2019), available at: https://www.postgresql.org/docs/ 12/queries-with.html

2. Abseher, M., Musliu, N., Woltran, S.: htd – a free, open-source framework for (customized) tree decompositions and beyond. In: CPAIOR’17. LNCS, vol. 10335, pp. 376–386. Springer Verlag (2017)

3. Bacchus, F., Dalmao, S., Pitassi, T.: Algorithms and complexity results for #SAT and bayesian inference. In: FOCS’03. pp. 340–351. IEEE Computer Soc. (2003)

4. Bannach, M., Berndt, S.: Practical access to dynamic programming on tree decompositions. Algorithms 12(8), 172 (2019)

5. Bliem, B., Charwat, G., Hecher, M., Woltran, S.: D-flat2: Subset minimization in dynamic programming on tree decompositions made easy. Fundam. Inform. 147(1), 27–61 (2016)

6. Burchard, J., Schubert, T., Becker, B.: Laissez-faire caching for parallel #SAT solving. In: SAT’15. LNCS, vol. 9340, pp. 46–61. Springer Verlag (2015)

7. Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distributionaware sampling and weighted model counting for SAT. In: AAAI’14. pp. 1722–1730. The AAAI Press (2014)

8. Chakraborty, S., Meel, K.S., Vardi, M.Y.: Improving approximate counting for probabilistic inference: From linear to logarithmic sat solver calls. In: IJCAI’16. pp. 3569–3576. The AAAI Press (2016)

9. Charwat, G., Woltran, S.: Expansion-based QBF solving on tree decompositions. Fundam. Inform. 167(1-2), 59–92 (2019)

10. Choi, A., Van den Broeck, G., Darwiche, A.: Tractable learning for structured probability spaces: A case study in learning preference distributions. In: IJCAI’15. The AAAI Press (2015)

11. Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)

12. Cygan, M., Fomin, F.V., Kowalik, �L., Lokshtanov, D., D´aniel Marx, M.P., Pilipczuk, M., Saurabh, S.: Parameterized Algorithms. Springer Verlag (2015)

13. Darwiche, A.: New advances in compiling CNF to decomposable negation normal form. In: ECAI’04. pp. 318–322. IOS Press (2004)

14. Darwiche, A.: SDD: A new canonical representation of propositional knowledge bases. In: IJCAI’11. pp. 819–826. AAAI Press/IJCAI (2011)

15. Dell, H., Komusiewicz, C., Talmon, N., Weller, M.: The PACE 2017 Parameterized Algorithms and Computational Experiments Challenge: The Second Iteration. In: IPEC 2017. Leibniz International Proceedings in Informatics (LIPIcs), vol. 89, pp. 30:1–30:12. Dagstuhl Publishing (2018)

16. Diestel, R.: Graph Theory, 4th Edition, Graduate Texts in Mathematics, vol. 173. Springer Verlag (2012)

17. Domshlak, C., Hoffmann, J.: Probabilistic planning via heuristic forward search and weighted model counting. Journal of Artificial Intelligence Research 30 (2007)

18. Doubilet, P., Rota, G.C., Stanley, R.: On the foundations of combinatorial theory. vi. the idea of generating function. In: Berkeley Symposium on Mathematical Statistics and Probability. pp. 2: 267–318 (1972)

19. Due˜nas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y.: Counting-based reliability estimation for power-transmission grids. In: AAAI’17. pp. 4488–4494. The AAAI Press (2017)

20. Ermon, S., Gomes, C.P., Selman, B.: Uniform solution sampling using a constraint solver as an oracle. In: UAI’12. pp. 255–264. AUAI Press (2012)

21. Fichte, J.K., Hecher, M., Morak, M., Woltran, S.: Exploiting treewidth for projected model counting and its limits. In: SAT’18. LNCS, vol. 10929, pp. 165–184. Springer Verlag (2018)

22. Fichte, J.K., Hecher, M., Woltran, S., Zisser, M.: A Benchmark Collection of #SAT Instances and Tree Decompositions (Benchmark Set) (Jun 2018), https: //doi.org/10.5281/zenodo.1299752

23. Fichte, J.K., Hecher, M., Zisser, M.: An improved gpu-based SAT model counter. In: CP. Lecture Notes in Computer Science, vol. 11802, pp. 491–509. Springer (2019)

24. Gomes, C.P., Sabharwal, A., Selman, B.: Chapter 20: Model counting. In: Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185, pp. 633–654. IOS Press (2009)

25. Kiljan, K., Pilipczuk, M.: Experimental evaluation of parameterized algorithms for feedback vertex set. In: SEA. LIPIcs, vol. 103, pp. 12:1–12:12. Schloss Dagstuhl (2018)

26. Kleine B¨uning, H., Lettman, T.: Propositional logic: deduction and algorithms. Cambridge University Press, Cambridge (1999)

27. Kloks, T.: Treewidth. Computations and Approximations, LNCS, vol. 842. Springer Verlag (1994)

28. Koriche, F., Lagniez, J.M., Marquis, P., Thomas, S.: Knowledge compilation for model counting: Affine decision trees. In: IJCAI’13. The AAAI Press (2013)

29. Lagniez, J., Marquis, P.: Preprocessing for propositional model counting. In: AAAI. pp. 2688–2694. AAAI Press (2014)

30. Lagniez, J.M., Marquis, P.: An improved decision-DDNF compiler. In: IJCAI’17. pp. 667–673. The AAAI Press (2017)

31. Langer, A., Reidl, F., Rossmanith, P., Sikdar, S.: Evaluation of an MSO-solver. In: Proc. ALENEX. pp. 55–63. SIAM / Omnipress (2012)

32. Liu, J., Zhong, W., Jiao, L.: Comments on ”the 1993 DIMACS graph coloring challenge” and ”energy function-based approaches to graph coloring”. IEEE Trans. Neural Networks 17(2), 533 (2006)

33. Muise, Christian J .and McIlraith, S.A., Beck, J.C., Hsu, E.I.: Dsharp: Fast d-DNNF compilation with sharpSAT. In: AI’17. LNCS, vol. 7310, pp. 356–361. Springer Verlag (2012)

34. Oztok, U., Darwiche, A.: A top-down compiler for sentential decision diagrams. In: IJCAI’15. pp. 3141–3148. The AAAI Press (2015)

35. Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82(1-2), 273–302 (1996)

36. Samer, M., Szeider, S.: Algorithms for propositional model counting. Journal of Discrete Algorithms 8(1), 50—64 (2010)

37. Sang, T., Bacchus, F., Beame, P., Kautz, H., Pitassi, T.: Combining component caching and clause learning for effective model counting. In: SAT’04 (2004)

38. Sang, T., Beame, P., Kautz, H.: Performing bayesian inference by weighted model counting. In: AAAI’05. The AAAI Press (2005)

39. Sharma, S., Roy, S., Soos, M., Meel, K.S.: GANAK: A scalable probabilistic exact model counter. In: IJCAI. pp. 1169–1176. ijcai.org (2019)

40. Thurley, M.: sharpSAT – counting models with advanced component caching and implicit BCP. In: SAT’06. pp. 424–429. Springer Verlag (2006)

41. Toda, S.: PP is as hard as the polynomial-time hierarchy. SIAM J. Comput. 20(5), 865–877 (1991)

42. Toda, T., Soh, T.: Implementing efficient all solutions SAT solvers. ACM Journal of Experimental Algorithmics 21, 1.12 (2015), special Issue SEA 2014

43. Ullman, J.D.: Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press (1989)

44. Xue, Y., Choi, A., Darwiche, A.: Basing decisions on sentences in decision diagrams. In: AAAI’12. The AAAI Press (2012)