b

DiscoverSearch
About
My stuff
Interactive Debugging of Knowledge Bases
2016·arXiv
Abstract
Abstract

I Prolog 1

1 Introduction 5

2 Preliminaries 21 2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Considered Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Notational Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Knowledge Base Debugging 27 3.1 Parsimonious Knowledge Base Debugging . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Background Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Diagnosis Computation 37 4.1 Conflict Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Conflict Sets versus Justifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 The Relation between Conflict Sets and Diagnoses . . . . . . . . . . . . . . . . . . . . 41

4.4 Methods for Diagnosis Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.4.1 Computation of a Minimal Conflict Set . . . . . . . . . . . . . . . . . . . . . . 46

4.4.2 Correctness of Conflict Set Computation . . . . . . . . . . . . . . . . . . . . . 49

4.5 Hitting Set Tree Based Diagnosis Computation . . . . . . . . . . . . . . . . . . . . . . 58

4.5.1 Breadth-First Diagnosis Computation . . . . . . . . . . . . . . . . . . . . . . . 58

4.5.2 Correctness of Breadth-First Diagnosis Computation . . . . . . . . . . . . . . . 62

4.6 Diagnosis Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6.1 Construction of a Probability Space . . . . . . . . . . . . . . . . . . . . . . . . 67

4.6.2 Using Probabilities for Diagnosis Computation . . . . . . . . . . . . . . . . . . 72

4.6.3 Correctness of Weighted Diagnosis Computation . . . . . . . . . . . . . . . . . 75

4.6.4 Using Probabilities to Compute Minimum Cardinality Diagnoses . . . . . . . . 78

4.7 Non-Interactive Knowledge Base Debugging Algorithm . . . . . . . . . . . . . . . . . . 79

5 Summary 85

II Interactive Knowledge Base Debugging 87

6 Motivation and Problem Definitions 91

7 User Interaction 93 7.1 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.2 Leading Diagnoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7.3 Q-Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.4 Interpretation of Q-Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.5 The Relation between a Query and Its Q-Partition . . . . . . . . . . . . . . . . . . . . . 97

7.6 Existence of Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

8 Query Generation 99 8.1 Generation of a Pool of Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.2 Discussion of Query Pool Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.3 Minimization of Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.4 Soundness of Query Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.5 Complexity of Query Pool Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.6 Shortcomings of Query Pool Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 115

8.7 Correctness of Query Pool Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

9 An Algorithm for Interactive Knowledge Base Debugging 121 9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

9.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

9.2.1 Input Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

9.2.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

9.2.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

9.2.4 Algorithm Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

9.3 Query Selection Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9.4 Correctness and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

10 Summary 143

III Iterative Diagnosis Computation 145

11 STATICHS: A Static Iterative Diagnosis Computation Algorithm 149 11.1 Overview and Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

11.2 Algorithm Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

11.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

11.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

12 DYNAMICHS: A Dynamic Iterative Diagnosis Computation Algorithm 171 12.1 Overview and Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

12.2 Algorithm Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

12.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

12.4 Details and Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

12.4.1 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

12.4.2 The Labeling Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

12.4.3 Impact of Answered Queries on Conflict Sets . . . . . . . . . . . . . . . . . . . 198

12.4.4 Impact of Answered Queries on Diagnoses . . . . . . . . . . . . . . . . . . . . 200

12.4.5 Redundant Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

12.4.6 Hitting Set Tree Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

12.4.7 De-Facto Non-Redundant Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 214

12.4.8 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

12.4.9 Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

12.4.10 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

13 Discussion of Iterative Diagnosis Computation 249

IV Two Query Strategies for Efficient Fault Localization in Interactive Ontology Debugging 255

14 Introduction to the Problem 259

15 Motivating Examples and Basic Concepts 263

16 Entropy-Based Query Selection 273

17 Implementation Details 277

18 Evaluation 281

19 Related Work 291

20 Summary and Conclusions 293

V Minimizing User Interaction in Ontology Debugging 295

21 Introduction to the Problem 299

22 Motivation and Basic Concepts 301

23 RIO: Risk Optimization for Query Selection 307

24 Evaluation 311

25 Related Work 315

26 Summary and Conclusions 317

VI A Direct Approach to Sequential Diagnosis of High Cardinality Faults in Knowledge Bases 319

27 Introduction to the Problem 323

28 Basic Concepts 325

29 Interactive Direct Diagnosis of Knowledge Bases 329

30 Evaluation 335

31 Summary and Conclusions 339

VII Epilog 341

32 Related Work 345

33 Summary 349

34 Future Work Topics 353

Bibliography 357

List of Figures

1.1 The Principle of Non-Interactive KB Debugging . . . . . . . . . . . . . . . . . . . . . . 7

1.2 The Principle of Interactive KB Debugging . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Precedence Constraints among the Parts . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Recursion Tree for the Computation of a Minimal Conflict Set . . . . . . . . . . . . . . 50

4.2 Non-Interactive KB Debugging Process without Fault Information . . . . . . . . . . . . 81

4.3 Non-Interactive KB Debugging Process with Fault Information . . . . . . . . . . . . . . 82

11.1 (Example 11.1) Solving the Problem of Interactive Static KB Debugging . . . . . . . . . 160

11.2 (Example 11.2) Solving the Problem of Interactive Static KB Debugging . . . . . . . . . 161

11.3 (Example 11.2 continued) Solving the Problem of Interactive Static KB Debugging . . . 162

12.1 (Example 12.1) Solving the Problem of Interactive Dynamic KB Debugging . . . . . . . 186

12.2 (Example 12.1 continued) Solving the Problem of Interactive Dynamic KB Debugging . 187

12.3 (Example 12.2) Solving the Problem of Interactive Dynamic KB Debugging . . . . . . . 192

12.4 (Example 12.2 continued) Solving the Problem of Interactive Dynamic KB Debugging . 193

15.1 The Search Tree of the Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 271

18.1 Average Number of Queries Required to Select the Target Diagnosis . . . . . . . . . . . 283

18.2 Example of Prior Fault Probabilities of Syntax Elements Sampled from Extreme, Moderate and Uniform Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

18.3 Average Time/Query Gain Resulting from the Application of the Extended CKK Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

18.4 Average Time Required to Identify the Target Diagnosis Using CKK and Brute Force Query Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

24.1 Average Number of Queries Required by RIO Compared to Other Query Selection Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

24.2 Box-Whisker Plots Illustrating the Performance Discrepancy between Better and Worse Query Selection Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

29.1 INV-QX Recursion Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

29.2 Identification of the Target Diagnosis Using INV-HS-TREE and INV-QX . . . . . . . . 332

29.3 Identification of the Target Diagnosis Using HS-TREE and QX . . . . . . . . . . . . . . 333

List of Tables

image

22.1 A Set of Queries and Associated Partitions w.r.t. the Initial DPI of the Example Ontology 306

24.1 Average Time for Debugging Session, between Two Successive Queries and Average Number of Queries Required by Each Strategy . . . . . . . . . . . . . . . . . . . . . . 313

24.2 Percentage Rates Indicating How Often Which Query Selection Strategy Performed Best 313

30.1 HS-TREE and INV-HS-TREE Applied to Anatomy Benchmark . . . . . . . . . . . . . . 336

30.2 Performance of Sequential Diagnosis Using Direct Computation of Diagnoses . . . . . . 337

Abstract

Most artificial intelligence applications rely on knowledge about a relevant real-world domain that is encoded in a knowledge base (KB) by means of some logical knowledge representation language. The most essential benefit of such logical KBs is the opportunity to perform automatic reasoning to derive implicit knowledge or to answer complex queries about the modeled domain. The feasibility of meaningful reasoning requires a KB to meet some minimal quality criteria such as consistency; that is, there must not be any contradictions in the KB. Without adequate tool assistance, the task of resolving such violated quality criteria in a KB can be extremely hard even for domain experts, especially when the problematic KB includes a large number of logical formulas, comprises complicated formalisms, was developed by multiple people or in a distributed fashion or was (partially) generated by means of some automatic systems.

Non-interactive debugging systems published in research literature often cannot localize all possible faults (incompleteness), suggest the deletion or modification of unnecessarily large parts of the KB (non-minimality), return incorrect solutions which lead to a repaired KB not satisfying the imposed quality requirements (unsoundness) or suffer from poor scalability due to the inherent complexity of the KB debugging problem. Even if a system is complete and sound and considers only minimal solutions, there are generally exponentially many solution candidates to select one from. However, any two repaired KBs obtained from these candidates differ in their semantics in terms of entailments and non-entailments. Selection of just any of these repaired KBs might result in unexpected entailments, the loss of desired entailments or unwanted changes to the KB which in turn might cause unexpected new faults during the further development or application of the repaired KB. Also, manual inspection of a large set of solution candidates can be time-consuming (if not practically infeasible), tedious and error-prone since human beings are normally not capable of fully realizing the semantic consequences of deleting a set of formulas from a KB. Hence there is a need for adequate tools that support a user when facing a faulty KB.

In this work, we account for these issues and propose methods for the interactive debugging of KBs which are complete and sound and compute only minimally invasive solutions, i.e. suggest the deletion or modification of just a set-minimal subset of the formulas in the problematic KB. User interaction takes place in the form of queries asked to a person, e.g. a domain expert, about intended and non-intended entailments of the correct KB. To construct a query, only a minimal set of two solution candidates must be available. After the answer to a query is known, the search space for solutions is pruned. Iteration of this process until there is only a single solution candidate left yields a repaired KB which features exactly the semantics desired and expected by the user.

The novel contributions of this work are:

Thorough Theoretical Workup of the Topic of Interactive Debugging of Monotonic KBs: We evolve the theory of the topic by first elaborating on the theory of non-interactive KB debugging, revealing crucial shortcomings in the application of non-interactive methods and thereby motivating the development and deployment of interactive approaches in KB debugging. Then, we give some important results that guarantee the feasibility of interactive KB debugging, give some precise definitions of the problems interactive KB debugging aims to solve and present algorithms that provably solve these problems.

A Complete Picture of an Interactive Debugging System is Drawn: This is the first work that deals with an entire system of algorithms that are required for the interactive debugging of monotonic KBs, considers and details all algorithms separately, proves their correctness and demonstrates how all these algorithms are orchestrated to make up a full-fledged and provably correct interactive KB debugging system.

Two New Algorithms for the Iterative Computation of Candidate Solutions in the scope of interactive KB debugging are proposed. The first one guarantees constant convergence towards the exact solution of the interactive KB problem by the ascertained reduction of the number of remaining solutions after any query is answered. The second one features powerful search tree pruning techniques and might thus be expected to exhibit a more time- and space-saving behavior than existing algorithms, in particular for growing problem instances.

Suggestion and Extensive Analysis of Different Methods for Selection of the “Best” Query to ask the user next. We compare a greedy “split-in-half” strategy that proposes queries which eliminate half of the known candidate solutions with a strategy relying on information entropy that chooses the query with highest information gain based on (a user’s) beliefs about faults in the KB. Comprehensive experiments manifest that an average guess of the fault information suffices to reduce the query answering effort for the interacting user, often to a significant extent, by means of the latter strategy compared to the former. Moreover, we demonstrate that both methods clearly outperform a random way of selecting queries.

Presentation of a Reinforcement Learning Query Selection Strategy. Minimal effort for the interacting user can be achieved if both the query selection method is chosen carefully and the provided fault information satisfies some minimum quality requirements. In particular, for deficient fault information and unfavorable strategy for query selection, we observe cases where the overhead in terms of user effort exceeds 2000% (!) in comparison to employing a more favorable query selection strategy. Since, unfortunately, assessment of the fault information is only possible a-posteriori (after the debugging session is finished and the correct solution is known), we devise a learning strategy (RIO) that continuously adapts its behavior depending on the performance achieved and in this vein minimizes the risk of using low-quality fault information. This approach makes interactive debugging practical even in scenarios where reliable fault estimates are difficult to obtain. Evaluations provide evidence that for 100% of the cases in the hardest (from the debugging point of view) class of faulty test KBs, RIO performed at least as good as the best other strategy and in more than 70% of these cases it even manifested superior behavior to the best other strategy. Choosing RIO over other approaches can involve an improvement by the factor of up to 23, meaning that more than 95% of user time and effort might be saved per debugging session.

Provisioning of Mechanisms for Efficiently Dealing with KB Debugging Problems Involving High Cardinality Faults. In the standard interactive debugging approach described in this work, the computation of queries is based on the generation of the set of most probable solution candidates. By this postulation, certain quality guarantees about the output solution can be given. However, we learn that dropping this requirement can bring about substantial savings in terms of time and especially space complexity of interactive debugging, in particular in debugging scenarios where faulty KBs are (partly) generated as a result of the application of automatic systems. In such situations, we propose to base query computation on any set of solution candidates using a “direct” method for candidate generation. We study the application of this direct method to high cardinality faults in KBs and find out that the number of required queries per debugging session is scarcely affected for cases when the standard approach is also applicable. However, the direct method proves applicable in situations when the standard approach is not (due to time or memory issues) and is still able to locate the correct solution.

image

image

In this part, we first give an introduction in Chapter 1. This includes a motivation why knowledge base debugging is a “hot topic” (and even getting hotter as intelligent applications and devices become more and more ubiquitous), an introduction to the non-interactive debugging of knowledge bases and the revealment of decisive shortcomings of this paradigm, e.g. poor scalability and the risk of obtaining solutions of inferior quality. As a solution to the identified issues we then explain how a (group of) user(s) might collaborate with an interactive debugging system to determine high-quality solutions even in scenarios where non-interactive systems fail. Further, we discuss the design and the components of a generic interactive debugger, provide an illustrating example and outline the powerful feature of our system to be able to incorporate background knowledge into the debugging process which can drastically reduce the search space for solutions and disclose faults in the knowledge base that could be missed otherwise. Finally, we provide an enumeration of the contributions of this work and discuss the further organization of this part and of the rest of this work.1

Motivation. Most artificial intelligence applications rely on knowledge that is encoded in a knowledge base (KB) by means of some logical knowledge representation language such as propositional logic (PL) [CL73], Datalog [CGT89], first-order logic (FOL) [CL73], The Web Ontology Language (OWL [PSHH+04], OWL 2 [GHM+08, MPSP09]) or Description Logic (DL) [BCM+07]. Experts in a variety of application domains keep developing KBs of constantly growing size. A concrete example of a repository containing biomedical KBs is the Bioportal2, which comprises vast ontologies with tens or even hundreds of thousands of terms each (e.g. the SNOMED-CT ontology with currently over 395.000 terms). Such KBs however pose a significant challenge for people as well as tools involved in their evolution, maintenance and application.

All these activities are based on the most essential benefit of logical KBs, namely the opportunity to perform automatic reasoning to derive implicit knowledge or to answer complex queries about the modeled domain. The feasibility of meaningful reasoning requires a KB to meet the minimum quality criterion consistency, i.e. there must not be any contradictions in the KB. Because any logical formula can be derived from an inconsistent KB. Further on, one might postulate further requirements to be met by a KB. For instance, one might consider faulty a FOL KB entailing  ∀X ¬p(X)for some predicate symbol p occurring in the KB. Such a KB would be incoherent, i.e. it would violate the requirement coherency (which was originally defined for DL KBs [SHCH07, PSK05]). Additionally, test cases can be specified giving information about desired (positive test cases) and non-desired (negative test cases) entailments a correct KB should feature. This characterization of a KB’s intended semantics is a direct analogon to the field of software debugging, where test cases are exploited as a means to verify the correct semantics of the program code.

As KBs are growing in size and complexity, their likeliness of violating one of these criteria increases. Faults in KBs may, for instance, arise because human reasoning is simply overstrained [HBP11, HPS09]. That is, generally a person will not be capable of completely grasping or mentally processing the entire knowledge contained in a (large or complex) KB at once. In fact, a person might fully comprehend some isolated part of a the KB, but might not be able to determine or understand all implications or nonimplications of this isolated part combined with other parts of a KB, i.e. when new logical formulas are added.

Another reason for the non-compliance with the mentioned quality criteria imposed on KBs might be that multiple (independently working) editors contribute to the development of the KB [NCLM06] which may lead to contradictory formulas. The OBO Project3 and the NCI Thesaurus4 are examples of collaborative KB development projects. Employing automatic tools, e.g. [JRG11, NB12, JMSK09], to generate (parts of) KBs can further exacerbate the task of KB quality assurance [Mei11, EFvH+11].

Moreover, as studies in cognitive psychology [CP71, JL99] attest, humans make systematic errors while formulating or interpreting logical formulas. These observations are confirmed by [RDH+04, RCVB09] which present common faults people make when developing a KB (ontology). Hence, it is essential to devise methods that can efficiently identify and correct faults in a KB.

Non-Interactive KB Debugging. Given a set of requirements to the KB and sets of test cases, KB debugging methods [SHCH07, KPHS07, FS05, HPS08] can localize a (potential) fault by computing a subset D of the formulas in the KB K called a diagnosis. At least all formulas in a diagnosis must be (adequately) modified or deleted in order to obtain a KB  K∗that satisfies all postulated requirements and test cases. Such a KB  K∗constitutes the solution to the KB debugging problem. Figure 1.15 outlines such a KB debugging system. The input to the system is a diagnosis problem instance (DPI) defined by

some KB K formulated using some (monotonic) logical language L (every formula in K might be correct or faulty),

(optionally) some KB B (over L) formalizing some background knowledge relevant for the domain modeled by K (such that B and K do not share any formulas; all formulas in B are considered correct)

a set of requirements R to the correct KB,

sets of positive (P) and negative (N ) test cases (over L) asserting desired semantic properties of the correct KB and

(optionally) some fault information FP, e.g. in terms of fault probabilities of logical formulas in K.

Moreover, the system requires a sound and complete logical reasoner for deciding consistency (coherency) and calculating logical entailments of a KB formulated over the language L. Some approaches (including the ones presented in this work) use the reasoner as a black-box (e.g. [SFFR12, Hor11]) within the debugging system. That is, the reasoner is called as is and serves as an oracle independent from other computations during the debugging process; that is, the internals of the reasoner are irrelevant for the debugging task. On the other hand, glass-box approaches (e.g. [SHCH07, Hor11, KPSH05]) attempt to exploit internal modifications of the reasoner for debugging purposes; in other words, the sources of problems (e.g. contradictory formulas) in the KB are computed as a direct consequence of reasoning [Hor11]. The advantages of a black-box approach over a glass-box approach are the lower memory consumption and better performance [KPSH05] of the reasoner and the reasoner independence of the debugging method. The latter benefit is essential for the generality of our approaches and their applicability to various knowledge representation formalisms.

Given these inputs, the debugging system focuses on (a subset of) all possible fault candidates (usually the set of minimal, i.e. irreducible, diagnoses) and usually outputs the most probable one amongst these if some fault information is provided or the minimum cardinality one, otherwise. Alternatively, a debugging system might also be employed to calculate a predefined number of (most probable or minimum cardinality) minimal diagnoses or to determine all minimal diagnoses computable within a predefined time limit.

image

Figure 1.1: The principle of non-interactive KB debugging.

Issues with Non-Interactive KB Debugging Systems. In real-world scenarios, debugging tools often have to cope with large numbers of minimal diagnoses where the trivial application, i.e. deletion, of any minimal diagnosis leads to a (repaired) KB with different semantics in terms of entailed and non-entailed formulas. For example, in [SF10] a sample study of real-world KBs revealed that the number of different minimal diagnoses might exceed thousand by far (1782 minimal diagnoses for a KB with only 1300 formulas). In such situations simple visualization of all these alternative modifications of the ontology is clearly ineffective. Selecting a wrong diagnosis (in terms of its semantics, not in terms of fulfillment of test cases and requirements) can lead to unexpected entailments or non-entailments, lost desired entailments and surprising future faults when the KB is further developed. Manual inspection of a large set of (minimal) diagnoses is time-consuming (if not practically infeasible), error-prone and often computationally infeasible due to the complexity of diagnosis computation.

Moreover, [Stu08] has put several (non-interactive) debugging systems to the test using a test set of faulty (incoherent OWL) real-world KBs which were partly designed by humans and partly by the application of automatic systems. The result was that most of the investigated systems had serious performance problems, ran out of memory, were not able to locate all the existing faults in the KB (incompleteness), reported parts of a KB as faulty which actually were not faulty (unsoundness), produced only trivial solutions or suggested non-minimal faults (non-minimality). Often, performance problems and incompleteness of non-interactive debugging methods can be traced back to an explosion of the search tree for minimal diagnoses.

The Solution: Interactive KB Debugging. In this work we present algorithms for interactive KB debugging. These aim at the gradual reduction of compliant minimal diagnoses by means of user interaction, thereby seeking to prevent the search tree for minimal diagnoses from exploding in size by performing regular pruning operations. “User” in this case might refer to a single person or multiple persons, usually experts of the particular domain the faulty KB is dealing with such as biology, medicine or chemistry. Throughout an interactive debugging session, the user is asked a set of automatically chosen queries about the domain that should be modeled by a given faulty KB. A query can be created by the system after a set D of a minimum of two minimal diagnoses has been precomputed (we call D the leading diagnoses). Each query is a conjunction (i.e. a set) of logical formulas that are entailed by some correct subset of the formulas in the KB. With regard to one particular query Q, any set of minimal diagnoses for the KB, in particular the set D which has been utilized to generate Q, can be partitioned into three sets, the first one (D+) including all diagnoses in D compliant only with a positive answer to Q, the second (D−) including all diagnoses in D compliant only with a negative answer to Q, and the third (D0) including all diagnoses in D compliant with both answers. A positive answer to Q signalizes that the conjunction of formulas in Q must be entailed by the correct KB wherefore Q is added to the set of positive test cases. Likewise, if the user negates Q, this is an indication that at least one formula in Q must not be entailed by the correct KB. As a consequence, Q is added to the set of negative test cases.

Assignment of a query Q to either set of test cases results in a new debugging scenario. In this new scenario, all elements of  D−are no longer minimal diagnoses given that Q has been classified as a positive test case. Otherwise, all diagnoses in  D+are invalidated. In this vein, the successive reply to queries generated by the system will lead the user to the single minimal solution diagnosis that perfectly reflects their intended semantics. In other words, after deletion of all formulas in the solution diagnosis from the KB and the addition of the conjunction of all formulas in the specified positive test cases to the KB, the resulting KB meets all requirements and positive as well as negative test cases. In that, the added formulas contained in the positive test cases serve to replace the desired entailments that are broken due to the deletion of the solution diagnosis from the KB.

Thence, in the interactive KB debugging scenario the user is not required to cope with the understanding of which faults (e.g. sources of inconsistency or implications of negative test cases) occur in the faulty initial KB, why they are faults (i.e. why particular entailments are given and others not) and how to repair them. All these tasks are undertaken by the interactive debugging system.

The proposed approaches to interactive KB debugging in this work follow the standard model-based diagnosis (MBD) technique [Rei87, dKW87]. MBD has been successfully applied to a great variety of problems in various fields such as robotics [SW05], planning [SW09], debugging of software programs [WSM02], configuration problems [FFJS04], hardware designs [FSW99], constraint satisfaction problems and spreadsheets [ARW12]. Given a description (model) of a system, together with an observation of the system’s behavior which conflicts with the intended behavior of the system, the task of MBD is to find those components of the system (a diagnosis) which, when assumed to be functioning abnormally, provide an explanation of the discrepancy between the intended and the observed system behavior. Translated to the setting of KB debugging, the set of “system components” comprises the formulas  ax iin the given faulty KB K. The “system description” refers to the statement that the KB K along with the background KB B and the positive test cases  p ∈ Pmust meet all predefined requirements (e.g. consistency, coherency) and must not logically entail any of the negative test cases  n ∈ N, i.e.

image

The “observation which conflicts with the intended behavior of the system” corresponds to the finding that (i) or (ii) or both are violated. That is, the “system description” along with the “observation” and the assumption that all components are sound yields an inconsistency. An “explanation for the discrepancy between observed and intended system behavior” (i.e. a diagnosis) is the assumption D that all formulas in a subset D of K are faulty (“behave abnormally”) and all formulas in K\D are correct (“do not behave abnormally”) such that the “system description” along with the “observation” and the assumption D is consistent. Computation of (minimal) diagnoses is accomplished with the aid of minimal conflict sets, i.e. irreducible sets of formulas in the KB K that preserve the violation of (i) or (ii) or both.

An MBD problem can be modeled as an abduction problem [BATJ91], i.e. finding an explanation for a set of data. It was proven in [BATJ91] that the computation of the first explanation (minimal diagnosis) is in P. However, given a set of explanations (minimal diagnoses) it is NP-complete to decide whether there is an additional explanation (minimal diagnosis). Stated differently, the detection of the first explanation can be efficiently accomplished whereas the finding of any further one is intractable (unless P = NP). When seeing the (interactive) KB debugging problem as an abduction problem, one must additionally take into account the costs for reasoning. Because, a call to a logical reasoner is required in order to decide whether or not a set of hypotheses (a subset of the KB) is an explanation (minimal diagnosis). Incorporating the necessary reasoning costs and assuming consistency a minimal requirement to the correct KB, the finding of the first explanation (minimal diagnosis) is already NPhard even for propositional KBs [SL89] (since propositional satisfiability checking is NP-complete). The worst case complexity for the debugging of KBs formulated over more expressive logics such as OWL 2 (reasoning is 2-NEXPTIME-complete [GHM+08, Kaz08]) will be of course even worse. This seems quite discouraging. However, we have shown in our previous works [RSFF13, SFFR12, SFRF14c] that for many real-world KBs interactive KB debugging is feasible in reasonable time, despite high (or intractable) worst case reasoning costs and the intractable complexity of the abduction (i.e. minimal diagnosis finding) problem as such. Hence, the goal of this work is amongst others to present algorithms that work well in many practical scenarios.

Assumptions about the Interacting User. About a user u consulting an (interactive) debugging system, we make the following plausible assumptions:

U1 u is not able to explicitly enumerate a set of logical formulas that express the intended domain that should be modeled in a satisfactory way, i.e. without unwanted entailments or non-fulfilled requirements,

U2 u is able to answer concrete queries about the intended domain that should be modeled, i.e. u can classify a given logical formula (or a conjunction of logical formulas) as a wanted or unwanted proposition in the intended domain (i.e. an entailment or non-entailment of the correct domain model).

The first assumption is obviously justified since otherwise u could have never obtained a faulty KB, i.e. a KB that violates at least one requirement or test case, and there would be no need for u to employ a debugging system.

Regarding the second assumption, the first thing to be noted is that any KB (i.e. any model of the intended domain) either does entail a certain logical formula ax or it does not entail ax. Second, if u is assumed to bring along enough expertise in that domain, u should be able to gauge the truth of (at least) some formulas about that domain, especially if these formulas constitute logical entailments of parts of the specified knowledge in KB so far. We want to emphasize that u is not required to be capable of answering all possible queries (or formulas) about the respective domain since u might always skip a particular query in our system without any noticeable disadvantages. In such a case, the system keeps generating further queries, one at a time (usually the next-best one according to some quality measure for queries), until u is ready to answer it. As the number of possible queries is usually exponential in the number of minimal diagnoses exploited to compute it, there will be plenty of different “surrogate queries” in most scenarios.

A Motivating Example. To get a more concrete idea of these assumptions, the reader is invited to think about whether the following first-order KB K is consistent (a similar example is discussed in [HPS09]):

image

If we assume that the predicate symbols res, secr and gen stand for ’researcher’, ’secretary’ and ’general employee’, respectively, and the constant pam stands for the person Pam, the KB says the following:

Formula 1.1: “Somebody is a researcher if and only if everything they write is a paper.”

Formula 1.2: “Everybody who writes something is a researcher.”

Formula 1.3: “Each secretary is a general employee.”

Formula 1.4: “No general employee is a researcher.”

Formula 1.5: “Pam is a secretary.”

This KB is indeed inconsistent. The reader might agree that it is not very easy to understand why this is the case. The observations made in [HPS09] concerning a slight modification  K′of the KB K extracted from a real-world KB confirm this assumption. Compared to K, the KB  K′included only Formulas 1.1- 1.3 of K, was formulated in DL (cf. Section 2.2), and used the terms A, C, . . . instead of res, paper, . . . . Amongst others, this KB  K′was used as a sample KB in a study where participants had to find out whether a concrete given formula is or is not entailed by a concrete given KB. In the case of the KB  K′, the assignment (translated to the terminology in our KB K) was to find out whether  ∀X(secr(X) → res(X))is an entailment of formulas 1.1-1.3. Although  K′contains only three formulas, the result was that even participants with many years of experience in DL, among them also DL reasoner developers, did not realize that this is in fact the case (the reason for this entailment to hold is that formulas 1.1-1.3 imply that  ∀X res(X)holds).

Since  ∀X res(X)is also necessary for the inconsistency of K, this suggests that people might also have severe difficulties in comprehending why K is inconsistent. Once the validity of this entailment is clear, it is relatively straightforward to see that K cannot have any models. For, res(pam) (due to ∀X res(X)) and  ¬res(pam)(due to formulas 1.3-1.5) are implications of K.

Consequently, we might also assume that even experienced knowledge engineers (not to mention pure domain experts) could end up with a contradictory KB like K, which substantiates our first assumption (U1) about u. Probably, the intention of those people who specified formulas 1.1-1.3 was not that  ∀X res(X)should be entailed. That is, it might be already a too complex task for many people to (mentally) reason even with such a small KB like this and manually derive implicit knowledge from it.

However, on the other hand, we might well assume u to be able to answer a concrete query about the intended domain they tried to model by K. For instance, one such query could be whether  Q1 :={∀X res(X)}is a desired entailment of their model (i.e. “should everybody be a researcher in your intended model of the domain?”). If we assume the (seemingly obvious) case that u negates this query, i.e. asserts that this is an unwanted entailment, then an interactive debugging system (employing a logical reasoner) can derive that at least one of the formulas 1.1 and 1.2 must be faulty. This holds because the only set-minimal explanation in terms of formulas in K for the entailment  ∀X res(X)is given by these two formulas. In other words, the set of formulas {1.1, 1.2} is the only minimal conflict set in K given that  Q1is a negative test case. Hence, the deletion (or suitable modification) of any of these formulas will break this unwanted entailment.

Before it is known that  Q1must not be entailed by the correct KB, given consistency is the only requirement to the KB postulated by u, the complete KB K is a minimal conflict set. That is, after the assignment of a (strategically well-chosen) query to the set of positive or, in this case, negative test cases can already shift the focus of potential modifications or deletions to a subset of only two candidate formulas. We would call these two formulas the remaining minimal diagnoses after an answer to the query  Q1has been submitted.

Initially, there are five minimal diagnoses, each formula in K is one. The meaning of a diagnosis is that its deletion from K leads to the fulfillment of all requirements and (so-far-)specified positive and negative test cases. As the reader should be easily able to see, the deletion of any formula from K yields a consistent KB; e.g. removing formula 1.5 prohibits the entailment  ¬res(pam)whereas discarding formula 1.2 prohibits the entailment res(pam). The reader should notice that, as soon as the negative test case  Q1is known, removing (only) formula 1.5 does not yield a correct KB since {1.1, 1.2, 1.3, 1.4} still entails  Q1which must not be entailed.

A second query to u could be, for example,  Q2 : {∃X((∃Y writes(X, Y )) ∧ ¬res(X))}(i.e. “is there somebody who writes something, but is no researcher?”). Again, it is reasonable to suppose that u might know whether or not this should hold in their intended domain model. The (seemingly obvious) answer in this case would be positive, e.g. because u intends to model students who write homework, exams, etc., but are no researchers. This positive answer leads to the new positive test case  Q2. Adding this positive test case, like a set of new formulas, to the KB K would result in  Knew := K ∪ Q2. The debugging system would then figure out that formula 1.2 is the only minimal conflict set in the KB  Knew. The reason for this is that the elimination of formula 1.2 breaks the entailment  Q1(negative test case) and enables the addition of a new desired entailment  Q2(positive test case) without involving the violation of any requirements (consistency). Therefore, formula 1.2 is the only minimal diagnosis that is still compliant with the new knowledge in terms of  Q1 = falseand  Q2 = trueobtained.

It is important to notice that the solution KB  Knewthat is returned to the user as a result of the interactive debugging session includes a new logical formula  Q2that can be seen as a repair of the deleted formula 1.2. Since the knowledge after the debugging session is that  ¬1.2 ≡ Q2must be true, this new knowledge is incorporated into the KB  Knew. This indicates that the fault in KB was simply that the  ¬in front of formula 1.2 had been forgotten.

Notice however that the positive test case  Q2is not added to K as a usual KB formula, but rather as an extension of K that has already been approved by the user. Should the user at some later point in time commit the same fault again (and explicitly specify some formula x equivalent to formula 1.2), then the interactive debugging system, owing to the positive test case  Q2, would immediately detect a singleton conflict comprising only formula x. As a consequence, each diagnosis considered during this later debugging session would suggest to delete or modify (at least) x.

This scenario should illustrate that, in spite of not being able to specify their domain knowledge in a logically consistent way, the user u might still be able to answer questions about the intended domain, which supports our second assumption made about the user u (the reader might agree that answering Q1and  Q2is much easier than recognizing the entailment  ∀X res(X)of the KB). In other words, the availability of an (efficient) debugging system could help u debug their KB, without needing to analyze which entailments hold or do not hold, why certain entailments hold or do not hold or why exactly the KB does not meet certain imposed requirements or test cases, by simply answering queries whether a certain entailment should or should not hold. These queries are automatically generated by the system in a way that they focus on the problematic parts of the KB, i.e. the minimal conflict sets, and discriminate between the possible solution candidates, i.e. the minimal diagnoses.

Benefits of the Usage of Conflict Sets. We want to remark that the usage of minimal conflict sets “naturally” forces the system to take into consideration only the smallest relevant (faulty) parts of the problematic KB. This is owed to the property of minimal conflict sets to abstract from what all the reasons for a certain entailment or requirements violation are. Instead, only the “root” (subset-minimal) causes for such violations are examined and no computation time is wasted to extract “purely derived” causes (those which are resolved as a byproduct of fixing all root causes from which it is derived, cf. [Hor11, Kal06]). For example, assuming the debugging scenario involving our example KB consisting only of formulas 1.1-1.4 which is incoherent and a requirements set including coherency. Then, there are two entailments reflecting the incoherency of this KB, first  ∀X ¬secr(X)and second  ∀X ¬gen(X)(these entailments hold due to  ∀X res(X)which follows from formulas 1.1 and 1.2). Of these two, only the second one is a “root” problem; the first one is a “purely derived” problem. That means, the entailment  ∀X ¬secr(X)only holds due to the presence of the entailment  ∀X ¬gen(X). So, the cause for  ∀X ¬gen(X)is given by the set of formulas {1.1, 1.2, 1.4} whereas the proper superset {1.1, 1.2, 1.3, 1.4} of this set accounts for the entailment  ∀X ¬secr(X). The exploitation of minimal conflict sets (the only minimal conflict set for this KB is {1.1, 1.2, 1.4}) ascertains that such “purely derived” causes of requirements or test case violations will not be considered at all.

image

Figure 1.2: The principle of interactive KB debugging.

The Ability to Incorporate Background Knowledge. Another feature of the approaches described in this work is their ability to incorporate relevant additional information in terms of a background knowledge KB B (which is regarded to be correct). B is a (consistent) KB which is usually semantically related with the faulty KB, e.g. B represents knowledge about the domain modeled by K that has already been sufficiently endorsed by domain experts. For instance, a doctor who wants to express their knowledge of dermatology in terms of a KB might resort to an approved background KB that specifies the human anatomy. Taking this background information into account puts the problematic KB into some context with existing knowledge and can thereby help a great deal to restrict the search space for solutions of the (interactive) KB debugging problem. This has also been found in [Stu08]. This useful strategy of prior search space restriction is also exploited in the field of ontology matching6 where automatic systems are employed to generate an alignment, i.e. a set of correspondences between semantically related entities of two different ontologies (KBs). Here, both ontologies are considered correct and diagnoses are only allowed to include elements of the alignment [MST07].

Applying a strategy like that to our example KB given above, supposing that we know that Pam is not a researcher in the world the KB should model, we might specify the background KB  B := {¬res(pam)}prior to starting the interactive debugging session. This would immediately reduce the initial set of possible minimal diagnoses from five (i.e. the entire KB) to two (i.e. the first two formulas 1.1 and 1.2). Reason for this is that the entailment  ∀X res(X)of formulas 1.1 and 1.2 already conflicts with the background knowledge  ¬res(pam).

Outline of an Interactive KB Debugging System. The schema of an interactive debugging system is pictured by Figure 1.2.7 As in the case of a non-interactive debugging system (see above), the system receives as input a diagnosis problem instance (DPI). Further on, a range of additional parameters might be provided to the system. These serve as a means to fine-tune the system’s behavior in various aspects. Hence, we call these inputs tuning parameters. These are (roughly) explained next.

First, some parameters might be specified that take influence on the number of leading diagnoses used for query generation and the necessary computation time invested for leading diagnoses computation. Moreover, some parameter determining the quantity of (pre-)generated queries (of which one is selected to be asked to the user) versus the reaction time (the time it takes the system to compute the next query after the current one has been answered) of the system can be chosen. A further input argument is a query selection measure constituting a notion of query “goodness” that is employed to filter out the “best” query among the set of generated queries. To give the system a criterion specifying when a solution of the interactive KB debugging problem is “good enough”, the user is allowed to define a fault tolerance parameter  σ. The lower this parameter is chosen, the better the (possibly “approximate”) solution that is guaranteed to be found. In case of specifying this parameter to zero, the system will (if feasible) return the “exact” solution of the interactive KB debugging problem. Roughly, the exact solution is given in terms of a solution KB obtained by means of a single solution candidate (minimal diagnosis) that is left after a sufficient number of queries have been answered (and added to the test cases). On the contrary, an approximate solution is represented by a solution KB obtained by means of a solution candidate with sufficiently high probability (where “sufficiently high” is determined by  σ) at some point where there are still multiple solution candidates available.

Finally, the user may choose between two different modes (static or dynamic) of determining the leading diagnoses. The static diagnosis computation strategy guarantees a constant “convergence” towards the exact solution by “freezing” the set of solution candidates at the very beginning and exploiting answered queries only for the deletion of minimal diagnoses. A possible disadvantage of this approach is the lack of efficient pruning of the used search tree. On the other hand, the dynamic method of calculating leading diagnoses has a primary focus on the preservation of a search tree of small size, thereby aiming at being able to solve diagnosis problem instances which are not solvable by the static approach due to high time and (more critically) space complexity. To this end, more powerful pruning rules are applied in this case which do not permit the algorithm to consider only a fixed set of solution candidates. Rather, the set of minimal diagnoses and minimal conflict sets are generally variable in this case which means that they are subject to change after assignment of an answered query to the test cases.

Like in the case of a non-interactive debugger, an interactive debugging system requires a sound and complete logical reasoner for deciding consistency (coherency) and calculating logical entailments of a KB formulated over the language L.

The workflow in interactive KB debugging illustrated by Figure 1.2 is the following:

1. A set of leading diagnoses is computed by the diagnosis engine (by means of the fault information, if available) using the logical reasoner and passes it to the query generation module.

2. The query generation module computes a pool of queries exploiting the set of leading diagnoses and delivers it to the query selection module.

3. The query selection module filters out the “best query” (often by means of the fault information, if available) and shows it to the interacting user.

4. The user submits an answer to the query.

5. The query along with the given answer is used to formulate a new test case.

6. This new test case is transferred back to the diagnosis engine and taken into account in prospective iterations. If the stop criterion (as per  σ, see above) is not met, another iteration starts at step 1. Otherwise, the solution KB  K∗constructed from the currently most probable minimal diagnosis is output.

Contributions of this Work. The contributions of this work are the following:

This work provides a thorough account of the subject and evolves the theory of interactive KB debugging (for monotonic KBs) by presupposing a reader to have only some basic knowledge of logic. Hence, this work addresses newbies as well as people already familiar with related topics. Whereas the comprehensive theoretical considerations might appeal to the more theoretically

oriented readers such as researchers, the precise and exhaustive description of all discussed algorithms might be interesting from the implementation point of view and might serve more practically oriented people such as programmers or engineers as an algorithmic cookbook. Further on, the extensive illustration of the way algorithms work by examples might also serve a merely superficially interested reader to just receive a rough impression of how KBs might be interactively debugged.

Except for basics in FOL and PL, this work is self-contained and provides all necessary definitions and proofs to make the topic of interactive KB debugging accessible to the reader.

To the best of our knowledge, this work provides the most comprehensive and detailed introduction to the field of interactive debugging of (monotonic) KBs. Our previous works on the topic [SFFR12, SF10, RSFF13, FS05, SFRF14c] are more application-oriented and thus abstract from some details and omit some of the proofs in favor of comprehensive evaluations of the presented strategies.

This is the first work that gives formal and precise definitions of problems dealt with in interactive KB debugging and introduces methods that provably solve these problems. We believe that precise problem statements are the very basis for all further scientific investigations in a field. Hence, we hope that this work can “open” the important subject of interactive KB debugging to a broader audience of interested researchers. This can lead to further progress and improvements in debugging techniques which we deem essential in the light of the growing number of intelligent applications incorporating KBs of growing size and complexity (keyword: The Semantic Web [BLHL+01]).

An in-depth discussion of query computation including computational complexity considerations together with an accentuation of potential ways of improving these methods is given. The investigated methods for query computation have been used also in [SFFR12, RSFF13, SF10, SFRF14c], but have not been addressed in depth in these works.

We are concerned with the discussion of different ways of exploiting diverse sources of meta information in the KB debugging process from which diagnosis probabilities can be extracted. Our previous works on this topic [SFFR12, RSFF13, SF10, SFRF14c] do not address this matter in a comparable depth.

We give a formal proof of the soundness of an algorithm QX (based on [Jun04]) for the detection of a minimal conflict set in a KB and we show the correctness (completeness, soundness, optimality) of a hitting set tree algorithm HS (based on [Rei87]) for finding minimal diagnoses in a KB in best-first order (i.e. most probable diagnoses first) which uses QX for conflict set computation only on-demand. We are not aware of any other work that comprises such proofs.

We establish the theoretical relationship between the widely-used notions of a conflict set and a justification. The former is i.a. used in [dKW87, Rei87, SFFR12, RSFF13] and the latter i.a. in [HPS08, HPS09, HPS10, Hor11, HBP11, HPS12b, SQJH08, Kal06, MS09, SSZ09, NRG12]. As a consequence, empirical results concerning the one might be translated to the other. For instance, since each minimal conflict set is an subset of a justification and there is an efficient (polynomial) method for computing a minimal conflict set given a superset of a minimal conflict set, a result manifesting the efficiency of justification computation for a set of KBs (e.g. [HPS12a]) implies the efficiency of conflict set computation for the same set of KBs. Moreover, we argue that minimal conflict sets are the better choice for our system since these put the focus of the debugger only on the smallest faulty subsets of the KB whereas justifications are better suited in scenarios where exact explanations for the presence of certain entailments are sought.

Two new algorithms for iterative (leading) diagnosis computation in interactive KB debugging are proposed. One that is guaranteed to reduce the number of remaining solutions after a query is

answered and one that features more powerful pruning techniques than our previously published algorithms [SFFR12, RSFF13] (an evaluation that compares the overall efficiency of our previous algorithms with the ones proposed in this work must still be conducted and is part of our future research).

We suggest and extensively analyze different methods for the selection of an “optimal” query to ask the user out of a pool of possible queries. We compare a greedy “split-in-half” strategy that proposes queries which eliminate half of the leading diagnoses with a strategy relying on information entropy [Sha48] that chooses the query with highest information gain based on some statistic or (a user’s) beliefs about faults in the KB. Comprehensive experiments manifest that only an average guess of the fault information suffices to reduce the query answering effort for the interacting user, often to a significant extent, by means of the latter strategy compared to the former. Moreover, we demonstrate that both methods clearly outperform a random query selection strategy. The latter result witnesses that incorporation of meta (fault) information into the debugging process is in fact reasonable and might relieve the interacting user of a significant proportion of the effort required without taking into account any meta information.

Addressing the issue of choosing the suitable query selection method for some given fault information, we present a reinforcement learning query selection strategy. For, reliance upon a strategy (e.g. information entropy) that fully exploits and gains from the given fault information can speed up the debugging procedure in the normal case, but can also have a negative impact on the performance in the bad case where the actual solution diagnosis is rated as highly improbable. As an alternative, one might prefer to rely on a tool (e.g. “split-in-half”) which does not consider any fault information at all. In this case, however, possibly well-chosen information cannot be exploited, resulting again in inefficient debugging actions.

Minimal effort for the interacting user can be achieved if both the query selection method is chosen carefully and the provided fault information satisfies some minimum quality requirements. In particular, for deficient fault information and unfavorable strategy for query selection, we observe cases where the overhead in terms of user effort exceeds 2000% (!) in comparison to employing a more favorable query selection strategy. Since, unfortunately, assessment of the fault information is only possible a-poteriori (after the debugging session is finished and the correct solution is known), we devise a learning strategy (RIO) that continuously adapts its behavior depending on the performance achieved and in this vein minimizes the risk of using low-quality fault information.

This approach makes interactive debugging practical even in scenarios where reliable fault estimates are difficult to obtain. Evaluations provide evidence that for 100% of the cases in the hardest (from the debugging point of view) class of faulty test KBs, RIO performed at least as good as the best other strategy and in more than 70% of these cases it even manifested superior behavior to the best other strategy. Choosing RIO over other approaches can involve an improvement by the factor of up to 23, meaning that more than 95% of user time and effort might be saved per debugging session.

We come up with mechanisms for efficiently dealing with KB debugging problems involving high cardinality (minimal) diagnoses. In the standard interactive debugging approach described in the first parts of this work, the computation of queries is based on the generation of the set of most probable (or minimum cardinality) leading diagnoses. By this postulation, certain quality guarantees about the output solution can be given. However, we learn that dropping this requirement can bring about substantial savings in terms of time and especially space complexity of interactive debugging, in particular in debugging scenarios where faulty KBs are (partly) generated as a result of the application of automatic systems, e.g. KB (ontology) learning or matching systems [HSNM11, NB12, JMSK09, RP10, JRGZH12, Mei11].

image

Figure 1.3: Precedence constraints among the parts of this work.

To cope with such situations, we propose to base query computation on any set of leading diagnoses using a “direct” method for diagnosis generation. Contrary to the standard method that exploits minimal conflict sets, this approach takes advantage of the duality between minimal diagnoses and minimal conflict sets and employs “inverse” algorithms to those used in the standard approach in order to determine minimal diagnoses directly from the DPI without the indirection via conflict sets.

We study the application of this direct method to high cardinality faults in KBs and find out that the number of required queries per debugging session is hardly affected for cases when the standard approach is also applicable. However, the direct method proves applicable and able to locate the correct solution diagnosis in situations when the standard approach (albeit one that not yet incorporates the powerful search tree pruning techniques introduced in this work) is not due to time or memory issues.

Organization of this Work. This work is subdivided into seven parts. Figure 1.3 illustrates the precedence constraints among the parts. We want to point out that Parts IV-VI correspond to works that have already been published and are thus self-contained, both from the notation and the content point of view. Parts I-III, on the contrary, are constructive and should thence be read in order.

(Rest of) Part I. In Chapter 2, besides introducing the notation used in this work, we describe the requirements imposed on logical knowledge representation languages L that might be used with our approaches. It should be noted that the postulated properties do not restrict the applications of our approaches very much. For instance, these might be employed to resolve over-constrained constraint satisfaction problems (CSPs) or repair faulty KBs in PL, FOL, DL, Datalog or OWL. Since DL provides the logical underpinning of OWL which has recently received increasing attention due to the extensive research in the field of The Semantic Web [BLHL+01], we will also give a short introduction to DL. For, to underline the flexibility of the presented debugging systems in this work, we will illustrate how they work by means of examples involving PL, FOL as well as DL KBs.

In Chapter 3, we first give a formal definition of the KB debugging problem and define a diagnosis problem instance (DPI), the input of a KB debugger, and a solution KB, the output of a KB debugger. Further on, we formally characterize a diagnosis and give the notion of KB validity and what it means for a KB to be faulty. We discuss and prove relationships between these notions and specify properties a DPI must satisfy in order to be solvable by a KB debugger.

We motivate why it makes sense to focus on set-minimal diagnoses instead of all diagnoses, i.e. to stick to “The Principle of Parsimony” [Rei87, BATJ91]. This results in the definition of the problem of parsimonious KB debugging. Then, we prove that solving this problem is equivalent to the computation of a minimal diagnosis. Finally, we explain the benefits of using some background KB in (parsimonious) KB debugging.

In Chapter 4 we describe methods for diagnosis computation. To this end, we first introduce the notion of a (minimal) conflict set, discuss some properties of conflict sets related to the notion of KB validity and give sufficient and necessary criteria for the existence of non-trivial conflict sets w.r.t. a DPI. Subsequently, we derive the relationship between a conflict set and the notion of a justification (a minimal set of formulas necessary for a particular entailment to hold) which is well-known and frequently used, especially in the fields of DL, OWL and The Semantic Web [HPS08, HPS09, HPS10, Hor11, HBP11, HPS12a]. Concretely, we will demonstrate that a minimal conflict set is a subset of a justifica-tion for some negative test case or for some inconsistency (entailment false) or incoherency (entailment ∀X1, . . . , Xk ¬p(X1, . . . , Xk)for some predicate symbol p of arity k) of the given KB. Moreover, we will learn that, for the debugging tasks we consider, conflict sets are better suited than justifications.

Having deduced all relevant characteristics of (minimal) conflict sets, we proceed to give a description of a method (QX, Algorithm 1) due to [Jun04] which was originally presented as a method for finding preferred explanations (conflicts) in over-constrained CSPs, but can also be employed for an efficient computation of a minimal conflict set w.r.t. a DPI in KB debugging. We discuss and exemplify this algorithm in detail, prove its correctness as a routine for minimal conflict set computation and give complexity results.

Having at our disposal a proven sound method for generation of a minimal conflict set, we continue with the delineation of a hitting set tree algorithm similar to the one originally presented in [Rei87] which enables the computation of different minimal conflict sets by means of successive calls to QX, each time given an (adequately) modified DPI. In this manner, a hitting set tree can be constructed (breadth-first) which facilitates the computation of minimal diagnoses (minimum cardinality diagnoses first). We prove the correctness (termination, soundness, completeness, minimum-cardinality-first property) of this hitting set tree algorithm coupled with the QX method which serves to solve the problem of parsimonious KB debugging.

In order to be able to incorporate fault information into the diagnoses finding process, we deal with the induction of a probability space over diagnoses in Section 4.6. We discuss several ways of constructing a probability space including different sources of fault information. Hereinafter, we detail how diagnosis probabilities can be determined on the basis of some available fault information and how these can be appropriately updated after new observations (in terms of answered queries) have been made. Furthermore, we outline how fault probabilities can be appropriately incorporated into the hitting set search tree in order to guarantee the discovery of minimal diagnoses in best-first order, i.e. most probable ones first. Then, we prove the correctness (termination, soundness, completeness, best-first property) of this best-first diagnosis finding algorithm for parsimonious KB debugging.

Finally, we describe a non-interactive KB debugging procedure (Algorithm 3) that relies on this best-first diagnosis finding algorithm. Some illustrating examples are provided which at the same time reveal significant shortcomings present in non-interactive KB debugging. This motivates the development of interactive KB debugging algorithms.

Readers not theoretically inclined or non-interested in the technical details might well skip Sections 4.2, 4.4.2, 4.5.2 and 4.6 in Part I.

Part II. In Chapter 6, we first discuss how disadvantages of non-interactive KB debugging procedures can be overcome by allowing a user to take part in the debugging process. Then, we define the problem of interactive static KB debugging as well as the problem of interactive dynamic KB debugging which “naturally” arise from the fact that the DPI in interactive KB debugging is always renewed after a new test case has been specified (a new query has been answered). The former problem searches for a solution KB w.r.t. the DPI given as input such that this solution KB satisfies all test cases added during the debugging session and there is no other such solution KB. The latter problem searches for a solution KB w.r.t. the current DPI (i.e. the input DPI including all new test cases added throughout the debugging session so far) such that there is no other solution KB w.r.t. the current DPI.

Next, in Chapter 7, the central term of a query is specified which constitutes the medium for user interaction. Queries are generated from a set of leading diagnoses which is characterized thereafter. The set of leading diagnoses is uniquely partitioned into three subsets by each query. The tuple including these subsets is called q-partition. Subsequently, the reader is given some explanations how the q-partition can be interpreted, and how it relates to a query. In fact, we will prove that the notion of a q-partition can serve as a criterion for checking whether a set of logical formulas is a query or not. After that, we will learn that a query exists for any set of (at least two) leading diagnoses which grants that the presented algorithms will definitely be able to come up with a query without the need to impose any restrictions on which (minimal) diagnoses are computed by the diagnosis engine in each iteration.

Chapter 8 shows a method for the generation of (a pool of) set-minimal queries (Algorithm 4) aiming at stressing the interacting user as sparsely as possible, features in-depth discussions of this method’s properties, proves its correctness, provides complexity results and gives some illustrating examples. Further on, drawbacks of this method are pointed out and possible solutions are discussed.

Subsequently, Chapter 9 deals with the presentation of the central algorithm of this work which implements an interactive KB debugging system (Algorithm 5). First, an overview of the workflow of interactive KB debugging is given, followed by a more comprehensive detailed specification of the algorithm. Some query selection measures are discussed [RSFF13, SFFR12] and optimization versions of the problems of interactive dynamic and static KB debugging are defined where the goal is to obtain the solution to these problems by asking the user a minimal number of queries. Finally, we prove the correctness of the interactive KB debugging algorithm and provide a discussion of its complexity.

Non-theoretically-oriented readers might well skip Sections 8.2, 8.4, 8.5, 8.7 and 9.4 in Part II. Moreover, for the superficially interested reader, it may suffice to concentrate only on Chapter 6 and Sections 7.1, 7.2 and 9.1 in Part II.

Part III. Here, we go into detail w.r.t. the two strategies for iterative diagnoses computation introduced in Part II that might be plugged into Algorithm 5 to solve either the interactive static or dynamic KB debugging problem.

Chapter 11 describes the static method and proves its soundness and completeness w.r.t. the computation of minimal diagnoses w.r.t. the DPI given as an input to the interactive KB debugging algorithm and its optimality w.r.t. the discovery of minimal diagnoses in best-first order (most-probable or minimum cardinality diagnoses first). Incorporation of the static method as a routine for leading diagnosis computation into Algorithm 5 provably solves the problem of interactive static KB debugging.

Chapter 12 details the dynamic method and proves its soundness and completeness w.r.t. the computation of minimal diagnoses w.r.t. the current DPI and its optimality w.r.t. the discovery of minimal diagnoses in best-first order (most-probable or minimum cardinality diagnoses first). Employing the dynamic method as a routine for leading diagnosis computation in Algorithm 5 provably solves the problem of interactive dynamic KB debugging.

The practically oriented reader or the one that is willing to believe that the presented iterative diagnosis computation techniques in fact work as claimed might skip Sections 11.4 as well as 12.4 in Part III.

Part IV. In this part, we suggest and extensively analyze different methods for the selection of an “optimal” query (see above). The material dealt with in Part IV is based on the publications [SFFR12, SF10] where the former was published in the journal Web Semantics: Science, Services and Agents on the World Wide Web and the latter in the Proceedings of the 9th International Semantic Web Conference (ISWC 2010).

Part V. The reinforcement learning query selection strategy (RIO) that makes the presented debugging system robust against the usage of low-quality fault information is presented and thoroughly analyzed in this part which is based on the works [RSFF13, RSFF12, RSFF11, SRF11] published in Web Reasoning and Rule Systems (RR-2013), in the Proceedings of the 7th International Workshop on Ontology Matching (OM-2012), in the Proceedings of the Joint Workshop on Knowledge Evolution and Ontology Dynamics 2011 (EvoDyn2011) and in DX 2011 - 22nd International Workshop on Principles of Diagnosis, respectively.

Part VI. This part covers the topic of efficiently dealing with KB debugging problems involving high cardinality faults (see above) and relies on material presented in [SFRF14c, SFRF14a, SFRF14b] and published in the Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), in DX 2014 - 25th International Workshop on Principles of Diagnosis and in the Proceedings of the Third International Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM14), respectively.8

Part VII. To round this work off, we provide a discussion of related work in Chapter 32,9 summarize the contributions of this work in Chapter 33 and deal with our future work topics in Chapter 34.

2.1 Assumptions

The techniques described in this work are applicable for any logical knowledge representation formalism L for which the entailment relation is

1. monotonic: is given when adding a new logical formula to a KB  KLcannot invalidate any entailments of the KB, i.e.  KL |= αLimplies that  KL ∪ {βL} |= αL,

2. idempotent: is given when adding implicit knowledge explicitly to a KB  KLdoes not yield new entailments of the KB, i.e.  KL |= αLand  KL ∪ {αL} |= βLimplies  KL |= βLand

3. extensive: is given when each logical formula entails itself, i.e.  {αL} |= αLfor all  αL,

and for which

4. reasoning procedures for deciding consistency and calculating logical entailments of a KB are available,

where  αL, βLare logical formulas and  KLis a set�ax (1)L , . . . , ax (n)L �of logical formulas formulated

over the language  L. KLis to be understood as the conjunction �ni=1 ax (i)L. Notice that the elements of a KB are called quite differently in literature. Possible denotations are logical formula (e.g. [KK06]), well-formed formula (e.g. [CL73]), (logical) sentence or axiom (e.g. [RN10]) and axiom (in most of the description logic literature, e.g. [BCM+07]). We will mainly stick to the term formula (sometimes axiom) to refer to the elements of a KB. As the logic will be clear from the context in the sequel, we will omit the index L when referring to formulas or KBs over L throughout the rest of this work.

2.2 Considered Logics

To underline the general character of this work, we will illustrate our approaches using example diagnosis problem instances expressed in different logical languages. In this section we give notational remarks concerning these different logics used, namely propositional logic (PL), first-order logic (FOL) as well as description logic (DL). Whereas we assume the reader to be familiar with FOL and PL (a good introduction to PL and FOL can be found in [CL73]), we will give a short introduction to DL.

Remark 2.1 It is important to notice that the usage of DL as well as FOL examples throughout this work should not suggest that the Properties 1 4 stated above are satisfied for any DL or FOL language L. In fact, it is well-known by the theorems of Church and Turing (cf. [Men09]; the original works are [Chu36, Tur37]) that FOL is not decidable in general, i.e. Property 4 above is not met. Also in the case of DL, which subsumes a range of different logical languages featuring different expressivity and thus different computational complexity of reasoning procedures, there are languages which are undecidable. For instance, a DL language allowing the formalism of equality role-value-maps which facilitates the expression of concepts like “persons whose co-workers coincide with their relatives” can be proven undecidable [BCM+07, SS89].

Property 4 is satisfied, for example, for the DL language SROIQ which is the logical underpinning of OWL 2 [GHM+08]. However, the complexity (2-NEXPTIME-complete [Kaz08]) of logical reasoning is intractable in the worst case for this language which implies the intractability of our methods in the worst case. Nevertheless, other DL languages applied with similar systems as those described in this paper have been showing reasonable performance [SFRF14c, RSFF13, SFFR12]. Also from the theoretical point of view, there are DL languages that allow for efficient reasoning. One example is the OWL 2 EL profile which enables polynomial time reasoning [BBL05]. For this language, the efficient reasoning service ELK has been presented by [KKS14]. For FOL, datalog is an example of a decidable sublanguage where reasoning is efficient [RN10]. Further, restricted sublanguages of FOL can often be translated to some DL language wherefore DL positive results concerning the decidability of reasoning as well as complexity results can be adopted for these restricted FOL languages [BCM+07, chapter 4] [Bor96].

Moreover, we want to point out that the practical efficiency of our systems depends strongly on the practical performance (which might be by far better than suggested by the worst case reasoning complexities) of the reasoning services called by our algorithms since the reasoning services are used as a black-box (as mentioned in Chapter 1). Possible strategies for improving the reasoning efficiency in the black-box setting are briefly discussed in Chapter 34.

Ontologies and The Semantic Web

Ontologies are KBs that formally and explicitly represent common knowledge about a domain in the form of individuals, concepts (set of individuals) and roles (binary relationships between individuals). As, in the last decade, extensive research has been done in the area of The Semantic Web [BLHL+01] making (automatic) ontology development tools and reasoning services more efficient, ontology engineering for the Semantic Web is on the upswing. The Semantic Web aims at the enrichment of unstructured information on the web by semantic meta data which should facilitate the usage of the web as structured database of knowledge of all kinds where computers are able to “understand” this structured data, establish relationships between different data sources, combine information from different data sources and (most essentially) derive new (implicit) knowledge from the structured data. At this, ontologies are the key to a common vocabulary used for the semantic meta data. Ontologies are employed to precisely define the meaning of different terms, state relationships between different terms and to introduce new terms by means of already specified ones.

The constantly increasing number of people creating ontologies of increasing size (examples were given in Chapter 1) results in more and more (faulty) ontologies which constitute useful application scenarios and test cases for our approaches. For that reason, we also want to use ontology engineering for The Semantic Web as a concrete use case for the presented work. The standard knowledge representation formalism for ontologies is OWL 2 [MPSP09, GHM+08] which relies on DL. A short introduction to DL is given next.

Description Logic

Description Logic (DL) [BCM+07] is a family of knowledge representation languages with a formal logic-based semantics that are designed to represent knowledge about a domain in form of concept descriptions. The syntax of a description language L is defined by its signature and a set of constructors. The signature of L corresponds to the union of possibly disjoint sets  NC, NRand  NI, where  NCcontains all concept names (unary predicates),  NRcomprises all role names (binary predicates) and  NIis the set of all individuals (constants) in L. Each concept and role description can be either atomic or complex. The latter ones are composed using constructors defined in the particular language L. A typical set of DL constructors for complex concepts includes conjunction  A ⊓ B, disjunction  A ⊔ B, negation  ¬A, existential  ∃r.Aand value  ∀r.Arestrictions, where A, B are concept descriptions and  r ∈ NR.

Axioms are statements of knowledge that must be true in a domain. An ontology K is defined as a tuple (T , A), where T (TBox) is a set of terminological axioms and A (ABox) a set of assertional axioms. Each TBox axiom is expressed by a general concept inclusion  A ⊑ B, a form of logical implication, or by a definition  A ≡ B, a kind of logical equivalence, where A and B are concept descriptions or role descriptions. ABox axioms are used to assert properties of individuals in terms of the vocabulary defined in the TBox, e.g. concept A(x) or role r(x, y) assertions, where A is a concept description, r a role description, and  x, y ∈ NI.

The semantics of a description language is given in terms of interpretations  I = (∆I, ·I)consisting of a non-empty domain  ∆Iand a function  ·Ithat assigns to every atomic concept  A ∈ NCa set  AI ⊆∆I, to every atomic role  r ∈ NRa set  rI ⊆ ∆I × ∆Iand to every individual  x ∈ NIsome value xI ∈ ∆I. The interpretation function is extended to complex concept descriptions by the following inductive definitions:

image

where  ⊤and  ⊥are predefined concepts; the former is the universal concept and the latter the bottom concept.

The semantics of axioms is defined as follows for (1) TBox and (2) ABox axioms: (1) Interpretation I satisfies  A ⊑ Biff  AI ⊆ BIand it satisfies  A ≡ Biff  AI = BI. (2) A(x) is satisfied by I iff xI ∈ AIand r(x, y) is satisfied iff  (xI, yI) ∈ rI. An interpretation I is a model of K = (T , A) iff it satisfies all TBox axioms in T and all ABox axioms in A. An ontology K is consistent iff it has a model. A concept A (role r) is satisfiable w.r.t K iff there is a model I of K with  AI ̸= ∅ (rI ̸= ∅). An ontology K is coherent iff all concepts and roles occurring in K are satisfiable. An axiom  αis entailed by K iff  αis true in all models I of K. For a set of axioms X we write K |= X as a shorthand for  K |= αfor all α ∈ X.

Usually description logic systems provide sound and complete reasoning services to their users. Besides verification of coherency and consistency of K and satisfiability checking of concepts, reasoner tasks include classification and realization. Classification determines, for each concept name A occurring in K, most specific (general) concepts that subsume (are subsumed by) A. A concept A subsumes (is subsumed by) a concept B iff  K |= B ⊑ A (K |= A ⊑ B). Classification is employed to build a taxonomy of concepts in K. Realization, given an individual name x occurring in K and a given set of concepts in K (usually all concepts in K), computes the most specific concepts  A1, . . . , Anfrom the set such that  K |= Ai(x)for all i = 1, . . . , n. The most specific concepts are those that are minimal w.r.t. the

subsumption ordering  ⊑.

Example 2.1 The example KB given in the Introduction (Chapter 1) can be equivalently represented in DL (cf. Remark 2.1) as follows:

image

where Res is the concept symbol with equivalent meaning as the predicate symbol res, the role symbol writes corresponds to the equally named binary predicate, Paper to paper, and so on. Notice that axiom 2.2 states that the domain of writes is Res.

2.3 Notational Remarks10

General Notational Conventions. Throughout this work, the nomenclature given by Table 2.1 is used (many of the designators in the table will be explained later in this work). We will mainly refer to an ontology by the term KB.

In order to make a clear distinction between scalars and functions, we denote all scalars g by g and all functions g by g(). If an ordered list occurs in a set operation, then this list is interpreted as a (non-ordered) set. For example, let L := [1, 3, 4, 2] be an ordered list; then  L∩{1, 2, 3}yields the set {1, 2, 3}.

Notational Convention for PL (cf. [RN10]). We use uppercase letters A, B, . . . to denote atoms and the standard logical connectives to build PL formulas from atoms. The operator precedence we use is  ¬, ∧, ∨, →, ↔, from highest to lowest. Given a PL KB K and a PL formula ax, we call �Kand  �axthe signature of K and the signature of ax, respectively. The former comprises all atoms occurring in K and the latter all atoms occurring in ax.

Notational Convention for FOL (cf. [CGT89]). Variables are denoted by uppercase letters; constants and predicate symbols are denoted by strings beginning with a lowercase letter11. Recalling the example KB given in Chapter 1, X, Y are variables, pam is a constant and res, writes, paper, secr and gen are predicate symbols. FOL formulas are built from the standard logical connectives described for PL above. The operator precedence we use for FOL formulas is the same as stated above12. The precedence of quantifiers  ∀, ∃is such that a quantifier outside of any parenthesized expression holds over everything to the right of it; if occurring in a parenthesized expression, a quantifier holds over everything to the right of it within this expression. For example,  ∀Xprof (X) → ∃Y secr(Y )is equivalent to  (∀X(prof (X) →(∃Y (secr(Y )))))(i.e. “for each professor there is at least one secretary”) and not to  (∀Xprof (X)) →∃Y secr(Y )(i.e. “if everybody is a professor, then there is at least one secretary”).

Given a FOL KB K and a FOL formula ax, we call �Kand  �axthe signature of K and the signature of ax, respectively. The former comprises all predicate, function and constant symbols occurring in K and the latter all predicate, function and constant symbols occurring in ax. The signature of the example KB given in Chapter 1 is {res, writes, paper, secr, gen, pam} and the signature of formula 1.2 of this KB is {writes, res}.

Remark 2.2 By analogy with the definition of coherency in DL (see Section 2.2), we call a FOL KB K incoherent iff  K |= ∀X1, . . . , Xk ¬p(X1, . . . , Xk)for some k-place predicate symbol p in the signature of K where  k ≥ 1.

Remark 2.3 We want to point out that whenever we will speak of entailment computation we address the invocation of a sound reasoning service that is guaranteed to terminate after finite execution time and returns a finite number of entailments for any KB given as input (cf. Remark 2.1). Similarly, when we say that all entailments of a KB are computed, we always refer to a finite set of entailments of certain types output by such a reasoning service. Examples of such entailment types regarding DL are the (a) classifi-cation and (b) realization entailments, by which we mean (a) all the subsumption relationships between concept names appearing in the KB, i.e. entailments of the form  C1 ⊑ C2for concept names  C1, C2 ∈ �Kand (b) all the concept names instantiated by a given individual for all individuals appearing in the KB, i.e. entailments of the form C(a) for concepts names  C ∈ �Kand individual names  a ∈ �K.

image

Table 2.1: Symbols and abbreviations used throughout this work (cf. footnote 10).

KB debugging can be seen as a test-driven procedure comparable to test-driven software development and debugging, where test cases are specified to restrict the possible faults until the user detects the actual fault manually or there is only one (highly probable) fault remaining which is in line with the specified test cases. In this chapter, we want to study the theory of (non-interactive) KB debugging, present and discuss mechanisms that can be employed for the debugging of KBs and reveal drawbacks of such systems. In (non-interactive) KB debugging we assume test cases fixed during the debugging procedure. That is, a user might specify a set of test cases offline, run a debugging system and investigate the output solution(s). In case no satisfactory solution has been returned, some additional test cases might be defined offline before the debugger might be invoked again.

The inputs to a KB debugging problem can be characterized as follows: Given is a KB K and a KB B (background knowledge), both formulated over some logic L complying with the conditions 1 4 given in Chapter 2. All formulas in B are considered to be correct and all formulas in K are considered potentially faulty.  K ∪ Bdoes not meet postulated requirements R where {consistency} ⊆ R ⊆{coherency, consistency} or does not feature desired semantic properties, called test cases.13 Positive test cases (aggregated in the set P) correspond to desired entailments and negative test cases (N ) represent undesired entailments of the correct (repaired) KB (along with the background KB B). Each test case p ∈ Pand  n ∈ Nis a set of logical formulas over L. The meaning of a positive test case  p ∈ Pis that the correct KB integrated with B must entail each formula (or the conjunction of formulas) in p, whereas a negative test case  n ∈ Nsignalizes that some formula (or the conjunction of formulas) in n must not be entailed by the correct KB integrated with B.

Remark 3.1 In the sequel, we will write K |= X for some set of formulas X to denote that K |= ax for all  ax ∈ Xand  K ̸|= Xto state that  K ̸|= axfor some  ax ∈ X.

The described inputs to the KB debugging problem are captured by the notion of a diagnosis problem instance:

Definition 3.1 (Diagnosis Problem Instance). Let

• Kbe a KB over L,

P, N sets including sets of formulas over L,

• {consistency} ⊆ R ⊆ {coherency, consistency},

• Bbe a KB over L such that  K ∩ B = ∅and B satisfies all requirements  r ∈ R,

the cardinality of all sets K, B, P, N be finite.

Then we call the tuple  ⟨K, B, P, N ⟩Ra diagnosis problem instance (DPI) over L.14

Note that, for now, we do not make any assumptions about the contents of the sets K, B, P and N that go beyond Definition 3.1. So, it might be well the case, for example, to specify a DPI according to Definition 3.1 for which there are no solutions or for which only trivial solutions exist. Later on, we will discuss properties a DPI must fulfill to guarantee existence of solutions for it.

We define a solution KB for a DPI as follows:

Definition 3.2 (Solution KB). Let  ⟨K, B, P, N ⟩Rbe a DPI. Then a KB  K∗is called solution KB w.r.t. ⟨K, B, P, N ⟩R, written as  K∗ ∈ Sol⟨K,B,P,N⟩R, iff all the following conditions hold:

image

A solution KB  K∗w.r.t. a DPI is called maximal, written as  K∗ ∈ Solmax⟨K,B,P,N⟩R, iff there is no solution KB  K′such that  K′ ∩ K ⊃ K∗ ∩ K.

Now, the problem of KB debugging can be formalized:

Problem Definition 3.1 (KB Debugging). Given a DPI  ⟨K, B, P, N ⟩R, find a solution KB w.r.t. ⟨K, B, P, N ⟩R.

Note that basically any KB  K∗that meets conditions (3.1) - (3.3) is a solution KB in the sense of Definition 3.2. Hence,  K∗does not even need to have a non-empty intersection with K. Only the postulation of maximality of a solution KB (as detailed later in Section 3.1) establishes a relationship to the given KB K.

Remark 3.2 Let  K′ := K ∪ B ∪ UP. Then, conditions (3.1) - (3.3) can be reduced to conditions (3.2) and (3.3) if

 N := N ∪ {{false}}given R = {consistency} or

 N := N ∪ {{∀X1, . . . , Xk p(X1, . . . , Xk) → false} | pis k-place predicate symbol in �K′, k ≥ 1} ∪ {{false}}in case R = {consistency, coherency}.

This holds because a KB K is inconsistent iff K |= {false} and K is incoherent iff some predicate symbol in  K′must be false for any instantiation. Notice that the latter must hold for all predicate symbols in  K′and not only in K (see Example 3.1). For PL and DL, the definitions of N are analogous (cf. Chapter 2), but for PL coherency is not defined wherefore only the first bullet is relevant for PL. In what follows we will stick to the more explicit characterization of a solution KB given by Definition 3.2.

Example 3.1 Let a DL DPI be defined as

image

Then, �K = {B, C}, but there is some concept  A /∈ �K, but  A ∈ �K′, which is unsatisfiable w.r.t.  K ∪ B. Since we want a solution KB integrated with B to meet the conditions (3.1) - (3.3), K is not a solution KB w.r.t.  ⟨K, B, P, N ⟩Rdespite the fact that it is perfectly consistent and coherent as an isolated KB.

Whereas the definition of a solution KB refers to the desired properties of the output of a KB debugging system, the following definition can be seen as a characterization of KBs provided as an input to a KB debugger. If a KB is valid w.r.t. the background knowledge, the requirements and the test cases, then finding a solution KB w.r.t. the DPI is trivial. Otherwise, obtaining a solution KB from it involves modification of the input KB and subsequent addition of suitable formulas. Usually, the KB K part of the DPI given as an input to a debugger is assumed to be invalid w.r.t. this DPI.

Definition 3.3 (Valid KB). Let  ⟨K, B, P, N ⟩Rbe a DPI. Then, we say that a KB  K′is valid w.r.t. ⟨·, B, P, N ⟩Riff  K′ ∪ B ∪ UPdoes not violate any  r ∈ Rand does not entail any  n ∈ N. A KB is said to be invalid (or faulty) w.r.t.  ⟨·, B, P, N ⟩Riff it is not valid w.r.t.  ⟨·, B, P, N ⟩R.15

Intuitively, if a KB K is faulty w.r.t.  ⟨·, B, P, N ⟩R, then there is at least one incorrect formula in K that needs to be corrected or deleted; if a KB K is valid w.r.t.  ⟨·, B, P, N ⟩R, a solution KB can be directly obtained by simply extending K by the set  UPof all sentences comprised in positive test cases. Note, however, that K being valid w.r.t.  ⟨·, B, P, N ⟩Rdoes not necessarily mean that  K ∪ Bentails any  p ∈ P.

Proposition 3.1. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then,  K′ ∪ UP ∈ Sol⟨K,B,P,N⟩Riff  K′is valid w.r.t. ⟨·, B, P, N ⟩R.

Proof. ⇒”: If  K′ ∪ UPis a solution KB, then  K′ ∪ UP ∪ Bmeets all  r ∈ Ras per condition (3.1) and does not entail any  n ∈ Nas per condition (3.3). Hence,  K′is valid w.r.t.  ⟨·, B, P, N ⟩R.

⇐”: If  K′is valid w.r.t.  ⟨·, B, P, N ⟩R, then  (K′∪UP)∪Bmeets all  r ∈ R, i.e. meets condition (3.1). Moreover,  (K′ ∪ UP) ∪ B ̸|= nfor all  n ∈ N, i.e.  (K′ ∪ UP) ∪ Bmeets condition (3.3). By extensiveness of the used language  L, (K′∪UP)∪B |= pfor all  p ∈ P, i.e. condition (3.2) is fulfilled by  (K′∪UP)∪B. Thus,  K′ ∪ UPis a solution KB.

Definition 3.4 (Extension). Let  ⟨K, B, P, N ⟩Rbe a DPI over L and  K′ ⊆ K. A set of formulas E over L is called an extension w.r.t.  K′and  ⟨K, B, P, N ⟩R, written as  E ∈ EX(K′)⟨K,B,P,N⟩R, iff  (K \ K′) ∪ Eis a solution KB w.r.t.  ⟨K, B, P, N ⟩R.

Definition 3.5 (Diagnosis). Let  ⟨K, B, P, N ⟩Rbe a DPI. A set of formulas  D ⊆ Kis called a diagnosis w.r.t.  ⟨K, B, P, N ⟩R, written as  D ∈ aD⟨K,B,P,N⟩R, iff there exists some  E ∈ EX(D)⟨K,B,P,N⟩R, i.e. (K \ D) ∪ Eis a solution KB w.r.t.  ⟨K, B, P, N ⟩R.

A diagnosis D w.r.t.  ⟨K, B, P, N ⟩Ris minimal, written as  D ∈ mD⟨K,B,P,N⟩R, iff there is no  D′ ⊂ Dsuch that  D′is a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. A diagnosis D w.r.t.  ⟨K, B, P, N ⟩Ris a minimum cardinality diagnosis w.r.t.  ⟨K, B, P, N ⟩Riff there is no diagnosis  D′w.r.t.  ⟨K, B, P, N ⟩Rsuch that  |D′| < |D|. Proposition 3.2. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then,  D ∈ aD⟨K,B,P,N⟩Riff K \ D is valid w.r.t. ⟨·, B, P, N ⟩R.

Proof. ⇒”: If D is a diagnosis w.r.t.  ⟨K, B, P, N ⟩R, there is some extension E w.r.t. D and  ⟨K, B, P, N ⟩Rwhich implies that  (K \ D) ∪ Eis a solution KB w.r.t.  ⟨K, B, P, N ⟩R. Now, assume that K \ D is not valid w.r.t.  ⟨·, B, P, N ⟩R. By Proposition 3.1, this means that  (K \ D) ∪ UPis not a solution KB. Hence,  (K \ D) ∪ UP ∪ Bviolates some  r ∈ Ror entails some  n ∈ N. As  (K \ D) ∪ Eis a solution KB, we have that  (K \ D) ∪ E ∪ B |= pfor all  p ∈ P. So, by idempotency of  L, (K \ D) ∪ E ∪ B ≡(K \ D) ∪ E ∪ B ∪ UP ⊇ (K \ D) ∪ UP ∪ Bwhich violates some  r ∈ Ror entails some  n ∈ N. By monotonicity of  L, (K\D)∪E ∪Balso violates some  r ∈ Ror entails some  n ∈ Nwhereby  (K\D)∪Eis not a solution KB which is a contradiction.

⇐”: If K\D is valid w.r.t.  ⟨·, B, P, N ⟩R, then  (K\D)∪B∪UPdoes not violate any  r ∈ Rand does not entail any  n ∈ N. Since  (K \D)∪B ∪UPalso entails each positive test case  p ∈ Pby extensiveness of L, we can conclude that  (K \ D) ∪ UPis a solution KB. By Definition 3.4,  UP ∈ EX(D)⟨K,B,P,N⟩Rand thus D is a diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

In other words, D is a diagnosis w.r.t.  ⟨K, B, P, N ⟩Riff  (K \ D) ∪ Bmeets all requirements, i.e. consistency and/or coherency, as per condition (3.1), does not entail any negative test cases as per condition (3.3), and the positive test cases  p ∈ Pcan be added to  (K \ D) ∪ Bwithout violating any of the conditions (3.1) or (3.3).

From a given DPI  ⟨K, B, P, N ⟩R, a solution KB  K∗can be obtained by a deletion and an expansion step. The deletion step involves the elimination of a diagnosis  D ⊆ Kfrom K. Note that, due to monotonicity of L, only deletion (and not expansion) of the KB can effectuate a repair of inconsistencies, incoherencies and unwanted entailments. Note, if K is already valid w.r.t.  ⟨·, B, P, N ⟩R, then D can be set to  ∅and the deletion step can be omitted. The expansion step aims at the fulfillment of positive test cases P, i.e. condition (3.2), which is not necessarily the case after the deletion step. In fact, some new logical sentences  E ∈ EX(D)⟨K,B,P,N⟩Rmay need to be added to  (K \ D) ∪ Bto grant entailment of all positive test cases.

Corollary 3.1. Let D be a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. Then there is a set of logical sentences  E ∈EX(D)⟨K,B,P,N⟩Rover L such that:

image

Proof. The proposition of the corollary is a direct consequence of Definition 3.2 and Definition 3.5.

From the point of view of a solution KB  K∗w.r.t.  ⟨K, B, P, N ⟩R, K \ K∗is a diagnosis w.r.t. ⟨K, B, P, N ⟩Rand  K∗ \ Kis one possible extension w.r.t. D and  ⟨K, B, P, N ⟩R.

Proposition 3.3. For each solution KB  K∗w.r.t.  ⟨K, B, P, N ⟩Rthere is a diagnosis w.r.t.  ⟨K, B, P, N ⟩Rand an extension E w.r.t. D and  ⟨K, B, P, N ⟩Rsuch that  K∗ = (K \ D) ∪ Eand  E ∩ D = ∅.

Proof. Let  K∗be a solution KB w.r.t.  ⟨K, B, P, N ⟩R. Then  K∗can be written as  K∗ = (K ∩K∗)∪(K∗ \K) = (K \ (K \ K∗)) ∪ (K∗ \ K). Let  K \ K∗ =: Dand  K∗ \ K =: E, then  E ∩ D = ∅. Further on, D ⊆ Kholds and E is a set of logical sentences such that  K∗ = (K\D)∪E ∈ Sol⟨K,B,P,N⟩R. Therefore, D ∈ aD⟨K,B,P,N⟩Rand  E ∈ EX(D)⟨K,B,P,N⟩R.

Corollary 3.2. The (non-)existence of a diagnosis w.r.t.  ⟨K, B, P, N ⟩Ris equivalent to the (non-)existence of a solution KB w.r.t.  ⟨K, B, P, N ⟩R.

Proof. Proposition 3.3 shows that there is a diagnosis for each solution KB. By Definition 3.5, there is also a solution KB for each diagnosis.

The next Proposition gives sufficient and necessary criteria for the existence of a solution, i.e. a diagnosis or a solution KB, respectively, for a given DPI.

Proposition 3.4. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then, a diagnosis D w.r.t.  ⟨K, B, P, N ⟩Rexists iff

• ∀ r ∈ R : B ∪ UPfulfills r and

• ∀  n ∈ N :B ∪  UP̸|= n.

Proof. ⇐”: Let us define D := K. Then  X := (K \ D) ∪ B ∪ UP = B ∪ UP. Consequently, X satisfies each  r ∈ Ras per condition (3.1),  X ̸|= nfor each  n ∈ Nas per condition (3.3), and finally X |= p for each  p ∈ Pby extensiveness of L and thus meets condition (3.2). So, X is a solution KB w.r.t.  ⟨K, B, P, N ⟩Rwherefore D must be a diagnosis.

⇒”: Let  D ⊆ Kbe some diagnosis w.r.t.  ⟨K, B, P, N ⟩R. Then, by definition of a diagnosis, there is some solution KB  K∗w.r.t.  ⟨K, B, P, N ⟩R. Then  K∗ ∪ B |= pfor all  p ∈ Pby condition (3.2), which implies that  K∗ ∪ B ∪ UPdoes not feature any new entailments compared to  K∗ ∪ Bby idempotency of L. So,  K∗ ∪ B ≡ K∗ ∪ B ∪ UPholds. Now, for arbitrary  n ∈ N, since  K∗ ∪ B ̸|= nwe have that K∗ ∪ B ∪ UP ̸|= n, and, by monotonicity of L, that  B ∪ UP ̸|= n. Analogously, for any  r ∈ R, because K∗ ∪ Bsatisfies r, it must be true that  K∗ ∪ B ∪ UPsatisfies r and, by monotonicity of L, that  B ∪ UPsatisfies r.

Definition 3.6 (Admissible DPI). We call a DPI  ⟨K, B, P, N ⟩Radmissible iff there is at least one diagnosis  D ∈ aD⟨K,B,P,N⟩R.

A non-admissible DPI may arise in a situation where a user specifies test cases manually. For this procedure a similar error-proneness as for the user’s formulation of KB formulas can be assumed. And there are lots of pitfalls to escape, as Proposition 3.4 shows. In particular, the specified test cases in P and N must be “compatible” with each other, i.e. positive test cases must not contradict negative ones. For example, adding  p1 := {A ⊑ C, E ≡ B}and  p2 := {C ⊑ E}to P and  n1 := {A ⊑ B}to N leads to a contradiction between P and N and consequently to the non-admissibility of a DPI comprising P and N . Furthermore, the background KB B which is considered as correct, must indeed be correct, at least in terms of R; and negative test cases must be specified in a way not to postulate non-entailment of knowledge specified in B. A counterexample is  B := {∃r.⊤ ⊑ A, r(x, y), A ⊑ C}and N := {{C(x)}}. And third, the union of positive test cases together with B must be in compliance with R, particularly the formulas in P must not be inconsistent or incoherent. Because the union of positive test cases  UPcan be viewed as an own KB since all logical sentences occurring in some  p ∈ Pmust be true in the solution KB. So, in a setting where test cases are specified manually, faults occur as likely in  UPas they do in K.

The debugging system presented in this work, however, guarantees by automatic test case generation that admissibility of a DPI is satisfied at any time, provided that an admissible DPI is given as an initial input to the debugging system.

Remark 3.3 In case of a present DPI  ⟨K, B, P, N ⟩Rwhich is non-admissible, the DPI must be properly modified before it can be used with our debugging system. More concretely, the sets B, P as well as N must be prepared in a way that the two conditions in Proposition 3.4 are satisfied. When supposing that B is an already approved and correct KB (which is a reasonable assumption for a KB used as background knowledge during a debugging session), then there are (at least) the following ways to obtain an admissible DPI from a given non-admissible DPI without modifying B.

(a) One straightforward way to achieve that is the deletion of all manually specified test cases from P and N . After that, both sets are either the empty set (if no automatic test cases, e.g. from former debugging sessions were included in these sets) or comprise only automatically generated test cases. The former case yields an admissible DPI independently of K by the property of B to not violate any requirements in R (see Definition 3.1). That the latter case implies the admissibility of the DPI is a property of the debugging system described in this work (as we will show later by Corollary 7.3).

(b) Another way to resolve the non-admissibility of a DPI  ⟨K, B, P, N ⟩Ris to first check whether ⟨UP, B, ∅, N ⟩Ris admissible (verification of Proposition 3.4 by means of a reasoning service). If so, it is clear that B does not conflict with N . Then, a debugger (like the one presented in this work) can be exploited to find an as small as possible subset of the set of all formulas occurring in the positive test cases, the removal of which causes the DPI to become admissible. This would be accomplished by the computation of a minimal diagnosis  DPw.r.t.  ⟨UP, B, ∅, N ⟩Rand the usage of the modified admissible DPI  ⟨K, B, {UP \ DP} , N ⟩Rinstead of the original one. In this case, only a set-minimal set  DPof formulas that were desired entailments of the user are lost. This modification is possible in polynomial time apart from the reasoning costs, i.e. by means of a polynomial number of calls to a reasoner (cf. Chapter 1).

(c) Otherwise, i.e. if B already conflicts with the negative test cases N , then an algorithm similar to Algorithm 1 (that will be presented in Section 4.4.1) can be employed to determine a maximal subset N ′of N w.r.t. set inclusion such that B will not be in conflict with  N ′. This approach also requires only a polynomial number of calls to a reasoner (cf. Proposition 4.8). If the resulting modified DPI ⟨K, B, P, N ′⟩Ris not yet admissible, i.e. after adding the positive test cases  UPto B there are again conflicts with  N ′, method (b) must be executed in order to finally obtain an admissible DPI.

That is, given a non-admissible DPI, there is a transformation achievable in polynomial time which enables the establishment of admissibility involving a set-minimal number of modifications to the given test cases. Thence, in the rest of this work, we will assume that a DPI given as an input to our algorithms is admissible.

In general, there are multiple (minimal) diagnoses for a DPI, i.e.  |aD⟨K,B,P,N⟩R| ≥ |mD⟨K,B,P,N⟩R|> 1, and there are multiple, in fact infinitely many, extensions  E ∈ EX(D)⟨K,B,P,N⟩Rfor a fixed diagnosis  D ∈ aD⟨K,B,P,N⟩R. The task addressed in this work is finding an optimal diagnosis for a given DPI, whereas the identification of an optimal extension w.r.t. that diagnosis and the DPI is not the aim. What we understand by “optimality” of a diagnosis will be addressed in more detail in Part II. Instead, we will content ourselves with finding any extension that enables to formulate a solution KB given a DPI and a diagnosis for that DPI. In fact, the problem of finding a solution KB for a DPI can be reduced to finding a diagnosis for that DPI since a suitable extension can be easily formulated for any diagnosis, as the next proposition shows:

Proposition 3.5. Let  ⟨K, B, P, N ⟩Rbe a DPI and  D ∈ aD⟨K,B,P,N⟩R. Then  UPis an extension w.r.t. D and  ⟨K, B, P, N ⟩R.

Proof. Let us assume that there is some  D ∈ aD⟨K,B,P,N⟩Rand  UPis not an extension w.r.t. D and ⟨K, B, P, N ⟩R. By the definition of a diagnosis, this is equivalent to stating that  (K \ D) ∪ UPis not a solution KB which in turn means that at least one condition (3.1), (3.2) or (3.3) of Definition 3.2 is violated by  (K \ D) ∪ UP. However, the fact that D is a diagnosis implies the existence of some extension  E ∈ EX(D)⟨K,B,P,N⟩Rthat can be added to (K \ D) to obtain a solution KB. This means that conditions (3.1) and (3.3) must be already valid for (K \ D), since, by monotonicity of L, addition of logical sentences E can neither solve inconsistencies or incoherencies necessary for fulfillment of condition (3.1) nor invalidate non-desired entailments as per condition (3.3). As a consequence, condition (3.2) must be violated by  (K \ D) ∪ UP. By extensiveness of L it holds that  (K \ D) ∪ UP |= pfor all p ∈ Pwhereby we obtain that condition (3.2) is fulfilled which yields a contradiction.

Proposition 3.5 claims that the expansion operation, i.e. identifying a concrete extension for a diagnosis, is trivial, at least for our purposes, namely formulating an extension reflecting only evident entailments given by the set of positive test cases P. Consequently, in order to find a solution KB for some DPI, it is sufficient to concentrate on the deletion step, i.e. on the search for diagnoses.

Note that using  UPas a canonical extension when computing diagnoses does not affect the set of identified diagnoses. In other words, exchanging  E ∈ EX(D)⟨K,B,P,N⟩Rfor  UPin Definition 3.5 yields an equivalent definition. The following corollary proves this statement and summarizes the relationship between the notions diagnosis, solution KB and valid KB.

Corollary 3.3. The following statements are equivalent:

1. D is a diagnosis w.r.t.  ⟨K, B, P, N ⟩R

2.  (K \ D) ∪ UPis a solution KB w.r.t.  ⟨K, B, P, N ⟩R

3. (K \ D) is valid w.r.t.  ⟨·, B, P, N ⟩R.

Proof. That (1) is equivalent to (2) follows from Definition 3.5 which states that D is a diagnosis w.r.t. ⟨K, B, P, N ⟩Riff there is some set of sentences  E ∈ EX(D)⟨K,B,P,N⟩Rsuch that  (K \ D) ∪ Eis a solution KB, and from Proposition 3.5 which proves that  UPis an extension w.r.t. any diagnosis D and ⟨K, B, P, N ⟩R.

That (1) is equivalent to (3) follows directly from Proposition 3.2 and the equivalence of (2) and (3) has been shown in Proposition 3.1.

3.1 Parsimonious Knowledge Base Debugging

Why are minimal diagnoses interesting? First, the set of minimal diagnoses w.r.t. a DPI captures all the information that explains the unwanted properties, i.e. violation of requirements or test cases, of the DPI. In other words, the minimal diagnoses represent all subset-minimal possibilities to modify a KB in a way it becomes a valid KB w.r.t. the given DPI (e.g. by simply deleting a minimal diagnosis from the KB in the trivial case). By monotonicity of the logic L, each superset of a minimal diagnosis w.r.t. a DPI is a diagnosis w.r.t. this DPI. That is,  aD⟨K,B,P,N⟩Rcan be easily reconstructed given  mD⟨K,B,P,N⟩R. There is however no evidence (in terms of specified requirements and test cases) in a DPI that would justify the selection of a non-minimal diagnosis. That is, if K is a KB and  D ⊆ Ka minimal diagnosis w.r.t. a DPI including K, K \ D does not violate any of the postulated properties that must hold for a KB to be valid w.r.t. this DPI. For that reason, there is no evident need to delete or modify any other sentences in K except for the ones in some minimal diagnosis D.

Second, usually a setting can be assumed where the author of a KB specifies formulas to the best of their knowledge. Hence, the assumption that a formula is rather correct than faulty, or in other words, that the KB author wants to keep as many formulated sentences as possible in a solution KB obtained from a debugger, is practical.

This also motivates the importance of a certain subset of minimal diagnoses, namely minimum cardinality diagnoses, which are the solutions of choice in scenarios where no probabilistic information about the KB authors’ faults is available, e.g. in terms of statistics retrieved from log data of the used IDE (see Section 4.6 for details). In an application where such information is given, minimum cardinality diagnoses might not always be the appropriate choice (for details see Part II). In this case the aim is to find a minimal diagnosis with a maximal probability of including only sentences that are actually faulty (which might not necessarily be a minimum cardinality diagnosis).

Third, minimality of diagnoses will be a necessary condition to guarantee the possibility of discrimination between different (candidate) diagnoses to formulate a solution KB, as will be seen later in Chapter 7.

Fourth, focusing only on minimal diagnoses rather than all diagnoses can greatly reduce the search space for diagnoses and therefore greatly speed up the debugging procedure (cf. [dKW87]).

Projected to the task of KB debugging, namely finding a solution KB w.r.t. a given DPI, this means we are interested in minimal invasiveness, that is making as few formula-deletion-modifications to the input KB K as possible in the course of the performed debugging actions. That is, the actual goal is to find some maximal solution KB  K∗for a DPI. Compare with “The Principle of Parsimony” in [Rei87, p. 7] [BATJ91].

Problem Definition 3.2 (Parsimonious KB Debugging). Given a DPI  ⟨K, B, P, N ⟩R, the task is to find a maximal solution KB w.r.t.  ⟨K, B, P, N ⟩R.

The next proposition shows that this problem can be reduced to finding a minimal diagnosis.

Proposition 3.6. (i)  K \ K∗is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rfor each maximal solution KB K∗w.r.t.  ⟨K, B, P, N ⟩R.

(ii) If D is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R, then  (K \ D) ∪ Eis a maximal solution KB w.r.t. ⟨K, B, P, N ⟩Rfor all extensions  E ∈ EX(D)⟨K,B,P,N⟩R.

Proof. Ad (i): Let  K∗be an arbitrary maximal solution KB w.r.t.  ⟨K, B, P, N ⟩R. The first observation is that  D := K \ K∗is a diagnosis w.r.t.  ⟨K, B, P, N ⟩Rsince  K∗ \ K ∈ EX(D)⟨K,B,P,N⟩Rby the fact that  K∗ = (K \ D) ∪ (K∗ \ K)is a solution KB by assumption. Let us assume that there is a diagnosis Dk ∈ aD⟨K,B,P,N⟩Rsuch that  D ⊃ Dk. Since  Dkis a diagnosis, it holds per Definition 3.5 that there is an extension  E ∈ EX(Dk)⟨K,B,P,N⟩Rsuch that  K∗k := (K \ Dk) ∪ Eis a solution KB. Further on, K∩K∗k = K∩((K\Dk)∪E) = (K\Dk)∪(K∩E). Since  K∩K∗can be written as  K\(K\K∗) = K\Dwhich is a strict subset of  K\Dkwhich in turn is a subset of  (K\Dk)∪(K∩E) = K∩K∗k. Consequently, K ∩ K∗ ⊂ K ∩ K∗kholds, which is by Definition 3.2 a contradiction to the maximality of the solution KB K∗. Thus,  D = K \ K∗is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

Ad (ii): Let D be a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R. Then, by Definition 3.5, there is an extension  E ∈ EX(D)⟨K,B,P,N⟩Rsuch that  K∗ := (K \ D) ∪ Eis a solution KB. Let us assume that E ∩ D ̸= ∅. We can rewrite  K∗as  K∗ = (K \ D) ∪ (E ∩ D) ∪ (E \ D). Since  ∅ ⊂ E ∩ D ⊆ D, we have that  (K \ D) ∪ (E ∩ D) ⊃ K \ D. Thus, there is a  D′ := D \ (E ∩ D) ⊂ Dand an extension E′ ∈ EX(D′)⟨K,B,P,N⟩Rsuch that  E′ := E \ Dsuch that  K∗ = (K \ D′) ∪ E′. As  K∗is a solution KB, this is a contradiction to the minimality of D. Therefore, (*)  E ∩ D = ∅for all  E ∈ EX(D)⟨K,B,P,N⟩Rmust hold.

Let E be any extension w.r.t. D and  ⟨K, B, P, N ⟩R. Then we can write  K∩K∗ = K∩((K\D)∪E) =(K\D)∪(K∩E)and by (*) also  K∩E = ((K\D)∪D)∩E = ((K\D)∩E)∪(D∩E) = (K\D)∩E ⊆ K\D. Consequently, (**)  K ∩ K∗ = K \ D. Now, assume that there is a solution KB  K∗kwith the property K ∩ K∗k ⊃ K ∩ K∗. By (**), this implies that  K ∩ K∗k ⊃ K \ Dwhich means that there is a  Dk ⊂ D ⊆ Ksuch that  K ∩ K∗k = K \ Dk ⊆ K∗k. Now  K∗kis a solution KB w.r.t.  ⟨K, B, P, N ⟩Rand can be written as K∗k = (K∗k ∩ K) ∪ (K∗k \ K) = (K \ Dk) ∪ (K∗k \ K). By  Dk ⊆ Kand since there is a set of formulas E := K∗k \ Ksuch that  (K \ Dk) ∪ E ∈ Sol⟨K,B,P,N⟩Rwe have that  E ∈ EX(Dk)⟨K,B,P,N⟩Rmust hold wherefore  Dkis a diagnosis by Definition 3.5. This, however, is a contradiction to the minimality of D. Therefore,  K∗ = (K \ D) ∪ Emust be a maximal solution KB for any  E ∈ EX(D)⟨K,B,P,N⟩R.

By claim (i), Proposition 3.6 assures that each maximal solution KB can be found by investigating all minimal diagnoses w.r.t. a DPI. Claim (ii) shows that any solution KB built from a minimal diagnosis is indeed maximal. Thus, finding a suitable minimal diagnosis solves the problem of parsimonious KB debugging completely.

3.2 Background Knowledge

The general debugging setting considered in this work envisions the opportunity for the user to specify some background knowledge B, i.e. a set of formulas that are known (or strongly assumed) to be correct in advance. Note that, in order for the debugging procedure to work soundly, before some background knowledge is incorporated into the DPI, it is necessary to verify its conformance with the postulated requirements R (cf. Definition 3.1).We can distinguish between two basic scenarios how background knowledge can be leveraged: (1) We have an initial KB  Kinitand we know or want to assume that a subset of formulas in  Kinitis correct, i.e.  B ∩ Kinit ̸= ∅, and (2) we have an initial KB  Kinitand some background knowledge disjoint from  Kinit, i.e.  B ∩ Kinit = ∅.

Example use cases for scenario (1) are situations where a user knows that a subset of formulas B in K is definitely sound or wants to restrict the scope of debugging to a particular part of the KB. Concretely, this may occur, for instance, when B is the result, i.e. the finally output solution KB  K∗, of a former successful debugging session and K is a further development of  K∗, or in a collaborative setting where many users are involved in the development of K and one of them may want to debug only formulas authored by herself and not touch foreign formulas, which are thus assumed as correct and assigned to B. In (1),  Kinit ∩ Band  Kinit \ Bpartition the original KB  Kinitinto a set of correct and a set of possibly incorrect formulas, respectively. The corresponding DPI would thus be  ⟨Kinit \ B, B, P, N ⟩Rfor some sets of test cases P and N . Note that this DPI does meet the necessary condition (cf. Definition 3.1) K∩B = ∅as  (Kinit \ B)∩B = ∅. So, in the debugging session, only  K := Kinit \ Bis used to search for diagnoses, which can reduce the search space substantially. Though, B is incorporated in the calculations throughout the KB debugging procedure, but no formula in B may take part in a diagnosis. The advantage of this over simply not considering the formulas in B at all is, that the semantics of formulas in B is not lost and can be exploited, e.g., to grant the desired semantic properties also in the context of existing approved knowledge or to facilitate a greater choice of queries to interact with a user, which can be exploited to ask queries with lower cardinality or involving less complex formulas (see Chapter 7 for details on queries).

In scenario (2), the corresponding DPI looks like  ⟨Kinit, B, P, N ⟩Rfor some sets of test cases P and N . An application of this scenario could be the reuse of an existing KB to support an increase of the fault detection rate and thus more sustainable debugging. For example, when formulating a KB  Kinitabout a domain, a reference KB B in that domain that is thoroughly curated by experts could be leveraged. The use of such a KB B is possible both if  Kinitis correct as a standalone KB, i.e.  Kinitis already a solution KB for  ⟨Kinit, ∅, P, N ⟩R, or not. In the first case,  Kinitmight still contain formulations conflicting with B. In this vein, in both cases, faults may be detected that would have been missed otherwise.

In this chapter we describe methods for computing minimal diagnoses w.r.t. a given admissible DPI, provide an in-depth theoretical analysis of these methods including correctness proofs and illustrate the presented algorithms by various examples.

4.1 Conflict Sets

The search space for minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rthe size of which is in general  O(2|K|)(if all subsets of the KB K are investigated) can be reduced to a great extent by exploiting the notion of a conflict set [Rei87, dKW87, SFFR12].

Definition 4.1 (Conflict Set). Let  ⟨K, B, P, N ⟩Rbe a DPI. A set of formulas  C ⊆ Kis called a conflict set w.r.t.  ⟨K, B, P, N ⟩R, written as  C ∈ aC⟨K,B,P,N⟩R, iff  C ∪ UPis not a solution KB w.r.t.  ⟨K, B, P, N ⟩R. A conflict set C is minimal, written as  C ∈ mC⟨K,B,P,N⟩R, iff there is no  C′ ⊂ Csuch that  C′is a conflict set.

Simply put, a (minimal) conflict set is a (minimal) faulty KB that is a subset of K. That is, a conflict set is one source causing the faultiness of K in the context of  B ∪ UP. In other words, a valid KB may not include all the formulas of any conflict set.

Corollary 4.1.  C ⊆ Kis a conflict set w.r.t.  ⟨K, B, P, N ⟩Riff C is invalid w.r.t.  ⟨·, B, P, N ⟩R.

Proof. If C is a conflict set w.r.t.  ⟨K, B, P, N ⟩R, then  C ∪UPis not a solution KB, i.e.  C ∪B∪UPviolates some  r ∈ R, some  p ∈ Por some  n ∈ N. By extensiveness of  L, C ∪ B ∪ UP |= pfor all  p ∈ P, so C ∪ B ∪ UPmust violate some  r ∈ Ror entail some  n ∈ N. Thus, by Definition 3.3, C is invalid w.r.t. ⟨·, B, P, N ⟩R.

If  C ⊆ Kis not valid w.r.t.  ⟨·, B, P, N ⟩R, then  C∪B∪UPviolates some  r ∈ Ror entails some  n ∈ N, wherefore  C∪UP /∈ Sol⟨K,B,P,N⟩R. Hence, by Definition 4.1, C is a conflict set w.r.t.  ⟨K, B, P, N ⟩R.

Consequently, a conflict set C along with the background knowledge B either violates some  r ∈ R, entails some  n ∈ N, or yields to a violation of some  r ∈ Ror entailment of some  n ∈ Nif all formulas UPcomprised by the positive test cases are added to C. Any KB K that is not valid w.r.t.  ⟨·, B, P, N ⟩Ris itself a conflict set and includes at least one minimal conflict set.

Proposition 4.1. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then, K is not valid w.r.t.  ⟨·, B, P, N ⟩Riff K includes at least one minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

image

⇐”: Let K include at least one minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then, by Definition 4.1, there is some  C ⊆ Ksuch that  C ∪ UPis not a solution KB. Hence, by the monotonicity of  L, K ∪ UPcannot be a solution KB either. So, by Proposition 3.1, K is not valid w.r.t.  ⟨·, B, P, N ⟩R.

As a consequence, a complete and sound method for computing minimal conflict sets w.r.t. a DPI ⟨K, B, P, N ⟩Rcan be used to decide validity of K w.r.t.  ⟨·, B, P, N ⟩R. Moreover, such a method can be used to decide whether a given DPI is admissible, i.e. has solutions. For, if a DPI is admissible and the given KB is invalid w.r.t. this DPI, then there cannot be an empty conflict set. In other words, if the empty KB is a conflict set – or, equivalently, an empty conflict set exists w.r.t. a DPI –, then the DPI is not admissible.

Proposition 4.2. Let  ⟨K, B, P, N ⟩Rbe a DPI and K be invalid w.r.t.  ⟨·, B, P, N ⟩R. Then, there exists a minimal conflict set  C ̸= ∅w.r.t.  ⟨K, B, P, N ⟩Riff  ⟨K, B, P, N ⟩Ris admissible.

Proof. Since K is not valid w.r.t.  ⟨·, B, P, N ⟩R, there must be at least one conflict set w.r.t.  ⟨K, B, P, N ⟩Rby Proposition 4.1. Assume that there exists a minimal conflict set  C ̸= ∅w.r.t.  ⟨K, B, P, N ⟩R. This can be true iff  ∅is not a (minimal) conflict set w.r.t.  ⟨K, B, P, N ⟩R. By Corollary 4.1 and Definition 3.3, this is equivalent to the fact that  ∅ ∪ B ∪ UP ≡ B ∪ UPdoes not violate any  r ∈ Rand does not entail any n ∈ N. By Proposition 3.4, this holds iff there exists a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. By Definition 3.6, this is equivalent to  ⟨K, B, P, N ⟩Rbeing admissible.

The following proposition provides information about the relationship between (minimal) conflict sets and the background knowledge as well as the positive test cases.

Proposition 4.3. Let  ⟨K, B, P, N ⟩Rbe a DPI and C a conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then the following holds:

1. C ∩ B = .

2. If C is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R, then  C ∩ UP = ∅.

Proof. 1):  C ∩ B = ∅holds since  C ⊆ K(Definition 4.1) and  K ∩ B = ∅(Definition 3.1).

2): Assume that C is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rand  C ∩ UP ̸= ∅. Since C is a conflict set, we have that  C ∪ B ∪ UPviolates some  r ∈ Ror entails some  n ∈ Nby Corollary 4.1 and Definition 3.3. Since  (C \ UP) ∪ B ∪ UP = C ∪ B ∪ UPand  (C \ UP) ⊂ C, this implies that (C \ UP)is a conflict set w.r.t.  ⟨K, B, P, N ⟩Rwhich in turn implies that  C /∈ mC⟨K,B,P,N⟩Rwhich is a contradiction.

4.2 Conflict Sets versus Justifications

The notion of a conflict set is closely related to the notion of a justification [HPS08, HPS09, HPS10, Hor11, HBP11, HPS12a] which is frequently adopted in the field of the Semantic Web (cf. Section 2.2) in order to find minimal explanations for particular entailments in DL ontologies. Thus, the paradigm of a justification can be a useful aid in the debugging of faulty ontologies [Kal06]. Note that sometimes justifications are referred to as MinAs (Minimal Axiom Sets) [BP08] or MUPS (Minimal Unsatisfiabil-ity Preserving Sub-TBoxes) [SHCH07] where the latter term is mostly used in the context of ontology debugging. The notion of a (minimal) conflict set, on the other hand, has been mainly adopted in the Diagnosis community [Rei87, dKW87, PW03, WSM02, FFJS04]. In this section we want to establish a relationship between these two widely used instruments used for debugging. It will turn out that both terms are strongly related, but in debugging systems like the ones proposed in our work conflict sets are better suited as they automatically focus only on the minimal explanations for faults in a KB.

For example, the author of [Kal06] i.a. discusses the use of justifications to aid the debugging of incoherent ontologies, i.e. ontologies that include unsatisfiable concepts (cf. Section 2.2). If there are multiple unsatisfiable concepts, then some of these might be only unsatisfiable due to the unsatisfiability of another concept. Assume, for instance, an incoherent DL KB  K := {A � B, B ⊑ E ⊓ ¬E}. In K there are two unsatisfiable concepts A and B where A’s unsatisfiability is dependent on B’s unsatisfiability. Using the terminology of [Kal06, Hor11], A would be called a purely derived unsatisfiable concept whereas B would be called a root unsatisfiable concept. Because the (only) justification for the unsatisfiability of A is  JA := Kwhereas the (only) justification for the unsatisfiability of B is  JB = {B ⊑ E ⊓ ¬E} ⊂ JA. Therefore, [Kal06] proposes to resolve root unsatisfiable concepts first since this might resolve some (purely) derived concepts as well, as in this example. However, finding out whether a concept is root or derived involves the computation of justifications for all unsatisfiable concepts in a KB. On the other hand, reliance on minimal conflict sets would implicate a direct focus on the faultiness (in this example: the incoherency) of the KB and not necessarily on the exact explanations of all unsatisfiable concepts that cause the incoherency. In this vein, no justification for a purely derived concept can be a minimal conflict set. So, the computation of minimal conflict sets involves only the determination of those justifications for faults that must necessarily be resolved. Therefore, for the given example, the only minimal conflict set is  JB.

A justification for a given formula (axiom) relative to a KB is a (subset-)minimal subset of the KB that entails the given formula.

Definition 4.2 (Justification for a Formula). [KPHS07] Let K be a KB and  αa formula, both over L. Then  J ⊆ Kis called a justification for  αw.r.t. K, written as  J ∈ Just(α, K), iff  J |= αand for all J′ ⊂ Jit holds that  J′ ̸|= α.

Since we consider test cases which are sets of formulas over L, we generalize the definition of a justification as follows:

Definition 4.3 (Justification for a Set of Formulas). Let  K, K′be KBs over L. Then  J ⊆ Kis called a justification for  K′w.r.t. K, written as  J ∈ Just(K′, K), iff  J |= K′and for all  J′ ⊂ Jit holds that J′ ̸|= K′.16

In order to express the connection between justifications and conflict sets, we require yet another generalization of this definition. To this end, the following definition characterizes a justification for a set X of KBs relative to a KB K as a (subset-)minimal subset of K such that this subset entails some KB in X.

Definition 4.4 (Justification for a Set of Sets of Formulas). Let K be a KB over L and X a set of KBs over L. Then  J ⊆ Kis called justification for X w.r.t. K, written as  J ∈ Just(X, K), iff  J |= K′for some  K′ ∈ Xand for all  J′ ⊂ Jit holds that  J′ ̸|= K′′for all  K′′ ∈ X.

Based on Definition 4.4, the relation between conflict sets and justifications is captured by the following Proposition 4.4. Intuitively, any conflict set w.r.t.  ⟨K, B, P, N ⟩Ris the part of a justification for a fault that is relevant for the debugging task, where fault refers to an inconsistency (and/or incoherency) and/or a negative test case entailed by  K ∪ B ∪ UP. Since debugging focuses on the deletion of KB formulas only, “relevant” in this context refers to the subset of the justification that does not contain any sentences in B and  UP, but solely sentences from K. Importantly, there may be justifications, in general, the relevant subset of which is not a minimal conflict set. The reason why this case can arise in spite of the set-minimality of justifications is that the relevant part of a justification (for some set of sentences  K1, e.g. a negative test case  n1 ∈ N) may be a superset of the relevant part of another justification (for some other set of sentences  K2, e.g. another negative test case  n2 ∈ N) whereas both justifications are not in a subset-relationship (i.e. contain different sentences from B and/or  UP). This circumstance is illustrated by the following example:

Example 4.1 Let a DPI  ⟨K, B, P, N ⟩Rbe defined as

image

We have that  K∪B ∪UPis consistent and thus no requirement in R is violated. But, the two negative test cases are both entailed by  K∪B ∪UPwherefore K is invalid w.r.t.  ⟨·, B, P, N ⟩R. The set of justifications for the violation of the first negative test case is  Jn1 = {{A ⊑ B, B ⊑ E}}; for the second one it is Jn2 = {{B ⊑ E, E ⊑ ∃r.G}}. The relevant subset of the justification  J1in  Jn1is  J1,rel = {B ⊑ E}(since  {A ⊑ B}is in B) whereas the relevant subset of the justification  J2in  Jn2is  J2,rel = {B ⊑ E, E ⊑ ∃r.G}, i.e.  J1,rel ⊂ J2,reldespite that there is no set subset-relationship between  J1and  J2. Hence, there are two justifications that explain the invalidity of K w.r.t.  ⟨·, B, P, N ⟩R, but there is only one minimal conflict set  C = J1,relw.r.t.  ⟨K, B, P, N ⟩R.

So, generally, the set of minimal conflict sets w.r.t. a DPI is a subset of the set of justifications for faults in  K ∪ B ∪ UP, which is due to the focus on just the parts of justifications that are relevant for the KB debugging task.

Proposition 4.4. Let  ⟨K, B, P, N ⟩Rbe a DPI. Additionally, let

image

(b)  X := {{⊤ ⊑ ⊥}} ∪ Nif R = {consistency}.17

Then the following holds:

1. If C is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R, then there is some  J ∈ Just(X, K ∪ B ∪ UP)such that  (J ∩ K) \ UP = C.

2. For all  J ∈ Just(X, K ∪ B ∪ UP)it is true that  C := (J ∩ K) \ UPis a conflict set w.r.t. ⟨K, B, P, N ⟩R, but not necessarily a minimal one.

Proof. 1): Assume that  C ∈ mC⟨K,B,P,N⟩Rand for all  J ∈ Just(X, K ∪ B ∪ UP)it holds that  (J ∩ K) \UP ̸= C. There are two cases to distinguish between: (a) there is some sentence in  (J ∩ K) \ UPthat is not in C and (b) there is some sentence in C that is not in  (J ∩ K) \ UP.

Let us first assume (a), i.e. for all  J ∈ Just(X, K ∪ B ∪ UP)it holds that there is some sentence ax in  (J ∩ K) \ UPthat is not in C. Additionally, assume there is a  J ∈ Just(X, K ∪ B ∪ UP)such that J ⊆ C ∪ B ∪ UP. We can write J as  J = S1 ∪ S2 ∪ S3for  S1 := [(J ∩ K) \ UP], S2 := [J ∩ B]and S3 := [J ∩UP]. Since  J = S1∪S2∪S3 ⊆ C∪B∪UPit must hold in particular that  S1 ⊆ C∪B∪UPand therefore  ax ∈ C ∪ B ∪ UP. However,  ax /∈ Cby assumption,  ax /∈ Bsince  ax ∈ Kand  B ∩ K = ∅, and ax /∈ UPsince  ax ∈ S1and  S1∩UP = ∅. This is a contradiction. Hence, for all  J ∈ Just(X, K∪B∪UP)it holds that  J ̸⊆ C ∪ B ∪ UP. Since X captures all  r ∈ Rand  n ∈ N, we can conclude that C is not a conflict set w.r.t.  ⟨K, B, P, N ⟩Rwhich is a contradiction to  C ∈ mC⟨K,B,P,N⟩R.

Let us now assume (b), i.e. for all  J ∈ Just(X, K ∪ B ∪ UP)it holds that there is some sentence ax in C that is not in  (J ∩ K) \ UP. Since C is a conflict set and since X captures all  r ∈ Rand  n ∈ N, we have that  C ∪ B ∪ UP |= K′for some  K′ ∈ X. So, there must be some  J0 ∈ Just(X, K ∪ B ∪ UP)such that  J0 ⊆ C ∪ B ∪ UP. As  C ∈ mC⟨K,B,P,N⟩R, there cannot be any  J ∈ Just(X, K ∪ B ∪ UP)with J ⊆ C′ ∪ B ∪ UPfor arbitrary  C′ ⊂ C. This must hold in particular for  J0which implies that  J0 ∩ C = Cwhich is equivalent to  C ⊆ J0. As (1)  C ⊆ K(Definition 4.1) and, by Proposition 4.3 and by the fact that C ∈ mC⟨K,B,P,N⟩R, (2)  C ∩ UP = ∅, we can conclude that  C ⊆ (J0 ∩ K) \ UPwhich is a contradiction since there cannot be a ax in C that is not in  (J0 ∩ K) \ UP.2): If  J ∈ Just(X, K∪B∪UP), then, by Definition 4.4,  J |= K′for some  K′ ∈ Xand  J ⊆ K∪B∪UP. So,  [(J ∩ K) \ UP] ∪ B ∪ UP = (J ∩ K) ∪ B ∪ UP ⊇ Jwherefore  [(J ∩ K) \ UP] ∪ B ∪ UP |= K′by monotonicity of L. As  K′ ∈ Xand X captures all the reasons why some  r ∈ Ror some  n ∈ Nmay not be fulfilled (cf. the discussion in Chapter 3), we have that  [(J ∩K)\UP]∪B∪UPviolates some  r ∈ Ror entails some  n ∈ N. This implies that  [(J ∩ K) \ UP] ∪ UP /∈ Sol⟨K,B,P,N⟩R. Since  (J ∩ K) \ UP ⊆ Kis also true,  (J ∩ K) \ UP ∈ aC⟨K,B,P,N⟩Rby Definition 4.1.

To see that  (J ∩ K) \ UP /∈ mC⟨K,B,P,N⟩Rholds in general, reconsider Example 4.1 where  (J2 ∩K) \ UP = J2 ⊃ Cholds for the justification  J2and the minimal conflict set C.

4.3 The Relation between Conflict Sets and Diagnoses

A minimal conflict set has the property that deletion of any formula in it yields a set of formulas which is correct in the context of B, P, N and R.

Proposition 4.5. If C is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R, then  C′is valid w.r.t.  ⟨·, B, P, N ⟩Rfor each  C′ ⊂ C.

Proof. Since  C ∈ mC⟨K,B,P,N⟩R, it must hold that  C′ /∈ aC⟨K,B,P,N⟩R. Then, by Corollary 4.1,  C′is valid w.r.t.  ⟨·, B, P, N ⟩R.

Hence, by deletion of at least one formula from each minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R, a valid KB can be obtained from K. Thus, a solution KB  (K\D)∪UPcan be obtained by calculation of a hitting set D of all minimal conflict sets in  mC⟨K,B,P,N⟩R. The Hitting Set problem is defined as follows:

Definition 4.5 (Hitting Set). Let  S = {S1, . . . , Sn}be a set of sets. Then, H is called a hitting set of S iff  H ⊆ USand  H ∩ Si ̸= ∅for all i = 1, . . . , n.A hitting set H of S is minimal iff there is no hitting set  H′of S such that  H′ ⊂ H.

Proposition 4.6. [FS05] A (minimal) diagnosis w.r.t. the DPI  ⟨K, B, P, N ⟩Ris a (minimal) hitting set of all minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R.

Now, we want to contemplate two example DPIs and analyze them regarding the their minimal conflict sets and minimal diagnoses:

Example 4.2 In this example, we analyze the PL DPI  ⟨K, B, P, N ⟩Rgiven by Table 15.3. There are two minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R, i.e.  mC⟨K,B,P,N⟩R = {C1, C2} = {⟨1, 2, 5⟩ , ⟨1, 2, 7⟩}.18

Why is  C1a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We recall Definition 4.1 and argue as follows to deduce the entailment  C1 |= n1where  n1 ∈ N(left of the colon: the formulas used in the deduction are underlined; right of the colon: the relevant implications are underlined):

image

Minimality of  C2is obvious from this argumentation. i.e. we cannot deduce  n1if any one of the formulas 1, 2 or 5 is omitted, and there is no other fault except for the violation of  n1.

Why is  C2a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We recall Definition 4.1 and argue as follows to deduce the entailment  C2 ∪ B |= n1where  n1 ∈ N(left of the colon: the formulas used in the deduction are underlined; right of the colon: the relevant implications are underlined):

image

Minimality of  C2is obvious from this argumentation. i.e. we cannot deduce  n1if any one of the formulas 1, 2 or 7 is omitted, and there is no other fault except for the violation of  n1. There are no further minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R. This is fairly easy to see since

• K ∪ B ∪ UP = K ∪ Bcannot be inconsistent due to the fact that the only negative literal occurring on the righthand side of an implication is  ¬Aand A does not occur at the righthand side of any implication in  K ∪ B,

there is no other way to deduce  n1than using a superset of the formulas in  C1or  C2and

 n1is the only negative test case in N .

Hence, the set of all minimal diagnoses  mD⟨K,B,P,N⟩R = {D1, D2, D3} = {[1], [2], [5, 7]}is obtained by computing all minimal hitting sets of  mC⟨K,B,P,N⟩R = {C1, C2}(cf. Proposition 4.6).

Example 4.3 In this example, we analyze the DL DPI  ⟨K, B, P, N ⟩Rgiven by Table 4.2. There are four minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R, i.e.

image

Why is  C1a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We recall Definition 4.1 and argue as follows to deduce the entailment  C1 |= n1where  n1 ∈ N(left of the colon: the formulas used in the deduction are underlined;

right of the colon: the relevant implications are underlined):

image

Minimality of  C1is follows from this argumentation. i.e. we cannot deduce  n1if any one of the formulas 1, 2 or 5 is omitted, and from the fact that we cannot deduce an incoherency (r2), inconsistency (r1) or the entailment of any other negative test case  n ∈ Nfor any KB  C′1 ∪ B ∪ UPfor any  C′1 ⊂ C1.

Why is  C2a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We recall Definition 4.1 and argue as follows to deduce that  C2 ∪ Bis incoherent and thus violates the requirement  r2 ∈ R(left of the colon: the formulas used in the deduction are underlined; right of the colon: the relevant implications are underlined):

image

Since we cannot deduce an incoherency (r2), inconsistency (r1) or the entailment of any negative test case  n ∈ Nfor any KB  C′2 ∪ B ∪ UPfor any  C′2 ⊂ C2, the minimality of  C2follows.

Why is  C3a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We recall Definition 4.1 and argue as follows to deduce that  C3 ∪ B ∪ UPis inconsistent and thus violates the requirement  r1 ∈ R(left of the colon: the formulas used in the deduction are underlined; right of the colon: the relevant implications are underlined):

image

No inconsistency (r1) or incoherency (r2) can be derived and no negative test case  n ∈ Nis entailed from any  C′3 ∪ B ∪ UPfor  C′3 ⊂ C3. Hence,  C3is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

Why is  C4a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We recall Definition 4.1 and argue as follows to deduce the entailment  C4 ∪ B |= n2where  n2 ∈ N(left of the colon: the formulas used in the deduction are

underlined; right of the colon: the relevant implications are underlined):

image

No inconsistency (r1) or incoherency (r2) can be derived and no negative test case  n ∈ Nis entailed from any  C′4 ∪ B ∪ UPfor  C′4 ⊂ C4. Thus,  C4is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

Hence, the set of all minimal diagnoses  mD⟨K,B,P,N⟩R, obtained by computing all minimal hitting sets of  mC⟨K,B,P,N⟩R = {C1, C2, C3, C4}(cf. Proposition 4.6), comprises ten minimal diagnoses  Difor i = 1, . . . , 10:

image

Although the DPI  ⟨K, B, P, N ⟩Ris very small in size, i.e. number of formulas occurring in it is very small, the reader might agree that it is not trivial on the one hand (1) to realize which subsets of this KB K are (minimal) conflict sets, (2) to see that or why a subset of this KB K along with the background knowledge B and the union of the positive test cases  UPis a (minimal) conflict set (cf. [HBP11]), and (3) to assess that there are no further minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R. This example gives a little bit of an impression that tool assistance in the debugging of KBs is inevitable especially for real-world KBs that are huge in size and/or complex in terms of the expressivity of the used logic or in terms of their “debugging properties”, i.e. large number and/or size of minimal conflict sets and/or minimal diagnoses.

A means to handle problems (1) and (3) is provided by some method for the computation of a minimal conflict set (e.g. QX given by Algorithm 1 below, see Section 4.4.1) coupled with a hitting set tree algorithm (e.g. HS described by Algorithm 2 below, see Section 4.5) for the systematic computation of different minimal conflict sets, or other mechanisms such as the ALL_JUST_ALG presented in [KPHS07] which computes all justifications for some particular entailment (but, some post-processing of the justifications is necessary to obtain minimal conflict sets, cf. Section 4.2).

Problem (2) and its complexity for humans has been studied in [HBP11] with a focus on justifica-tions in DL or OWL KBs. Since a minimal conflict set can be regarded as the relevant (i.e. potentially faulty) part of a justification for some undesired entailment (i.e. a violated requirement or test case) as we analyzed in Section 4.2, the cognitive complexity model proposed by [HBP11] applies also to minimal conflict sets. Ways to facilitate the understanding of justifications for humans (that might be successfully applied also to conflict sets) have been addressed in [HPS10, HPS09, HPS08]. Moreover, there is an ontology editing browser SWOOP [KPS+06] equipped with a strikeout feature [Kal06] that highlights parts of justifications that are relevant for the entailment by striking out all irrelevant parts. This is more or less the automation of our analyses of the conflict sets by underlining the relevant parts of the formulas in this example and Example 4.2.

image

Table 4.1: Propositional Logic Example DPI

4.4 Methods for Diagnosis Computation

Two common methods employed for the computation of (minimal) diagnoses [SFFR12, RSFF13] are the QuickXPlain algorithm [Jun04] (in short QX) and a hitting set search tree [Rei87, GSW89] (in short HS). Thereby, QX serves as a deterministic method for computing one minimal conflict set w.r.t. a given DPI  ⟨K, B, P, N ⟩Rper call. Since a diagnosis is a hitting set of all minimal conflict sets, more than one minimal conflict set is generally required to compute a diagnosis. Due to its determinism, however, QX always computes the same minimal conflict set for the same input DPI. Thus, in order to compute different (or all) minimal conflict sets, the input to QX needs to be varied accordingly. This can be done by means of HS which serves as a search tree to systematically and successively explore all minimal conflict sets w.r.t. an initially given DPI. Note that often not all minimal conflict sets w.r.t. a DPI are necessary to obtain a minimal diagnosis w.r.t. this DPI. This is the case when different minimal conflict sets overlap, i.e. have a non-empty intersection. In the extreme case, when all minimal conflict sets w.r.t. a DPI share some formulas, then the computation of any single minimal conflict set can suffice to obtain a minimal diagnosis, which is actually even a minimum cardinality diagnosis.

Another approach for computing a minimal conflict set (or justification) is the “expand-and-shrink” algorithm presented in [KPHS07]. However, empirical evaluations and a theoretical analysis of the best and worst case complexity of the “expand-and-shrink” method compared to QX performed in [SFJ08] revealed that the latter is preferable over the former.

Also, alternative strategies for the computation of minimal diagnoses have been suggested. One common method is to avoid the indirection of diagnosis computation via minimal conflict sets and use algorithms that determine diagnoses directly [SU06], i.e. without the necessity to compute conflict sets. This approach has been applied for the non-interactive debugging of ontologies [DQPS11] and constraints [FSZ11]. In our previous work, we adopted such a direct technique for the interactive debugging of KBs [SFRF14c]. The reason why we stick to the conflict-based approach in this work is that we want to present best-first algorithms that figure out minimal diagnoses in descending order of their probability. This is not (systematically) realizable with a direct approach.

image

Table 4.2: Description Logic Example DPI

4.4.1 Computation of a Minimal Conflict Set

The QX algorithm takes a DPI  ⟨Korig, Borig, P, N ⟩Rover some monotonic logic L as input and returns a minimal conflict set  C ⊆ Korigw.r.t.  ⟨Korig, Borig, P, N ⟩Ras output, if some conflict set exists for the DPI, and ’no conflict’ otherwise.

Monotonic Properties. Basically, QX can be employed to find for an input set X a set-minimal subset Xmin ⊆ Xthat has a certain property prop for problems of completely different nature such as propositional unsatisfiability or over-constrainedness of constraint satisfaction problems. The only postulated prerequisite for QX to work correctly is that prop is a monotonic property. A property is monotonic if and only if the binary function that returns 1 if the property holds for the input set and 0 otherwise is a monotonic function.

Definition 4.6 (Binary Monotonic Function). Let X be a set and  f : 2X → {0, 1}be a binary function defined for all subsets of X. Then, f is monotonic iff

image

So, prop is monotonic iff, given that prop holds for some set  X′, it follows that prop also holds for any superset  X′′of  X′. Note that, by simple logical transformation, an equivalent statement can be derived from Definition 4.6; namely that, given that prop does not hold for some set  X′′, it follows that prop does not hold for any subset  X′of  X′′either.

As inconsistency and incoherency as well as the entailment of some  n ∈ Nover some monotonic language L are clearly monotonic properties, the following proposition holds.

Proposition 4.7. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then, the invalidity of  K′ ⊆ Kw.r.t.  ⟨·, B, P, N ⟩R(as per Definition 3.3) is a monotonic property.

By Corollary 4.1, a (minimal) conflict set w.r.t.  ⟨K, B, P, N ⟩Ris a (minimal) invalid sub-KB of K w.r.t.  ⟨·, B, P, N ⟩R. Therefore:

Corollary 4.2. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then, being a conflict set w.r.t.  ⟨K, B, P, N ⟩Ris a monotonic property.

Thus, QX is applicable for the problem of finding a minimal conflict set w.r.t. a DPI. As we shall see later in Chapter 8, another monotonic property will enable us to apply QX also for the minimization of queries asked to an interacting user in the interactive debugging of KBs.

How QX (Algorithm 1) Works. After verifying that the trivial cases, i.e.  Korigis already a valid KB w.r.t.  ⟨·, Borig, P, N ⟩Ror  Korig = ∅, are not met, a non-empty minimal conflict set w.r.t.  ⟨Korig, Borig, P, N ⟩Rmust exist. So, the algorithm enters the recursive procedure QX′(∅, ⟨Korig, Borig, P, N ⟩R). Note that the parameters P, N , R of QX′are used for validity tests (ISKBVALID, line 9) only and are maintained invariant during the entire recursive execution. In case  Korigis not a singleton, i.e. it does not hold for sure that  Korigis an element of a minimal conflict set w.r.t.  ⟨Korig, Borig, P, N ⟩R, the idea is to apply a divide-and-conquer strategy to reduce  Koriginto two subproblems and solve one subproblem first, i.e. find a minimal conflict set for this subproblem, and then the second subproblem. The union of the minimal conflict sets found for the subproblems is then a minimal conflict set for the original problem. This division into smaller problems is recursively executed for each subproblem until the trivial case, i.e. the KB of the subproblem that is analyzed includes only one element, occurs. Then this element is an element of a minimal conflict set w.r.t. the original problem.

Simply put, one can imagine that QX takes  Korig, partitions it into  K1and  K2and first considers the DPI with KB  K2and background knowledge  B ∪ K1(line 16). If the latter already includes a conflict set (second condition in line 9), then  K2can be safely discarded and does not need to be further considered. Instead,  K1is further investigated, i.e. the DPI with KB  K1,2and background knowledge  B ∪ K1,1where K1,1and  K2,2partition  K1. Notice that, in this way,  |K2|sentences can be dismissed by a single call to ISKBVALID which is the only function in Algorithm 1 that calls a reasoner.

If, on the other hand,  B ∪K1includes no conflict set,  K2is partitioned into  K2,1and  K2,2and the two DPIs, the first with KB  K2,2and background knowledge  B ∪K1 ∪K2,1and the second with KB  K2,1and background knowledge  B ∪ K1 ∪ C2,2, are recursively analyzed where  C2,2is the result computed for the first DPI.

This recursion is executed until encountering a trivial case, i.e. a leaf node of the recursion tree, along each path. Then, the recursion unwinds by building the union of all leaf nodes, i.e. the union of all returned sets for subproblems where a trivial case occurred.

The next example illustrates one execution of QX which computes one minimal conflict set:

Example 4.4 Let us consider the DL example DPI depicted by Table 4.3. We will now demonstrate how a minimal conflict set is computed by Algorithm 1 (see Fig. 4.1). Since K is not the empty set and not a valid KB w.r.t. the DPI (conditions in lines 4 and 2 are false), QX′(∅, ⟨K, B, P, N ⟩R)is called in line 7. This call is illustrated by the root node (node 1⃝) of the recursion tree given in Fig. 4.1 (whereas the evaluations made by QX prior to this call are not depicted in the figure). Notice that each node in the tree shows only the values of C, K and B since all other parameters P, N and R are invariant throughout the entire execution of Algorithm 1.

Due to the fact that  C = ∅and K includes five formulas and is thus not a singleton,  K = {ax 1, . . ., ax 5}is partitioned into  K1 = {ax 1, ax 2, ax 3}and  K2 = {ax 4, ax 5}and QX′is recursively called in line 16 with parameters  C = K1, K = K2and  B = B ∪ {ax 1, ax 2, ax 3}which is expressed in the figure by a left branch to node 2⃝. This call, however, returns  ∅directly since  B ∪ {ax 1, ax 2, ax 3}is already invalid w.r.t.  ⟨·, ∅, P, N ⟩Rbecause  B ∪ {ax 1, ax 2, ax 3} ∪ UP =�A(w), A(v), s(v, w)�∪�A ⊑ B, B ⊑ E, B ⊑D ⊓ ¬∃s.C�∪{{B(w)}} |= {¬C(w)}which is a negative test case, i.e. must not

Algorithm 1 QX: Computation of a Minimal Conflict Set

image

be entailed by a solution KB w.r.t. the input DPI (the parts of the formulas relevant for the entailment to hold are underlined). Returning  ∅in this case means discarding  K2 = {ax 4, ax 5}.

So, the algorithm opens a right branch from the root to node 3⃝by calling QX′(line 17) with parameters  C = ∅(result of left branch),  K = K1 = {ax 1, ax 2, ax 3}and B = B. During the execution of this call  K1is partitioned into  {ax 1, ax 2}(left branch to node 4⃝) and  {ax 3}(right branch to node 5⃝). In node 4⃝, it holds that  B ∪{ax 1, ax 2}can be extended to a solution KB by adding  UP, i.e.  B ∪{ax 1, ax 2}is valid. As it is already an established fact since the execution of node 2⃝that  B ∪ {ax 1, ax 2, ax 3}is invalid, it must be the case that  ax 3is an element of a minimal conflict set w.r.t. the input DPI (as there is a conflict set w.r.t. the input DPI in  {ax 1, ax 2, ax 3}, but there is none in  {ax 1, ax 2}). The algorithm accounts for that by checking whether K is a singleton (line 11) in which case it is guaranteed that K is a subset of a minimal conflict set w.r.t. the input DPI. So, node 4⃝returns  {ax 3}. This procedure is continued until each path from the root node reaches a node where a trivial case is met. Then the recursion unwinds and, when arrived at the root node, the minimal conflict set  ⟨ax 1, ax 3⟩is returned.

That  C := ⟨ax 1, ax 3⟩is indeed a conflict set can be recognized easily by the underlinings in the formulas given before. Minimality is given since  B ∪ C ∪ UPis neither inconsistent nor incoherent and the deletion of any formula from C breaks the entailment of  n1. Hence, QX has returned a sound output.

image

Table 4.3: Description Logic Example DPI 2

The complexity of Algorithm 1 in terms of the number of calls to the function ISKBVALID, which is the only place in the algorithm where a reasoning service is consulted, is captured by the following proposition.

Proposition 4.8 (Complexity of QX). [Jun04] Let  ⟨K, B, P, N ⟩Rbe a DPI and the function SPLIT (line 13 of Algorithm 1) be defined as SPLIT(n) = ⌊ n2 ⌋where n is a natural number. Then, the worst case number of calls to ISKBVALID during one call to QX(⟨K, B, P, N ⟩R)is in  O(|C| log |K||C| )where C is the output of QX(⟨K, B, P, N ⟩R).

For any other definition of the function SPLIT, the worst case number of ISKBVALID invocations gets larger.

4.4.2 Correctness of Conflict Set Computation

This section is dedicated to the proof of correctness of Algorithm 1. First, we show some essential properties of QX by various Lemmata which will finally be exploited to demonstrate the overall soundness of QX.

The QX algorithm accepts a DPI  ⟨Korig, Borig, P, N ⟩Rover some monotonic language L as input and returns a minimal conflict set  C ⊆ Korigw.r.t.  ⟨Korig, Borig, P, N ⟩Ras output. First, the algorithm checks whether  Korigis a valid KB w.r.t. the input DPI  ⟨·, Borig, P, N ⟩R(line 2). If so, there is no conflict set for the DPI by Proposition 4.1 and the algorithm returns ’no conflict’. Otherwise, the test  Korig = ∅is performed (line 4). If so, then the negative outcome of the validity test executed in line 2 actually means that one of the two criteria of Proposition 3.4 is violated which, by Definition 3.6, implies that the DPI is not admissible. Invalidity of  Korigw.r.t.  ⟨·, Borig, P, N ⟩Rand non-admissiblity of  ⟨Korig, Borig, P, N ⟩Rmean that there is only one minimal conflict set  C = ∅by Proposition 4.2. Thus,  ∅is returned in line 5.

Lemma 4.1. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and K be invalid w.r.t.  ⟨·, B, P, N ⟩R. Then, there is a minimal conflict set  C ⊃ ∅w.r.t.  ⟨K, B, P, N ⟩R.

Proof. The proposition is a direct consequence of Proposition 4.2.

image

Figure 4.1: Recursion tree produced during the computation of the minimal conflict set  ⟨ax 1, ax 3⟩w.r.t. the DPI shown by Table 4.3 using Algorithm 1. Nodes in the depicted tree represent calls QX′(C, ⟨K, B, P, N ⟩R)

and are written in format  C, K, B k⃝ where kis a counter starting from 1 that indicates when the respective call is made. A recursive call to QX′ (left branch = call in line 16; right branch = call in line 17) is denoted by a normal arrow whereas the return of a set is visualized by a dashed arrow.

So, if both initial tests (lines 2 and 4) are negative, then, by Lemma 4.1, there is a non-trivial minimal conflict set w.r.t.  ⟨Korig, Borig, P, N ⟩Rwherefore the algorithm enters the recursion by a call to the procedure QX′.

The argumentation so far proves the following lemma.

Lemma 4.2.

QX(⟨K, B, P, N ⟩R)returns ’no conflict’ iff there is no (minimal) conflict w.r.t.  ⟨K, B, P, N ⟩R.

QX(⟨K, B, P, N ⟩R)returns  ∅iff  ∅is the only (minimal) conflict w.r.t.  ⟨K, B, P, N ⟩R.

QX(⟨K, B, P, N ⟩R)returns QX′(∅, ⟨K, B, P, N ⟩R)iff there is some minimal conflict  C ⊃ ∅w.r.t. ⟨K, B, P, N ⟩R.

Corollary 4.3. QX(⟨K, B, P, N ⟩R)returns QX′(∅, ⟨K, B, P, N ⟩R)iff  ⟨K, B, P, N ⟩Ris an admissible DPI.

Proof. By the third proposition of Lemma 4.2 and Proposition 4.1 we have that QX(⟨K, B, P, N ⟩R)returns QX′(∅, ⟨K, B, P, N ⟩R)iff K is invalid w.r.t.  ⟨·, B, P, N ⟩R. By Proposition 4.2, we can then conclude that QX(⟨K, B, P, N ⟩R)returns QX′(∅, ⟨K, B, P, N ⟩R)iff  ⟨K, B, P, N ⟩Ris an admissible DPI.

The input arguments (at any call) to QX′are (a) some subset C of the original input KB  Korigto QX and (b) a DPI  ⟨K, B, P, N ⟩Rwhere  K ⊆ Korigand  B ⊇ Borig.

The principle of QX′relies on the following fact.

Lemma 4.3. [Jun04] Let  K1, K2be a partition of K. If  C2is a minimal conflict set w.r.t.  ⟨K2, B ∪K1, P, N ⟩Rand  C1is a minimal conflict set w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R, then  C1 ∪ C2is a minimal conflict set w.r.t.  ⟨K1 ∪ K2, B, P, N ⟩R = ⟨K, B, P, N ⟩R.

Proof. Since  C1is a minimal conflict set w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R, we have that  C1is invalid w.r.t. ⟨·, B ∪ C2, P, N ⟩R. From that we obtain that  C1 ∪ C2must be invalid w.r.t.  ⟨·, B, P, N ⟩R. Further on, by the fact that  K1, K2partition K we have that  C1 ⊆ K1 ⊆ Ksince  C1is a minimal conflict set w.r.t. ⟨K1, B ∪ C2, P, N ⟩Rand  C2 ⊆ K2 ⊆ Ksince  C2is a minimal conflict set w.r.t.  ⟨K2, B ∪ K1, P, N ⟩R. Consequently,  C1∪C2 ⊆ Kmust be true. So, by Corollary 4.1,  C1∪C2is a conflict set w.r.t.  ⟨K, B, P, N ⟩R.

To show the minimality of  C1 ∪ C2, assume that  C ⊂ C1 ∪ C2is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Due to  K1 ∩ K2 = ∅and  C1 ⊆ K1and  C2 ⊆ K2, it must hold that  C1 ∩ C2 = ∅. Thus, (1)  C ∩ C1 ⊂ C1or (2)  C ∩ C2 ⊂ C2.

Let us assume (1) holds. Then, C is invalid w.r.t.  ⟨·, B, P, N ⟩R, i.e.  C∪B∪UP = (C′1∪C2)∪B∪UP =C′1 ∪(B∪C2)∪UPviolates some  r ∈ Ror some  n ∈ Nwhere  C′1 ⊂ C1. This, however, is a contradiction to the minimality of the conflict set  C1w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R.

Now, let us assume (2) holds. Then, C is invalid w.r.t.  ⟨·, B, P, N ⟩R, i.e.  C ∪ B ∪ UP = (C1 ∪ C′2) ∪B ∪ UPviolates some  r ∈ Ror some  n ∈ Nwhere  C′2 ⊂ C2. By monotonicity of L and  C1 ⊆ K1, this implies  C′2 ∪ (K1 ∪ B) ∪ UPviolates some  r ∈ Ror some  n ∈ N, i.e.  C′2 ⊂ K2is a conflict set w.r.t. ⟨K2, B ∪ K1, P, N ⟩Rwhich is a contradiction due to  C′2 ⊂ C2and the minimality of the conflict set  C2w.r.t.  ⟨K2, B ∪ K1, P, N ⟩R.

QX′(C, ⟨K, B, P, N ⟩R)computes a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rin a divide-and-conquer fashion whereby the argument C is the set of sentences of  Korigthat has been added to B in the current iteration. That is, in this iteration QX′will output either (1)  ∅if the current B (which includes C) already contains a minimal conflict set w.r.t. the original DPI  ⟨Korig, Borig, P, N ⟩Ror (2) a minimal conflict set w.r.t. the current DPI  ⟨K, B, P, N ⟩R(i.e. a subset of a minimal conflict set w.r.t. the original DPI) which does not include any sentence from C.

Lemma 4.4.

1. For each call QX′(C, ⟨K, B, P, N ⟩R)within Algorithm 1 it holds that  C ⊆ B.

2. If QX′(C, ⟨K, B, P, N ⟩R)is called in line 16 of Algorithm 1,  C ̸= ∅holds.

3. If QX′(C, ⟨K, B, P, N ⟩R)returns  ∅, then there is some non-empty minimal conflict set w.r.t.  ⟨C, B \C, P, N ⟩R.

4. If QX′(C, ⟨K, B, P, N ⟩R)returns  ∅, then  ∅is the only minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

5. QX′(C, ⟨K, B, P, N ⟩R)terminates.

Proof.

1): There are three situations when QX′(C, ⟨K, B, P, N ⟩R)is called within Algorithm 1, namely in lines 7, 16 and 17. In line 7,  C := ∅ ⊆ Bholds. In line 16,  C := K1 ⊆ B ∪ K1 =: Bholds. In line 17, C := C2 ⊆ B ∪ C2 =: Bholds.2): In line 16, QX′is called with  C := K1, which is always not the empty set due to the definition of the SPLIT function in line 13 that is used to extract  K1from K.

3): The first observation is that QX′(C, ⟨K, B, P, N ⟩R)cannot return  ∅if  C = ∅as in this case the first condition in line 9 is not met. Thus, in particular, QX′cannot return  ∅if called in line 7.

So,  ∅can be returned by QX′(C, ⟨K, B, P, N ⟩R)only if it is called (1) in line 16 or (2) in line 17.

If QX′(C, ⟨K, B, P, N ⟩R)returns  ∅, then  C ̸= ∅and B is invalid w.r.t.  ⟨·, ∅, P, N ⟩R(line 9), i.e. B contains a minimal conflict set w.r.t.  ⟨B, ∅, P, N ⟩Rwhich is non-empty by Proposition 4.2 since ⟨B, ∅, P, N ⟩Ris an admissible DPI by admissibility of the input DPI and the invariance of P, N , R throughout QX′. Additionally,  C ⊆ Bholds by the first proposition of this lemma. Now, assume that there is no non-empty (minimal) conflict set w.r.t.  ⟨C, B \ C, P, N ⟩R. Then, for each minimal conflict set C′(which we know is non-empty) w.r.t.  ⟨B, ∅, P, N ⟩Rit must hold that  C ∩ C′ = ∅, i.e. there is already a non-empty minimal conflict set w.r.t.  ⟨B \ C, ∅, P, N ⟩R.

Case (1): Let us assume first that the call to QX′was made in line 16. Then, before this call to QX′, B was exactly B \ C. By the second proposition of this lemma,  C ̸= ∅as QX′was called in line 16. Thus, before the current call to QX′, the algorithm must have already returned  ∅(both conditions in line 9 are met) in line 10 which is a contradiction to the assumption that QX′(C, ⟨K, B, P, N ⟩R)was called in line 16.

Case (2): Now, assume that the call to QX′(C2, ⟨K1, B ∪ C2, P, N ⟩R)was made in line 17. Then  C2is the result of the call to QX′(K1, ⟨K2, B ∪ K1, P, N ⟩R)in line 16. By the argumentation above, we have that  C2 ̸= ∅and there is a non-empty minimal conflict set w.r.t.  ⟨B ∪ C2, ∅, P, N ⟩R. Moreover, we have that there is a non-empty minimal conflict set w.r.t.  ⟨B, ∅, P, N ⟩R. However, as QX′(K1, ⟨K2, B ∪K1, P, N ⟩R)in line 16 did not return  ∅and  K1 ̸= ∅by the second proposition of this lemma, it must hold that  B ∪ K1is valid w.r.t.  ⟨·, ∅, P, N ⟩R, i.e. there is no (minimal) conflict set w.r.t.  ⟨B ∪ K1, ∅, P, N ⟩R. By monotonicity of L, this is a contradiction to the fact that there is a non-empty minimal conflict set w.r.t.  ⟨B, ∅, P, N ⟩R.

4): Assume QX′(C, ⟨K, B, P, N ⟩R)returns  ∅and there is some non-empty minimal conflict set w.r.t. ⟨K, B, P, N ⟩R. Since  ∅is returned, both conditions in line 2 must be met, i.e. in particular B must be invalid w.r.t.  ⟨·, ∅, P, N ⟩Rwhich means that  ⟨K, B, P, N ⟩Ris not admissible. By Proposition 4.2, there cannot be a non-empty (minimal) conflict set w.r.t.  ⟨K, B, P, N ⟩R. This yields a contradiction.

5): QX′(C, ⟨K, B, P, N ⟩R)either returns  ∅in line 10 iff the conditions in line 9 are met or otherwise returns K in line 12 iff |K| = 1 or otherwise calls itself recursively in lines 16 and 17. However, for each recursive call QX′(C′, ⟨K′, B′, P, N ⟩R)within QX′(C, ⟨K, B, P, N ⟩R)it holds that  K′ ⊂ Kas K′ ∈ {K1, K2}and  K1, K2 ⊂ Kdue to the definition of the SPLIT function in line 13 that is used to compute  K1and  K2from K in lines 14 and 15. Hence, each recursive call must finally reach the stopping criterion |K| = 1 and return K if it does not reach the stopping criterion in line 9 before.

Lemma 4.5. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI. If QX′(C, ⟨K, B, P, N ⟩R)is called, then at least one of the immediate recursive calls of QX′in line 16 or line 17 is given an admissible DPI as argument.

Proof. Let us assume that  ⟨K, B, P, N ⟩Ris an admissible DPI. Within QX′(C, ⟨K, B, P, N ⟩R), the immediate recursive call is QX′(K1, ⟨K2, B ∪K1, P, N ⟩R)in line 16 and QX′(C2, ⟨K1, B ∪C2, P, N ⟩R)in line 17 where  K1, K2is a partition of K and  C2is the result of QX′(K1, ⟨K2, B ∪ K1, P, N ⟩R). If ⟨K2, B ∪ K1, P, N ⟩Ris admissible, then the proposition of the lemma is fulfilled. So, assume that that ⟨K2, B ∪ K1, P, N ⟩Ris not admissible. Due to this non-admissibility, it must hold that  B ∪ K1is invalid w.r.t.  ⟨·, ∅, P, N ⟩R, so the second condition in line 2 is met. As the call to QX′(K1, ⟨K2, B∪K1, P, N ⟩R)was made in line 16, it must be true by Lemma 4.4, prop. 2 that  K1 ̸= ∅wherefore the first condition in line 2 is met as well. Thus, the result of the call of QX′in line 16 must be  ∅. So, the call of QX′in line 17 looks like QX′(∅, ⟨K1, B, P, N ⟩R). However, the DPIs  ⟨K1, B, P, N ⟩Rand  ⟨K, B, P, N ⟩Rare identical except for the first entries, i.e.  K1and K. We know that the latter DPI is admissible. Due to the fact that admissibility of a DPI is defined independently of the KB (the first entry of the DPI tuple), we have that ⟨K1, B, P, N ⟩Rmust be admissible. This completes the proof.

As long as the algorithm goes downwards in the recursion tree (and has never gone upwards), (1) the invariant that a minimal conflict set exists for each recursive call to QX′holds, (2) each call to QX′that returns, returns a singleton or empty set and (3) the two calls to QX′immediately before going upwards in the recursion tree for the first time must both return either a singleton or an empty set.

Lemma 4.6 (QX: Downwards Correctness). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and let there be a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then, the following propositions hold:

1. Before line 18 has ever been reached during the execution of QX′(C, ⟨K, B, P, N ⟩R), the following holds: If some call to QX′(C′, ⟨K′, B′, P, N ⟩R)returns a set S, then  S = ∅or |S| = 1.

2. Before line 18 has ever been reached during the execution of QX′(C, ⟨K, B, P, N ⟩R), the following holds: If QX′(C′, ⟨K′, B′, P, N ⟩R)is recursively called, then there is some non-empty minimal conflict set w.r.t.  ⟨K′ ∪ C′, B′ \ C′, P, N ⟩R.

3. Before line 18 has ever been reached during the execution of QX′(C, ⟨K, B, P, N ⟩R), the following holds: If some call to QX′(C′, ⟨K′, B′, P, N ⟩R)returns a set S, then S is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

4. When line 18 is reached for the first time, each of the calls to QX′immediately before in lines 16 and 17 must have returned  ∅or some K with |K| = 1.

Proof.

1): Assume the opposite, i.e. some call to QX′(C′, ⟨K′, B′, P, N ⟩R)returns a set S with |S| > 1before line 18 has ever been reached. There are three places where QX′can return, namely in line 10, in line 12 or in line 18. However, in line 10, only  ∅and in line 12 only a singleton set can be returned. That is, S must be returned in line 18 which is a contradiction to the assumption that line 18 has not yet been reached.

2): Induction Base: The first recursive call QX′(C′, ⟨K′, B′, P, N ⟩R)can only occur at line 16 where C′ = K1, K′ = K2and  B′ = B ∪ K1and  K1, K2is a partition of K as per the definition of the SPLIT and GET functions in lines 13-15. So,  K′ ∪ C′ = Kand  B′ \ C′ = B. The latter holds since  C′ ⊆ Kand for each DPI  K ∩ B = ∅holds by Definition 3.1. As there is a non-empty minimal conflict set w.r.t. ⟨K, B, P, N ⟩Rwe have that there is a non-empty minimal conflict set w.r.t.  ⟨K′ ∪ C′, B′ \ C′, P, N ⟩Rby the fact that  ⟨K, B, P, N ⟩R = ⟨K′ ∪ C′, B′ \ C′, P, N ⟩R. Thus, the existence of a non-empty minimal conflict set w.r.t.  ⟨K′ ∪ C′, B′ \ C′, P, N ⟩Ris given during the execution of the first recursive call to QX′.

Induction Assumption: Now, let us assume that the existence of a non-empty minimal conflict set w.r.t.  ⟨K ∪ C, B \ C, P, N ⟩Ris given during some call QX′(C, ⟨K, B, P, N ⟩R). The goal is now to show that the existence of a non-empty minimal conflict set w.r.t.  ⟨K′ ∪ C′, B′ \ C′, P, N ⟩Ris given during any recursive call QX′(C′, ⟨K′, B′, P, N ⟩R)that is invoked during execution of QX′(C, ⟨K, B, P, N ⟩R).Induction Step: Now, there are three cases where this recursive call to QX′can take place, namely (1) in line 16, (2) in line 17 where the result of QX′in line 16 is  C2 = ∅and (3) in line 17 where the result of QX′in line 16 is some  C2with  |C2| = 1. The case where some  C2with  |C2| > 1is returned by QX′in line 16, is impossible due to the assumption that line 18 has not yet been reached and the first proposition of this lemma.

Case (1): Let us assume that the call QX′(C′, ⟨K′, B′, P, N ⟩R)is made in line 16. Since that call is made within QX′(C, ⟨K, B, P, N ⟩R), it must hold that some condition in line 2 during QX′(C, ⟨K, B, P, N ⟩R)is violated, as otherwise a return would have taken place in line 10 which is a contradiction to the assumption that QX′(C′, ⟨K′, B′, P, N ⟩R)is called in line 16.

Let us first assume that  C = ∅holds. In this case, the first condition in line 2 is violated and, by the Induction Assumption, it is true that there is a non-empty minimal conflict set w.r.t. the DPI  ⟨K ∪C, B \ C, P, N ⟩Rwhich is equal to the DPI  ⟨K, B, P, N ⟩Rby  C = ∅. So, an equal argumentation to the one of the Induction Base can be applied to derive that there is a non-empty minimal conflict set w.r.t. ⟨K′ ∪ C′, B′ \ C′, P, N ⟩R.

If  C ̸= ∅holds, on the other hand, then the first condition in line 2 is satisfied wherefore the second condition in line 2 must be violated. That is, there is no conflict set w.r.t.  ⟨B, ∅, P, N ⟩R. As there is a non-empty minimal conflict set w.r.t.  ⟨K ∪ C, B \ C, P, N ⟩Rby the Induction Assumption,  C ⊆ Bby Lemma 4.4, prop. 1 and  |K| ≥ 2by the fact that there was no return in line 12, there must be a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Again, an equal argumentation to the one of the Induction Base can be applied to derive that there is a non-empty minimal conflict set w.r.t.  ⟨K′ ∪ C′, B′ \ C′, P, N ⟩R.

Case (2): Here, we assume that the recursive call QX′(C′, ⟨K′, B′, P, N ⟩R)is made in line 17 and the result of QX′in line 16 is  C2 = ∅. So, it holds that  C′ = C2 = ∅, K′ = K1and  B′ = B, i.e. the recursive call can be written as QX′(∅, ⟨K1, B, P, N ⟩R). By the fact that QX′(K1, ⟨K2, B∪K1, P, N ⟩R)called in line 16 returned  ∅, both conditions in line 2 during QX′(K1, ⟨K2, B ∪ K1, P, N ⟩R)must have been met. Thus, in particular the existence of a non-empty minimal conflict set w.r.t.  ⟨B ∪ K1, ∅, P, N ⟩Rmust be given. Further on, by the Induction Assumption there is a non-empty minimal conflict set w.r.t. ⟨C ∪ K, B \ C, P, N ⟩R.

Let us first assume  C = ∅. In this case  ⟨C ∪ K, B \ C, P, N ⟩Rcan be written as  ⟨K, B, P, N ⟩Rand it holds that there is a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R, i.e. K is invalid w.r.t. ⟨·, B, P, N ⟩R. By Proposition 4.2, this implies that  ⟨K, B, P, N ⟩Ris admissible. In other words, there is no conflict set w.r.t.  ⟨B, ∅, P, N ⟩R. Consequently, there must be a non-empty minimal conflict set w.r.t. ⟨K1, B, P, N ⟩R.

If  C ̸= ∅, on the other hand, then the second condition in line 2 during QX′(C, ⟨K, B, P, N ⟩R)must be invalid, i.e. there is no conflict set w.r.t.  ⟨B, ∅, P, N ⟩R. Consequently, there must be a non-empty minimal conflict set w.r.t.  ⟨K1, B, P, N ⟩R.

Case (3): Here, we assume that the recursive call QX′(C′, ⟨K′, B′, P, N ⟩R)is made in line 17 and the result of QX′in line 16 is  C2 ̸= ∅. As  C2 ̸= ∅and line 18 has never been reached by assumption,  C2must have been returned in line 12 of QX′(K1, ⟨K2, B ∪ K1, P, N ⟩R)(which was called in line 16) wherefore C2 = K2must hold. So, it holds that  C′ = K2, K′ = K1and  B′ = B ∪ K2, i.e. the recursive call can be written as QX′(K2, ⟨K1, B ∪ K2, P, N ⟩R). By the Induction Assumption, there is a non-empty minimal conflict set w.r.t.  ⟨C ∪ K, B \ C, P, N ⟩R. Moreover,  C ⊆ Bby Lemma 4.4, prop. 1 and (*) there is a non-empty minimal conflict set w.r.t. the DPI  ⟨K, B, P, N ⟩Rwhich is equal to the DPI  ⟨K1 ∪ K2, B, P, N ⟩Rby the fact that  K1, K2partition K as per the definition of the SPLIT and GET functions in lines 13-15.

What must still be proven, is (*): Let us first assume that  C = ∅holds. In this case,  ⟨C ∪ K, B \C, P, N ⟩R = ⟨K, B, P, N ⟩Rand thus there is a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

If  C ̸= ∅, on the other hand, then the second condition in line 2 during QX′(C, ⟨K, B, P, N ⟩R)must be invalid as otherwise  ∅would have been returned which is a contradiction to the assumption that the recursive call QX′(C′, ⟨K′, B′, P, N ⟩R)was invoked in line 17. So, there is no conflict set w.r.t. ⟨B, ∅, P, N ⟩R. Consequently, there must be a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rdue to  C ⊆ Bby Lemma 4.4, prop. 1.

3): Case  S ̸= ∅: By  S ̸= ∅and the fact that line 18 has not yet been reached, we obtain by the first proposition of this lemma that |S| = 1 must hold.

There are two cases that can trigger QX′(C, ⟨K, B, P, N ⟩R)to return K with |K| = 1, i.e. case 1 involving  C ̸= ∅and case 2 involving  C = ∅.

In case 1, B must be valid w.r.t.  ⟨·, ∅, P, N , ⟩Ras otherwise  ∅would be returned in line 10. So, there is no (minimal) conflict set w.r.t.  ⟨B, ∅, P, N ⟩R.

As |K| = 1 by assumption and by the fact that  C ⊆ B(holds by Lemma 4.4, prop. 1) and there is some non-empty minimal conflict set w.r.t.  ⟨K∪C, B \C, P, N ⟩R(holds by the second proposition of this lemma), K must include a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Since the only proper subset of K is the empty set, K must be a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

Case 2 can arise only when QX′(C, ⟨K, B, P, N ⟩R)is called in line 7 or line 17. In line 16 QX′is called with  C ̸= ∅by Lemma 4.4, prop. 2.

In line 7 QX′is called with  C = ∅and, by Corollary 4.3, with an admissible DPI  ⟨K, B, P, N ⟩Rfor

which a non-empty minimal conflict set exists as arguments. By the second proposition of this lemma, there is some non-empty minimal conflict set w.r.t.  ⟨K ∪ ∅, B \ ∅, P, N ⟩R = ⟨K, B, P, N ⟩R, and, by admissibility of  ⟨K, B, P, N ⟩R, there is no (minimal) conflict set w.r.t.  ⟨B, ∅, P, N ⟩R. By |K| = 1, K must be a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

A necessary condition for QX′to be called with  C = ∅in line 17 is obviously that QX′(K1, ⟨K2, B ∪K1, P, N ⟩R)called in line 16 returns  ∅. By the Lemma 4.4, prop. 3, there is some non-empty minimal conflict set w.r.t.  ⟨K1, B, P, N ⟩R. In line 17, the call QX′(∅, ⟨K1, B, P, N ⟩R)is made which, by assumption, returns  K1with  |K1| = 1. That means  K1is a minimal conflict set w.r.t.  ⟨K1, B, P, N ⟩R.

Case  S = ∅: Here, both conditions in line 2 must be met, i.e. in particular B is invalid w.r.t. ⟨·, ∅, P, N ⟩Rwhich implies that K is invalid w.r.t.  ⟨·, B, P, N ⟩Rand  ⟨K, B, P, N ⟩Ris admissible. Therefore, by Proposition 4.2, there is no non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. However, since K is invalid w.r.t.  ⟨·, B, P, N ⟩R, there must be a conflict set w.r.t.  ⟨K, B, P, N ⟩R. So, there is only the empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

4): This proposition is an immediate consequence of the first proposition of this lemma.

Lemma 4.7. Let  ⟨K, B, P, N ⟩Rbe a non-admissible DPI. Then,  ∅is the only minimal conflict set w.r.t. ⟨K, B, P, N ⟩Rand QX′(C, ⟨K, B, P, N ⟩R)with  C ̸= ∅returns  ∅immediately in line 10.

Proof. Since  ⟨K, B, P, N ⟩Ris non-admissible,  B ∪ UPviolates some  r ∈ Ror  B ∪ UP |= nfor some n ∈ N. Therefore,  ∅is invalid w.r.t.  ⟨·, B, P, N ⟩R, which, by Corollary 4.1, implies that  ∅is a (minimal) conflict set w.r.t.  ⟨K, B, P, N ⟩R.

QX′(C, ⟨K, B, P, N ⟩R)returns  ∅in line 10 as both conditions in line 9 are satisfied due to  C ̸= ∅and the non-admissibility of  ⟨K, B, P, N ⟩R.

Lemma 4.8. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI. Then QX′(C, ⟨K, B, P, N ⟩R)does not return in line 10.

Proof. By Definition 3.6, B must be valid w.r.t.  ⟨·, ∅, P, N ⟩R. Hence, the second condition in line 9 is not satisfied wherefore a return cannot take place in line 10.

Lemma 4.9. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and let there be a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then the following holds: When QX′(C, ⟨K, B, P, N ⟩R)reaches line 18 for the first time,  C1 ∪ C2is a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

Proof. The premises of this lemma are the same as those of Lemma 4.6. By Lemma 4.6, prop. 4 we know that for  C2and  C1that are returned by the the calls to QX′in lines 16 and 17  |C1| ≤ 1and  |C2| ≤ 1holds. Moreover, we know by Lemma 4.3 that  C1 ∪ C2is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

What remains open is to show that  C1 ∪ C2 ̸= ∅. To this end, we first assume that  C ̸= ∅. Then, by Lemma 4.7,  ⟨K, B, P, N ⟩Rmust be an admissible DPI since it does not return in line 10, but only in line 18.

If, on the other hand,  C = ∅holds, we can apply Lemma 4.6, prop. 2 to obtain that there is a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. This implies that K is invalid w.r.t.  ⟨·, B, P, N ⟩R. Therefore, we can conclude by means of Proposition 4.2 that  ⟨K, B, P, N ⟩Ris an admissible DPI. Thus, in both cases we have that  ⟨K, B, P, N ⟩Ris an admissible DPI. Applying Lemma 4.5 yieldsthat at least one recursive call to QX′in lines 16 and 17 is given an admissible DPI as argument. By Lemma 4.8, this call cannot return in line 10. So, it must return in line 12 by the assumption that line 18 has not yet been reached before, wherefore it must return a set of cardinality 1. This completes the proof.

As long as the algorithm goes upwards after going upwards for the first time, a non-empty minimal conflict set is propagated upwards.

Lemma 4.10 (QX: Upwards Correctness). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and let there be a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then: After QX′(C, ⟨K, B, P, N ⟩R)has reached line 18 for the first time, the following holds: As long as line 16 is not reached, each return in line 18 returns a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R.

Proof. The premises of this lemma are the same as those of Lemma 4.6. By Lemma 4.9 we know that a non-empty minimal conflict C set is returned at the first return that is made in line 18. As, by assumption, C is not the result  C2of a prior call to QX′in line 16, it must be the result  C1of a prior call to QX′in line 17. Since the premises of Lemma 4.6 are fulfilled, Lemma 4.6 can be applied. Since the call QX′(K1, ⟨K2, B ∪K1, P, N ⟩)(that returned  C2) in line 16 took place before line 18 was first reached, we have that  C2is a minimal conflict set w.r.t.  ⟨K2, B ∪ K1, P, N ⟩by Lemma 4.6, prop. 3. By Lemma 4.3, we have that  C2 ∪ Cis a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩. As long as line 16 is not reached, the same argumentation can be used to show that a minimal conflict set is returned in line 18.

When the algorithm goes downwards again after going upwards for the first time, the invariant that that a minimal conflict set exists for each recursive downwards call to QX′holds.

Lemma 4.11 (QX: Downwards-after-upwards Correctness). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and let there be a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then: After QX′(C, ⟨K, B, P, N ⟩R)has reached line 18 for the first time, the following holds: If line 16 is reached for the first time, then, if the DPI  ⟨K1, B∪C2, P, N ⟩Rwhich is the argument to the immediate call QX′(C2, ⟨K1, B∪C2, P, N ⟩R)in line 17 is admissible, then there is a non-empty minimal conflict set w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R.

Proof. The premises of this lemma are the same as those of Lemma 4.6. Since line 16 is first reached after line 18 has been reached for the first time, it must hold that QX′(K1, ⟨K2, B ∪ K1, P, N ⟩R)in line 16 was called before line 18 has been reached. The reason for this to hold is the fact that only returns and no new calls to QX′can have been made between the first occurrence of line 18 and the next occurrence of line 16.

Therefore, the result  C2of the call QX′(K1, ⟨K2, B ∪ K1, P, N ⟩R)in line 16 is a minimal conflict set w.r.t.  ⟨K2, B ∪ K1, P, N ⟩Rdue to Lemma 4.6, prop. 3. As a consequence,  C2 ∪ B ∪ K1 ∪ UPviolates some  r ∈ Ror some  N ∈ N. As the DPI  ⟨K1, B ∪ C2, P, N ⟩Ris admissible by assumption, it holds that C2 ∪ B ∪ UPdoes not violate any  r ∈ Ror  N ∈ N. Hence,  K1must be invalid w.r.t.  ⟨·, B ∪ C2, P, N ⟩Rwhich implies that there must be a non-empty minimal conflict set S w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R.

By applying the argumentation of Lemmas 4.6, 4.10 and 4.11 recursively on the entire recursion tree, we can prove the correctness of QX′.

Lemma 4.12. If QX′(C, ⟨Korig, Borig, P, N ⟩R)is called in line 7 by Algorithm 1, it returns a non-empty minimal conflict set w.r.t.  ⟨Korig, Borig, P, N ⟩R.

Proof. If QX′(C, ⟨Korig, Borig, P, N ⟩R)is called in line 7 of Algorithm 1, it must be true, by Lemma 4.2, prop. 4.2 and Corollary 4.3, that  ⟨Korig, Borig, P, N ⟩Ris an admissible DPI for which a non-empty minimal conflict set exists. As a consequence, the premises of Lemma 4.6 are met for  ⟨Korig, Borig, P, N ⟩R.

There are two cases to consider: Either (a)  |Korig| ≤ 1or (b)  |Korig| > 1for the initial call to QX′(C, ⟨Korig, Borig, P, N ⟩R)in line 7. In case (a),  0 = |Korig| < 1cannot hold as there must be a non-empty minimal conflict set C w.r.t.  ⟨Korig, Borig, P, N ⟩Rdue to Lemma 4.2, prop. 4.2. Since  ∅ ⊂ C ⊆Korigmust hold for C, this would be a contradiction to  |Korig| = 0.

So,  |Korig| = 1holds in case (a). In this case, QX′returns  Korigimmediately in line 12, since C = ∅and thus the conditions checked in line 9 cannot be met. In this case,  Korigis indeed a non-empty minimal conflict set since for the DPI  ⟨Korig, Borig, P, N ⟩Rgiven as argument there is a non-empty minimal conflict set by Lemma 4.2, prop. 4.2. Therefore  ∅cannot be a conflict set w.r.t. this DPI whereby Korigis the only possible minimal conflict set due to  |Korig| = 1.

Case (b): In this case, a direct return can neither take place in line 10 by  C = ∅nor in line 12 by |Korig| > 1. So, QX′is called recursively in lines 16 and 17. Since QX′terminates due to Lemma 4.2, prop. 5, QX′must reach line 18. The first time some recursive call QX′(C, ⟨K, B, P, N ⟩R)reaches line 18, it returns a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rdue to Lemma 4.9.By Lemma 4.10, as long as line 16 is not reached, i.e. no “left branch” (call to QX′in line 16) but only “right branches” (calls to QX′in line 17) return, a minimal conflict set S is returned for each call to QX′that “wraps” (is higher in the recursion tree than) the call that was the first to reach line 18. It holds that  S ̸= ∅since S is a union of sets including the non-empty set returned when line 18 was first reached.

When it comes to an execution of line 16, i.e. the left branch returns, then the algorithm will take the right branch by executing line 17, i.e. calling QX′(C2, ⟨K1, B ∪ C2, P, N ⟩R), and go downwards in the recursion tree.

Now, there are two cases. First,  ⟨K1, B ∪ C2, P, N ⟩Ris non-admissible. Then, by Lemma 4.7, there is only one minimal conflict set w.r.t.  ⟨K1, B ∪C2, P, N ⟩R, namely  ∅, and QX′(C2, ⟨K1, B ∪C2, P, N ⟩R)directly returns  ∅. As also the result  C2of the call to QX′(K1, ⟨K2, B∪K1, P, N ⟩R)immediately before in line 16 is a minimal conflict set w.r.t.  ⟨K2, B∪K1, P, N ⟩R, as established above, we can apply Lemma 4.3 to derive that indeed a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ris returned in line 18. Thus, Lemma 4.10 can be further applied to move upwards in the recursion tree until line 16 occurs again.

Second,  ⟨K1, B ∪ C2, P, N ⟩Ris admissible. Then, by Lemma 4.11, there is a non-empty minimal conflict set w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R. Hence, Lemma 4.6 can be used again for the subtree of the recursion tree rooted at the call QX′(C2, ⟨K1, B ∪ C2, P, N ⟩R). That is, it can be used to show that each call to QX′within this subtree returns a minimal conflict set w.r.t. the DPI given as argument as long as the algorithm moves downwards in the tree. Having reached line 18 for the first time, Lemma 4.9 lets us conclude again that a non-empty conflict set w.r.t. the respective argument DPI is actually returned at this place. Subsequently, Lemma 4.10 can be applied to show that each return gives back a minimal conflict set w.r.t. the argument DPI of the respective call, as long as the algorithm moves upwards in the recursion tree.

What is still open is to show that the call QX′(C2, ⟨K1, B ∪ C2, P, N ⟩R)in line 17 that is made immediately after the algorithm first reached line 16 after moving upwards after reaching line 18 for the first time returns a minimal conflict set w.r.t.  ⟨K1, B ∪ C2, P, N ⟩R, indeed. This holds by the fact that Lemmas 4.6 and 4.10 guarantee that a left branch always returns a minimal conflict set, Lemma 4.11 guarantees that Lemmas 4.6 and 4.10 can be applied after making a single right branch. However, as QX′terminates the recursion tree is finite and thus the case must arise where the right branch directly returns. In case the DPI  ⟨K, B, P, N ⟩Rgiven as argument for this right branch is non-admissible, the only minimal conflict set  ∅is returned, as established above. If the DPI  ⟨K, B, P, N ⟩Rgiven as argument for this right branch is admissible, on the other hand, then we have already shown above that there is a non-empty minimal conflict set w.r.t. this DPI. Moreover, |K| = 1 must hold due to the fact that this right branch directly returns (without entering a further recursion). Therefore, K is returned which is actually a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ras K is the only non-empty subset of K.

Proposition 4.9. Let  ⟨K, B, P, N ⟩Rbe a DPI. Then, QX(⟨K, B, P, N ⟩R)terminates and returns

’no conflict’ iff there is no conflict w.r.t.  ⟨K, B, P, N ⟩R(K is valid w.r.t.  ⟨·, B, P, N ⟩R)

• ∅iff  ∅is the only minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R(DPI is non-admissible)

a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Riff there is a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R(DPI is admissible and K is invalid w.r.t.  ⟨·, B, P, N ⟩R).

Proof. The proposition is a direct consequence of Lemma 4.2 and Lemma 4.12.

4.5 Hitting Set Tree Based Diagnosis Computation

One way to compute minimal diagnoses from minimal conflict sets is to use a hitting set tree algorithm which was originally proposed by Reiter [Rei87]. In this work we describe methods for non-interactive and interactive diagnosis computation based on the ones used in [FS05, SF10, SFFR12] which are closely related to the original hitting set tree algorithm. Differences of the described non-interactive algorithm to the original one of Reiter are

1. the usage of different edge weights (probabilities) inducing an order of node generation (uniform-cost) different to breadth-first and

2. the opportunity to specify an execution time threshold t as well as a minimal (nmin) and maximal (nmax) desired number of minimal diagnoses to be computed by the algorithm.

In this vein, the algorithm computes at least the  nminmost-probable minimal diagnoses w.r.t. the given probabilities and goes on computing further next most-probable minimal diagnoses until either overall computation time reaches the time limit t or  nmaxdiagnoses have been computed.

Such a time threshold and an interval of minimal and maximal number of diagnoses is particularly relevant in settings where not all potential minimal faulty sets need to be computed, such as iterative, interactive settings where reaction time is crucial (since a user is waiting to interact with the system). Instead, in such settings only a “representative” set of minimal diagnoses is exploited to decide which question to ask a user such that the answer to that question allows the constructed partial tree to be pruned. After pruning, the tree is expanded again to compute another “representative” set of minimal diagnoses. Such an interactive KB debugging algorithm will be presented in Part II. The non-interactive version of the KB debugging algorithm is delineated by Algorithm 2 and described next.

Inputs. The algorithm takes as input an admissible DPI  ⟨K, B, P, N ⟩R, some computation timeout t, a desired minimal (nmin) and maximal (nmax) number of minimal diagnoses to be returned, and a function p : K → (0, 0.5)that assigns to each formula  ax ∈ Ka weight that represents the (estimated) likeliness of ax to be faulty and thereby determines the search strategy, e.g. breadth-first or uniform-cost. Within the algorithm, p() is used to impose an order on open nodes that tells the algorithm which node to expand next. Details concerning the function p() will be discussed in Section 4.6 after demonstrating various ways of obtaining information relevant to p() and detailing how p() can be defined by means of such information. Throughout the rest of the current Section 4.5 we assume that p() implies a first-in-first-out sorting of open nodes, i.e. a breadth-first search strategy as described in [Rei87].

4.5.1 Breadth-First Diagnosis Computation

Algorithm Overview and Implementation Remarks. To compute minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rfrom minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R, the algorithm produces a labeled tree where a non-closed node is labeled by a minimal conflict set and a closed node is labeled by either valid or closed. From a non-closed node labeled by a minimal conflict set  C = {ax p, . . . , ax q}there are |C| outgoing edges, each labeled by one  ax ∈ Cand each leading to a new node that needs to be labeled. Closed nodes are leaf nodes of the produced tree, i.e. they have no successor nodes, and correspond to non-minimal or duplicate hitting sets (label closed) or to minimal hitting sets (label valid) of all minimal conflict sets w.r.t. the input DPI  ⟨K, B, P, N ⟩R. Conflict sets to label nodes are computed only on-demand for time efficiency after the attempt to reuse an already computed one fails. In case an appropriate order of node labeling (e.g. breadth-first tree construction) is used, the complete tree given when all nodes in the tree are closed contains all minimal diagnoses w.r.t. the DPI  ⟨K, B, P, N ⟩Rprovided as input. In this complete tree, the set of edge labels on each path from the root node to a node labeled by valid is a minimal diagnosis.

What Algorithm 2 actually does is building up a pruned HS-tree for a given DPI. So, we next provide formal definitions of a (partial) HS-tree and a (partial) pruned HS-tree based on the definitions given in [Rei87].

Definition 4.7 (HS-Tree). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI. An edge-labeled and node-labeled tree T is called an HS-tree w.r.t.  ⟨K, B, P, N ⟩Riff it is a smallest tree with the following properties:

1. The root of T is labeled by valid if K is valid w.r.t.  ⟨·, B, P, N ⟩R. Otherwise, the root is labeled by a conflict set w.r.t.  ⟨K, B, P, N ⟩R.

2. If n is a node of T, define H(n) to be the set of edge labels on the path in T from the root node to n. If n is labeled by valid, it has no successor nodes in T. If n is labeled by a conflict set C w.r.t. ⟨K, B, P, N ⟩R, then for each  ax ∈ C, nhas a successor node  naxjoined to n by an edge labeled by ax. The label for  naxis a conflict set  C′w.r.t.  ⟨K, B, P, N ⟩Rsuch that  C′ ∩ H(nax) = ∅if such a set  C′exists. Otherwise,  naxis labeled by valid.

T is called a partial HS-tree w.r.t.  ⟨K, B, P, N ⟩Riff T is a HS-tree w.r.t.  ⟨K, B, P, N ⟩Rwhere not all nodes in T are labeled and non-labeled nodes have no successors.

Definition 4.8 (Pruned HS-Tree). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI. An edge-labeled and node-labeled tree T is called a pruned HS-tree (pHS-tree) w.r.t.  ⟨K, B, P, N ⟩Riff T is the result of constructing an HS-tree w.r.t.  ⟨K, B, P, N ⟩Rwith due regard to the following rules:

1. Label nodes in the HS-tree in breadth-first order.

2. Use only minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rto label nodes in T.

3. Reusing node labels: If node n is labeled by C and  n′is a node such that  H(n′) ∩ C = ∅, label  n′by C.

4. Non-minimality pruning rule: If node n is labeled by valid and node  n′is such that  H(n) ⊆ H(n′),label  n′by closed.

5. If node n is labeled by closed, it has no successors.

6. Duplicate pruning rule: If node n is next to be labeled and there is some node  n′such that  H(n′) =H(n), then label n by closed.

T is called a partial pruned HS-tree iff T is a pruned HS-tree where not all nodes in T have been labeled yet and non-labeled nodes have no successors.

Remark 4.1 Notice that we use a definition of a pruned HS-tree that slightly differs from the definition given in [Rei87] in that we inherently assume that only minimal conflict sets w.r.t. the given DPI are used to label nodes in the tree. Therefore we could omit the last rule in the definition of [Rei87]. Namely, such a situation where some node has been labeled by a subset of the label of another node cannot arise in our definition since no minimal conflict set can be a subset of another different minimal conflict set w.r.t. the same DPI.

In general, there are multiple different pHS-trees w.r.t. one and the same DPI [GSW89]. Reason for this is that

the order of adding successor nodes (on the same tree level) to the queue Q and

which of generally multiple minimal conflict sets to (re)use to label a node

is not determined by Definition 4.8.

By [Rei87, Theorem 4.8] and Proposition 4.6, the following holds:

Proposition 4.10. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and T a pHS-tree w.r.t.  ⟨K, B, P, N ⟩R. Then, {H(n) | n is a node of T labeled by  valid} = mD⟨K,B,P,N⟩R, i.e. the set of all minimal diagnoses w.r.t. ⟨K, B, P, N ⟩R.

Remark 4.2 A node nd in Algorithm 2 is defined as the set of formulas that label the edges on the path from the root node to nd. In other words, we associate a node n with H(n). In this vein, Algorithm 2 internally does not store a labeled tree, but only “relevant” sets of nodes and conflict sets. That is, it does not store any

non-leaf nodes,

labels of non-leaf nodes, i.e. it does not store which minimal conflict set labels which node,

edges between nodes,

labels of edges and

leaf nodes labeled by closed.

Let T denote the (partial) pHS-tree produced by Algorithm 2 at some point during its execution (Corollary 4.4 will show that Algorithm 2 using breadth-first search in fact produces a (partial) pHS-tree). Then, Algorithm 2 only stores

a set of nodes  Dcalcwhere each node corresponds to the edge labels along a path in T leading to a leaf node that has been labeled by valid (minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R),

a list of open (non-closed) nodes Q where each node in Q corresponds to the edge labels along a path in T leading from the root node to a leaf node that has been generated, but has not yet been labeled and

the set  Ccalcof already computed minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rthat have been used to label non-leaf nodes in T.

We call  ⟨Dcalc, Q, Ccalc⟩the relevant data of T. If T is a pHS-tree, then Q is the empty list.

This internal representation of the constructed (partial) pHS-tree by its relevant data does not constrain the functionality of the algorithm. This holds as diagnoses are paths from the root, i.e. nodes in the internal representation, and the goal of a (partial) pHS-tree is to determine minimal diagnoses w.r.t. the given DPI. The node labels or edge labels along a certain path and their order along this path is completely irrelevant when it comes to finding a label for the leaf node of this path. Instead, only the set of edge labels is required for the computation of the label for a leaf node. Also, to rule out nodes corresponding to non-minimal diagnoses, it is sufficient to know the set of already found diagnoses  Dcalc. No already closed nodes are needed for the correct functionality of Algorithm 2.

Initialization. First, Algorithm 2 initializes the variable  tstartwith the current system time (GETTIME), the set of calculated minimal diagnoses  Dcalcto the empty set and the ordered queue of open nodes Q to a list including the empty set only (i.e. only the unlabeled root node).

The Main Loop. Within the loop (line 5) the algorithm gets the node to be processed next, namely the first node node (GETFIRST, line 6) in the list of open nodes Q ordered by the function  pnodes()and removes node from Q (DELETEFIRST, line 7). Note that  pnodes()can be directly obtained from p(). As mentioned before, for the moment the reader should simply suppose that  pnodes()imposes an order on Q which effectuates a breadth-first labeling of open nodes in the tree. A definition of  pnodes()will be given by Definition 4.9 after a motivation and detailed explanation of  pnodes()will have been given in Section 4.6.

Computation of Node Labels. Then, a label is computed for node in line 8. Nodes are labeled by valid, closed or a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rby the procedure LABEL (line 18 ff.). This procedure gets as inputs the DPI  ⟨K, B, P, N ⟩R, the current node node, the set of already computed minimal conflicts (Ccalc) and minimal diagnoses (Dcalc) and the queue Q of open nodes, and it returns an updated set of computed minimal conflicts  Ccalcand a label for node. It works as follows:

A node node is labeled by closed iff (a) there is an already computed minimal diagnosis D in  Dcalcthat is a subset of this node, i.e.  D ⊆ node, which means that node cannot be a minimal diagnosis (non-minimality criterion, lines 19-21) or (b) there is some node nd in the queue of open nodes Q such that node = nd which means that one of the two tree branches with an equal set of edge labels can be closed, i.e. removed from Q (duplicate criterion, lines 22-24).

If none of these closed-criteria is met, the algorithm searches for some C in  Ccalc, the set of already computed minimal conflict sets, such that  C ∩ node = ∅and returns the label C for node (reuse criterion, lines 25-27). This means that the path represented by node cannot be a diagnosis as there is (at least) one minimal conflict set, namely C, that is not hit by node.

If the reuse criterion does not apply, a call to QX(⟨K \ node, B, P, N ⟩R)is made (line 28) in order to check whether there is a not-yet-computed minimal conflict set that is not hit by node. Note that the KB K \ node that is given to QX as part of the argument DPI ensures that only minimal conflict sets C ⊆ K\nodecan be computed, i.e. ones that do not share any single formula with node (cf. Section 4.4.1).

Remark 4.3 A minimal conflict set computed by QX(⟨K \ node, B, P, N ⟩R)is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rindeed since (i) QX(⟨K \ node, B, P, N ⟩R)returning a set C means that C is a minimal conflict set w.r.t.  ⟨K \ node, B, P, N ⟩Rby Proposition 4.9 and (ii) the “⇒” direction of Corollary 4.1 implies that C is not valid w.r.t.  ⟨·, B, P, N ⟩Rand (iii) the “⇐” direction of Corollary 4.1 lets us conclude that C is a minimal conflict w.r.t.  ⟨X, B, P, N ⟩Rwhere X is any superset of C, in particular X := K.

QX may then return (a) ’no conflict’, i.e. K \ node is already valid w.r.t.  ⟨·, B, P, N ⟩R, or (b) a new conflict set  L ̸= ∅such that  L /∈ Ccalc. Note that the case of the output  L = ∅of QX cannot arise since (i) the DPI provided as input to the algorithm is assumed to be admissible, (ii) no other DPI for which QX is called can be non-admissible since admissibility is defined only by the sets B, P, N , R which remain unmodified throughout the execution of Algorithm 2, and (iii) as per Proposition 4.9, QX returns  ∅only if the DPI given to it as an argument is non-admissible. Further on, we point out that the conflict set L in case (b) must be a new conflict set since the reuse criterion is always checked before the call to QX and thus must be negative. That is, each  C ∈ Ccalcis hit by node and L is not hit by node wherefore  L ̸= Cmust hold for all  C ∈ Ccalc.

In each of the described cases, the LABEL procedure returns a tuple including the respective label as explained and the set  Ccalcwhere  Ccalcis equal to the input argument  Ccalcin all cases except for the case where a new minimal conflict set is computed by QX. In this case, the newly computed conflict set is added to  Ccalc(line 32) before the procedure returns.

Processing of a Node Label. Back in the main procedure,  Ccalcis updated (line 9) and then the label L returned by procedure LABEL is processed as follows:

If L = valid, then there is no minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rthat is not hit by (i.e. has an empty intersection with) the current node node. Thus, node is added to the set of calculated minimal diagnoses  Dcalc. Minimality of diagnoses added to  Dcalcis guaranteed by the pruning rule (lines 19- 21) which eliminates non-minimal nodes (paths) and the way the tree is built level by level by the used breadth-first strategy. In case a uniform-cost variant of tree construction is used, certain properties of the function p() need to be postulated to preserve this minimality guarantee. We discuss these properties in Section 4.6.

If, on the other hand, L = closed is the returned label of the procedure LABEL, then there is either a minimal diagnosis in  Dcalcthat is a subset of the current node node or a duplicate of node is already included in Q. Consequently, node must simply be removed from Q which has already been executed in line 7.

In the third case, if a minimal conflict set L is returned in line 8, then L is a label for node meaning that |L| successor nodes of node need to be added to Q in sorted order using the function  pnodes()(INSERTSORTED, line 15), as will be explained in more detail in Section 4.6.

Recap. To summarize, in each iteration, the node node that is the first element of the queue Q is deleted from Q and,

1. if node is a diagnosis, it is added to the set  Dcalc

2. if there is some diagnosis in  Dcalcthat is a proper subset of node or node is equal to some other node in Q, no action is performed, i.e. the algorithm deletes node without substitution

3. if there is some minimal conflict set that node does not hit, then such a conflict set C is computed and for each  ax ∈ Ca new node  node ∪ {ax}is added to Q.

We call each node nd that is added to Q in the latter case a successor of the node node.

4.5.2 Correctness of Breadth-First Diagnosis Computation

For the discussion of the output of Algorithm 2 we will exploit the following result saying that Algorithm 2 computes all and only minimal diagnoses, if it executes until the queue of open nodes becomes the empty set.

Proposition 4.11 (Soundness and Completeness of Algorithm 2 using Breadth-First Search). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI given as input to Algorithm 2. If Algorithm 2 using a breadth-first tree construction strategy terminates due to Q = [], then the algorithm returns exactly the set of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Proof. This proposition is a consequence of Proposition 4.10 and the following Lemma 4.13 which witnesses that Algorithm 2 using a breadth-first tree construction strategy produces a pHS-tree as per Defi-nition 4.8.

Lemma 4.13. Algorithm 2 with the admissible input DPI  ⟨K, B, P, N ⟩Rusing a breadth-first tree construction strategy is a procedure for producing a pHS-tree T w.r.t.  ⟨K, B, P, N ⟩R.

Proof. We verify whether all rules given by Definitions 4.7 and 4.8 are satisfied by Algorithm 2.

Definition 4.7, rule 1: The root node  ∅which is the only element of the initial list Q is labeled by the first call to LABEL for  node := ∅in line 8. If valid is returned, then QX(⟨K, B, P, N ⟩R)must have returned ’no conflict’ which is the case if K is valid w.r.t.  ⟨·, B, P, N ⟩R.

Otherwise, if valid is not returned by LABEL, then some minimal conflict set L w.r.t.  ⟨K, B, P, N ⟩Rmust have been returned in line 33. L is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rby Proposition 4.9 and since QX(⟨K, B, P, N ⟩R)has not returned ’no conflict’ as otherwise valid would have been returned contradicting our assumption and since  ⟨K, B, P, N ⟩Ris an admissible DPI by assumption. LABEL cannot have returned earlier in line 21 or line 24, since  Dcalcis the empty set and Q the empty list at this time. The former holds since  Dcalcis only extended in line 11 which cannot ever have been reached before the first call to LABEL has returned. The latter holds as Q initially contained only  ∅and as  ∅was deleted from Q in line 7 before the call to LABEL was made in line 8.

Definition 4.7, rule 2: Suppose a node node is labeled by valid, then it is added to  Dcalcin line 11. Since node can only get a label different from closed if it is the only exemplar of this node in Q due to the duplicate criterion (lines 22-24), it must be the case that  node /∈ Q(line 7) after node has been labeled by valid. Only nodes that get labeled by a conflict set can have successor nodes added to Q in line 15. Only nodes in Q can get a label (cf. lines 6 and 8). For node to be added to Q at some later point in time there must be a proper subset of node that is still in Q as each node newly added to Q is a proper superset of some node in Q (cf. line 15 which is the only position in the algorithm where nodes are added to Q). This is impossible due to the breadth-first tree construction strategy which implies that all nodes of cardinality  |node| − 1have already been labeled (and thus deleted from Q in line 7) when node is being labeled. Hence, if node is labeled by valid, then it has no successors.

If node is labeled by some conflict set L, then Algorithm must come to line 15, where a successor node ∪ {e}is added to Q for all  e ∈ L.

How node  nodee := node ∪ {e}must be labeled is overridden by the rules 3, 4 and 6 of Defini-tion 4.8 (see below).

Definition 4.8, rule 1: This is true by our assumption about p() and  pnodes().

Definition 4.8, rule 2: This holds since QX(⟨K\node, B, P, N ⟩R)computes only minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R(cf. Remark 4.3).

Definition 4.8, rule 3: All minimal conflict sets that have been used to label nodes so far are stored in  Ccalc. Before a minimal conflict to label node might be computed by a call to QX in line 28, the reuse criterion in lines 25-27 checks whether there is a set C in  Ccalcwith  C ∩ node. If positive, C is returned as a label for node.

Definition 4.8, rule 4: This is accomplished by the non-minimality criterion in lines 19-21 which checks for existence of a node already labeled by valid which is a subset of the node to be labeled right now. All nodes labeled by valid are stored in  Dcalc(cf. lines 10 and 11).

Definition 4.8, rule 5: If some node node is labeled by closed, then no action is performed (cf. line 12). Before each node is labeled in line 8, it is deleted from Q in line 7. That node cannot be inserted into Q at some later point in time follows from the argumentation used above to demonstrate that Definition 4.7, rule 2 is met.

Definition 4.8, rule 6: This is achieved by the duplicate criterion in lines 22-24 where Q is browsed for some node equal to the one that is to be labeled right now. When some node node is next to be labeled, then all duplicates of node must already be in Q as reasoned above in the argumentation to show that Definition 4.7, rule 2 is satisfied. Thus, the criterion must search for duplicates in no other collections than Q. Indeed, only one (i.e. the last non-deleted) exemplar of these duplicates of node in Q can get a label other than closed due to the duplicate criterion which closes duplicates as long as there are any.

We conclude that Algorithm 2 is a procedure for constructing a pHS-tree.

By Proposition 4.11 and the fact that there is no place in Algorithm 2 where nodes are removed from Dcalc(which implies that only minimal diagnoses can be added to  Dcalc), the following corollary is obvious.

Corollary 4.4. Algorithm 2 with the admissible input DPI  ⟨K, B, P, N ⟩Rusing a breadth-first tree construction strategy stores by  ⟨Dcalc, Q, Ccalc⟩the relevant data of

a pHS-tree w.r.t.  ⟨K, B, P, N ⟩Rif Algorithm 2 stops due to Q = [],

a partial pHS-tree w.r.t.  ⟨K, B, P, N ⟩Rotherwise.

If a pHS-tree is computed in breath-first order, minimal diagnoses are generated with increasing cardinality, as the following Corollary 4.5 attests. Consequently, for the generation of all minimum cardinality diagnoses, only the first level of the tree has to be generated, where a node is labeled.

Corollary 4.5. The following holds for the set D returned by Algorithm 2 using breadth-first search: If D contains some diagnosis of cardinality k, then it includes all diagnoses w.r.t.  ⟨K, B, P, N ⟩Rof cardinality lower than k.

Proof. By Proposition 4.11, it is a fact that Algorithm 2 computes all and only minimal diagnoses w.r.t. ⟨K, B, P, N ⟩R. As these are computed in breadth-first order, the first computed diagnoses must be the minimum cardinality ones. To see this, assume that Algorithm 2 returns D which includes one nonminimum cardinality diagnosis D and does not comprise a minimum cardinality diagnosis  D′, i.e. |D| > |D′|. By breadth-first search, nodes are labeled in ascending order of their cardinality. And, if the first node of cardinality k is labeled, no more nodes of cardinality  k−1can be in Q (cf. proof of Lemma 4.13). So, we have that the pHS-tree obtained by further execution of the algorithm until Q = [] can never label D′since  |D| > |D′|and D has already been labeled. Hence, the algorithm would not return  D′in its final output D. Since each minimum cardinality diagnosis is a minimal diagnosis,  D′is a minimal diagnosis. Thus, we have a contradiction to the fact that the algorithm computes all minimal diagnoses.

Output. The repeat-loop is iterated until the stop criterion (line 16) applies. In case at least  nminminimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rexist, there are two cases:

If the finding of the  nmin-th minimal diagnosis happens after  t′ < ttime has passed since the start of Algorithm 2, then the algorithm will continue iterating and terminate only if execution time amounts to at least t time or  |D| = nmaxat the time line 16 is processed.

Otherwise, if the detection of the  nmin-th minimal diagnosis takes place after processing longer than t time, then the algorithm will terminate immediately after having determined the  nmin-th minimal diagnosis.

In both cases, the output is a set D of minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rsuch that  nmin ≤ |D| ≤nmaxand D is the set of best minimal diagnoses as per p(), in this case the set of minimal diagnoses with minimum cardinality since p() is assumed to be specified as to cause a breadth-first tree construction.

If fewer than  nminminimal diagnoses exist w.r.t.  ⟨K, B, P, N ⟩R, then Q = [] will be the cause for the algorithm to terminate. In this case, the pHS-tree w.r.t.  ⟨K, B, P, N ⟩Rhas been built up and all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rare stored in  Dcalc. Thus, the output is the set  mD⟨K,B,P,N⟩Rof all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Termination. The next proposition shows that Algorithm 2 must yield a set of minimal diagnoses after finite time.

Proposition 4.12. Algorithm 2 always terminates.

Proof. This is due to the fact that minimal conflict sets used to label non-leaf nodes are subsets of K and that nodes in Q are subsets of K, which is a finite set by Definition 3.1. Moreover, a node in Q is either deleted without substitution from Q if valid or closed (line 7) or deleted (line 7) and replaced by proper supersets of it (INSERTSORTED in line 15). This means that the cardinality of all nodes in Q is strictly monotonically increasing. Thus each node (path) node is guaranteed to be closed (valid or closed) when node = K as in this case node must hit all possible (minimal) conflict sets  Ciw.r.t.  ⟨K, B, P, N ⟩Rsince Ci ⊆ Kholds by Definition 4.1. So, after finite time the queue Q definitely becomes the empty list which is a stop criterion (line 16).

The argumentation so far proves the following

Proposition 4.13. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI,  t, nmin, nmax ∈ Nand  p : K → (0, 0.5)de-fined in a way that Q is always ordered first-in-first-out. For these inputs, Algorithm 2 always terminates and returns a set D of minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rwhich is

the set of the |D| minimal diagnoses of minimum cardinality w.r.t.  ⟨K, B, P, N ⟩R(i.e. the first |D| elements in  mD⟨K,B,P,N⟩Rif  mD⟨K,B,P,N⟩Ris assumed to be sorted in ascending order by cardinality) such that  nmin ≤ |D| ≤ nmax, if at least  nminminimal diagnoses exist w.r.t.  ⟨K, B, P, N ⟩R, or

the set of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R, otherwise.

4.6 Diagnosis Probability Space

The induction of a probability space [Dur10] over diagnoses facilitates incorporation of well-established probability theoretic methods into the process of KB debugging; for example, a Bayesian approach [SFFR12, RSFF13, dKW87] for identifying the true diagnosis, i.e. the one which leads to a solution KB with the desired semantics, by repeated measurements (see Part II). Let the true diagnosis be denoted as  Dtin the sequel.

The Probability Space of All Diagnoses. From the point of view of probability theory, a diagnosis can be viewed as an atomic event in a probability space  ⟨Ω, E, p⟩defined as follows:

 Ωis the sample space consisting of all possible diagnoses w.r.t. a DPI  ⟨K, B, P, N ⟩R, i.e.  Ω =aD⟨K,B,P,N⟩R,

• Eis a sigma-algebra on  Ω, in our case the powerset  2Ωof  Ω, and

p is a probability measure assigning a probability to each event in E, i.e.  p : E → [0, 1]such that �ω∈Ω p({ω}) = 1which means  �D∈aD⟨K,B,P,N⟩R p({D}) = 1.

So, p({D}) for  D ∈ aD⟨K,B,P,N⟩Rcan be seen as the probability that D is the true diagnosis, i.e. the probability of the event  Dt = D(or  Dt ∈ {D}). Consequently, p({D}) for  D ∈ aD⟨K,B,P,N⟩Ris the probability distribution of the random variable  Dt, i.e. the probability distribution of the true diagnosis. In this vein, the probability of a set  {Di, . . . , Dj} ∈ Eis interpreted as the likeliness of this set to comprise the true diagnosis  Dt. That is,  p({Di, . . . , Dj}) = p(Dt ∈ {Di, . . . , Dj}) = p(Dt = Di ∨ · · · ∨ Dt =

image

Dj) = 0.3means that  Dtis an element of  {Di, . . . , Dj}with 30% probability. Note that singletons are often written without curly braces, i.e.  p({Di})is usually written as  p(Di); we will also do so in the rest of this work.

The elements of the sample space  Ωof a probability space are often called atomic events because they must be mutually exclusive (i.e. two atomic events cannot “happen” at the same time as an outcome of the fictive experiment a probability space describes) and exhaustive (i.e. for each “execution” of the experiment the probability space describes one atomic event must “happen”). Since the true diagnosis  Dtmust be a diagnosis w.r.t.  ⟨K, B, P, N ⟩Rand  Ωby definition comprises all such diagnoses, exhaustiveness is clearly fulfilled. Mutual exclusiveness is a consequence of the fact that each diagnosis D gives complete information about the correctness of each formula  ax k ∈ K. In other words,  Dt ∈ {D}is a shorthand for the statement that all  ax i ∈ Dare faulty and all  ax j ∈ K \ Dare correct. Thus, any two different diagnoses are mutually exclusive events, i.e.  Dt = Diimplies  Dt ̸= Djfor all  Dj ∈ aDsuch that Di ̸= Dj.

The probability measure p is completely defined if a probability p(D) for each diagnosis  D ∈ Ωis given. Then, by the mutual exclusiveness of events  Dt ∈ {Di}and  Dt ∈ {Dj}for  Di ̸= Dj, the probability

image

for each event  E ∈ E.

Restricted Probability Spaces of Diagnoses. In many cases, only a restricted set of diagnoses w.r.t. a DPI is considered relevant for the debugging task. That is, the focus is on locating the true diagnosis among a predefined subset of all diagnoses  aD⟨K,B,P,N⟩R. This involves an adaptation of the probability space, in particular of the set  Ω. For instance, if not the set of all, but only the set of minimal diagnoses mD⟨K,B,P,N⟩Rw.r.t.  ⟨K, B, P, N ⟩Rshould be considered by a debugging system – as motivated in Section 3.1 – then  Ω := mD⟨K,B,P,N⟩R. The other properties  E = 2Ωand �ω∈Ω p({ω}) = 1remain the same for each restricted probability space, but depend on  Ω. Thus, for example, a probability p(D) for D ∈ mD⟨K,B,P,N⟩R ⊆ aD⟨K,B,P,N⟩Rmust be generally defined differently, i.e. assigned a higher value, when  Ω = mD⟨K,B,P,N⟩Rinstead of  Ω = aD⟨K,B,P,N⟩R. This is due to the condition that all probabilities of atomic events in  Ωmust sum up to 1. In practice, because of the computational complexity of diagnosis computation, the used probability space will usually need to be restricted even further in that  Ωcomprises only a set of “leading diagnoses” which is a subset of all minimal diagnoses w.r.t. a DPI (see Chapter 7).

4.6.1 Construction of a Probability Space

Since a diagnosis constitutes an assumption about the correctness of each formula in the KB, the probability of a diagnosis D (to be the true diagnosis  Dt) can be computed by means of fault probabilities of formulas. In other words, computing the probability of the event  D = Dtcorresponds to computing the probability of the event that exactly all formulas in D are faulty and all other formulas in the KB are correct.

Estimating Fault Probabilities of Formulas in the KB

Next we discuss various possibilities of how the probability of an  ax ∈ Kmight be assessed. To this end, we first make a distinction between situations where some useful empirical data is available or not and then we differentiate between different sorts of such available data and how to take advantage of it.

Empirical Data is Accessible. Let us first reflect on how to utilize different empirical data sources in order to compute formula probabilities. Data can be of the following kinds (enumeration may not be complete):

(a) Regarding formulas: Change logs of formulas in the KB

(b) Regarding the user: Data about common mistakes of the user who has formulated the KB

Ad (a): Prerequisite for the availability of change logs of formulas in the KB is the usage of some KB engineering software with integrated logging or change management. Examples of such KB (ontology) developing environments are Protégé [NSD+00], Web Protégé [TNNM13], SWOOP [KPS+06], OntoEdit [SEA+02] or KAON2.19 Given a formula  ax ∈ Kand its change log, the fault probability p(ax) of this formula can be estimated by counting the number of modifications accomplished for ax in the change log. The intuition is, the more often ax has been altered, the more uncertain the (set of) author(s) might be about its correctness. This method of probability computation however suffers from a cold-start problem. If a KB is completely newly created, then such information is not available at all. On the other hand, for KBs that are being developed over a long period of time, this method can be assumed to be a rather reliable way of assessing the likeliness of formulas to be faulty.

Ad (b): Clearly, data about common mistakes of a user has to be related to some type of entity that is recurrent and not dependent on a particular KB. Formulas are therefore not suitable and too coarsegrained since one and the same formula will rarely occur in many KBs. More adequate entities to relate a user fault to are predicates (terms) and logical connectives – these usually (re-)appear in many different KBs. In this way, the extrapolation and reusability of collected personal fault information of a user within one KB and between different KBs is granted.

One way of obtaining data about common mistakes of user u on this syntactical level is, for instance, the examination of diagnoses got as a result of past debugging sessions performed on KBs authored by u. Another way is, again, to use the change logs (if available) of formulas in KBs user u has created in the past.

Given such a past diagnosis D, we know that all formulas  ax ∈ Dthat had been written by u have been confirmed to be faulty by a user. So, these formulas could be analyzed for contained predicates (terms) and logical connectives and the probability of being faulty of those syntactical constructs could be raised relative to those constructs that do not occur in formulas in D. At this, the following assumptions could be made:

If a formula has been confirmed to be faulty by the user, then the meaning of all predicates (terms) appearing in this formula is not correct (because in the domain that should be modeled the relationship between the predicates (terms) occurring in the formula stated by the formula must not hold). So, all predicates (terms) in ax get more suspicious of being faulty in general if  ax ∈ Dfor some past solution diagnosis D.

If a formula including some logical connective is part of some past solution diagnosis, then this type of logical connective gets more suspicious of being faulty in general.

When exploiting change logs of formulas authored by u, the following assumptions could be made:

If a formula has been modified, then a user has changed the meaning of all predicates (terms) appearing in this formula. So, all predicates (terms) in ax get more suspicious of being faulty in general if ax has been edited at least once. The more often it has been altered, the more suspicious the predicates (terms) get.

If some logical connective in a formula is modified, i.e. deleted or added, then this type of logical connective gets more suspicious of being faulty in general.

The following example should give an intuition of these assumptions:

Example 4.5 Imagine the situation where the author of formula  ax := ∀X pet(X) ↔ animal(X) ∧(∃Y hasOwner(X, Y ) ∧ person(Y ))is known to have only vague knowledge about the predicate pet and to frequently interchange  ∧and  ∨when formulating logical formulas. This could be reflected by the assignment of higher fault probability to the predicate pet than to the predicates animal, hasChild and person and by raising the fault probability of  ∧as well as  ∨compared to other logical connectives available in the used logic L. Then, formula ax should intuitively have a higher probability of being faulty than, e.g., formula  ax ′ = ∀X animal(X) → ¬person(X)since  ax ′does not include any of the “suspicious” terms or connectives as ax does.

A probability of 0.25 of some predicate (term) a occurring in K could then account for the observation made in the logs that, in past debugging sessions (not necessarily related to the current KB K), every fourth formula formulated by user u which includes the term a was modified at least once. Similarly, another term b could be assigned fault probability 0.5 which could reflect that formulas formulated by u including b have been altered twice as often as formulas formulated by u comprising a. Given additionally that a occurred in two formulas formulated by u of past diagnoses whereas b did not occur in any, the probability of a could be increased by some addend or factor to take account of this.

Concerning some logical connective, say  ∃, the observation that all past diagnosis formulas contained ∃and in 80% of formulas formulated by this user including  ∃the  ∃connective has been modified at least once, the fault probability of  ∃might be assigned rather high. In comparison, the probability of some other connective, say  ¬, occurring in no diagnosis and having been altered only in 10% of the formulas comprising  ¬, the probability of the  ¬connective might be estimated rather low.

A shortcoming of this approach is again a cold-start problem. If a user is new to conceptualizing knowledge in a structured logical manner or at least in the given logical language L, then no such (personalized) past diagnoses or change logs will be available. So, this issue especially concerns beginners who are usually anyhow more prone to errors than expert-users. On the positive side, utilization of such empirical data can yield to fault information that is very well tailored for the user and that can imply a significant reduction of computation time and user effort necessary for debugging of the KB at hand [SFFR12].

No Empirical Data is Available. If no data of the kinds (a) and (b) discussed above is available to a debugging system, then we have the following possibilities:

(c) Common fault patterns

(d) Subjective self-assessment of a user

(e) Examination of structural complexity of logical formulas

(f) Using no probabilities

Ad (c): A common fault pattern [RDH+04, CRV+09, KPSCG06], also called anti-pattern, refers to a set of formulas that either leads to an inconsistency (logical anti-pattern) or corresponds to a potential modeling error that – alone – does not lead to a inconsistency or incoherency (non-logical anti-pattern), but still might become a source of inconsistency if merged with other formulas (cf. Section 3.2). Although most of these patterns incorporate more than one formula which makes the individual consideration of a formula in terms of fault probability calculation difficult, an idea to incorporate knowledge about anti-patterns to probability estimation of formulas could be to count for each  ax ∈ Kin how many different (logical or non-logical) anti-patterns it occurs. The higher this count, the more likely a formula might be involved in a conflict set and thus in the true diagnosis.

A drawback of this method could be that most of the formulas involved in a KB might not correspond to any formula occurring in an anti-pattern. Thus, one might end up with no probability estimate for most of the formulas in a KB K. Besides that, the information provided by these anti-patterns is not personalized at all and therefore might significantly diverge from the true fault probabilities for a user and lead to a false bias in the used fault data. This justifies to basically rely on another approach to get a first estimate of a formula’s likeliness of being faulty and use this method only to make adaptations to already established probabilities.

Ad (d): The method of a user’s self-assessment of own fault probabilities supposes a user to be able to specify fault probabilities of predicates (terms), logical connectives or complete formulas by themselves. Since users not always have a clear picture of own strengths and weaknesses, this variant must be regarded with suspicion. Furthermore, in settings where several persons are involved in the engineering of the KB, a reasonable rating of fault probabilities of terms, connectives or formulas authored by other persons might be difficult or impossible for a user.

Ad (e): Here the idea is to examine “grammatical” (i.e. syntactical) aspects of formulas such as the “nesting depth” of subordinate clauses or the mere “length” of a formula. The underlying assumption can be that higher length and/or deeper nesting means higher complexity and cognitive difficulty in understanding of the formula’s semantics – as it does in natural language. For instance, it is reasonable to expect formulas like  ax 1 := ∀X a(X) → (∃Y r1(X, Y ) ∧ (∀Z r2(Y, Z) → b(Z)))to tend to be more error-prone and more likely to be faulty than  ax 2 := ∀X g(X) → b(X). This intuition is modeled by the maximum nesting depth as well as by the length of  ax 1in comparison to  ax 2. Using the analogy to natural language, the maximum nesting depth of a formula could roughly be defined as the maximum number of encapsulated subordinate clauses that cannot be “flattened” occurring in the natural language translation of the formula. For formula  ax 1, this would imply a maximum nesting depth of two; for  ax 2it would amount to zero. The reason is that  ax 1stated in natural language would sound “if somebody X is a, then there is somebody Y , who satisfies property  r1with X and for whom anybody, who sat-isfies property  r2with Y is b”. In this natural language formulation, there are two subordinate clauses, i.e. the clauses beginning with the word “who”; the first is at nesting depth one and the second at depth two. These subordinate clauses cannot be flattened, i.e. be brought to some lower depth, because the Z is related to the Y which in turn is related to the X. The length of formulas could be defined similarly as in [HPS08] which provides such a definition for DL languages. In this case the length of  ax 1and  ax 2would be four (roughly: four predicates in  ax 1) and two (two predicates in  ax 2), respectively.

A disadvantage of such a “grammatical” approach gets evident when most of the formulas in a KB are rather “simple”, i.e. have a low nesting depth and a short length. In such case this method will give little differentiation between different formulas and should thus be combined with another method of probability estimation in general.

Ad (f): In a situation where all the aforementioned ways of gauging probabilities do not apply or are believed to have a too high risk of introducing a false bias into the debugging system, the solution is to define all formulas to be equally probably faulty. The obvious pro of this is that the system cannot get misled by unreasonable fault probabilities whereas the con is that possibly well-suited probabilistic information cannot be exploited. Moreover, experiments in our previous work [SFFR12] have manifested that fault information of only “average” quality most often leads to a better performance than no fault information. Apart from that, we have suggested a reinforcement learning “plug-in” to a debugger which could successfully mitigate the negative effect of low-quality fault information and in many cases, in spite of the low-quality fault information, even led to lower resource consumption (user, time) than a debugger without this plug-in using good fault information [RSFF13].

Collaborative KB Development. In a collaborative development scenario involving several authors, provenance information could be additionally leveraged to refine probability estimates (cf. [KPSCG06]). At this point, user skills could come into play; that is, formulas authored by more experienced authors get a lower overall fault probability as opposed to beginners concerning KB engineering or logic skills or expertise in the modeled domain. This probability adaptation can also affect syntactical elements in that one and the same predicate (term) or logical connective can get a different probability depending on in which formula it occurs and who authored that formula.

Remark 4.4 Of course, these assumptions and methods of obtaining fault probabilities of syntactical elements and formulas are only some possible ways of doing so. For example, one might argue that the “authorship” of a formula is somewhat not clearly defined. What if user  u1has originally written formula ax and then user  u2alters the formula to become  ax ′? Who is the author of  ax ′? u1, u2or both? For whose fault probability computation should the renewed modification of  ax ′to  ax ′′count? Questions like this one need to be discussed and maybe evaluations using real data need to be accomplished in order to find a practical answer; or perhaps to find out that completely different approaches turn out to be reasonable. This is a topic of our future work.

Remark 4.5 By the definition of a DPI (Definition 3.1) stating that the KB K must be disjoint with the background knowledge B and the role B has within a DPI, namely to comprise all formulas that are definitely correct, we postulate that no formula  ax ∈ Kmust have a probability of zero. In a situation when this is not the case, a modified DPI must be used where such formulas have been moved from K to B.

Computation of Diagnosis Probabilities. In the following, we denote by ax (K) the set of logical connectives and quantifiers occurring in a formula ax (in the KB K) and by  �ax( �K) the signature of ax (of K).

Example 4.6 Considering the DL formula ax := Pet  ≡Animal  ⊓ ∃hasOwner.Person, we have that  ax = {≡, ⊓, ∃}and  �ax = {Pet, Animal, hasOwner, Person}.

We now suppose that either a fault probability  p(e) := p(“eis faulty”) of each element  e ∈ K ∪ �Kor the fault probability  p(ax) := p(“axis faulty”) of each formula  ax ∈ Kis given. For estimation of these probabilities any (combination) of the methods mentioned above might be employed. In case formula probabilities are given, diagnosis probabilities can be directly computed by Formula 4.3. Otherwise, the following pre-computations must be performed.

The fault probability p(ax) of ax can be calculated as the probability that at least one (occurrence of a) syntactical element in ax is faulty. So, p(ax) is equal to 1 minus the probability that none of the syntactical elements occurring in ax is faulty. Hence, under the assumption of mutual independence of syntactical faults concerning elements  e ∈ ax ∪ �ax,

image

where n(e) is the number of occurrences of syntactical element e in ax.

If p(ax) for all  ax ∈ Kis known, the fault probability p(D) of any diagnosis  D ∈ Ω ⊆ aD⟨K,B,P,N⟩Rcan be determined as the probability that each formula in D is faulty whereas each formula in K \ D is correct, i.e. not faulty. Thence,

image

Recall that probabilities of all atomic events in a well-defined probability space must sum up to 1. As not every subset of K is a diagnosis, this is in general not the case. Therefore, diagnosis probabilities need to be normalized, i.e. each diagnosis probability p(D) must be divided by the sum of all diagnosis probabilities for diagnoses in  Ω. That is, the following adjustment is necessary:

image

We want to emphasize that the probability measures p(e) of syntactical elements e and p(ax) of formulas ax are not required to satisfy any conditions except for  p(e) ∈ (0, 1]and  p(ax) ∈ (0, 1]for all e ∈ ax ∪ �axand all  ax ∈ K(see Remark 4.5 why the intervals (0, 1] are open). In particular, no normalization is needed. The reason for this is that “e is faulty” and “ax is faulty” are assumptions about a single logical connective and a single logical formula, respectively. “D is the true diagnosis”, to the contrary, is an assumption about each formula in the KB K. So, the probabilities of two different syntactical elements  ei ̸= ejare computed on the basis of two different probability spaces, namely Ωei = {“eiis faulty”, “eiis not faulty”} and  Ωej = {“ejis faulty”, “ejis not faulty”} which clearly do not depend on each other at all. The same argumentation holds for probabilities of formulas.

More Reliable Probabilities through Observations. As we argued before, the basic fault information from which diagnosis probabilities are deduced might be rather vague. A usual way of dealing with scenarios of that kind, is to regard the initial probabilities as a first (a-priori) estimation and to gather additional information, e.g. by making measurements or observations, and exploit this information to adapt the a-priori estimation in order to obtain a more reliable a-posteriori estimation. The more additional information has been accumulated and incorporated, the more realistic is the resulting updated estimation of probabilities.

A well-known technique enabling computation of a-posteriori probabilities from a-priori probabilities is Bayes’ Theorem. Let p(D) be the a-priori probability of some  D ∈ Ω ⊆ aD⟨K,B,P,N⟩Rand Obs be a new observation. Then, the a-posteriori probability p(D | Obs) of D, i.e. the probability that the true diagnosis  Dt = Dtaking into account the new information Obs, is computed according to Bayes’ Theorem as

image

where p(Obs) is the (a-priori) probability that observation Obs is made and p(Obs | D) is the (a-priori) probability that the observation Obs is made under the assumption that D is the true diagnosis, i.e. Dt = D. That is, the a-priori probability p(D), i.e. the probability that  Dt = Dwithout any additional knowledge, must be multiplied by p(Obs | D)/p(Obs) which is often referred to as the support Obs provides for D. If the support is greater than 1, then the a-posteriori probability of D is greater than its a-priori probability, otherwise the a-posteriori probability gets smaller after incorporating the new information Obs. Note that Bayes’ Theorem is only applicable to KB debugging if a suitable class of observations can be defined such that p(Obs) and p(Obs | D) can be computed for observations Obs of this class. As we shall see in Chapter 7, the assignment of test cases to either P or N is one such class of observations. For instance,  ti ∈ Pand  tj ∈ Nfor sets of formulas  ti, tjover L are two such observations.

4.6.2 Using Probabilities for Diagnosis Computation

If available, formula fault probabilities can be exploited during construction of the pHS-tree (Algorithm 2, Chapter 4) in that most probable instead of minimum cardinality diagnoses are calculated first. To achieve that, breadth-first construction of the tree must be replaced by uniform-cost order of node expansion by means of the function p() that assigns a fault probability to each formula  ax ∈ K. Thereby, the “probability” p(nd) of a node  nd = {ax s, . . . , ax t}in Algorithm 2 is defined through  p(ax), ax ∈ Kas

image

Notice that this formula extends the definition of Formula 4.3 to arbitrary subsets of K, not only diagnoses. Thus, Formula 4.3 is a special case of Formula 4.6.

First, note that we put “probability” of a node in quotation marks as, to be concise, each node (path) which is not yet a diagnosis, i.e. needs to be further expanded to become one, has probability zero (of being the true diagnosis  Dt). For, a probability space is defined on a set of diagnoses and not on a set of arbitrary subsets nd of the KB. However, we misuse the diagnosis probability space in this case to determine the probability of “pseudo-diagnoses” in order to impose an order on the queue of open nodes in the tree. This will guarantee the finding of the most probable diagnoses first, as we shall see below (Proposition 4.17).

Second, note that no normalization, i.e. application of Formula (4.4), is necessary within the scope of the non-interactive Algorithm 2 since the aim here is only the expansion of nodes nd in the order of p(nd) and the return of the most probable identified diagnoses at a certain point in time. For this, the comparison of the probability of one node nd with the probability of another node  nd′suffices. Thus, no other calculations using the properties of a probability space are performed by Algorithm 2. We shall recognize in Chapter 9 that this will not hold for the interactive Algorithm 5 where Formula (4.4) is essential.

So, nodes nd are inserted into Q in a way descending order of node probabilities in Q is always maintained. Consequently, nodes with highest fault probability are processed first. This is practical since a user will usually be most interested in seeing those possible faults first that have the highest (estimated) probability to be the actual fault they seek.

However, one needs to be careful when using probabilities as weights in order not to lose the property of Algorithm 2 to compute minimal diagnoses only. To this end, the formula probabilities p(ax) for all ax ∈ Kmust be adapted as

image

where the factor c is an arbitrary positive real number smaller than 0.5, e.g.  c := 0.49/ max{ax∈K}(p(ax)). This transformation effects that all probabilities p(ax) become smaller than 50%. In other words, each formula must be more likely to be correct than faulty which in turn means that a minimal diagnosis is more likely than any of its supersets.

Definition 4.9. Let  p : K → [0, 1]be some function that assigns to each  ax ∈ Ksome  p(ax) ∈[0, 1]. Then, we denote by  pnodes : 2K → [0, 1]the function that assigns to each node  nd ⊆ Ksome pnodes(nd) ∈ [0, 1]which is obtained by means of Formula 4.6 and p().

Lemma 4.14. Let  nd, nd′ ⊆ Kwhere  nd ⊂ nd′and  p : K → (0, 0.5)a function which assigns to each ax ∈ Ksome probability  p(ax) ∈ (0, 0.5). Then  pnodes(nd) > pnodes(nd′)holds.

Proof. According to Formula 4.6 and Definition 4.9 we have that

image

Then the probability  pnodes(nd′)can be computed from  pnodes(nd)in that, for each formula ax in  nd′ \nd ⊆ K \ nd, we multiply  pnodes(nd)by a factor  fax := p(ax)/(1 − p(ax))because ax “moves” from K \ nd to nd. However,  fax < 1holds due to p(ax) < 0.5 and thus  1 − p(ax) > 0.5.

This result will be a key to proving the completeness, soundness and correctness of Algorithm 2 in the next Section.

The next definition characterizes a (partial) weighted pHS-tree, the type of hitting set tree constructed by Algorithm 2 given any function  p(ax) ∈ (0, 0.5)for all  ax ∈ Kas input which is not necessarily specified in a way a breadth-first tree construction is forced.

Definition 4.10 (Weighted Pruned HS-Tree). Let  ⟨K, B, P, N ⟩Rbe an admissible DPI and let  w : K →[0, 1] be a weight function which assigns a weight to each node  n ⊆ Kwith the property that  w(n1) >w(n2)if  n1 ⊂ n2. An edge-labeled and node-labeled tree T is called a weighted pruned HS-tree (wpHStree) w.r.t.  ⟨K, B, P, N ⟩Rand w() iff T is the result of constructing an HS-tree w.r.t.  ⟨K, B, P, N ⟩Rwith due regard to the following rule

image

and the rules 2 to 6 as per Definition 4.8.

T is called a partial weighted pruned HS-tree w.r.t.  ⟨K, B, P, N ⟩Rand w() iff T is a weighted pruned HS-tree w.r.t.  ⟨K, B, P, N ⟩Rand w() where not all nodes in T have been labeled yet and non-labeled nodes have no successors.

Then, we have the following relationship between a (partial) pHS-tree and a (partial) wpHS-tree. An explanation why this holds will be given in Section 4.6.4.

Proposition 4.14. A (partial) pHS-tree w.r.t.  ⟨K, B, P, N ⟩Ris a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand w() where w() is a weight function which, additionally to the property postulated in Definition 4.10, satisfies  w(n1) = w(n2)if  |n1| = |n2|.

In general, a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand w() is not a (partial) pHS-tree w.r.t. ⟨K, B, P, N ⟩R.

Lemma 4.15. Algorithm 2 is a procedure for producing a wpHS-tree T w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes().

Proof. First, the property  pnodes(n1) > pnodes(n2)if  n1 ⊂ n2postulated by Definition 4.10 holds by Lemma 4.14 and the fact that the function p given as input to Algorithm 2 satisfies  p(ax) ∈ (0, 0.5)for all  ax ∈ K. Moreover, the DPI  ⟨K, B, P, N ⟩Rprovided as an input to Algorithm 2 is admissible, as postulated by Definition 4.10.

The compliance with rule 1 of Definition 4.7 as well as with rules 2 to 6 of Definition 4.8 is a simple consequence of Lemma 4.13. In the following we prove that rule 2 of Definition 4.7 and rule 1 of Definition 4.10 are satisfied.

Definition 4.7, rule 2: Suppose a node nd is labeled by valid. Then it is added to  Dcalcin line 11. Since nd can only get a label different from closed if it is the only exemplar of this node in Q due to the duplicate criterion (lines 22-24), it must be the case that  nd /∈ Q(line 7) after nd has been labeled by valid. Only nodes that get labeled by a conflict set can have successor nodes added to Q in line 15. Only nodes in Q can get a label (cf. lines 6 and 8). For nd to be added to Q at some later point in time there must be a proper subset of nd that is still in Q as each node newly added to Q is a proper superset of some node in Q (cf. line 15 which is the only position in the algorithm where nodes are added to Q). This is impossible since Q is ordered descending by  pnodes(). Hence, each proper subset of nd must have been ranked before nd in Q and thus must have already been labeled because nd is already labeled by assumption. Hence, if nd is labeled by valid, then it has no successors.

Definition 4.10, rule 1: That nodes are processed and labeled in order of descending  pnodes()follows from the fact that new nodes are inserted into Q only in a way that the order of Q by descending  pnodes()is maintained (INSERTSORTED in line 15) and by the fact that always the first element of Q is selected to be labeled next (GETFIRST in line 6).

This completes the proof.

Let the relevant data of a wpHS-tree be defined as for a pHS-tree (cf. Remark 4.2). By the correctness of Lemma 4.15, we have:

Corollary 4.6. Algorithm 2 stores by  ⟨Dcalc, Q, Ccalc⟩the relevant data of

a wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()if Algorithm 2 stops due to Q = [], and

a partial wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()otherwise.

4.6.3 Correctness of Weighted Diagnosis Computation

First, we show the completeness of Algorithm 2 regarding minimal diagnoses, i.e. that it computes all minimal diagnoses w.r.t. the DPI it is given as input.

Lemma 4.16. Only diagnoses w.r.t.  ⟨K, B, P, N ⟩Rcan be added to  Dcalcby Algorithm 2.

Proof. A node nd can be added to  Dcalconly in line 11. To reach this line, LABEL must have returned valid for nd. For this to hold, QX(⟨K \ nd, B, P, N ⟩R)must have returned ’no conflict’ which implies that nd is a diagnosis w.r.t.  ⟨K, B, P, N ⟩Rby Propositions 4.9 and 3.2.

Lemma 4.17. Let T denote a (partial) wpHS-tree produced by Algorithm 2. Further, let Q be the queue of open nodes in T maintained by Algorithm 2 and let nd be some node which occurs only once in Q and which is a proper subset of some minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R. Then:

(1) The nodes  ∅ = nd1, . . . , ndkalong any path from the root node  ∅to  ndkin T satisfy  ndi ⊂ ndi+1and  |ndi| + 1 = |ndi+1|and  ndi ⊆ Kfor  1 ≤ i ≤ k.

(2) If the LABEL function is called for nd, then it yields some minimal conflict set C w.r.t.  ⟨K, B, P, N ⟩Rwith  nd ∩ C = ∅.

Proof. (1): In the representation used by Algorithm 2, a node nd in the (partial) wpHS-tree T produced by Algorithm 2 is defined as the set of all edge labels on the path from the root node to nd (see Remark 4.2) and the successor of a node is defined as a node added to Q after nd has been labeled by a minimal conflict set.After the LABEL function for node nd has returned some minimal conflict set L as a label for nd, Algorithm 2 goes to line 15 since  L ̸= closedand  L ̸= validand adds an element  nd ∪ {e}to Q for each  e ∈ L. Therefore, it holds that  |nd ∪ {e} | = |nd| + 1for each successor of nd. Hence,  ndi ⊂ ndi+1and  |ndi| + 1 = |ndi+1|holds for any path of nodes  ∅ = nd1, . . . , ndkin T starting from the root node.

The argumentation why each node must be a subset of K is as follows: Suppose  node ∪ {e}is added to Q in line 15 which is the only place in Algorithm 2 where nodes are added to Q. So, LABEL must have returned neither valid nor closed for node. Hence, node cannot be a diagnosis w.r.t.  ⟨K, B, P, N ⟩Ras otherwise LABEL with argument node must have returned valid in line 30. Due to the fact that node = K is definitely a diagnosis w.r.t.  ⟨K, B, P, N ⟩Ras it must hit all minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rwhich must all be subsets of K (Definition 4.1),  node ⊂ Kmust hold.

(2): Suppose the LABEL function is called for a node  nd ∈ Qwhere  nd ⊂ Dfor some minimal diagnosis D.

First, there cannot be any  nd′ ∈ Dcalcwith  nd′ ⊆ ndsince  Dcalcincludes only diagnoses w.r.t. ⟨K, B, P, N ⟩Rand  nd ⊂ Dwherefore there would be a diagnosis  nd′ ⊂ D, contradiction. Due to the fact that nd is present only once in Q, there cannot be some  nd′ = ndin Q. Thus, closed cannot be returned for nd by LABEL.

By the facts that a diagnosis must hit all minimal conflict sets (Proposition 4.6) and that nd is a proper subset of a diagnosis, either the criterion checked in line 26 must be true or QX(⟨K \ nd, B, P, N ⟩R)

must return a minimal conflict set L, i.e.  L ̸=’no conflict’. In both cases, a minimal conflict set is returned by LABEL. There are no other labels that can be returned by LABEL.

Lemma 4.18. Each minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Roccurs as a node in Q during the execution of Algorithm 2, if the execution stops due to Q = [].

Proof. For Algorithm 2 it holds that

(i) if nd is the last exemplar of some node in Q which is a proper subset of some minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rand the LABEL function is called for nd, then it yields some minimal conflict set C w.r.t.  ⟨K, B, P, N ⟩Rwith  nd ∩ C = ∅by Lemma 4.17 and

(ii) each node nd that has been labeled by some minimal conflict set C is deleted from Q (line 7) whereupon one successor node  ndax = nd∪{ax}for each element  ax ∈ Cis added to Q (INSERTSORTED in line 23) and

(iii) each minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Ris a superset of  ∅and a subset of K (Definition 3.5) which includes one element of each minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rand includes only elements of minimal conflict sets (Proposition 4.6).

Let D be some minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R. Then, there is a path of nodes from the root node  ∅to D in the pHS-tree produced by Algorithm 2, if the execution stops due to Q = [].

This holds by the following argumentation: If  D = ∅, then the path is  ⟨∅⟩. Now, suppose  D ⊃ ∅. Since D is a minimal diagnosis wherefore no other diagnosis can be equal to  ∅, the root node  n0 := ∅of the constructed tree must be labeled by some minimal conflict set  C1. Then, by (iii), there must be some  ax 1 ∈ C1that is an element of D. So, we define  n1 := {ax 1}. If  n1 = D, then the path is  ⟨∅, n1⟩. Otherwise, due to  D ⊃ n1and (i), node  n1in the pHS-tree must be labeled by some minimal conflict set C2. Then, by (iii), there must be some  ax 2 ∈ C2that is an element of D. So, we define  n2 := n1 ∪ {ax 2}. If  n2 = D, then the path is  ⟨∅, n1, n2⟩. Otherwise, due to  D ⊃ n2and (i), node  n2in the pHS-tree must be labeled by some minimal conflict set  C3. This reasoning can be continued until  nk = Dfor some k. By (iii),  D ⊆ Kholds wherefore such k must exist.

Algorithm 2 cannot stop executing before  nkhas been in Q since each node  nilabeled by a minimal conflict set  Ci+1involves the addition of  |Ci+1|successor nodes to Q by (ii). In particular, the successor node  ni ∪ {ax i+1}must be added to Q. As the execution stops due to Q = [], all nodes  nifor  i ≤ kmust be labeled before termination. Thus, D must be in Q sometime.

Proposition 4.15 (Completeness of Algorithm 2). If Algorithm 2 terminates due to Q = [], then the algorithm returns a set D including all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Proof. Assume some minimal diagnosis D w.r.t.  ⟨K, B, P, N ⟩Rwhere  D /∈ Dafter Algorithm 2 has returned due to Q = []. First, each minimal diagnosis will occur in Q throughout the execution of Algorithm 2 because it executes until Q = [] wherefore Lemma 4.18 applies. Any node nd in Q can only be deleted from Q if LABEL is called with the argument node nd (lines 7 and 8). There is no other point in Algorithm 2 where elements are removed from Q. Since at the end Q = [], each minimal diagnosis, in particular D, must be labeled.

Suppose D is the last exemplar of possibly multiple duplicates of it in Q. Then, the LABEL function cannot return closed for D. This holds, on the one hand, because the duplicate criterion (lines 22-24) only removes possible duplicate nodes from Q, but never the last exemplar of a node in Q. On the other hand, D can never be closed due to the non-minimality criterion (lines 19-21) as  Dcalccan only include diagnoses w.r.t.  ⟨K, B, P, N ⟩Rby Proposition 4.16. Thus, due to the minimality of  D, Dcalccannot comprise any diagnosis  D′with  D′ ⊆ D, except for some  D′which is equal to D. This would however be a contradiction to the assumption that  D /∈ D.

The reuse criterion (lines 25-27) cannot apply for D either since a minimal diagnosis is a hitting set of all minimal conflict sets (Proposition 4.6) wherefore there cannot be a minimal conflict set in  Ccalcwhich has an empty intersection with D. So, the algorithm will come to line 28 where QX(⟨K \ D, B, P, N ⟩R)will return ’no conflict’ (Propositions 4.9 and 3.2). Therefore, D will be labeled by valid and will be added to  Dcalcin line 11.

Next, we show the soundness of Algorithm 2 w.r.t. minimal diagnoses, i.e. that it computes only minimal diagnoses w.r.t. the DPI it is given as input.

Proposition 4.16 (Soundness of Algorithm 2). If an element D is added to the set  Dcalcduring the execution of Algorithm 2, D is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

Proof. Assume that some element nd is added to  Dcalcwhich is not a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. This immediately yields a contradiction due to Lemma 4.16.

Assume now that some element nd is added to  Dcalcwhich is a diagnosis w.r.t.  ⟨K, B, P, N ⟩R, but not a minimal one. Now, since nd is a non-minimal diagnosis, there is some  D ⊂ ndwhich is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

Then, there are three cases to distinguish: (a) D is in Q and (b) D is in  Dcalcand (c) D is neither in Q nor in  Dcalc, i.e. the node D has not yet been generated.

Note that these are all possible cases as D is a minimal diagnosis by assumption. So, D cannot have been ruled out, i.e. labeled by closed, by the non-minimality criterion (lines 19-21) before since only diagnoses can be added to  Dcalcas argued in the first paragraph of this proof and there cannot be a diagnosis  D′ ∈ Dcalcsuch that  D′ ⊂ D. The case  D′ = Dis already considered by case (b). The duplicate criterion (lines 22-24) does not need to be taken into account since it deletes duplicate nodes only.

(a): To be added to  Dcalc, ndmust have been the first element of the queue Q by GETFIRST in line 6. Since  D ∈ Qby assumption and since Q is sorted in descending order of node probability (INSERTSORTED in line 15), we conclude that  pnodes(D) ≤ pnodes(nd). However, as  pnodes(X)for a node  X ⊆ Kis defined by means of p(ax) where  p(ax) ∈ (0, 0.5)for all  ax ∈ Kas per Formula 4.6 (Definition 4.9), Lemma 4.14 applies and establishes the truth of  pnodes(S1) > pnodes(S2)if  S1 ⊂ S2for  S1, S2 ⊆ K. By  D ⊂ nd, this implies  pnodes(D) > pnodes(nd), contradiction.

(b): Assuming case (b), we can derive a contradiction as follows. By the fact that nd is added to Dcalc, it must hold that the LABEL procedure called for nd in line 8 returned valid as part of its output in line 30. However, as  D ⊂ ndis already an element of  Dcalcby assumption, the LABEL procedure must have already returned in line 21 wherefore it cannot have reached line 30, contradiction.

(c): Suppose that D has not yet been generated as a node in Q. By Lemma 4.17, the nodes  ∅ =nd1, . . . , ndkalong a path from the root node in the pHS-Tree produced by Algorithm 2 satisfy  ndi ⊂ndi+1and  |ndi| + 1 = |ndi+1|. So, by Lemma 4.14, the node probabilities along any path from the root node are strictly monotonically decreasing. Since  pnodes(D) > pnodes(nd)holds by the same argumentation as in (a), we have that all nodes on the path from the root node to D have a higher probability than nd. As Q is sorted in descending order of node probability and in each iteration the first element in Q is processed as explained in (a), we infer that D must have already been generated at the time nd is processed, contradiction.

Next, we argue that Algorithm 2 computes minimal diagnoses in descending order of diagnosis probability according to the parameter p() given as input to the algorithm.

Corollary 4.7. Let the probability p(D) of a diagnosis D in Algorithm 2 be computed from the given function  p(ax), ax ∈ Kas per Formula 4.3.

1. At any point in time during the execution of Algorithm 2,  Dcalccomprises the  |Dcalc|most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

2. If Algorithm 2 returns a set D of cardinality n, then D is the set of the n most-probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Proof. (1): By Propositions 4.15 and 4.16, it is a fact that Algorithm 2 computes all and only minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R. What must still be shown is that minimal diagnoses are added to  Dcalcin descending order of their probability p() as per Formula 4.3. The probability p(D) of some diagnosis D is equal to  pnodes(D)since a each diagnosis is a node and Formula 4.3 is a special case of Formula 4.6 by which the probability  pnodes(nd)of a node nd is calculated.

Let us denote by  Dpmaxthe minimal diagnosis with maximum probability that has not yet been added to  Dcalcand by  D¬pmaxan arbitrary minimal diagnosis with non-maximal probability, that is pnodes(D¬pmax) < pnodes(Dpmax). So, we need to demonstrate that each node  nd ⊂ Dpmaxon a path from the root node to node  Dpmaxis processed before  D¬pmaxis treated. By Lemma 4.17, a path from the root node in the pHS-Tree produced by Algorithm 2 is a set of nodes  ∅ = nd1, . . . , ndkwhere  ndi ⊂ ndi+1and  |ndi| + 1 = |ndi+1|. Further recall that the probability  pnodes(X)of a node  X ⊆ Kin Algorithm 2 is defined as per Formula 4.6. So, by Lemma 4.14, the node probabilities along any path from the root node are strictly monotonically decreasing. Hence, each node nd on a path from the root node to  Dpmaxhas a probability  pnodes(nd) > pnodes(Dpmax) > pnodes(D¬pmax). By the insertion of new nodes into Q (INSERTSORTED in line 15) in a way descending order of Q as per  pnodes()is always maintained, and by the selection of the first element of Q (GETFIRST in line 6) as next node to be processed, each node nd on a path to  Dpmaxmust be processed before  D¬pmaxis processed. Consequently, minimal diagnoses are added to  Dcalcin descending order of their probability p() as per Formula 4.3.

(2): This proposition follows directly from (1).

Proposition 4.17. Algorithm 2 always terminates and returns a set D of minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rwhich is

the set of the |D| most probable (w.r.t. p() and Formula 4.3) minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rsuch that  nmin ≤ |D| ≤ nmax, if at least  nminminimal diagnoses exist w.r.t.  ⟨K, B, P, N ⟩R, or

the set of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R, otherwise.

Proof. The proposition is a direct consequence of Propositions 4.12, 4.15 and 4.16 and Corollary 4.7.

4.6.4 Using Probabilities to Compute Minimum Cardinality Diagnoses

The function  p : K → (0, 0.5)can be defined in a way that minimum cardinality instead of maximum probability diagnoses are identified first. To this end, p() is specified as a fixpoint function that maps each formula  ax ∈ Kto one and the same constant value p(ax) := c where c is an arbitrary real number such that 0 < c < 0.5, e.g. c := 0.3. That in this setting diagnoses are found in order of ascending cardinality is a simple consequence of Corollary 4.7.

Example 4.7 Let us now study how such formula and diagnosis probabilities would be constructed for the example DPI depicted by Table 15.3. Let us suppose that the KB K in the DPI was formulated by a single user u for whom the personal fault probabilities of syntactical elements �K ∪ Kgiven by the first row of Table 4.4 have been extracted from log data of the KB editing software applied by u. Then, the resulting probabilities of formulas  ax ∈ Kas per Formula 4.2 are as presented in the rightmost column of Table 4.4. The entries in the table from the second to the last but two column display the number of occurrences of the syntactical element given by the column label in the formula given by the row label. These values are required to compute the formula probabilities listed in the last but one column

image

Table 4.4: Computing fault probabilities of formulas in K given fault probabilities of syntactical elements e ∈ �K ∪ Kfor the DPI given by Table 15.3.

as per Formula 4.2. The final probabilities that can “safely” be incorporated into Algorithm 2 under a guarantee that only minimal diagnoses will be output are shown in the last column. These result from an application of Formula 4.7 to the probabilities given in the last but one column with an adaptation parameter c := 0.49.

Notice that, for example,  p(ax 5)is rather high since the predicates A and Y as well as the connective ¬occurring in  ax 5have a comparably high fault probability in relation to syntactical elements appearing in other formulas. Formula  ax 3, on the other hand, comprises only two predicates which should be well-understood by u and no connectives except for  →which is not problematic for u either. Therefore, its fault probability is rather low.

4.7 Non-Interactive Knowledge Base Debugging Algorithm

Algorithm 3 describes the procedure for non-interactive debugging of KBs. The algorithm requires as input all the parameters that are required by Algorithm 2 and an additional parameter  auto ∈ {true, false}indicating either automatic (true) or manual (false) mode. If auto = false, Algorithm 3 calls HS (Algorithm 2) with the parameters as provided. The set of minimal diagnoses D returned by HS is then presented to the user who can select a diagnosis manually after inspecting the diagnoses in D. Alternatively, in case of auto = true, the system calls HS with the parameters as provided, but with  nmin = nmax = 1. Hence, only the most probable minimal diagnosis is computed by HS and returned as an output of Algorithm 3 to the user.

If a user wants the algorithm to output the set of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R, then the parameter setting auto = false and  nmin = ∞must be chosen. If, on the other hand, a fixed number n of leading diagnoses should be computed (as long as there are at least n minimal diagnoses for the DPI), then  nmin := n =: nmaxare the correct parameter settings. Note that in both cases the specification of t has no effect.

Of course, the user can also apply Algorithm 3 several times with varying parameters  t, nmin, nmaxand p(). Or they can specify a test case, i.e. add a set of formulas X either to P (if each  ax ∈ Xshould be entailed by the correct KB) or to N (if the conjunction of all formulas in X must not be implied by the correct KB), and rerun the algorithm with this modified DPI.

image

Anyway, the user must either find the correct diagnosis (if it is an element of the output set D at all) by hand or be convinced that the returned minimum cardinality or respectively maximum probability diagnosis is indeed the one that yields a solution KB with the intended semantics. Moreover, when formulating test cases by hand, a user can be assumed to be as likely to specify something contradictory or faulty as during creation of the KB itself.

Unsurprisingly, application of Algorithm 3 will often lead to unsatisfying solution ontologies. Remedy for this is provided by Interactive KB Debugging which on the one hand requires higher effort of one (or several) user(s), but on the other hand ensures a high quality solution in terms of its semantics to the problem of Parsimonious KB Debugging (Problem Definition 3.2).

Example 4.8 Assume a user wants to find a maximal solution KB for the example DPI  ⟨K, B, P, N ⟩Rprovided by Table 15.3 and that no data giving information about fault probabilities of syntactical constructs or formulas in K is available. Therefore, let p(ax) := c for some fixed  c ∈ (0, 0.5)(see Section 4.6.2 for an explanation of this choice of c). The non-interactive KB debugging algorithm presented by Algorithm 3 called with  ⟨K, B, P, N ⟩R, the function  p(), nmin = ∞and auto = false as inputs results in the hitting set tree given by the upper picture in Figure 4.2. By  nmin = ∞and auto = false, the user signalizes that inspection of all minimal diagnoses w.r.t. the input DPI is desired. Hence, the (complete) breadth-first pHS-tree as per Algorithm 2 is constructed. So, the output is the set of all minimal diagnoses  mD⟨K,B,P,N⟩R = {[1], [2], [5, 7]}.

In the shown hitting set tree, minimal diagnoses are indicated by nodes labeled by  ✓(D)where D is a name given to this diagnosis. A node closed due to non-minimality is denoted by  ×(⊃D)where D is some minimal diagnosis that is a subset of the set of edge labels along the path leading from the root node to this node. The label  CCmeans that the minimal conflict set C has been freshly computed by a call to QX. The label  CR, on the other hand, means that the minimal conflict set C has been reused from the set of already computed minimal conflict sets. In this example, both minimal conflict sets are computed by QX and no conflict sets are reused. The order of node labeling is indicated by the numbers i⃝starting from 1. Open nodes, i.e. generated nodes that have not yet been labeled, are indicated by a question mark.

image

Figure 4.2: Non-interactive KB debugging process without any given fault information applied to the DPI given by Table 15.3 with settings  auto = false and nmin = ∞(above) and auto = true (below).

In case auto = true was given as an input to the algorithm instead, the partial pHS-tree depicted by the lower picture in Figure 4.2 would be constructed and the output would be  D = {D1} = {[1]}containing just the first found and thus most probable minimal diagnosis w.r.t. the input DPI. Note that D1 = [1]and  D2 = [2](which is not computed) have equal probability and whether the one or the other is computed first depends only on the ordering of equally probable (in this case: equal cardinality) nodes in Q. As already mentioned in Section 4.6.2, in this example the most probable diagnosis is equivalent to a minimum cardinality diagnosis since all formula probabilities are equal.

Please notice that the internal “flat” representation used by Algorithm 2 which does not store a tree but only the set of open and closed nodes differs from the standard tree representation [Kal06, FS05, SQJH08, Rei87] we use to depict the hitting set tree graphically in Figure 4.2. Whereas within Algorithm 2 a node node stores the set of all the edge labels on the path leading from the root node to node, in the figure we label each node in the tree by the respective label that is computed for this node by the LABEL function, i.e. either by a minimal conflict set, by  ✓or by  ×.

Example 4.9 Recall Example 4.7 which demonstrated how formula fault probabilities are constructed from fault probabilities of syntactical elements for the example DPI depicted by Table 15.3. Now we want to show how the non-interactive KB debugging algorithm given by Algorithm 3 works when these formula probabilities are incorporated.

Suppose the inputs to the algorithm are the DPI  ⟨K, B, P, N ⟩R, the function p(ax) for  ax ∈ Kdisplayed by the rightmost column of Table 4.4 and auto = false. Further on, let the user of the debugging algorithm be willing to wait a maximum of one second for an output and let them postulate a minimum of two most probable minimal diagnoses to be returned, e.g. to have at least a second choice if the employed formula probabilities are not perfectly suitable and the most probable diagnosis is not the desired solution. These postulations are expressed by specifying the parameters  nmin = 2and t = 1 (second). Additionally, assume the user expects the provided probabilities to be sufficiently reasonable such that the

image

Figure 4.3: Non-interactive KB debugging process with given fault information applied to the DPI given by Table 15.3 with settings  auto = false, nmin = 2, nmax = 4 and t = 1.

desired diagnosis will be among the best four diagnoses wherefore  nmax = 4is chosen. Moreover, let us imagine that the time for each fresh computation of a minimal conflict plus generation of the (unlabeled) successor nodes of this node is 0.4 seconds and the cost of computing any other label of a node is 0.1 seconds.

Then the partial wpHS-tree produced by Algorithm 3 initialized in this way is illustrated by Figure 4.3. The used notation is as described in Example 4.8 with one additional attribute. Namely, each edge is not only labeled by one element of the conflict set from which it goes out, but also by a label  p ∈ (0, 1)that is placed near the arrow head of the arrow that expresses the edge. This label p gives the probability as per  pnodes()(cf. Definition 4.9) of the (partial) diagnosis that corresponds to the union of the edge labels along the path from the root to and including the edge that is labeled by p. For example, the label 0.06 of the edge directed at the node number 4⃝means that the probability of {2, 5} is 0.06. Further on, open, i.e. generated, but not yet labeled nodes, are designated by a question mark.

As outlined by the circled numbers i⃝, as a first action the root node is labeled by the newly computed minimal conflict set  ⟨1, 2, 5⟩, the computation time of which amounts to 0.4. Then, the tree construction proceeds according to the (partial) diagnosis probabilities according to  pnodes()computed from the formula probabilities  p(ax), ax ∈ Kprovided by the last column of Table 4.4. Therefore, the most probable edge leading away from the root node is labeled next. This already leads to the finding of the first minimal diagnosis  D1 = [2]after overall computation time of 0.5 seconds. Since  nmin = 2diagnoses have not yet been computed and there are still unlabeled open nodes, namely those corresponding to paths {1} and {5}, the algorithm continues the execution by labeling the next best node {5} with a probability of 0.07 – as opposed to 0.02 for the other open node {1}. Since {5} is neither a superset of an already computed minimal diagnosis nor a duplicate of another open node nor a diagnosis itself, it must be labeled by some minimal conflict set. Because the already established minimal conflict set  ⟨1, 2, 5⟩is not disjoint with {5}, no reuse is possible and QX is called to determine a new minimal conflict set  ⟨1, 2, 7⟩w.r.t.  ⟨K, B, P, N ⟩R. All successor nodes of the newly labeled node 3⃝, i.e. the nodes corresponding to the paths {1, 5} , {2, 5} and {5, 7}, are added to the list Q of open nodes such that descending order of probabilities is maintained. The resulting queue is then Q = [{2, 5} , {5, 7} , {1} , {1, 5}]. As a next step, again the first and thus best open node {2, 5} is chosen from Q and labeled by  ×(⊃D1)which means that the corresponding path is closed since it is a superset of an already found minimal diagnosis, namely D1 = [2]. At this point, the overall computation time amounts to 1 second which corresponds to the time limit t. For that reason, the algorithm will go ahead searching for minimal diagnoses only until a minimal number  nminthereof is detected. The node processed next, corresponding to the path {5, 7}, is then determined to be a minimal diagnosis by the LABEL procedure.

Thus, the output of the algorithm after 1.1 seconds execution time is the set of minimal diagnoses D = {[2], [5, 7]} which is a proper subset of all minimal diagnoses  D⟨K,B,P,N⟩R = {[1], [2], [5, 7]}. However, if we assume that the user’s intended KB should entail  E → G, for instance, then none of the returned diagnoses can be used to compute a solution KB featuring this entailment when integrated with the background knowledge B. Hence, the true diagnosis  Dtwould be missed in this case.

Also, when computing all minimal diagnoses w.r.t. a DPI – if this is even possible in a concrete case due to the computational complexity – and showing them to the user, a user might review just the most probable ones and make a decision on which one to choose only based on these. For instance, [SF10] reported on one DPI where computation of all minimal diagnoses, 1782 in number, is feasible. In such a case it is hard to expect that a user will be willing or will have the time to inspect more than a small fraction of these 1782 diagnoses. The consequence will be a wrong choice of diagnosis in many cases, also because a simple view on a diagnosis will often not lead to the certainty of a user that this one is or is not the desired one. The reason for this is that usually it is too complex for a human brain to perform the necessary mental reasoning to make oneself a picture of the implications of choosing one diagnosis as opposed to another one.

For our example DPI, a user getting the output  D = mD⟨K,B,P,N⟩R = {[1], [2], [5, 7]}with the computed probabilities p([1]) = 12%, p([2]) = 60% and p([5, 7]) = 28% might decide to just inspect the diagnoses that make the most probable 80% fraction of diagnoses. In this case, either [2] or [5, 7] would be selected, which corresponds to a wrong choice in case  E → Gshould be entailed be the resulting solution KB after integration with the background KB B.

In this part, we profoundly introduced the topic of knowledge base debugging. We stated necessary properties of knowledge representation languages to be compatible with our approaches, namely that the entailment relation must be monotonic, idempotent and extensive. We gave precise definitions of the problems of KB debugging and parsimonious KB debugging. Both problems assume a given instance of a diagnosis problem (DPI). The former seeks any solution in line with the given requirements whereas the latter seeks a solution that preserves as much formulas as possible of the given faulty KB, i.e. aims at minimal changes. With the validity of a KB, a solution KB, a diagnosis and a conflict set, we have characterized central notions that will be extensively used throughout this work. We have studied the relationship between all these notions and proved that solving the problem of parsimonious KB debugging is equivalent to finding a minimal diagnosis w.r.t. a given DPI.

We established the relationship between conflict sets and justifications, a similar notion that is used concurrently to conflict sets in (prevalently DL, OWL or Semantic Web) literature, and provided evidence that conflict sets are the better choice for the debugging problems addressed here. In particular, conflict sets serve the purpose of reducing the search space for minimal diagnoses – minimal hitting sets of all minimal conflict sets – and help a debugging software to focus on the relevant and problematic parts of the faulty KB. A method for the efficient, polynomial time computation of a conflict set was detailed and its correctness was formally proven. Based on this method, we were able to depict a way of computing minimal diagnoses which is based on using a hitting set tree. Such a tree constitutes a systematic way of generating all minimal conflict sets and, in the course of this, also all minimal diagnoses. Depending on the particular situation, the presented algorithm can be configured to compute diagnoses in a predefined order, e.g. most probable diagnoses first or those diagnoses first that are minimally invasive in terms of the changes made to the faulty KB.

Different ways of obtaining and incorporating meta (fault) information into the debugging process were elucidated. Such information, if reasonable, can facilitate and accelerate the debugging process significantly. However, even in the case of the availability of high-quality fault information, we discovered substantial drawbacks of the debugging system presented so far. That is, such a system either chooses automatically a solution (diagnosis) based on the given fault information in a solution space of (generally) exponential size or refers a subset of all solutions, e.g. the most probable solutions, to the user for manual inspection. In the former case, the probability of being presented a solution KB with undesired semantics is very high implying unwanted changes to the faulty KB and unexpected entailments and non-entailments as well as future errors. Such unexpected semantics can be critical or even fatal; one should imagine intelligent medical applications relying on such KBs, for instance. In the latter case, the burden is placed on the user(s) who must mentally anticipate the implications of applying different repairs (using the different submitted diagnoses) to the KB which is practically impossible for human beings both from the

86 CHAPTER 5. SUMMARY

time/effort as well as from the mental perspective. Moreover, it is basically intractable to generate all possible solutions. Hence, it is not even sure that the manually investigated solutions include to correct one (with the postulated semantics).

This leads us to the next part which deals with exactly these issues and proposes a solution.

image

Interactive Knowledge Base Debugging

image

This part is organized as follows:

In Chapter 6, we first discuss how disadvantages of non-interactive KB debugging procedures can be overcome by allowing a user to take part in the debugging process. Next, we define the problem of interactive static KB debugging as well as the problem of interactive dynamic KB debugging which “naturally” arise from the fact that the DPI in interactive KB debugging is always renewed after a new test case has been specified (a new query has been answered). The former problem searches for a solution KB w.r.t. the DPI given as input such that this solution KB satisfies all test cases added during the debugging session and there is no other such solution KB. The latter problem searches for a solution KB w.r.t. the current DPI (i.e. the input DPI including all new test cases added throughout the debugging session so far) such that there is no other solution KB w.r.t. the current DPI.

Next, in Chapter 7, the central term of a query is specified which constitutes the medium for user interaction. Queries are generated from a set of leading diagnoses which is characterized thereafter. The set of leading diagnoses is uniquely partitioned into three subsets by each query. The tuple including these subsets is called q-partition. Subsequently, the reader is given some explanations how the q-partition can be interpreted, and how it relates to a query. In fact, we will prove that the notion of a q-partition can serve as a criterion for checking whether a set of logical formulas is a query or not. After that, we will learn that a query exists for any set of (at least two) leading diagnoses which grants that the presented algorithms will definitely be able to come up with a query without the need to impose any restrictions on which (minimal) diagnoses are computed by the diagnosis engine in each iteration.

Chapter 8 shows a method for the generation of (a pool of) set-minimal queries (Algorithm 4) aiming at stressing the interacting user as sparsely as possible, features in-depth discussions of this method’s properties, proves its correctness, provides complexity results and gives some illustrating examples. Further on, drawbacks of this method are pointed out and possible solutions are discussed.

Subsequently, Chapter 9 deals with the presentation of the central algorithm of this work which implements an interactive KB debugging system (Algorithm 5). First, an overview of the workflow of interactive KB debugging is given, followed by a more comprehensive detailed specification of the algorithm. Some query selection measures are discussed [RSFF13, SFFR12] and optimization versions of the problems of interactive dynamic and static KB debugging are defined where the goal is to obtain the solution to these problems by asking the user a minimal number of queries. Finally, we prove the correctness of the interactive KB debugging algorithm and provide a discussion of its complexity.

Non-theoretically-oriented readers might well skip Sections 8.2, 8.4, 8.5, 8.7 and 9.4 in this part. Moreover, for the superficially interested reader, it may suffice to concentrate only on Chapter 6 and Sections 7.1, 7.2 and 9.1 in this part.20

So far, we have learned that the problem of (parsimonious) KB debugging as defined in Problem Definitions 3.1 and 3.2 in Chapter 3 can be solved by investigating minimal diagnoses w.r.t. a given DPI ⟨K, B, P, N ⟩R. We have seen how minimal diagnoses can be computed, we have introduced a probability space over diagnoses and we have discussed how a-priori probability estimates for diagnoses can be established. Now, assume the situation where a DPI with say 100 minimal diagnoses is given, among which there is one diagnosis D with highest estimated probability p(D) = 10%. By the definitions of a diagnosis and a solution KB (Definitions 3.2 and 3.5), each of the 100 diagnoses can be used to formulate a solution KB w.r.t. the DPI  ⟨K, B, P, N ⟩R. So, should the system output the solution KB  (K \ D) ∪ UPobtained from D as the optimal solution? Will a user be satisfied with a likeliness of 90% of being offered a suboptimal solution? What if the diagnoses probabilities are bad estimates and another diagnosis  D′should actually have a probability of 20%?

Why not simply apply Algorithm 3 to show all 100 minimal diagnoses to the user and let them select the preferred one by hand? First, due to the complexity of diagnosis calculation algorithms (cf. Chapter 1), pre-computation of 100 (or, generally, all) minimal diagnoses is usually not tractable within reasonable time. This makes such an approach quite unattractive in an interactive setting. Second, going through large sets of diagnoses can be time-consuming, tedious and error-prone. Third, human beings are normally not capable of (fully) realizing the semantic consequences of deleting a diagnosis from a KB, especially if the KB is large, complex and/or has been created by multiple engineers or automatic systems. Thus, applying a suboptimal diagnosis can result in unexpected entailments or unwanted changes, and thus an incorrect solution KB (incorrect in the sense of the semantics, not in the sense of violating given requirements or test cases), which might cause unexpected new faults and contradictions when augmented by new formulas. Consequently, a solution diagnosis is only acceptable if the user has sufficiently scrutinized and approved its semantic effect to the KB.

This leads to the definition of two types of Interactive KB Debugging problems. First, there is the problem of Interactive Dynamic KB Debugging which, given an input DPI, aims at the extension of this DPI by new test cases confirmed by a user such that there is only one minimal diagnosis left w.r.t. the extended DPI. Second, we specify the problem of Interactive Static KB Debugging which, given an input DPI, aims at the formulation of new test cases confirmed by a user such that these new test cases rule out all but one minimal diagnosis w.r.t. the input DPI.

image

Remark 6.1 The solution of an Interactive Dynamic KB Debugging problem given the DPI  ⟨K, B, P, N ⟩Rsolves the problem of KB Debugging (Problem Defnition 3.1) as well as the problem of Parsimonious KB Debugging (Problem Defnition 3.2) for the DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, but in general not for the original DPI  ⟨K, B, P, N ⟩R. This is the reason why we term it “dynamic”, since a solution is found for a version of the initial DPI that has been extended by test cases.

image

Remark 6.2 The solution of an Interactive Static KB Debugging problem given the DPI  ⟨K, B, P, N ⟩Rconstitutes a solution to the problem of KB Debugging (Problem Defnition 3.1) as well as to the problem of Parsimonious KB Debugging (Problem Defnition 3.2) for the original DPI  ⟨K, B, P, N ⟩R, therefore the term “static”.

Now, we give a more formal definition of a true diagnosis (an informal characterization of which was given in Section 4.6). If sufficiently many new test cases are specified and added to a given DPI such that there is only one remaining minimal diagnosis w.r.t. the input DPI (the input DPI extended by the new test cases) left, then this diagnosis is referred to as the true diagnosis w.r.t. Interactive Static (Dynamic) KB Debugging.

Definition 6.1 (True Diagnosis). Let  Dtbe equal to D in Problem Definition 9.2 (9.1). Then  Dtis called the true diagnosis w.r.t. Interactive Static KB Debugging (Interactive Dynamic KB Debugging).

The idea in interactive KB debugging is to iteratively consult a user asking them to give additional information as regards desired and undesired entailments of the correct KB. Thus, the principle of interactive KB debugging is based on that of Sequential Diagnosis which has been suggested by [dKW87] as an iterative way to localize the faulty components (among an initially large set of possibilities) in malfunctioning digital circuits by performing repeated (most informative) measurements. We have shown in our previous works [SF10, SFFR12] how sequential diagnosis can be applied to KBs (ontologies).

In our approach, for the selection of which question (of a pool of possible ones) to ask a user next, an active learning [Set12] approach is applied.21 Active Learning is an iterative supervised machine learning technique in which a learning algorithm is able to interactively query the user to obtain a label for a desired unlabeled instance. In the case of a KB debugging system, an unlabeled instance is a set of logical formulas and the label is whether the conjunction of these formulas should or should not be entailed by the correct KB. Since the learner can choose the instances to be labeled, the number of consultations of an interacting user required to learn a concept (in this case the one solution KB with the desired semantics w.r.t. a given DPI) can often be much lower than the number required in a standard supervised learning setting since the risk that the algorithm must deal with lots of uninformative examples is reduced.

We suppose the user of an interactive KB debugger to be a single person or multiple persons, usually experts of the particular domain the faulty KB is dealing with or authors of the faulty KB. Moreover, we assume the interacting user to be able to answer concrete queries about the intended domain that should be modeled. Otherwise put, we suppose that a user can classify a given logical formula (or a conjunction of logical formulas) as a wanted or unwanted proposition in the intended domain, i.e. as an entailment or non-entailment of the correct domain model. We have already argued in Chapter 1 why this assumption is plausible.

7.1 Queries

In interactive KB debugging, a set of logical formulas Q is presented to the user who should decide whether to assign Q to the set of positive (P) or negative (N ) test cases w.r.t. a given DPI  ⟨K, B, P, N ⟩R. In other words, the system asks the user “should the KB you intend to model entail all formulas in Q?”. In that, Q is generated by the debugging algorithm in a way that any decision of the user

1. invalidates at least one minimal diagnosis (search space restriction) and

2. preserves validity of at least one minimal diagnosis (solution preservation).

We call a set of logical formulas Q with these properties a query. Successive classification of queries as entailments (all formulas in Q must be entailed) or non-entailments (at least one formula in Q must not be entailed) of the correct KB enables gradual restriction of the search space for (minimal) diagnoses. Further on, classification of sufficiently many queries guarantees the detection of a single correct solution diagnosis which can be used to determine a solution KB with the correct semantics w.r.t. a given DPI.22

Definition 7.1 (Query). Let  ⟨K, B, P, N ⟩Rover L and  D ⊆ mD⟨K,B,P,N⟩R. Then a set of logical formulas  Q ̸= ∅over L is called a query w.r.t. D iff there are diagnoses  D, D′ ∈ Dsuch that  D /∈mD⟨K,B,P∪{Q},N⟩Rand  D′ /∈ mD⟨K,B,P,N∪{Q}⟩R. The set of all queries w.r.t. D and  ⟨K, B, P, N ⟩Ris denoted by  QD,⟨K,B,P,N⟩R.

Remark 7.1 Although Definition 7.1 only postulates that at least one diagnosis in D is invalidated for whatever answer is given to the query, this implies that, for each answer to the query, there is also a diagnosis that remains valid after adding the corresponding test case to the DPI, as will be shown by Proposition 7.4.

So, w.r.t. a set of minimal diagnoses  D ⊆ mD⟨K,B,P,N⟩R, a query Q is a set of logical formulas that rules out at least one diagnosis in D (and therefore in  mD⟨K,B,P,N⟩R) as a candidate to formulate a solution KB, regardless of whether Q is classified as a positive or negative test case.

7.2 Leading Diagnoses

Query generation requires a precalculated set of minimal diagnoses  D ⊆ mD⟨K,B,P,N⟩Rthat serves as a representative for all minimal diagnoses  mD⟨K,B,P,N⟩R. As already mentioned, computation of the entire set  mD⟨K,B,P,N⟩Ris generally not tractable within reasonable time. Usually, D is defined as a set of most probable or minimum cardinality diagnoses (cf. Chapter 4). Therefore, D is called the set of leading diagnoses w.r.t.  ⟨K, B, P, N ⟩R[SFFR12].

The leading diagnoses D are then exploited to determine a query Q the answering of which enables a discrimination between the diagnoses in  mD⟨K,B,P,N⟩R. That is, a subset of  mD⟨K,B,P,N⟩Rwhich is not “compatible” with the new information obtained by adding the test case Q to P or N is ruled out (see Proposition 7.3 below). For the computation of the subsequent query only a leading diagnoses set Dneww.r.t. the minimal diagnoses still compliant with the new sets of test cases  P′and  N ′is taken into consideration, i.e.  Dnew ⊆ D⟨K,B,P′,N ′⟩R.

The number of precomputed leading diagnoses D affects the quality of the obtained query. The higher |D|, the more representative is D w.r.t.  mD⟨K,B,P,N⟩R, the more options there are to specify a query in a way that a user can easily comprehend and answer it, and the higher is the chance that a query that eliminates a high rate of diagnoses w.r.t. D will also eliminate a high rate of all minimal diagnoses mD⟨K,B,P,N⟩R. The selection of a lower |D| on the other hand means better timeliness regarding the interaction with a user, first because fewer leading diagnoses might be computed much faster and second because the search space for an “optimal” query is smaller.23 So, the optimal number of leading diagnoses depends on the complexity of the particular DPI considered. One way to determine a suitable |D| can be to first define an interval  [nmin, nmax]that must comprise |D| where the upper bound defines the desired number of leading diagnoses and the lower bound the minimally postulated number. Second, the search for minimal diagnoses is run at least as long as it takes to compute  nmindiagnoses and at the longest until nmaxdiagnoses have been found or a timeout t expires that is specified in a manner it enables frequent user interaction. Note that such parameters have already been taken into account in the non-interactive KB debugging Algorithm 2 (see Section 4.7).

7.3 Q-Partitions

Now we introduce the notion of a q-partition, a partition of the leading diagnoses set D induced by a query w.r.t. D. A q-partition will be a helpful instrument in deciding whether a set of logical formulas is a query or not. It will facilitate an estimation of the impact a query answer has in terms of invalidation of minimal diagnoses. And, given fault probabilities, it will enable us to gauge the probability of getting a positive or negative answer to a query.

From now on, given a DPI  ⟨K, B, P, N ⟩Rand some minimal diagnosis  Diw.r.t.  ⟨K, B, P, N ⟩R, we will use the following abbreviation for the solution KB obtained by deletion of  Dialong with the given background knowledge B:

image

Definition 7.2 (q-Partition24). Let  ⟨K, B, P, N ⟩Rbe a DPI over  L, D ⊆ mD⟨K,B,P,N⟩R. Further, let Q be a set of logical formulas over L and

 D+(Q) :={Di ∈ D| K∗i|= Q},

 D−(Q) :={Di ∈ D| ∃x ∈ R ∪ N : K∗iQ violates x},

 D0(Q) :=  D \ (D+j ∪ D−j).

Then  ⟨D+(Q), D−(Q), D0(Q)⟩is called a q-partition iff Q is a query w.r.t. D and  ⟨K, B, P, N ⟩R.

Remark 7.2 The set  D−(Q)contains exactly those diagnoses  Di ∈ Dwhere  K \ Diis invalid w.r.t. ⟨·, B, P ∪ {Q} , N ⟩(cf. Definition 3.3).

Proposition 7.1. For each query Q w.r.t. some  D ⊆ mD⟨K,B,P,N⟩Rit holds that  ⟨D+(Q), D−(Q), D0(Q)⟩is a partition of D.

Proof. First, by definition of  D0(Q), we have that  D+(Q)∪D−(Q)∪D0(Q) = D, D+(Q)∩D0(Q) = ∅and  D−(Q) ∩ D0(Q) = ∅. Second,  D+(Q) ∩ D−(Q) = ∅since  K∗i |= Qjand  ∃x ∈ R ∪ N :(K∗i ∪Qjviolates x) imply by idempotency of L that  K∗iviolates some  x ∈ R∪Nwhich is a contradiction to  Dibeing a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. Thus, each diagnosis in D is an element of exactly one set of  D+(Q), D−(Q), D0(Q)which is equivalent to the statement of the proposition.

Remark 7.3 In fact, Proposition 7.1 holds for any set  D ⊆ aD⟨K,B,P,N⟩R, i.e. for any subset of all diagnoses w.r.t.  ⟨K, B, P, N ⟩R. This can be easily seen from the proof of Proposition 7.1 which does not require minimality of diagnoses. That is, any set of diagnoses w.r.t. a DPI is partitioned into the three sets D+(Q), D−(Q)and  D0(Q)as per Definition 7.2 by a query Q w.r.t. this DPI.

Proposition 7.2. For each query Q w.r.t. some  D ⊆ mD⟨K,B,P,N⟩Rthere is one and only one partition ⟨D+(Q), D−(Q), D0(Q)⟩.

Proof. The existence of a partition  D+(Q), D−(Q), D0(Q)follows directly from Proposition 7.1. Assume there are two different partitions  ⟨D+1 (Q), D−1 (Q), D01(Q)⟩and  ⟨D+2 (Q), D−2 (Q), D02(Q)⟩. Then, (a)  D+1 (Q) ̸= D+2 (Q)or (b)  D−1 (Q) ̸= D−2 (Q)or (c)  D01(Q) ̸= D02(Q)must hold. If (a) is true, then there is one diagnosis  Di ∈ Dsuch that  K∗i |= Qand  K∗i ̸|= Q– a contradiction. If (b) is true, then there is one diagnosis  Di ∈ Dsuch that  K∗i ∪ Qviolates some  x ∈ R ∪ Nand  K∗i ∪ Qdoes not violate any  y ∈ R ∪ N– a contradiction. If (c) is true, then  (D+1 (Q) ∪ D−1 (Q)) ̸= (D+2 (Q) ∪ D−2 (Q))which implies that either (a) or (b) must be true.

Due to the uniqueness of a q-partition  ⟨D+(Q), D−(Q), D0(Q)⟩for a query Q, we denote this q-partition by P(Q). As a consequence of Definition 7.2 and Proposition 7.2, a query Q is a set of common entailments of KBs  K∗i, each resulting from the deletion of a single minimal diagnosis  Di ∈ D+(Q)from K.

Corollary 7.1. For each query  Q ∈ QD,⟨K,B,P,N⟩Rthere is a set of minimal diagnoses  D+(Q) ⊆mD⟨K,B,P,N⟩Ras defined by Definition 7.2 such that  Q ⊆ {e | ∀Di ∈ D+(Q) : K∗i |= e}.

7.4 Interpretation of Q-Partitions

Since  K∗icorresponds to the solution KB (along with B) obtained under the assumption that  Dt = Di, i.e. the true diagnosis (cf. Definition 6.1) corresponds to  Di, the sets  D+(Q)and  D−(Q)can be interpreted as those leading diagnoses that predict the classification of Q as a positive and negative test case, respectively. In other words, if the true diagnosis  Dtis in  D+(Q), then the true solution KB K∗tentails Q by Definition 7.2. Therefore the user will answer Q positively (cf. Definition 6.1). If, con- versely,  Dtis in  D−(Q), then the true solution KB  K∗twould be invalidated if Q was answered positively, since  K∗t ∪ Q = (K \ Dt) ∪ B ∪ UP∪{Q}violates some  x ∈ R ∪ Nand thus  K \ Dtis invalid w.r.t. ⟨·, B, P ∪ {Q} , N ⟩R, which implies that  Dtis not a diagnosis w.r.t.  ⟨K, B, P ∪ {Q} , N ⟩Raccording to Proposition 3.2. Hence, the user will answer Q negatively (cf. Definition 6.1). Diagnoses in  D0(Q)on the other hand neither predict  Q ∈ Pnor  Q ∈ N. This means that we do not know how the user will answer a query Q for which the true diagnosis  Dtis in  D0(Q). In this case, for any answer to Q, the true diagnosis  Dtis in the set of minimal diagnoses w.r.t. the new DPI including Q as a test case. To summarize: If the true diagnosis  Dtis an element of  D+(Q) (D−(Q)), then Q will be answered positively (negatively).

Conversely, this means that a q-partition P(Q) gives a prior indication which leading diagnoses would be invalidated by a user’s answer. Diagnoses in  D+(Q)are invalidated by the classification  Q ∈ N, and diagnoses in  D−(Q)in case of  Q ∈ P. Diagnoses in  D0(Q)can never be invalidated by an answer to Q. Thus, intuitively, queries with  D0(Q) = ∅are preferable over other queries (as per the information provided by the set of leading diagnoses D) as the number of (definitely) eliminated diagnoses in mD⟨K,B,P,N⟩Rshould be maximized.

The following proposition is a direct consequence of Corollary 3.3 and explicates the impact of the addition of a test case to a DPI regarding the set of minimal diagnoses for this DPI.

Proposition 7.3. Let Q be a query w.r.t.  D ⊆ mD⟨K,B,P,N⟩Rand let the answer of a user to Q be u(Q) ∈ {true, false}.

If u(Q) = true, then  Di ∈ mD⟨K,B,P,N⟩Ris a diagnosis w.r.t.  ⟨K, B, P ∪ {Q} , N ⟩Riff  K \ Diis valid w.r.t.  ⟨·, B, P ∪ {Q} , N ⟩R.

image

Remark 7.4 From Proposition 7.3 and Definition 7.2 it is easy to see that at least  Di ∈ D−(Q) ⊂mD⟨K,B,P,N⟩Rare eliminated by a positive answer to Q. Namely,  D−(Q)comprises exactly those diagnoses  Dithat imply the violation of some  r ∈ Ror the entailment of some  n ∈ Nif Q is added to K∗i. On the other hand, at least  Di ∈ D+(Q) ⊂ mD⟨K,B,P,N⟩Rare discarded if u(Q) = false as all diagnoses in  D+(Q)entail Q which must not be entailed.

Note that, in general, the addition of a query to the test cases of a DPI causes not only an invalidation of some leading minimal diagnoses in D, but also the elimination of minimal diagnoses that have not even been computed yet. On the other hand, an added test case might also introduce new minimal diagnoses, i.e. ones that were no minimal diagnoses before this test case was added. However, the newly obtained DPI after the addition of any new test case can only exhibit a reduced set of all (i.e. minimal and non-minimal) diagnoses compared with the DPI before the test case was added (we will prove this result by Proposition 12.3).

7.5 The Relation between a Query and Its Q-Partition

The following proposition shows the relationship between a query and its q-partition and provides a criterion that enables to check whether a set of logical formulas is a query w.r.t. some set of leading diagnoses or not.

Proposition 7.4. Let  ⟨K, B, P, N ⟩Rbe a DPI over L and  D ⊆ mD⟨K,B,P,N⟩R. Then a set of logical formulas  Q ̸= ∅over L is a query w.r.t. D iff  D+(Q) ̸= ∅and  D−(Q) ̸= ∅.

Proof. ⇐”: If  D+(Q) ̸= ∅and  D−(Q) ̸= ∅holds, then a non-empty set of diagnoses  D−(Q) (D+(Q)) becomes invalid for positive (negative) answer to Q. So, Q is a query.

⇒”: If Q is a query, then there are diagnoses  D, D′ ∈ Dsuch that  D /∈ mD⟨K,B,P∪{Q},N⟩Rand D′ /∈ mD⟨K,B,P,N∪{Q}⟩R. Consequently,  D ∈ D\mD⟨K,B,P∪{Q},N⟩Rand  D′ ∈ D\mD⟨K,B,P,N∪{Q}⟩Rholds. But, as the diagnoses in  D \ mD⟨K,B,P∪{Q},N⟩Rare exactly the diagnoses in D that become invalid by the positive answer to Q, we obtain  D ∈ D−(Q). The argumentation for  D′ ∈ D+(Q)is analogous. Hence,  D+(Q) ̸= ∅and  D−(Q) ̸= ∅.

Corollary 7.2. Let  D ⊆ mD⟨K,B,P,N⟩R. Then, for each q-partition  P(Q) = ⟨D+(Q), D−(Q), D0(Q)⟩w.r.t. D it holds that  D+(Q) ̸= ∅and  D−(Q) ̸= ∅.

Proof. Follows from Definition 7.2 which grants the existence of a query for any q-partition and Proposition 7.4 which states that neither  D+(Q)nor  D−(Q)must be empty sets for any query.

So, by Proposition 7.4, a query not only eliminates at least one leading diagnosis, but also leaves at least one leading diagnosis valid. Therefore, an admissible DPI can never get non-admissible by adding a query to the positive or negative test cases.

Corollary 7.3. Let  ⟨K, B, P, N ⟩Rbe an admissible DPI,  D ⊆ mD⟨K,B,P,N⟩Rand  Q ∈ QD,⟨K,B,P,N⟩R. Then  ⟨K, B, P ∪ {Q} , N ⟩Ras well as  ⟨K, B, P, N ∪ {Q}⟩Rare admissible DPIs.

Proof. Assume that  ⟨K, B, P ∪ {Q} , N ⟩Ris non-admissible. Then there is no valid diagnosis for this DPI. Since  ⟨K, B, P, N ⟩Ris an admissible DPI, this means that Q invalidates each diagnosis  D ∈aD⟨K,B,P,N⟩R ⊇ mD⟨K,B,P,N⟩R ⊃ D. By Proposition 7.4, this is a contradiction to the fact that Q is a query. The argumentation for  ⟨K, B, P, N ∪ {Q}⟩Ris analogue.

This means in particular that a query can never contain a conflict set or result in a violation of some requirement  r ∈ Rwhen added to  B ∪ UP(cf. Proposition 3.4).

7.6 Existence of Queries

For any set of at least two leading minimal diagnoses the existence of a query is guaranteed, as the next proposition and corollary show. In particular, this implies that for arbitrary two minimal diagnoses  D, D′w.r.t. a DPI there is a query Q that enables to differentiate between D and  D′, i.e. exactly one of these diagnoses is invalidated by each answer to Q.

Proposition 7.5. Let  D ⊆ mD⟨K,B,P,N⟩Rwith  |D| ≥ 2and  UDbe the union of all diagnoses in D. Then

image

Proof. Ad (I): Assume that Q is not a query. Then either (1)  Q = ∅or (2)  D+(Q) = ∅or (3)  D−(Q) = ∅. In the following we prove that neither (1) nor (2) nor (3) can hold.

(1):  Q = ∅means that  Di ⊇ UD. Since any diagnosis D in D is a subset of  UD, this implies that for each  D ∈ D, D ⊆ Diholds. As  |D| ≥ 2is assumed, there is a  Dk ̸= Di ∈ Dfor which this property holds. This, however, is a contradiction to the minimality of diagnosis  Di.

(2):  D+(Q) = ∅cannot hold, since  (K \ Di) ⊇ (UD \ Di)and  UD \ Di |= Qby monotonicity of description logics imply that  K∗i = (K \ Di) ∪ B ∪ UP |= Q. Hence, there is at least one diagnosis, namely  Di, in  D+(Q).

(3): To prove that  D−(Q) ̸= ∅, we must show that there is a diagnosis  D ∈ Dsuch that Y := (K \ D)∪B∪UP ∪Q = (K\D)∪B∪UP ∪(UD\Di)is incoherent. However,  (K\D)∪(UD\Di) = K\(D∩Di)by distributive and De Morgan laws which yields  Y = K \ (D ∩ Di) ∪ B ∪ UP. But,  D ∩ Di ⊂ Dmust hold as  D ̸⊆ Diby the subset-minimality of  Diwhereby D must comprise a formula  ax /∈ Di. Hence, Y ⊃ (K \ D) ∪ B ∪ UPis incoherent by subset-minimality of D.

Ad (II): We already know that  Di ∈ D+(Q)by (2). Since  D ∈ Din (3) can be chosen arbitrarily, we obtain that  D ∈ D−(Q)for all diagnoses  D ∈ D \ {Di}.

We immediately obtain a lower bound for the number of queries by Proposition 7.5:

Corollary 7.4. Let  D ⊆ mD⟨K,B,P,N⟩Rwith |D| > 1. Then a lower bound for the number of queries w.r.t. D is |D|.

Remark 7.5 Notice that the preceding proposition and corollary require a set of minimal diagnoses. This means that subset-minimality of diagnoses is a necessary prerequisite for guaranteeing the possibility of discrimination between diagnoses. In other words, interactive debugging by means of (some or only) non-minimal diagnoses cannot be proven to work correctly (without making any further assumptions).

image

In this chapter we want to describe, discuss and prove the correctness of methods for the generation of queries which takes place in each iteration of an interactive KB debugging algorithm after a set of leading diagnoses has been determined. With Algorithm 4, similar versions of which can be found in [SFFR12, RSFF13], we present a way to compute a pool QP of queries and associated q-partitions w.r.t. a set of leading diagnoses D and a DPI  ⟨K, B, P, N ⟩R. The generation of this pool QP is the first stage of the query computation function used in the interactive debugging algorithm (Algorithm 5) presented below. In a second stage, one particular query that meets certain criteria such as maximum expected information gain is selected from QP (see Section 9.3).

Before we give a description of Algorithm 4, let us have a look at some example by which we want to demonstrate the principle how a query w.r.t. some set of leading diagnoses for a DPI can be constructed. This should give the reader a first idea and an intuition of how the presented algorithm works.

Example 8.1 Consider the example FOL DPI given by Table 15.2. The set of minimal conflict sets mC⟨K,B,P,N⟩R = {C1, C2} = {⟨1, 3, 4⟩ , ⟨1, 2, 3, 5⟩}(like in previous examples, formulas  ax iin Table 15.2 are sometimes referred to just by their number i if it is clear from the context what is meant). Let the set of leading diagnoses be the set of all minimal diagnoses, i.e.  D = mD⟨K,B,P,N⟩R ={D1, D2, D3, D4} = {[1], [3], [4, 5], [2, 4]}. To enable a better understanding of this example, we first analyze why  C1and  C2are minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R.

Why is  C1a conflict set w.r.t.  ⟨K, B, P, N ⟩R? In the following we underline the formulas  ax iand relevant parts of these formulas used in the derivation of the conflict set. First, there is the background KB B including  a1(w)and  a1(u). Due to  ax 1, by substitution of X by w (written as X/w), we obtain a2(w), m1(w)and  m2(w)from  a1(w). Likewise, we can derive  a2(u), m1(u)and  m2(u)from  a1(u)by X/u. Substituting X by w in  ax 3yields  m1(w) → ¬a(w) ∧ b(w). Thus, we obtain  ¬a(w). A substitution of X by u in  ax 4results in  m2(u) → (∀Y s(u, Y ) → a(Y )) ∧ d(u). By Y/w, we have m2(u) → (s(u, w) → a(w)) ∧ d(u). Since  m2(u)has already been deduced from the background for-

mula  a1(u)and s(u, w) is a background formula as well, we can conclude a(w) from  ax 4. All in all, we have derived  ¬a(w)and a(w), i.e. an inconsistency, by means of B and  C1(and  UPwhich is the empty set) wherefore  C1is a conflict set w.r.t.  ⟨K, B, P, N ⟩Rby Definition 4.1. The minimality of  C1can be easily verified by the way we derived that it is a conflict set; namely, leaving out any of the formulas  ax 1, ax 3or  ax 4does not allow to derive an inconsistency or incoherency (note that the set of negative test cases N is empty).

Why is  C2a conflict set w.r.t.  ⟨K, B, P, N ⟩R? We argue as follows to deduce the inconsistency

image

Table 8.1: First-Order Logic Example DPI

responsible for  C2to be a conflict set (the relevant implications and used formulas are again underlined):

image

Minimality of  C2can again be verified by observing that, given any formula of  C2is left out, no inconsistency or incoherency can be derived.

Now we show how to construct a query manually. As suggested by Definition 7.2 and Proposition 7.4 and discussed in Section 7.5, an obvious way of generating a query w.r.t. D and  ⟨K, B, P, N ⟩Ris via the notion of a q-partition. Definition 7.2 states that Q is a set of common entailments of KBs  K∗i(Formula 7.1) where  Di ∈ D+(Q), a subset of D. Hence, a first step towards query computation is to choose some non-empty subset S of the leading diagnoses D which we will call the seed for query generation. For our manual construction, let  S = {D3, D4} = {[4, 5], [2, 4]}. For each of the diagnoses

Diin S, we assemble the KB  K∗iand use a reasoning engine to obtain a set of entailments  EDiof  K∗i. For  D3we obtain  K∗3 := {1, 2, 3, 4, 5}\{4, 5}∪{6, 7, 8}∪{} = {1, 2, 3, 6, 7, 8}. Similarly, we compute K∗4 = {1, 3, 5, 6, 7, 8}.

Suppose that the reasoner invoked by the used GETENTAILMENTS function produces only entailments of the type  ∀Xp1(X) → p2(X)for predicate names  p1, p2and of the type p(a) where p is a predicate name and a is a constant (cf. Remark 2.3). For this purpose, DL and OWL reasoners, respectively, such as Pellet [SPG+07], HermiT [SMH08], FaCT++ [TH06] or KAON225 could be used with their classification and realization reasoning services. The reason why this is possible can be realized after a short analysis of the DPI  ⟨K, B, P, N ⟩Rgiven by Table 15.2. For, this DPI can be translated to DL similarly as demonstrated in Example 2.1. All the mentioned reasoners can deal with the expressivity of the resulting DL language.

Then, we obtain the sets  ED3and  ED4, i.e. the sets of entailments of  K∗3and  K∗4, respectively, as depicted by Table 8.2. The set of common entailments Q, i.e.  Q = ED3 ∩ ED4is then the set containing all elements in the rows of Table 8.2 that are above the dashed line.

Notice at this point that the set  {a1(w), a1(u), s(u, w)} = Bdoes not need to be computed or, respectively, included in Q since none of these formulas can serve to discriminate between diagnoses (which is the only aim of a query). The simple reason for this is that  K∗ifor each  Di ∈ Dcomprises these formulas and thus each  K∗ientails these formulas by the extensiveness of FOL (cf. Chapter 2). Since entailed by each potential solution KB  K∗i, these formulas cannot yield a violation of any requirements or test cases since none of the KBs  K∗iviolates any requirements or test cases (follows from Definitions 3.5 and 3.2).

Continuing with our query construction, we know by Proposition 7.4 that Q is a query w.r.t. D and ⟨K, B, P, N ⟩Riff  D+(Q) ̸= ∅and  D−(Q) ̸= ∅. Whereas it is trivial that the former condition is met since  D+(Q)contains (at least) the two diagnoses  D3and  D4that we used to compute Q (cf. Definition 7.2), we still need to verify whether the latter condition is actually satisfied for Q. To this end, as per Definition 7.2, we must simply find some diagnosis  Djin  D \ S = {D1, D2, D3, D4} \{D3, D4} = {D1, D2}such that  K∗j ∪ Qviolates some  x ∈ N ∪ R, i.e. whether some negative test case is entailed or whether this KB is incoherent or inconsistent. So, we start with  D1, i.e. we examine (K \ D1) ∪ B ∪ P ∪ Q = {1, 2, 3, 4, 5} \ {1} ∪ {6, 7, 8} ∪ {} ∪ Q = {2, 3, 4, 5, 6, 7, 8} ∪ Q.

And, indeed, we are able to prove an inconsistency for this KB. To see that, verify that by X/w in e2 ∈ Q(see Table 8.2) and  a1(w) = ax 6 ∈ K∗1we can derive  m1(w)which lets us conclude  ¬a(w)by the substitution of X by w in  ax 3 ∈ K∗1. On the other hand, we obtain a(w) by X/u in  e3 ∈ Q, {X/u, Y/w} in  ax 4 ∈ K∗1and  s(u, w) = ax 8 ∈ K∗1as shown in the explanation for conflict set  C1above. Thus,  D1 ∈ D−(Q).

That is, we have just proven that Q is de facto a query w.r.t. D and  ⟨K, B, P, N ⟩R. And this, although we have not yet assigned each leading diagnosis to the respective set of the q-partition of Q. In a situation where just any query shall be asked to the user, this would suffice, and the query could be presented to the interacting user.

However, in case a “best” query according to some criterion shall be determined from a set of different competing queries, usually the computation of the full q-partition of each competing query is required. This is due to the fact that the q-partition provides information about several properties of queries that are considered by common query selection techniques (for details see Section 9.3). So, let us complete the q-partition for our query Q by investigating  K∗2 ∪ Q = {1, 2, 4, 5, 6, 7, 8} ∪ Q. Also in this case we can derive an inconsistency which can be easily realized by reconsidering the argumentation why  C2is a conflict set above and by using  e4 ∈ Qinstead of  ax 3 /∈ K∗2 ∪ Q. That means, the final q-partition P(Q) for Q is given by  ⟨{D3, D4} , {D1, D2} , ∅⟩.

The next question that arises directly from the proofs that  D3, D4 ∈ D−(Q)is whether there is a (set-minimal) subset  Qminof Q such that  Qminpreserves the discrimination properties of Q, i.e. the q-partition  P(Qmin) = P(Q). In fact, the answer is yes for the query Q we computed, but also for the majority of other cases. This is a simple consequence of using the reasoning engine as a black-box which suggests a strategy we pursued in our query construction which relies on a precomputation of entailments and a final minimization part. Sticking to this black-box concept however does not allow to use some customized reasoning procedure that pointedly returns a set of common entailments Q for a set of diagnoses  S ⊂ Dwhere all formulas in Q are necessary for a requirement or test case violation, respectively, of KBs  K∗jfor diagnoses in D \ S.

What militates for such a black-box approach is the generality and independence of a particular logic (for which an adequate glass-box reasoner exists), the easier implementation of the debugging system and potential performance issues with a glass-box approach [KPSH05]. For a black-box algorithm to work, only a reasoner implementing a sound and complete inference procedure for the used logic L must be available.

In general, there is more than one minimized version of a query that preserves the q-partition. Theoretically, the number of such minimal queries w.r.t. one q-partition can be exponential in the size of the initially computed query that is provided as an input to the minimization procedure. For our query Q, for instance,

image

are set-minimal, q-partition preserving subqueries. Namely, each of the sets  Qmin,1, Qmin,2and  Qmin,3together with {2, 5, 6, 7, 8} implies an inconsistency since  m3(w)and  ¬m3(w)can be derived and {2, 5, 6, 7, 8} ⊆ K∗1and  {2, 5, 6, 7, 8} ⊆ K∗2. {e2, e3} ⊂ Qmin,4yields an inconsistency when added to  K∗1, i.e. a(w) and  ¬a(w)are entailed, and  {e4} ⊂ Qmin,4merged with  K∗2yields an inconsistency, i.e. the derivation of  m3(w)and  ¬m3(w). In order not to overwhelm the user we would of course ask them such a minimized version of a query rather than the full query that contains plenty of irrelevant formulas.

An example of a seed S that does not lead to the discovery of a query is  S = {D1, D2, D3}since the set of common entailments  ED1 ∩ ED2 ∩ ED3 = ∅. Note that this holds when all  EDicontain only entailments of the types we specified above. For other types of entailments, i.e. a different specification of the GETENTAILMENTS function, this might no longer hold.

8.1 Generation of a Pool of Queries

The main function GETPOOLOFQUERIES of Algorithm 4 gets as inputs an admissible DPI  ⟨K, B, P, N ⟩Rover L, a set of leading (minimal) diagnoses  D ⊆ mD⟨K,B,P,N⟩Rsuch that  |D| ≥ 2and a parameter q ∈ N ∪ {∞} , q ≥ 1that indicates the number of queries in  QD,⟨K,B,P,N⟩Rthe algorithm is supposed to return (where  q := ∞signalizes that a maximum number of queries should be output). The way of generating a pool of queries is guided by Proposition 7.4 which says that a non-empty set Q of formulas over L is a query w.r.t. D and  ⟨K, B, P, N ⟩Rif and only if  D+(Q)as well as  D−(Q)are non-empty sets of diagnoses. That is, the necessary and sufficient criteria for Q to be a query are

image

2. QP includes a tuple�Q,�D+(Q), D−(Q), D0(Q)��only if Q ∈ QD,⟨K,B,P,N⟩R, and

3. QP includes at most one tuple where  D+(Q) = Y for each Y ⊂ D, and

4. for each  Y ⊂ Dfor which a query  Q w.r.t. D and ⟨K, B, P, N ⟩Rexists such that (a) Q includes only entailments computed by the used GETENTAILMENTS function and (b) P(Q) is such that  D+(Q) = Y ,QP includes a tuple  ⟨Q′, P(Q′)⟩ such that D+(Q′) = Y , and

5.  QP ̸= ∅.

If  q < |QPmax|, then QP includes qtuples satisfying (1), (2) and (3). (|QPmax| ≥ 0is the maximum number of tuples  ⟨Q, P(Q)⟩that can be computed by GETPOOLOFQUERIES by the used GETENTAILMENTS function)

image

10: for  Dr ∈ D \ S do

11: if  Q ⊆ EDr then ▷ Does K∗r |= Q ?

12: D+ ← D+ ∪ {Dr}

image

25: return QP

image

37: procedure ISQPARTCONST(Q,�D+, D−, D0�, ⟨K, B, P, N ⟩R)

38: for  Dr ∈ D− do

image

44: return true

image

Table 8.2: (Example 8.1) Entailments computed for KBs  K∗3 and K∗4.

(CQ1) Q ̸= ∅and

(CQ2) D+(Q) ̸= ∅and

(CQ3) D−(Q) ̸= ∅.

Note, since the disjoint sets of diagnoses  D+(Q) ⊆ Dand  D−(Q) ⊆ Dmust not be empty,  |D| ≥ 2must be postulated in order for any queries to exist w.r.t. D and  ⟨K, B, P, N ⟩R(cf. Corollary 7.4).

As a first action (lines 3-5), the algorithm computes a set of entailments  EDifor each  K∗i(cf. For- mula 7.1) where  Di ∈ Dand stores these entailments along with the respective diagnosis as a tuple ⟨Di, EDi⟩in a set  ED. This is accomplished by the function GETENTAILMENTS which gets a tuple ⟨X, Y, Z, W⟩of arguments where X, Y, Z are sets of formulas over some logic L and W is a set including sets of formulas over L. Then, GETENTAILMENTS computes a finite (cf. Remark 2.3) set of entailments of certain types (cf. Examples 8.1 and 8.6) of the KB  (Y \ X) ∪ Z ∪ UW.

Then, the algorithm runs through all proper non-empty subsets S of the leading diagnoses D and, for each S, it computes the set of common entailments Q of all KBs  K∗iwhere  Di ∈ S(function GET- COMMONENTAILMENTS) by means of the precomputed set  ED. That is,  Q := �D∈S ED. If Q is non-empty, then CQ1 and CQ2 are fulfilled for Q. CQ2 is met since  S ̸= ∅and thus there is a diagnosis  Di ∈ Dsuch that  K∗i |= Qwhich implies that  D+(Q) ̸= ∅. So, the algorithm proceeds to verify CQ3 (lines 10-17) in that it assigns the remaining diagnoses in D that are not in S to the according sets D+(Q), D−(Q)or  D0(Q)as per Definition 7.2. Note that the function ISKBVALID has been speci-fied in Algorithm 1 on page 48. With the parameters given when called in line 13, ISKBVALID checks whether  K∗r ∪ Q = (K \ Dr) ∪ B ∪ UP∪{Q}does not violate any requirement in R and does not entail any test case in N . Once the call to this function returns false for one diagnosis  Dr ∈ D \ S, it holds that Dr ∈ D−(Q)thus CQ3 is definitely met. Therefore, isQuery is set to true in line 15. If, on the other hand, isQuery is not set to true for any diagnosis in D \ S, then the set  D−(Q) = ∅and thus Q is not in  QD,⟨K,B,P,N⟩R.

So far, we have proven the following proposition.

Proposition 8.1. Let a DPI  ⟨K, B, P, N ⟩R, a set of diagnoses  D ⊆ mD⟨K,B,P,N⟩Rand a natural number q ≥ 1be the input to the function GETPOOLOFQUERIES. Then, a value stored in variable Q at the time GETPOOLOFQUERIES executes line 18 is a query w.r.t. D and  ⟨K, B, P, N ⟩Riff the variable isQuery stores the value true.

If the purpose was only to find queries (and not q-partitions), the algorithm could stop processing for the current Q and go to the next set S, given that isQuery is set to true for some diagnosis. However, as the q-partition provides meaningful information to assess a query, e.g. it gives the number of diagnoses invalidated for each answer or the estimated probability of each answer (cf. Chapter 7), the q-partition is a necessary input to the subsequently called function SELECTBESTQUERY (line 48 in Algorithm 6, see later in Sections 9.2.4 and 9.3) that selects a query from the pool of queries QP. For this reason, the algorithm continues until the computation of the q-partition for Q is complete.

In a last step (lines 18-20), given that isQuery is true and there is not yet a query with the same q-partition in QP, the algorithm computes a set-minimal subset  Qminof Q such that the q-partition of Qminis the same as the one of Q (function MINQ). Finally, the tuple�Qmin,�D+, D−, D0��including the minimized query  Qminalong with its q-partition�D+, D−, D0�is added to QP. If |QP| = q, then QP is returned; otherwise, a further iteration for another S is executed. If |QP| = q is not met until all seeds S have been processed, the set QP is checked for emptiness in line 23. If  QP = ∅, then the function ADDTRIVIALQUERIES (line 24) adds  |D| ≥ 2queries as defined by Q in Proposition 7.5 to QP (cf. Corollary 7.4) and then returns QP; otherwise, QP is directly returned.

Remark 8.1 Notice that lines 23 and 24 in Algorithm 4 aim at ensuring the non-emptiness of the pool of queries QP returned by GETPOOLOFQUERIES for any GETENTAILMENTS function (see Example 8.6 for different specifications of the GETENTAILMENTS function). This is a necessary criterion for the interactive KB debugging system (Algorithm 5) to work in a sound way since it guarantees that the CALCQUERY function (line 16 in Algorithm 5) always returns a query w.r.t. the current set of leading diagnoses D and the given DPI. Note that the |D| queries generated and added to QP by ADDTRIVIALQUERIES can be trivially obtained without the consultation of a reasoning service by extraction of the respective formulas from the KB K, as prescribed by Proposition 7.5.

8.2 Discussion of Query Pool Generation

Multiple Equal Q-Partitions. In the general case there is more than one query w.r.t. one and the same q-partition. For that reason alone that a minimized query is a set-minimal subset of an initially computed one where multiple such subsets may exist.

Example 8.2 An example for such a query resulting in multiple minimized subqueries with identical q-partition can be found in Example 8.1.

However, note that GETPOOLOFQUERIES is designed to compute a pool QP that includes at most one query with one and the same q-partition. The idea behind this is (1) to minimize the calls to the expensive function MINQ and (2) that two queries with the same q-partition have exactly the same properties w.r.t. common query selection criteria such as maximum expected information gain or maximum worst case invalidation rate of diagnoses after the query answer is known. Such criteria have been shown to often lead to a reduction of debugging effort for the interacting user (cf. [SFFR12, RSFF13]). As the purpose of the computation of the pool of queries QP is to constitute an input to the query selection function that uses exactly such selection measures, the inclusion of only one query with a particular q-partition is reasonable, also (3) to minimize computation time of the query selection function which needs to go through all elements of QP in order to pick the “best” one in the worst case.

On the other hand, regarding the comprehensibility of the query, i.e. the cognitive load on the user when it comes to understanding the meaning of the query, two queries with the same q-partition may well be significantly different. This however is beyond the scope of this work and considered a topic for future research.

The following proposition gives evidence that the set QP returned by GETPOOLOFQUERIES is indeed duplicate-free w.r.t. the q-partitions in QP.

Proposition 8.2. Let a DPI  ⟨K, B, P, N ⟩R, a set of diagnoses  D ⊆ mD⟨K,B,P,N⟩Rand  q ∈ N ∪{∞} , q ≥ 1be the input to the function GETPOOLOFQUERIES. Then, the function GETPOOLOF-QUERIES returns a set QP including tuples of the form  ⟨Q, P(Q)⟩where  Q ∈ QD,⟨K,B,P,N⟩Ris a query and  P(Q) =�D+(Q), D−(Q), D0(Q)�is the q-partition of Q such that QP does not include any two equal queries and does not include any two equal q-partitions.

Proof. The test of the criterion  ¬INCLQPART tested before the call to MINQ will always return false for the q-partition�D+, D−, D0�if�D+, D−, D0�is already included in a tuple in QP. Since MINQ is q-partition-preserving, no q-partition that does not occur in a tuple in QP can become equal to some q-partition in QP by a call to MINQ. Therefore, QP cannot include any two equal q-partitions. Since two equal queries have equal q-partitions, any two different q-partitions cannot be q-partitions of equal queries. Thus, QP cannot include any two equal queries either.

Note that, on account of the q-partition preserving property of MINQ, only such q-partitions are ruled out by the criterion in line 18 that would lead to duplicates at the time they should be added to QP in line 20.

Computation of Entailments. Generally, the (theoretical) number of entailments of a set of formulas is not finite. However, the entailments (of a certain type) returned by a reasoner are finite. For instance, asked for entailments of  {A ⊑ B ⊓ C}, a reasoner performing the classification reasoning service would give back  A ⊑ Band  A ⊑ C, but not entailments like  A ⊑ B ⊔ Cor  A ⊑ C ⊓ C ⊓ C. That is, when we speak of entailments, then we mean entailments in the practical sense (cf. Remark 2.3), i.e. w.r.t. a reasoning service such as classification for DL KBs which computes all and only subsumptions  X ⊑ Ysuch that Y is the most specific concept that subsumes X, or forward-chaining for Datalog KBs which computes all and only atoms that are entailed by the KB.

Example 8.3 If we recall Example 8.1, we see that the number of computed entailments of  K∗4and K∗3was 19 and 13 respectively, which are rather high numbers in the light of the small KBs, but impor- tantly these numbers are necessarily finite. For, there cannot be more than  |Pred|2entailments of the ∀Xp1(X) → p2(X)type and not more than |Pred| |Const| entailments of the p(a) type for a KB whose signature includes the unary predicate symbols Pred and constant symbols Const and does not include any function symbols. In case of KB  K∗3, for example, the set  Pred = {a1, a2, m1, m2, m3, a, b}and Const = {u, w} which means that upper bounds for the number of entailments of the first and second type are 49 and 14, respectively.

Further, note that the number of existing different q-partitions and which q-partitions there are at all w.r.t. some set of leading diagnoses D and a DPI depends on the function GETENTAILMENTS, i.e. on the set of entailments calculated by it.

Example 8.4 Recall Example 8.1 where we constructed a query Q w.r.t. the set of all minimal diagnoses for the DPI given by Table 15.2. Assume now that only entailments of the first type, i.e. those of the form  ∀Xp1(X) → p2(X), and none of the second type p(a) are computed by GETENTAILMENTS and denote the set of entailments of this form of  K∗iby  E′Di. Then,  Q′ = E′D3 ∪ E′D4 = {e1, . . . , e5}(cf. Table 8.2), i.e. a subset of the query Q computed for a GETENTAILMENTS function producing entailments of both types. The q-partition of  Q′is the same as the q-partition of Q, namely  ⟨{D3, D4} , {D1, D2} , ∅⟩. However, the queries  Qmin,1and  Qmin,2are no longer obtained as minimized versions of  Q′, unlike Qmin,3and  Qmin,4which are subqueries of  Q′, too.

Minimizing the Set  D0in Q-Partitions. Recall that  D0 = ∅is a desirable property of a q-partition since a query with such q-partition may invalidate any leading diagnosis, depending on the answer to the query (cf. Chapter 7). In other words, no leading diagnosis is guaranteed to be still valid for any answer after the query is added as a test case to the DPI.

In general, GETPOOLOFQUERIES computes q-partitions where  D0may be a non-empty set. However, if the GETENTAILMENTS function is specified to compute certain explicit entailments of K, then D0 = ∅can be guaranteed.

Definition 8.1 (Explicit Entailment). Let K be a KB. Then,  αis an explicit entailment of K iff  α ∈ K.

Now, if each set of entailments  EDcomputed by GETENTAILMENTS includes all the formulas that occur in some diagnosis in D, but do not occur in D, then GETPOOLOFQUERIES definitely returns a set QP of queries and associated q-partitions where  D0(Q) = ∅holds for each tuple in QP.

Proposition 8.3. Let  ⟨K, B, P, N ⟩Rbe a DPI and  D ⊆ mD⟨K,B,P,N⟩R. If the set  EDcomputed by GETENTAILMENTS meets  ED ⊇ UD \ Dfor all  D ∈ D, then GETPOOLOFQUERIES computes only queries Q with  D0(Q) = ∅.

Proof. Assume that Q is some query computed by GETPOOLOFQUERIES. As MINQ is a q-partition preserving transformation of Q, we can assume w.l.o.g. that Q is a query computed by GETPOOLOF-QUERIES before MINQ is called for Q. We have to show that for an arbitrary diagnosis  Di ∈ Deither  Diis assigned to  D+(Q)or to  D−(Q).

So, let us assume that there is a diagnosis  Dkwhich is assigned to  D0(Q) = D \ (D+(Q) ∪ D−(Q))in line 17. Then,  Q ̸⊆ EDkand  K∗k ∪ Qdoes not violate any  x ∈ R ∪ Nmust hold, otherwise  Dkwould have already been assigned to  D+(Q)in line 12 or to  D−(Q)in line 14. But  Q ̸⊆ EDkimplies Q ̸⊆ UD \ Dksince  EDk ⊇ UD \ Dkby precondition. This in turn means that there is some formula ax in Q which is not in  UD \ Dk. Then  ax ∈ Dkmust hold, as otherwise for all formulas  ax ′ ∈ Qit would hold that  ax ′is an entailment of  K∗k = (K\Dk)∪B∪UP, i.e. an entailment of all formulas in  K∪B∪UPexcept for those in  Dk. However, all entailments of  K∗kare stored in  EDkby the implementation of the function GETENTAILMENTS. Thus  Q ⊆ EDkwould hold which cannot be the case as shown before. Consequently, we have derived that  Q ∩ Dk ̸= ∅which means by set-minimality of diagnoses in D, in particular of  Dk, that  K∗k ∪ Qmust violate some  x ∈ R ∪ Nwhich is a contradiction to the assumption that  Dk ∈ D0(Q).

Example 8.5 Let us come back to the example DPI given by Table 15.2. The possibility of a query Q constructed by Algorithm 4 with  D0(Q) ̸= ∅is witnessed by the selection of seed  S = {D1}and the assumption that entailments of the two types given in Example 8.1 are produced by GETENTAILMENTS. The set of entailments  Q = ED1 = {e4, e14, e15, ∀Xm2(X) → d(X)}(for  eicf. Table 8.2). Then,  D2as well as  D3are assigned to  D−(Q)as both KBs  K∗3∪Q, K∗4∪Qentail  m3(w)and  ¬m3(w)wherefore they are both inconsistent and thus violate  r1 ∈ R. However,  D4 ∈ D0(Q)since  K∗i ̸|= ∀Xm2(X) → d(X)and hence does not entail Q and since  K∗i ∪ Qdoes not violate consistency or coherency (recall that the set of negative test cases is empty in the DPI and thus must not be considered), i.e. does not contain a conflict set.

Applying Proposition 8.3, we could use a modified GETENTAILMENTS function that returns a minimal set of entailments just that the precondition of the proposition is met, i.e.  E′D = UD \ Dfor all D ∈ D. With this function, for the seed  S = {D1}we would get  Q′ = E′D1 = {2, 3, 4, 5}(again, formu- las in Table 15.2 are referred to just by their number). Let us now check whether  D0(Q′)is indeed empty. As explicit entailments are stronger than non-explicit ones, we must still have that  D2, D3 ∈ D−(Q′). For  D4, we have  K∗4 ∪ Q′ = {1, 3, 5, 6, 7, 8} ∪ {2, 3, 4, 5} = {1, 2, 3, 4, 5, 6, 7, 8}which corresponds to the entire KB plus background knowledge of the given DPI and includes conflict sets  C1 = {1, 3, 4}and  C2 = {1, 2, 3, 5}wherefore it is inconsistent. Therefore, diagnosis  D4must also be an element of D−(Q′).

Please note that making the entailments  Q = ED1computed by the unmodified GETENTAILMENTS function only slightly stronger would already suffice to force inclusion of  D4in  D0(Q). In fact, including ax 4 := ∀Xm2(X) → (∀Y s(X, Y ) → a(Y )) ∧ d(X)in Q instead of  ∀Xm2(X) → d(X)would make Q non-disjoint with  D4as both comprise  ax 4. Consequently, in line with the proof of Proposition 8.3, K∗4 ∪ Qmust include a conflict set ({1, 3, 4}) wherefore  D4 ∈ D−(Q).Another point we want to mention is that empty  D0could also be achieved by making the query slightly weaker. For our concrete query  Q = ED1, this means that leaving out  ∀Xm2(X) → d(X)would lead to empty  D0(Q). However, the difference to the scenario above where we made Q sightly stronger is that  D4would be an element of  D+(Q)instead of  D−(Q)in this case, i.e. the q-partition would be  ⟨{D1, D4} , {D2, D3} , ∅⟩.

A shortcoming of the strategy of making the query weaker is that it can be computationally expensive as perhaps a large number of subsets of Q might need to be considered and tested for fulfillment of D0(Q) = ∅. Each such test would involve calls to the reasoner which are usually expensive. A second drawback is that no guarantee is given to finally end up with an empty set  D0(Q)since weakening of Q might also involve the “shift” of some diagnosis from  D−(Q)to  D0(Q). On the other hand, the strategy of computing stronger entailments is computationally more resource-saving as (trivially obtained) explicit entailments can be added to make the query stronger. Furthermore, making the query stronger – in a controlled way, by adding formulas from  UD \ UD+(Q)to Q as suggested by Proposition 8.3 – can never lead to non-empty  D0(Q)as Proposition 8.3 substantiates.

(Non-)Completeness of Query Pool QP. Note that specifying  q := ∞causes GETPOOLOFQUERIES to run through all  S ⊂ Dand to compute a maximum number of queries. However, in general, not all theoretically possible queries are computed by GETPOOLOFQUERIES. One trivial reason for this is that only minimized, i.e. set-minimal, queries are contained in the returned set QP.

But, also queries  Q′with  D+(Q′) = Y ⊂ Dwill not be included in QP if there is some query Q with  D+(Q) = Ysuch that  |D−(Q)| > |D−(Q′)|(and, equivalently,  |D0(Q)| < |D0(Q′)|). As we will learn in a moment, both mentioned reasons for the incompleteness of the output of GETPOOLOFQUERIES will even be desirable for reasons of efficiency. That is, the mentioned types of queries that are not taken into account in QP are “non-preferred” as non-set-minimal queries demand a non-necessary amount of user interaction and the answering of queries Q with a non-necessarily large set  D0(Q)involves a worse discrimination between leading minimal diagnoses (and, if these are “good” representatives of all minimal diagnoses, then of all minimal diagnoses) than other queries  Q′with  |D0(Q′)| < |D0(Q)|and D+(Q) = D+(Q′).

Still, GETPOOLOFQUERIES meets a completeness criterion for a subset of all queries  QD,⟨K,B,P,N⟩R, elements of which cannot be trivially detected to be “non-preferred”. That is, GETPOOLOFQUERIES is complete w.r.t. the set  D+, as the following proposition states. In other words, for each subset  X ⊂ Dit detects a q-partition with  D+ = X, if one exists.

Proposition 8.4. Let a DPI  ⟨K, B, P, N ⟩R, D ⊆ mD⟨K,B,P,N⟩Rsuch that  |D| ≥ 2and some  q ∈N ∪ {∞} , q ≥ 1be the inputs to GETPOOLOFQUERIES and let  |QPmax| ≥ 0be the maximum number of tuples  ⟨Q, P(Q)⟩that can be computed by GETPOOLOFQUERIES by means of the used GETENTAIL- MENTS function. Further, let Y be an arbitrary subset of D. If there is some query  Q ∈ QD,⟨K,B,P,N⟩Rthat (1) includes only entailments that are computed by GETENTAILMENTS and (2) has a q-partition such that  D+(Q) = Y, then GETPOOLOFQUERIES with parameter  q ≥ |QPmax|returns a set QP including a query  Q′with  D+(Q′) = Y. Moreover, this query  Q′is found in the iteration where the seed S = Y .

Proof. Since  q ≥ |QPmax|, GETPOOLOFQUERIES will arrive at a step where it selects the seed S = Y in line 6. Now, let us assume that in this iteration no query Q with  D+(Q) = Yis found. Then, either (a) no query is found at all, i.e. CQ1 or CQ2 or CQ3 are violated, or (b) a query Q with  D+(Q) ̸= Yis found.

(a): Assume first that CQ1 is violated, i.e. GETCOMMONENTAILMENTS called with argument S returns  ∅. This implies that the KBs  K∗rfor  Dr ∈ Yhave no common entailments, if entailments are computed by GETENTAILMENTS. This however means that there cannot be a q-partition with  D+ ⊇ Ywhich is a contradiction to the precondition that there is some query  Q ∈ QD,⟨K,B,P,N⟩Rthat includes only entailments computed by GETENTAILMENTS and has a q-partition such that  D+(Q) = Y.

Second, assume that CQ2 is violated, i.e.  D+(Q) = ∅. If GETCOMMONENTAILMENTS with argument S returned  Q ̸= ∅, then  D+(Q) ⊇ S ⊃ ∅would hold. Thus,  Q = ∅, i.e. CQ1 is violated. So, as shown before, this leads to a contradiction.

In case any of CQ1 or CQ2 is violated, we already derived a contradiction. So, we make the assumption that CQ1 and CQ2 are met. So, finally, let us assume that CQ3 is violated, i.e. that  D−(Q) = ∅. That is, if Q (which must be a non-empty set by CQ1) denotes all common entailments (computable with GETENTAILMENTS) of  K∗rfor  Dr ∈ Y, then  K∗i ∪Qdoes not violate any  x ∈ R ∪Nfor any  Di ∈ D\S. Consequently, for all diagnoses  Diin D we have that  K∗i ∪ Qdoes not violate any  x ∈ R ∪ N. But, as there is, by precondition, a query with  D+ = Y, this query must be a subset of all possible common entailments (computable with GETENTAILMENTS) of KBs  K∗ifor diagnoses in Y , i.e. this query must be a subset of Q. But, by monotonicity of L, no  K∗i ∪ Q′for a subset  Q′of Q can violate  x ∈ R ∪ Nif Q does not. Again, we have a contradiction to the precondition as above.

(b): Here, a query Q is found with  D+(Q) ̸= Yand  D−(Q) ̸= ∅. Since Q is a query,  Q ̸= ∅must hold. Since the seed S = Y , this means that Q is the set of all common entailments (computable with GETENTAILMENTS) of  K∗ifor  Di ∈ Y, i.e.  D+(Q) ⊇ Y. By  D+(Q) ̸= Y, we conclude that  D+(Q) ⊃Y must be true. The only way of achieving a smaller set  D+(Q), namely  D+(Q) = Y, is to add some formulas to Q as making Q smaller can only increase  D+(Q). This holds because postulating that, instead of Q, only a subset  Q′of Q must be entailed by  K∗i, can cause a new KB  K∗jfor diagnosis  Dj /∈ D+(Q)to entail  Q′. However, as Q is the set of all entailments computable with GETENTAILMENTS of KBs K∗ifor  Di ∈ Y, a superset  Q′′of Q computed by GETENTAILMENTS with  D+(Q′′) = Ycan never be obtained. Therefore, we have a contradiction to the precondition.

We have now proven the following: If there exists a q-partition as described in the proposition, then this q-partition is found in the iteration where the seed S = Y .

Remark 8.2 Regarding Proposition 8.4, note the following:

(a) In fact, as one and the same q-partition must occur at most once in QP, GETPOOLOFQUERIES must only keep assigning diagnoses in D \ S to the respective sets of the q-partition as long as  D+ = S. Because for  D+ = Z ⊃ S, we know to find a query (if one exists) for the seed S = Z.

(b) A statement equivalent to the proposition is: If there is no query (including only entailments computed by the GETENTAILMENTS function) with  D+ = Yfound for seed S = Y , then such a query and q-partition, respectively, does not exist.

The following proposition states that if a q-partition with one and the same set  D+is found twice during the execution of GETPOOLOFQUERIES, then the queries for both q-partitions and thus both q-partitions must be equal. That is, for one set  D+, there is at most one tuple in QP.

Proposition 8.5. Let  Qibe a query with  D+(Qi) = Yin the set QP returned by GETPOOLOFQUERIES and found for seed  Si = Yand let  Qjbe a query with  D+(Qj) = Yin the set QP returned by GETPOOLOFQUERIES and found for some seed  Sj ⊂ Y. Then  Qi = Qj.

Proof. Let  Q′i, Q′jbe the queries stored in the variable Q in line 18 for seeds  Siand  Sj, respectively; i.e. the supersets of the queries  Qi, Qjbefore the minimization function MINQ is called for each of them. Q′j ⊆ Q′iholds by the fact that  Q′iis the set of all common entailments computable with GETENTAIL- MENTS of  K∗rfor  Dr ∈ Yand by the fact that  Q′jmust be a set of common entailments computed by GETENTAILMENTS of exactly these KBs, because of  D+(Q′j) = Yand Definition 7.2.  Q′j ⊇ Q′iholds by the fact that  Q′jis computed as intersection of  EDrwhere  Dr ∈ Sjand  Q′iis computed as intersection of  EDswhere  Ds ∈ Si ⊃ Sj. Thus, we can conclude that  Q′i = Q′j.

As  Q′i = Q′j, also  P(Q′i) = P(Q′j)must hold for the q-partitions by Proposition 7.2. That the mini- mized versions  Qi, Qjof  Q′i, Q′joutput by MINQ are equal, follows from the determinism of the MINQ function, wherefore equal inputs, i.e.  (∅, Q′i, ∅, P(Q′i), ⟨K, B, P, N ⟩R) = (∅, Q′j, ∅, P(Q′j), ⟨K, B, P, N ⟩R), must yield equal outputs.

Remark 8.3 Proposition 8.5 hints at a possible improvement of Algorithm 4, namely to check in line 6 whether the seed S already occurs as a set  D+in some tuple in QP and only continue the execution for S if this does not hold (not shown in Algorithm 4). In this vein, time and reasoning costs (line 14) can be saved.

Another improvement regarding line 6 is to delete all remaining seeds  S′with the property  S′ ⊃ Sif Q in line 8 is the empty set (not shown in Algorithm 4). Namely, all seeds  S′must also lead to  Q = ∅since the intersection of  EDfor  D ∈ Salready returned  ∅wherefore the intersection of  EDfor  D ∈ S′must also return  ∅.

By now, we know from Proposition 8.5 that, given a query with  D+exists, one and only one q-partition with  D+will be added to QP, but which one?

W.r.t. one and the same set  D+, queries with a set  D−with higher cardinality are preferable over others as the cardinality of  D0should be minimized (cf. Chapter 7). So, preferable queries among those with equal set  D+are those for which  D−is a set-maximal set. Exactly such a query is added to QP for each  D+for which a query exists, as the following proposition shows.

Proposition 8.6. If the set QP returned by GETPOOLOFQUERIES comprises a query Q with  D+(Q) =Y , then Q is a query with minimal  |D0(Q)|among all queries  Q′with  D+(Q′) = Ycomputable with the function GETENTAILMENTS.

Proof. Assume that GETPOOLOFQUERIES finds a query Q with  D+(Q) = Yand  |D0(Q)| = kand assume there is a query  Q′(consisting only of entailments computed by function GETENTAILMENTS) with  D+(Q′) = Yand with  |D0(Q′)| < k. This means that  |D−(Q)| < |D−(Q′)|. However, as Q is computed for seed S = Y , Q is a maximal set of entailments computable with GETENTAILMENTS of  K∗ifor  Di ∈ Y. Because  Q′is also a common entailment of  K∗ifor  Di ∈ Y, we have that  Q′ ⊆ Qmust be true. Since the fact that  K∗i ∪ Qdoes not violate any  x ∈ R ∪ N, i.e. the fact that  Di /∈ D−(Q), implies by monotonicity of L that  K∗i ∪ Q′for the subset  Q′of Q cannot violate any  x ∈ R ∪ Neither, i.e.  Di /∈ D−(Q′), we conclude that  |D−(Q′)| ≤ |D−(Q)|must hold. This is a contradiction.

8.3 Minimization of Queries

MINQ. The minimization of the query Q by MINQ (see Algorithm 4) while preserving the q-partition aims at simplifying the job of the answering user who only needs to go through a smaller set of logical formulas  Qminin order to come up with an answer to the query. Since the q-partition reflects the properties of a query w.r.t. the invalidation of (leading) diagnoses and two queries have equal such properties, then of course the one that is a subset of the other should be asked.

The concept of the function MINQ is similar to the one of QX (Algorithm 1). Like QX, MINQ carries out a divide-and-conquer strategy to find a set-minimal set with a monotonic property. In this case, the monotonic property is not the invalidity of a subset of the KB w.r.t. a DPI (as per Definition 3.3) as it is for the computation of minimal conflict sets using QX, but the property of some  Qmin ⊂ Qhaving the same q-partition as Q. So, the crucial difference between QX and MINQ is the function that checks this monotonic property. For MINQ, this function – that checks a subset of a query for constant q-partition – is ISQPARTCONST.

MINQ – Input Parameters. MINQ gets five parameters as input. The first three, namely X, Q and QB, are relevant for the divide-and-conquer execution, whereas the last two, namely the original q-partition �D+, D−, D0�of the query (i.e. the parameter Q) that should be minimized, and the DPI  ⟨K, B, P, N ⟩Rare both needed as an input to the function ISQPARTCONST. Besides the latter two, another argument QB is passed to this function where QB is a subset of the original query Q. ISQPARTCONST then checks whether the q-partition for the (potential) query QB is equal to the q-partition�D+, D−, D0�of the original query given as argument. The DPI is required as the parameters K, B, P, N and R are necessary for these checks.

MINQ – Testing Sub-Queries for Constant Q-Partition. In particular, ISQPARTCONST tests for each Dr ∈ D−whether  K∗r ∪ QBis valid (w.r.t.  ⟨·, ∅, ∅, N⟩R). If so, this means that  Dr /∈ D−(QB)and thus that the q-partition of QB is different to the one of Q wherefore false is immediately returned. If true for all  Dr ∈ D−, it is tested for  Dr ∈ D0whether  K∗r |= QB. If so, this means that  Dr /∈ D0(QB)and thus that the q-partition of QB is different to the one of Q wherefore false is immediately returned. If false is not returned for any  Dr ∈ D−or  Dr ∈ D0, then the conclusion is that QB is a query w.r.t. to D and  ⟨K, B, P, N ⟩Rand has the same q-partition as Q wherefore the function returns true.

Note that, instead of calling a reasoner to answer whether  K∗r |= QB, the set of precalculated entail- ments  EDrof  K∗rfor each  Dr ∈ Dcan be given as an argument to MINQ as well as to ISQPARTCONST (not shown in Algorithm 4). In this case an equivalent test is  QB ⊆ EDr. Such a strategy is particularly appropriate if reasoning is expensive for the DPI at hand.

Soundness of ISQPARTCONST is proven by the following lemma.

Lemma 8.1. Let  ⟨K, B, P, N ⟩Rbe a DPI,  D ⊆ mD⟨K,B,P,N⟩R, Q ∈ QD,⟨K,B,P,N⟩Rwith q-partition P(Q) =�D+(Q), D−(Q), D0(Q)�. Then a non-empty set  QB ⊂ Qis a query in  QD,⟨K,B,P,N⟩Rwith P(QB) = P(Q) if

1.  ∀Dr ∈ D−(Q) : K∗r ∪ QBviolates some  r ∈ Ror entails some  n ∈ Nand

2. ∀Dr ∈ D0(Q) : K∗r̸|= QB.

Proof. Let  Q ∈ QD,⟨K,B,P,N⟩Rand QB be an arbitrary proper subset of Q. If criterion 1) of this lemma is met, then we know that each diagnosis in  D−(Q)is in  D−(QB)as well, i.e. (I):  D−(QB) ⊇ D−(Q)holds.

Assume a minimal diagnosis  Dr ∈ D0(Q). Then,  K∗r ∪ Qdoes not violate any  r ∈ Rand does not entail any  n ∈ Nand  K∗rdoes not entail Q. This however implies that  K∗r ∪QBcannot violate any  r ∈ R

and cannot entail any  n ∈ Neither by monotonicity of L. But it is possible that  K∗r |= QB. So, validity of criterion 2) of this lemma is sufficient to guarantee that each diagnosis in  D0(Q)is in  D0(QB)as well, i.e. (II):  D0(QB) ⊇ D0(Q)holds.

As all diagnoses in  D+(Q)entail all formulas in Q by Definition 7.2, all diagnoses in  D+(Q)must entail QB as well. Consequently, due to deletion of some formulas from Q, no  Dr ∈ D+(Q)can “move” to any set  D−(QB)or  D0(QB). That is, (III):  D+(QB) ⊇ D+(Q)must hold.

So, the overall conclusion is that, if criterion 1) and 2) are met, then (I), (II) and (III) hold. Assume that some  ⊇-relation in  i ∈ {(I), (II), (III)} is a  ⊃-relation. This leads to a violation of some  j ∈{(I), (II), (III)} with  j ̸= isince�D+(Q), D−(Q), D0(Q)�and�D+(QB), D−(QB), D0(QB)�are partitions of D. Therefore, all  ⊇-relations must be =-relations and we can derive that P(Q) = P(QB).

Moreover, we have that QB must be a query. This is due to the facts that QB is non-empty, Q is a query and the q-partitions of Q and QB are equal. Therefore,  D+(QB) = D+(Q) ≥ 1and D−(QB) = D−(Q) ≥ 1which lets us conclude by Proposition 7.4 that QB is a query.

MINQ – The Divide-and-Conquer Strategy. Intuitively, MINQ partitions the given query Q in two parts  Q1and  Q2and first analyzes  Q2while  Q1is part of QB (line 34). Note that in each iteration QB is the subset of Q that is currently assumed to be part of the sought minimized query (i.e. the one query that will finally be output by MINQ). In other words, analysis of  Q2while  Q1is part of QB means that all irrelevant formulas in  Q2should be located and removed from  Q2resulting in  Qmin2 ⊆ Q2. That is,  Qmin2must include only relevant formulas which means that  Qmin2along with QB is a query with an equal q-partition as Q, but the deletion of any further formula from  Qmin2changes the q-partition.

After the relevant subset  Qmin2 of Q2, i.e. the subset that is part of the minimized query, has been returned,  Q1is removed from  QB, Qmin2is added to QB and  Q1is analyzed for a relevant subset that is part of the minimized query (line 35). This relevant subset,  Qmin1, together with  Qmin2, then builds a set-minimal subset of the input Q that is a query and has a q-partition equal to that of Q. Note that the argument X of MINQ is the subset of Q that has most recently been added to QB.

For each call in line 34 or line 35, the input Q to MINQ is recursively analyzed until a trivial case arises, i.e. (a) until Q is identified to be irrelevant for the computed minimized query wherefore  ∅is returned (lines 27 and 28) or (b) until |Q| = 1 and Q is not irrelevant for the computed minimized query wherefore Q is returned (lines 29 and 30).

Example 8.6 Let us reconsider the FOL DPI depicted by Table 15.2 on page 270. We recall that sets of minimal conflict sets and minimal diagnoses w.r.t. this DPI were given by  mC⟨K,B,P,N⟩R = {C1, C2} ={⟨1, 3, 4⟩ , ⟨1, 2, 3, 5⟩}as well as  mD⟨K,B,P,N⟩R = {D1, D2, D3, D4} = {[1], [3], [4, 5], [2, 4]}. For this DPI, a set of minimized queries computed by GETPOOLOFQUERIES is presented by Table 8.3. Note that these queries have been produced by different GETENTAILMENTS functions (as indicated by the dashed lines in Table 8.3). That is,  Qifor  i ∈ {1, . . . , 5}have been produced by the same GETENTAILMENTS function that is described in Example 8.1. For  i ∈ {6, . . . , 9}, Qihas been computed from a GETEN- TAILMENTS function that outputs only explicit entailments (cf. Definition 8.1) and  Q10from a GETEN- TAILMENTS function that returns a finite set of entailments where each entailment is some FOL formula. This could be accomplished, for example, by some resolution-based reasoning procedure [CL73].

It is important to realize that the results regarding Algorithm 4 established so far, most of which depend on the particular used GETENTAILMENTS function, must only hold within one part of Table 8.3 (where different parts are separated by the dashed lines). For example, for  Q2and  Q9it holds that D+(Q2) = D+(Q9), but  D−(Q2) ̸= D−(Q9)and  D0(Q2) ̸= D0(Q9). By application of one and the same GETENTAILMENTS function, this case would be prohibited by Proposition 8.5. Furthermore, by Proposition 8.6, only  Q9would be an element of the query pool QP in this case since  D0(Q9) ⊂D0(Q2).

Moreover, we want to remark that  Q7, Q8and  Q9can be seen as a proof that  Q6is indeed set-

minimal. Each  Qi, i ∈ {7, 8, 9}is a result of the removal of a single formula from  Q6. And, each such Qifeatures a q-partition different from the one of  Q6. This illustrates quite well the principle of MINQ which performs tests of exactly this kind to verify minimality of a query or detect formulas that might be deleted from it under preservation of the q-partition, respectively.

Another essential note is that it is guaranteed that  D0(Q6) = ∅. This holds due to the construction of  Q6as  UD \ D4 = {1, 2, 3, 4, 5} \ [2, 4] = {1, 3, 5}(recall that we use squared brackets to denote diagnoses in spite of the fact that these are sets, cf. Table 2.1). So,  Q6comprises all formulas occurring in minimal diagnoses except for the ones contained in  D4. We have that for any two different minimal diagnoses  Di, Djw.r.t. one and the same DPI it must be true that  Di \ Dj ̸= ∅as well as  Dj \ Di ̸= ∅as otherwise one would be necessarily a subset of the other. From this, we can easily derive that  K∗i ∪ Q6for  i ∈ {1, . . . , 3}, i.e. for all minimal diagnoses  Diw.r.t. this DPI other than  D4which was used to build the query  Q6, must comprise a conflict set. This must be valid by the minimality of  Diand since by  Q6at least one formula of  Diis readded to the KB. Note that a similar argumentation was used in the proof of Proposition 8.3.

image

Table 8.3: Some queries and associated q-partitions for the DPI given by Table 15.2.

8.4 Soundness of Query Minimization

The following lemma shows that the function ISQPARTCONST used by MINQ is indeed a monotonic function (cf. Definition 4.6), which is a necessary prerequisite for versions of the QX algorithm to work in a sound way.

Lemma 8.2. Let  ⟨K, B, P, N ⟩Rbe a DPI,  D ⊆ mD⟨K,B,P,N⟩R, Q ∈ QD,⟨K,B,P,N⟩Rwith q-partition P(Q). Further, let  f : 2Q → {0, 1}be a function that maps a subset QB of Q to 1 if QB has q-partition P(QB) = P(Q), to 0 otherwise. Then, f is a monotonic function (as per Definition 4.6).

Proof. Assume a subset  Q′of Q with  f(Q′) = 1, i.e.  Q′has q-partition  P(Q′) = P(Q). Let  Q′ ⊂Q′′ ⊆ Qand assume that  f(Q′′) = 0, i.e.  Q′′has a q-partition  P(Q′′) ̸= P(Q).

As shown in the proof of Lemma 8.1,  D+(X1) ⊇ D+(X2)holds for any  X1 ⊆ X2. Therefore, we have  D+(Q′) ⊇ D+(Q′′) ⊇ D+(Q)and by  P(Q′) = P(Q)that  D+(Q′) = D+(Q)and thus that all ⊇-relations are =-relations. So, either  D−(Q′′) ̸= D−(Q)or  D0(Q′′) ̸= D0(Q)must hold.

First, assume that  D−(Q′′) ̸= D−(Q). Then, as  K∗r ∪ Q′′ ⊂ K∗r ∪ Qand by monotonicity of L, it can only be the case that for some  Dr ∈ Dsome  x ∈ R ∪ Nthat is violated for  K∗r ∪ Qis not violated for  K∗r ∪ Q′′. Hence,  D−(Q′′) ⊂ D−(Q)must hold. By a similar argumentation – without the assumption that  D−(Q′) ̸= D−(Q′′)holds – we have that  D−(Q′) ⊆ D−(Q′′)and thus, altogether, that D−(Q′) ⊂ D−(Q)must be true. Due to  P(Q′) = P(Q)we know that  D−(Q′) = D−(Q)which is a contradiction.

Finally, assume that  D0(Q′′) ̸= D0(Q). Since  K∗r ∪ Qdoes not violate any  x ∈ R ∪ Nfor  Dr ∈D0(Q), K∗r∪Q′′cannot violate any  x ∈ R∪Nby monotonicity of L. As a conclusion, the only possibility for  D0(Q′′) ̸= D0(Q)is that  K∗r |= Q′′for some  Dr ∈ D0(Q), i.e. that  Dr ∈ D+(Q′′)which implies that  D0(Q′′) ⊂ D0(Q). By a similar argumentation – without the assumption that  D0(Q′) ̸= D0(Q′′)holds – we have that  D0(Q′) ⊆ D0(Q′′)and thus, altogether, that  D0(Q′) ⊂ D0(Q)must be true. Due to  P(Q′) = P(Q)we know that  D0(Q′) = D0(Q)which is a contradiction.

This completes the proof for monotonicity of the given function f.

Proposition 8.7 (Correctness of MINQ). Given a query  Q ∈ QD,⟨K,B,P,N⟩Ras input, MINQ computes a subset  Qmin ⊆ Qsuch that  P(Qmin) = P(Q)and there is no  Q′ ⊂ Qminsuch that  P(Q′) = P(Q).

Proof. This proposition is a consequence of the correctness of QX shown by Proposition 4.9, of the correctness of function ISQPARTCONST established by Lemma 8.1 and of the monotonicity of the property tested by the function ISQPARTCONST guaranteed by Lemma 8.2.

8.5 Complexity of Query Pool Generation

The complexity of query minimization, i.e. one call to MINQ, in terms of calls to the ISQPARTCONST function is directly obtained from the complexity results for the standard QX algorithm given by Proposition 4.8.

Proposition 8.8 (Complexity of MINQ). Let  ⟨K, B, P, N ⟩Rbe a DPI,  D ⊆ mD⟨K,B,P,N⟩R, Q ∈QD,⟨K,B,P,N⟩Rwith  P(Q) =�D+(Q), D−(Q), D0(Q)�and the function SPLIT (line 31 of Algorithm 4) be defined as SPLIT(n) = ⌊ n2 ⌋where n is a natural number. Then, the worst case number of calls to ISQ- PARTCONST during one call to MINQ(∅, Q, ∅, P(Q), ⟨K, B, P, N ⟩R)is in

image

where  Qminis the output of MINQ(∅, Q, ∅, P(Q), ⟨K, B, P, N ⟩R). For any other definition of the function SPLIT, the worst case number of calls to ISQPARTCONST gets larger.

The overall complexity of GETPOOLOFQUERIES in terms of calls to functions that call the reasoner, i.e. functions GETENTAILMENTS, ISKBVALID and ISQPARTCONST, is established by the following proposition.

Proposition 8.9 (Complexity of GETPOOLOFQUERIES). Let  ⟨K, B, P, N ⟩Rbe a DPI, q a natural number and  D ⊆ mD⟨K,B,P,N⟩R. Then, the worst case number of calls to functions that call a reasoner during one call to GETPOOLOFQUERIES(⟨K, B, P, N ⟩R, D, q)is in

image

where��Q(max)��is the maximum size of a query before minimization, i.e. the size of the set of maximum cardinality that is stored in variable Q in line 19 throughout all iterations, and���Q(max)min ���is the maximum size of a minimized query, i.e. the size of the set of maximum cardinality that is stored in variable  Q′in line 19 throughout all iterations.

Proof. During the execution of the for-loop over lines 3-5 the function GETENTAILMENTS is called |D| times. During the execution of the for-loop over lines 6-22 which may be executed at most  2|D| −2times, ISKBVALID is called at most  |D| − 1times since  |S| ≥ 1and  S ⊂ Dand thus  |D \ S| ≤ |D| − 1holds; furthermore, MINQ may be called once, namely if the condition tested by the if-statement in line 18 is true. During one execution of MINQ, by Proposition 8.8, at most

image

calls to ISQPARTCONST are made where  Qminis the output of the call to MINQ. So, an upper bound of the number of calls to ISQPARTCONST performed by one call to MINQ among all calls to MINQ throughout the execution of GETPOOLOFQUERIES, is

image

where���Q(max)min ���is the set of maximum cardinality that is stored in variable  Q′in line 19 throughout all iterations and��Q(max)��is the set of maximum cardinality that is stored in variable Q in line 19 throughout all iterations.

So, all in all we know that functions that call a reasoner are invoked at most

image

is an upper bound of this number, the proposition holds.

Note that none of the parameters that affect the complexity of the function GETPOOLOFQUERIES grows with the size of the DPI provided as an input to the interactive KB debugging problem. Merely the costs for reasoning, where a black-box debugging approach has no influence on, are affected by a higher complexity or larger size of the input DPI. Moreover, the size of the most relevant parameter influencing the worst case complexity, namely the exponent |D|, can be specified by the user to any value greater or equal to 2. In other words, minus reasoning time, the generation of a pool of queries is a fixed parameter tractable problem [DF95] in the context of interactive KB debugging.

8.6 Shortcomings of Query Pool Generation

First, the exponential time complexity regarding the parameter |D| is a problem arising from the paradigm of computing an optimal query w.r.t. a certain quantitative measure qsm() such as information gain [SFFR12, RSFF13] by calculating a (generally exponentially large) pool QP of queries in a first stage, whereupon

qsm(Q) ∈ Ris evaluated for  Q ∈ QPuntil the one  Q∗with optimal  qsm(Q∗)is found and selected as the query to be asked to the user.

A key to solving this issue is the use of a different paradigm that does not rely on the computation of the pool QP. Instead, qualitative measures can be derived from quantitative measures that have been used in interactive debugging scenarios [SFFR12, RSFF13, SF10]. These qualitative measures provide a way to estimate the qsm() value of partial q-partitions, i.e. ones where not all leading diagnoses have been assigned to the respective set in the q-partition yet. That way a direct search for a query with (nearly) optimal properties is possible. A similar strategy called CKK has been employed in [SFFR12] for the information gain measure (see Section 9.3). From such a technique we can expect to save a high number of reasoner calls. Because only a usually small subset of q-partitions included in the pool computed by GETPOOLOFQUERIES is required to find a query with desirable properties if the search is implemented by means of a heuristic that involves the exploration of seemingly favorable (potential) queries and (partial) q-partitions, respectively, first. This is a topic of future work.

Another shortcoming of GETPOOLOFQUERIES is the extensive use of reasoning services which may be computationally expensive (depending on the given DPI). Instead of computing a set of common entailments Q of a set of KBs  K∗ifirst and consulting a reasoner to fill up the (q-)partition for Q in order to test whether Q is a query at all, the idea enabling a significant reduction of reasoner dependence is to compute some kind of canonical query without a reasoner and use simple set comparisons to decide whether the associated partition is a q-partition. Guided by qualitative properties mentioned before, a search for such q-partition with desirable properties can be accomplished without reasoning at all. Also, a set-minimal version of the optimal canonical query can be computed without reasoning aid. Only for the optional enrichment of the identified optimal canonical query by additional entailments and for the subsequent minimization of the enriched query, the reasoner may be employed. This is also a topic of future work.

Another aspect that can be improved is that only one minimized version of each query is computed by Algorithm 4. That is, per q-partition P, there might be some set-minimal queries which do not occur in the output set QP. From the point of view of how well a query might be understood by an interacting user, of course not all minimized queries can be assumed equally good in general. Hence, in order to avoid a situation where a potentially best-understood query w.r.t. P is not included in QP, the query minimization process (see Section 8.3) might be adapted to take into account some information about faults the interacting user is prone to. This could be exploited to estimate how well this user might be able to understand and answer a query. For instance, given that the user frequently has problems to apply  ∃in a correct manner to express what they intend to express, but has never made any mistakes in formulating implications  →, then the query  Q1 = {∀X p(X) → q(X), r(a)}might be better comprehended than Q2 = {∀X∃Y s(X, Y )}. One way to achieve the finding of a well-understood query for some q-partition P is to run the query minimization MINQ more than once, each time with a modified input (using a hitting set tree to accomplish this in a systematic manner – cf. Chapter 4, where an analogue idea is used to compute different minimal conflict sets w.r.t. a DPI). In this way, different set-minimal queries for P can be identified and the process can be stopped when a suitable query is found.

8.7 Correctness of Query Pool Generation

The following proposition confirms the correctness of Algorithm 4, i.e. of the function GETPOOLOF-QUERIES. Roughly, it states that the output of QP of the function is duplicate-free, i.e. no query or q-partition occurs twice in QP, that QP includes only queries and q-partitions, that tuples in QP are unique w.r.t. the set  D+of a q-partition and that, given q > |QP|, there is no subset Y of D for which a q-partition with  D+ = Yexists and for which no q-partition with  D+ = Yis an element of QP.

Proposition 8.10. Let a DPI  ⟨K, B, P, N ⟩R, D ⊆ mD⟨K,B,P,N⟩Rsuch that  |D| ≥ 2and some  q ∈N ∪ {∞} , q ≥ 1be the inputs to GETPOOLOFQUERIES and let  |QPmax| ≥ 0be the maximum number of tuples  ⟨Q, P(Q)⟩that can be computed by GETPOOLOFQUERIES by means of the used GETENTAIL- MENTS function. If  q ≥ |QPmax|(in particular  q = ∞), then

1. there are no two tuples  ⟨Q, P(Q)⟩ , ⟨Q′, P(Q′)⟩in QP such that  Q = Q′or  P(Q) = P(Q′), and

2. QP includes a tuple�Q,�D+(Q), D−(Q), D0(Q)��only if  Q ∈ QD,⟨K,B,P,N⟩R, and

3. QP includes at most one tuple where  D+(Q) = Yfor each  Y ⊂ D, and

4. for each  Y ⊂ Dfor which a query Q w.r.t. D and  ⟨K, B, P, N ⟩Rexists such that

image

If  q < |QPmax|, then QP includes q tuples satisfying (1), (2) and (3).

Proof. Statement (1) is a consequence of Proposition 8.2. Statement (2) is an implication of Proposition 8.1 and Proposition 8.7. The former says that only sets Q that are actually queries w.r.t. D and ⟨K, B, P, N ⟩Rcan pass line 18. Thus, only queries are passed to MINQ as parameter Q. By the latter which states that MINQ is correct, i.e. outputs a query if the input is a query, statement (2) follows. Statement (3) follows from Proposition 8.5. If  q ≥ |QPmax|, the truth of statement (4) is witnessed by Proposition 8.4. Statement (5) is true by lines 23 and 24 and by Proposition 7.5 as well as Corollary 7.4 and the premise that  |D| ≥ 2which guarantee that the function ADDTRIVIALQUERIES always adds at least  |D| ≥ 2 > 0queries to QP. In case  q < |QPmax|, only statements (1), (2) and (3) are satisfied in general (for the same reasons as given above for the case  q ≥ |QPmax|) and QP is returned in line 22 by the definition of  |QPmax|. Thence, the condition  |QP| = q ≥ 1tested in line 21 must be valid for QP.

Algorithm 5 Interactive KB Debugging

Input: a tuple�⟨K, B, P, N ⟩R, nmin, nmax, t, p �K∪K, q, qsm(), σ, mode�consisting of

an admissible DPI  ⟨K, B, P, N ⟩R,

leading diagnoses computation parameters, natural numbers  nmin ≥ 2, nmax, t,

a function  p �K∪K : �K ∪ K → (0, 1],

a parameter  q ∈ N ∪ {∞} , q ≥ 1that determines the size of the computed query pool,

a function  qsm(Q) ∈ Rused for query selection that assigns a real number to a query Q to express the “goodness” of Q,

a maximum fault tolerance  σ ∈ [0, 1] and

a mode  mode ∈ {static, dynamic}that determines the used method for diagnosis computation.

image

an approximation of the solution to Interactive Static KB Debugging (Problem Def. 6.2) if  σ > 0.the (exact) solution to Interactive Static KB Debugging if  σ = 0.

image

an approximation of the solution to Interactive Dynamic KB Debugging (Problem Def. 6.1) if  σ > 0.the (exact) solution to Interactive Dynamic KB Debugging if  σ = 0.

image

10: ⟨D✓, Q, Ccalc, D×, D⊃, Qdup⟩ ← DYNAMICHS(⟨K, B, P, N ⟩R, Q, Qdup, t, nmin, nmax,Ccalc, D✓, D×, pK(), P ′, N ′, D⊃)

11: pD() ← GETPROBDIST(D✓, pK(), ⟨K, B, P, N ⟩R, QA) ▷see Algorithm 6 12: Dmax ← GETMODE(D✓, pD())

13: if  pD(Dmax) ≥ 1 − σ then ▷stop criterion 14: return GETSOLKB(Dmax, ⟨K, B, P ∪ P ′, N ∪ N ′⟩R, P ′, mode) ▷return solution KB 15: else

16: ⟨Q, P(Q)⟩ ← CALCQUERY(D✓, qData, pD(), p �K∪K(), qsm(),

image

18: QA  ← APPEND(⟨Q, answer⟩ , QA)

image

26: N ′ ← N ′ ∪ {Q}

image

Knowledge Base Debugging

In this chapter we will give a description of an algorithm for interactive KB debugging (Algorithm 5) which implements the entire functionality required by an interactive debugging system. All other algorithms presented so far will be subroutines of Algorithm 5 which are either directly or indirectly called by it. Before we explain and discuss Algorithm 5 in detail, we give the reader a rough and informal overview of the algorithm’s input, output and actions in the following section in order to make the details of the algorithm easier to digest.

Remark 9.1 Note, in the following, when we speak of the input DPI we refer to the DPI  ⟨K, B, P, N ⟩Rthat is provided as an input to Algorithm 5, by the current DPI we mean the DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rwhere  P′and  N ′, respectively, are all positive and negative test cases added to the input DPI from the start of the algorithm’s execution until the current point in time. Further on, an intermediate (or previous) DPI denotes a DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Rwhich is not the current DPI and where  ∅ ⊆ P′′ ⊆ P′and  ∅ ⊆N ′′ ⊆ N ′. Finally, the last-but-one DPI corresponds to an intermediate DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Rwhere either  |P′| = |P′′| + 1or  |N ′| = |N ′′| + 1is true, but not both.

9.1 Interactive Debugging Algorithm: Overview

image

fault probabilities of syntactical elements occurring in the KB,

a minimal and desired number of leading diagnoses,

a desired maximum reaction time (time between two successive queries presented to the user),

a maximum fault tolerance (roughly, the probability of being presented a non-desired solution KB as output),

a measure for query selection (determines which query is the best query within a given set of queries),

a parameter that determines the size of the computed pool of queries in each iteration and

a parameter specifying the way the hitting set tree for computation of leading diagnoses is constructed and updated.

Output:

A solution KB such that the diagnosis used to formulate the solution KB has a probability (w.r.t. the current leading diagnoses) greater than or equal to 1 minus the given maximum fault tolerance.

Procedure:

1. Initialization: Compute the fault probability of each formula in the KB by means of the given fault probabilities.

2. Leading Diagnoses Computation: Use a hitting set tree constructed and updated in a manner as specified in the input coupled with QX to calculate a set of leading diagnoses. In that, the cardinality and computation time of the set of leading diagnoses is determined by the corresponding input parameters specifying minimal and desired number of leading diagnoses and desired reaction time.

3. Probability Update and Stop Criterion: Use the formula fault probabilities and the new information obtained by already specified test cases (answered queries) to compute updated (posterior) probabilities of the current leading diagnoses. If one diagnosis probability is greater than or equal to 1 minus the maximum fault tolerance, return the solution KB obtained by deletion of this diagnosis from the KB and subsequent addition of the union of all positive test cases.

4. Query Generation and Selection: Use the set of leading diagnoses (and possibly their fault probabilities) to generate a pool of queries, the size of which depends on the respective parameter provided as input. Given the pool of queries, select the best query according to the given query selection measure.

5. User Interaction and Incorporation of New Information: Ask the user the selected query and add it to the positive test cases in case of a positive answer and to the negative test cases otherwise.

6. Hitting Set Tree Update: Update the hitting set tree based on the new information given by the clas-sification of the test case resulting from the query answer. In particular, this involves the deletion of all those minimal diagnoses that conflict with the new test case.

7. Repeat from Step 2.

9.2 Interactive Debugging Algorithm: Detailed Description

To describe the detailed process of Algorithm 5, we first characterize the input arguments, the output and the meaning of the variables used and then provide a step-by-step textual description of the actions taken by the algorithm.

9.2.1 Input Arguments

The input parameters of Algorithm 5 are the following:

An admissible DPI  ⟨K, B, P, N ⟩R(cf. Definition 3.6).

Natural numbers  nmin ≥ 2, nmax, tfor leading diagnoses calculation (see description in Chapter 7 on page 95).

Remark: The postulation  nmin ≥ 2is necessary in order for the existence of queries w.r.t. any computed set of leading minimal diagnoses D and  ⟨K, B, P, N ⟩Rto be guaranteed (see Proposition 7.5).

A function  p �K∪K : �K ∪ K → (0, 1]that assigns a fault probability  p �K∪K(e)to each  e ∈ �K ∪ Kreflecting the degree of belief that (one occurrence of) a syntactical element e appearing in K is faulty (see Section 4.6).

Remarks: Forbidding a probability of zero for syntactical elements assures that no formula in K can have a probability of zero (cf. Remark 4.5).

Recall from Section 4.6.1 that �Krefers to the signature of K (cf. Chapter 2) and K denotes the set of all logical connectives occurring in K. From probabilities of logical connectives and elements of the signature, probabilities of formulas in K and from those in turn probabilities of diagnoses w.r.t. the DPI can be derived as shown by Formulas 4.2 and 4.3.

Further note that in the description of the algorithms in this section, unlike in Section 4.6, we use different denotations for probabilities of syntactical elements (p �K∪K), formulas (pK()) and diagnoses (pD()) in order to make a clear distinction between these different functions.

A natural number  q ≥ 1that denotes the number of queries that should be precomputed, i.e. the preferred size of the query pool QP (see Chapter 8), before the “best” tuple  ⟨Q∗, P(Q∗)⟩is selected from QP.

Remark: In general, higher q implies better quality of the selected query in terms of the query selection measure qsm() (see next bullet point). The chance of locating a good query in a larger set of queries is higher. On the other hand, higher q involves a worse reaction time, i.e. time between two successive queries. The more queries are computed, the more time the function GETPOOLOF-QUERIES consumes.

A query selection measure qsm() where  qsm : QP → Ris a function that assigns a real-valued number  qsm(⟨Q, P(Q)⟩)to each tuple in QP, often called the score of  ⟨Q, P(Q)⟩.

Remark: qsm() defines what is considered the “best” query in the set QP, namely the query  Q∗in the tuple  ⟨Q∗, P(Q∗)⟩with best score among all tuples in the pool QP. Diverse measures that can be used as a qsm() function in this algorithm have been discussed and evaluated within the scope of interactive KB debugging in literature [SFFR12, RSFF13] (for details see Section 9.3).

A maximum fault tolerance  σthat defines the stop criterion of the algorithm. That is, for a current set of leading diagnoses, the stop criterion is satisfied iff the most probable leading diagnosis has an (updated) probability of at least  1−σ(see below for a precise definition of what “updated” means).

Remark: The smaller  σis chosen, the higher is the chance that a desired diagnosis is found. Selecting  σ := 0, i.e. admitting zero fault tolerance, is the safest (but also most time-consuming) way to run a debugging session with Algorithm 5, as in this case the session will stop only after all but one diagnosis have been invalidated by test cases.

A mode  mode ∈ {static, dynamic}that determines

(i) which type of leading diagnoses are computed, i.e. only minimal diagnoses w.r.t. the input DPI (static) or minimal diagnoses w.r.t. the current DPI (dynamic),

(ii) the hitting set tree pruning strategy after a query has been answered, i.e. conservative pruning (static) or invasive pruning (dynamic),

(iii) the space and time complexity of diagnosis computation, i.e. not much affected by the asked queries (static) – tree is almost monotonically growing, but cannot get larger in size than the complete non-interactive hitting set tree (the tree produced by Algorithm 2 with input nmin = ∞) – or significantly influenced by the asked queries (dynamic) – tree may shrink significantly if new test cases do not introduce “completely new” minimal conflict sets (that

are in no subset-relation with an existing one), or lead to a tree that is significantly larger than the complete non-interactive hitting set tree if many “completely new” minimal conflict sets result from the addition of new test cases. For an in-depth discussion and comparison of both strategies the reader may consult Part III.

9.2.2 Output

The output of Algorithm 5 can be explained as follows by making a distinction between the two modes of the algorithm specified by input parameter mode:

Proposition 9.1. If mode = static, then Algorithm 5 returns the (exact) solution of the Interactive Static KB Debugging problem (Problem Definition 6.2) if  σ = 0and an approximate solution of the problem if σ > 0where the likeliness of finding the (exact) solution increases with decreasing  σ.

More concretely, a maximal solution KB  K∗ = (K \ Dmax) ∪ UPw.r.t. the input DPI  ⟨K, B, P, N ⟩Ris returned such that

1.  Dmax ∈ D (Dmaxis an element of the current set of leading diagnoses)

2.  Dmax = arg maxD∈D pD(D) (Dmaxis the a-posteriori most probable leading diagnosis)

3.  pD(Dmax) ≥ 1 − σ(the a-posteriori probability of  Dmaxexceeds the predefined threshold)

4.  D ⊆ mD⟨K,B,P,N⟩R ∩mD⟨K,B,P∪P′,N∪N ′⟩Rcomprises the |D| most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Ras per the diagnosis probability measure  pD,prio()

(the set of leading diagnoses corresponds to the a-priori most probable minimal diagnoses w.r.t. the input DPI that satisfy all specified test cases),

image

6. the a-posteriori probability measure  pD()is computed from  pD,prio()as per Bayes’ Theorem (Formula 4.5, for details see below) taking into account the new information given by the set of all answered queries so far, i.e. the collected sets of positive (P′) and negative (N ′) test cases.

If mode = dynamic, then Algorithm 5 returns the (exact) solution of the Interactive Dynamic KB Debugging problem (Problem Definition 6.1) if  σ = 0and an approximate solution of the problem if  σ > 0where the likeliness of finding the (exact) solution increases with decreasing  σ.

More concretely, a maximal solution KB  K∗ = (K\Dmax)∪UP∪P′w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪ N ′⟩Ris returned such that

1.  Dmax ∈ D (Dmaxis an element of the current set of leading diagnoses)

2.  Dmax = arg maxD∈D pD(D) (Dmaxis the a-posteriori most probable leading diagnosis)

3.  pD(Dmax) ≥ 1 − σ(the a-posteriori probability of  Dmaxexceeds the predefined threshold)

4.  D ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Rcomprises the |D| most probable minimal diagnoses w.r.t.  ⟨K, B, P ∪P′, N ∪ N ′⟩Ras per the diagnosis probability measure  pD,prio()

(the set of leading diagnoses corresponds to the a-priori most probable minimal diagnoses w.r.t. the current DPI),

image

6. the a-posteriori probability measure  pD()is computed from  pD,prio()as per Bayes’ Theorem (Formula 4.5, for details see below) taking into account the new information given by the set of all answered queries so far, i.e. the collected sets of positive (P′) and negative (N ′) test cases.

Remark 9.2 We still need to explain what we mean by “approximate solution” of the Interactive Static (Dynamic) KB Debugging problem. Roughly, an approximate solution is one constructed from a diagnosis which is not the only remaining minimal diagnosis. More precisely, an approximate solution of

the Interactive Static KB Debugging problem is a maximal solution KB  (K \ D) ∪ UPsuch that

image

there is some  D′ ̸= Dwhich is a minimal diagnosis w.r.t. the input DPI and w.r.t. the current DPI

the Interactive Dynamic KB Debugging problem is a maximal solution KB  (K \ D) ∪ UP∪P′such that

D is a minimal diagnosis w.r.t. the current DPI and

there is some  D′ ̸= Dwhich is a minimal diagnosis w.r.t. the current DPI

where the input DPI is given by  ⟨K, B, P, N ⟩Rand the currect DPI by  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

So, as long as not all but one diagnosis candidate that enables the formulation of a solution KB has been ruled out by the classification of test cases, we speak of an approximate solution. Now, the lower a value for  σis predefined, the longer Algorithm 5 will usually need to iterate and the more test cases will usually need to be specified until one diagnosis has a probability greater than or equal to  1 − σ. Thence, at the time a diagnosis exceeds the probability  1 − σthere will be usually fewer minimal diagnoses left than in case of the selection a higher value for  σ. Therefore, the likeliness of picking the (exact) solution will usually be the higher, the lower  σis.

Remark 9.3 Note that granting a maximum absolute fault tolerance  σthat is independent of a set of leading diagnoses is generally computationally infeasible due to the high complexity of diagnosis computation (see Chapter 1). Since, for an absolute fault tolerance to hold, all minimal diagnoses w.r.t. the current DPI have to be computed in order to determine their probability and to decide whether the most probable diagnosis has a probability greater than or equal to  1 − σ.

In fact, the fault tolerance used by Algorithm 5 which is relative to the set of leading diagnoses, i.e. the (a-priori) most probable minimal diagnoses D w.r.t. a DPI can be interpreted as follows. Under the assumption that the true diagnosis  Dtis included in D, the chance that the most probable minimal diagnosis  Dmax ∈ Dwhich satisfies the stop criterion is not equal to  Dtis smaller than the predefined threshold  σ(cf. Section 4.6). Thus, under this assumption, the (a-posteriori) probability of being presented a non-desired solution KB as output of Algorithm 5 is smaller than  σ.

The a-priori diagnoses probability measure  pD,prio()refers to the one that is computed directly from the fault information provided as an input to Algorithm 5 whereas the a-posteriori diagnoses probability measure  pD()is the one obtained from  pD,prio()after incorporating the information given by the new test cases specified so far during the debugging session. So,  pD,prio()and  pD()might differ in terms of the probability order of diagnoses. Incorporation of updated probabilities directly into the hitting set tree algorithms to be used for the determination of leading diagnoses in the order prescribed by an updated probability measure is only possible if there is an additional update operator (besides Bayes’ Theorem for adapting diagnoses probabilities) that can be applied to formula probabilities. For, the latter are exploited in the hitting set tree to assign probability weights to paths that are not yet diagnoses (cf. pnodes()specified by Definition 4.9 and the discussion of Formula 4.6) in order to guide the search for minimal diagnoses in best-first order. Updated diagnosis probabilities are not helpful at all for this purpose. Devising a reasonable mechanism of updating formula probabilities seems to be hard mostly due to the lack of suitable data that might be collected during the debugging session to accomplish that. What would be imaginable during the debugging session is to try to learn something about the fault probability of syntactical elements by examining the positive (all formulas are definitely correct) and singleton negative (the single formula is definitely incorrect) test cases. However, a drawback of such a strategy comes into effect when only syntactically very simple queries are used which is, for instance, the case in Example 8.1 (see the definition of the GETENTAILMENTS function there). From such queries not many useful insights concerning faulty syntactical elements might be gained. On the other hand, such queries are absolutely desirable from the point of view of how well a user might comprehend the formulas asked by the system. Hence, these two aspects seem to contradict each other. Still, it is a topic for future research to attempt to elaborate a solution for that issue.

A way to achieve that  pD()coincides with  pD,prio(), at least in case mode = static, is to exclude queries Q with  D0(Q) ̸= ∅(see Remark 9.8). How this might be accomplished is stated by Proposition 8.3. Please notice that ignorance of queries with non-empty  D0does not implicate any disadvantages for interactive debugging. On the contrary, it is even a desirable feature of a debugger and brings along higher computational efficacy of query generation and stronger test cases from the logical point of view (cf. Section 8.2). For the scenario mode = dynamic, it is not possible in general to bypass the probability update by means of such queries (see Remark 9.8).

9.2.3 Variables

The variables used by Algorithm 5 that are not input arguments to the algorithm are the following:

 P′, N ′are the sets of positive and negative test cases, respectively, collected during the execution of Algorithm 5 so far. That is,  P′stores all positively answered queries, whereas  N ′stores all negatively answered ones.

 Ccalcis the set of all conflict sets computed by QX during the execution of Algorithm 5 so far.

Remark: In case of static debugging (mode = static), Ccalcincludes exclusively minimal conflict sets w.r.t. the input DPI, whereas, in case of dynamic debugging (mode = dynamic), Ccalcmay comprise minimal conflict sets w.r.t. the current or any intermediate DPI.

 D✓is the set of leading diagnoses returned by a call of STATICHS in case of static debugging (mode = static) and by a call of DYNAMICHS in case of dynamic debugging (mode = dynamic).

Remarks: In case of dynamic debugging,  D✓ ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Ris the set of most probable minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Ras per the diagnosis probability measure  pD,prio()computed from  p �K∪K()by Formulas 4.2, 4.7, 4.3 and 4.4 (cf. Sections 4.6 and 9.2.2).

In case of static debugging,  D✓ ⊆ mD⟨K,B,P,N⟩R ∩mD⟨K,B,P∪P′,N∪N ′⟩R, i.e.  D✓includes only diagnoses that are minimal diagnoses w.r.t. the input DPI  ⟨K, B, P, N ⟩Ras well as w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪N ′⟩R. Moreover,  D✓comprises the most probable minimal diagnoses w.r.t.

the input DPI according to the diagnosis probability measure  pD,prio()computed from  p �K∪K()by Formulas 4.2, 4.7, 4.3 and 4.4 (cf. Sections 4.6 and 9.2.2).

 D×stores all minimal diagnoses w.r.t. the input DPI that have been invalidated by one of the collected positive and negative test cases  P′and  N ′, respectively (mode = static). D×stores the minimal diagnoses w.r.t. the last-but-one DPI that have been invalidated by the most recently added test case (mode = dynamic).

 Doutis the subset of the set of current leading diagnoses  D✓that has been invalidated by the most recently added test case.

 D⊃stores all diagnoses that are non-minimal w.r.t. the current DPI, i.e. for each diagnosis  nd ∈ D⊃there is some  nd′ ∈ D✓such that  nd ⊃ nd′(mode = dynamic).

Remark:  D⊃is solely needed for dynamic and not for static debugging as the latter does not need to store non-minimal diagnoses (cf. rule 4 of Definition 4.8 on page 59). Reason for this is the fact that only minimal diagnoses w.r.t. the input DPI are searched for. On the other hand, in case of dynamic debugging, non-minimal diagnoses might become minimal ones after some new test cases are specified since minimal diagnoses w.r.t. the (changing) current DPI are considered.

qData is an informal variable that comprehends any kind of data that might be taken into account by the query selection measure qsm() and that might need to be adapted after a query has been answered (and diagnoses have been invalidated) in order to take the obtained new information into account. One can imagine qData as a log specific to the particular function qsm() that is used which records data of prior (query answering) iterations executed by the algorithm such as certain performance measures. An example of a qsm() strategy using one such metric, namely the ratio of leading diagnoses invalidated by a test case, can be found in [RSFF13].

 QA := [⟨Q, u(Q)⟩]Q∈P′∪N ′where  u(Q) ∈ {true, false}is the chronologically ordered list of queries and user answers collected so far during the execution of Algorithm 5.

Q is the current queue of open nodes in the hitting set tree maintained by Algorithm 5.

The list  Qduproughly stores all duplicate nodes (that is, nodes for each of which there is a node in the hitting set tree that corresponds to an equal set of edge labels) computed so far during the execution of Algorithm 5.

Remark: The list  Qdupis only relevant in case mode = dynamic and not needed if mode = static. The purpose of this set is to enable the “replacement” of pruned nodes which is necessary to guarantee the completeness of DYNAMICHS in terms of not missing any minimal diagnoses (for a detailed explanation, see Chapter 12).

9.2.4 Algorithm Walkthrough

Initialization. In the first 4 lines, variable declarations take place. First, all variables that store sets of conflict sets, diagnoses or test cases, and qData are initialized to the empty set. Further on,  Qdupand QA are initialized to an empty list. Finally, the queue Q of open nodes used for the hitting set tree construction by STATICHS (mode = static) or DYNAMICHS (mode = dynamic), respectively, is set to [∅]since it initially includes only a non-labeled root node.

Remark 9.4 The non-labeled root node is denoted by  ∅since nodes in STATICHS are associated with the set of edge labels along the path in the hitting set tree from the root node to this node (cf. Chapters 4 and 11). Hence, the root node itself corresponds to the empty path which includes no edges.

Notice that in case of DYNAMICHS, nodes will be (ordered) lists instead of (non-ordered) sets like in STATICHS (cf. Chapter 12). That is, to be precise, the unlabeled root node in this case corresponds to the empty list []. For the ease of representation of Algorithm 5, only one set Q is initialized to be used with either STATICHS or DYNAMICHS. Thence, by abuse of notation, we associate  ∅in this case with the empty list [].

Computing Fault Probabilities of Formulas. Then, GETFORMULAPROBS is called in line 5 with the KB K and the function  p �K∪K : �K ∪ K → (0, 1]as inputs. The function first applies Formula 4.2 to compute probabilities for each formula in K, then applies Formula 4.7 to these probabilities leading to the output  pK : K → (0, 0.5), a function that assigns a value  pK(ax) ∈ (0, 0.5)to each  ax ∈ K.

Computing Leading Diagnoses. At this point, all input arguments required by for the hitting set tree construction are instantiated. So, the algorithm enters the while loop in line 6. As a first step within the loop, either STATICHS, if mode = static, or DYNAMICHS, otherwise, is called in order to obtain a tuple including a set of leading diagnoses along with variables that store the “state” of the (partial) hitting set tree constructed so far and facilitate the reuse of this tree in the next iteration.

In concrete terms, STATICHS accepts the arguments  ⟨K, B, P, N ⟩R, Q, t, nmin, nmax, Ccalc, D✓, D×, pK(), P′and  N ′and returns a tuple  ⟨D, Q, Ccalc, D×⟩the elements of which are defined as follows:

D is the current set of leading diagnoses such that

image

where “most-probable” refers to the diagnosis probability measure  pD,prio()obtained from  pK()by application of Formulas 4.3 and 4.4.

Q is the current queue of open nodes of the hitting set tree.

 Ccalc ⊆ mC⟨K,B,P,N⟩Ris the set of all computed minimal conflict sets w.r.t. the input DPI throughout all calls of STATICHS during the execution of Algorithm 5 so far.

 D×comprises all computed minimal diagnoses throughout all calls of STATICHS during the execution of Algorithm 5 so far where each  D ∈ D×has been invalidated by some test case in  P′or N ′.

Similarly, DYNAMICHS accepts the arguments  ⟨K, B, P, N ⟩R, Q, Qdup, t, nmin, nmax, Ccalc, D✓, D×, pK(), P′, N ′and  D⊃and returns a tuple  ⟨D, Q, Ccalc, D×, D⊃, Qdup⟩the elements of which are defined as follows:

D is the current set of leading diagnoses such that

(a)  D ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Ris the set of most probable minimal diagnoses w.r.t.  ⟨K, B, P ∪P′, N ∪ N ′⟩Rsuch that

image

image

where “most-probable” refers to the diagnosis probability measure  pD,prio()obtained from  pK()by application of Formulas 4.3 and 4.4.

Q is the current queue of open (non-labeled) nodes of the hitting set tree,

 Ccalcis a set of conflict sets w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R,

 D× =,

 D⊃is the set of all processed nodes so far throughout the execution of Algorithm 5 that are non-minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand

 Qdupincludes all duplicate nodes found so far throughout the execution of Algorithm 5 (for a detailed explanation see Chapter 12 and Algorithm 8).

Remark 9.5 It is very important to notice that the function  pnodes()for  p() := pK()as specified by Definition 4.9 on page 73 imposes the same order on a set of minimal diagnoses as the a-priori probability measure  pD,prio(). That is  pnodes(D) = c · pD,prio(D)for all minimal diagnoses D w.r.t. a DPI where c is a constant (which is the same for all diagnoses D). The difference between both functions is that pnodes()is defined for all  X ⊆ Kwhereas  pD,prio()is only defined for (leading) minimal diagnoses  D ⊆K. Further on  pD,prio()is normalized whereas  pnodes()is not which accounts for the (normalization) constant c. The function  pnodes()is essential for the best-first construction of the hitting set tree in STATICHS and DYNAMICHS since it allows for the assignment of a “probability” to non-diagnoses (cf. the discussion of Formula 4.6 on page 73). Since the input argument p() (which is the same for all calls) to STATICHS as well as DYNAMICHS is equal to  pK()by lines 8 and 10 in Algorithm 5, the set D returned by STATICHS (DYNAMICHS) is also the set of most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R(⟨K, B, P ∪ P′, N ∪ N ′⟩R) as per the function  pnodes()(cf. Proposition 11.1 and Corollary 12.8).

Remark 9.6 Notice that the return parameter that is relevant for the main purpose of Algorithm 5, namely to compute a query and thereby obtain a new test case classified by the user, is solely the set of leading diagnoses D. The other return parameters serve as a means to store the state of the hitting set tree that is gradually built up by successive calls of STATICHS (if mode = static) and DYNAMICHS (if mode = dynamic), respectively. Whereas Q and  Ccalc(and  D⊃and  Qdupin case of DYNAMICHS) are never modified until the next call to STATICHS or DYNAMICHS, the sets  D✓and  D×are only changed once, after the subset of invalidated leading diagnoses  Doutis known, in lines 21 and 22.

At this moment, we do not go into detail regarding the way how leading diagnoses are computed by STATICHS and DYNAMICHS. We simply suppose that both functions act in a manner that the outputs just specified are returned for the given inputs. An in-depth delineation of both functions will be given in Chapters 11 and 12 in Part III. Further note that the return parameter D is stored in variable  D✓from line 10 on.

Computing a Probability Distribution of Leading Diagnoses. After the set of leading diagnoses  D✓has been computed, the variables  D✓, pK(), ⟨K, B, P, N ⟩Rand QA are used as arguments to the function GETPROBDIST (see Algorithm 6) which computes a probability distribution of the leading diagnoses, i.e. a probability measure  pD()for the probability space with sample space  Ω = D✓(cf. Section 4.6). As a first action to achieve this, the (a-priori) probabilities  pD,prio(D)for  D ∈ D✓are computed from the (a-priori) probabilities  pK(ax)for formulas  ax ∈ Kas per Formula 4.3 (GETPRIODIAGPROBS in line 29). Application of Formula 4.4 is not necessary at this point as probabilities are anyhow normalized at the end of GETPROBDIST (line 44). Notice that the function  pK()remains constant, i.e. unmodified, throughout the entire execution of Algorithm 5.

Now, since a-priori diagnosis probabilities assigned by  pD,prio()directly rely upon  pK()which in turn is computed directly from the initially given fault probabilities  p �K∪K(), the probability measure  pD,prio()is adapted to yield a-posteriori diagnosis probabilities  pD()in order to reflect the new evidence provided by the collected test cases  P′and  N ′.

The a-posteriori probability of a current leading diagnosis D in  D✓is  pD(D | QA)and can be computed by means of Bayes’ Theorem (Formula 4.5) from  pD,prio()as follows.

image

where QA is the chronologically ordered list of queries and user answers collected so far during the execution of Algorithm 5 (see page 127). We point out that  pD,prio(QA)is only a normalization factor that is equal for each diagnosis and thus does not need to be explicitly computed. The crucial factor is

image

which describes the probability of getting exactly the answer u(Q) for each query  Q ∈ P′ ∪N ′under the assumption that D corresponds to the true diagnosis  Dt, i.e.  Dt = D. In other words,  pD,prio(QA | D)is the probability of QA under the assumption that the user answers in a way that u(Q) = true if D ∈ D+(Q)and u(Q) = false if  D ∈ D−(Q).

For a single query  Qi, the probability  pD,prio(Qi = u(Qi) | D)is defined as (cf. [dKW87])

image

for  u(Qi) = trueand

image

for  u(Qi) = falsewhere  D+(Qi), D−(Qi)and  D0(Qi)are computed w.r.t. the DPI  ⟨K, B, P ∪P′′, N ∪N ′′⟩where  P′′and  N ′′, respectively, include all test cases collected prior to  Qi, i.e.  P′′ ∪ N ′′ ={Q1, . . . , Qi−1}if queries are numbered chronologically. That is, if D predicted the answer  u(Qi)to Qigiven by the user, the probability is 1, zero if D predicted the converse answer  ¬u(Qi)and 12if D did not predict any answer to  Qi.

So, aside from the normalization factor (see above),  pD,prio(Qi = u(Qi) | D)is the factor by which the a-priori probability  pD,prio(D)must be multiplied to obtain the a-posteriori probability  pD(D)of a diagnosis D after a single query  Qihas been answered and added as a test case to the DPI.

The intuitive explanation for the update by this factor is that if D predicted (at least) one answer u(Q) conversely as given by the user, then D is a-posteriori impossible since it has already been invalidated by the addition of test case Q. In case a diagnosis has never predicted the wrong answer, but did not predict any answer for many queries so far, then it is a-posteriori more unlikely than a diagnosis that did predict a correct answer more often. That is, our a-posteriori degree of belief that D is the correct diagnosis is the higher, the more often D had predicted answers to queries that were later actually given by the user (cf. Section 7.4 for an explanation what we mean by “predict”).

The value of  pD,prio(Qi = u(Qi) | D)can be computed by use of QA and the q-partitions  P(Q1), . . . , P(Qi−1)of the current set of leading diagnoses  D✓(for which a-posteriori probabilities are to be computed) for all queries  Q1, . . . , Qi−1answered before query  Qi. Thereby, each  P(Qj)where j ∈ {1, . . . , i − 1}must be computed for a DPI where only  Q1, . . . , Qj−1are incorporated as test cases.

Taking these thoughts into account, GETPROBDIST (Algorithm 6) updates  pD,prio(D)for each diagnosis  D ∈ D✓in that it runs through all query-answer pairs  ⟨Q, u(Q)⟩in QA chronologically and for each  D ∈ D✓it multiplies  pD,prio(D)by 12if  D ∈ D0(Q)as per Formulas 9.1 and 9.2. For each check whether a diagnosis is in  D0(Q)in lines 34 and 39 a DPI is used that already incorporates all test cases P′′and  N ′′that have been added chronologically before Q was asked. This is achieved by updating  P′′and  N ′′successively (lines 36 and 41). After all elements of QA have been processed, the updated diagnosis probabilities are finally normalized (line 44, cf. Formula 4.4 on page 72) and the resulting function pD,prio()is returned.

Remark 9.7 Note that the function GETPROBDIST exploits the fact that all diagnoses in  D✓are leading diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rwhich guarantees that none of these diagnoses has been invalidated by any of the test cases in  P′or in  N ′added throughout the execution of Algorithm 5 (cf. Proposition 12.3 given later). Hence, it is clear that each  D ∈ D✓must be in  D+(Q) ∪ D0(Q)if u(Q) = true and in  D−(Q) ∪ D0(Q)if u(Q) = false, and it is only tested whether  D /∈ D+(Q)in the prior case (line 34) and whether  D /∈ D−(Q)in the latter (line 39). It must be further noted that, in case of mode = dynamic, diagnoses in  D✓are not necessarily minimal diagnoses w.r.t. the intermediate DPIs ⟨K, B, P ∪ P′′, N ∪ N ′′⟩that are used for the probability update. However, this is not problematic since any set of (minimal and/or non-minimal) diagnoses is partitioned into the three sets  D+(Q), D−(Q)and D0(Q)by a query Q (cf. Remark 7.3) wherefore P(Q) exists for any set  D✓. Thence, the correctness of GETPROBDIST remains unaffected by the usage of the setting mode = dynamic.

Remark 9.8 We want to emphasize that an adaptation of  pD,prio(D)is only necessary in case  D ∈D0(Qj)for some query  Qjanswered so far during the execution of Algorithm 5 as otherwise a multiplication by 1 is required which does not change  pD,prio(D).

For the case of static debugging (mode = static), an immediate implication of this is the following: The restriction of asking the user only queries  Qjw.r.t. a DPI with the property that no minimal diagnosis w.r.t. this DPI can be an element of  D0(Qj)makes the probability update for each diagnosis in  D✓equivalent to a multiplication by 1 and hence obsolete. This must be the case since each diagnosis in  D✓which is a subset of  mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩R(see Section 9.2.2) must be a minimal diagnosis w.r.t. each intermediate DPI (which includes a superset of the test cases in the input DPI  ⟨K, B, P, N ⟩Rand a subset of the test cases in the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R) as will be substantiated by Proposition 12.5 given later. Consequently, such a scenario implicates that the order of diagnoses computed by STATICHS corresponds to the best-first order also w.r.t. the a-posteriori diagnosis probabilities (cf. Remark 9.3).

The approach of only using queries with this property is feasible, e.g. by using a GETENTAILMENTS function in conformity with Proposition 8.3 for the generation of the query pool (GETPOOLOFQUERIES). Such a type of queries is also favorable from the discrimination point of view, as we pointed out in Section 8.2. An improvement of static debugging with this type of queries is to deactivate the probability update, i.e. replace line 11 in Algorithm 5 by line 29 of Algorithm 6. This improvement is not shown in Algorithm 5.

In a dynamic debugging session (mode = dynamic), on the contrary, the usage of such queries does not guarantee the triviality of the probability update. For, also if no minimal diagnosis w.r.t. the DPI (for which a query  Qjis computed) can be an element of  D0(Qj), there may be some non-minimal one which is. For example, for any admissible DPI  ⟨K, B, P, N ⟩Ris holds that D := K is a diagnosis (cf. Proposition 3.4 and Definition 3.6), albeit in most cases a non-minimal one. In such a case, (K \ D) ∪ B ∪ UPwhich is equal to  B ∪ UPcannot entail  Qj. Because, were this the case, then all minimal diagnoses  Di ∈ mD⟨K,B,P,N⟩Rwould be elements of  D+(Qj)as each  K∗i ⊇ B ∪ UPand thus each K∗i |= Qjby the monotonicity of L. Hence, this would be a contradiction to the fact that  Qjis a query w.r.t.  ⟨K, B, P, N ⟩Rby Corollary 7.2. On the other hand,  (K \ D) ∪ B ∪ UP ∪ Qj = B ∪ UP ∪ Qjcannot violate any  x ∈ N ∪R. Since, if this were the case, then adding  Qjto the positive test cases would lead to a non-admissible DPI  ⟨K, B, P ∪ {Qj} , N ⟩R. By Corollary 7.3, this would be a contradiction to the fact that  Qjis a query w.r.t.  ⟨K, B, P, N ⟩R. Thence,  D ∈ D0(Qj)must hold for the assumed non-minimal diagnosis D. From that we conclude that the probability update in dynamic debugging cannot be made obsolete in general by the usage of such a type of queries.

Stop Criterion and Output. The (a-posteriori) probability distribution  pD()of leading diagnoses  D✓is then used in line 12 of Algorithm 5 to compute the mode of this distribution, i.e. the one diagnosis Dmax ∈ D✓with maximum probability according to  pD().

In the sequel,  Dmaxis used to check the stop criterion (line 13), namely whether  Dmaxhas a probability greater than or equal to  1 − σ. If this is the case and mode = static, the function GETSOLKB computes a maximal solution KB w.r.t. the input DPI as  (K \ Dmax) ∪ UPby means of the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, P′and  Dmax. Given that mode = dynamic, GETSOLKB returns a maximal solution KB w.r.t. the current DPI as  (K \ Dmax) ∪ UP∪P′by means of the current DPI ⟨K, B, P ∪P′, N ∪N ′⟩Rand  Dmax. This solution KB is then returned as an output of Algorithm 5. If, on the other hand, the stop criterion is not met, the algorithm continues the execution with the computation of another query.

Remark 9.9 Notice that the returned maximal solution KB  (K \ Dmax) ∪ UPw.r.t. the input DPI in case mode = static can be easily extended to constitute a maximal solution KB w.r.t. the current DPI, namely by extending it by  UP′. If mode = dynamic, then the KB output in line 14 is a maximal solution KB w.r.t. the current DPI, but possibly a non-maximal solution KB w.r.t. the input DPI.

Query Computation and User Interaction. In line 16, the function CALCQUERY is applied to compute a query and the associated q-partition by means of the leading diagnoses  D✓, (possibly) the collected data qData, the probability distribution  pD()of the leading diagnoses, a query selection function qsm() (which might exploit the function  p �K∪K()), a parameter q determining the size of the computed query pool and the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

As a first step within CALCQUERY, the function GETPOOLOFQUERIES computes a query pool QP as detailed in Chapter 8 from  D✓, qand  ⟨K, B, P ∪P′, N ∪N ′⟩R. Then, the best tuple  ⟨Q, P(Q)⟩ ∈ QPaccording to the function qsm() is searched for and finally returned as the output of CALCQUERY. During the query selection process, the evaluation of the query selection measure  qsm(Q) ∈ Rfor queries Q where  ⟨Q, P(Q)⟩ ∈ QPmay require qData, the fault probabilities  pD()of leading diagnoses as well as the fault probabilities  p �K∪K()of syntactical elements in K. This depends on which concrete measure qsm() is employed (see Section 9.3 which presents some possible measures).

As a next step, the query Q of the best tuple  ⟨Q, P(Q)⟩ ∈ QPis presented to the interacting user in line 17 which is the only place in Algorithm 5 where user interaction takes place. The user is modeled as a deterministic function  u : QD,⟨K,B,P∪P′,N∪N ′⟩ → {true, false}that allocates a positive (true) or negative (false) answer to each query w.r.t. any set of leading diagnoses D for some current DPI ⟨K, B, P ∪ P′, N ∪ N ′⟩. The answer u(Q) given by the user is stored in the variable answer.

Remark 9.10 We want to point out that the algorithm can be easily adapted to allow a user to reject queries, e.g. if they are not sure how to answer. That is, the user function might be modeled as u : QD,⟨K,B,P∪P′,N∪N ′⟩ → {true, false, unknown}where u(Q) = unknown signifies the rejection of query Q. In this case, an accordingly modified version of Algorithm 5 would calculate an alternative query w.r.t. D and  ⟨K, B, P ∪ P′, N ∪ N ′⟩, e.g. the second best one according to the query selection measure qsm() among all tuples in QP (this potential feature is not shown in Algorithm 5). In this vein, a total of  |QP| − 1queries can be dismissed per set of leading diagnoses D.

We want to accentuate that the presented interactive algorithm might be easily adapted to cope with queries whose answer is unknown to the user, but a definite assumption for the algorithm to return a correct solution is a user that does not give wrong answers. In other words, the algorithm does not provide inherent mechanisms that allow for the detection of wrong answers or for the debugging of the KB debugging procedure (keyword “garbage in, garbage out”). So, we suppose the function u() to be deterministic which prohibits the situation that a user might change their mind at a later point in time. Of course, this is still a possible scenario in practice, but in case it arises, a user has to revise, i.e. delete or edit, specified test cases they disagree with by hand before a new debugging session using the modified DPI might be started.

Another remark at this place concerns the way a user might choose to answer the query. A “minimal” feedback of a user that we regard as an answer to a query Q is to merely say true, i.e. each formula in Q (or the conjunction of formulas in Q) must be entailed by the correct KB, or false, i.e. at least one formula in Q (or the conjunction of formulas in Q) must not be entailed by the correct KB. The presented algorithm (Algorithm 5) is designed to deal with exactly this kind of an answer. However, imagine a user being presented Q and think of how they might proceed in order to come up with an answer to Q. The first observation is that, in order to respond by true, a user must definitely scrutinize each single formula in Q because otherwise they could never decide for sure whether the conjunction of all formulas in Q is correct. Another observation is that a user might cease to go through the rest of the formulas in case they have already identified one that must not be an entailment of the desired KB. For, in this situation, the overall query Q is already false. This however indicates that at least one formula must be known to be correct or false whatever answer is given to Q. Therefore, we can usually expect a user to be able to give exactly this information, namely one formula in Q that must be incorrect, additionally to answering by false. This extra piece of information can be exploited to achieve better space and time efficiency in the context of diagnosis computation. Proposing more efficient algorithms that exploit this information is a topic for future work.

Incorporating the New Information. The new information represented by the answer answer to Q is incorporated (lines 18-26) by updating values of all relevant parameters. First, by means of the function APPEND, the tuple consisting of the answered query Q and the corresponding answer answer given by the user is added as a last element to the chronological list of queries and answers QA that is used for the next probability update (line 11).

Then, the subset  Doutof the leading diagnoses  D✓that gets invalidated after adding Q to the positive or negative test cases of the DPI, respectively, is computed by the function GETINVALIDDIAGS that gets the q-partition  P(Q) =�D+(Q), D−(Q), D0(Q)�of Qand answer as input arguments.  Doutthen corresponds to the set  D−(Q)given that answer is true and to  D+(Q)otherwise (cf. Section 7.4). Note that  ∅ ⊂ Dout ⊂ D✓holds by Proposition 7.4 and since Q is a query w.r.t.  D✓(since  D✓is given as an input to CALCQUERY).

As a next step, the data qData is updated. As already pointed out in Section 9.2.3, the form of the variable qData depends on the employed query selection measure qsm() and so do the actions that are performed by UPDATEQDATA.

In order to communicate the impact of the answered query to the hitting set tree algorithm (either STATICHS or DYNAMICHS), the set of invalidated leading diagnoses  Doutis deleted from the leading diagnoses  D✓and added to  D×. After this update,  D✓includes all diagnoses that have been computed by the hitting set tree algorithm so far that are minimal diagnoses w.r.t. the current DPI.

Finally, the new test case Q is added to the new positive test cases  P′if answer is true and to the new negative test cases  N ′in case of answer = false.

9.3 Query Selection Measures

In this section, we give a brief introduction to some query selection measures qsm() that have been suggested and evaluated in literature within the scope of KB or ontology debugging [SFFR12, RSFF13]. Such query selection measures, when used as a parameter in an interactive KB debugging algorithm such as the one described by Algorithm 5, aim at solving the following optimization problems. In Interactive Dynamic KB Debugging, the problem is defined as follows:

Problem Definition 9.1. The task is to solve the problem specified by Problem Definition 6.1 in a way that  |P′| + |N ′|is minimal.

In Interactive Static KB Debugging, the problem is defined as follows:

Problem Definition 9.2. The task is to solve the problem specified by Problem Definition 6.2 in a way that  |P′| + |N ′|is minimal.

That is, these optimization problems aim at the minimization of user effort during interactive KB debugging. In other words, the goal is the minimization of the number of queries required to be asked to a user in order to solve the Interactive Static KB Debugging or the Interactive Dynamic KB Debugging Problem, respectively.

In our previous work [SFFR12], we have discussed entropy-based (ENT()) and split-in-half (SPL()) query selection measures.

Entropy-Based Query Selection. A best query  QENTaccording to ENT() has a maximal information gain among all queries Q where  ⟨Q, P(Q)⟩ ∈ QP. In other words,  QENTminimizes the expected entropy of the probability distribution of the leading diagnoses  D✓after  QENThas been added as a test case to the DPI based on the user’s answer  u(QENT). As shown in [dKW87], this leads to the definition

image

where p() in the case of our algorithm corresponds to the leading diagnoses probability measure  pD()computed in line 11 in Algorithm 5 and

image

(cf. Section 7.4) where

image

Then, the best query in a pool QP according to qsm() := ENT() is

image

So, theoretically optimal w.r.t. ENT() is a query Q whose positive and negative answers are equally likely and for which  D0(Q)is the empty set. In other words, the best query has the property that the sum of probabilities of leading diagnoses predicting the positive answer as well as the sum of probabilities of leading diagnoses predicting the negative answer is 50%.

Split-In-Half Query Selection. For the selection criterion qsm() := SPL(), on the other hand, the query

image

is preferred where

image

Hence, this measure is optimized by queries Q for which the number of leading diagnoses predicting the positive answer is equal to the number of leading diagnoses predicting the negative answer and for which D0(Q)is the empty set.

Risk-Optimized Query Selection. For scenarios where a-priori probabilities are vague, we have presented another more complex query selection measure RIO() in [RSFF13] which uses a reinforcement learning strategy to constantly adapt some “risk” parameter that indicates the current amount of trust in the probabilities. Whereas ENT() and SPL() do not rely on qData, this learning strategy does so and requires the invalidation rate or “performance”, i.e. |Dout||D✓|, of the previous iteration for the adaptation of the learning parameter. As long as the invalidation rate is “good”, the trust in the current (a-posteriori) probabilities – that strongly depend on the vague a-priori probabilities – is high, but it is gradually decreased after observing “worse” performance, and so on. High trust in the probabilities means usage of ENT() which can exploit high quality fault information well as demonstrated in the experiments conducted in [SFFR12], whereas low trust involves selection of queries that guarantee a higher worst case invalidation rate, i.e. have similar properties to queries SPL() would select.

Example 9.1 Let us reconsider the queries and associated q-partitions for the example DPI of Table 15.2 that are depicted by Table 8.3 on page 113. Let us denote by  Qi ≺M Qjthat  Qiis preferred over  Qjand by  Qi ≺≻M Qjthat  Qiis equally preferable as  Qjif the query selection measure qsm() := M is used. Furthermore, we make the assumption that the probability distribution  pDof the (leading) diagnoses D✓ = {D1, . . . , D4}is as shown in Table 9.1.

Then, we make the following observations:

 Q6is the theoretically optimal query w.r.t. ENT() since  pD(D+(Q6)) = 0.5, pD(D−(Q6)) = 0.5and  D0(Q6) = ∅, i.e. the positive and the negative answer have equal probabilities of 50% and thus  Q6the highest theoretically possible information gain of 1 (bit). This can be compared with one toss of a coin where the information gain of tossing the coin and checking whether it is head or tail is highest in a case where the coin is fair. For a coin that shows head with a probability of 0.95, conversely, the information gain of tossing the coin is rather small since we are already quite sure about the result in advance.

image

Table 9.1: (Example 9.1) Diagnosis probabilities for the example DPI given by Table 15.2.

 Q9 ≺M Q5as well as  Q9 ≺M Q2for  M ∈ {SPL(), ENT()}because both  Q5and  Q2share oneset in  {D+, D−}with  Q9, but exhibit a non-empty set  D0whereas  D0(Q9) = ∅. This shows that both split-in-half and entropy-based query selection penalize a query Q if there are leading diagnoses that are definitely not discriminated by it, i.e.  D0(Q) ̸= ∅. This is perfectly desirable as we discussed.

 Q4 ≺≻M Q10for  M ∈ {SPL(), ENT()}since their q-partitions differ just by commutation of thesets  D+and  D−. This is what one would expect of such a measure, i.e. that it does not matter whether the positive or negative answer is more probable if the probability values are the same (in case of ENT()) and whether the number of diagnoses predicting the positive or negative answer is higher if the numbers are the same (in case of SPL()). However, notice that  Q4might be much easier to comprehend and answer for the interacting user. Therefore,  Q4might be preferred in a scenario where some second measure  qsm2()comes into play to identify a best query among equally preferable queries w.r.t. some  qsm1()that is used as a primary measure. For, example some “query-easiness” measure  qsm2()might be employed after  qsm1() ∈ {SPL(), ENT()}has filtered out an equally preferable set of queries; in this case let this set be  {Q4, Q10}. The measure qsm2()could be defined to simply count the logical connectives and quantifiers occurring in a query Q and pick one for which this number is minimal. In this case, this number would be 0 for Q4and 7 for  Q10, wherefore  Q4would be decisively better than  Q10w.r.t.  qsm2().

It holds that  Q3 ≺ENT() Q10 ≺ENT() Q1, but  Q3 ≺≻SPL() Q10 ≺≻SPL() Q1. The former holds since all three queries feature an empty set  D0, but the difference between  p(D+)and  p(D−)is largest for  Q1 (p(D+(Q1)) = 0.95), second largest for  Q10 (p(D−(Q10)) = 0.85) and smallest for  Q3 (p(D+(Q3)) = 0.7).

 Q9is the second best query among those given in Table 8.3 because both answers of it are almost equally probable (positive answer has a probability of 0.55 and negative answer a probability of 0.45).

Queries  Q7, Q8and  Q9are theoretically optimal w.r.t. the SPL() measure, since  D0 = ∅and |D+| = |D−|for all of them.

Regarding the RIO() measure, queries  Q7, Q8and  Q9are “no risk” queries since they feature the maximum possible worst case elimination rate of  50%. Q2and  Q6, for instance, have a “higher risk” as their minimal invalidation rate amounts to only 25%. That is, if  Q2 (Q6) is answered positively (negatively), then only one of four leading diagnoses is invalidated.

9.4 Interactive Debugging Algorithm: Correctness and Complexity

First, we prove the correctness of Proposition 9.1 on page 124 by using the results of Sections 11.4 and 12.4.10 which provide evidence for the correctness (soundness, completeness and optimality) of methods STATICHS and DYNAMICHS:

Proof of Proposition 9.1. First, we argue why Algorithm 5 must terminate. The function GETFORMU- LAPROBS in line 5 terminates since it applies Formulas 4.2 and 4.7 |K| times and |K| is finite by Defini-tion 3.1. If mode = static, then STATICHS terminates due to Proposition 11.1. If mode = dynamic, then DYNAMICHS terminates due to Corollary 12.8. GETPROBDIST terminates since (1) the number of already answered queries |QA| is finite, (2)  |D✓|is finite since diagnoses are subsets of K and thus there is only a finite number of (minimal) diagnoses w.r.t. any DPI according to Definition 3.1 (since all sets included in the DPI are finite) and (3) reasoning (GETENTAILMENTS and ISKBVALID) is assumed to be decidable for the logic L over which the DPI is formulated as per Chapter 2. Further, GETMODE clearly terminates due to the fact that  |D✓|is finite and returns the mode  Dmaxof the diagnoses probability distribution  pD()over the diagnoses in  D✓. Now, if the stop criterion  pD(Dmax) ≥ 1 − σis met, then GETSOLKB is called. GETSOLKB simply deletes the given diagnosis  Dmaxfrom the given KB K and adds a finite set of formulas to it, and thence terminates.

If the stop criterion is not met, then  |D✓| ≥ 2must hold as otherwise the single diagnosis  D ∈ D✓would necessarily have fulfilled the stop criterion as its probability as per any probability measure over the sample space  Ω := D✓must be equal to 1 and thus greater than or equal to  1 − σwhere  σ ≥ 0.

Due to  |D✓| ≥ 2, Proposition 8.10 implies that GETPOOLOFQUERIES (called within CALCQUERY) terminates and yields a non-empty query pool as output. SELECTBESTQUERY (also called within CAL- CQUERY) terminates as well since it simply selects one query from the pool according to the measure qsm() (cf. Section 9.3). Since we assume the interacting user to answer to a query or to reject it within finite time, u(Q) also terminates. It is clear that APPEND terminates. GETINVALIDDIAGS simply extracts one entry of the given q-partition and thus terminates. Finally, UPDATEQDATA also terminates by assumption (no qsm() must be used for which UPDATEQDATA might not terminate). As a consequence, all functions called in Algorithm 5 terminate. What remains to be proven is that the stop criterion must be met after a finite number of iterations, i.e. after a finite number of test cases have been added to the input DPI.

In mode = static the stop criterion must be satisfied after a finite number of iterations due to the following argumentation:

There is a finite set of minimal diagnoses w.r.t. the input DPI  ⟨K, B, P, N ⟩Rsince each (minimal) diagnosis w.r.t. this DPI is a subset of K according to Definition 3.5 and since |K| is finite by Definition 3.1.

In each iteration, one test case is added either to  P′or  N ′.

Each test case added to whatever set  P′or  N ′invalidates at least one minimal diagnosis w.r.t. the input DPI in the set  D✓by the definition of a query (Definition 7.1) and since each query is computed w.r.t. the leading diagnoses  D✓by the correctness of GETPOOLOFQUERIES (cf. Proposition 8.10).

 D✓contains only minimal diagnoses w.r.t. the input DPI by Proposition 11.1.

Also by Proposition 11.1, no invalidated minimal diagnosis w.r.t. the input DPI can be an element of some subsequent set of leading diagnoses  D✓.

Therefore, unless the stop criterion is met before due to a sufficiently high probability of one of multiple leading diagnoses as per  pD(), Algorithm 5 in mode = static must arrive at a point where |D✓| = 1after a finite number of iterations. Note that  |D✓| = 0is impossible due to the definition of a query (Definition 7.1) which ensures that each added test case leaves valid at least one minimal diagnosis in  D✓.

Algorithm 5 terminates in mode = dynamic since for any sequence QA of queries that are added to the positive or negative test cases  P′or  N ′, respectively, there is a finite number  kQAsuch that there is no more than one minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rfor  |P′| + |N ′| = kQAwherefore the stop criterion must be met. Now, let us assume that the opposite holds. That is, there is a sequence  QA∗of queries that are added to the positive or negative test cases  P′or  N ′, respectively, and for all natural numbers k there is more than one minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rfor  |P′|+|N ′| = k. Then we argue as follows to derive a contradiction:

There is a finite set of (minimal) diagnoses w.r.t. any DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Robtained from the input DPI by the addition of test cases. This is true since |K| is finite by Definition 3.1 and since each (minimal) diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Ris a subset of K according to Defi-nition 3.5.

In each iteration, one test case is added either to  P′or  N ′.

Each test case added to whatever set  P′or  N ′invalidates at least one minimal diagnosis w.r.t. the current DPI in the set  D✓by the definition of a query (Definition 7.1) and since each query is computed w.r.t. the leading diagnoses  D✓by the correctness of GETPOOLOFQUERIES (cf. Proposition 8.10).

If DPI denotes the current DPI at the time DYNAMICHS is called, then the set  D✓returned by DYNAMICHS is a subset of or equal to  mDDP I, i.e.  D✓contains only minimal diagnoses w.r.t. DPI by Corollary 12.8.

Let  ⟨DPI0, DPI1, . . . ⟩denote the sequence of DPIs encountered in the case of adding answered queries as test cases to the input DPI  DPI0as per  QA∗. Further, let  ⟨aD0, aD1, . . . ⟩be the sequence such that  aDi := aDDP Ii, i = 0, 1, . . ., i.e.  aDiis the set of all diagnoses w.r.t.  DPIi. Then  aDi ⊃ aDi+1for all  i ≥ 0due to Corollary 12.4.

As each query added as a test case to  DPIileaves valid at least one (minimal) diagnosis w.r.t. DPIidue to Definition 7.1, we have that  aDk ⊃ ∅for k = 0, 1, . . . .

Since  aDiis finite, there must be some finite number  k∗such that  |aDk∗| = 1wherefore  |mDk∗| =1 must also be valid. This is a contradiction.

Thence, Algorithm 5 terminates in any mode mode. Now, we show that propositions (1)-(6) of Proposition 9.1 hold for (i) mode = static and (ii) mode = dynamic.

(i): First, by the proof so far, we have that Algorithm 5 in mode = static given the input DPI ⟨K, B, P, N ⟩Rterminates. Since the only point where the algorithm can terminate is line 14, GETSOLKB is called with arguments  ⟨Dmax, ⟨K, B, P ∪ P′, N ∪ N ′⟩R, P′, static⟩. By the definition of GETSOLKB (see Section 9.2.4), we have that  (K \ Dmax) ∪ UPis returned by the algorithm.

Propositions (1) and (2) follow from the specification of the GETMODE function which is called with arguments  ⟨D✓, pD()⟩. Proposition (3) is true since GETSOLKB can never be reached without pD(Dmax) ≥ 1 − σbeing fulfilled.  D✓ ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩Ris true due to Proposition 11.1, Remark 9.5 and the fact that  D✓is obtained as an output of STATICHS. Hence, Proposition (4) holds. Proposition (5) is implied by Remark 9.5 and by the specification of the GETFORMU- LAPROBS function which computes  pK()from  p �K∪K()as per Formulas 4.2 and 4.7 in line 5. Finally, Proposition (6) is a consequence of the definition of the GETPROBDIST function which accounts for the computation of  pD()from  pK(), the input DPI,  D✓and the chronological sequence of all queries and associated answers QA so far. Therefore, Proposition 9.1 is true for mode = static.

(ii): First, by the proof so far, we have that Algorithm 5 in mode = dynamic given the input DPI  ⟨K, B, P, N ⟩Rterminates. Since the only point where the algorithm can terminate is line 14, GET-SOLKB is called with arguments  ⟨Dmax, ⟨K, B, P ∪ P′, N ∪ N ′⟩R, P′, dynamic⟩. By the definition of GETSOLKB (see Section 9.2.4), we have that  (K \ Dmax) ∪ UP∪P′is returned by the algorithm.

Propositions (1) and (2) follow from the specification of the GETMODE function which is called with arguments  ⟨D✓, pD()⟩. Proposition (3) is true since GETSOLKB can never be reached without pD(Dmax) ≥ 1 − σbeing fulfilled.  D✓ ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Ris true due to Corollary 12.8, Remark 9.5 and the fact that  D✓is obtained as an output of DYNAMICHS. Hence, Proposition (4) holds. Proposition (5) is implied by Remark 9.5 and by the specification of the GETFORMULAPROBS function which computes  pK()from  p �K∪K()as per Formulas 4.2 and 4.7 in line 5. Finally, Proposition (6) is a consequence of the definition of the GETPROBDIST function which accounts for the computation of  pD()from  pK(), the input DPI,  D✓and the chronological sequence of all queries and associated answers QA so far. Therefore, Proposition 9.1 is true for mode = dynamic.

Next, we show that the solution to Interactive Static KB Debugging is found for  σ = 0in case mode = static:

(s1)  D✓ ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩Rholds for the output of STATICHS in each iteration by Proposition 11.1. Therefore,  D✓comprises only minimal diagnoses w.r.t. the input DPI that comply with all specified test cases in  P′and  N ′.

image

(s2) By  p �K∪K() : �K ∪ K → (0, 1]we derive by Formula 4.2 that each formula in K must have a probability greater than zero. Further, by Formula 4.7, no formula in K can have a probability greater than or equal to 0.5 (i.e. in particular a probability of 1 is not possible for a formula). Hence, we have that  pK : K → (0, 0.5)for the measure  pK()computed by GETFORMULAPROBS in line 5 in Algorithm 5. Thence, by the definition of  pnodes()in STATICHS based on  p() := pK()(cf. Definition 4.9 on page 73) due to the fact that  pK()is given as an input argument to STATICHS in line 8, we have that no diagnosis can have an (a-priori) probability of zero. Since the function GETPROBDIST might only perform some multiplications of a diagnosis probability by 12, also the a-posteriori probability of each diagnosis must be greater than zero.

(s3) Hence, due to  σ = 0, it must be necessarily be true that  |D✓| = 1before the algorithm terminates.

(s4) By Problem Definition 6.2 and the specification of the GETSOLKB function, the output solution KB must be the solution to Interactive Static KB Debugging.

That a solution found for  σ > 0in case mode = static might be an approximate solution to Interactive Static KB Debugging is a direct consequence of the definition of approximate solution given in Remark 9.2.

Finally, the proof that the solution to Interactive Dynamic KB Debugging is found for  σ = 0in case mode = dynamic is analogue to the one for mode = static, just

(d1)  D✓ ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Rholds for the output of DYNAMICHS in each iteration by Corollary 12.8. Therefore,  D✓comprises only minimal diagnoses w.r.t. the current DPI.

(d2) By (s2), (s3), Problem Definition 6.1 and the specification of the GETSOLKB function, the output solution KB must be the solution to Interactive Dynamic KB Debugging.

That a solution found for  σ > 0in case mode = dynamic might be an approximate solution to Interactive Dynamic KB Debugging is a direct consequence of the definition of approximate solution given in Remark 9.2.

This completes the proof of Proposition 9.1.

Next, we examine the complexity of Algorithm 5.26 To this end, we denote in the following by expensive operation a call of a (usually) expensive function such as one that internally consults a logical reasoner or another operation such as addition or multiplication that is the most time consuming algorithmic action within a certain part of an algorithm. We analyze Algorithm 5 in terms of the number num of expensive operations that are required during its execution in the worst case. The worst case time required by Algorithm 5 is then the multiplication of the maximal worst case time consumption of any expensive operation throughout the algorithm by num.

The next propositions assume |K| as an upper bound of  |P′| + |N ′|. This is plausible in the light of evaluations performed in e.g. [SFFR12, RSFF13] which substantiate that usually the size of the faulty KB exceeds the number of queries that are necessary to solve the interactive debugging problem by several orders of magnitude.

We first investigate the complexity of the function GETPROBDIST which is called once in each iteration of Algorithm 5:

Proposition 9.2. Let |K| be an upper bound of  |P′| + |N ′|. Then, the function GETPROBDIST in Algorithm 5 requires a number of expensive operations that is linear in |K|.

Proof. The time complexity of GETPROBDIST can be assessed by adding the complexities of (i) GET-PRIODIAGPROBS, (ii) the for-loop between line 30 and 41, (iii) the summation in line 42 and (iv) the for-loop in lines 43 and 44. Time complexity of (i) is in  O(nmax |K|)since  |D✓| ≤ nmaxwhere  nmaxis a predefined constant and  |K| − 1multiplications must be conducted per diagnosis in  D✓. (ii) requires  |QA| |D✓| ≤ (|P′| + |N ′|) nmax ≤ |K| nmaxmany calls to functions GETENTAILMENTS and ISKBVALID, respectively, that internally call a logic reasoner. Time requirements of (iii) amount to O(|D✓|) = O(nmax)summations. Finally, (iv) involves  O(nmax)multiplications.

Thus, we obtain an overall time complexity of  O(nmax |K| + nmax |K| + nmax + nmax) = O(|K|)for GETPROBDIST.

The next proposition is based on this result and witnesses that Algorithm 5 requires only a quadratic number of expensive operations in the size of the KB K.

Proposition 9.3. Let |K| be an upper bound of  |P′| + |N ′|and let the function qsm() given as input to Algorithm 5 be such that the time complexity of UPDATEQDATA is in O(|K|). Minus the time consumed by diagnosis computation (by STATICHS in case of mode = static or by DYNAMICHS otherwise), the time complexity in terms of number of required expensive operations of Algorithm 5 is quadratic in |K|.

Proof. Variable instatiation (lines 1-4) and variable update (lines 18-26) is in O(1) where some query selection measure qsm() is supposed to be used, for which the time complexity of UPDATEQDATA is in O(|K|) (this holds for all query selection measures described in Section 9.3). GETFORMULAPROBS called in line 5 runs in  O(|K| |ax max|)as Formula 4.2 is applied once to each formula in K for each of which at most  |ax max|multiplications are performed where  |ax max|is the maximum size of a formula in K in terms of included syntactical elements (multiple occurrences of one and the same symbol are counted multiply). As shown by Proposition 9.2, the complexity of GETPROBDIST called in line 11 is in O(|K|). Execution of GETMODE needs one iteration over all diagnoses in  D✓in order to determine the one with maximum probability, i.e. it runs in  O(|nmax|) = O(1)time since  nmaxis a constant. Next, GETSOLKB which computes a solution KB from a given diagnosis D works in  O(|D| + |P| + |P′|) ⊆ O(|K|)since |D| elements need to be deleted from a set of cardinality K which can be accomplished in constant time per element (e.g., using a hashtable) and additionally at most  |P|+|P′|set union operations are required, namely the union of (K \ D) with  UP∪P′where the latter needs  |P| + |P′| − 1set union operations. As |P| is a constant  c, O(|D| + |P| + |P′|) ⊆ O(2c |K|) ⊆ O(|K|). In Section 8.5, we have already underlined that GETPOOLOFQUERIES is a fixed parameter tractable problem, i.e. it requires

image

calls to a reasoner in the worst case (cf. Proposition 8.9). Similarly, SELECTQUERY involves  O(2nmax)comparisons  qsm(Qi) < qsm(Qj)for  Qi, Qj ∈ QPsince the cardinality of the computed query pool is in  O(2nmax). The latter holds due to Proposition 8.10 which substantiates that the calculated query pool includes at most one query Q for which  D+(Q) = Yfor each  Y ⊂ D✓. And, an upper bound for the cardinality of  D✓is the constant  nmax. Therefore, the runtime of SELECTQUERY is in O(1), too.

Since adding up a number of time complexities each of which is at most in O(|K|), we can conclude that the runtime of one iteration of Algorithm 5 minus the time needed for diagnosis computation is also in O(|K|), i.e. linear in |K| in terms of number of expensive operations needed. As there might be a maximum of |K| iterations by the premise that  |P′| + |N ′| ≤ |K|, we obtain an overall time complexity – minus the complexity of diagnoses computation – of  O(|K|2)for Algorithm 5.

That is, Algorithm 5 requires only a quadratic number of expensive operations “outside” of the methods STATICHS or DYNAMICHS, respectively, that account for diagnosis computation. That the substantial complexity of Algorithm 5 lies in the computation of diagnoses, is confirmed by the following results.

The first result is based on the fact that determining minimal diagnoses w.r.t. a DPI is an MBD problem (cf. page 8) which in turn can be regarded as an abduction problem as defined in [BATJ91]. More precisely, the problem of detecting minimal diagnoses w.r.t. a DPI is a monotonic abduction problem [BATJ91]. Hence, the following proposition holds [BATJ91, Theorem 4.3]:

Proposition 9.4. Let  ⟨K, B, P, N ⟩Rbe a DPI over L and let ISKBVALID (see Algorithm 1) be a function computable for L in polynomial time w.r.t. the size of  ⟨K, B, P, N ⟩R(cf. the description of the function e in [BATJ91, Section 3.3]). Then, given a set D of minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rsuch that ∅ ⊂ D, it is NP-complete to determine whether there is a minimal diagnosis D w.r.t.  ⟨K, B, P, N ⟩Rsuch that  D /∈ D.

Remark 9.11 The function ISKBVALID in the case of KB debugging is analogue to the function e used in [BATJ91]. Given the overall data  Dallthat must be explained by a solution to an abduction problem, the function e computes for a subset H of  Hall, the set of all individual hypotheses, the set e(H) = D where  D ⊆ Dallis the data explained by H. H is an explanation of the abduction problem iff it is set-minimal and  e(H) = Dall[BATJ91].

In the case of our KB debugging system, given a DPI  ⟨K, B, P, N ⟩R, Dallcorresponds to the set of all requirements in R and all test cases in N violated by  K∪B∪UP. Hallcorresponds to K. So, e corresponds to ISKBVALID since ISKBVALID is given some K \ D and  ⟨·, B, P, N ⟩R(where D corresponds to some H ⊆ Hall) and checks whether  (K \ D) ∪ B ∪ UPdoes not violate any requirement or test case, i.e. whether  e(H) = Dall. Notice that ISKBVALID can easily be slightly modified to return the subset of Dallthat is explained by H, i.e. the subset of the initially violated requirements and test cases that are resolved by deletion of D from  K∪B∪UP. To this end, the early termination in case of detected invalidity must simply be omitted.

Remark 9.12 An abduction problem is monotonic [BATJ91] iff for all  H, H′ ⊆ Hallit holds that H ⊆ H′ → e(H) ⊆ e(H′). That parsimonious KB debugging (or the problems given by Problem Definitions 3.2, 6.2, 6.1, 9.2 and 9.1) seen as an abduction problem is indeed monotonic is a simple consequence of the monotonicity of the logic L over which a DPI must be defined (as per the postulations of Chapter 2). For, if  (K \ D′) ∪ B ∪ UP |= x, then also  (K \ D) ∪ B ∪ UP |= xfor  D ⊆ D′. Modeling requirements  r ∈ Ras unwanted entailments of the correct KB (see Remark 3.2), we immediately see that D cannot resolve more unwanted entailments  x ∈ R ∪ Nthan  D′. Thence, parsimonious KB debugging is a monotonic abduction problem.

Unfortunately, ISKBVALID is not tractable (i.e. computable in polynomial time) for many logics L. In particular, it is already in  ∆P2 =PNP for PL (cf. the polynomial hierarchy defined by [MS72]). This holds since propositional satisfiability checking is NP-complete [Coo71, Kar72] and since ISKBVALID, in order to to check the validity (see Definition 3.3) of a set of PL formulas X w.r.t. some PL DPI ⟨·, B, P ∪ P′, N ∪ N ′⟩R, requires a polynomial number of calls to a propositional satisfiability checker AlgSAT. For, by the definition of ISKBVALID (see Algorithm 1), one call of  AlgSATis required for testing whether  X ∪ B ∪ UP∪P′is consistent and a maximum of  |N | + |N ′|further calls are needed to verify whether  X ∪ B ∪ UP∪P′ ∪ {¬n}is consistent for all  n ∈ N ∪ N ′, i.e. whether  X ∪ B ∪ UP∪P′ ̸|= nfor all  n ∈ N ∪ N ′(note that  ¬nrefers to the formula  ¬ax 1 ∨ · · · ∨ ¬ax kif  n := {ax 1, . . . , ax k}, cf. page 27). Since we assume  |P′| + |N ′| ≤ |K|and since |N | is a constant throughout the execution of Algorithm 5, we have that the number  |N | + |N ′| + 1 ≤ |N | + |K| + 1of calls to  AlgSATperformed by ISKBVALID is bounded by a polynomial in |K|.

As a conclusion of this discussion and Proposition 9.4, we have:

Corollary 9.1. Let  ⟨K, B, P, N ⟩Rbe a PL DPI given as an input to Algorithm 5. Then, each call of STATICHS or DYNAMICHS within Algorithm 5 must solve (at least) an NP-complete problem by means of an oracle that requires a polynomial number of calls to another NP-complete oracle.

Proof. Both STATICHS and DYNAMICHS must return a set of at least  nmin ≥ 2minimal diagnoses each time they are called (given that  nminminimal diagnoses exist w.r.t. the given DPI) due to the specification of input parameter  nminin Algorithm 5 and the calls of STATICHS and DYNAMICHS in lines 8 and 10, respectively. For the first call, this implies that at least two minimal diagnoses must be found. Hence, Proposition 9.4 applies to the complexity of finding the second minimal diagnosis during the execution of the first call of both STATICHS and DYNAMICHS, just that ISKBVALID does not terminate in polynomial time, but uses a polynomial number of calls to an NP-complete oracle (the propositional satisfiability checker).

In each subsequent call of any of the two methods STATICHS and DYNAMICHS, the existing set of leading diagnoses will contain at least one minimal diagnosis w.r.t. the current DPI (since each query leaves valid at least one leading diagnosis, cf. Definition 7.1), and at least one further minimal diagnosis w.r.t. this DPI must be extracted (cf. bullet (aii) in the characterization of the outputs of STATICHS and DYNAMICHS on page 128 ff.). Thus, Proposition 9.4 holds for the computation of the first diagnosis in any subsequent call of any of the two functions, just that ISKBVALID does not terminate in polynomial time, but uses a polynomial number of calls to an NP-complete oracle (the propositional satisfiability checker).

The general complexity of ISKBVALID is even worse if DPIs over more expressive logics such as OWL 2 are considered for which one single call of a reasoner invoked by ISKBVALID is already 2-NEXPTIME-complete [GHM+08, Kaz08].

However, in spite of these discouraging theoretical complexity results, debugging techniques similar to the ones discussed in this work have proven to perform reasonably in practice for many real-world KB debugging problems over DL and OWL languages, respectively [SFFR12, RSFF13, SFRF14c] which are more expressive than PL. For instance, we have shown in [SFFR12] that faulty real-world OWL KBs with sizes of up to over 33000 formulas are efficiently interactively debuggable with similar methods as those presented in this work (reaction time of the system, i.e. time between two successive queries: only 1 minute; average query length: not more than 4 formulas; overall number of queries: at most 14). Moreover, we have demonstrated in [RSFF13] that a pair of real-world OWL KBs (the first including over 11000 formulas, the second almost 5000) that has been automatically integrated by diverse ontology matching systems resulting in a faulty aligned KB (see Chapter 32 for details; we also list some matching systems there) can be debugged with absolutely reasonable time and query answering effort for the interacting user. In concrete terms, the RIO debugging strategy proposed in [RSFF13] (which can also be plugged in as a query selection measure into the system described in this work, see Section 9.3) involved an average reaction time of no more than 13 seconds and required an average number of queries to be answered by the user of no more than nine.

In this part we dealt with how the process of KB debugging can be designed so as to enable a (group of) user(s) to interact with the debugging software in order to achieve high quality solutions. We defined the problem of interactive static KB debugging as well as the problem of interactive dynamic KB debugging which “naturally” arise from the fact that the DPI in interactive KB debugging is always renewed after a new test case has been specified (a new query has been answered). The former problem searches for a solution KB w.r.t. the original DPI given as input such that this solution KB satisfies all test cases added during the debugging session and there is no other such solution KB. The latter problem searches for a solution KB w.r.t. the current DPI (i.e. the original DPI including all new test cases added throughout the debugging session so far) such that there is no other solution KB w.r.t. the current DPI.

We specified the pivotal notion of a query which constitutes the “interface” between the debugging system and the interacting user. Queries are sets of logical formulas satisfying the search space restriction as well as the solution preservation property. That is, incorporation of any answer to a particular query into the debugging process leads to a reduction of the search space for solutions on the one hand, but guarantees the existence of at least one remaining solution on the other hand. Queries are generated from a set of leading diagnoses that act as a representative of all (minimal) diagnoses. We established that, for any set of at least two leading diagnoses, a query exists. The unique q-partition of a query constitutes the relationship between a query and the set of leading diagnoses and can be used to decide for a set of logical formulas whether this set is or is not a query. Furthermore, the q-partition can be used to estimate the impact of a query answer on the (distribution of the) set of solutions and thence can be exploited to assess the (expected) quality of different queries which in turn can help to filter out a suitable query among a pool of possible queries.

It was also presented how a pool of queries can be generated for a given set of leading diagnoses and a DPI. We showed how to minimize these queries in terms of the included number of logical formulas the aim of which is to strain the user(s) as little as possible when it comes to answering them. Moreover, we pointed out that query generation is a fixed parameter tractable problem due to the fact that the (maximum) number of leading diagnoses can be predefined and therefore constitutes a constant value (which is not growing as the diagnosis problem instance grows). We featured an in-depth discussion of the properties of the query generation algorithm, in the course of which we detected several drawbacks. The gave a hint to potential solutions that we will address in our future work. Additionally, we formally proved the correctness of the query generation method and derived complexity results. All of this was concretized by means of several illustrating examples.

Finally, we explicated the central algorithm of this work which implements an interactive KB debugging system. First, an overview of the workflow of interactive KB debugging was given, followed by a more comprehensive detailed specification of the algorithm. Some query selection measures (all of which are later covered in more depth in Parts IV and V) were discussed and optimization versions of the problems of interactive dynamic and static KB debugging were defined where the goal is to obtain the solution to these problems by asking the user a minimal number of queries. Finally, we formally proved the correctness of the interactive KB debugging algorithm and gave a discussion of its complexity.

image

image

In this part we introduce and discuss two methods, STATICHS and DYNAMICHS, which are called in lines 8 and 10 of Algorithm 5, respectively. The former provides a method for solving the Interactive Static KB Debugging Problem (Problem Definition 6.2) whereas the latter aims at solving the Interactive Dynamic KB Debugging Problem (Problem Definition 6.1). Both are methods for iterative diagnosis computation that are employed to compute a set of leading diagnoses in each iteration of the presented interactive KB debugging algorithm (Algorithm 5). Each time a query has been answered by the interacting user and added to the respective set of test cases of the DPI, a subset of the leading diagnoses (and usually also a set of not-yet-computed minimal diagnoses) is invalidated. An iterative diagnosis computation method is then invoked to update the leading diagnoses set taking the new information into account that is given by the recently added test case. That is, the  k ≤ nmaxmost probable ways of solving the Interactive Static (Dynamic) KB Debugging Problem in the light of the new evidence are extracted by STATICHS (DYNAMICHS) after the search space has been suitably pruned. In this vein, if there is only one solution left, the (exact) solution of Interactive Static (Dynamic) KB Debugging has been found.

Chapter 11 provides an in-depth description of the static method and proves its correctness. Chapter 12 details the dynamic method and demonstrates its correctness. The practically oriented reader or the one that is willing to believe that the presented iterative diagnosis computation techniques in fact work as claimed might skip Sections 11.4 as well as 12.4 in this part.27

Computation Algorithm

As the name already suggests, STATICHS (Algorithm 7) is a procedure that solves the problem of Interactive Static KB Debugging defined by Problem Definition 6.2 if used for leading diagnosis computation in Algorithm 5. STATICHS is sound, complete and optimal w.r.t. the set of solutions of the Interactive Static KB Debugging problem (this will be proven in Section 11.4). Optimality refers to the best-first computation of minimal diagnoses regarding a given probability measure.

11.1 Overview and Intuition

The STATICHS algorithm is strongly related to the non-interactive hitting set algorithm HS (see Algorithm 2) in that, at any stage during the execution of Algorithm 5, the hitting set tree produced by STATICHS corresponds to some part of the complete (non-interactive) wpHS-tree built-up by Algorithm 2. This is achieved by the strategy to use new test cases only for the invalidation of diagnoses, and not for the computation of conflict sets (and thus diagnoses). That is, all minimal conflict sets are computed w.r.t. the input DPI. Thereby, the introduction of new diagnoses, i.e. ones that are not minimal diagnoses w.r.t. the input DPI, through addition of new test cases to the DPI is prohibited (cf. Proposition 4.6).

So, what STATICHS as a subroutine of Algorithm 5 does is gradually building up the standard (non-interactive) wpHS-tree in multiple phases. During each phase some new (not-yet-computed) minimal diagnoses w.r.t. the input DPI are computed, in the order of their probability, most probable ones first. Before such a newly detected minimal diagnosis is added to the set of leading diagnoses (Dcalc ∪ D✓), a test is performed that verifies that this new diagnosis is consistent with all test cases added to the input DPI so far. In this vein, all answered queries so far not only serve to eliminate a subset of the set of leading diagnoses at the time when the respective query is answered, but also to eliminate incompatible minimal diagnoses w.r.t. the input DPI that are found at some later point in time. However, in order to be eliminated due to a specified test case, a minimal diagnosis must first be computed. That is, no partial diagnoses can be eliminated due to newly specified test cases.

Between each two phases of tree construction, a query computed on the basis of the current set of leading diagnoses is asked to the user (this is accomplished directly in Algorithm 5). After incorporating the user’s answer, some leading diagnoses are eliminated (this is granted by the definition of a query, see Definition 7.1). Moreover, the “state” of the tree is maintained during the execution of Algorithm 5 until STATICHS is again called in order to calculate further leading diagnoses. The state of the current partial

wpHS-tree is stored by variables

 Dcalc ∪ D✓– computed minimal diagnoses w.r.t. the input DPI consistent with all test cases specified so far,

Q – the list of open, non-labeled nodes,

 Ccalc– minimal conflict sets w.r.t. the input DPI computed so far and

 D×– computed minimal diagnoses w.r.t. the input DPI not consistent with all test cases specified so far.

Each time a tree construction phase, i.e. the computation of new leading diagnoses, is finished, a new diagnosis probability distribution is obtained by the diagnosis probability update as per Bayes’ Theorem described in Section 9.2. Once this distribution involves one highly probable diagnosis (the probability of which exceeds a predefined threshold  1 − σ) and else just highly improbable ones, the algorithm terminates. The output is a solution KB w.r.t. the input DPI built from this highly probable minimal diagnosis.

Remark 11.1 In case  σhas a predefined value of zero, the output is the (exact) solution to the problem of Interactive Static KB Debugging for the input DPI. In a scenario where some fault tolerance  σ > 0is given, the solution KB returned by Algorithm 5 is an approximation of the (exact) solution to Interactive Static KB Debugging for the input DPI where a better approximation can be expected for smaller values of  σ(cf. Remark 9.2). “Better” in this context refers to the satisfaction of desired semantic properties of the KB returned by Algorithm 5, i.e. desired entailments and desired non-entailments of the KB. The intuition is that the specification of additional test cases T guarantees the output of a KB complying with these test cases, whereas accepting one – albeit highly probable – of multiple solution KBs without having incorporated T leaves open the possibility for this KB to not fulfill T.

However, answering queries is effort for an interacting user. Therefore, the approach that involves the “early” termination of the algorithm after a solution KB has a sufficiently high probability (lower than 1) constitutes a trade-off between exactness of the output and the effort of the user and overall execution time of the interactive KB debugging algorithm, respectively.

Constant “Convergence” towards the Solution. As said, each added test case is an answered query and thus eliminates at least one minimal diagnosis w.r.t. the input DPI. And, only minimal diagnoses w.r.t. the input DPI are computed by STATICHS. Hence, by the fact that a solution to Interactive Static KB Debugging can only be constructed from a minimal diagnosis w.r.t. the input DPI, it is guaranteed that the number of solutions to Interactive Static KB Debugging is strictly monotonically decreasing throughout the execution of Algorithm 5. That is, the initial number of (all) minimal diagnoses (w.r.t. the input DPI) is “static” which means that no “new” minimal diagnoses can be introduced when the input DPI is extended by new test cases.

As a consequence of this, it is reasonable to employ STATICHS in a situation where the (complete) wpHS-tree produced by the standard (non-interactive) algorithm HS is believed to be as compact as to fit into the available system memory. In this case, STATICHS is also guaranteed to not exceed the available memory, even if an exact solution (σ = 0) is intended.

Unfortunately, however, it will be generally the case that a complete enumeration of all minimal diagnoses is intractable, especially due to an overwhelming space complexity. In such a case, Algorithm 5 using STATICHS will definitely run out of memory (given that STATICHS is called sufficiently often). The reason is that the space consumption of STATICHS will sooner or later definitely reach the huge extent of the wpHS-tree produced by HS. Nevertheless, STATICHS might be used to (possibly) find some (approximate) solution. This might work in a scenario where the given probabilistic information in terms of  p �K∪K()provided as an input to Algorithm 5 is “reasonable” in that the desired diagnosis is assigned a rather high probability and is thus figured out early, before the available memory is exhausted.

A possible modification of the stop criterion in STATICHS in a way that new leading diagnoses are not computed until a desired number of such is detected or a timeout is reached, but rather until a predefined maximum space is consumed, would not mitigate space complexity issues very much. An explanation for this is that stopping STATICHS on account of no more available memory implies that no further call of STATICHS will be able to execute. That is because, as mentioned before, an added test case can only invalidate already computed diagnoses, no other branches in the wpHS-tree, and each invalidated minimal diagnosis cannot be discarded, but must be stored (in  D×) to avoid the usage of leading diagnoses that are non-minimal w.r.t. the input DPI (cf. lines 21-23 in Algorithm 7).

Poor Search Tree Pruning. As we explained before, the preservation of a constantly shrinking set of minimal diagnoses comes at the cost of being able to exploit new test cases only partially, i.e. only for the invalidation of already computed minimal diagnoses w.r.t. the input DPI and not for the computation of minimal conflict sets and thus minimal diagnoses. The incorporation of test cases into the DPI that is used to determine minimal conflict sets (line 30 in Algorithm 7) could, on the one hand, lead to new minimal conflict sets that are no minimal conflict sets w.r.t. the input DPI. As a consequence of this, minimal diagnoses might be determined by the algorithm which are no minimal diagnoses w.r.t. the input DPI, but w.r.t. the current DPI. Hence, the soundness of STATICHS w.r.t. the set of solutions of the Interactive Static KB Debugging problem would be violated. Furthermore, such conflict sets could lead to the missing of some minimal diagnoses w.r.t. the input DPI, a violation of the completeness of STATICHS w.r.t. the set of solutions of the Interactive Static KB Debugging problem.

On the other hand, the exploitation of new test cases for conflict set generation might give rise to the possibility of pre-pruning of any tree branches, not just branches that already correspond to diagnoses w.r.t. the input DPI. Such a “dynamic” strategy which exploits the new information given by a test case not just partially, but for the invalidation and computation of diagnoses and conflict sets, will be implemented be DYNAMICHS which we will detail in Chapter 12.

Put another way, in STATICHS only the standard pruning rules for the construction of a wpHS-tree are applicable, namely the deletion of duplicate nodes and the elimination of non-minimal diagnoses (cf. Definition 4.10). Newly defined test cases only facilitate the deletion of tree branches from the leading diagnoses set  Dcalc ∪ D✓, but not from memory (as invalidated minimal diagnoses must be stored in D×, as pointed out before).

To summarize, STATICHS on the one hand makes sure to only consider relevant solutions of the problem of Interactive Static KB Debugging, but on the other hand suffers from this conservative strategy in that tree pruning cannot be designed very effectively. So, on the positive side, uncontrolled growth of the produced wpHS-tree can be avoided, but, on the negative side, consultation of an interacting user cannot be taken advantage of in terms of reduction of the space complexity of STATICHS compared to the construction of a wpHS-tree by a non-interactive procedure like Algorithm 2.

11.2 Algorithm Walkthrough

Input Parameters. When STATICHS (Algorithm 7) is called for the first time in Algorithm 5, the inputs Ccalc, D✓, D×, P′and  N ′correspond to the empty set and  Q = [∅](cf. lines 1-4 and 8 in Algorithm 5). Further on,  Dcalcis defined to be the empty set at the beginning of each execution of STATICHS. That is, STATICHS starts the construction of the wpHS-tree from an initial tree consisting of a single unlabeled root node  ∅ (∈ Q). And, all collections that are later returned by STATICHS, except for Q, are initially empty. Further input arguments are the DPI  ⟨K, B, P, N ⟩Rprovided as an input to Algorithm 5, the sets of positively (P′) and negatively (N ′) answered queries since the start of Algorithm 5, the leading diagnosis computation parameters  nmin, nmax, t(see the description in Chapter 7 on page 95) and the probability measure  p() := pK()that assigns a probability in the interval (0, 0.5) to each formula in K (cf. line 5 in Algorithm 5).

The Main Loop. During the repeat-loop, in each iteration the first node node in Q is processed (GETFIRST, line 5). That is, node is deleted from Q (DELETEFIRST, line 6) and the SLABEL function is called given node (i.a.) as a parameter. Notice that elements are added to Q (line 17) in a way that a sorting of Q in descending order according to  pnodes()(cf. Definition 4.9) is maintained throughout the execution of STATICHS.

Computation of a Node Label. The SLABEL function processes node as follows. First, the non-minimality criterion (lines 21-23) is checked. That is, among all nodes in  D(×,✓,calc) = D×∪D✓∪Dcalcone is searched which is a subset of node. If such a node nd is found, then node must be a non-minimal diagnosis (nd ⊂ node) or a duplicate diagnosis (nd = node) w.r.t.  ⟨K, B, P, N ⟩Rsince all sets  D×, D✓and  Dcalccontain only minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R. In this case, the branch in the wpHS-tree corresponding to node can be dismissed which is taken account of by returning the label closed for node.

In case the non-minimality criterion is not satisfied, the duplicate criterion (lines 24-26) is checked next. Here, Q is browsed for a node that is equal to node. If such a one is found, node can be discarded because it suffices to consider only one tree branch among multiple tree branches in the wpHS-tree featuring one and the same set of edge labels. Hence, closed is returned as a label for node. Altogether, this means that only the last processed exemplar of a node corresponding to one and the same set of edge labels is labeled, all others are discarded.

If the duplicate criterion is not met, the reuse criterion (lines 27-29) is checked next. That is,  Ccalcis browsed for a set  C (Ccalccomprises only minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R) such that C and node are disjoint sets. If such a C is detected, then C can be used to label node since the set of edge labels along the path in the wpHS-tree leading from the root node to node does not hit C. In this case, the label C is returned for node by SLABEL.

Given that the reuse criterion fails, QX is called given the DPI  ⟨K \ node, B, P, N ⟩Ras an argument (line 30). If the output L is equal to ’no conflict’, then we know by Proposition 4.9 that node is a diagnosis w.r.t.  ⟨K, B, P, N ⟩R, wherefore the label valid is returned for node. Otherwise, the output L must be a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rthat has an empty set-intersection with node. Since the reuse criterion failed, i.e. there is no set in  Ccalcthat does not intersect with node, L must be a fresh minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rin the sense that  L /∈ Ccalcmust hold. Therefore the label L is first added to  Ccalcand then returned by SLABEL as a label for node.

Processing of a Node Label. Back in the main procedure,  Ccalcis updated (line 8) and then the label L returned by the SLABEL function is processed as follows. If L = valid, then it is a fact that node is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R, but it is not certain that node also meets all positive test cases  P′and all negative test cases  N ′that have been specified and added to  ⟨K, B, P, N ⟩Rso far. Thus, according to Proposition 7.3, the validity of the KB K \ node w.r.t.  ⟨·, B, P ∪ P′, N ∪ N ′⟩Rmust still be checked (line 10). If successful, node is added to the set  Dcalcof calculated minimal diagnoses w.r.t. the input DPI that comply with all answered queries so far. Otherwise, node is added to the set  D×of minimal diagnoses w.r.t. the input DPI that have been invalidated by some answered query.

Roughly, the minimality of diagnoses added to  Dcalcis assured by the pruning rule (lines 21-23) which eliminates non-minimal nodes and the fact that  pnodes()sorts a node  nd′corresponding to a superset of some node nd behind nd in Q.

If, on the other hand, L = closed is the label returned by SLABEL, then node must simply be removed from Q which has already been executed in line 6. Thence, no actions are necessary (cf. line 14).

In the third case, if a minimal conflict set L is returned by SLABEL, then L is a label for node meaning that |L| successor nodes of node, namely a node  node ∪ {e}for all elements  e ∈ L, need to be added to Q in sorted order using the function  pnodes()(INSERTSORTED, line 17).

Stop Criterion. The first criterion causing STATICHS to terminate is Q = [] which means that the complete wpHS-tree has been constructed and no further nodes can be labeled. In this case,  Dcalc ∪ D✓comprises all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rthat are compliant with all the specified positive and negative test cases  P′and  N ′.

If the first criterion is not met, then the second criterion is checked. That is, a test is performed which checks whether the number of leading minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rin  Dcalc ∪ D✓amounts to at least  nminand either  |Dcalc ∪ D✓| = nmaxor more than t time has passed since the start of the execution of STATICHS. In the latter case,  nmin ≤ |Dcalc ∪ D✓| < nmaxholds. In the former case, |Dcalc ∪ D✓| = nmaxis satisfied.

Processing of the Leading Diagnoses Returned by STATICHS. When a call of STATICHS in Algorithm 5 returns  ⟨Dcalc ∪ D✓, Q, Ccalc, D×⟩, the set  Dcalc ∪ D✓is stored in the variable  D✓in Algorithm 5. Between two successive calls of STATICHS in Algorithm 5, only this set  D✓as well as  D×are modified. The list Q and the set  Ccalcremain unchanged until they are used as input parameters to the next call of STATICHS in Algorithm 5.

In case one diagnosis  Dmaxof the current leading diagnoses in  D✓has a probability greater or equal 1−σas per the probability measure  pD()(see Section 9.2), the stop criterion of interactive KB debugging is met and a solution KB w.r.t.  ⟨K, B, P, N ⟩Rconstructed from the input DPI  ⟨K, B, P, N ⟩Ras well as from  Dmaxis returned to the user. Thereafter, Algorithm 5 terminates and no more calls of STATICHS take place.

Otherwise, if no leading diagnosis satisfies the stop criterion, a query Q together with its q-partition P(Q) is computed, as was detailed in Chapter 8 and Section 9.2. An answer u(Q) to this query is submitted by the interacting user (line 17 in Algorithm 5). Then u(Q) along with P(Q) is exploited to figure out the subset  Doutof  D✓that does not comply with u(Q). This set  Doutis then deleted from D✓and added to  D×. Additionally, Q is added to the positive test cases  P′if u(Q) = true and to the negative test cases  N ′otherwise. Subsequently, STATICHS is called again given

the updated parameters  D✓, D×, P′and  N ′(which are modified within and outside of STATICHS during the execution of Algorithm 5),

the unchanged parameters  Q, Ccalc(which are modified only within STATICHS during the execution of Algorithm 5) and

the constant parameters  ⟨K, B, P, N ⟩R, t, nmin, nmaxand  pK()(which are not modified within or outside of STATICHS during the execution of Algorithm 5).

The execution of this next and any subsequent call to STATICHS runs in analogue way as described.

Remark 11.2 We want to emphasize that queries are computed w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪N ′⟩Ralthough STATICHS focuses on solutions to the problem of Interactive Static KB Debugging which involves exclusively minimal diagnoses w.r.t. the input DPI  ⟨K, B, P, N ⟩R. However, a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rthat satisfies all positive test cases  P′as well as all negative test cases  N ′is also a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. And, a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rthat does not satisfy all positive test cases  P′as well as all negative test cases  N ′is not a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. These two facts are guaranteed by Proposition 12.5 that will be given on page 201.

Hence, it holds that

• Dis a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rthat satisfies  P′ ∪ {Q}as well as  N ′if and only if D is a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′ ∪ {Q} , N ∪ N ′⟩Rand

• Dis a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rthat satisfies  P′as well as  N ′ ∪ {Q}if and only if D is a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′ ∪ {Q}⟩R.

Therefore, each query constructed during Algorithm 5 with mode = static must be a query w.r.t. the current set of leading diagnoses  D✓and the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R(cf. Equation 7.1, Definition 7.2 and Proposition 7.3 on pages 95-96).

As a consequence of this, no additional test is required in order to ascertain that each diagnosis in the set  D✓that is given as a parameter to the next call of STATICHS does in fact satisfy all answered queries so far.

11.3 Illustrating Examples

In this section we will give two examples of how interactive KB debugging using STATICHS (Algorithm 5 with parameter mode = static) works. The first one will show the similarities and differences between the usage of STATICHS (within Algorithm 5) and HS (within Algorithm 3) since it will depict the application of STATICHS on the same example DPI (see Table 15.3) that was used to show the functionality of HS in examples 4.8 and 4.9. At the same time, the first example will provide evidence that solving the problem of Interactive Static KB Debugging can be more efficient than solving the problem of Interactive Dynamic KB Debugging in terms of the number of query answers required from an interacting user. This will be discussed in more detail in Chapter 13.

The second example is supposed to deepen the reader’s understanding of the way STATICHS works. To this end, the example DPI provided by Table 4.2 will be used which constitutes a significantly harder (interactive) debugging task than the DPI investigated in the first example. This example will involve the construction of a relatively large hitting set tree and thereby give a presentiment of the space and time complexity problems caused by the poor tree pruning inherent in the STATICHS algorithm. In addition, this example will draw a reverse image of the first example in that it will stress the advantage of the decision to search for a solution of Interactive Dynamic KB Debugging rather than for a solution of Interactive Static KB Debugging (more on that in Chapter 13).

Example 11.1 In this example we assume that the author (called user throughout this example) of the (admissible) DPI  ⟨K, B, P, N ⟩Rgiven by Table 15.3 applies Algorithm 5 with mode = static to interactively debug  ⟨K, B, P, N ⟩R. Further, suppose the following user requirements:

In order to guarantee a fast reaction time of the system (the time between two successive queries to the user), the user wants each query to be computed from the minimally necessary number of leading diagnoses. Thus, in each iteration exactly two leading diagnoses should be computed by STATICHS (cf. Proposition 7.5). This postulation is reflected by setting  nmin = nmax = 2. Notice that the time limit t is irrelevant in this case.

Moreover, the user desires to get just any query, i.e. they do not demand any particular properties – such as optimal information gain among a pool of queries – to be satisfied by a query. This can be ensured by choosing q := 1 (cf. Chapter 8) and qsm() equal to any query selection measure described in Section 9.3.

The user is new to KB debugging and has neither an idea of faults they frequently make nor access to any kind of data that would indicate their tendency to certain types of faults. Thence,  pK(ax) := c < 0.5for all  ax ∈ K, i.e. all formula fault probabilities are specified to be equal (to some constant c). In such a case, if a formula fault probability measure  pK()is given as an input to Algorithm 5, then line 5 in Algorithm 5 is omitted. Please notice that this aspect is not shown in Algorithm 5.

Finally, the user’s intention is to get the (exact) solution to the problem of Interactive Static KB Debugging. This can be taken into account by specifying  σ := 0.

The tree constructed and parameters computed and used by Algorithm 5 using STATICHS are visualized by Figure 11.1. We use the same notation as in Figures 4.2 and 4.3 which is described in Examples 4.8 and 4.9. The only new notational element here is the  =⇒labeled by some designator of a query. That is,  ✓(Di) Qj=⇒ ✓means that  Diis still a minimal diagnosis after  Qjhas been answered and added to the respective set of test cases of the DPI. On the other hand,  ✓(Di) Qj=⇒ ×signifies that the minimal diagnosis  Diis invalidated through the addition of the answered query  Qjto the respective set of test cases of the DPI. Please notice that  =⇒does not point at a node of the wpHS-tree. Instead, the label at which  =⇒points is to be understood as the new label of the node originally labeled by  ✓(Di)from which the (first of possibly multiple)  =⇒goes out. This notation should help to keep track of the evolution of node labels in the wpHS-tree without needing to overload a single node by multiple different successive labels.

In the first iteration, i.e. during the execution of the first call of STATICHS during Algorithm 5, the root node (initially the empty set) is labeled by the minimal conflict set  ⟨1, 2, 5⟩w.r.t.  ⟨K, B, P, N ⟩Rand three successor nodes, namely {1}, {2} as well as {5}, are added to the queue of open nodes Q. Since all formulas have been assigned an equal fault probability, STATICHS conducts a breadth-first tree construction (as displayed by the numbers i⃝that give the order of node labeling). That is, Q in this case is a first-in-first-out queue. In this vein, first [1] and then [2] are identified as minimal diagnoses w.r.t. the given DPI. Since  D✓ ∪Dcalc = ∅∪{[1], [2]}has a cardinality of  nmin = nmax = 2, the stop criterion of STATICHS causes it to terminate and return  ⟨Dcalc ∪ D✓, Ccalc, Q, D×⟩ = ⟨{[1], [2]} , {⟨1, 2, 5⟩} , [{5}], ∅⟩(because  D✓and  D×are initially empty sets), as shown in the upper right column in Figure 11.1.

Then, in Algorithm 5, outside of the STATICHS procedure, the first query  Q1 = {E → ¬A}is computed from the leading diagnoses set {[1], [2]}. The q-partition  P(Q1)associated with  Q1is  ⟨{[1]} , {[2]}, ∅⟩. The user’s answer  u(Q1)to  Q1is then false. Thence, the set  Doutis calculated from  P(Q1)as D+(Q1) = {[1]}(due to negative answer, cf. Remark 7.4), deleted from  D✓ := D✓ ∪ Dcalcto yield D✓ = {[2]}and added to  D×to yield  D× = {[1]}. The set  D✓corresponds to the set of all already computed minimal diagnoses w.r.t. the input DPI that satisfy all queries answered so far. The set  D×comprises all already computed minimal diagnoses w.r.t. the input DPI that do not satisfy all queries answered so far. These sets  D✓and  D×along with the collections Q and  Ccalcwhich are unmodified outside of STATICHS are used as input arguments for the second call of STATICHS. Notice that, in the figure, the resulting values of operations performed within STATICHS are given in the righthand column above the dashed line whereas values computed outside of STATICHS are given below the dashed line.

After the modifications caused by the addition of the query  Q1to the negative test cases of  ⟨K, B, P, N ⟩Rhave been taken into account in step 4⃝, the partial wpHS-tree built in iteration 1 is further constructed in iteration 2 resulting in the tree depicted by the middle picture in the lefthand column of Figure 11.1. Whereas the branches with edge labels {5, 1} and {5, 2} correspond to proper supersets of the minimal diagnoses [1] and [2], respectively, w.r.t. the input DPI  ⟨K, B, P, N ⟩Rand are thus closed by the non-minimality criterion tested in the SLABEL function, the branch with edge labels {5, 7} is identified as a minimal diagnosis  D3 := [5, 7]w.r.t.  ⟨K, B, P, N ⟩R. However,  D3is not directly added to the set Dcalc. In fact, the validity of the KB  K \ D3w.r.t. the current DPI  ⟨K, B, P, N ∪ {Q1}⟩Ris tested beforehand. As this test is successful, meaning that  D3 ∈ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P,N∪{Q1}⟩R, D3can be safely added to  Dcalcimplying the set of leading diagnoses  D✓ ∪ Dcalc = {D2, D3}with cardinality two. Due to  nmin = nmax = 2, STATICHS terminates.

After the second query  Q2has been answered negatively involving the dismissal of the leading diagnosis  D2, STATICHS ends up with an empty queue Q of open nodes in iteration 3 (see the tree in the lower left column of Figure 11.1). Hence, STATICHS returns a singleton set including the leading diagnosis  D3. Now, independently of the specified formula probabilities,  pD(D3) = 1 ≥ 1 − σ = 1is satisfied since the probability space considered by the probability measure  pD()focuses on the sample space  Ω = {D3}(cf. Sections 4.6 and 9.2). Thus, the stop condition of Algorithm 5 is met wherefore the solution KB Ksol := (K \ D3) ∪ UP = (K \ D3) ∪ ∅ = K \ D3is returned to the user. This solution KB  Ksolis the (exact) solution to Interactive Static KB Debugging given the DPI  ⟨K, B, P, N ⟩Rof Table 15.3 as an input because  D3is the only minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Rthat conforms with all answered queries  Q1 = falseand  Q2 = false.

All in all, the execution of Algorithm 5 in this example performs

2 full QX calls, i.e. calls of QX that actually return a minimal conflict set (there are two minimal conflict sets labeled by C in the picture at the bottom of the lefthand column in Figure 11.1) and

6 validity checks, i.e. calls of QX that return ’no conflict’ (one check for each of the three found minimal diagnoses; notice that QX does only perform a single KB validity check by ISKBVALID in case it returns ’no conflict’, see Algorithm 1) or calls of ISKBVALID in line 10 in STATICHS (one call for each of the three found minimal diagnoses),

computes

3 minimal diagnoses w.r.t. the input DPI,

2 minimal conflict sets w.r.t. the input DPI and

2 queries and asks the user 2 logical formulas (1 per query)

and stores

a maximum of 5 nodes (where node refers to the internal representation of a node in STATICHS as a set of edge labels along a path from the root node to a leaf node; there are even more nodes in the sense of tree nodes in the picture at the bottom of the lefthand column in Figure 11.1).

Example 11.2 Let us now consider the (admissible) DPI  ⟨K, B, P, N ⟩Rgiven by Table 4.2. We assume an expert (called user throughout this example) in the domain Dom modeled by K who wants to find a solution to Interactive Static KB Debugging for the given DPI  ⟨K, B, P, N ⟩Rby means of Algorithm 5 with mode = static. Further, we suppose the following requirements:

The user wants each query to be computed from three leading diagnoses. Thus, after each iteration of STATICHS, the set  D✓ ∪ Dcalcshould comprise exactly three elements. This postulation is reflected by setting  nmin = nmax = 3. Notice that the time limit t is irrelevant in this case.

Moreover, as in example 11.1, we assume no demand for queries satisfying special properties which is reflected by choosing q := 1 (cf. Chapter 8) and qsm() equal to any query selection measure described in Section 9.3.

Let there be several documentations of past debugging sessions (e.g. in terms of formula change logs) involving KBs in the domain Dom of the author auth of K accessible to the user. Further, let the user have extracted term and logical construct probabilities  p �K∪K(ax) ∈ [0, 1]for  ax ∈ Kfor auth from this data. This function  p �K∪K : �K ∪ K → [0, 1]is then provided as an input to Algorithm 5.Finally, the user’s intention is to get the (exact) solution to the problem of Interactive Static KB Debugging. This can be taken into account by specifying  σ := 0.

The tree constructed and parameters computed and used by Algorithm 5 using STATICHS are visualized by Figures 11.2 as well as 11.3. We use the same notation as in Figures 4.2, 4.3 and 11.1 which is described in Examples 4.8, 4.9 and 11.1.

After the initialization of variables, Algorithm 5 calls the function GETFORMULAPROBS in line 5 which exploits  p �K∪K()to calculate the function  pK()giving the fault probabilities of formulas in K (cf. Sections 4.6.1, 9.2 and Example 4.7). Let the resulting probabilities be as depicted by Table 11.1.

image

Table 11.1: (Example 11.2) Computed formula fault probabilities for the example DPI given by Table 4.2.

Then, STATICHS is called for the first time, resulting in the wpHS-tree given in the first picture in Figure 11.2. Contrary to Example 11.1, where the tree was built up in breadth-first order, in this example the formula probabilities  p() := pK()given by Table 11.1 are used to assign a probability  pnodes(n)to each path n in the wpHS-tree starting from the root node (cf. Formula 4.6 and Definition 4.9). In this vein, as outlined by the numbers i⃝indicating when a node is labeled, after the root node has been labeled by C1 := ⟨1, 2, 5⟩, the node corresponding to the outgoing edge of  C1labeled by the formula with the largest fault probability among all formulas in  C1is labeled first. That is, the node {1} with  pnodes({1}) = 0.41(as opposed to the nodes {2} and {5} with 0.25 each) is labeled first. The SLABEL procedure, after checking whether {1} is a non-minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Ror a duplicate of some other node in Q (both checks negative), computes another minimal conflict set  C2 := ⟨2, 4, 6⟩such that  {1}∩C2 = ∅(C2is not hit by the node {1}) to constitute a label for node {1}. The successor nodes {1, 2}, {1, 4} and {1, 6} of {1} are generated and added to the list Q in a way that the sorting of Q in descending order of pnodes()is maintained.

Since {1, 4} (0.28) as well as {1, 6} (0.27) have a larger probability (as per  pnodes()) than the nodes {2} (0.25) and {5} (0.25), Q is given by [{1, 4} , {1, 6} , {2} , {5} , {1, 2}] when it comes to the processing of the next node. Since STATICHS always treats the first node of Q next, it identifies the first minimal diagnosis  D1 := [1, 4]w.r.t.  ⟨K, B, P, N ⟩Rin step 3⃝. In steps 4⃝and 8⃝, two further minimal diagnoses D2 := [1, 6]and  D3 := [5, 4]are detected. Altogether, the union of  D✓(initially the empty set) and Dcalc(comprising the three computed diagnoses) now contains  3 = nmin = nmaxelements wherefore STATICHS terminates and outputs the tuple  ⟨Dcalc ∪ D✓, Ccalc, Q, D×⟩where the sets in this tuple are given under the wpHS-tree of iteration 1 in Figure 11.2.

From this set of leading diagnoses  D✓ := D✓ ∪ Dcalc, the probability measure  pD : D✓ →[0, 1] is computed by the function GETPROBDIST (cf. Algorithm 6 and Section 9.2). The result is ⟨pD(D1), pD(D2), pD(D3)⟩ = ⟨0.38, 0.37, 0.25⟩. The mode  Dmax := D1of this probability distribution is then computed by GETMODE. As  σ = 0, pD(Dmax) = 0.38 ̸≥ 1wherefore the stop criterion of Algorithm 5 is not satisfied.

Consequently, Algorithm 5 proceeds to generate the first query  Q1 = {B ⊑ K}(based on the current set of leading diagnoses  D✓) along with its associated q-partition  P(Q1) = ⟨{D1, D2} , {D3} , ∅⟩. The diagnosis  D1is in  D+(Q1)because  K∗1 = (K \ D1) ∪ B ∪ UP(recall Formula 7.1 for a definition of  K∗i) comprises formulas 2, 3, 5, 6, 7, 8 and 9 as well as  p1(cf. Table 4.2) wherefore  K∗1 |= {B ⊑ K} = Q1(due to the set of formulas  {2, 3} = {B ⊑ G, G ⊑ K}). That  D2belongs to  D+(Q1)as well follows analogously. On the other hand,  D3 ∈ D−(Q1)must be true since  K∗3 ∪Q1includes i.a.  A ⊑ B(formula 1) and  B ⊑ K (∈ Q1) wherefore  {A ⊑ K} = n1is an entailment of  K∗3. Thus, the negative test case  n1is violated.

The positive user answer  u(Q1) = trueis incorporated in that  Q1is appended to the set of positive test cases P yielding  P ∪ {Q1} = {{r(x, y)} , {B ⊑ K}}. Step 9⃝shows the impact of this test case addition on the set of leading diagnoses, i.e. all diagnoses in the set  Dout = D−(Q1) = {D3}(due to positive answer, cf. Remark 7.4) are re-labeled by  ×whereas all other leading diagnoses (D1, D2) are still labeled by  ✓.

In the same fashion, further node labelings are conducted in iteration 2 until  |D✓ ∪ Dcalc| =| {D1, D2} ∪ {[2, 1]} | = 3 = nmin = nmaxholds again. These actions are displayed by the tree at the bottom of Figure 11.2.

Notice that, after step 12⃝, two nodes corresponding to the same set are elements of the list Q. At

step 13⃝, the duplicate criterion checked by SLABEL comes into play. Since the node {1, 2} (the leftmost branch in the tree) is ranked first in Q (we assume a first-in-first-out ordering of nodes corresponding to equal sets of edge labels in Q), the SLABEL procedure is called given node := {1, 2} as an argument and detects the node {2, 1} (the fourth leftmost branch in the tree) in Q. Hence, node = {1, 2} is closed as a duplicate node which finds expression in the label  ×(dup). When {2, 1} (which must have the same probability as {1, 2} due to set-equality) is processed at step14⃝, it is discovered to be a minimal diagnosis (D5) w.r.t.  ⟨K, B, P, N ⟩R.

image

before  D5is detected. However,  D4is immediately ruled out and added to  D×(cf. line 13 in STATICHS) due to the fact that  K \ D4is invalid w.r.t. the current DPI  ⟨·, B, P ∪ {Q1} , N ⟩R(cf. Definition 3.3). The explanation why this holds is as follows:

image

Formula 7.1 for a definition of  K∗i) does not violate any  r ∈ R = {consistency, coherency} and does not entail any  n ∈ N = {n1, n2} = {{A ⊑ K} , {L ⊑ ∃r.F, B(x), G ⊑ K}}. Applying the diagnosis  D4to K yields  K \ D4 = {1, 3, 5, 8}which includes in particular formula 1 which is equal to  A ⊑ B(see Table 4.2). However, there is also the negative test case  n1indicating that  A ⊑ Kmust not be entailed by  K∗4. That is,  B ⊑ K ∈ K∗4(due to  Q1) and  A ⊑ B ∈ K∗4which implies that  K∗4 |= {A ⊑ K} = n1wherefore  K∗4is invalid w.r.t.  ⟨·, B, P ∪ {Q1} , N ⟩R.

image

of  =⇒. In case of the invalidation of a leading diagnosis (i.e. one that was utilized in the computation of Qj), on the contrary, the step number at the shaft is lower than the step number at the arrow head.

image

D✓∪Dcalc = {D1, D2, D5}is then answered by  u(Q2) = trueas well, wherefore the leading diagnoses D2, D5are ruled out and added to  D×. So, the input argument  D✓given to the next call of STATICHS in Algorithm 5 consists of the single diagnosis  D1. In the third iteration (see the picture given in Figure 11.3), STATICHS again executes in order to complete the leading diagnosis set to contain three elements. However, as we can say in advance,  D1is the only minimal diagnosis w.r.t. the input DPI  ⟨K, B, P, N ⟩Rwhich is also a diagnosis w.r.t. the current DPI  ⟨K, B, P ∪ {Q1, Q2} , N ⟩R. Nevertheless, STATICHS continues expanding the wpHS-tree until it has verified that this is the case (Q = []). This is equivalent to finishing the construction of the non-interactive wpHS-tree that is generated by HS with parameters  nmin = nmax = ∞. We want to stress that the construction of the entire wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  p() := pK()is inevitable in a debugging scenario where the (exact) solution to the Interactive Static KB Debugging problem is sought (the probability w.r.t.  pD()of a diagnosis can only be equal to 1 if there is only a single leading diagnosis returned by STATICHS).

image

3 and directly dismissed (added to  D×) after the validity check in line 10 of STATICHS. All other tree branches are closed due to the non-minimality (label  ×(⊃Di)) or duplicate criterion (label  ×(dup)). Due to  σ = 0and the associated necessity to grow the wpHS-tree until all leaf nodes are labeled, the final tree (19 labeled leaf nodes) depicted in Figure 11.3 is relatively large in comparison to the small size |K| = 7. This example might already give an idea of the potential explosion of the wpHS-tree produced by STATICHS in case the (exact) solution to the Interactive Static KB Debugging problem is desired. This is why it will usually make sense in practice to specify a fault tolerance  σ > 0which enables Algorithm 5 with mode = static to escape from the generally intractable complexity of the complete investigation of all minimal diagnoses w.r.t. the input DPI (full construction of the wpHS-tree). However, in this concrete example, allowing a small fault tolerance  σhas no effect either. Actually,  σ ≥ 0.56is necessary to achieve a premature termination of the tree construction. This holds due to the fact that the probability distributions of leading diagnoses are  ⟨pD(D1), pD(D2), pD(D3)⟩ = ⟨0.38, 0.37, 0.25⟩(after iteration 1)

and  ⟨pD(D1), pD(D2), pD(D5)⟩ = ⟨0.44, 0.42, 0.14⟩(after iteration 2). Now, given say  σ := 0.6, the stop criterion of Algorithm 5 would be met after iteration 2 because  pD(Dmax) = pD(D1) = 0.44 ≥0.4 = 1 − 0.6 = 1 − σ. Nate that, in this case, the same (exact) solution would be returned as for the setting  σ := 0. The (significant) difference is just that the final tree in this case has only 14 leaf nodes, of which only 7 are labeled (the labeling of a node is in general significantly more costly than the mere generation of a node). As opposed to this, the full tree comprises 19 labeled nodes. On the other side of the coin, choosing a value of  σ > 0.5, for example, means that – from the point of view of the knowledge at the time Algorithm 5 terminates – a solution to Interactive Static KB Debugging is returned by Algorithm 5 which has a higher probability of not being the (exact) solution than of being the (exact) solution.

All in all, the execution of Algorithm 5 in this example performs

4 full QX calls, i.e. calls of QX that actually return a minimal conflict set (there are four minimal conflict sets labeled by C in the tree in Figure 11.3) and

20 validity checks, i.e. calls of QX that return ’no conflict’ (one check for each of the 10 found minimal diagnoses; notice that QX does only perform a single KB validity check by ISKBVALID in case it returns ’no conflict’, see Algorithm 1) or calls of ISKBVALID in line 10 in STATICHS (one call for each of the 10 found minimal diagnoses),

computes

10 minimal diagnoses w.r.t. the input DPI,

4 minimal conflict sets w.r.t. the input DPI and

2 queries and asks the user 2 logical formulas (1 per query)

and stores

a maximum of 19 nodes (where node refers to the internal representation of a node in STATICHS as a set of edge labels along a path from the root node to a leaf node; there are even more nodes in the sense of tree nodes in the picture in Figure 11.3).

image

Figure 11.1: (Example 11.1) Solving the problem of Interactive Static KB Debugging (Problem Definition 6.2) for the example DPI given by Table 15.3 by means of Algorithm 5 and STATICHS.

image

image

11.4 Correctness of the Algorithm

In this section we will demonstrate the correctness of STATICHS. That is, we will prove that STATICHS, given the inputs described in Algorithm 7, yields the outputs enumerated in Algorithm 7. Used in Algorithm 5 to iteratively compute a set of leading diagnoses for query generation, STATICHS in this way serves to solve the problem of Interactive Static KB Debugging approximately (parameter  σ > 0in Algorithm 5) or exactly (σ = 0).

After each call to STATICHS during Algorithm 5, the hitting set tree produced by STATICHS is a (partial) wpHS-tree w.r.t. the DPI  ⟨K, B, P, N ⟩Rgiven as an input to Algorithm 5 and  pnodes()which can be directly obtained from the function p() given as input to STATICHS. This proposition is made by Lemma 11.3.

In order to be able to prove this proposition, we formulate and prove two lemmata, Lemma 11.1 and 11.2. The former, which is given next, shows that this proposition holds for the very first call of STATICHS during the execution of Algorithm 5. The latter assures that this proposition holds for any further call of STATICHS during Algorithm 5 for an adequate set of input parameters to STATICHS. Finally, Lemma 11.3 exploits these results to ascertain that this proposition is satisfied for all calls of STATICHS.

Lemma 11.1. Let the following be the input parameters to the STATICHS function:

• ⟨K, B, P, N ⟩Ris the DPI given as input to Algorithm 5,

nmin, nmax, t  ∈ Nwhere nmin ≥ 2,

a function  p : K → (0, 0.5),

Q = [∅],

 P′=  N ′=  D× = D✓ = Ccalc = ∅.

Then, STATICHS creates a (partial) wpHS-tree T w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()(cf. Definition 4.9) equivalent to one produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩R, nmin, nmax, tand p() and returns  ⟨D, Q, Ccalc, D×⟩where  ⟨D ∪ D×, Q, Ccalc⟩is the relevant data of T.

Proof. Since all input parameters  P′, N ′, D×, D✓and  Ccalcare equal to the empty set,  Dcalc = ∅and Q includes only the node  ∅, we might regard  ⟨Dcalc, Q, Ccalc⟩as the initial relevant data of some (partial) wpHS-tree which includes only an unlabeled root node. The root node  ∅cannot be labeled as otherwise it would be necessarily an element of  Dcalcif  ∅is a diagnosis w.r.t.  ⟨K, B, P, N ⟩Ror the set Ccalcwould include the conflict set that labels the root node.

D×can never be extended during the execution of STATICHS since line 13 can never be reached. This holds because the test made in line 10 can never be negative. Namely, as  P′ = N ′ = ∅, this test actually checks whether K \ node is valid w.r.t.  ⟨·, B, P, N ⟩R. Due to the fact that L = valid has been output as a label for node (line 9) by the SLABEL function called in line 7, it must hold that QX(⟨K \ node, B, P, N ⟩R)yielded ’no conflict’. By Proposition 4.9, this implies that K \ node is valid w.r.t.  ⟨·, B, P, N ⟩R. Thence,  D× = ∅definitely holds whenever STATICHS terminates.

Moreover, each node with the label valid is added to  Dcalcsince line 13 can never be reached. As a consequence, with the given input parameters, the execution of the code between line 2 and line 18 of Algorithm 7 has exactly the same effect as executing the code between line 2 and line 16 of Algorithm 2.

D✓can never be extended as there is no such modification operation at all in STATICHS. Thus, D✓ = ∅holds throughout the execution of STATICHS.

Now, the SLABEL procedure is equivalent to the LABEL procedure of Algorithm 2, except for the first line of the non-minimality criterion. That is, in STATICHS (line 21) some nd is searched for in D(×,✓,calc)whereas in Algorithm 2 (line 19) such nd is searched in  Dcalc. However, we point out that D(×,✓,calc)in the SLABEL procedure corresponds to the set  D× ∪D✓ ∪Dcalcin STATICHS (cf. the call to SLABEL in line 7), where  D✓ = D× = ∅is an invariant, as argued above. Taking these arguments into account, we have that  D(×,✓,calc)in SLABEL in line 21 is equal to  Dcalc, just as in Algorithm 2.

Hence, with the given input parameters, we have verified that STATICHS acts equivalently to Algorithm 2. As Algorithm 2 produces a (partial) wpHS-tree T w.r.t. the input DPI  ⟨K, B, P, N ⟩Rand pnodes()by Lemma 4.15, we infer that STATICHS also does so.

As opposed to Algorithm 2 which returns only  Dcalc, STATICHS returns  ⟨D, Q, Ccalc, D×⟩where D := Dcalc ∪ D✓ = Dcalcsince  D✓ = ∅, as argued above. In that,  Dcalc, Qand  Ccalccorrespond exactly to the equally named collections in Algorithm 2 and  D× = ∅, as argued above. Therefore, by Corollary 4.6,  ⟨D ∪ D×, Q, Ccalc⟩ = ⟨Dcalc, Q, Ccalc⟩is the relevant data of the (partial) wpHS-tree T w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()produced by Algorithm 2.

The next lemma manifests that STATICHS, given such parameters that  ⟨D× ∪ D✓, Q, Ccalc⟩is the relevant data of a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes(), again yields a (partial) wpHStree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes().

Lemma 11.2. Let the following be the input parameters to the STATICHS function:

• ⟨K, B, P, N ⟩Ris the DPI given as input to Algorithm 5,

 P′is the set of positive and  N ′is the set of negative test cases specified since the start of Algorithm 5 where  P′ ∪ N ′ ⊃ ∅,

nmin, nmax, t  ∈ Nwhere nmin ≥ 2,

a function  p : K → (0, 0.5),

 D× ̸= ∅, D✓ ̸= ∅, Ccalc ̸= ∅and Q such that  ⟨D× ∪ D✓, Q, Ccalc⟩is the relevant data of a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()produced by Algorithm 2 with input parameters ⟨K, B, P, N ⟩Rand p().

Then, STATICHS creates a (partial) wpHS-tree T w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()equivalent to one produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p() and returns  ⟨D, Q, Ccalc, D×⟩where  ⟨D ∪ D×, Q, Ccalc⟩is the relevant data of T.

Proof. Since  ⟨D× ∪ D✓, Q, Ccalc⟩is the relevant data of a (partial) wpHS-tree T w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p(), it is clear that, if the construction of T is continued by an algorithm working equivalently to Algorithm 2 and using this relevant data, the relevant data of a (partial) wpHS-tree  T ′w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()will be stored by this algorithm (Corollary 4.6). Therefore, we show that STATICHS is such an algorithm.

In Algorithm 2, the set of all already computed minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Ris denoted by  Dcalc. Nodes labeled by valid are added to  Dcalc(line 11) and  Dcalcis used in the non-minimality criterion in the LABEL function (line 19). If Algorithm 2 should be used to continue construction of T using the relevant data  ⟨D× ∪ D✓, Q, Ccalc⟩, the required setting is just to use  Dcalc := D× ∪ D✓and use Q and  Ccalcfor the equally named variables in Algorithm 2. If then a new node nd labeled by valid were added to  Dcalc, we would have that  Dcalc := D× ∪ D✓ ∪ {nd}. By Corollary 4.7, this set  Dcalcused by Algorithm 2 would at each point in time comprise exactly the  |Dcalc|most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes().

In STATICHS, each node node labeled by valid is added either to  Dcalc, which is initially the empty set in STATICHS, or to  D×(lines 11 and 13), i.e. node is added to  Dcalc ∪ D×. Thus, it is also true to say that node is added to  Dcalc ∪ D× ∪ D✓. So, the first new node nd labeled by valid is added to this set which is then equal to  D× ∪ D✓ ∪ {nd}. This set is equal to the set  Dcalcthat would be used by Algorithm 2 to further construct the (partial) wpHS-tree T.

In the non-minimality criterion in function SLABEL,  D×,✓,calcis used which is equal to the set Dcalc ∪ D× ∪ D✓in STATICHS (cf. the call to SLABEL in line 7). Hence,  Dcalc ∪ D× ∪ D✓is used and modified in STATICHS in exactly the same way as  Dcalcis used and modified in Algorithm 2.

Apart from this, as can be easily verified, the labeling function SLABEL in STATICHS is identical to LABEL in Algorithm 2 and the way Q and  Ccalcare used and modified in STATICHS is exactly equivalent to the way these are used and modified in Algorithm 2.

What remains to be shown is that  Dcalc ∪ D× ∪ D✓, as  Dcalcin Algorithm 2, always contains all already computed minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rwhich are the  |Dcalc ∪ D× ∪ D✓|most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Since  D× ∪ D✓is the first set in the relevant data of a (partial) wpHS-tree T w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p(), by Corollaries 4.6 and 4.7, it must be valid that  D× ∪ D✓comprises the  |D× ∪ D✓|most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R. Since  Dcalcis initially defined to be the empty set in STATICHS, it is also true to say that  D× ∪ D✓ ∪ Dcalccomprises the  |D× ∪ D✓ ∪ Dcalc|most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rwhen STATICHS starts executing. Since, by assumption, the same p() is used by STATICHS as was used for the construction of the (partial) wpHS-tree T so far, the same ordering of Q is used by STATICHS as would be used by Algorithm 2 to further construct the (partial) wpHS-tree T. Therefore,  D× ∪ D✓ ∪ Dcalcmust indeed comprise the  |D× ∪ D✓ ∪ Dcalc|most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rat each point in time.

The set D in the tuple  ⟨D, Q, Ccalc, D×⟩returned by STATICHS corresponds exactly to  Dcalc ∪D✓. So,  D ∪ D× = D× ∪ D✓ ∪ Dcalc.

To summarize, STATICHS acts exactly equivalently to Algorithm 2. As a consequence, Corollary 4.6 regarding Algorithm 2 applies to STATICHS as well. This means that the tuple consisting of the set of nodes labeled by valid, i.e.  D× ∪ D✓ ∪ Dcalc, the list of open nodes Q and the set of minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rin STATICHS store the relevant data of a (partial) wpHS-tree T as it could have been generated by Algorithm 2. This completes the proof.

Lemma 11.3. Any call to STATICHS within Algorithm 5 yields an output  ⟨D, Q, Ccalc, D×⟩where

• ⟨D ∪ D×, Q, Ccalc⟩is the relevant data of T and

T is a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()equivalent to one produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p().

Proof. As can be easily verified, the arguments given to STATICHS at the first time it is called throughout the execution of Algorithm 5 correspond exactly to the input parameters to STATICHS assumed in Lemma 11.1 (cf. the variable instantiations in lines 1-4 of Algorithm 5). Thus, by Lemma 11.1, we conclude that the first call to STATICHS during the runtime of Algorithm 5 yields the output  ⟨D, Q, Ccalc, D×⟩where  ⟨D ∪ D×, Q, Ccalc⟩is the relevant data of T and T is a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()equivalent to one produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p().

When this first call to STATICHS returns in Algorithm 5, D is renamed to become  D✓in Algorithm 5 (line 8).  Q, Ccalcand  D×bear unmodified names within Algorithm 5. We point out that Q and  Ccalcare not modified anywhere in Algorithm 5.  D✓and  D×are modified only in lines 21 and 22. In these lines, a subset  Doutof  D✓is deleted from  D✓and added to  D×.

Doutmust be a subset of  D✓. This holds, first, because  ⟨Q, P(Q)⟩is a query Q w.r.t. the leading diagnoses  D✓and the DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rtogether with its q-partition P(Q) (CALCQUERY in line 16, cf. Section 9.2). Second,  Doutcorresponds either to  D+(Q)(if the answer u(Q) = false) or to  D−(Q)(if the answer u(Q) = true) where both sets must be subsets of the set of leading diagnoses D✓by Definition 7.2 (GETINVALIDDIAGS in line 19, cf. Section 9.2).

Hence,  D✓ ∪ D×remains unchanged throughout Algorithm 5. By the renaming of D to become D✓in Algorithm 5 (see the argumentation above),  D✓ ∪ D×is equal to the set  D ∪ D×where ⟨D, Q, Ccalc, D×⟩is the output of the first call to STATICHS in Algorithm 5. Therefore, the relevant data  ⟨D ∪ D×, Q, Ccalc⟩of T is unmodified until the second call to STATICHS within Algorithm 5 is made.

So, we have that the arguments given to STATICHS at the second time it is called throughout the execution of Algorithm 5 correspond exactly to the input parameters to STATICHS assumed in Lemma 11.2. Notice that the probability measure  pK()which corresponds to the probability measure p() in STATICHS is never changed throughout the while-loop in Algorithm 5 (cf. Section 9.2).

Thus, by Lemma 11.2, we conclude that the second call to STATICHS during the runtime of Algorithm 5 yields the output  ⟨D, Q, Ccalc, D×⟩where  ⟨D ∪ D×, Q, Ccalc⟩is the relevant data of  T ′and T ′is a (partial) wpHS-tree w.r.t.  ⟨K, B, P, N ⟩Rand  pnodes()equivalent to one produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p().

By means of the same line of argument we used so far and further applications of Lemma 11.2 it can be derived that the proposition of this lemma holds for any call to STATICHS throughout Algorithm 5.

By means of the just proven Lemma 11.3, we are now able to show by the next lemma that STATICHS computes minimal diagnoses w.r.t. the DPI  ⟨K, B, P, N ⟩Rgiven as an input to Algorithm 5 in most-probable-first order. Further on, the next lemma will reveal that only minimal diagnoses w.r.t. the DPI ⟨K, B, P, N ⟩Rare computed by STATICHS which assures the soundness of STATICHS concerning the (input) DPI  ⟨K, B, P, N ⟩R. The soundness of STATICHS as regards the (current) DPI  ⟨K, B, P ∪P′, N ∪N ′⟩Rwill be considered in Lemma 11.6 below.

Lemma 11.4. Any call to STATICHS within Algorithm 5 yields an output  ⟨D, Q, Ccalc, D×⟩where D ∪ D×is the set of  |D ∪ D×|most probable (w.r.t.  pnodes()) minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Proof. Let T be the (partial) wpHS-tree T produced by any call to STATICHS within Algorithm 5. Then, by Lemma 11.3,

T is equal to a (partial) wpHS-tree produced by Algorithm 2 with input parameters  ⟨K, B, P, N ⟩Rand p() and

the first set  Dcalcin the relevant data  ⟨Dcalc, Q, Ccalc⟩of T produced by Algorithm 2 corresponds to  D ∪ D×.

So, by Corollary 4.7, the proposition of this lemma follows.

Moreover, Lemma 11.3 provides the basis for showing the completeness of STATICHS. That is, Lemma 11.5 will manifest that all minimal diagnoses w.r.t. the DPI  ⟨K, B, P, N ⟩Rgiven as an input to Algorithm 5 will be found by STATICHS given that it keeps executing for a sufficiently long period of time.

Lemma 11.5. Any call to STATICHS within Algorithm 5 where the execution of STATICHS terminates due to Q = [] yields an output  ⟨D, Q, Ccalc, D×⟩where  D ∪ D×is the set of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R.

Proof. The proposition of this lemma follows from Lemma 11.3 and Proposition 4.15 by an analogue argumentation as in the proof of Lemma 11.4.

The following lemma proves that STATICHS is sound w.r.t. the finding of minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, i.e. the DPI  ⟨K, B, P, N ⟩Rgiven as an input to Algorithm 5 extended by all new positive and negative test cases  P′and  N ′, respectively, that have been collected so far.

Lemma 11.6. If any call to STATICHS adds an element D to the set  Dcalcduring the execution of Algorithm 5, D is a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

Proof. By Lemma 11.4 we know that each node node that is added to  Dcalcby STATICHS is a minimal diagnosis w.r.t. the input DPI  ⟨K, B, P, N ⟩R. Through the test for validity of K \ node w.r.t. ⟨·, B, P ∪ P′, N ∪ N ′⟩R(cf. Definition 3.3) which must be successful before node is added to  Dcalc(ISKBVALID in line 10), we have that node must also be a diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rby Proposition 3.2. Since node is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩Ras argued and due to Proposition 12.4 (see page 200), there cannot be a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rwhich is a proper subset of node. Thence, node must be a minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

We are now in a position to bring to proof that the first set D in the tuple output by any call of STATICHS in Algorithm 5 contains only these minimal diagnoses w.r.t. the (input) DPI  ⟨K, B, P, N ⟩Rthat are also minimal diagnoses w.r.t. the (current) DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. In other words, this means that the set of leading diagnoses used for query generation in Algorithm 5 consists only of minimal diagnoses w.r.t. the input DPI that are in agreement with the additional information given by all query answers so far.

Lemma 11.7. Any call to STATICHS within Algorithm 5 yields an output  ⟨D, Q, Ccalc, D×⟩where D ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩R.

Proof. The output set D of any call to STATICHS during the execution of Algorithm 5 corresponds to the set  Dcalc ∪ D✓in STATICHS. As per Lemma 11.6,  Dcalcincludes only minimal diagnoses w.r.t. ⟨K, B, P ∪ P′, N ∪ N ′⟩R. By Lemma 11.4,  Dcalcincludes only minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩R. Therefore, we can conclude that  Dcalc ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩R. So, we must show that  D✓ ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩Rholds when any call to STATICHS during the execution of Algorithm 5 terminates. We will perform an induction proof.

Base Case: At the first call of STATICHS during the execution of Algorithm 5, the argument  D✓passed to STATICHS is the empty set. As argued in the proof of Lemma 11.1,  D✓is never modified throughout STATICHS. Thus,  D✓ = ∅ ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩Rholds for the output of the first call to STATICHS. Therefore, the proposition of this lemma holds for the output of the first call of STATICHS.

Induction Step: Assume that the proposition of this lemma holds for the last-but-one call to STATICHS during the execution of Algorithm 5 (Induction Hypothesis). Consider the last, i.e. most recent, call to STATICHS during the execution of Algorithm 5.

First, the set  D✓given as an input argument to STATICHS at the last call of STATICHS is unmodified throughout the entire execution of STATICHS, as already mentioned. Second,  D✓ = D′ \ Dout ⊆D′holds where  D′is the output of the last-but-one call of STATICHS by Algorithm 5 since the only modification to the set  D′(which is denoted by  D✓in Algorithm 5) during Algorithm 5 is the deletion (line 21) of exactly those diagnoses  Doutin  D′that are invalidated by the addition of the most recent test case (GETINVALIDDIAGS in line 19). That is, the input  D✓to the most recent call to STATICHS includes only diagnoses that comply with the most recently added test case. Call the most recently added test case tc. By the Induction Hypothesis,  D′ ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪(P′\{tc}),N∪(N ′\{tc})⟩R. Notice that either  tc ∈ P′or  tc ∈ N ′holds, but not both. As  D✓ ⊆ D′, it must be true that  D✓ ⊆mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪(P′\{tc}),N∪(N ′\{tc})⟩Rand  D✓complies with the test case tc. Hence, we infer that  D✓ ⊆ mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩R. Consequently, the proposition of this lemma must hold for each call of STATICHS during the execution of Algorithm 5.

The results proven so far in this section facilitate the proof of correctness of STATICHS:

Proposition 11.1 (Correctness of STATICHS). Any call to STATICHS (given the inputs described in Algorithm 7) within Algorithm 5 terminates and yields an output  ⟨D, Q, Ccalc, D×⟩where

(1) it holds for D that

image

where “most-probable” refers to the probability measure  pnodes()(cf. Definition 4.9) obtained from the given function p();

(2) Q is the current queue of open (non-labeled) nodes of the produced (partial) wpHS-tree,

(3)  Ccalcis the set of all minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far and

(4)  D×is the set of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far where each diagnosis in D×does not satisfy all test cases  P′and  N ′.

Proof. Termination of any call to STATICHS within Algorithm 5 is granted by the fact that each node is a subset of K wherefore  2|K|is a finite upper bound of the overall number of nodes that might be elements of Q during the execution of any call of STATICHS. Moreover, in each iteration of the repeat-loop in STATICHS, one element is removed from Q (line 6) and no once removed element can ever be readded to Q. The latter is satisfied due to the non-minimality criterion (lines 21-23) that deletes all but one nodes set-equal to some set  X ⊆ Kbefore the first node set-equal to X is processed and due to the fact that no once labeled nodes, i.e. those nodes that are elements of  Dcalc, D✓or  D×, are ever added to Q again (because there is no line of code in STATICHS that does so).

Proposition (1): During the execution of Algorithm 5 (and STATICHS), diagnoses are added to  D×only in line 22. In this line, only and all diagnoses not complying with the most recent test case are added to  D×(GETINVALIDDIAGS in line 19, cf. Section 9.2). Hence, no diagnosis in  D×can be in  mD⟨K,B,P,N⟩R ∩ mD⟨K,B,P∪P′,N∪N ′⟩R. Now, by Lemmata 11.4 and 11.7, we deduce that  D ⊂mD⟨K,B,P,N⟩R∩mD⟨K,B,P∪P′,N∪N ′⟩Ris the set of most probable minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rthat satisfy all test cases  P′and  N ′. If STATICHS does not terminate due to Q = [], properties (a)-(i) and (a)-(ii) of D are direct consequences of the stop criterion in line 18 in STATICHS. Otherwise, we infer by Lemma 11.5 that (b) must be true.

Propositions (2) and (3) hold by Lemma 11.3 and the definition of relevant data of a (partial) wpHStree (cf. Remark 4.2).

Proposition (4): This proposition follows from the line of argument in the proof of proposition (1) above.

image

the DPI  ⟨K, B, P, N ⟩Rgiven as input to Algorithm 5,

the overall sets of positively (P ′) and negatively (N ′) answered queries added as test cases to  ⟨K, B, P, N ⟩R so far,

the current queue Q of open (non-labeled) nodes of a (partial) wpHS-tree,

some desired computation timeout t,

a desired minimal (nmin ≥ 2) and maximal (nmax) number of minimal diagnoses to be returned,

the set  Ccalcof all minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far, the set  D✓of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far that satisfy all test cases  P ′ and N ′,the set  D×of all minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far that do not satisfy all test cases  P ′ and N ′.

a function  p : K → (0, 0.5).

image

Q is the current queue of open (non-labeled) nodes of the produced (partial) wpHS-tree,

 Ccalcis the set of all minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far and  D×comprises those minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rcomputed so far that do not satisfy all test cases  P ′ and N ′.

image

Diagnosis Computation Algorithm

As the name already suggests, DYNAMICHS (Algorithm 8) is a procedure that solves the problem of Interactive Dynamic KB Debugging defined by Problem Definition 6.1 if used for leading diagnosis computation in Algorithm 5. DYNAMICHS is sound, complete and optimal w.r.t. the set of solutions of the Interactive Dynamic KB Debugging problem (this will be proven in Section 12.4.10). Optimality refers to the best-first computation of minimal diagnoses regarding a given probability measure.

12.1 Overview and Intuition

Synoptic View of the Algorithm. DYNAMICHS (Algorithm 8) is employed as a subroutine in Algorithm 5 with mode = dynamic to build up a hitting set tree iteratively. That is, each time DYNAMICHS is called in Algorithm 5, it expands the existing tree only to a sufficient extent in order to determine a desired number of new leading diagnoses used for the generation of the next query. Then, the leading diagnoses set is returned.

Outside of the DYNAMICHS method in Algorithm 5, a new diagnosis probability distribution is obtained by the diagnosis probability update (cf. Section 9.2). Once this distribution involves one diagnosis, the probability of which exceeds a predefined threshold  1 − σ, the algorithm terminates. The output is a solution KB w.r.t. the current DPI built from this highly probable minimal diagnosis.

Remark 12.1 In case  σhas a predefined value of zero, the output is the (exact) solution to the problem of Interactive Dynamic KB Debugging for the input DPI. In a scenario where some fault tolerance σ > 0is given, the solution KB returned by Algorithm 5 is an approximation of the (exact) solution to Interactive Dynamic KB Debugging for the input DPI where a better approximation can be expected for smaller values of  σ(cf. Remark 9.2). “Better” in this context refers to the satisfaction of desired semantic properties of the KB returned by Algorithm 5, i.e. desired entailments and desired non-entailments of the KB. The intuition is that specification of additional test cases T guarantees the output of a KB complying with these test cases, whereas accepting one – albeit highly probable – of multiple solution KBs without having incorporated T leaves open the possibility for this KB to not fulfill T.

However, answering queries is effort for an interacting user. Therefore, the approach that involves the “early” termination of the algorithm after a solution KB has a sufficiently high probability (lower than 1) constitutes a trade-off between exactness of the output and the effort of the user and overall execution time of the interactive KB debugging algorithm, respectively.

In case there is no highly probable leading diagnosis, a query constructed from the current set of leading diagnoses is asked to the user. The user’s answer is incorporated into the current DPI resulting in a new DPI. Thereafter, DYNAMICHS is invoked again given this new DPI as an argument.

Storage of the Search Tree. Between each two calls of DYNAMICHS in Algorithm 5, the “state” of the current hitting set tree is stored by variables

 Dcalc– computed minimal diagnoses w.r.t. the current DPI,

Q – the list of open, non-labeled nodes,

 Ccalc– (not necessarily minimal) conflict sets w.r.t. the current DPI computed so far,

 D⊃– non-minimal diagnoses w.r.t. the current DPI computed so far,

 Qdup– non-labeled duplicate nodes (i.e. nodes corresponding to tree branches with the same set of edge labels as branches that are already present in the tree)

 D×– the empty set (is filled up during Algorithm 5 between two calls of DYNAMICHS with diagnoses from  Dcalcthat have been invalidated by an answered query)

where nodes in the tree again store (among others) the edge labels on the path from the root node to themselves.

Search Tree Update. It is immediately apparent from the enumeration given above that, in comparison to STATICHS, additional collections, i.e.  D⊃as well as  Qdup, need to be maintained in order to “remember” the current tree while Algorithm 5 is processing outside of the method DYNAMICHS. The cause for these additional variables is the tree update necessary after each addition of a test case to a DPI. For, each iteration of DYNAMICHS considers a different DPI in terms of the test cases. And, any two different DPIs in general lead to a different hitting set tree and to different sets of minimal diagnoses and conflict sets. Hence, the idea of the tree update is the following: Reuse the partial hitting set tree T (stored by the variables described above) constructed before the new test case was added to the current DPI  DPIjand perform suitable modifications to T in order to obtain a tree  T ′such that the further expansion of  T ′allows to identify all minimal diagnoses w.r.t. the new DPI  DPIj+1resulting from the addition of the new test case to  DPIj. In other words, the tree update seeks to establish a tree that is equivalent to one built by execution of DYNAMICHS using the new DPI  DPIj+1starting from an empty tree.

Node Storage. Notice that, unlike in STATICHS or HS, it is crucial to store nodes not as sets in DY- NAMICHS, but as ordered lists of formulas. That is, each node nd stores a list of all the edge labels along the (directed) path in the hitting set tree from the root node to nd where the order of formulas in the list is given by the order of traversing the edge labels along this path. Additionally, DYNAMICHS stores the attribute nd.cs for each node nd which is an ordered list including the node labels, i.e. the conflict sets, along the path from the root node to nd in analogous way. Associating a node with these two lists instead of one set is necessary from the point of view of the tree update. Because this facilitates the differentiation between two nodes corresponding to an equal (partial) diagnosis. For example, there could be some node nd1that is “redundant” after some query Q has been answered, but there is a set-equal node  nd2which is still “relevant” (set-equality refers to equal sets, not lists, of edge labels stored by two nodes). In this case, the algorithm should get rid of  nd1(in order to save time and space) while preserving node  nd2(in order to maintain completeness). Associating set-equal nodes with each other might thus either lead to unnecessary tree expansion steps (if none is deleted) or incompleteness of the algorithm concerning the consideration of all minimal diagnoses (in case both are deleted).

Addition of a Test Case Changes Set of Solutions. Unlike the STATICHS algorithm, which is strongly related to the non-interactive hitting set algorithm HS (Algorithm 2) as outlined in Section 11.1, the hitting set tree produced by DYNAMICHS will usually differ significantly from the non-interactive hitting set tree produced by HS. The reason for this is that in DYNAMICHS the initial DPI  DPI0is not fixed (in that conflict sets and diagnoses are calculated only w.r.t.  DPI0), but new test cases are also used for the computation of minimal conflict sets (and thus minimal diagnoses) and not only for the invalidation of diagnoses. Hence, every time a query has been answered and a respective test case has been incorporated into the DPI, the minimal conflict sets computed for the old DPI  DPIjmight not be minimal conflict sets w.r.t. the current DPI  DPIj+1anymore (see Examples 12.1 and 12.2). On the one hand, a minimal conflict set C w.r.t.  DPIjmight be a non-minimal conflict set w.r.t.  DPIj+1(since there is a new minimal conflict set  C′ ⊂ Cw.r.t.  DPIj+1). On the other hand, there might be also “completely new” minimal conflict sets  Ckw.r.t.  DPIj+1which are in no set-relationship with any minimal conflict set w.r.t.  DPIj.

Due to this changing set of minimal conflict sets, the set of minimal diagnoses is variable as well (cf. Proposition 4.6). To see this, let D be a minimal diagnosis w.r.t.  DPIj. Then D hits all minimal conflict sets  Ckin  mCDP Ij. Now, assume that D comprises (only) the element ax from  Ck, but there is a minimal conflict set  C′kin  mCDP Ij+1such that  C′k ⊆ Ck \ {ax}. In this case, D is not a (minimal) hitting set of all minimal conflict sets in  mCDP Ij+1(since D does not hit  C′k), i.e. D is not a (minimal) diagnosis w.r.t.  DPIj+1. That means, D needs to be extended (by a hitting set of all minimal conflict sets in  mCDP Ij+1it does not hit) in order to become a diagnosis w.r.t.  DPIj+1. After extending D, both situations might arise, either that D is a minimal diagnosis w.r.t.  DPIj+1or that D is a non-minimal diagnosis w.r.t.  DPIj+1. When the latter case occurs, DYNAMICHS might often be able to figure out that (the tree branch corresponding to) D is simply redundant (w.r.t. the new DPI  DPIj+1) and does not need to be considered during the further expansion of the hitting set tree (which searches for minimal diagnoses w.r.t.  DPIj+1and not w.r.t.  DPIj). That is, such redundant tree branches are unnecessary in order to explore all minimal diagnoses w.r.t.  DPIj+1(cf. Sections 12.1 and 12.4.5 for an explanation and precise characterization of redundancy).

As a consequence, the nice property of STATICHS that the set of minimal diagnoses that needs to be taken into account given  DPIj+1is a proper subset of the minimal diagnoses set that needed to be considered given  DPIjin no longer valid for DYNAMICHS. That is, the set of remaining solution candidates in DYNAMICHS is not guaranteed to “converge” constantly towards a singleton comprising only one solution. The DPI, the minimal conflict sets as well as the minimal diagnoses are “dynamic”. What holds for both DYNAMICHS and STATICHS is the guarantee that the set of all (i.e. minimal and non-minimal) diagnoses is constantly shrinking, i.e.  aDDP Ij ⊃ aDDP Ij+1(as well will later prove by Corollary 12.4).

Search Tree Pruning. Let T be the hitting set tree produced in the j-th iteration of DYNAMICHS (i.e. T is the tree that was used to search for minimal diagnoses w.r.t.  DPIj). Then, after a new test case has been added to  DPIj, there are often redundant subtrees in T that can be pruned. The resulting tree  T ′can then be used in the (j + 1)-th iteration of DYNAMICHS to identify minimal diagnoses w.r.t. the new DPI  DPIj+1. Using T instead of  T ′might lead to a significant time and (more severely) space overhead, due to the unnecessary expansion of redundant branches that are known to give no new information at all. Another approach could be to simply discard the entire tree T and start to construct a new one w.r.t.  DPIj+1from scratch. This strategy, however, will usually also suffer from a non-negligible time overhead since most of the tree T can be safely reused in iteration j+1 and only parts of it must be revised. In particular, this strategy would potentially involve many additional calls of QX (which internally calls an expensive reasoner) as, in the worst case (when no pruning is possible), the entire existing tree might be rebuilt.

As we shall see in Remark 12.5, Section 12.4 and Examples 12.1 as well as 12.2, the overhead in terms of (expensive) calls to a reasoner (i.e. calls of QX) due to tree pruning (compared to its impact on the tree) is absolutely reasonable. In fact, only one call of a “fast version” of QX (see Section 12.4.6) might already lead to the deletion of 75% of the tree branches as one can see in the first pruning step in Example 12.2.

The evolution of the hitting set tree produced by Algorithm 5 using DYNAMICHS is thus characterized by alternating expansion and pruning phases. Also for very complex problems, in case that expansion phases are “short enough” such that tree pruning can take place “often enough”, one might be able to keep the hitting set tree “small enough” to handle it efficiently. The extent of the expansion phase can be steered by the specification of the leading diagnosis parameters  nmin, nmaxand t (cf. Section 9.2). In the extreme case, these can be defined in a way (nmin = nmax = 2) the algorithm will allow only the computation of a single further minimal diagnosis (in the first expansion phase: two diagnoses) before DYNAMICHS (i.e. the tree expansion phase) terminates and a further pruning phase might take place.

However, it is not automatically warranted that tree pruning is possible after each expansion phase. Similarly, no certainty is given that the transition from  DPIjto  DPIj+1just causes the deletion of parts of the tree and no additional expansion of the tree. In fact, this depends on certain properties of the test case that is added after an expansion phase (i.e. properties of the generated query).

Test Cases Affect Tree Pruning. Some added test case might give rise to some pruning steps as well as it might induce the construction of new subtrees (where “new” means that these would be no subtress of a hitting set tree w.r.t. the previous DPI  DPIj). The latter situation occurs when “completely new” minimal conflict sets (see above) are introduced by the addition of a test case. If this is the only impact of a test case, then this test case has only a negative influence on the time and space complexity. In other words, none of the invalidated minimal diagnoses (and no other nodes in the tree) are redundant; but all of them must additionally hit the set of “completely new” minimal conflict sets (in order to become diagnoses w.r.t.  DPIj+1). Hence, in this case, the transition from  DPIjto  DPIj+1results only in monotonic growth of the tree. If possible, such “negative-impact test cases” must be avoided. On the other hand, one must strive for the usage of “positive-impact test cases”, i.e. those that only trigger tree pruning, but no tree expansion. Defining and studying properties that constitute such “positive-impact test cases” and developing specialized algorithms for extracting exactly those types of queries that enable as substantial and effective pruning as possible is a topic of future research.

An idea pertinent to this issue could for example be to attempt to extract a query by means of the conflict set C that labels the root node of the tree. More concretely, if any answer to a query yields a new test case that leads to the introduction of a minimal conflict set that is a proper subset of C, then it is for sure that significant pruning can take place (since entire subtrees starting from the root of the tree can be deleted). For instance, the first query  Q1in Example 12.2 features this property. Roughly, the reasons for that are that  Q1is an entailment of a proper subset  Csubof C (i.e.  Csubis a justification of  Q1, cf. Section 4.2) and  Q1is “relevant” for this conflict set C to be a conflict set. In other words, the latter means that  Q1can be used to “replace” the part  Csubof C, i.e.  (C \ Csub) ∪ Q1is invalid w.r.t. the given DPI. That is, addition of  Q1to the positive test cases asserts the correctness of one part of C, namely  Csub(cf. Example 12.2), wherefore the other part must be incorrect (because some part of a conflict set must be definitely incorrect). On the other hand, assignment of  Q1to the negative test cases asserts exactly the incorrectness of  Csubwherefore the formulas  C \ Csubbecome obsolete in the minimal conflict set C yielding the new minimal conflict set  C′ := Csub. Another desirable property of  Q1is that addition of  Q1to either set of test cases does not imply the origination of any “completely new” conflict sets (see above) which result in additional growth of the tree.

That is, in its original form (without assuring only the usage of “positive-impact test cases”), the time and space complexity of DYNAMICHS is a function of the generated queries. There is a potential to perform significant pruning, but also the risk of significant tree growth. In case mostly “positive-impact queries” are generated and asked to the user, the performance might be very nice and significantly superior to the one of STATICHS. In the reverse case, the performance might be also worse than the one of STATICHS. In the case of STATICHS, there is no chance for significant pruning, but also no chance for a tree growth that goes beyond the size of the non-interactive tree produced by HS.

In STATICHS, there are only expansion phases (in case the tree pruning described by Definition 4.8 is considered part of an expansion phase) which means that the tree constructed by STATICHS will constantly grow (apart from the deleted duplicate nodes and non-minimal diagnoses). All the user can do is hope that Algorithm 5 applying STATICHS will not run out of memory (cf. Section 11.1).

The idea is now to be able to use DYNAMICHS instead of STATICHS particularly if the latter runs out of memory soon. If the leading diagnosis parameters are specified small enough to prevent the hitting set tree produced during one expansion phase from becoming too large and test cases are not chosen unfavorably, the DYNAMICHS method should be able to outperform STATICHS significantly, as Examples 11.2 and 12.2 suggest.

12.2 Algorithm Walkthrough

Input Parameters. When DYNAMICHS (Algorithm 8) is called for the first time in Algorithm 5, the inputs  Ccalc, D✓, D×, P′and  N ′correspond to the empty set and  Q = [∅](cf. lines 1-4 and 10 in Algorithm 5). Further on,  Dcalcis defined to be the empty set at the beginning of each execution of DY- NAMICHS. That is, DYNAMICHS starts the construction of the hitting set tree from an initial tree consisting of a single unlabeled root node  ∅ (∈ Q). And, all collections that are later returned by DYNAMICHS in line 25, except for Q, are initially empty. Further input arguments are the DPI  ⟨K, B, P, N ⟩Rprovided as an input to Algorithm 5, the sets of positively (P′) and negatively (N ′) answered queries since the start of Algorithm 5 (both sets initially empty), the leading diagnosis computation parameters  nmin, nmax, t(see description in Chapter 7 on page 95) and the probability measure  p() := pK()that assigns a probability in the interval (0, 0.5) to each formula in K (see line 5 in Algorithm 5).

Tree Update during First Iteration of DYNAMICHS. Before the repeat-loop in DYNAMICHS is entered, the UPDATETREE function is called (line 4), but has no effect. This holds since UPDATETREE first iterates over all elements in  D×, then over all elements in  D⊃and finally over all elements in  D✓where D× = D⊃ = D✓ = ∅, as pointed out before.

The Main Loop. During the repeat-loop, in each iteration the first node node in the queue Q of open (non-labeled) nodes is processed (GETFIRST, line 6). Notice that, anywhere throughout DYNAMICHS, nodes are added to Q in a way that a sorting of Q in descending order according to  pnodes()(cf. Definition 4.9) is maintained (cf. INSERTSORTED in lines 17, 68, 77, 80, 100 and 103). Hence, the most probable node (according to  pnodes()) is always processed next.

So, when node is processed, it is first deleted from Q (DELETEFIRST, line 7). Then a test is performed whether  node ∈ D✓, i.e. whether node is already known to be a minimal diagnosis w.r.t. the current DPI ⟨K, B, P ∪ P′, N ∪ N ′⟩R. In case this test is positive, node is directly added to  Dcalc, the set of leading diagnoses that will be output by the current call of DYNAMICHS. Otherwise, the DLABEL function is called given node (i.a.) as a parameter (line 11).

Computation of a Node Label. The DLABEL function processes node as follows. First, the non-minimality criterion (lines 27-29) is checked. That is, among all nodes in  Dcalc, one is searched which is a proper subset of node. If such a node nd is found, then node must be a non-minimal diagnosis w.r.t. the current DPI since, anytime throughout the execution of DYNAMICHS,  Dcalccontains only minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R(this will be proven later by Proposition 12.9). In this case, unlike in STATICHS, the branch in the hitting set tree corresponding to node cannot be simply discarded, but needs to be still stored (in the set  D⊃). It is necessary to store non-minimal diagnoses as these might become minimal diagnoses w.r.t. the new DPI obtained after the subsequent addition of a new test case to the current DPI (cf. Proposition 12.5).

In case the non-minimality criterion is not satisfied, the reuse criterion (lines 30-40) is checked next. That is, the set  Ccalccontaining (not necessarily minimal) conflict sets w.r.t. the current DPI is browsed for a set C such that C and node are disjoint sets. If such a set C is found, there must be some set  X ⊆ Cwhich is a minimal conflict set w.r.t. the current DPI. This minimal conflict set X can then be used to label node since the set of edge labels along the path in the tree leading from the root node to node does not hit X (because it does not hit C).

The minimality of C is verified by a call of QX(⟨C, B, P ∪ P′, N ∪ N ′⟩R)that yields X, a minimal conflict set w.r.t. the current DPI (cf. Proposition 4.9; notice that X must be a non-empty set due to Proposition 12.2, for details see Section 12.4). In case  X ⊂ C(line 33), before X is returned as a label for node, the following tree pruning steps are performed:

All the conflict sets  Ciused as node labels in the hitting set tree or in duplicate tree branches so far (i.e.  Ci ∈ nd.csfor a node  nd ∈ Q∪D⊃∪Qdup) such that  X ⊂ Ciare replaced by X (PRUNEQDUP and PRUNE in lines 36-38),

any subtree is pruned if its root node is linked to a node now labeled by X (replacing some  Ci ⊃ X) by an edge with label ax where ax is in  Ci \ X(PRUNEQDUP and PRUNE in lines 36-38) and

for each pruned node nd, if there is a non-pruned node in  Qdupsuited to construct a node  nd′thatcan replace  nd, nd′is added to the collection of nodes from which nd was deleted (PRUNEQDUP and PRUNE in lines 36-38),

all the conflict sets  Ci ∈ Ccalcthat are proper supersets of X are deleted from  Ccalcand X is added to  Ccalc(ADDSETDELSUPSETS in line 39).

Otherwise, C (= X) is directly returned by DLABEL without performing any tree pruning because the reused conflict set C is (still) a minimal conflict set w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪N ′⟩R(notice that each element of  Ccalcwas added to  Ccalcas a minimal conflict set w.r.t. some DPI  ⟨K, B, P∪P′′, N ∪N ′′⟩Rwhere  P′′ ⊆ P′and  N ′′ ⊆ N ′during the execution of this or a previous call of DYNAMICHS). For an in-depth explanation of the pruning functions PRUNE and PRUNEQDUP the reader is kindly referred to Section 12.4.6.

Remark 12.2 During the execution of the first call of DYNAMICHS in Algorithm 5, no tree pruning can take place (neither within the scope of DLABEL nor anywhere else) since all elements of  Ccalc(initially the empty set) must be minimal conflict sets w.r.t. the input DPI which is at the same time the current DPI. Pruning of the hitting set tree is only possible in case some non-leaf nodes of the tree are labeled by conflict sets that are not minimal w.r.t. the current DPI.

Given that the reuse criterion fails, QX is called given the current DPI  ⟨K\node, B, P ∪P′, N ∪N ′⟩Ras an argument (line 41). If the output L is equal to ’no conflict’, then we know by Proposition 4.9 that node is a diagnosis w.r.t. the current DPI, wherefore the label valid is returned for node. Otherwise, the output L must be a minimal conflict set w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rthat has an empty set-intersection with node. Since the reuse criterion failed, i.e. there is no set in  Ccalcthat does not intersect with node, L must be a fresh minimal conflict set w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rin the sense that  L /∈ Ccalcmust hold. Therefore the label L is first added to  Ccalcand then returned by DLABEL as a label for node.

Remark 12.3 Please notice that this call of QX to label a node is one of the key differences between STATICHS and DYNAMICHS. Whereas the former uses QX exclusively for the computation of minimal conflict sets w.r.t. the (static) input DPI exploiting just the initial sets of positive and negative test cases P and N , respectively, the latter employs QX to compute minimal conflict sets w.r.t. the (dynamic) current DPI which includes all new test cases (P′and  N ′) resulting from answered queries in the ongoing interactive debugging session so far.

Processing of a Node Label. Back in the main procedure, the label L returned by the DLABEL function is processed as follows. If L = valid, then it is a fact that node is a minimal diagnosis w.r.t. the current DPI (cf. Proposition 12.9 in Section 12.4.9) wherefore node is added to the set  Dcalc. Otherwise, if nonmin is the returned label for node, node is added to the set  D⊃of non-minimal diagnoses w.r.t. the current DPI. Otherwise, i.e. if  L /∈ {valid, nonmin}, then L must be a minimal conflict set w.r.t. the current DPI (see the description of node label computation above). In this case, |L| successor nodes of node are generated (lines 18 and 19). For each logical formula  e ∈ L, a new node is computed from node (and node.cs) as  nodee := ADD(node, e)and  nodee.cs := ADD(node.cs, L)which means that e is appended to the end of the list node and L is appended to the end of the list node.cs.

If there is already a node  nd ∈ Qsuch that  nd = nodee(line 20), where ’=’ applied to these lists means that the list nd interpreted as a set is equal to the list  nodeeinterpreted as a set (cf. Section 12.4.1 for an explication of this notation), then there is already a branch in the existing tree which includes the same set of edge labels as the new node  nodee. Note that the tree branch corresponding to nd will differ from the one corresponding to  nodeein terms of the order of edge labels or (the order of) the node labels visited when traversed starting from the root node. As it makes no sense to expand two branches with equal sets of edge labels in a hitting set tree (cf. rule 6 in Definition 4.8) for time and space complexity reasons and the fact that the sought diagnoses are sets – and not lists – of edge labels in the tree, such a duplicate node  nodeeis stored in the separate list  Qdup. This list  Qdupis always kept sorted by ascending node-cardinality (INSERTSORTED in line 21).

The purpose of storing and not deleting such nodes is the possibility that the now “active” branch nd might be pruned after the addition of some test case whereas  nodeemight be unaffected by that pruning step. In this case,  nodee, given it meets certain properties (see Section 12.4 for details), can be reactivated and incorporated into the tree in order to replace nd. Had  nodeejust been discarded instead of being stored, the completeness of Algorithm 5 with mode = dynamic would be violated in general. That is, we would not have any guarantee that all minimal diagnoses w.r.t. the current DPI are actually explored by the algorithm.

Otherwise, if there is no node in Q that is set-equal to  nodee, then  nodeeis added to the k-th position in Q (INSERTSORTED in line 23) if there are (exactly)  k − 1nodes in Q that have a probability as per pnodes()that is greater than or equal to  pnodes(nodee).

Stop Criterion. The repeat-loop of DYNAMICHS is executed until the stop criterion in line 24 is sat-isfied. The first criterion causing DYNAMICHS to terminate is Q = [] which means that the complete hitting set tree has been constructed and no further nodes can be labeled. In this case,  Dcalccomprises all minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R(cf. Proposition 12.8).

If the first criterion is not met, then the second criterion is checked. That is, a test is performed which checks first whether there is at least one new diagnosis w.r.t. the current DPI in  Dcalcwhich was not returned by the last-but-one call of DYNAMICHS (i.e. which is not an element of  D✓). Notice that this criterion or Q = [] will be definitely met after finite execution time of DYNAMICHS since either new nodes in Q will be processed (and labeled) until there is some new diagnosis w.r.t. the current DPI identified or the Q will become empty.

Additionally, the second criterion involves a test that checks whether the cardinality of  Dcalcamounts to at least  nminand either  |Dcalc| = nmaxor more than t time has passed since the start of the execution of DYNAMICHS. In the latter case,  nmin ≤ |Dcalc| < nmaxholds. In the former case,  |Dcalc| = nmaxis satisfied.

Processing of the Leading Diagnoses Returned by DYNAMICHS. When a call of DYNAMICHS in Algorithm 5 returns  ⟨Dcalc, Q, Ccalc, D×, D⊃, Qdup⟩, the set  Dcalcis stored in the variable  D✓in Algorithm 5. Between two successive calls of DYNAMICHS in Algorithm 5, only this set  D✓as well as D×are modified. The collections  Q, Ccalc, D⊃as well as  Qdupremain unchanged until they are used as input parameters when it comes to the next call of DYNAMICHS in Algorithm 5.

In case one diagnosis  Dmaxof the current leading diagnoses in  D✓has a probability greater than or equal to  1 − σas per the probability measure  pD()(see Section 9.2), the stop criterion of interactive KB debugging is met and the solution KB  (K\Dmax)∪UP∪P′w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪N ′⟩Ris returned to the user (GETSOLKB in line 14, cf. Section 9.2). Thereafter, Algorithm 5 terminates and no more calls of DYNAMICHS take place.

Otherwise, if no leading diagnosis satisfies the stop criterion, a query Q together with its q-partition P(Q) is computed as has been detailed in Chapter 8 and Section 9.2. An answer u(Q) to this query is submitted by the interacting user (line 17 in Algorithm 5). Then u(Q) along with P(Q) is exploited to figure out the subset  Doutof  D✓that does not comply with u(Q). This set  Doutis then deleted from D✓and added to  D×. Additionally, Q is added to the positive test cases  P′if u(Q) = true and to the negative test cases  N ′otherwise. Subsequently, DYNAMICHS is called again given

the updated parameters  D✓, D×, P′and  N ′(which are modified within and outside of DY- NAMICHS during the execution of Algorithm 5),

the unchanged parameters  Q, Ccalc, D⊃and  Qdup(which are modified only within DYNAMICHS during the execution of Algorithm 5) and

the constant parameters  ⟨K, B, P, N ⟩R, t, nmin, nmaxand  pK()(which are not modified within or outside of DYNAMICHS during the execution of Algorithm 5).

The execution of this next and any subsequent call to DYNAMICHS runs in analogue way as described so far, except for the effect of the UPDATETREE function called at the very beginning of each execution of DYNAMICHS (recall that the execution of UPDATETREE had no effect during the first execution of DYNAMICHS). We shall now explicate how this function works in all other executions of DYNAMICHS, except for the first one.

Tree Update. Between line 48 and line 69, UPDATETREE goes through all nodes  nd ∈ D×(recall that D×includes exactly these diagnoses that have been ruled out by the most recently answered query) and first performs the Quick Redundancy Check (QRC, lines 50-54) for nd. If the QRC is not successful, it additionally performs the Complete Redundancy Check (CRC, lines 56-60) for nd.

The QRC (for details see Lemma 12.6) aims at identifying whether nd is redundant and can be pruned, i.e. it attempts to find a witness of redundancy of nd. Informally, a redundant node in (redundant subtree of) the tree is a node (subtree) such that the further expansion of the current tree without this node (subtree) still yields to the detection of all minimal diagnoses w.r.t. the current DPI. A witness of redundancy of nd is a minimal conflict set  C′w.r.t. the current DPI such that a superset  C ⊃ C′was used as a node label on the tree path nd represents (that is, there is some  i ≤ |nd.cs|such that C is the i-th element of nd.cs, i.e. C = nd.cs[i]) and the label (nd[i]) of the outgoing edge of C on the path represented by nd is an element not in  C′(that is, an element in  C \ C′). Formal and precise characterizations of redundancy of nodes and the witness of redundancy of a node are given by Definition 12.4 in Section 12.4.5.

To this end, the QRC involves the call of QX(⟨Und.cs \ nd, B, P ∪ P′, N ∪ N ′⟩R)which returns X. If X is a set (and not ’no conflict’), then X is a minimal conflict set w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪ N ′⟩R(as  Und.cs \ nd ⊆ K, cf. Proposition 4.9). To check if X is in fact a witness of redundancy of  nd, X ⊂ C(line 52) is tested for all  C ∈ nd.cs. If such a C is located, X is a witness of redundancy of nd and the QRC is successful (expressed by  quickRC ← truein line 53). In this case, the execution is resumed at line 61.

The QRC bears its name due to the fact that it requires at most one call of QX (which internally performs expensive calls to a reasoner). Moreover, it passes to QX a (DPI including a) KB of a size that is generally significantly smaller than |K| where |K| is roughly the size of the KB used in the (more expensive) calls of QX made in the DLABEL function. Hence, the QRC will be usually very fast (cf. Proposition 4.8).

Otherwise, since the negative outcome of the QRC (which is sound, but not complete w.r.t. the finding of a witness of redundancy of nd) does not imply the non-existence of a witness of redundancy of nd, the CRC (for details see Lemma 12.7) must be performed. As the name already suggests, the CRC is sound and complete and will therefore be positive and yield a witness of redundancy if and only if there is some. The CRC involves multiple calls of QX(⟨nd.cs[i] \ {nd[i]} , B, P ∪ P′, N ∪ N ′⟩R), one for each conflict set nd.cs[i] in nd.cs. It is straightforward from the characterization of a witness of redundancy given before that, given the CRC returns a set X, X is a witness of redundancy of nd.

If nd is non-redundant, there cannot be any witness of redundancy of nd. Hence, the complete and sound method CRC will not find such a one. Therefore, quickRC = false and completeRC = false must hold in line 61. In this case, the for-loop in line 48 continues with the next node in  D×.

On the other hand, if nd is redundant, due to the completeness of CRC, either quickRC = true or completeRC = true must hold when it comes to the execution of the if-statement in line 61. At this point, it is guaranteed that the variable X stores a witness of redundancy of nd.

The CRC, contrary to the QRC, generally requires multiple (at most |nd|) calls of QX (which internally performs expensive calls to a reasoner). But, like the QRC, it passes to QX a (DPI including a) KB of a size that is generally significantly smaller than |K|. Furthermore, at most one call of QX will involve more than one call of ISKBVALID (see Algorithm 1), i.e. the function that calls the reasoner. This must be true since CRC only requires an additional call of QX if a witness of redundancy has not yet been found. And, each call of QX that does not find a witness of redundancy of nd returns ’no conflict’ which necessitates only a single invocation of ISKBVALID. Hence, each execution of the CRC will be very fast in general as well (cf. Proposition 4.8).

What comes next is the pruning of all redundant nodes in the tree for which X is a witness of redundancy. Essentially, the same pruning steps are performed here as in the reuse criterion described in ’Computation of a node label’ above. A detailed discussion of the pruning functions PRUNE as well as PRUNEQDUP can be found in Section 12.4.6.

Notice that a redundant node is guaranteed to be a redundant node in any further iteration of DY- NAMICHS (using a new current DPI that incorporates new test cases). We will prove this by Lemma 12.4 in Section 12.4.5. So, nodes pruned by PRUNE or PRUNEQDUP can be deleted for good and do not need to be stored any longer. Moreover, it should be noted that only redundant nodes are pruned at any pruning step in DYNAMICHS. For, as long as a node in DYNAMICHS is not known to be redundant, some successor node of this node might be a minimal diagnosis w.r.t. the current DPI. Thus, the deletion of such a node could perhaps prevent the algorithm from finding a particular minimal diagnosis which would implicate the algorithm’s incompleteness.

Remark 12.4 Since the removal of a node from a collection  S ∈ {D×, Q, Qdup, D⊃}within the scope of PRUNE or PRUNEQDUP can be followed by the re-addition to S of a suitable duplicate node constructed from a node stored in  Qdup(see Section 12.4.6 for a precise explanation of node replacements),  D×might be changed both in that nodes are deleted from it and added to it during the for-loop (line 48). Therefore, the ’for nd ∈ D×’-statement must be read as ’if nd is a node in the current set  D×which has not yet been processed’. For a better code readability, we abstained from using a programmatically precise representation of this issue in Algorithm 9.

Due to the soundness and completeness of QRC paired with CRC concerning the identification of a witness of redundancy for a given node and the accomplished pruning of (at least) all nodes in  D×for which a witness of redundancy has been extracted, all nodes that are in  D×when the algorithm reaches line 67 are non-redundant nodes. Consequently, there is no evidence to exclude the remaining nodes in D×from the further search for minimal diagnoses. For this reason, each of these nodes is reinserted into Q by INSERTSORTED in line 68 such that the sorting of Q in descending order of  pnodes()is maintained. Then these nodes are deleted from  D×. Thus,  D× = ∅holds after each execution of UPDATETREE.

So, in DYNAMICHS, unlike in STATICHS, diagnoses (and nodes in general) are not ruled out due to the fact that they contradict an answered query, but only if they are (found to be) redundant. Nevertheless, a diagnosis that contradicts an answered query is a “hot candidate” for finding some witness of redundancy. For that reason, UPDATETREE searches for witnesses of redundancy (only) by means of D×which includes the most “suspicious” nodes. Namely, it comprises those nodes that were minimal diagnoses w.r.t. the last-but-one DPI, but have been invalidated by the most recently answered query. The two possible reasons for a diagnosis nd to be invalidated are its redundancy as defined above or that it does not hit a new minimal conflict set (which is not a subset of one in nd.cs) that has been introduced by the addition of the test case resulting from the user’s query answer. Thus, it is likely to detect witnesses of redundancy by investigating nodes in  D×, as the QRC and the CRC do. Throughout the pruning steps performed in lines 62-65, witnesses of redundancy extracted from nodes in  D×are exploited to remove redundant nodes in the other collections  Qdup, D⊃and Q as well.

Remark 12.5 It should be noted that the collections Q as well as  D⊃are not necessarily cleaned from all redundant nodes after all pruning steps in UPDATETREE are finished. At this point, all those redundant nodes are still elements of these collections for which no witness of redundancy was found (there might exist one, though) throughout the redundancy checks (QRC and CRC) performed.

Assuring the non-existence of redundant nodes in Q and  D⊃might involve extensive usage of the (expensive) reasoner. In the worst case, one call of QX for each non-leaf node along each path from the root node to a leaf node labeled by nonmin or to a leaf node that has no label would be necessary. However, the number of these non-leaf nodes is generally exponential in the maximum length of such a path in the tree. In comparison, the number of calls of QX for investigating all nodes in  D×by QRC and CRC is polynomial (linear) in the maximum length of a tree path labeled by  ×. For, the number of QX-calls cannot get larger than  (nmax − 1)(|ndmax| + 1)where the constant  nmaxis the maximum number of desired leading diagnoses predefined by the user and  |ndmax|is the maximum cardinality of some  nd ∈ D×. This holds since  |D×| ≤ nmax − 1(cf. Corollary 7.3) and QRC requires at most one and CRC at most  |ndmax|QX-calls.

Other than that, the chance of locating new witnesses of redundancy by means of investigating nodes in Q and  D⊃can be assumed to be smaller than for nodes in  D×since there is no indication or evidence that these nodes might be redundant. So, cleaning Q and  D⊃from all redundant nodes might be signifi-cant effort with negligible impact. Therefore, DYNAMICHS is designed to focus the search for witnesses of redundancy only on the “suspicious nodes” in  D×.

As mentioned above, when the execution arrives at line 70, only nodes that are definitely redundant (because they were deleted due to some witness of redundancy) have been deleted from the sets  Q, D×, D⊃and  Qdup.

In lines 70-78, each node  nd ∈ D⊃which has not been deleted throughout the pruning operations in line 65 is processed as follows: If there is no minimal diagnosis  D ∈ D✓such that  nd ⊃ D, then nd is removed from  D⊃and reinserted into Q (lines 77 and 78) in a way the sorting of Q in descending order according to  pnodes()is maintained (INSERTSORTED). This re-insertion is plausible since there is no more evidence of nd (which is a non-minimal diagnosis w.r.t. the last-but-one DPI) being a non-minimal diagnosis w.r.t. the current DPI (non-minimal diagnoses might become minimal diagnoses by the addition of test cases, cf. Section 12.4.3 and Proposition 12.5).

Otherwise, nd remains an element of the set of non-minimal diagnoses  D⊃w.r.t. the current DPI as D✓comprises exclusively minimal diagnoses w.r.t. the current DPI and one of these is a proper subset of nd.

In lines 79-80, all elements in  D✓, each of which is a minimal diagnosis w.r.t. the current DPI, are added to Q in a way the sorting of Q in descending order according to  pnodes()is maintained.

Remark 12.6 Please notice that the elements of  D✓, although they are known to be minimal diagnoses w.r.t. the current DPI, are not directly added to the set of found leading diagnoses  Dcalcw.r.t. the current DPI, but to Q. The reason for this is that there might be (not-yet-found) minimal diagnoses w.r.t. the current DPI (nodes in Q or successor nodes thereof) which were not minimal diagnoses w.r.t. the last-but-one DPI (and thus are no elements of  D✓) that have a higher probability as per  pnodes()than elements of  D✓. For instance, such diagnoses might have been added to Q from the set  D⊃in line 77.

In this way, since always the first (and most probable) node in Q is processed next, a guarantee is given that  Dcalcalways comprises the  |Dcalc|most probable minimal diagnoses w.r.t. the current DPI as per  pnodes(). The knowledge of the validity of minimal diagnoses in  D✓w.r.t. the current DPI is however not forgotten, but exploited in line 12 (i.e. no call of DLABEL and QX is necessary for a node in  D✓to be added to  Dcalc), as elucidated in ’The main loop’ above.

12.3 Illustrating Examples

In this section we will give two examples of how interactive KB debugging using DYNAMICHS (Algorithm 5 with parameter mode = dynamic) works. The first one will show the similarities and differences between the usage of DYNAMICHS (within Algorithm 5) and HS (within Algorithm 3) since it will depict the application of STATICHS on the same example DPI (see Table 15.3) that was used to show the functionality of HS in examples 4.8 and 4.9. At the same time, the first example will provide evidence that solving the problem of Interactive Dynamic KB Debugging can be less efficient than solving the problem of Interactive Static KB Debugging in terms of the number of query answers required from an interacting user. This will be discussed in more detail in Chapter 13.

The second example is supposed to deepen the reader’s understanding of the way DYNAMICHS works. To this end, the example DPI provided by Table 4.2 will be used which constitutes a significantly harder (interactive) debugging task than the DPI investigated in the first example. This example will involve the construction of a relatively large hitting set tree in the first iteration of DYNAMICHS (which behaves very similarly to STATICHS as well as HS and constructs the same wpHS-tree as these methods), but will then show the power of the tree pruning that can be exploited in Interactive Dynamic KB Debugging in that the tree will shrink rapidly after the addition of test cases. Hence, this example will emphasize the advantage of the decision to search for a solution of Interactive Dynamic KB Debugging rather than for a solution of Interactive Static KB Debugging (more on that in Chapter 13).

Notice that, in the following examples, whenever some tuple or list occurs in an expression using set operators, it is interpreted as a set.

Example 12.1 In this example we assume that the author (called user throughout this example) of the (admissible) DPI  ⟨K, B, P, N ⟩Rgiven by Table 15.3 applies Algorithm 5 with mode = dynamic to interactively debug  ⟨K, B, P, N ⟩R. Further, the same scenario and parameter settings as in Example 11.1 are supposed. That is,  nmin = nmax = 2(notice that the time limit t is irrelevant in this case), q := 1 (cf. Chapter 8), qsm() is equal to any query selection measure described in Section 9.3,  pK(ax) := c < 0.5for all  ax ∈ K, i.e. all formula fault probabilities are specified to be equal (to some constant c) and  σ := 0.

The tree constructed and parameters computed and used by Algorithm 5 using DYNAMICHS are visualized by Figures 12.1 and 12.2. We use the same notation as in Figures 4.2, 4.3, 11.1, 11.2 and 11.3 which is described in Examples 4.8, 4.9, 11.1 and 11.2.

In the first iteration, i.e. during the execution of the first call of DYNAMICHS during Algorithm 5, the root node (initially the empty set) is labeled by the minimal conflict set  ⟨1, 2, 5⟩w.r.t.  ⟨K, B, P, N ⟩Rand three successor nodes, namely  nd1 := [1], nd2 := [2]as well as  nd3 := [5]with  nd1.cs = nd2.cs =nd3.cs = [⟨1, 2, 5⟩], are added to the queue of open nodes Q. Since all formulas have been assigned an equal fault probability, DYNAMICHS conducts a breadth-first tree construction (as displayed by the numbers i⃝that give the order of node labeling). That is, Q in this case is a first-in-first-out queue. In this vein, first [1] and then [2] are identified as minimal diagnoses w.r.t. the given DPI.

Since  Dcalc = {[1], [2]}has a cardinality of  nmin = nmax = 2, the stop criterion of DYNAMICHS causes it to terminate and return  ⟨Dcalc, Q, Ccalc, Q, D×, D⊃, Qdup⟩ = ⟨ {[1], [2]}, [[5]], {⟨1, 2, 5⟩}, ∅, ∅, []⟩, as shown in the upper right column in Figure 12.1.

Then, in Algorithm 5, outside of the DYNAMICHS procedure, the first query  Q1 = {E → ¬A}is computed from the leading diagnoses set {[1], [2]}. The q-partition  P(Q1)associated with  Q1is ⟨{[1]} , {[2]} , ∅⟩. The user’s answer  u(Q1)to  Q1is then false. Thence, the set  Doutis calculated from P(Q1)as  D+(Q1) = {[1]}(due to negative answer, cf. Remark 7.4), deleted from  D✓ := D✓ ∪ Dcalcto yield  D✓ = {[2]}and added to  D×to yield  D× = {[1]}. Now, the set  D✓corresponds to the set of all computed (i.e. added to  Dcalc) minimal diagnoses w.r.t. the last-but-one DPI  ⟨K, B, P, N ⟩Rthat are minimal diagnoses w.r.t. current DPI  ⟨K, B, P, N ∪ {Q1}⟩R, i.e. that satisfy the most recently answered query  Q1. The set  D×comprises all computed (i.e. added to  Dcalc) minimal diagnoses w.r.t. the last-but-one DPI  ⟨K, B, P, N ⟩Rthat are not minimal diagnoses w.r.t. current DPI  ⟨K, B, P, N ∪ {Q1}⟩R, i.e. that do not satisfy the most recently answered query  Q1.

These sets  D✓and  D×along with the collections  Q, Qdup, D⊃and  Ccalcwhich are unmodified outside of DYNAMICHS are used as input arguments for the second call of DYNAMICHS. Notice that, in Figures 12.1 and 12.2, the resulting values of operations performed within DYNAMICHS are given in the righthand column above the dashed line whereas values computed outside of DYNAMICHS are given below the dashed line.

The execution of the second call of DYNAMICHS starts with a call of the UPDATETREE function. The purpose of this function is to transform the hitting set tree T that was constructed by the first call of DYNAMICHS into an updated hitting set tree  T ′. Whereas the tree T was used to locate minimal diagnoses w.r.t. the last-but-one DPI  ⟨K, B, P, N ⟩R, the modified tree  T ′should serve to generate minimal diagnoses w.r.t. the current DPI  ⟨K, B, P, N ∪ {Q1}⟩R. The parameters  D✓, D×, Q, Qdup, D⊃and Ccalcthat represent the tree T (given at the top of the lefthand column in Figure 12.1), where  D✓ ∪ D×is equal to the set  Dcalcproduced by the first call of DYNAMICHS, are i.a. given as input arguments to the UPDATETREE function.

As a first step within UPDATETREE, a redundancy check is performed for each diagnosis in  D×. In this case  D× = {D1}since  D1is the only minimal diagnosis that has been ruled out by the most recently added negative test case  Q1. The purpose of the redundancy check is to figure out whether  D1is redundant w.r.t. the current DPI and must be pruned or whether it might be extended to become a minimal diagnosis w.r.t. the current DPI.

First, the Quick Redundancy Check (QRC) QX(⟨{2, 5} , B, P, N ∪ {Q1}⟩R) = ⟨2, 5⟩(line 50 in DYNAMICHS) is executed for  D1which detects (line 52 in DYNAMICHS) that  D1(and possibly some further nodes) is redundant and can be pruned. This holds since the minimal conflict set  ⟨1, 2, 5⟩w.r.t. the last-but-one DPI  ⟨K, B, P, N ⟩Ris not a minimal conflict set w.r.t. the current DPI  ⟨K, B, P, N ∪{Q1}⟩Rbecause  ⟨2, 5⟩returned by QX is already a minimal conflict set w.r.t. the current DPI (cf. Proposition 4.9). We call the minimal conflict set  ⟨2, 5⟩a witness of redundancy for  D1. Hence, all branches in the hitting set tree starting from the outgoing edge of  ⟨1, 2, 5⟩labeled by 1 can be safely deleted from all collections representing the new tree  T ′(warranted that all minimal diagnoses w.r.t. the current DPI can still be generated from the pruned tree  T ′).

Please notice that the QRC involves only a single call of QX using a KB of a size (here: 2) that is generally significantly smaller than |K| (here: 7) which is roughly the size of the KB used in calls of QX made in the DLABEL function. Hence, the QRC will be usually very fast.

An illustration why  ⟨2, 5⟩“replaces”  ⟨1, 2, 5⟩as a minimal conflict set w.r.t. the current DPI can be given as follows: First,  ⟨1, 2, 5⟩is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ras it is a set-minimal subset of K that entails  {¬A} = n1 ∈ N, there is no other negative test case in N except for  n1and there is no proper subset  C′of  ⟨1, 2, 5⟩where  C′ ∪ B ∪ UPviolates any  r ∈ R(see example 4.2 for a detailed explanation). Second, formula 2 implies in particular  E → Ywhich, along with formula  5 (Y → ¬A), yields  E → ¬A. As the negative answer to  Q1is equivalent to postulating that  {E → ¬A}must not be entailed by the KB desired by the user, we have that  ⟨2, 5⟩is a conflict set w.r.t.  ⟨K, B, P, N ∪ {Q1}⟩R. As neither {2} nor {5} is a invalid KB w.r.t.  ⟨·, B, P, N ∪ {Q1}⟩R(cf. Corollary 4.1 and Definition 4.1), we have that  ⟨2, 5⟩is a minimal conflict set w.r.t.  ⟨K, B, P, N ∪ {Q1}⟩R.

Because the QRC has been successful, yielding some witness of redundancy of  D1, the Complete Redundancy Check (CRC) is no more necessary and the collections  Qdup, Q, D×as well as  D⊃are processed by the PRUNE and PRUNEQDUP functions, respectively, which involve the removal of all nodes in these collections that are redundant due to the witness  ⟨2, 5⟩. In other words, all nodes are eliminated which correspond to a path in the tree that includes a node label  Cold ⊃ ⟨2, 5⟩and the label e of the outgoing edge of  Coldon this path is an element of  Cold \ ⟨2, 5⟩. Moreover, all the supersets of  ⟨2, 5⟩in  Ccalc(here, only  ⟨1, 2, 5⟩) are replaced by  ⟨2, 5⟩since they are not minimal conflict sets anymore (ADDSETDELSUPSETS).

The pruning of nodes is expressed by dashed arrows in the pictures labeled by ’Updated Tree’ in Figures 12.1 and 12.2 where the location of cutting a branch is marked by a crossline at the shaft of a dashed arrow. Furthermore, the elements of “old” minimal conflict sets that are no more elements of known (i.e. already computed) current minimal conflict sets are crossed out. As shown by the picture ’Updated Tree’ in the righthand column of Figure 12.1,  D1is the only removed node during the pruning steps using the witness of redundancy  ⟨2, 5⟩.

Since  D⊃ = ∅, UPDATETREE directly jumps to the last three lines where all elements of  D✓are readded to Q in sorted order (but at the same time remain elements of  D✓). In the figure, this is displayed by the Q1=⇒pointing to a question mark (which stands for an open node) instead of a checkmark as in the case of the STATICHS algorithm. Notice that, although it is a fact that all elements of  D✓are minimal diagnoses w.r.t. the current DPI, this step is necessary in order to make sure the set  Dcalcreturned by any call of DYNAMICHS actually comprises the  |Dcalc|most probable minimal diagnoses w.r.t. the current DPI. For, there might be, for instance, some node that is a non-minimal diagnosis w.r.t. the last-but-one DPI (and is thus not an element of  D✓), but becomes a minimal diagnosis w.r.t. the current DPI and has a higher probability than some node in  D✓. Additionally, we want to point out that no calls of the DLABEL procedure are needed for diagnoses in  D✓as we know their label must be valid. This is reflected by the test in line 8 in DYNAMICHS.

In the figure, all the updated collections  D⊃, Ccalc, Qas well as  Qdup, after being processed by UPDATETREE are shown at the bottom of fields labeled by UPDATETREE. We want to remark that  D×is always the empty set at the end of the execution of UPDATETREE since each node in  D×gets either pruned or is reinserted into Q as an open node. These updated collections represent the new pruned hitting set tree that can be further constructed in order to detect all and only minimal diagnoses w.r.t. the current DPI  ⟨K, B, P, N ∪{Q1}⟩R. Note that the actions carried out by UPDATETREE take place between steps 4⃝and 5⃝.

The expansion of this tree during the repeat-loop in DYNAMICHS is depicted by the picture named ’Iteration 2’ in Figure 12.1. Namely, first (step 5⃝) the node [2] is directly labeled by valid (line 8) since it is a known minimal diagnosis w.r.t. the current DPI (as explained before). In the sixth step, [5] is labeled by the minimal conflict set  ⟨1, 2, 7⟩w.r.t. the current DPI and three further nodes ([5, 1], [5, 2] and [5, 7], all with  nd.cs = [⟨2, 5⟩ , ⟨1, 2, 7⟩]) are generated as successor nodes of [5] and are added to Q. Now, [5, 1] (first-in-first-out) is the foremost node in Q and is thus processed next and found to be a minimal diagnosis w.r.t. the current DPI. Therefore, DYNAMICHS terminates and returns i.a. the new set of leading diagnoses  Dcalc = {[2], [5, 1]}.

Please notice the difference here to Example 11.1 where the node {5, 1} never became part of Q in STATICHS due to the existence of a minimal diagnosis [1] w.r.t. the input DPI  ⟨K, B, P, N ⟩Rwhich is a proper subset of this node (and due to the fact that STATICHS must only consider minimal diagnoses w.r.t. the input DPI). In the current example, this node can only become relevant w.r.t. the current DPI if all (known) diagnoses (here, only [1]) that are proper subsets of it have already been pruned. It should now be clear to the reader why non-minimal nodes cannot be deleted for good as in STATICHS and why the set  D⊃is necessary in DYNAMICHS.

This leading diagnosis [5, 1] is also the reason why the second query  Q2 = {E → G}is different from the second query (Y → ¬A) calculated in Example 11.1.

The execution of the algorithm continues in an analogue manner as explained so far. In the following, we just want to explain some interesting aspects in the rest of its execution:

After the query  Q3 = {Y → ¬A}(the same query as the second query in Example 11.1) is answered negatively and  Q3is added to  N ′yielding the current DPI  ⟨K, B, P, N ∪ {Q1, Q2, Q3}⟩R, the UPDATETREE function not only prunes  [2] = D2 ∈ D×and adds  [5, 7] = D4 ∈ D✓to Q as we delineated above for the first query  Q1, but adds  [5, 2] ∈ D⊃to Q as well. The reason for that is the deletion of the minimal diagnosis [2] w.r.t. the last-but-one DPI  ⟨K, B, P, N ∪ {Q1, Q2}⟩Rwherefore the last evidence for the non-minimality of node [5, 2] has been deleted. Hence, the status of [5, 2] as a non-minimal diagnosis is no more justified wherefore it must be added to the queue to preserve the completeness of the algorithm w.r.t. the finding of all minimal diagnoses w.r.t. the current DPI. And, indeed, [5, 2] is identified as minimal diagnosis (D5) in iteration 4.

For each element of  D×during each execution of UPDATETREE throughout the execution of Algorithm 5, the Quick Redundancy Check (QRC) is successful. That is, each witness of redundancy used for pruning throughout the entire runtime of the algorithm could be determined very fast. Namely, as it is easy to see from line 50 in DYNAMICHS, the KB used in the call of QX in the QRC for some node nd has a size in  O((|nd| − 1)|Cmax|)where  Cmaxis the minimal conflict set of maximum cardinality in  Ccalc. In most of the cases,  |nd| ≪ |K|as well as  |Cmax| ≪ |K|will hold. The (usually more expensive) Complete Redundancy Check (CRC), which requires O(|nd|) calls to QX with a KB of size  O(|Cmax| − 1), is thus never employed.

In this example, the same minimal diagnosis [5, 7] is used to compute the finally returned solution KB as in Example 11.1. The only difference between both outputs is that the KB  (K \ [5, 7]) ∪ Q4returned by DYNAMICHS in this example contains the new positive test case  Q4 ∈ P′. The output by STATICHS in Example 11.1 does not contain any newly specified positive test case in  P′(cf. Remark 9.9), just the union of the “original” positive test cases in P (apart from that, there is not even a newly specified positive test case in Example 11.1).

In spite of finding the same solution diagnosis, STATICHS requires fewer queries than DYNAMICHS. Notably, DYNAMICHS even needs a proper superset of the queries asked by STATICHS (Q1, Q2in Example 11.1 are equal to  Q1, Q3in our current example) in this case. Such a proposition however cannot be made in general since the queries formulated by STATICHS generally differ from those formulated by DYNAMICHS. In this vein, it might just as well be the case that it takes DYNAMICHS fewer queries to finish than it takes STATICHS, due to its advantages in tree pruning.

All in all, the execution of Algorithm 5 in this example performs

2 full QX calls, i.e. calls of QX using the KB K\node for a node node that actually return a minimal conflict set (there are two minimal conflict sets labeled by C in Figures 12.1 and 12.2 which do not result from QRC, CRC or the minimality test of a conflict set in line 32 of DYNAMICHS),

4 fast QX calls, i.e. executions of QX within the scope of the QRC (one call of QX each for the QRC of  D1, D3, D2and  D5),

5 validity checks, i.e. calls of QX that return ’no conflict’ (one check for each of the five found minimal diagnoses where the identification of diagnoses  D2at step 5⃝, D2at step 9⃝, D4at step 14⃝and  D4at step 16⃝does not require any call to a reasoning service by means of  D✓, see line 8 in DYNAMICHS; notice that QX does only perform a single KB validity check by ISKBVALID in case it returns ’no conflict’, see Algorithm 1) and

4 tree update processes involving 4 pruned nodes (1 per tree update),

computes

5 minimal diagnoses (D1, D2, D4w.r.t. the input DPI and  D3and  D5w.r.t. some DPI resulting from the input DPI by addition of new test cases),

6 minimal conflict sets (⟨1, 2, 5⟩as well as  ⟨1, 2, 7⟩w.r.t. the input DPI and the subsets thereof ⟨2, 5⟩, ⟨2, 7⟩, ⟨5⟩and  ⟨7⟩w.r.t. some DPI resulting from the input DPI by addition of new test cases) and

4 queries and asks the user 4 logical formulas (1 per query)

and stores

a maximum of 4 nodes (where node refers to the internal representation of a node nd in DY- NAMICHS as a list of edge labels (nd) and a list of node labels (nd.cs) along a path from the root node to a leaf node).

image

Figure 12.1: (Example 12.1) Solving the problem of Interactive Dynamic KB Debugging (Problem Definition 6.1) for the example DPI given by Table 15.3 by means of Algorithm 5 and DYNAMICHS.

image

Figure 12.2: (Example 12.1 continued) Solving the problem of Interactive Dynamic KB Debugging (Problem Definition 6.1) for the example DPI given by Table 15.3 by means of Algorithm 5 and DYNAMICHS.

Example 12.2 Let us now consider the (admissible) DPI  ⟨K, B, P, N ⟩Rgiven by Table 4.2. We assume an expert (called user throughout this example) in the domain Dom modeled by K who wants to find a solution to Interactive Dynamic KB Debugging for the given DPI  ⟨K, B, P, N ⟩Rby means of Algorithm 5 with mode = dynamic. Further, the same scenario and parameter settings as in Example 11.2 are supposed. That is,  nmin = nmax = 3(notice that the time limit t is irrelevant in this case), q := 1 (cf. Chapter 8), qsm() is equal to any query selection measure described in Section 9.3,  p �K∪K : �K∪K → [0, 1]is given such that  pK(ax)for  ax ∈ Kresulting from the application of GETAXIOMSPROBS is as given by Table 11.1 and  σ := 0.

The tree constructed and parameters computed and used by Algorithm 5 using DYNAMICHS are visualized by Figures 12.3 and 12.4. We use the same notation as in Figures 4.2, 4.3, 11.1, 11.2, 11.3, 12.1 and 12.2 which is described in Examples 4.8, 4.9, 11.1, 11.2 and 12.1.

After the initialization of variables, Algorithm 5 calls the function GETFORMULAPROBS in line 5 which exploits  p �K∪K()to calculate the function  pK()giving the fault probabilities of formulas in K (cf. Sections 4.6.1, 9.2 and Example 4.7).

Then, DYNAMICHS is called for the first time, resulting in the hitting set tree given in the first picture in Figure 12.3. As outlined by the numbers i⃝indicating at which point in time a node is labeled, the root node (initially the empty set) is labeled first by  C1 := ⟨1, 2, 5⟩and three successor nodes, namely nd1 := [1], nd2 := [2]as well as  nd3 := [5]with  nd1.cs = nd2.cs = nd3.cs = [⟨1, 2, 5⟩], are added to the queue of open nodes Q. Contrary to Example 12.1, where the tree was built up in breadth-first order, in this example the formula probabilities  p() := pK()given by Table 11.1 are used to assign a probability pnodes(n)to each path n in the tree starting from the root node (cf. Formula 4.6 and Definition 4.9). In this vein, the node corresponding to the outgoing edge of  C1labeled by the formula with the largest fault probability among all formulas in  C1is processed next. That is, the node [1] with  pnodes([1]) =0.41 (as opposed to the nodes [2] and [5] with 0.25 each) is labeled next. The DLABEL procedure, after checking whether [1] is a non-minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R(check is negative), computes another minimal conflict set  C2 := ⟨2, 4, 6⟩such that  [1] ∩ C2 = ∅ (C2is not hit by the node [1]) to constitute a label for node [1]. The successor nodes [1, 2], [1, 4] and [1, 6] of [1] are generated and added to the list Q in a way that the sorting of Q in descending order of  pnodes()is maintained.

Since [1, 4] (0.28) as well as [1, 6] (0.27) have a larger probability (as per  pnodes()) than the nodes [2] (0.25) and [5] (0.25), Q is given by [[1, 4], [1, 6], [2], [5], [1, 2]] when it comes to the processing of the next node. Since DYNAMICHS always treats the first node of Q next, it identifies the first minimal diagnoses D1 := [1, 4]and  D2 := [1, 6]w.r.t.  ⟨K, B, P, N ⟩Rat steps 3⃝and 4⃝, respectively. At step 5⃝, when node [2] is processed, a minimal conflict set  C3 := ⟨1, 3, 4⟩is computed and set as a label for [2], giving rise to the generation of three further nodes [2, 1], [2, 3] and [2, 4], all with  ndi.cs = [⟨1, 2, 5⟩ , ⟨1, 3, 4⟩].

However, notice that not all of these new nodes are added to Q, contrary to STATICHS (cf. Example 11.2). For, there is already a node [1, 2] corresponding to the set {1, 2} in Q. Due to the test performed in line 20, this duplicate node [2, 1] is assigned to the list  Qdupwhich is expressed in the figure by dup. Since diagnoses are sets, not lists,  [1, 2, ax 1, . . . , ax k]and  [2, 1, ax 1, . . . , ax k]constitute one and the same diagnosis and it is irrelevant whether the one or the other is found. Hence, the nodes [1, 2] and [2, 1] are regarded as duplicates. Nevertheless,  ndi := [2, 1](with  ndi.cs = [⟨1, 2, 5⟩ , ⟨1, 3, 4⟩]) must not be completely deleted as it might be the case that (some successor node of)  ndj := [1, 2](with ndj.cs = [⟨1, 2, 5⟩ , ⟨2, 4, 6⟩]) becomes redundant due to the eventual addition of some test case. For example, in case the reason for the redundancy of  ndjis given (only) by a witness of redundancy that is a subset of  ⟨2, 4, 6⟩, ndjis pruned and replaced by the node  ndiwhich is still non-redundant.

Thence, only [2, 3] and [2, 4] are added to Q as successor nodes of the processed node [2]. Next, the minimal conflict set  C2 = ⟨2, 4, 6⟩is reused (lines 30-40 in DLABEL) as a label for node [5] with pnodes([5]) = 0.25and the three new nodes [5, 2], [5, 4] as well as [5, 6] are generated and assigned to Q at step 7⃝. Then, the fourth minimal conflict set  C4 := ⟨1, 5, 6, 8⟩is computed to label the node [2, 4] with pnodes([2, 4]) = 0.18and the four new nodes [2, 4, 1], [2, 4, 5], [2, 4, 6] as well as [2, 4, 8] are generated and assigned to Q st step 8⃝. At step 9⃝, the third minimal diagnosis  D3 := [5, 4]w.r.t.  ⟨K, B, P, N ⟩Ris eventually found and added to  Dcalcwhich now has reached a cardinality of  3 = nmin = nmaxwherefore DYNAMICHS stops and returns i.a. the set of leading diagnoses  Dcalc = {[1, 4], [1, 6], [5, 4]}. The returned values are given in the lefthand column in Figure 12.3.

As in Example 11.2, where a debugging session for the same DPI using STATICHS is presented, the first query  Q1is computed as  {B ⊑ K}and answered by true by the user. The assignment of  Q1to the positive test cases of the DPI  ⟨K, B, P, N ⟩Rbrings the opportunity to perform some significant pruning actions (within the function UPDATETREE called at the beginning of the second call of DYNAMICHS). These are shown in the tree with the caption ’Updated Tree’ and in the righthand column in Figure 12.3.

As a first step within UPDATETREE, a redundancy check is performed for each diagnosis in  D×. In this case  D× = {D3} = {[5, 4]}since  D3is the only minimal diagnosis that has been ruled out by the most recently added positive test case  Q1. The purpose of the redundancy check is to figure out whether D3is redundant w.r.t. the current DPI and must be pruned or whether it might be extended to become a minimal diagnosis w.r.t. the current DPI.

First, the Quick Redundancy Check (QRC) QX(⟨{1, 2, 6} , B, P ∪ {Q1} , N ⟩) = ⟨1⟩(line 50 in DY- NAMICHS) is executed for  D3where the KB {1, 2, 6} used in this call of QX is obtained by deletion of node := D3from the union of all conflict sets (the elements of node.cs) along the path that corresponds to  D3, i.e.  {1, 2, 6} = (⟨1, 2, 5⟩ ∪ ⟨2, 4, 6⟩) \ [5, 4]. By means of the QRC it is figured out (line 52 in DY- NAMICHS) that  D3(and possibly some further nodes) is redundant and can be pruned. This holds since the minimal conflict set  ⟨1, 2, 5⟩w.r.t. the last-but-one DPI  ⟨K, B, P, N ⟩Ris not a minimal conflict set w.r.t. the current DPI  ⟨K, B, P ∪{Q1} , N ⟩Rbecause  ⟨1⟩returned by QX is already a minimal conflict set w.r.t. the current DPI (cf. Proposition 4.9). We call this minimal conflict set  ⟨1⟩a witness of redundancy for  D3. Hence, all branches in the hitting set tree starting from an outgoing edge of  ⟨1, 2, 5⟩labeled by 2 or by 5 can be safely deleted from all collections storing nodes in DYNAMICHS.

An illustration why  ⟨1⟩“replaces”  ⟨1, 2, 5⟩as a minimal conflict set w.r.t. the current DPI can be given as follows: First,  ⟨1, 2, 5⟩is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ras it is a set-minimal subset of K that entails  {A ⊑ K} = n1 ∈ Nand there is no proper subset  C′of  ⟨1, 2, 5⟩where  C′ ∪ B ∪ UPviolates any  r ∈ Ror entails any  n ∈ N(see example 4.3 for a detailed explanation). Second, considering the current DPI  ⟨K, B, P ∪ {Q1} , N ⟩R, we have that  ⟨1, 2, 5⟩ ∪ B ∪ UP∪{Q1} |= n1, too. However, {2, 5} = {B ⊑ G, G ⊑ K} |= {B ⊑ K} = Q1implies that  B ∪ UP∪{Q1} ⊇ Q1can replace the subset {2, 5} of the conflict set  ⟨1, 2, 5⟩. For, formula  1 (A ⊑ B) along with  Q1 (B ⊑ K) already entails n1. Further,  B ∪ UP∪{Q1}cannot violate any negative test case  ni ∈ Nor requirement  rj ∈ Rby the admissibility of the input DPI  ⟨K, B, P, N ⟩R, the fact that  Q1is a query, Corollary 7.3, Definition 3.6 and Proposition 3.4. Thus, by Definition 4.1,  ⟨1⟩is in fact a minimal conflict set w.r.t. the current DPI ⟨K, B, P ∪ {Q1} , N ⟩R.

Now, the first nice thing at this point is that  ⟨1⟩is not only a witness of redundancy of nodes nd where ⟨1, 2, 5⟩ ∈ nd.cs, but of each nd (in the tree or in the set  Qdupof duplicate nodes) where nd.cs contains a conflict set that is a proper superset of  ⟨1⟩. That is,  ⟨1⟩also replaces  ⟨1, 3, 4⟩as well as  ⟨1, 5, 6, 8⟩. This implicates that two outgoing edges (those labeled by 2 or 5) of  ⟨1, 2, 5⟩, two outgoing edges (those labeled by 3 or 4) of  ⟨1, 3, 4⟩and three outgoing edges (those labeled by 5, 6 or 8) of  ⟨1, 5, 6, 8⟩can be pruned.

The second nice thing that has an even more significant bearing on tree pruning than the first thing is that  ⟨1⟩is a witness of redundancy of the conflict set that labels the root node. That is, pruning can take place at the very top of the tree and two of three subtrees rooted at successor nodes of the root node can be pruned. That is, for instance, within the rightmost subtree of the root node in the picture with caption ’Updated Tree’ in Figure 12.3 no pruning is possible at all since the conflict set  ⟨2, 4, 6⟩labels the root node of this subtree and  ⟨1⟩is not a subset of  ⟨2, 4, 6⟩. However, this subtree is still redundant since it is connected with the root node by a “redundant” edge labeled by 5. As a consequence, we can observe the pruning of a total of 9 nodes (of altogether 12 nodes in the tree) in only one execution of UPDATETREE.

Now, to receive an impression of the power of tree pruning in DYNAMICHS, the reader is invited to compare the trees used in iterations 2 and 3 in the current example (the bottom left pictures in Figure 12.3 and Figure 12.4) with the trees used in iterations 2 and 3 in Example 11.2 (the bottom picture in Figure 11.2 and the picture in Figure 11.3) which deals with the debugging of the same DPI (just by means of STATICHS instead of DYNAMICHS), uses the same sets of leading diagnoses in each iteration, thus the same queries, and of course the same user (that gives the same answers in both examples).

After all diagnoses of  D✓are added to Q as a final action within UPDATETREE, the repeat-loop of the second iteration of DYNAMICHS is entered. Here, the minimal diagnoses  D1 (pnodes(D1) = 0.28, step 11⃝), D2 (0.27, 12⃝) and  D4 (0.09, 13⃝) are found and assigned to the empty set  Dcalcbefore DYNAMICHS terminates again. Notice that only one call of the DLABEL procedure is required in the second iteration (for node [1, 2]) due to the test in line 8 of DYNAMICHS which is positive for  D1and  D2(since  D1, D2 ∈D✓).

Once the second query  Q2 = {B ⊑ ∃r.F}is added to the positive test cases resulting in the DPI ⟨K, B, P ∪ {Q1, Q2} , N ⟩R, the UPDATETREE function causes the pruning of two further nodes (D2 =[1, 6] and  D4 = [1, 2]) leading to the continuance of only a single node (D1 = [1, 4]) in the memory of DYNAMICHS (see the picture with caption ’Updated Tree’ in Figure 12.4). The reason for this is that Q2can “replace” the part  {2, 6} = {B ⊑ G, G ⊑ ∃r.F}(which entails  Q2) of the minimal conflict set ⟨2, 4, 6⟩w.r.t. the last-but-one DPI  ⟨K, B, P ∪ {Q1} , N ⟩Rsuch that  ⟨2, 4, 6⟩ \ {2, 6} = ⟨4⟩is already a minimal conflict set w.r.t. the current DPI  ⟨K, B, P ∪ {Q1, Q2} , N ⟩R(cf. the analysis of the minimal conflict set  C2 = ⟨2, 4, 6⟩in Example 4.3).

Since, by now, all minimal conflict sets  ⟨1, 2, 5⟩, ⟨2, 4, 6⟩, ⟨1, 5, 6, 8⟩as well as  ⟨1, 3, 4⟩w.r.t. the input DPI  ⟨K, B, P, N ⟩Rhave “shrunk” as much as to constitute only two different set-minimal sets  ⟨1⟩and ⟨4⟩, it is clear by Proposition 4.6 that there can be only a single minimal diagnosis [1, 4] w.r.t. the current DPI  ⟨K, B, P ∪ {Q1, Q2} , N ⟩R. Therefore, the third iteration of DYNAMICHS terminates due to Q = [] and returns the singleton set  Dcalc = {[1, 4]}. Consequently, the probability  pD([1, 4]) = 1wherefore Algorithm 5 also stops executing and returns  (K \ [1, 4]) ∪ p1 ∪ Q1 ∪ Q2as the (exact) solution to the Interactive Dynamic KB Debugging problem for the DPI  ⟨K, B, P, N ⟩R.

The advantage of DYNAMICHS in this example over STATICHS in Example 11.2 in iterations 2 and 3 is that the pruning of nodes lets the algorithm automatically focus on the still relevant (i.e. non-redundant) parts of the tree. STATICHS, on the other hand, is doomed to spend most of the execution time for investigating nodes that turn out to be already invalidated by some specified test case(s). As already mentioned in Example 11.2, the inability of STATICHS to “early-prune” incomplete branches of the tree is especially unfavorable in the last iteration of STATICHS in case  σ = 0since all irrelevant minimal diagnoses w.r.t. the input DPI must first be computed before they can be ruled out.

This immense upside of DYNAMICHS over STATICHS (see the analysis in the end of Example 11.2) also finds expression in the quantitative analysis of this example given next. All in all, the execution of Algorithm 5 in this example performs

4 full QX calls, i.e. calls of QX using the KB K\node for a node node that actually return a minimal conflict set (there are four minimal conflict sets labeled by C in Figures 12.3 and 12.4 which do not result from QRC, CRC or the minimality test of a conflict set in line 32 of DYNAMICHS),

2 fast QX calls, i.e. executions of QX within the scope of the QRC (one call of QX each for the QRC of  D3and  D2),

4 validity checks, i.e. calls of QX that return ’no conflict’ (one check for each of the four found minimal diagnoses where the identification of diagnoses  D1at step11⃝, D2at step12⃝and  D1at step 15⃝does not require any call to a reasoning service by means of  D✓, see line 8 in DYNAMICHS; notice that QX does only perform a single KB validity check by ISKBVALID in case it returns ’no conflict’, see Algorithm 1) and

2 tree update processes involving 11 pruned nodes (9 nodes during the first update between steps 10⃝and11⃝and 2 nodes during the second between steps14⃝and15⃝),

computes

4 minimal diagnoses (D1, D2, D3and  D4, all w.r.t. the input DPI),

6 minimal conflict sets (⟨1, 2, 5⟩, ⟨2, 4, 6⟩, ⟨1, 3, 4⟩and  ⟨1, 5, 6, 8⟩w.r.t. the input DPI and the subsets thereof  ⟨1⟩and  ⟨4⟩w.r.t. some DPI resulting from the input DPI by addition of new test cases) and

2 queries and asks the user 2 logical formulas (1 per query)

and stores

a maximum of 12 nodes (where node refers to the internal representation of a node nd in DY- NAMICHS as a list of edge labels (nd) and a list of node labels (nd.cs) along a path from the root node to a leaf node).

Finally, we want to emphasize that, in all executions of UPDATETREE throughout this example, the usually very efficient QRC was successful right off and the usually more time-consuming CRC was never required.

image

Figure 12.3: (Example 12.2) Solving the problem of Interactive Dynamic KB Debugging (Problem Definition 6.1) for the example DPI given by Table 4.2 by means of Algorithm 5 and DYNAMICHS.

image

Figure 12.4: (Example 12.2 continued) Solving the problem of Interactive Dynamic KB Debugging (Problem Definition 6.1) for the example DPI given by Table 4.2 by means of Algorithm 5 and DYNAMICHS.

12.4 Algorithm Details and Correctness

In this section we will discuss DYNAMICHS in a detailed way and give proofs of its completeness and soundness. To this end, we first give some definitions and some hints regarding the notation used in this section.

12.4.1 Definitions and Notation

The DYNAMICHS algorithm will require a different storage of nodes than STATICHS and Algorithm 2 since it will not interpret different branches with the same set of edge labels in the hitting set tree to be equivalent. So, DYNAMICHS, as opposed to STATICHS and Algorithm 2, will not discard any branch that is a duplicate branch in terms of its edge labels. Instead, a set storing these duplicate branches will be consulted each time a branch is found to be “redundant” and thus needs to be pruned. This strategy enables the substitution of a “redundant” branch by a “non-redundant” branch featuring an equal set of edge labels.

That is why a node nd in (the hitting set tree produced by) DYNAMICHS corresponds to the ordered list of edge labels visited when traversing a path from the root node to some leaf node. As an attribute of nd, nd.cs corresponds to the ordered list of node labels visited when traversing a path from the root node to some leaf node.

Definition 12.1. Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Let further  P′′1 , . . . , P′′kand  N ′′1 , . . . , N ′′ksuch that P′′j ⊆ P′and  N ′′j ⊆ N ′for  j ∈ {1, . . . , k}. Then we define in DYNAMICHS

image

where each node nd stores as an attribute

the (ordered) list  nd.cs = [C1, . . . , Ck]such that  Cjis a minimal conflict set w.r.t.  ⟨K, B, P∪P′′j , N ∪N ′′j ⟩Rand  ax j ∈ Cjfor all  j ∈ {1, . . . , k}corresponding to the set of node labels on the path from the root node to nd.

Further, nd[i] refers to the i-th element in nd, i.e. to  ax i, and nd.cs[i] refers to the i-th element in nd.cs, i.e. to  Ci. Notice that conflict sets nd.cs[i] itself are (non-ordered) sets. Moreover, we define

• |nd|and |nd.cs| to denote the number of elements in the lists nd and nd.cs,

nd[i..k] := [nd[i], . . . , nd[k]] for i  ≤k and  |nd| ≥k,

nd.cs[i..k] := [nd.cs[i], . . . , nd.cs[k]] for i  ≤k and  |nd.cs| ≥k,

nodes nd and nd[i..k] appearing on the left or right side of expressions using the following set operators to be considered as (non-ordered) sets:  ⊃, ⊇, ⊂, ⊆, =, \

We call

nd[1..k] where (k < |nd|) k ≤ |nd|a (proper) subnode of nd and

 nd′′a successor (node) of  nd′iff  nd′is a proper subnode of  nd′′.

nd the same node as  nd′iff

image

 nd[i] = nd′[i]for i  ∈ {1, . . . , |nd|} and

 nd.cs[i] = nd′.cs[i]for i  ∈ {1, . . . , |nd|}.

Example 12.3 For instance, in line 20 of Algorithm 8, the test  nodee ∈ Qchecks whether there is some set nd in Q such that  nodeeand nd interpreted as sets are equal. That is,  nodee := {1, 3, 2}is equal to nd := {2, 1, 3} although the order of formulas is different and the ordered sets of conflict sets  nodee.csand nd.cs might be different as well. Another example of this interpretation of nodes as sets can be found in line 50 where  Und.cs \ ndrefers to the set difference of the union of all sets in nd.cs and the set nd. If, e.g.  Und.cs := {1, 2, 3, 4}and nd := {4, 2}, the result of this set difference is {1, 3} or, equivalently, {3, 1}.

On the other hand, if the operator is not one of those listed above, then node is interpreted as an ordered set. For example, consider line 19 where the ADD operator is used to append a logical formula e to the end of the ordered set of formulas node. Suppose, e.g. node := [3, 1, 2] and e := 4, then the result is [3, 1, 2, 4] which is not equal to [1, 2, 3, 4].

The following definition characterizes alternative paths in a hitting set tree produced by DYNAMICHS, i.e. different paths leading to the same (leaf) node in the tree.

Definition 12.2. Let nd and  nd′be nodes in DYNAMICHS such that

• |nd′| ≤ |nd|,

nd′ =nd[1..|nd′|]and

there is some  j ∈�1, . . . , |nd′|�with the property that  nd′[j] ̸= nd[j]or  nd′.cs[j] ̸= nd.cs[j].

Further, let ADD(L1, L2)be the function that outputs the list  [a1, . . . , an, b1, . . . , bm]given two lists L1 := [a1, . . . , an]and  L2 := [b1, . . . , bm]. Then we call

 nd′an alternative subnode of nd,

 nd′a proper alternative subnode of nd if  |nd′| < |nd|and

node where

image

In a context where  nd′is relevant, we call node the alternative equal node of nd constructed from nd′.

Regarded as a set, an alternative equal node node of some node nd is equal to nd. There is just at least one difference between node and nd with regard to the order of elements in nd as opposed to the order of elements in node or with regard to the (order of) elements in nd.cs as opposed to the (order of) elements in node.cs.

Example 12.4 Let nd := [1, 2, 3, 4] with  nd.cs := [⟨1, 2, 3⟩ , ⟨2, 6⟩ , ⟨3, 6, 7⟩ , ⟨4, 5⟩]. Then,  nd1 :=[2, 1] with  nd1.cs := [⟨1, 2, 3⟩ , ⟨1, 4⟩]as well as  nd2 := [3, 2, 1]with  nd2.cs := [⟨1, 2, 3⟩ , ⟨2, 6⟩ , ⟨1, 4⟩]are alternative subnodes of nd. To see that  nd1is an alternative subnode of nd, observe that the set-equality between  nd1 = [2, 1]and  nd[1..|nd1|] = [1, 2]holds and  2 = nd1[j] ̸= nd[j] = 1for j := 1 holds. Similarly, for  nd2, we have that the set equality between [1, 2, 3] and [3, 2, 1] holds and the elements on the j-th position for, e.g. j := 1, are different, i.e.  1 ̸= 3.

These alternative subnodes of nd can be used to construct the following alternative equal nodes of nd: The one obtained from  nd1is  node1 := [2, 1, 3, 4]with  node1.cs := [⟨1, 2, 3⟩ , ⟨1, 4⟩ , ⟨3, 6, 7⟩ , ⟨4, 5⟩]and the one obtained from  nd2is  node2 := [3, 2, 1, 4]with  node1.cs := [⟨1, 2, 3⟩ , ⟨2, 6⟩ , ⟨1, 4⟩ , ⟨4, 5⟩].

The following definition introduces the terminology that will be used throughout this section to refer to nodes in DYNAMICHS with certain properties.

Definition 12.3. In DYNAMICHS, a node nd with nd.cs is called

generated iff it is built in lines 18 and 19,

processed iff lines 6-15 have been executed for node := nd,

pruned iff

image

replaced iff it is found to be redundant in line 91 and some node  ndrep = ndis added to  S′in line 100

combined-replaced iff it is found to be redundant in line 112 and some node  ndcomb,rep = ndis added to  Dupnewin line 121

at any point in time during the execution of DYNAMICHS at any call to DYNAMICHS during the execution of Algorithm 5.

The node  ndrepis referred to as replacement node (of nd) and the node  ndcomb,repis referred to as combined replacement node (of nd).

12.4.2 The Labeling Function in DYNAMICHS

The following two lemmata provide an analysis of the DLABEL function and characterize the output given by this function independently of when it is called during the execution of Algorithm 5.

The first one analyzes the case where DLABEL returns valid or nonmin which means that the node for which DLABEL was called is a diagnosis or a non-minimal diagnosis w.r.t. the current DPI, respectively. Further on, it states that only diagnoses w.r.t. the current DPI can be stored in the set  Dcalcand only diagnoses for whose non-minimality there is evidence in terms of a diagnosis in  Dcalccan be labeled by nonmin.

Lemma 12.1. Let the DLABEL procedure be called at any point in time during the execution of DY- NAMICHS given i.a. some node node, some DPI  ⟨K, B, P, N ⟩R, some set of positive test cases  P′and some set of negative test cases  N ′as argument. Then the following holds:

(1) If DLABEL returns valid, node is a diagnosis w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

(2) During this execution of DYNAMICHS,  Dcalccomprises only diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

(3) If DLABEL returns nonmin, node is a non-minimal diagnosis w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪ N ′⟩R.

(4) At the time the label nonmin is returned for node, there is some diagnosis  D′w.r.t. the current DPI ⟨K, B, P ∪ P′, N ∪ N ′⟩Rsuch that  D′ ∈ Dcalcand  node ⊃ D′.

Proof. (1): Assume that DLABEL returns valid for node. Then, by Proposition 4.9, Remark 4.3, Corollary 3.3, Corollary 7.3 and the fact that the DPI  ⟨K, B, P, N ⟩Rused in DYNAMICHS as an input to DLA- BEL is the same DPI as the admissible one given as an input to Algorithm 5, node must be a diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. This proves proposition (1).

(2): This is a direct conclusion from proposition (1) and the facts that nodes labeled by valid are added to the set  Dcalcin line 13, at the beginning of the execution of DYNAMICHS,  Dcalc = ∅holds (line 3) and  Dcalcis modified only in line 13 throughout DYNAMICHS.

(3): At the beginning of the execution of DYNAMICHS,  Dcalc = ∅(line 3) and  Dcalcis modified only in line 13 throughout DYNAMICHS. In line 13, exactly those nodes are added to  Dcalcfor which the DLABEL function returns valid. By the correctness of proposition (1), only diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rcan be added to  Dcalc.

Now, assume DLABEL returns nonmin for node. Then, due to the fact that  Dcalccan only comprise diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand  node ⊃ D′for some  D′ ∈ Dcalcby line 27, node must be a non-minimal diagnosis w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

(4): This is a direct consequence of proposition (3).

The following lemma states that the set  Ccalcgiven as an input to DLABEL must include only minimal conflict sets, each w.r.t. the current DPI or some DPI including only a subset of the test cases the current DPI comprises. Moreover, it provides evidence that, in case DLABEL returns a set, this set is a minimal conflict set w.r.t. the current DPI which is not hit by the node given as input to DLABEL.

Lemma 12.2. Let the DLABEL procedure be called at any point in time during the execution of DY- NAMICHS given i.a. some node node, a set of sets  Ccalc, some DPI  ⟨K, B, P, N ⟩R, some set of positive test cases  P′and some set of negative test cases  N ′as argument. Then,

(1) each element in  Ccalcis a minimal conflict set w.r.t. some DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Rwhere P′′ ⊆ P′and  N ′′ ⊆ N ′and

(2) if DLABEL returns a set L, then this set L is a minimal conflict set w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand  node ∩ L = ∅.

Proof. (1): At the first call to DYNAMICHS,  Ccalc = ∅is given as an input argument to DYNAMICHS (lines 1 and 10 in Algorithm 5). The only places throughout DYNAMICHS where  Ccalcis modified are lines 39, 45 and 66. However, modifications to  Ccalcin lines 39 and 66 can only take place in case there is already some element in  Ccalc. That is, the first element must be added to  Ccalcin line 45.

In line 45, only minimal conflict sets w.r.t. some DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Rare added to  Ccalcwhere  P′′ ⊆ P′and  N ′′ ⊆ N ′since the call to DLABEL might have taken place during some prior execution of DYNAMICHS during the execution of Algorithm 5. In order to reach line 45, QX called with the DPI  ⟨K \ node, B, P ∪ P′′, N ∪ N ′′⟩Ras argument must not return ’no conflict’ (line 41). That is, a minimal conflict set  L ̸= ∅w.r.t.  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Ris computed in line 41 by Propostition 4.9, Remark 4.3, Corollary 7.3 and the fact that the DPI  ⟨K, B, P, N ⟩Rused in DYNAMICHS as an input to DLABEL is the same DPI as the admissible one given as an input to Algorithm 5.

In lines 39 and 66, the following is true: (*) Only minimal conflict sets that are proper subsets of elements already in  Ccalccan be added to  Ccalc. In the case of line 39, (*) is true due to the following reasons: In order to reach line 39, QX(⟨C, B, P ∪ P′′, N ∪ N ′′⟩R) = X ̸= Cmust hold for some element  C ∈ Ccalc. Since  Ccalcis never changed in Algorithm 5 between two calls to DYNAMICHS, Ccalccomprises only conflict sets w.r.t. the current DPI or previous DPIs (including fewer test cases than the current one). Moreover, a minimal conflict set C can only shrink after the addition of new test cases to the DPI for which it was computed by Proposition 12.1. Hence, the newly added element X must be a proper subset of the existing element C in  Ccalc. That X is a minimal conflict set w.r.t. the DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Rfollows from QX(⟨C, B, P ∪ P′′, N ∪ N ′′⟩R) = X, Propostition 4.9, Remark 4.3, Corollary 7.3 and the fact that the DPI  ⟨K, B, P, N ⟩Rused in DYNAMICHS as an input to DLABEL is the same DPI as the admissible one given as an input to Algorithm 5.

In the case of line 66, (*) is true due to the following reasons: Due to Lemmata 12.6 and 12.7, quickPC = true or completePC = true can only hold if X is a witness of redundancy of nd. By Definition 12.4, a witness of redundancy is a conflict set w.r.t. the current DPI which is a proper subset of some conflict set that has been used as a label in nd.cs. However, each label in nd.cs must be an element of  Ccalcdue to lines 30, 45 and 19.

(2): That, in case DLABEL returns a set L, it returns a minimal conflict set w.r.t. the current DPI is a consequence from the inference in the proof of proposition (1). We still need to show that  L ∩ node = ∅.

If DLABEL returns in line 46, we can derive from the fact that L is the output of the call QX(⟨K\node, B, P ∪ P′, N ∪ N ′⟩R), Proposition 4.9 and Definition 4.1 that  L ⊆ K \ nodewhich implies that L ∩ node = ∅.

If DLABEL returns in line 34 or line 40, then the return can be executed only if the check  C∩node = ∅is true in line 31. By the argumentation in the proof of proposition (1), for the returned set L it must hold that  L ⊆ C. Hence,  L ∩ node = ∅is satisfied.

As a simple conclusion from Lemma 12.2, we have that the argument X passed to the PRUNE function called within DLABEL is a minimal conflict set w.r.t. the current DPI:

Corollary 12.1. Assume the execution of some call to DYNAMICHS during the execution of Algorithm 5 using the current DPI DPI. Anytime PRUNE is called within DLABEL, the input X given to it is a minimal conflict set w.r.t. DPI.

Proof. Assume the execution of some call to DYNAMICHS during the execution of Algorithm 5 using the current DPI DPI. Then, Lemma 12.2 says that the set X returned in line 40 is a minimal conflict set w.r.t. DPI. Since X is not modified by any of the functions PRUNE and ADDSETDELSUPSETS, we obtain the proposition of this corollary.

From this we derive that the input X passed to PRUNEQDUP called within DLABEL must be a minimal conflict set w.r.t. the current DPI:

Corollary 12.2. Assume the execution of some call to DYNAMICHS during the execution of Algorithm 5 using the current DPI DPI. Anytime PRUNEQDUP is called within DLABEL, the input X given to it is a minimal conflict set w.r.t. DPI.

Proof. This corollary is a direct consequence of Corollary 12.1 and the fact that the argument X given to PRUNEQDUP is the same argument X that is given to PRUNEQDUP (none of these functions modifies X).

12.4.3 Impact of Answered Queries on Conflict Sets

After one call to DYNAMICHS in Algorithm 5 returns, the set  Dcalc(called  D✓in Algorithm 5) returned by DYNAMICHS is used as a set of leading diagnoses w.r.t. the current DPI in order to compute a query. After the answered query is incorporated into the DPI, a new call to DYNAMICHS for this new current DPI is made.

As we have learned from Lemmata 12.1 and 12.2, the new call to DYNAMICHS considers only minimal diagnoses and minimal conflict sets w.r.t. the new current DPI. Therefore, the next proposition investigates the impact of the addition of the answered query as a new test case on the set of minimal conflict sets w.r.t. the new current DPI. Concretely, it claims that the transition from a DPI to a new DPI extended by a test case does change the set of minimal conflict sets, that each (minimal) conflict set remains a (not necessarily minimal) conflict set and that minimal conflict sets cannot grow in size.

It is however important to notice that some “new” minimal conflict set might emerge in the course of this DPI-transition which is not in a subset-relationship with any existing minimal conflict set.

Proposition 12.1. Let D be a set of minimal diagnoses w.r.t.  ⟨K, B, P, N ⟩Rand  Q ∈ QD,⟨K,B,P,N⟩R. Further, let either  P′ = P ∪ {Q}or  N ′ = N ∪ {Q}. Then it holds that

image

(2) each conflict set w.r.t.  ⟨K, B, P, N ⟩Ris a conflict set w.r.t.  ⟨K, B, P′, N ′⟩R,

(3) each minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ris a conflict set w.r.t.  ⟨K, B, P′, N ′⟩R,

(4) there are no  C ∈ mC⟨K,B,P,N⟩Rand  C′ ∈ mC⟨K,B,P′,N ′⟩Rsuch that  C ⊂ C′,

(5) if there is a subset-relationship between  C ∈ mC⟨K,B,P,N⟩Rand  C′ ∈ mC⟨K,B,P′,N ′⟩R, then  C′ = Cor  C′ ⊂ C.

Proof. (1): Assume the opposite, namely that  mC⟨K,B,P,N⟩R = mC⟨K,B,P′,N ′⟩R. Then, by Proposition 4.6,  mD⟨K,B,P,N⟩R = mD⟨K,B,P′,N ′⟩Rmust be true. This however is a contradiction to Defini-tion 7.1 and the fact that Q is a query.

(2): Let C be a conflict set w.r.t.  ⟨K, B, P, N ⟩R. Then  C ∪ B ∪ UPviolates some  x ∈ R ∪ N. If P′ = P ∪ {Q}holds, then, by monotonicity of  L, C ∪ B ∪ UP∪{Q}violates some  x ∈ R ∪ N, i.e. C is a conflict set w.r.t.  ⟨K, B, P′, N ′⟩R. Otherwise, if  N ′ = N ∪ {Q}is given, then  C ∪ B ∪ UPviolates some x ∈ R ∪ N ⊂ R ∪ N ′, i.e. C is a conflict set w.r.t.  ⟨K, B, P′, N ′⟩R.

(3): This is a direct consequence of (2), since each minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ris a conflict set w.r.t.  ⟨K, B, P, N ⟩R.

(4): Since, by (3), each minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ris also a conflict set w.r.t.  ⟨K, B, P′, N ′⟩R, there cannot be a minimal conflict set  C′w.r.t.  ⟨K, B, P′, N ′⟩Rwhich is a proper superset of a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Ras this would imply non-minimality of  C′w.r.t. ⟨K, B, P′, N ′⟩R.

(5): This proposition is a direct consequence of (4).

Given the existence of some non-empty minimal conflict set w.r.t. an admissible DPI DPI, the extension of the test cases of DPI by a query yields a new DPI  DPI′for which all minimal conflict sets are non-empty:

Proposition 12.2. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that  ⟨K, B, P, N ⟩Ris admissible and  P′ ⊇ Pand  N ′ ⊇ Nand  |P′ ∪ N ′| = |P ∪ N | + 1. Let further  C ̸= ∅be a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩Rand  Q ∈ (P′ ∪ N ′) \ (P ∪ N )be a query w.r.t. some  D ⊆ mD⟨K,B,P,N⟩Rand ⟨K, B, P, N ⟩R. Then, for each minimal conflict set  C′w.r.t.  ⟨K, B, P′, N ′⟩Rit holds that  C′ ̸= ∅.

Proof. Assume there is some minimal conflict set  C′w.r.t.  ⟨K, B, P′, N ′⟩Rsuch that  C′ = ∅. This implies that there cannot be a minimal conflict set  C′′w.r.t.  ⟨K, B, P′, N ′⟩Rwhich is not the empty set because C′would be a proper subset of  C′′, which would be a contradiction to the minimality of  C′′.

Due to Corollary 7.3 and the fact that a query Q w.r.t. some  D ⊆ mD⟨K,B,P,N⟩Rand  ⟨K, B, P, N ⟩Ris added to  ⟨K, B, P, N ⟩Rin order to obtain  ⟨K, B, P′, N ′⟩R, we have that  ⟨K, B, P′, N ′⟩Rmust be admissible.

By Corollary 3.3, K cannot be valid w.r.t.  ⟨·, B, P, N ⟩Rsince  ∅cannot be a diagnosis w.r.t.  ⟨K, B, P, N ⟩Rby Proposition 4.6 and the fact that C is a non-empty minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R. From this we can infer that K cannot be valid w.r.t.  ⟨·, B, P′, N ′⟩Ras  P′ ⊇ Pand  N ′ ⊇ N.

Now, by Proposition 4.2, there must be some minimal conflict set w.r.t.  ⟨K, B, P′, N ′⟩Rwhich is not the empty set, contradiction.

12.4.4 Impact of Answered Queries on Diagnoses

Next, we analyze what influence answered queries that are added as new test cases to the current DPI have on the (minimal) diagnoses w.r.t. this DPI. The first lemma assures that each DPI constructed during the execution of Algorithm 5 must be admissible as a consequence of the postulated admissibility of the DPI given as an initial input to Algorithm 5.

Lemma 12.3. Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Then, the DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Ris admissible.

Proof. The admissibility of  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rfollows from the fact that  ⟨K, B, P, N ⟩Ris the (coercively) admissible input DPI of Algorithm 5, Corollary 7.3 which reveals that admissibility of a DPI is preserved under the addition of a query to the test cases of the DPI and the fact that  P′as well as  N ′are sets of queries. The latter holds because CALCQUERY (Algorithm 5, line 16) computes only queries and the only place where  P′and  N ′are modified is lines 24-26 where only sets returned by CALCQUERY are added to  P′and  N ′.

The next proposition confirms the restrictive character of test cases. That is, any extension of a current DPI through the addition of a test case cannot lead to a set of (all) diagnoses w.r.t. the new DPI that is a superset of the set of (all) diagnoses w.r.t. the current DPI. We want to point out that this is not necessarily true for the set of minimal diagnoses.

Proposition 12.3. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that  P′ ⊇ Pand  N ′ ⊇ N. Then, each diagnosis w.r.t.  ⟨K, B, P′, N ′⟩Ris also a diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

Proof. Let  D′ ∈ aD⟨K,B,P′,N ′⟩R. Then, by Corollary 3.3 and Definition 3.2,  (K \ D′) ∪ B ∪ UP′does not violate any  x ∈ R ∪ N ′. Since however formulas, in particular those in  UP′\P, that are added to a KB cannot invalidate any (unwanted) entailments, in particular those in  N ′, and cannot resolve any inconsistencies or incoherencies by the monotonicity of L, we can conclude that  (K \ D′) ∪ B ∪ UPdoes not violate any  x ∈ R ∪ N ′either. Since  N ′ ⊇ N, non-violation of any test case in  N ′implies non violation of any test case in N also. Consequently,  (K\D′)∪B∪UPdoes not violate any  x ∈ R ∪Nand entails all  p ∈ P(due to  UP) wherefore  D′ ∈ aD⟨K,B,P,N⟩Rdue to Corollary 3.3 and Definition 3.2.

As a consequence of this, each minimal diagnosis w.r.t. the new DPI is a diagnosis w.r.t. the current DPI, i.e. either a minimal or a non-minimal diagnosis w.r.t. the current DPI.

Corollary 12.3. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that  P′ ⊇ Pand  N ′ ⊇ N. Then, each minimal diagnosis w.r.t.  ⟨K, B, P′, N ′⟩Ris also a diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

Proof. Since Proposition 12.3 holds for all diagnoses w.r.t.  ⟨K, B, P′, N ′⟩R, it also holds for all minimal diagnoses w.r.t.  ⟨K, B, P′, N ′⟩Rsince each minimal diagnosis is a diagnosis.

Adding a test case to a DPI cannot make minimal diagnoses shrink:

Proposition 12.4. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that  P′ ⊇ Pand  N ′ ⊇ Nand let  D ∈ mD⟨K,B,P,N⟩R. Then, for all  D′ ∈ mD⟨K,B,P′,N ′⟩R, it holds that  D′ ̸⊂ D.

Proof. Let  D ∈ mD⟨K,B,P,N⟩Rand let  D′ ∈ mD⟨K,B,P′,N ′⟩Rsuch that  P′ ⊇ P, N ′ ⊇ Nand suppose D′ ⊂ D. By Proposition 12.3,  D′must be a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. By  D′ ⊂ D, this is a contradiction to the premise that  D ∈ mD⟨K,B,P,N⟩R, i.e. that D is minimal.

In fact, it even holds that each “new” minimal diagnosis (which is not a minimal diagnosis w.r.t. the current DPI) resulting from the addition of a test case to the current DPI must be a proper superset of some minimal diagnosis w.r.t. the current DPI. In other words, a minimal diagnosis w.r.t. the new DPI is either a minimal diagnosis w.r.t. the current DPI or a proper superset of some minimal diagnosis w.r.t. the current DPI.

Proposition 12.5. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that  P′ ⊇ Pand  N ′ ⊇ Nand let  D′ ∈ mD⟨K,B,P′,N ′⟩Rand  D′ /∈ mD⟨K,B,P,N⟩R. Then, there is some  D ∈ mD⟨K,B,P,N⟩Rsuch that  D ⊂ D′.

Proof. By Corollary 12.3, we know that  D′ ∈ mD⟨K,B,P′,N ′⟩Ris a diagnosis w.r.t.  ⟨K, B, P, N ⟩R. If  D′is already a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R, then the proposition holds. Otherwise, there must be some  D ⊂ D′such that D is a minimal diagnosis w.r.t.  ⟨K, B, P, N ⟩R.

Addition of a query to whatever test case set of a DPI DPI implies that the set of all diagnoses w.r.t. the new DPI is a proper subset of all diagnoses w.r.t. DPI:

Corollary 12.4. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that

P′ ⊇P and N ′ ⊇N ,

• |P′| = |P| + 1or  |N ′| = |N | + 1, but not both, and

 (P′ ∪ N ′) \ (P ∪ N ) = {Q}where Q is a query w.r.t. some set  D ⊆ mD⟨K,B,P,N⟩Rand ⟨K, B, P, N ⟩R.

Then,  aD⟨K,B,P′,N ′⟩R ⊂ aD⟨K,B,P,N⟩Rholds.

Proof. By Proposition 12.3 we have that  aD⟨K,B,P′,N ′⟩R ⊆ aD⟨K,B,P,N⟩R. Since  ⟨K, B, P′, N ′⟩Rresults from  ⟨K, B, P, N ⟩Rby the addition of the query Q w.r.t. some set D and  ⟨K, B, P, N ⟩Rto either P or N , we conclude by Definition 7.1 that at least one minimal diagnosis D w.r.t.  ⟨K, B, P, N ⟩Rin D is not a minimal diagnosis w.r.t.  ⟨K, B, P′, N ′⟩R. Assume, D is a non-minimal diagnosis w.r.t. ⟨K, B, P′, N ′⟩R. In this case, there must be some  D′ ⊂ Dsuch that  D′is a minimal diagnosis w.r.t. ⟨K, B, P′, N ′⟩R. This is a contradiction to Proposition 12.4. Consequently,  D /∈ aD⟨K,B,P′,N ′⟩R. Hence, D ∈ aD⟨K,B,P,N⟩R \aD⟨K,B,P′,N ′⟩R. By  aD⟨K,B,P′,N ′⟩R ⊆ aD⟨K,B,P,N⟩R, the proposition of the corollary follows.

12.4.5 Redundant Nodes in DYNAMICHS

The following result constitutes the basis for the definition of a redundant node we give in the next section. It is already stated in [Rei87], but without a proof. It testifies that the set of all minimal hitting sets of a collection F of sets remains steady if elements that are not set-minimal sets in F are deleted from F. By Proposition 4.6, the same must hold for the set of all minimal diagnoses of the collection of all minimal conflict sets w.r.t. some DPI DPI. That is, considering only minimal hitting sets of minimal conflict sets w.r.t. DPI is sufficient for completeness of a hitting set tree algorithm concerning the finding of all minimal diagnoses w.r.t. DPI.

However, we proved by Proposition 12.1 that existing conflict sets will tend to shrink gradually through the specification of new test cases. This implicates that more and more nodes  ndistored by DYNAMICHS will have the property that  ndi.cswill include non-minimal conflict sets w.r.t. the current DPI which constitutes the first of two criteria that are together sufficient for a safe pruning of  ndi. By safe pruning we mean the deletion of a node without eliminating any minimal diagnoses w.r.t. the current DPI.

Proposition 12.6. If F is a collection of sets, and if  S ∈ Fand  S′ ∈ Fsuch that  S ⊂ S′, then Fsub := F \ {S′}has the same minimal hitting sets as F.

Proof. Let D be a minimal hitting set of  Fsub, then D is a hitting set of F since  D ∩ S ̸= ∅holds which implies by  S ⊂ S′that  D ∩ S′ ̸= ∅. Assume that D is a non-minimal hitting set of F, i.e. that a subset D′ ⊂ Dis a hitting set of F. Then, however, by minimality of D w.r.t.  Fsubwe have that not all sets in Fsubare hit by  D′and thus, by  Fsub ⊂ F, that not all sets in F can be hit by  D′, contradiction. Thus, each minimal hitting set of  Fsubis also a minimal hitting set of F.

Let D be a minimal hitting set of F, then D is clearly a hitting set of  Fsub ⊂ F. Suppose that D is a non-minimal hitting set of  Fsub, i.e. that a proper subset of D is a hitting set of  Fsub. Let  D′ ⊂ Dbe a subset-minimal such subset of D. That is,  D′is a minimal hitting set of  Fsub. Since D is a minimal hitting set of  F, D′is not a (minimal) hitting set of F, but a minimal hitting set of  Fsub. This is a contradiction to the already proven fact that any minimal hitting set of  Fsubis also a minimal hitting set of F.

Assume the first criterion for a safe pruning of a node  ndi, namely the existence of some non-minimal conflict set w.r.t. the current DPI in  ndi.cs, is met. Then, we have not yet any evidence that  ndiis obsolete since for each of the non-minimal conflict sets in  ndi.csthere must be one (or multiple) proper subset(s) which is a minimal conflict set w.r.t. the current DPI. Let  C¬minbe one particular non-minimal conflict set in  ndi.csand let C be the particular proper subset of  C¬minthat is the first “witness” found by DYNAMICHS which documents the non-minimality of  C¬min. Then  C¬mincan be split into two disjoint parts, namely C and the set of formulas C that  C¬mindoes not share with C.

Now, the second criterion for a safe pruning of  ndiis about whether  ndihits C. If so, then  ndiis not a (partial) hitting set of only minimal conflict sets w.r.t. the current DPI. Put another way, this means that, under the assumption that a wpHS-tree was constructed using only the “static” current DPI, then the label  C¬minwould have never been produced and hence the node  ndicould have never been generated. Eventually, by the considerations made in Sections 4.6.3 and 11.4, we know that such a static hitting set tree algorithm is complete although not taking into account nodes like  ndi.

These thoughts motivate the following definition of a redundant node28.

Definition 12.4. Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Further, let nd be a node in DYNAMICHS. Then we call nd a redundant node w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Riff there is

some r  ∈ {1, . . . , |nd|} and

some minimal conflict set C w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R

such that

• C ⊂ nd.cs[r] and

nd[r] ∈nd.cs[r] \ C.

Moreover, C is called a witness of redundancy of nd.

A node node in DYNAMICHS can be only redundant w.r.t. a DPI DPI if node.cs comprises some non-minimal conflict set w.r.t. DPI:

Corollary 12.5. Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Further, let nd be a node in DYNAMICHS such that  nd[i] ∈ mC⟨K,B,P∪P′,N∪N ′⟩Rfor all  i ∈ {1, . . . , |nd|}. Then nd is not a redundant node w.r.t. ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

Proof. Since nd.cs comprises only minimal conflict sets w.r.t.  ⟨K, B, P, N ⟩R, there cannot be any  C ∈mC⟨K,B,P,N⟩Rsuch that  C ⊂ nd.cs[i]for some i.

A node that is redundant w.r.t. some DPI DPI remains redundant w.r.t. any  DPI′that includes a superset of the test cases DPI includes:

Lemma 12.4. Let  ⟨K, B, P, N ⟩Rand  ⟨K, B, P′, N ′⟩Rbe two DPIs such that  P′ ⊇ Pand  N ′ ⊇ N. Further, let nd be a redundant node w.r.t.  ⟨K, B, P, N ⟩R. Then, nd is a redundant node w.r.t.  ⟨K, B, P′, N ′⟩R.

Proof. By Proposition 12.1, if  ⟨K, B, P′, N ′⟩Rresults from the addition of a single new positive or negative test case to  ⟨K, B, P, N ⟩R, there cannot be any minimal conflict set w.r.t.  ⟨K, B, P′, N ′⟩Rthat is a proper superset of a minimal conflict w.r.t.  ⟨K, B, P, N ⟩R. By Definition 12.4, we can derive that any redundant node w.r.t.  ⟨K, B, P, N ⟩Rmust be a redundant node w.r.t.  ⟨K, B, P′, N ′⟩R. The proposition of this lemma is a consequence of further applications of Proposition 12.1.

This implies that a redundant node that is deleted during the execution of DYNAMICHS using the current DPI DPI cannot become non-redundant throughout the entire remaining execution of the interactive debugging session, i.e. the execution of Algorithm 5. Reason for this is that the sets of test cases in a DPI can only be extended and not reduced in the course of debugging.

Remark 12.7 Note that this has consequences on the way how “mind-changes” of a user might be handled by the interactive algorithm. It implies that the current state of DYNAMICHS (stored in the output variables of DYNAMICHS) cannot be exploited in case a user decides to discard some already answered query or to switch the already submitted answer of some query, resulting in some modified DPI DPI′. In such a situation a new construction of a hitting set tree by DYNAMICHS using the DPI  DPI′is indicated. Otherwise, some already pruned redundant node w.r.t. DPI might become a relevant node for DPI′which would lead to a violation of the postulated completeness of DYNAMICHS w.r.t. each current DPI, in this case the DPI  DPI′.

The following result is straightforward and claims that each successor node of a redundant node  ndiw.r.t. DPI is a redundant node w.r.t. DPI. So, if r is the minimal value such that both criteria of Definition 12.4 hold for  ndi, all successor nodes of the subnode  ndi[1..r]of  ndican be deleted. In other words, the entire subtree (of the hitting set tree produced by DYNAMICHS) rooted at an outgoing edge e of a non-minimal conflict set where e is labeled by an element ax which is not an element of a given witness of redundancy is obsolete.

Lemma 12.5. Let  ⟨K, B, P, N ⟩Rbe a DPI, nd be a redundant node w.r.t.  ⟨K, B, P, N ⟩Rand  nd′be a successor node of nd. Then,  nd′is a redundant node w.r.t.  ⟨K, B, P, N ⟩R.

Proof. The proposition of this lemma is a direct consequence of Definition 12.4.

12.4.6 Hitting Set Tree Pruning in DYNAMICHS

The main pruning operations performed by DYNAMICHS take place in the scope of the UPDATETREE function which is called right at the beginning of the execution of each call to DYNAMICHS. Assume a call to DYNAMICHS during Algorithm 5 given i.a. the DPI  ⟨K, B, P, N ⟩Rand the test cases  P′and  N ′as arguments and suppose the last-but-one call to DYNAMICHS was given  P′′and  N ′′as arguments. The job of UPDATETREE is to restore the parameters that store the state of DYNAMICHS (for DPI  ⟨K, B, P ∪P′′, N ∪ N ′′⟩R) in a way that they include at least all nodes that would be included by the respective parameters produced by a call to DYNAMICHS for the static DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

Roughly speaking, this involves the following actions:

Pruning: That is, only nodes that are definitely redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rare deleted. A node is definitely redundant if a witness of redundancy of it is known.

Replacement: A deleted redundant node is replaced by an alternative equal node of it which is non-redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, if there is such a one. Alternative equal nodes are constructed from the list of duplicate nodes  Qdup.

Rearrangement: the reassignation of nodes to Q that “survived” all pruning steps or were introduced in the course of a replacement step and for which no evidence w.r.t.  ⟨K, B, P ∪P′, N ∪N ′⟩Ris given that it should be assigned to any other set.

More concretely, UPDATETREE has the following effect on the collections  Q, D×, D⊃, Qdupwhich are, together with  D✓, the only node-storing collections of DYNAMICHS at the beginning of the execution of each call to DYNAMICHS:

(a) If nd is in  Qdup, then nd is removed from  Qduponly if there is a known witness of redundancy of nd w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. If there is an alternative equal replacement node  nd′of nd which is constructable from some node in  Qdup, then  nd′is added to  Qdup.

(b) If nd is in Q, then nd is removed from Q only if there is a known witness of redundancy of nd w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. If there is an alternative equal replacement node  nd′of nd which is constructable from some node in  Qdup, then  nd′is added to Q.

(c) If nd is in  D×and there is no known witness of redundancy of nd w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, then nd is added to Q.

(d) If nd is in  D×and nd is redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, then, if there is some alternativeequal replacement node  nd′of nd which is constructable from some node in  Qdup, then  nd′is added to Q.

(e) If nd is in  D⊃, there is no known witness of redundancy of nd w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand there is no known minimal diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rwhich is a proper subset of nd, then nd is added to Q.

(f) All nodes nd in  D✓are added to Q.

Some comments: Step (a) is conducted by PRUNEQDUP before PRUNE is called, for each witness of redundancy X of some node detected during the execution of UPDATETREE. PRUNE is the function that prunes or replaces nodes that are elements of any other collection than  Qdup, i.e.  Q, D×or  D⊃, and for which X is a witness of redundancy. In this vein, the PRUNE function just needs to perform a test whether there is any node in  Qdupthat enables the construction of a replacement node of a deleted node. No check for redundancy of nodes in  Qdupis necessary at this stage since  Qduphas already been processed and cleaned from all redundant nodes w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

Under the assumption that the deletion of a node redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Ris safe in terms of completeness of DYNAMICHS as to finding all minimal diagnoses w.r.t.  ⟨K, B, P ∪P′, N ∪N ′⟩R(which we will prove throughout this section), UPDATETREE acts safely. That is, deletion actions are performed just on the basis of given evidence in the form of a witness of redundancy. However, it must be accentuated that this does not necessarily imply the pruning or replacement of all redundant nodes w.r.t. ⟨K, B, P ∪ P′, N ∪ N ′⟩R. This is quite desired as guaranteeing complete pruning might be very costly concerning execution time since it would involve the precomputation of all not-yet-computed minimal conflict sets w.r.t. the current DPI at once. In the bad case, since these computations would take place online, i.e. between two successive queries shown to the user, this would be anything but beneficial for an interactive algorithm whose usability and usefulness depends greatly on its timeliness. Apart from that, a single newly added test case can be expected to lead to the introduction of only a small number of minimal conflict sets w.r.t. the current DPI that are no minimal conflict sets w.r.t. the last-but-one DPI.

Which nodes are pruned throughout UPDATETREE depends on which witnesses of redundancy are found, i.e. which minimal conflict sets are computed. The UPDATETREE function is implemented to search targeted for witnesses of redundancy of stored nodes. That is, instead of just computing any minimal conflict set w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, it focuses on the set of nodes  D×which includes the subset of all minimal diagnoses  Dcalccomputed in the last-but-one iteration of DYNAMICHS w.r.t. the last-but-one DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩R, which are no diagnoses w.r.t. the current DPI  ⟨K, B, P ∪P′, N ∪ N ′⟩R. Note that we will prove later in this section that  Dcalc, and thus  D×and  D✓which are subsets thereof, will indeed comprise only minimal diagnoses. So, UPDATETREE looks for witnesses of redundancy by means of exactly these minimal diagnoses that have been invalidated through the addition of the most recent answered query to the test cases of the DPI. Each diagnosis nd w.r.t. the last-but-one DPI can be invalidated only because it does not hit some minimal conflict set w.r.t. the current DPI and not because it is a non-minimal hitting set of all minimal conflict sets w.r.t. the current DPI. This can be directly inferred from Proposition 12.4 which manifests that minimal diagnoses cannot shrink by the addition of a new test case i.e. there cannot be any minimal diagnosis w.r.t. the current DPI which is a proper subset of nd.

Now, two cases can be identified for a minimal conflict set C w.r.t. the current DPI that is not hit by nd:

C1: C is not in a subset-relationship with any minimal conflict set in nd.cs. That is, C is definitely not a witness of redundancy of nd.

C2: C is in a subset-relationship with some minimal conflict set in nd.cs. That is, C satisfies the first criterion of a witness of redundancy of nd (cf. Definition 12.4). Thence, C might be a witness of redundancy of nd.

Now, the idea is to try to figure out very fast some C for a node  nd ∈ D×such that C is a witness of redundancy of nd. This idea is implemented in the so-called Quick Redundancy Check (QRC) which

calls QX just once given the DPI  ⟨Und.cs \ nd, B, P, N ⟩Rwith the usually very small KB  Und.cs \nd ⊆ Kin order to calculate just one minimal conflict set C w.r.t. the current DPI

and then verifies whether C is a witness of redundancy of nd by conducting at most |nd| subset-relationship checks.

The following lemma confirms that QRC (lines 50-54 in Algorithm 9), if successful, indeed computes a witness of redundancy of nd and thus gives evidence that nd is redundant w.r.t. the current DPI.

Lemma 12.6 (Quick Redundancy Check – QRC). Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Further, let nd be some node in DYNAMICHS. Then the following holds:

If QX(⟨Und.cs \ nd, B, P ∪ P′, N ∪ N ′⟩R)returns a set C such that  C ⊂ nd.cs[i]for some  i ∈{1, . . . , |nd.cs|}, then

nd is a redundant node w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand

• Cis a witness of redundancy of nd.

Proof. First,  Und.cs \ ndincludes all elements in the union of all conflict sets in nd.cs except for the elements occurring in nd. So, if QX(⟨Und.cs \ nd, B, P ∪ P′, N ∪ N ′⟩R)returns a set C, then C is a minimal conflict set w.r.t.  ⟨Und.cs \ nd, B, P ∪ P′, N ∪ N ′⟩Rby Proposition 4.9.

By Definition 4.1,  C ⊆ Und.cs \ ndholds wherefore  C ∩ nd = ∅. By  K ⊇ Und.cs \ ndand Remark 4.3, C is a minimal conflict set w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

If  C ⊂ nd.cs[i]for some  i ∈ {1, . . . , |nd.cs|}, then we have that C is a minimal conflict set w.r.t. ⟨K, B, P ∪P′, N ∪N ′⟩Rwhich is a proper subset of nd.cs[i]. Since  C ∩nd = ∅implies that  nd[i] /∈ Cfor all  i ∈ {1, . . . , |nd|}, we conclude that  nd[i] ∈ nd.cs[i] \ C. Now, by Definition 12.4, nd is a redundant node w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand C is a witness of redundancy of nd.

Remark 12.8 Please notice that the opposite direction does not necessarily hold. That is, if the node nd is redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R, QX(⟨Und.cs \ nd, B, P ∪ P′, N ∪ N ′⟩R)might return

some C which is not a subset of any conflict set in nd.cs or

’no conflict’.

As an illustration of that remark, we give the following example:

Example 12.5 For instance, assume a node nd = [1, 2] with  nd.cs = [⟨1, 2, 3⟩ , ⟨2, 4, 5⟩]and that  ⟨2, 3⟩is a minimal conflict set w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rwherefore nd is redundant by Definition 12.4. Then  Und.cs \ nd = {3, 4, 5}.

Suppose that  ⟨3, 5⟩is a minimal conflict set w.r.t. the current DPI as well. So, in this case, QX(⟨{3, 4, 5}, B, P ∪ P′, N ∪ N ′⟩R)might return  ⟨3, 5⟩. However,  ⟨3, 5⟩is neither a subset of  ⟨1, 2, 3⟩nor a subset of  ⟨2, 4, 5⟩wherefore  ⟨3, 5⟩is no witness of redundancy of nd.

On the other hand, if we suppose that  ⟨2, 3⟩and  ⟨2, 4, 5⟩are the only minimal conflict sets w.r.t. the current DPI that are subsets of  Und.cs = {1, 2, 3, 4, 5}, then ’no conflict’ is the output of the call to QX. This holds since nd[2] = 2 is an element of both  ⟨2, 3⟩and  ⟨2, 4, 5⟩and hence not an element of  Und.cs \nd = {3, 4, 5}. Therefore, neither  ⟨2, 3⟩nor  ⟨2, 4, 5⟩is returned by QX since QX(⟨{3, 4, 5} , B, P ∪P′, N ∪N ′⟩R)can only return a set that is a subset of {3, 4, 5} by Proposition 4.9 and Definition 4.1.

In both cases of the previous example, an existing witness of redundancy of nd is not detected by QRC. In this situation, i.e. when QRC is negative, a Complete Redundancy Check (CRC) is performed which involves QX investigating all the DPIs  ⟨nd.cs[i] \ nd[i], B, P ∪ P′, N ∪ N ′⟩Rfor  i ∈ {1, . . . , |nd|}separately. CRC, as substantiated by the following lemma, does find a witness of redundancy if the node nd is redundant w.r.t. the current DPI; and, if CRC does not find a witness of redundancy w.r.t. the current DPI, then nd is non-redundant w.r.t. the current DPI.

Lemma 12.7 (Complete Redundancy Check – CRC). Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Further, let nd be some node in DYNAMICHS. Then, the following holds:

(1) nd is redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Riff there is some  i ∈ {1, . . . , |nd|}such that QX(⟨nd.cs[i] \ {nd[i]} , B, P ∪ P′, N ∪ N ′⟩R) = Xwhere  X ̸=’no conflict’.

(2) If there is some  i ∈ {1, . . . , |nd|}such that QX(⟨nd.cs[i] \ {nd[i]} , B, P ∪ P′, N ∪ N ′⟩R) = Xwhere  X ̸=’no conflict’, then X is a witness of redundancy of nd.

Proof. (1): “⇐”: Assume there is some  i ∈ {1, . . . , |nd|}such that QX(⟨nd.cs[i]\{nd[i]} , B, P∪P′, N ∪N ′⟩R) = Xwhere  X ̸=’no conflict’. Then, by Proposition 4.9, we have that X is a minimal conflict set w.r.t.  ⟨nd.cs[i] \ {nd[i]} , B, P ∪ P′, N ∪ N ′⟩Rsuch that  X ⊆ nd.cs[i] \ {nd[i]}. By Definition 4.1, X is a minimal conflict set w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. Hence, we can conclude that  nd[i] /∈ X. By Definition 12.1 and since nd is a node in DYNAMICHS, it holds that  nd[i] ∈ nd.cs[i]. As a consequence, nd[i] ∈ nd.cs[i] \ Xholds. By Definition 12.4, nd is redundant w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R(and X is a witness of redundancy of nd).

⇒”: Suppose nd is a redundant node w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. Then, by Definition 12.4, there must be some  r ∈ {1, . . . , |nd|}and some minimal conflict set X w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rsuch that (i)  X ⊂ nd.cs[r]and (ii)  nd[r] ∈ nd.cs[r] \ X. By (ii),  nd[r] /∈ X. By Definition 12.1 and the fact that nd is a node in DYNAMICHS, we obtain that  nd[r] ∈ nd.cs[r]must be true. Hence, by (i), we derive that  X ⊆ nd.cs[r] \ {nd[r]}. By Proposition 4.9, QX given some DPI DPI outputs a minimal conflict set w.r.t. DPI iff there is a minimal conflict set w.r.t. DPI. Therefore and since QX(⟨nd.cs[i] \ {nd[i]} , B, P ∪ P′, N ∪ N ′⟩R)is called for each  i ∈ {1, . . . , |nd|}, it must also be called for i := r since  r ∈ {1, . . . , |nd|}. So, some minimal conflict set  X′, and not ’no conflict’, must be returned by QX(⟨nd.cs[r] \ {nd[r]} , B, P ∪ P′, N ∪ N ′⟩R)since there is at least one minimal conflict set w.r.t.  ⟨nd.cs[r] \ {nd[r]} , B, P ∪ P′, N ∪ N ′⟩R, namely X.

(2): This proposition follows directly from (1) (“⇐”).

At the point where some witness of redundancy X of some node  nd ∈ D×is found by QRC or CRC in UPDATETREE, the next steps (lines 62-65) involve the pruning of  Qdup, Q, D×and  D⊃. As already mentioned,  Qdupis the first collection to be cleaned from redundant nodes (w.r.t. the witness X) in PRUNEQDUP in order to constitute an input to the PRUNE function that does not include any redundant nodes (w.r.t. the witness X) and can be used “blindly” to construct replacement nodes of redundant nodes (w.r.t. the witness X) deleted from  Q, D×or  D⊃.

Before any pruning steps have ever been executed during the execution of Algorithm 5,  Qdupcomprises all generated nodes  nddupfor which, at generation time, there was one node  nd ∈ Qsuch that nddup = nd. That means,  nddupis stored in  Qdupin order to be available as an alternative equal node of nd or as an alternative subnode of some successor of nd in case nd is found to be redundant w.r.t. some current DPI.

If some node  nddupin  Qdupis found to be redundant w.r.t. the current DPI, there might be other nodes in  Qdupfrom which a non-redundant alternative equal node  nd′dupof  nddupw.r.t. the current DPI can be constructed. By Definition 12.3, we call such a node  nd′dupa combined replacement node of nddup. The name stems from the fact that  nd′dupis generated as a combination of existing nodes in  Qdup. Combining two nodes  nd1, nd2 ∈ Qdupsuch that  nd1is a proper alternative subnode of  nd2yields  nd3with  nd2 = nd3. nd3is constructed in that the first (redundant) part of  nd2(and  nd2.cs) is replaced by the (non-redundant) part  nd1(and  nd1.cs).

Such a combination is “legitimate” since it gives a node  nd3that would have been constructed if all duplicate nodes would have been added to Q and processed regularly instead of being added to  Qdup. The strategy to store duplicate nodes (where “duplicate” refers to the set a node represents) in a separate collection  Qdupas soon as they are found is part of the space-saving policy the DYNAMICHS algorithm pursues. For, in general, this prevents the algorithm to generate and store exponentially many nodes corresponding to equal sets. Since diagnoses are sets and not lists like nodes, it suffices to find only one node corresponding to a diagnosis. Only if some active node (one that is not in  Qdup) becomes redundant, some other set-equal node, if available, is constructed from the stored duplicate nodes. This idea is very similar to the way pruning is handled in the directed acyclic graph described in [GSW89].

The idea of node combination is formalized by the following definition.

Definition 12.5. Let S be a collection of nodes in DYNAMICHS and let  Sibe the set of nodes of cardinality i in S. Further, let the set  Comb1(S) := S1and let  Combi(S)comprise

all nodes in  Siand

all nodes nd such that nd is an alternative equal node of some node in  Siconstructed from some node in �i−1j=1 Combj(S).

Then,  Comb(S) := �∞i=1 Combi(S)is called the set of combined nodes of S and a node in  Combi(S)is called a combined node of cardinality i in S.

Further, let node be a node in DYNAMICHS and X be a minimal conflict set w.r.t. the current DPI. Then,

 Combnode(S) := {nd | nd ∈ Comb(S), nd = node}is the set of combined equal nodes of nd of S and

 Combnode,X(S) ⊆ Combnode(S)is the set of combined equal nodes of nd of S for which X is not a witness of redundancy.

The following corollary summarizes some simple consequences of Definition 12.5.

Corollary 12.6. Let S be a set of nodes in DYNAMICHS and let  Sibe the set of nodes of cardinality i in S. Then:

(1)  Combi(S) = ∅iff  Si = ∅.

(2)  Combi(S)includes only nodes of cardinality i.

(3)  Combnode(S) = ∅iff there is no node  nd ∈ Ssuch that nd = node.

(4) If  nd ∈ Combi(S)and  nd /∈ S, then

there is some  nd′ ∈ Combj(S)for some  j ∈ {1, . . . , i − 1}and

some  nd′′ ∈ Si

image

 nd′is an alternative subnode of  nd′′and

nd = ADD(nd′, nd′′[j+ 1..i]) and

nd.cs = ADD(nd′.cs,nd′′.cs[j + 1..i]).

The example we give next illustrates Definition 12.5.

Example 12.6 Recall the nodes  nd, nd1, nd2, node1and  node2of Example 12.4 and let  nd3 :=[1, 2, 6, 4] with  nd3.cs := [⟨1, 2, 3⟩ , ⟨2, 6⟩ , ⟨3, 6, 7⟩ , ⟨4, 5⟩]and  S := {nd, nd1, nd2, nd3}. Then,

image

where

image

is the alternative equal node of  nd3constructed from  nd1.

The PRUNEQDUP function is always called given the current list  Qdupwhich is anytime sorted in ascending order by node cardinality. This holds by lines 21, 121 and 124 which are the only places where nodes are added to  Qdupthroughout DYNAMICHS and where nodes are inserted into  Qdupsuch that the order by node cardinality is preserved. Now, the next lemma substantiates that PRUNEQDUP, given some minimal conflict set X w.r.t. the current DPI, updates  Qdupin a way that all redundant nodes w.r.t. the witness X are deleted, each deleted node is replaced by one non-redundant combined replacement node w.r.t. the witness X if such a one is constructable (cf. Definition 12.5), and for each remaining node nd, i.e. nd is a non-deleted node or a combined replacement node of some deleted node, each superset of X in nd.cs is replaced by X.

This leads to a new list  Qdupreturned by PRUNEQDUP which includes only non-redundant nodes w.r.t. the witness X. Furthermore, the new list  Qdupcontains a node corresponding to each set (path) S for which there was a corresponding node in the old list  Qdupif there would be a non-redundant (w.r.t. X) node corresponding to S in a hitting set tree equal to the one produced by DYNAMICHS except that all duplicate nodes corresponding to equal sets (paths) would be regularly processed and expanded.

Lemma 12.8. Let  ⟨K, B, P, N ⟩Rbe a DPI and let the input parameters to the PRUNEQDUP function be:

X is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R,

Dup is a set of nodes sorted ascending by node cardinality.

image

(1) all nodes in Dup for which X is not a witness of redundancy,

(2) at least one node in  Combnd,X(Dup)for each node  nd ∈ Dupfor which X is a witness of redundancy, if  Combnd,X(Dup) ̸= ∅and

(3) only nodes nd such that there is no  r ∈ {1, . . . , |nd|}for which  nd.cs[r] ⊃ X.

Proof. The function PRUNEQDUP walks through all nodes ndi in the set Dup. If X is not a witness of redundancy of ndi, tested in lines 111 and 112 exactly as prescribed by Definition 12.4, then k = 0 must hold in line 116 by lines 109-115. Thus, line 124 is executed and ndi added to  Dupnew. Since no nodes are removed from  Dupnewthroughout PRUNEQDUP, proposition (1) is valid.

Otherwise, i.e. if X is a witness of redundancy of ndi, then line 113 must have been executed at least once before line 116 is reached. This implies that k > 0 must hold in line 116. At this point, k stores the maximum position in (the list) ndi at which the redundancy criterion of lines 111 and 112 is satisfied. So, in line 117, nodes in  Dupneware tested successively until some  ndj ∈ Dupnewmeets  |ndj| ≥ kand ndi[1..|ndj|] = ndj. This means that the subnode ndi[1..|ndj|] of ndi can be replaced by ndj (and ndi.cs[1..|ndj|] by ndj.cs) to yield an alternative equal node  ndinewof ndi (lines 119 and 120).

We still have to show that X cannot be a witness of redundancy of  ndinew. For this to hold it is sufficient that X is not a witness of redundancy of ndj by  |ndj| ≥ k. So, we must verify that  Dupnewcan comprise only nodes of which X is not a witness of redundancy. We prove this by induction.

Since  Dupnewis initialized to be the empty set when the function PRUNEQDUP starts executing, we just need to investigate which nodes are added to  Dupnewwithin PRUNEQDUP. Addition of nodes to Dupnewhappens at lines 121 and 124.

Base case: When line 121 executed for the first time during the execution of PRUNEQDUP,  Dupnewcan only comprise nodes which have been added to it in line 124. By the argumentation used to prove proposition (1) of this lemma, it holds that X is not a witness of redundancy of any node added to  Dupnewin line 124. Thus, there cannot be a witness of redundancy of the very first node added to  Dupnewin line 121.

Induction step: Let us assume that  Dupnewcomprises only nodes such that X is not a witness of redundancy of any of them. Further, suppose that  ndinewis added to  Dupnewwhen line 121 is executed for the k-th time where k > 1. Then, by the same line of argument as in the base case, we can conclude that X is not a witness of redundancy of  ndinew.

Each node  ndinewadded to  Dupnewin line 121 is an element of  Combndi,X(Dup). Namely, ndj satisfies the criterion in line 118 and thus  ndinewis an element of  Combndi(Dup)by Definition 12.5. And, as shown before, X is not a witness of redundancy of  ndinew, wherefore  ndinew ∈ Combndi,X(Dup)by the definition of  Combndi,X(Dup)(Definition 12.5).

Thence, if  Combndi,X(Dup) ̸= ∅, there must be at least one node nd added to  Dupnewsuch that nd ∈ Combndi,X(Dup)and X is not a witness of redundancy of nd. Consequently, proposition (2) holds.

Proposition (3): First, observe that each node in Dup is definitely processed as ndi by the for-loop in line 107 and the fact that there is no criterion that can cause a preliminary break of this for-loop. Each time the first part of the redundancy check (line 111) is successful for ndi, we know that some conflict set ndi.cs[m] is non-minimal w.r.t.  ⟨K, B, P, N ⟩R. If the second part of the redundancy check (line 112) is negative, then  ndi[m] ∈ X, wherefore there is – at least so far – no evidence that ndi is redundant w.r.t.  ⟨K, B, P, N ⟩R. In this case, ndi might later be inserted to  Dupnew(in case X is not a witness of redundancy of ndi) and hence the set ndi.cs[m] is replaced by the minimal conflict set X w.r.t. ⟨K, B, P, N ⟩Rin line 115. If the second part of the redundancy check in line 112 is positive, then it is guaranteed that ndi is either combined-replaced or pruned. This holds due to lines 116-122 and since k > 0 must be true due to line 113. That a combined replacement node that might be found for some redundant ndi throughout lines 116-122 meets proposition (3) can be shown by induction in a very similar way as proposition (2) was shown.

The following corollary is a direct consequence of Lemma 12.8 and states that the updated list  Qdup(if interpreted as a set) is a subset of the set of combined nodes of the old list  Qdup. In other words, no nodes corresponding to sets (paths) that are not represented by a node in the old list  Qdupcan be introduced throughout PRUNEQDUP. The introduction of such nodes corresponding to “new” sets (paths) can only take place in line 21 where newly generated nodes are added to  Qdup.

Corollary 12.7. Given the same preconditions as in Lemma 12.8, PRUNEQDUP returns  Dupnewwhere Dupnew ⊆ Comb(Dup).

The following result provides sufficient and necessary criteria for a node nd to be a combined node of Qdup. Roughly, these criteria involve the existence of a sequence of nodes  nd1, . . . , ndk ∈ Qdupwhere each node in this sequence is a proper alternative subnode of the next node and nd is constructed from this sequence of nodes in that nd is an alternative equal node of  ndkconstructed from  nd′k−1. nd′k−1in turn is an alternative equal node of  ndk−1constructed from  nd′k−2, and so on. Finally,  nd′2is an alternative equal node of  nd2constructed from  nd1and  nd1 ∈ Qdup.

Lemma 12.9. Let nd be a node in DYNAMICHS. Then,  nd ∈ Comb(Qdup)iff there are nodes  nd1, . . ., ndk ∈ Qdupfor  k ≥ 1such that

(1) |nd1| <· · ·  < |ndk| = |nd|,

(2) it holds that

image

(3)  ndiis an alternative subnode of  ndi+1for  i ∈ {1, . . . , k − 1}.

Proof. ⇒”: Suppose  nd ∈ Comb(Qdup)and that |nd| = i. Then, there are two cases, either  nd ∈ Qdup

or  nd /∈ Qdup. In the former case, we can define  nd1as nd and the proposition of the lemma holds. In the latter case, by proposition 2 of Corollary 12.6, Definition 12.5 and |nd| = i, it holds that

nd ∈ Combi(Qdup). By proposition 4 of Corollary 12.6 and the fact that  nd ∈ Combi(Qdup), there is

image

an alternative subnode of  ndkand that  nd[ik] = ndk[ik]for  ik ∈ {|ndk−1| + 1, . . . , |ndk|}must be true.

That is, propositions (1), (2) and (3) hold for  ndkand  ndk−1. Now, again, there are two cases for  ndk−1, i.e. either  ndk−1 ∈ Qdupor  ndk−1 /∈ Qdup. In the former case, we can define  nd1as  ndk−1and the proposition of the lemma holds. In the latter case, the same argumentation as for nd can be applied to show the existence of some

ndk−2that meets propositions (1), (2) and (3). Due to the fact that the cardinality of  ndk−i−1is strictly

smaller than the cardinality of  ndk−ifor all i and the fact that  Comb1(Qdup) = Qdup, the case  ndk−m ∈Qdupmust finally arise for some m. “⇐”: Suppose there are nodes  nd1, . . . , ndk ∈ Qdupsuch that propositions (1)-(3) are satisfied. Let

k = 1. Then, by propositions (1) and (2) of this lemma, we have that nd is the same node as  nd1. Since

nd1 ∈ Qdupand by Definition 12.5, we have that  nd ∈ Comb(Qdup). So, the lemma holds for k = 1. Now, assume that the lemma holds for k = m for some natural number m. That is, assume that there

is a node  nd ∈ Comb(Qdup)if there are nodes  nd1, . . . , ndm ∈ Qdupsuch that  |nd1| < · · · < |ndm| =

|nd|,

image

and  ndiis an alternative subnode of  ndi+1for  i ∈ {1, . . . , m}. What we need to show is that  nd′ ∈

Comb(Qdup).

If  nd′ ∈ Qdup, then, by Definition 12.5, the lemma is true. So suppose  nd′ /∈ Qdup. By the definition of an alternative subnode (Definition 12.2),  nodesub ⊆ nodein case  nodesubis an

alternative subnode of node. So, because  ndiis an alternative subnode of  ndi+1and  |ndi| < |ndi+1|

for  i ∈ {1, . . . , m}, we have that  ndi ⊂ ndi+1for  i ∈ {1, . . . , m}. Consequently,  ndi ⊆ nd′for

i ∈ {1, . . . , m + 1}and  ndi ⊆ ndfor  i ∈ {1, . . . , m}must hold. Due to  |ndm| = |nd|we obtain the

set-equality between  ndmand nd. This result along with  ndm ⊆ nd′and  |ndm| < |ndm+1| = |nd′|

implies that  nd ⊂ nd′. However, since

image

is met for ndx being the same node as nd as well as for ndx being the same node as  nd′, we can conclude that  nd[i] = nd′[i]for  i ∈ {1, . . . , |nd|}.

image

The PRUNE function (lines 63-65) is called given a collection  S ∈ {Q, D×, D⊃}, a minimal conflict set X w.r.t. the current DPI and  Qdupwhich has already been updated and cleaned from redundant nodes (w.r.t. the witness X) by the PRUNEQDUP function. So, let  nddup ∈ Qdupbe a (not necessarily proper) alternative subnode of some node node that is stored in S. Assume X is a witness of redundancy of node. By Lemma 12.8 and since  nddup ∈ Qdup, Xcannot be a witness of redundancy of  nddup. Further, let r ∈ {1, . . . , |node|}be the highest number such that  X ⊂ node.cs[r]and  node[r] ∈ node.cs[r] \ X. Now, in case  r ≤ |nddup|holds,  nddup(and  nddup.cs) can be used to replace the first  |nddup|elements of node (and node.cs). The result is an alternative equal node of node which is non-redundant w.r.t. the current DPI and which can be added to S after deletion of node as a representative of the set (path) node has represented.

Now, the next lemma substantiates that PRUNE updates S in a way that all redundant nodes w.r.t. the witness X are deleted, each deleted node is replaced by one non-redundant replacement node w.r.t. the witness X if such a one is constructable from  Qdupand for each remaining node nd, i.e. nd is a non-deleted node or a replacement node of some deleted node, each superset of X in nd.cs is replaced by X.

This leads to a new set S returned by PRUNE which includes only non-redundant nodes w.r.t. the witness X. Furthermore, the new set S contains a node corresponding to each set (path) Y for which there was a corresponding node in the old set S if there would be a non-redundant (w.r.t. X) node corresponding to Y in a hitting set tree equal to the one produced by DYNAMICHS except that all duplicate nodes corresponding to equal sets (paths) would be regularly processed and expanded.

Lemma 12.10. Let  ⟨K, B, P, N ⟩Rbe a DPI and let the following be the input parameters to the PRUNE function:

X is a minimal conflict set w.r.t.  ⟨K, B, P, N ⟩R,

S is a set of nodes in DYNAMICHS,

Dup is a set of nodes where

image

for each  nd ∈ Sthere might be some  nd′ ∈ Dupsuch that  nd′is an alternative subnode of nd and

image

Then, PRUNE returns  S′where the following holds:

(1)  S′is a set such that  S \ S′includes exactly these nodes in S for which X is a witness of redundancy and  S ∩ S′includes exactly these nodes in S for which X is not a witness of redundancy.

(2) Each element  nd ∈ S′ \ Sis an alternative equal node of some node in  S \ S′constructed from some node in Dup such that X is not a witness of redundancy of nd.

(3) Let  nd ∈ S \ S′and  Altnddenote the set of all alternative equal nodes of nd, each of which can be constructed from some node in Dup and for each of which X is not a witness of redundancy. Then there is some  nd′ ∈ Altndsuch that  nd′ ∈ S′ \ S.

(4)  S′includes only nodes nd such that there is no  r ∈ {1, . . . , |nd|}for which  nd.cs[r] ⊃ X.

Proof. The PRUNE procedure runs through all nodes  nd ∈ Sand for each nd runs through all sets in nd.cs (lines 87 and 89). Lines 90 and 91 perform a check whether X is a witness of redundancy of nd, implementing exactly the criteria given by Definition 12.4. If the check is not successful for any i ∈ {1, . . . , |nd|}, i.e. X is not a witness of redundancy of nd, then k = 0 must hold when line 95 is reached. Hence, nd is added to  S′in line 103 in this case. As only nodes different from nd can be added to  S′in line 100 and as there are no other ways nodes might be added to  S′, we have that  S \ S′includes exactly these nodes in S for which X is a witness of redundancy and  S ∩ S′includes exactly these nodes in S for which X is not a witness of redundancy. So, proposition (1) is true.

The truth of proposition (2) can be derived as follows: By the proof of proposition (1), line 100 is the only place where nodes that are not elements of S are added to  S′. Hence, each node in  S′ \S must be added to  S′in line 100. Thus, only nodes  nodenew := ADD(node, nd[|node| + 1..|nd|])with  nodenew.cs := ADD(node.cs, nd.cs[|node| + 1..|nd|])constructed exactly as per Definition 12.2 in lines 98 and 99 where  nd ∈ Scan be added to  S′.

Now, we still have to show that node is an alternative subnode of nd. From the precondition that X is not a witness of redundancy of any node in Dup, X cannot be a witness of redundancy of node. Moreover,  |node| ≥ kmust hold as line 97 has been passed. So, we have that X must be a witness of redundancy for nd[1..|node|] since k > 0 (line 95) and by the way k is constructed (lines 88-92). Hence, there must be some  j ∈ {1, . . . , |node|}with the property that  node[j] ̸= nd[j]or  node.cs[j] ̸= nd.cs[j]wherefore node is indeed an alternative subnode of nd. Thus,  nodenewis an alternative equal node of nd by Definition 12.2. That  nd ∈ S \ S′must be true can be explained as follows. By the argumentation to prove proposi-tion (1) and (2) so far, we know that only nodes can be added to  S′in line 100 and line 103 for which X is not a witness of redundancy. Moreover, we have shown that line 100 can only be reached for some node  nd ∈ Sfor which X is a witness of redundancy. Consequently,  nd /∈ S′must hold.

That X is not a witness of redundancy of  nodenewcan be derived as follows: From the precondition that X is not a witness of redundancy of any node in Dup, X cannot be a witness of redundancy of  nodenew[1..|node|]with  nodenew.cs[1..|node|]since  nodenew[j] = node[j]and  nodenew.cs[j] =node.cs[j] for all  j ∈ {1, . . . , |node|}. kis the maximum index such that  X ⊂ nd.cs[k]and  nd[k] ∈nd.cs[k]\X by lines 88-92. Since  |node| ≥ k, Xcannot be a witness of redundancy of  nodenew[|node|+1..|nd|] with  nodenew.cs[|node| + 1..|nd|]either since  nodenew[j] = nd[j]and  nodenew.cs[j] = nd.cs[j]for all  j ∈ {|node| + 1, . . . , |nd|}. Therefore, X cannot be a witness of redundancy of  nodenew.

Proposition (3): As already argued, for each node  nd ∈ S \ S′, line 96 must be reached. Then, in line 96, all nodes in Dup are investigated in order to find an alternative subnode of nd. So, if there is such a one, then it must be found.

Proposition (4): For a node nd that is added to  S′in line 103, the for-loop in line 89 must have been executed. Since, as already shown, line 92 cannot be executed for a node that is added to  S′in line 103, line 94 must have been executed for all  i ∈ {1, . . . , |nd|}. Hence, proposition (4) holds for all nodes inserted into  S′in line 103.

For nodes

image

inserted into  S′in line 100, proposition (4) follows from the precondition that Dup includes only nodes n such that there is no  r ∈ {1, . . . , |n|}for which  n.cs[r] ⊃ X, from the fact that  node ∈ Dupand the fact that line 94 must have been executed for all indices i > k.

12.4.7 De-Facto Non-Redundant Nodes in DYNAMICHS

The following definition introduces a notion that is of rather theoretical use for the proof of completeness of DYNAMICHS we will give later. The definition assumes a fixed DPI and characterizes as active sublabel of a particular conflict set nd.cs[r] in nd.cs the subset of nd.cs[r] that “survives” all the pruning steps, i.e. PRUNEQDUP and PRUNE calls, during all executions of DYNAMICHS up to the one with a current DPI DPI. Notice that the shape of the active sublabel can never be known in advance as we do not know which witnesses of redundancy might be found. This makes up the theoretical nature of this definition. However, we will be able to show that no active sublabel of a node can be the empty set under certain preconditions that are met for DYNAMICHS.

Definition 12.6. Let

nd be a node in DYNAMICHS,

r  ∈ {1, . . . , |nd|} fixed,

 DPI1, . . . , DPInbe a sequence of DPIs where  DPIjincludes a proper subset of the test cases DPIj+1includes for  j ∈ {1, . . . , n − 1},

 DPInis equal to DPI or includes a proper subset of the test cases DPI includes,

• C1, . . . , Cnbe the chronological sequence of all sets X given as an argument to PRUNE and PRUNEQDUP during all executions of DYNAMICHS up to and including the one with current DPI DPI where

image

Ck⊃ Ck+1for k ∈ {1, . . . , n − 1},

nd.cs[r] ⊃ C1.

Then, we call  Cnthe active sublabel of nd.cs[r] w.r.t. DPI.

The next definition of a de-facto non-redundant node is based on Definition 12.6. A de-facto non-redundant node w.r.t. DPI includes at each position an element that hits the active sublabel w.r.t. DPI at this position. Again, this definition is of theoretical rather than practical use, but crucial for the proof of completeness of DYNAMICHS. In fact, we will be able to show that for each minimal diagnosis w.r.t. DPI there must be – anytime during any execution of DYNAMICHS with a current DPI including a subset of the test cases in DPI – a de-facto non-redundant node corresponding to a subset of this diagnosis. In further consequence, this will allow us to derive the algorithm’s completeness concerning the detection of all minimal diagnoses w.r.t. DPI.

Definition 12.7. We call a node nd in DYNAMICHS de-facto non-redundant w.r.t. DPI iff nd[r] is an element of an active sublabel w.r.t. DPI for all  r ∈ {1, . . . , |nd|}.

A de-facto non-redundant node w.r.t. a DPI DPI “survives” all pruning steps at least until the execution of DYNAMICHS with current DPI DPI:

Proposition 12.7. Let nd be a node which is de-facto non-redundant w.r.t. DPI. Then, nd cannot be pruned or replaced during any execution of DYNAMICHS up to and including the one with current DPI DPI.

Proof. By Definitions 12.6 and 12.7, PRUNE and PRUNEQDUP cannot be called given a witness of redundancy of nd during any execution of DYNAMICHS up to and including the one with current DPI DPI. By Lemmata 12.8 and 12.10, only nodes can be pruned or replaced for which the input set X given to PRUNE and PRUNEQDUP is a witness of redundancy.

Example 12.7 Let K = {1, . . . , 10} be the KB of the (admissible) input DPI  DPI0to Algorithm 5 and let nd := [1, 2, 3, 4] with  nd.cs := [⟨1, 5, 7⟩ , ⟨2, 4, 6⟩ , ⟨3, 6, 7⟩ , ⟨4, 5⟩]be a node stored by DYNAMICHS during the execution of some call to DYNAMICHS during Algorithm 5. Moreover, let DPI be a fixed DPI constructed during the execution Algorithm 5 that includes a (not necessarily proper) superset of the test cases in  DPI0. Assume that the chronological sequence of all inputs X to PRUNE and PRUNEQDUP throughout all executions of DYNAMICHS up to and including the one with current DPI DPI during Algorithm 5 and after nd has been generated is given by  ⟨1, 6⟩ , ⟨3, 7⟩ , ⟨1, 3, 8⟩ , ⟨2⟩ , ⟨4⟩ , ⟨1, 5⟩.

Then nd.cs undergoes the transition depicted by Table 12.1 induced by this sequence of X arguments to PRUNE/PRUNEQDUP. We can observe in Table 12.1 that each proper superset of some argument X of

image

Table 12.1: Transition of nd.cs induced by multiple calls to PRUNE.

PRUNE/PRUNEQDUP that occurs in nd.cs is replaced by X (cf. Lemmata 12.8 and 12.10). This is the case, for instance, for  X = ⟨3, 7⟩in the second row of the table which replaces  nd.cs[3] = ⟨3, 6, 7⟩. Similar situations can be found in rows 4-6. No changes to nd.cs are triggered for  X = ⟨1, 6⟩or  X = ⟨1, 3, 8⟩in rows 1 and 3, respectively, because at this stage nd.cs does not include any superset of X.

We learn from the last row of the table that nd is de-facto non-redundant w.r.t. DPI. This holds, first, since we considered the chronological sequence of all inputs X to PRUNE and PRUNEQDUP throughout all executions of DYNAMICHS up to and including the one with current DPI DPI during Algorithm 5. Second, we have that

image

where  nd.cs′is the value of nd.cs given by the last row of the table which is the “current” value of nd.cs during the execution of DYNAMICHS with current DPI DPI. By Definition 12.6,  nd.cs′[i]is the active sublabel of nd.cs[i] w.r.t. DPI for  i ∈ {1, . . . , 4}. That is, for example,  ⟨3, 7⟩is the active sublabel of nd.cs[3]. As we realized that each element of nd is an element of an active sublabel w.r.t. DPI, we obtain the de-facto non-redundancy of nd w.r.t. DPI as per Definition 12.7.

Notice that the sole definition of redundancy of a node w.r.t. DPI (Definition 12.4) does not perfectly serve our purposes as it does not take into account the order in which new conflict sets emerge and are used for pruning.

For instance, consider  nd.cs[2] = ⟨2, 4, 6⟩which includes 2 as well as 4. Both values  ⟨2⟩and  ⟨4⟩of X in rows 4 and 5 of Table 12.1 must be conflict sets w.r.t. DPI by Proposition 12.1, which says that conflict sets cannot grow after the addition of a test case to a DPI, and the fact that each X must be a minimal conflict set w.r.t. some DPI including a subset of the test cases in DPI. In fact, by Proposition 12.2 and the admissibility of  DPI0, ⟨2⟩and  ⟨4⟩are even minimal conflict sets w.r.t. DPI. Thus, application of Definition 12.4 yields that nd is redundant w.r.t. DPI because  ⟨4⟩ ⊂ ⟨2, 4, 6⟩and  nd[2] = 2 ∈⟨2, 4, 6⟩ \ ⟨4⟩(cf. Definition 12.4). However, bearing in mind that  ⟨2⟩was known to the algorithm before ⟨4⟩, or,  ⟨2⟩was used for pruning before  ⟨4⟩, we have that the set nd.cs[2], after being modified by PRUNE or PRUNEQDUP, is not redundant w.r.t. DPI. This is true since the new set  nd.cs[2] = ⟨2⟩which is not a superset of  ⟨4⟩.

So, to summarize, a node is (theoretically) redundant w.r.t. DPI as per Definition 12.4 iff there is a minimal conflict set w.r.t. DPI which is a witness of redundancy of this node. As however the example above has shown, whether a node is found to be redundant or not depends on the order of conflict sets used for pruning. This fact is also mentioned in [GSW89]. And, a (theoretically) redundant node w.r.t. DPI does not necessarily need to be discovered by DYNAMICHS and might be modified by PRUNE or PRUNEQDUP in a way that it becomes non-redundant w.r.t. DPI.

On the other hand, the definition of de-facto non-redundancy w.r.t. DPI (Definition 12.7) incorporates exactly these thoughts and declares only nodes as de-facto non-redundant w.r.t. DPI which are actually not found to be redundant w.r.t. DPI.

The criteria for a node nd to be a combined node of  Qdupgiven by Lemma 12.9 will facilitate the proof of the next lemma. This lemma states that a combined node of  Qdupwhich is non-redundant w.r.t. some DPI  ⟨K, B, P ∪P′, N ∪N ′⟩Rcannot be pruned during DYNAMICHS given i.a. the DPI  ⟨K, B, P, N ⟩Rand sets of positively and negatively answered queries  P′′and  N ′′as input where  P′′ ⊆ P′and  N ′′ ⊆ N ′. This result will constitute an essential prerequisite for the proof of completeness of DYNAMICHS.

Lemma 12.11. Let  nd ∈ Comb(Qdup)be some node that is de-facto non-redundant w.r.t. the DPI DPI and let  DPI′be some DPI that is either equal to DPI or includes only a subset of the test cases of DPI. Then, throughout any execution of DYNAMICHS using the current DPI  DPI′, nd ∈ Comb(Qdup)holds. Proof. First, we show that there cannot be a minimal conflict set C w.r.t.  DPI′such that PRUNEQDUP is called with X := C and there is some  q ∈ {1, . . . , |nd|}with the property that  C ⊂ nd.cs[q]and nd[q] ∈ nd.cs[q] \ C.

So, assume that PRUNEQDUP is called with X := C and there is some C w.r.t.  DPI′such that there is some  q ∈ {1, . . . , |nd|}with the property that  C ⊂ nd.cs[q]and  nd[q] ∈ nd.cs[q] \ C. Let now C1, . . . , Cnbe the (arbitrary actual) chronological sequence of all sets X given as an argument to PRUNE and PRUNEQDUP during all executions of DYNAMICHS up to and including the one with current DPI DPI where

nd.cs[q] ⊃ C1,

each  Ciis a minimal conflict set w.r.t.  DPIifor  i ∈ {1, . . . , n}

• Ck⊃ Ck+1for k ∈ {1, . . . , n − 1},

 DPIjincludes a proper subset of the test cases  DPIj+1includes for  j ∈ {1, . . . , n − 1},

 DPInis equal to DPI or includes a proper subset of the test cases DPI includes.

Then,  Cnis the active sublabel of nd.cs[q] w.r.t. DPI. Since  C ⊂ nd.cs[q]and X := C is an argument of PRUNEQDUP during  DPI′, we have that C must be equal to some set  Cjin the sequence  C1, . . . , Cn. By Definition 12.7 and the de-facto non-redundancy of nd w.r.t.  DPI, nd.cs[q] ∈ Cnmust hold. By Cn ⊆ Cj = C, we finally obtain  nd.cs[q] ∈ C, which is a contradiction to  nd[q] ∈ nd.cs[q] \ C.

Lemma 12.9 and  nd ∈ Comb(Qdup)guarantee the existence of nodes  nd1, . . . , ndk ∈ Qdupfor k ≥ 1such that

(1) |nd1| <· · ·  < |ndk| = |nd|,

(2) it holds that

image

and

(3)  ndiis an alternative subnode of  ndi+1for  i ∈ {1, . . . , k − 1}.

So, let us assume that  nd /∈ Comb(Qdup)at some point in time during the execution of DYNAMICHS using the current DPI  DPI′. That is, some node  ndjfor some  j ∈ {1, . . . , k}must have been deleted from  Qdup. Nodes can only be deleted from  Qdupin the scope of the function PRUNEQDUP. By Lemma 12.8 and Corollary 12.2, only nodes for which X is a witness of redundancy can be deleted from  Qdupby the function PRUNEQDUP where X is the minimal conflict set given to PRUNEQDUP. Thus, assume that  ndjfor some  j ∈ {1, . . . , k}is the first node among  nd1, . . . , ndk ∈ Qdupdeletedfrom  Qdupby PRUNEQDUP given the minimal conflict set X w.r.t.  DPI′as an argument. Then, as X must be a witness of redundancy of  ndj, we have that there is some  m ∈ {1, . . . , |ndj|}such that X ⊂ ndj.cs[m]and  ndj[m] ∈ ndj.cs[m] \ X.

Since Lemma 12.9 holds also for  j ≤ kand  ndjis the first node among  nd1, . . . , ndk ∈ Qdupdeleted from  Qdup, we deduce that there is some node  node ∈ Comb(Qdup)such that  |node| = |ndj|and node[r] = nd[r] for  r ∈ {1, . . . , |ndj|}where  |ndj| ≤ |nd|. As pointed out before, there cannot be any q ∈ {1, . . . , |nd|}such that  X ⊂ nd.cs[q]and  nd[q] ∈ nd.cs[q] \ X. This, however, is a contradiction that there is some  m ∈ {1, . . . , |ndj|}such that  X ⊂ ndj.cs[m]and  ndj[m] ∈ ndj.cs[m] \ X.

Hence, none of the nodes  nd1, . . . , ndk ∈ Qdupcan be deleted throughout the execution of DY- NAMICHS using the current DPI  DPI′. Consequently, by Lemma 12.9,  nd ∈ Comb(Qdup)must be preserved.

The finding of the next lemma is that a node nd in DYNAMICHS cannot be processed before all nodes that are set-equal to nd or proper subsets of nd have been generated.

Lemma 12.12. Let GenNodes be the set of all nodes generated throughout the execution of all calls to DYNAMICHS during the execution of Algorithm 5. Then, a node nd cannot be processed before each node  nd′ ∈ GenNodeswhere  nd′ ⊆ ndis generated.

Proof. Let  nd′ ∈ GenNodessuch that  nd′ ⊆ nd. Assume that nd is processed, but  nd′has not yet been generated. In order to be processed, nd must be an element of Q. By the fact that  nd′ ∈ GenNodes, nd′must be generated at some point in time. In order for  nd′to be generated, some node  nd′′with  nd′′ ⊂ nd′must be an element of Q. This follows from

the fact that each generated node is a superset of some node in Q (cf. lines 6, 18 and 23 and Definition 12.3),

the fact that Q can only be modified by (a) deleting from Q some node and adding a set of successor nodes of it to Q (lines 6, 7 and 23) or by (b) deleting from Q some node and possibly adding to Q a replacement node of it in the function PRUNE and

the fact that for any replacement node  ndrepof nd it holds that  ndrep = nd.

By Lemma 4.14, each node which is a proper subset of another node has a higher probability as per pnodes(). Since nd is processed before  nd′is generated and nodes in Q are processed in descending order of  pnodes()(lines 23 and 6),  pnodes(nd) > pnodes(nd′′)where  nd′′ ⊂ nd′ ⊆ nd, contradiction.

The purpose of the following definition is to refer to a node that results from another node nd by several replacements conducted by PRUNE as a node in a transitive replaces-relation with nd. This will simplify the notation used in the following two lemmata.

Definition 12.8. Let  ndi ∼Rep ndjiff  ndiis a replacement node of  ndjcomputed so far by PRUNE at any time during the execution of any call to DYNAMICHS during the execution of Algorithm 5. Further, let the set  Rep := {⟨ndi, ndj⟩ | ndi ∼Rep ndj}. Then we say that  nd1is in a transitive replaces-relation with  ndkiff there is a sequence of nodes  nd1, nd2, . . . , ndk−1, ndksuch that  ⟨ndi, ndi+1⟩ ∈ Repfor all i ∈ {1, . . . , k − 1}.

12.4.8 Completeness of DYNAMICHS

Lemmata 12.13 and 12.14 constitute the key results towards proving the completeness of DYNAMICHS in terms of finding the complete set of minimal diagnoses w.r.t. any current DPI DPI in case the execution of DYNAMICHS with current DPI DPI terminates on account of Q = []. In other words, if there are no more open nodes in the hitting set tree constructed by DYNAMICHS with current DPI DPI, all minimal diagnoses w.r.t. DPI have been labeled by valid and are thus elements of the set  Dcalc.

The completeness proof (Lemma 12.8) will be a proof by induction where Lemma 12.13 will serve to derive the base case of the induction, whereas Lemma 12.14 will be exploited to establish the induction step.

Lemma 12.13 assumes an arbitrary fixed “current” DPI DPI such that DYNAMICHS with this “current” DPI DPI returns due to Q = []. Further on, it assumes an arbitrary minimal diagnosis D w.r.t. DPI and a de-facto non-redundant node nd w.r.t. DPI which is a proper subset of D generated anytime throughout all executions of DYNAMICHS during the execution of Algorithm 5 up to the one with the current DPI DPI.

Given these preconditions, the lemma establishes the existence of a node  ndsucthat corresponds to a superset of nd and to a subset of D, includes one element more than the set nd and is generated anytime throughout all executions of DYNAMICHS during the execution of Algorithm 5 up to the one with the current DPI DPI. Moreover, it states that the node  nd′sucset-equal to this generated node that is an element of Q cannot be pruned. However, it might be replaced. In case there is only one potential replacement node of  nd′succonstructable from (the combined nodes of)  Qdup, this replacement node is de-facto non-redundant w.r.t. DPI. Any node  nd′suc,repin a transitive replaces relation with  nd′succannot be pruned either. It might again be replaced. In case there is only one potential replacement node of  nd′suc,repconstructable from (the combined nodes of)  Qdup, this replacement node is de-facto non-redundant w.r.t. DPI.

Figuratively, with respect to the hitting set tree constructed by DYNAMICHS, this lemma predicates the following: Let the hitting set tree produced by DYNAMICHS be completely constructed for an arbitrary DPI DPI. In case there is any tree branch whose edge labels correspond to a part of the minimal diagnosis D w.r.t. DPI and which is known to be definitely not pruned during this tree construction, then this branch must be extended by one edge labeled by an element of D and this extended path is known to be definitely not pruned during this tree construction.

Notice that during this tree construction, in practice, we will generally never be able to say that a concrete branch corresponding to a partial minimal diagnosis will definitely not be pruned. For, this depends on the answers to queries submitted by the interacting user. Nevertheless, for the proof of completeness of DYNAMICHS, it suffices to just know that there is any such branch in the tree.

Lemma 12.13. Assume the execution of DYNAMICHS with the current DPI DPI and assume that the execution stops due to Q = []. Let

GenNodes be the set of all nodes generated throughout the execution of all calls to DYNAMICHS during the execution of Algorithm 5,

• Dbe some minimal diagnosis w.r.t. DPI,

 nd ∈ GenNodessuch that nd is de-facto non-redundant w.r.t. DPI and  nd ⊂ Dand

Then there are nodes  ndsucand  nd′sucsuch that the following holds:

image

(2) |ndsuc| = |nd| + 1.

(3)  ndsuc ∈GenNodes.

(4)  nd′suc = ndsucis an element of Q immediately after  ndsuchas been generated.

(5) If PRUNE is called given a witness of redundancy of  nd′suc, then some replacement node of  nd′sucis found. If only one replacement node of  nd′sucis found, then this replacement node is de-facto non-redundant w.r.t. DPI.

(6) Let  nd′suc,repbe in a transitive replaces-relation with  nd′suc. If PRUNE is called given a witness of redundancy of  nd′suc,rep, then some replacement node of  nd′suc,repis found. If only one replacement node of  nd′suc,repis found, then this replacement node is de-facto non-redundant w.r.t. DPI.

Proof. Now, since  nd ∈ GenNodes, we know that nd must be generated at some point in time during the execution of any call to DYNAMICHS during the execution of Algorithm 5. As the execution  Excurrof the call to DYNAMICHS using DPI is assumed to terminate due to Q = [] and no more nodes can be generated after Q = [] (each generated node is constructed by extending a node in Q), nd must be generated the latest during  Excurr.

So, let us consider exactly the point in time when nd is generated. Since this point in time might not arise during the execution  Excurrof DYNAMICHS, but during some execution  Exprevtaking place before  Excurrwhich uses some “current” DPI which includes fewer test cases than the current DPI DPI of  Excurr, we call the “current” DPI in  Exprevin the following  DPIprev. That is,  DPIprevmight be equal to DPI or comprise a subset of the test cases DPI includes.

First, we observe that immediately after nd has been generated, there is some node  nd′ ∈ Qsuch that nd′ = nd. If  nd′is not the same node as nd, then  nd ∈ Qdup. This follows from lines 20-23.

Second, we have that  nd′ ∈ Qcannot be pruned before it is processed. In case  nd′is the same node as nd, this follows from Proposition 12.7 and the precondition that nd is de-facto non-redundant w.r.t. DPI. Notice that in this case  nd ∈ Qcannot even be replaced (also by Proposition 12.7).Otherwise, if  nd′is not the same node as nd, we argue as follows: Assume that  nd′is redundant w.r.t. DPIprevand that the PRUNE function is called with arguments  Q, Qdupand some minimal conflict set X w.r.t.  DPIprevwhich is a witness of redundancy of  nd′. Then, since nd is de-facto non-redundant w.r.t. DPI, since  DPIprevincludes a subset of the test cases DPI comprises and by Proposition 12.7, nd cannot have been deleted from  Qdupduring any pruning step. Thence, by Lemma 12.10, nd (or some other node set-equal to  nd′for which X is not a witness of redundancy) must be constructed and added to Q in lines 96-101 during the execution of the PRUNE function.

That is, before any node set-equal to nd is processed, any number of calls to PRUNE with arguments Q, Qdupand some minimal conflict set X w.r.t. any DPI  DPIprevimply that Q includes some node that is set-equal to nd. Let us denote by node the node set-equal to nd that is finally processed.

There must be some execution of DYNAMICHS with some DPI (which might be equal to DPI or include a subset of the test cases in DPI) during which node is processed. This holds as the execution of DYNAMICHS with DPI is assumed to stop because of Q = [], since not all nodes set-equal to node can be pruned, as just argued before, and because the only alternative way, except for pruning, to achieve the deletion of a node from Q (line 7) is to process it. Let  DPIprevnow be the “current” DPI of the execution of DYNAMICHS during which node is processed. Further, we denote the DPI considered by the immediate subsequent execution of DYNAMICHS by  DPIprev+1, and so on.

When node is processed, it is either

(a) labeled by a set (DLABEL returns in line 40, 46 or 34) or

(b) not labeled by a set (DLABEL returns in line 29 or 43).

Case (b): In this case, DLABEL returns either

(i) nonmin or

(ii) valid.

Case (i): By Lemma 12.1, node must be a non-minimal diagnosis w.r.t.  DPIprev. By line 15, node is then added to the set  D⊃. D⊃is never modified throughout Algorithm 5 and is given as an input argument to each subsequent call to DYNAMICHS by line 10 in Algorithm 5. During the execution of some subsequent call to DYNAMICHS using the DPI  DPIprev+ifor  i ≥ 1, the set  D⊃might be modified by the UPDATETREE function (line 65 and lines 70-78) or in the DLABEL function (line 38) called for DPIprev+i. Because node = nd and nd is de-facto non-redundant w.r.t. DPI, we infer by the same argumentation as used above that  node ∈ D⊃cannot be pruned, i.e. node considered as a set cannot be deleted from  D⊃in line 65 or line 38. The truth of this is supported by Corollary 12.1 and Lemmata 12.6 and 12.7 which say that PRUNE can only be called given some minimal conflict set X w.r.t.  DPIprev+i. So, after any number of calls to PRUNE, we have that either  node ∈ D⊃or, otherwise, there is some node in  D⊃which is set-equal to node and which is in a transitive replaces-relation with node. We keep calling this (possibly replacement) node node in the following.

By Lemma 12.1, at the time node was processed, there must be some diagnosis  D′w.r.t.  DPIprevsuch that  D′ ∈ Dcalcand  node ⊃ D′. Additionally, by Lemma 12.1, the set  Dcalccomputed during DYNAMICHS for some “current” DPI  DPIjcomprises only diagnoses w.r.t.  DPIj. Now, we have node ⊂ Dsince  nd ⊂ Dand node = nd, and  D′ ⊂ node. That is,  D′ ⊂ D. By the precondition that D is a minimal diagnosis w.r.t.  DPI, D′cannot be a diagnosis w.r.t. DPI. Thus, there cannot be any such D′in  Dcalccomputed during DYNAMICHS for DPI.

All nodes in  Dcalcreturned by some call to DYNAMICHS using DPI  DPI1that are no diagnoses w.r.t.  DPI2, the extension of  DPI1by a new query added as a positive or negative test case, are added to the set  D×(and not to  D✓) in line 22 of Algorithm 5 and are thus no elements of the set  D✓given as an argument to DYNAMICHS at the next call to DYNAMICHS. The elements of  D✓given as an argument to

DYNAMICHS at the next call to DYNAMICHS using  DPI2are definitely added to Q again in lines 79-80 as  D✓is not modified elsewhere in DYNAMICHS before lines 79-80 are reached. Therefore, we need to differentiate between two cases: Either

(x1)  D′ ∈ D×never holds for the input argument  D×to any call to DYNAMICHS or

(x2)  D′ ∈ D×holds at least once for the input argument  D×to some call to DYNAMICHS.

Case (x1): Since  D′ ∈ Dcalcholds after the execution of DYNAMICHS using  DPIprevstops, we have that  D′ ∈ D✓must hold for the argument  D✓given to DYNAMICHS using  DPIprev+1. After UPDATETREE returns during DYNAMICHS using  DPIprev+1, D′ ∈ Qholds as argued. Subsequently, D′might be added again to  Dcalcand then to  D✓again in line 21 of Algorithm 5 and to Q again in line 80 during DYNAMICHS using  DPIprev+2, and so forth. But, when a test case is added to some DPI DPIprev+iin Algorithm 5 that invalidates the diagnosis  D′(yielding the DPI  DPIprev+i+1), D′ /∈ Dcalcis assumed to hold (otherwise it would be an element of  D×against our assumption). Such a test case must be added sometime as argued above. By Proposition 12.3,  D′cannot be a (minimal) diagnosis w.r.t. any DPI including a superset of the test cases in  DPIprev+i+1either. Notice that the case  D′ /∈ Dcalccan emerge in spite of the fact that  D′is a minimal diagnosis w.r.t.  DPIprev+ibecause there may be minimal diagnoses w.r.t.  DPIprev+ithat have a higher probability as per  pnodes()than  D′. For  DPIprev+i+1and all DPIs including more test cases than  DPIprev+i+1, D′cannot be added to  Dcalcanymore due to Lemma 12.1 since only diagnoses w.r.t. the currently used DPI can be added to  Dcalc.

Case (x2): Here,  D′ ∈ D×holds at least once for the input argument  D×to some call to DYNAMICHS using the DPI  DPIprev+i. Then, DYNAMICHS using the DPI  DPIprev+i−1must have returned a set Dcalcincluding  D′as otherwise  D′cannot be added to  D×. Hence,  D′must be a diagnosis w.r.t. DPIprev+i−1by Lemma 12.1. Since  D′is added to  D×, it cannot be a diagnosis w.r.t.  DPIprev+i. This must hold

by Remark 7.4,

since the set added to  D×in Algorithm 5 is exactly the set  Doutreturned by GETINVALIDDIAGS in line 19 of Algorithm 5 and

 Dout = D+(Q)in case the user answer u(Q) to the query Q w.r.t.  Dcalcand  DPIprev+i−1is false and  Dout = D−(Q)otherwise (notice that  Dcalcis called  D✓in Algorithm 5).

So, by Proposition 12.3,  D′cannot be a (minimal) diagnosis w.r.t. any DPI including more test cases than DPIprev+ieither.

Each element in  D×is processed by the UPDATETREE function (lines 48-69) called for the DPI DPIprev+i. In lines 48-69, each node ndx in  D×can only be pruned or either ndx or a node in a transitive replaces-relation with ndx is added to Q in line 68.  Dcalcis not modified by UPDATETREE and Dcalc = ∅holds at the beginning of the execution of each call to DYNAMICHS. (A node set-equal to)  D′cannot ever be readded to  Dcalcby Lemma 12.1 and since  D′is not a diagnosis w.r.t any DPI including more test cases than  DPIprev+i. Hence,  D′ ∈ Dcalccan never hold for any DPI including more test cases than  DPIprev+i.

Hence, there must be some DPI  DPIprev+ksuch that  D✓given as input to the DYNAMICHS-call for  DPIprev+kdoes not include any diagnosis  D′ ⊂ node. So, during the execution of the call to DYNAMICHS using DPI  DPIprev+k, nodemust be deleted from  D⊃and be reinserted into Q by lines 70- 78 in UPDATETREE which is called at the beginning of the execution of DYNAMICHS at any call to DYNAMICHS. This must hold since all nodes ndx in  D⊃that have not yet been pruned and for which there is no diagnosis in  D✓which is a proper subset of ndx, are added to Q throughout lines 70-78. As shown, both criteria are met for node during the execution of the call to DYNAMICHS using DPI DPIprev+k.

Case (ii): By Lemma 12.1, we know that node is a diagnosis w.r.t.  DPIprevand that node is added to Dcalc. Since  node ⊂ Dand D is a minimal diagnosis w.r.t. DPI, we obtain, by the same argumentation as in (i), that there must be some DPI  DPIprev+ksuch that  D✓given as input to the DYNAMICHS-call for  DPIprev+kdoes not include node.

If  node /∈ D×, then it cannot ever be added to  Dcalcagain, as argued in case (i). Otherwise, during the execution of UPDATETREE which is called at the beginning of the execution of each call to DYNAMICHS, D×is modified in lines 48-69.

Now, we differentiate between two cases, namely node is either

(¬r) non-redundant w.r.t. DPI or

(r) redundant w.r.t. DPI.

Case (¬r): Due to the non-redundancy of node w.r.t. DPI, Lemma 12.4, Lemma 12.10 and Corollary 12.1, node cannot be replaced or pruned throughout lines 48-66. Thus, node is reinserted into Q in line 68.

Case (r): Since node is redundant w.r.t. DPI, it may or may not be redundant w.r.t.  DPIprev+k+1. So, during the UPDATETREE function called in DYNAMICHS for  DPIprev+k+1, there may or may not be some call to PRUNE given some X as argument which is a witness of redundancy of node. In the latter case, node will not be replaced or pruned during any PRUNE execution and will be reinserted into Q in line 68. In the former case, node might be replaced, but it cannot be pruned due to the same reasoning as given above in case (i). So, either node or some node in a transitive replaces-relation with node must be in  D×at the time line 67 is reached. This node is then added to Q in line 68.

Now, both cases (i) and (ii) identified for case (b) lead to the reinsertion of node or some node set-equal to node into Q. Notice that this node has the same properties as node before one of the cases (i) or (ii) emerged. That is, if PRUNE is called given a witness of redundancy of node, then a replacement node of node is found. And, if only one replacement node of node is found, this replacement node is de-facto non-redundant w.r.t. DPI.

If node is the same node as nd, this holds since there cannot be a witness of redundancy of nd due to the de-facto non-redundancy of nd w.r.t. DPI and Proposition 12.7. Otherwise, this holds by Lemma 12.10 and since node = nd and  nd ∈ Qdupmust hold due the de-facto non-redundancy of nd w.r.t. DPI and Proposition 12.7. So, we call this reinserted node again node.

Furthermore, node can be neither labeled by valid nor by nonmin during the execution of DY- NAMICHS for DPI. This holds by Lemma 12.1 and since node can be neither a diagnosis nor a non-minimal diagnosis w.r.t. DPI due to  node ⊂ Dand the fact that D is a minimal diagnosis w.r.t. DPI. As a consequence of this and the assumption that the DYNAMICHS-call for DPI terminates due to Q = [], case (a) must arise at some point in time for node during some execution of DYNAMICHS for some (previous) DPI not-necessarily equal to DPI.

Case (a): In this case, by Lemma 12.2, DLABEL returns a minimal conflict set L w.r.t.  DPIprevas a label for node where L has the property that  L ∩ node = ∅. It must hold that  L ̸= ∅. Otherwise, by Proposition 4.2, either

(v1) K is valid w.r.t.  ⟨·, B, Pprev, Nprev⟩Rwhere  DPIprev = ⟨K, B, Pprev, Nprev⟩Ror

(v2)  DPIprevis non-admissible.

In the former case (v1), we know by Corollary 3.3 that the only (minimal) diagnosis w.r.t.  DPIprevis  ∅. If  DPIprevis equal to DPI, this is a contradiction to the existence of some minimal diagnosis w.r.t. DPI, namely D, which is not the empty set.  D ⊃ ∅must hold since, by precondition, there is a node nd such that  nd ⊂ Dand since  ∅ ⊆ nd.

Otherwise, if  DPIprevincludes a proper subset of the test cases DPI includes, DPI can never be a current DPI during any execution of DYNAMICHS during the same execution of Algorithm 5 during which there is an execution of DYNAMICHS using  DPIprevas a current DPI. This holds as there must be at least two diagnoses in  D✓(which is the set  Dcalcreturned by DYNAMICHS for  DPIprev) in line 13 of Algorithm 5 in order for DYNAMICHS to be called again with an extended DPI. For, in case there is only one diagnosis, i.e.  ∅, then the probability of this diagnosis is 1 which is greater or equal  1 − σfor any choice of  σdue to  σ ≥ 0. Consequently, Algorithm 5 would return in line 14. This is a contradiction to the assumption that there is an execution of DYNAMICHS using DPI as a current DPI.

In the latter case (v2), we can infer by Corollary 7.3, which states that adding queries as test cases to an admissible DPI can never yield a non-admissible DPI, that the DPI given as an input to Algorithm 5 must be non-admissible, contradiction.

Thence,  L ̸= ∅and DYNAMICHS will execute lines 17-23 and generate one node  nodee := ADD(node, e) with  nodee.cs := ADD(node.cs, L)for each  e ∈ L(cf. Definition 12.2 for an explanation of the function ADD).

Now, we have that there must be some non-empty active sublabel of  L = nodee.cs[r]w.r.t. DPI where  r := |nodee|by Definition 12.6. This holds by the following argumentation:

The first observation is that  nodee.cs[r]cannot be reduced twice during one and the same execution of DYNAMICHS using one and the same DPI  DPIprev+jwhich results from  DPIprevby addition of test cases. For, by Corollaries 12.1 and 12.2 and Lemmata 12.6 and 12.7, PRUNE as well as PRUNEQDUP can only be called given some minimal conflict set X w.r.t.  DPIprev+j. By Lemmata 12.10 and 12.8, all nodes ndx that are in the set returned by PRUNE and PRUNEQDUP, respectively, have the property that there are no proper supersets of X in ndx.cs. Moreover, there are no proper subsets of X in ndx.cs. Because each ndx.cs[m] for  m ∈ {1, . . . , |ndx.cs|}must be a minimal conflict set w.r.t. some DPI equal to  DPIprev+jor including a subset of the test cases in  DPIprev+j. Otherwise, ndx could not be a node during the execution of DYNAMICHS where  DPIprev+jis the current DPI. By Proposition 12.1, there cannot be any  m ∈ {1, . . . , |ndx.cs|}such that  ndx.cs[m] ⊂ Xas X is a minimal conflict set w.r.t. DPIprev+j. As two minimal conflict sets w.r.t.  DPIprev+jcan never be in a proper subset-relationship with one another,  L = nodee.cs[r]can be modified at most once by PRUNE or PRUNEQDUP for the DPI DPIprev+j.

Second, by Proposition 12.1, each minimal conflict set w.r.t.  DPIprevis a conflict set w.r.t. any DPI DPIprev+jthat results from  DPIprevby addition of test cases, that is, in particular, w.r.t. DPI. So, there must be some minimal conflict set  Cjw.r.t. each  DPIprev+jsuch that  Cj ⊆ Land there cannot be any minimal conflict set w.r.t.  DPIprev+jthat is a proper superset of L.

Third, we have that  L ̸= ∅, Lis a minimal conflict set w.r.t.  DPIprev, and  DPIprev+jincludes a superset of the test cases in  DPIprev. Thus, by Proposition 12.2, each minimal conflict set w.r.t. DPIprev+jmust be non-empty. In particular, this implies that all minimal conflict sets w.r.t. DPI that are subsets of L must be non-empty.

By these three observations, the criteria of Definition 12.6 can be applied to analyze the active subnode of  nodee.cs[r]w.r.t. DPI. That is, if  C1, . . . , Cnis the (arbitrary actual) chronological sequence of all sets X given as an argument to PRUNE and PRUNEQDUP during all executions of DYNAMICHS from the one with current DPI  DPIprevup to and including the one with current DPI DPI where

nodee.cs[r] ⊃ C1,

each  Ciis a minimal conflict set w.r.t.  DPIifor  i ∈ {1, . . . , n}

• Ck⊃ Ck+1for k ∈ {1, . . . , n − 1},

 DPIjincludes a proper subset of the test cases  DPIj+1includes for  j ∈ {1, . . . , n − 1},

 DPInis equal to DPI or includes a proper subset of the test cases DPI includes and

 DPIprevincludes a proper subset of the test cases  DPI1includes,

then  Cnis the active sublabel of  nodee.cs[r]w.r.t. DPI. However, as argued before, the minimal conflict set  Cnw.r.t.  DPIncannot be the empty set. As a consequence, we obtain that there must be a non-empty active sublabel of  nodee.cs[r]w.r.t. DPI.

By Propositions 12.1 and 12.2, there is a non-empty minimal conflict set  C′w.r.t. DPI such that C′ ⊆ Cn. Due to  Cn ⊂ · · · ⊂ C1 ⊂ nodee.cs[r] = Lwe conclude that  Cn ⊂ L. Therefore,  ∅ ⊂ C′ ⊂ Lholds.

By Proposition 4.6, each minimal diagnosis w.r.t. DPI is a minimal hitting set of all minimal conflict sets w.r.t. DPI. Thence, we have that  C′ ∩ D ̸= ∅. So, by  C′ ⊂ L, we have that  ∅ ⊂ C′ ∩ D ⊆L ∩ D ⊆ L. Consequently, we define  ndsuc := nodex = ADD(node, x)with  ndsuc.cs := nodex.cs =ADD(node.cs, L) for some  x ∈ C′ ∩ D ⊆ L. Then,  ndsuc ⊆ Dbecause  node ⊂ Dand  x ∈ D. It is clear from the inference so far that  nd ⊂ ndsuc, |ndsuc| = |nd| + 1and  ndsuc ∈ GenNodes. This shows the truth of propositions (1)-(3).

Proposition (4) must hold by lines 20-23.

Now we argue why propositions (5) and (6) must hold. Assume that  nd′suc ∈ Qis redundant w.r.t.some DPI  DPI′′prevwhich is equal to DPI or includes a subset of the test cases in DPI. Then, there must be some minimal conflict set  C′′w.r.t.  DPI′′prevwhich is a witness of redundancy of  nd′suc. Suppose that PRUNE is called given  X := C′′as an argument.

Now, we have to distinguish two cases: Either

(q1)  ndsucwas added to Q after it was generated or

(q2)  ndsucwas added to  Qdupafter it was generated

image

(c1)  C′′ ⊂nd′suc.cs[|nd′suc|]and nd′suc[|nd′suc|] ∈nd′suc.cs[|nd′suc|] \ C′′or

(c2)  C′′ ⊂ nd′suc.cs[j]and  nd′suc[j] ∈ nd′suc.cs[j] \ C′′for some  j ∈�1, . . . , |nd′suc| − 1�.

Case (q1): Here, we have that  nd′sucis the same node as  ndsucsince  ndsucwas added to Q after generation and no node replacement can have taken place because  nd′sucis defined as the node set- equal to  ndsucthat is an element of Q immediately after  ndsuchas been generated. And, only one node corresponding to one and the same set can be in Q at the same time.

Case (c1): We have that  C′′must be equal to some minimal conflict set  Cjin the sequence  C1, . . . , Cn. This must be truesince, first,  DPI′′previs equal to DPI or includes a subset of the test cases in DPI and DPIprevincludes a proper subset of the test cases in  DPI′′prev.

To understand why the latter must hold, recall that  DPIprevis the DPI of the call to DYNAMICHS where  ndsucwas generated and the minimal conflict set L was computed. By assumption, however, there is some minimal conflict set w.r.t.  DPI′′prev, namely  C′′, such that  C′′ ⊂ nd′suc.cs[|nd′suc|] = L. Hence, it cannot be truethat both L and  C′′are minimal conflict sets w.r.t. the same DPI. Otherwise, we would have a contradiction to the minimality of L. By Proposition 12.1, which states that minimal conflict sets cannot grow by the addition of new test cases to the DPI, we obtain the claimed fact that  DPIprevincludes a proper subset of the test cases in  DPI′′prev.

Second, the sequence  C1, . . . , Cncomprises all sets X given as an argument to PRUNE and PRUNEQDUP during all executions of DYNAMICHS from the one with current DPI  DPIprevup to and including the one with current DPI DPI where  nd′suc.cs[|nd′suc|] ⊃ C1 ⊃ · · · ⊃ Cnholds. Reason for this to be valid is the fact that  nd′sucis the same node as  ndsucin the currently considered case (q1).

Now, recall that  C′is a minimal conflict set w.r.t. DPI such that  x ∈ C′ ∩ D ⊂ L. Further, by nd′suc = nodex, we have that  nd′suc[|nd′suc|] = x. Due to  C′ ⊆ Cnand  Cn ⊆ Cj, we have that  C′ ⊆ Cj. Therefore, we can infer by  C′′ = Cjthat  C′ ⊆ C′′is true. Now,  x ∈ C′implies that  x ∈ C′′wherefore x /∈ nd′suc.cs[|nd′suc|] \ C′′. By  x = nd′suc[|nd′suc|], this is a contradiction to the assumption of case (c1). Hence, case (c2) must arise.

Case (c2): We have that  nd′suc[1..|nd′suc| − 1]is the same node as node since  nd′suc = nodex. Then, there are two cases: Either

(s1) node is the same node as nd or

(s2) node is not the same node as nd.

Case (s1): If node is the same node as nd, then node is de-facto non-redundant w.r.t. DPI since nd is de-facto non-redundant w.r.t. DPI by precondition. Moreover, x is an element of the active sublabel of nd′suc[|nd′suc|]w.r.t. DPI, as specified before. Thus, by Definition 12.7,  nd′sucis de-facto non-redundant w.r.t. DPI. Hence, PRUNE cannot be given an argument  C′′which is a witness of redundancy of  nd′sucwhere  C′′is a minimal conflict set w.r.t.  DPI′′prev. This holds due to

the fact that  DPI′′prevcomprises a (not necessarily proper) subset of the test cases in DPI,

Proposition 12.7 which states that a de-facto non-redundant node w.r.t. DPI cannot be pruned or replaced during any execution of DYNAMICHS with a current DPI that includes a (not necessarily proper) subset of the test cases in DPI and

Lemma 12.10 which says that  nd′sucwould be replaced or pruned in case that PRUNE is called given a witness of redundancy of  nd′suc.

So, we have derived a contradiction to the assumption that PRUNE is called given a minimal conflict set X := C′′w.r.t.  DPI′′prevwhich is a witness of redundancy of  nd′suc. Hence, case (s2) must be true.

Case (s2): If node is not the same node as nd, then node may or may not be de-facto non-redundant w.r.t. DPI. In the former case, the same argumentation as in case (s1) applies and yields a contradiction. In the latter case, we know that  C′′ ⊂ nd′suc.cs[j]as well as  nd′suc[j] ∈ nd′suc.cs[j] \ C′′must be truefor some  j ∈�1, . . . , |nd′suc| − 1�. So, by Lemma 12.10,  nd′sucis not an element of the returned list  Q′of the call to PRUNE given the arguments Q (which includes  nd′suc), X := C′′and  Qdup.

However, at least one replacement node of  nd′sucmust be found by PRUNE. This must hold by the following reasoning:

First,  nd ∈ Qdupmust hold at the time this call to PRUNE is made. This is satisfied since

the entire (current) list  Qdupis browsed for an alternative subnode of  nd′suc,

 nd ∈ Qdupholds at some point in time during the execution of DYNAMICHS with the current DPI  DPIprevdue to the fact that node is not the same node as nd and the argumentation at the beginning of this proof,

 DPIprevincludes a subset of the test cases in  DPI′′prev,

 DPI′′previncludes a subset of the test cases in DPI,

Proposition 12.7 states that a de-facto non-redundant node w.r.t. DPI cannot be pruned or replaced during the execution of DYNAMICHS with a current DPI that includes a subset of the test cases in DPI,

nodes can only be deleted from  Qdupby being pruned and

nd is de-facto non-redundant w.r.t. DPI.

Second, by line 21 and PRUNEQDUP, which are the only places in DYNAMICHS where  Qdupis modified,  Qdupis sorted in ascending order by node cardinality at any time during the execution of any call to DYNAMICHS.

Third, in order to construct a replacement node of  nd′suc, PRUNE first determines the maximal k such that  C′′ ⊂ nd′suc.cs[k]and  nd′suc[k] ∈ nd′suc.cs[k] \ C′′. As case (c1) was proven to be false, we conclude that  k ≤ |nd′suc| − 1must hold. Then, in line 96, an alternative subnode of  nd′suc

which has cardinality k + z where  z ≥ 0is minimal and

from which a replacement node of  nd′succan be constructed

is searched for in  Qdup. To see this, observe that elements in  Qdup– which is sorted in ascending order of node cardinality, as argued – are visited in order starting from the lowest cardinality node (line 96).

Fourth,  nd ∈ Qdupis an alternative equal node of node. Since  nd′suc = nodex, we have that nd is an alternative subnode of  nd′sucsuch that  k ≤ |nd′suc| − 1 = |nd|.

Thus, we have that one replacement node of  nd′sucis definitely found by PRUNE. And, in case there is only one replacement node of  nd′succonstructable during PRUNE, then this replacement node is given by  nd′suc,new := ADD(nd, x)with  nd′suc,new.cs := ADD(nd.cs, L). By the de-facto non-redundancy of nd and since x is specified as an element of the active sublabel of  nd′suc.cs[|nd′suc|]w.r.t. DPI (see above), we obtain by Definition 12.7 that  nd′suc,newis a de-facto non-redundant node w.r.t. DPI. Thence, proposition (5) is true.

Due to  |nd| = |node| = |nd′suc| − 1, the alternative subnode of  nd′sucactually found by PRUNE cannot have a cardinality greater than  |nd′suc| − 1. So, let  ndaltbe the found alternative subnode of nd′suc. Since  |ndalt| ≤ |nd′suc|−1, we obtain that the replacement node  nd′suc,new,1of  nd′succonstructed from  ndaltmust meet  nd′suc,new,1[|nd′suc|] = nd′suc[|nd′suc|] = xas well as  nd′suc,new,1.cs[|nd′suc|] =nd′suc.cs[|nd′suc|] = L. That is, the first  |nd| = |node| = |nd′suc| − 1positions of  nd′suc,new,1as a set correspond to a node in a transitive replaces-relation with nd.

Therefore, the same line of argument as used for  nd′succan be applied to any node  nd′suc,repin a transitive replaces-relation with  nd′suc. That is, the following must be valid for any node  nd′suc,repin a transitive replaces-relation with  nd′suc:

 nd′suc,rep[|nd′suc|] = xand  nd′suc,rep.cs[|nd′suc|] = L.

If PRUNE is called given a witness of redundancy of  nd′suc,rep, then some replacement node of nd′suc,repis found. And, if only one replacement node of  nd′suc,repis constructable, then this replacement node is de-facto non-redundant w.r.t. DPI.

After once a replacement node of  nd′sucor of some node in a transitive replaces-relation with  nd′sucis found which is de-facto non-redundant w.r.t. DPI, this replacement node cannot be replaced or pruned by Proposition 12.7. Therefore, by Lemma 12.10, no witness of redundancy of this replacement node can exist w.r.t. any DPI including a (not necessarily proper) subset of the test cases in DPI. Thence, proposition (6) is true.

Case (q2): Here, we have that  nd′sucis not the same node as  ndsuc. This must be valid as  nd′sucis defined as the node set-equal to  ndsucthat is an element of Q immediately after  ndsucwas generated and ndsucis assumed to be added to  Qdupafter being generated.

Now, independently of whether (c1) or (c2) occurs, the following holds: If PRUNE is called given a witness of redundancy  C′′of  nd′sucw.r.t.  DPI′′prev, then a replacement node of  nd′sucis found. And, if only one replacement node of  nd′sucis constructable, then this replacement node is de-facto non- redundant w.r.t. DPI.

To understand why this must hold, first recall that  ndsucis a successor of node, i.e.  ndsuc[1..|ndsuc|−1] is the same node as node. Furthermore, node is the node set-equal to nd that is processed. That is, node is either the same node as nd or it is in a transitive replaces-relation with nd. Then, the same two cases (s1) and (s2) can be distinguished as in case (q1)(c2) where (s1) leads to a contradiction. So, case (s2) must be true. That is, node is not the same node as nd. Hence, by the argumentation in case (q1)(c2)(s2), nd ∈ Qdupmust hold during the execution of any call to DYNAMICHS with a current DPI that comprises a (not necessarily proper) superset of the test cases in  DPIprev– which is the current DPI at the time nd is generated – and a (not necessarily proper) subset of the test cases in DPI. In particular, this implies that  nd ∈ Qdupat the time PRUNE is called given the witness of redundancy  C′′of  nd′sucw.r.t.  DPI′′prevas an argument.

By assumption,  ndsuchas been added to  Qdupafter being generated. Now, suppose PRUNEQDUP is called given a witness of redundancy  C′of  ndsuc ∈ Qdupw.r.t. some DPI  DPI′prevas an argument. Then DPI′prevmust comprise a (not necessarily proper) superset of the test cases in  DPIprev. This can be concluded from Lemma 12.12 which implies that  ndsuccannot have been generated during an execution of DYNAMICHS with a current DPI including a proper subset of the test cases in  DPIprev. Hence, the argumentation before implicates that  nd ∈ Qdupat the time PRUNEQDUP is called given the witness of redundancy  C′of  ndsucw.r.t.  DPI′prevas an argument.

Thus,  ndsuccannot be pruned on account of Lemma 12.8 which says that a node can only be pruned from  Qdupif the set  Combndsuc(Qdup)of combined equal nodes of  ndsucof  Qdup(cf. Definition 12.5) is the empty set.

However,  Combndsuc(Qdup) ̸= ∅must be valid. Because we demonstrated that

 nd ∈ Qdup,

 ndsuc ∈ Qdup,

 ndsucis the same node as  nodex = ADD(node, x)with  ndsuc.csbeing equal to  nodex.cs =ADD(node.cs, L),

nd = node and

x is specified as an element of the active sublabel of  ndsuc.cs[|ndsuc|]w.r.t. DPI (see above) wherefore  x /∈ ndsuc.cs[|ndsuc|] \ C′.

Therefore,

image

is a combined equal node of  ndsucof  Qdup, i.e.  ndcomb ∈ Combndsuc(Qdup). The node  ndcombis de-facto non-redundant w.r.t. DPI as nd is de-facto non-redundant w.r.t. DPI and since x is an element of the active sublabel of  ndsuc.cs[|ndsuc|]w.r.t. DPI.

By Definition 12.5, any combined equal node of  ndsucmust share the element at the  |ndsuc|-th position with  ndsucand  ndsuc.cs, respectively. Hence, the first  |ndsuc|−1elements of a combined equal node of  ndsucare set-equal to the first  |ndsuc| − 1elements of  ndsuc. So, there exists a combined equal node, namely  ndcomb, of any (redundant) node that results from  ndsucby a set of combined replacements.

By Lemma 12.11, the fact that  ndcomb ∈ Combndsuc(Qdup) ⊆ Comb(Qdup)at some point in time during the execution of DYNAMICHS with current DPI  DPI′prevand the de-facto non-redundancy of ndcombw.r.t. DPI, we conclude that, during any execution of DYNAMICHS with a current DPI that includes a (not necessarily proper) superset of the test cases in  DPI′prevand includes a (not necessarily proper) subset of the test cases in  DPI, ndcomb ∈ Comb(Qdup)must hold. Because  DPI′previs an arbitrary DPI that comprises a (not necessarily proper) superset of the test cases in  DPIprev, we derive that  ndcomb ∈ Comb(Qdup)must be trueparticularly during the execution of DYNAMICHS with the current DPI  DPI′′prev.

If  C′′is a witness of redundancy of  ndsuc ∈ Qdup, then the updated list  Qdupreturned by PRUNEQDUP must include a combined replacement node of  ndsuc, either  ndcombor some other node. Otherwise, i.e. if  C′′is not a witness of redundancy of  ndsuc ∈ Qdup, the updated list  Qdupreturned by PRUNEQDUP must include  ndsuc.

PRUNE is always called immediately after PRUNEQDUP and thus uses the updated list  Qdupwhich comprises a node set-equal to  ndsucand thus set-equal to  nd′suc. Consequently, we have that one replace- ment node of  nd′sucis definitely found by PRUNE. And, in case there is only one replacement node of nd′succonstructable during PRUNE, this replacement node is given by  ndcomb. Thence, proposition (5) is true.

Independently of which replacement node of  nd′sucis actually found by PRUNE, a set-equality be- tween this replacement node and  ndcombwill hold. This is truesince  ndcomb = nd′sucand since each replacement node, by definition, is set-equal to the node it replaces. Consequently, this set-equality holds for any node in a transitive replaces-relation with  nd′suc. So, we have that one replacement node of any node  nd′suc,repin a transitive replaces-relation with  nd′sucis definitely found by PRUNE. And, in case there is only one replacement node of  nd′suc,repconstructable during PRUNE, this replacement node is given by  ndcombwhich is de-facto non-redundant w.r.t. DPI.

That  ndcomb, after it has been used as a replacement node of  nd′sucor of some node in a transitive replaces-relation with  nd′suc, cannot be pruned or replaced, follows from Proposition 12.7 and the fact that  ndcombis de-facto non-redundant w.r.t. DPI. Therefore, by Lemma 12.10, no witness of redundancy of  ndcombcan exist w.r.t. any DPI including a (not necessarily proper) subset of the test cases in DPI. Thence, proposition (6) is true.

The next result, Lemma 12.14, assumes an arbitrary fixed “current” DPI DPI such that DYNAMICHS with this “current” DPI DPI returns due to Q = []. Further on, it assumes an arbitrary minimal diagnosis D w.r.t. DPI and a node nd which is a proper subset of D such that nd is an element of Q anytime throughout all executions of DYNAMICHS during the execution of Algorithm 5 up to the one with the current DPI DPI. Additionally, nd cannot be pruned. It might be replaced; and in case there is only one potential replacement node of nd constructable from (the combined nodes of)  Qdup, this replacement node is de-facto non-redundant w.r.t. DPI. Any node  nd′in a transitive replaces relation with nd cannot be pruned either. It might again be replaced. In case there is only one potential replacement node of  nd′constructable from (the combined nodes of)  Qdup, this replacement node is de-facto non-redundant w.r.t. DPI.

Given these preconditions, the lemma establishes the existence of a node  ndsucthat corresponds to a superset of nd and to a subset of D, includes one element more than the set nd and is generated anytime throughout all executions of DYNAMICHS during the execution of Algorithm 5 up to the one with the current DPI DPI. Moreover, it states that the node  nd′sucset-equal to this generated node that is an element of Q cannot be pruned. However, it might be replaced. In case there is only one potential replacement node of  nd′succonstructable from (the combined nodes of)  Qdup, this replacement node is de-facto non-redundant w.r.t. DPI. Any node  nd′suc,repin a transitive replaces relation with  nd′succannot be pruned either. It might again be replaced. In case there is only one potential replacement node of  nd′suc,repconstructable from (the combined nodes of)  Qdup, this replacement node is de-facto non-redundant w.r.t. DPI.

Pictured, with respect to the hitting set tree constructed by DYNAMICHS, this lemma purports the following: Let the hitting set tree produced by DYNAMICHS be completely constructed for an arbitrary DPI DPI. In case there is any tree branch whose edge labels correspond to a part of the minimal diagnosis D w.r.t. DPI and which is known to be definitely not pruned during this tree construction, then this branch must be extended by one edge labeled by an element of D and this extended path is known to be definitely not pruned during this tree construction.

Lemma 12.14. Assume the execution of DYNAMICHS with the current DPI DPI and assume that the execution stops due to Q = []. Let

GenNodes be the set of all nodes generated throughout the execution of all calls to DYNAMICHS during the execution of Algorithm 5,

• Dbe some minimal diagnosis w.r.t. DPI,

 DPI′prevbe a DPI which is either equal to DPI or includes fewer test cases than DPI and which is the current DPI during any particular call to DYNAMICHS,

nd be some node such that the following holds:

image

There is some execution of DYNAMICHS with current DPI  DPI′prevduring which it holds at some point in time that  nd ∈ Q.

If PRUNE is called given a witness of redundancy of nd, then some replacement node of nd is found. If only one replacement node of nd is found, then this replacement node is de-facto non-redundant w.r.t. DPI.

Let  nd′be in a transitive replaces-relation with nd. If PRUNE is called given a witness of redundancy of  nd′, then some replacement node of  nd′is found. If only one replacement node of  nd′is found, then this replacement node is de-facto non-redundant w.r.t. DPI.

Then there are nodes  ndsucand  nd′sucsuch that the following holds:

image

(2) |ndsuc| = |nd| + 1.

(3)  ndsuc ∈GenNodes.

(4)  nd′suc = ndsucis an element of Q immediately after  ndsuchas been generated.

(5) If PRUNE is called given a witness of redundancy of  nd′suc, then some replacement node of  nd′sucis found. If only one replacement node of  nd′sucis found, then this replacement node is de-facto non-redundant w.r.t. DPI.

(6) Let  nd′suc,repbe in a transitive replaces-relation with  nd′suc. If PRUNE is called given a witness of redundancy of  nd′suc,rep, then some replacement node of  nd′suc,repis found. If only one replacement node of  nd′suc,repis found, then this replacement node is de-facto non-redundant w.r.t. DPI.

Proof. Since  nd ∈ Qholds at some point in time during the execution of some call to DYNAMICHS with current DPI  DPI′prevand since the execution of DYNAMICHS with DPI terminates due to Q = [], we have that some node set-equal to nd must be processed. This must be satisfied because nodes can only be deleted from Q in that they are processed or pruned, and nd cannot be pruned from Q. For, by precondition, if PRUNE is called given a witness of redundancy of nd, then a replacement node of nd is found. And, if only one replacement node  ndrepof nd is found,  ndrepis de-facto non-redundant w.r.t. DPI.

Now, let  nd1be a replacement node of nd found by PRUNE called with some witness of redundancy of nd. Then, by precondition, what holds for nd also holds for  nd1. That is, if PRUNE is called given a witness of redundancy of  nd1, then a replacement node of  nd1is found. And, if only one replacement node  nd1,repof  nd1is found,  nd1,repis de-facto non-redundant w.r.t. DPI.

The same holds for any  ndiwhich is in a transitive replaces-relation with nd. So, anytime PRUNE is called for a node set-equal to nd, at least one replacement node is found by PRUNE. And, in case  ndiis de-facto non-redundant w.r.t. DPI – which must be the case sooner or later for some node in a transitive replaces-relation with nd, by the given preconditions – then, by Proposition 12.7,  ndicannot be pruned or replaced.

Hence, let us denote by node the node set-equal to nd that is finally processed. Let  DPIprevnow be the “current” DPI of the execution of DYNAMICHS during which node is processed. Further, we denote the DPI of the immediate subsequent execution of DYNAMICHS by  DPIprev+1, and so on.

Since node is processed, it is either

(s) labeled by a set (DLABEL returns in line 40, 46 or 34) or

(¬s) not labeled by a set (DLABEL returns in line 29 or 43).

Case (¬s): In this case, DLABEL returns

(i) nonmin or

(ii) valid.

Case (i): By Lemma 12.1, node must be a non-minimal diagnosis w.r.t.  DPIprev. By line 15, node is then added to the set  D⊃. D⊃is never modified throughout Algorithm 5 and is given as an input argument to each subsequent call to DYNAMICHS by line 10 in Algorithm 5. During the execution of some subsequent call to DYNAMICHS using the DPI  DPIprev+ifor  i ≥ 1, the set  D⊃might be modified by the PRUNE function called during UPDATETREE (line 65 and lines 70-78) or during DLABEL (line 38).

Recall that node is either the same node as nd or in a transitive replaces-relation with nd. Hence, by the argumentation given before, we have that, if PRUNE is called given a witness of redundancy of node, then there is a replacement node of node found by PRUNE. And, if there is only one replacement node of node found by PRUNE, then this replacement node is de-facto non-redundant w.r.t. DPI.

Therefore,  node ∈ D⊃cannot be pruned, i.e. node considered as a set cannot be deleted from  D⊃in line 65 or line 38. So, after any number of calls to PRUNE, we have that either  node ∈ D⊃or, otherwise, there is some node in  D⊃which is set-equal to node and which is in a transitive replaces-relation with node. We keep calling this (possibly replacement) node node in the following.

By Lemma 12.1, at the time node was processed, there must be some diagnosis  D′w.r.t.  DPIprevsuch that  D′ ∈ Dcalcand  node ⊃ D′. Additionally, by Lemma 12.1, the set  Dcalccomputed during DYNAMICHS for some “current” DPI  DPIjcomprises only diagnoses w.r.t.  DPIj. Now, we have node ⊂ Dsince  nd ⊂ Dand node = nd, and  D′ ⊂ node. That is,  D′ ⊂ D. By the precondition that D is a minimal diagnosis w.r.t.  DPI, D′cannot be a diagnosis w.r.t. DPI. Thus, there cannot be any such D′in  Dcalccomputed during DYNAMICHS for DPI.

All nodes in  Dcalcreturned by some call to DYNAMICHS using DPI  DPI1that are no diagnoses w.r.t.  DPI2, the extension of  DPI1by a new query added as a positive or negative test case, are added to the set  D×(and not to  D✓) in line 22 of Algorithm 5 and are thus no elements of the set  D✓given as an argument to DYNAMICHS at the next call to DYNAMICHS. The elements of  D✓given as an argument to DYNAMICHS at the next call to DYNAMICHS using  DPI2are definitely added to Q again in lines 79-80 as  D✓is not modified elsewhere in DYNAMICHS before lines 79-80 are reached.

Therefore, we need to differentiate between two cases: Either

(x1)  D′ ∈ D×never holds for the input argument  D×to any call to DYNAMICHS or

(x2)  D′ ∈ D×holds at least once for the input argument  D×to some call to DYNAMICHS.

Case (x1): Since  D′ ∈ Dcalcholds after the execution of DYNAMICHS using  DPIprevstops, we have that  D′ ∈ D✓must hold for the argument  D✓given to DYNAMICHS using  DPIprev+1. After UPDATETREE returns during DYNAMICHS using  DPIprev+1, D′ ∈ Qholds as argued. Subsequently, D′might be added again to  Dcalcand then to  D✓again in line 21 of Algorithm 5 and to Q again in line 80 during DYNAMICHS using  DPIprev+2, and so forth. But, when a test case is added to some DPI  DPIprev+iin Algorithm 5 that invalidates the diagnosis  D′(yielding the DPI  DPIprev+i+1), D′ /∈Dcalcis assumed to hold (otherwise it would be an element of  D×against our assumption). Such a test case must be added sometime as argued above. By Proposition 12.3,  D′cannot be a (minimal) diagnosis w.r.t. any DPI including more test cases than  DPIprev+i+1either. Notice that the case  D′ /∈Dcalccan emerge in spite of the fact that  D′is a minimal diagnosis w.r.t.  DPIprev+ibecause there may be minimal diagnoses w.r.t.  DPIprev+ithat have a higher probability than  D′. For  DPIprev+i+1and all DPIs including more test cases than  DPIprev+i+1, D′cannot be added to  Dcalcanymore due to Lemma 12.1 which claims that only diagnoses w.r.t. the currently used DPI can be added to  Dcalc.

Case (x2): Here,  D′ ∈ D×holds at least once for the input argument  D×to some call to DYNAMICHS using the DPI  DPIprev+i. Then, DYNAMICHS using the DPI  DPIprev+i−1must have returned a set Dcalcincluding  D′as otherwise  D′cannot be added to  D×. Hence,  D′must be a diagnosis w.r.t. DPIprev+i−1by Lemma 12.1. Since  D′is added to  D×, it cannot be a diagnosis w.r.t.  DPIprev+i. This must hold

by Remark 7.4,

since the set added to  D×in Algorithm 5 is exactly the set  Doutreturned by GETINVALIDDIAGS in line 19 of Algorithm 5 and

 Dout = D+(Q)in case the user answer u(Q) to the query Q w.r.t.  Dcalcand  DPIprev+i−1is false and  Dout = D−(Q)otherwise (notice that  Dcalcis referred to as  D✓in Algorithm 5).

So, by Proposition 12.3,  D′cannot be a (minimal) diagnosis w.r.t. any DPI including more test cases than DPIprev+ieither.

Each element in  D×is processed by the UPDATETREE function (lines 48-69) called for the DPI DPIprev+i. In lines 48-69, each node ndx in  D×can only be pruned or either ndx or a node in a transitive replaces-relation with ndx is added to Q in line 68.  Dcalcis not modified by UPDATETREE and Dcalc = ∅holds at the beginning of the execution of each call to DYNAMICHS. (A node set-equal to)  D′cannot ever be readded to  Dcalcby Lemma 12.1 and since  D′is not a diagnosis w.r.t any DPI including more test cases than  DPIprev+i. Hence,  D′ ∈ Dcalccan never hold for any DPI including more test cases than  DPIprev+i.

Hence, there must be some DPI  DPIprev+ksuch that  D✓given as input to the DYNAMICHS-call for  DPIprev+kdoes not include any diagnosis  D′ ⊂ node. So, during the execution of the call to DYNAMICHS using DPI  DPIprev+k, nodemust be deleted from  D⊃and be reinserted into Q by lines 70- 78 in UPDATETREE which is called at the beginning of the execution of DYNAMICHS at any call to DYNAMICHS. This must hold since all nodes ndx in  D⊃that have not yet been pruned and for which there is no diagnosis in  D✓which is a proper subset of ndx, are added to Q throughout lines 70-78. As shown, both criteria are met for node during the execution of the call to DYNAMICHS using DPI DPIprev+k.

Case (ii): By Lemma 12.1, we know that node is a diagnosis w.r.t.  DPIprevand that node is added to Dcalc. Since  node ⊂ Dand D is a minimal diagnosis w.r.t. DPI, we obtain, by the same argumentation as in (i), that there must be some DPI  DPIprev+ksuch that  D✓given as input to the DYNAMICHS-call for  DPIprev+kdoes not include node.

If  node /∈ D×, then it cannot ever be added to  Dcalcagain, as argued in case (i). Otherwise, during the execution of UPDATETREE which is called at the beginning of the execution of each call to DYNAMICHS, D×is modified in lines 48-69.

Now, we differentiate between two cases, namely node is either

(¬r) non-redundant w.r.t. DPI or

(r) redundant w.r.t. DPI.

Case (¬r): Due to the non-redundancy of node w.r.t. DPI, Lemma 12.4, Lemma 12.10 and Corollary 12.1, node cannot be replaced or pruned throughout lines 48-66. Thus, node is reinserted into Q in line 68.

Case (r): Since node is redundant w.r.t. DPI, it may or may not be redundant w.r.t.  DPIprev+k+1. So, during the UPDATETREE function called in DYNAMICHS for  DPIprev+k+1, there may or may not be some call to PRUNE given some X as argument which is a witness of redundancy of node. In the latter case, node will not be replaced or pruned during any PRUNE execution and will be reinserted into Q in line 68. In the former case, node might be replaced, but it cannot be pruned due to the same reasoning as given in the second paragraph of case (i). So, either node or some node in a transitive replaces-relation with node must be in  D×at the time line 67 is reached. This node is then added to Q in line 68.

Now, both cases (i) and (ii) identified for case (¬s) lead to the reinsertion of node or some node in a transitive replaces-relation with node – which is thus set-equal to nd – into Q. Notice that this node has the same properties as node before one of the cases (i) or (ii) emerged (by analogue reasoning as conducted above). That is, if PRUNE is called given a witness of redundancy of node, then a replacement node of node is found. And, if only one replacement node of node is found, this replacement node is de-facto non-redundant w.r.t. DPI. So, we call this reinserted node again node.

Furthermore, node can be neither labeled by valid nor by nonmin during the execution of DY- NAMICHS for DPI. This holds by Lemma 12.1 and since node can be neither a diagnosis nor a non-minimal diagnosis w.r.t. DPI due to  node ⊂ Dand the fact that D is a minimal diagnosis w.r.t. DPI. As a consequence of this and the assumption that the DYNAMICHS-call for DPI terminates due to Q = [], case (s) must arise at some point in time for node during some execution of DYNAMICHS for some (previous) DPI not-necessarily equal to DPI.

Case (s): In this case, by Lemma 12.2, DLABEL returns a minimal conflict set L w.r.t.  DPIprevas a label for node where L has the property that  L ∩ node = ∅. It must hold that  L ̸= ∅. Otherwise, by Proposition 4.2, either

(v1) K is valid w.r.t.  ⟨·, B, Pprev, Nprev⟩Rwhere  DPIprev = ⟨K, B, Pprev, Nprev⟩Ror

(v2)  DPIprevis non-admissible.

In the former case (v1), we know by Corollary 3.3 that the only (minimal) diagnosis w.r.t.  DPIprevis  ∅. If  DPIprevis equal to DPI, this is a contradiction to the existence of some minimal diagnosis w.r.t. DPI, namely D, which is not the empty set.  D ⊃ ∅must hold since, by precondition, there is a node nd such that  nd ⊂ Dand since  ∅ ⊆ nd.

Otherwise, if  DPIprevincludes a proper subset of the test cases DPI includes, DPI can never be a current DPI during any execution of DYNAMICHS during the same execution of Algorithm 5 during which there is an execution of DYNAMICHS where  DPIprevis the current DPI. This holds as there must be at least two diagnoses in  D✓in line 13 of Algorithm 5 in order for DYNAMICHS to be called again with a DPI including a proper superset of the test cases in  DPIprev(notice that, in Algorithm 5, the name of the set  Dcalcreturned by DYNAMICHS for  DPIprevis  D✓). For, in case there is only one diagnosis, i.e.  ∅, then the probability of this diagnosis is 1 which is greater or equal  1 − σfor any choice of  σdue to σ ≥ 0. Consequently, Algorithm 5 would return in line 14. This is a contradiction to the assumption that there is an execution of DYNAMICHS where DPI is the current DPI.

In the latter case (v2), we can infer by Corollary 7.3, which states that adding queries as test cases to an admissible DPI can never yield a non-admissible DPI, that the DPI given as an input to Algorithm 5 must be non-admissible, contradiction.

Thence,  L ̸= ∅and DYNAMICHS will execute lines 17-23 and generate one node  nodee := ADD(node, e) with  nodee.cs := ADD(node.cs, L)for each  e ∈ L(cf. Definition 12.2 for an explanation of the function ADD).

Now, we have that there must be some non-empty active sublabel of  L = nodee.cs[r]w.r.t. DPI where  r := |nodee|by Definition 12.6. Definition 12.6 is applicable by the following argumentation:

The first observation is that  nodee.cs[r]cannot be reduced twice during one and the same execution of DYNAMICHS using one and the same DPI  DPIprev+jwhich results from  DPIprevby addition of test cases. For, by Corollaries 12.1 and 12.2 and Lemmata 12.6 and 12.7, PRUNE as well as PRUNEQDUP can only be called given some minimal conflict set X w.r.t.  DPIprev+j. By Lemmata 12.10 and 12.8, all nodes ndx that are in the set returned by PRUNE and PRUNEQDUP, respectively, have the property that there are no proper supersets of X in ndx.cs. Moreover, there are no proper subsets of X in ndx.cs. Because each ndx.cs[m] for  m ∈ {1, . . . , |ndx.cs|}must be a minimal conflict set w.r.t. some DPI equal to  DPIprev+jor including a subset of the test cases in  DPIprev+j. Otherwise, ndx could not be a node during the execution of DYNAMICHS where  DPIprev+jis the current DPI. By Proposition 12.1, there cannot be any  m ∈ {1, . . . , |ndx.cs|}such that  ndx.cs[m] ⊂ Xas X is a minimal conflict set w.r.t. DPIprev+j. As two minimal conflict sets w.r.t.  DPIprev+jcan never be in a proper subset-relationship with one another,  L = nodee.cs[r]can be modified at most once by PRUNE or PRUNEQDUP for the DPI DPIprev+j.

Second, by Proposition 12.1, each minimal conflict set w.r.t.  DPIprevis a conflict set w.r.t. any DPI DPIprev+jthat results from  DPIprevby addition of test cases; that is, in particular, w.r.t. DPI. So, there must be some minimal conflict set  Cjw.r.t. each  DPIprev+jsuch that  Cj ⊆ Land there cannot be any minimal conflict set w.r.t.  DPIprev+jthat is a proper superset of L.

Third, we have that  L ̸= ∅, Lis a minimal conflict set w.r.t.  DPIprev, and  DPIprev+jincludes a superset of the test cases in  DPIprev. Thus, by Proposition 12.2, each minimal conflict set w.r.t. DPIprev+jmust be non-empty. In particular, Proposition 12.2 implies that all minimal conflict sets w.r.t. DPI that are subsets of L must be non-empty.

By these three observations, the criteria of Definition 12.6 can be applied to analyze the active subnode of  nodee.cs[r]w.r.t. DPI. That is, if  C1, . . . , Cnis the (arbitrary actual) chronological sequence of all sets X given as an argument to PRUNE and PRUNEQDUP during all executions of DYNAMICHS from the one with current DPI  DPIprevup to and including the one with current DPI DPI where

image

each  Ciis a minimal conflict set w.r.t.  DPIifor  i ∈ {1, . . . , n}

• Ck⊃ Ck+1for k ∈ {1, . . . , n − 1},

 DPIjincludes a proper subset of the test cases  DPIj+1includes for  j ∈ {1, . . . , n − 1},

 DPInis equal to DPI or includes a proper subset of the test cases DPI includes and

 DPIprevincludes a proper subset of the test cases  DPI1includes,

then  Cnis the active sublabel of  nodee.cs[r]w.r.t. DPI. However, as argued before, the minimal conflict set  Cnw.r.t.  DPIncannot be the empty set. As a consequence, we obtain that there must be a non-empty active sublabel of  nodee.cs[r]w.r.t. DPI.

By Propositions 12.1 and 12.2, there is a non-empty minimal conflict set  C′w.r.t. DPI such that C′ ⊆ Cn. Due to  Cn ⊂ · · · ⊂ C1 ⊂ nodee.cs[r] = Lwe conclude that  Cn ⊂ L. Therefore,  ∅ ⊂ C′ ⊂ Lholds.

By Proposition 4.6, each minimal diagnosis w.r.t. DPI is a minimal hitting set of all minimal conflict sets w.r.t. DPI. Thence, we have that  C′ ∩ D ̸= ∅. So, by  C′ ⊂ L, we have that  ∅ ⊂ C′ ∩ D ⊆L ∩ D ⊆ L. Consequently, we define  ndsuc := nodex = ADD(node, x)with  ndsuc.cs := nodex.cs =ADD(node.cs, L) for some  x ∈ C′ ∩ D ⊆ L. Then,  ndsuc ⊆ Dbecause  node ⊂ Dand  x ∈ D. It is clear from the inference so far that  nd ⊂ ndsuc, |ndsuc| = |nd| + 1and  ndsuc ∈ GenNodes. This shows the truth of propositions (1)-(3).

Proposition (4) must hold by lines 20-23.

Now we argue why propositions (5) and (6) must hold. Assume that  nd′suc ∈ Qis redundant w.r.t.some DPI  DPI′′prevwhich is equal to DPI or includes fewer test cases than DPI. Then, there must be some minimal conflict set  C′′w.r.t.  DPI′′prevwhich is a witness of redundancy of  nd′suc. Suppose that PRUNE is called given  X := C′′as an argument.

Now, we have to distinguish two cases: Either

(q1)  ndsucwas added to Q after it was generated or

(q2)  ndsucwas added to  Qdupafter it was generated

image

(c1)  C′′ ⊂nd′suc.cs[|nd′suc|]and nd′suc[|nd′suc|] ∈nd′suc.cs[|nd′suc|] \ C′′or

(c2)  C′′ ⊂ nd′suc.cs[j]and  nd′suc[j] ∈ nd′suc.cs[j] \ C′′for some  j ∈�1, . . . , |nd′suc| − 1�.

Case (q1): Here, we have that  nd′sucis the same node as  ndsucsince  ndsucwas added to Q after generation and no node replacement can have taken place because  nd′sucis defined as the node set- equal to  ndsucthat is an element of Q immediately after  ndsuchas been generated. And, only one node corresponding to one and the same set can be in Q at the same time.

Case (c1): We have that  C′′must be equal to some minimal conflict set  Cjin the sequence  C1, . . . , Cn. This must be truesince, first,  DPI′′previs equal to DPI or includes a subset of the test cases in DPI and DPIprevincludes a proper subset of the test cases in  DPI′′prev.

To understand why the latter must hold, recall that  DPIprevis the DPI of the call to DYNAMICHS where  ndsucwas generated and the minimal conflict set L was computed. By assumption, however, there is some minimal conflict set w.r.t.  DPI′′prev, namely  C′′, such that  C′′ ⊂ nd′suc.cs[|nd′suc|] = L. Hence, it cannot be truethat both L and  C′′are minimal conflict sets w.r.t. the same DPI. Otherwise, we would have a contradiction to the minimality of L. By Proposition 12.1, which states that minimal conflict sets cannot grow by the addition of new test cases to the DPI, we obtain the claimed fact that  DPIprevincludes a proper subset of the test cases in  DPI′′prev.

Second, the sequence  C1, . . . , Cncomprises all sets X given as an argument to PRUNE and PRUNEQDUP during all executions of DYNAMICHS from the one with current DPI  DPIprevup to and including the one with current DPI DPI where  L = nd′suc.cs[|nd′suc|] ⊃ C1 ⊃ · · · ⊃ Cnholds. Reason for this to be valid is the fact that  nd′sucis the same node as  ndsucin the currently considered case (q1).

Now, recall  C′is a minimal conflict set w.r.t. DPI such that  x ∈ C′ ∩ D ⊂ L. Further, by  nd′suc =nodex, we have that  nd′suc[|nd′suc|] = x. Since  C′ ⊆ Cn, we have that  C′ ⊆ Cjmust hold due to  Cn ⊆ Cj. Therefore, we can infer by  C′′ = Cjthat  C′ ⊆ C′′is true. Now,  x ∈ C′implies that  x ∈ C′′wherefore x /∈ nd′suc.cs[|nd′suc|] \ C′′. By  x = nd′suc[|nd′suc|], this is a contradiction to the assumption of case (c1). Hence, case (c2) must arise.

Case (c2): We have that  nd′suc[1..|nd′suc|−1]must be redundant w.r.t.  DPI′′prev. The subnode  nd′suc[1..|nd′suc| − 1]of  nd′sucis the same node as node by  nd′suc = nodex. So, suppose PRUNE is called with arguments Q (which inlcudes  nd′suc), X := C′′and  Qdupduring the execution of DYNAMICHS with current DPI  DPI′′prev.

Recall that node is the node set-equal to nd that is processed. That is, node is either the same node as nd or it is in a transitive replaces-relation with nd. Therefore, by the preconditions of this lemma, the following holds: If PRUNE is called given a witness of redundancy of node, then a replacement node of node is found. And, if only one replacement node  noderepof node is found, then  noderepis de-facto non-redundant w.r.t. DPI.

So, at the time PRUNE might be called given a witness of redundancy of  node, Comb(Qdup)must include a (non-necessarily proper) alternative subnode  noderep,subof node from which the de-facto non-redundant node  noderepw.r.t. DPI can be constructed as

image

This holds due to

Corollary 12.7, which says that each call to PRUNEQDUP returns the list  Qdup, a subset of Comb(Qdup),

the fact that PRUNEQDUP is always called immediately before PRUNE is called and

the fact that PRUNE searches for alternative subnodes for the construction of a replacement node of a redundant node exactly in the output set of PRUNEQDUP.

By Definition 12.7, this is implies that  noderep,submust be de-facto non-redundant w.r.t. DPI as otherwise the de-facto non-redundancy w.r.t. DPI could not hold for  noderep.

Consequently, by Lemma 12.11,  noderep,sub ∈ Comb(Qdup)must always be satisfied during any execution of DYNAMICHS using a DPI that is equal to DPI or includes a subset of the test cases in DPI. Hence, in particular, this must hold for the DPI  DPI′′prev.

By line 21 and PRUNEQDUP, which are the only places in DYNAMICHS where  Qdupis modified, Qdupis sorted in ascending order by node cardinality at any time during the execution of any call to

image

In order to construct a replacement node of  nd′suc, PRUNE first determines the maximal k such that C′′ ⊂ nd′suc.cs[k]and  nd′suc[k] ∈ nd′suc.cs[k] \ C′′. As case (c1) was proven to be false, we conclude that  k ≤ |nd′suc| − 1must hold. Due to the fact that  nd′suc[1..|nd′suc| − 1]is the same node as node, as reasoned above, and the fact that a de-facto non-redundant alternative equal node  noderep(see above)

of node can be constructed from  noderep,sub ∈ Comb(Qdup), we obtain that  k ≤ |noderep,sub|. This holds because the truth of both  node.cs[m] ⊃ C′′and  node[m] ∈ node.cs[m] \ C′′for some  m ∈{|noderep,sub| + 1, . . . , |node|}would be a contradiction to the de-facto non-redundancy of  noderepw.r.t. DPI.

Then, in line 96, an alternative subnode of  nd′suc

which has cardinality k + z where  z ≥ 0is minimal and

from which a replacement node of  nd′succan be constructed

is searched for in  Qdup. To see this, observe that elements in  Qdup– which is sorted in ascending order of node cardinality, as argued – are visited in order starting from the lowest cardinality node (line 96).

However, there is an alternative subnode  noderep,subof node such that  k ≤ |noderep,sub| ≤ |node| =|nd′suc| − 1and  noderep,subis an element of the argument  Qdupgiven to PRUNE, as shown above. As nd′sucis the same node as  nodex, nodeis a subnode of  nd′suc. Therefore,  noderep,subis an alternative subnode of  nd′suc.

Thus, we have that one replacement node of  nd′sucis definitely found by PRUNE. And, in case there is only one replacement node of  nd′succonstructable during PRUNE, then this replacement node is given by  nd′suc,new := ADD(noderep,sub, nodex[|noderep,sub| + 1..|nodex|]) = ADD(noderep, x)with nd′suc,new.cs := ADD(noderep,sub.cs, nodex.cs[|noderep,sub| + 1..|nodex|]) = ADD(noderep.cs, L). As it is straightforward from the deductions above,  nd′suc,newis de-facto non-redundant w.r.t. DPI. Thence, proposition (5) is true.

Due to  |noderep,sub| ≤ |node| = |nd′suc| − 1, the alternative subnode of  nd′sucactually found by PRUNE cannot have a cardinality greater than  |nd′suc|−1. So, let  ndaltbe the found alternative subnode of nd′suc. Since  |ndalt| ≤ |nd′suc|−1, we obtain that the replacement node  nd′suc,new,1of  nd′succonstructed from  ndaltmust meet  nd′suc,new,1[|nd′suc|] = nd′suc[|nd′suc|] = xas well as  nd′suc,new,1.cs[|nd′suc|] =nd′suc.cs[|nd′suc|] = L. That is, the first  |node| = |nd′suc| − 1positions as a set correspond to a node in a transitive replaces-relation with nd.

Now, we have the following precondition of this lemma: Let  nd′be in a transitive replaces-relation with nd. If PRUNE is called given a witness of redundancy of  nd′, then some replacement node of  nd′is found. If only one replacement node of  nd′is found, then this replacement node is de-facto non-redundant w.r.t. DPI.

Therefore, the same line of argument as used for  nd′succan be applied to any node  nd′suc,repin a transitive replaces-relation with  nd′suc. That is, the following must be valid for any node  nd′suc,repin a transitive replaces-relation with  nd′suc:

 nd′suc,rep[|nd′suc|] = xand  nd′suc,rep.cs[|nd′suc|] = L.

If PRUNE is called given a witness of redundancy of  nd′suc,rep, then some replacement node of nd′suc,repis found. And, if only one replacement node of  nd′suc,repis constructable, then this replacement node is de-facto non-redundant w.r.t. DPI.

After once a replacement node of  nd′sucor of some node in a transitive replaces-relation with  nd′sucis found which is de-facto non-redundant w.r.t. DPI, this replacement node cannot be replaced or pruned by Proposition 12.7. Therefore, by Lemma 12.10, no witness of redundancy of this replacement node can exist w.r.t. any DPI including a (not necessarily proper) subset of the test cases in DPI. Thence, proposition (6) is true.

Case (q2): Here, we have that  nd′sucis not the same node as  ndsuc. This must be valid as  nd′sucis defined as the node set-equal to  ndsucthat is an element of Q immediately after  ndsucwas generated and ndsucis assumed to be added to  Qdupafter being generated.

Now, independently of whether (c1) or (c2) occurs, the following holds: If PRUNE is called given a witness of redundancy of  nd′suc, then a replacement node of  nd′sucis found. And, if only one replacement node of  nd′sucis constructable, then this replacement node is de-facto non-redundant w.r.t. DPI.

To understand why this must hold, first recall that  ndsucis a successor of node, i.e.  ndsuc[1..|ndsuc|−1] is the same node as node. Furthermore, node is the node set-equal to nd that is processed. That is, node is either the same node as nd or it is in a transitive replaces-relation with nd.

Therefore, by the preconditions of this lemma, the following holds: If PRUNE is called given a witness of redundancy of node, then a replacement node of node is found. And, if only one replacement node noderepof node is constructable, then  noderepis de-facto non-redundant w.r.t. DPI.

As argued in case (q1)(c2),  Comb(Qdup)must include a subnode  noderep,subof  noderepthat is de-facto non-redundant w.r.t. DPI and from which  noderepis constructed. This must be satisfied during any execution of DYNAMICHS using a DPI that is equal to DPI or includes a subset of the test cases in DPI. Hence, in particular, this must hold for the DPI  DPI′′prev.

Since  ndsuchas been added to  Qdupby assumption, it might be found to be redundant w.r.t. some DPI (either equal to DPI or including a subset of the test cases in DPI) during some execution of PRUNEQDUP. If so,  ndsuccannot be pruned on account of Lemma 12.8 which says that a node can only be pruned from  Qdupif the set  Combndsuc(Qdup)of combined equal nodes of  ndsucof  Qdup(cf. Definition 12.5) is the empty set.

However,  Combndsuc(Qdup) ̸= ∅must be valid. Because we demonstrated that

 noderep,sub ∈ Comb(Qdup),

 ndsuc ∈ Qdup,

 ndsucis the same node as  nodex = ADD(node, x)with  ndsuc.csbeing equal to  nodex.cs =ADD(node.cs, L) and

 x /∈ ndsuc.cs[|ndsuc|] \ C′′(see case (q1)(c1)) wherefore  C′′must be a witness of redundancy of node.

Therefore,  ndcomb := ADD(noderep,sub, nodex[|noderep,sub| + 1..|nodex|]) = ADD(noderep, x)with ndcomb.cs := ADD(noderep,sub.cs, nodex.cs[|noderep,sub| + 1..|nodex|]) = ADD(noderep.cs, L)is a combined equal node of  ndsucof  Qdup, i.e.  ndcomb ∈ Combndsuc(Qdup). As argued in case (q1)(c2), this node  ndcomb(denoted by  nd′suc,newin case (q1)(c2)) is de-facto non-redundant w.r.t. DPI.

Because PRUNE is called immediately after PRUNEQDUP and thus uses the updated list  Qdupwhich comprises  ndcomband because  ndcomb = ndsuc = nd′suc, we have that one replacement node of  nd′sucis definitely found by PRUNE. And, in case there is only one replacement node of  nd′succonstructable during PRUNE, this replacement node is given by  ndcomb. Thence, proposition (5) is true.

By Proposition 12.7, the fact that  ndcomb ∈ Combndsuc(Qdup) ⊆ Comb(Qdup)at some point in time during the execution of DYNAMICHS with current DPI  DPI′′prevand the de-facto non-redundancy of  ndcombw.r.t. DPI, we conclude that, during any execution of DYNAMICHS with a current DPI that includes a (not necessarily proper) superset of the test cases in  DPI′′prevand includes a (not necessarily proper) subset of the test cases in  DPI, ndcomb ∈ Comb(Qdup)must hold. Further on,  ndcomb = nd′sucis true.

Hence, independently of which replacement node of  nd′sucis actually found by PRUNE, a set-equality between this replacement node and  ndcombwill hold. This is truesince each replacement node, by defini-tion, is set-equal to the node it replaces. Consequently, this set-equality holds for any node in a transitive replaces-relation with  nd′suc. So, we have that one replacement node of any node  nd′suc,repin a transitive replaces-relation with  nd′sucis definitely found by PRUNE. And, in case there is only one replacement node of  nd′suc,repconstructable during PRUNE, this replacement node is given by  ndcombwhich is de- facto non-redundant w.r.t. DPI.

That  ndcomb, after it has been used as a replacement node of  nd′sucor of some node in a transitive replaces-relation with  nd′suc, cannot be pruned or replaced, follows from Proposition 12.7 and the fact that  ndcombis de-facto non-redundant w.r.t. DPI. Therefore, by Lemma 12.10, no witness of redundancy of  ndcombcan exist w.r.t. any DPI including a (not necessarily proper) subset of the test cases in DPI. Thence, proposition (6) is true.

In the following we prove the completeness of DYNAMICHS. Given an arbitrary minimal diagnosis D w.r.t. to an arbitrary fixed DPI DPI, Proposition 12.8 testifies that there must be some node set-equal to D that is processed during the execution of DYNAMICHS with current DPI DPI in case this execution terminates by reason of Q = []. Second, the proposition demonstrates that the set  Dcalcreturned by this execution of DYNAMICHS comprises all minimal diagnoses w.r.t. DPI. Additionally, the proposition shows that, at any point in time during the execution of Algorithm 5, some node that corresponds to a subset of D must be stored by DYNAMICHS.

In terms of the hitting set tree produced by DYNAMICHS, the proposition states that, after all branches in the tree have been closed or pruned, there is a closed branch labeled by valid for each minimal diagnosis w.r.t. DPI. And, for any minimal diagnosis D w.r.t. DPI, at any time during the tree construction, there is some branch that corresponds to a part of D.

This proposition will be proven by deriving the existence of a de-facto non-redundant node  ndDw.r.t. DPI for any minimal diagnosis D w.r.t. DPI such that  ndD ⊆ D. In case  ndD = D, we will deduce directly that the proposition must be true. Otherwise, i.e. if  ndD ⊂ D, then Lemmata 12.13 and 12.14 will be exploited.

Proposition 12.8 (Completeness of DYNAMICHS). Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS and assume that DYNAMICHS terminates due to Q = []. Let further  DPI := ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand D be some minimal diagnosis w.r.t. DPI. Then the following holds:

(1) At some point in time during the execution of DYNAMICHS with current DPI DPI, there is a node nd such that nd = D and nd is processed.

(2) The execution of DYNAMICHS with current DPI DPI returns a set  Dcalcthat comprises all minimal diagnoses w.r.t. DPI.

(3) Let  DPI′be an arbitrary DPI that includes a (not necessarily proper) subset of the test cases in DPI. Then, at any point in time during the execution of DYNAMICHS with current DPI  DPI′, there is some node  nd′such that  nd′ ⊆ Dand  nd′is an element of one of the collections  Q, Dcalc, D✓, D×or  D⊃.

Proof. Let GenNodes be the set of all nodes generated throughout the execution of all calls to DY- NAMICHS during the execution of Algorithm 5.

Assume first that  D = ∅. This means that DPI must be the input DPI of Algorithm 5. Assume the opposite.

A query is only generated and added as a new test case to the DPI in lines 16 and 24 or 26 of Algorithm 5 if there are at least two diagnoses in the set  Dcalc(called  D✓in Algorithm 5) returned by DYNAMICHS. Otherwise, line 16 cannot be reached since there must be exactly one diagnosis in  D✓when it comes to the execution of line 13 wherefore the probability of this diagnosis must be equal to 1 which is greater or equal to  1 − σfor any choice of  σ(recall that  σis positive). Please notice that D✓ = ∅cannot hold in line 13 since this would imply the non-admissibility of the input DPI given to Algorithm 5 by Corollary 7.3 and Definition 3.6. By precondition, however, the DPI provided as an input to Algorithm 5 must be admissible.

Now, since DPI is assumed to be not equal to the input DPI of Algorithm 5, we have, by the argumentation given, that there must have been at least two diagnoses w.r.t. the input DPI.

Let us first assume that K is valid w.r.t.  ⟨·, B, P, N ⟩Rwhere  ⟨K, B, P, N ⟩Ris the input DPI. Then, by Corollary 3.3,  ∅is a diagnosis w.r.t. the input DPI. Obviously, it must be a minimal diagnosis and the only minimal diagnosis w.r.t. the input DPI, contradiction.

Second, suppose that K is invalid w.r.t.  ⟨·, B, P, N ⟩R. By Proposition 4.6 which says that a diagnosis w.r.t. some DPI is a hitting set of all minimal conflict sets w.r.t. this DPI, we conclude that there must be at least one minimal conflict set C w.r.t. the input DPI. Now, by Proposition 12.1, there must be a minimal conflict set  C′w.r.t. DPI such that  C′ ⊆ C. By Proposition 4.2, the fact that K is invalid w.r.t. ⟨·, B, P, N ⟩R, the fact that the input DPI is admissible and Corollary 7.3 which states that the addition of queries as test cases cannot make an admissible DPI non-admissible, we obtain that  ∅ ⊂ C′. By Proposition 4.6, this is a contradiction to  D = ∅and the fact that D is a diagnosis w.r.t. DPI.

So, DPI is the input DPI. Hence, the first call to DYNAMICHS throughout the execution of Algorithm 5 considers this DPI. During the execution of the first call to DYNAMICHS,  Q = [∅]holds by lines 3 and 10 of Algorithm 5. The function UPDATETREE has no effect during the execution of the first call to DYNAMICHS in Algorithm 5. That is, in particular, it does not modify Q. For, UPDATETREE first iterates over all elements in  D×, then over all elements in  D⊃and finally over all elements in  D✓where D× = D⊃ = D✓ = ∅by lines 1 and 10 in Algorithm 5. Hence,  Q = [∅]holds when DYNAMICHS reaches line 6 wherefore  ∅is processed.

Now, assume  D ̸= ∅. In this case, the root node must be labeled by some minimal conflict set L w.r.t. the DPI given as input to Algorithm 5. To see this, suppose the opposite, i.e. that the root node is labeled by (i) nonmin or (ii) valid.

Case (i): This leads to a contradiction. For,  Dcalc = ∅holds at the beginning of each execution of DYNAMICHS (line 3). The root node  ∅must be the first node that is processed throughout all executions of DYNAMICHS during the execution of Algorithm 5 since it holds for each other node node that  node ⊃ ∅. Thus, the non-minimality criterion (lines 27-29) cannot be satisfied because  Dcalc = ∅must hold in line 27 when DLABEL is executed for the root node. Hence, the label nonmin is impossible for the node ∅.

Case (ii): By Lemma 12.1, we can deduce that  ∅is a diagnosis w.r.t. the input DPI. The fact that there cannot be any diagnosis w.r.t. the input DPI which is a proper subset of  ∅implies that  ∅is a minimal diagnosis w.r.t. the input DPI. By the reasoning applied before (in the case  D = ∅), we obtain that DPI is equal to the input DPI and that  ∅is the only minimal diagnosis w.r.t. DPI. This is a contradiction to the existence of a minimal diagnosis w.r.t. DPI, namely D, which is non-empty.

Consequently, the root node must be labeled by some minimal conflict set L w.r.t. the input DPI. Hence, DYNAMICHS will execute lines 17-23 and generate one node  nodee := ADD(∅, e) = [e]with nodee.cs := ADD(∅, L) = [L]for each  e ∈ L(cf. Definition 12.2 for an explanation of the function ADD). This means that  nodee ∈ GenNodesfor each  e ∈ L. As L is a set and thus comprises only one exemplar of each element, there cannot be a set-equal node  node′eof  nodeein Q at the time  nodeeis generated. So, each  nodeemust be added to Q in line 23.

By Proposition 12.1, there must be some minimal conflict set C w.r.t. DPI such that  C ⊆ L. Since D is a diagnosis w.r.t. DPI, we have that  C ∩ D ̸= ∅by Proposition 4.6. Thence,  L ∩ D ̸= ∅must be true. Therefore, in particular,  L ̸= ∅must hold.

Assume that |D| = 1. This implies by Proposition 4.6 that each minimal conflict set w.r.t. DPI includes x. Further, there is some  x ∈ Lsuch that  D = {x} = nodex. By Corollary 12.1 and Lemmata 12.6 and 12.7, PRUNE is only called given some minimal conflict set X w.r.t. the current DPI  DPIprevas argument. As DYNAMICHS using DPI is assumed to terminate due to  Q = [], DPIprevmust be equal to DPI or include only a subset of the test cases DPI includes. By Proposition 12.1, it must hold for X that it is equal to or a superset of some minimal conflict set w.r.t. DPI. Hence  x ∈ Xmust hold wherefore X cannot be a witness of redundancy of  nodex. So,  nodexcan never be pruned and must be finally processed as DPI terminates due to Q = [] and nodes can only be deleted from Q by being pruned or processed. So far, we have established the truth of the lemma for  |D| ≤ 1.

Now, suppose  |D| ≥ 2. In the following, we argue that there must be some node  nodey ⊂ Dfor some y ∈ Lwhich is de-facto non-redundant w.r.t. DPI.

As DYNAMICHS using DPI is assumed to terminate due to Q = [], each node  nodeefor  e ∈ Lmust have been generated (and L must have been computed) during DYNAMICHS with some current DPI DPIprevwhich is equal to DPI or includes only a subset of the test cases DPI includes. Let  DPIprev+ibe any DPI which includes a proper superset of the test cases  DPIprevincludes and is either equal to DPI or comprises a subset of the test cases DPI comprises. Then, Proposition 12.1 manifests that there must be some minimal conflict set  Ciw.r.t.  DPIprev+isuch that  Ci ⊆ L. Since we proved above that L ̸= ∅must hold, we deduce by Proposition 12.2 that  Ci ̸= ∅must be valid.

From Corollaries 12.1, 12.2 and Lemmata 12.6 and 12.7 we infer that PRUNE as well as PRUNEQDUP are always called with a minimal conflict set X w.r.t. the current DPI given as an argument. Lemma 12.8 and the fact that PRUNE is always called immediately after PRUNEQDUP given the argument  Qdupwhich is the output list of PRUNEQDUP, we have that the list  Qdupincludes only nodes nd such that there is no r ∈ {1, . . . , |nd|}for which  nd.cs[r] ⊃ X. As a consequence of this, we have by Lemma 12.10 that for all nodes nd in the collection  S′returned by PRUNE there is no  r ∈ {1, . . . , |nd|}for which  nd.cs[r] ⊃ X.

Thence, the first time PRUNE is called with some  X1 ⊂ L, X1is a minimal conflict set w.r.t. some DPI DPIprev+i. Thus, as argued,  X1 ⊃ ∅must hold. So, after PRUNE has finished executing, for each node node in its output set there will be no  r ∈ {1, . . . , |node|}such that  node.cs[r] ⊃ X1. For any further minimal conflict set  X2w.r.t. some  DPIprev+i+kfor which PRUNE is called, we have that  X2 ⊃ ∅and for each node node in its output set there will be no  r ∈ {1, . . . , |node|}such that  node.cs[r] ⊃ X2, and so on.

For L, in particular, there is some (possibly empty) sequence of minimal conflict sets  X1, . . . , Xnw.r.t. DPIs  DPIprev+i1, . . . , DPIprev+in (ij < ij+1for  j ∈ {1, . . . , n − 1}) such that  L ⊃ X1and Xi ⊃ Xi+1for  i ∈ {1, . . . , n}where this sequence includes all such conflict sets which restrict a conflict set used to label nodes that was initially given by L. Since  Xnis a minimal conflict set w.r.t.  DPIprev+inwhich is equal to DPI or includes only a subset of the test cases DPI includes, we have that there must be some minimal conflict set C w.r.t. DPI such that  C ⊆ Xn, as already argued. As D must hit C by Proposition 4.6, we obtain that  D ∩ Xn ̸= ∅.

So, by the inference given, there must be some  y ∈ Lsuch that  y ∈ X1 ∩ · · · ∩ Xnand  y ∈ D. That is,  nodey ⊂ D.

Since  |nodee| = 1and  nodee.cs[1] = Lfor all  e ∈ L, in particular for e = y, we obtain by Definitions 12.6 and 12.7 that  nodeyis de-facto non-redundant w.r.t. DPI.

So, the preconditions of Lemma 12.13 are met for  nodey. As a consequence, there must be a node nd′sucsuch that  |nd′suc| = |nodey|+1, nd′suc ⊆ D, nd′sucis an element of Q immediately after  nodeyhas been processed and  nd′sucsatisfies the postulations to the node nd in the preconditions of Lemma 12.14. Hence, if  nd′suc ⊂ D, there must be a node  nd′′sucsuch that  |nd′′suc| = |nd′suc| + 1, nd′′suc ⊆ D, nd′′sucis an element of Q immediately after a node set-equal to  nd′suchas been processed and  nd′′sucsatisfies the postulations to the node nd in the preconditions of Lemma 12.14.

This reasoning by means of Lemma 12.14 can be further applied to finally derive that some node nd = D must be generated and some node  nd′set-equal to nd must be an element of Q. By Lemma 12.14, either nd′or a node set-equal to  nd′which is in a transitive replaces-relation with  nd′must finally be processed. Reason for this is that  nd′ ∈ Qcannot be pruned, but can only be replaced, and each replacement node is set-equal to  nd′and thus to D. Moreover, the execution of DYNAMICHS with current DPI DPI terminates due to Q = [] wherefore each node in Q must be either pruned or processed as these are the only two ways nodes might be eliminated from Q. If some node nd = D is processed during an execution of DYNAMICHS with current DPI some DPIDPI′that includes a proper subset of the test cases in DPI, then DLABEL cannot return a set L. This holds by Lemma 12.2 and Proposition 12.1. The former says that  nd ∩ L = ∅and L is a minimal conflict set w.r.t.  DPI′. The latter asserts that each conflict set w.r.t. DPI is a conflict set w.r.t. DPI. Moreover,

we can deduce that  L ̸= ∅must hold if a set L is returned by DLABEL by a similar argumentation as used

in the proof of Lemma 12.14. That is, by Proposition 4.6, we have that D cannot be a diagnosis w.r.t. DPI, contradiction. Hence, DLABEL must return nonmin or valid for nd. In the former case, it would be added to  D⊃, in

the latter to  Dcalc. Similarly as done in the proof of Lemma 12.14, we can show that nd must be reinserted

into Q the latest during the execution of DYNAMICHS with current DPI DPI and, in particular, nd must be an element of Q when the repeat-loop during the execution of DYNAMICHS with current DPI DPI is entered. Thus, nd must be (again) processed during the execution of DYNAMICHS with current DPI DPI. This proves proposition (1). Proposition (2): At the beginning of each execution of DYNAMICHS, it holds that  Dcalc = ∅. This is

truein particular for the execution of DYNAMICHS with current DPI DPI. Now, proposition (1) reveals

that, for each diagnosis D w.r.t. DPI, at some point in time during the execution of DYNAMICHS with current DPI DPI, there is a node nd such that nd = D and nd is processed. When nd is processed, the DLABEL function is called for nd. The DLABEL function might return (a) a set L, (b) nonmin or (c) valid. There are no other possible return values of DLABEL. Case (a): By Lemma 12.2, L must be a minimal conflict set w.r.t. DPI such that  nd ∩ L = ∅.

According to Proposition 4.6, it must hold for D that  D ∩ L ̸= ∅since D is a minimal diagnosis w.r.t.

DPI. Since D = nd, we obtain a contradiction. Case (b): By Lemma 12.1,  Dcalccan comprise only diagnoses w.r.t. DPI. By line 27, this yields

that there is a diagnosis w.r.t. DPI that is a proper subset of nd. This however is a contradiction to the

set-equality of nd with the minimal diagnosis D w.r.t. DPI. Consequently, case (c) must arise. This implies that nd is added to  Dcalcin line 13. Proposition (3) is a direct consequence of the reasoning in this proof and in the proofs of Lem-

mata 12.13 and 12.14.

12.4.9 Soundness of DYNAMICHS

Having established the completeness of each call to DYNAMICHS concerning the minimal diagnoses w.r.t. the current DPI DPI at this call, we are now able to prove the soundness of each call to DYNAMICHS. That is, we will demonstrate that only minimal diagnoses w.r.t. DPI can be added to the set  Dcalcduring DYNAMICHS with the current DPI DPI. Necessary condition for the proof of the following proposition is the completeness of DYNAMICHS, i.e. Proposition 12.8.

Proposition 12.9 (Soundness of DYNAMICHS). Let  ⟨K, B, P, N ⟩Rbe the DPI and  P′and  N ′the sets of positively and negatively answered queries given as an input to DYNAMICHS. Let further DPI := ⟨K, B, P ∪ P′, N ∪ N ′⟩R. Then, the following holds:

(1) At any point in time during the execution of DYNAMICHS with current DPI DPI, each node in  Dcalcis a minimal diagnosis w.r.t. DPI.

(2) At any point in time during the execution of DYNAMICHS with current DPI  DPI, Dcalccomprises the  |Dcalc|most-probable minimal diagnoses w.r.t. DPI.

Proof. Proposition (1): At the beginning of any execution of DYNAMICHS, the set  Dcalcis the empty set (line 3). So, it suffices to show that only minimal diagnoses w.r.t. DPI can be added to  Dcalcduring the execution of DYNAMICHS with the current DPI DPI.

A node node can be added to  Dcalcexclusively in line 13. In order for this line to be reached, by the criterion that is checked in line 12, node must be processed and labeled by valid. By Lemma 12.1, if node gets labeled by valid, then it is a diagnosis w.r.t. DPI.

So, assume that node is added to  Dcalcwhere node is a non-minimal diagnosis w.r.t. DPI. Since node must have been processed and labeled by valid, the DLABEL function must have been executed given node as an argument and must have returned in line 43. Hence, there can be no node  nd ∈ Dcalcsuch that  nd ⊂ nodeholds, as otherwise DLABEL would have already returned in line 29.

However, since node is a non-minimal diagnosis w.r.t. DPI there must be some minimal diagnosis D w.r.t. DPI such that  D ⊂ node. Moreover, by Proposition 12.8, at any point in time before D is added to  Dcalc, there must be some node nd such that  nd ⊆ Dand nd is an element of one of the collections (a)  Dcalc, (b)  D✓, (c)  D×, (d)  D⊃or (e) Q. So, let us consider these cases in sequence.

Case (a): First,  nd ⊆ Dand  D ⊂ nodeimplies that  nd ⊂ nodemust be valid. As mentioned above, there can be no node in  Dcalcwhich is a proper subset of node, contradiction.

Case (b): In this case, nd must be also an element of Q since all nodes in  D✓are inserted into Q during UPDATETREE which is executed before the repeat-loop is entered, i.e. before it can come to the assumed addition of node to  Dcalcwhich can only take place within the repeat-loop. So, in fact case (e) applies here.

Case (c): As can be easily seen from lines 67-69 in UPDATETREE,  D×must be the empty set at the time node might be added to  Dcalcby analogue argumentation as in case (b), contradiction.

Case (d): By lines 70-78 in UPDATETREE and the fact that UPDATETREE must have been executed before the assumed addition of node to  Dcalccan take place as argued in case (b), we have that there must be some node  ndsub ∈ D✓such that  ndsub ⊂ nd. Otherwise, nd would have been deleted from  D⊃in line 78. By  nd ⊂ nodeas per case (a), we deduce that  ndsub ⊂ node. Due to  nd ⊆ D, it must be truethat ndsub ⊆ D. Thus, we have derived that case case (b) holds for the node  ndsub. By the deductions in case (b) above, we eventually know that case (e) must hold.

Thence, assumption of cases (a) and (c) is contradictory. Cases (b) and (d) imply the truth of case (e). Therefore, case (e) must occur.

Case (e): Due to the facts that all nodes are inserted into Q in a manner that descending order of nodes in Q by  pnodes()is maintained (cf. lines 23, 100 and 103) and always the first node in Q is processed next (cf. line 6), we conclude that  pnodes(nd) ≤ pnodes(node)must be valid. However, due to  nd ⊆ D ⊂ nodewe have that  nd ⊂ node. Now, by Lemma 4.14,  pnodes(n) > pnodes(n′)holds for any two nodes n and n′such that  n ⊂ n′. Therefore,  pnodes(nd) > pnodes(node), contradiction.

Proposition (2): By proposition (1), each node added to  Dcalcmust be a minimal diagnosis w.r.t. DPI.

Assume any point in time t during the execution of DYNAMICHS with the current DPI DPI. Then, |Dcalc| = m ≥ 0must hold. We use induction by m to prove proposition (2).

Base Case: Suppose that m = 0 and some minimal diagnosis D w.r.t. DPI is added to  Dcalcwhere D is not the most probable minimal diagnosis w.r.t. DPI. This implies that D is processed and that D has the highest probability as per  pnodes()among all nodes that are elements of Q at time t, as argued in the proof of proposition (1).

Let us denote by  D1the most probable minimal diagnosis w.r.t. DPI. That is,  pnodes(D1) >pnodes(D)holds.

Then, by Proposition 12.8, at any point in time during the execution of DYNAMICHS with the current DPI DPI, there must be some node  nd1such that  nd1 ⊆ D1and  nd1is an element of one of the collections (a)  Dcalc, (b)  D✓, (c)  D×, (d)  D⊃or (e) Q.

Case (a) can be ruled out due to the assumption that  Dcalc = ∅. Cases (b)-(d) can be treated analogously as above in the proof of proposition (1). Hence, case (e) must hold.

That is,  nd1 ∈ Qat time t and  nd1is equal to or a subset of  D1. As  pnodes(nd1) ≥ pnodes(D1) >pnodes(D)holds by Lemma 4.14, we can infer that D has not the highest probability as per  pnodes()among all nodes that are elements of Q at time t, contradiction.

Inductive Step: Now, let m > 0 and assume that the m most probable minimal diagnoses w.r.t. DPI are already elements of  Dcalc. Suppose further that some minimal diagnosis D w.r.t. DPI is added to Dcalcwhere D is not the (m + 1)-th most probable minimal diagnosis w.r.t. DPI. This implies that D is processed and that D has the highest probability as per  pnodes()among all nodes that are elements of Q

at time t.

Let us denote by  Dm+1the (m + 1)-th most probable minimal diagnosis w.r.t. DPI. That is, pnodes(Dm+1) > pnodes(D)holds since the m most probable minimal diagnoses w.r.t. DPI are already elements of Q.

Then, by Proposition 12.8, at any point in time during the execution of DYNAMICHS with the current DPI DPI, there must be some node  ndm+1such that  ndm+1 ⊆ Dm+1and  ndm+1is an element of one of the collections (a)  Dcalc, (b)  D✓, (c)  D×, (d)  D⊃or (e) Q.

Case (a) can be ruled out due to proposition (1) which affirms that only minimal diagnoses w.r.t. DPI can be elements of  Dcalc. As  Dm+1is not an element of  Dcalcper assumption, a node  ndm+1 = Dm+1cannot be an element of  Dcalc. Furthermore, by the fact that  Dm+1is a minimal diagnosis w.r.t. DPI, any node  ndm+1 ⊂ Dm+1cannot be a (minimal) diagnosis w.r.t. DPI and thus cannot be an element of Dcalc. Cases (b)-(d) can be treated analogously as above in the proof of proposition (1). Hence, case (e) must hold.

That is,  ndm+1 ∈ Qat time t and  ndm+1is equal to or a subset of  Dm+1. As  pnodes(ndm+1) ≥pnodes(Dm+1) > pnodes(D)holds by Lemma 4.14, we can infer that D has not the highest probability as per  pnodes()among all nodes that are elements of Q at time t, contradiction.

12.4.10 Correctness of DYNAMICHS

Now, we are able to prove that DYNAMICHS terminates and yields an output complying with the assertions given in Algorithm 8:

Corollary 12.8. Any call to DYNAMICHS (given the inputs described in Algorithm 8) within Algorithm 5 terminates and yields an output  ⟨Dcalc, Q, Ccalc, D×, D⊃, Qdup⟩where

(1)  Dcalcis the current set of leading diagnoses such that

(a)  Dcalc ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Ris the set of most probable minimal diagnoses w.r.t.  ⟨K, B, P ∪P′, N ∪ N ′⟩Rsuch that

image

where “most-probable” refers to the probability measure  pnodes()given by Definition 4.9 and obtained from the function p() given as an input argument to DYNAMICHS.

(2) Q is the current queue of open (non-labeled) nodes of the produced hitting set tree,

(3)  Ccalcis a set of conflict sets w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R,

(4)  D× = ∅,

(5)  D⊃is the set of all processed nodes so far throughout the execution of Algorithm 5 that are non-minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand

(6)  Qdupincludes a node set-equal to X for a set  X ⊆ Kiff

image

Proof. First, we prove that any call to DYNAMICHS within Algorithm 5 terminates. To this end, assume that a call to DYNAMICHS executes infinitely. That is, Q = [] must not be satisfied at any time during the execution of DYNAMICHS due to the stop criterion of DYNAMICHS in line 24.

However, the overall number of nodes that might be elements of Q during the processing of the repeat-loop of any call to DYNAMICHS is finite. This is satisfied since each node nd in DYNAMICHS is a list corresponding to a subset of K and each element of the list nd.cs is a subset of K as well. For, a node can never correspond to a proper superset of K by Proposition 4.9 which says that QX(⟨K \D, B, P ∪ P′, N ∪ N ′⟩R)returns ’no conflict’ in case K \ D is valid w.r.t.  ⟨·, B, P ∪ P′, N ∪ N ′⟩Rwhich is equivalent to D being a diagnosis w.r.t.  ⟨K \ D, B, P ∪ P′, N ∪ N ′⟩Rby Corollary 3.3. Now, the DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Ris admissible which follows from the admissibility of the input DPI ⟨K, B, P, N ⟩Rand Corollary 7.3. That D := K must be a diagnosis w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Ris a direct consequence of the admissibility of  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rand Definition 3.6. Therefore DLABEL must return valid for each node the latest when the node becomes set-equal to K. A node that was assigned the label valid and added to  Dcalccan never be processed again during this execution of DYNAMICHS wherefore no successors of such a node can be added to Q. The same holds for some node that is labeled by nonmin and added to  D⊃.

Thence, the assumption that  Q ̸= []forever implies that there is (at least) one node node that is never removed from Q.

By Lemma 12.12, each node that is a subset of or set-equal to a once processed node nd must have been generated before nd is processed. That is, after a node is processed, it is guaranteed that no proper subsets of it can ever be processed and no subsets of it can ever be added to Q. After a node nd is processed and is not labeled by valid or nonmin, nd is not an element of Q anymore (cf. line 7) and Q comprises a set of successor nodes of nd where each such node corresponds to a proper superset of nd (cf. line 23). Consequently, a node in Q that is processed can either be deleted whereupon no successor thereof is added to Q (in case of pruning or labeling a node by valid or nonmin) or be deleted whereupon proper supersets of it are added to Q (in case of labeling a node by a conflict set).

A (combined) replacement of a node involves the substitution of this node by another node set-equal to it. However, there can be only finitely many possibilities to construct a replacement or combined replacement node of some node since  Comb(Qdup) ⊇ Qdupalso includes only nodes, i.e. finitely many elements. Therefore, each node in Q can be replaced only finitely many times.

Since in each iteration of the repeat-loop in DYNAMICHS one node is processed, the cardinality of the nodes that are elements of Q is strictly monotonically increasing.

As node is supposed to be never processed, we have that in each iteration of the repeat-loop, one of the other nodes in Q must by processed. By the given argumentation, we know that after finitely many iterations, Q = [node] must be given (since all other nodes must be already pruned or labeled). Hence, node will be processed in the next iteration as GETFIRST in line 6 must catch node, contradiction.

Proposition (1): This proposition is a direct consequence of Proposition 12.9-(2) and the stop criterion of DYNAMICHS in line 24.

Proposition (2) is clear. Proposition (3) follows from Lemma 12.2 which asserts that each element of Ccalcis a minimal conflict set w.r.t. some DPI  ⟨K, B, P ∪ P′′, N ∪ N ′′⟩Rwhere  P′′ ⊆ P′and  N ′′ ⊆N ′. By Proposition 12.1, we obtain that each element of  Ccalcis a conflict set w.r.t. the current DPI ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

Proposition (4): This proposition is true since UPDATETREE is called at the beginning of each execution of DYNAMICHS and all elements in  D×that have not been deleted from  D×before are deleted in lines 67-69. After UPDATETREE has finished processing, there is no other place in DYNAMICHS where nodes can be added to  D×. Hence,  D× = ∅must hold when DYNAMICHS terminates.

Proposition (5): The elements of  D⊃after UPDATETREE at the beginning of the execution of DY- NAMICHS has returned must be non-minimal diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rby lines 70-78 and the fact that  D✓comprises only diagnoses w.r.t. the current DPI. The latter holds by lines 19 and 21 of Algorithm 5 where only diagnoses w.r.t. the current DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rare added to  D✓. That only non-minimal diagnoses w.r.t. the current DPI can be added to  D⊃during the execution of the repeat-loop is a simple implication of Lemma 12.1-(4).

Proposition (6) is a consequence of lines 20-21, the definition of de-facto non-redundancy (Defini-tion 12.7) and Lemma 12.8.

image

image

image

Computation

In this chapter we want to summarize properties of and differences between STATICHS and DYNAMICHS that we already pointed out in previous sections and, additionally, we want to shed light on some further interesting aspects of these iterative diagnosis computation methods in the scope of interactive KB debugging (Algorithm 5). Table 13.1 provides an overview of what we did discuss or will discuss below.

First Segment of Table 13.1 – Addressed Problem and Properties w.r.t. Solutions. The first row of the table has been proven by Proposition 9.1 on page 124. Results given by the second up to the fourth row of the table are substantiated by Proposition 11.1 (STATICHS) and Corollary 12.8 (DYNAMICHS). We have discussed in Section 11.1 that Algorithm 5 with mode = static can artificially fix the search space for possible solutions initially. This is an inherent property of the Interactive Static KB Debugging Problem which the algorithm aims to solve in static mode. For, a minimal diagnosis w.r.t. the input DPI which satisfies all answered queries added as test cases throughout the debugging session must be detected (see left column of category “diagnoses” in Table 13.1). Hence, the solution space is given by  |mDinputDP I|. “Initially fixed search space” in this case means that, given the fault tolerance  σ = 0, Algorithm 5 in static mode must compute all minimal diagnoses w.r.t. the input DPI, i.e. the entire set  mDinputDP I. In case of dynamic mode, on the other hand, the solution space (i.e. minimal diagnoses w.r.t. the current DPI, see right column of Table 13.1 in category “diagnoses”) that needs to be explored by Algorithm 5 for a given value of zero for  σis not known in advance. It rather depends on which test cases are specified or, respectively, which queries the user is asked. In case of the usage of mainly “positive-impact queries”, the search space might have significantly smaller cardinality than  mDinputDP Iwhereas it might grow significantly beyond the cardinality of  mDinputDP Iin a scenario where many unfavorable “negative-impact queries” are generated (cf. Section 12.1). The maximum theoretically possible cardinality of the search space for DYNAMICHS is given by  |aDinputDP I|due to Corollary 12.4.

Second Segment of Table 13.1 – Impact of New Test Cases and Computation Focus. The properties given in the category “computes” in Table 13.1 are confirmed by Proposition 11.1 (STATICHS) and Corollary 12.8 (DYNAMICHS). Hence, other than DYNAMICHS which analyzes the current DPI in terms of minimal conflict sets and diagnoses in each iteration, STATICHS must only consider minimal conflict sets w.r.t. the input DPI (see categories “diagnoses” and “conflict sets” in Table 13.1). This is sufficient for the exploration of all minimal diagnoses w.r.t. the input DPI by Proposition 4.6. In this vein, new test cases in static KB debugging are not taken into account in the computation of minimal conflict sets. Instead, new test cases are just exploited to invalidate already computed minimal diagnoses w.r.t. the input DPI. Thus, test cases specified during static KB debugging are treated somewhat inferior to test cases already present in the input DPI. Because, the newly gained information given by these test cases is not utilized to reveal new faults in the KB or to lay the focus on just the now relevant parts of existing faults, but only for the purpose of constraining the search space for minimal diagnoses w.r.t. the input DPI  ⟨K, B, P, N ⟩R. We might thus call test cases added during the execution of Algorithm 5 with mode = static pure differentiation test cases (see category “purpose of test cases” in Table 13.1).

Of course, seen from the point of view of a current DPI, i.e. the input DPI extended by differentiation test cases, STATICHS does not guarantee completeness w.r.t. this current DPI, but only w.r.t. the initial one. This however does not mean that, after the (exact) solution  K∗ := (K \ D) ∪ UPof the Interactive Static KB Debugging problem has been localized by means of STATICHS, the differentiation test cases (P′and  N ′) cannot be simply added to the DPI. In this case,  K∗is still a maximal solution KB w.r.t. the extended input DPI  ⟨K, B, P ∪ P′, N ∪ N ′⟩R. In other words, there is no conflict set (and thus no diagnosis) w.r.t.  ⟨K \D, B, P ∪P′, N ∪N ′⟩Rand K \D is valid w.r.t.  ⟨·, B, P ∪P′, N ∪N ′⟩R. However, in spite of using the (exact) solution KB of the Interactive Static KB Debugging problem, it is not ensured that this solution is the optimal one w.r.t. the extended DPI, i.e. of the Interactive Dynamic KB Debugging problem. This is because user interaction is just exploited to the extent that the best solution w.r.t. the input DPI is crystallized out. It is not used to have the solution verified by the user in the light of the extended DPI.

On the other hand, test cases assigned throughout dynamic KB debugging by means of Algorithm 5 with mode = dynamic are treated equally as test cases already given in the input DPI. They are used to prune the search space and to pinpoint new faults that arise from added test cases resulting from answered queries. The dynamic algorithm assists the user in filtering out a solution and verifying in a thorough manner that this solution is the desired one w.r.t. the extended DPI, among all existing solutions w.r.t. the extended DPI. Due to these aspects we might regard Algorithm 5 with mode mode = dynamic as the standard method for Interactive KB Debugging.

In Sections 11.1, 12.1, 12.4.3 and 12.4.4 we have thoroughly investigated the impact of new test cases (answered queries) added to the DPI on the set of minimal (all) diagnoses and the set of minimal conflict sets considered by the respective method STATICHS or DYNAMICHS. For the former, we have shown that (for arbitrary iteration i of Algorithm 5)  mDi ⊃ mDi+1and  aDi ⊃ aDi+1where  mDiand  aDidenote the set of all minimal diagnoses and the set of all diagnoses, respectively, that are relevant (for the DPI considered) during iteration i. That is, the set of minimal as well as the set of all diagnoses (w.r.t. the input DPI) is reduced to a proper subset after a new test case has been added. For the latter, (for arbitrary iteration i of Algorithm 5) we have argued that generally  mDi ̸⊃ mDi+1, but still  aDi ⊃ aDi+1, where  mDiand  aDiare defined as above. That is, not only might some minimal diagnoses (w.r.t. the last-but-one DPI) be invalidated, but also some new ones (w.r.t. the current DPI) might originate from the incorporation of the information given by a query answer.

Concerning minimal conflict sets, the set of all (or: relevant) minimal conflict sets does not change throughout a debugging session by means of STATICHS, i.e.  mCi = mCi+1(for arbitrary iteration i of Algorithm 5) where  mCiis the set of minimal conflict sets relevant (for the DPI considered) during iteration i. This holds since the minimal conflict sets w.r.t. the input DPI are artificially fixed (see above). On the contrary, the assignment of a new test case using DYNAMICHS involves the reduction of some minimal conflict sets (w.r.t. the last-but-one DPI) to smaller subset conflict sets (w.r.t. the current DPI) and/or the introduction of some “completely new” minimal conflict sets (which are in no subset-relation with existing ones, cf. Section 12.1). These results are summarized by the categories “set of all X upon addition of a test case” in Table 13.1.

Third Segment of Table 13.1 – Hitting Set Tree Construction, Pruning and Complexity. Regarding the constructed hitting set tree, we have explained that STATICHS builds a wpHS-tree (see Definition 4.10 on page 74 and the argumentation in Section 11.4) just as the HS method which is employed for diagnosis computation in the presented non-interactive KB debugging scenario (Algorithm 3). The main differences between Algorithm 5 in static mode and Algorithm 3 are, first, that the former constructs the wpHS-tree step-by-step in multiple phases. Between each two phases a query is generated and presented to the user. The latter, by contrast, finishes the tree construction (to the extent as prescribed by the given parameters nmin, nmaxand t, see Section 4.7) before a single most probable automatically selected solution or a set of solutions is displayed to the user. Second, the tree constructed by the interactive static algorithm exhibits a different labeling of leaf nodes than the one built up be the non-interactive algorithm. In the former, some leaf nodes might be labeled by  ×indicating that the path to this node is a minimal diagnosis w.r.t. the input DPI, but one which is not in accordance with all answered queries. Notice that such invalidated diagnoses cannot be simply deleted in favor of memory savings, but must be stored in order for the non-minimality criterion (lines 21-23) to function properly which is necessary to preserve the property of STATICHS to compute only minimal diagnoses (cf. Lemma 11.7). In the non-interactive wpHS-tree, on the other hand, all minimal diagnoses w.r.t. the input DPI are labeled by  ✓.

What the interactive static and the non-interactive tree have in common is the usage of only minimal conflict sets w.r.t. the input DPI as labels of internal (i.e. non-leaf) nodes and the adherence to the “standard” pruning rules [Rei87] as per Definition 4.8 on page 59, i.e. the immediate deletion of non-minimal and duplicate tree paths. Except for the standard pruning actions that take place during tree expansion, no separate pruning phases are performed by STATICHS. The reason for this is the fixation of the minimal conflict sets, i.e. the consideration of only minimal conflict sets w.r.t. the input DPI. Incorporation of new minimal conflict sets resulting from answered queries would generally negate completeness of STATICHS w.r.t. the exploration of all minimal diagnoses w.r.t. the input DPI. Integration of new conflict sets that are subsets of existing ones, however, is the key to more substantial pruning actions carried out by DYNAMICHS.

Due to the more or less equivalent construction of both the tree built up by STATICHS and the one constructed by the HS method in the non-interactive algorithm, it is straightforward to recognize that the worst case time and space complexity of both tree computations (without taking into the account other actions performed by the interactive algorithm like probability updates and query generations) are equal. By worst case complexity we refer to the complexity of the search for the (exact) solution of the Interactive Static KB Debugging Problem on the one hand and the complexity of enumerating all minimal diagnoses w.r.t. the input DPI on the other hand. In particular, the complexity of tree construction in static KB debugging is independent of given parameters such as the ones for leading diagnoses computation (nmin, nmaxand t) and of the test cases that are classified positively or negatively, respectively, during the debugging session.

To sum up, due to the artificial fixation of the solution set, there is no possibility of tree pruning in static KB debugging except for the standard pruning rules and hence no way to escape the generally immense worst case complexity for diagnosis search in case  σ = 0.

The hitting set tree constructed by DYNAMICHS, on the other hand, might differ significantly from the wpHS-tree produced by the non-interactive algorithm. First, it uses minimal conflict sets w.r.t. the current DPI to label internal nodes in the tree during each expansion stage. Since minimal conflict sets can only “shrink” and not “grow” due to the integration of test cases into a DPI as stated by Proposition 12.1, the finding that by now a subset of a former minimal conflict set (w.r.t. some previous DPI) is already a minimal conflict set (w.r.t. the current DPI) gives rise to very powerful ways of tree pruning, as we detailed in Section 12.4.6 and illustrated by Example 12.2. In this vein, the evolution of the tree produced by DYNAMICHS can be characterized by alternating expansion and pruning stages. A pruning stage takes place after a test case has been added to the last-but-one DPI in order to modify the tree  Tiused to search for minimal diagnoses w.r.t. the last-but-one DPI to obtain a tree  Ti+1that enables the discovery of all minimal diagnoses w.r.t. the current DPI. Concretely, both pre-pruning as well as post-pruning is possible during a pruning phase. Pre-pruning refers to the deletion of tree paths ending in an open leaf node, i.e. paths corresponding to partial diagnoses, and post-pruning refers to the deletion of tree paths ending in a closed node, i.e. paths corresponding to (minimal or non-minimal) diagnoses. Both pre- and post-pruning are not possible in STATICHS. The ability for significant tree pruning comes at the cost of not being able to exploit the standard pruning rules as STATICHS does. For, non-minimal diagnoses and duplicate tree paths must be stored to guarantee the proper working of tree pruning and in further consequence the completeness of minimal diagnoses search for each current DPI (see Section 12.4).

As we pointed out in Section 12.1, the test cases specified during the dynamic debugging session and the defined leading diagnoses computation parameters  nmin, nmaxand t might have a material influence on the extent of possible tree pruning on the one hand and the extent of undesired tree growth on the other. Thence, worst case time and space complexity of the tree generation by means of DYNAMICHS cannot be initially (at least theoretically) quantified as in the case of STATICHS. Consequently, significant savings as well as a substantial overhead compared to STATICHS are possible. Careful “control” of certain properties of asked queries (added test cases) might help to keep considerable unwanted tree growth within bounds, as we touched upon in Section 12.1 and will elaborate on in future work.

Nevertheless, we want to mention a shortcoming of STATICHS compared to DYNAMICHS. Namely, for  σ = 0, STATICHS must enumerate all minimal diagnoses w.r.t. the input DPI (otherwise no diagnosis can have a probability of 1, see the proof of Proposition 9.1 in Section 9.4) whereas DYNAMICHS might be able to obtain some extended DPI (by the addition of test cases) soon for which only one minimal diagnosis exists. This might require the computation of only a small fraction of the number of  |mDinputDP I|minimal diagnoses that STATICHS must determine and therefore might be substantially more time and space saving than figuring out all minimal diagnoses w.r.t. some DPI. This is quite well illustrated by Examples 11.2 and 12.2.

Fourth Segment of Table 13.1 – Query Generation and Bias. We explained in Remark 11.2 on page 153 that queries in STATICHS are computed w.r.t. the current DPI albeit only minimal diagnoses w.r.t. the input DPI (which are at the same time minimal diagnoses w.r.t. the current DPI, cf. bullet (a) on page 128) are considered and calculated by Algorithm 5 with mode = static. In the case of dynamic debugging it is clear that queries are computed w.r.t. the current DPI since only minimal diagnoses w.r.t. the current DPI are taken into account.

Another important property of an interactive KB debugging algorithm is whether it is biased or unbiased. Intuitively, we call an interactive KB debugging algorithm biased w.r.t. some current DPI DPI encountered during its execution iff there might be a minimal diagnosis D w.r.t. DPI such that D might be definitely invalidated independently of the answers a user gives. In other words, an interactive KB debugging algorithm is unbiased iff for each minimal diagnosis D w.r.t. DPI there is a set  QADincluding query answer-pairs such that the addition of the positive queries in  QADto the positive test cases of DPI and the addition of the negative queries in  QADto the negative test cases of DPI yields an extended DPI DPI′such that D is the only minimal diagnosis w.r.t.  DPI′. This means that unbiasedness implies that any solution w.r.t. any encountered current DPI during the debugging session might be found as the finally remaining (exact) solution diagnosis. So, all solutions are treated equitably by an unbiased algorithm and only the user may decide by their given answers which solutions are and which are not ruled out.

More formally, we define unbiasedness of an interactive KB debugging algorithm as follows:

Definition 13.1. Let  ⟨K, B, P, N ⟩Rbe the input DPI given to an algorithm  AlgXthat solves the Interactive X Debugging Problem for  X ∈ {static, dynamic}. Let  P′ ⊇ ∅and  N ′ ⊇ ∅be the sets of test cases specified so far during the execution of  AlgXand let  D ⊆ mD⟨K,B,P∪P′,N∪N ′⟩Rbe the current set of leading diagnoses. Then, we call  AlgXbiased w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Riff there is a diagnosis  D ∈ mD⟨K,B,P∪P′,N∪N ′⟩Rand a query  Q ∈ QD,⟨K,B,P∪P′,N∪N ′⟩Rsuch that D /∈ mD⟨K,B,P∪P′∪{Q},N∪N ′⟩Rand  D /∈ mD⟨K,B,P∪P′,N∪N ′∪{Q}⟩R.

image

any execution of  AlgXsuch that  AlgXis biased w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩R.

Remark 13.1 It is important to notice the difference between completeness (which has already been established for Algorithm 5 using any of the methods STATICHS or DYNAMICHS, see Lemma 11.5 and Proposition 12.8) and unbiasedness of an algorithm. Completeness refers to the guarantee that the algorithm explores all minimal diagnoses w.r.t. any DPI DPI. However, it does not say anything about what might happen after a new test case Q is added to DPI. Although it does state that all minimal diagnoses w.r.t. the new DPI  DPI′are explored, it leaves us unclear about what effect the addition of the query Q to the test cases might have had on the minimal diagnoses. So, there might be a minimal diagnosis w.r.t. DPI that would have been ruled out by both answers to Q thereby violating unbiasedness, but not completeness. To sum up, completeness gives us guarantees about what happens during the diagnosis computation phase whereas unbiasedness gives us guarantees about what happens during the transition from one DPI to a new DPI.

In the following, we show that Algorithm 5 in both static and dynamic mode is unbiased.

Proposition 13.1. Assume the execution of Algorithm 5 with  mode ∈ {static, dynamic}given the input DPI  ⟨K, B, P, N ⟩R. Further, let  D := Dcalcbe the set of minimal diagnoses w.r.t.  ⟨K, B, P ∪ P′, N ∪N ′⟩Rreturned by a call of DYNAMICHS in case of mode = dynamic and  D := Dcalc ∪ D✓be the set of minimal diagnoses w.r.t.  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rreturned by a call of STATICHS in case of mode = static. Moreover, let  D ∈ mD⟨K,B,P∪P′,N∪N ′⟩R.

Then, no query Q w.r.t. D and  ⟨K, B, P ∪ P′, N ∪ N ′⟩Rcan be computed by Algorithm 5 such that D /∈ mD⟨K,B,P∪P′∪{Q},N∪N ′⟩Rand  D /∈ mD⟨K,B,P∪P′,N∪N ′∪{Q}⟩R.

Proof. Let us consider the q-partition  P(Q) =�D+(Q), D−(Q), D0(Q)�of the query Q that is computed by Algorithm 5 for the set of leading diagnoses D. By Proposition 7.1, we have that  D+(Q) ∪D−(Q) ∪ D0(Q) = Dand  D+(Q), D−(Q)and  D0(Q)are pairwise disjoint sets, i.e. the sets  D+(Q), D−(Q)and  D0(Q)constitute a partition of the set D. Let us now assume that each diagnosis in mD⟨K,B,P∪P′,N∪N ′⟩Ris assigned to its respective set in P(Q) as per Definition 7.2 yielding the tuple�D+m(Q), D−m(Q), D0m(Q)�where  D+m(Q)∪D−m(Q)∪D0m(Q) = mD⟨K,B,P∪P′,N∪N ′⟩R. Then, by analogue argumentation as in the proof of Proposition 7.1, we obtain that  D+m(Q), D−m(Q)and  D0m(Q)are pairwise disjoint sets. That is,�D+m(Q), D−m(Q), D0m(Q)�is the (extended) q-partition of Q w.r.t. the leading diagnoses set  mD⟨K,B,P∪P′,N∪N ′⟩R.

By Remark 7.4, we have that  Dpos := D+m(Q) ∪ D0m(Q)are minimal diagnoses w.r.t. the DPI ⟨K, B, P ∪ P′ ∪ {Q} , N ∪ N ′⟩R(positive answer u(Q)) and  Dneg := D−m(Q) ∪ D0m(Q)are minimal diagnoses w.r.t. the DPI  ⟨K, B, P ∪ P′, N ∪ N ′ ∪ {Q}⟩R(negative answer u(Q)). Since  Dpos ∪ Dneg ⊇mD⟨K,B,P∪P′,N∪N ′⟩R, we have that each diagnosis in  mD⟨K,B,P∪P′,N∪N ′⟩Ris either in  Dposor in Dneg(or in both). Hence, for each diagnosis  D ∈ mD⟨K,B,P∪P′,N∪N ′⟩Rthere is some answer  u(Q) ∈{true, false} to the query Q such that D is a diagnosis w.r.t. the DPI resulting from  ⟨K, B, P ∪ P′, N ∪N ′⟩Rby addition of the new test case Q to the respective set (P ∪P′for positive and  N ∪N ′for negative answer). Consequently, the claimed proposition holds.

Corollary 13.1. Algorithm 5 with  mode ∈ {static, dynamic}is unbiased for any given input DPI ⟨K, B, P, N ⟩R.

image

Table 13.1: Comparison: STATICHS versus DYNAMICHS.

image

Two Query Strategies for Efficient Fault Localization in Interactive Ontology Debugging

image

In this part, we suggest and extensively analyze different methods for the selection of an “optimal” query. The material dealt with in Part IV is based on the publications [SFFR12, SF10] where the former was published in the journal Web Semantics: Science, Services and Agents on the World Wide Web and the latter in the Proceedings of the 9th International Semantic Web Conference (ISWC 2010).

Ontology acquisition and maintenance are important prerequisites for the successful application of semantic systems in areas such as the Semantic Web. However, as state of the art ontology extraction methods cannot automatically acquire ontologies in a complete and error-free fashion, users of such systems must formulate and correct logical descriptions on their own. In most of the cases these users are domain experts who have little or no experience in expressing knowledge in representation languages like OWL 2 DL [GHM+08]. Studies in cognitive psychology, e.g. [CP71, JL99], indicate that humans make systematic errors while formulating or interpreting logical descriptions, with the results presented in [RDH+04, RCVB09] confirming that these observations also apply to ontology development. Moreover, the problem gets even more if an ontology is developed by a group of users, such as OBO Foundry29 or NCI Thesaurus30, is based on a set of imported third-party ontologies, etc. In this case inconsistencies might appear if some user does not understand or accept the context in which shared ontological descriptions are used. Therefore, identification of erroneous ontological definitions is a difficult and time-consuming task.

Several ontology debugging methods [SHCH07, KPHS07, FS05, HPS08] were proposed to simplify ontology development and maintenance. Usually the main aim of debugging is to obtain a consistent and, optionally, coherent ontology. These basic requirements can be extended with additional ones, such as test cases [FS05], which must be fulfilled by the target ontology  Ot. Any ontology that does not fulfill the requirements is faulty regardless of how it was created. For instance, an ontology might be created by an expert specializing descriptions of the imported ontologies (top-down) or by an inductive learning algorithm from a set of examples (bottom-up).

Note that even if all requirements are completely specified, many logically equivalent target ontologies might exist. They may differ in aspects such as the complexity of consistency checks, size or readability. However, selecting between logically equivalent theories based on such measures is out of the scope of this work. Furthermore, although target ontologies may evolve as requirements change over time, we assume that the target ontology remains stable throughout a debugging session.

Given an set of requirements (e.g. formulated by a user) and a faulty ontology, the task of an ontology debugger is to identify the set of alternative diagnoses, where each diagnosis corresponds to a set of possibly faulty axioms. More concretely, a diagnosis D is a subset of an ontology O such that one should remove (change) all the axioms of a diagnosis from the ontology (i.e. O \ D) in order to formulate an ontology  O′that fulfills all the given requirements. Only if the set of requirements is complete the only possible ontology  O′corresponds to the target ontology  Ot. In the following we refer to the removal of a diagnosis from the ontology as a trivial application of a diagnosis. Moreover, in practical applications it might be inefficient to consider all possible diagnoses. Therefore, modern ontology debugging approaches focus on the computation of minimal diagnoses. A set of axioms  Diis a minimal diagnosis iff there is no proper subset  D′i ⊂ Diwhich is a diagnosis. Thus, minimal diagnoses constitute minimal required changes to the ontology.

Application of diagnosis methods can be problematic in the cases for which many alternative minimal diagnoses exist for a given set of test cases and requirements. A sample study of real-world incoherent ontologies, which were used in [KPHS07], showed that hundreds or even thousands of minimal diagnoses may exist. In the case of the Transportation ontology the diagnosis method was able to identify 1782 minimal diagnoses 31. In such situations a simple visualization of all alternative sets of modifications to the ontology is ineffective. Thus an efficient debugging method should be able to discriminate between the diagnoses in order to select the target diagnosis  Dt. Trivial application of  Dtto the ontology O allows a user to extend  (O \ Dt)with a set of additional axioms EX and, thus, to formulate the target ontology Ot, i.e.  Ot = (O \ Dt) ∪ EX.

One possible solution to the diagnosis discrimination problem would be to order the set of diagnoses by various preference criteria. For instance, Kalyanpur et al. [KPSCG06] suggest a measure to rank the axioms of a diagnosis depending on their structure, usage in test cases, provenance, and impact in terms of entailments. Only the top ranking diagnoses are then presented to the user. Of course this set of diagnoses will contain the target diagnosis only in cases where the faulty ontology, the given requirements and test cases provide sufficient data to the appropriate heuristic. However, it is difficult to identify which information, e.g. test cases, is really required to identify the target diagnosis. That is, a user does not know a priori which and how many tests should be provided to the debugger to ensure that it will return the target diagnosis.

In this part we present an approach for the acquisition of additional information by generating a sequence of queries, the answers of which can be used to reduce the set of diagnoses and ultimately identify the target diagnosis. These queries should be answered by an oracle such as a user or an information extraction system. In order to construct queries we exploit the property that different ontologies resulting from trivial applications of different diagnoses entail unequal sets of axioms. Consequently, we can differentiate between diagnoses by asking the oracle if the target ontology should entail a set of logical sentences or not. These entailed logical sentences can be generated by the classification and realization services provided in description logic reasoning systems [SPG+07, HM01, MSH09]. In particular, the classification process computes a subsumption hierarchy (sometimes also called “inheritance hierarchy” of parents and children) for each concept description mentioned in a TBox. For each individual mentioned in an ABox, the realization computes all the concept names of which the individual is an instance [SPG+07].

We propose two methods for selecting the next query of the set of possible queries: The first method employs a greedy approach that selects queries which try to cut the number of diagnoses in half. The second method exploits the fact that some diagnoses are more likely than others because of typical user errors [RDH+04, RCVB09]. Beliefs for an error to occur in a given part of a knowledge base, represented as a probability, can be used to estimate the change in entropy of the set of diagnoses if a particular query is answered. In our evaluation the fault probabilities of axioms are estimated by the type and number of the logical operators employed. For example, roughly speaking, the greater the number of logical operators and the more complex these operators are, the greater the fault probability of an axiom. For assigning prior fault probabilities to diagnoses we employ the fault probabilities of axioms. Of course other methods for guessing prior fault probabilities, e.g. based on context of concept descriptions, measures suggested in the previous work [KPSCG06], etc., can be easily integrated in our framework. Given a set of diagnoses and their probabilities the method selects a query which minimizes the expected entropy of a set of diagnoses after an oracle answers a query, i.e. maximizes the information gain. An oracle should answer such queries until a diagnosis is identified whose probability is significantly higher than those of all other

diagnoses. This diagnosis is most likely to be the target diagnosis.

In the first evaluation scenario we compare the performance of both methods in terms of the number of queries needed to identify the target diagnosis. The evaluation is performed using generated examples as well as real-world ontologies presented in Tables 18.1 and 18.5. In the first case we alter a consistent and coherent ontology with additional axioms to generate conflicts that result in a predefined number of diagnoses of a required length. Each faulty ontology is then analyzed by the debugging algorithm using entropy, greedy and “random” strategies, where the latter selects queries at random. The evaluation results show that in some cases the entropy-based approach is almost 60% better than the greedy one whereas both approaches clearly outperformed the random strategy.

In the second evaluation scenario we investigate the robustness of the entropy-based strategy with respect to variations in the prior fault probabilities. We analyze the performance of entropy-based and greedy strategies on real-world ontologies by simulating different types of prior fault probability distributions as well as the “quality” of these probabilities that might occur in practice. In particular, we identify the cases where all prior fault probabilities are (1) equal, (2) “moderately” varied or (3) “extremely” varied. Regarding the “quality” of the probabilities we investigate cases where the guesses based on the prior diagnosis probabilities are good, average or bad. The results show that the entropy method outperforms “split-in-half” in almost all of the cases, namely when the target diagnosis is located in the more likely two thirds of the minimal diagnoses. In some situations the entropy-based approach achieves even twice the performance of the greedy one. Only in cases where the initial guess of the prior probabilities is very vague (the bad case), and the number of queries needed to identify the target diagnosis is low, “split-in-half” may save on average one query. However, if the number of queries increases, the performance of the entropy-based query selection increases compared to the “split-in-half” strategy. We observed that if the number of queries is greater than 10, the entropy-based method is preferable even if the initial guess of the prior probabilities is bad. This is due to the effect that the initial bad guesses are improved by the Bayes-update of the diagnoses probabilities as well as an ability of the entropy-based method to stop in the cases when a probability of some diagnosis is above an acceptance threshold predefined by the user. Consequently, entropy-based query selection is robust enough to handle different prior fault probability distributions.

Additional experiments performed on big real-world ontologies demonstrate the scalability of the suggested approach. In our experiments we were able to identify the target diagnosis in an ontology with over 33000 axioms using entropy-based query selection in only 190 seconds using an average of five queries.

The remainder of Part IV is organized as follows: Chapter 15 presents two introductory examples as well as the basic concepts. The details of the entropy-based query selection method are given in Chapter 16. Chapter 17 describes the implementation of the approach and is followed by evaluation results in Chapter 18. An overview of related work is given in Chapter 19 and conclusions are drawn in Chapter 20.

Concepts

We begin by presenting the fundamentals of ontology diagnosis and then show how queries and answers can be generated and employed to differentiate between sets of diagnoses.

Description Logics

Since the underlying knowledge representation method of ontologies in the Semantic Web is based on description logics, we start by briefly introducing the main concepts, employing the usual definitions as in [Bor96, Baa03]. A knowledge base is comprised of two components, namely a TBox (denoted by T ) and a ABox (A). The TBox defines the terminology whereas the ABox contains assertions about named individuals in terms of the vocabulary defined in the TBox. The vocabulary consists of concepts, denoting sets of individuals, and roles, denoting binary relationships between individuals. These concepts and roles may be either atomic or complex, the latter being obtained by employing description operators. The language of descriptions is defined recursively by starting from a schema S = (CN, RN, IN) of disjoint sets of names for concepts, roles, and individuals. Typical operators for the construction of complex descriptions are  C ⊔D(disjunction),  C ⊓D(conjunction),  ¬C(negation),  ∀R.C(concept value restriction), and  ∃R.C(concept exists restriction), where C and D are elements of CN and  R ∈ RN.

Knowledge bases are defined by a finite set of logical sentences. Sentences regarding the TBox are called terminological axioms whereas sentences regarding the ABox are called assertional axioms. Terminological axioms are expressed by  C ⊑ D(Generalized Concept Inclusion) which corresponds to the logical implication. Let  a, b ∈ INbe individual names. C(a) and R(a, b) are thus assertional axioms.

Concepts (rsp. roles) can be regarded as unary (rsp. binary) predicates. Roughly speaking description logics can be seen as fragments of first-order predicate logic (without considering transitive closure or special fixpoint semantics). These fragments are specifically designed to ensure decidability or favorable computational costs.

The semantics of description terms are usually given using an interpretation  I = ⟨∆I, (·)I⟩, where ∆Iis a domain (non-empty universe) of values, and  (·)Iis a function that maps every concept description to a subset of  ∆I, and every role name to a subset of  ∆I × ∆I. The mapping also associates a value in ∆Iwith every individual name in IN. An interpretation I is a model of a knowledge base iff it satisfies all terminological axioms and assertional axioms. A knowledge base is satisfiable iff a model exists. A concept description C is coherent (satisfiable) w.r.t. a TBox T , if a model I of T exists such that  CI ̸= ∅.

A TBox is incoherent iff an incoherent concept description exists.

Diagnosis of Ontologies

Example 15.1 Consider a simple ontology O with the terminology T :

image

and assertions  A : {A(w), ¬R(w), A(v)}. Assume that the user explicitly states that the three assertional axioms should be considered as correct, i.e. these axioms are added to a background theory B. The introduction of a background theory ensures that the diagnosis method focuses purely on the potentially faulty axioms.

image

The only irreducible set of non-background axioms (minimal conflict set) that preserves the inconsistency is  CS : ⟨ax 1, ax 2, ax 3, ax 4⟩. That is, one has to modify or remove the axioms of at least one of the following diagnoses

image

to restore the consistency of the ontology. However, it is unclear which of the ontologies  Oi = O \ Diobtained by application of diagnoses from the set  D : {D1, . . . , D4}is the target one.

Definition 15.1. A target ontology  Otis a set of logical sentences characterized by a set of background axioms B, a set of sets of logical sentences P that must be entailed by  Otand the set of sets of logical sentences N that must not be entailed by  Ot.

image

• Otmust be satisfiable (optionally coherent)

image

• B ⊆ Ot

• Ot |= p ∀p ∈ P

• Ot̸|= n ∀n ∈ N

Given B, P, and N, an ontology O is faulty iff O does not fulfill all the necessary requirements of the target ontology.

Note that the approach presented in this work can be used with any knowledge representation language for which there exists a sound and complete procedure to decide whether O |= ax and the entailment operator |= is extensive, monotone and idempotent. For instance, these requirements are fulfilled by all subsets of OWL 2 which are interpreted under OWL Direct Semantics.

Definition 15.1 allows a user to identify the target diagnosis  Dtby providing sufficient information about the target ontology in the sets B, P and N. For instance, if in Example 15.1 the user provides the information that  Ot |= {B(w)}and  Ot ̸|= {C(w)}, the debugger will return only one diagnosis, namely D2. Application of this diagnosis results in a consistent ontology  O2 = O \ D2that – integrated with the background knowledge B – entails {B(w)} because of  ax 1and the assertion A(w). In addition,  O2 ∪ Bdoes not entail {C(w)} since  O2 ∪ B ∪ {¬C(w)}is consistent and, moreover,  {¬R(w), ax 4, ax 3} |={¬C(w)}. All other ontologies  Oi = (O \ Di)obtained by the application of the diagnoses  D1, D3and  D4do not fulfill the given requirements, since  O1 ∪ B ∪ {B(w)}is inconsistent and therefore any consistent extension of  O1 ∪Bcannot entail {B(w)}. As both  O3 ∪Band  O4 ∪Bentail  {C(w)}, O2 ∪Bcorresponds to the target ontology  Ot.

Definition 15.2. Let  ⟨O, B, P, N⟩be a diagnosis problem instance, where O is an ontology, B a background theory, P a set of sets of logical sentences which must be entailed by the target ontology  Ot, and N a set of sets of logical sentences which must not be entailed by  Ot.

A set of axioms  D ⊆ Ois a diagnosis iff the set of axioms O \ D can be extended by a logical description EX such that:

1.  (O \ D) ∪ B ∪ EXis consistent (and coherent if required)

2. (O \ D) ∪ B ∪  EX |= p ∀p ∈ P

3. (O \ D) ∪ B ∪ EX ̸|= n ∀n ∈ N

A diagnosis  Didefines a partition of the ontology O where each axiom  ax j ∈ Diis a candidate for changes by the user and each axiom  ax k ∈ O\Diis correct. If  Dtis the set of axioms of O to be changed (i.e.  Dtis the target diagnosis) then the target ontology  Otis  (O \ Dt) ∪ B ∪ EXfor some EX defined by the user.

In the following we assume the background theory B together with the sets of logical sentences in the sets P and N always allow formulation of the target ontology. Moreover, a diagnosis exists iff a target ontology exists.

Proposition 15.1. A diagnosis D for a diagnosis problem instance  ⟨O, B, P, N⟩exists iff

image

The set of all diagnoses is complete in the sense that at least one diagnosis exists where the ontology resulting from the trivial application of a diagnosis is a subset of the target ontology:

Proposition 15.2. Let  D ̸= ∅be the set of all diagnoses for a diagnosis problem instance  ⟨O, B, P, N⟩and  Otthe target ontology. Then a diagnosis  Dt ∈ Dexists s.t.  (O \ Dt) ⊆ Ot.

The set of all diagnoses can be characterized by the set of minimal diagnoses.

Definition 15.3. A diagnosis D for a diagnosis problem instance  ⟨O, B, P, N⟩is a minimal diagnosis iff there is no  D′ ⊂ Dsuch that  D′is a diagnosis.

Proposition 15.3. Let  ⟨O, B, P, N⟩be a diagnosis problem instance. For every diagnosis D there is a minimal diagnosis  D′s.t.  D′ ⊆ D.

Definition 15.4. A diagnosis D for a diagnosis problem instance  ⟨O, B, P, N⟩is a minimum cardinality diagnosis iff there is no diagnosis  D′such that  |D′| < |D|.

To summarize, a diagnosis describes which axioms are candidates for modification. Despite the fact that multiple diagnoses may exist, some are more preferable than others. E.g. minimal diagnoses require minimal changes, i.e. axioms are not considered for modification unless there is a reason. Minimal cardinality diagnoses require changing a minimal number of axioms. The actual type of error contained in an axiom is irrelevant as the concept of diagnosis defined here does not make any assumptions about errors themselves. There can, however, be instances where an ontology is faulty and the empty diagnosis is the only minimal diagnosis, e.g. if some axioms are missing and nothing must be changed.

The extension EX plays an important role in the ontology repair process, suggesting axioms that should be added to the ontology. For instance, in Example 15.1 the user requires that the target ontology must not entail {B(w)} but has to entail {B(v)}, that is N = {{B(w)}} and P = {{B(v)}}. Because, the example ontology O is inconsistent some sentences must be changed. The consistent ontology  O1 =O \ D1(along with the background axioms B) neither entails {B(v)} nor {B(w)} (in particular  O1 ∪B |= {¬B(w)}). Consequently,  O1has to be extended with a set EX of logical sentences in order to entail {B(v)}. This set of logical sentences can be approximated with  EX = {B(v)}. O1 ∪ B ∪ EXis satisfiable, entails {B(v)} but does not entail {B(w)}. All other ontologies  Oi = O \ Di, i =2, 3, 4 (integrated with B) are consistent but entail {B(w), B(v)} and must be rejected because of the monotonic semantics of description logic. That is, there is no such extension EX that  (Oi ∪ B ∪ EX) ̸|={B(w)}. Therefore, the diagnosis  D1is the minimum cardinality diagnosis which allows the formulation of the target ontology. Note that formulation of the complete extension is impossible, since our diagnosis approach deals with changes to existing axioms and does not learn new axioms.

The following corollary characterizes diagnoses without employing the true extension EX to formulate the target ontology. The idea is to use the sentences which must be entailed by the target ontology to approximate EX as shown above.

Corollary 15.1. Given a diagnosis problem instance  ⟨O, B, P, N⟩, a set of axioms  D ⊆ Ois a diagnosis iff

image

Proof sketch:  (⇒)Let  D ⊆ Obe a diagnosis for  ⟨O, B, P, N⟩. Since there is an EX s.t.  (O\D)∪B∪EXis satisfiable (coherent) and  (O\D)∪B∪EX |= pfor all  p ∈ P, it follows that  (O\D)∪B∪EX∪�p∈P pis satisfiable (coherent) and therefore  (O \ D) ∪ B ∪ �p∈P pis satisfiable (coherent). Consequently, the first condition of the corollary is fulfilled. Since  (O \ D) ∪ B ∪ EX |= pfor all  p ∈ Pand (O \ D) ∪ B ∪ EX ̸|= nfor all  n ∈ Nit follows that  (O \ D) ∪ B ∪ EX ∪ �p∈P p ̸|= nfor all  n ∈ N. Consequently,  (O \ D) ∪ B ∪ �p∈P p ̸|= nfor all  n ∈ Nand the second condition of the corollary is fulfilled.

(⇐)Let  D ⊆ Oand  ⟨O, B, P, N⟩be a diagnosis problem instance. Without limiting generality let EX = P. By Condition 1 of the corollary  (O \ D) ∪ B ∪ �p∈P pis satisfiable (coherent). Therefore, for EX = P the sentences  (O \ D) ∪ B ∪ EXare satisfiable (coherent), i.e. the first condition for a diagnosis is fulfilled and these sentences entail p for all  p ∈ Pwhich corresponds to the second condition a diagnosis must fulfill. Furthermore, by Condition 2 of the corollary  (O \ D) ∪ B ∪ EX ̸|= nfor all n ∈ Nholds and therefore the third condition for a diagnosis is fulfilled. Consequently,  D ⊆ Ois a diagnosis for  ⟨O, B, P, N⟩.

Conflict sets, which are the parts of the ontology that preserve the inconsistency/incoherency, are usually employed to constrain the search space during computation of diagnoses.

Definition 15.5. Given a diagnosis problem instance  ⟨O, B, P, N⟩, a set of axioms  CS ⊆ Ois a conflict set iff  CS ∪ B ∪ �p∈P pis inconsistent (incoherent) or  n ∈ Nexists s.t.  CS ∪ B ∪ �p∈P p |= n.

Definition 15.6. A conflict set CS for an instance  ⟨O, B, P, N⟩is minimal iff there is no  CS′ ⊂ CSsuch that  CS′is a conflict set.

A set of minimal conflict sets can be used to compute the set of minimal diagnoses as shown in [Rei87]. The idea is that each diagnosis must include at least one element of each minimal conflict set.

Proposition 15.4. D is a minimal diagnosis for the diagnosis problem instance  ⟨O, B, P, N⟩iff D is a minimal hitting set for the set of all minimal conflict sets of  ⟨O, B, P, N⟩.

image

Table 15.1: Entailments of ontologies  Oi = (O \ Di) , i = 1, . . . , 4(integrated with B) in Example 15.1 returned by realization.

Given a set of sets S, a set H is a hitting set of S iff  H ∩ Si ̸= ∅for all  Si ∈ Sand  H ⊆ �Si∈S Si.Most modern ontology diagnosis methods [SHCH07, KPHS07, FS05, HPS08] are implemented according to Proposition 28.2 and differ only in details, such as how and when (minimal) conflict sets are computed, the order in which hitting sets are generated, etc.

Differentiating between Diagnoses

The diagnosis method usually generates a set of diagnoses for a given diagnosis problem instance. Thus, in Example 15.1 an ontology debugger returns a set of four minimal diagnoses  {D1, . . . , D4}. As explained in the previous section, additional information, i.e. sets of sets of logical sentences P and N, can be used by the debugger to reduce the set of diagnoses. However, in the general case the user does not know which sets P and N to provide to the debugger such that the target diagnosis will be identified. Therefore, the debugger should be able to identify sets of logical sentences on its own and only ask the user or some other oracle, whether these sentences must or must not be entailed by the target ontology. To generate these sentences the debugger can apply each of the diagnoses in  D = {D1, . . . , Dn}and obtain a set of ontologies  Oi = O \ Di , i = 1, . . . , nthat fulfill the user requirements. For each ontology  Oia description logic reasoner can generate a set of entailments such as entailed subsumptions provided by the classification service and sets of class assertions provided by the realization service. These entailments can be used to discriminate between the diagnoses, as different ontologies entail different sets of sentences due to extensivity of the entailment relation. Note that in the examples provided in this section we consider only two types of entailments, namely subsumption and class assertion. In general, the approach presented in this work is not limited to these types and can use all of the entailment types supported by a reasoner.

For instance, in Example 15.1 for each ontology  Oi = (O \ Di) , i = 1 . . . 4(integrated with B) the realization service of a reasoner returns the set of class assertions presented in Table 15.1. Without any additional information the debugger cannot decide which of these sentences must be entailed by the target ontology. To obtain this information the diagnosis method must query an oracle that can specify whether the target ontology entails some set of sentences or not. E.g. the debugger could ask an oracle if {D(w)} is entailed by the target ontology (Ot |= {D(w)}). If the answer is yes, then {D(w)} is added to P and D4is considered as the target diagnosis. All other diagnoses are rejected because  (O \Di)∪B ∪{D(w)}for i = 1, 2, 3 is inconsistent. If the answer is no, then {D(w)} is added to N and  D4is rejected as (O \ D4) ∪ B |= {D(w)}and we have to ask the oracle another question. In the following we consider a query Q as a set of logical sentences such that  Ot |= Qholds iff  Ot |= qifor all  qi ∈ Q.

Property 1. Given a diagnosis problem instance  ⟨O, B, P, N⟩, a set of diagnoses D, a set of logical sentences Q representing the query  (Ot |= Q)and an oracle able to evaluate the query: If the oracle answers yes then every diagnosis  Di ∈ Dis a diagnosis for  P ∪ {Q}iff both conditions

hold:

image

If the oracle answers no then every diagnosis  Di ∈ Dis a diagnosis for  N ∪ {Q}iff both conditions hold:

image

In particular, a query partitions the set of diagnoses D into three disjoint subsets.

Definition 15.7. For a query Q, each diagnosis  Di ∈ Dof a diagnosis problem instance  ⟨O, B, P, N⟩can be assigned to one of the three sets  DP, DNor  D∅where

image

Given a diagnosis problem instance we say that the diagnoses in  DPpredict a positive answer (yes) as a result of the query Q, diagnoses in  DNpredict a negative answer (no), and diagnoses in  D∅do not make any predictions.

Property 2. Given a diagnosis problem instance  ⟨O, B, P, N⟩, a set of diagnoses D, a query Q and an oracle:

If the oracle answers yes then the set of rejected diagnoses is  DNand the set of remaining diagnoses is  DP ∪ D∅.If the oracle answers no then the set of rejected diagnoses is  DPand the set of remaining diagnoses is  DN ∪ D∅.

Consequently, given a query Q either  DPor  DNis eliminated but  D∅always remains after the query is answered. For generating queries we have to investigate for which subsets  DP, DN ⊆ Da query exists that can differentiate between these sets. A straight forward approach is to investigate all possible subsets of D. In our evaluation we show that this is feasible if we limit the number n of minimal diagnoses to be considered during query generation and selection. E.g. for n = 9, the algorithm has to verify 512 possible partitions in the worst case.

Given a set of diagnoses D for the ontology O, a set P of sets of sentences that must be entailed by the target ontology  Otand a set of background axioms B, the set of partitions PR for which a query exists can be computed as follows:

1. Generate the power set  P (D), PR ← ∅

2. Assign an element of P (D) to the set  DPiand generate a set of common entailments  Eiof all ontologies  (O \ Dj) ∪ B ∪ �p∈P p, where  Dj ∈ DPi

3. If  Ei = ∅, then reject the current element  DPi, i.e. set  P (D) ← P (D) \ {DPi }and goto Step 2. Otherwise set  Qi ← Ei.

4. Use Definition 15.7 and the query  Qito classify the diagnoses  Dk ∈ D\DPiinto the sets  DPi,  DNiand  D∅i. The generated partition is added to the set of partitions  PR ← PR∪{�Qi, DPi , DNi , D∅i�}and set  P (D) ← P (D) \ {DPi }. If  P (D) ̸= ∅then go to Step 2.

In Example 15.1 the set of diagnoses D of the ontology O contains 4 elements. Therefore, the power set P (D) includes 15 elements  {{D1}, {D2} , . . . , {D1, D2, D3, D4}}, assuming we omit the element corresponding to  ∅as it does not contain any diagnoses to be evaluated. Moreover, assume that P and N are empty. In each iteration an element of P (D) is assigned to the set  DPi. For instance, the algorithm assigns  DP1 = {D1, D2}. In this case the set of common entailments is empty as  (O \ D1) ∪ Bhas no entailed sentences (see Table 15.1). Therefore, the set  {D1, D2}is rejected and removed from P (D). Assume that in the next iteration the algorithm selects  DP2 = {D2, D3}. In this case the set of common entailments  E2 = {B(w)}is not empty and so  Q2 = {B(w)}. The remaining diagnoses  D1and  D4are classified according to Definition 15.7. That is, the algorithm selects the first diagnosis  D1and verifies whether  (O \ D1) ∪ B |= {B(w)}. Given the negative answer of the reasoner, the algorithm checks if  (O \ D1) ∪ B ∪ {B(w)}is inconsistent. Since the condition is satisfied the diagnosis  D1is added to the set  DN2. The second diagnosis  D4is added to the set  DP2as it satisfies the first requirement (O \ D4) ∪ B |= {B(w)}. The resulting partition  ⟨{B(w)}, {D2, D3, D4}, {D1}, ∅⟩is added to the set PR.

However, a query need not include all of the entailed sentences. If a query Q partitions the set of diagnoses into  DP, DNand  D∅and an (irreducible) subset  Q′ ⊂ Qexists which preserves the partition then it is sufficient to query  Q′. In our example,  Q2 : {B(w), C(w)}can be reduced to its subset Q′2 : {C(w)}. If there are multiple irreducible subsets that preserve the partition then we select one of them.

All of the queries and their corresponding partitions generated in Example 15.1 are presented in Table 15.2. Given these queries the debugger has to decide which one should be asked first in order to minimize the number of queries to be answered. A popular query selection heuristic (called “split-in-half”) prefers queries which allow half of the diagnoses to be removed from the set D regardless of the answer of an oracle.

Using the data presented in Table 15.2, the “split-in-half” heuristic determines that asking the oracle if  (Ot |= {C(w)})is the best query (i.e. the reduced query  Q2), as two diagnoses from the set D are removed regardless of the answer. Assuming that  D1is the target diagnosis, then an oracle will answer no to our question (i.e.  Ot ̸|= {C(w)}). Based on this feedback, the diagnoses  D3and  D4are removed according to Property 2. Given the updated set of diagnoses D and P = {{C(w)}} the partitioning algorithm returns the only partition  ⟨{B(w)} , {D2} , {D1} , ∅⟩. The heuristic then selects the query {B(w)}, which is also answered with no by the oracle. Consequently,  D1is identified as the only remaining minimal diagnosis.

In general, if n is the number of diagnoses and we can split the set of diagnoses in half with each query, then the minimum number of queries is  log2n. Note that this minimum number of queries can only be achieved when all minimal diagnoses are considered at once, which is intractable even for relatively small values of n.

However, in case probabilities of diagnoses are known we can reduce the number of queries by utilizing two effects:

image

Table 15.2: Possible queries in Example 15.1

1. We can exploit diagnoses probabilities to assess the likelihood of each answer and the expected value of the information contained in the set of diagnoses after an answer is given.

2. Even if multiple diagnoses remain, further query generation may not be required if one diagnosis is highly probable and all other remaining diagnoses are highly improbable.

Example 15.2 Consider an ontology O with the terminology T :

image

and the background theory containing the assertions  A : {A1(w), A1(u), s(u, w)}.

The ontology along with the background theory is inconsistent and the set of minimal conflict sets CS = {⟨ax 1, ax 3, ax 4⟩ , ⟨ax 1, ax 2, ax 3, ax 5⟩}. To restore consistency, the user should modify all axioms of at least one minimal diagnosis:

image

Following the same approach as in Example 15.1, we compute a set of possible queries and corresponding partitions using the algorithm presented above. A set of possible irreducible queries for Example 15.2 and their partitions are presented in Table 15.3. These queries partition the set of diagnoses D in a way that makes the application of myopic strategies, such as “split-in-half”, inefficient. A greedy algorithm based on such a heuristic would first select the first query  Q1, since there is no query that cuts the set of diagnoses in half. If  D4is the target diagnosis then  Q1will be answered with yes by an oracle (see Figure 15.1). In the next iteration the algorithm would also choose a suboptimal query, the first untried query  Q2, since there is no partition that divides the diagnoses  D1, D2, and  D4into two groups of equal size. Once again, the oracle answers yes, and the algorithm identifies query  Q4to differentiate between D1and  D4.

image

Table 15.3: Possible queries in Example 15.2

However, in real-world settings the assumption that all axioms fail with the same probability is rarely the case. For example, Roussey et al. [RCVB09] present a list of “anti-patterns” where an anti-pattern is a set of axioms, such as  {C1 ⊑ ∀R.C2, C1 ⊑ ∀R.C3, C2 ≡ ¬C3}that corresponds to a minimal conflict set. The study performed by [RCVB09] shows that such conflict sets often occur in practice due to frequent misuse of certain language constructs like quantification or disjointness. Such studies are ideal sources for estimating prior fault probabilities. However, this is beyond the scope of our work presented in this part.

Our approach for computing the prior fault probabilities of axioms is inspired by [RDH+04] and considers the syntax of a knowledge representation language, such as restrictions, conjunction, negation, etc. For instance, if a user frequently changes the universal to the existential quantifier and vice versa in order to restore coherency, then we can assume that axioms including such restrictions are more likely to fail than the other ones. In [RDH+04] the authors report that in most cases inconsistent ontologies are created because users (a) mix up  ∀r.Sand  ∃r.S, (b) mix up  ¬∃r.Sand  ∃r.¬S, (c) mix up  ⊔and ⊓, (d) wrongly assume that classes are disjoint by default or overuse disjointness, or (e) wrongly apply negation. Observing that misuses of quantifiers are more likely than other failure patterns one might find that the axioms  ax 2and  ax 4are more likely to be faulty than  ax 3(because of the use of quantifiers), whereas  ax 3is more likely to be faulty than  ax 5and  ax 1(because of the use of negation).

Detailed justifications of diagnoses probabilities are given in the next section. However, let us assume some probability distribution of the faults according to the observations presented above such that: (a) the diagnosis  D2is the most probable one, i.e. single fault diagnosis of an axiom containing a negation; (b) although  D4is a double fault diagnosis, it follows  D2closely as its axioms contain quantifiers; (c)  D1and  D3are significantly less probable than  D4because conjunction/disjunction in  ax 1and  ax 5have a significantly lower fault probability than negation in  ax 3. Taking this information into account asking query  Q1is essentially useless because it is highly probable that the target diagnosis is either  D2or D4and, therefore, it is highly probable that the oracle will respond with yes. Instead, asking  Q3is more informative because regardless of the answer we can exclude one of the highly probable diagnoses, i.e. either  D2or  D4. If the oracle responds to  Q3with no then  D2is the only remaining diagnosis. However, if the oracle responds with yes, diagnoses  D4, D3, and  D1remain, where  D4is significantly more probable compared to diagnoses  D3and  D1. If the difference between the probabilities of the diagnoses is high enough such that  D4can be accepted as the target diagnosis, no additional questions are required. Obviously this strategy can lead to a substantial reduction in the number of queries compared to myopic approaches as we demonstrate in our evaluation.

Note that in real-world application scenarios failure patterns and their probabilities can be discovered by analyzing the debugging actions of a user in an ontology editor, like Protégé. Learning of fault probabilities can be used to “personalize” the query selection algorithm to prefer user-specific faults.

image

Figure 15.1: The search tree of the greedy algorithm

However, as our evaluation shows, even a rough estimate of the probabilities is capable of outperforming the “split-in-half” heuristic.

To select the best query we exploit a-priori failure probabilities of each axiom derived from the syntax of description logics or some other knowledge representation language, such as OWL. That is, the user is able to specify own beliefs in terms of the probability of syntax element such as  ∀, ∃, ⊓, etc. being erroneous; alternatively, the debugger can compute these probabilities by analyzing the frequency of various syntax elements in the target diagnoses of different debugging sessions. If no failure information is available then the debugger can initialize all of the probabilities with some small value. Compared to statistically well-founded probabilities, the latter approach provides a suboptimal but useful diagnosis discrimination process, as discussed in the evaluation.

Given the failure probabilities of all syntax elements  se ∈ Sof a knowledge representation language used in O, we can compute the failure probability of an axiom  ax i ∈ O

image

where  Fse1 . . . Fsenrepresent the events that the occurrence of a syntax element  sejin  ax iis faulty. E.g. for  ax2of Example 15.2  p(ax2) = p(F⊑ ∪ F¬ ∪ F∃ ∪ F⊓ ∪ F∃). Assuming that each occurrence of a syntax element fails independently, i.e. an erroneous usage of a syntax element  sekmakes it neither more nor less probable that an occurrence of syntax element  sejis faulty, the failure probability of an axiom is computed as:

image

where  c(sej)returns number of occurrences of the syntax element  sejin an axiom  ax i. If among other failure probabilities the user states that  p(F⊑) = 0.001, p(F¬) = 0.01, p(F∃) = 0.05and  p(F⊓) = 0.001then  p(ax 2) = p(F⊑ ∪ F¬ ∪ F∃ ∪ F⊓ ∪ F∃) = 0.108.

Given the failure probabilities  p(ax i)of axioms, the diagnosis algorithm first calculates the a-priori probability  p(Dj)that  Djis the target diagnosis. Since all axioms fail independently, this probability can be computed as [dKW87]:

image

The prior probabilities for diagnoses are then used to initialize an iterative algorithm that includes two main steps: (a) the selection of the best query and (b) updating the diagnoses probabilities given query feedback.

According to information theory the best query is the one that, given the answer of an oracle, minimizes the expected entropy of the set of diagnoses [dKW87]. Let  p(Qi = yes)be the probability that query  Qiis answered with yes and  p(Qi = no)be the probability for the answer no. Furthermore, let p(Dj|Qi = yes)be the probability of diagnosis  Djafter the oracle answers yes and  p(Dj|Qi = no)be the probability after the oracle answers no. The expected entropy after querying  Qiis:

image

Based on a one-step-look-ahead information theoretic measure, the query which minimizes the expected entropy is considered best. This formula can be simplified to the following score function [dKW87] which we use to evaluate all available queries and select the one with the minimum score to maximize information gain:

image

where  v ∈ {yes, no}is a feedback of an oracle and  D∅iis the set of diagnoses which do not make any predictions for the query  Qi. The probability of the set of diagnoses  p(D∅i )as well as of any other set of diagnoses  Dilike  DPiand  DNiis computed as:

image

because by Definition 28.2, each diagnosis uniquely partitions all of the axioms of an ontology O into two sets, correct and faulty, and thus all diagnoses are mutually exclusive events.

Since, for a query  Qi, the set of diagnoses D can be partitioned into the sets  DPi,  DNiand  D∅i, the probability that an oracle will answer a query  Qiwith either yes or no can be computed as:

image

Clearly this assumes that for each diagnosis of  D∅iboth outcomes are equally likely and thus the probability that the set of diagnoses  D∅ipredicts either  Qi = yesor  Qi = nois  p(D∅i )/2.

Following feedback v for a query  Qs, i.e.  Qs = v, the probabilities of the diagnoses must be updated to take the new information into account. The update is made using Bayes’ rule for each  Dj ∈ D:

image

where the denominator  p(Qs = v)is known from the query selection step (Equation 16.4) and  p(Dj)is either a prior probability (Equation 16.2) or is a probability calculated using Equation 16.5 after a previous iteration of the debugging algorithm. We assign  p(Qs = v|Dj)as follows:

image

Example 16.1 (Example 15.1 continued) Suppose that the debugger is not provided with any information about possible failures and therefore assumes that all syntax elements fail with the same probability

image

Table 16.1: Expected scores for minimized queries (p(ax i) = 0.01)

image

Table 16.2: Expected scores for minimized queries  (p(ax 1) = 0.025, p(ax 2) = p(ax 3) = p(ax 4) = 0.01)

0.01 and therefore  p(ax i) = 0.01for all  ax i ∈ O. Using Equation 16.2 we can calculate probabilities for each diagnosis. For instance,  D1suggests that only one axiom  ax 1should be modified by the user. Hence, we can calculate the probability of diagnosis  D1as  p(D1) = p(ax 1)(1−p(ax 2))(1−p(ax 3))(1−p(ax 4)) = 0.0097. All other minimal diagnoses have the same probability, since every other minimal diagnosis suggests the modification of one axiom. To simplify the discussion we only consider minimal diagnoses for query selection. Therefore, the prior probabilities of the diagnoses can be normalized to p(Dj) = p(Dj)/ �Dj∈D p(Dj)and are equal to 0.25.

Given the prior probabilities of the diagnoses and a set of queries (see Table 15.2) we evaluate the score function (Equation 16.3) for each query. E.g. for the first query  Q1 : {B(w)}the probability p(D∅) = 0and the probabilities of both the positive and negative outcomes are:  p(Q1 = 1) = p(D2) +p(D3) + p(D4) = 0.75and  p(Q1 = 0) = p(D1) = 0.25. Therefore the query score is  sc(Q1) = 0.1887.

The scores computed during the initial stage (see Table 16.1) suggest that  Q2is the best query. Taking into account that  D1is the target diagnosis the oracle answers no to the query. The additional information obtained from the answer is then used to update the probabilities of diagnoses using the Equation 16.5. Since  D1and  D2predicted this answer, their probabilities are updated,  p(D1) = p(D2) = 1/p(Q2 =1) = 0.5. The probabilities of diagnoses  D3and  D4which are rejected by the oracle’s answer are also updated,  p(D3) = p(D4) = 0.

In the next iteration the algorithm recomputes the scores using the updated probabilities. The results show that  Q1is the best query. The other two queries  Q2and  Q3are irrelevant since no information will be gained if they are asked. Given the oracle’s negative feedback to  Q1, we update the probabilities p(D1) = 1and  p(D2) = 0. In this case the target diagnosis  D1was identified using the same number of steps as the “split-in-half” heuristic.

However, if the user specifies that the first axiom is more likely to fail, e.g.  p(ax 1) = 0.025, then Q1 : {B(w)}will be selected first (see Table 16.2). The recalculation of the probabilities given the negative outcome  Q1 = 0sets  p(D1) = 1and  p(D2) = p(D3) = p(D4) = 0. Therefore the debugger identifies the target diagnosis in only one step.

Example 16.2 (Example 15.2 continued) Suppose that in  ax 4the user specified  ∀s.Ainstead of  ∃s.Aand  ¬∃s.M3instead of  ∃s.¬M3in  ax 2. Therefore  D4is the target diagnosis. Moreover, assume that the debugger is provided with observations of three types of faults: (1) conjunction/disjunction occurs with probability  p1 = 0.001, (2) negation  p2 = 0.01, and (3) restrictions  p3 = 0.05. Using Equation 16.1 we can calculate the probability of the axioms containing an error:  p(ax 1) = 0.0019, p(ax 2) = 0.1074, p(ax 3) = 0.012, p(ax 4) = 0.051, and  p(ax 5) = 0.001. These probabilities are exploited to calculate the prior probabilities of the diagnoses (see Table 16.3) and to initialize the query selection process. To

image

Table 16.3: Probabilities of diagnoses after answers

image

Table 16.4: Expected scores for queries

simplify matters we focus on the set of minimal diagnoses.

In the first iteration the algorithm determines that  Q3is the best query and asks the oracle whether Ot |= {M1 ⊑ B}is true or not (see Table 16.4). The obtained information is then used to recalculate the probabilities of the diagnoses and to compute the next best subsequent query, i.e.  Q4, and so on. The query process stops after the third query, since  D4is the only diagnosis that has the probability p(D4) > 0.

Given the feedback of the oracle  Q4 = yesfor the second query, the updated probabilities of the diagnoses show that the target diagnosis has a probability of  p(D4) = 0.9918whereas  p(D3)is only 0.0082. In order to reduce the number of queries a user can specify a threshold, e.g.  σ = 0.95. If the absolute difference in probabilities of two most probable diagnoses is greater than this threshold, the query process stops and returns the most probable diagnosis. Therefore, in this example the debugger based on the entropy query selection requires less queries than the “split-in-half” heuristic. Note that already after the first answer  Q3 = yesthe most probable diagnosis  D4is three times more likely than the second most probable diagnosis  D1. Given such a great difference we could suggest to stop the query process after the first answer if the user would set  σ = 0.65.

The iterative ontology debugger (Algorithm 11) takes a faulty ontology O as input. Optionally, a user can provide a set of axioms B that are known to be correct as well as a set P of axioms that must be entailed by the target ontology and a set N of axioms that must not. If these sets are not given, the corresponding input arguments are initialized with  ∅. Moreover, the algorithm takes a set FP of fault probabilities for axioms  ax i ∈ O, which can be computed as described in Chapter 16 by exploiting knowledge about typical user errors. Alternatively, if no estimates of such probabilities are available, all probability values can be initialized using a small constant. We show the results of such a strategy in our evaluation section. The two other arguments  σand n are used to improve the performance of the algorithm.  σspecifies the diagnosis acceptance threshold, i.e. the minimum difference in probabilities between the most likely and second-most likely diagnoses. The parameter n defines the maximum number of most probable diagnoses that should be considered by the algorithm during each iteration. A further performance gain in Algorithm 11 can be achieved if we approximate the set of the n most probable diagnoses with the set of the n most probable minimal diagnoses, i.e. we neglect non-minimal diagnoses. We call this set of at most n most probable minimal diagnoses the leading diagnoses. Note, under the reasonable assumption that the fault probability of each axiom  p(ax i)is less than 0.5, for every non-minimal diagnosis ND a minimal diagnosis  D ⊂ NDexists which from Equation 16.2 is more probable than ND. Consequently the query selection algorithm presented here operates on the set of minimal diagnoses instead of all diagnoses (i.e. non-minimal diagnoses are excluded). However, the algorithm can be adapted with moderate effort to also consider non-minimal diagnoses.

We use the approach proposed by Friedrich et al. [FS05] to compute diagnoses and employ the combination of two algorithms, QUICKXPLAIN [Jun04] and HS-TREE [Rei87]. In a standard implementation the latter is a breadth-first search algorithm that takes an ontology O, sets P and N, and the maximum number of most probable minimal diagnoses n as an input. The algorithm generates minimal hitting sets using minimal conflict sets, which are computed on-demand. This is motivated by the fact that in some circumstances a subset of all minimal conflict sets is sufficient for generating a subset of all required minimal diagnoses. For instance, in Example 15.2 the user wants to compute only n = 2 leading minimal diagnoses and a minimal conflict search algorithm returns  CS1. In this case HS-TREE identifies two required minimal diagnoses  D1and  D2and avoiding the computation of the minimal conflict set  CS2. Of course, in the worst case, when all minimal diagnoses have to be computed the algorithm should compute all minimal conflict sets. In addition, the HS-TREE generation reuses minimal conflict sets in order to avoid unnecessary computations. Thus, in the real-world scenarios we evaluated (see Table 18.1), less than 10 minimal conflict sets were contained in the faulty ontologies having at most 13 elements while the maximal cardinality of minimal diagnoses was observed to be at most 9. Therefore, space limitations were not a problem for the breadth-first generation. However, for scenarios involving diagnoses of greater

image

cardinalities iterative-deepening strategies could be applied.

In our implementation of HS-TREE we use the uniform-cost search strategy. Given additional information in terms of axiom fault probabilities FP, the algorithm expands a leaf node in a search-tree if it is an element of the path corresponding to the maximum probability hitting set of minimal conflict sets computed so far. The probability of each minimal hitting set can be computed using Equation 16.2. Consequently, the algorithm computes a set of diagnoses ordered by their probability starting from the most probable one. HS-TREE terminates if either the n most probable minimal diagnoses are identified or no further minimal diagnoses can be found. Thus the algorithm computes at most n minimal diagnoses regardless of the number of all minimal diagnoses.

HS-TREE uses QUICKXPLAIN to compute required minimal conflicts. This algorithm, given a set of axioms AX and a set of correct axioms B returns a minimal conflict set  CS ⊆ AX, or  ∅if axioms AX ∪ Bare consistent. In the worst case, to compute a minimal conflict QUICKXPLAIN performs 2k(log(s/k) + 1) consistency checks, where k is the size of the generated minimal conflict set and s is the number of axioms in the ontology. In the best case only log(s/k) + 2k are performed [Jun04]. Importantly, the size of the ontology is contained in the log function. Therefore, the time needed for consistency checks in our test ontologies remained below 0.2 seconds, even for real world knowledge bases with thousands of axioms. The maximum time to compute a minimal conflict was observed in the Sweet-JPL ontology and took approx. 5 seconds (see Table 18.2).

In order to take past answers into account the HS-TREE updates the prior probabilities of the diagnoses by evaluating Equation 16.5. All required data is stored in the query history QH as well as in the sets P and N. When complete, HS-TREE returns a set of tuples of the form  ⟨Di, p(Di)⟩where  Diis contained in the set of the n most probable minimal diagnoses (leading diagnoses) and  p(Di)is its probability calculated using Equation 16.2 and Equation 16.5.

In the query-selection phase Algorithm 11 calls SELECTQUERY function (Algorithm 12) to generate a tuple  T =�Q, DP, DN, D∅�, where Q is the minimum score query (Equation 16.3) and  DP, DNand  D∅the sets of diagnoses constituting the partition. The generation algorithm carries out a depth-first search, removing the top element of the set D and calling itself recursively to generate all possible

image

subsets of the leading diagnoses. The set of leading diagnoses D is extracted from the set of tuples DP by the GETDIAGNOSES function. In each leaf node of the search tree the GENERATE function calls CREATEQUERY creates a query given a set of diagnoses  DPby computing common entailments and partitioning the set of diagnoses  D \ DP, as described in Section 15. If a query for the set  DPdoes not exist (i.e. there are no common entailments) or  DP = ∅then CREATEQUERY returns an empty tuple T = ⟨∅, ∅, ∅, ∅⟩. In all inner nodes of the tree the algorithm selects a tuple that corresponds to a query with the minimum score as found using the GETSCORE function. This function may implement the entropy-based measure (Equation 16.3), “split-in-half” or any other preference criteria. Given an empty tuple T = ⟨∅, ∅, ∅, ∅⟩the function returns the highest possible score of a used measure. In general, CREATEQUERY is called  2ntimes, where we set n = 9 in our evaluation. Furthermore, for each leading diagnosis not in  DP, CREATEQUERY has to check if the associated query is entailed. If a query is not entailed, a consistency check has to be performed. Entailments are determined by classification/realization and a subset check of the generated sentences. Common entailments are computed by exploiting the intersection of entailments for each diagnosis contained in  DP. Note that the entailments for each leading diagnosis are computed just once and reused in for subsequent calls of CREATEQUERY.

In the function MINIMIZEQUERY, the query Q of the resulting tuple�Q, DP, DN, D∅�is iteratively reduced by applying QUICKXPLAIN such that sets  DP, DNand  D∅are preserved. This is implemented by replacing the consistency checks performed by QUICKXPLAIN with checks that ensure that the reduction of the query preserves the partition. In order to check if a partition is preserved, a consistency/entailment check is performed for each element in  DNand  D∅. Elements of  DPneed not be checked because these elements entail the query and therefore any reduction. In the worst case n(2k log(s/k)+2k) consistency checks have to be performed in MINIMIZEQUERY where k is the length of the minimized query. Entailments of leading diagnoses are reused.

Algorithm 11 invokes the function GETQUERY to obtain the query from the tuple stored in T and calls GETANSWER to query the oracle. Depending on the answer, Algorithm 11 extends either the set P or the set N and thus excludes diagnoses not compliant with the query answer from the results of HS-TREE in further iterations. Note, the algorithm can be easily adapted to allow the oracle to reject a query if the

answer is unknown. In this case the algorithm proceeds with the next best query (w.r.t. the GETSCORE function) until no further queries are available. Algorithm 11 stops if the difference in the probabilities of the top two diagnoses is greater than the acceptance threshold  σor if no query can be used to differentiate between the remaining diagnoses (i.e. the score of the minimum score query equals to the maximum score of the used measure). The most probable diagnosis is then returned to the user. If it is impossible to differentiate between a number of highly probable minimal diagnoses, the algorithm returns a set that includes all of them. Moreover, in the first case (termination due to  σ), the algorithm can continue if the user is not satisfied with the returned diagnosis and at least one further query exists. Additional performance improvements can be achieved by using greedy strategies in Algorithm 12. The idea is to guide the search such that a leaf node of the left-most branch of a search tree contains a set of diagnoses  DPthat might result in a tuple�Q, DP, DN, D∅�with a low-score query. This method is based on the property of Equation 16.3 that sc(Q) = 0 if

image

Consequently, the query selection problem can be presented as a two-way number partitioning problem: given a set of numbers, divide them into two sets such that the difference between the sums of the numbers in each set is as small as possible. The Complete Karmarkar-Karp (CKK) algorithm [Kor98], which is one of the best algorithms developed for the two-way partitioning problem, corresponds to an extension of the Algorithm 12 with a set differencing heuristic [KKLO86]. The algorithm stops if the optimal solution to the two-way partitioning problem is found or if there are no further subsets to be investigated. In the latter case the best found solution is returned.

The main drawback of applying CKK to the query selection process is that none of the pruning techniques can be used. Also even if the algorithm finds an optimal solution to the two-way partitioning problem there just might be no query for a found set of diagnoses  DP. Moreover, since the algorithm is complete it still has to investigate all subsets of the set of diagnoses in order to find the minimum score query. To avoid this exhaustive search we extended CKK with an additional termination criterion: the search stops if a query is found with a score below some predefined threshold  γ. In our evaluation section we demonstrate substantial savings by applying the CKK partitioning algorithm.

To sum up, the proposed method depends on the efficiency of the classification/realization system and consistency/coherency checks given a particular ontology. The number of calls to a reasoning system can be reduced by decreasing the number of leading diagnoses n. However, the more leading diagnoses provide the more data for generating the next best query. Consequently, by varying the number of leading diagnoses it is possible to balance runtime with the number of queries needed to isolate the target diagnosis.32

We evaluated our approach using the real-world ontologies presented in Table 18.1 with the aim of demonstrating its applicability real-world settings. In addition, we employed generated examples to perform controlled experiments where the number of minimal diagnoses and their cardinality could be varied to make the identification of the target diagnosis more difficult. Finally, we carried out a set of tests using randomly modified large real-world ontologies to provide some insights on the scalability of the suggested debugging method.

For the first test we created a generator which takes a consistent and coherent ontology, a set of fault patterns together with their probabilities, the minimum number of minimum cardinality diagnoses m, and the required cardinality  |Dt|of these minimum cardinality diagnoses as inputs. We also assumed that the target diagnosis has cardinality  |Dt|. The output of the generator is an alteration of the input ontology for which at least the given number of minimum cardinality diagnoses with the required cardinality exist. Furthermore, to introduce inconsistencies (incoherencies), the generator applies fault patterns randomly to the input ontology depending on their probabilities.

In this experiment we took five fault patterns from a case study reported by Rector et al. [RDH+04] and assigned fault probabilities according to their observations of typical user errors. Thus we assumed that in cases (a) and (b) (see Section 15), where an axiom includes some roles (i.e. property assertions), axiom descriptions are faulty with a probability of 0.025, in cases (c) and (d) 0.01 and in case (e) 0.001. In each iteration, the generator randomly selected an axiom to be altered and applied a fault pattern. Following this, another axiom was selected using the concept taxonomy and altered correspondingly to introduce an inconsistency (incoherency). The fault patterns were randomly selected in each step using

image

Table 18.1: Diagnosis results for several of the real-world ontologies presented in [KPHS07]. #C/#P/#I are the number of concepts, properties and individuals in each ontology. #CS/min/max are the number of conflict sets, and their minimum and maximum cardinality. The same notation is used for diagnoses #D/min/max. The ontologies are available upon request.

image

Table 18.2: Min/avg/max time and calls required to compute the nine leading most probable diagnoses as well as all diagnoses for the real-world ontologies. Values are given for each stage, i.e. consistency checking, computation of minimal conflicts and minimal diagnoses, together with the total runtime needed to compute the diagnoses. All time values are 15 trial averages and are given in milliseconds.

the probabilities provided above.

For instance, given the description of a randomly selected concept A and the fault pattern “misuse of negation”, we added the construct  ⊓¬Xto the description of A, where X is a new concept name. Next, we randomly selected concepts B and S such that  S ⊑ Aand  S ⊑ Band added  ⊓Xto the description of B. During the generation process, we applied the HS-TREE algorithm after each introduction of an incoherency/inconsistency to control two parameters: the minimum number of minimal cardinality diagnoses in the ontology and their cardinality. The generator continues to introduce incoherences/inconsistencies until the specified parameter values are reached. For instance, if the minimum number of minimum cardinality diagnoses is equal to m = 6 and their cardinality is  |Dt| = 4, then the generated ontology will include at least 6 diagnoses of cardinality 4 and possibly some additional number of minimal diagnoses of higher cardinalities.

The resulting faulty ontology as well as the fault patterns and their probabilities were inputs for the ontology debugger. The acceptance threshold  σwas set to 0.95 and the number of most probable minimal diagnoses n was set to 9. In addition, one of the minimal diagnoses with the required cardinality was randomly selected as the target diagnosis. Note, the target ontology is not equal to the original ontology, but rather a corrected version of the altered one in which the faulty axioms were repaired by replacing them with their original (correct) versions according to the target diagnosis. The tests were performed

image

Figure 18.1: Average number of queries required to select the target diagnosis  Dtwith threshold  σ = 0.95.Random and “split-in-half” are shown for the cardinality of minimal diagnoses  |Dt| = 2.

using the ontologies bike2 to bike9, bcs3, galen and galen2 from Racer’s benchmark suite33.

The average results of the evaluation performed on each test ontology (presented in Figure 18.1) show that the entropy-based approach outperforms the “split-in-half” heuristic as well as the random query selection strategy by more than 50% for the  |Dt| = 2case due to its ability to estimate the probabilities of diagnoses and to stop once the target diagnosis crossed the acceptance threshold. On average the algorithm required 8 seconds to generate a query. In addition, Figure 18.1 shows that the number of queries required increases as the cardinality of the target diagnosis increases, regardless of the method. Despite this, the entropy-based approach remains better than the “split-in-half” method for diagnoses with increasing cardinality. The approach did however require more queries to discriminate between high cardinality diagnoses because in such cases more minimal conflicts were generated. Consequently, the debugger should consider more minimal diagnoses in order to identify the target one.

For the next test we selected seven real-world ontologies described in Tables 18.1 and 18.234. Performance of both the entropy-based and “split-in-half” selection strategies was evaluated using a variety of different prior fault probabilities to investigate under which conditions the entropy-based method should be preferred.

In our experiments we distinguished between three different distributions of prior fault probabilities: extreme, moderate and uniform (see Figure 18.2 for an example). The extreme distribution simulates a situation in which very high failure probabilities are assigned to a small number of syntax elements. That is, the provider of the estimates is quite sure that exactly these elements are causing a fault. For instance, it may be well known that a user has problems formulating restrictions in OWL whereas all other elements, such as subsumption and conjunction, are well understood. In the case of a moderate distribution the estimates provide a slight bias towards some syntax elements. This distribution has the same motivation as the extreme one, however, in this case the probability estimator is less sure about the sources of possible errors in axioms. Both extreme and moderate distributions correspond to the exponential distribution with  λ = 1.75and  λ = 0.5respectively. The uniform distribution models the situation where no prior fault probabilities are provided and the system assigns equal probabilities to all syntax elements found in a faulty ontology. Of course the prior probabilities of diagnoses may not reflect the actual situation. Therefore, for each of the three distributions we differentiate between good, average and bad cases. In the good case the estimates of the prior fault probabilities are correct and the

image

Figure 18.2: Example of prior fault probabilities of syntax elements sampled from extreme, moderate and uniform distributions.

target diagnosis is assigned a high probability. The average case corresponds to the situation when the target diagnosis is neither favored nor penalized by the priors. In the bad case the prior distribution is unreasonable and disfavors the target diagnosis by assigning it a low probability.

We executed 30 tests for each of the combinations of the distributions and cases with an acceptance threshold  σ = 0.85and a required number of most probable minimal diagnoses n = 9. Each iteration started with the generation of a set of prior fault probabilities of syntax elements by sampling from a selected distribution (extreme, moderate or uniform). Given the priors we computed the set of all minimal diagnoses D of a given ontology and selected the target one according to the chosen case (good, average or bad). In the good case the prior probabilities favor the target diagnosis and, therefore, it should be selected from the diagnoses with high probability. The set of diagnoses was ordered according to their probabilities and the algorithm iterated through the set starting from the most probable element. In the first iteration the most probable minimal diagnosis  D1is added to the set G. In next iteration j a diagnosis Djwas added to the set G if �i≤j p(Di) ≤ 13and to the set A if  �i≤j p(Di) ≤ 23. The obtained set G contained all most probable diagnoses which we considered as good. All diagnoses in the set A \ G were classified as average and the remaining diagnoses D \ A as bad. Depending on the selected case we randomly selected one of the diagnoses as the target from the appropriate set.

The results of the evaluation presented in Table 18.3 show that the entropy-based query selection approach clearly outperforms “split-in-half” in good and average cases for the three probability distributions. The average time required by the debugger to perform such basic operations as consistency checking, computation of minimal conflicts and diagnoses is presented in Table 18.4. The results indicate that on average at most 17 seconds required to compute up to 9 minimal diagnoses and a query. Moreover, the number of axioms in a query remains reasonable in most of the cases stays bounds, i.e. between 1 and 4 axioms per query.

In the uniform case better results were observed since the diagnoses have different cardinality and structure, i.e. they include different syntax elements. Consequently, even if equal probabilities for all syntax elements (uniform distribution) are given, the probabilities of diagnoses are different. Axioms with a greater number of syntax elements receive a higher fault probability. Also, diagnoses with a smaller cardinality in many cases receive a higher probability. This information provides enough bias to favor the entropy-based method.

In the bad case, where the target diagnosis received a low probability and no information regarding the

image

Table 18.3: Minimum, average and maximum number of queries required by the entropy-based and “split-in-half” query selection methods to identify the target diagnosis in real-world ontologies. Ontologies are ordered by the number of diagnoses.

image

Table 18.4: Average time required to compute at most nine minimal diagnoses (DT) and a query (QT) in each iteration, as well as the average number of axioms in a query after minimization (QL). The averages are shown for extreme, moderate and uniform distributions using the entropy-based query selection method. Time is measured in milliseconds. 0 0 0 max win Q 0,15 0,03 0,19max loss Q 0,37 0,14 0,38max win T 32% 34% 37%max loss T 33% 38% 35%

image

Figure 18.3: Average time/query gain resulting from the application of the extended CKK partitioning algorithm. The whiskers indicate the maximum and minimum possible average gain of queries/time using extended CKK.

prior fault probabilities was given, we observed that the performance of the entropy-method improved as more queries were posed. In particular, in the University ontology the performance is essentially similar (7.27 vs. 7.37) whereas in the Economy and Transportation ontology the entropy-based method can save and average of two queries.

“Split-in-half” appears to be particularly inefficient in all good, average and bad cases when applied to ontologies with a large number of minimal diagnoses, such as Economy and Transportation. The main problem is that no stop criteria can be used with the greedy method as it is unable to provide any ordering on the set of diagnoses. Instead, the method continues until no further queries can be generated, i.e. only one minimal diagnosis exists or there are no discriminating queries. Conversely, the entropy-based method is able to improve its probability estimates using Bayes-updates as more queries are answered and to exploit the differences in the probabilities in order to decide when to stop.

The most significant gains are achieved for ontologies with many minimal diagnoses and for the average and good cases, e.g. the target diagnosis is within the first or second third of the minimal diagnoses ranked by their prior probability. In these cases the entropy-based method can save up to 60% of the queries.

image

Table 18.5: Statistics for the real-world ontologies used in the stress-tests measured for a single random alteration. #CS/min/max are the number of minimal conflict sets, and their minimum and maximum cardinality. The same notation is used for diagnoses #D/min/max. The minimum/average/maximum time required to make a consistency check (Consistency), compute a minimal conflict set (QuickXplain) and a minimal diagnosis are measured in milliseconds. Overall runtime indicates the time required to compute all minimal diagnoses in milliseconds.

image

Table 18.6: Average values measured for extreme, moderate and uniform distributions in each of the good, average and bad cases. #Query is the number of queries required to find the target diagnosis. Overall runtime as well as the time required to compute a query (QT) and at least nine minimal diagnoses (DT) are given in milliseconds. Query length (QL) shows the average number of axioms in a query.

Therefore, we can conclude that even rough estimates of the prior fault probabilities are sufficient, provided that the target diagnosis is not significantly penalized. Even if no fault probabilities are available and there are many minimal diagnoses, the entropy-based method is advantageous. The differences between probabilities of individual syntax elements appears not to influence the results of the query selection process and affect only the number of outliers, i.e. cases in which the diagnosis approach required either few or many queries compared to the average.

Another interesting observation is that often both methods eliminated more than n diagnoses in one iteration. For instance, in the case of the Transportation ontology both methods were able to remove hundreds of minimal diagnoses with a small number of queries. This behavior appears to stem from relations between the diagnoses. That is, the addition of a query to either P or N allows the method to remove not only the diagnoses in sets  DPor  DN, but also some unobserved diagnoses that were not in any of the sets of n leading diagnoses computed by HS-TREE. Given the sets P and N, HS-TREE automatically invalidates all diagnoses which do not fulfill the requirements (see Definition 28.2).

The extended CKK method presented in Chapter 17 was evaluated in the same settings as the complete Algorithm 12 with acceptance threshold  γ = 0.1. The obtained results presented in Figure 18.3 show that the extended CKK method decreases the length of a debugging session by at least 60% while requiring

image

Figure 18.4: Average time required to identify the target diagnosis using CKK and brute force query selection algorithms.

on average 0.1 queries more than Algorithm 12. In some cases (mostly for the uniform distribution) the debugger using CKK search required even fewer queries than Algorithm 12 because of the inherent uncertainty of the domain. The plot of the average time required by Algorithm 12 and CKK to identify the target diagnosis presented in Figure 18.4 shows that the application of the latter can reduce runtime significantly.

In the last experiment we tried to simulate an expert developing large real-world ontologies35 as described in Table 18.5. Often in such settings an expert makes small changes to the ontology and then runs the reasoner to verify that the changes are valid, i.e. the ontology is consistent and its entailments are correct. To simulate this scenario we used the generator described in the first experiment to introduce 1 to 3 random changes that would make the ontology incoherent. Then, for each modified ontology, we performed 15 tests using the fault distributions as in the second test. The results obtained by the entropy-based query selection method using CKK for query computation are presented in Table 18.6. These results show that the method can be used for analysis of large ontologies with over 33000 axioms while requiring a user to wait for only a minute to compute the next query.

Despite the range of ontology diagnosis methods available (see [SHCH07, KPHS07, FS05]), to the best of our knowledge no interactive ontology debugging methods, such as our “split-in-half” or entropy-based methods, have been proposed so far. The idea of ranking of diagnoses and proposing a target diagnosis is presented in [KPSCG06]. This method uses a number of measures such as: (a) the frequency with which an axiom appears in conflict sets, (b) impact on an ontology in terms of its “lost” entailments when an axiom is modified or removed, (c) ranking of test cases, (d) provenance information about axioms, and (e) syntactic relevance. For each axiom in a conflict set, these measures are evaluated and combined to produce a rank value. These ranks are then used by a modified HS-TREE algorithm to identify diagnoses with a minimal rank. However, the method fails when a target diagnosis cannot be determined reliably with the given a-priori knowledge. In our work required information is acquired until the target diagnosis can be identified with confidence. In general, the work of [KPSCG06] can be combined with the ideas presented in our work as axiom ranks can be taken into account together with other observations for calculating the prior probabilities of the diagnoses.

The idea of selecting the next best query based on the expected entropy was exploited in the generation of decisions trees in [Qui86] and further refined for selecting measurements in the model-based diagnosis of circuits in [dKW87]. We extend these methods to query selection in the domain of ontology debugging.

In the area of debugging logic programs, Shapiro [Sha83] developed debugging methods based on query answering. Roughly speaking, Shapiro’s method aims to detect one fault at a time by querying an oracle about the intended behavior of a Prolog program at hand. In our terminology, for each answer that must not be entailed this diagnosis approach generates one conflict at a time by exploiting the proof tree of a Prolog program. The method then identifies a query that splits the conflict in half. Our approach can deal with multiple diagnoses and conflicts simultaneously which can be exploited by query generation strategies such as “split-in-half” and entropy-based methods. Whereas the “split-in-half” strategy splits the set of diagnoses in half, Shapiros’s method focuses on one conflict. Furthermore, the exploitation of failure probabilities is not considered in [Sha83]. However, Shapiro’s method includes the learning of new clauses in order to cover not entailed answers. Interleaving discrimination of diagnoses and learning of descriptions is currently not considered in our approach because of their additional computational costs.

From a general point of view Shapiro’s method can be seen as a prominent example of inductive logic programming (ILP) including systems such as [MB88, Mug95]. In particular, [Mug95] proposes inverse entailments combined with general to specific search through a refinement graph with the goal of generating a theory (hypothesis) which covers the examples and fulfills additional properties. Compared to ILP, the focus of our work lies on the theory revision. However, our knowledge representation languages are variants of description logics and not logic programs. Moreover, our method aims to discover axioms

292 CHAPTER 19. RELATED WORK

which must be changed while minimizing user interaction. Preferences of theory changes are expressed by probabilities which are updated through Bayes’ rule. Other preferences based on plausible extensions of the theory were not considered, again because of their computational costs.

Although model-based diagnosis has also been applied to logic programs [CFD93], constraint knowledge bases [FFJS04] and hardware descriptions [FSW99], none of these approaches propose a query generation method to discriminate between diagnoses.

In this part we presented an approach to the interactive debugging of ontologies. This approach is applicable to any knowledge representation language with monotonic semantics. We showed that the axioms generated by classification and realization reasoning services can be exploited to generate queries which differentiate between diagnoses. For selecting the best next query we proposed two strategies: The “split-in-half” strategy prefers queries which allow eliminating a half of the leading diagnoses. The entropy-based strategy employs information theoretic concepts to exploit knowledge about the likelihood of axioms to be faulty. Based on the probability of an axiom containing an error we predict the (expected) information gain produced by a query result, enabling us to select the best subsequent query according to a one-step-lookahead entropy-based scoring function. We described the implementation of an interactive debugging algorithm and compared the entropy-based method with the “split-in-half” strategy. Our experiments showed a significant reduction in the number of queries required to identify the target diagnosis when the entropy-based method is applied. Depending on the quality of the given prior fault probabilities the required number of queries could be reduced by up to 60%.

In order to evaluate the robustness of the entropy-based method we experimented with different prior fault probability distributions as well as different qualities of the prior probabilities. Furthermore, we investigated cases where knowledge about failure probabilities is missing or inaccurate. In case such knowledge is unavailable, the entropy-based methods ranks the diagnoses based on the number of syntax elements contained in an axiom and the number of axioms in a diagnosis. Given that this is a reasonable guess (i.e. the target diagnosis is not at the lower end of the diagnoses ranked by their prior probabilities), the entropy-based method outperformed “split-in-half”. Moreover, even if the initial guess is not reasonable, the entropy-based method improves the accuracy of the probabilities as more questions are asked. Furthermore, the applicability of the approach to real-world ontologies containing thousands of axioms was demonstrated by an extensive set of evaluations which are publicly available.

image

image

A reinforcement learning query selection strategy (RIO) that makes the presented debugging system robust against the usage of low-quality fault information is presented and thoroughly analyzed in this part which is based on the publications [RSFF13, RSFF12, RSFF11, SRF11] published in Web Reasoning and Rule Systems (RR-2013), in the Proceedings of the 7th International Workshop on Ontology Matching (OM-2012), in the Proceedings of the Joint Workshop on Knowledge Evolution and Ontology Dynamics 2011 (EvoDyn2011) and in DX 2011 - 22nd International Workshop on Principles of Diagnosis, respectively.

The foundation for widespread adoption of Semantic Web technologies is a broad community of ontology developers which is not restricted to experienced knowledge engineers. Instead, domain experts from diverse fields should be able to create ontologies incorporating their knowledge as autonomously as possible. The resulting ontologies are required to fulfill some minimal quality criteria, usually consistency, coherency and no undesired entailments, in order to grant successful deployment. However, the correct formulation of logical descriptions in ontologies is an error-prone task which accounts for a need for assistance in ontology development in terms of ontology debugging tools. Usually, such tools [SHCH07, KPHS07, FS05, HPS08] use model-based diagnosis [Rei87] to identify sets of faulty axioms, called diagnoses, that need to be modified or deleted in order to meet the imposed quality requirements. The major challenge inherent in the debugging task is often a substantial number of alternative diagnoses.

In [SFFR12] this issue is tackled by letting the user take action during the debugging session by answering queries about entailments and non-entailments of the desired ontology. These answers pose constraints to the validity of diagnoses and thus help to sort out incompliant diagnoses step-by-step. In addition, a Bayesian approach is used to continuously readjust the fault probabilities by means of the additional information given by the user. The user effort in this interactive debugging procedure is strongly affected by the quality of the initially provided meta information, i.e. prior knowledge about fault probabilities of a user w.r.t. particular logical operators. To get this under control, the selection of queries shown to the user can be varied correspondingly. To this end, two essential paradigms for choosing the next “best” query have been proposed, split-in-half and entropy-based.

In order to opt for the optimal strategy, however, the quality of the meta information, i.e. good or bad (which means high or low probability of the correct solution), must be known in advance. This would, however, implicate the pre-knowledge of the initially unknown solution. Entropy-based methods can make optimal profit from exploiting properly adjusted initial fault probabilities (high potential), whereas they can completely fail in the case of weak prior information (high risk). The split-in-half technique, on the other hand, manifests constant behavior independently of the probabilities given (no risk), but lacks the ability to leverage appropriate fault information (no potential). This matter of fact is witnessed by the evaluation we conducted, which shows that an unsuitable combination of meta information and query selection strategy can result in a substantial increase of more than 2000% w.r.t. number of queries to a user. So, there is a need to either (1) guarantee a sufficiently suited choice of prior fault information, or (2) to manage the “risk” of unsuitable method selection. The task of (1) might not be a severe problem in a debugging scenario involving a faulty ontology developed by a single expert, since the meta information might be extracted from the logs of previous sessions, if available, or specified by the expert based on their experience w.r.t. own faults. However, realization of task (1) is a major issue in scenarios involving automatized systems producing (parts of) ontologies, e.g. ontology alignment and ontology learning, or numerous users collaborating in modeling an ontology, where the choice of reasonable meta information is rather unclear. Therefore, we focus on accomplishing task (2).

The contribution of this part is a new RIsk Optimization reinforcement learning method (RIO), which allows to minimize user interaction throughout a debugging session on average compared to existing strategies, for any quality of meta information (high potential at low risk). By virtue of its learning capability, our approach is optimally suited for debugging ontologies where only vague or no meta information is available. A learning parameter is constantly adapted based on the information gathered so far. On the one hand, our method takes advantage of the given meta information as long as good performance is achieved. On the other hand, it gradually gets more independent of meta information if suboptimal behavior is measured.

Experiments on two datasets of faulty real-world ontologies show the feasibility, efficiency and scalability of RIO. The evaluation will indicate that, on average, RIO is the best choice of strategy for both good and bad meta information with savings as to user interaction of up to 80%.

The problem specification, basic concepts and a motivating example are provided in Chapter 22. Chapter 23 explains the suggested approach and gives implementation details. Evaluation results are described in Chapter 24. Related work is discussed in Chapter 25. Chapter 26 concludes.

First we provide an informal introduction to ontology debugging, particularly addressing readers unfamiliar with the topic. Later we introduce precise formalizations. We assume the reader to be familiar with description logics [BCM+07].

Ontology debugging deals with the following problem: Given is an ontology O which does not meet postulated requirements R, e.g. R = {coherency, consistency}. O is a set of axioms formulated in some monotonic knowledge representation language, e.g. OWL DL. The task is to find a subset of axioms in O, called diagnosis, that needs to be altered or eliminated from the ontology in order to meet the given requirements. The presented approach to ontology debugging does not rely upon a specific knowledge representation formalism, it solely presumes that it is logic-based and monotonic. Additionally, the existence of sound and complete procedures for deciding logical consistency and for calculating logical entailments is assumed. These procedures are used as a black box. For OWL DL, e.g., both functionalities are provided by a standard DL-reasoner.

A diagnosis is a hypothesis about the state of each axiom in O of being either correct or faulty. Generally, there are many diagnoses for one and the same faulty ontology O. The problem is then to figure out the single diagnosis, called target diagnosis  D∗, that complies with the knowledge to be modeled by the intended ontology. In interactive ontology debugging we assume a user, e.g. the author of the faulty ontology or a domain expert, interacting with an ontology debugging system by answering queries about entailments of the desired ontology, called the target ontology  O∗. The target ontology can be understood as O minus the axioms of  D∗plus a set of axioms needed to preserve the desired entailments, called positive test cases. Note that the user is not expected to know  O∗explicitly (in which case there would be no need to consult an ontology debugger), but implicitly in that they are able to answer queries about O∗.

A query is a set of axioms and the user is asked whether the conjunction of these axioms is entailed by  O∗. Every positively (negatively) answered query constitutes a positive (negative) test case fulfilled by  O∗. The set of positive (entailed) and negative (non-entailed) test cases is denoted by P and N , respectively. So, P and N are sets of sets of axioms, which can be, but do not need to be, initially empty. Test cases can be seen as constraints  O∗must satisfy and are therefore used to gradually reduce the search space for valid diagnoses. Roughly, the overall procedure consists of (1) computing a predefined number of diagnoses, (2) gathering additional information by querying the user, (3) incorporating this information to prune the search space for diagnoses, and so forth, until a stopping criterion is fulfilled, e.g. one diagnosis  D∗has overwhelming probability.

The general debugging setting we consider also envisions the opportunity for the user to specify some background knowledge B, i.e. a set of axioms that are known to be correct. B is then incorporated in the calculations throughout the ontology debugging procedure, but no axiom in B may take part in a diagnosis. For example, in case the user knows that a subset of axioms in O is definitely sound, all axioms in this subset are added to B before initiating the debugging session. The advantage of this over simply not considering the axioms in B at all is, that the semantics of axioms in B is not lost and can be exploited, e.g., in query generation. B and O \ B partition the original ontology into a set of correct and possibly incorrect axioms, respectively. In the debugging session, only O := O \ B is used to search for diagnoses. This can reduce the search space for diagnoses substantially. Another application of background knowledge could be the reuse of an existing ontology to support successful debugging. For example, when formulating an ontology about medical terms, a thoroughly curated reference ontology B could be leveraged to find own formulations contradicting the correct ones in B, which would not be found without integration of B into the debugging procedure.

More formally, ontology debugging can be defined in terms of a diagnosis problem instance, for which we search for solutions, i.e. diagnoses, that enable to formulate the target ontology:

Definition 22.1 (Diagnosis Problem Instance, Target Ontology). Let  O = T ∪ Abe an ontology with terminological axioms T and assertional axioms A, B a set of axioms which are assumed to be correct (background knowledge), R a set of requirements to O, P and N respectively a set of positive and negative test cases, where each test case  p ∈ Pand  n ∈ Nis a set of axioms. Then we call the tuple  ⟨O, B, P, N ⟩Ra diagnosis problem instance (DPI). An ontology  O∗is called target ontology w.r.t. ⟨O, B, P, N ⟩Riff all the following conditions hold:

image

Definition 22.2 (Diagnosis). We call  D ⊆ Oa diagnosis w.r.t. a DPI  ⟨O, B, P, N ⟩Riff  (O \ D) ∪(�p∈P p)is a target ontology w.r.t.  ⟨O, B, P, N ⟩R. A diagnosis D w.r.t. a DPI is minimal iff there is no D′ ⊂ Dsuch that  D′is a diagnosis w.r.t. this DPI. The set of minimal diagnoses w.r.t. a DPI is denoted by mD.

Note that a diagnosis D gives complete information about the correctness of each axiom  axk ∈ O, i.e. all  axi ∈ Dare assumed to be faulty and all  axj ∈ O \ Dare assumed to be correct.

Example 22.1 Consider  O := T ∪ Awith terminological axioms  T := O1 ∪ O2 ∪ M12:

image

and an assertional axiom A = {PhDStudent(s)}, where  M12is an automatically generated set of axioms serving as semantic links between  O1and  O2. The given ontology O is inconsistent since it describes s as both a DeptMember and not.

Let us assume that the assertion PhDStudent(s) is considered as correct and is thus added to the background theory, i.e. B := A, and that no test cases are initially specified, i.e. the sets P and N are empty. For the resulting DPI  ⟨T , A, ∅, ∅⟩{coherence}the set of minimal diagnoses  mD = {D1 : [ax 1], D2 :[ax 2], D3 : [ax 3], D4 : [ax 4], D5 : [ax 5], D6 : [ax 6]}. mDcan be computed by a diagnosis algorithm such as the one presented in [FS05].

With six minimal diagnoses for only six ontology axioms, this example already gives an idea that in many cases |mD| can get very large. Note that generally the computation of all minimal diagnoses w.r.t. a given DPI is not feasible within reasonable time due to the complexity of the underlying algorithms. Therefore, in practice, especially in an interactive scenario where reaction time is essential, a set of leading diagnoses  D ⊆ mDis considered as a representative for mD.36 Concerning the optimal number of leading diagnoses, a trade-off between representativeness and complexity of associated computations w.r.t. D needs to be found.

Without any prior knowledge in terms of diagnosis fault probabilities or specified test cases, each diagnosis in D is equally likely to be the target diagnosis  D∗. In other words, for each  D ∈ Dw.r.t. the DPI  ⟨T , A, ∅, ∅⟩{coherence}, the ontology  (O \ D) ∪ (�p∈P p)meets all the conditions defining a target ontology. However, besides postulating coherence the user might want the target ontology to entail that s is a student as well as a researcher, i.e.  O∗ |= t1where  t1 := {Researcher(s), Student(s)}. Formulating  t1as a positive test case yields the DPI  ⟨T , A, {t1}, ∅⟩{coherence}, for which only diagnoses D2, D4, D6 ∈ Dare valid and enable to formulate a corresponding  O∗. All other diagnoses in D are ruled out by the fact that  t1 ∈ P, which means they have a probability of zero of being the target diagnosis. If t1 ∈ N, in contrast, this would imply that  D2, D4, D6had to be rejected.

So, it depends on the test cases specified by a user which diagnosis will finally be identified as target diagnosis. Also, the order in which test cases are specified, is crucial. For instance, consider the test cases  t1 := {PhD(s)}and  t2 := {Student(s)}. If  t1 ∈ Pis specified before  t2 ∈ N, then  t1 ∈ Pis redundant, since the only diagnosis agreeing with  t2 ∈ Nis  D3which preserves also the entailment  t1in the resulting target ontology  O∗ = (O \ D3) ∪ ∅without explicating it as a positive test case.

Since it is by no means trivial to get the right – in the sense of most informative – test cases formulated in the proper order such that the number of test cases necessary to detect the target diagnosis is minimized, interactive debugging systems offer the functionality to automatize selection of test cases. The benefit is that the user can just concentrate on “answering” the provided test cases which means assigning them to either P or N . We call such automatically generated test cases queries. The theoretical foundation for the application of queries is the fact that  O \ Diand  O \ Djfor  Di ̸= Dj ∈ Dentail different sets of axioms.

Definition 22.3 (Query, Partition). Let D be a set of minimal diagnoses w.r.t. a DPI  ⟨O, B, P, N ⟩Rand O∗i := (O \ Di) ∪ B ∪ (�p∈P p)for  Di ∈ D. Then a set of axioms  Xj ̸= ∅is called a query w.r.t. D iff D+j := {Di ∈ D | O∗i |= Xj} ̸= ∅and  D−j := {Di ∈ D | ∃x ∈ N ∪ R : O∗i ∪ Xjviolates  x} ̸= ∅. The (unique) partition of a query  Xjis denoted by  ⟨D+j , D−j , D0j⟩where  D0j = D\(D+j ∪D−j ). XDterms a set of queries and associated partitions w.r.t. D in which one and the same partition of D occurs at most once and only if there is an associated query for this partition.

Note that, in general, there can be  nqqueries for a particular partition of D where  nqcan be zero or some positive integer. We are interested in (1) only those partitions for each of which  nq ≥ 1and (2) only one query for each such partition. The set  XDincludes elements such that (1) and (2) holds.  XDfor a given set of minimal diagnoses D w.r.t. a DPI can be generated as shown in Algorithm 13. In each iteration, given a set of diagnoses  D+k ⊂ D, common entailments37  Xk :=�e | ∀Di ∈ D+k : O∗i |= e�are computed (GETENTAILMENTS) and used to classify the remaining diagnoses in  D \ D+kto obtain the partition  ⟨D+k , D−k , D0k⟩associated with  Xk. Then, if the partition  ⟨D+k , D−k , D0k⟩does not already occur in  XD(INCLUDESPARTITION), the query  Xkis minimized [SFFR12] (MINIMIZEQUERY) such that its partition is preserved, yielding a query  X′k ⊆ Xksuch that any  X′′k ⊂ X′kis not a query or has

image

not the same partition. Finally,  X′kis added to  XDtogether with its partition  ⟨D+k , D−k , D0k⟩. Function REQVIOLATED(arg) returns true if arg violates some requirement in R or entails some negative test case in N .

Asking the user a query  Xjmeans asking them  (O∗ |= Xj?). Let the answering of queries by a user be modeled as function  u : XD → {t, f}. If  uj := u(Xj) =t, then  P ← P ∪ {Xj}and  D ← D \ D−j. Otherwise,  N ← N ∪ {Xj}and  D ← D \ D+j. Prospectively, according to Definition 22.2, only those diagnoses are considered in the set D that comply with the new DPI obtained by the addition of a test case. This allows us to formalize the problem we address in this work:

Problem Definition 22.1 (Query Selection). Given D w.r.t. a DPI  ⟨O, B, P, N ⟩R, a stopping criterion stop : D → {t, f}and a user u, find a next query  Xj ∈ XDsuch that (1)  (Xj, . . . , Xq)is a query sequence of minimal length and (2) there exists a  D∗ ∈ Dw.r.t.  ⟨O, B, P′, N ′⟩Rsuch that stop(D∗) = t, where  P′ := P ∪ {Xi | Xi ∈ {Xj, . . . , Xq}, ui = t}and  N ′ := N ∪ {Xi | Xi ∈{Xj, . . . , Xq}, ui = f}.

Two strategies for selecting the “best” next query have been proposed [SFFR12]:

Split-In-Half Strategy (SPL) selects the query  Xjwhich minimizes the following scoring function:

image

So, SPL prefers queries which eliminate half of the diagnoses independently of the query outcome.

Entropy-Based Strategy (ENT) uses information about prior probabilities  ptfor the user to make a mistake when using a syntactical construct of type  t ∈ CT(L), where CT(L) is the set of constructors available in the used knowledge representation language L, e.g.  {∀, ∃, ⊑, ¬, ⊔, ⊓} ⊂ CT(OWL DL). These fault probabilities  ptare assumed to be independent and used to calculate fault probabilities of axioms  axkas

image

where n(t) is the number of occurrences of construct type t in  axk. The probabilities of axioms can in turn be used to determine fault probabilities of diagnoses  Di ∈ Das

image

ENT selects the query  Xj ∈ XDwith highest expected information gain, i.e. that minimizes the following scoring function [SFFR12]:

image

where

image

and

image

The answer  uj = ais used to update probabilities  p(Dk)for  Dk ∈ Daccording to the Bayesian formula, yielding  p(Dk|uj = a).

The result of the evaluation in [SFFR12] shows that ENT reveals better performance than SPL in most of the cases. However, SPL proved to be the best strategy in situations when misleading prior information is provided, i.e. the target diagnosis  D∗has low probability. So, one can regard ENT as a high risk strategy with high potential to perform well, depending on the priorly unknown quality of the given fault information. SPL, in contrast, can be seen as a no-risk strategy without any potential to leverage good meta information. Therefore, selection of the proper combination of prior probabilities  {pt | t ∈ CT(L)}and query selection strategy is crucial for successful diagnosis discrimination and minimization of user interaction.

Example 22.2 (Example 22.1 continued) To illustrate this, let a user who wants to debug our example ontology O set  p(ax i) := 0.001for  axi(i=1,...,4)and  p(ax 5) := 0.1, p(ax 6) := 0.15, e.g. because the user doubts the correctness of  ax 5, ax 6while being quite sure that  axi(i=1,...,4)are correct. Assume that D2corresponds to the target diagnosis  D∗, i.e. the settings provided by the user are inept. Application of ENT starts with computation of prior fault probabilities of diagnoses  p(D1) = p(D2) = p(D3) =p(D4) = 0.003, p(D5) = 0.393, p(D6) = 0.591(Formula 22.1). Then  (O∗ |= X1?)with  X1 :={DeptEmployee(s), Student(s)}, will be identified as the optimal query since it has the minimal score scent(X1) = 0.02(see Table 22.1 for queries and partitions w.r.t. the example ontology). However, since the unfavorable answer  u1 = fis given, this query eliminates only two of six diagnoses  D4and  D6. The Bayesian probability update then yields  p(D2) = p(D3) = p(D4) = 0.01and  p(D5) = 0.97. As next query  X2with  scent(X2) = 0.811is selected and answered unfavorably (u2 = t) as well which results in the elimination of only one of four diagnoses  D5. By querying  X3 (scent(X3) = 0.082, u3 = t) and X4 (sc(X4) = 0, u4 = t), the further execution of this procedure finally leads to the target diagnosis  D2. So, application of ENT requires four queries to find  D∗. If SPL is used instead, only three queries are required. The algorithm can select one of the two queries  X5or  X9because each eliminates half of all diagnoses in any case. Let the strategy select  X5which is answered positively (u5 = t). As successive queries,  X6 (u6 = f) and  X1 (u1 = f) are selected, which leads to the revelation of  D∗ = D2.

image

Table 22.1: A set  XDof queries and associated partitions w.r.t. the initial DPI  ⟨T , A, ∅, ∅⟩{coherence} of theexample ontology O.

This scenario demonstrates that the no-risk strategy SPL (three queries) is more suitable than ENT (four queries) for fault probabilities which disfavor the target diagnosis. Let us suppose, on the other hand, that probabilities are assigned more reasonably in our example, e.g.  D∗ = D6. Then it will take ENT only two queries  (X1, X6)to find  D∗while SPL will still require three queries, e.g.  (X5, X1, X6).

This example indicates that, unless the target diagnosis is known in advance, one can never be sure to select the best strategy from SPL and ENT. In Chapte 23 we present a learning query selection algorithm that combines the benefits of both SPL and ENT. It adapts the way of selecting the next query depending on the elimination rate (like SPL) and on information gain (like ENT). Thereby its performance approaches the performance of the better of both SPL and ENT.

Selection

The proposed Risk Optimization Algorithm (RIO) extends ENT strategy with a dynamic learning procedure that learns by reinforcement how to select the next query. Its behavior is determined by the achieved performance in terms of diagnosis elimination rate w.r.t. the set of leading diagnoses D. Good performance causes similar behavior to ENT, whereas aggravation of performance leads to a gradual neglect of the given meta information, and thus to a behavior akin to SPL. Like ENT, RIO continually improves the prior fault probabilities based on new knowledge obtained through queries to a user.

RIO learns a “cautiousness” parameter c whose admissible values are captured by the user-defined interval [c, c]. The relationship between c and queries is as follows:

Definition 23.1 (Cautiousness of a Query). We define the cautiousness  cq(Xi)of a query  Xi ∈ XDas follows:

image

A query  Xiis called braver than query  Xjiff  cq(Xi) < cq(Xj). Otherwise  Xiis called more cautious than  Xj. A query with maximum cautiousness  cqis called no-risk query.

Definition 23.2 (Elimination Rate). Given a query  Xiand the corresponding answer  ui ∈ {t, f}, the elimination rate

image

and

image

The answer  uito a query  Xiis called favorable iff it maximizes the elimination rate  e(Xi, ui). Otherwise uiis called unfavorable. The minimal or worst case elimination rate  minui∈{t,f}(e(Xi, ui))of  Xiis denoted by  ewc(Xi).

So, the cautiousness  cq(Xi)of a query  Xiis exactly the worst case elimination rate, i.e.  cq(Xi) =ewc(Xi) = e(Xi, ui)given that  uiis the unfavorable query result. Intuitively, parameter c characterizes the minimum proportion of diagnoses in D which should be eliminated by the successive query.

Definition 23.3 (High-Risk Query). Given a query  Xiand cautiousness  c, Xiis called a high-risk query iff  cq(Xi) < c, i.e. the cautiousness of the query is lower than the algorithm’s current cautiousness value c. Otherwise,  Xiis called non-high-risk query. By  NHRc(XD) ⊆ XDwe denote the set of non-high-risk queries w.r.t. c. For given cautiousness c, the set of queries  XDcan be partitioned in high-risk queries and non-high-risk queries.

Example 23.1 (Example 22.2 continued) Let the user specify c := 0.3 for the set D with |D| = 6. Given these settings,  X1 := {DeptEmployee(s), Student(s)}is a non-high-risk query since its partition  ⟨D+1 , D−1 , D01⟩ = ⟨{D4, D6} , {D1, D2, D3, D5} , ∅⟩and thus its cautiousness  cq(X1) = 2/6 ≥0.3 = c. The query  X2 := {PhD(s)}with the partition  ⟨{D1, D2, D3, D4, D6} , {D5} , ∅⟩is a high-risk query because  cq(X2) = 1/6 < 0.3 = cand  X3 := {Researcher(s), Student(s)}with  ⟨{D2, D4, D6}, {D1, D3, D5}, ∅⟩is a no-risk query due to  cq(X3) = 3/6 = cq.

Given a user’s answer  usto a query  Xs, the cautiousness c is updated depending on the elimination rate  e(Xs, us)by  c ← c + cadjwhere the cautiousness adjustment factor  cadj := 2 (c − c)adj. The scaling factor  2 (c−c)regulates the extent of the cautiousness adjustment depending on the interval length c−c. More crucial is the factor adj that indicates the sign and magnitude of the cautiousness adjustment:

image

where  ε ∈ (0, 12)is a constant which prevents the algorithm from getting stuck in a no-risk strategy for even |D|. E.g., given c = 0.5 and  ε = 0, the elimination rate of a no-risk query  e(Xs, us) = 12resulting always in adj = 0. The value of  εcan be set to an arbitrary real number, e.g.  ε := 14. If  c + cadjis outside the user-defined cautiousness interval [c, c], it is set to c if c < c and to c if c > c. Positive  cadjis a penalty telling the algorithm to get more cautious, whereas negative  cadjis a bonus resulting in a braver behavior of the algorithm. Note, for the user-defined interval  [c, c] ⊆ [cq, cq]must hold.  c − cqand  cq − crepresent the minimal desired difference in performance to a high-risk (ENT) and no-risk (SPL) query selection, respectively. By expressing trust (disbelief) in the prior fault probabilities through specification of lower (higher) values for c and/or c, the user can take influence on the behavior of RIO.

Example 23.2 (Example 23.1 continued) Assume  p(ax i) := 0.001for  axi(i=1,...,4)and  p(ax 5) := 0.1, p(ax 6) := 0.15and the user rather disbelieves these fault probabilities and thus sets c = 0.4, c = 0 and

c = 0.5. In this case RIO selects a no-risk query  X3just as SPL would do. Given  u3 = tand |D| = 6, the algorithm computes the elimination rate  e(X3, t) = 0.5and adjusts the cautiousness by  cadj = −0.17which yields c = 0.23. This allows RIO to select a higher-risk query in the next iteration, whereupon the target diagnosis  D∗ = D2is found after asking three queries. In the same situation, ENT (starting with high-risk query  X1) would require four queries.

RIO, described in Algorithm 14, starts with the computation of minimal diagnoses. GETDIAGNOSES function implements a combination of HS-Tree and QuickXPlain algorithms [SFFR12]. Using uniform-cost search, the algorithm extends the set of leading diagnoses D with a maximum number of most probable minimal diagnoses such that  |D| ≤ n.

Then the GETPROBABILITIES function calculates the fault probabilities  p(Di)for each diagnosis Diof the set of leading diagnoses D using Formula (22.1). Next it adjusts the probabilities as per the Bayesian theorem taking into account all previous query answers which are stored in P and N . Finally, the resulting probabilities  padj(Di)are normalized. Based on the set of leading diagnoses D,

image

GENERATEQUERIES generates queries according to Algorithm 13. GETMINSCOREQUERY determines the best query  Xsc ∈ XDaccording to  scent:

image

If  Xscis a non-high-risk query, i.e.  c ≤ cq(Xsc)(determined by GETQUERYCAUTIOUSNESS),  Xscis selected. In this case,  Xscis the query with best information gain in  XDand moreover guarantees the required elimination rate specified by c.

Otherwise, GETALTERNATIVEQUERY selects the query  Xalt ∈ XD (Xalt ̸= Xsc)which has minimal score  scentamong all least cautious non-high-risk queries  Lc. That is,

image

where

image

If there is no such query  Xalt ∈ XD, then  Xscis selected.

Given the user’s answer  us, the selected query  Xs ∈ {Xsc, Xalt}is added to P or N accordingly (see Chapter 22). In the last step of the main loop the algorithm updates the cautiousness value c (function UPDATECAUTIOUSNESS) as described above.

Before the next query selection iteration starts, a stop condition test is performed. The algorithm evaluates whether the most probable diagnosis is at least  σ%more likely than the second most probable diagnosis (ABOVETHRESHOLD) or none of the leading diagnoses has been eliminated by the previous query, i.e. GETELIMINATIONRATE returns zero for  Xs. If a stop condition is met, the presently most likely diagnosis is returned (MOSTPROBABLEDIAG).

Goals. This evaluation should demonstrate that (1) there is a significant discrepancy between existing strategies SPL and ENT concerning user effort where the winner depends on the quality of meta information, (2) RIO exhibits superior average behavior compared to ENT and SPL w.r.t. the amount of user interaction required, irrespective of the quality of specified fault information, (3) RIO scales well and (4) its reaction time is well suited for an interactive debugging approach.

Provenance of Test Data. As data source for the evaluation we used faulty real-world ontologies produced by automatic ontology matching systems (cf. Example 22.1). Matching of two ontologies  Oiand Ojis understood as detection of correspondences between elements of these ontologies [SE13]:

Definition 24.1 (Ontology matching). Let  Q(O) ⊆ S(O)denote the set of matchable elements in an ontology O, where S(O) denotes the signature of O. An ontology matching operation determines an alignment  Mij, which is a set of correspondences between matched ontologies  Oiand  Oj. Each correspondence is a 4-tuple  ⟨xi, xj, r, v⟩, such that  xi ∈ Q(Oi), xj ∈ Q(Oj), ris a semantic relation and v ∈ [0, 1]is a confidence value. We call  OiMj := Oi ∪ φ(Mij) ∪ Ojthe aligned ontology for  Oiand Ojwhere  φmaps each correspondence to an axiom.

Let in the following Q(O) be the restriction to atomic concepts and roles in  S(O), r ∈ {⊑, ⊒, ≡}and φthe natural alignment semantics [MS09] that maps correspondences one-to-one to axioms of the form xi r xj. We evaluate RIO using aligned ontologies by the following reasons: (1) Matching results often cause inconsistency/incoherence of ontologies. (2) The (fault) structure of different ontologies obtained through matching generally varies due to different authors and matching systems involved in the genesis of these ontologies. (3) For the same reasons, it is hard to estimate the quality of fault probabilities, i.e. it is unclear which of the existing query selection strategies to chose for best performance. (4) Available reference mappings can be used as correct solutions of the debugging procedure.

Test Datasets. We used two datasets D1 and D2: Each faulty aligned ontology  OiMjin D1 is the result of applying one of four ontology matching systems to a set of six independently created ontologies in the domain of conference organization. For a given pair of ontologies  Oi ̸= Oj, each system produced an alignment  Mij. The average size of  OiMjper matching system was between 312 and 377 axioms. D1 is a superset of the dataset used in [Stu08] for which all debugging systems under evaluation manifested correctness or scalability problems. D2, used to assess the scalability of RIO, is the set of ontologies from the ANATOMY track in the Ontology Alignment Evaluation Initiative38 (OAEI) 2011.5 [SE13], which comprises two input ontologies  O1(11545 axioms) and  O2(4838 axioms). The size of the aligned ontologies generated by results of seven different matching systems was between 17530 and 17844 axioms. 39

Reference Solutions. For the dataset D1, based on a manually produced reference alignment  Rij ⊆Mijfor ontologies  Oi, Oj(cf. [MST08]), we were able to fix a target diagnosis  D∗ := φ(Mij \ Rij)for each incoherent  OiMj. In cases where  D∗represented a non-minimal diagnosis, it was randomly redefined as a minimal diagnosis  D∗ ⊂ φ(Mij \ Rij). In case of D2, given the ontologies  O1and O2, the output  M12of a matching system, and the correct reference alignment  R12, we fixed  D∗as follows: We carried out (prior to the actual experiment) a debugging session with DPI  ⟨φ(M12 \ R12), O1 ∪ O2 ∪ φ(M12 ∩ R12), ∅, ∅⟩{coherence}and randomly chose one of the identified diagnoses as  D∗.

Test Settings. We conducted 4 experiments EXP-i (i = 1, . . . , 4), the first two with dataset D1 and the other two with D2. In experiments 1 and 3 we simulated good fault probabilities by setting  p(ax k) :=0.001 for  ax k ∈ Oi ∪ Ojand  p(ax m) := 1 − vmfor  ax m ∈ Mij, where  vmis the confidence of the correspondence underlying  ax m. Unreasonable fault information was used in experiments 2 and 4. In EXP-4 the following probabilities were defined:  p(ax k) := 0.01for  ax k ∈ Oi ∪ Ojand  p(ax m) :=0.001 for  ax m ∈ Mij. In EXP-2, in contrast, we used probability settings of EXP-1, but altered the target diagnosis  D∗in that we precomputed (before the actual experiment started) the 30 most probable minimal diagnoses, and from these we selected the diagnosis with the highest number of axioms  ax k ∈OiMj \ φ(Mij)as  D∗.

Throughout all four experiments, we set |D| := 9 (which proved to be a good trade-off between computation effort and representativeness of the leading diagnoses),  σ := 85%and as input parameters for RIO we set c := 0.25 and  [c, c] := [cmin, cmax] = [0, 49]. To let tests constitute the highest challenge for the evaluated methods, the initial DPI was specified as  ⟨OiMj, ∅, ∅, ∅⟩{coherence}, i.e. the entire search space was explored without adding parts of  OiMjto B, although  D∗was always a subset of the alignment Mijonly. In practice, given such prior knowledge, the search space could be severely restricted and debugging greatly accelerated. All tests were executed on a Core-i7 (3930K) 3.2Ghz, 32GB RAM with Ubuntu Server 11.04 and Java 6 installed.40

Metrics. Each experiment involved a debugging session of ENT, SPL as well as RIO for each ontology in the respective dataset. In each debugging run we measured the number of required queries (q) until  D∗was identified, the overall debugging time (debug) assuming that queries are answered instantaneously and the reaction time (react), i.e. the average time between two successive queries. The queries generated in the tests were answered by an automatic oracle by means of the target ontology  OiMj \ D∗.

Observations. The difference w.r.t. the number of queries per test run between the better and the worse strategy in {SPL,ENT} was absolutely significant, with a maximum of 2300% in EXP-4 and averages of 190% to 1145% throughout all four experiments (Figure 24.2). Moreover, results show that varying quality of fault probabilities in {EXP-1,EXP-3} compared to {EXP-2,EXP-4} clearly affected the performance of ENT and SPL (see first two rows in Figure 24.2). This perfectly motivates the application of RIO.

Results of both experimental sessions,  ⟨EXP-1,EXP-2⟩and  ⟨EXP-3,EXP-4⟩, are summarized in Figures 24.1(a) and 24.1(b), respectively. The figures show the (average) number of queries asked by RIO

image

Table 24.1: Average time (ms) for the entire debugging session (debug), average time (ms) between two successive queries (react), and average number of queries (q) required by each strategy.

image

Figure 24.1: The bars show the avg. number of queries (q) needed by RIO, grouped by matching tools. The distance from the bar to the lower (upper) end of the whisker indicates the avg. difference of RIO to the queries needed by the per-session better (worse) strategy of SPL and ENT, respectively.

and the (average) differences to the number of queries needed by the per-session better and worse strategy in {SPL,ENT}, respectively. The results illustrate clearly that the average performance achieved by RIO was always substantially closer to the better than to the worse strategy. In both EXP-1 and EXP-2, throughout 74% of 27 debugging sessions, RIO worked as efficiently as the best strategy (Figure 24.2). In 26% of the cases in EXP-2, RIO even outperformed both other strategies; in these cases, RIO could save more than 20% of user interaction on average compared to the best other strategy. In one scenario in EXP-1, it took ENT 31 and SPL 13 queries to finish, whereas RIO required only 6 queries, which amounts to an improvement of more than 80% and 53%, respectively. In  ⟨EXP-3,EXP-4⟩, the savings achieved by RIO were even more substantial. RIO manifested superior behavior to both other strategies in 29% and 71% of cases, respectively. Not less remarkable, in 100% of the tests in EXP-3 and EXP-4, RIO was at least as efficient as the best other strategy. Recalling Figure 24.2, this means that RIO can

image

Table 24.2: Percentage rates indicating which strategy performed best/better w.r.t. the required user interaction, i.e. number of queries. EXP-1 and EXP-2 involved 27, EXP-3 and EXP-4 seven debugging sessions each.  qstrdenotes the number of queries needed by strategy str and min is an abbreviation for  min(qSPL, qENT).

image

Figure 24.2: Box-Whisker Plots presenting the distribution of overhead  (qw−qb)/qb∗100(in %) per debugging session of the worse strategy  qw := max(qSPL, qENT)compared to the better strategy  qb := min(qSPL, qENT).Mean values are depicted by a cross.

avoid query overheads of over 2000%. Table 24.1, which provides average values for q, react and debug per strategy, demonstrates that RIO is the best choice in all experiments w.r.t. q. Consequently, RIO is suitable for both good and poor meta information.

As to time aspects, RIO manifested good performance, too. Since times consumed in  ⟨EXP-1,EXP-2⟩are almost negligible, consider the more meaningful results obtained in  ⟨EXP-3,EXP-4⟩. While the best reaction time in both experiments was achieved by SPL, we can clearly see that SPL was significantly inferior to both ENT and RIO concerning q and debug. RIO revealed the best debugging time in EXP-4, and needed only 2.2% more time than the best strategy (ENT) in EXP-3. However, if we assume the user being capable of reading and answering a query in, e.g., 30 sec on average, which is already quite fast, then the overall time savings of RIO compared to ENT in EXP-3 would already account for 5%. Doing the same thought experiment for EXP-4, RIO would save 25% (w.r.t. ENT) and 50% (w.r.t. SPL) of debugging time on average. All in all, the measured times confirm that RIO is well suited for interactive debugging.

A similar interactive technique was presented in [NRG12], where a user is successively asked single ontology axioms in order to obtain a partition of a given ontology into a set of desired and a set of undesired consequences. However, given an inconsistent/incoherent ontology, this technique starts from an empty set of desired consequences aiming at adding to this set only axioms which preserve coherence, whereas our approach starts from the complete ontology aiming at finding a minimal set of axioms responsible for the violation of pre-specified requirements.

An approach for alignment debugging was proposed in [Mei11]. This work describes approximate algorithms for computing a “local optimal diagnosis” and complete methods to discover a “global optimal diagnosis”. Optimality in this context refers to the maximum sum of confidences in the resulting coherent alignment. In contrast to our framework, diagnoses are determined automatically without support for user interaction. Instead, techniques for manual revision of the alignment as a procedure independent from debugging are demonstrated.

We have shown problems of state-of-the-art interactive ontology debugging strategies w.r.t. the usage of unreliable meta information. To tackle this issue, we proposed a learning strategy which combines the benefits of existing approaches, i.e. high potential and low risk. Depending on the performance of the diagnosis discrimination actions, the trust in the a-priori information is adapted. Tested under various conditions, our algorithm revealed good scalability and reaction time as well as superior average performance to two common approaches in the field in all tested cases w.r.t. required user interaction. Highest achieved savings amounted to more than 80% and user interaction overheads resulting from the wrong choice of strategy of up to 2300% could be saved. In the hardest test cases, the new strategy was not only on average, but in 100% of the test cases at least as good as the best other strategy.

image

A Direct Approach to Sequential Diagnosis of High Cardinality Faults in Knowledge Bases

image

In this part we cover the topic of efficiently dealing with KB debugging problems involving high cardinality faults. This part relies on material [SFRF14c, SFRF14a, SFRF14b] published in the Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), in DX 2014 - 25th International Workshop on Principles of Diagnosis and in the Proceedings of the Third International Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM14), respectively.41

Model-based diagnosis (MBD) [Rei87] is a general method which can be used to find errors in hardware, software, knowledge-bases (KBs), orchestrated web-services, configurations, etc. In particular, ontology (KB) debugging tools [KPHS07, FS05, HPS08] can localize a (potential) fault by finding sets of axioms  D ⊆ Kcalled diagnoses for the KB K. Diagnoses are generated using minimal conflict sets, i.e. irreducible sets of axioms  CS ⊆ Kthat violate some requirements, by using a consistency checker (black-box approach). At least all axioms of a minimal diagnosis must be modified or deleted in order to formulate a fault-free knowledge-base  K∗. A knowledge-base K is faulty if some requirements, such as consistency of K, presence or absence of specific entailments, are violated.

Sequential MBD methods [dKW87] applied to KB debugging acquire additional information in order to discriminate between diagnoses [SFFR12]. Generated queries are answered by some oracle providing additional observations about the entailments of a valid KB. As various applications show, the standard methods work very satisfactorily for cases where the number of faults (minimal conflict sets) is low (single digit number), consistency checking is fast (single digit number of seconds), and sufficient possibilities for observations are available.

However, there are situations when KBs comprise a large number of faults. For example, in ontology matching scenarios two KBs with several thousands of axioms are merged into a single one. High quality matchers (e.g. [JRG11]) require the diagnosis of such substantially extended KBs, but could not apply standard diagnosis methods because of the large number of minimal diagnoses and their high cardinality. E.g. there are cases when the minimum cardinality of diagnoses is greater than 20.

In order to deal with hard diagnosis instances, we propose to relax the requirement for sequential diagnosis to compute a set of preferred minimal diagnoses, such as a set of most probable diagnoses. Instead, we compute just some set of minimal diagnoses which can be used for query generation. This allows to use direct computation of diagnoses [SU06] without computing conflict sets. The direct approach was applied for non-interactive diagnosis of ontologies [DQPS11, BKP12] and constraints [FSZ11]. A recent approach [SKFP12] does not generate the standard HS-TREE, but still depends on the minimization of conflict sets, i.e. |D| minimized conflicts have to be discovered. Consequently, if  |D| ≫ m, substantially more consistency checks are required, where |D| is the cardinality of the minimal diagnosis and m is the number of minimal diagnoses required for query generation.

Since we are replacing the set of most probable diagnoses by just a set of minimal diagnoses, some important practical questions have to be addressed. (1) Is a substantial number of additional queries needed, (2) is this approach able to locate the faults, and (3) how efficient is this approach?

In order to answer these questions we have exploited the most difficult diagnosis problems of the ontology alignment competition [EFvH+11]. Our evaluation shows that sequential diagnosis by direct diagnosis generation needs approximately the same number of queries (±1) in order to identify the faults. This evaluation was carried out for cases where the standard sequential diagnosis method was applicable. Furthermore, the evaluation shows that our proposed method is able to locate faults in all cases correctly, particularly in those cases where debugging sessions by means of the standard method are not successful (due to overwhelming time or space consumption). Moreover, for the hardest cases (i.e., more than 4 minutes overall debugging time), the additional computation costs introduced by the direct method apart from the costs needed for theorem proving are less than 50%, i.e. reasoning costs amount to more than two thirds of overall computation time.

The rest of Part VI is organized as follows: Chapter 28 gives a brief introduction to the main notions of sequential KB diagnosis. The details of the suggested algorithms are presented in Chapter 29. In Chapter 30 we provide evaluation results whereupon Chapter 31 gives a conclusion.

In the following we present (1) the fundamental concepts regarding the diagnosis of KBs and (2) the interactive localization of axioms which must be changed.

Diagnosis of KBs. Given a knowledge-base K which is a set of logical sentences (axioms), the user can specify particular requirements during the knowledge-engineering process. The most basic requirement is satisfiability, i.e. a logical model exists. A further frequently employed requirement is coherence. Coherence requires that there exists a model s.t. the interpretation of every unary predicate is non-empty. In other words, if we add  ∃Y a(Y )to K for every unary predicate a, then the resulting KB must be satisfiable. In addition, as it is common practice in software engineering, the knowledge-engineer (user for short) may specify test cases. Test cases are axioms which must (not) be entailed by a valid KB.

Definition 28.1. Given a set of axioms P (called positive test cases) and a set of axioms N (called negative test cases), a knowledge-base  K∗is valid iff it fulfills the following requirements:

1.  K∗is satisfiable (and coherent if required)

2. K∗|= p ∀p ∈ P

3. K∗̸|= n ∀n ∈ N

Let us assume that there is a non-valid KB K, then a set of axioms  D ⊆ Kmust be removed and possibly some axioms EX must be added by the user s.t. an updated  K∗becomes valid, i.e.  K∗ :=(K \ D) ∪ EX. The goal of diagnosis is to provide information to the users which are the sets of axioms D (which is called a diagnosis) that must be changed. In order to prevent unnecessary changes, D is often required to be subset-minimal, i.e. the set should be as small as possible. Furthermore, we allow the user to define a set of axioms B (called the background theory) which must not be changed (i.e. the correct axioms). More formally:

Definition 28.2. Given a diagnosis problem instance (DPI) specified by  ⟨K, B, P, N⟩where

• Kis a knowledge-base,

• Ba background theory,

P a set of axioms which must be implied by a valid knowledge-base  K∗and

N a set of axioms, each of which must not be implied by  K∗

D ⊆ Kis a diagnosis w.r.t.  ⟨K, B, P, N⟩iff K \ D can be extended by a set of logical sentences EX such that:

1.  (K \ D) ∪ B ∪ EXis consistent

2. (K \ D) ∪ B ∪ EX |= p for all  p ∈ P

3. (K \ D) ∪ B ∪ EX ̸|= n for all  n ∈ N

D is a minimal diagnosis iff there is no  D′ ⊂ Dsuch that  D′is a diagnosis. D is a minimum cardinality diagnosis iff there is no diagnosis  D′such that  |D′| < |D|.42

The following proposition of [SFFR12] characterizes diagnoses by replacing EX with the positive test cases.

Corollary 28.1. Given a DPI  ⟨K, B, P, N⟩, a set of axioms  D ⊆ Kis a diagnosis w.r.t.  ⟨K, B, P, N⟩iff

image

is satisfiable (coherent) and

image

Hereafter we assume that a diagnosis always exists.

Proposition 28.1. A diagnosis D w.r.t. a DPI  ⟨K, B, P, N⟩exists iff  B∪{�p∈P p}is consistent (coherent) and  ∀n ∈ N : B ∪ {�p∈P p} ̸|= n

For the computation of diagnoses conflict sets are usually employed to constrain the search space. A conflict set is the part of the KB that preserves the inconsistency/incoherency.

Definition 28.3. Given a DPI  ⟨K, B, P, N⟩, a set of axioms  CS ⊆ Kis a conflict set w.r.t.  ⟨K, B, P, N⟩iff CS ∪B∪{�p∈P p}is inconsistent (incoherent) or there is an  n ∈ Nsuch that  CS ∪B∪{�p∈P p} |= n. CS is minimal iff there is no  CS′ ⊂ CSsuch that  CS′is a conflict set.43

Minimal conflict sets can be used to compute the set of minimal diagnoses as it is shown in [Rei87]. The idea is that each diagnosis must include at least one element of each minimal conflict set.44

Proposition 28.2. D is a (minimal) diagnosis w.r.t. the DPI  ⟨K, B, P, N⟩iff D is a (minimal) hitting set for the set of all minimal conflict sets w.r.t.  ⟨K, B, P, N⟩.

For the generation of a minimal conflict set, diagnosis systems use a divide-and-conquer method (e.g. QUICKXPLAIN [Jun04], for short QX), which we discussed in Sections 4.4.1 and 4.4.2. In the worst case, QX requires  O(|CS| log( |K||CS|))calls to the reasoner, where CS is the returned minimal conflict set.

The computation of minimal diagnoses in KB debugging systems is implemented using Reiter’s Hitting Set HS-TREE algorithm [Rei87] (cf. Algorithm 2 in Chapter 4). The algorithm constructs a directed tree from the root to the leaves, where each non-leave node is labeled with a minimal conflict set and leave nodes are labeled by  ✓(no conflicts) or  ×(pruned).

Each (✓) node corresponds to a minimal diagnosis. The minimality of the diagnoses is guaranteed by the minimality of conflict sets used for labeling the nodes, the pruning rule and the breadth-first strategy of the tree generation. Moreover, because of the breadth-first strategy the minimal diagnoses are generated in increasing order of their cardinality. Under the assumption that diagnoses with lower cardinality are more probable than those with higher cardinality, HS-TREE generates most probable minimal diagnoses first.

Diagnoses Discrimination. For many real-world DPIs, a diagnosis system can return a large number of (minimal) diagnoses. Each minimal diagnosis corresponds to a different set of axioms in the given KB K. All the axioms of any minimal diagnosis might be deleted from K or changed accordingly in order to formulate a valid  K∗. The user may extend the test cases P and N such that diagnoses are eliminated, thus identifying exactly the correct minimal diagnosis. For discriminating between minimal diagnoses we assume that the user knows some of the sentences a valid  K∗must (not) entail, that is the user serves as an oracle.

Property 3. Given a DPI  ⟨K, B, P, N⟩, a set of diagnoses D w.r.t.  ⟨K, B, P, N⟩, and a logical sentence Q representing the oracle query  K∗ |= Q. If the oracle gives the answer yes then  Di ∈ Dis a diagnosis w.r.t.  ⟨K, B, P ∪ {Q}, N⟩iff both conditions hold:

image

If the oracle gives the answer no then  Di ∈ Dis a diagnosis w.r.t.  ⟨K, B, P, N ∪ {Q}⟩iff both conditions hold:

image

However, many different queries might exist for some set of diagnoses  |D| ≥ 2, in the extreme case exponentially many (in |D|). To select the best query, the authors in [SFFR12] suggest two query selection strategies: SPLIT-IN-HALF (SPL) and ENTROPY (ENT). The first strategy is a greedy approach preferring queries which allow to remove half of the diagnoses in D, for both answers to the query. The second is an information-theoretic measure, which estimates the information gain for both outcomes of each query and returns the query that maximizes the expected information gain. The prior fault probabilities required for evaluating the ENT measure can be obtained from statistics of previous diagnosis sessions. For instance, if the user has problems to apply “∃”, then the diagnosis logs are likely to contain more repairs of axioms including this quantifier. Consequently, the prior fault probabilities of axioms including “∃” should be higher. Given the fault probabilities of axioms, one can calculate prior fault probabilities of diagnoses as well as evaluate ENT (see [SFFR12] for more details). The queries for both strategies are constructed by exploiting so called classification and realization services provided by description logic reasoners. Given a KB K and interpreting unary predicates as classes (rsp. concepts), the classification generates the inheritance (subsumption) tree, i.e. the entailments  K |= ∀X p(X) → q(X), if p is a subclass of q. Realization computes, for each individual name t occurring in a KB K, a set of most specific classes p s.t. K |= p(t) (see [BCM+07] for details).

Due to the number of diagnoses and the complexity of diagnosis computation, not all diagnoses are exploited for generating queries but a set of minimal diagnoses of size less or equal to some (small) predefined number m [SFFR12]. We call this set the leading diagnoses and denote it by D from now on. This set comprises the (most probable) minimal diagnoses which represent the set of all diagnoses.

The sequential KB debugging process can be sketched as follows. As input a DPI and some meta information, such as prior fault estimates F, query selection strategy  sQ (SPLor ENT) and stop criterion σ, are given. As output a minimal diagnosis is returned that has a posterior probability of at least  1−σ. For sufficiently small  σthis means that the returned diagnosis is highly probable whereas all other minimal diagnoses are highly improbable.

1. Using QX and HS-TREE, compute a set of leading diagnoses D of cardinality min(m, a), where a is the number of all minimal diagnoses w.r.t. the DPI and m is the number of leading diagnoses predefined by a user.

2. Use the prior fault probabilities F and the already specified test cases to compute (posterior) probabilities of diagnoses in D by the Bayesian Rule (cf. [SFFR12]).

3. If some diagnosis  D ∈ Dhas a probability greater than or equal to  1 − σor the user accepts D as the axioms to be changed then stop and return D.

4. Use D to generate a set of queries and select the best query Q according to  sQ.

5. Ask the user  K∗ |= Qand, depending on the answer, add Q either to P or to N.

6. Remove elements from D violating the newly acquired test case.

7. Repeat at Step 1.

Knowledge Bases

The novelty of our approach is the interactivity combined with the direct calculation of diagnoses. To this end we will utilize an “inverse” version of the QX algorithm [Jun04] called INV-QX and an associated “inverse” version of HS-TREE termed INV-HS-TREE.

This combination of algorithms was first used in [FSZ11]. However, we introduced two modifications: (i) a depth-first search strategy instead of breadth-first and (ii) a new pruning rule which moves axioms from K to B instead of just removing them from K, since not adding them to B might result in losing some of the minimal diagnoses.

INV-QX – Key Idea. INV-QX relies on the monotonic semantics of the used knowledge representation language. The algorithm takes a DPI  ⟨K, B, P, N⟩and a ranking heuristic  ≺as input and outputs either one minimal diagnosis or ’no diagnosis exists’. The ranking heuristic assigns a fault probability to each axiom in K, if this information is available; otherwise every axiom has the same rank.

The main idea behind Algorithm 15 is to start with the set  D0 = ∅and extend it until a subset of axioms  D ⊆ Kis found such that D is a minimal diagnosis with respect to Definition 28.2. In the first steps (lines 1-3), Algorithm 15 defines a (potentially) faulty set of axioms  K′and a set  B′of axioms assumed to be correct and sorts  K′w.r.t. the ranking heuristic (SORT). Next, INV-QX verifies whether a diagnosis exists for the input data (line 4), i.e. if the conditions given by Proposition 28.1 are met. This is accomplished by a call to the VERIFY function (defined in line 18 ff.) which requires a reasoner that implements consistency checking (ISCONSISTENT) and allows to decide whether a set of axioms  K′entails some axiom n or not (ENTAILS). Concretely, VERIFY tests for given arguments B (set of correct axioms), D (potential minimal diagnosis), K (potentially faulty set of axioms), N (negative test cases) whether the set D is a minimal diagnosis or not according to Corollary 28.1. In case no diagnosis exists, the algorithm returns ’no diagnosis exists’, otherwise it calls the function FINDDIAG in line 6.

FINDDIAG (line 7) is the main function of the algorithm which takes six arguments as input. The values of the arguments B, K and N remain constant during the recursion and are required only for the verification of requirements, i.e. calls to the VERIFY function. The values of D (potential diagnosis),  ∆(axioms most recently added to D) and  K∆(part of the original knowledge base that is currently analyzed for the inclusion of axioms that are elements of the sought minimal diagnosis) on the other hand change throughout the recursive calls of FINDDIAG. The two latter sets are obtained by recurrently partitioning the set  K∆(SPLIT and GETELEMENTS in lines 12-14). In most of the implementations SPLIT is specified so as to return  k = ⌊|K∆|/2⌋which causes the splitting of  K∆into partitions of equal cardinality (this results in the best worst case time complexity [Jun04]). The algorithm pursues this to divide-and-conquer

image

strategy (lines 15 and 16) until it identifies that the set D is a diagnosis (line 8). In further iterations the algorithm minimizes this diagnosis by splitting it into sub-diagnoses of the form  D = D′ ∪ K∆, where K∆contains only one axiom. In case D is a diagnosis and  D′is not, the algorithm decides that  K∆is a subset of the sought minimal diagnosis. Just as the original QX algorithm, INV-QX always terminates and it returns a minimal diagnosis for a given DPI (provided there exists one).

INV-QX requires  O(|D| log( |K||D|))calls to a reasoner to find a minimal diagnosis D. Moreover, in opposite to SAT or CSP methods, e.g. [NPQW13], INV-QX can be used to compute diagnoses in cases when satisfiability checking is beyond NP. For instance, reasoning for most of the KBs used in Chapter 30 is EXPTIME-complete.

INV-QX is a deterministic algorithm and returns one and the same minimal diagnosis if applied twice to one and the same DPI. In order to obtain a different next diagnosis, the DPI used as input for INV-QX must be modified accordingly. To this end, we employ the INV-HS-TREE algorithm.

INV-HS-TREE – Construction. The algorithm is inverse to the HS-TREE algorithm in the sense that nodes are now labeled by minimal diagnoses (instead of minimal conflict sets) and a path from the root to an open node is a partial conflict set (instead of a partial diagnosis). The algorithm constructs a directed tree from the root to the leaves, where each node nd is labeled either with a minimal diagnosis D or  ×(pruned) which indicates that the node is closed. For each  s ∈ Dthere is an outgoing edge labeled by s. Let H(nd) be the set of edge labels on the path from the root to the node nd. Initially the algorithm generates an empty root node and adds it to a LIFO-queue, thereby implementing a depth-first search strategy. Until the required number m of minimal diagnoses is reached or the queue is empty, the algorithm removes the first node nd from the queue and labels nd by applying the following steps:

1. (reuse):  D ∈ Dif  D ∩ H(nd) = ∅, add for each  s ∈ Da node to the LIFO-queue, or

2. (pruned):  ×if INV-QX(K \ H(nd), B ∪ H(nd), P, N) =’no-diagnosis-exists’, (according to Proposition 28.1), or

3. (compute): D if INV-QX(K \ H(nd), B ∪ H(nd), P, N) = D; add D to D and add for each  s ∈ Da node to the LIFO-queue.

Reuse of known diagnoses in Step 1 and the addition of H(nd) to the background theory B in Steps 2 and 3 allows the algorithm to force INV-QX to search for a minimal diagnosis that is different to all already computed minimal diagnoses in D. So, if neither Step 1 nor Step 2 are applicable, INV-HS-TREE calls INV-QX which is guaranteed to compute a new minimal diagnosis D which is then added to the set D.

INV-HS-TREE – Update Procedure for Interactivity. Since paths in INV-HS-TREE are (1) irrelevant and need not be maintained, and (2) only a small (linear) number of nodes/paths is in memory due to the application of a depth-first search, the update procedure after a query Q has been answered involves a reconstruction of the tree. In particular, by answering  Q, m − kof (maximally) m leading diagnoses are invalidated and deleted from memory. The k still valid minimal diagnoses are used to build a new tree. To this end, the root is labeled by any of these k minimal diagnoses and a tree is constructed as described above where the k diagnoses are incorporated for the reuse check. Note that the recalculation of a diagnosis that has been invalidated by a query is impossible as in subsequent iterations a new DPI is considered which includes the answered query as a test case.

INV-HS-TREE – Comparison to HS-TREE. Since INV-QX(K, B ∪ H(nd), P, N) =’no diagnosis exists’ means H(nd) is a conflict set w.r.t. the current DPI  ⟨K, B, P, N⟩, in INV-HS-TREE any path that is a conflict set is automatically closed. This makes a pruning rule similar to the one in HS-TREE which closes a node nd given an alternative path  H(nd′)to a closed node  nd′with  H(nd′) ⊆ H(nd)obsolete. So, INV-HS-TREE benefits from the fact that minimality of diagnoses is independent of path-minimality, and thereby might save time for comparison of exponentially many paths over HS-TREE.

Another great advantage of INV-HS-TREE over HS-TREE is that it can be constructed using a space-saving depth-first strategy. The reason for this is again that minimality of paths (conflict sets) is irrelevant in INV-HS-TREE whereas in HS-TREE minimality of paths (diagnoses) is essential. In an implementation where successors of a node are generated one at a time in INV-HS-TREE, the space complexity of the entire tree construction is linear and amounts to O(2m) = O(m) where m is the predefined maximum number of leading diagnoses. This holds as k < m still valid diagnoses from the previous iteration are in memory, plus a path in the tree can comprise a maximum of m nodes corresponding to different (reused or new) diagnoses before the search is stopped (|D| = m). No conflict sets are stored.

For HS-TREE, by contrast, the worst-case space complexity is exponential, i.e.  O(|CSmax|d)where |CSmax|is the size of the minimal conflict set with maximum cardinality (among all minimal conflict sets w.r.t. the given DPI) and d is the tree depth were m minimal diagnoses have been generated.

The crucial disadvantage of INV-HS-TREE compared to HS-TREE is that the former cannot guarantee the computation of diagnoses in a special order, e.g. minimum cardinality or maximum fault probability first.

image

Figure 29.1: INV-QX recursion tree. Each node shows values of FINDDIAG input variables as well as the result of the VERIFY function called in line 8.

image

[ax 3, ax 4]

image

Figure 29.2: Identification of the target diagnosis  [ax 3, ax 4] using INV-HS-TREE.

Example 29.1 Consider a DPI with the following knowledge base K:

image

the background knowledge B = {a(v), b(w), c(s)}, one positive P = {d(v)} and one negative N = {e(w)} test case.

Let us first show how a minimal diagnosis is computed by INV-QX (see Figure 29.1). The algorithm starts with an empty diagnosis  D = ∅and  K∆containing all axioms of K 1 . VERIFY called in line 8 returns false since  (B ∪ P) ∪ (K \ ∅)is inconsistent. Since moreover  |K∆| ̸= 1(line 10), the algorithm splits  K∆into  {ax 1, ax 2}and  {ax 3, ax 4, ax 5}(lines 12-14) and passes the sub-problem (line 15) to the next level of recursion 2 . Since the set  D = {ax 1, ax 2}is not a diagnosis, i.e. the KB  (B∪P)∪(K\D)is inconsistent and  |K∆| = | {ax 3, ax 4, ax 5} | ̸= 1, the problem in  K∆is split one more time (lines 12-14). On the second level of recursion 3 the set D is a diagnosis, yet not a minimal one. The function VERIFY returns true and the algorithm starts to analyze the found diagnosis. Therefore, it verifies whether the last extension of the set D is a subset of a minimal diagnosis 4 . Since the extension includes only one axiom  ax 3and the extended set  {ax 1, ax 2}is not a diagnosis, the algorithm concludes that  ax 3must be an element of the a minimal diagnosis. The leftmost branch of the recursion tree terminates and returns

image

⟨ax 3,ax  5⟩C ⟨ax 3,ax  5⟩R

image

Figure 29.3: Identification of the target diagnosis  [ax 4, ax 3]using HS-TREE and QX computing conflicts on-demand. All computed node labels are denoted with C and all reused with R.

{ax 3}. This axiom is added to the set D and the algorithm starts investigating whether the two axioms {ax 1, ax 2}also belong to a minimal diagnosis 5 . First, it tests the set  {ax 3, ax 1}6 , which is not a diagnosis, and in the next iteration it identifies  {ax 3, ax 2}as a minimal diagnosis in node 7 which is the final output of INV-QX.

In general, for the sample DPI there are three minimal diagnoses  {D1 : [ax 2, ax 3], D2 : [ax 3, ax 4],D3 : [ax 1, ax 4, ax 5]}and four minimal conflict sets  {CS1 : ⟨ax 1, ax 3⟩ , CS2 : ⟨ax 2, ax 4⟩ , CS3 :⟨ax 3, ax 5⟩ , CS4 : ⟨ax 3, ax 4⟩}.

Now we show how INV-HS-TREE can be applied to find the (correct) diagnosis that allows the formulation of a valid KB (with the desired semantics in terms of entailments and non-entailments). Assume that the number of leading diagnoses required for query generation is set to m = 2. Applied to the sample DPI, INV-HS-TREE computes a minimal diagnosis  D1 := [ax 2, ax 3] =INV-QX(K, B, P, N) to label the root node, see Figure 29.2. Next, it generates one successor node that is linked with the root by an edge labeled with  ax 2. For this node INV-QX(K \ {ax 2} , B ∪ {ax 2} , P, N)yields a minimal diagnosis  D2 := [ax 3, ax 4]disjoint with  {ax 2}. Now |D| = 2 and a query is generated and answered as in Figure 29.2. Adding c(w) to the negative test cases invalidates  D1since  (K \ D1) ∪ B ∪ P |= c(w). In the course of the update,  D1is deleted and  D2used as the root of a new tree. An edge labeled with ax3is created and diagnosis  D3 := [ax 1, ax 4, ax 5]is generated. After the answer to the second query is added to the positive test cases,  D3is invalidated and all outgoing edge labels  ax 3, ax 4of the root  D2of the new tree are conflict sets for the current DPI  ⟨K, B, {d(v), ∀X a(X) → c(X)} , {e(w), c(w)}⟩, i.e. all leaf nodes are labeled by  ×and the tree construction is complete. So,  D2is returned as its probability is 1.

Finally, let us compare the performance of HS-TREE [Rei87] with the one of INV-HS-TREE. Applied to our sample DPI, the standard interactive diagnosis process using HS-TREE first calls QX [Jun04] which returns a minimal conflict set  ⟨ax 1, ax 3⟩(Figure 29.3). This minimal conflict set is used to label the root node of the HS-TREE. By reuse (R) of already computed minimal conflict sets or further calls (C) to QX (if there is no conflict set to reuse) the algorithm extends the HS-TREE until m = 2 leading minimal diagnoses  D := {D1, D2}for the DPI are computed. To discriminate between diagnoses in D, the query  K∗ |= c(w)is computed. Given the answer no,  D2is invalidated which is reflected by the closing of the corresponding node in the tree (label  ×). The second iteration considers the new DPI ⟨K, B, {d(v)} , {e(w), c(w)}⟩and involves further expansion of (open nodes in) the tree under consideration of the pruning rule until the size of leading diagnoses D is 2, i.e.  {D1, D3}. After the positive answer to the second query and closing of the invalidated diagnosis  D3, the recalculation of D (not shown in Figure 29.3) yields no further minimal diagnoses. So, the algorithm terminates and returns  D1. As we can see, HS-TREE comprises a lot of intermediate nodes in comparison to INV-HS-TREE. That leads to a dramatic difference in memory consumption between these two approaches.

We evaluated our approach DIR (based on INV-QX and INV-HS-TREE) versus the standard technique STD [SFFR12] (based on QX and HS-TREE) using a set of KBs created by automatic matching systems. Given two knowledge bases  Kiand  Kj, a matching system outputs an alignment  Mijwhich is a set of correspondences between semantically related entities of  Kiand  Kj. Let Q(K) denote the set of all elements of K for which correspondences can be produced, i.e. names of predicates. Each correspondence is a tuple  ⟨xi, xj, r, v⟩, where  xi ∈ Q(Ki), xj ∈ Q(Kj)and  xi, xjhave the same arity,  r ∈ {←, ↔, →}is a logical operator and  v ∈ [0, 1]is a confidence value. The latter expresses the probability of a correspondence to be correct. Let X be a vector of distinct logical variables with a length equal to the arity of  xi, then each  ⟨xi, xj, r, v⟩ ∈ Mijis translated to the axiom  ∀X xi(X) r xj(X).Let  K(Mij)denote the set of axioms resulting from such a translation for the alignment  Mij. Then the result of the matching process is an aligned KB  Kij = Ki ∪ K(Mij) ∪ Kj.

The KBs considered in this section were created by ontology matching systems participating in the Ontology Alignment Evaluation Initiative (OAEI) 2011 [EFvH+11]. Each matching experiment in the framework of OAEI represents a scenario in which a user obtains an alignment  Mijby means of some (semi)automatic tool for two real-world ontologies  Kiand  Kj. The latter are KBs expressed by the Web Ontology Language (OWL) [GHM+08] whose semantics is compatible with the SROIQ description logic (DL). This DL is a decidable fragment of first-order logic for which a number of effective reasoning methods exist [BCM+07]. Note that, SROIQ is a member of a broad family of DL knowledge representation languages. All DL KBs considered in this evaluation are expressible in SROIQ.

The goal of the first experiment was to compare the performance of STD and DIR on a set of large, but diagnostically uncomplicated KBs, generated for the Anatomy experiment of OAEI.45 In this experiment the matching systems had to find correspondences between two KBs describing the human and the mouse anatomy.  K1 (Human) and  K2 (Mouse) include 11545 and 4838 axioms, respectively, whereas the size of the alignment  M12produced by different matchers varies between 1147 and 1461 correspondences. Seven matching systems produced a classifiable but incoherent output. One system generated a classifiable and coherent aligned KB. However, this system employes a built-in heuristic diagnosis engine which does not guarantee to produce minimal diagnoses. That is, some axioms are removed without reason. Four systems produced KBs which could not be processed by current reasoning systems (e.g. HermiT) since these KBs could not be classified within 2 hours.

For testing the performance of our system we have to define the correct output of sequential diagnosis which we call the target diagnosis  Dt. We assume that the only available knowledge is  Mijtogether with  Kiand  Kj. In order to measure the performance of the matching systems the organizers of OAEI

image

Table 30.1: HS-TREE and INV-HS-TREE applied to Anatomy benchmark. Time is given in sec, Scoring stands for query selection strategy, Reaction is the average system reaction time between queries.

provided a golden standard alignment  Mtconsidered as correct. Nevertheless, we cannot assume that  Mtis explicitly available since the matching system would have used this information. W.r.t. the knowledge available, any minimal diagnosis w.r.t. the DPI  ⟨K(Mij), Ki ∪ Kj, ∅, ∅⟩(i.e.  K(Mij)is the KB and  Ki ∪Kjused as background theory) can be selected as  Dt. However, for every alignment we selected a minimal diagnosis as target diagnosis  Dtwhich is outside the golden standard. By this procedure we mimic cases where additional information can be acquired such that no correspondence of the golden standard is removed in order to establish coherence. We stress that this setting is unfavorable for diagnosis since providing more information by exploiting the golden standard would reduce the number of queries to ask. Consequently, we limit the knowledge to  Kijand use  Kij \ Dtto answer the queries.

In particular, the selection of a target diagnosis  Dtfor each  Kijoutput by a matching system was done in two steps: (i) compute the set of all minimal diagnoses AD w.r.t. the correspondences which are not in the golden standard, i.e.  K(Mij \Mt), and use  Ki ∪Kj ∪K(Mij ∩Mt)as background theory. The set of test cases are empty. I.e. the DPI is  ⟨K(Mij \ Mt), Ki ∪ Kj ∪ K(Mij ∩ Mt), ∅, ∅⟩. (ii) select  Dtrandomly from AD. The prior fault probabilities of axioms  ax ∈ K(Mij)expressing correspondences were set to  1 − vaxwhere  vaxis the confidence value provided by the matcher.

The tests were performed for the mentioned seven incoherent alignments where the input DPI is ⟨K(Mij), Ki ∪ Kj, ∅, ∅⟩and the output is a minimal diagnosis. We tested DIR and STD with both query selection strategies SPLIT-IN-HALF (SPL) and ENTROPY (ENT) in order to evaluate the quality of fault probabilities based on confidence values. Moreover, for generating a query, the number of leading diagnoses was limited to m = 9.

The results of the first experiment are presented in Table 30.1. DIR computed  Dtwithin 36 sec. on average and slightly outperformed STD which required 36.7 sec. The number of asked queries was equal for both methods in all but two cases resulting from KBs produced by the MapSSS system. For these KBs, DIR required one query more using ENT and one query less using SPL. In general, the results obtained for the Anatomy case show that DIR and STD have similar performance in both runtime and number of queries. Both DIR and STD identified the target diagnosis. Moreover, the confidence values

image

Table 30.2: Sequential diagnosis using direct computation of diagnoses. 30 Diag is the time required to find 30 minimal diagnoses, min |D| is the cardinality of a minimum cardinality diagnosis, Scoring indicates the query selection strategy, Reaction is the average system reaction time between queries, #CC number of consistency checks, CC gives average time needed for one consistency check. Time is given in sec.

provided by the matching systems appeared to be a good estimate for fault probabilities. Thus, in many cases ENT was able to find  Dtusing one query only, whereas SPL used 4 queries on average.

In the first experiment, the identification of the target diagnosis by sequential STD required the computation of 19 minimal conflicts on average. Moreover, the average size of a minimum cardinality diagnosis over all KBs in this experiment was 7. In the second experiment (see below), where STD is not applicable, the cardinality of the target diagnosis is significantly higher.

The second experiment was performed on KBs of the OAEI Conference benchmark which turned out to be problematic for STD. For these KBs we observed that the minimum cardinality diagnoses comprise 18 elements on average. In 11 of the 13 KBs of the second experiment (see Table 30.2), STD was unable to find any diagnosis within 2 hours. In the other two cases STD succeeded to find one minimal diagnosis for csa-conference-ekaw and nine for ldoa-conference-confof. However, DIR even succeeded to find 30 minimal diagnoses for each KB within time acceptable for interactive diagnosis settings. Moreover, on average DIR was able to find 1 minimal diagnosis in 8.9 sec., 9 minimal diagnoses in 40.83 sec. and 30 minimal diagnoses in 107.61 sec. (see Column 2 of Table 30.2). This result shows that DIR is a stable and practically applicable method even in cases where a knowledge base comprises high-cardinality faults.

In the Conference experiment, we first selected the target diagnosis  Dtfor each  Kijjust as it was done in the described Anatomy case. Next, we evaluated the performance of sequential DIR using both query selection methods. The results of the experiment presented in Table 30.2 show that DIR found  Dtfor each KB. On average DIR solved the problems more efficiently using ENT than SPL because also in the Conference case the confidence values provided a reasonable estimation of axiom fault probabilities. Only in three cases ENT required more queries than SPL.

Moreover, the experiments show that the efficiency of debugging methods depends highly on the runtime of the underlying reasoner. For instance, in the hardest case consistency checking took 93.4% of the total time whereas all other operations – including construction of the search tree, generation and selection of queries – took only 6.6% of time. Consequently, sequential DIR requires only a small fraction of computation effort. Runtime improvements can be achieved by advances in reasoning algorithms or the reduction of the number of consistency checks. Currently, in order to generate a query, DIR requires O(m ∗ |D| log( |K||D|))checks to find m leading diagnoses.

A further source for improvements can be observed for the ldoa-ekaw-iasted ontology where both methods asked the same number of queries. In this case, a sequential diagnosis session using ENT query selection method required only half of the consistency checks SPL did. However, an average consistency check made in the session using ENT took almost twice as long as an average consistency check using SPL. The analysis of this ontology showed that there is a small subset of axioms (called “hot spot” in [GPS12]) which made reasoning considerably harder. As practice shows, they can be resolved by suitable queries. This can be observed in the ldoa-ekaw-iasted case where SPL acquired appropriate test cases early and thereby found  Dtfaster. Therefore, research and application of methods allowing fast identification of such hot spots might result in a significant improvement of diagnosis runtime.

In this part, we presented a sequential diagnosis method for faulty KBs which is based on the direct computation of minimal diagnoses. We were able to reduce the number of consistency checks by avoiding the computation of minimized conflict sets and by computing just some set of minimal diagnoses instead of a set of most probable diagnoses or a set of minimum cardinality diagnoses. The presented evaluation results in Chapter 30 indicate that the performance of the suggested sequential diagnosis system is either comparable with or outperforms the existing approach in terms of runtime and required number of queries in case a KB includes a large number of faults. The scalability of the algorithms was demonstrated on a set of large KBs including thousands of axioms.

image

image

In this part we provide a discussion of related work in Chapter 32,46 summarize the contributions of this work in Chapter 33 and deal with our future work topics in Chapter 34.

To the best of our knowledge no interactive KB debugging methods that ask a user automatically selected queries have been proposed to repair faulty (monotonic) KBs so far (except for our own previous works [SF10, SFFR12, RSFF13, SFRF14c]).

Non-interactive debugging methods for KBs (ontologies) are introduced in [SHCH07, KPHS07, FS05]. Ranking of diagnoses and proposing a “best” diagnosis is presented in [KPSCG06]. This method uses a number of measures such as (a) the frequency with which a formula appears in conflict sets, (b) the impact on the KB in terms of its “lost” entailments when some formula is modified or removed, (c) provenance information about the formula and (d) syntactic relevance of a formula. All these measures are evaluated for each formula in a conflict set. The scores are then combined in a rank value which is associated with the corresponding formula. These ranks are then used by a modified hitting set tree algorithm that identifies diagnoses with a minimal rank. In this work no query generation and selection strategy is proposed if the intended diagnosis cannot be determined reliably with the given a-priori knowledge. In our work additional information is acquired until the minimal diagnosis with the intended semantics can be identified with confidence. In general, the work of [KPSCG06] can be combined with the approaches presented in our work as ranks of logical formulas can be taken into account together with other observations for calculating the prior probabilities of minimal diagnoses (see Section 4.6.1).

The idea of selecting the next query based on certain query selection measures was exploited in the generation of decisions trees [Qui86] and for selecting measurements in the model-based diagnosis of circuits [dKW87] (in both works, the minimal expected entropy measure was used). We extended these methods to query selection in the domain of KB debugging [SF10] and devised further query selection measures [SFFR12, RSFF13].

An approach for the debugging of faulty aligned KBs (ontologies) was proposed by [Mei11]. An aligned KB is the union of two KBs  K1and  K2and an alignment  A1,2(which is properly formatted as a set of logical formulas, cf. Definition 18 in [Mei11]).  A1,2is a set of correspondences (each with an associated automatically computed confidence value) produced by an automatic system (an ontology matcher) given  K1and  K2as inputs where each correspondence represents a (possible) semantic relationship between a term occurring in the first and a term occurring in the second input KB. The goal of a debugging system for faulty aligned KBs is usually the determination of a subset of the alignment A′1,2 ⊂ A1,2such that the aligned KB using  A′1,2is not faulty. In terms of our approaches, this corre- sponds to the setting  K := A1,2and  B := K1 ∪ K2. We have already shown in [RSFF12, SFRF12] that our systems can also be applied for fault localization in aligned KBs. The work of [Mei11] describes approximate algorithms for computing a “local optimal diagnosis” and complete methods to discover a “global optimal diagnosis”. Optimality in this context refers to the maximum sum of confidences in the resulting repaired alignment  A′1,2. In contrast to our framework, diagnoses are determined automatically without support for user interaction. Instead, [Mei11] demonstrates techniques for the manual revision of the alignment as a procedure independent from debugging. Another difference to our approach is the way of detecting sources of faults. We rely on a divide-and-conquer algorithm [Jun04] for the identification of a minimal conflict set  C ⊆ A1,2(in [Mei11] C is called a MIPS, cf. [FS05, SHCH07]). In the worst case the method we use exhibits only  O(|C| ∗ log(|A1,2|/|C|))calls of some function that performs a check for faults in a KB and internally uses a reasoner (in our case ISKBVALID, see Algorithm 1). The “shrink” strategy applied in [Mei11] (which is similar to the “expand-and-shrink” method used in [KPHS07]), on the other hand, requires a worst case number of  O(|A1,2|)calls to such a function. Empirical evaluations and a theoretical analysis of the best and worst case complexity of the “expand-and-shrink” method compared to the divide-and-conquer method performed in [SFJ08] revealed that the latter is preferable over the former. It should be noted that a similar divide-and-conquer method as used in our work could most probably also be plugged into the system in [Mei11] instead of the “shrink” method.

There are some ontology matchers which incorporate alignment repair features: CODI [HSNM11], YAM++ [NB12], ASMOV [JMSK09] and KOSIMap [RP10], for instance, employ logic-based techniques to search for a set of predefined “anti-patterns” which must not occur in the aligned ontology, either to avoid inconsistencies or incoherencies or to eliminate unwanted or redundant entailments. In case such a pattern is revealed, it is resolved by eliminating from the alignment some correspondences responsible for its occurrence. All the techniques incorporated in these matchers are distinct from the presented approaches in that they implement incomplete or approximate methods of alignment repair, i.e. not all alternative solutions to the alignment debugging problem are taken into account. As a consequence of this, on the one hand, the final alignment produced by these systems may still trigger faults in the aligned KB. On the other hand, a suboptimal solution may be found, e.g. in terms of the user-intended semantics w.r.t. the aligned ontology or other criteria such as alignment confidence or cardinality.

Another ontology matcher, LogMap 2 [JRGZH12], provides integrated debugging features and the opportunity for a user to interact during this process. However, the system is not really comparable with ours since it is very specialized and dedicated to the goal of producing a fault-free alignment. Concretely, there are at least two differences to our approach. First, LogMap 2 uses incomplete reasoning mechanisms in order to speed up the matching process. Hence, the output is not guaranteed to be fault-free. Second, the option for user interaction aims in fact at the revision of a set of correspondences, i.e. the sequential assessing of single correspondences as ’faulty’ or ’correct’. Our approach, on the contrary, asks the user queries (i.e. entailments of non-faulty parts of the KB).

An interactive technique similar to our approaches was presented in [NRG12], where a user is successively asked single KB formulas (ontology axioms) in order to obtain a partition of a given ontology into a set of desired or correct and a set of undesired or incorrect formulas. Whereas our strategies aim at finding a parsimonious solution involving minimal change to the given faulty KB in order to repair it, the method proposed in [NRG12] pursues a (potentially) more invasive approach to KB quality assurance, namely a (reasoner-supported) exhaustive manual inspection of (parts of) a KB. Given an inconsistent/incoherent KB, this technique starts from an empty set of desired formulas aiming at adding to this set only correct formulas of the KB which preserve consistency and coherency. Our approach, on the other hand, works its way forward the other way round in that it starts from the complete KB aiming at finding a minimal set of formulas to be deleted or modified which are responsible for the violation of the pre-specified requirements. Another difference of our approach compared to the one suggested in [NRG12] is the type of queries asked to the user and the way these are selected. Our method allows for the generation of queries which are not explicit formulas in the KB, but implicit consequences of non-faulty parts of the KB. Besides, the set of selectable queries in our approach differs from one iteration to the next due to the changing set of leading diagnoses whereas queries (i.e. KB formulas) in [NRG12] are known in advance and the challenge is to figure out the best ordering of formulas to be assessed by the user. Whereas we apply mostly information theoretic measures (e.g. the minimal expected entropy in the set of leading diagnoses after a query has been answered), the authors in [NRG12] employ “impact measures” which, roughly speaking, indicate the number of automatically classifiable formulas in case of positive and, respectively, negative classification of a query (i.e. a particular formula).

In this work we motivated why appropriate tool assistance is a must when it comes to repairing faulty KBs. For, KBs that do not satisfy some minimal quality criteria such as logical consistency can make artificial intelligence applications relying on the domain knowledge modeled by this KB completely useless. In such a case, no meaningful reasoning or answering of queries about the domain is possible.

Non-interactive debugging systems published in research literature often cannot localize all possible faults (incompleteness), suggest the deletion or modification of unnecessarily large parts of the KB (non-minimality), return incorrect solutions which lead to a repaired KB not satisfying the imposed quality requirements (unsoundness) or suffer from poor scalability due to the inherent complexity of the KB debugging problem [Stu08]. Even if a system is complete and sound and considers only minimal solutions, there are generally exponentially many solution candidates to select one from. However, any two repaired KBs obtained from these candidates differ in their semantics in terms of entailments and non-entailments. Selection of just any of these repaired KBs might result in unexpected entailments, the loss of desired entailments or unwanted changes to the KB which in turn might cause unexpected new faults during the further development or application of the repaired KB. Also, manual inspection of a large set of solution candidates can be time-consuming (if not practically infeasible), tedious and error-prone since human beings are normally not capable of fully realizing the semantic consequences of deleting a set of formulas from a KB.

To account for this issue, we evolved a comprehensive theory on which provably complete, sound and optimal (in terms of given probability information) interactive KB debugging systems can be built which suggest only minimal changes to repair a present KB. Interaction with a user is realized by asking the user queries. That is, a conjunction of logical formulas must be classified either as an intended or a non-intended entailment of the correct KB. To construct a query, only a minimal set of two solution candidates must be available. After the answer to a query is known, the search space for solutions is pruned. Iteration of this process until there is only a single solution candidate left yields a (repaired) solution KB which features exactly the semantics desired and expected by the user.

We presented algorithms for the computation of minimal conflict sets, i.e. irreducible faulty subsets of the KB, and for the computation of minimal diagnoses, i.e. irreducible sets of KB formulas that must be properly modified or deleted in order to repair the KB. We combined these algorithms with methods that derive probabilities of diagnoses from meta information about faults (e.g. the outcome of a statistical analysis) to constitute a non-interactive debugging system for monotonic KBs which computes minimal diagnoses in best-first order. Building on the idea of this non-interactive method, we devised a complete and sound best-first algorithm for the interactive debugging of monotonic KBs that allows a user to take part in the debugging process in order to figure out the best solution.

In order to integrate the new information collected by successive consultations of the user, the diag-

noses computation in an interactive system must be regularly stopped. That is, there must be alternating phases, on the one hand for the further exploration of the solution space in order to gain new evidence for query generation and on the other hand for user interaction. To this end, we proposed two new strategies for the iterative computation of minimal diagnoses that exactly serve this purpose. The first strategy, STATICHS, takes advantage of an artificial fixation of the solution set which guarantees the monotonic reduction of the solution space independently of the asked queries, the given answers or other parameters of the algorithm. In this vein, the complexity of this algorithm is initially known and the maximum overhead compared to the non-interactive algorithm is polynomially bound.47 On the downside, STATICHS cannot optimally exploit the information given by the answered queries and thus cannot employ powerful methods that enable a more efficient pruning of the solution search space.

Such powerful methods can be incorporated by the second suggested strategy, DYNAMICHS, the performance of which can be orders of magnitude better than the (initially fixed) performance of STATICHS in the best case. That is, the ability to fully incorporate the information gained from user interaction might lead to a modified problem instance for which only a single (best) solution exists with only a small fraction of the time, space and user effort needed by STATICHS. Moreover, the (exact) solution located by means of an interactive debugging session applying DYNAMICHS is generally a better (verified) solution than the (exact) solution found by use of STATICHS. However, the complexity of DYNAMICHS depends to a great degree on which queries are generated and which input parameters are chosen and the worst case complexity is not initially bound as in case of STATICHS. In the design of DYNAMICHS we put a particular emphasis on memory saving behavior which is manifested, for instance, by the manner how duplicate search tree paths are handled.

For selecting the best subsequent query in interactive debugging we first proposed and exhaustively analyzed two strategies: The “split-in-half” strategy prefers queries which allow eliminating a half of the leading diagnoses. The entropy-based strategy employs information theoretic concepts to exploit knowledge about the likelihood of formulas to be faulty. Based on the probability of a formula containing an error we can predict the (expected) information gain produced by a query result, enabling us to select the best subsequent query according to a one-step-lookahead entropy-based scoring function.

In comprehensive experiments using real-world KBs we compared the entropy-based method with the “split-in-half” strategy and witnessed a significant reduction in the number of queries required to identify the correct diagnosis when the entropy-based method is applied. Depending on the quality of the given prior fault probabilities, the required number of queries could be reduced by up to 60%. In order to evaluate the robustness of the entropy-based method we experimented with different prior fault probability distributions as well as different qualities of the prior probabilities. Furthermore, we investigated cases where knowledge about fault probabilities is missing or inaccurate. In case such knowledge is unavailable, the entropy-based methods ranks the diagnoses based on the number of syntax elements contained in a formula and the number of formulas in a diagnosis. Given that this is a reasonable guess (i.e. the sought diagnosis is not at the lower end of the diagnoses ranked by their prior probabilities), the entropy-based method outperformed “split-in-half”. Moreover, even if the initial guess is not reasonable, the entropy-based method improves the accuracy of the probabilities as more questions are asked. Furthermore, the applicability of the approach to real-world KBs containing thousands of formulas was demonstrated by an extensive set of evaluations.

We showed that unconditional reliance upon the entropy-based method might still be problematic in the presence of fault information that is considerably uncertain. For, the entropy-based strategy fully exploits and gains from the given fault information. In this vein, it proved to speed up the debugging procedure in the normal case. However, we found out in experiments that it might also have a negative impact on the performance in the bad case where the actual solution diagnosis is rated as highly improbable. As an alternative, one might prefer to rely on a tool (e.g. “split-in-half”) which does not consider any fault information at all. In this case, however, possibly well-chosen information cannot be exploited, resulting again in inefficient debugging actions.

Minimal effort for the interacting user can be achieved if both the query selection method is chosen carefully and the provided fault information satisfies some minimum quality requirements. In particular, for deficient fault information and unfavorable strategy for query selection, we reported on cases where the overhead in terms of user effort exceeds 2000% (!) in comparison to employing a more favorable query selection strategy. Unfortunately, assessment of the fault information is only possible a-poteriori (after the debugging session is finished and the correct solution is known). To tackle this issue, we proposed a reinforcement learning strategy (RIO) which combines the benefits of the entropy-based and the “’split-in-half’ approaches, i.e. high potential (to perform well) and low risk (to perform badly). RIO continuously adapts its behavior depending on the performance achieved and in this vein minimizes the risk of integrating low-quality fault information into the debugging process.

The RIO approach makes interactive debugging practical even in scenarios where reliable fault estimates are difficult to obtain. Tested under various conditions, the RIO algorithm revealed good scalability and reaction time as well as superior average performance to both the entropy-based as well as the “split-in-half” strategy in all tested cases w.r.t. required amount of user interaction. Highest achieved savings of RIO as against the best other strategy amounted to more than 80%. Further on, the performed evaluations provided evidence that for 100% of the cases in the hardest (from the debugging point of view) class of faulty test KBs, RIO performed at least as good as the best other strategy and in more than 70% of these cases it even manifested superior behavior to the best other strategy. Choosing RIO over other approaches can involve an improvement by the factor of up to 23, meaning that more than 95% of user time and effort might be saved per debugging session.

Moreover, we came up with mechanisms for efficiently dealing with KB debugging problems involving high cardinality faults. In the standard interactive debugging approach described in the first parts of this work, the computation of queries is based on the generation of the set of most probable (or minimum cardinality) leading diagnoses. By this postulation, certain quality guarantees about the output solution can be given. However, we learned that dropping this requirement can bring about substantial savings in terms of time and especially space complexity of interactive debugging, in particular in debugging scenarios where faulty KBs are (partly) generated as a result of the application of automatic systems, e.g. KB (ontology) learning or matching systems.

To cope with such situations, we proposed to base query computation on any set of leading diagnoses using a “direct” method for diagnosis generation. Contrary to the standard method that exploits minimal conflict sets, this approach takes advantage of the duality between minimal diagnoses and minimal con-flict sets and employs “inverse” algorithms to those used in the standard approach in order to determine minimal diagnoses directly from the DPI without the indirection via conflict sets.

We studied the application of this direct method to high cardinality faults in KBs and noticed that the number of required queries per debugging session is hardly affected for cases when the standard approach is also applicable. However, the direct method proved applicable and able to locate the correct solution diagnosis also in situations when the standard approach (albeit one that not yet incorporates the powerful search tree pruning techniques introduced in this work) is not due to time or memory issues.

We want to point out that this work is unique in that it provides an in-depth theoretical workup of the topic of interactive KB debugging which (to the best of our knowledge) cannot be found in such a detailed fashion in other works. Furthermore, this is the first work that gives precise definitions of the problems addressed in interactive KB debugging. Additionally, it is unique in that it features (new) algorithms that provably solve these interactive KB debugging problems. To account for a tradeoff between solution quality and execution time, these algorithms are equipped with a feature to compute approximate solutions where the goodness of the approximation can be steered by the user. Another unique characteristic of this work is that it deals with an entire system of algorithms that are required for the interactive debugging of monotonic KBs, considers and details all algorithms separately, analyzes their complexity, proves their correctness and demonstrates how all these algorithms are orchestrated to make up a full-fledged and provably correct interactive KB debugging system.

This work has given rise to several questions we will elaborate on in our future work:

Query Generation and Selection. Our discussions of the presented query generation methods have revealed some drawbacks (cf. Chapter 8). Albeit being a fixed-parameter tractable problem as argued, the exponential time complexity regarding the number of leading diagnoses |D| in case an optimal query is desired is clearly an aspect that should be improved. This high complexity arises from the paradigm of computing an optimal query w.r.t. some measure qsm() by calculating a (generally exponentially large) pool QP of queries in a first stage, whereupon the best query in QP according to qsm() is filtered out in a second stage.

A key to solving this issue is the use of a different paradigm that does not rely on the computation of the pool QP. Instead, qualitative measures can be derived from quantitative measures that have been used in interactive debugging scenarios [SFFR12, RSFF13, SF10]. These qualitative measures provide a way to estimate the qsm() value of partial q-partitions, i.e. ones where not all leading diagnoses have been assigned to the respective set in the q-partition yet. In this way a direct search for a query with (nearly) optimal properties is possible. A similar strategy called CKK has been employed in [SFFR12] for the information gain measure qsm() := ENT() (see Section 9.3). From such a technique we can expect to save a high number of reasoner calls. Because usually only a small subset of q-partitions included in a query pool (of exponential cardinality) is required to find a query with desirable properties if the search is implemented by means of a heuristic that involves the exploration of seemingly favorable (potential) queries and (partial) q-partitions, respectively, first.

Another shortcoming of the paradigm of query pool generation and subsequent selection of the best query is the extensive use of reasoning services which may be computationally expensive (depending on the given DPI). Instead of computing a set of common entailments Q of a set of KBs  K∗ifirst and consulting a reasoner to fill up the (q-)partition for Q in order to test whether Q is a query at all (see Chapter 8), the idea enabling a significant reduction of reasoner dependence is to compute some kind of canonical query without a reasoner and use simple set comparisons to decide whether the associated partition is a q-partition. Guided by qualitative properties mentioned before, a search for such q-partition with desirable properties can be accomplished without reasoning at all. Also, a set-minimal version of the optimal canonical query can be computed without reasoning aid. Only for the optional enrichment of the identified optimal canonical query by additional entailments and for the subsequent minimization of the enriched query, the reasoner may be employed. We will present strategies accounting for these ideas in the near future.

Another aspect that can be improved is that only one minimized version of each query is computed by Algorithm 4. That is, per q-partition P, there might be some set-minimal queries which do not occur in the output set QP. From the point of view of how well a query might be understood by an interacting user, of course not all minimized queries can be assumed equally good in general. For instance, consider the minimized queries  Q4and  Q10in Table 8.3 on page 113. Both are equally good regarding their q-partitions (just the sets  D+and  D−are commuted), but most people will probably agree that  Q4is much easier to comprehend from the logical point of view and thus much easier to answer.

Hence, in order to avoid a situation where a potentially best-understood query w.r.t. P is not included in QP, the query minimization process (see Section 8.3) might be adapted to take into account some information about faults the interacting user is prone to. This could be exploited to estimate how well this user might be able to understand and answer a query. For instance, given that the user frequently has problems to apply  ∃in a correct manner to express what they intend to express, but has never made any mistakes in formulating implications  →, then the query  Q1 = {∀X p(X) → q(X), r(a)}might be better comprehended than  Q2 = {∀X∃Y s(X, Y )}. One way to achieve the finding of a well-understood query for some q-partition P is to run the query minimization MINQ more than once, each time with a modified input (using a hitting set tree to accomplish this in a systematic manner – cf. Chapter 4, where an analogue idea is used to compute different minimal conflict sets w.r.t. a DPI). In this way, different set-minimal queries for P can be identified and the process can be stopped when a suitable query is found.

In order to come up with such a strategy, however, one must first gain insight into how well a user might understand certain logical formalisms and what properties make a query easy to comprehend from the logical perspective. It is planned to gather corresponding data about different users in the scope of a user study and to utilize the results to achieve a model of “query hardness” (by sticking to a similar overall methodology as used in [HBP11]) in order to come up with strategies for the determination of minimal queries that are easily understood. Note that such a model could also act as a guide how to specify the initial fault probabilities of syntactical elements that are used to obtain diagnoses probabilities (see Section 4.6).

Incorporating A-Posteriori Probabilities into Diagnosis Search. As we discussed in Remark 9.3 on page 125, the a-priori (pD,prio()) and the a-posteriori (pD()) diagnoses probabilities might not only differ in terms of the probability values assigned to different diagnoses, but also in terms of the probability order of diagnoses. Incorporation of updated probabilities directly into the hitting set tree algorithms to be used for the determination of leading diagnoses in the order prescribed by an updated probability measure is only possible if there is an additional update operator (besides Bayes’ Theorem for adapting diagnoses probabilities) that can be applied to formula probabilities. For, the latter are exploited in the hitting set tree to assign probability weights to paths that are not yet diagnoses (cf.  pnodes()specified by Definition 4.9 and the discussion of Formula 4.6) in order to guide the search for minimal diagnoses in best-first order. Updated diagnosis probabilities are not helpful at all for this purpose. Devising a reasonable mechanism of updating formula probabilities seems to be hard mostly due to the lack of suitable data that might be collected during the debugging session to accomplish that. What would be imaginable during the debugging session is to try to learn something about the fault probability of syntactical elements by examining the positive (all formulas are definitely correct) and singleton negative (the single formula is definitely incorrect) test cases. However, a drawback of such a strategy comes into effect when only syntactically very simple queries are used which is, for instance, the case in Example 8.1 (see the definition of the GETENTAILMENTS function there). From such queries not many useful insights concerning faulty syntactical elements might be gained. On the other hand, such queries are absolutely desirable from the point of view of how well a user might comprehend the formulas asked by the system. Hence, these two aspects seem to contradict each other. Still, it is a topic for future research to attempt to elaborate a solution for that issue.

Facilitation of More Informative User Answers. The debugging system described in this work is designed to get along with just a “minimal” feedback of a user regarding an asked query. That is, we assume the user’ answer to a query Q to be merely true, i.e. each formula in Q (or the conjunction of formulas in Q) must be entailed by the correct KB, or false, i.e. at least one formula in Q (or the conjunction of formulas in Q) must not be entailed by the correct KB. However, imagine a user being presented Q and think of how they might proceed in order to come up with an answer to Q. The first observation is that, in order to respond by true, a user must definitely scrutinize each single formula in Q because otherwise they could never decide for sure whether the conjunction of all formulas in Q is correct. Another observation is that a user might cease to go through the rest of the formulas in case they have already identified one that must not be an entailment of the desired KB. For, in this situation, the overall query Q is already false. This however indicates that at least one formula must be known to be correct or false whatever answer is given to Q. Therefore, we can usually expect a user to be able to give exactly this information, namely one formula in Q that must be incorrect, additionally to answering by false. This extra piece of information can be exploited to achieve better space and time efficiency in the context of diagnosis computation since knowing which formula must definitely not be entailed gives more information that just a set of formulas of which we know that at least one among those is not entailed. Apart from that, there might be other pieces of additional information a user might be easily able to give additionally to the “minimal” feedback we assume in this work. Proposing more efficient algorithms that exploit such tapes of additional information is on our future work agenda.

Usage of “Positive-Impact” Queries in Combination with DYNAMICHS. As we discussed in Section 12.1 in the context of Algorithm 5 in dynamic mode, an added test case might give rise to some pruning steps as well as it might induce the construction of new subtrees (where “new” means that these would be no subtress of a hitting set tree w.r.t. the DPI not including this test case). The latter situation occurs when “completely new” minimal conflict sets (those that are in no subset-relationship with existing ones) are introduced by the addition of a test case. If this is the only impact of a test case, then this test case has only a negative influence on the time and space complexity of Algorithm 5 using DYNAMICHS. In other words, none of the invalidated minimal diagnoses (and no other nodes in the tree) are redundant, but all of them must additionally hit the set of “completely new” minimal conflict sets (in order to become diagnoses w.r.t. new DPI). Hence, in this case, the transition from one DPI to another including this test case results only in monotonic growth of the tree. If possible, such “negative-impact test cases” must be avoided. On the other hand, one must strive for the usage of “positive-impact test cases”, i.e. those that only trigger tree pruning, but no tree expansion. Defining and studying properties that constitute such “positive-impact test cases” and “negative-impact test cases”, respectively, and developing specialized algorithms for extracting exactly those types of queries that enable as substantial and effective pruning as possible in the context of DYNAMICHS is part of our already ongoing research. Note that a rough intuition of which properties make out a “positive-impact test case” is illustrated on the basis of an example in Section 12.1.

Finding the Right Expert to Answer a Query in a Collaborative KB Development Setting. As we mentioned in Chapter 1, there are collaborative KB development projects such as the OBO Project48 and the NCI Thesaurus49, where many different people contribute to the specification of their knowledge in large KBs. In such a setting, it may be hard to decide who is the person that has the highest chance of being able to answer a concrete query correctly. The idea in such a scenario could be to use a combination of different measures such as educational level (e.g. professor versus PhD student) or hierarchy of contributors (e.g. senior user versus regular user), statistical information about past faults of a contributor (e.g., how many of the formulas originally authored by a person have been corrected by other persons of higher educational level) or provenance information regarding terms occurring in the query (who has authored most of the formulas in which these terms occur?) in order to learn an “expert model” and use it to devise some kind of recommender system [JZFF10] that suggests which person to ask a particular query.

Once established, such an expert model together with provenance information of KB formulas and other types of information discussed in Section 4.6.1 could also be exploited when it comes to the defini-tion of the fault information provided as input to our debugging system. An example of a system which enables the remote collaborative development of KBs (ontologies) and also provides logs of interesting usage data such as formula change logs and provenance information is Web Protégé [TNNM13].

Studying the Performance of the Newly Proposed Iterative Diagnosis Computation Mechanisms. We will conduct extensive experiments using faulty real-world KBs in order to assess the impact of the usage of the powerful search tree pruning techniques of the DYNAMICHS method or the guaranteed “convergence” towards the correct solution diagnosis of the STATICHS in comparison to interactive debugging algorithms used in our previous works [SFFR12, RSFF13, SF10, SFRF14c].

Methods for Query Selection without Computation of Diagnoses. We are also working on “conflict-based debugging” methods that do not rely on the computation of leading diagnoses for query generation. Instead, queries might be generated directly from (minimal) conflict sets. Such methods might be used together with a boolean hitting set search tree (which was originally proposed by [JL02] and optimized by [PQ12]) where the tree is regularly pruned using test cases such that tree branching is mostly or completely suppressed. In this manner, the tree remains small in size and all in all computes only a single diagnosis, i.e. the one consistent with all answered queries. Such an approach could be very space saving. Nevertheless, it is unclear whether the number of required queries and/or the computation time might increase. Implementing such an approach and answering these open questions is a topic on our future work agenda.

Employing Advanced Reasoning Techniques to Increase Debugging Efficiency. To cope with application contexts where reasoning is the main obstacle for efficient debugging, a plan for future work is to integrate advances reasoning techniques into our system.

For example, a modular combination of reasoners [RGH12] might be adopted. In such a system there are two sound reasoners are combined where one (R1, e.g. HermiT [SMH08]) is complete for the full logic L (e.g. OWL 2 [GHM+08]) and the other one (R2, e.g. ELK [KKS14]) is complete for only a fragment  L′ ⊂ L(e.g. the OWL 2 EL profile [GHM+08]), but  L′can be handles much more efficiently by R2. The system in [RGH12] could be used to assign the bulk of the workload on R2 while relying on R1 only if necessary.

Another interesting approach might be to employ techniques introduced in [GPS12] for detecting so-called “hot spots” in KBs which, when deleted from the KB, lead to much more efficient reasoning. Since reasoning in our approaches is mostly applied to fractions of the faulty KB, we could possibly benefit from such an approach. For instance, queries are entailments of a set of different non-faulty fractions  K∗i = (K \ Di) ∪ B ∪ UPof the original KB. Now, given that a hot spot H is included, say in B ∪ UP, then we might well delete H from this subset of  K∗iand might still obtain meaningful queries. The reason is that H does not include any formulas in  UD(where D is the set of leading diagnoses) which are essential for query computation from the diagnosis discrimination point of view. Formulas in B ∪ UP, on the other hand, are included in all non-faulty fractions  K∗iand thus do not directly serve the discrimination between diagnoses. Since  UDmight be much smaller in size than  B ∪ UPin many scenarios (due to a usually small number of leading diagnoses in D), there might be a high chance for hot spots to be located in  B ∪ UPrather than in  UD.

[ARW12] Rui Abreu, André Riboira, and Franz Wotawa. Constraint-based Debugging of Spreadsheets. In CIbSE, pages 1–14, 2012.

[Baa03] Franz Baader. Appendix: Description Logic Terminology. In Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider, editors, Description Logic Handbook, pages 485–495. Cambridge University Press, 2003.

[BATJ91] Tom Bylander, Dean Allemang, Michael Tanner, and John Josephson. The computational complexity of abduction. Artificial Intelligence, 49:25–60, 1991.

[BBL05] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In IJCAI, pages 364–369, 2005.

[BCM+07] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. PatelSchneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2007.

[BKP12] Franz Baader, Martin Knechtel, and Rafael Penaloza. Context-dependent views to axioms and consequences of Semantic Web ontologies. Web Semantics: Science, Services and Agents on the World Wide Web, 12-13:22–40, April 2012.

[BLHL+01] Tim Berners-Lee, James Hendler, Ora Lassila, et al. The Semantic Web. 2001. http: //bit.ly/18ZvAXo.

[Bor96] Alex Borgida. On the relative expressiveness of description logics and predicate logics. Artificial Intelligence, 82(1-2):353–367, 1996.

[BP08] Franz Baader and R. Penaloza. Axiom Pinpointing in General Tableaux. Journal of Logic and Computation, 20(1):5–34, November 2008.

[CFD93] Luca Console, Gerhard Friedrich, and Daniele Theseider Dupre. Model-Based Diagnosis Meets Error Diagnosis in Logic Programs. In IJCAI, pages 1494–1501, 1993.

[CGT89] Stefano Ceri, Georg Gottlob, and Letizia Tanca. What you always wanted to know about Datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, I(1), 1989.

[Chu36] Alonzo Church. An unsolvable problem of elementary number theory. American Journal of Mathematics, pages 345–363, 1936.

[CL73] Chin-Liang Chang and Richard Char-Tung Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press Inc., 1973.

[Coo71] Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pages 151–158. ACM, 1971.

[CP71] John Ceraso and Angela Provitera. Sources of error in syllogistic reasoning. Cognitive Psychology, 2(4):400–410, 1971.

[CRV+09] Oscar Corcho, Catherine Roussey, Vilches Blázquez, Luis Manuel, and Ivan Pérez. Patternbased OWL Ontology Debugging Guidelines. In Eva Blomqvist, Kurt Sandkuhl, Francois Scharffe, and Vojtech Svatek, editors, Workshop on Ontology Patterns (WOP 2009), collocated with the 8th International Semantic Web Conference (ISWC 2009)., CEUR Workshop proceedings, pages 68–82, 2009.

[DF95] Rod G. Downey and Michael R. Fellows. Fixed-parameter tractability and completeness I: Basic results. SIAM Journal on Computing, 24(4):873–921, 1995.

[dKW87] Johan de Kleer and Brian C. Williams. Diagnosing multiple faults. Artificial Intelligence, 32(1):97–130, April 1987.

[DQPS11] Jianfeng Du, Guilin Qi, Jeff Z. Pan, and Yi-Dong Shen. A Decomposition-Based Approach to OWL DL Ontology Diagnosis. In Proceedings of 23rd IEEE International Conference on Tools with Artificial Intelligence, pages 659–664. IEEE Press, November 2011.

[Dur10] Rick Durrett. Probability: Theory and Examples, Fourth Edition. Cambridge University Press, 2010.

[EFvH+11] Jérôme Euzenat, Alfio Ferrara, Willem Robert van Hage, Laura Hollink, Christian Meilicke, Andriy Nikolov, Dominique Ritze, François Scharffe, Pavel Shvaiko, Heiner Stuckenschmidt, Ondrej Sváb-Zamazal, and Cássia Trojahn dos Santos. Final results of the Ontology Alignment Evaluation Initiative 2011. In Proceedings of the 6th International Workshop on Ontology Matching, pages 1–29. CEUR-WS.org, 2011.

[FFJS04] Alexander Felfernig, Gerhard Friedrich, Dietmar Jannach, and Markus Stumptner. Consistency-based diagnosis of configuration knowledge bases. Artificial Intelligence, 152(2):213 – 234, 2004.

[FS05] Gerhard Friedrich and Kostyantyn Shchekotykhin. A General Diagnosis Method for Ontologies. In Yolanda Gil, Enrico Motta, Richard Benjamins, and Mark Musen, editors, Proceedings of the 4th International Semantic Web Conference (ISWC 2005), pages 232–246. Springer, 2005.

[FSW99] Gerhard Friedrich, Markus Stumptner, and Franz Wotawa. Model-based diagnosis of hardware designs. Artif. Intell., 111(1-2):3–39, 1999.

[FSZ11] Alexander Felfernig, Monika Schubert, and Christoph Zehentner. An efficient diagnosis algorithm for inconsistent constraint sets. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 26(1):53–62, June 2011.

[GHM+08] Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter F. Patel-Schneider, and Ulrike Sattler. OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on the World Wide Web, 6(4):309–322, November 2008.

[GPS12] Rafael Goncalves, Bijan Parsia, and Ulrike Sattler. Performance Heterogeneity and Approximate Reasoning in Description Logic Ontologies. In Proceedings of 11th International Semantic Web Conference (ISWC 2012), pages 82–98, 2012.

[GSW89] Russell Greiner, Barbara A. Smith, and Ralph W. Wilkerson. A correction to the algorithm in Reiter’s theory of diagnosis. Artificial Intelligence, 41(1):79–88, 1989.

[HBP11] Matthew Horridge, Samantha Bail, and Bijan Parsia. The cognitive complexity of OWL justifications. In Proceedings of the 10th International Semantic Web Conference (ISWC 2011). Springer, 2011.

[HM01] Volker Haarslev and Ralf Müller. RACER System Description. In Rajeev Goré, Alexander Leitsch, and Tobias Nipkow, editors, 1st International Joint Conference on Automated Reasoning, volume 2083 of Lecture Notes in Computer Science, pages 701–705, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.

[Hor11] Matthew Horridge. Justification based Explanation in Ontologies. PhD thesis, University of Manchester, 2011.

[HPS08] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Laconic and Precise Justifications in OWL. In Amit Shet, Steffen Staab, Mike Dean, Massimo Paolucci, Diana Maynard, Timothy Finin, and Krishnaprasad Thirunarayan, editors, Proceedings of the 7th International Semantic Web Conference (ISWC 2008), volume 5318 of Lecture Notes in Computer Science, pages 323–338. Springer, 2008.

[HPS09] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Lemmas for Justifications in OWL. In Proceedings of the 22nd Workshop of Description Logics DL2009. CEUR Workshop Proceedings, 2009.

[HPS10] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Justification Oriented Proofs in OWL. In Proceedings of the 9th International Semantic Web Conference (ISWC 2010). Springer, 2010.

[HPS12a] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Extracting justifications from BioPortal ontologies. In Proceedings of the 11th International Semantic Web Conference (ISWC 2012), pages 287–299, 2012.

[HPS12b] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Justification Masking in Ontologies. In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning, 2012.

[HSNM11] Jakob Huber, Timo Sztyler, Jan Noessner, and Christian Meilicke. CODI: Combinatorial Optimization for Data Integration - Results for OAEI 2011. In Proceedings of the 6th International Workshop on Ontology Matching, 2011.

[JL99] Philip N. Johnson-Laird. Deductive reasoning. Annual review of psychology, 50:109–135, 1999.

[JL02] Yun-fei Jiang and Li Lin. Computing the minimal hitting sets with binary HS-tree. Journal of software, 13(12):2267–2274, 2002.

[JMSK09] Yves R. Jean-Mary, E. Patrick Shironoshita, and Mansur R. Kabuka. Ontology Matching with Semantic Verification. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):235–251, September 2009.

[JRG11] Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. Logmap: Logic-based and scalable ontology matching. In Proceedings of the 10th International Semantic Web Conference (ISWC 2011), pages 273–288. Springer, 2011.

[JRGZH12] Ernesto Jiménez-Ruiz, Bernardo Cuenca Grau, Yujiao Zhou, and Ian Horrocks. Large-scale interactive ontology matching: Algorithms and implementation. In Proceedings of 20th European Conference on Artificial Intelligence (ECAI2012), pages 444–449, 2012.

[Jun04] Ulrich Junker. QUICKXPLAIN: Preferred Explanations and Relaxations for OverConstrained Problems. In Deborah L. McGuinness and George Ferguson, editors, Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, volume 3, pages 167–172. AAAI Press / The MIT Press, 2004.

[JZFF10] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. Recommender Systems: An Introduction. Cambridge University Press, New York, NY, USA, 1st edition, 2010.

[Kal06] Aditya Kalyanpur. Debugging and Repair of OWL Ontologies. PhD thesis, University of Maryland, College Park, 2006.

[Kar72] Richard M. Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, pages 85–103, 1972.

[Kaz08] Yevgeny Kazakov. SRIQ and SROIQ are harder than SHOIQ. In Proceedings of the 21st Workshop of Description Logics DL2008, 2008.

[KK06] Martin Kreuzer and Stefan Kühling. Logik für Informatiker. Pearson Studium, München, Germany, 2006.

[KKLO86] Narendra Karmarkar, Richard M. Karp, George S. Lueker, and Andrew M. Odlyzko. Probabilistic analysis of optimum partitioning. Journal of Applied Probability, 23(3):626–645, 1986.

[KKS14] Yevgeny Kazakov, Markus Krötzsch, and František Simanˇcík. The incredible ELK. Journal of automated reasoning, 53(1):1–61, 2014.

[Kor98] Richard E. Korf. A complete anytime algorithm for number partitioning. Artificial Intelligence, 106(2):181–203, December 1998.

[KPHS07] Aditya Kalyanpur, Bijan Parsia, Matthew Horridge, and Evren Sirin. Finding all Justifica-tions of OWL DL Entailments. In Karl Aberer, Key-Sun Choi, Natasha F. Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux, editors, The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, volume 4825 of LNCS, pages 267–280, Berlin, Heidelberg, November 2007. Springer Verlag.

[KPS+06] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, Bernardo Cuenca Grau, and James Hendler. Swoop: A Web Ontology Editing Browser. J. Web Sem., 4(2):144–153, 2006.

[KPSCG06] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, and Bernardo Cuenca Grau. Repairing Un- satisfiable Concepts in OWL Ontologies. In York Sure and John Domingue, editors, The Semantic Web: Research and Applications, 3rd European Semantic Web Conference, ESWC 2006, volume 4011 of Lecture Notes in Computer Science, pages 170–184, Berlin, Heidelberg, 2006. Springer.

[KPSH05] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, and James Hendler. Debugging Unsatisfiable Classes in OWL Ontologies. Web Semantics: Science, Services and Agents on the World Wide Web, 3(4):268–293, 2005.

[MB88] Stephen Muggleton and Wray L. Buntine. Machine Invention of First-order Predicates by Inverting Resolution. In J Laird, editor, Proceedings of the 5th International Conference on Machine Learning (ICML’88), pages 339–352. Morgan Kaufmann, 1988.

[Mei11] Christian Meilicke. Alignment Incoherence in Ontology Matching. PhD thesis, Universität Mannheim, 2011.

[Men09] Elliott Mendelson. Introduction to Mathematical Logic, Fifth Edition. CRC Press, 2009.

[MPSP09] Boris Motik, Peter F. Patel-Schneider, and Bijan Parsia. OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax. W3C recommendation, pages 1–133, 2009.

[MS72] Albert R. Meyer and Larry J. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In 13th Annual Symposium on Switching and Automata Theory, pages 125–129. IEEE, 1972.

[MS09] Christian Meilicke and Heiner Stuckenschmidt. An Efficient Method for Computing Alignment Diagnoses. In Proceedings of the 3rd International Conference on Web Reasoning and Rule Systems, pages 182–196. Springer-Verlag, 2009.

[MSH09] Boris Motik, Rob Shearer, and Ian Horrocks. Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research, 36(1):165–228, 2009.

[MST07] Christian Meilicke, Heiner Stuckenschmidt, and Andrei Tamilin. Repairing Ontology Mappings. Proceedings of the 22nd National Conference on Artificial intelligence - AAAI’07, pages 1408–1413, 2007.

[MST08] Christian Meilicke, Heiner Stuckenschmidt, and Andrei Tamilin. Reasoning Support for Mapping Revision. Journal of Logic and Computation, 19(5):807–829, August 2008.

[Mug95] Stephen Muggleton. Inverse entailment and Progol 1 Introduction. New Generation Computing, Special issue on Inductive Logic Programming, 13(3-4):245–286, 1995.

[NB12] Duyhoa Ngo and Zohra Bellahsene. YAM++ - A combination of graph matching and machine learning approach to ontology alignment task. Journal of Web Semantics - The Semantic Web Challenge 2011 Special Issue, 2012.

[NCLM06] Natalya F. Noy, A. Chugh, W. Liu, and Mark A. Musen. A framework for ontology evolution in collaborative environments. In Proceedings of the 5th International Semantic Web Conference (ISWC 2006), 2006.

[NPQW13] Iulia Nica, Ingo Pill, Thomas Quaritsch, and Franz Wotawa. The route to success: A per- formance comparison of diagnosis algorithms. In Proceedings of the Twenty-Third international Joint Conference on Artificial Intelligence, pages 1039–1045, 2013.

[NRG12] Nadeschda Nikitina, Sebastian Rudolph, and Birte Glimm. Interactive Ontology Revision. Web Semantics: Science, Services and Agents on the World Wide Web, 12-13:118–130, 2012.

[NSD+00] Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray W. Fergerson, and Mark A. Musen. Creating Semantic Web Contents with Protégé-2000. IEEE Intelligent Systems, 16(2):60–71, 2000.

[PQ12] Ingo Pill and Thomas Quaritsch. Optimizations for the Boolean Approach to Computing Minimal Hitting Sets. In Proceedings of the 20th European Conference on Artificial Intelligence, pages 648–653, 2012.

[PSHH+04] Peter F. Patel-Schneider, Patrick Hayes, Ian Horrocks, et al. OWL Web Ontology Language Semantics and Abstract Syntax. W3C recommendation, 10, 2004.

[PSK05] Bijan Parsia, Evren Sirin, and Aditya Kalyanpur. Debugging OWL ontologies. In Allan Ellis and Tatsuya Hagino, editors, Proceedings of the 14th international conference on World Wide Web, pages 633–640. ACM Press, May 2005.

[PW03] Bernhard Peischl and Franz Wotawa. Model-Based Diagnosis or Reasoning from First Principles. IEEE Intelligent Systems, 18:32–37, 2003.

[Qui86] John Ross Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.

[RCVB09] Catherine Roussey, Oscar Corcho, and Luis Manuel Vilches-Blázquez. A catalogue of OWL ontology antipatterns. In International Conference On Knowledge Capture, pages 205–206, Redondo Beach, California, USA, 2009. ACM.

[RDH+04] Alan Rector, Nick Drummond, Matthew Horridge, Jeremy Rogers, Holger Knublauch, Robert Stevens, Hai Wang, and Chris Wroe. OWL Pizzas: Practical Experience of Teaching OWL-DL: Common Errors & Common Patterns. In Enrico Motta, Nigel R. Shadbolt, Arthur Stutt, and Nick Gibbins, editors, Engineering Knowledge in the Age of the SemanticWeb 14th International Conference, EKAW 2004, pages 63–81, Whittenbury Hall, UK, 2004. Springer.

[Rei87] Raymond Reiter. A Theory of Diagnosis from First Principles. Artificial Intelligence, 32(1):57–95, 1987.

[RGH12] Ana Armas Romero, Bernardo Cuenca Grau, and Ian Horrocks. MORe: Modular combination of OWL reasoners for ontology classification. In Proceedings of the 11th International Semantic Web Conference (ISWC 2012), 2012.

[RN10] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson Education, 3rd edition, 2010.

[Rod15] Patrick Rodler. A Theory of Interactive Debugging of Knowledge Bases in Monotonic Logics. Master’s thesis, Alpen-Adria Universität Klagenfurt, 2015.

[RP10] Quentin Reul and Jeff Z. Pan. KOSIMap: Use of Description Logic Reasoning to Align Heterogeneous Ontologies. In Volker Haarslev, David Toman, and Grant Weddell, editors, Proceedings of the 23rd International Workshop on Description Logics DL2010, pages 489– 500. CEUR Workshop Proceedings, 2010.

[RSFF11] Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss, and Gerhard Friedrich. Balancing Brave and Cautious Query Strategies in Ontology Debugging. In Tudor Groza Vit Novacek, Zhisheng Huang, editor, Proceedings of the Joint Workshop on Knowledge Evolution and Ontology Dynamics 2011 (EvoDyn2011), Bonn, Germany, 2011. CEUR Workshop Proceedings.

[RSFF12] Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss, and Gerhard Friedrich. RIO: Minimizing User Interaction in Debugging of Aligned Ontologies. In Proceedings of the 7th International Workshop on Ontology Matching (OM-2012), 2012.

[RSFF13] Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss, and Gerhard Friedrich. RIO: Minimizing User Interaction in Ontology Debugging. In Wolfgang Faber and Domenico Lembo, editors, Web Reasoning and Rule Systems, volume 7994 of Lecture Notes in Computer Science, pages 153–167. Springer Berlin Heidelberg, 2013.

[SE13] Pavel Shvaiko and Jérôme Euzenat. Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering, 25(1):158–176, 2013.

[SEA+02] York Sure, Michael Erdmann, Juergen Angele, Steffen Staab, Rudi Studer, and Dirk Wenke. OntoEdit: Collaborative Ontology Development for the Semantic Web. In Proceedings of the 1st International Semantic Web Conference (ISWC 2002), pages 221–235, 2002.

[Set12] Burr Settles. Active Learning. Morgan and Claypool Publishers, 2012.

[SF10] Kostyantyn Shchekotykhin and Gerhard Friedrich. Query strategy for sequential ontology debugging. In Peter F. Patel-Schneider, Pan Yue, Pascal Hitzler, Peter Mika, Zhang Lei, Jeff Pan, Ian Horrocks, and Birte Glimm, editors, Proceedings of the 9th International Semantic Web Conference (ISWC 2010), pages 696–712, Shanghai, China, 2010.

[SFFR12] Kostyantyn Shchekotykhin, Gerhard Friedrich, Philipp Fleiss, and Patrick Rodler. Interactive Ontology Debugging: Two Query Strategies for Efficient Fault Localization. Web Semantics: Science, Services and Agents on the World Wide Web, 12-13:88–103, 2012.

[SFJ08] Kostyantyn Shchekotykhin, Gerhard Friedrich, and Dietmar Jannach. On Computing Minimal Conflicts for Ontology Debugging. In MBS 2008 - Workshop on Model-Based Systems, 2008.

[SFRF12] Kostyantyn Shchekotykhin, Philipp Fleiss, Patrick Rodler, and Gerhard Friedrich. Direct computation of diagnoses for ontology alignment. In Pavel Shvaiko, Jérôme Euzenat, Anastasios Kementsietsidis, Ming Mao, Natasha Noy, and Heiner Stuckenschmidt, editors, Proceedings of the 7th International Workshop on Ontology Matching (OM2012), pages 244– 245, Boston, MA USA, 2012. CEUR Workshop Proceedings.

[SFRF14a] Kostyantyn Shchekotykhin, Gerhard Friedrich, Patrick Rodler, and Philipp Fleiss. A direct approach to sequential diagnosis of high cardinality faults in knowledge bases. In DX 2014 - 25th International Workshop on Principles of Diagnosis (DX 2014), 2014.

[SFRF14b] Kostyantyn Shchekotykhin, Gerhard Friedrich, Patrick Rodler, and Philipp Fleiss. Interactive Ontology Debugging using Direct Diagnosis. In Patrick Lambrix, Guilin Qi, Matthew Horridge, and Bijan Parsia, editors, Proceedings of the Third International Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM14). CEUR Workshop Proceedings, 2014.

[SFRF14c] Kostyantyn Shchekotykhin, Gerhard Friedrich, Patrick Rodler, and Philipp Fleiss. Sequential diagnosis of high cardinality faults in knowledge-bases by direct diagnosis generation. In Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014). IOS Press, 2014.

[Sha48] Claude Elwood Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948.

[Sha83] Ehud Shapiro. Algorithmic Program Debugging. MIT Press, 1983.

[SHCH07] Stefan Schlobach, Zhisheng Huang, Ronald Cornet, and Frank Harmelen. Debugging Incoherent Terminologies. Journal of Automated Reasoning, 39(3):317–349, 2007.

[SKFP12] Roni Stern, Meir Kalech, Alexander Feldman, and Gregory Provan. Exploring the Duality in Conflict-Directed Model-Based Diagnosis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Exploring, pages 828–834, 2012.

[SL89] Bart Selman and Hector Levesque. Abductive and default reasoning: A computational core. Proceedings of the 8th National Conference on Artificial Intelligence, pages 343–348, 1989.

[SMH08] Rob Shearer, Boris Motik, and Ian Horrocks. HermiT : A Highly-Efficient OWL Reasoner. In Proc. of the 5th Int. Workshop on OWL: Experiences and Directions (OWLED 2008 EU), 2008.

[SPG+07] Evren Sirin, Bijan Parsia, Bernardo Cuenca Grau, Aditya Kalyanpur, and Y Katz. Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2):51–53, 2007.

[SQJH08] Boontawee Suntisrivaraporn, Guilin Qi, Qiu Ji, and Peter Haase. A Modularization-Based Approach to Finding All Justifications for OWL DL Entailments. In Proceedings of the 7th International Semantic Web Conference (ISWC 2008), pages 1–15. Springer, 2008.

[SRF11] Kostyantyn Shchekotykhin, Patrick Rodler, and Gerhard Friedrich. Balancing brave and cautious query strategies in ontology debugging. In 22nd International Workshop on Principles of Diagnosis (DX 2011), pages 122–129, 2011.

[SS89] Manfred Schmidt-Schauß. Subsumption in KL-ONE is undecidable. In Proceedings of the 1st International Conference on Principles of Knowledge Representation and Reasoning, pages 421–431. Morgan Kaufmann Publishers Inc., 1989.

[SSZ09] Ulrike Sattler, Thomas Schneider, and Michael Zakharyaschev. Which Kind of Module Should I Extract? In Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, and Ulrike Sattler, editors, Proceedings of the 22nd International Workshop on Description Logics, volume 477 of CEUR Workshop Proceedings. CEUR-WS.org, 2009.

[Stu08] Heiner Stuckenschmidt. Debugging OWL Ontologies - A Reality Check. In Raul GarciaCastro, Asunción Gómez-Pérez, Charles J. Petrie, Emanuele Della Valle, Ulrich Küster, Michal Zaremba, and Shafiq M. Omair, editors, Proceedings of the 6th International Workshop on Evaluation of Ontology-based Tools and the Semantic Web Service Challenge (EON), pages 1–12, Tenerife, Spain, 2008.

[SU06] Ken Satoh and Takeaki Uno. Enumerating Minimally Revised Specifications Using Dualization. In Takashi Washio, Akito Sakurai, Katsuto Nakajima, Hideaki Takeda, Satoshi Tojo, and Makoto Yokoo, editors, New Frontiers in Artificial Intelligence, volume 4012 of Lecture Notes in Computer Science, pages 182–189. Springer Berlin Heidelberg, 2006.

[SW05] Gerald Steinbauer and Franz Wotawa. Detecting and locating faults in the control software of autonomous mobile robots. In IJCAI International Joint Conference on Artificial Intelligence, pages 1742–1743, 2005.

[SW09] Gerald Steinbauer and Franz Wotawa. Robust Plan Execution Using Model-Based Reasoning. Advanced Robotics, 23(10):1315–1326, 2009.

[TH06] Dmitry Tsarkov and Ian Horrocks. FaCT++ description logic reasoner: System description. In In Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR 2006), pages 292–297. Springer, 2006.

[TNNM13] Tania Tudorache, Csongor Nyulas, Natalya F. Noy, and Mark A. Musen. WebProtégé: A Collaborative Ontology Editor and Knowledge Acquisition Tool for the Web. Semantic Web, 4(1):89–99, 2013.

[Tur37] Alan Mathison Turing. On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(1):230–265, 1937.

[WSM02] Franz Wotawa, Markus Stumptner, and Wolfgang Mayer. Model-Based Debugging or How to Diagnose Programs Automatically. In Tim Hendtlass and Moonis Ali, editors, Developments in Applied Artificial Intelligence, volume 2358 of Lecture Notes in Computer Science, pages 746–757. Springer Berlin Heidelberg, 2002.


Designed for Accessibility and to further Open Science