One-Shot Induction of Generalized Logical Concepts via Human Guidance

2019·Arxiv

Abstract

Abstract

We consider the problem of learning generalized first-order representations of concepts from a single example. To address this challenging problem, we augment an inductive logic programming learner with two novel algorithmic contributions. First, we define a distance measure between candidate concept representations that improves the efficiency of search for target concept and generalization. Second, we leverage richer human inputs in the form of advice to improve the sample-efficiency of learning. We prove that the proposed distance measure is semantically valid and use that to derive a PAC bound. Our experimental analysis on diverse concept learning tasks demonstrates both the effectiveness and effi-ciency of the proposed approach over a first-order concept learner using only examples.

1 Introduction

We consider the problem of learning generalized representations of concepts using a small number of examples. More specifically, we study the case of learning from one example, which is traditionally called one-shot learning. This problem has received much attention from ML community (Lake et al. 2011; Khan and Madden 2014). We consider a challenging setting inside one-shot learning, that of learning explainable and generalizable (first-order) concepts. These concepts can then be particularly reused for learning compositional concepts, for instance, plans. In our concept learning setting, plan induction becomes a special case where a generalizable plan is induced from a single (noise-free) demonstration. As an example, consider building a tower that requires learning L-shapes as a primitive. In our formulation, the goal is to learn a L-shape from a single demonstration. Subsequently, using this concept, the agent can learn to build a rectangular base (with 2 L-shapes) from another single demonstration and so on till the tower is fully built.

Concept learning has been addressed previously in several ways: problem solving by reflection (Stroulia and Goel 1994), mechanical compositional concepts (Wilson and Latombe 1994), learning probabilistic programs (Lake, Salakhutdinov, and Tenenbaum 2015), etc. However, all these methods require a significant number of examples. Concept learning as one-class classification has been considered previously where no negative examples are explicitly created. As one would expect, they are considered while learning an SVM or Nearest Neighbor (Tax 2001), or with a neural network (Kozerawski and Turk 2018).

While these methods are closely-related, our work has two key differences. First, we aim to learn an easily interpretable, explainable, and generalizable concept representation that can then be compounded to learn more complex concepts. To this effect, we focus on learning first-order Horn clause (Horn 1951)(If Then statements). Second, and perhaps most important, we do not assume the existence of a simulator (for plans) or employ a closed-world assumption to generate negative examples. Inspired by Mitchell’s observation of futility of bias-free learning (Mitchell 1997), we develop an approach that employs domain expertise as inductive bias. The principle of structural risk minimization (Vapnik 1999) shows how optimal generalization from extremely sparse observations is quite hard. The problem is even more critical in structured domains (most relations are false in the world). Thus, one-shot induction of generalized logical concepts is challenging. We aim to solve this problem via an iterative revision of first-order horn clause theories using a novel scoring metric and guidance from a human. In essence, we emulate a ‘student’ who learns a generalized concept from an example demonstration provided by the ‘teacher’, by both reflecting as well as, occasionally, asking relevant questions.

We propose a novel approach referred as Guided Oneshot Concept Induction (GOCI) for one-shot concept learning that learns generalized first-order rules. GOCI builds upon an inductive logic program (ILP) learner (Muggleton 1991) and introduces two key algorithmic steps. First, a modified scoring function that allows for explicitly computing distances between concept representations. We show that this distance corresponds to the well known Normalized Compression Distance (NCD) in the case of plan induction. Based on this observation, we demonstrate that the proposed scoring function is indeed a valid distance metric. Second, the use of domain knowledge from human expert as an inductive bias. Unlike many advice taking systems that employ domain knowledge before training, our GOCI algorithm identifies the relevant regions of the concept representation space and actively solicits guidance from the human expert to find the target concept in a sample-efficient manner. Overall, these two algorithmic modifications allow for more effective and efficient learning using GOCI that we demonstrate both theoretically and empirically.

Contributions. We make a few important contributions: (1) We derive a new distance-penalized scoring function inside an ILP learner that allows for computing distances between concepts during guided one-shot concept induction. (2) We treat the human-advice as an inductive bias to accelerate learning. Our ILP learner actively solicits richer information from the human experts than mere labels. (3) We analyze the theoretical properties of GOCI. This includes the validity of the distance metric, showing that NCD between plans is a special case of our metric; and deriving a PAC bound based on Kolmogorov complexity. (4) Finally, we demonstrate the exponential gains in both sample efficiency and effectiveness of the algorithm over standard concept learners on diverse concept induction tasks.

2 Background and Related Work

Concept Learning: Concept learning has previously been studied from several perspectives. Our approach is closely related to Stroulia & Goel (1994)’s work which proposes an approach for inducing logical problem-solving concepts by reflection. While our scoring metric in GOCI is similar to ‘reflection’, its scope is much broader (problem-solving is a special case) and can be used to learn from sparse observations. Wilson (1994) models concepts about continuous geometric shapes by reasoning with two-dimensional bounding boxes. While we use discrete spatial structures as motivating examples, GOCI is not limited to discrete spaces. GOCI is also related in spirit to probabilistic (bayesian) program induction for learning decomposable visual concepts (Lake, Salakhutdinov, and Tenenbaum 2015) which illustrates how exploiting decomposability is more effective than deep learning frameworks. Our approach leverages not only decomposability but implicit relational structure as well, via a logical representation, allowing for generalized concept classes, including plan induction. Tom Mitchell defines concept learning as inferring a booleanvalued function from training examples of its inputs and outputs (Mitchell 1997), which has inspired its treatment as a standard classification/regression problem as well.

One/few-Shot Learning: All approaches described above require a significant number of examples. Our problem setting requires learning from sparse examples (possibly one). Lake et al., (2011) propose a one-shot version of bayesian program induction of visual concepts. There is also substantial work on one/few-shot learning (both deep and shallow) in a traditional classification setting (Bart and Ullman 2005; Vinyals et al. 2016; Wang et al. 2018), most of which either pre-train with gold-standard support example set or sample synthetic observations under certain assumptions. We make no such assumptions about synthetic examples.

Theory Induction: ILP (Muggleton 1991; Muggleton and De Raedt 1994) inductively learns a logical program (first-order theory) that aims to cover most of the positive examples and none of the negative examples. With ILP, goal is to generalize over instances using background knowledge as search bias by building valid hypotheses about unseen examples. In concept learning, generalization is search through space of candidate inductive hypotheses where induction of single theory requires (1) structuring, (2) searching and (3) constraining space of theories (Lisi 2008). Research in this area has usually focused on one or more of these dimensions. FOIL (Quinlan 1990) is an early non-interactive learner with the disadvantage that it occasionally prunes some uncovered hypotheses. This is alleviated in systems like FOCL by introducing language-bias in form of user-defined constraints (Pazzani 1992). Top-down relational ILP systems such as PROGOL (Muggleton 1995) & ALEPH (Srinivasan 2007) employ inverse entailment to restrict search space. Later with Interactive ILP, learner could pose questions and elicit expert advice which allows pruning large parts of search space (Sammut and Banerji 1986; Rouveirol 1992; Rouveirol 1990). To incorporate new incoming information, ILP systems with theory revision, incrementally re-fine and correct the induced theory (MUGGLETON 1988; Sammut and Banerji 1986). While GOCI is conceptually similar to ALEPH (uses different search strategies and evaluation functions), it additionally acquires domain knowledge by interacting with human expert incrementally. We extend this to one-shot setting.

Knowledge-Guided Learning: Background knowledge in ILP is primarily used as search bias. Strong inductive biases in the form of domain knowledge are required to accelerate learning. Mitchell (1980) proved that biasing learners is necessary to achieve true generalization over new instances. Although earliest form of knowledge injection can be found in explanation-based approaches (Shavlik and Towell 1989; Towell and Shavlik 1994), our work relates to preferenceelicitation framework (Braziunas and Boutilier 2006) which guides learning via human preferences. Augmented learning with domain knowledge as an inductive bias has long been explored across various modeling formalisms, including SVMs, ANNs (Fung, Mangasarian, and Shavlik 2003; Towell and Shavlik 1994), probabilistic logic (Odom et al. 2015), and planning (Das et al. 2018). Our human-guided GOCI learner aims to extend these directions in the context of learning complex concepts (including plans).

3 Guided One-shot Concept Induction

We are inspired by a teacher (human) and student (machine) setting in which a small number of demonstrations are used to learn generalized concepts (Chick 2007). Intuitively, the description provided by a human teacher tends to be modular (can have distinct logical partitions), structured (entities and relations between them), and in terms of known concepts. Hence, a vectorized representation of examples is in-sufficient. We choose a logical representation, specifically a function-free restricted form of first-order logic (FOL) that models structured spaces faithfully. Our approach GOCI (Figure 1) consists of an ILP learner that induces horn clauses from data. Typical ILP learners use coverage to score candidate clauses. However, this cannot be used with one-shot learning. Hence, we introduce a penalty function. Specifically, this penalty function

Figure 1: Highlevel overview of our GOCI framework.

is the conceptual distance between the induced theory at the current iteration and the demonstration provided by the “teacher” (Conceptual distance calculator; top-right in Figure 1). Moreover, the search space of FOL theories is intractable and hence, ILP resorts to employing “mode” definitions as search bias. However, we need stronger inductive biases in our setting. We allow for a stronger inductive bias by enabling the teacher to provide constraints as advice that can be directly added to the clausal theory. To find the most relevant constraints, GOCI queries the expert to choose among a sampled set of constraints (as first-order predicates) and uses the selected constraint to revise the theory. This is analogous to a student asking relevant questions.

Given: A set of ground predicates (or trajectories) describing one (or few) instance(s) and expert guidance

To Do: Induction of first-order logic representation of a concept that generalizes the given example(s) effectively

3.1 Input

The input to GOCI is the description of the instances(s) of a concept that the human teacher provides. An input example of a concept is, thus, conjunction of a set of ground literals (assertions).

Example 1 An instance of a concept in the domain of spatial structures, can be a structure L with dimensions height = 5, base = 4 (as shown in Figure 2). L(S), Height(S, 5), Base(S, 4), s is the concept identifier and may be described as conjunction of ground literals,

which denotes L as composition of a ‘Row’ of w = 4 and a ‘Tower’ of h = 4 with appropriate literals describing the scenario (Figure 2 left). As a special case, under partial or total ordering assumptions among the ground literals, an input instance can represent a plan demonstration.

3.2 Output

GOCI induces a least general generalization (LGG) horn clause from the input example(s).

Figure 2: Concept L (base = 4, height = 5), described as composition of a Tower and a Row

Definition 1 Concept in our setting is represented as horn clause theory.

where the body is a conjunction of liter- als indicating known concepts, the head identifies a target concept, and the terms are logical variables that denote the parameters of the concept. Since a concept can be described in multiple ways (Figure 2), the final theory will be a disjunction over clause bodies with the same head. A (partial) instantiation of a theory T is .

Note that these definitions allow for the reuse of concepts induced earlier, potentially in a hierarchical fashion. We believe that this is crucial in achieving human-agent collaboration in complex domains.

Example 2 Figure 2 illustrates an instance of the concept L which can be described in multiple ways. A possible disjunction could be,

HeightBaseContains(s, a), Contains(s, b), Row(a), Tower(b), WidthHeightEqualSubSpRelNWTopHeightBaseContains(s, a),

The generalization must be clearly noted. The last argument of the SpRel() is a constant and not variable, since only this particular spatial alignment is appropriate for the concept of the L structure. Although the input is a single instance described in one way (Example 1), GOCI should learn a generalized representation such as Example 2. Another interesting aspect are the additional constraints: Equal(X,Y) and Sub(X,Y,N). While such predicates are a part of the language, they are not typically described directly in the input examples. However, they are essential in appropriate generalization, since they can express complex interactions between numerical (non-numerical) parameters.

Note that we describe concepts in a generalized manner as horn clause theories. A specific instantiation could be plan induction from sparse demonstrations. In our approach, we can specify the time index as the last argument of both the state and action predicates. Following this definition, we can allow plan induction as shown in our experiments. Our novel conceptual distance is clearer and more intuitive in the case of plans as can be seen later.

Definition 2 (Decomposable:) A concept C is Decomposable iff it is expressed as a conjunction of other concepts, and one or more additional literals to model the interactions. . Here are literals that represent other concepts and are literals that, either describe the attributes of or connect them.

Decomposable means that the concept (currently unknown) can be described as a composition of other known concepts. GOCI learns the class of decomposable concepts since it is intuitive for the “human teacher” to describe. Decomposable concepts faithfully capture the modular and structured aspect of how humans would understand and describe instances of concepts in the universe. It also connects to the decomposable nature of planning tasks which can be normally partitioned into sub-tasks that yield sub-plans.

3.3 Methodology

Search: Given the definition of the input and output spaces, we now describe the search process. ILP systems perform a greedy search through the space of possible theories. Space is typically defined, declaratively, by a set of mode definitions (Muggleton 1995) to guide the search process. Following most well-known ILP systems (Srinivasan 2007), we also start with the most specific clause (known as a bottom clause) from the ground assertions and successively add/modify literals that might improve a rule that best explains the domain. Typically, the best theory is the one which most accurately explains positive example(s) provided while minimizing negative example coverage. Thus, it optimizes the likelihood of a theory T based on the data (). We start with a bottom clause, we variablize the ground

statements via anti-substitution. Variabilization of theory T

is denoted by where vars(T). That is, inductive substitution , is a mapping from occurrences of ground terms in theory T to variables. Evaluation Score: To score a candidate theory, it is

necessary to modify the coverage-based ILP scoring (e.g.,

ALEPH’s compression heuristics) for the following reasons. • To use as much of the user-provided advice as possible, we learn long theory, hence the search space can be exponentially large. Thus, modes alone aren’t enough to guide our search strategy in this situation.

• There is only one (a few) positive training example(s) to learn from and many possible rules can accurately match the training example. Coverage based scores clearly fail here to identify the best theory.

Most inductive learners optimize some adaptation of likelihood. For a candidate theory T, likelihood given data D is, LL(T) = log P(D|T), which is essentially its coverage. In our GOCI framework, we have one (at most few) positive example(s). Clearly coverage will not suffice. Hence, we define a modified objective as follows.

where is the optimal theory, is the set of all candidate theories, and D is the conceptual distance between the instantiated candidate theory and the original example X. As explained earlier, a theory T is a disjunction of horn clauses with the target concept as the head. Distance metric: Conceptual distance, ), is a penalty in our objective. The key idea is that any learned first-order horn clause theory must recover the given instance by equivalent substitution. However, syntactic measures, such as edit distance, are not sufficient since changing even a single literal, especially, literals that indicate interconcept relations, could potentially result in a completely different concept. For instance, in blocks-world, the difference between a block being in the middle of a row and one at the end of the row can be encoded by changing one literal. Hence, a more sophisticated semantic distance such as conceptual distance is necessary (Friend et al. 2018). However, such distances are difficult to compute without a deeper understanding of the domain and its structure.

Our solution is to employ inter-plan distances. As discussed earlier, the class of concepts GOCI can induce, are decomposable and, hence, are equivalent to parameterized planning tasks. One of the key contributions of this work is to exploit this equivalence by using a domainindependent planner to find grounded plans for both the theory learned at a particular iteration and the instance given as input, X. We then compute the Normalized Compression Distance (NCD) between the plans. NCD for Plans: Goldman & Kuter (2015) proved that NCD is arguably the most robust inter-plan distance metric. NCD is a reasonable approximation of Normalized Information Distance, which by itself is not computable (Vit´anyi et al. 2009). Let the plans for and X be and . To obtain NCD, we execute string compression (lossy or lossless) on each of the plans as well as the concatenation of the two plans to recover the compressed strings , and respectively. NCD between the plans can be computed as,

The conceptual distance between a theory T and X is the NCD between the respective plans, . This entire computation is performed by the Conceptual distance calculator as shown in Figure 1.

Observations about NCD: (1) Conceptual distance as a penalty term for negative log-likelihood in the scoring function ensures that the learned theory will correctly recover the given example/demonstration. (2) generalizes to the Kolmogorov-Smirnov statistic between two target distributions if we induce probabilistic logic theories. We prove these insights theoretically in the next section.

Human Guidance: As outlined earlier, the search space in ILP is provably infinite. Typically language-bias (modes) and model assumptions (closed world) are used to prune the search space. However, it is still intractable with one (or few) examples. Hence, we employ human expert guidance as constraints that can be directly used to refine an induced theory, acting as a strong inductive bias. Also, we are learning decomposable concepts (see Definition 2). This allows us to exploit another interesting property. Constraints can now be applied over the attributes of the known concepts that compose the target concept, or over the relations between them. Thus, GOCI directly includes such constraints in the clauses as literals (see Example 2). Though such constraint literals come from the pre-declared language, they are not directly observed in the input example(s). So an ILP learner will fail to include such literals in the induced clauses.

If the human inputs (constraints) are provided upfront before learning commences, it turns out to be wasteful/irrelevant. More importantly, it places the burden on the human consider all possible scenarios where advice could be needed. To alleviate this, our framework explicitly queries for human advice on the relevant constraint literals, which are most useful. Let U be a predefined library of constraint predicates in the language, and let be a relevant constraint literal. Human advice A can essentially be viewed as a preference over the set of relevant constraints {U()}. If denotes the preferred set of constraints, then we denote the theory having a preferred constraint literal in the body of a clause as . (For instance, based on Example 2 GOCI could query which of the two sampled constraints is more useful. Human could prefer , since it clearly subsumes the other.) Scoring function now becomes:

Thus, we are optimizing the constrained form of the same objective as Equation 1 which aims to prune the search space. This is inspired by advice elicitation approaches (Odom et al. 2015). While our framework can incorporate different forms of advice, we focus on preference over constraints on the logical variables, in our one-shot setting. The formal algorithm, described next, illustrates how we achieve this via an iterative greedy refinement (Figure 1, query-advice loop shown in left part of the image).

3.4 The GOCI Algorithm

Algorithm 1 outlines the GOCI framework. It initializes a theory , by variablizing the ‘bottom clause’ obtained from X and background knowledge [lines 3 & 5]. Then it performs a standard ILP search (described earlier) to propose a candidate theory [line 6]. This is followed by the guided refinement steps, where constraint literals are sampled (parameter tying guides the sampling) and the human teacher is queried for preference over them, such that the candidate theory can be modified using preferred constraints [lines 7-9]. The function NCD() performs the computation of the conceptual distance, by first grounding the current modified candidate theory with the same parameter values as the input example X, then generating grounded plans and finally calculating the normalized compression

Algorithm 1 Guided One-shot Concept Induction

1: procedure GOCI(Instance X) 2: Initialize: Set Iteration 3: Initialize: Bootstrap theory 4: repeat

distance between the plan strings (as shown in Figure 1 and Equation 2) [line 10]. Distance-penalized negative log-likelihood score is estimated and minimized to find the best theory at the current iteration [lines 11-14], which is then used as the initial model in the next iteration. This process is repeated either until convergence (no change in induced theory) or maximum iteration bound (L).

3.5 Theoretical analysis

Validity of Distance Metric: Normalized information distance between 2 strings x and y is provably a valid distance metric (Vit´anyi et al. 2009)

where K(x) is the Kolmogorov complexity of a string x and K(x|y) is the conditional Kolmogorov complexity of x given another string y. NCD is a computable approximation of the same []. Thus, we just verify if is a correct conceptual distance measure.

Consider two theories and , with same parameterizations (i.e., same heads). Let and be their grounded theories with identical parameter values . Our learned theories are equivalent to planning tasks. Assuming access to a planner which returns and , the two plan strings w.r.t the instantiations of concepts represented by and respectively.

Proposition 1 (Valid Conceptual Distance) Normalized

information distance is a valid and sound conceptual distance measure between and , i.e., iff the concepts represented by and are equivalent.

Proposition 2 (Generalization to Kolmogorov-Smirnov) In generalized probabilistic logic, following Vit´anyi (2013), corresponds to two-sample Kolmogorov-Smirnov statistic between two random variables and with distributions and respectively.

sup, where is the cumulative distribution function for and supis the supremum operator. In deterministic setting, is a special case of .

PAC Learnability: To analyze the theoretical properties of GOCI, we build on the PAC analysis for recursive rlgg (relative least general generalization) in GOLEM for function-free horn clause induction due to Muggleton & Feng (1990). Let n denote the sample size and H denote the hypothesis space. As shown earlier, GOCI learns the final theory by iterative refinement after inducing an initial theory. Denote the initial hypothesis space as and hypothesis space of the final theory as (such that ).

Proposition 3 (Sample complexity) Following Valiant (1984) and Mooney (1994), with probability , the sample complexity of inducing the optimal theory is:

where is the regret, - sample complexity of -number of distinct predicates, d is the distance of the current revision from the last known consistent theory, and L is the upper bound on the number of refinement steps (iterations).

ILP (such as GOLEM (Muggleton and Feng 1990)) induces ij-Determinate clauses (Muggleton and Feng 1990), where i is the maximum depth of a variable in a clause and & j is the maximum arity. Thus, in our problem setting, , where t is the number of terms, and p is the place (Muggleton and Feng 1990). Hence, (if j & i is bounded : ) Equation 3 can be reformulated as:

Mooney (1994) defines distance d to be the number of single literal changes in a single refinement step. In Algorithm 1, we observe that at each iteration , updates are w.r.t. the preferred constraint predicates .

Proposition 4 (Refinement distance) d is upper bounded by the expected number of literals that can be constructed out of the library of constraint predicates with human advice and lower bounded by the conceptual distance between theory learned at two consecutive iterations since we adopt a greedy approach. If denotes the probability of a constraint predicate being preferred then, where is the maximum possible number of constraint literals and q is the maximum arity of the constraints. If we use only pairwise constraints, then q =2.

Our input is sparse (one or few instances). GOCI elicits advice over constraints to acquire additional information. Let |X| be the number of input examples.

Proposition 5 (Advice complexity) From Equations 3 and 4, at convergence , we get examples, on an average, for a concept C to be PAC learnable using GOCI.

Figure 3: Instances of spatial concepts in Minecraft. (Left) Upright Tee, (Right) Upright L

4 Evaluation

We next aim to answer the following questions explicitly:

(Q1) Is GOCI effective in one-shot concept induction?

(Q2) How sample efficient is GOCI compared to baselines?

(Q3) What is the relative contribution of the novel scoring function vs. human guidance towards performance?

We developed our framework by extending Wisconsin Inductive Logic Learner (Natarajan et al. 2009). We modi-fied the scoring function with NCD penalty and integrated a customized SHOP2 planner (Nau et al. 2003) for inter-plan NCD computation. We added constraint sampling and human guidance in iterative fashion as outlined in Algorithm 1. Experimental Setup: We compare GOCI with a standard ILP system with no enhancements. We focus on the spe-cific task of “one-shot concept induction”, with a single input example for each of the several types of concepts and report aggregated precision. We consider precision because preference queries are meant to eliminate false positives in our case. To demonstrate the robustness of GOCI w.r.t sample complexity, beyond one-shot case, we conducted experiments with varying sample sizes for each concept type across all domains and show learning curves for the same. We also evaluate the relative contribution of two important components of our proposed framework: (a) novel scoring metric and (b) human guidance; i.e., we compare against two more baselines (ILP+Score and ILP+Guidance). For every domain, we consider ten different types of concepts (ten targets). All the results are aggregated over five different runs. Domains We employ 3 domains with varying complexity:

1. Minecraft (Spatial Structures): The goal is to learn discrete spatial concepts in a customized (Narayan-Chen, Jayannavar, and Hockenmaier 2019) Project Malmo platform for Minecraft. Dialogue data in Malmo is available online, and we converted them into a logical representation. All structures are in terms of discrete atomic unit blocks (cubes). Figure 3 shows examples of some simple spatial structures that GOCI was able to learn. The representation language is similar to the Example 2, with some pre-existing concepts in knowledge base such as Row, Tower, Cube etc.

2. Assembly (a planning domain): Generalized concepts are equivalent to parameterized planning tasks. Assembly is a well-known planning domain, where different mechanical

Figure 4: Learning curves for varying sample size to compare the sample-efficiency of GOCI and ILP (best viewed in color).

Figure 5: Results of ablation study on Minecraft domain. Relative contribution of our distance-penalized score vs. human guidance (best viewed in color).

Table 1: Results for one-shot concept learning.

structures (machines) are constructed using different parts and resources. Input is a conjunction of ground literals indicating ground plan demonstration (assuming total ordering).

3. Chemical Entities of Biological Interest (ChEBI): ChEBI (Degtyarenko et al. 2007) is a compound database which contains some important structural features and activity-based information for classification of chemicals. Core information ChEBI provides are (1) Molecular structure, (2) Biological role, (3) Application, and (4) Subatomic particle, which may be suitable for the prediction of various attributes of compounds. We model the Benzene molecule prediction task following the description in ChEBI. The data has predicates such as Carbon, SingleBond, DoubleBond, HasAtom etc.

4.1 Experimental Results

[Effective One-shot (Q1)] Table 1 shows the performance of GOCI on one-shot concept learning tasks as compared to standard ILP. GOCI significantly outperforms ILP across all domains answering (Q1) affirmatively. Also, note that GOCI is very ‘query’ efficient as observed from the small average number of queries posed in the case of each domain. [Sample Efficiency (Q2)] In figure 4, we observe that GOCI converges within significantly smaller sample size across all domains, thus, supporting our theoretical claims in Section 3.5. In ChEBI, though, quality of planning domain encoding might explain mildly lower-precision yet GOCI does perform significantly better than vanilla ILP learner. [Relative contribution (Q3)] Figure 5 validates our intuition that both components (scoring function and humanguidance) together make GOCI a robust one-shot (sample-efficient) concept induction framework. Though human guidance, alone, is able to enhance the performance of a vanilla ILP learner in sparse samples, yet it is not suffi-cient for optimal performance. In contrast, although the advantage of our novel distance-penalized scoring metric is marginal in sparse samples, it is essential for optimal performance at convergence.

The most important conclusion from the experiments is that when available, the guidance along with the score leads to a jump-start, better slope and in some cases, asymptotically sample efficient with a fraction of the number of instances needed than merely learning from data. This clearly demonstrates the need for the injection of human guidance as knowledge when learning complex concepts.

5 Conclusions

We developed a human-in-the-loop one-shot concept learning framework in which the agent learns a generalized representation of a concept as FOL rules, from a single (or a few) positive example(s). Our 2 primary contributions are – a new distance measure between concepts and richer human inputs than mere labels to be solicited actively by the agent. Our theoretical and experimental analyses showed the promise of GOCI method. An exhaustive evaluation involving richer human inputs and integration with a plan induction system remain interesting directions of future research.

References

[Bart and Ullman 2005] Bart, E., and Ullman, S. 2005. Single-example learning of novel classes using representation by similarity. In BMVC.

[Braziunas and Boutilier 2006] Braziunas, D., and Boutilier, C. 2006. Preference elicitation and generalized additive utility. In AAAI.

[Chick 2007] Chick, H. L. 2007. Teaching and learning by example. Mathematics: Essential research, essential practice.

[Das et al. 2018] Das, M.; Odom, P.; Islam, M. R.; Doppa, J. R.; Roth, D.; and Natarajan, S. 2018. Preference-Guided Planning: An Active Elicitation Approach. In AAMAS.

[Degtyarenko et al. 2007] Degtyarenko, K.; De Matos, P.; Ennis, M.; Hastings, J.; Zbinden, M.; McNaught, A.; Alc´antara, R.; Darsow, M.; Guedj, M.; and Ashburner, M. 2007. Chebi: a database and ontology for chemical entities of biological interest. Nucleic acids research 36(suppl 1):D344–D350.

[Friend et al. 2018] Friend, M.; Khaled, M.; Lefever, K.; and Sz´ekely, G. 2018. Distances between formal theories.

[Fung, Mangasarian, and Shavlik 2003] Fung, G. M.; Man- gasarian, O. L.; and Shavlik, J. W. 2003. Knowledge-based support vector machine classifiers. In NIPS.

[Goldman and Kuter 2015] Goldman, R. P., and Kuter, U. 2015. Measuring plan diversity: Pathologies in existing approaches and a new plan distance metric. In AAAI.

[Horn 1951] Horn, A. 1951. On sentences which are true of direct unions of algebras. The Journal of Symbolic Logic.

[Khan and Madden 2014] Khan, S. S., and Madden, M. G. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review.

[Kozerawski and Turk 2018] Kozerawski, J., and Turk, M. 2018. Clear: Cumulative learning for one-shot one-class image recognition. In CVPR.

[Lake et al. 2011] Lake, B.; Salakhutdinov, R.; Gross, J.; and Tenenbaum, J. 2011. One shot learning of simple visual concepts. In CSS.

[Lake, Salakhutdinov, and Tenenbaum 2015] Lake, B. M.; Salakhutdinov, R.; and Tenenbaum, J. B. 2015. Humanlevel concept learning through probabilistic program induction. Science.

[Lisi 2008] Lisi, F. A. 2008. Building rules on top of ontolo- gies for the semantic web with inductive logic programming. Theory and Practice of Logic Programming.

[Mitchell 1980] Mitchell, T. M. 1980. The need for biases in learning generalizations. CS Dept, Rutgers Univ.

[Mitchell 1997] Mitchell, T. 1997. Machine Learning. McGraw-Hill Higher Education.

[Mooney 1994] Mooney, R. J. 1994. A Preliminary PAC Analysis of Theory Revision. Computational Learning Theory and Natural Learning Systems.

[Muggleton and De Raedt 1994] Muggleton, S., and De Raedt, L. 1994. Inductive logic programming: Theory and methods. JLP.

[Muggleton and Feng 1990] Muggleton, S., and Feng, C. 1990. Efficient induction of logic programs. In ALT.

[MUGGLETON 1988] MUGGLETON, S. 1988. Machine invention of first-order predicates by inverting resolution. In Proc. of 5th Machine Learning Conference.

[Muggleton 1991] Muggleton, S. 1991. Inductive logic pro- gramming. New generation computing.

[Muggleton 1995] Muggleton, S. 1995. Inverse entailment and progol. New generation computing.

[Narayan-Chen, Jayannavar, and Hockenmaier 2019] Narayan-Chen, A.; Jayannavar, P.; and Hockenmaier, J. 2019. Collaborative dialogue in Minecraft. In ACL.

[Natarajan et al. 2009] Natarajan, S.; Kunapuli, G.; OReilly, C.; Maclin, R.; Walker, T.; Page, D.; and Shavlik, J. 2009. ILP for bootstrapped learning: A layered approach to automating the ILP setup problem. In ILP.

[Nau et al. 2003] Nau, D. S.; Au, T.-C.; Ilghami, O.; Kuter, U.; Murdock, J. W.; Wu, D.; and Yaman, F. 2003. Shop2: An htn planning system. JAIR.

[Odom et al. 2015] Odom, P.; Khot, T.; Porter, R.; and Natarajan, S. 2015. Knowledge-based probabilistic logic learning. In AAAI.

[Pazzani 1992] Pazzani, M. J. 1992. An information-based approach to integrating empirical and explanation-based learning. ILP.

[Quinlan 1990] Quinlan, J. R. 1990. Learning logical defini- tions from relations. Machine learning.

[Rouveirol 1990] Rouveirol, C. 1990. Saturation: Postpon- ing choices when inverting resolution. In ECAI.

[Rouveirol 1992] Rouveirol, C. 1992. Extensions of inver- sion of resolution applied to theory completion. ILP.

[Sammut and Banerji 1986] Sammut, C., and Banerji, R. B. 1986. Learning concepts by asking questions. Machine learning: An artificial intelligence approach.

[Shavlik and Towell 1989] Shavlik, J. W., and Towell, G. G. 1989. Combining explanation-based learning and artificial neural networks. In ICML.

[Srinivasan 2007] Srinivasan, A. 2007. Aleph: A learning engine for proposing hypotheses. Software available at http://web2. comlab. ox. ac. uk/oucl/research/areas/machlearn/Aleph/aleph. pl.

[Stroulia and Goel 1994] Stroulia, E., and Goel, A. K. 1994. Learning problem-solving concepts by reflecting on problem solving. In ECML.

[Tax 2001] Tax, D. M. J. 2001. One-class classification: Concept learning in the absence of counter-examples. Ph.D. Dissertation, TU Delft.

[Towell and Shavlik 1994] Towell, G. G., and Shavlik, J. W. 1994. Knowledge-based artificial neural networks. Artificial intelligence.

[Valiant 1984] Valiant, L. G. 1984. A theory of the learnable. Communications of the ACM.

[Vapnik 1999] Vapnik, V. N. 1999. An overview of statistical learning theory. IEEE Transactions on Neural Networks.

[Vinyals et al. 2016] Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D.; et al. 2016. Matching networks for one shot learning. In NIPS.

[Vit´anyi et al. 2009] Vit´anyi, P. M.; Balbach, F. J.; Cilibrasi, R. L.; and Li, M. 2009. Normalized information distance. In Information theory and statistical learning. Springer. 45– 82.

[Vit´anyi 2013] Vit´anyi, P. M. 2013. Conditional kolmogorov complexity and universal probability. Theoretical Computer Science.

[Wang et al. 2018] Wang, Y.-X.; Girshick, R.; Hebert, M.; and Hariharan, B. 2018. Low-shot learning from imaginary data. In CVPR.

[Wilson and Latombe 1994] Wilson, R. H., and Latombe, J.- C. 1994. Geometric reasoning about mechanical assembly. Artificial Intelligence.

designed for accessibility and to further open science