In AI planning (Ghallab, Nau, and Traverso 2004; Geffner and Bonet 2013), a plan is synthesized against a model of the environment—a planning domain—to achieve a given goal from an initial state of the environment. Such model describes how actions change the world, via the specification of their preconditions and effects. As any model, planning domains are never “complete” and they inevitable make assumptions on the dynamics of the environment. A limitation of standard planning formalism is that they do not account for deviations from such assumptions, and hence are not well prepared for integration within an execution framework. A common approach to handle discrepancies between what the planner expected and what happened at run-time is to simply perform re-planning or plan-repair (Fox et al. 2006). But, why would the system keep reasoning about the same model of the world that has been proven “wrong”? As an actor’s view on planning becomes more prominent in the field (Ghallab, Nau, and Traverso 2014), and inspired by work in Software Engineering (D’Ippolito et al. 2014), we propose a “generalized”1 planning framework that aims
to better account for the uncertainties at design/engineering time. Concretely, rather than fixing the level of risk and ob- jectives, we envision the specification of various assump- tion levels, as well as different goals for each assumption level. This is achieved by allowing the knowledge engineer to specify a family of planning domains, each carrying a set of assumptions. For example, in an fully idealized model of the blocks world, a robotic arm always successfully grabs blocks, whereas in less idealized models the griper may fail when picking, maybe missing or even breaking it. Depend- ing on the assumptions imposed on the gripper operation, one may aim for different types of block towers.
The aim of the framework is to synthesize not one but a collection of inter-related policies embedding, together, adaptive behavior. So, as the environment violates assump- tions, the agent should gracefully “degrade” to less refined planning models. Since such models carry less assumptions on the environment, less functionalities can generally be of- fered. If the gripper may break a block while picking it, building a complete block may just not be achievable. So, with model degradation, comes goal degradation, usually to a less ambitious (and often less demanding) one.
We call the above framework multi-tier adaptive planning and is the topic this paper. Let us start with a simple example to motivate the work and upcoming technical development.
Consider a robot moving in a grid similar to the dust- cleaning robot example in (Bonet and Geffner 2015). The robot can walk one cell at a time or it can run covering multi- ple cells in one shot. Unfortunately, the physical shape of the corridor is not fully known to the designer and the physical guarantees of the robot’s actuators are not be fully known to the designer. Because of that, in some scenarios, some cells may be impossible to navigate and the robot may get dam- aged or even broken. So, the knowledge engineer considers various possible assumption levels on the environment’s be- havior, together with corresponding adequate objectives. In the most idealized model
, the designer assumes that both walking and running actions succeed with no negative
Listing 1: Actions walk left and run left in the three models.
side-effects. The goal there is for the robot to reach a destination cell c0 and intact. In a less idealized , both running and walking actions still cause the robot to advance, but no assumption can be made on their side effects and movement may cause minor damages. The robot should then just aim to reach the target destination c0. Finally, in the least idealized
, a walking action may sometimes cause the robot to get minor damages without even advancing, and even worse, a running action may get the robot broken and render it unusable. Under those weaker assumptions, the robot should return to base location c2 for servicing.
Under the above multi-tier specification, the robot tries its best, but adapts its behavior as it discovers some assumptions may not hold. To do so, the robot initially assumes the most idealized world model and thus works for the most ambitious goal: reach destination undamaged. But, upon observing an inconsistency with the assumptions, it must adapt both the model of the environment as well as the objective being pursued. For example, if the robot succeeds in advancing when moving but gets a minor damage, it should degrade to . If it actually fails to move at all, it should degrade to
to operate under such level weaker assumptions.
A solution to this scenario must, on the one hand, strive for the best possible solution and, on the other hand, be open to a potential future degradation. Concretely, the robot should never attempt to perform an action that may prevent graceful degradation. In our example, while, in principle, running would be the most efficient way to reach the destination, it may cause a catastrophic failure in tier level 1, precluding the goals of every tier. Thus, the robot must be conservative and should cautiously move by walking.
In what follows, we propose a framework to specify problems like this one in the realms of automated planning, de-fine its solution concept, and show how to compute solutions with existing planning technology.
In this section we propose a multi-tier automated planning framework in which the knowledge engineer can specify a ranked set of assumptions about the environment and a corresponding set of objective goals. A solution to such framework will display adaptive behavior at execution time, by aligning the model and objectives w.r.t. run-time observations. Before doing so, though, we first go over the standard technical machinery for non-deterministic planning.
A fully observable non-deterministic (FOND) planning domain (Rintanen 2008; Gerevini, Bonet, and Givan 2006) is a pair consisting of a set of Boolean state variables V and an operator set O. A state
is the set of propositional variables that are true in the state. We use S to denote the set of all states and l to denote the complement of literal l.
An operator is a tuple Pre
Eff
, where o is a unique name, Pre
is a Boolean condition over V describing the preconditions of operator o, and Eff
, with
, is the (non-deterministic) effect of o where each
is a (set of) conditional effects
with C being a Boolean condition over V and E a set (conjunction) of literals.2 The intended meaning is that one of the
effects ensues nondeterministically, by the environment’s choice.
A policy controller is a function that maps state
to a set of (executable) actions
. A policy C executed from state
on domain D de-fines a set of possible executions Ex
of the form
, where
,
Pre
, and
is a possible successor state when
is executed in state
w.r.t. domain D, for all
. We use last
to denote the last state in (finite) execution
and Ex(D, s) to the set of all possible executions in D from state
(i.e., Ex(D, s) = Ex
, where
).
Finally, a FOND planning problem consists of a FOND domain D, an initial state
, and a goal
as a conjunction of literals from V . There has been several solution concepts for FOND planning depending on the fairness of non-deterministic actions. Roughly speaking, a fair action is one in which all effects occur infinitively often when the action is executed infinitively many times in the same state (Geffner and Bonet 2013; Sardina and D’Ippolito 2015). When all actions are assumed fair, a strong-cyclic plan guarantees that the agent, by “retrying,” eventually achieves the goal (Cimatti et al. 2003). In turn, when no fairness can be assumed, a plan with acyclic executions that reaches the goal in a bounded number of steps—a strong policy—is required. The Dual FOND (or FOND+) hybrid variation has recently been introduced to deal with domains that have both fair and unfair actions/-effects (Camacho and McIlraith 2016; Geffner and Geffner 2018). In that setting, a solution amounts to a policy whose “fair” executions w.r.t. the actions/effects assumed to be fair (not necessarily all) are goal reaching. Lastly, we note that while planning under non-determinism is EXPTIMEcomplete (Rintanen 2004), effective optimized techniques and solvers have been developed, and is an area of significant active work (e.g., (Muise, McIlraith, and Beck 2012; Kuter et al. 2008; Kissmann and Edelkamp 2009; Muise, Belle, and McIlraith 2014; Geffner and Geffner 2018)).
With the technical machinery on FOND planning at hand, we are ready to formally present our framework for multi-tier adaptive planning. Following (D’Ippolito et al. 2014), we aim for the knowledge engineering to be able to specify a variety of models carrying different assumptions.
Definition 1. A multi-tier planning domain (MTD) is a tuple such that:
1. is a set of FOND planning domains over the same variables V and operator signatures, and every operator has the same preconditions across all domains in
;
2. is a partial-order relation over
such that
implies Ex
Ex
for all states
; and
3. has a greatest element in
, denoted
, as well as a minimum element.
The first condition states that an MTD is just a collection of planning domains over the same vocabulary. While the action names and their preconditions ought to be the same in all domains, the effects of each operator may differ across domains. Such differences in operator effects will reflect different assumptions on the environment. However, the differences cannot be arbitrary: they are to reflect model “re-finements.” This is achieved by the second condition, which specifies that domains in lower tiers of the hierarchy () must produce the same behaviors as higher models (
), and possibly more. The intuition is that higher-level models are “refinements” of lower-level models, posing possibly more assumptions on the environment (e.g., by actions having fewer non-deterministic effects), hence permitting fewer execution runs.
As in standard planning, a problem instance task adds a specific initial situation and a (set of) objectives.
Definition 2. A multi-tier planning problem (MTP) is a tuple where
is an
is M’s initial state, and G is a function mapping each domain D in
to a goal G(D) (or just
Observe that unlike standard planning approaches, we allow the designer to specify various goals, depending on the risk imposed by the assumptions on the environment. Often, the weaker the assumptions, the lower level of functionality that may be guaranteed (D’Ippolito et al. 2014).
Example 1. Listing 1 depicts an MTP for the no-running example of Section , in which the modeler specifies three planning domains. The two actions walk and run are modeled at each of the three tiers. The actions’ schema and preconditions are the same across all tiers. However, the effects differ progressively, as assumptions are relaxed from higher tiers to lower tiers. For example, in third tier model , the most idealized model, it is assumed that movement actions always succeed, whereas in the lowest tier model
, actions may fail and may even do so catastrophically. Also observe the goals for each tier may differ, since when assumptions are relaxed, more ambitious goals may not be achievable. Because effects are incrementally relaxed across tiers, runs at lowers tiers are strict super sets of those in upper tiers.
Finally, we define a structure that associates a specific policy to each domain in an MTD, prescribing what behavior should ensue from the executor under the different models.
Definition 3. A multi-tier controller (MTC) for an MTD is a function
mapping each domain
to a specific policy C(D) (or just
The challenge now is to formally capture when an MTC amounts to a “solution” strategy for a multi-tier planning problem. To do so, it is important to first understand how the MTC structure is meant to be deployed in the environment. Intuitively, at any point in time, the executor is operating relative on some planning domain (i.e., model) of the world D from the ones available in , by carrying out its corresponding policy C(D) so as to bring about the level’s goal G(D). Initially, the executor deploys policy
from the initial problem state
on the most idealized domain
, aiming at achieving the most ambitious goal
. However, if at any point during execution, an inconsistency with the current model
is observed, the executor ought to switch to an alternative domain
such that
. Technically, an inconsistency amounts to observing an actual state s that cannot be explained with planning domain
. Of course, once the executor switches downwards—referred as degradation—the model it operates on to a more permissive one (i.e., one with weaker assumptions), the objective sought, and hence the strategy, must be changed too. A smart executor, though, aims to degrade gracefully, that is, as little as possible, switching to a planning domain that retains as many assumptions as possible about the environment (and the most ambitious goal).
Let us now develop the solution concept for MTPs. We define the set of triggering states for a domain in an MTD as those states in which the executor, when deployed in a given multi-tier controller as per the above operational scheme, may need to start operating under such domain. As expected, the initial state is the triggering state for the highest level, most idealized, domain
. For other domains, a triggering state amounts to a degradation step.
Definition 4. Let C be an MTC for a , and let s be a state (over variables
). We inductively define the set of triggering initial states for each planning domain
under C, denoted Init(D, C), as follows:
Let us explain the second case. Suppose the executor has so far been carrying out policy on a domain model
, from some (triggering) state
. Suppose this has yielded execution run
(consistent with
). However, when executing the next prescribed operator o (as per the corresponding policy
for
), the resulting evolution to a state s yields an execution
that can be explained by (i.e., its a legal execution in) domain D but not by any model higher than D (including
). When this happens, state s is a triggering state for D, that is state where the executor may have to start operating under domain model D when using policy
.
Next, for a controller C to be a solution for an MTP M, it must achieve the associated goal of a domain in M from all the triggering states of the domain in question.
Definition 5. An MTC C is a solution controller for an MTP iff for every domain
, the projected policy
is a solution plan for planning problem
, for every state
Init
Note that, unlike standard planning, this definition requires each policy to work from more than one initial state. However, it is not the case that all policies in C need to work from the initial state . Such a requirement would be too demanding for capturing the intended operational framework as described above, this is because most policies, if not all but
, will ever be used at state
(unless the system comes back to such state after some degradation).
Informally, an MTP is a collection of similar planning problems and a solution amounts to solution policies for each problem that can be “connected”, if necessary, at degradation time. A naive approach thus would repetitively compute solution policies for each planning problem, making sure they “connect.” We show here we can solve the whole problem in a principled manner and in one shot. Concretely, we build a single Dual FOND planning task from a given MTP M such that a strong-cyclic solution for
amount to an MTC solution for M. To argue for technique’s generality, we first identify a meaningful fragment of MTDs.
Definition 6. A planning domain is an oneof-refinement of a domain
iff
and for every
Pre
Eff
, there is a
-operator
Pre
Eff
such that Pre
Pre
and Eff
Eff
That is, is like
but may contain fewer non-deterministic effects on some operators. It turns out that, in the context of Dual FOND planning, oneof-refinements capture all possible refinements in a multi-tier planning task— any MTD is equivalent to a oneof-refinement type.
Theorem 1. Let be an MTD and
. Then,
(i.e., planning domain
is a refinement of domain
) iff there exists a planning domain
such that:
1. ExEx
, for all
(that is,
are equivalent planning domains); and
2. is an oneof-refinement of
.
This states that the only meaningful difference between ordered domains in comes, only, in the refined domain (
) having less (in terms of set inclusion) non-deterministic effects in some operators.
Compilation to Dual FOND planning Let be a a multi-tier planning problem such that
if and only if
is a oneof-refinement of D. Due to Theorem 1, restricting
to a oneof-refinement relation does not affect generality. From now on, for technical legibility and without loss of generality, we assume domains in
are STRIP-like with no conditional effects.3
In this section, we shall construct a single dual-FOND planning problem that will fully capture problem M. For compactness, we use Eff
to de- note the effects of operator o in planning domain D. We also abuse notation and treat non-deterministic effects as sets.
Let us start by explaining the general strategy being encoded into . Roughly speaking, the planning problem
will model a dynamic system running as per multi-tier specification M. As such, at any time, the system is operating relative to some model D in
(initially, the most ambitious
), trying to achieve D’s goal via an appropriate plan, and degrading to an adequate (lower) model when action outcomes’ do not align with model D. To achieve this, the encoding will model an iterative two-phase process in which an acting phase is, sometimes if necessary, followed by an alignment & degradation phase. A special variable act is used to distinguish both phases. As expected, during an acting phase, an operator representing some domain action is executed. This step involves the execution of a non-deterministic action with fair semantics, and the optional subsequent execution of an unfair version of the action. In the latter case, the system will then evolve to an alignment phase, in which the encoding verifies whether the outcomes seen correspond to the assumed current model D; and if not, the behavior is “degraded” to an appropriate (lower-level) model that is able to explain the observed outcome.
It turns out that one of the key challenges is to encode a proper and scalable alignment phase in a planning domain (i.e., in PDDL): how can we encode that a given effect could be explained by some model in (but not by another model)? In some sense, doing so would amount to reducing meta-level reasoning to the object (PDDL) level. We show that, via a clever encoding, that reduction is indeed possible. The technical difficulty is depicted in the following example. Example 2. Consider the case in which the robot is operating in the highest domain model
and decides to execute action walk. Upon execution, the robot senses variable scratch true—the robot is now damaged. In the standard (intuitive) configuration, in which the robot starts nondamaged, the robot should degrade its operational model to domain
, as that is the highest model explaining the damage. However, if the robot happens to start damaged already, then domain model
still explains the transition, and no degradation should occur. Here, walk’s effects under
and
are indistinguishable.
This example shows that just observing a proposition (scratch) in a transition that does not appear in an effect (walk’s effect under ) does not directly imply the effect may not explain the transition. Can we then characterize, succinctly, under which conditions a set of observed propositions E is explained by some operator o in a domain model D? It turns out we can.
Definition 7 (Effect Explicability). Let E be a set of literals (e.g., effects that have just seen to be true after an action execution). The conditions for operator o in domain D to explain E, denoted Explains[o, D, E], is defined as follows (recall is the set symmetric difference operation):
That is, some effect of o in model D yields the same result as effect E, if all the literals that one effect has an the other does not were already true (at the outset of o’s execution). In our Example 2, if we take E to be the second effect of walk in
(i.e., E = {(not (at ?o)), (at ?d), (scratch)}) we have Explains
, as the literal scratch is the only one in the effects’ symmetric difference.
Observe that if E and are inconsistent, the formula will contain the conjunction of a proposition and its negation, thus reducing to false. In fact, the following result guarantees the intended meaning of the above definition.
Lemma 1. Let be two domain states. Let
be a set of literals including at least all new literals in
w.r.t. to s. Then, state
is a possible successor when operator o is executed in state
under model D if and only if s |= Explains[o, D, E].
We are now ready to provide the encoding of M into a dual-FOND planning problem .
Domain variables. The set of propositional variables of
is obtained by extending the set of variables V in M’s domains with the following additional variables:
• , for each domain
, that will be used to signal that model D is a/the highest model explaining the effect of the last executed action;
• , for each domain
, that will be used to track the most “ambitious” compatible model so far;
• act, use to denote the system is in the acting phase (otherwise, it is in the alignment phase);
• , for each operator
, that will be used to ensure the execution of a unfair action; and
• end, used to denote the goal achievement of the current model of execution.
Initial state & goal condition. The initial state of is:
This encodes the initial state of the MTP M and the fact that the system starts in the
’s greatest, most ambitious, domain model
(proposition
and all other
’s are false) and in the action phase. In addition, all effect level signaling variables
are initialized to false (no action has been executed), as well as the goal variable end.
Finally, the goal condition of is simply
end. We will see below which actions make variable end true.
Domain operators. The planning domain will include two types of operators, one for modeling the actual domain actions and one for implementing the alignment check (and potential degradation) process. Let us start with the former.
So, for each (domain) operator o in domain , we include a operator
Pre, Eff
in
, where:
• Pre = Preact
, that is, action
is executable when o itself is executable in D, and the system is currently operating under model D and is the fair-acting phase (act is true and all
are false); and
• Eff = Eff , that is, when operator
is executed,either one of original effects of o in D ensues or a distinguished predicate
is made true.
When one of the effects of o in domain D happens, it just resembles the dynamics of domain D. However, if the effect that ensues is , the system evolves into a “unfair-acting” phase (act
), explained after the following example.
Example 3. The resulting walk action for the domain level in the compilation would look as follows in PDDL:
:parameters (?o - cell ?d - cell) :precondition (and (at ?o) (adj ?o ?d) (not (broken))
Next, when the the system evolves to the unfair-acting phase (e.g., due to effect u_walk happening in the example above), the only executable action will be a second version of domain operator o, which in turn will include all effects of o across all domains in , together with additional bookkeeping variables
to support the next alignment, and potential degradation, reasoning phase. More concretely, for each domain operator o in
includes a rather powerful operator
Pre, Eff
, where:
• Pre = act Pre
, that is, the system is in the unfair-acting phase for operator o; and
• Eff is a set of nondeterministic effects, each being a col-
where Explains
Explains
. Intuitively, the operator
contains each possible effects E of o (from any domain in M) as a non-deterministic option. In turn, the set of conditional effects for a particular effect E will not only make the effect E itself ensue, but will also set a “marker” proposition
signaling the highest domains explaining the effect in question. To realize that, condition
above states that the original effect E is explained (as per Definition 7) by (some effect of) operator o at domain model
but not by any other model higher than
. When that is the case, proposition
is set to true, recording the fact that
is the highest model explaining such effect. Observe that by the way all conditions are designed, they ought to be mutually exclusive, so only one
will be made true. In addition, act is set to false so as to force the reasoner into the alignment phase, to be explained shortly. (We note that the effects of level D itself are accounted when
.)
Importantly, while operator will be treated fair, action
will be treated as unfair—this is where dual FOND semantics (Geffner and Geffner 2018) come into play. Also, as the following example shows, significant syntactic simplifi-cations can be achieved in
by analyzing conditions in conditional effects and precondition of the action.
Example 4. Let us see complete Example 3 by showing the unfair version of the walk action. After syntactic simplifica-tion w.r.t. conditions and the action precondition, we obtain the simpler, more readable, equivalent action:
As we discussed before, this unfair action contemplates the effects present in all the domain models. The intended meaning of this is that whenever an action executes, it may fail and we may observe effects of any domain level. However, we do not want the planner to rely on these possible failures, so we contemplate them as unfair actions.
Alignment & degradation operators. When the unfair version of a domain operator has been executed, an effect could ensue that might not be explained by the current domain under which the reasoner is operating under (encoded via propositions ). If so, the system ought to gracefully degrade to a lower level model that is able to explain the last system evolution. We encode this reasoning, and potential degradation, in the so-called alignment phase (act is false).
In the best case, the state observed after the execution of an action corresponds to one of the expected ones w.r.t. the current planning domain model the executor is operating under. Technically, the reasoner continues operating under current model D (proposition is true), provided domain D has been able to explain the evolution of the last executed action: proposition
has been set to true for some domain
that is either D itself or a higher one in the hierarchy (recall effects in higher level domains are subsets of ). So, in such case, the planner (and executor) is able to execute special action
Pre, Eff
to keep planning/executing under the current model and goal:
• Pre act
, that is, the action can be executed during the alignment phase when the current domain or one of its refinements accounts for the last effect outcome.
• Eff = (act , that is, effect signals are allreset and the system goes back to the action phase.
If, on the other hand, the state observed does not conform to the current operating model (i.e., proposition is false), then the system must degrade to a lower tier where the environment model would fit the observation, and adjust the objective to the corresponding (often less ambitious) goal. Needless to say, we expect a “smart” reasoner/executor to degrade as little as possible, by retaining as many assumptions on the environment as possible and only dropping those that have been observed to be wrong. This will allow the agent to aim for the highest, most valuable, goal so far.
Technically, when such that
, we include an operator
Pre, Eff
in
, where:
• Eff ∧
∧ act).
That is, the controller can degrade from current operating domain D to domain if the last effect seen was explained by lower domain
or any other domain higher than
that is unrelated to D (so as to handle MTPs with a non-linear structure). The effect results in the controller being degraded to level
(proposition
becomes true), all booking explicability effect prepositions
being reset, and the reasoner progressing to the acting phase.
Note that, effectively, the dynamics of level variables are outside the control of the reasoner, as these depend only on which non-deterministic effects have occurred and how (i.e., how variables
have been set).
Goal operators. The only part remaining is the overall goal of the multi-tier problem. Intuitively this should be “achieve the highest level goal”, which under a conservative degradation process, it reduces to “achieve the goal of the current operating model.” We therefore include goal actions end
, one per domain
.
This completes the encoding of a multi-tier planning problem M into a single non-deterministic planning domain . We now prove its correctness w.r.t. Definition 5. First, any solution policy for the planning task amounts, as is, to a solution to the corresponding multi-tier planning problem.
Theorem 2. If is a strong-cyclic solution for
, then controller
is an MTC solution for M, where:
Proof Sketch. Consider and
such that
Init
, and an infinite and fair execution
Ex
. We show that goal G(D) holds true somewhere along
as follows:
1. We transform into an execution
Ex
, with
act
, by adding propositions act and
to every state in
and replacing every domain operator o with its
version.
2. If appears infinitively often in
, we replace every second appearance of the form
by two-action steps
act
such that
is the highest domain in
that contains the effect of o that supports the transition
—we know there is one because
is a legal execution in
from state
. By do- ing this changes in
we are guarantee that the execution is fair, while still preserving the fact that every domain action in it has the effects as per domain D. So, execution
mirrors the original
for domain D but over the extended language of
.
3. Because Init
, there exists a finite execution
Ex
(i.e., execution in
via policy
) that ends in state
act}. This means that
Ex
, and since
is fair (w.r.t. the
fair actions) and has
always true, it follows that
has to reach the
’s goal by executing operator CHECKGOAL
. Then, its precondition
holds true at some point in
and therefore in
too.
That is, the MTC controller C under domain D and in state s, what the strong-cyclic solution for prescribes when
is true and the reasoning cycle is in the acting phase.
In addition, any possible MTC solution will be represented by some strong-cyclic policy of (i.e., completeness). Theorem 3. If C is an MTC solution for M, then there exists a strong-cyclic solution
for
such that
, for every domain D in M (where
is as in Theorem 2). Proof Sketch. Policy
follows the domain actions prescribed by
exactly, augmented with the booking auxiliary actions as needed. A similar argument, based on execution traces, as in Theorem 2 can be built.
We close by noting that the size of is increased by a linear number of bookkeeping propositional variables, and a quadratic number (w.r.t. the number of domains in
) of extra actions. So, while the multi-tier planning framework appears to be more involved than the standard (non-deterministic) planning, it can be suitably reduced to the latter, with an encoding that is, arguably, fairly natural and comparable in size. Importantly, though, the solution proposed relies on the fact that we can specify both fair and unfair actions in the same planning model. This is a feature that will prove a challenge when actually realizing the technique in current planning technology, as we shall see next.
In this section, we demonstrate that MTPs can indeed be solved today with existing planning technology, but argue that additional effort in Dual FOND is necessary. The first obstacle is the availability of FOND planning technology supporting both fair and unfair assumptions. To the best of our knowledge, the only off-the-shelf planner to do so is Geffner and Geffner (2018)’s FOND-SAT system. By leveraging on SAT solvers, their system yields an elegant declarative technique for FOND planning that features the possibility of combining fair and unfair actions. So, we report on using FOND-SAT over the encoding for our non-running example. Notwithstanding, the experiments reported are intended to demonstrate the existence of systems to solve MTPs and to provide a baseline for future work, rather than for providing a performance evaluation. Listing 2 shows a fragment, in a readable plan-like format, of the outcome when FOND-SAT is ran on our encoding for the non-running example. The full controller is shown in Appendix A.4. First of all, the plan cautiously avoids the run action altogether, as it may get the robot broken and precludes the achievement of all tier goals. After performing the walk fair-version action in (line 2) corresponding to the highest model , the plan checks its
effects (line 3). If proposition u_walk remains false (lines 4-9), the effect in model has occurred—the robot has done a successful move. If another walk (line 4) succeeds as well (lines 5-7), the robot achieves the top level
’ goal (line 6). Note that, in such a run, no alignment action is included: the walk unfair version has never been performed and hence only effects of
has ensued.
If, instead, the first walking action (line 2) yields the special effect u_walk, the plan jumps to line 11. There, the only action available is the unfair version of walking (line 11), which has all the effects, as non-deterministic options, of the walking action across all domains , and
. As FOND-SAT does not handle conditional effects, we simulate each conditional effect for the effect chosen by a set of walk_eE_explained_by_dx whose precondition is
, together with the original precondition of the operator (walk in this case). Finally, if the effect chosen could be explained by the current operating domain (line 13, explained by domain
), the system executes a continue operation at the current level, enabling the next domain action. On the other hand, when the effect is explained by a lower domain than the one operating under (lines 18 and 23), degradation to the corresponding domain is carried out (line 19 and 24).
It is easy to see how this plan also represents a multi-tier controller which only outputs the fair version of the operators, and all the other auxiliar actions and propositions are the controller internal memory.
Now, what would happen if the robot starts scratched as discussed in Example 2? It turns out the problem becomes unsolvable. The reason is that any observed scratch after movement does not need to be explained by a different model than (e.g., the walk effect of
that scratches the robot), as the scratch is explained already by being true originally. Thus, when walking always advances the agent, it would never degrade its behavior, remain operating in
without ever achieving
’s goal. If, however, we drop the non-scratched requirement from
, the problem would be solvable again, though with a slightly different policy. The robot would just try to achieve the top goal, degrading only to
if it does not move after a walk action. Since, as discussed,
’s scratch effect would be explained by
itself, line 18 would become walk_e2_explained_by_l3 and line 19 would become continue_l3. The policy for this example is shown in Appendix A.
While the above demonstrates the possibility to solve MTPs using (the only) existing planning technology, running our, arguably simple, example takes around 600 seconds to produce the 29 states first controller shown in the Appendix A in an i7-4510 CPU with 8GB of RAM, when using the off-the-shelf version of the planner. This clearly indicates the need for more and better dual-FOND implementations or the development of specialized optimizations for MTPs. For example, as we are only allowing degradation and not upgrades, one can modify the SAT encoding to specify a number of controllers to be used per domain level, without allowing transitions from lower to upper levels. In preliminary tests we did we experienced a speed-up of approx 30%. Another optimization involves an estimation of the number of controllers required to solve the MTP, for example by running the top level domain which will provide a lower bound.
The work presented in this paper is mostly related to existing work that aim to better handle execution failures and exceptions as well work that aim to provide richer goal spec-ifications. Fault Tolerant Planning (FTP) explores how to build plans that can cope with operations’ failures. Notably, the work of Domshlak (2013) considers the FTP problem via reductions to planning. There, k-admissible plans are defined as those that guarantee to achieve the goal even if an operation happens to fail up to k times. Like ours, the approach reduces to (classical) planning via an intelligent compilation inspired in that of (Bonet and Geffner 2011). In other words, the problem is reduced to bounded liveness. We do not impose such a restriction and our adaptive framework is actually orthogonal to theirs. In fact, it would be possible to accommodate k-admissible plans within our hierarchy, so that degradation is trigger only after some number of observed failures. In Robust Planning (RP), the usual approach is to search for the best plan assuming the worst possible environment, e.g., (Buffet and Aberdeen 2005). Typically, RP looks for subclasses of MDPs for which policies are guaranteed to eventually reach a goal state. Policies are normally computed as solutions to Stochastic Shortest Path problems. Our approach, instead, is based on a qualitative formaliza-
tion of action and change, with no specification of probabilities, which can sometimes be not trivial, unfeasible, or simply uneconomical to obtain. More importantly, having a single problem description forces the engineer to represent a goal that may never be achieved in the actual environment. Indeed, the engineer may find it difficult, or impossible, to model degraded goals depending on the actual environment’s dynamics; it would require the goals to somehow predicate on the probabilities associated to the assumed degraded behavior. We in turn provide an accessible formalism for modelling different levels of assumptions and goals. Also, our technique is not rooted in dynamic programming or optimization algorithms, but in planning ones.
In terms of representation formalisms, there has been efforts to provide more powerful goal specifications, such as Lago, Pistore, and Traverso (2002)’s EAGLE language, De Giacomo et al. (2016)’s Agent Planning Programs, and Shiv- ashankar et al. (2012)’s Hierarchical Goal Networks (HGN), but always assuming operation on a single model of the environment aiming for a single goal.
Our work also keeps some relationship with deliberative control architectures such as T-REX (McGann et al. 2008), and other similar planning-execution-monitoring architectures like Propice-Plan (Despouys and Ingrand 1999) or CPEF (Myers 1999). Our approach differs in that those systems usually replan when goals (or perceived state) change, whereas our proposal aims to generate a policy that takes into account all the scenarios prior to the system’s execution (as per model). As a result, we can provide guarantees of goal achievability and program termination (although it s more demanding computationally). Also, our framework puts more emphasis on formal semantics (based on PDDL and transition systems), modeling, and plan synthesis, whereas the mentioned agent architectures primarly focus on implemented platforms and execution.
As stated, our work is inspired by that of D’Ippolito et al. (2014) in the area of Software Engineering. At a general level, our account is rooted in Knowledge Representation, and specifically, automated planning, which allows us to leverage on advanced representation formalisms (e.g., PDDL) as well as computational techniques for such representations. Still, as ours, their approach provides support for a tiered architecture with multiple behavioural models and goals. Unlike ours, though, their account is limited to a lineal hierarchy of models, so independent assumptions, as in our example, cannot be represented. More importantly, D’Ippolito et al. require solution (sub-)controllers of lower tiers to simulate those in upper tiers, and thus it would not handle our simple no-running example. Finally, being rooted in knowledge representation, we area able to exploit planning technology.
The overarching motivation of this work is to addresses Ghallab, Nau, and Traverso (2014)’s call for an actors’ perspective on planning, by proposing a pure planningbased framework that “integrates better planning and acting.” In our framework, the knowledge engineer has the opportunity to consider multiple levels of assumptions and goals. The problem amounts to synthesize a meta-controller that, while running, is able to gracefully degrade its “level of service” when the assumptions on the environment are not met. We developed a compilation technique to construct such adaptive meta-controllers via dual-FOND planning. We note that plain FOND planning, where every action is assumed fair, would not work, as the agent may decide to keep trying an action to obtain one of the “failing” effects to achieve an “easier” lower-level goal (this artifact was already noted by Camacho and McIlraith (2016)).
There are several limitations of our framework in its current form. The approach is based on lattice structures that do not guarantee full ordering of environment models. As a consequence, the executor may have multiple optimal options to degrade execution from a given tier, and it has to “guess” one. But, upon a later inconsistency, the executor may find out it should have degraded differently. Since, the proposal does not provide support to move “sideways” in the same tier, the executor is forced to degrade downwards again. Also, we have not provided a mechanism for enhancement, that is, for “upgrading” to more refined models, for example, when certain transient failure has been fixed.
Alberto Pozanco carried out this work during his visit to RMIT University supported by FEDER/Ministerio de Ciencia, Innovaci´on y Universidades Agencia Estatal de Investigaci´on TIN2017-88476-C2-2-R and RTC-2016-5407-4.
[2011] Bonet, B., and Geffner, H. 2011. Planning under partial observability by classical replanning: Theory and experiments. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), 1936–1941.
[2015] Bonet, B., and Geffner, H. 2015. Policies that gener- alize: Solving many planning problems with the same policy. In Proc. of the International Joint Conference on Artifi-cial Intelligence (IJCAI), 2798–2804.
[2005] Buffet, O., and Aberdeen, D. 2005. Robust planning with (l)rtdp. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), 1214–1219.
[2016] Camacho, A., and McIlraith, S. A. 2016. Strong- cyclic planning when fairness is not a valid assumption.
[2003] Cimatti, A.; Pistore, M.; Roveri, M.; and Traverso, P. 2003. Weak, strong, and strong cyclic planning via symbolic model checking. Artificial Intelligence 147(1-2):35–84.
[2016] De Giacomo, G.; Gerevini, A.; Patrizi, F.; Saetti, A.; and Sardina, S. 2016. Agent planning programs. Artificial Intelligence 231:64–106.
[1999] Despouys, O., and Ingrand, F. F. 1999. Propiceplan: Toward a unified framework for planning and execution. In Biundo, S., and Fox, M., eds., Recent Advances in AI Planning, 5th European Conference on Planning, ECP’99, Durham, UK, September 8-10, 1999, Proceedings, volume 1809 of Lecture Notes in Computer Science, 278– 293. Springer.
[2014] D’Ippolito, N.; Braberman, V. A.; Kramer, J.; Magee, J.; Sykes, D.; and Uchitel, S. 2014. Hope for the best, prepare for the worst: multi-tier control for adaptive systems. 688–699.
[2013] Domshlak, C. 2013. Fault tolerant planning: Com- plexity and compilation. In Proc. of the International Conference on Automated Planning and Scheduling (ICAPS).
[2006] Fox, M.; Gerevini, A.; Long, D.; and Serina, I. 2006. Plan stability: Replanning versus plan repair. In Proc. of the International Conference on Automated Planning and Scheduling (ICAPS), 193–202. AAAI Press.
[2013] Geffner, H., and Bonet, B. 2013. A Concise Introduction to Models and Methods for Automated Planning. Morgan & Claypool Publishers.
[2018] Geffner, T., and Geffner, H. 2018. Compact policies for fully observable non-deterministic planning as SAT. In Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling, ICAPS 2018, Delft, The Netherlands, June 24-29, 2018., 88–96.
[2006] Gerevini, A.; Bonet, B.; and Givan, B., eds. 2006. Booklet of 4th International Planning Competition.
[2004] Ghallab, M.; Nau, D. S.; and Traverso, P. 2004. Automated Planning: Theory and Practice. Morgan Kaufmann Publishers Inc.
[2014] Ghallab, M.; Nau, D. S.; and Traverso, P. 2014. The actor’s view of automated planning and acting: A position paper. Artificial Intelligence 208:1–17.
[2009] Kissmann, P., and Edelkamp, S. 2009. Solving fully- observable non-deterministic planning problems via translation into a general game. In Proceedings of the Annual German Conference on AI, 1–8.
[2008] Kuter, U.; Nau, D., S.; Reisner, E.; and Goldman, R., P. 2008. Using classical planners to solve nondeterministic planning problems. In Proc. of the International Conference on Automated Planning and Scheduling (ICAPS), 190–197.
[2002] Lago, U. D.; Pistore, M.; and Traverso, P. 2002. Plan- ning with a language for extended goals. In Proc. of the National Conference on Artificial Intelligence (AAAI), 447– 454.
[2008] McGann, C.; Py, F.; Rajan, K.; Thomas, H.; Hen- thorn, R.; and McEwen, R. S. 2008. A deliberative architecture for AUV control. In 2008 IEEE International Conference on Robotics and Automation, ICRA 2008, May 19-23, 2008, Pasadena, California, USA, 1049–1054. IEEE.
[2014] Muise, C.; Belle, V.; and McIlraith, S. A. 2014. Computing contingent plans via fully observable non-deterministic planning. In Proc. of the National Conference on Artificial Intelligence (AAAI).
[2012] Muise, C.; McIlraith, S. A.; and Beck, J. C. 2012. Im- proved non-deterministic planning by exploiting state relevance. In Proc. of the International Conference on Automated Planning and Scheduling (ICAPS), 172–180.
[1999] Myers, K. L. 1999. CPEF: A continuous planning and execution framework. AI Magazine 20(4):63–69.
[2003] Rintanen, J. 2003. Expressive equivalence of for- malisms for planning with sensing. In Proc. of the International Conference on Automated Planning and Scheduling (ICAPS), 185–194.
[2004] Rintanen, J. 2004. Complexity of planning with par- tial observability. In Proc. of the International Conference on Automated Planning and Scheduling (ICAPS), 345–354.
[2008] Rintanen, J. 2008. Regression for classical and non- deterministic planning. In Proc. of the European Conference in Artificial Intelligence (ECAI), 568–572.
[2015] Sardina, S., and D’Ippolito, N. 2015. Towa rds fully observable non-deterministic planning as assumption-based reactive synthesis. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), 3200–3206.
[2012] Shivashankar, V.; Kuter, U.; Nau, D. S.; and Alford, R. 2012. A hierarchical goal-based formalism and algorithm for single-agent planning. In Proc. of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 981–988.
[2008] Srivastava, S.; Immerman, N.; and Zilberstein, S. 2008. Foundations of Generalized Planning. Technical Report UM-CS-2008-039, Dept. of Computer Science, Univ. of Massachusetts, Amherst.
The following is graphical representation of the policy obtained for the non-running scenario:
In turn, the controller for the variant of the scenario where (1) the robot is initially scratched; and (2) the non-scratched requirement from ’s goal is dropped is as follows: