b

DiscoverSearch
About
My stuff
The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems
2018·arXiv
Abstract
Abstract

: Learning algorithms have shown considerable prowess in simulation by allowing robots to adapt to uncertain environments and improve their performance. However, such algorithms are rarely used in practice on safety-critical systems, since the learned policy typically does not yield any safety guarantees. That is, the required exploration may cause physical harm to the robot or its environment. In this paper, we present a method to learn accurate safety certificates for nonlinear, closed-loop dynamical systems. Specifically, we construct a neural network Lyapunov function and a training algorithm that adapts it to the shape of the largest safe region in the state space. The algorithm relies only on knowledge of inputs and outputs of the dynamics, rather than on any specific model structure. We demonstrate our method by learning the safe region of attraction for a simulated inverted pendulum. Furthermore, we discuss how our method can be used in safe learning algorithms together with statistical models of dynamical systems.

image

Safety is among the foremost open problems in robotics and artificial intelligence [1]. Many autonomous systems, such as self-driving cars and robots for palliative care, are safety-critical due to their interaction with human life. At the same time, learning is necessary for these systems to perform well in a priori unknown environments. During learning, they must safely explore their environment by avoiding dangerous states from which they cannot recover. For example, consider an autonomous robot in an outdoor environment affected by rough terrain and adverse weather conditions. These factors introduce uncertainty about the relationship between the robot’s speed and maneuverability. While the robot should learn about its capabilities in such conditions, it must not perform a maneuver at a high speed that would cause it to crash. Conversely, traveling at only slow speeds to avoid accidents is not conducive to learning about the extent of the robot’s capabilities.

To ensure safe learning, we must verify a safety certificate for a state before it is explored. In control theory, a set of states is safe if system trajectories are bounded within it and asymptotically converge to a fixed point under a fixed control policy. Within such a region of attraction (ROA) [2], the system can collect data during learning and can always recover to a known safe point. In this paper, we leverage Lyapunov stability theory to construct provable, neural network-based safety certificates, and adapt them to the size and shape of the largest ROA of a general nonlinear dynamical system.

Related work Lyapunov functions are convenient tools for stability (i.e., safety) certification of dynamical systems [2] and for ROA estimation [3, 4, 5]. These functions encode long-term behaviour of state trajectories in a scalar value [6], such that a ROA can be encoded as a level set of the Lyapunov function. However, Lyapunov functions for general dynamical systems are difficult to find; computational approaches are surveyed in [7]. A Lyapunov function can be identified efficiently via a semi-definite program (SDP, [8]) when the dynamics are polynomial and the Lyapunov function is restricted to be a sum-of-squares (SOS) polynomial [9]. Other methods to compute ROAs include maximization of a measure of ROA volume over system trajectories [10], and sampling-based approaches that generalize information about stability at discrete points to a continuous region [11].

This paper is particularly concerned with safety certificates for dynamical systems with uncertainties in the form of model errors. In robust control [12], the formulation of SDPs with SOS Lyapunov functions is used to compute ROA estimates for uncertain linear dynamical systems with the assumption of a worst-case linear perturbation from a known bounded set [13, 14]. Learning-based control methods with a Gaussian process (GP, [15]) model of the system instead consider uncertainty in a Bayesian manner, where model errors are reduced in regions where data has been collected. The methods in [16, 17] estimate a ROA with Lyapunov stability certificates computed on a discretization of the state space, which is used for safe reinforcement learning (RL, [18]). The Lyapunov function is assumed to be given in [16], while [17] uses the negative value (i.e., cost) function from RL with a quadratic reward. Ultimately, this approach is limited by a shape mismatch between level sets of the Lyapunov function and the true largest ROA. For example, a quadratic Lyapunov function has ellipsoidal level sets, which cannot characterize a non-ellipsoidal ROA, while the SOS approach is restricted to fixed monomial features. To improve safe exploration for general nonlinear dynamics, we want to learn these features to determine a Lyapunov function with suitably shaped level sets.

Contributions In this paper, we present a novel method for learning accurate safety certificates for general nonlinear dynamical systems. We construct a neural network Lyapunov candidate and, unlike past work in [19, 20], we structure our candidate such that it always inherently yields a provable safety certificate. Then, we specify a training algorithm that adapts the candidate to the shape of the dynamical system’s trajectories via classification of states as safe or unsafe. We do not depend on any specific structure of the dynamics for this. We show how our construction relates to SOS Lyapunov functions, and compare our approach to others on a simulated inverted pendulum benchmark. We also discuss how our method can be used to make safe learning more effective.

We consider a discrete-time, time-invariant, deterministic dynamical system of the form

image

where  t ∈ Nis the time step index, and  xt ∈ X ⊂ Rdand  ut ∈ U ⊂ Rpare the state and control inputs respectively at time step t. The system is controlled by a feedback policy  π: X → Uand the resulting closed-loop dynamical system is given by  xt+1 = fπ(xt)with  fπ(x) = f(x, π(x)). We assume this policy is given, but it can, for example, be computed online with RL or optimal control. This policy  πis safe to use within a subset  Sπof the state space X. The set  Sπis a ROA for  fπ, i.e., every system trajectory of  fπthat begins at some  x ∈ Sπalso remains in  Sπand asymptotically approaches an equilibrium point  xO ∈ Sπwhere  fπ(xO) = xO[2]. We assume  xO = 0without loss of generality. Hereafter, we use  Sπto denote the true largest ROA in X under the policy  π.

A reliable estimate of  Sπis critical to online learning systems, since we need to ensure that a policy is safe to use on the real system before it can be deployed. The goal of this paper is to estimate the largest safe set  Sπ. We must also ensure safety by never overestimating  Sπ, i.e., we must not identify unsafe states as safe. For this to be feasible, we make a regularity assumption about the closed-loop dynamics; we assume  fπis Lipschitz continuous on X with Lipschitz constant  Lfπ ∈ R>0. This is a weak assumption and is even satisfied when a neural network policy is used [21].

2.1 Safety Certification with Lyapunov Functions

One way to estimate the safe region  Sπis by using a Lyapunov function. Given a suitable Lyapunov function v, a safe region for the closed-loop dynamical system  xt+1 = fπ(xt)can be determined.

Theorem 1 (Lyapunov’s stability theorem [6]): Suppose  fπis locally Lipschitz continuous and has an equilibrium point at  xO = 0. Let  v : X → Rbe locally Lipschitz continuous on X. If there exists a set  Dv ⊆ Xcontaining 0 on which v is positive-definite and  ∆v(x) := v(fπ(x)) − v(x) < 0, ∀x ∈ Dv \ {0}, then  xO = 0is an asymptotically stable equilibrium. In this case, v is known as a Lyapunov function for the closed-loop dynamics  fπ, and  Dvis the Lyapunov decrease region for v.

image

Figure 1. Fig. 1a illustrates a shape mismatch between the largest level set V(c) (blue ellipsoid) of a quadratic Lyapunov function v contained within the decrease region  Dv(green dashes), and the safe region  Sπ (black).We cannot certify all of  Sπ with v, which limits exploration in safe learning. Instead, we train a Lyapunov candidate  vθwith parameters  θ to match Sπwith a level set  Vθ(cS), as in Fig. 1b, via classification of sampled states as “safe” with ground-truth label  y = +1 (i.e., x ∈ Sπ) or “unsafe” with  y = −1 (i.e., x /∈ Sπ).

Theorem 1 states that a Lyapunov function v characterizes a “basin” of safe states where trajectories of  fπ“fall” towards the origin  xO = 0. If we can find a positive-definite v such that the dynamics always map downwards in the value of v(x), then trajectories eventually reach v(x) = 0, thus x = 0. To find a ROA, rather than checking if v decreases along entire trajectories, it is sufficient to verify the one-step decrease condition  ∆v(x) < 0for every state x in a level set of v.

Corollary 1 (Safe level sets [6]): Every level set  V(c) := �x | v(x) ≤ c�, c ∈ R>0contained within the decrease region  Dvis invariant under  fπ. That is,  fπ(x) ∈ V(c), ∀x ∈ V(c). Furthermore,  limt→∞ xt = 0for every  xtin these level sets, so each one is a ROA for  fπand  xO = 0.

Intuitively, if v(x) decreases everywhere in the level set  V(c1), except at  xO = 0where it is zero, then  V(c1)is invariant, since the image of  V(c1)under  fπis the smaller level set  V(c2)with  c2 < c1. If v is also positive-definite, then this ensures trajectories that start in a level set V(c) contained in the decrease region  Dvremain in V(c) and converge to  xO = 0. To identify safe level sets, we must check if a given Lyapunov candidate v satisfies the conditions of Theorem 1. However, the decrease condition  ∆v(x) < 0is difficult to verify throughout a continuous subset  Dv ⊆ X. It is sufficient to verify the tightened safety certificate  ∆v(x) < −L∆vτat a finite set of points that cover  Dv, where  L∆v ∈ R>0is the Lipschitz constant of  ∆vand  τ ∈ R>0is a measure of how densely the points cover  Dv[17]. We can even couple this with bounds on  fπfrom a statistical model to certify high-probability safe sets with the certificate  ∆ˆv(x) < −L∆vτ, where  ∆ˆv(x)is an upper confidence bound on  ∆v(x). A GP model of  fπis used for this purpose in [17].

2.2 Computing SOS Lyapunov Functions

In general, a suitable Lyapunov candidate v is difficult to find. Computational methods often restrict v to a particular function class for tractability. The SOS approach restricts v(x) to be polynomial, but is limited to polynomial dynamical systems, i.e., when  fπ(x)is a vector of polynomials in the elements of x [9, 22, 23]. In particular, the SOS approach enforces  v(x) = m(x)⊤Qm(x), where m(x) is a vector of a priori fixed monomial features in the elements of x, and Q is an unknown positive-semidefinite matrix. This makes v(x) a quadratic function on a monomial feature space. A SDP can be efficiently solved to yield a Q that simultaneously guarantees that v satisfies the assumptions of Theorem 1 and has the largest possible level set in its decrease region  Dv. That is, the positive-definiteness of v and the negative-definiteness of  ∆vin  Dvare enforced as constraints in the SDP. This contrasts the more general approach described in Sec. 2.1, where a Lyapunov candidate v is given and then the assumptions of Theorem 1 are verified by checking discrete points.

With the SOS approach and a suitable choice of  m(x), Sπcan be estimated well with a level set V(c) of v, since the monomial features allow Lyapunov functions with shapes beyond simple ellipsoids to be found. However, the SOS approach requires polynomial dynamics, and the best choice of m(x) can be difficult to determine. Without a suitable Lyapunov function, we face the problem of a shape mismatch between V(c) and  Sπ. This is exemplified in Fig. 1a, where level sets of quadratic v are ellipsoidal while  Sπis not, which limits the region of the state space that is certifiable as safe by v.

In this section, we establish a more flexible class of parameterized Lyapunov candidates that can satisfy the assumptions on v in Theorem 1 by virtue of their structure and gradient-based parameter training. In particular, we show how a binary classification problem based on whether each state x lies within the safe region  Sπcan be formulated to train the parameterized Lyapunov candidate.

3.1 Construction of a Neural Network Lyapunov Function

We take the SOS approach in Sec. 2.2 as a starting point to construct a neural network Lyapunov candidate. The SOS Lyapunov candidate  v(x) = m(x)⊤Qm(x)is a Euclidean inner product on the transformed space  Y :=�φ(x), ∀x ∈ X�with  φ(x) := Q1/2m(x). The ability of the SOS Lyapunov candidate v to certify safe states for  fπdepends on the choice of monomials in m(x). We interpret these choices as engineered features that define the expressiveness of v in delineating the decision boundary between safe and unsafe states. Rather than choose such features manually and parameterize  φ(x)with Q only, we propose the Lyapunov candidate  vθ(x) = φθ(x)⊤φθ(x)to learn the requisite features, where  φθ : Rd → RDis a feed-forward neural network with parameter vector  θ. Feed-forward neural networks are expressive in that they can approximate any continuous function on compact subsets of  Rdwith a finite number of parameters [24, 25]. In Sec. 3.2, we exploit this property together with gradient-based parameter training to closely match the true ROA  Sπwith a level set of the candidate  vθwithout the need to engineer individual features of  φ.

We cannot use an arbitrary feed-forward neural network  φθin our Lyapunov candidate, since the conditions of Theorem 1 must be satisfied. Otherwise, the resulting candidate  vθcannot provide any safety information. In general,  φθis a sequence of function compositions or layers. Each layer has the form  yℓ(x) = ϕℓ(Wℓyℓ−1(x)), where  yℓ(x)is the output of layer  ℓfor state  x ∈ X, ϕℓis a fixed element-wise activation function, and  Wℓyℓ−1(x)is a linear transformation parameterized by  Wℓ ∈ Rdℓ×dℓ−1. To satisfy the assumptions of Theorem 1,  vθmust be Lipschitz continuous on X and positive-definite on some subset of X around  xO = 0. To this end, we restrict  vθto be positive-definite and Lipschitz continuous on X for all values of  θ := {Wℓ}ℓwith a suitable choice of structure for  φθ.

Theorem 2 (Lyapunov neural network): Consider  vθ(x) = φθ(x)⊤φθ(x)as a Lyapunov candidate function, where  φθis a feed-forward neural network. Suppose, for each layer  ℓin  φθ, the activation function  ϕℓand weight matrix  Wℓ ∈ Rdℓ×dℓ−1each have a trivial nullspace. Then  φθhas a trivial nullspace, and  vθis positive-definite with  vθ(0) = 0and  vθ(x) > 0, ∀x ∈ X \ {0}. Furthermore, if  ϕℓis Lipschitz continuous for each layer  ℓ, then  vθis locally Lipschitz continuous.

We provide a formal proof of Theorem 2 in Appendix A and briefly outline it here. As an inner product,  vθ(x) = φθ(x)⊤φθ(x)is already positive-definite for any neural network output  φθ(x), and thus is at least nonnegative for any state  x ∈ X. The step from nonnegativity to positive-definiteness of  vθon X now only depends on how the origin  0 ∈ Xis mapped through  φθ. If  φθmaps  0 ∈ Xuniquely to the zero output  φθ(0) = 0, i.e., if  φθhas a trivial nullspace, then  vθis positive-definite. For this, it is sufficient that each layer of  φθhas a trivial nullspace, i.e., that each layer “passes along”  0 ∈ Xto its zero output  yℓ(0) = 0until the final output  φθ(0) = 0.

In Theorem 2, each layer  ℓhas a trivial nullspace as long as its weight matrix  Wℓand activation function  ϕℓhave trivial nullspaces. Consequently, this requires that  dℓ ≥ dℓ−1for each layer  ℓ, where  dℓis the output dimension of layer  ℓ. That is,  Wℓmust not decrease the dimension of its input. To ensure that  Wℓhas a trivial nullspace, we structure it as

image

where  Gℓ1 ∈ Rqℓ×dℓ−1for some  qℓ ∈ N≥1, Gℓ2 ∈ R(dℓ−dℓ−1)×dℓ−1, Idℓ−1 ∈ Rdℓ−1×dℓ−1is the identity matrix, and  ε ∈ R>0is a constant. The top partition  G⊤ℓ1Gℓ1 + εIdℓ−1is positive-definite for  ε > 0, thus  Wℓalways has full rank and a trivial nullspace. Otherwise,  Wℓwould have a non-empty nullspace of dimension  dℓ−1 − min(dℓ, dℓ−1) = dℓ−1 − dℓ > 0by the rank-nullity theorem. With this choice of structure for  Wℓ, the parameters of the neural network  φθare given by  θ := {Gℓ1, Gℓ2}ℓ. Finally, we choose activation functions that have trivial nullspaces and that are Lipschitz continuous in X, such as  tanh(·)and the leaky ReLU. We can then compute a Lipschitz constant for  φθ[21].

image

Figure 2. Illustration of training the parameterized Lyapunov candidate  vθto expand the safe level set  Vθ(ck)(blue ellipsoid) towards the true largest ROA  Sπ(black). States in the gap  G between Vθ(ck) and Vθ(αck)(orange ellipsoid) are simulated forward to determine regions (green) towards which we can expand the safe level set. This information is used in Algorithm 1 to iteratively adapt safe level sets of  vθto the shape of  Sπ.

3.2 Learning a Safe Set via Classification

Previously, we constructed a neural network Lyapunov candidate  vθin Theorem 2 that satisfies the positive-definiteness and Lipschitz continuity requirements in Theorem 1. As a result, we can always use the one-step decrease condition  ∆vθ(x) := vθ(fπ(x)) − vθ(x) < 0as a provable safety certificate to identify safe level sets that are subsets of the largest safe region  Sπ. Now, we design a training algorithm to adapt the parameters  θsuch that the resulting Lyapunov candidate  vθsatisfies  ∆vθ(x) < 0throughout as large of a decrease region  Dvθ ⊆ Xas possible. This also makes  vθa valid Lyapunov function for the closed-loop dynamics  fπ.

For now, we assume the entire safe region  Sπis known. We want to use a level set  Vθ(c)of  vθto certify the entire set  Sπas safe. According to Theorem 1, this requires the Lyapunov decrease condition  ∆vθ(x) < 0to be satisfied for each state  x ∈ Sπ. We formally state this problem as

image

where  Vol(·)is some measure of set volume. Thus, we want to find the largest level set of  vθthat is contained in the true largest ROA  Sπ; see Fig. 2a. We fix  c = cSwith some  cS ∈ R>0, as it is always possible to rescale  vθby a constant, and focus on optimizing over  θ. We can then interpret (3) as a classification problem. Consider Fig. 1b, where we assign the ground-truth label y = +1 whenever a state x is contained in  Sπ, and  y = −1otherwise. We use  vθtogether with Theorem 1 to classify states by their membership in the level set  V(cS). This is described by the decision rule

image

That is, each state within the level set  V(cS)obtains the label y = +1. However, we must also satisfy the Lyapunov decrease condition imposed by Theorem 1. This can be written as the constraint

image

which means that we can assign the label y = +1 only if the decrease condition is also satisfied. The decision rule (4) together with the constraint (5) ensures that the resulting estimated safe set  V(cS)satisfies all of the conditions in Theorem 1. We want to select the neural network parameters  θso that this rule can perfectly classify  x ∈ Sπas “safe” with  ˆyθ(x) = +1(i.e.,  cS − vθ(x) > 0) or x /∈ Sπas “unsafe” with  ˆyθ(x) = −1(i.e.,  cS − vθ(x) ≤ 0). To this end, the decision boundary vθ(x) = cSmust exactly delineate the boundary of  Sπ. Furthermore, the value of  θmust ensure (5) holds, such that  vθsatisfies the decrease condition of Theorem 1 on  Sπ.

Since we have rewritten the optimization problem in (3) as a classification problem, we can use ideas from the corresponding literature [26]. In particular, we define a loss function  ℓ(y, x; θ)that penalizes misclassification of the true label y at a state x under the decision rule (4) associated with  θ. Many common choices for the loss function are possible; for simplicity, we use the perceptron loss, which penalizes misclassifications more when they occur far from the decision boundary.

image

We choose not to use the “maximum margin” objective of the hinge loss, since it may be unsuitable for us to accurately delineate  Sπ, where states can lie arbitrarily close to the decision boundary in the continuous state space X. Since we use the level set  Vθ(cS)in our classification setting, this corresponds to  ℓ(y, x; θ) = max�0, −y ·�cS − vθ(x)��. Here,  cS − vθ(x)is the signed distance from the decision boundary  vθ(x) = cS, which separates the safe set  Sπfrom the rest of the state space  X \ Sπ. This classifier loss has a magnitude of��cS − vθ(x)��in the case of a misclassification, and zero otherwise. This ensures that decisions far from the decision boundary, such as those near the origin, are considered more important than the more difficult decisions close to the boundary.

Ideally, we would like to minimize this loss throughout the state space with  min�X l(y, x; θ) dxsubject to the constraint (5). Since this problem is intractable, we use gradient-based optimization together with mini-batches instead, as is typically done in machine learning. To this end, we sample states  Xb = {xi}ifrom the state space X at random and assign the ground-truth labels  {yi}ito them. Based on this finite set, the optimization objective can be written as

image

where the batch  Xbis re-sampled after every gradient step. We can apply a Lagrangian relaxation

image

in order to make the problem tractable. Here,  λ ∈ R>0is a Lagrangian multiplier and the term λ((y + 1)/2) max�0, ∆vθ(x)�is the Lyapunov decrease loss, which penalizes violations of (5). The decrease condition  ∆vθ(x) < 0only needs to be enforced within the safe region  Sπ, so we do not want to incur a loss if it is violated at a state where  y = −1. Thus, we use the multiplier (y +1)/2 to map  {+1, −1}to {1, 0}, such that the Lyapunov decrease loss is zeroed-out if  y = −1.

However, there are two issues when this formulation is compared to the exact problem in (3). Firstly, the objective (7) only penalizes violations of the decrease condition  ∆vθ(x) < 0, rather than constraining  θto enforce it. Thus, while  ∆vθ(x) < 0is always a provable safety certificate, we must verify that it holds over some level set whenever we update  θ. Secondly, ground-truth labels of  Sπare not known in practice. To address these issues, we can use any method to check Lyapunov safety certificates over continuous state spaces to certify a level set  Vθ(c)as safe, and then use  Vθ(c)to estimate labels y from  Sπ. For this work, we check the tightened certificate  ∆vθ(x) < −L∆vθτon a discretization of X, as described in Sec. 2.1. This method exposes the Lipschitz constant  L∆vθof  ∆vθ, which can conveniently be used for regularization in practice [21]. Possible alternatives to this safety verification method include the use of an adaptive discretization for better scaling to higher-dimensional state spaces [11], and formal verification methods for neural networks [27, 28].

Since such an estimate of  Sπis limited by the largest safe level set of  vθ, we propose Algorithm 1 to iteratively “grow” an estimate of  Sπ. We initialize  vθ, then use it to identify the largest safe level set  Vθ(c0)by verifying the condition  ∆vθ(x) < 0. At first, we use  Vθ(c0)to estimate  Sπ. At iteration  k ∈ N≥0, we consider the safe level set  Vθ(ck)and the expanded level set  Vθ(αck)for some  α ∈ R>1; see Fig. 2b. Then, states in the “gap”  G := Vθ(αck)\Vθ(ck)are forward-simulated

image

Figure 3. Results for training the neural network (NN) Lyapunov candidate  vθfor an inverted pendulum. In Fig. 3a, system trajectories (black) converge to the origin only within the largest safe region  Sπ(green). The NN candidate (orange) characterizes  Sπwith a level set better than both the LQR (blue ellipsoid) and SOS (yellow) candidates, as it adapts to the shape of  Sπ. In Fig. 3b, the safe level  ck of vθconverges non-monotonically towards the fixed boundary  cS = 1, and the safe level set  Vθ(ck)grows to cover most of  Sπ. However, as discussed at the end of Sec. 3, convergence of  Vθ(ck) to Sπis not guaranteed in general by Algorithm 1.

with the dynamics  fπfor  T ∈ N≥1time steps. States that fall in  Vθ(ck)before or after forward-simulation form a new estimate of  Sπ, since trajectories become “trapped” in  Vθ(ck)and converge to the origin. We use this estimate of  Sπto identify labels y for classification, then apply SGD with the objective (7) to update  θand encourage  Vθ(ck)to grow. Finally, we certify the new largest safe level set  Vθ(ck+1). These steps are repeated until a choice of stopping criterion is satisfied.

In general, Algorithm 1 does not guarantee convergence of the safe level set  Vθ(ck)to  Sπ, nor that  Vθ(ck)monotonically grows in volume. Furthermore, it is not guaranteed that the iterated safe level  ck ∈ R>0approaches the safe level  cSthat is prescribed to delineate  Sπ. This is typical of gradient-based parameter training, since the parameters  θcan become “stuck” in local optima. However, since the Lyapunov candidate  vθis guaranteed to satisfy the positive-definiteness and Lipschitz continuity conditions of Theorem 1 by its construction in Theorem 2,  ∆vθ(x) < 0is always a provable safety certificate for identifying safe level sets. Thus, we can always use  vθto identify at least a subset of  Sπ, without ever identifying unsafe states as safe.

In the previous section, we developed Algorithm 1 to train the parameters  θof a neural network Lyapunov candidate  vθconstructed according to Theorem 2. This construction ensures the positive-definiteness and Lipschitz continuity assumptions on  vθin Theorem 1 are satisfied. Algorithm 1 encourages  vθto satisfy the decrease condition and match the true largest ROA  Sπfor the closed-loop dynamics  fπwith a level set  Vθ(cS). In this section, we present details for the implementation of Algorithm 1 to learn the largest safe region of a simulated inverted pendulum system, and experimental results in a comparison to other methods of computing Lyapunov functions.

Inverted Pendulum Benchmark The inverted pendulum is governed by the differential equation mℓ2¨θ = mgℓ sin θ − β ˙θ + uwith state  x := (θ, ˙θ), where  θis the angle from the upright equilibrium  xO = 0, uis the input torque, m is the pendulum mass, g is the gravitational acceleration, ℓis the pole length, and  βis the friction coefficient. We discretize the dynamics with a time step of  ∆t = 0.01 sand enforce a saturation constraint  u ∈ [−¯u, ¯u], such that the pendulum falls over past a certain angle and cannot recover. For a linear policy  u = π(x) = Kx, this yields the safe region  Sπin Fig. 3 around the upright equilibrium for the closed-loop dynamics  fπ. In particular, we fix K to the linear quadratic regulator (LQR) solution for the discretized, linearized, unconstrained form of the dynamics [29]. Outside of  Sπ, the pendulum falls down without the ability to recover and the system trajectories diverge away from  xO = 0.

Practical Considerations To train the parameters of the Lyapunov candidate  vθto adapt to the shape of  Sπ, we use Algorithm 1 with SGD. To certify the safety of continuous level sets of  vθwhenever  θis updated, we check the stricter decrease condition  ∆vθ(x) < −L∆vθτat a discrete set of points that cover X in increasing order of the value of  vθ(x), as in [17]. Algorithm 1 does not guarantee that the safe level set estimate  Vθ(ck)grows monotonically in volume towards  Sπwith each iteration k. In fact, the estimate  Vθ(ck)may shrink if  vθinitially succeeds and then fails to satisfy the decrease condition  ∆vθ(x) < 0in some regions of the state space. This tends to occur near the origin, where  vθ(0) = ∆vθ(0) = 0and the “basin of attraction” characterized by  vθ“flattens”. To alleviate this, we use a large Lagrange multiplier  λ = 1000in the SGD objective (7) to strongly “push”  θtowards values that ensure  vθcontinues to satisfy the decrease condition. In addition, we normalize the Lyapunov decrease loss  λ((y + 1)/2) max�0, ∆vθ(x)�in (7) by  vθ(x). This more heavily weighs sampled states near the origin, i.e., where  vθ(x)is small.

Results We implement Algorithm 1 on the inverted pendulum benchmark with the Python code available at https://github.com/befelix/safe_learning, which is based on TensorFlow [30]. For the neural network Lyapunov candidate  vθ, we use three layers of 64  tanh(·)activation units each. We prescribe  Vθ(cS)with  cS = 1as the level set that delineates the safe region  Sπ. Fig. 3 shows the results of training  vθwith Algorithm 1, and the largest safe level set  Vθ(c18)with 10 SGD iterations per update. Fig. 3a visualizes how this level set has “moulded” to the shape of  Sπ. Fig. 3b shows how the safe level  ckconverges towards the prescribed level  cS = 1that delineates  Sπ, and how the fraction of  Sπcovered by  Vθ(ck)approaches 1. The true largest ROA  Sπis estimated by forward-simulating all of the states in a state space discretization, and set volume is estimated by counting discrete states. Fig. 3a also shows the largest safe sets for a LQR Lyapunov candidate and a SOS Lyapunov candidate. The LQR candidate  vLQR(x) = x⊤Pxis computed in closed-form for the same discretized, linearized, unconstrained form of the dynamics used to determine the LQR policy  π(x) = Kx[29]. The SOS Lyapunov candidate  vSOS(x) = m(x)⊤Qm(x)uses up to third-order monomials in x, thus it is a sixth-order polynomial. It is computed with the toolbox SOSTOOLS [31] and the SDP solver SeDuMi [32] in MATLAB for the unconstrained nonlinear dynamics with a Taylor polynomial expansion of  sin θ. While the SOS approach is a powerful specialized method for polynomial dynamical systems, it cannot account for the non-differentiable nonlinearity introduced by the input saturation, which drastically alters the closed-loop dynamics. As a result, while  vSOSis optimized for the system without saturation, it is ill-suited to the true closed-loop dynamics and yields a small safe level set. Overall, our neural network Lyapunov candidate  vθperforms the best at certification of as much of  Sπas possible, since it only relies on inputs and outputs of  fπ, and adapts to the shape of  Sπ.

Comments on Safe Learning Fig. 3a demonstrates that a neural network Lyapunov candidate  vθcan certify more of the true largest safe region  Sπthan other common Lyapunov candidates. This has important implications for safe exploration during learning for dynamical systems; with more safe states available to visit, an agent can better learn about itself and its environment under a wider range of operating conditions. For example, our method is applicable in the safe reinforcement learning framework of [17]. This past work provides safe exploration guarantees for a GP model of the dynamics  fπwith confidence bounds on the Lyapunov stability certificate, but these guarantees are limited by the choice of Lyapunov function. As our results have shown, certain Lyapunov candidates may poorly characterize the shape of the true largest safe region  Sπ. Since our neural network Lyapunov candidate can adapt to the shape of  Sπduring learning by using, for example, the mean estimate of  fπfrom the GP model, we could enlarge the estimated safe region more quickly as data is collected. Our method is also applicable to exploration algorithms within safe motion planning that depends on knowledge of a safe region, such as in [33]. Overall, our method strongly warrants consideration for use in safe learning methods that leverage statistical models of dynamical systems.

We have demonstrated a novel method for learning safety certificates for general nonlinear dynamical systems. Specifically, we developed a flexible class of parameterized Lyapunov candidate functions and a training algorithm to adapt them to the shape of the largest safe region for a closed-loop dynamical system. We believe that our method is appealing due to its applicability to a wide range of dynamical systems in theory and practice. Furthermore, it can play an important role in improving safe exploration during learning for real autonomous systems in uncertain environments.

This research was supported in part by SNSF grant 200020 159557, the Vector Institute, and a fellowship by the Open Philanthropy Project.

[1] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Man´e. Concrete prob- lems in AI safety. Technical report, 2016. arXiv:1606.06565v2 [cs.AI].

[2] H. K. Khalil. Nonlinear Systems. Prentice Hall, Upper Saddle River, NJ, 3 edition, 2002.

[3] A. Vannelli and M. Vidyasagar. Maximal Lyapunov functions and domains of attraction for autonomous nonlinear systems. Automatica, 21(1):69–80, 1985.

[4] D. J. Hill and I. M. Y. Mareels. Stability theory for differential/algebraic systems with ap- plication to power systems. IEEE Transactions on Circuits and Systems, 37(11):1416–1423, 1990.

[5] J. M. G. da Silva Jr. and S. Tarbouriech. Antiwindup design with guaranteed regions of sta- bility: An LMI-based approach. IEEE Transactions on Automatic Control, 50(1):106–111, 2005.

[6] R. Kalman and J. Bertram. Control system analysis and design via the “second method” of Lyapunov II: Discrete-time systems. Transactions of the American Society of Mechanical Engineers (ASME): Journal of Basic Engineering, 82(2):394–400, 1960.

[7] P. Giesl and S. Hafstein. Review on computational methods for Lyapunov functions. Discrete and Continuous Dynamical Systems, Series B, 20(8):2291–2331, 2016.

[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2009.

[9] P. A. Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, California Institute of Technology, 2000.

[10] D. Henrion and M. Korda. Convex computation of the region of attraction of polynomial control systems. IEEE Transactions on Automatic Control, 59(2):297–312, 2014.

[11] R. Bobiti and M. Lazar. A sampling approach to finding Lyapunov functions for nonlinear discrete-time systems. In Proc. of the European Control Conference (ECC), pages 561–566, 2016.

[12] K. Zhou and J. C. Doyle. Essentials of Robust Control. Prentice Hall, Upper Saddle River, NJ, 1998.

[13] A. Trofino. Robust stability and domain of attraction of uncertain nonlinear systems. In Proc. of the American Control Conference (ACC), pages 3707–3711, 2000.

[14] U. Topcu, A. K. Packard, P. Seiler, and G. J. Balas. Robust region-of-attraction estimation. IEEE Transactions on Automatic Control, 55(1):137–142, 2010.

[15] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.

[16] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes. In Proc. of the IEEE Conference on Decision and Control (CDC), pages 4661–4666, 2016.

[17] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause. Safe model-based reinforce- ment learning with stability guarantees. In Proc. of the Conference on Neural Information Processing Systems (NIPS), pages 908–918, 2017.

[18] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 2 edition, 2018. (draft).

[19] V. Petridis and S. Petridis. Construction of neural network based Lyapunov functions. In Proc. of the IEEE International Joint Conference on Neural Network Proceedings, pages 5059–5065, 2006.

[20] N. Noroozi, P. Karimaghaee, F. Safaei, and H. Javadi. Generation of Lyapunov functions by neural networks. In Proc. of the World Congress on Engineering (WCE), volume 1, pages 61–65, 2008.

[21] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In Proc. of the International Conference on Learning Representations (ICLR), 2014.

[22] A. Papachristodoulou and S. Prajna. On the construction of Lyapunov functions using the sum of squares decomposition. In Proc. of the IEEE Conference on Decision and Control (CDC), pages 3482–3487, 2002.

[23] A. Papachristodoulou. Scalable analysis of nonlinear systems using convex optimization. PhD thesis, California Institute of Technology, 2005.

[24] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989.

[25] K. Hornik. Some new results on neural network approximation. Neural Networks, 6(8):1069– 1072, 2001.

[26] C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag, New York, NY, 2006.

[27] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety verification of deep neural networks. Technical report, 2017. arXiv:1610.06940v3 [cs.AI].

[28] G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. In Proc. of the International Conference on Computer Aided Verification (CAV), 2017.

[29] F. L. Lewis, D. L. Vrabie, and V. L. Syrmos. Optimal Control. John Wiley & Sons, Inc., Hoboken, NJ, 3 edition, 2012.

[30] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A system for largescale machine learning. In Proc. of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265–283, 2016.

[31] S. Prajna, A. Papachristodoulou, and P. A. Parrilo. Introducing SOSTOOLS: A general purpose sum of squares programming solver. In Proc. of the IEEE Conference on Decision and Control (CDC), pages 741–746, 2002.

[32] J. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11(1–4):625–653, 1999.

[33] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause. Learning-based model predictive control for safe exploration. In Proc. of the IEEE Conference on Decision and Control (CDC), 2018. (to appear).

Theorem 2 (Lyapunov neural network): Consider  vθ(x) = φθ(x)⊤φθ(x)as a Lyapunov candidate function, where  φθis a feed-forward neural network. Suppose, for each layer  ℓin  φθ, the activation function  ϕℓand weight matrix  Wℓ ∈ Rdℓ×dℓ−1each have a trivial nullspace. Then  φθhas a trivial nullspace, and  vθis positive-definite with  vθ(0) = 0and  vθ(x) > 0, ∀x ∈ X \ {0}. Furthermore, if  ϕℓis Lipschitz continuous for each layer  ℓ, then  vθis locally Lipschitz continuous.

Proof. We begin by showing that  φθhas a trivial nullspace in X by induction, and then use this to prove that  vθis positive-definite on X. Recall that a feed-forward neural network is a successive composition of its layer transformations, such that the output  yℓ(x)of layer  ℓfor the state  x ∈ Xis the input to layer  ℓ + 1. Consider  ℓ = 0with the input  y0(x) := x, and the first layer output y1(x) = ϕ1(W1y0(x)). Clearly  y0has a trivial nullspace in X, since it is just the identity function. Since  W1, ϕ1, and  y0each have a trivial nullspace in their respective input spaces, the sequence of logical statements

image

holds. Thus,  x = 0 ⇐⇒ ϕ1(W1y0(x)) = 0holds, and  y1has a trivial nullspace in X. If we now assume  yℓhas a trivial nullspace in X, it is clear that  yℓ+1has a trivial nullspace in X, since

image

holds in a similar fashion. As a result,  yℓhas a trivial nullspace for each layer  ℓby induction. Since φθis a composition of a finite number of layers,  φθ = yLfor some  L ∈ N≥0, thus  φθhas a trivial nullspace in X.

We now use this property of  φθto prove that the Lyapunov candidate  vθ(x) = φθ(x)⊤φθ(x)is positive-definite on X. As an inner product,  φθ(x)⊤φθ(x)is positive-definite on the transformed space  Y :=�φθ(x), ∀x ∈ X�. Thus,  vθ(x) = 0 ⇐⇒ φθ(x) = 0and  vθ(x) > 0otherwise. Since we have already proven  φθ(x) = 0 ⇐⇒ x = 0, combining these statements shows that vθ(x) = 0 ⇐⇒ x = 0and  vθ(x) > 0otherwise. As a result,  vθ(x)is positive-definite on X.

Finally, we need to show that if every activation function  ϕℓis Lipschitz continuous, then  vθis locally Lipschitz continuous. If the neural network  φθis Lipschitz continuous, then clearly  vθis locally Lipschitz continuous, since it is quadratic and thus differentiable with respect to  φθ. To show that  φθis Lipschitz continuous, it is sufficient to show that each layer is Lipschitz continuous. This is due to the fact that any function composition f(g(x)) is Lipschitz continuous with Lipschitz constant LfLgif f has Lipschitz constant  Lfand g has Lipschitz constant  Lg. This fact can be seen from ��f(g(x)) − f(g(x′))�� ≤ Lf��g(x) − g(x′)�� ≤ LfLg��x − x′��, for each pair  x, x′ ∈ X. By the Lipschitz continuity of function composition and the linearity of  Wℓyℓ−1, each layer transformation yℓ = ϕℓ(Wℓyℓ−1)is Lipschitz continuous if  ϕℓis Lipschitz continuous. As a result, the neural network  φθis Lipschitz continuous, and the Lyapunov candidate  vθis locally Lipschitz continuous.

image

Remark 1: In (2), we ensured each weight matrix  Wℓhas a trivial nullspace with the structure

image

where  Gℓ1 ∈ Rqℓ×dℓ−1for some  qℓ ∈ N≥1, Gℓ2 ∈ R(dℓ−dℓ−1)×dℓ−1, Idℓ−1 ∈ Rdℓ−1×dℓ−1is the identity matrix, and  ε ∈ R>0is a constant. To minimize the number of free parameters required by our neural network Lyapunov candidate, we choose  qℓto be the minimum integer such that each entry in  G⊤ℓ1Gℓ1 ∈ Rdℓ−1×dℓ−1is independent from the others. Since  G⊤ℓ1Gℓ1is symmetric, it has �dℓ−1j=1 j = dℓ−1(dℓ−1 + 1)/2free parameters, thereby requiring  qℓdℓ−1 ≥ dℓ−1(dℓ−1 + 1)/2or qℓ ≥ (dℓ−1 + 1)/2. For this, we choose  qℓ =�(dℓ−1 + 1)/2�.


Designed for Accessibility and to further Open Science