The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems

2018·arXiv

Abstract

1 Introduction

Safety is among the foremost open problems in robotics and artificial intelligence [1]. Many autonomous systems, such as self-driving cars and robots for palliative care, are safety-critical due to their interaction with human life. At the same time, learning is necessary for these systems to perform well in a priori unknown environments. During learning, they must safely explore their environment by avoiding dangerous states from which they cannot recover. For example, consider an autonomous robot in an outdoor environment affected by rough terrain and adverse weather conditions. These factors introduce uncertainty about the relationship between the robot’s speed and maneuverability. While the robot should learn about its capabilities in such conditions, it must not perform a maneuver at a high speed that would cause it to crash. Conversely, traveling at only slow speeds to avoid accidents is not conducive to learning about the extent of the robot’s capabilities.

To ensure safe learning, we must verify a safety certificate for a state before it is explored. In control theory, a set of states is safe if system trajectories are bounded within it and asymptotically converge to a fixed point under a fixed control policy. Within such a region of attraction (ROA) [2], the system can collect data during learning and can always recover to a known safe point. In this paper, we leverage Lyapunov stability theory to construct provable, neural network-based safety certificates, and adapt them to the size and shape of the largest ROA of a general nonlinear dynamical system.

Related work Lyapunov functions are convenient tools for stability (i.e., safety) certification of dynamical systems [2] and for ROA estimation [3, 4, 5]. These functions encode long-term behaviour of state trajectories in a scalar value [6], such that a ROA can be encoded as a level set of the Lyapunov function. However, Lyapunov functions for general dynamical systems are difficult to find; computational approaches are surveyed in [7]. A Lyapunov function can be identified efficiently via a semi-definite program (SDP, [8]) when the dynamics are polynomial and the Lyapunov function is restricted to be a sum-of-squares (SOS) polynomial [9]. Other methods to compute ROAs include maximization of a measure of ROA volume over system trajectories [10], and sampling-based approaches that generalize information about stability at discrete points to a continuous region [11].

This paper is particularly concerned with safety certificates for dynamical systems with uncertainties in the form of model errors. In robust control [12], the formulation of SDPs with SOS Lyapunov functions is used to compute ROA estimates for uncertain linear dynamical systems with the assumption of a worst-case linear perturbation from a known bounded set [13, 14]. Learning-based control methods with a Gaussian process (GP, [15]) model of the system instead consider uncertainty in a Bayesian manner, where model errors are reduced in regions where data has been collected. The methods in [16, 17] estimate a ROA with Lyapunov stability certificates computed on a discretization of the state space, which is used for safe reinforcement learning (RL, [18]). The Lyapunov function is assumed to be given in [16], while [17] uses the negative value (i.e., cost) function from RL with a quadratic reward. Ultimately, this approach is limited by a shape mismatch between level sets of the Lyapunov function and the true largest ROA. For example, a quadratic Lyapunov function has ellipsoidal level sets, which cannot characterize a non-ellipsoidal ROA, while the SOS approach is restricted to fixed monomial features. To improve safe exploration for general nonlinear dynamics, we want to learn these features to determine a Lyapunov function with suitably shaped level sets.

Contributions In this paper, we present a novel method for learning accurate safety certificates for general nonlinear dynamical systems. We construct a neural network Lyapunov candidate and, unlike past work in [19, 20], we structure our candidate such that it always inherently yields a provable safety certificate. Then, we specify a training algorithm that adapts the candidate to the shape of the dynamical system’s trajectories via classification of states as safe or unsafe. We do not depend on any specific structure of the dynamics for this. We show how our construction relates to SOS Lyapunov functions, and compare our approach to others on a simulated inverted pendulum benchmark. We also discuss how our method can be used to make safe learning more effective.

2 Problem Statement and Background

We consider a discrete-time, time-invariant, deterministic dynamical system of the form

where is the time step index, and and are the state and control inputs respectively at time step t. The system is controlled by a feedback policy and the resulting closed-loop dynamical system is given by with . We assume this policy is given, but it can, for example, be computed online with RL or optimal control. This policy is safe to use within a subset of the state space X. The set is a ROA for , i.e., every system trajectory of that begins at some also remains in and asymptotically approaches an equilibrium point where [2]. We assume without loss of generality. Hereafter, we use to denote the true largest ROA in X under the policy .

A reliable estimate of is critical to online learning systems, since we need to ensure that a policy is safe to use on the real system before it can be deployed. The goal of this paper is to estimate the largest safe set . We must also ensure safety by never overestimating , i.e., we must not identify unsafe states as safe. For this to be feasible, we make a regularity assumption about the closed-loop dynamics; we assume is Lipschitz continuous on X with Lipschitz constant . This is a weak assumption and is even satisfied when a neural network policy is used [21].

2.1 Safety Certification with Lyapunov Functions

One way to estimate the safe region is by using a Lyapunov function. Given a suitable Lyapunov function v, a safe region for the closed-loop dynamical system can be determined.

Theorem 1 (Lyapunov’s stability theorem [6]): Suppose is locally Lipschitz continuous and has an equilibrium point at . Let be locally Lipschitz continuous on X. If there exists a set containing 0 on which v is positive-definite and , , then is an asymptotically stable equilibrium. In this case, v is known as a Lyapunov function for the closed-loop dynamics , and is the Lyapunov decrease region for v.

Figure 1. Fig. 1a illustrates a shape mismatch between the largest level set V(c) (blue ellipsoid) of a quadratic Lyapunov function v contained within the decrease region (green dashes), and the safe region We cannot certify all of , which limits exploration in safe learning. Instead, we train a Lyapunov candidate with parameters with a level set , via classification of sampled states as “safe” with ground-truth label ) or “unsafe” with

Theorem 1 states that a Lyapunov function v characterizes a “basin” of safe states where trajectories of “fall” towards the origin . If we can find a positive-definite v such that the dynamics always map downwards in the value of v(x), then trajectories eventually reach v(x) = 0, thus x = 0. To find a ROA, rather than checking if v decreases along entire trajectories, it is sufficient to verify the one-step decrease condition for every state x in a level set of v.

Corollary 1 (Safe level sets [6]): Every level set contained within the decrease region is invariant under . That is, . Furthermore, for every in these level sets, so each one is a ROA for and .

Intuitively, if v(x) decreases everywhere in the level set , except at where it is zero, then is invariant, since the image of under is the smaller level set with . If v is also positive-definite, then this ensures trajectories that start in a level set V(c) contained in the decrease region remain in V(c) and converge to . To identify safe level sets, we must check if a given Lyapunov candidate v satisfies the conditions of Theorem 1. However, the decrease condition is difficult to verify throughout a continuous subset . It is sufficient to verify the tightened safety certificate at a finite set of points that cover , where is the Lipschitz constant of and is a measure of how densely the points cover [17]. We can even couple this with bounds on from a statistical model to certify high-probability safe sets with the certificate , where is an upper confidence bound on . A GP model of is used for this purpose in [17].

2.2 Computing SOS Lyapunov Functions

In general, a suitable Lyapunov candidate v is difficult to find. Computational methods often restrict v to a particular function class for tractability. The SOS approach restricts v(x) to be polynomial, but is limited to polynomial dynamical systems, i.e., when is a vector of polynomials in the elements of x [9, 22, 23]. In particular, the SOS approach enforces , where m(x) is a vector of a priori fixed monomial features in the elements of x, and Q is an unknown positive-semidefinite matrix. This makes v(x) a quadratic function on a monomial feature space. A SDP can be efficiently solved to yield a Q that simultaneously guarantees that v satisfies the assumptions of Theorem 1 and has the largest possible level set in its decrease region . That is, the positive-definiteness of v and the negative-definiteness of in are enforced as constraints in the SDP. This contrasts the more general approach described in Sec. 2.1, where a Lyapunov candidate v is given and then the assumptions of Theorem 1 are verified by checking discrete points.

With the SOS approach and a suitable choice of can be estimated well with a level set V(c) of v, since the monomial features allow Lyapunov functions with shapes beyond simple ellipsoids to be found. However, the SOS approach requires polynomial dynamics, and the best choice of m(x) can be difficult to determine. Without a suitable Lyapunov function, we face the problem of a shape mismatch between V(c) and . This is exemplified in Fig. 1a, where level sets of quadratic v are ellipsoidal while is not, which limits the region of the state space that is certifiable as safe by v.

3 Learning Lyapunov Candidates

In this section, we establish a more flexible class of parameterized Lyapunov candidates that can satisfy the assumptions on v in Theorem 1 by virtue of their structure and gradient-based parameter training. In particular, we show how a binary classification problem based on whether each state x lies within the safe region can be formulated to train the parameterized Lyapunov candidate.

3.1 Construction of a Neural Network Lyapunov Function

We take the SOS approach in Sec. 2.2 as a starting point to construct a neural network Lyapunov candidate. The SOS Lyapunov candidate is a Euclidean inner product on the transformed space with . The ability of the SOS Lyapunov candidate v to certify safe states for depends on the choice of monomials in m(x). We interpret these choices as engineered features that define the expressiveness of v in delineating the decision boundary between safe and unsafe states. Rather than choose such features manually and parameterize with Q only, we propose the Lyapunov candidate to learn the requisite features, where is a feed-forward neural network with parameter vector . Feed-forward neural networks are expressive in that they can approximate any continuous function on compact subsets of with a finite number of parameters [24, 25]. In Sec. 3.2, we exploit this property together with gradient-based parameter training to closely match the true ROA with a level set of the candidate without the need to engineer individual features of .

We cannot use an arbitrary feed-forward neural network in our Lyapunov candidate, since the conditions of Theorem 1 must be satisfied. Otherwise, the resulting candidate cannot provide any safety information. In general, is a sequence of function compositions or layers. Each layer has the form , where is the output of layer for state is a fixed element-wise activation function, and is a linear transformation parameterized by . To satisfy the assumptions of Theorem 1, must be Lipschitz continuous on X and positive-definite on some subset of X around . To this end, we restrict to be positive-definite and Lipschitz continuous on X for all values of with a suitable choice of structure for .

Theorem 2 (Lyapunov neural network): Consider as a Lyapunov candidate function, where is a feed-forward neural network. Suppose, for each layer in , the activation function and weight matrix each have a trivial nullspace. Then has a trivial nullspace, and is positive-definite with and . Furthermore, if is Lipschitz continuous for each layer , then is locally Lipschitz continuous.

We provide a formal proof of Theorem 2 in Appendix A and briefly outline it here. As an inner product, is already positive-definite for any neural network output , and thus is at least nonnegative for any state . The step from nonnegativity to positive-definiteness of on X now only depends on how the origin is mapped through . If maps uniquely to the zero output , i.e., if has a trivial nullspace, then is positive-definite. For this, it is sufficient that each layer of has a trivial nullspace, i.e., that each layer “passes along” to its zero output until the final output .

In Theorem 2, each layer has a trivial nullspace as long as its weight matrix and activation function have trivial nullspaces. Consequently, this requires that for each layer , where is the output dimension of layer . That is, must not decrease the dimension of its input. To ensure that has a trivial nullspace, we structure it as

where for some is the identity matrix, and is a constant. The top partition is positive-definite for , thus always has full rank and a trivial nullspace. Otherwise, would have a non-empty nullspace of dimension by the rank-nullity theorem. With this choice of structure for , the parameters of the neural network are given by . Finally, we choose activation functions that have trivial nullspaces and that are Lipschitz continuous in X, such as and the leaky ReLU. We can then compute a Lipschitz constant for [21].

Figure 2. Illustration of training the parameterized Lyapunov candidate to expand the safe level set (blue ellipsoid) towards the true largest ROA (black). States in the gap (orange ellipsoid) are simulated forward to determine regions (green) towards which we can expand the safe level set. This information is used in Algorithm 1 to iteratively adapt safe level sets of to the shape of

3.2 Learning a Safe Set via Classification

Previously, we constructed a neural network Lyapunov candidate in Theorem 2 that satisfies the positive-definiteness and Lipschitz continuity requirements in Theorem 1. As a result, we can always use the one-step decrease condition as a provable safety certificate to identify safe level sets that are subsets of the largest safe region . Now, we design a training algorithm to adapt the parameters such that the resulting Lyapunov candidate satisfies throughout as large of a decrease region as possible. This also makes a valid Lyapunov function for the closed-loop dynamics .

For now, we assume the entire safe region is known. We want to use a level set of to certify the entire set as safe. According to Theorem 1, this requires the Lyapunov decrease condition to be satisfied for each state . We formally state this problem as

where is some measure of set volume. Thus, we want to find the largest level set of that is contained in the true largest ROA ; see Fig. 2a. We fix with some , as it is always possible to rescale by a constant, and focus on optimizing over . We can then interpret (3) as a classification problem. Consider Fig. 1b, where we assign the ground-truth label y = +1 whenever a state x is contained in , and otherwise. We use together with Theorem 1 to classify states by their membership in the level set . This is described by the decision rule

That is, each state within the level set obtains the label y = +1. However, we must also satisfy the Lyapunov decrease condition imposed by Theorem 1. This can be written as the constraint

which means that we can assign the label y = +1 only if the decrease condition is also satisfied. The decision rule (4) together with the constraint (5) ensures that the resulting estimated safe set satisfies all of the conditions in Theorem 1. We want to select the neural network parameters so that this rule can perfectly classify as “safe” with (i.e., ) or as “unsafe” with (i.e., ). To this end, the decision boundary must exactly delineate the boundary of . Furthermore, the value of must ensure (5) holds, such that satisfies the decrease condition of Theorem 1 on .

Since we have rewritten the optimization problem in (3) as a classification problem, we can use ideas from the corresponding literature [26]. In particular, we define a loss function that penalizes misclassification of the true label y at a state x under the decision rule (4) associated with . Many common choices for the loss function are possible; for simplicity, we use the perceptron loss, which penalizes misclassifications more when they occur far from the decision boundary.

We choose not to use the “maximum margin” objective of the hinge loss, since it may be unsuitable for us to accurately delineate , where states can lie arbitrarily close to the decision boundary in the continuous state space X. Since we use the level set in our classification setting, this corresponds to . Here, is the signed distance from the decision boundary , which separates the safe set from the rest of the state space . This classifier loss has a magnitude ofin the case of a misclassification, and zero otherwise. This ensures that decisions far from the decision boundary, such as those near the origin, are considered more important than the more difficult decisions close to the boundary.

Ideally, we would like to minimize this loss throughout the state space with subject to the constraint (5). Since this problem is intractable, we use gradient-based optimization together with mini-batches instead, as is typically done in machine learning. To this end, we sample states from the state space X at random and assign the ground-truth labels to them. Based on this finite set, the optimization objective can be written as

where the batch is re-sampled after every gradient step. We can apply a Lagrangian relaxation

in order to make the problem tractable. Here, is a Lagrangian multiplier and the term is the Lyapunov decrease loss, which penalizes violations of (5). The decrease condition only needs to be enforced within the safe region , so we do not want to incur a loss if it is violated at a state where . Thus, we use the multiplier (y +1)/2 to map to {1, 0}, such that the Lyapunov decrease loss is zeroed-out if .

However, there are two issues when this formulation is compared to the exact problem in (3). Firstly, the objective (7) only penalizes violations of the decrease condition , rather than constraining to enforce it. Thus, while is always a provable safety certificate, we must verify that it holds over some level set whenever we update . Secondly, ground-truth labels of are not known in practice. To address these issues, we can use any method to check Lyapunov safety certificates over continuous state spaces to certify a level set as safe, and then use to estimate labels y from . For this work, we check the tightened certificate on a discretization of X, as described in Sec. 2.1. This method exposes the Lipschitz constant of , which can conveniently be used for regularization in practice [21]. Possible alternatives to this safety verification method include the use of an adaptive discretization for better scaling to higher-dimensional state spaces [11], and formal verification methods for neural networks [27, 28].

Since such an estimate of is limited by the largest safe level set of , we propose Algorithm 1 to iteratively “grow” an estimate of . We initialize , then use it to identify the largest safe level set by verifying the condition . At first, we use to estimate . At iteration , we consider the safe level set and the expanded level set for some ; see Fig. 2b. Then, states in the “gap” are forward-simulated

Figure 3. Results for training the neural network (NN) Lyapunov candidate for an inverted pendulum. In Fig. 3a, system trajectories (black) converge to the origin only within the largest safe region (green). The NN candidate (orange) characterizes with a level set better than both the LQR (blue ellipsoid) and SOS (yellow) candidates, as it adapts to the shape of , the safe level converges non-monotonically towards the fixed boundary , and the safe level set grows to cover most of . However, as discussed at the end of Sec. 3, convergence of is not guaranteed in general by Algorithm 1.

with the dynamics for time steps. States that fall in before or after forward-simulation form a new estimate of , since trajectories become “trapped” in and converge to the origin. We use this estimate of to identify labels y for classification, then apply SGD with the objective (7) to update and encourage to grow. Finally, we certify the new largest safe level set . These steps are repeated until a choice of stopping criterion is satisfied.

In general, Algorithm 1 does not guarantee convergence of the safe level set to , nor that monotonically grows in volume. Furthermore, it is not guaranteed that the iterated safe level approaches the safe level that is prescribed to delineate . This is typical of gradient-based parameter training, since the parameters can become “stuck” in local optima. However, since the Lyapunov candidate is guaranteed to satisfy the positive-definiteness and Lipschitz continuity conditions of Theorem 1 by its construction in Theorem 2, is always a provable safety certificate for identifying safe level sets. Thus, we can always use to identify at least a subset of , without ever identifying unsafe states as safe.

4 Experiments and Discussion

In the previous section, we developed Algorithm 1 to train the parameters of a neural network Lyapunov candidate constructed according to Theorem 2. This construction ensures the positive-definiteness and Lipschitz continuity assumptions on in Theorem 1 are satisfied. Algorithm 1 encourages to satisfy the decrease condition and match the true largest ROA for the closed-loop dynamics with a level set . In this section, we present details for the implementation of Algorithm 1 to learn the largest safe region of a simulated inverted pendulum system, and experimental results in a comparison to other methods of computing Lyapunov functions.

Inverted Pendulum Benchmark The inverted pendulum is governed by the differential equation with state , where is the angle from the upright equilibrium is the input torque, m is the pendulum mass, g is the gravitational acceleration, is the pole length, and is the friction coefficient. We discretize the dynamics with a time step of and enforce a saturation constraint , such that the pendulum falls over past a certain angle and cannot recover. For a linear policy , this yields the safe region in Fig. 3 around the upright equilibrium for the closed-loop dynamics . In particular, we fix K to the linear quadratic regulator (LQR) solution for the discretized, linearized, unconstrained form of the dynamics [29]. Outside of , the pendulum falls down without the ability to recover and the system trajectories diverge away from .

Practical Considerations To train the parameters of the Lyapunov candidate to adapt to the shape of , we use Algorithm 1 with SGD. To certify the safety of continuous level sets of whenever is updated, we check the stricter decrease condition at a discrete set of points that cover X in increasing order of the value of , as in [17]. Algorithm 1 does not guarantee that the safe level set estimate grows monotonically in volume towards with each iteration k. In fact, the estimate may shrink if initially succeeds and then fails to satisfy the decrease condition in some regions of the state space. This tends to occur near the origin, where and the “basin of attraction” characterized by “flattens”. To alleviate this, we use a large Lagrange multiplier in the SGD objective (7) to strongly “push” towards values that ensure continues to satisfy the decrease condition. In addition, we normalize the Lyapunov decrease loss in (7) by . This more heavily weighs sampled states near the origin, i.e., where is small.

Results We implement Algorithm 1 on the inverted pendulum benchmark with the Python code available at https://github.com/befelix/safe_learning, which is based on TensorFlow [30]. For the neural network Lyapunov candidate , we use three layers of 64 activation units each. We prescribe with as the level set that delineates the safe region . Fig. 3 shows the results of training with Algorithm 1, and the largest safe level set with 10 SGD iterations per update. Fig. 3a visualizes how this level set has “moulded” to the shape of . Fig. 3b shows how the safe level converges towards the prescribed level that delineates , and how the fraction of covered by approaches 1. The true largest ROA is estimated by forward-simulating all of the states in a state space discretization, and set volume is estimated by counting discrete states. Fig. 3a also shows the largest safe sets for a LQR Lyapunov candidate and a SOS Lyapunov candidate. The LQR candidate is computed in closed-form for the same discretized, linearized, unconstrained form of the dynamics used to determine the LQR policy [29]. The SOS Lyapunov candidate uses up to third-order monomials in x, thus it is a sixth-order polynomial. It is computed with the toolbox SOSTOOLS [31] and the SDP solver SeDuMi [32] in MATLAB for the unconstrained nonlinear dynamics with a Taylor polynomial expansion of . While the SOS approach is a powerful specialized method for polynomial dynamical systems, it cannot account for the non-differentiable nonlinearity introduced by the input saturation, which drastically alters the closed-loop dynamics. As a result, while is optimized for the system without saturation, it is ill-suited to the true closed-loop dynamics and yields a small safe level set. Overall, our neural network Lyapunov candidate performs the best at certification of as much of as possible, since it only relies on inputs and outputs of , and adapts to the shape of .

Comments on Safe Learning Fig. 3a demonstrates that a neural network Lyapunov candidate can certify more of the true largest safe region than other common Lyapunov candidates. This has important implications for safe exploration during learning for dynamical systems; with more safe states available to visit, an agent can better learn about itself and its environment under a wider range of operating conditions. For example, our method is applicable in the safe reinforcement learning framework of [17]. This past work provides safe exploration guarantees for a GP model of the dynamics with confidence bounds on the Lyapunov stability certificate, but these guarantees are limited by the choice of Lyapunov function. As our results have shown, certain Lyapunov candidates may poorly characterize the shape of the true largest safe region . Since our neural network Lyapunov candidate can adapt to the shape of during learning by using, for example, the mean estimate of from the GP model, we could enlarge the estimated safe region more quickly as data is collected. Our method is also applicable to exploration algorithms within safe motion planning that depends on knowledge of a safe region, such as in [33]. Overall, our method strongly warrants consideration for use in safe learning methods that leverage statistical models of dynamical systems.

5 Conclusion

We have demonstrated a novel method for learning safety certificates for general nonlinear dynamical systems. Specifically, we developed a flexible class of parameterized Lyapunov candidate functions and a training algorithm to adapt them to the shape of the largest safe region for a closed-loop dynamical system. We believe that our method is appealing due to its applicability to a wide range of dynamical systems in theory and practice. Furthermore, it can play an important role in improving safe exploration during learning for real autonomous systems in uncertain environments.

Acknowledgments

This research was supported in part by SNSF grant 200020 159557, the Vector Institute, and a fellowship by the Open Philanthropy Project.

References

[1] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Man´e. Concrete prob- lems in AI safety. Technical report, 2016. arXiv:1606.06565v2 [cs.AI].

[2] H. K. Khalil. Nonlinear Systems. Prentice Hall, Upper Saddle River, NJ, 3 edition, 2002.

[3] A. Vannelli and M. Vidyasagar. Maximal Lyapunov functions and domains of attraction for autonomous nonlinear systems. Automatica, 21(1):69–80, 1985.

[4] D. J. Hill and I. M. Y. Mareels. Stability theory for differential/algebraic systems with ap- plication to power systems. IEEE Transactions on Circuits and Systems, 37(11):1416–1423, 1990.

[5] J. M. G. da Silva Jr. and S. Tarbouriech. Antiwindup design with guaranteed regions of sta- bility: An LMI-based approach. IEEE Transactions on Automatic Control, 50(1):106–111, 2005.

[6] R. Kalman and J. Bertram. Control system analysis and design via the “second method” of Lyapunov II: Discrete-time systems. Transactions of the American Society of Mechanical Engineers (ASME): Journal of Basic Engineering, 82(2):394–400, 1960.

[7] P. Giesl and S. Hafstein. Review on computational methods for Lyapunov functions. Discrete and Continuous Dynamical Systems, Series B, 20(8):2291–2331, 2016.

[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2009.

[9] P. A. Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, California Institute of Technology, 2000.

[10] D. Henrion and M. Korda. Convex computation of the region of attraction of polynomial control systems. IEEE Transactions on Automatic Control, 59(2):297–312, 2014.

[11] R. Bobiti and M. Lazar. A sampling approach to finding Lyapunov functions for nonlinear discrete-time systems. In Proc. of the European Control Conference (ECC), pages 561–566, 2016.

[12] K. Zhou and J. C. Doyle. Essentials of Robust Control. Prentice Hall, Upper Saddle River, NJ, 1998.

[13] A. Trofino. Robust stability and domain of attraction of uncertain nonlinear systems. In Proc. of the American Control Conference (ACC), pages 3707–3711, 2000.

[14] U. Topcu, A. K. Packard, P. Seiler, and G. J. Balas. Robust region-of-attraction estimation. IEEE Transactions on Automatic Control, 55(1):137–142, 2010.

[15] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.

[16] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes. In Proc. of the IEEE Conference on Decision and Control (CDC), pages 4661–4666, 2016.

[17] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause. Safe model-based reinforce- ment learning with stability guarantees. In Proc. of the Conference on Neural Information Processing Systems (NIPS), pages 908–918, 2017.

[18] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 2 edition, 2018. (draft).

[19] V. Petridis and S. Petridis. Construction of neural network based Lyapunov functions. In Proc. of the IEEE International Joint Conference on Neural Network Proceedings, pages 5059–5065, 2006.

[20] N. Noroozi, P. Karimaghaee, F. Safaei, and H. Javadi. Generation of Lyapunov functions by neural networks. In Proc. of the World Congress on Engineering (WCE), volume 1, pages 61–65, 2008.

[21] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In Proc. of the International Conference on Learning Representations (ICLR), 2014.

[22] A. Papachristodoulou and S. Prajna. On the construction of Lyapunov functions using the sum of squares decomposition. In Proc. of the IEEE Conference on Decision and Control (CDC), pages 3482–3487, 2002.

[23] A. Papachristodoulou. Scalable analysis of nonlinear systems using convex optimization. PhD thesis, California Institute of Technology, 2005.

[24] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989.

[25] K. Hornik. Some new results on neural network approximation. Neural Networks, 6(8):1069– 1072, 2001.

[26] C. M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag, New York, NY, 2006.

[27] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety verification of deep neural networks. Technical report, 2017. arXiv:1610.06940v3 [cs.AI].

[28] G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. In Proc. of the International Conference on Computer Aided Verification (CAV), 2017.

[29] F. L. Lewis, D. L. Vrabie, and V. L. Syrmos. Optimal Control. John Wiley & Sons, Inc., Hoboken, NJ, 3 edition, 2012.

[30] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A system for largescale machine learning. In Proc. of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265–283, 2016.

[31] S. Prajna, A. Papachristodoulou, and P. A. Parrilo. Introducing SOSTOOLS: A general purpose sum of squares programming solver. In Proc. of the IEEE Conference on Decision and Control (CDC), pages 741–746, 2002.

[32] J. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11(1–4):625–653, 1999.

[33] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause. Learning-based model predictive control for safe exploration. In Proc. of the IEEE Conference on Decision and Control (CDC), 2018. (to appear).

A Proofs

Theorem 2 (Lyapunov neural network): Consider as a Lyapunov candidate function, where is a feed-forward neural network. Suppose, for each layer in , the activation function and weight matrix each have a trivial nullspace. Then has a trivial nullspace, and is positive-definite with and . Furthermore, if is Lipschitz continuous for each layer , then is locally Lipschitz continuous.

Proof. We begin by showing that has a trivial nullspace in X by induction, and then use this to prove that is positive-definite on X. Recall that a feed-forward neural network is a successive composition of its layer transformations, such that the output of layer for the state is the input to layer . Consider with the input , and the first layer output . Clearly has a trivial nullspace in X, since it is just the identity function. Since , and each have a trivial nullspace in their respective input spaces, the sequence of logical statements

holds. Thus, holds, and has a trivial nullspace in X. If we now assume has a trivial nullspace in X, it is clear that has a trivial nullspace in X, since

holds in a similar fashion. As a result, has a trivial nullspace for each layer by induction. Since is a composition of a finite number of layers, for some , thus has a trivial nullspace in X.

We now use this property of to prove that the Lyapunov candidate is positive-definite on X. As an inner product, is positive-definite on the transformed space . Thus, and otherwise. Since we have already proven , combining these statements shows that and otherwise. As a result, is positive-definite on X.

Finally, we need to show that if every activation function is Lipschitz continuous, then is locally Lipschitz continuous. If the neural network is Lipschitz continuous, then clearly is locally Lipschitz continuous, since it is quadratic and thus differentiable with respect to . To show that is Lipschitz continuous, it is sufficient to show that each layer is Lipschitz continuous. This is due to the fact that any function composition f(g(x)) is Lipschitz continuous with Lipschitz constant if f has Lipschitz constant and g has Lipschitz constant . This fact can be seen from , for each pair . By the Lipschitz continuity of function composition and the linearity of , each layer transformation is Lipschitz continuous if is Lipschitz continuous. As a result, the neural network is Lipschitz continuous, and the Lyapunov candidate is locally Lipschitz continuous.

Remark 1: In (2), we ensured each weight matrix has a trivial nullspace with the structure

where for some is the identity matrix, and is a constant. To minimize the number of free parameters required by our neural network Lyapunov candidate, we choose to be the minimum integer such that each entry in is independent from the others. Since is symmetric, it has free parameters, thereby requiring or . For this, we choose .

Designed for Accessibility and to further Open Science