Scalable synthesis of safety certificates from data with application to learning-based control

2017·arXiv

Abstract

I. INTRODUCTION

Digitalization opens new perspectives for control engineering and automation by making large amounts of data from experiments and numerical models available. Learning-based control exploits this cumulated knowledge and potentially also performs autonomous exploration of unseen system behavior in order to find an optimal control policy. An example is deep reinforcement learning (RL), providing prominent results, one of which is the application to Atari Arcade video-games [1].

Compared with traditional control techniques, learning-based methods offer the potential to reduce modeling and controller design effort. However, many industrial applications are safety-critical systems, i.e. systems with physical constraints that have to be satisfied. This essentially limits the application of most available learning-based control algorithms, which do not provide safety certificates. In order to address this limitation, we present efficient and scalable methods for the synthesis of a safety strategy consisting of a safe set and corresponding safe control law, which are cheap to implement and can be applied together with existing controllers or modern learning techniques to enhance them with safety guarantees.

Contributions: We consider dynamical systems with unknown Lipschitzian nonlinearity and formulate the safe set and safe controller synthesis as convex optimization problems, which directly employ available data. Our analysis considers an approximate linear model of the system and

Kim Wabersich (wabersich@kimpeter.de) and Melanie N. Zeilinger (mzeilinger@ethz.ch) are with the Institute for Dynamic Systems and Control, ETH Zurich, Switzerland. This work was supported by the Swiss National Science Foundation under grant no. PP00P2 157601/1.

uses data to incorporate the unknown nonlinear effects. The computations are based on Lyapunov’s method and result in two optimization problems: The first optimization problem defines a quadratic approximation of the nonlinearity in the Lyapunov conditions and the second one describes the computation of the safe set and controller. Similar to [2], [3] the framework can be used to augment any desired controller, which is lacking safety guarantees, in particular one which is based on learning.

We extend the technique to reduce conservatism of the safe set by putting a prior on the unknown dynamics in the form of a Gaussian process model, which is beneficial, especially in case of high dimensional systems and sparse data. Due to its less conservative nature, this extension favors safe exploration beyond the system behavior seen so far and is well suited for iteratively learning in closed-loop.

We illustrate the approach using examples, including a convoy of partly non-cooperative autonomous cars.

Related work: Given its relevance in industrial applications, there has been a growing interest in safe learning methods in the past years. Extensions of existing RL methods have been developed to enable safe RL with respect to different notions of safety, see [4] for a survey. A detailed literature review regarding RL, focusing on safety with respect to state and input constraints as also considered in this work, can be found in [3]. There are few results for efficient controller tuning from data with respect to best worst-case performance (also worst-case stability under physical constraints) by Bayesian min-max optimization, see e.g. [5], or by safety constrained Bayesian optimization as e.g. in [6], [7]. In [8] a method was developed that allows to analyze a given closed-loop system (under an arbitrary RL algorithm) with respect to safety.

Recent developments include the concept of a supervisory framework, which consists of a safe set in the state space and a safety controller. As long as the system state is in the interior of the safe set, any control law (e.g. unsafe learning-based control) can be applied. The safety controller only interferes if necessary, in case that the system reaches the boundary of the safe set, see e.g. [2], [3]. Such a framework allows for certifying an arbitrary learning-based control algorithm with safety guarantees. Previously proposed techniques are based on a differential game formulation, which results in solving a min-max optimal control problem. An active field of research aims at extending these techniques to larger scale systems, mostly by considering special cases as described, e.g., in [9], [10], [11]. For some relevant cases, the existence of analytic solutions [12] has been shown.

c2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Fig. 1: Illustration of Assumption II.1. Red dots display observations .

The results presented in this paper are based on the concept of a safety framework, but compared to previous work we focus on approximation techniques to improve scalability with respect to the system dimension.

Structure of the paper: In Section II we state the problem and in Section III we present our main result for safe set and controller computation. We then show an extension using a stronger assumption on the unknown system dynamics by considering Gaussian processes in Section IV in order to reduce conservatism of the safe set. The results are demonstrated on numerical examples within the respective sections. We conclude the paper in Section V.

Notation: The set of symmetric matrices of dimension n is , the set of positive (semi-) definite matrices is () , the set of integers in the interval is , the set of integers in the interval is , and for let . The boundary of an arbitrary compact set is . Given a set D = , let and . Define as , which picks the closest element in with respect to under the 2-norm. Given a set and a locally Lipschitz continuous function , the local Lipschitz constant for all is denoted by . The Minkowski sum of two sets is denoted by .

II. PROBLEM DESCRIPTION

We consider deterministic nonlinear systems of the form

where and is locally Lipschitz continuous. The system is subject to polytopic state constraints , and polytopic input constraints . The origin is contained in X, (A, B) is controllable, and the system state is fully observable.

The explicit form of the nonlinearity d is unknown, hence d is assumed to be a memoryless black-box function, which will be identified from system measurements. However, we assume that B, i.e. the influences of the control input, are known. The matrix A incorporates system knowledge in form of a linear model, which will be used in the design procedure in Section III-D. The linear model can, e.g., be selected as an approximate system model. For identification of d(x), we have access to finitely many observations D = that fulfill the following property (Figure 1).

Assumption II.1. Given a set with N data tuples, where , there exists a non-trivial subset such that for any there exists an such that .

Remark II.2. Assumption II.1 implies that there exists a region, in which the collected data samples are dense. Intuitively, one can think of together with the Lipschitz constant L, as a ‘measure of knowledge’ that we have about d(x) inside . The ‘knowledge’ increases as gets smaller.

Remark II.3. For simplicity, we assume noise-free data D. It would, however, be possible to incorporate bounded or stochastic noise with only minor changes.

We consider the problem of providing a safety certificate for an arbitrary control law by means of a safe set and controller, as proposed in [2], [3]. Consider a potentially unsafe control strategy , obtained for example by application of RL (which often cannot guarantee constraint satisfaction). In order to achieve minimal interference with the desired control , the goal is to compute a set of states S, for which we know a control strategy such that input and state constraints will be satisfied for all future times, in particular considering that d(x) is unknown. The control can then safely be applied in the interior of S, until it becomes necessary to take a safety-ensuring action on the boundary of S which guarantees that we stay in S, i.e. that we can still provide a safe control strategy in the future. More formally:

Definition II.4. A set is called a safe set for system (1) if there exists a safe control law such that for an arbitrary (learning-based) policy , the safe controller

guarantees that the state x(t) is contained in S for all t > 0 if .

In particular, we aim at finding an algorithm that scales well in computational complexity with respect to the dimen- sionality of system (1) as well as the number of measurements.

III. SAFE-SETS FOR NONLINEAR SYSTEMS FROM DATA

We first introduce the class of safe-sets considered. Afterwards, we motivate the proposed method and highlight its basic idea using an example, in order to then introduce the algorithm for safe set and controller computation in the remainder of the section.

A. Ellipsoidal safe set

In order to provide a scalable optimization-based approach, we restrict the form of the safe set to an ellipsoidal

set of the form

with , and the safe controller to the class of linear state feedback control laws with . To construct , we leverage Lyapunov’s direct method, with a quadratic Lyapunov function . By standard Lyapunov arguments, the following sufficient conditions ensure, analogously to [2, Lemma 1], that fulfills Definition II.4:

for all .

B. Motivating Example

Consider the system with subject to input constraints and state constraints . The task is to find a safe interval S = [a, b] and a corresponding safe control law according to Definition II.4. The nonlinearity d(x) is unknown, but we are given a set of noise-free observations such that the convex hull equals the state space (this will not be a necessary assumption later, see also Assumption II.1). We consider a linear state feedback .

Robust approach: Without any knowledge of d(x), a robust approach is to consider the dynamics , with , where the bound on w is estimated from D. In this case, there does not exist a feasible controller gain k w.r.t. input constraints for any state and , such that , i.e. there does not exist a safe set.

Proposed approach: Let be our Lyapunov candidate function. We analyze . At the boundary of the state space we have the measurements , providing that . By standard Lyapunov arguments, this implies that for k = 0 we have that for all , for all t > 0. We conclude that is a safe controller for which the state constraint set constitutes a safe set.

This example highlights that rather than taking uniform bounds on the unknown dynamics, we can provide less conservative safe sets by quantifying the effect of the unknown dynamics in the form of state-dependent disturbances, which can be inferred from the available data D. In the following, we will exploit this concept for safe set and controller computations and conclude the section with two examples.

C. Computation of the safe set and controller

Given a matrix P (see Section III-D) that determines the shape of the safe set, we write the problem of finding the size of the safe set and a corresponding safe controller as two consecutive convex optimization problems.

Remark III.1. Given the shape P of the safe set (3), there exists a small enough such that Assumption II.1 is satisfied on the sub-level set , i.e.

The proposed procedure first uses data to bound the effects of the nonlinearity on the Lyapunov decrease by a quadratic form in the largest possible safe set, i.e. over . This quadratic bound is then used as input to the second optimization problem, which computes the controller and set size in order to take into account the nonlinearities in addition to the linear system dynamics. The restriction to a quadratic bound of the nonlinearity is motivated by the fact that it can be treated efficiently by means of a convex problem.

In order to reduce conservatism, we bound the nonlinearity on sub-regions of the safe set, described by intervals

which are defined such that for any . Note that the selection of sub-intervals is possible as we will use the quadratic bound for upper bounding (4c), which is only required to hold on . For every interval , we then formulate two convex optimization problems in order to determine the volume of the safe set and the safe controller. In case no solution exists, the interval can be reduced. In general, the smaller the intervals are chosen, the less conservative the bound will be.

Bounding the nonlinear effects: Given an interval , consider the neighborhood . The indices of data samples inside the set are given by . We seek to find a quadratic bound on the nonlinearity arising in the Lyapunov decrease (4c) for all , i.e. to find a such that

The first optimization problem for bounding the nonlinearity over each interval is given by

with and as defined in Assumption II.1.

Lemma III.2. Let Assumption II.1 hold and let satisfy (5). Consider an interval according to (6). If (8) attains a solution, then for all it holds that

Proof. We prove that for all the implication (8b), (8c) (9) holds. First note that for any there exists a such that by the definition of the intervals, i.e. , see also Remark III.1. For notational ease let . Equation (9) reads . For all and for all we have therefore by Lipschitz continuity . Note that by the definition of the intervals and Remark III.1 the relation holds. As a consequence, if for all and for all we have that , then the quadratic bound (9) holds for all . Finally, using the S-Lemma (see [13]) the condition is equal to (8b),(8c) which completes the proof.

Remark III.3. The optimization problem in (8) is a convex semidefinite programming problem. In case that there are more observations () than the optimization algorithm can handle in (8), one can iteratively calculate : Solve (8) using a subset of D in order to obtain , in the next iteration choose another disjoint subset of D and add the constraint to (8) in order to obtain . Repeat until all subsets of D are processed which yields a feasible, possibly suboptimal solution to (8).

Problem (8) provides a bound on the nonlinear effect in the Lyapunov decrease by means of the Lipschitz constant of . In practice, a local Lipschitz constant can e.g. be obtained from data as

Since does not have to be positive definite, stabilizing effects of the nonlinearity can be considered in (7), i.e. can be negative and therefore contribute to rendering negative on the boundary of the safe set. This is demonstrated in Section III-E, see also Figure 2 (right).

Calculation of safe level set and safe controller: By using the quadratic bound from the first optimization problem, we are able to state the second optimization problem for safe set and controller design satisfying the conditions (4a)-(4c) as follows. Let be chosen according to (6). The optimization problem is given by

(10b)

Theorem III.4. Let Assumption II.1 hold and let satisfy (5). Consider an interval according to (6). If (10) attains a solution , then is a safe set for system (1) according to Definition II.4 with .

Proof. We prove the result in two steps: 1) Conditions (10d)-(10e) imply (4a) and (4b). By [14, Section 5.2.2] we can rewrite (4b) as which equals (10e). The matrix inequality for the states can be derived similarly. 2) Conditions (10b)-(10c) ensure that fulfills (4c). For all we have to fulfill (4c) which is implied by (7). A sufficient condition for (4c) is therefore , i.e. that for all . Multiplying from left and right by yields (10c). We have shown that is a safe set according to Definition II.4. The objective in (10) yields the largest safe set under these sufficient conditions.

Note that optimizing over P and K in (10) is not possible, because the bound obtained in the first optimization step depends on P.

Problem (8) and (10) provide (semidefinite) convex optimization problems for computing a safe set and controller by directly employing data points. Such problems can be solved efficiently even for higher dimensions, see e.g. [15], [16]. While this offers a general approach with favorable scalability properties, the limitation is that the resulting safe set cannot be larger than the convex hull of the data points plus a -neighborhood. Exploration is therefore limited. Nevertheless, initially collected data can be iteratively extended inside such that from Assumption II.1 gets smaller over time. Recomputation of the safe set can then reduce conservatism. We present an extension in Section IV which further reduces conservatism and improves exploration.

D. Shape of the safe set

Using the linear model of the system dynamics in (1), we define an approximate initial shape matrix P of the safe set by neglecting the unknown nonlinearity. Assume that is given by with , which can be e.g. calculated as the minimum volume covering ellipse of the data points D, see [17, p. 222]. We can find a safe set for system (1) by setting d(x) = 0, resulting in the following optimization problem with and

min logdet(E) (11a)

s.t. AE (11b) AE + EABY + Y (11c)

If (11) attains a solution, then analogously to Theorem III.4, we obtain a safe set for system (1) according to Defini-tion II.4 with , and , but for d(x) = 0. Constraint (11b) ensures that the safe set is a subset of , i.e. the set in which Assumption II.1 is satisfied and therefore the data is dense. This implies that (0, 1], i.e. , is the maximum set size such

Fig. 2: Left: Sample trajectories under the safe control law, when starting on the boundary of the safe set . Red dots: Data points, black lines: Closed-loop system trajectories, elliptic ring , which contains . Dashed set: Safe set using from Section IV. Right: Quadratic bound according to Lemma III.2 (blue) of non-linear term (red).

that Assumption II.1 as well as all constraints are fulfilled. The optimization problem (10) then aims at improving the initial approximation obtained via (11) with respect to the nonlinearity by designing a different control law and safe set size.

E. Illustrative numerical example

Consider a nonlinear system of the form (1), where A = with input constraints and state constraints . The origin is not a stable equilibrium, and is neither an equilibrium point. For simplicity we consider a grid of data points as illustrated in Figure 2, however any data set fulfilling Assumption II.1 could be used. The data was taken inside the set with obtained by solving (11), which also corresponds to with . We apply Theorem III.4 by solving (8) and (10) for and obtain . The results are illustrated in Figure 2. Note that in general the quadratic bound must only hold on and could therefore be violated around the origin.

F. Simulation: Safety for autonomous convoys

Consider a convoy of cars or trucks as depicted in Figure 3. Given a target velocity and a possibly small target distance , the goal is to drive closely behind each other, in order to leverage slipstream effects for efficiency. We assume that it is possible to overwrite the local controllers, i.e. the acceleration of car 1, car 3 and car 4 in a centralized way if necessary to ensure safety. During a supervised observation phase, initial data about the system is collected. We consider the problem of finding a safe, centralized control law, and a safe set such that the cars will not crash, even if we cannot determine the acceleration of car 2 and car 5.

Let be the difference between the target distance and the actual distance of car i + 1 and car i. Let be the difference between the

Fig. 3: Illustration of the autonomous car convoy. The acceleration of red cars cannot be controlled.

target velocity of the convoy and the velocity of car i. The dynamics of all cars i = 1, ..., 5 are given as where is the applied acceleration. The control law of car 1 is given by , of cars 3, 4 by and of cars 2, 5 (which we cannot overwrite) by , , i.e. they apply a saturated, stabilizing state feedback law and are therefore nonlinear. The target distance between the cars is 1 meter. In order to avoid a crash the state constraints are given by . The maximum acceleration of cars 1, 3, and 4 is , i.e. . We are given observations of and in the interval with .

In Figure 4, a numerical simulation under the resulting safe control law (2) is shown, starting from the boundary of the safe set with and the remaining states equal to zero, which represents the situation that the second car is too close to the first one and its velocity is slightly higher than the reference velocity. As we can see in Figure 4, the first car has to accelerate quickly several times during the first two seconds for safety reasons, since the second car (which cannot be controlled) would decelerate more as car 3 would be able to compensate. The same situation occurs at 4.2 seconds between car 3 and car 4 and at around 10 seconds between car 1 and car 2 again. After six safety interventions in total, the local controllers of the cars are able to stabilize the overall system.

IV. SAFE-SETS USING GAUSSIAN PROCESSES

The previous sections are based on Lipschitz continuity of the unknown nonlinearity d(x), which was incorporated into the quadratic upper bound in (10), see Lemma III.2. A shortcoming of using only Lipschitz continuity as ‘prior’ knowledge is the requirement of relatively dense and structured data, see Assumption II.1. This implies that almost no ‘exploration’ can be made beyond the data, observed so far. By putting a stronger prior on the class of functions for modeling the nonlinearity d(x), we develop a less conservative quadratic bound, which can then be used in (10) and is generally expected to yield a larger safe set. In addition, it improves exploration beyond the data points allowing to iteratively improve the safe set during closed-loop operation, where initially few data points are generally available.

A. Gaussian Processes We use a Gaussian process model (GP) in order to perform Bayesian inference on the unknown nonlinearity (see. e.g.

Fig. 4: Sample trajectory of the car convoy under the safety framework, when starting on the boundary of . Grey lines indicate times when the safe control law is applied.

[18]) for each element of d(x). The GP is defined by a mean function , together with a covariance (kernel) function , denoted with for the GP prior on in short. We set the mean prior function to be constant, i.e. . Given observations at locations where , the posterior distribution of is given by

where , is the posterior covariance, is the variance, is the positive definite covariance matrix matrix and is a vector of N elements, each equal to . The posterior mean for the vector d(x) is and the variance .

B. GP-based bounding of nonlinearity

The GP model provides a measure for the posterior variance of d(x), which is used to improve the bound on the effect of the nonlinearity on the Lyapunov decrease. Instead of using Lipschitz-based arguments, as in Section III, we calculate a strict quadratic bound on highly probable, worst-case realizations of the nonlinear term for all , given an interval . The intervals in this section are not limited to a dense data region . Therefore one can drop the constraint (11b) in the computation of P in (11) and choose for the construction of the

intervals (6). By relying on the GP, the largest interval can be chosen independent of Assumption II.1 and Remark III.1.

Algorithm 1 summarizes the calculation of the quadratic bound, implementing the following idea. Let f(x) be a function that has to be quadratically upper bounded by for all . In order to enforce the infinite dimensional constraint , we proceed iteratively by starting with a finite approximation, which will be improved until holds for all .

fined, which returns the maximum value of the nonlinear term with a chosen probability, e.g. with 99.73% by letting c = 3. Starting with a finite number of samples of f(x) for (lines 3,4) we compute an initial guess for a quadratic bound on given by in line 5, where

s.t. for all , which yields a quadratic upper bound on .

We search for potential violations of the current bound in line 8 and add it to the set of data points in lines 9 and 10. After that we update the quadratic bound in line 11. The algorithm iterates until there is no violation. This way for all , the quadratic form will be a bound on with high probability.

Remark IV.1. The optimization problem in line 8 is continuous (compare for example [19, Chapter 2G]), but nonconvex. An alternative approach is to build a discretization of , which we denote by , with grid size , , and evaluate the posterior mean and covariance for each . By selecting with as defined in [20, Lemma 3], the bound can be approximated using (8) as a convex optimization problem.

For the second step of calculating a safe set size and controller, we can simply use the bound instead of in (10) in order to obtain a set, which is safe with the selected probability. By construction, will be less conservative, or equal than . This is due to the fact that we put a prior on the unknown nonlinearity d(x),

Fig. 5: Safe RL using the proposed safety framework together with the PGSD [21] algorithm. The distance of the state to the origin, as well as the volume of the safe set is shown over time. Grey-shaded time points depict application of the safe control law .

which allows for Bayesian inference and therefore improved extra- and interpolation based on the data, as opposed to using estimates based on Lipschitz continuity. In Figure 2 (left), the benefit of using over the Lipschitz based bound Q is illustrated. Moreover, the main advantage is that we do not require particular assumptions on the data and the safe set is not limited to a subset of , as it is the case in Lemma III.2.

C. Numerical example: Exploration

Consider a nonlinear system of the form (1) with A = , input constraints , and state constraints . We use a squared exponential kernel as defined in [18] with and . Given initial data in with , we solve (10) using the high probability (c = 3, 99.73%) bound with an initial P obtained via (11) and for i = 1, 2, .., 8, i.e. we start with i = 1 and iterate i = 2, 3, .. until we find a feasible solution. We assume that the desired control input is given by the policy gradient with signed derivative (PGSD) algorithm (see [21]), which is a policy search RL method without any safety guarantees. During closed-loop operation under the safe control law (2) we collect data D. Every 0.2 seconds we recompute the safe level set, i.e. , where the safe set size converges after 2.5 seconds. In Figure 5 the evolution of the safe set size (volume of the ellipse) is shown as well as the distance of the system state to the origin, which has to be minimized by the PGSD learning based control law. The unsafe RL input is ‘overwritten’ three times indicated by the grey lines to ensure safety, until it begins to converge.

V. CONCLUSION

This paper presents a safety framework that allows to en- hance arbitrary learning-based and unsafe control strategies, for nonlinear and potentially larger scale systems with safety certificates. The nonlinearity is assumed to be unknown and we only require a possibly inaccurate linear system model and observations of the system. A key feature is that the proposed method directly exploits the available data, without the need of an additional learning mechanism. By relying on convex optimization problems, the proposed method is scalable with respect to the system dimension and number of data points. In order to reduce conservatism of the safe set calculations, the approach was extended using a Gaussian process model as prior on the nonlinearity. This modification enables safe exploration and thereby iterative computation of the safe set during closed-loop operation. The results were demonstrated using several numerical example problems, showing that the safety framework can be used to certify arbitrary and in particular learning-based controllers.

REFERENCES

[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[2] A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin, “Reachability-based safe learning with gaussian processes,” in 53rd IEEE Conference on Decision and Control (CDC). IEEE, 2014, pp. 1424–1431.

[3] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,” arXiv preprint arXiv:1705.01292, 2017.

[4] J. Garcıa and F. Fern´andez, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, pp. 1437–1480, 2015.

[5] K. P. Wabersich and M. Toussaint, “Automatic testing and minimax optimization of system parameters for best worst-case performance,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept 2015, pp. 5533–5539.

[6] F. Berkenkamp and A. P. Schoellig, “Safe and robust learning control with gaussian processes,” in European Control Conference (ECC), July 2015, pp. 2496–2501.

[7] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe controller optimization for quadrotors with gaussian processes,” in Proc. of the International Conference on Robotics and Automation (ICRA), 2016, pp. 491–496.

[8] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause, “Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes,” in 55th IEEE Conference on Decision and Control (CDC), Dec 2016, pp. 4661–4666.

[9] M. Chen, J. F. Fisac, S. Sastry, and C. J. Tomlin, “Safe sequential path planning of multi-vehicle systems via double-obstacle hamilton-jacobi-isaacs variational inequality,” in European Control Conference (ECC), July 2015, pp. 3304–3309.

[10] S. Kaynama, I. M. Mitchell, M. Oishi, and G. A. Dumont, “Scalable safety-preserving robust control synthesis for continuous-time linear systems,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 3065–3070, 2015.

[11] J. F. Fisac and S. S. Sastry, “The pursuit-evasion-defense differential game in dynamic constrained environments,” in 54th IEEE Conference on Decision and Control (CDC), Dec 2015, pp. 4549–4556.

[12] J. Darbon and S. Osher, “Algorithms for overcoming the curse of dimensionality for certain Hamilton–Jacobi equations arising in control theory and elsewhere,” Research in the Mathematical Sciences, vol. 3, no. 1, p. 19, 2016.

[13] I. P´olik and T. Terlaky, “A survey of the S-lemma,” SIAM review, vol. 49, no. 3, pp. 371–418, 2007.

[14] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory. SIAM, 1994.

[15] K.-C. Toh, M. J. Todd, and R. H. T¨ut¨unc¨u, “SDPT3a matlab software package for semidefinite programming, version 1.3,” Optimization methods and software, vol. 11, no. 1-4, pp. 545–581, 1999.

[16] M. ApS, The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28)., 2016. [Online]. Available: http://docs.mosek.com/7.1/toolbox/index.html

[17] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.

[18] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning. MIT press Cambridge, 2006, vol. 1.

[19] A. L. Dontchev and R. T. Rockafellar, “Implicit functions and solution mappings,” Springer Monogr. Math., 2009.

[20] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Proc. of the Conference on Neural Information Processing Systems (NIPS), 2017.

[21] J. Z. Kolter and A. Y. Ng, “Policy search via the signed derivative.” in Robotics: science and systems, 2009, p. 34.

Designed for Accessibility and to further Open Science