Bayesian Learning-Based Adaptive Control for Safety Critical Systems

2019·Arxiv

Abstract

Abstract

Deep learning has enjoyed much recent success, and applying state-of-the-art model learning methods to controls is an exciting prospect. However, there is a strong reluctance to use these methods on safety-critical systems, which have constraints on safety, stability, and real-time performance. We propose a framework which satisfies these constraints while allowing the use of deep neural networks for learning model uncertainties. Central to our method is the use of Bayesian model learning, which provides an avenue for maintaining appropriate degrees of caution in the face of the unknown. In the proposed approach, we develop an adaptive control framework leveraging the theory of stochastic CLFs (Control Lyapunov Functions) and stochastic CBFs (Control Barrier Functions) along with tractable Bayesian model learning via Gaussian Processes or Bayesian neural networks. Under reasonable assumptions, we guarantee stability and safety while adapting to unknown dynamics with probability 1. We demonstrate this architecture for high-speed terrestrial mobility targeting potential applications in safety-critical high-speed Mars rover missions.

Index Terms— Robust/Adaptive Control of Robotic Systems, Robot Safety, Probability and Statistical Methods, Bayesian Adaptive Control, Deep Learning, Mars Rover

I. INTRODUCTION

The rapid growth of Artificial Intelligence (AI) and Machine Learning (ML) disciplines has created a tremendous impact in engineering disciplines, including finance, medicine, and general cyber-physical systems. The ability of ML algorithms to learn high dimensional dependencies has expanded the capabilities of traditional disciplines and opened up new opportunities towards the development of decision making systems which operate in complex scenarios. Despite these recent successes [1], there is low acceptance of AI and ML algorithms to safety-critical domains, including human-centered robotics, and particularly in the flight and space industries. For example, both recent and near-future planned Mars rover missions largely rely on daily human decision making and piloting, due to a very low acceptable risk for trusting black-box autonomy algorithms. Therefore there is a need to develop computational tools and algorithms that bridge two worlds: the canonical structure of control theory, which is important for providing guarantees in safety-critical applications, and the data driven abstraction and representational power of machine learning, which is

Fig. 1: The left image depicts a 1/5th scale RC car platform driving at the Mars Yard at JPL; and the right is a platform from the Mars Explore Rover (MER) mission.

necessary for adapting the system to achieve resiliency against unmodeled disturbances.

Towards this end, we propose a novel, lightweight framework for Bayesian adaptive control for safety critical systems, which we call BALSA (BAyesian Learning-based Safety and Adaptation). This framework leverages ML algorithms for learning uncertainty representations of dynamics which in turn are used to generate sufficient conditions for stability using stochastic CLFs and safety using stochastic CBFs. Treating the problem within a stochastic framework allows for a cleaner and more optimal approach to handling modeling uncertainty, in contrast to deterministic, discrete-time, or robust control formulations. We apply our framework to the problem of high-speed agile autonomous vehicles, a domain where learning is especially important for dynamics which are complex and difficult to model (e.g., fast autonomous driving over rough terrain). Potential Mars Sample Return (MSR) missions are one example in this domain. Current Mars rovers (i.e., Opportunity and Curiosity) have driven on average 3km/year [2], [3]. In contrast, if MSR launches in 2028, then the rover has only 99 sols (102 days) to complete potentially 10km [4], [5]. After factoring in the intermittent and heavily delayed communications to earth, the need for adaptive, high-speed autonomous mobility could be crucial to mission success.

Along with the requirements for safety and adaptation, computational efficiency is of paramount importance for real systems. Hardware platforms often have severe power and weight requirements, which significantly reduce their computational power. Probabilistic learning and control over deep Bayesian models is a computationally intensive problem. In contrast, we shorten the planning horizon and rely on a high-level, lower fidelity planner to plan desired trajectories. Our method then guarantees safe trajectory tracking behavior, even if the given trajectory is not safe. This frees up the computational budget for other tasks, such as online model training and inference.

Related work - Machine-learning based planning and

control is a quickly growing field. From Model Predictive Control (MPC) based learning [6], [7], safety in reinforcement learning [8], belief-space learning and planning [9], to imitation learning [10], these approaches all demand considerations of safety under learning [11], [12], [13], [14]. Closely related to our work is Gaussian Process-based Bayesian Model Reference Adaptive Control (GP-MRAC) [15], where modeling error is approximated with a Gaussian Process (GP). However, computational speed of GPs scales poorly with the amount of data (), and sparse approximations lack representational power. Another closely related work is that of [16], who showed how to formulate a robust CLF which is tolerant to bounded model error. Extensions to robust CBFs were given in [17]. A stated drawback of this approach is the conservative nature of the bounds on the model error. In contrast, we incorporate model learning into our formulation, which allows for more optimal behavior, and leverage stochastic CLF and CBF theory to guarantee safety and stability with probability 1. Other related works include [18], which uses GPs in CBFs to learn the drift term in the dynamics f(x), but uses a discrete-time, deterministic formulation. [19] combined L1 adaptive control and CLFs. Learning in CLFs and CBFs using adaptive control methods (including neuro-adaptive control) has been considered in several works, e.g. [20], [21], [22], [23].

Contributions - Here we take a unique approach to address the aforementioned issues, with the requirements of 1) adaptation to changes in the environment and the system, 2) adaptation which can take into account high-dimensional data, 3) guaranteed safety during adaptation, 4) guaranteed stability during adaptation and convergence of tracking errors, 5) low computational cost and high control rates. Our contributions are fourfold: First, we introduce a Bayesian adaptive control framework which explicitly uses the model uncertainty to guarantee stability, and is agnostic to the type of Bayesian model learning used. Second, we extend recent stochastic safety theory to systems with switched dynamics to guarantee safety with probability 1. In contrast to adaptive control, switching dynamics are used to account for model updates which may only occur intermittently. Third, we combine these approaches in a novel online-learning framework (BALSA). Fourth, we compare the performance of our framework using different Bayesian model learning and uncertainty quantification methods. Finally, we apply this framework to a high-speed driving task on rough terrain using an Ackermann-steering vehicle and validate our method on both simulation and hardware experiments.

II. SAFETY AND STABILITY UNDER MODEL LEARNING VIA STOCHASTIC CLF/CBFS

Consider a stochastic system with SDE (stochastic differential equation) dynamics:

where , the controls are , the diffusion is , and is a zero-mean Wiener process. For simplicity we restrict our analysis to systems of this form, but emphasize that our results are extensible to systems of higher relative degree [24], as well as hybrid systems with periodic orbits [25]. A wide range of nonlinear control-affine systems in robotics can be transformed into this form. In general, on a real system, f, may not be fully known. We assume g(x) is known and invertible, which makes the analysis more tractable. It will be interesting in future work to extend our approach to unknown, non-invertible control gains, or non-control affine systems (e.g. be a given approximate model of f(x). We formulate a pre-control law with pseudo-control

which leads to the system dynamics being

where is the modeling error, with

Suppose we are given a reference model and reference control from, for example, a path planner:

The utility of the methods outlined in this work is for adaptive tracking of this given trajectory with guaranteed safety and stability. We assume that is continuously differentiable in is bounded and piecewise continuous, and that is bounded for a bounded . Define the error . We split the pseudo-control input into four separate terms:

where we assign to a PD controller:

Additionally, we assign as a pseudo-control which we optimize for and as an adaptive element which will cancel out the model error. Then we can write the dynamics of the model error e as:

d

where the matrices A and G are used for ease of notation. The gains should be chosen such that A is Hurwitz. When , the drift modeling error term is canceled out from the error dynamics.

Next, we require a method for learning or approximating the drift and diffusion terms . Such methods include Bayesian SDE approximation methods [26], NeuralSDEs [27], or differential GP flows [28], to name a few. This model should know what it doesn’t know [29], and should capture both the epistemic uncertainty of the model, i.e., the uncertainty from lack of data, as well as the aleatoric uncertainty, i.e., the uncertainty inherent in the system [30]. We expect that these methods will continue to be improved by the community. We can use the second equation in (3) to generate data points to use for learning these terms in the SDE. In discrete time, the learning problem is formulated as finding a mapping from input data to output data Given the dataset with , we can construct the approximates the drift term approximates the diffusion term . Note that we do not require updating the model at each timestep, which significantly reduces computational load requirements and allows for training more expressive models (e.g., neural networks).

In practical terms, in this work we opt for an approximate method for learning , in which we view each data point in as an independently and identically distributed sample, and set up a single timestep Bayesian regression problem, in which we model as a multivariate Gaussian random variable, i.e. . This approximation ignores the SDE nature of (3) and will not be a faithful approximation (See [31] for insightful comments on this problem). However, until Bayesian SDE approximation methods improve, we believe this approach to be reasonable in practice. Methods for producing reliable confidence bounds include a large class of Bayesian neural networks ([32], [33], [34]), Gaussian Processes or its many approximate variants ([35], [36]), and many others. We compare several methods in our experimental results. We leave a more principled learning approach using Bayesian SDE learning methods for future work.

After obtaining the joint model (7) can be written as the following switching SDE:

with is a switching index which updates each time the model is updated. The main problem which we address is how to find a pseudo-control which provably drives the tracking error to 0 while simultaneously guaranteeing safety.

Since is not known a priori, one approach is to assume that is bounded by some known term. The size of this bound will depend on the type of model used to represent the uncertainty, its training method, and the distribution of the data . See [15] for such an analysis for sparse online Gaussian Processes. For neural networks in general there has been some work on analyzing these bounds [37], [38]. For simplicity, let us assume the modeling error , and instead rely on to fully capture any remaining modeling error in the drift. Then we have the following dynamics:

with . This is valid as long as captures both the epistemic and aleatoric uncertainty accurately. Note also that if the bounds on are known, then our results are easily extensible to this case via (8).

A. Stochastic Control Lyapunov Functions for Switched Systems

We establish sufficient conditions on to guarantee convergence of the error process e(t) to 0. The result is a linear constraint similar to deterministic CLFs (e.g., [17]). The difference here is the construction a stochastic CLF condition for switched systems. The switching is needed to account for online updates to the model as more data is accumulated.

In general, consider a switched SDE of Itˆo type [39] defined by:

where is a Wiener process, a(t, X) is a -vector function, matrix, and is a switching index. The switching index may change a finite number of times in any finite time interval. For each switching index, must satisfy the Lipschitz condition D with D compact. Then the solution of (10) is a continuous Markov process.

Definition II.1. X(t) is said to be exponentially mean square ultimately bounded uniformly in i if there exists positive constants such that for all , we have that

We first restate the following theorem from [15]:

Theorem II.1. Let X(t) be the process defined by the solution to (10), and let V (t, X) be a function of class with respect to X, and class with respect to t. Denote the Itˆo differential generator by L. If 1) for real ; and 2) for real , and all i; then the process X(t) is exponentially mean square ultimately bounded uniformly in i. Moreover,

Proof. See [15] Theorem 1.

We use Theorem II.1 to derive a stochastic CLF sufficient condition on for the tracking error e(t). Consider the stochastic Lyapunov candidate function where P is the solution to the Lyapunov equation , where Q is any symmetric positive-definite matrix.

Theorem II.2. Let e(t) be the switched stochastic process defined by (9), and let be a positive constant. Suppose for all and the relaxation variable satisfy the inequality:

Then e(t) is exponentially mean-square ultimately bounded uniformly in i. Moreover if (11) is satisfied with all exponentially in the mean-squared sense.

Proof. The Lyapunov candidate function V (e) is bounded above and below by

. We have the following Itˆo differential of the Lyapunov candidate:

Rearranging, (11) becomes , we see that the conditions for Theorem II.1 are satisfied and e(t) is exponentially mean square ultimately bounded uniformly in i. Moreover,

where is the condition number of the matrix P. Therefore if converges to 0 exponentially in the mean square sense.

The relaxation variable allows us to find solutions for which may not always strictly satisfy a Lyapunov stability criterion . This allows us to incorporate additional constraints on at the cost of losing convergence of the error e to 0. Fortunately, the error will remain bounded by the largest . In practice we re-optimize for a new at each timestep. This does not affect the result of Theorem II.2 as long as we re-optimize a finite number of times for any given finite interval.

One highly relevant set of constraints we want to satisfy are control constraints , where is a matrix and is a vector. Let Recall the pre-control law (2). Then the control constraint is:

Next we formulate additional constraints to guarantee safety.

B. Stochastic Control Barrier Functions for Switched Systems

We leverage recent results on stochastic control barrier functions [40] to derive constraints linear in which guarantee the process x(t) satisfies a safety constraint, i.e., is defined by a locally Lipschitz function as and . We first extend the results of [40] to switched stochastic systems.

Definition II.2. Let X(t) be a switched stochastic process defined by (10). Let the function be locally Lipschitz and twice-differentiable on int(C). If there exists class-K functions and such that for all X,

1/γ1(h(X)) ≤ B(X) ≤ 1/γ2(h(X)), then B(x) is called a

candidate control barrier function.

Definition II.3. Let B(x) be a candidate control barrier function. If there exists a class-K function such that

, then B(x) is called a control barrier function (CBF).

Theorem II.3. Suppose there exists a CBF for the switched stochastic process X(t) defined by (10). If all

Proof. [40] Theorem 1 provides a proof of the result for nonswitched stochastic processes. Let denote the switching times of X(t), i.e., when , the process X(t) has diffusion matrix , and when for i > 0, the process X(t) has diffusion matrix . If , then for all with probability 1 since the process X(t) does not switch in the time interval By similar argument for any for all with probability 1. This also implies that , since X(t) is a continuous Markov process. Then for all with probability 1. Then by induction, for all

Next, we establish a linear constraint condition sufficient for to guarantee safety for (9). Rewrite (9) in terms of x(t) as:

Theorem II.4. Let x(t) be a switched stochastic process defined by (16). Let B(x) be a candidate control barrier function. Let be a class-K function. Suppose for all satisfies the inequality:

Then B(x) is a CBF and (17) is a sufficient condition for safety, i.e., if with probability 1.

Proof. We have the following Itˆo differential of the CBF candidate B(x):

Rearranging (17) it is clear that . Then B(x) is a CBF and the result follows from Theorem II.3.

C. Safety and Stability under Model Adaptation

We can now construct a CLF-CBF Quadratic Program (QP) in terms of incorporating both the adaptive stochastic CLF and CBF conditions, along with control limits (Equation

(18)):

Adaptive CLF) Adaptive CBF)

In practice, several modifications to this QP are often made ([24],[41]). In addition to a relaxation term for the CLF in Theorem II.2, we also include a relaxation term for the CBF. This helps to ensure the QP is feasible and allows for slowing down as much as possible when the safety constraint cannot be avoided due to control constraints, creating, e.g., lower impact collisions. Safety is still guaranteed as long as the relaxation term is less than 0. For an example of guaranteed safety in the presence of this relaxation term see [17], also see [21] for an approach to handling safety with control constraints. The emphasis of this work is on guaranteeing safety in the presence of adaptation so we leave these considerations for future work. Our entire framework is outlined in Algorithm 1.

III. APPLICATION TO FAST AUTONOMOUS DRIVING

In this section we validate BALSA on a kinematic bicycle model for car-like vehicles. We model the state as position in x and y, heading, and velocity respectively, with dynamics . where a is the input acceleration, L is the vehicle length, and is the steering angle. We employ a simple transformation to obtain dynamics in the form of (1). Let where , , and . Let the controls . Then fits the canonical form of (1). To ascertain the importance of learning and adaptation, we add the following disturbance to to use as a “true” model:

This constitutes a non-linearity in the forward velocity and a tendency to drift to the right.

We use the following barrier function for pointcloud-based obstacles. Similar to [17], we design this barrier function with an extra component to account for position-based constraints which have a relative degree greater than 1. This is done by including the time-derivative of the position-based constraint as an additional term in the barrier function, which penalizes velocities (or higher order derivatives) leading to a decrease of the level set function h. Let our safety set , where is the position of an obstacle. Let where r > 0 is the radius of a circle around the obstacle. Then construct a barrier function . As shown by [24], B(x) is a CBF, where helps to control the rate of convergence. We chose

A. Validation of BALSA in Simulation

One iteration of the algorithm for this problem takes less than 4ms on a 3.7GHz Intel Core i7-8700K CPU, in Python code which has not been optimized for speed. We make our code publicly available1. Because training the model occurs on a separate thread and can be performed anytime online, we do not include the model training time in this benchmark. We use OSQP [42] as our QP solver.

In Figure 2, we compare BALSA with several different baseline algorithms. We use a Neural Network trained with dropout and a negative-log-likelihood loss function for capturing the uncertainty [34]. We place several obstacles in the direct path of the reference trajectory. We also place velocity barriers for driving too fast or too slow. We observe that the behavior of the vehicle using our algorithm maintains good tracking errors while avoiding barriers and maintaining safety, while the other approaches suffer from various drawbacks. The adaptive controller (ad) and PD controller (pd) violate safety constraints. The (qp) controller with an inaccurate model also violates constraints and exhibits highly suboptimal behavior (Figure 3). A robust (rob) formulation which uses a fixed robust bound which is meant to bound any model uncertainty [17], while not violating safety constraints, is too conservative and non-adaptive, has trouble tracking the reference trajectory. In contrast, BALSA adapts to model error with guaranteed safety. We also plot the model uncertainty and error in (Figure 3).

Fig. 2: Comparison of the performance of four algorithms in tracking and avoiding barrier regions (red ovals). ref is the reference trajectory. ad is an adaptive controller (is a non-adaptive safety controller (is a proportional derivative controller (is a robust controller which uses a fixed to compensate for modeling errors. balsa is the full adaptive CLF-CBF-QP approach outlined in this paper and in Algorithm 1, i.e. (

Fig. 3: Top: Velocities of each algorithm. Red dotted line indicates safety barrier. Middle: Output prediction error of model, decreasing with time. Solid and dashed lines indicate both output dimensions. Bottom: Uncertainty , also decreasing with time. Predictions are made after 10 seconds to accumulate enough data to train the network. During this time we choose an upper bound for

B. Comparing Different Modeling Methods in Simulation

Next we compared the performance of BALSA on three different Bayesian modeling algorithms: Gaussian Processes, a Neural Network with dropout, and ALPaCA [33], a meta-learning approach which uses a hybrid neural network with Bayesian regression on the last layer. For all methods we retrained the model intermittently, every 40 new datapoints. In addition to the current state, we also included as input to the model the previous control, angular velocity in yaw, and the current roll and pitch of the vehicle. For the GP we reoptimized hyperparameters with each training. For the dropout NN, we used 4 fully-connected layers with 256 hidden units each, and trained for 50 epochs with a batch size of 64. Lastly, for ALPaCA we used 2 hidden layers, each with 128 units, and 128 basis functions. We used a batch size of 150, 20 context data points, and 20 test data points. The model was trained using 100 gradient steps and online adaption (during prediction) was performed using 20 of the most recent context data points with the current observation (see [33] for details of the meta-learning capabilities of ALPaCA). At each training iteration we retrain both the neural network and the last Bayesian linear regression layer. Figure (4) and Table (I) show a comparison of tracking error for these methods. We

Fig. 4: Comparison of adaptation performance in a Gazebo simulation using three different probabilistic model learning methods.

TABLE I: Average tracking error in position for different modeling methods in sim, split into the first minute and second minute.

found GPs to be computationally intractable with more than 500 data points, although they exhibited good performance. Neural networks with dropout converged quickly and were efficient to train and run. ALPaCA exhibited slightly slower convergence but good tracking as well.

C. Hardware Experiments on Martian Terrain

To validate that BALSA meets real-time computational requirements, we conducted hardware experiments on the platform depicted in Figure (5). We used an off-the shelf RC car (Traxxas Xmaxx) in 1/5-th scale (wheelbase 0.48 m), equipped with sensors such as a 3D LiDAR (Velodyne VLP-16) for obstacle avoidance and a stereo camera (RealSense T265) for on-board for state estimation. The power train consists of a single brushless DC motor, which drives the front and rear differential, operating in current control mode for controlling acceleration. Steering commands were fed to a servo position controller. The on-board computer (Intel NUC i7) ran Ubuntu 18.04 and ROS [43].

Experiments were conducted in a Martian simulation environment, which contains sandy soil, gravel, rocks, and rough terrain. We gave figure-eight reference trajectories at 2m/s and evaluated the vehicle’s tracking performance (Figure 5). Due to large achieving good tracking performance at higher speeds is difficult. We observed that BALSA is able to adapt to bumps and changes in friction, wheel slip, etc., exhibiting improved tracking performance over a non-adaptive baseline (Table II).

We also evaluated the safety of BALSA under adaptation. We used LiDAR pointclouds to create barriers at each LiDAR return location. Although this creates a large number of

TABLE II: Mean, standard deviation, and max tracking error on our rover platform for a figure-8 task.

Fig. 5: Left: A high-speed rover vehicle. Right: Figure-8 tracking on our rover platform on rough and sandy terrain, comparing adaptation vs. no adaptation.

Fig. 6: Vehicle avoids collision despite localization drift and unmodeled dynamics. Blue line is the reference trajectory, colored pluses are the vehicle pose, colored points are obstacles. Colors indicate time, from blue (earlier) to red (later). Note that localization drift results in the obstacles appearing to shift position. Green circle indicates location of the obstacle at the last timestep. Despite this drift the vehicle does not collide with the obstacle.

constraints, the QP solver is able to handle these in real-time. Figure 6 shows what happens when an obstacle is placed in the path of the reference trajectory. The vehicle successfully slows down and comes to a stop if needed, avoiding the obstacle altogether.

IV. CONCLUSION

In this work, we have described a framework for safe, fast, and computationally efficient probabilistic learning-based control. The proposed approach satisfies several important real-world requirements and take steps towards enabling safe deployment of high-dimensional data-driven controls and planning algorithms. Further development other types of robots including drones, legged robots, and manipulators is straightforward. Incorporating better uncertainty-representing modeling methods and training on higher-dimensional data (vision, LiDAR, etc) will also be a fruitful direction of research.

ACKNOWLEDGEMENT

The authors would like to thank Joel Burdick’s group for their hardware support. This research was partially carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology, and was sponsored by the JPL Year Round Internship Program and the National Aeronautics and Space Administration (NASA). Jennifer Nguyen was supported in part by NASA EPSCoR Research Cooperative Agreement WV-80NSSC17M0053 and NASA West Virginia Space Grant Consortium, Training Grant #NX15AI01H. Evangelos A. Theodorou was supported by the C-STAR Faculty Fellowship at Georgia Institute of Technology. Copyright ©2019. All rights reserved.

REFERENCES

[1] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.

[2] NASA, “Where is Curiosity? - NASA Mars Curiosity Rover,” 2018. [Online]. Available: https://mars.nasa.gov/msl/mission/ whereistherovernow/

[3] NASA, “Opportunity Updates,” 2018. [Online]. Available: https: //mars.nasa.gov/mer/mission/rover-status/opportunity/recent/all/

[4] E. Klein, E. Nilsen, A. Nicholas, C. Whetsel, J. Parrish, R. Mattingly, and L. May, “The mobile mav concept for mars sample return,” in 2014 IEEE Aerospace Conference, March 2014, pp. 1–9.

[5] A. Nelessen, C. Sackier, I. Clark, P. Brugarolas, G. Villar, A. Chen, A. Stehura, R. Otero, E. Stilley, D. Way, K. Edquist, S. Mohan, C. Giovingo, and M. Lefland, “Mars 2020 entry, descent, and landing system overview,” in 2019 IEEE Aerospace Conference, March 2019, pp. 1–20.

[6] N. Wagener, C. Cheng, J. Sacks, and B. Boots, “An online learning approach to model predictive control,” CoRR, vol. abs/1902.08967, 2019. [Online]. Available: http://arxiv.org/abs/1902.08967

[7] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Information-theoretic model predictive control: Theory and applications to autonomous driving,” IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603–1622, 2018.

[8] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe modelbased reinforcement learning with stability guarantees,” in Advances in neural information processing systems, 2017, pp. 908–918.

[9] S.-K. Kim, R. Thakker, and A.-A. Agha-Mohammadi, “Bi-directional value learning for risk-aware planning under uncertainty,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2493–2500, 2019.

[10] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 627–635.

[11] C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot, “Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking,” The International Journal of Robotics Research, vol. 35, no. 13, pp. 1547–1563, nov 2016. [Online]. Available: http://journals.sagepub.com/doi/10.1177/0278364916645661

[12] K. Pereida and A. P. Schoellig, “Adaptive Model Predictive Control for High-Accuracy Trajectory Tracking in Changing Conditions,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, oct 2018, pp. 7831–7837. [Online]. Available: https://ieeexplore.ieee.org/document/8594267/

[13] L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious Model Predictive Control using Gaussian Process Regression,” arXiv, may 2017. [Online]. Available: http://arxiv.org/abs/1705.10702

[14] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural Lander: Stable Drone Landing Control using Learned Dynamics,” arXiv, nov 2018. [Online]. Available: http://arxiv.org/abs/1811.08027

[15] G. Chowdhary, H. A. Kingravi, J. P. How, and P. A. Vela, “Bayesian Nonparametric Adaptive Control Using Gaussian Processes,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 3, pp. 537–550, mar 2015. [Online]. Available: http://ieeexplore.ieee.org/document/6823109/

[16] Q. Nguyen and K. Sreenath, “Optimal Robust Control for Bipedal Robots through Control Lyapunov Function based Quadratic Programs.” Robotics: Science and Systems, 2015.

[17] Q. Nguyen and K. Sreenath, “Optimal robust control for constrained nonlinear hybrid systems with application to bipedal locomotion,” in 2016 American Control Conference (ACC). IEEE, 2016, pp. 4807– 4813.

[18] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks,” arXiv, mar 2019. [Online]. Available: http://arxiv.org/abs/1903.08792

[19] Q. Nguyen and K. Sreenath, “L1 adaptive control for bipedal robots with control Lyapunov function based quadratic programs,” in 2015 American Control Conference (ACC). IEEE, jul 2015, pp. 862–867. [Online]. Available: http://ieeexplore.ieee.org/document/7170842/

[20] A. J. Taylor, V. D. Dorobantu, M. Krishnamoorthy, H. M. Le, Y. Yue, and A. D. Ames, “A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability,” arXiv, mar 2019. [Online]. Available: http://arxiv.org/abs/1903.07214

[21] T. Gurriet, A. Singletary, J. Reher, L. Ciarletta, E. Feron, and A. Ames, “Towards a framework for realizable safety critical control through active set invariance,” in Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems. IEEE Press, 2018, pp. 98–106.

[22] V. Azimi and P. A. Vela, “Robust adaptive quadratic programming and safety performance of nonlinear systems with unstructured uncertainties,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 5536–5543.

[23] V. Azimi and P. A. Vela, “Performance reference adaptive control: A joint quadratic programming and adaptive control framework,” in 2018 Annual American Control Conference (ACC). IEEE, 2018, pp. 1827–1834.

[24] Q. Nguyen and K. Sreenath, “Exponential Control Barrier Functions for enforcing high relative-degree safety-critical constraints,” in 2016 American Control Conference (ACC). IEEE, jul 2016, pp. 322–328. [Online]. Available: http://ieeexplore.ieee.org/document/7524935/

[25] A. D. Ames, K. Galloway, K. Sreenath, and J. W. Grizzle, “Rapidly exponentially stabilizing control lyapunov functions and hybrid zero dynamics,” IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 876–891, 2014.

[26] A. Look and M. Kandemir, “Differential bayesian neural nets,” arXiv preprint arXiv:1912.00796, 2019.

[27] X. Liu, S. Si, Q. Cao, S. Kumar, and C.-J. Hsieh, “Neural sde: Stabilizing neural ode networks with stochastic noise,” arXiv preprint arXiv:1906.02355, 2019.

[28] P. Hegde, M. Heinonen, H. L¨ahdesm¨aki, S. Kaski et al., “Deep learning with differential gaussian process flows,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2019.

[29] L. Li, M. L. Littman, T. J. Walsh, and A. L. Strehl, “Knows what it knows: a framework for self-aware learning,” Machine learning, vol. 82, no. 3, pp. 399–443, 2011.

[30] C. J. Roy and W. L. Oberkampf, “A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing,” Computer Methods in Applied Mechanics and Engineering, vol. 200, no. 25-28, pp. 2131–2144, jun 2011.

[31] T. Lew, A. Sharma, J. Harrison, and M. Pavone, “On the Problem of Reformulating Systems with Uncertain Dynamics as a Stochastic Differential Equation. http://asl.stanford.edu/wp-content/papercite-data/pdf/dynsSDE.pdf,” Technical Report, 2020. [Online]. Available: http://asl.stanford.edu/wp-content/papercite-data/pdf/dynsSDE.pdf

[32] D. Hafner, D. Tran, T. Lillicrap, A. Irpan, and J. Davidson, “Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors,” arXiv, jul 2018. [Online]. Available: http://arxiv.org/abs/1807.09289

[33] J. Harrison, A. Sharma, and M. Pavone, “Meta-Learning Priors for Efficient Online Bayesian Regression,” arXiv, jul 2018. [Online]. Available: http://arxiv.org/abs/1807.08912

[34] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning, 2016, pp. 1050–1059.

[35] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, “Taking the human out of the loop: A review of bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2015.

[36] Y. Pan, X. Yan, E. A. Theodorou, and B. Boots, “Prediction under uncertainty in sparse spectrum gaussian processes with applications to filtering and control,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 2760–2768.

[37] D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017.

[38] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural lander: Stable drone landing control using learned dynamics,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 9784–9790.

[39] R. Khasminskii, Stochastic stability of differential equations. Springer Science & Business Media, 2011, vol. 66.

[40] A. Clark, “Control barrier functions for complete and incomplete information stochastic systems,” in 2019 American Control Conference (ACC), July 2019, pp. 2928–2935.

[41] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.

[42] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,” ArXiv e-prints, Nov. 2017.

[43] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe, Japan, 2009, p. 5.

designed for accessibility and to further open science