Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties

2020·arXiv

Abstract

R. Cheng, A. D. Ames, and J. W. Burdick are with the Department of Mechanical and Civil Engineering, California Institute of Technology. M. J. Khojasteh is with the Department of Electrical Engineering, California Institute of Technology. {rcheng,mjkhojas,ames,jwb}@caltech.edu Fig. 1: Diagram overviewing the control structure. Our approach guarantees safety by utilizing a Bayesian Inference Module to learn dynamic uncertainties, and handles them with our proposed Robust CBF module.

the Multi-Agent CBF while maintaining computational effi-ciency of the underlying controller (i.e. a quadratic program).

Our approach focuses first on learning high-confidence polytopic bounds on the, possibly coupled, uncertainties in both the robot dynamics and other agents’ dynamics. To achieve this, we utilize Matrix-Variate Gaussian Processes (MVG) and optimize their hyperparameters offline from interaction data; this allows us to predict ellipsoidal uncertainty in our dynamics online, which we convert to an uncertainty polytope given a desired confidence level. Using these polytopic bounds, we formulate a robust CBF as a min-max optimization problem over the robot controls and the potential uncertainties, respectively. We then transform this min-max problem into a quadratic program that can be efficiently solved to find a safe control action that is robust with respect to our estimated uncertainty. See Fig. 1 for an overview of our approach.

Organization: Sec. II goes over related work in the safe collision avoidance literature, Sec. III provides background information on CBFs used to guarantee safety, and MatrixVariate Gaussian Processes used to estimate correlated uncertainties. Sect. IV introduces the robust multi-agent CBF and shows how it can be efficiently solved under given uncertainty bounds. Sec. V looks at how to learn these confidence bounds over the uncertainty. Finally, Sec. VI presents simulation results illustrating the benefits of our algorithm and verifying the safety of our controller.

Fig. 2: Sample path of a multi-agent system based on the nominal CBF (cf. [6]) and our proposed Robust CBF. The robot (blue) tries to navigate from a start position to random goal position while avoiding collisions with other agents (red). Approximately half of the other agents blindly travel towards their own randomly chosen goal, while the rest exhibit varying degrees of collision-avoidance behavior (the robot does not know their behavior apriori). For more details and results of simulations, see Sec. VI-A. (a) Initial robot/environment configuration, (b) Intermediate configuration, (c) Intermediate configuration showing that the nominal CBF controller experiences collision (top), while the robust CBF avoids collision (bottom). (d) Final configuration before robot reaches its goal position (star). See https://youtu.be/hXg5kZO86Lw for the simulation videos.

Multi-agent collision avoidance has been a long-studied problem with different approaches proposed for enabling safe control in varying situations. Velocity obstacles is a popular approach that involves limiting control actions to a set of “safe” actions, though its constant velocity with linear dynamics assumption is limiting [10], [11]. Related works in this direction have loosened these assumptions, but require significant sampling of the action space and do not incorporate dynamic uncertainty [12], [13]. More recently, Buffered Voronoi Cells (BVC) have been proposed as a tool to provide safety guarantees with only positional information, though safety guarantees are provided only under linear dynamics and without uncertainty [14], [15]. Also, [16]–[18] provide safety guarantees, under worst case disturbances, by solving the Hamilton-Jacobi-Isaacs equation to obtain a minimally invasive control law. However, the heavy computational expense prohibits applicability to largescale multi-agent systems.

Reinforcement learning methods have emerged recently, which directly learn actions in multi-agent settings in response to the observed environment [19], [20]. However, these methods provide no formal guarantees of safety, and as such are prone to collision in novel environments. Furthermore, they have not been shown to scale to settings with many agents. To ensure safety in the reinforcement learning settings, the work [21] combined a safe backup policy with the learned policies. [22] incorporated safety constraints into the reinforcement learning framework, although this work considered only decoupled uncertainties and was limited to polytopic safety constraints. Bayesian inference was utilized in [23]–[25] to learn system dynamics in an online manner while ensuring safety (with high probability) using Control Barrier Functions for a single agent.

Control Barrier Functions (CBF) are a tool for enforcing set invariance of dynamical systems [26]–[28], and they have been used to guarantee collision avoidance in multiagent settings by projecting desired actions, to the closest (in leastsquares sense) safe actions according to a CBF condition [29]–[31]. However, defining a valid CBF for general systems remains a challenge, especially in cases with significant uncertainty. Recent works have looked at learning CBFs for general systems, either implicity or explicitly [32], [33]. A multi-agent CBF was defined explicitly for multi-agent systems in the case of continuous-time linear dynamics [6], [7]. However, this multi-agent CBF does not incorporate uncertainty or nonlinearity in the dynamics. Other work on robust CBFs deals only with highly conservative worst-case bounds [8]. We seek to address this gap in this paper.

Our work builds upon the current literature by computing the minimally invasive action necessary to maintain safety in a computationally efficient manner for discrete-time systems while incorporating nonlinear dynamics and learned uncertainty.

Our robotic system is represented by nonlinear control- affine dynamics in discrete-time:

where p , v , and z denote position, velocity and other states, respectively. Here, f j, gj, and d j are realvalued functions, for j , and u where U := : umax}. The functions f(x) and g(x) are assumed to be known, whereas d(x) represents unknown uncertainty in the dynamics, which we model with a Gaussian process [cf. Sec. III-B]. We assume that this system has relative degree 2 with respect to the positional output p; in discrete time, this directly implies that gp(x) = 02. Similarly, let us represent the other agents within our multi-agent system with dynamics,

where i indexes each of the other agents in our system. We assume the control input for other agents are a (uncertain) function of their state at the given time, so we do not show control inputs explicitly in (2).

As our robot is interacting with other unknown agents, it will be important for us to account for the uncertainties, d, when considering safety via CBFs. For the rest of the paper, we assume that we perfectly observe each agent’s current state xt, but do not know (i.e. can only estimate) their uncertain dynamics d and d.

A. Control Barrier Functions

Consider a safe set, C , defined by the super-level set of a continuously differentiable function h : :

To maintain safety during the learning process, the system state must always remain within the safe set C (i.e. the set C is forward invariant with respect to the system dynamics). A set C is forward invariant if for every x0 , xfor all t 0. In our multi-agent setting, this set could include all states where collision can be avoided given the robot’s input bounds. Control barrier functions utilize a Lyapunov-like argument to provide a sufficient condition for ensuring forward invariance of the safe set C under controlled dynamics.

Definition 1: Given a set defined by (3), the continuously differentiable function h : is a discrete-time control barrier function (CBF) for dynamical system (1) if there exists such that for all xt ,

If a function h(x) is a CBF, then there exists a controller such that the set C is forward invariant [27], [34]. In other words, system safety is guaranteed by ensuring satisfaction of condition (4). Our goal is to ensure safety, by computing a minimally invasive control action that satisfies (4), in an online fashion. In particular, we utilize the following multi-agent CBF inspired by [6]:

where amax represents our robot’s max acceleration in the collision direction, Ds is the collision margin, is the positional difference between the agents, and is the velocity difference between the agents.

The work [6] introduced a CBF similar to (5) for continuous-time linear systems, such that amax can be determined easily. In Sec. IV, under an appropriate assumption, we show that (5) is a valid CBF for the discrete-time nonlinear dynamics (1) and (2).

B. Matrix-Variate Gaussian Process Regression

Here we illustrate how Bayesian learning can be used to acquire a distribution over the uncertainty in the dynamics. Since we are estimating a multivariate uncertainty d(x), we must consider potential correlations in its components. Thus, we use the Matrix-Variate Gaussian Process (MVG) model to learn the system dynamics and uncertainty from data. By learning and in tandem with the controller, we can obtain high probability confidence intervals on the unknown dynamics, which adapt/shrink as we obtain more information (i.e. measurements) on the system. We first start by defining the MVG distribution [24], [35]–[37] as follows.

Definition 2: We say the random matrix X is distributed according to a MVG distribution when its probability density function is defined as:

where M denotes the mean, and encodes the covariance matrix of the rows, and encodes the covariance matrix of the columns. In this case, we write X , and we have vecvec, where vecis the vectorization of X, obtained by stacking the columns of X, and is the Kronecker product.

We continue by modeling d(x) as a MVG on . Without loss of generality, we assume zero mean for the MVG with positive semi-definite parameter covariance matrix , and kernel . That is,

where with xi,xj). There are many potential choices for the kernel function xi,xj) (cf. [38]), though we utilize the simple squared-exponential kernel in this work,

where and l are kernel hyperparameters. Since d(x) is an MVG, the training observations yx1),...,d(xNat sampling points xx1,...,xN], and the predictive target dat query test point, x, are jointly Gaussian as

where Kwith xi,xj), and Kwith xi). Thus, we can compute the posterior distribution as follows:

This allows us to estimate our unknown dynamics and their possibly correlated uncertainties.

In this section, we first show that under certain assump- tions, (5) is a multi-agent CBF for our discrete-time nonlinear dynamics. Then, given bounds on the uncertainty in each agents’ dynamics, we incorporate robustness to these uncertainties into the CBF while maintaining the computational efficiency of a quadratic program.

Extending Multi-Agent CBF to Discrete-Time, Nonlinear Systems: The multi-agent CBF introduced in [6] was originally designed for continuous-time linear systems. However, we prove that under proper assumption, h(x) defined in (5) is a discrete-time CBF for the discrete-time nonlinear system (1)/(2). The tradeoff is the additional conservativeness in amax introduced by the following assumption. Intuitively, this assumption ensures that the robot can accelerate in any direction relative to the other agents, as proved in Lemma 2.

Assumption 1: Assume that for all x , gv(x) is invertible and 1, where fv(x) + dvf vt and gv(x)) is the minimum sin- gular value of gv(x).

Remark 1: This assumption ensures controllability and places restrictions on our agent’s dynamics with relation to its actuator authority. If umax is large, the restriction is minimal/non-existent, and vice-versa. As a simple example, a car at rest would not satisfy this assumption, though a moving car would likely satisfy this assumption (with a higher velocity corresponding to larger amax).

Lemma 2: Under Assumption 1, which places controllability restrictions on the dynamics, the expression (5), defining set C , represents a discrete-time CBF for system (1), with

Proof: First, we must show that set C defined by expression (5) is control invariant for the dynamics (1), given that the robot has acceleration authority in any direction of at least amax for all x . For this, we rely on the same proof structure in [6], with the main difference being that we have discrete-time (rather than continuous-time) dynamics. Let ˆv(xt) denote the component of velocity v(xt) in the direction of collision.

We know that collision can be avoided if we can match the other agent’s velocity (i.e. ˆv = 0) by the time we reach them. If we assume that we can accelerate by amax in any direction, we are guaranteed that we can achieve ˆv = 0 within time Tc . In our discrete-time formulation, the following condition implies collision avoidance:

Note that this constraint is only active when two agents are moving closer to each other (ˆv < 0), and no constraint is needed when two agents are moving away from each other (ˆv 0). Therefore, collision can always be avoided under the following condition,

Based on our geometric argument, we know that the set : hdefined by h :amaxDs) is control invariant. This implies that h(x) is a discrete-time CBF [27], given that the robot can accelerate by at least amax in any direction.

Therefore, our second step is to show that for all x and any unit vector ˆe, it holds that supuvtvvt ˆeamax > 0.

where the last inequality follows directly from Assumption 1. Therefore, we are guaranteed that the robot can accelerate by at least amax > 0 in any direction. Combined with the first part of the proof, this shows that the set C defined by (5) is a discrete-time CBF.

Incorporating Robustness into CBF: While uncertainty in robot/environmental dynamics can be directly incorporated into the Control Barrier Condition (CBC) for simple systems/constraints (e.g. linear CBFs) [22], this is not the case for the multi-agent CBF with discrete-time dynamics. Unfortunately, uncertainty cannot be directly incorporated into the CBC while maintaining a quadratic program.

Consider our CBF (5) and the dynamics defined in (1) and (2). Based on these, we can compute the following CBC with respect to each other agent i as follows:

If we can (a) determine bounds on the dynamic uncertainties, d, in (1) and (2), and (b) compute control actions that satisfy CBC0 in an online fashion, then we can obtain robust safety guarantees utilizing the multi-agent CBF. Ideally, we could incorporate (14) into an efficiently solvable program as follows,

umax . (15) where udes is any, potentially unsafe, desired control action passed to our CBF (e.g. a linear MPC controller [39]) and D is our bound on the uncertainty (to be further discussed in Section V). Note that the CBC constraint (14) in (15) is clearly not linear nor convex. Therefore, the resulting program is non-convex and cannot be solved at high frequency for adequate safety assurances. However, recall that our system has relative degree 2, which allows us to derive the following bound,

CBC(xt,ut,dtkc(xtH1(xt)dt uTt H2(xt)dt H3(xt)ut, (16) where the definitions of the terms (kc,H1,H2,H3) are given in (27) in the Appendix. We also move the derivation of the bound (16) to the Appendix due to space limitations. For the rest of the paper, we drop the index i for notational convenience. The following lemma allows us to utilize CBC bound (16) to obtain safety guarantees under polytopic uncertainties.

Lemma 3: Suppose the uncertainty in our dynamics d is bounded in the polytope Gd . Then the action, u, obtained from solving the following optimization problem (17) robustly satisfies the CBC condition (14) (i.e. renders the set C forward invariant).

Proof: The robust optimization problem (17) can be equivalently represented by the following optimization problem (i.e. (17) is the dual to (18) with no duality gap [40] where is the dual variable):

This lemma shows that if we bound d(xt) in a polytope, we can transform our robust multi-agent CBF (15) into a quadratic program (17), which gives us a computationally efficient way to provide robust guarantees of safety under robot and environment uncertainties. Hence, in the following section, we examine the problem of learning accurate polytopic bounds on the uncertainty d in an online fashion.

In this section, our goal is to learn accurate confidence supports for the uncertainties d and d(for all agents i) in an online manner, which will allow us to guarantee safety with high probability. To this end, we utilize Matrix-Variate Gaussian Processes which provide multivariate Gaussian distributions over the uncertainties, d and d.

Using (1), we have d(xt) = xtxtxt)ut. A similar relation holds based on (2) for other agents. Thus, given a sequence of measurements (xt,ut,xt) over a horizon T, we compute the uncertain variable, dtdtover that horizon. Then, we infer a distribution over the query point, dt (i.e. next time point), as described in Equation (8).

Learning Kernel Parameters: Direct application of the MVG (8) to our multi-agent setup will be problematic without first training the MVG model hyperparameters. This is easy to see by noting that the covariance, , does not depend on the observed values, Y. Furthermore, the coupling between uncertainties, captured by , is completely independent of our online measurements. Instead, much of the uncertainty prediction is baked into the kernel parameters, , and matrix . Thus, to obtain accurate estimates of d, we must learn MVG model parameters offline from data. In other words, some agents might behave predictably and others might behave more erratically, and hyperparameter optimization is necessary to capture these uncertainty profiles in our Bayesian inference.

Based on the probability density function (6), we obtain the negative log-likelihood of a given set of training data X,

which we optimize (over ) using Stochastic Gradient Descent (see hyperparameter optimization in Fig. 1) [41]. Recall that N denotes the number of training samples in our batch, and n denotes the dimension of the output Y (i.e. d(x)). We run the optimization several times with different initializations to decrease our chance of getting stuck in poor local optima. The gradient update expressions are shown in Equation (20) below. Note that we use projected gradient updates for , in order to enforce the condition that must be positive definite.

Converting GP Uncertainty to Polytopic Bound: After learning the kernel parameters, we can obtain the mean, ˆM, and variance, , from data observed online based on the multivariate Gaussian Process (8). Then, the uncertainties should follow the distribution,

where represents the chi-squared distribution with N degrees of freedom (equal to dimension of d). This allows us to obtain the confidence support,

However, this set defines an ellipsoid over d rather than a polytope, which we require for the robust optimization; while we could directly utilize the ellipsoidal constraint, this would not lead to an efficiently solvable QP. To obtain a polytope, we compute the minimum bounding box surrounding the uncertainty ellipsoid.

Lemma 4: Suppose our robot/environment uncertainty can be described by our MVG model (described by the distribution (21)). With probability 1 , the following polytopic bound on the uncertainty d holds:

where and represent the eigenvectors and eigenvalues of , respectively.

where represent the eigenvectors of contained in , and are the eigenvalues of contained in . We can then conclude that with probability 1, the following relations hold giving us our polytopic bound,

Remark 5: Recall from Section III.B that in deriving our uncertainty bounds using the Matrix-Variate Gaussian Process, we rely on the assumption that the uncertainties (d1,...,dN) are distributed according to a multivariate Gaussian. While this a strong assumption that may not be valid in general [42], it can provide a good approximation of agent behavior in many cases. As an alternative, if the uncertainty belongs to Reproducing Kernel Hilbert Space (RKHS) the high confidence bounds developed in [1] could be used.

High-Confidence Safety Guarantee: Combining the uncertainty bound on d with our result from Sec. IV leads us to the main result, summarized in the following Theorem.

Theorem 6: Using the polytopic bounds (23), the control action obtained from the quadratic program (17) guarantees robust safety (i.e. collision avoidance between agents) with probability at least 1.

Proof: We can represent (23) in the form g}; therefore, with probability 1 , the uncertainty d is contained in the set (by Lemma 4). From Lemma 3 and Equation (16), if we solve the quadratic program (17), we are guaranteed that CBCkc H1d uTH2dH3u 0 for all d . Therefore, the CBF condition is satisifed with probability 1, so safety is guaranteed with probability at least 1(by Definition 1 and the forward invariance property of CBFs [26]).

We test our algorithm in a simulated multi-agent environment in which our robot, with nonlinear dynamics satisfying Assumption 1, navigates from a start to goal position while avoiding collisions, in the presence of a random number of other agents (3-12 agents). Each of the other agents has a randomized (unknown) goal that they try to reach. Approximately half of them blindly travel from their start to goal position without accounting for others, while the other half exhibit some collision avoidance behavior through their own control barrier functions (with random CBF parameters). An example simulation instance is shown in Fig. 2. See the code (referenced below) for further simulation details/parameters and agent dynamics.

We simulate several instances of the other agents moving and interacting, and use this data for hyperparameter optimization of an MVG model as described in Sec. V. We then equip our robot with the robust CBF described in Sec. IV, using the optimized MVG for uncertainty prediction.

By running 1000 simulated tests in randomized environments, we show that the robust CBF avoids collision in 98.5% of cases (when we set 05 [cf. (22)]), performing much better than the nominal multi-agent CBF (cf. [6]), which avoids collisions in 85.0% of cases. The simulation results are summarized in Table I.

TABLE I: Performance statistics for the robust vs nominal multi-agent CBF across 1000 randomized trials. For fair comparison, the robust and nominal CBFs were tested in the same randomized 1000 trials. Collision Rate: Percentage of trials that ended in collision. Distance to Collision: For trials without collision, the robot’s margin from collision. The closer the robust CBF is to the nominal CBF, the less conservativeness is introduced by the uncertainty prediction.

Robustness must always come at the cost of performance (e.g. we can reach the goal faster if we do not care about collisions). To investigate the conservativeness of our approach, we looked at the uncertainty predictions of the MVG; Fig. 3 shows the uncertainty ellipse (over the 4-dimensional disturbance, d) projected onto the two velocity dimensions, as well as the true disturbances, dv. We found that (97%, 99%) of disturbances, d, were within the confidence ellipsoid, respectively, in line with the expectations of the MVG model. Furthermore, the results in Table I show that the robust CBF only introduces slight conservativeness, as the margin from collision (in instances where the CBF was active) was very similar when utilizing the robust CBF vs. the nominal CBF. This suggests that the MVG model does well at modeling the uncertainties.

Fig. 3: The normalized 2σ (red) and 3σ (blue) uncertainty ellipsoids over the other agents’ dynamics, dh, projected onto the velocity dimensions,

dhv . The true disturbances over 1000 time steps (across different trials) are

plotted as the blue dots. We found that the percentage of disturbances within the 2σ and 3σ uncertainty ellipsoids were consistent with expectations based on the MVG model. Note that the uncertainty ellipse is state-dependent, so we normalize the ellipsoid for each point, (xh,dhv ), for fair comparison.

The code for implementing the robust multi-agent CBF in our simulated environment can be found at https://github.com/rcheng805/robust_cbf.

A video of the simulations can be found at https://youtu.be/hXg5kZO86Lw.

Robot navigation in unstructured environments with humans must be safe, but such environments are fraught with uncertainty due to the unpredictability of agents. In this work, we have introduced a robust multi-agent control barrier formulation, which guarantees safety with high probability in the presence of multiple uncontrolled, uncertain agents. We learn uncertainties online for the agents in the environment using Matrix-Variate Gaussian Processes, and design our CBF to be robust to the learned uncertainties.

Future work will look at learning and designing safe controllers for a larger class of uncertainties, uncaptured by our Matrix Variate GP. Particularly, many agents in the real world exhibit multi-modal uncertainties, and it will be important for us to design safe, robust controllers for such uncertainties.

REFERENCES

[1] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Advances in Neural Information Processing Systems, 2017.

[2] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,” IEEE Transactions on Automatic Control, 2018.

[3] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learningbased model predictive control for safe exploration,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018.

[4] K. P. Wabersich and M. N. Zeilinger, “Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning,” arXiv preprint arXiv:1812.05506, 2018.

[5] L. Janson, T. Hu, and M. Pavone, “Safe motion planning in unknown environments: Optimality benchmarks and tractable policies,” in Robotics: Science and Systems, Pittsburgh, USA, June 2018.

[6] U. Borrmann, L. Wang, A. D. Ames, and M. Egerstedt, “Control Barrier Certificates for Safe Swarm Behavior,” IFAC Conference on Analysis and Design of Hybrid Systems, 2015.

[7] L. Wang, A. Ames, and M. Egerstedt, “Safety barrier certificates for heterogeneous multi-robot systems,” in Proceedings of the American Control Conference, 2016.

[8] A. Singletary, P. Nilsson, T. Gurriet, and A. D. Ames, “Online Active Safety for Robotic Manipulators,” 2020.

[9] F. Bartoli, G. Lisanti, L. Ballan, and A. Del Bimbo, “ContextAware Trajectory Prediction,” in International Conference on Pattern Recognition, 2018.

[10] P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,” International Journal of Robotics Research, 1998.

[11] J. D. Van Berg, M. Lin, and D. Manocha, “Reciprocal velocity obstacles for real-time multi-agent navigation,” in Proceedings - IEEE International Conference on Robotics and Automation, 2008.

[12] D. Wilkie, J. Van Den Berg, and D. Manocha, “Generalized velocity obstacles,” in International Conference on Intelligent Robots and Systems, 2009.

[13] D. Bareiss and J. Van Den Berg, “Generalized reciprocal collision avoidance,” International Journal of Robotics Research, 2015.

[14] D. Zhou, Z. Wang, S. Bandyopadhyay, and M. Schwager, “Fast, On-line Collision Avoidance for Dynamic Vehicles Using Buffered Voronoi Cells,” IEEE Robotics and Automation Letters, 2017.

[15] M. Wang, Z. Wang, S. Paudel, and M. Schwager, “Safe Distributed Lane Change Maneuvers for Multiple Autonomous Vehicles Using Buffered Input Cells,” in International Conference on Robotics and Automation, 2018.

[16] J. F. Fisac, M. Chen, C. J. Tomlin, and S. Shankar Sastry, “Reachavoid problems with time-varying dynamics, targets and constraints,” in International Conference on Hybrid Systems: Computation and Control, HSCC, 2015.

[17] M. Chen and C. J. Tomlin, “HamiltonJacobi Reachability: Some Recent Theoretical Advances and Applications in Unmanned Airspace Management,” Annual Review of Control, Robotics, and Autonomous Systems, 2018.

[18] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-jacobi reachability: A brief overview and recent advances,” in Conference on Decision and Control, CDC 2017.

[19] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” in IEEE International Conference on Intelligent Robots and Systems, 2017.

[20] M. Everett, Y. F. Chen, and J. P. How, “Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learning,” in International Conference on Intelligent Robots and Systems, 2018.

[21] W. Zhang, O. Bastani, and V. Kumar, “MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding,” arXiv eprints, p. arXiv:1910.12639, Oct 2019.

[22] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks,” Proceedings of the AAAI Conference on Artificial Intelligence, 2019.

[23] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadrotor dynamics using barrier certificates,” in International Conference on Robotics and Automation (ICRA). IEEE, 2018.

[24] M. J. Khojasteh, V. Dhiman, M. Franceschetti, and N. Atanasov, “Probabilistic safety constraints for learned high relative degree system dynamics,” arXiv preprint arXiv:1912.10116, 2019.

[25] D. D. Fan, J. Nguyen, R. Thakker, N. Alatur, A.-a. Agha-mohammadi, and E. A. Theodorou, “Bayesian learning-based adaptive control for safety critical systems,” arXiv preprint arXiv:1910.02325, 2019.

[26] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs with application to adaptive cruise control,” in Proceedings of the IEEE Conference on Decision and Control, 2014.

[27] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control Barrier Function Based Quadratic Programs for Safety Critical Systems,” IEEE Transactions on Automatic Control, 2017.

[28] Q. Nguyen and K. Sreenath, “Exponential control barrier functions for enforcing high relative-degree safety-critical constraints,” in 2016 American Control Conference (ACC). IEEE, 2016, pp. 322–328.

[29] A. Clark, “Control barrier functions for complete and incomplete information stochastic systems,” in 2019 American Control Conference (ACC). IEEE, 2019, pp. 2928–2935.

[30] G. Yang, C. Belta, and R. Tron, “Self-triggered control for safety critical systems using control barrier functions,” in 2019 American Control Conference (ACC). IEEE, 2019, pp. 4454–4459.

[31] M. Srinivasan, N.-s. P. Hyun, and S. Coogan, “Weighted polar finite time control barrier functions with applications to multi-robot systems,” in IEEE Conference on Decision and Control, 2019.

[32] T. Gurriet, P. Nilsson, A. Singletary, and A. D. Ames, “Realizable set invariance conditions for cyber-physical systems,” in Proceedings of the American Control Conference, 2019.

[33] M. Srinivasan, A. Dabholkar, S. Coogan, and P. Vela, “Synthesis of Control Barrier Functions using a Supervised Machine Learning Approach,” arXiv, 2020.

[34] A. Agrawal and K. Sreenath, “Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation,” in Robotics: Science and Systems, 2017.

[35] A. K. Gupta and D. K. Nagar, Matrix variate distributions. Chapman and Hall/CRC, 2018.

[36] C. Louizos and M. Welling, “Structured and efficient variational deep learning with matrix gaussian posteriors,” in International Conference on Machine Learning, 2016, pp. 1708–1716.

[37] S. Sun, C. Chen, and L. Carin, “Learning Structured Weight Uncertainty in Bayesian Neural Networks,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2017, pp. 1283–1292.

[38] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3.

[39] S. Singh, A. Majumdar, J.-J. Slotine, and M. Pavone, “Robust online motion planning via contraction theory and convex optimization,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 5883–5890.

[40] Y. Chen and J. Anderson, “System level synthesis with state and input constraints,” in Conference on Decision and Control (CDC), 2019.

[41] S. Crepey and M. F. Dixon, “Gaussian Process Regression for Derivative Portfolio Modeling and Application to CVA Computations,” arXiv, 2019.

[42] A. Lederer, J. Umlauft, and S. Hirche, “Uniform error bounds for gaussian process regression with application to safe control,” in Advances in Neural Information Processing Systems, 2019, pp. 659– 669.

APPENDIX I PARAMETERS OF THE CONTROL BARRIER CONDITION

Here we define the terms used in the lower bound of the Control Barrier Condition (CBC) in (16):

APPENDIX II PROOF OF CBC LOWER BOUND

In this section, we prove that the lower bound defined in (16) holds. We begin by expanding out the full CBC condition in (14), using our assumption of a relative degree 2 system, we reach the following expression:

CBC(x,u,d) = ( fp(x)f hp(x))T( fv(x)f hv (x))fp(x)+dp(x)f hp(x)dhp(x)+amax(fp(x)+dp(x)f hp(x)dhp(x)Ds) +

amaxDspTfpf hpgvfp(x)+dpf hpdhp

fvf hv (x) fp(x)+dpf hpdhpfpf hp(x) fp(x)+dpf hpdhpfvf hv fp(x)+dpf hpdhp

uR dp dv dhp dhv

1 fp(x)+dpf hpdhpfp(x)+dpf hpdhp

By bounding the positional uncertainty terms dpand dhp, we obtain a lower bound on CBC(x,u,d):

CBCminfpf hpfvf hv fpf hpamaxfpf hpDs) +

fpf hp(x) fpf hpfvf hv fpf hpfpf hpfpf hp

uR , dp , dv , dhp , dhvuTRgvfpf hpgvfpf hpdp dv dhp dhvfpf hpfpf hp

fpf hpfpf hp

Grouping the terms, this can be written in simplified form using the parameters defined in Appendix I:

CBCkcH1uTH2H3(x)u

Designed for Accessibility and to further Open Science