An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments

2019·Arxiv

Abstract

Abstract

Real-world autonomous vehicles often operate in a priori unknown environments. Since most of these systems are safety-critical, it is important to ensure they operate safely in the face of environment uncertainty, such as unseen obstacles. Current safety analysis tools enable autonomous systems to reason about safety given full information about the state of the environment a priori. However, these tools do not scale well to scenarios where the environment is being sensed in real time, such as during navigation tasks. In this work, we propose a novel, real-time safety analysis method based on Hamilton-Jacobi reachability that provides strong safety guarantees despite environment uncertainty. Our safety method is planner-agnostic and provides guarantees for a variety of mapping sensors. We demonstrate our approach in simulation and in hardware to provide safety guarantees around a state-of-the-art vision-based, learning-based planner. Videos of our approach and experiments are available on the project website1.

I. INTRODUCTION

Autonomous vehicles operating in the real world must navigate through a priori unknown environments using on-board, limited-range sensors. As a vehicle makes progress towards a goal and receives new sensor information about the environment, rigorous safety analysis is critical to ensure that the system’s behavior does not lead to dangerous collisions. In order to provide such safety guarantees for real vehicles, any analysis should take into account multiple sources of uncertainty, such as modelling error, external disturbances, and unknown parts of the environment.

A variety of mechanisms have been proposed to ensure robustness to modeling error and external disturbances [24], [16], [34]. Additionally, safety guarantees for systems using limited-range sensors in unknown environments have been the subject of recent investigation [21], [22], [32], [19]. Although interesting results have emerged from these studies, the safety guarantees are provided by imposing specific assumptions on the sensor and/or the planner that are rather restrictive for a variety of real-world autonomous systems and sensors used for navigational purposes. In contrast, rather than proposing a specific planning and sensing paradigm that guarantees safety, we aim to design a safety framework that is compatible with a broad class of sensors, planners, and dynamics.

There are two main challenges with providing such a framework. The first challenge relates to ensuring safety with respect to unknown parts of the environment and

Fig. 1. Overview: We consider the problem of safe navigation from an initial state to a goal state in an a priori unknown environment. Our approach treats the unsensed environment as an obstacle, and uses a HJ reachability framework to compute a safe controller for the vehicle, which is updated in real-time as the vehicle explores the environment. We show an application of our approach on a Turtlebot using a vision-based planner. When the robot is at risk of colliding, the safe controller () keep the system safe.

external disturbances while minimally interfering with goaldriven behavior. Second, real-time safety assurances need to be provided as new environment information is acquired, which requires approximations that are both computationally efficient and not overly conservative. Moreover, this safety analysis should be applicable to a wide variety of real-world sensors, planners, and vehicles.

In this paper, we propose a safety framework that can overcome these challenges for autonomous vehicles operating in a priori unknown static environments under the assumption that the sensors work perfectly within their ranges. Erroneous and noisy sensors can make safety analysis significantly more challenging and we defer this to future work. Our framework is based on Hamilton Jacobi (HJ) reachability analysis [25], [27], a verification method for guaranteeing safety and performance for dynamical systems with general nonlinear dynamics and disturbances. This approach provides not only the set of states from which the dynamical system can always satisfy state constraints (i.e. remain collision-free), but also provides an optimal controller that guarantees the system will never violate the state constraints. In particular, we treat the unknown environment at any given time as an obstacle and use HJ reachability to compute the backward reachable set (BRS), i.e. the set of states from which the vehicle can enter the unknown and potentially unsafe part of the environment, despite the best control effort. The complement of the BRS therefore represents the safe set for the vehicle. With this computation, we also obtain the corresponding least restrictive safety controller, which does not interfere with the planner unless the safety of the vehicle is at risk. Use of HJ reachability analysis in our framework thus allows us to overcome the first challenge—our framework can be applied to general nonlinear vehicles, sensors, and planners.

In general, due to the computationally expensive nature of HJ computations, this approach has not been leveraged in settings where the environment is not known beforehand and rather is sensed at run-time. To overcome this challenge, we propose a novel, real-time algorithm to compute the BRS. Our algorithm only locally updates the BRS in light of new environment information, which significantly alleviates the computational burden of HJ reachability while still maintaining the safety guarantees at all times. To summarize, our key contributions are:

1) a provably-safe framework for navigation in static unknown environments that is applicable to a broad class of sensors, planners, and dynamics,

2) an algorithm for online safe set updates from new sensor measurements as the robot navigates,

3) demonstration of our approach on different sensors and planners on a vehicle with nonlinear dynamics in the presence of external disturbances,

4) a hardware demonstration of our approach to provide safety around a state-of-the-art learning-based planner which uses only monocular RGB images for planning.

II. RELATED WORK

An extensive body of research deals with motion planning and safe exploration for robots in unknown environments, some of which focuses on safety guarantees despite modeling error and external disturbances. We cannot hope to summarize all these works here, but we attempt to discuss several of the most closely related approaches.

A. Safe motion planning

Methods that ensure safety despite modeling error and disturbances are largely motivated by the trade-off between safety and efficiency during real-time planning. A popular approach is to perform offline computations that quantify disturbances and modeling error which can be used online to determine collision-free trajectories [24], [16], [34]. Alternatively, [2], [3] use control barrier functions to design provably stable controllers while satisfying given state-space constraints. However, these methods assume that a recursively feasible collision-free path can be obtained despite the unknown environment, which may not be possible in real-world environments. Several works address this problem for single-agent scenarios within a model predictive control framework [29], [31], as well as for multiple vehicles using sequential trajectory planning [33], [5]. However, these works assume a priori knowledge of all obstacles, whereas our framework guarantees safety in an a priori unknown environments for potentially high-order nonlinear dynamics.

Ensuring safety with respect to both modeling error and limited sensing horizons have been studied using sum-of-squares [21], linear temporal logic [22], reactive synthesis approaches [32], graph-based kinodynamic planner [19] among others. These works typically impose restrictions on sensors or planners to ensure safety with respect to the unknown environment. In contrast, the proposed framework is sensor and planner agnostic, provided that the sensor can accurately identify the obstacles within it’s sensing region.

B. Safe exploration

The problem of finding feasible trajectories to a specified goal in an unknown environment has also been studied in the robotic exploration literature for simplified kinematic motion models using frontier-exploration methods [36] and D* [20]. Other works include sampling-based motion planners for drift-less dynamics [8] and dynamic exploration methods for vehicles with a finite stopping time [17]. Robotic exploration has been also studied within the context of fully and partially observable Markov decision processes [30], [28] and reinforcement learning [1], [18] to reduce collision probabilities; however, no theoretical safety guarantees are typically provided.

Safe exploration has also been studied in terms of Lyapunov stability [9], [11]. Even though stability is often desirable, it is insufficient to guarantee collision avoidance. In contrast, our formulation uses a stronger definition of safety, and is more in line with [15], [13], which characterize safety using reachable sets.

III. PROBLEM STATEMENT

where , and represent the state, the control, and the disturbance experienced by the vehicle. Here, d can include the effect of both the external disturbances or dynamics mismatch. For convenience, we partition the state x into the position component and the non-position component . We assume that the flow field is uniformly continuous in time, and Lipschitz continuous in x for fixed u and d. With this assumption, given , there exists a unique trajectory solving (1) [12]. We also assume that the vehicle state x can be accurately sensed at all times.

Let and denote the start and the goal state of the vehicle. The vehicle aims to navigate from to in an a priori unknown environment, E, whose map or topology is not available to the robot. At any time t and state x(t), the vehicle has a planner , which outputs the control command u(t) to be applied at time t. The vehicle also has a sensor which at any given time exposes a region of the state space , and provides a conservative estimate of the occupancy within . For example, if the vehicle has a camera sensor, would be a triangular region (prismatic in 3D) representing the field-of-view of the camera. We assume perfect perception within this limited sensor range. Dealing with erroneous perception, sensor noise, and dynamic environments are problems in their own right, and we defer them to future work. Finally, we assume that there is a known initial obstacle-free region around given by ; e.g. this is the case when the vehicle is starting at rest and its initial state is collision-free.

Given , the planner , and the sensor S, the goal of this paper is to design a least restrictive control mechanism to navigate the vehicle to the goal state while remaining safe, which means avoiding obstacles at all times. Since the environment E is unknown, the safety needs to be ensured given the partial observations of the environment obtained through the sensor, which in general is challenging. We use the HJ reachability-based framework to ensure safety despite only partial knowledge of the environment.

IV. RUNNING EXAMPLE

To illustrate our approach and provide intuition behind the proposed framework, we introduce a simple running example: a 3-dimensional Dubins’ car system with disturbances added to the velocity. The dynamics of the system are given by:

where is the state, is the position, is the heading, and is the disturbance experienced by the vehicle. The control of the vehicle is , where v is the speed and is the turn rate. Both controls have a lower and upper bound, which for this example are chosen to be 1rad/s. The disturbance bound is chosen as .

The environment setup for is shown in Figure 2. The vehicle start and the goal state are given by (shown in black) and (the center of the green area). The goal is to reach within 0.3m of (the light green area). However, there is an obstacle in the environment which is not known to the vehicle beforehand (shown in grey). At the beginning of the running example navigation task, we assume that there is no obstacle within 1.5m of , and obtain the initial obstacle-free region (the area inside the dashed black line).

To demonstrate the sensor-agnostic nature of our approach, we simulate the Dubins’ car with two different sensors: a LiDAR and a camera. For a LiDAR, the sensing region is given by a circle of radius R centered around the current position p(t), where R = 3m in this simulation (shown in Figure 2a). For a camera, the sensing region is determined by a triangular region with solid angle F (also called the field-of-view) and apex at the current vehicle heading, and a maximum extent of R. We use and R = 20m for our simulations (shown in Figure 2b). However, part of the regions of can be occluded by the obstacles, as would be the case for any real-world sensors.

Fig. 2. The initial setup for the running example. The goal is to safely reach the goal (center of the green area) from the initial position (black marker) in the presence of an unknown obstacle (the grey square). We also show the initial sensing region for the LiDAR and camera sensors.

Additionally, for each sensor, we demonstrate our approach on two different planners : a sampling-based planner and a model-based planner. For the sampling-based planner, we use a Rapidly-Exploring Random Tree (RRT) [23], and for the model-based planner, we use a splinebased planner [35]. Our goal is to safely navigate to the goal position despite the unknown obstacles.

V. HAMILTON-JACOBI REACHABILITY

Our framework is based on Hamilton Jacobi (HJ) reachability analysis [25], [27]. This analysis has been successfully applied in a variety of domains, such as aircraft auto-landing and safe multi-vehicle path planning [5], [6]. In this work, we will be using reachability analysis to compute a backward reachable set (BRS) given a set of unsafe states L. Intuitively, is the set of states such that the system trajectories that start from this set can enter L within a time horizon of for some disturbance despite the best control efforts. In contrast, for any trajectory that starts from , there exists a control such that the system trajectory will never enter L, despite the worst-case disturbance. Here, represents the complement of the set . The computation of the BRS can be formulated as a differential game between the control and disturbance, which can be solved using the principle of dynamic programming. The cost functional corresponding to this differential game is given by:

where represents the system state at time s starting from state x at time and applying control with disturbance . In (3), the function l(x) is the implicit surface function representing the unsafe set 0}. Intuitively, J keeps track of whether the system trajectory even entered the unsafe set during the time horizon , and if so, the cost corresponding to that trajectory is negative.

The value function corresponding to the cost functional in (3) is given by:

where represents the set of non-anticipative strategies [27]. If the value function is negative for a given state, then starting from this state the system cannot avoid entering into the unsafe set eventually. Thus, the value function in (4) keeps track of all unsafe trajectories of the system, which in turn can be used to compute the safe trajectories for the system. For further details on this formulation, we refer the interested readers to [27], [6].

The value function in (4) can be obtained using dynamic programming, which yields a Hamilton Jacobi-Isaacs Variational Inequality (HJI-VI) [14], [25]. Ultimately, a BRS can be computed by solving the following final value HJI-VI:

τ, x) + H(τ, x, τ, x)), lτ, x)} = 0

Here, and denote the time and space derivatives of the value function respectively. The Hamiltonian, , encodes the role of system dynamics, control, and disturbance, and is given by

Once the value function is computed, the BRS, and consequently, the set of safe states are given by

HJI reachability also provides the optimal control to keep the system in the safe set and is given by

In fact, the system can safely apply any control as long as it is not at the boundary of the unsafe region. If the system reaches the boundary of , the control in (9) steers the system away from the unsafe states. This least restrictive controller provided by HJI reachability is also the basis for ensuring safety for any planner in a least restrictive fashion in our framework.

VI. OUR APPROACH

We propose an HJ-reachability-based framework to ensure safety in an a priori unknown environment. Our framework also uses a novel, real-time computation of a conservative approximation of the safe set based on new observations of the environment as the vehicle is navigating. We first describe our framework, and then present a real-time algorithm to update the safe set.

A. Ensuring safety in unknown environments

Our framework treats the unsensed environment at any given time as an obstacle. The unsensed environment along with the sensed obstacles are used to compute a safe region for the vehicle using HJI reachability. This ensures that the vehicle never enters the unknown and potentially unsafe part of the environment, despite the worst case disturbance.

Let denote the sensed obstacle-free region of the environment at any time t. Given the initial obstacle-free region , we compute the corresponding safe set by solving the HJI-VI in (5), assuming everything outside is an obstacle. For this computation, the initial value function in (5) is given by , where is positive inside and negative outside. One such function is given by the signed distance to . Starting from , the HJI-VI is solved to obtain the value function . Here, is the dummy computation variable in (5). is then used to compute the safe region (see (8)). As long as the vehicle is inside , a controller exists to ensure that it does not collide with the known or unknown obstacles.

We next execute a controller on the system for the time horizon as per the following control law:

where is the optimal safe controller corresponding to and is given by (9). Also, until the safe set is updated, we use the last computed safe set for finding the optimal safe controller, i.e., . The control mechanism in (10) is least restrictive in the sense that it lets the planner execute the desirable control on the system, except when the system is at the risk of violating safety. Note that the control horizon T in our framework can be arbitrarily chosen by the system designer while still ensuring safety.

While the system is executing the control law in (10), it will obtain new sensor measurements at each time t, which is used to obtain , the free space sensed at that time. If the sensor is completely occluded by an obstacle at any time, the corresponding free space is an empty set. Thus, the overall known free space at time t is given by:

At the end of the control time horizon, we compute another safe region assuming everything outside is an obstacle. This safe region is obtained by solving HJI-VI until convergence. We then execute a control law similar to in (10), except that the safety controller intervenes only when the system is at the boundary of . The entire procedure is repeated until the system reaches the goal state.

Since the safety controller does not allow the system trajectory to leave the known free space, the proposed framework is guaranteed to avoid collision at all times. However, the safe set can be rather conservative especially early on when most of the environment is still unexplored, which is a trade-off we make to ensure safety against all unexpected obstacles. If additional information about the obstacles in the environment is known, it can be incorporated and will only reduce the conservativeness of the safe set.

Note that the safe set does not necessarily need to be updated every T seconds. It can be updated faster, slower or at the same rate as the control horizon. Essentially one can use the most recent safe set in the control law in (10), and still ensure safety at all times. This is because the safe set at any time is only smaller than the safe set at time when . However, the safe set should be updated as quickly as possible to minimize interference with the planner.

B. Efficient update of the BRS

Our framework requires the computation of a safe set in real-time as the vehicle is navigating through the environment. In general, this is challenging due to the exponentially scaling computational complexity of HJI reachability with respect to the state dimension [6]. To mitigate some of the computational challenges, we introduce two novel approaches to computing the BRS: warm-starting and local value function updates.

1) Warm-start approach: At any given time, the vehicle senses only a small additional part of the environment. Consequently, the free space map M only changes by a small amount in a small time horizon. Intuitively, this should only cause a small change in the safe region. We leverage this intuition to propose a novel, faster way to update the reachable set. For brevity, we explain our approach assuming that the safe set is updated every T seconds, but the same results hold when the safe set is updated at a non-fixed rate.

Given the last computed safe set at time , and the maps at and the current time t, we “warm-start” the value function in (5) for the BRS computation at time t as follows:

where as before is defined such that it is positive inside and negative outside. Intuitively, instead of initializing the value function with everywhere in the state space, (12) initializes it with the last computed value function for the states where no new information has been obtained since the last computation, and with only at the states which were previously assumed to be occupied but are actually obstacle-free. This leads to a much faster computation of BRS because the value function needs to be updated only for a much smaller number of states that are newly found out to be free. At all the other states, the value function is already almost accurate and only small refinements are required. Interestingly, this procedure also maintains the conservativeness of the safe region, which is sufficient to ensure collision avoidance at all times.

Lemma 1: Let be the solution of the following warm-started HJI-VI:

τ, x)+H(τ, x, τ, x)), lτ, x)} = 0,

where is defined as in (12). Let be the solution of the HJI-VI in (5) with . Then for all . In particular, and .

Proof: Since represents the converged value function at time , we have . Therefore, from (12), . Now we are ready to prove our claim.

1 : Start and the goal states

8 while the vehicle is not at the goal do

11 for every T seconds do

13 Warm-start the BRS computation using , M, and (12) 14 Obtain the new safe region, , by solving HJI-VI with the warm-started value function

15 Wlast ←− W; Vlast ←− W; ←− t

For , by dynamic programming (see [6] for details) we know that:

˜)) = maxminmininf(0))maxminmininf(0))

where the second inequality follows from the fact that . Thus, from (7), we have that for all . In particular, this implies that . Finally, (8) implies that since the safe set is the complement of the BRS.

Intuitively, Lemma 1 states that the safe set obtained by the warm-start approach is an under-approximation of the actual safe set obtained by solving full HJI-VI. Thus, it can be used to ensure safety for the vehicle while being computationally efficient. In practice, for the sensors and navigation problems in this paper, the amount of conservatism incurred by warm-starting is very small, as we demonstrate in Section VII. Our overall approach with warm-starting to update the safe set is summarized in Algorithm 1. We start with the initial known free space and compute the initial safe set using HJI-VI (Line 6). The value function for this computation is initialized by the signed distance to . We also maintain the last computed BRS , the safe set , and the corresponding time (Line 7). At every state, the vehicle obtains the current sensor observation and extracts the sensed free space (Line 9 and 10). Next, a control command is applied to the vehicle (Line 11). If the vehicle is inside , the planner is used to obtain the control command; otherwise,

the safety controller is applied. Every T seconds, the safe set and controller are updated based on the free space sensed by the vehicle so far using HJI-VI (Line 15). The value function for this computation is warm-started with except at the states which are discovered to be obstacle free since as described in (12) (Line 14). The whole procedure is repeated until the vehicle reaches its goal.

2) Local update of the BRS: In the last section, we discussed how warm-starting the value function computation might lead to a faster convergence of the value function; however, the value function is still computed over the entire state space. In this section, we present a more practical algorithm that leverages the advantages of warm-starting by computing and updating the value function only locally at the states for which new information has been obtained since the last value function computation. Our safety framework is still same as what described in Algorithm 1—only the computational procedure for the safe set computation (Line 15 in Algorithm 1) is being modified to update the value function locally. We outline this procedure in Algorithm 2.

In Algorithm 2, we maintain a list of states Q at which the value function needs to be updated in light of the new environment observations. Q is initialized to be the set of states that are newly discovered to be free since , i.e., (Line 1). Since the change in the value of the states in Q (compared to ) would also cause a change in the value of the neighboring states, N(Q), we also add them to Q (Line 2). Thus, . Typically, the value function in HJI-VI is computed by discretizing the state-space into a grid and solving the VI over that grid. In such cases, the spatial derivative of the value function (required to compute the Hamiltonian in the HJI-VI in (5)) is computed numerically using the neighboring grid points. This spatial derivative is precisely responsible for the propagation of the change in the value function at a state to its neighboring states. In such cases, N(Q) might represent the neighboring grid points used to compute the spatial derivative of the value function for the states in Q; however, other neighboring criteria can be used.

Once the neighbors are added to Q, the value for all the states in Q is initialized as per (12) (Line 3), and their value is updated using HJI-VI in (5) for some time step (Line 6). This computation is much faster than classical HJI-VI computation since it is typically performed for many fewer states. Next, we remove all those states from Q whose value function hasn’t changed significantly over this (Line 8 and 9), as these states won’t cause any change in the value function for any other state. The neighbors of the remaining states are next added to Q (Line 10) and the entire procedure is repeated until the value function is converged for all states. Note that Algorithm 2 still maintains the conservatism of the safe set since it is just a different computational procedure for computing the warm-started value function, which is still used within the safety framework in Algorithm 1.

VII. SIMULATIONS

A. Running example revisited

We now return to our running example and demonstrate the proposed approach in simulation (described in Section IV). We implement our safety framework with three different methods to update the BRS: using the full HJI-VI, the warm-start approach (Section VI-B.1), and the local update approach (Section VI-B.2). The corresponding system trajectories for different planners and sensors for all the three methods are shown in Figure 3. For all combinations of planners and sensors, the proposed framework is able to safely navigate the vehicle to its goal position despite the external disturbances and no a priori knowledge of the obstacle (none of the trajectories go through the obstacle). As the vehicle navigates through the environment, the planner makes optimistic decisions at several states that might lead to a collision; however, the safety controller intervenes to ensure safety. States where the safety controller is applied are marked in red. Note that the safety controller intervenes more frequently for the camera sensor as compared to the LiDAR. This is because the field-of-view (FoV) of a camera is typically much narrower than a LiDAR (which senses the obstacles in all directions within a range). Given this limited FoV, the safety controller needs to account for a much larger unexplored environment, which in turn leads to more cautious control.

We compare the computation time required for each of the three methods to compute the BRS for the camera and LiDAR sensors in Table I. All computations were done on a MATLAB implementation on a desktop computer with a Core i7 5820K processor using the Level Set toolbox [26]. As expected, across all scenarios, warm-starting the value function for the BRS computation leads to a significant improvement in computation time compared to full HJI-VI; however, the computation time might still not be practical for most real-world applications. Only locally updating the value function in addition to warm-starting leads to a significant further improvement in the computation time, and the BRS

Fig. 3. The vehicle trajectories for the problem setting in Figure 2 for both planners (RRT and Spline planners) and both sensors (LiDAR and Camera sensors) with the safety controller computed from each of the three candidate safety approaches. The proposed framework is able to safely navigate the vehicle to the goal in all cases. When the planner makes unsafe decisions, the safety controller intervenes (the states marked in red) to ensure safety. is updated in approximately 1s on average for all sensors and planners. This improvement is impressive considering that the computation was done in MATLAB without any parallelization which is known to decrease the computation time by a factor of 100 [10].

Lemma 1 indicates that the safe set obtained by warm-starting the value function is conservative compared to the one obtained by full HJI-VI. Therefore, we also compare the percentage volume of the states at which the safe set is conservative. This over-conservative volume is typically limited to 0.5% which indicates that the warm-starting approach is able to approximate the true value function quite well.

Finally, we take a closer look at how the safe control comes into play when the system is operating with a rangelimited sensor. Figure 4a showcases a Dubins’ car with a camera sensor and an RRT planner, where the current robot state is shown in black, the corresponding sensed region is in dark blue, and the trajectory and corresponding sensed regions are shown in grey and light blue respectively. Since the camera’s FoV is occluded by an obstacle at the current state, it cannot sense the environment past the obstacle. Figure 4b illustrates the corresponding current belief map of the environment which is the union of the free space sensed by the vehicle so far (shown in white). Since the current sensed region is contained within the sensed region at the previous state, no new environment information is obtained and hence the BRS is not updated. The slice of the safe set at the current vehicle heading is shown in Figure 4b (the

TABLE I. Numerical comparison of average compute time and relative volume of over-conservative states for each planner and sensor across different BRS update methods. Local updates compute an almost exact BRS in 1 second, and are significantly faster than both HJI-VI and warm-start.

Fig. 4. (a) The sensed region by the vehicle at different states in time for the camera sensor. (b) The overall free space sensed by the vehicle and the corresponding safe set (interior of the red boundary). Since the vehicle is at the boundary of the safe set, the safety controller is applied to steer the robot inside the safe set and ensure collision avoidance.

area within the red boundary). Since the vehicle is at the boundary of the safe set, the safety controller intervenes and applies a control that leads the system towards the interior of the safe set (the red arrow) to ensure collision avoidance.

B. Safety for a learning-based planner

Since the proposed safety framework is planner-agnostic, we can use it to ensure safe navigation even in the presence of a learning-based planner. In particular, we use the vision-based planner proposed in [7], which takes an RGB camera image and the goal position as input, and uses a Convolutional Neural Network-based perception module to produce a desired next state that moves the robot towards its goal while trying to avoid obstacles on its way. This desired next state is used by a model-based low-level planner to produce a smooth trajectory from the vehicle’s current state to the desired state. The authors demonstrate that the proposed planner can leverage robot’s prior experience to navigate efficiently in novel indoor cluttered environments; however, it still leads to collisions in several real-world scenarios, like when the vehicle needs to go through narrow spaces. We use the proposed safety framework to ensure both safe and

Fig. 5. The proposed framework can be exploited to provide safety guarantees around vision-based planners that incorporate learning in the loop. The vision-based planner plans a path through the doorway. Without safety control (a) this results in collision, however with safety (b) the robot avoids collision and reaches the goal. efficient planning in such difficult navigation scenarios.

The task setup for our simulation is shown in Figure 5a. The robot needs to go through a very narrow hallway, followed by a door into the room to reach its goal (the green circle) starting from the initial state (black arrow). At the beginning, the robot has no knowledge about the obstacles (shown in dark grey). We simulate this scenario using the S3DIS dataset which contains mesh scans of several Stanford buildings [4]. By rendering this mesh at any state, we can obtain the image observed by the camera (used by the planner) as well as the occupancy information within the robot’s FoV (used for the safety computation). For the robot dynamics, we use the 4D Dubins’ car model:

where is the position, is the heading, and v is the speed of the vehicle. The control is , where is the acceleration and is the turn rate.

The trajectory taken by the learning-based planner in the absence of the safety module is shown in Figure 5a. Even though the vehicle is able to go through the narrow hallway, it collides with the door eventually. The trajectory taken by the vehicle when the planner is combined with the proposed safety framework is shown in Figure 5b. When the planner takes an unsafe action near the door, the safety controller intervenes (marked in red) and guides the robot to safely go through the doorway. We also illustrate the image observed by the robot near the doorway in Figure 5a. Even though most of the robot’s vision is blocked by the door, the planner makes a rather optimistic decision of moving forward and leads to a collision. In contrast, the safety controller makes a conservative decision of rotating in place to explore the environment more before moving forward, and eventually goes through the doorway to reach the goal. The planner-agnostic nature of our framework allows us to provide safety guarantees around learning-based planners as well.

VIII. EXPERIMENTS

We test the proposed approach in hardware using a TurtleBot 2 with a mounted stereo RGB camera. For the vehicle state measurement, we use the on-board odometry sensors on the TurtleBot. In our experiment, the vehicle needs to navigate through an unknown cluttered indoor environment to reach its goal (shown in Figure 1). For the BRS computation, we use the dynamics model in (13). We pre-map the environment using an open-source Simultaneous Localization and Mapping (SLAM) algorithm and the on-board stereo camera. This pre-mapping step is used to avoid the significant delay and inaccuracies in the real-time SLAM map update. However, the full map is not provided to the robot during deployment. Instead, for the safe set computation at any given time, the current FoV of the camera is projected on the SLAM map and only the information within the FoV is used to update the safe set. This emulates the limited sensor range of the Turtlebot’s camera. Regardless, this alludes to one of the important and interesting future research directions of ensuring safety despite sensor noise.

For planning, we use the learning-based planner described in Section VII-B that uses the current RGB image to determine a candidate next state. A top-view of our experiment setting is shown in Figure 6. The vehicle starting position and heading are shown in black, the goal region is shown in green, and the obstacles (unknown to the vehicle beforehand) are shown in grey. We ran the experiment with and without the safety controller and show the corresponding trajectories in Figure 6. Without the safety controller, the learning-based planner struggles with making sharp turns near the corner, and eventually collides into the obstacle (the chair, in this case). For context, we also show the RGB observation received by the planner near the corner. Even though the robot is very close to the chair, the planner makes the unsafe decision of continuing to move forward. However, when the learning-based planner is used within the proposed safety framework, the safety controller is able to account for this unsafe situation and safely steer the vehicle away from the obstacle. We show the corresponding safe set when the vehicle is at the obstacle boundary and the corresponding vehicle trajectory obtained using the safety controller. Afterwards, the planner takes over and steers the vehicle to the goal.

IX. PRACTICAL CONSIDERATIONS

When implementing a safety framework on real systems, there are many practical considerations that should be acknowledged. Below we discuss some of the main practical considerations we encountered during our simulation and hardware experiments when using our proposed framework:

• Since the value function is computed over a discretized state space, it might incur some numerical inaccuracies. Using a finer discretization and a higher order approximation for the spatial derivative is often helpful in alleviating these issues; however, the computation time also increases consequently. In our experience, we have found the 3rd or higher order approximations schemes to work pretty well.

• Due to the complicated geometry of real-world obstacles, the sensed map at any given time could appear highly irregular. Such irregular maps induce irregular

Fig. 6. We show the top-view of our experiment setting, and the corresponding system trajectories with and without the proposed safety framework. Without the safety framework, the robot collides into the chair. In contrast, our safety framework is able to safely navigate the robot to its goal by intervening when the vehicle is too close to the obstacles.

implicit surface functions, which can significantly hamper the value function computation. Thus, it might often be desirable to convert the occupancy map into a regular, well-behaved function, such as a signed distance map, and use that for the value function computation.

• Theoretically speaking, the safety controller only needs to be applied when the vehicle is at the boundary of the safe set. However, in practice, due to numerical inaccuracies, it might be desirable to apply the safety controller at a positive level of the value function.

X. CONCLUSION AND FUTURE WORK

Providing safety guarantees for real-world autonomous systems operating in a priori unknown environments is a challenging but important problem. In this paper, we propose an HJ reachability-based safety framework for navigation in unknown environments that is applicable to a wide variety of planners and sensors. To overcome the computation complexity of classical HJ reachability analysis, we propose a novel, real-time algorithm to update the reachable set as the vehicle traverses the environment. We demonstrate our approach on multiple sensors and planners, including a learning-based planner, both in simulation and on a hardware testbed.

Several interesting future directions emerge from this work. First, the proposed framework currently assumes perfect state estimation and sensor measurements. However, these assumptions may not hold in practice, and need to be appropriately accounted for in the framework. Additionally, the current safety guarantees hold for static environments– extensions to dynamic, multi-agent environments is another interesting and valuable research direction.

ACKNOWLEDGMENTS

This research is supported in part by the DARPA Assured Autonomy program under agreement number FA8750-18-C-0101, by NSF under the CPS Frontier project VeHICaL project (1545126), and by SRC under the CONIX Center, and by Berkeley Deep Drive.

The authors would also like to thank Kene Akametalu, Ellis Ratner, and Anca Dragan for their helpful advice on various technical issues examined in this paper.

REFERENCES

[1] J. Achiam, D. Held, A. Tamar, and P. Abbeel. Constrained policy optimization. In ICML, 2017.

[2] A. Agrawal and K. Sreenath. Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. In RSS, 2017.

[3] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861–3876, 2017.

[4] I. Armeni, A. Sax, A. R. Zamir, and S. Savarese. Joint 2D-3D-Semantic Data for Indoor Scene Understanding. arXiv preprint, 2017.

[5] S. Bansal, M. Chen, J. Fisac, and C. J. Tomlin. Safe sequential path planning under disturbances and imperfect information. In ACC, 2017.

[6] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin. Hamilton-Jacobi reachability: A brief overview and recent advances. In CDC, 2017.

[7] S. Bansal, V. Tolani, S. Gupta, J. Malik, and C. Tomlin. Combining optimal control and learning for visual navigation in novel environments. arXiv preprint, 2019.

[8] K. E. Bekris and L. E. Kavraki. Greedy but safe replanning under kinodynamic constraints. In ICRA, 2007.

[9] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause. Safe model-based reinforcement learning with stability guarantees. NIPS ’17.

[10] M. Chen, S. Bansal, K. Tanabe, and C. J. Tomlin. Provably safe and robust drone routing via sequential path planning: A case study in san francisco and the bay area. arXiv preprint, 2017.

[11] Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh. A lyapunov-based approach to safe reinforcement learning. In NIPS ’18.

[12] E. A. Coddington and N. Levinson. Theory of ordinary differential equations. McGraw-Hill, New York, 1955. pp. 1–42.

[13] J. Fisac, A. Akametalu, M. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. on Automatic Control, 2018.

[14] J. Fisac, M. Chen, C. J. Tomlin, and S. Sastry. Reach-avoid problems with time-varying dynamics, targets and constraints. In HSCC, 2015.

[15] T. Fraichard and H. Asama. Inevitable collision states-a step towards safer robots? Advanced Robotics, 18(10):1001–1024, 2004.

[16] S. Herbert, M. Chen, S. Han, S. Bansal, J. Fisac, and C. J. Tomlin. Fastrack: a modular framework for fast and guaranteed safe motion planning. In CDC, 2017.

[17] L. Janson, T. Hu, and M. Pavone. Safe motion planning in unknown environments: Optimality benchmarks and tractable policies. arXiv preprint, 2018.

[18] G. Kahn, A. Villaflor, V. Pong, P. Abbeel, and S. Levine. Uncertaintyaware reinforcement learning for collision avoidance. arXiv preprint, 2017.

[19] D.F. Keil, J. Fisac, and C. J. Tomlin. Safe and complete real-time planning and exploration in unknown environments. arXiv preprint, 2018.

[20] S. Koenig and M. Likhachev. Fast replanning for navigation in unknown terrain. IEEE Trans. on Robotics, 21(3):354–363, 2005.

[21] S. Kousik, S. Vaskov, F. Bu, M. J. Roberson, and R. Vasudevan. Bridging the gap between safety and real-time performance in recedinghorizon trajectory design for mobile robots. arXiv preprint, 2018.

[22] M. Lahijanian, M. R. Maly, D. Fried, L. E. Kavraki, H. Kress-Gazit, and M. Vardi. Iterative temporal planning in uncertain environments with partial satisfaction guarantees. IEEE Trans. on Robotics, 32:583– 599, 2016.

[23] Steven M LaValle. Rapidly-exploring random trees: A new tool for path planning. 1998.

[24] A. Majumdar and R. Tedrake. Funnel libraries for real-time robust feedback motion planning. The International Journal of Robotics Research, 36(8):947–982, 2017.

[25] K. Margellos and J. Lygeros. Hamilton-Jacobi Formulation for ReachAvoid Differential Games. IEEE Trans. on Automatic Control, 56(8):1849–1861, 2011.

[26] I. Mitchell. A toolbox of level set methods. http://www. cs. ubc. ca/mitchell/ToolboxLS/toolboxLS. pdf, Tech. Rep. TR-2004-09, 2004.

[27] I. Mitchell, A. Bayen, and C. J. Tomlin. A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games. IEEE Trans. on automatic control, 50(7):947–957, 2005.

[28] T. Moldovan and P. Abbeel. Safe exploration in markov decision processes. arXiv preprint, 2012.

[29] A. Richards and J. How. Model predictive control of vehicle maneuvers with guaranteed completion time and robust feasibility. ACC ’03.

[30] C. Richter, W. Vega-Brown, and N. Roy. Bayesian learning for safe high-speed navigation in unknown environments. In Robotics Research, pages 325–341. 2018.

[31] U. Rosolia and F. Borrelli. Learning model predictive control for iterative tasks. a data-driven control framework. IEEE Trans. on Automatic Control, 63(7):1883–1896, 2018.

[32] S. Sarid, Bingxin X., and H. Kress-gazit. Guaranteeing high-level behaviors while exploring partially known maps. In RSS, 2012.

[33] T. Schouwenaars, J. How, and E. Feron. Decentralized cooperative trajectory planning of multiple aircraft with hard safety guarantees. In AIAA Guidance, Navigation, and Control Conference, 2004.

[34] S. Singh, A. Majumdar, J. Slotine, and M. Pavone. Robust online motion planning via contraction theory and convex optimization. In ICRA ’17.

[35] R. Walambe, N. Agarwal, S. Kale, and V. Joshi. Optimal trajectory generation for car-type mobile robot using spline interpolation. IFACPapersOnLine, 49(1):601–606, 2016.

[36] L. Yoder and S. Scherer. Autonomous exploration for infrastructure modeling with a micro aerial vehicle. In Field and service robotics, 2016.

designed for accessibility and to further open science