Incorporating insights from rich, perceptual sensing modalities such as cameras remains a major challenge in controlling complex autonomous systems. While such sensing systems clearly have the potential to convey more information than simple, single output sensor devices, interpreting and robustly acting upon the high-dimensional data streams remains difficult. Recent end-to-end approaches tackle the problem of image based control by learning an optimized map from pixel values directly to low level control inputs. Though there has been tremendous success in accomplishing sophisticated tasks, critical gaps in understanding robustness and safety still remain [38]. On the other hand, methods rooted in classical state estimation and robust control explicitly characterize a model of the underlying system and its environment in order to design a feedback controller. These approaches have provided strong and rigorous guarantees of robustness and safety in domains such as aerospace and process control, but the level of specification of the underlying system required has thus far limited their impact for complex sensor inputs.
In this paper, we aim to leverage contemporary techniques from machine learning and robust control to understand the conditions under which safe and reliable behavior can be achieved in controlling a system with uncertain and high-dimensional sensors. Whereas much recent work has been devoted to proving safety and performance guarantees for learning-based controllers applied to systems with unknown dynamics [1, 2, 4, 5, 12, 16, 22, 23, 32, 47, 63, 65, 68], we focus on the practical scenario where the underlying dynamics of a system are well understood, and it is instead the integration of perceptual sensor data into the control loop that must be learned. Specifically, we consider controlling a known linear dynamical system for which partial state information can only be extracted from complex observations. Our approach is to design a virtual sensor by learning both a perception map (i.e., a map from observations to a linear function of the state) and a bound on its estimation errors. We show that under suitable smoothness assumptions, we can guarantee bounded errors within a neighborhood of the training data. This model of uncertainty allows us to synthesize a robust controller that ensures that the system does not deviate too far from states visited during training. Finally, we show that the resulting perception and robust control loop is able to robustly generalize under adversarial noise models. To the best of our knowledge, this is the first such guarantee for a vision-based control system.
1.1 Related work
Vision based estimation, planning, and control There is a rich body of work, spanning several research communities, that integrate complex sensing modalities, specifically cameras, into estimation, planning, and control loops. The robotics community has focused mainly on integrating camera measurements with inertial odometry via an Extended Kalman Filter (EKF) [31, 34, 35]. Similar approaches have also been used as part of Simultaneous Localization and Mapping (SLAM) algorithms in both ground [42] and aerial [41] vehicles. We note that these works focus solely on the estimation component, and do not consider downstream use of state estimates in control loops. In contrast, the papers [39, 40, 62] demonstrate techniques that use camera measurements to aid inertial position estimates to enable aggressive control maneuvers in unmanned aerial vehicles.
The machine learning community has taken a more data-driven approach. The earliest such example is likely [52], in which a 3-layer neural-network is trained to infer road direction from images. Modern approaches to vision based planning, typically relying on deep neural networks, include learning maps from image to trail direction [28], learning Q-functions for indoor navigation using 3D CAD images [56], and using images to specify waypoints for indoor robotic navigation [11]. Moving from planning to low-level control, end-to-end learning for vision based control has been achieved through imitation learning from training data generated via human [15] and model predictive control [50]. The resulting policies map raw image data directly to low-level control tasks. In Codevilla et al. [19], higher level navigational commands, images, and other sensor measurements are mapped to control actions via imitation learning. Similarly, Williams et al. [68] and related works, image and inertial data is mapped to a cost landscape, that is then optimized via a path integral based sampling algorithm. More closely related to our approach is Lambert et al. [37], where a deep neural network is used to learn a map from image to system state – we note that this perception module is naturally incorporated into our proposed pipeline. To the best of our knowledge, none of the aforementioned results provide safety or performance guarantees.
Learning, robustness, and control Our theoretical contributions are similar in spirit to those of the online learning community, in that we provide generalization guarantees under adversarial noise models [6, 7, 30, 36, 70]. Similarly, Agarwal et al. [4] shows that adaptive disturbance feedback control of a linear system under adversarial process noise achieves sublinear regret – we note that this approach assumes full state information. We also draw inspiration from recent work that seeks to bridge the gap between linear control and learning theory. These assume a linear time invariant system, and derive finite-time guarantees for system identification [22, 26, 27, 29, 49, 51, 57, 58, 61, 64], and/or integrate learned models into control schemes with finite-time performance guarantees [1–3, 20, 22, 23, 43, 48, 53, 55].
1.2 Notation
We use letters such as x and A to denote vectors and matrices, and boldface letters such as x and to denote infinite horizon signals and linear convolution operators. Thus, for
, we have by definition that
for the history of signal x up to time t. For a function
), we write f(x) to denote the signal
We overload the norm so that it applies equally to elements
, and linear operators
. For any element norm, we define the signal norm as
= sup
and the linear operator norm as
. We primarily focus on the triple (
). Note that as
is an induced norm, it satisfies the sub-multiplicative property
We define the ball ) =
. We say that a fuction f is locally S-slope bounded for a radius
, or similarly locally L-Lipschitz if for
Consider the LTI dynamical system
with system state , control input
, disturbance
, and observation
. We take the dynamics matrices (A, B, H) to be known. The observation process is determined by the unknown generative model q, which is nonlinear and potentially quite high-dimensional. As an example, consider a camera affixed to the dashboard of a car tasked with driving along a road. Here, the observations
are the captured images and the map q generates these images as a function of position and velocity.
Motivated by such vision based control systems, our goal is to solve the optimal control problem
where c(x, u) is a suitably chosen cost function and is a measurable function of the image history
. This problem is made challenging by the nonlinear, high-dimensional, and unknown generative model.
Suppose instead that there exists a perception map p that imperfectly predicts partial state information; that is a known matrix, and error
. Using this map, we define a new measurement model in which the map p plays the role of a noisy sensor:
Figure 1: (Left) A diagram of the proposed perception-based control pipeline. (Right) The conceptual rearrangement of the closed-loop system permitted through our perception error characterization.
This allows us to reformulate problem (3) as a linear optimal control problem, where the measurements are defined by (4) and the control law ) is a linear function of the outputs of past measurements
. As illustrated in Figure 1, guarantees of performance, safety, and robustness require designing a controller which suitably responds to system disturbance w and sensor noise e. For linear optimal control problems, a variety of cost functions and noise models have well-understood solutions; we expand on this observation further in Section 2.1.
In light of this discussion, we approach our problem with the following data-driven procedure: First, we use supervised learning to fit a perception map and explicitly characterize its errors in terms of the data it was trained on. Second, we compute a robust controller that mitigates the effects of the measurement error e. We will show that under suitable local smoothness assumptions on the generative model q and perception map p we can guarantee closed-loop robustness using a number of samples depending only on the state dimension.
2.1 Background on linear optimal control
We first recall some basic concepts from linear optimal control in the partially observed setting. In particular we consider the optimal control problem
for the state,
the control input,
the process noise,
the sensor noise, K a linear-time-invariant operator, and c(x, u) a suitable cost function.
Control design depends on how the disturbance w and measurement error e are modeled, as well as performance objectives. By modeling the disturbance w and sensor noise e as being drawn from different signal spaces, and by choosing correspondingly suitable cost functions, we can incorporate practical performance, safety, and robustness considerations into the design process. For example, the well-studied LQG control problem minimizes the expected quadratic cost for user-specified positive definite matrices and normally distributed process and measurement noise. The resulting optimal estimation and control policy is to apply a static state feedback controller with the Kalman filter state estimate. The robust version of this problem considers instead the worst-case quadratic cost for or power-norm bounded noise. This is the
optimal control problem, which has a rich history [72].
As a final example, we recall the optimal control problem, which seeks to minimize the
induced norm of a system. This formulation best accommodates real-time safety constraints and actuator saturation, which we motivate in the following example.
Example 2.2 (Waypoint Tracking). Suppose that the control objective is to keep the system within a bounded distance of a sequence of waypoints at every timestep, and that the distance between sequential waypoints is bounded. Denoting the system state as and the waypoint sequence as r, the cost function is
If we specify costs by Q = diag(1and R = diag(1
, then as long as the optimal cost is less than 1, we can guarantee bounded tracking error
and actuation
for all possible realizations of the waypoint and sensor error processes. By defining an augmented state, i.e.
:= [
], and modeling waypoints as bounded disturbances, i.e.
:=
, we can reformulate this cost into a standard
optimal control problem.
In Appendix A, we provide further details on these optimal control problem formulations and solutions. Returning to the goals of perception-based control, it is clear that the perception errors will not follow a Gaussian distribution. Because this invalidates the optimality of the LQG control strategy, we focus on bounded norm assumptions, particularly for the case of
In this section, we introduce a procedure to estimate regions for which a learned perception map can be used safely during operation. We first suppose access to initial training data used to learn a perception map via any wide variety of traditional supervised methods. We then estimate safe regions around the training data, potentially using parameters learned from a second dataset. The result is a characterization of how quickly the learned perception map degrades as we move away from the initial training data.
We will describe the regions of the state space within which the sensing is reliable using a safe set which approximates sub-level sets of the error function e(x) = . We make this precise in the following Lemma, defining a safe set which is valid under an assumption that the error function is locally slope bounded around training data.
Lemma 3.1 (Closeness implies generalization). Suppose that is locally S-slope bounded with a radius of r around training datapoints. Define the safe set
Then for any (, the perception error is bounded:
Proof: The proof follows by a simple argument. For training data point () such that
The second line follows from the local slope bounded assumption. Then further choosing to correspond to minimizing over training data in the definition of
around the training data. We remark that this notion of slope boundedness has connections to sector bounded nonlinearities, a classic setting for nonlinear system stability analysis [24]. Deriving the slope boundedness of the error function relies on the learned perception map as well as the underlying generative model, so we propose a second learning step to estimate a bound on S. We use an additional dataset composed of samples around each training data point: , where each
contains N points densely sampled from
). For each
in the training set, we then fix a radius r and estimate S with the maximum observed slope:
By taking the maximum over training datapoints , the slope bound S can be estimated with
Proposition 3.2. Assume that the error function is locally L-Lipschitz for a radius r around a training point
. Further assume that the true maximum slope
is achieved by a point with distance from
greater than 0
estimated with an
Proof: Let
for in the
cover of
). Because S can be achieved at least
away from
, and
is at most
away from x’, we can bound S as
on S. This margin decreases with the number of samples N and depends on an assumption about the local smoothness of the system. This estimation problem is similar to Lipschitz estimation, which has been widely studied with an eye towards optimization [60]. We note that Lipschitz smoothness is a stronger condition than our bounded slope condition, and requires several additional smoothness assumptions to estimate [69]. By combining Lemma 3.1 with Proposition 3.2, we estimate bounds on the perception errors within a neighborhood of the training data. In what follows, we use these local error bounds in robust control.
The local generalization result in Lemma 3.1 is useful only if the system remains close to states visited during training. In this section, we show that robust control can ensure that the system will remain close to training data so long as the perception map generalizes well. By then enforcing that the composition of the two bounds is a contraction, a natural notion of controller robustness emerges that guarantees favorable behavior and generalization. In what follows, we adopt an adversarial noise model and exploit the fact that we can design system behavior to bound how far the system deviates from states visited during training.
Recall that for a state-observation pair (x, z), the perception error, defined as e := , acts as additive noise to the measurement model y = p(z). While standard linear control techniques can handle uniformly bounded errors, more care is necessary to further ensure that the system remains with a safe region of the state space, as determined by the training data. Through a suitable convex reformulation of the safe region, this goal can be addressed through receding horizon strategies (e.g. [45, 66]). While these methods are effective in practice, constructing terminal sets and ensuring a priori that feasible solutions exist is not an easy task. To make explicit connections between learning and control, we turn our analysis to a system level perspective on the closed-loop to characterize its sensitivity to noise.
4.1 System-level parametrization
The system level synthesis (SLS) framework, proposed by Wang et al. [67], provides a parametrization of our problem that makes explicit the effects of errors e on system behavior. Namely, for any controller that is a linear function of the history of system outputs, we can write the state and input as a convolution of the system noise and the closed-loop system responses
As the equation (8) is linear in the system response elements Φ, convex constraints on state and input translate to convex constraints on the system response elements. Wang et al. [67] show that for any elements constrained to obey, for all
there exists a feedback controller that achieves the desired system responses (8), and therefore any optimal control problem over linear systems can be cast as a corresponding optimization problem over system response elements. To lessen the notational burden of working with these system response elements, we will work with their z transforms. This is particularly useful for keeping track of semi-infinite sequences, especially since convolutions in time can now be represented as multiplications, i.e.
The corresponding control law u = Ky is given by . This controller can be implemented via a state-space realization [8] or as an interconnection of the system response elements [67]. The affine realizability constraints can be rewritten as
where we denote the affine space defined by (A, B, C) as A. In this SLS framework, many robust control costs can be written as system norms,
For LQG control, the objective is equivalent to a system norm [72].
A comment on finite-dimensional realizations Although the constraints and objective function in this framework are infinite dimensional, two finite-dimensional approximations have been successfully applied. The first consists of selecting an approximation horizon T, and enforcing that ) = 0 for some appropriately large T, which is always possible for systems that are controllable and observable. When this is not possible, one can instead enforce bounds on the norm of
use robustness arguments to show that the sub-optimality gap incurred by this finite dimensional approximation decays exponentially in the approximation horizon T [9, 14, 22, 44]. Finally, in the interest of clarity, we always present the infinite horizon version of the optimization problems, with the understanding that finite horizon approximations are necessary in practice.
4.2 Robust control for generalization
In what follows, we specialize controller design concerns to our perception-based setting, and develop further conditions on the closed-loop response to incorporate into the synthesis problem. Because system level parametrization makes explicit the effects of perception errors e on system behavior, we can transparently keep track of closed-loop behavior, as made precise in the following Lemma.
Lemma 4.1 (Generalization implies closeness). For a perception map p with errors let the system responses
lie in the affine space defined by dynamics (A, B, C), and let K be the associated controller. Then the state trajectory x achieved by the control law u = Kp(z) and driven by noise process w, satisfies, for any target trajectory
where we define the nominal closeness to be the deviation from the target trajectory in the absence of measurement errors.
Proof: Notice that over the course of a trajectory, we have system outputs y = p(z) = Cx + e. Then recalling that the system responses are defined such that , we have that
to visit states during operation that are similar to those seen during training. The second term is a measure of the robustness of our system to the error e. We are now in a position to state the main result of the paper, which shows that under an additional robustness condition, Lemmas 3.1 and 4.1 combine to define a control invariant set around the training data within which we can bound the perception errors and consequently the performance.
Theorem 4.2. Let the assumptions of Lemmas 3.1 and 4.1 hold and suppose that the training error is bounded, i.e. . Then as long as
the perception errors remain bounded
and the closed-loop trajectory lies within
Proof: Recall that as in the proof of Lemma 3.1, as long as ) for all k and some arbitrary
By assumption, . Substituting this expression into the result of Lemma 4.1, we see that as long as
Next, to ensure that the the radius is bounded, first note that by norm definition, maxr if and only if
. A sufficient condition for this is given by
and thus we arrive at the robustness condition. Since the bound on is valid, we now use it to bound
, starting with (13) and rearranging,
generalization. Feasibility of the synthesis problem depends on the controllability and observability of the system (A, B, C), which impose limits on how small can be made to be, and on the planned deviation from training data as captured by the quantity
4.3 Robust synthesis for waypoint tracking
We now specialize to the task of waypoint tracking to simplify the term and propose a robust control synthesis problem. Recall that waypoint tracking can be encoded by defining the state
] as the concatenation of tracking error and waypoint and the disturbance as the change in reference,
Proposition 4.3. For a reference tracking problem with and a reference trajectory within a ball of radius
from the training data,
Proof: For the training data, we can assume zero tracking errors, so that . Then
]. Thus, we can rewrite
Using this setting, we add the robustness condition in (11) to a control synthesis problem as
Notice the trade-off between the size of different parts of the system response. Because the responses must lie in an affine space, they cannot both become arbitrarily small. Therefore, the synthesis problem must trade off between sensitivity to measurement errors and tracking fidelity. These trade-offs are mediated by the quality of the perception map, through its slope-boundedness and training error, and the ambition of the control task, through its deviation from training data and distances between waypoints.
While the robustness constraint in (11) is enough to guarantee a finite cost, it does not guarantee a small cost. To achieve this, we incorporate the error bound to arrive at the following robust synthesis procedure:
This is a convex program for fixed , so the full problem can then be approximately solved via one-dimensional search.
4.4 Necessity of robustness
Robust control is notoriously conservative, and our main result reslies heavily on small gain-like arguments. Can the conservatism inherent in this approach be generally reduced? In this section, we answer in the negative by describing a class of examples for which the robustness condition in Theorem 4.2 is necessary. For simplicity, we specialize to the goal of regulating a system to the origin, and assume that (0) is in the training set, with
We consider the following optimal control problem
and define ¯as a minimizing argument for the problem without the inequality constraint. For simplicity, we consider only systems in which the closed-loop is strictly proper and has a state-space realization.
Proposition 4.4. Suppose that
Then there exists a differentiable error function with slope bound S such that x = 0 is an asymptotically stable equilibrium if and only if
Proof: Sufficiency follows from our main analysis, or alternatively from a simple application of the small gain theorem. The necessity follows by construction, with a combination of classic nonlinear instability arguments and by considering real stability radii.
Recall that + (
). Define
to represent the internal state of the controller. Then we can write the system’s closed-loop behavior as (ignoring for now the effect of process noise, which does not affect the stability analysis)
Suppose that the error function e(x) is differentiable at x = 0 with derivative J. Then the stability of the origin depends on the linearized closed loop system ) :=
where
0] picks out the relevant component of the closed-loop state. If any eigenvalues of
lie outside of the unit disk, then x = 0 is not an asymptotically stable equilibrium.
Switching gears, recall that ) =
. We define S = 1
and notice that if
, then ¯
is also the solution to the constrained problem. By assumption,
(1)
. Thus, there exist real w, v such that
(1)
. If we set
, then
) has an eigenvalue on the unit disk and x = 0 is not an asymptotically stable equilibrium. To see why this is true, recall the classic result for stability radii (e.g. Corollary 4.5 in Hinrichsen and Pritchard [33]) and notice
Thus, for any error function with derivative at zero, the robustness condition is necessary as well as sufficient. One such error function is simply e(x) = Jx, though many more complex forms exist.
We now present a simple example in which this condition is satisfied, and construct the corresponding error function which results in an unstable system.
Example Consider the double integrator
Figure 2: In (a), visual inputs for the synthetic (left) and CARLA (right) examples. In (b), (left) the
bounded ball the synthetic circle must remain inside in the face of bounded adversarial noise, (center) the nominal trajectory the synthetic circle is driven to follow, and (right) and the nominal trajectory the simulated vehicle is driven to follow.
and the control law resulting from the optimal control problem with identity weighting matrices along with the robustness constraint
. Notice that the solution to the unconstrained problem would be the familiar LQG combination of Kalman filtering and static feedback on the estimated state. We denote the optimal unconstrained system response by ¯
Consider an error function with slope bounded by and derivative at x = 0 equal to
Calculations confirm that the norm satisfies the relevant property and that the origin is not a stable fixed point if the synthesis constraint
We demonstrate our theoretical results with examples of control from pixels, using both simple synthetic images and complex graphics simulation. The synthetic example uses generated 64 64 pixel images of a moving blurry white circle on a black background; the complex example uses 800
600 pixel dashboard camera images of a vehicle in the CARLA simulator platform [25]. Figure 2(a) shows representative images seen by the controllers.
For both visual settings, we demonstrate our results in controlling a 2D double integrator to track a periodic reference. The 2D double integrator state is given by = [(
)
)
], and each component evolves independently according to the dynamics in (14) with dt = 0.1. For all examples, the sensing matrix C extracts the position of the system, i.e.,
]. Training and validation trajectories are generated by driving the system with an optimal state feedback controller (i.e. where measurement y = x) to track a desired reference trajectory w = r + v, where r is a nominal reference, and v is a random norm bounded random perturbation satisfying
We consider three different perception maps: a linear map for the synthetic example and both visual odometry and a convolutional neural net for the CARLA example. For the CNN, we collect a dense set of training examples around the reference to train the model. We use the approach proposed by Coates and Ng [18] to learn a convolutional representation of each image: each resized and scaled image is passed through a single convolutional, ReLU activation, and max pooling layer. We then fit a linear map of these learned image features to position and heading of the camera pose. We require approximately 30000 training points. During operation, pixel-data z is passed through the CNN to obtain position estimates y, which are then used by the robust controller. We note that
Figure 3: (top) Test perception error vs. distance to the nearest training point
; and (bottom) median
tracking error for 200 rollouts using the (a) linear map on synthetic images, (b) SLAM and (c) CNN on CARLA dashboard images. Error bars show upper and lower quartiles.
other more sophisticated architectures for feature extraction would also be reasonable to use in our control framework; we find that this one is conceptually simple and sufficient for tracking around our fixed reference.
To perform visual odometry, we first collect images from known poses around the desired reference trajectory. We then use ORB SLAM [46] to build a global map of visual features and a database of reference images with known poses. This is the “training” phase. We use one trajectory of 200 points; the reference database is approximately this size. During operation, an image z is matched with an image in the database. The reprojection error between the matched features in
with known pose
and their corresponding features in z is then minimized to generate pose estimate y. For more details on standard visual odometry methods, see [59]. We highlight that modern visual SLAM algorithms incorporate sophisticated filtering and optimization techniques for localization in previously unseen environments with complex dynamics; we use a simplified algorithm under this training and testing paradigm in order to better isolate the data dependence.
We then estimate safe regions for all three maps, as described in Section 3. In the top panels of Figure 3 we show the error profiles as a function of distance to the nearest training point for the linear (left), SLAM (middle), and CNN (right) maps. We see that these data-dependent localization schemes exhibit the local slope bounded property posited in the previous section.
We compare robust synthesis to the behavior of nominal controllers that do not take into account the nonlinearity in the measurement model. In particular, we compare the performance of naively synthesized LQG and optimal controllers with controllers designed with the robustness condition (11). LQG is a standard control scheme that explicitly separates state estimation (Kalman Filtering) from control (LQR control), and is emblematic of much of standard control practice.
optimal control minimizes worst case state deviation and control effort by modeling process and sensor errors as
bounded adversarial processes. For further discussion and details, refer to Section 2.1 and Appendix B. The bottom panels of Figure 3 show that the robustly synthesized controllers remain within a bounded neighborhood around the training data. On the other hand, the nominal controllers drive the system away from the training data, leading to a failure of the perception and control loop. We note that although visual odometry may not satisfy smoothness assumptions when the feature detection and matching fails, we nevertheless observe safe system behavior, suggesting that using our robust controller, no such failures occur.
Though standard practice is to treat the output of a perception module as an ordinary signal, we have demonstrated both in theory and experimentally that accounting for the inherent uncertainty of perception based sensors can dramatically improve the performance of the resulting control loop. Moreover, we have shown how to quantify and account for such uncertainties with tractable data-driven safety guarantees. We hope to extend this study to the control of more complex systems, and to apply this framework to standard model-predictive control pipelines which form the basis of much of contemporary control practice.
We further hope to highlight the challenges involved in adapting learning-theoretic notions of generalization to the setting of controller synthesis. First note that if we collect data using one controller, and then use this data to build a new controller, there will be a distribution shift in the observations seen between the two controllers. Any statistical generalization bounds on performance must necessarily account for this shift. Second, from a more practical standpoint, most generalization bounds require knowing instance specific quantities governing properties of the class of functions we use to fit a predictor. Hence, they will include constants that are not measurable in practice. This issue can perhaps be mitigated using some sort of bootstrap technique for post-hoc validation. However, we note that the sort of bounds we aim to bootstrap are worst case, not average case. Indeed, the bootstrap typically does not even provide a consistent estimate of the maximum of independent random variables, see for instance [13], and Ch 9.3 in [17]. Other measures such as conditional value at risk require billions of samples to guarantee five 9s of reliability [54]. We highlight these issues only to point out that adapting statistical generalization to robust control is an active area with many open challenges to be considered in future work.
This work is generously supported in part by ONR awards N00014-17-1-2191, N00014-17-1-2401, and N00014-18-1-2833, the DARPA Assured Autonomy (FA8750-18-C-0101) and Lagrange (W911NF-16-1-0552) programs, a Siemens Futuremakers Fellowship, and an Amazon AWS AI Research Award. SD and VY are supported by an NSF Graduate Research Fellowship under Grant No. DGE 1752814.
[1] Yasin Abbasi-Yadkori and Csaba Szepesvári. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26, 2011.
[2] Yasin Abbasi-Yadkori, Nevena Lazic, and Csaba Szepesvári. Model-free linear quadratic control via reduction to expert prediction. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3108–3117, 2019.
[3] Marc Abeille and Alessandro Lazaric. Improved regret bounds for thompson sampling in linear quadratic control problems. In International Conference on Machine Learning, pages 1–9, 2018.
[4] Naman Agarwal, Brian Bullins, Elad Hazan, Sham M Kakade, and Karan Singh. Online control with adversarial disturbances. arXiv preprint arXiv:1902.08721, 2019.
[5] Anayo K Akametalu, Jaime F Fisac, Jeremy H Gillula, Shahab Kaynama, Melanie N Zeilinger, and Claire J Tomlin. Reachability-based safe learning with gaussian processes. In 53rd IEEE Conference on Decision and Control, pages 1424–1431. IEEE, 2014.
[6] Oren Anava, Elad Hazan, Shie Mannor, and Ohad Shamir. Online learning for time series prediction. In Conference on learning theory, pages 172–184, 2013.
[7] Oren Anava, Elad Hazan, and Shie Mannor. Online learning for adversaries with memory: price of past mistakes. In Advances in Neural Information Processing Systems, pages 784–792, 2015.
[8] James Anderson and Nikolai Matni. Structured state space realizations for sls distributed controllers. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 982–987. IEEE, 2017.
[9] James Anderson, John C Doyle, Steven Low, and Nikolai Matni. System level synthesis. arXiv preprint arXiv:1904.01634, 2019.
[10] MOSEK ApS. MOSEK Optimizer API for Python Release 9.0.88, 2019. URL https://docs.mosek. com/9.0/pythonapi.pdf.
[11] Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, and Claire Tomlin. Combining optimal control and learning for visual navigation in novel environments. arXiv preprint arXiv:1903.02531, 2019.
[12] Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, and Andreas Krause. Safe model-based reinforcement learning with stability guarantees. In Advances in neural information processing systems, pages 908–918, 2017.
[13] Peter J Bickel, David A Freedman, et al. Some asymptotic theory for the bootstrap. The annals of statistics, 9(6):1196–1217, 1981.
[14] Ross Boczar, Nikolai Matni, and Benjamin Recht. Finite-data performance guarantees for the outputfeedback control of an unknown system. In 2018 IEEE Conference on Decision and Control (CDC), pages 2994–2999. IEEE, 2018.
[15] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
[16] Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. arXiv preprint arXiv:1903.08792, 2019.
[17] Michael R Chernick. Bootstrap methods: A guide for practitioners and researchers, volume 619. John Wiley & Sons, 2011.
[18] Adam Coates and Andrew Y Ng. Learning feature representations with k-means. In Neural networks: Tricks of the trade, pages 561–580. Springer, 2012.
[19] Felipe Codevilla, Matthias Miiller, Antonio López, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–9. IEEE, 2018.
[20] Alon Cohen, Tomer Koren, and Yishay Mansour. Learning linear-quadratic regulators efficiently with onlyarXiv preprint arXiv:1902.06223, 2019.
[21] M. Dahleh and J. Pearson. -optimal feedback controllers for MIMO discrete-time systems. IEEE Transactions on Automatic Control, 32(4):314–322, April 1987. ISSN 0018-9286. doi: 10.1109/TAC.1987. 1104603.
[22] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. On the sample complexity of the linear quadratic regulator. arXiv preprint arXiv:1710.01688, 2017.
[23] Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4192–4201, 2018.
[24] Charles A Desoer and Mathukumalli Vidyasagar. Feedback systems: input-output properties, volume 55. Siam, 1975.
[25] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. arXiv preprint arXiv:1711.03938, 2017.
[26] Salar Fattahi and Somayeh Sojoudi. Data-driven sparse system identification. In 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 462–469. IEEE, 2018.
[27] Salar Fattahi, Nikolai Matni, and Somayeh Sojoudi. Learning sparse dynamical systems from a single sample trajectory. arXiv preprint arXiv:1904.09396, 2019.
[28] Alessandro Giusti, Jérôme Guzzi, Dan C Cireşan, Fang-Lin He, Juan P Rodríguez, Flavio Fontana, Matthias Faessler, Christian Forster, Jürgen Schmidhuber, Gianni Di Caro, et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 1(2):661–667, 2015.
[29] Moritz Hardt, Tengyu Ma, and Benjamin Recht. Gradient descent learns linear dynamical systems. The Journal of Machine Learning Research, 19(1):1025–1068, 2018.
[30] Babak Hassibi and Thomas Kaliath. H-bounds for least-squares estimators. IEEE Transactions on Automatic Control, 46(2):309–314, 2001.
[31] Joel A Hesch, Dimitrios G Kottas, Sean L Bowman, and Stergios I Roumeliotis. Camera-imu-based localization: Observability analysis and consistency improvement. The International Journal of Robotics Research, 33(1):182–201, 2014.
[32] Lukas Hewing and Melanie N Zeilinger. Cautious model predictive control using gaussian process regression. arXiv preprint arXiv:1705.10702, 2017.
[33] Diederich Hinrichsen and Anthony J Pritchard. Real and complex stability radii: a survey. In Control of uncertain systems, pages 119–162. Springer, 1990.
[34] Eagle S Jones and Stefano Soatto. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. The International Journal of Robotics Research, 30(4):407–430, 2011.
[35] Jonathan Kelly and Gaurav S Sukhatme. Visual-inertial sensor fusion: Localization, mapping and sensor-to-sensor self-calibration. The International Journal of Robotics Research, 30(1):56–79, 2011.
[36] Vitaly Kuznetsov and Mehryar Mohri. Time series prediction and online learning. In Conference on Learning Theory, pages 1190–1213, 2016.
[37] Alexander Lambert, Amirreza Shaban, Amit Raj, Zhen Liu, and Byron Boots. Deep forward and inverse perceptual models for tracking and prediction. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 675–682. IEEE, 2018.
[38] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
[39] Yi Lin, Fei Gao, Tong Qin, Wenliang Gao, Tianbo Liu, William Wu, Zhenfei Yang, and Shaojie Shen. Autonomous aerial navigation using monocular visual-inertial fusion. Journal of Field Robotics, 35(1): 23–51, 2018.
[40] Giuseppe Loianno, Chris Brunner, Gary McGrath, and Vijay Kumar. Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu. IEEE Robotics and Automation Letters, 2(2):404–411, 2016.
[41] Simon Lynen, Markus W Achtelik, Stephan Weiss, Margarita Chli, and Roland Siegwart. A robust and modular multi-sensor fusion approach applied to mav navigation. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 3923–3929. IEEE, 2013.
[42] Simon Lynen, Torsten Sattler, Michael Bosse, Joel A Hesch, Marc Pollefeys, and Roland Siegwart. Get out of my lab: Large-scale, real-time visual-inertial localization. In Robotics: Science and Systems, 2015.
[43] Horia Mania, Stephen Tu, and Benjamin Recht. Certainty equivalent control of lqr is efficient. arXiv preprint arXiv:1902.07826, 2019.
[44] Nikolai Matni, Yuh-Shyang Wang, and James Anderson. Scalable system level synthesis for virtually localizable systems. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 3473–3480. IEEE, 2017.
[45] David Q Mayne, SV Raković, Rolf Findeisen, and Frank Allgöwer. Robust output feedback model predictive control of constrained linear systems. Automatica, 42(7):1217–1222, 2006.
[46] Raul Mur-Artal and Juan D. Tardós. ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. CoRR, abs/1610.06475, 2016. URL http://arxiv.org/abs/1610.06475.
[47] C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot. Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 4029–4036, May 2014. doi: 10.1109/ICRA.2014.6907444.
[48] Y. Ouyang, M. Gagrani, and R. Jain. Control of unknown linear systems with thompson sampling. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1198–1205, Oct 2017. doi: 10.1109/ALLERTON.2017.8262873.
[49] Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. arXiv preprint arXiv:1806.05722, 2018.
[50] Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, and Byron Boots. Agile autonomous driving using end-to-end deep imitation learning. Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania, 2018.
[51] José Pereira, Morteza Ibrahimi, and Andrea Montanari. Learning networks of stochastic differential equations. In Advances in Neural Information Processing Systems, pages 172–180, 2010.
[52] Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems, pages 305–313, 1989.
[53] Anders Rantzer. Concentration bounds for single parameter adaptive control. In 2018 Annual American Control Conference (ACC), pages 1862–1866. IEEE, 2018.
[54] R Tyrrell Rockafellar, Stanislav Uryasev, et al. Optimization of conditional value-at-risk. Journal of risk, 2:21–42, 2000.
[55] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. A tutorial on thompson sampling. Foundations and Trends on Machine Learning, 11(1):1–96, July 2018. doi: 10.1561/2200000070.
[56] Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016.
[57] Tuhin Sarkar and Alexander Rakhlin. How fast can linear dynamical systems be learned? arXiv preprint arXiv:1812.01251, 2018.
[58] Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite-time system identification for partially observed lti systems of unknown order. arXiv preprint arXiv:1902.01848, 2019.
[59] Davide Scaramuzza and Friedrich Fraundorfer. Visual odometry [tutorial]. IEEE robotics & automation magazine, 18(4):80–92, 2011.
[60] Yaroslav D Sergeyev and Dmitri E Kvasov. L ipschitz global optimization. Wiley Encyclopedia of Operations Research and Management Science, 2010.
[61] Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473, 2018.
[62] Sarah Tang, Valentin Wüest, and Vijay Kumar. Aggressive flight with suspended payloads using vision-based control. IEEE Robotics and Automation Letters, 3(2):1152–1159, 2018.
[63] Andrew J Taylor, Victor D Dorobantu, Hoang M Le, Yisong Yue, and Aaron D Ames. Episodic learning with control lyapunov functions for uncertain robotic systems. arXiv preprint arXiv:1903.01577, 2019.
[64] Anastasios Tsiamis and George J Pappas. Finite sample analysis of stochastic system identification. arXiv preprint arXiv:1903.09122, 2019.
[65] Kim P Wabersich and Melanie N Zeilinger. Linear model predictive safety certification for learning-based control. In 2018 IEEE Conference on Decision and Control (CDC), pages 7130–7135. IEEE, 2018.
[66] Zhaoyang Wan and Mayuresh V Kothare. Robust output feedback model predictive control using off-line linear matrix inequalities. Journal of Process Control, 12(7):763–774, 2002.
[67] Yuh-Shyang Wang, Nikolai Matni, and John C Doyle. A system level approach to controller synthesis. arXiv preprint arXiv:1610.04815, 2016.
[68] Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Informationtheoretic model predictive control: Theory and applications to autonomous driving. IEEE Transactions on Robotics, 34(6):1603–1622, 2018.
[69] GR Wood and BP Zhang. Estimation of the lipschitz constant of a function. Journal of Global Optimization, 8(1):91–103, 1996.
[70] Sholeh Yasini and Kristiaan Pelckmans. Worst-case prediction performance analysis of the kalman filter. IEEE Transactions on Automatic Control, 63(6):1768–1775, 2018.
[71] Seungil You and Ather Gattami. H infinity analysis revisited. arXiv preprint arXiv:1412.6160, 2014.
[72] Kemin Zhou, John Comstock Doyle, Keith Glover, et al. Robust and optimal control, volume 40. Prentice Hall, 1996.
Table 1: Different noise model classes induce different cost functions, and can be used to model different phenomenon, or combinations thereof [21, 72].
Table 1 shows several common cost functions that arise from different system desiderata and different classes of disturbances and measurement errors ). We recall the power norm,
We now recall some examples from linear optimal control in the partially observed setting.
Example A.1 (Linear Quadratic Regulator). Suppose that the cost function is given by
for some user-specified positive definite matrices Q and , and that the controller is given full information about the system, i.e., that C = I and
= 0 such that the measurement model collapses to
. Then the optimal control problem reduces to the familiar Linear Quadratic Regulator (LQR) problem
For stabilizable (A, B), and detectable (A, Q), this problem has a closed-form stabilizing controller based on the solution of the discrete algebraic Riccati equation (DARE) [72]. This optimal control policy is linear, and given by
where P is the positive-definite solution to the DARE defined by (A, B, Q, R).
Example A.2 (Linear Quadratic Gaussian Control). Suppose that we have the same setup as the previous example, but that now the measurement is instead given by (4) for some C such that the pair (A, C) is detectable, and that ). Then the optimal control problem reduces to the Linear Quadratic Gaussian (LQG) control problem, the solution to which is:
where ˆis the Kalman filter estimate of the state at time k. The steady state update rule for the state estimate is given by
for filter gain +
where P is the solution to the DARE defined by (
). This optimal output feedback controller satisfies the separation principle, meaning that the optimal controller
is computed independently of the optimal estimator gain
These first two examples are widely known due to the elegance of their closed-form solutions and the simplicity of implementing the optimal controllers. However, this optimality rests on stringent assumptions about the distribution of the disturbance and the measurement noise. We now turn to an example, first introduced in Example 2.2, for which disturbances are adversarial and the separation principle fails.
Consider a waypoint tracking problem where it is known that both the distances between waypoints and sensor errors are instantaneously bounded, and we want to ensure that the system remains within a bounded distance of the waypoints. In this setup, the
optimal control problem is most natural, and our cost function is then
for some user-specified positive definite matrices . Then if the optimal cost is less than 1, we can guarantee that
for all possible realizations of the waypoint and sensor error processes. Considering the one-step lookahead case,
we can define the augmented state
= [
] and pose the problem with bounded disturbances
. We can then formulate the following
optimal control problem,
This optimal control problem is an instance of robust control [21]. The optimal controller does not obey the separation principle, and as such, there is no clear notion of an estimated state.
The robust SLS procedure we propose and analyze requires solving a finite dimensional approximation to an infinite dimensional optimization problem, as and the corresponding constraints (9) and objective function are infinite dimensional objects. As an approximation, we restrict the system responses
to be finite impulse response (FIR) transfer matrices of length T = 200, i.e., we enforce that
) = 0. We then solve the resulting optimization problem with MOSEK under an academic license [10]. More explicitly, we define vec(F ) :=
are the FIR coefficients of the system responses. We further define
The SLS constraints (9) and FIR condition are then enforced as
We then solve the following optimization problem
where the cost() and norm(
) operators are problem dependent. For the
robust problem, both the cost function and robust norm constraint reduce to the
induced matrix norm for an FIR transfer response F with coefficients in
where vec(denotes the ith row of vec(F ). For the robust LQG problem, the cost function reduces to the Frobenius norm for an FIR cost transfer matrix Z, i.e.,
The robustness constraint is the same induced matrix norm as in the
control problem.