Learning Stabilizable Nonlinear Dynamics with Contraction-Based Regularization

2019·Arxiv

Abstract

Abstract

We propose a novel framework for learning stabilizable nonlinear dynamical systems for continuous control tasks in robotics. The key contribution is a control-theoretic regularizer for dynamics fitting rooted in the notion of stabilizability, a constraint which guarantees the existence of robust tracking controllers for arbitrary open-loop trajectories generated with the learned system. Leveraging tools from contraction theory and statistical learning in Reproducing Kernel Hilbert Spaces, we formulate stabilizable dynamics learning as a functional optimization with convex objective and bi-convex functional constraints. Under a mild structural assumption and relaxation of the functional constraints to sampling-based constraints, we derive the optimal solution with a modified Representer theorem. Finally, we utilize random matrix feature approximations to reduce the dimensionality of the search parameters and formulate an iterative convex optimization algorithm that jointly fits the dynamics functions and searches for a certifi-cate of stabilizability. We validate the proposed algorithm in simulation for a planar quadrotor, and on a quadrotor hardware testbed emulating planar dynamics. We verify, both in simulation and on hardware, significantly improved trajectory generation and tracking performance with the control-theoretic regularized model over models learned using traditional regression techniques, especially when learning from small supervised datasets. The results support the conjecture that the use of stabilizability constraints as a form of regularization can help prune the hypothesis space in a manner that is tailored to the downstream task of trajectory generation and feedback control, resulting in models that are not only dramatically better conditioned, but also data efficient.

1 Introduction

The problem of efficiently and accurately estimating an unknown dynamical system,

from a small set of sampled trajectories, where is the state and is the control input, is a central task in model-based Reinforcement Learning (RL). In this setting, a robotic agent strives to pair an estimated dynamics model with a feedback policy in order to act optimally in a dynamic and uncertain environment. The model of the dynamical system can be continuously updated as the robot experiences the consequences of its actions, and the improved model can be leveraged for different tasks, affording a natural form of transfer learning. When it works, model-based RL typically offers major improvements in sample efficiency in comparison to state-of-the-art model-free methods such as Policy Gradients (Chua et al., 2018; Nagabandi et al., 2017) that do not explicitly estimate the underlying system. Yet, all too often, when standard supervised learning with powerful function approximators such as Deep Neural Networks and Kernel Methods are applied to model complex dynamics, the resulting controllers do not perform on par with model-free RL methods in the limit of increasing sample size, due to compounding errors across long time horizons. The main goal of this paper is to develop a new control-theoretic regularizer for dynamics fitting rooted in the notion of stabilizability, which guarantees that the existence of a robust tracking controller for arbitrary open-loop trajectories generated with the learned system.

Problem Statement: The motion planning task we wish to solve is to compute a (possibly non-stationary) policy mapping state and time to control that drives any given initial state to a desired compact goal region, while satisfying state and control input constraints, and minimizing some task specific performance cost (e.g., control effort and time to completion). However, in this work, we assume that the dynamics function F(x, u) is unknown to us and we are instead provided with a dataset of tuples taken from a collection of observed trajectories (e.g., expert demonstrations) on the robot. Accordingly, the objective of this work is to learn a dynamics model ) for the robot that is subsequently amenable for use within standard planning algorithms.

Approach Overview: Our parametrization of the policy takes the form where () is a nominal open-loop state-input control trajectory tuple, and ) is a feedback tracking controller. The performance of such a policy however, is strongly reliant upon the quality of the computed state-input trajectory and the tracking controller.

Formally, a reference state-input trajectory tuple (] for system (1) is termed 0 if there exists a feedback controller such that the solution x(t) of the system:

converges exponentially to

for some constant C > 0. The system (1) is termed in an open, connected, bounded region if all state trajectories ) satisfying are exponentially stabilizable at rate

In this work, we illustrate that na¨ıve regression techniques used to estimate the dynamics model from a small set of sample trajectories can yield model estimates that are severely ill-conditioned for trajectory generation and feedback control. Instead, this work advocates for the use of a constrained

regression approach in which one attempts to solve the following problem:

where H is an appropriate normed function space and 0 is a regularization parameter. Note that we use (ˆ) to differentiate the learned dynamics from the true dynamics. We demonstrate that for systems that are indeed stabilizable, enforcing such a constraint drastically prunes the hypothesis space, and therefore plays the role of a that is potentially more powerful and ultimately, more pertinent for the downstream control task of generating and tracking new trajectories.

Statement of Contributions: Stabilizability of trajectories is not only a complex task in non-linear control, but also a difficult notion to capture (in an algebraic sense) within a unified control theory. In this work, we leverage recent advances in contraction theory for control design through the use of Control Contraction Metrics (CCMs) (Manchester and Slotine, 2017; Singh et al., 2017) that turn stabilizability constraints into convex state-dependent Linear Matrix Inequalities (LMIs). Contraction theory (Lohmiller and Slotine, 1998) is a method of analyzing nonlinear systems in a differential framework, i.e., via the associated variational system (Crouch and van der Schaft, 1987, Chp 3), and is focused on the study of convergence between pairs of state trajectories towards each other. Thus, at its core, contraction explores a stronger notion of stability – that of incremental stability between solution trajectories, instead of the stability of an equilibrium point or invariant set. Importantly, we harness recent results in (Manchester et al., 2015; Manchester and Slotine, 2017; Singh et al., 2017) that illustrate how to use contraction theory to obtain a certificate for trajectory stabilizability and an accompanying tracking controller with exponential stability properties. For self containment, we provide a brief summary of these results in Section 3, which in turn will form the foundation of this work.

Our paper makes the following primary contributions.

• We formulate the learning stabilizable dynamics problem through the lens of control contraction metrics (Section 4). The resulting optimization problem is not only infinite-dimensional, as it is formulated over function spaces, but also infinitely-constrained due to the state-dependent LMI representing the stabilizability constraint.

• Under an arguably weak assumption on the structural form of the true dynamics model and a relaxation of the functional constraints to sampling-based constraints (Section 5), we derive a Representer Theorem (Scholk¨opf and Smola, 2001) specifying the form of the optimal solutions for the dynamics functions and the certificate of stabilizability by leveraging the powerful framework of vector-valued Reproducing Kernel Hilbert Spaces (Section 6). We motivate the sampling-based relaxation of the functional constraints from a standpoint of viewing the stabilizability condition as a novel control-theoretic regularizer for dynamics learning.

• By leveraging theory from randomized matrix feature approximations, we derive a tractable algorithm leveraging alternating convex optimization problems and adaptive sampling to iteratively solve a finite-dimensional optimization problem (Section 7).

• We perform an extensive set of numerical simulations on a 6-state, 2-input planar quadrotor model and provide a comprehensive study of various aspects of the iterative algorithm. Specifically, we demonstrate that na¨ıve regression-based dynamics learning can yield estimated models that generate completely unstabilizable trajectories. In contrast, the control-theoretic regularized model generates vastly superior quality trackable trajectories, especially when learning from small supervised datasets (Sections 2.1 and 7.2).

• We validate our algorithm on a quadrotor testbed (Section 8) with partially closed control loops to emulate a planar quadrotor, where we verify that the stabilizability regularization effects in low-data regimes observed in simulations does indeed generalize to real-world noisy data. In particular, with just 150 noisy tuples of (), we are able to stably track a challenging test trajectory, which is generated with the learned model and substantially different from any of the training data. In contrast, a model learned using traditional regression techniques leads to consistently unstable behavior and eventual failure as the quadrotor repeatedly flips out of control and crashes (see Figure 1).

Figure 1: Time-lapse of a quadrotor trying to execute a figure-eight maneuver (blue curve) using a reference trajectory and an LQR feedback tracking controller generated using the learned dynamical system. Left: Model learned using traditional ridge-regression; Right: Model learned using control-theoretic regularization proposed within this work. The models were trained with the same, extremely limited (150 points) set of () supervisory tuples. The quadrotor consistently failed and crashed into the floor with the trajectory and controller generated by the model learned with ridge-regression; the red triangles mark the points along the reference and actual trajectories at moment of crash – a separation of 1.6 m. In contrast, despite imperfect tracking (not unexpected given the extremely limited amount of supervision given to the learning algorithm), which leads to a slight graze along the floor at one point during the maneuver, the quadrotor manages to maintain bounded tracking error while using the model learned with control-theoretic regularization.

A preliminary version of this paper was presented at WAFR 2018 (Singh et al., 2018). In this revised and extended version, we include the following additional contributions: (i) rigorous derivation of the stabilizability-regularized finite-dimensional optimization problem using RKHS theory and random matrix features; (ii) extensive additional numerical studies into the convergence behavior of the iterative algorithm and comparison with traditional ridge-regression techniques; and (iii) validation of the algorithm on a quadrotor testbed with partially closed control loops to emulate a planar quadrotor.

Related Work: Model-based RL has enjoyed considerable success in various application domains within robotics such as underwater vehicles (Cui et al., 2017), soft robotic manipulators (Thuruthel et al., 2019), and control of agents with non-stationary dynamics (Ohnishi et al., 2019). While the literature on model-based RL is substantial; see (Polydoros and Nalpantidis, 2017) for a recent review, we focus our attention on five broad categories relevant to the problem we address in this work. Namely, these are: (i) direct regression for learning the full dynamics, where one ignores any control-theoretic notions tied to the learning task and treats dynamics estimation as a standard regression problem; (ii) residual learning, where one only attempts to learn corrections to a nominal prediction model that may have been derived, for example, from physics-based reasoning; (iii) uncertainty-aware model-based RL, where one tries to additionally represent the uncertainty in the learned model using probabilistic representations that are subsequently leveraged within the planning phase using robust or stochastic control techniques; (iv) hybrid model-based/model-free methods; and (v) imitation learning, where one learns dynamical representations of stable closed-loop behavior for a set of outputs (e.g., the end-effector on a robotic arm), and assumes knowledge of the robot controlled dynamics to realize the learned closed-loop motion, for instance, using dynamic inversion.

The simplest approach to learning dynamics is to ignore stabilizability and treat the problem as a standard one-step time series regression task (Punjani and Abbeel, 2015; Bansal et al., 2016; Nagabandi et al., 2017; Polydoros and Nalpantidis, 2017). However, coarse dynamics models trained on limited training data typically generate trajectories that rapidly diverge from expected paths, inducing controllers that are ineffective when applied to the true system. This divergence can be reduced by expanding the training data with corrections to boost multi-step prediction accuracy (Venkatraman et al., 2015, 2016). Despite being effective, these methods are still heuristic in the sense that the existence of a stabilizing feedback controller is not explicitly guaranteed. Alternatively, one can leverage strong physics-based priors and use learning to only regress the unmodeled dynamics. For instance, (Mohajerin et al., 2019; Shi et al., 2019; Punjani and Abbeel, 2015) aim to capture the unmodeled aerodynamic disturbance terms as corrections to a prior rigid body dynamics model. (Punjani and Abbeel, 2015) accomplish this for helicopter dynamics using a deep neural network, but then do not use the learned model for control. (Shi et al., 2019) attempt to capture the unmodeled ground-effect forces on quadrotors to build better controllers for near-ground tracking and precision landing. (Mohajerin et al., 2019) leverage a residual RNN in combination with a rigid-body model to generate time-series predictions for linear and angular velocities of a quadrotor as a function of current state and candidate future motor inputs, but do not use the model for closed-loop control. Finally, (Zhou et al., 2017) adopt a different perspective to learning “corrections” in that they attempt to learn the inverse dynamics (output to reference) for a system and pre-cascade the resulting predictions to correct an existing controller’s reference signal in order to improve trajectory tracking performance. The approach relies on the existence of a stabilizing controller and the stability of the system’s zero dynamics, thereby decoupling the effects of learning from stability. In similar spirit, (Taylor et al., 2019) leverage input-output feedback linearization to derive a Control Lyapunov Function (CLF) for a nominal dynamics model, assume that this function is a CLF for the actual dynamics as well, and regress only the correction terms in the derivative of this CLF. While leveraging physics-based priors can certainly be powerful, especially when the residual errors to be learned are small enough such that the system is feedback stabilizable with a controller derived from the physics model, in this work we are interested in the far more challenging scenario when such priors are unavailable and the full dynamics model must be learned from scratch. While exemplified using quadrotor models that can certainly be accurately stabilized even in the absence of learning, the insights provided in this work shed light on fundamental topics in the context of control-theoretic learning, which hopefully may influence dynamics-learning methods in more complex settings where priors are unavailable or too simple to be useful for adequate control.

An alternative strategy to cope with error in the learned dynamics model is to use uncertainty-aware model-based RL where control policies are optimized with respect to stochastic rollouts from probabilistic dynamics models (Kocijan et al., 2004; Kamthe and Deisenroth, 2018; Deisenroth and Rasmussen, 2011; Chua et al., 2018). For instance, PILCO (Deisenroth and Rasmussen, 2011) leverages a Gaussian Process (GP) state transition model and moment matching to analytically estimate the expected cost of a rollout with respect to the induced distribution. (Kamthe and Deisenroth, 2018) extend this formulation using nonlinear model predictive control (MPC) to incorporate chance constraints. (Chua et al., 2018) leverage an ensemble of probabilistic models to capture both epistemic (i.e., model) and aleatoric (i.e., intrinsic) uncertainty, and compute their control policy in receding horizon fashion through finite sample approximation of the random cost. Probabilistic models such as GPs may also be used to capture the residual error between a nominal physics-based model and the true dynamics. In (Ostafew et al., 2016), a GP is incrementally learned over multiple trials to capture unmodeled disturbances. The 3prediction range is subsequently leveraged to formulate chance constraints as a robust nonlinear MPC problem. The goal of (Fisac et al., 2017) and (Berkenkamp et al., 2017) is motivated from a safety perspective, where one wishes to actively learn a control policy while remaining “safe” in the presence of unmodeled dynamics, represented as GPs. The authors in (Fisac et al., 2017) leverage Hamilton-Jacobi reachability analysis to give high-probability invariance guarantees for a region of the state-space within which the learning controller is free to explore. On the other hand, (Berkenkamp et al., 2017) utilize Lyapunov analysis and smoothness arguments to incrementally grow the Lyapunov function’s region of attraction while simultaneously updating the GP. For the special case where the underlying dynamics are linear-time-invariant, (Dean et al., 2019) derive high-probability convergence rates for the estimated model and leverage system-level robust control techniques (Wang et al., 2019) for guaranteeing state and control constraint satisfaction.

While utilizing probabilistic prediction models along with a control strategy that incorporates this uncertainty, such as robust or approximate stochastic MPC, can certainly help guard against imperfect dynamics models, large uncertainty in the dynamics can lead to overly conservative strategies. This is true especially when the learned model is not merely a correction or residual term, or if the probabilistic model is computationally intractable to use within planning (e.g., GPs without additional sparsifying simplifications), thereby forcing conservative approximations. Finally, with the exception of the “safe” RL methods mentioned above, the learning algorithms themselves do not incorporate knowledge of the downstream application of the function being regressed, in that learning is viewed purely from a statistical point-of-view, rather than within a control-theoretic context.

More recently, hybrid combinations of model-based and model-free techniques have gained attention within the learning community. The authors in (Bansal et al., 2017) use Bayesian optimization to find an optimal linear dynamics model whose induced MPC policy minimizes the task-specific cost. In similar spirit, (Amos et al., 2018) differentiate through the fixed-point solutions of a parametric MPC problem to find optimal MPC cost and dynamics functions in order to minimize the actual task-specific cost. (Nagabandi et al., 2017) use behavioral cloning with respect to an MPC policy generated from a learned dynamics model to initialize model-free policy fine-tuning. The works in (Levine et al., 2016; Finn et al., 2016; Chebotar et al., 2017) leverage subroutines where local time-varying dynamics are fitted around a set of policy rollouts, and then used to perform trajectory optimization via an LQR backward pass. The induced local linear-time-varying policy from this rollout is then used as a supervisory signal for global policy optimization. While these lines of work try to frame dynamics fitting within the downstream context of the task, thereby imbuing the resulting learning algorithm with a more closed-loop flavor, the learned dynamics may be substantially different from the actual dynamics of the robot since, with the exception of the local time-varying dynamics fitting, the true goal is to optimize the task-specific cost. This can yield distorted dynamic models whose induced policies are more cost-optimal than policies extracted from the true dynamics. Thus, while the work presented herein espouses a closed-loop learning ideology, it does so from the control-theoretic perspective of trajectory stabilizability, i.e., the true objective is dynamics fitting which will subsequently be used to derive optimal trajectories and tracking controllers.

Finally, we address lines of work closest in spirit to this work. Learning dynamical systems satisfying some desirable stability properties (such as asymptotic stability about an equilibrium point, e.g., for point-to-point motion) has been studied in the autonomous case, ˙x(t) = f(x(t)), in the context of imitation learning. In this line of work, one assumes perfect knowledge and invertibility of the robot’s controlled dynamics to solve for the input that realizes this desirable closed-loop motion (Lemme et al., 2014; Khansari-Zadeh and Khatib, 2017; Ravichandar et al., 2017; Khansari-Zadeh and Billard, 2011; Medina and Billard, 2017). In particular, for a vector-valued RKHS formulation in the autonomous case with constant (identity) contraction metric, see (Sindhwani et al., 2018). Crucially, in our work, we do not require knowledge or invertibility of the robot’s controlled dynamics. We seek to learn the full controlled dynamics of the robot, under the constraint that the resulting learned dynamics generate dynamically feasible and most importantly, stabilizable trajectories. Thus, this work generalizes existing literature by additionally incorporating the controllability limitations of the robot within the learning problem.

The tools we develop may also be used to extend standard adaptive robot control design, such as (Slotine and Li, 1987) – a technique which achieves stable concurrent learning and control using a combination of physical basis functions and general mathematical expansions, e.g. radial basis function approximations (Sanner and Slotine, 1992). Notably, our work allows us to handle complex underactuated systems – a consequence of the significantly more powerful function approximation framework developed herein, as well as of the use of a differential (rather than classical) Lyapunovlike setting, as we shall detail.

be the set of symmetric matrices in , respectively the set of symmetric positive semi-definite, respectively, positive definite matrices in matrix . We denote the components of a vector Euclidean norm as , and its weighted norm as Let ) denote a matrix with (entry given by the Lie derivative of the function along the vector y. Finally, let ¯) denote the maximum and minimum eigenvalues of a square matrix A.

2 Problem Formulation and Solution Methodology

In this section we formally outline the structure of the problem we wish to solve and describe a general solution methodology rooted in model-based RL. To motivate the contributions of this work, we additionally present an attempt at a solution that uses traditional model-fitting techniques, and demonstrate how it fails to capture the nuances of the problem and ultimately yields sub-par results.

Consider a robotic system with state is an open, connected, bounded subset of , and control is a closed, bounded subset of , governed by the following continuous-time dynamical system:

where F is Lipschitz continuous in the state for fixed control, so that for any measurable control function ), there exists a unique state trajectory. The motion planning task we wish to solve is to find a (possibly non-stationary) policy that (i) drives the state x to a compact region , (ii) satisfies the state and input constraints, and (iii) minimizes a quadratic cost:

where is the first time . While there exist several methods in the literature on how to solve this problem given knowledge of the dynamical system, in this work, we assume that we do not know the governing model F(x, u). The problem we wish to address is how to solve the above motion planning task, given a dataset of tuples from observed trajectories on the robot.

The solution approach presented in this work adopts the model-based RL paradigm, whereby one first estimates a model of the dynamical system ˆF(x, u) using some form of regression, and then uses the learned model to solve the motion planning task with traditional planning algorithms. In this work, our strategy to solve the planning task is to parameterize general state-feedback policies as a sum of a nominal (open-loop) input and a feedback term designed to track the nominal state trajectory (induced by

This formulation represents a compromise between the general class of state-feedback control laws (a computationally intractable space over which to optimize) and a purely open-loop formulation (i.e., no tracking). Note that we do not present a new methodology for solving the planning task. Specifically, it is assumed that there exists an algorithm for computing (i) the open-loop state and control trajectories ()) that minimize the open-loop cost:

and (ii) the feedback tracking controller ), given a dynamical model. The focus of this paper is on how to design the regression algorithm for computing the model estimate ˆF.

2.1 Motivating Example

We ground the formalism within the following running example that will feature throughout this work.

Example 1 (PVTOL). Consider the 6-state planar vertical-takeoff-vertical-landing (PVTOL) system depicted in Figure 2. The system is defined by the state (position in the 2D plane, (is the body-reference velocity, and (are the roll and angular rate respectively, and are the controlled motor thrusts. The true dynamics are given by:

where g is the acceleration due to gravity, m is the mass, l is the moment-arm of the thrusters, and J is the moment of inertia about the roll axis.

Figure 2: Definition of planar quadrotor state variables: l denotes the thrust moment arm (symmetric), and denote the right and left thrust forces respectively.

The planar quadrotor is a complex non-minimum phase dynamical system that has been heavily featured within the acrobatic robotics literature and therefore serves as a suitable case-study.

2.1.1 Solution Parametrization

The dynamics assume the general control-affine form:

where is the input matrix, depicted in column-stacked form as (). Let us define the model estimate also in control-affine form as ˙where ˆ). Consider, as a first solution attempt, the following linear parametrization for the vector-valued functions ˆ

where are constant vectors to be optimized over, and ΦΦare a priori chosen feature mappings. To replicate the sparsity structure of the PVTOL input matrix, the feature matrix Φhas all zeros in its first

The justification for a linear model and the construction of the feature mappings will be elaborated upon later. At this moment, we wish to study the quality of the learned models obtained from solving the following convex optimization problem:

where 0 are given regularization constants. Note that the above optimization corresponds to the ubiquitous ridge-regression problem and is therefore a viable solution approach.

To evaluate the feasibility of this solution approach, we extracted a collection of training tuples from simulations of the PVTOL system without any noise (for further details, please see Section 7.2). We learned three models: (i) N-R: un-regularized model10: standard ridge-regularized model with , and (iii) CCM-R: control-theoretic regularized model, corresponding to the algorithm proposed within this work and elaborated upon in the remaining paper.

We learned four versions of the model corresponding to varying training dataset sizes with . The dimensions of were both 576 (corresponding to 96 parameters per state dimension). The feature mappings themselves are described in Section 7.2 and Appendix A. The regularization constants were held fixed for all N.

2.1.2 Evaluation

The evaluation corresponded to the motion planning task of generating and tracking trajectories using the learned models. We gridded the () plane to create a set of 120 initial conditions between 4 m and 12 m away from (0, 0), and randomly sampled the other states for the rest of the initial conditions. These conditions were held fixed for all models and for all training dataset sizes to evaluate model improvement.

For each model at each value of N, the evaluation task was to (i) solve a trajectory optimization problem to compute a dynamically feasible trajectory for the learned model to go from initial state to the goal state – a stable hover at (0, 0) at near-zero velocity; and (ii) track this trajectory with a feedback controller computed using time-varying LQR (TV-LQR). Note that all simulations without any feedback controller (i.e., open-loop control rollouts) led to the PVTOL crashing. This is understandable since the dynamics fitting objective does not optimize for multi-step error. The trajectory optimization step was solved as a fixed-endpoint, fixed-final time optimal control problem using the Chebyshev pseudospectral method (Fahroo and Ross, 2002) with the objective of minimizing. The final time T for a given initial condition was held fixed between all models. Note that 120 trajectory optimization problems were solved for each model and each value of N.

Figure 4 shows a boxplot comparison of the trajectory-wise RMS full state errors (where ) is the reference trajectory obtained from the optimizer and x(t) is the actual realized trajectory) for each model and all training dataset sizes. As N increases, the spread of the RMS errors decreases for both R-R and CCM-R models as expected. However, we see that the N-R model generates several unstable trajectories for , indicating the need for a form of regularization.

For N = 100 (which is at the extreme lower limit of the necessary number of samples since there are 96 features for each dimension of the dynamics function), both N-R and R-R models generate a large number of unstable trajectories. In contrast, all trajectories generated with the CCM-R model were successfully tracked with bounded error. The CCM-R model consistently achieves a lower RMS error distribution than both the N-R and R-R models for all training dataset sizes. Most notable, however, is its performance when the number of supervision training samples is small (i.e., ) and there is considerable risk of overfitting. It appears the stabilizability constraints leveraged to compute the CCM-R model have a notable regularizing effect on the resulting model trajectories (recall that the initial conditions of the trajectories are held fixed between the models). In Figure 3, we highlight two trajectories that start from the same initial conditions – one generated and tracked using the R-R model, the other using the CCM-R model, for N = 250. Overlaid on the plot are snapshots of the vehicle outline itself, illustrating the aggressive flight-regime of the trajectories (the initial bank angle is 40While tracking the R-R model generated trajectory eventually ends in complete loss of control, the system successfully tracks the CCM-R model generated trajectory to the stable hover at (0, 0).

Figure 3: Comparison of reference and tracked trajectories in the () plane for R-R and CCM-R models starting at the same initial conditions with N = 250. Red (dashed): nominal, Blue (solid): actual, Green dot: start, Black dot: nominal endpoint, blue dot: actual endpoint; Top: CCM-R, Bottom: R-R. The vehicle successfully tracks the CCM-R model generated trajectory to the stable hover at (0, 0) while losing control when attempting to track the R-R model generated trajectory.

2.1.3 Effect of Regularization

At this point, one might wonder if the choice of the regularization parameter may be sub-optimal for the R-R model. Traditionally, such parameters are tuned using regression error on a validation dataset. In Figure 5 we plot the mean regression error over an independently sampled validation dataset of 2000 demonstration tuples, as a function of the regularization parameter for all R-R models.

Figure 5: Mean regression error over an independent validation dataset as a function of for the RR model learned using (9), with varying training set size N. The best out of sample performance is achieved with constant is fixed at 10

The plot illustrates that the best out-of-sample performance is achieved with ever, this corresponds to the N-R model which, as we learned in the previous section, generated several unstable trajectories for all training dataset sizes. This is not a surprising result; the feature mapping used as the basis for the dynamics model corresponds to the randomized matrix approximation of a reproducing kernel (see Section 6.4). Recent results, as in (Liang and Rakhlin, 2018) and references within, corroborate such a pattern and even advocate for “ridgeless” regression.

Given that the CCM-R model uses the same feature mapping as the R-R and N-R models (i.e., the model capacity of all three models is the same), and is given the same set of demonstration tuples, it appears that traditional model-fitting techniques such as ridge-regression and associated hyper-parameter tuning rules are ill-suited to learn representations of dynamics that are appropriate for planning and control. This motivates the need for constrained dynamics learning, where the notion of model stabilizability is encoded as a constraint within the learning algorithm (as opposed to the unconstrained optimization in (9)). In the next section, we introduce conditions for nonlinear trajectory stabilizability which can be encoded as algebraic constraints within a model learning algorithm to prune the hypothesis space in a manner that is tailored to the downstream task of trajectory generation and feedback control.

3 Review of Contraction Theory

The core principle behind contraction theory (Lohmiller and Slotine, 1998) is to study the evolution of distance between any two arbitrarily close neighboring trajectories and draw conclusions upon the distance between any finitely apart pair of trajectories. Given an autonomous system of the form: ˙x(t) = f(x(t)), consider two neighboring trajectories separated by an infinitesimal (virtual) displacement ; formally, is a vector in the tangent space . The dynamics of this virtual displacement are given by:

where is the Jacobian of f. The dynamics of the infinitesimal squared distance these two trajectories is then given by:

Then, if the (symmetric part) of the Jacobian matrix negative definite, i.e.,

where (

convergent to zero at rate 2. By path integration of pair of trajectories, one has that the distance between any two trajectories shrinks exponentially to zero. The vector field f is thereby referred to be is referred to as the contraction rate.

Contraction metrics generalize this observation by considering as infinitesimal squared length distance, a symmetric positive definite function mapping from X to the set of uniformly positive definite symmetric matrices. Formally, M(x) may be interpreted as a Riemannian metric tensor, endowing the space X with the Riemannian squared length element ). A fundamental result in contraction theory (Lohmiller and Slotine, 1998) is that any contracting system admits a contraction metric M(x) such that the associated function ) satisfies:

for some positive contraction rate . Thus, the function ) may be interpreted as a differential Lyapunov function.

3.1 Control Contraction Metrics

Control contraction metrics (CCMs) generalize contraction analysis to the controlled dynamical setting, in the sense that the analysis searches jointly for a controller design and the metric that describes the contraction properties of the resulting closed-loop system. Consider a control affine dynamical system of the form in (7). To define a CCM, analogously to the previous section, we first analyze the variational dynamics, i.e., the dynamics of an infinitesimal displacement

where is an infinitesimal (virtual) control vector at is a vector in the control input tangent space, i.e., ). A uniformly positive definite matrix-valued function M(x) is a CCM for the system {f, B} if there exists a function ) such that the function satisfies

Given the existence of a CCM, one can then construct an exponentially stabilizing (in the sense of (2)) feedback controller ) as described in Appendix B.

Some important observations are in order. First, the function ) may be interpreted as a differential CLF, in that there exists a stabilizing differential controller that stabilizes the variational dynamics (10) in the sense of (11). Second, and more importantly, we see that by stabilizing the variational dynamics (essentially an infinite family of linear dynamics in (pointwise, everywhere in the state-space, we obtain a stabilizing controller for the original nonlinear system. Crucially, this is an exact stabilization result, not one based on local linearization-based control. Consequently, one can show several useful properties, such as invariance to state-space transformations (Manchester and Slotine, 2017) and robustness (Singh et al., 2017; Manchester and Slotine, 2018). Third, the CCM approach only requires a weak form of controllability, and therefore is not restricted to feedback linearizable (i.e., invertible) systems.

4 CCM Constrained Dynamics Learning

Leveraging the characterization of stabilizability via CCMs, we can now formalize our dynamics learning problem as follows. Given a supervised dataset of demonstration tuples we wish to learn the dynamics functions f(x) and B(x) in (7), subject to the constraint that there exists a CCM M(x) for the learned dynamics. That is, the CCM M(x) plays the role of a certificate of stabilizability for the learned dynamics.

As shown in (Manchester and Slotine, 2017), a necessary and sufficient characterization of a CCM M(x) for the system is given in terms of its dual by the following two conditions:

where ˆis the annihilator matrix for ˆ. In the definition above, we write will be optimization variables in our formulation while treated as a hyper-parameter. Thus, the learning task reduces to finding the functions that jointly satisfy the above constraints, while minimizing an appropriate regularized regression

loss function. Formally, problem (3) can be re-stated as:

where are appropriately chosen -valued function classes on respectively, and is a suitable -valued function space on X. The objective is composed of a dynamics term – consisting of regression loss and regularization terms, and a metric term – consisting of a condition number surrogate loss on the metric W(x) and a regularization term. The metric cost term is motivated by the observation that the state tracking error (i.e., ) in the presence of bounded additive disturbances is proportional to the ratio w/w; see (Singh et al., 2017).

Notice that the coupling constraint (13) is a bi-linear matrix inequality in the decision variables ˆf and W. Thus, at a high-level, a solution algorithm must consist of alternating between two convex sub-problems, defined by the objective and decision variable pairs ((