The problem of efficiently and accurately estimating an unknown dynamical system,
from a small set of sampled trajectories, where is the state and
is the control input, is the central task in model-based Reinforcement Learning (RL). In this setting, a robotic agent strives to pair an estimated dynamics model with a feedback policy in order to optimally act in a dynamic and uncertain environment. The model of the dynamical system can be continuously updated as the robot experiences the consequences of its actions, and the improved model can be leveraged for different tasks affording a natural form of transfer learning. When it works, model-based Reinforcement Learning typically offers major improvements in sample efficiency in comparison to state of the art RL methods such as Policy Gradients [3,23] that do not explicitly estimate the underlying system. Yet, all too often, when standard supervised learning with powerful function approximators such as Deep Neural Networks and Kernel Methods are applied to model complex dynamics, the resulting controllers do not perform at par with model-free RL methods in the limit of increasing sample size, due to compounding errors across long time horizons. The main goal of this paper is to develop a new control-theoretic regularizer for dynamics fitting rooted in the notion of stabilizability, which guarantees that the learned system can be accompanied by a robust controller capable of stabilizing any trajectory that the system may generate.
Formally, a reference state-input trajectory pair for system (1) is termed exponentially stabilizable at rate
if there exists a feedback controller
such that the solution x(t) of the system:
converges exponentially to at rate
. That is,
for some constant C > 0. The system (1) is termed exponentially stabilizable at rate in an open, connected, bounded region
if all state trajectories
satisfying
are exponentially stabilizable at rate
.
Problem Statement: In this work, we assume that the dynamics function f(x, u) is unknown to us and we are instead provided with a dataset of tuples taken from a collection of observed trajectories (e.g., expert demonstrations) on the robot. Our objective is to solve the problem:
where H is an appropriate normed function space and is a regularization parameter. Note that we use
to differentiate the learned dynamics from the true dynamics. We expect that for systems that are indeed stabilizable, enforcing such a constraint may drastically prune the hypothesis space, thereby playing the role of a “control-theoretic” regularizer that is potentially more powerful and ultimately, more pertinent for the downstream control task of generating and tracking new trajectories.
Related Work: The simplest approach to learning dynamics is to ignore stabilizability and treat the problem as a standard one-step time series regression task [3,5,23]. However, coarse dynamics models trained on limited training data typically generate trajectories that rapidly diverge from expected paths, inducing controllers that are ineffective when applied to the true system. This divergence can be reduced by expanding the training data with corrections to boost multi-step prediction accuracy [35, 36]. In recent work on uncertainty-aware model-based RL, policies [3,23] are optimized with respect to stochastic rollouts from probabilistic dynamics models that are iteratively improved in a model predictive control loop. Despite being effective, these methods are still heuristic in the sense that the existence of a stabilizing feedback controller is not explicitly guaranteed.
Learning dynamical systems satisfying some desirable stability properties (such as asymptotic stability about an equilibrium point, e.g., for point-to-point motion) has been studied in the autonomous case, , in the context of imitation learning. In this line of work, one assumes perfect knowledge and invertibility of the robot’s controlled dynamics to solve for the input that realizes this desirable closed-loop motion [10, 11, 14, 20, 27, 31]. Crucially, in our work, we do not require knowledge, or invertibility of the robot’s controlled dynamics. We seek to learn the full controlled dynamics of the robot, under the constraint that the resulting learned dynamics generate dynamically feasible, and most importantly, stabilizable trajectories. Thus, this work generalizes existing literature by additionally incorporating the controllability limitations of the robot within the learning problem. In that sense, it is in the spirit of recent model-based RL techniques that exploit control theoretic notions of stability to guarantee model safety during the learning process [1]. However, unlike the work in [1] which aims to maintain a local region of attraction near a known safe operating point, we consider a stronger notion of safety – that of stabilizability, that is, the ability to keep the system within a bounded region of any exploratory open-loop trajectory.
Potentially, the tools we develop may also be used to extend standard adaptive robot control design, such as [33] – a technique which achieves stable concurrent learning and control using a combination of physical basis functions and general mathematical expansions, e.g. radial basis function approximations [29]. Notably, our work allows us to handle complex underactuated systems, a consequence of the significantly more powerful function approximation framework developed herein, as well as of the use of a differential (rather than classical) Lyapunov-like setting, as we shall detail.
Statement of Contributions: Stabilizability of trajectories is a complex task in non-linear control. In this work, we leverage recent advances in contraction theory for control design through the use of control contraction metrics (CCM) [17] that turns stabilizability constraints into convex Linear Matrix Inequalities (LMIs). Contraction theory [16] is a method of analyzing nonlinear systems in a differential framework, i.e., via the associated variational system [4, Chp 3], and is focused on the study of convergence between pairs of state trajectories towards each other. Thus, at its core, contraction explores a stronger notion of stability – that of incremental stability between solution trajectories, instead of the stability of an equilibrium point or invariant set. Importantly, we harness recent results in [17,19,32] that illustrate how to use contraction theory to obtain a certificate for trajectory stabilizability and an accompanying tracking controller with exponential stability properties. In Section 2, we provide a brief summary of these results, which in turn will form the foundation of this work.
Our paper makes four primary contributions. First, we formulate the learning stabilizable dynamics problem through the lens of control contraction metrics (Section 3). Second, under an arguably weak assumption on the sparsity of the true dynamics model, we present a finite-dimensional optimization-based solution to this problem by leveraging the powerful framework of vector-valued Reproducing Kernel Hilbert Spaces (Section 4.2). We further motivate this solution from a standpoint of viewing the stabilizability constraint as a novel control-theoretic regularizer for dynamics learning. Third, we develop a tractable algorithm leveraging alternating convex optimization problems and adaptive sampling to iteratively solve the finite-dimensional optimization problem (Section 5). Finally, we verify the proposed approach on a 6-state, 2-input planar quadrotor model where we demonstrate that naive regression-based dynamics learning can yield estimated models that generate completely unstabilizable trajectories. In contrast, the control-theoretic regularized model generates vastly superior quality, trackable trajectories, especially for smaller training sets (Section 6).
The core principle behind contraction theory [16] is to study the evolution of distance between any two arbitrarily close neighboring trajectories and drawing conclusions on the distance between any pair of trajectories. Given an autonomous system of the form: , consider two neighboring trajectories separated by an infinitesimal (virtual) displacement
(formally,
is a vector in the tangent space
at x). The dynamics of this virtual displacement are given by:
where is the Jacobian of f. The dynamics of the infinitesimal squared distance
between these two trajectories is then given by:
Then, if the (symmetric part) of the Jacobian matrix is uniformly negative defi-nite, i.e.,
where
ponentially convergent to zero at rate . By path integration of
between any pair of trajectories, one has that the distance between any two trajectories shrinks exponentially to zero. The vector field is thereby referred to be contracting at rate
. Contraction metrics generalize this observation by considering as infinitesimal squared length distance, a symmetric positive definite function
, where
, is a mapping from X to the set of uniformly positive-definite
symmetric matrices. Formally, M(x) may be interpreted as a Riemannian metric tensor, endowing the space X with the Riemannian squared length element
. A fundamental result in contraction theory [16] is that any contracting system admits a contraction metric M(x) such that the associated function
satisfies:
for some . Thus, the function
may be interpreted as a differential Lyapunov function.
2.1 Control Contraction Metrics
Control contraction metrics (CCMs) generalize contraction analysis to the controlled dynamical setting, in the sense that the analysis searches jointly for a controller design and the metric that describes the contraction properties of the resulting closed-loop system. Consider dynamics of the form:
where is the input matrix, and denote B in column form as
and u in component form as
. To define a CCM, analogously to the previous section, we first analyze the variational dynamics, i.e., the dynamics of an infinites-imal displacement
:
where is an infinitesimal (virtual) control vector at u (i.e.,
is a vector in the control input tangent space, i.e.,
). A CCM for the system {f, B} is a uniformly positive-definite symmetric matrix function M(x) such that there exists a function
so that the function
satisfies
where is the matrix with element (i, j) given by Lie derivative of
along the vector g. Given the existence of a valid CCM, one then constructs a stabilizing (in the sense of eq. (2)) feedback controller
as described in Appendix D.
Some important observations are in order. First, the function may be interpreted as a differential control Lyapunov function, in that, there exists a stabilizing differential controller
that stabilizes the variational dynamics (6) in the sense of eq. (7). Second, and more importantly, we see that by stabilizing the variational dynamics (essentially an infinite family of linear dynamics in
) pointwise, everywhere in the state-space, we obtain a stabilizing controller for the original nonlinear system. Crucially, this is an exact stabilization result, not one based on local linearization-based control. Consequently, one can show several useful properties, such as invariance to state-space transformations [17] and robustness [18,32]. Third, the CCM approach only requires a weak form of controllability, and therefore is not restricted to feedback linearizable (i.e., invertible) systems.
Using the characterization of stabilizability using CCMs, we can now formalize our problem as follows. Given our dataset of tuples , the objective of this work is to learn the dynamics functions f(x) and B(x) in eq. (5), subject to the constraint that there exists a valid CCM M(x) for the learned dynamics. That is, the CCM M(x) plays the role of a certificate of stabilizability for the learned dynamics.
As shown in [17], a necessary and sufficient characterization of a CCM M(x) is given in terms of its dual by the following two conditions:
where is the annihilator matrix for B, i.e.,
for all x. In the definition above, we write
since
will be optimization variables in our formulation. Thus, our learning task reduces to finding the functions {f, B, W} and constant
that jointly satisfy the above constraints, while minimizing an appro- priate regularized regression loss function. Formally, problem (3) can be re-stated as:
subject to eqs. (8), (9) (10b)
(10c)
where and
are appropriately chosen
-valued function classes on X for
and
respectively, and
is a suitable
-valued function space on X. The objective is composed of a dynamics term
– consisting of regression loss and regularization terms, and a metric term
– consisting of a condition number surrogate loss on the metric W(x) and a regularization term. The metric cost term
is motivated by the observation that the state tracking error (i.e.,
) in the presence of bounded additive disturbances is proportional to the ratio w/w (see [32]).
Notice that the coupling constraint (9) is a bi-linear matrix inequality in the decision variables sets and W. Thus at a high-level, a solution algorithm must consist of alternating between two convex sub-problems, defined by the objective/decision variable pairs
and
.
When performing dynamics learning on a system that is a priori known to be exponentially stabilizable at some strictly positive rate , the constrained problem formulation in (10) follows naturally given the assured existence of a CCM. Albeit, the infinite-dimensional nature of the constraints is a considerable technical challenge, broadly falling under the class of semi-infinite optimization [8]. Alternatively, for systems that are not globally exponentially stabilizable in X, one can imagine that such a constrained formulation may lead to adverse effects on the learned dynamics model.
Thus, in this section we propose a relaxation of problem (10) motivated by the concept of regularization. Specifically, constraints (8) and (9) capture this notion of stability of infinitesimal deviations at all points in the space X. They stem from requiring that in eq (7) when
, i.e., when
can have no effect on
. This is nothing but the standard control Lyapunov inequality, applied to the differential setting. Constraint (8) sets to zero, the terms in (7) affine in u, while constraint (9) enforces this “natural” stability condition.
The simplifications we make are (i) relax constraints (9) and (10c) to hold pointwise over some finite constraint set , and (ii) assume a specific sparsity structure for input matrix estimate
. We discuss the pointwise relaxation here; the sparsity assumption on
is discussed in the following section and Appendix A.
First, from a purely mathematical standpoint, the pointwise relaxation of (9) and (10c) is motivated by the observation that as the CCM-based controller is exponentially stabilizing, we only require the differential stability condition to hold locally (in a tubelike region) with respect to the provided demonstrations. By continuity of eigenvalues for continuously parameterized entries of a matrix, it is sufficient to enforce the matrix inequalities at a sampled set of points [13].
Second, enforcing the existence of such an “approximate” CCM seems to have an impressive regularization effect on the learned dynamics that is more meaningful than standard regularization techniques used in for instance, ridge or lasso regression. Specifically, problem (10), and more generally, problem (3) can be viewed as the projection of the best-fit dynamics onto the set of stabilizable systems. This results in dynamics models that jointly balance regression performance and stabilizablity, ultimately yielding systems whose generated trajectories are notably easier to track. This effect of regularization is discussed in detail in our experiments in Section 6.
Practically, the finite constraint set , with cardinality
, includes all
in the regression training set of
tuples. However, as the LMI constraints are independent of
, the set
is chosen as a strict superset of
(i.e.,
N) by randomly sampling additional points within X, drawing parallels with semi-supervised learning.
4.1 Sparsity of Input Matrix Estimate ˆB
While a pointwise relaxation for the matrix inequalities is reasonable, one cannot apply such a relaxation to the exact equality condition in (8). Thus, the second simplification made is the following assumption, reminiscent of control normal form equations.
Assumption 1 Assume to take the following sparse representation:
where b(x) is an invertible matrix for all
.
For the assumed structure of , a valid
matrix is then given by:
Therefore, constraint (8) simply becomes:
where is the upper-left
block of W(x). Assembling these constraints for the (p, q) entry of
, i.e.,
, we obtain:
Since the matrix b(x) in (11) is assumed to be invertible, the only solution to this equation is for
, and all
. That is,
cannot be a function of the last m components of x – an elegant simplifica-tion of constraint (8). Due to space limitations, justification for this sparsity assumption is provided in Appendix A.
4.2 Finite-dimensional Optimization
We now present a tractable finite-dimensional optimization for solving problem (10) under the two simplifying assumptions introduced in the previous sections. The derivation of the solution algorithm itself is presented in Appendix B, and relies extensively on vector-valued Reproducing Kernel Hilbert Spaces.
Step 1: Parametrize the functions , the columns of
, and
as a linear combination of features. That is,
where ,
,
are constant vectors to be optimized over, and
,
and
are a priori chosen feature mappings. To enforce the sparsity structure in (11), the feature matrix
must have all 0s in its first
columns. The features
are distinct from
in that the former are only a function of the first
components of x (as per Section 4.1). While one can use any function approximator (e.g., neural nets), we motivate this parameterization from a perspective of Reproducing Kernel Hilbert Spaces (RKHS); see Appendix B.
Step 2: Given positive regularization constants and positive tolerances
and
, solve:
We wish to highlight the following key points regarding problem (16). Constraints (16b) and (16c) are the pointwise relaxations of (9) and (10c) respectively. Constraint (16d) captures the fact that W(x) is a symmetric matrix. Finally, constraint (16e) imposes some tolerance requirements to ensure a well conditioned solution. Additionally, the
tolerances and
are used to account for the pointwise relaxations of the matrix inequalities. A key challenge is to efficiently solve this constrained optimization problem, given a potentially large number of constraint points in
. In the next section, we present an iterative algorithm and an adaptive constraint sampling technique to solve problem (16).
The fundamental structure of the solution algorithm consists of alternating between the dynamics and metric sub-problems derived from problem (16). We also make a few additional modifications to aid tractability, most notable of which is the use of a dynamically updating set of constraint points at which the LMI constraints are enforced at the
iteration. In particular
with
being ideally much less than
, the cardinality of the full constraint set
. Formally, each major iteration k is characterized by three minor steps (sub-problems):
1. Finite-dimensional dynamics sub-problem at iteration k:
where is an additional regularization parameter for s – an
dimensional non-negative slack vector. The quantity
is defined as
That is, captures the worst violation for the
LMI over the entire constraint set
, given the parameters at the end of iteration
. 2. Finite-dimensional metric sub-problem at iteration k:
3. Update sub-problem. Choose a tolerance parameter
. Then, define
Thus, in the update step, we balance addressing points where constraints are being violated (
) and discarding points where constraints are satisfied with sufficient strict inequality (
). This prevents overfitting to any specific subset of the constraint points. A potential variation to the union above is to only add up to say K constraint violating points from
(e.g., corresponding to the K worst violators), where K is a fixed positive integer. Indeed this is the variation used in our experiments and was found to be extremely efficient in balancing the size of the set
and thus, the complexity of each iteration. This adaptive sampling technique is inspired by exchange algorithms for semi-infinite optimization, as the one proposed in [37] where one is trying to enforce the constraints at all points in a compact set X.
Note that after the first major iteration, we replace the regularization terms in and
with
, and
. This is done to prevent large updates to the parameters, particularly due to the dynamically updating constraint set
. The full pseudocode is summarized below in Algorithm 1.
Some comments are in order. First, convergence in Algorithm 1 is declared if either progress in the solution variables stalls or all constraints are satisfied within tolerance. Due to the semi-supervised nature of the algorithm in that the number of constraint points can be significantly larger than the number of supervisory regression tuples N, it is impractical to enforce constraints at all
points in any one iteration. Two key consequences of this are: (i) the matrix function W(x) at iteration k resulting from variables
does not have to correspond to a valid dual CCM for the interim learned dynamics at iteration k, and (ii) convergence based on constraint satisfaction at all
points is justified by the fact that at each iteration, we are solving relaxed sub-problems that collectively generate a sequence of lower-bounds on the overall objective. Potential future topics in this regard are: (i) investigate the properties of the converged dynamics for models that are a priori unknown unstabilizable, and (ii) derive sufficient conditions for convergence for both the infinitely- and finitely- constrained versions of problem (10).
Second, as a consequence of this iterative procedure, the dual metric and contraction rate pair do not possess any sort of “control-theoretic” optimality. For instance, in [32], for a known stabilizable dynamics model, both these quantities are optimized for robust control performance. In this work, these quantities are used solely as regularizers to promote stabilizability of the learned model. A potential future topic to explore in this regard is how to further optimize
for control performance for the final learned dynamics.
In this section we validate our algorithms by benchmarking our results on a known dynamics model. Specifically, we consider the 6-state planar vertical-takeoff-vertical-landing (PVTOL) model. The system is defined by the state: where
is the position in the 2D plane,
is the body-reference velocity,
are the roll and angular rate respectively, and 2-dimensional control input u corresponding to the motor thrusts. The true dynamics are given by:
where g is the acceleration due to gravity, m is the mass, l is the moment-arm of the thrusters, and J is the moment of inertia about the roll axis. We note that typical benchmarks in this area of work either present results on the 2D LASA handwriting dataset [10] or other low-dimensional motion primitive spaces, with the assumption of full robot dynamics invertibility. The planar quadrotor on the other hand is a complex non-minimum phase dynamical system that has been heavily featured within the acrobatic robotics literature and therefore serves as a suitable case-study.
6.1 Generation of Datasets
The training dataset was generated in 3 steps. First, a fixed set of waypoint paths in were randomly generated. Second, for each waypoint path, multiple smooth polynomial splines were fitted using a minimum-snap algorithm. To create variation amongst the splines, the waypoints were perturbed within Gaussian balls and the time durations for the polynomial segments were also randomly perturbed. Third, the PVTOL system was simulated with perturbed initial conditions and the polynomial trajectories as references, and tracked using a sub-optimally tuned PD controller; thereby emulating a noisy/imperfect demonstrator. These final simulated paths were sub-sampled at 0.1s resolution to create the datasets. The variations created at each step of this process were sufficient to generate a rich exploration of the state-space for training.
Due to space constraints, we provide details of the solution parameterization (number of features, etc) in Appendix C.
6.2 Models
Using the same feature space, we trained three separate models with varying training dataset (i.e., tuples) sizes of
. The first model, N-R was an unconstrained and un-regularized model, trained by solving problem (17) without constraints or
regularization (i.e., just least-squares). The second model, R-R was an unconstrained ridge-regression model, trained by solving problem (17) without any constraints (i.e., least-squares plus
regularization). The third model, CCM-R is the CCM-regularized model, trained using Algorithm 1. We enforced the CCM regularizing constraints for the CCM-R model at
points in the state-space, composed of the N demonstration points in the training dataset and randomly sampled points from X (recall that the CCM constraints do not require samples of
).
As the CCM constraints were relaxed to hold pointwise on the finite constraint set as opposed to everywhere on X, in the spirit of viewing these constraints as regularizers for the model (see Section 4), we simulated both the R-R and CCM-R models using the time-varying Linear-Quadratic-Regulator (TV-LQR) feedback controller. This also helped ensure a more direct comparison of the quality of the learned models themselves, independently of the tracking feedback controller. The results are virtually identical using a tracking MPC controller and yield no additional insight.
6.3 Validation and Comparison
The validation tests were conducted by gridding the plane to create a set of 120 initial conditions between 4m and 12m away from (0, 0) and randomly sampling the other states for the rest of the initial conditions. These conditions were held fixed for both models and for all training dataset sizes to evaluate model improvement.
For each model at each value of N, the evaluation task was to (i) solve a trajectory optimization problem to compute a dynamically feasible trajectory for the learned model to go from initial state to the goal state - a stable hover at (0, 0) at nearzero velocity; and (ii) track this trajectory using the TV-LQR controller. As a baseline, all simulations without any feedback controller (i.e., open-loop control rollouts) led to the PVTOL crashing. This is understandable since the dynamics fitting objective is not optimizing for multi-step error. The trajectory optimization step was solved as a fixed-endpoint, fixed final time optimal control problem using the Chebyshev pseudospectral method [6] with the objective of minimizing
. The final time T for a given initial condition was held fixed between all models. Note that 120 trajectory optimization problems were solved for each model and each value of N.
Figure 1 shows a boxplot comparison of the trajectory-wise RMS full state errors (where
is the reference trajectory obtained from the optimizer and x(t) is the actual realized trajectory) for each model and all training dataset sizes. As N increases, the spread of the RMS errors decreases for both R-R and CCM-R models as expected. However, we see that the N-R model generates several unstable trajectories for N = 100, 500 and 1000, indicating the need for some form of regularization. The CCM-R model consistently achieves a lower RMS error distribution than both the N-R and R-R models for all training dataset sizes. Most notable however, is its performance when the number of training samples is small (i.e.,
) when there is
Fig. 1: Box-whisker plot comparison of trajectory-wise RMS state-tracking errors over all 120 trajectories for each model and all training dataset sizes. Top row, left-to-right: N = 100, 250, 500, 1000; Bottom row, left-to-right: N = 100, 500, 1000 (zoomed in). The box edges correspond to the 25th, median, and 75th percentiles; the whiskers extend beyond the box for an additional 1.5 times the interquartile range; outliers, classified as trajectories with RMS errors past this range, are marked with red crosses. Notice the presence of unstable trajectories for N-R at all values of N and for R-R at N = 100, 250. The CCM-R model dominates the other two at all values of N, particularly for N = 100, 250.
considerable risk of overfitting. It appears the CCM constraints have a notable effect on the stabilizability of the resulting model trajectories (recall that the initial conditions of the trajectories and the tracking controllers are held fixed between the models).
For N = 100 (which is really at the extreme lower limit of necessary number of samples since there are effectively 97 features for each dimension of the dynamics function), both N-R and R-R models generate a large number of unstable trajectories. In contrast, out of the 120 generated test trajectories, the CCM-R model generates one mildly (in that the quadrotor diverged from the nominal trajectory but did not crash) unstable trajectory. No instabilities were observed with CCM-R for .
Figure 2a compares the traces between R-R and CCM-R corresponding to the five worst performing trajectories for the R-R N = 100 model. Similarly, Figure 2b compares the
traces corresponding to the five worst performing trajectories for the CCM-R N = 100 model. Notice the large number of unstable trajectories generated using the R-R model. Indeed, it is in this low sample training regime where the control-theoretic regularization effects of the CCM-R model are most noticeable.
Finally, in Figure 3, we highlight two trajectories, starting from the same initial conditions, one generated and tracked using the R-R model, the other using the CCM model, for N = 250. Overlaid on the plot are the snapshots of the vehicle outline itself, illustrating the quite aggressive flight-regime of the trajectories (the initial starting bank angle is ). While tracking the R-R model generated trajectory eventually ends in complete loss of control, the system successfully tracks the CCM-R model generated trajectory to the stable hover at (0, 0).
An interesting area of future work here is to investigate how to tune the regularization parameters . Indeed, the R-R model appears to be extremely sensitive to
, yielding drastically worse results with a small change in this parameter. On the other hand, the CCM-R model appears to be quite robust to variations in this parameter. Standard cross-validation techniques using regression quality as a metric are unsuitable as a tuning technique here; indeed, recent results even advocate for “ridgeless” regres-
Fig. 2: traces for R-R (left column) and CCM-R (right column) corresponding to the 5 worst performing trajectories for (a) R-R, and (b) CCM-R models at N = 100. Colored circles indicate start of trajectory. Red circles indicate end of trajectory. All except one of the R-R trajectories are unstable. One trajectory for CCM-R is slightly unstable.
sion [15]. However, as observed in Figure 1, un-regularized model fitting is clearly unsuitable. The effect of regularization on how the trajectory optimizer leverages the learned dynamics is a non-trivial relationship that merits further study.
In this paper, we presented a framework for learning controlled dynamics from demonstrations for the purpose of trajectory optimization and control for continuous robotic tasks. By leveraging tools from nonlinear control theory, chiefly, contraction theory, we introduced the concept of learning stabilizable dynamics, a notion which guarantees the existence of feedback controllers for the learned dynamics model that ensures trajectory trackability. Borrowing tools from Reproducing Kernel Hilbert Spaces and convex optimization, we proposed a bi-convex semi-supervised algorithm for learning stabilizable dynamics for complex underactuated and inherently unstable systems. The algorithm was validated on a simulated planar quadrotor system where it was observed that our control-theoretic dynamics learning algorithm notably outperformed traditional ridge-regression based model learning.
There are several interesting avenues for future work. First, it is unclear how the algorithm would perform for systems that are fundamentally unstabilizable and how the resulting learned dynamics could be used for “approximate” control. Second, we will explore sufficient conditions for convergence for the iterative algorithm under the finite-and infinite-constrained formulations. Third, we will address extending the algorithm to work on higher-dimensional spaces through functional parameterization of the control-theoretic regularizing constraints. Fourth, we will address the limitations imposed by
Fig. 3: Comparison of reference and tracked trajectories in the plane for R-R and CCMR models starting at same initial conditions with N = 250. Red (dashed): nominal, Blue (solid): actual, Green dot: start, black dot: nominal endpoint, blue dot: actual endpoint; Top: CCM-R, Bottom: R-R. The vehicle successfully tracks the CCM-R model generated trajectory to the stable hover at (0, 0) while losing control when attempting to track the R-R model generated trajectory.
the sparsity assumption on the input matrix B using the proposed alternating algorithm proposed in Section 4.1. Finally, we will incorporate data gathered on a physical system subject to noise and other difficult to capture nonlinear effects (e.g., drag, friction, backlash) and validate the resulting dynamics model and tracking controllers on the system itself to evaluate the robustness of the learned models.
1. Berkenkamp, F., Turchetta, M., Schoellig, A., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Conf. on Neural Information Processing Systems (2017)
2. Brault, R., Heinonen, M., d’Alch´e Buc, F.: Random fourier features for operator-valued ker- nels. In: Asian Conf. on Machine Learning. pp. 110–125 (2016)
3. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. arXiv preprint arXiv:1805.12114 (2018)
4. Crouch, P.E., van der Schaft, A.J.: Variational and Hamiltonian Control Systems. Springer (1987)
5. Deisenroth, M.P., Rasmussen, C.E.: PILCO: A model-based and data-efficient approach to policy search. In: Int. Conf. on Machine Learning. pp. 465–472 (2011)
6. Fahroo, F., Ross, I.M.: Direct trajectory optimization by a chebyshev pseudospectral method. AIAA Journal of Guidance, Control, and Dynamics 25(1), 160–166 (2002)
7. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their Applications 13(4), 18–28 (1998)
8. Hettich, R., Kortanek, K.O.: Semi-infinite programming: Theory, methods, and applications. SIAM Review 35(3), 380–429 (1993)
9. Huang, P.S., Avron, H., Sainath, T.N., Sindhwani, V., Ramabhadran, B.: Kernel methods match deep neural networks on TIMIT. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing. pp. 205–209. IEEE (2014)
10. Khansari-Zadeh, S.M., Billard, A.: Learning stable nonlinear dynamical systems with Gaus- sian mixture models. IEEE Transactions on Robotics 27(5), 943–957 (2011)
11. Khansari-Zadeh, S.M., Khatib, O.: Learning potential functions from human demonstra- tions with encapsulated dynamic and compliant behaviors. Autonomous Robots 41(1), 45–69 (2017)
12. Kurdila, A.J., Zabarankin, M.: Convex functional analysis. Springer Science & Business Media, Birkh¨auser Basel edn. (2006)
13. Lax, P.: Linear Algebra and its Applications. John Wiley & Sons, 2 edn. (2007)
14. Lemme, A., Neumann, K., Reinhart, R.F., Steil, J.J.: Neural learning of vector fields for encoding stable dynamical systems. Neurocomputing 141(1), 3–14 (2014)
15. Liang, T., Rakhlin, A.: Just interpolate: Kernel ridgeless regression can generalize. arXiv preprint arXiv:1808.00387v1 (2018)
16. Lohmiller, W., Slotine, J.J.E.: On contraction analysis for non-linear systems. Automatica 34(6), 683–696 (1998)
17. Manchester, I.R., Slotine, J.J.E.: Control contraction metrics: Convex and intrinsic criteria for nonlinear feedback design. IEEE Transactions on Automatic Control (2017), In Press
18. Manchester, I.R., Slotine, J.J.E.: Robust control contraction metrics: A convex approach to
20. Medina, J.R., Billard, A.: Learning stable task sequences from demonstration with linear parameter varying systems and hidden Markov models. In: Conf. on Robot Learning. pp. 175–184 (2017)
21. Micheli, M., Glaun´es, J.A.: Matrix-valued kernels for shape deformation analysis. Geometry, Imaging and Computing 1(1), 57–139 (2014)
22. Minh, H.Q.: Operator-valued bochner theorem, fourier feature maps for operator-valued ker- nels, and vector-valued learning. arXiv preprint arXiv:1608.05639 (2016)
23. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. arXiv preprint arXiv:1708.02596 (2017)
24. Olfati-Saber, R.: Nonlinear Control of Underactuated Mechanical Systems with Application to Robotics and Aerospace Vehicles. Ph.D. thesis, Massachusetts Inst. of Technology (2001)
25. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Conf. on Neural Information Processing Systems. pp. 1177–1184 (2007)
26. Rahimi, A., Recht, B.: Uniform approximation of functions with random bases. In: Allerton Conf. on Communications, Control and Computing. IEEE (2008)
27. Ravichandar, H., Salehi, I., Dani, A.: Learning partially contracting dynamical systems from demonstrations. In: Conf. on Robot Learning (2017)
28. Reyhanoglu, M., van der Schaft, A., McClamroch, N.H., Kolmanovsky, I.: Dynamics and control of a class of underactuated mechanical systems. IEEE Transactions on Automatic Control 44(9), 1663–1671 (1999)
29. Sanner, R.M., Slotine, J.J.E.: Gaussian networks for direct adaptive control. IEEE Transac- tions on Neural Networks 3(6), 837–863 (1992)
30. Scholk¨opf, B., Smola, A.J.: Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press (2001)
31. Sindhwani, V., Tu, S., Khansari, M.: Learning contracting vector fields for stable imitation learning. arXiv preprint arXiv:1804.04878 (2018)
32. Singh, S., Majumdar, A., Slotine, J.J.E., Pavone, M.: Robust online motion planning via con- traction theory and convex optimization. In: Proc. IEEE Conf. on Robotics and Automation (2017), Extended Version, Available at http://asl.stanford.edu/wp-content/ papercite-data/pdf/Singh.Majumdar.Slotine.Pavone.ICRA17.pdf
33. Slotine, J.J.E., Li, W.: On the adaptive control of robot manipulators. Int. Journal of Robotics Research 6(3), 49–59 (1987)
34. Spong, M.W.: Underactuated mechanical systems. In: Control Problems in Robotics and Automation. Springer Berlin Heidelberg (1998)
35. Venkatraman, A., Capobianco, R., Pinto, L., Hebert, M., Nardi, D., Bagnell, A.: Improved learning of dynamics models for control. In: Int. Symp. on Experimental Robotics. pp. 703– 713. Springer (2016)
36. Venkatraman, A., Hebert, M., Bagnell, J.A.: Improving multi-step prediction of learned time series models. In: Proc. AAAI Conf. on Artificial Intelligence (2015)
37. Zhang, L., Wu, S.Y., L´opez, M.A.: A new exchange method for convex semi-infinite pro- gramming. SIAM Journal on Optimization 20(6), 2959–2977 (2010)
38. Zhou, D.X.: Derivative reproducing properties for kernel methods in learning theory. Journal of Computational and Applied Mathematics 220(1-2), 456–463 (2008)
Physically, structural assumption (11) is not as mysterious as it appears. Indeed, consider the standard dynamics form for mechanical systems:
where is the configuration vector,
is the inertia matrix,
contains the centrifugal and Coriolis terms, g are the gravitational terms,
is the (full rank) input matrix mapping and
is the input. For fully actuated systems,
. For underactuated systems,
. By rearranging the configuration vector [24,28,34], one can partition q as
where
represents the unactuated degrees of freedom and
represents the actuated degrees of freedom. Applying this partitioning to the dynamics equation above yields
where is an invertible square matrix. As observed in [28], a substantial class of underactuated systems can be represented in this manner. Of course, fully actuated systems also take this form (
). Thus, by taking as state
where
is momentum (so that
), the dynamics can be written as (5):
Notice that the input matrix takes the desired normal form in (11). To address the apparent difficult of working with the state representation of (q, p) (when usually only measurements of are typically available and the inertia matrix H(q) is unknown), we make use of the following result from [17]:
Theorem 1 (CCM Invariance to Diffeomorphisms). Suppose there exists a valid CCM with respect to the state x. Then, if
is a diffeomorphism, then, the CCM conditions also hold with respect to state z with metric
, where
evaluated at
.
Thus, for the (substantial) class of underactuated systems of the form (20), one would solve problem (10) by alternating between fitting (using the demonstrations
) and leveraging Theorem 1 and the previous estimate of H to enforce the matrix inequality constraints using the state representation (q, p). This allows us to borrow several existing results from adaptive control on estimating mechanical system by leveraging the known linearity of H(q) in terms of unknown mass property parameters multiplying known physical basis functions. We leave this extension however, to future work.
To go from the general problem definition in (10) to the finite dimensional problem in (16), we first must define appropriate function classes for , and W. We will do this using the framework of RKHS.
B.1 Reproducing Kernel Hilbert Spaces
Scalar-valued RKHS: Kernel methods [30] constitute a broad family of non-parametric modeling techniques for solving a range of problems in machine learning. A scalar-valued positive definite kernel function generates a Reproducing Kernel Hilbert Space (RKHS) of functions, with the nice property that if two functions are close in the distance derived from the norm (associated with the Hilbert space), then their pointwise evaluations are close at all points. This continuity of evaluation functionals has the far reaching consequence that norm-regularized learning problems over RKHSs admit finite dimensional solutions via Representer Theorems. The kernel k may be (non-uniquely) associated with a higher-dimensional embedding of the input space via a feature map,
, such that
, where D is infinite for universal kernels associated with RKHSs that are dense in the space of square integrable functions. Standard regularized linear statistical models in the embedding space implicitly provide non-linear inference with respect to the original input representation. In a nutshell, kernel methods provide a rigorous algorithmic framework for deriving non-linear counterparts of a whole array of linear statistical techniques, e.g. for classifi-cation, regression, dimensionality reduction and unsupervised learning. For details, we point the reader to [7].
Vector-valued RKHS: Dynamics estimation is a vector-valued learning problem. Such problems can be naturally formulated in terms of vector-valued generalizations of RKHS conceps. The theory and formalism of vector-valued RKHS can be traced as far back as the work of Laurent Schwarz in 1964, with applications ranging from solving partial differential equations to machine learning. Informally, we say that that H is an RKHS of -valued maps if for any
, the linear functional that maps
to
is continuous. More formally, denote the standard inner product on
as
and let
bethe vector space of all functions
and let
be the space of all bounded linear operators on
, i.e.,
matrices. A function
is an operator-valued positive definite kernel if for all
and for all finite set of points
and
,
Given such a K, we define the unique -valued RKHS
with reproducing kernel K as follows. For each
and
, define the function
. That is, for all
. Then, the Hilbert space
is defined to be the completion of the linear span of functions
with inner product between functions
, defined as:
Notably, the kernel K satisfies the following reproducing property:
and is thereby referred to as the reproducing kernel for . As our learning problems involves the Jacobian of f, we will also require a representation for the derivative of the kernel matrix. Accordingly, suppose the kernel matrix K lies in
. For any
, define the matrix functions
and
as
where the derivative of the kernel matrix is understood to be element-wise. Define the function
on
as
The following result provides a useful reproducing property for the derivatives of functions in that will be instrumental in deriving the solution algorithm.
Theorem 2 (Derivative Properties on [21]). Let
be a RKHS in
with reproducing kernel
. Then, for all
:
As mentioned in Section 5, the bi-linearity of constraint (9) forces us to adopt an alternating solution strategy whereby in the “dynamics” sub-problem, {W, w, w} are held fixed and we minimize with respect to
. In the “metric” sub-problem,
are held fixed and we minimize
with respect to {W, w, w}.
In the following we derive several useful representer theorems to characterize the solution of the two sub-problems, under the two simplifying assumptions introduced in Section 4.2.
B.2 Dynamics Sub-Problem
Let be the reproducing
kernel for an
-valued RKHS
and let
be another reproducing kernel for an
-valued RKHS
. Define the finite-dimensional subspaces:
Note that all taken from the training dataset of
tuples are a subset of
.
Theorem 3 (Representer Theorem for f, B). Suppose the reproducing kernel is chosen such that all functions
satisfy the sparsity structure
for
. Consider then the pointwise-relaxed dynamics sub-problem for problem (10):
Suppose the feasible set for the LMI constraint is non-empty. Denote and
1, . . . , m as the optimizers for this sub-problem. Then,
and
for all j = 1, . . . , m.
Proof. For a fixed W, the constraint is convex in by linearity of the matrix F in
and
. By assumption (1), and the simplifying form for
, the matrix F is additionally independent of B. Then, by strict convexity of the objective functional, there exists unique minimizers
, provided the feasible region is non-empty [12].
Since and
are closed, by the Projection theorem,
and
. Thus, any
may be written as
where
,
, and
. Similarly, any
may be written as
where
, and
.
We can re-write as:
Jf, B) =
since is closed under addition, and
. Thus,
simplifies to
Column p of takes the form
where is the
standard basis vector in
. Thus,
plays no role in pointwise relaxation of constraint (9) and thus does not affect problem feasibility. Given assumption 1,
also has no effect on problem feasibility. Thus, by non-negativity of the term in the square brackets in (25), we have that the optimal f lies in
and optimal
lies in
The key consequence of this theorem is the reduction of the infinite-dimensional search problem for the functions f and to a finite-dimensional convex optimization problem for the constant vectors
, by choos- ing the function classes
and
. Next, we characterize the optimal solution to the metric sub-problem.
B.3 Metric Sub-Problem
By the simplifying assumption in Section 4.1, constraint (8) requires that be only a function of the first
components of x. Thus, define
as a scalar reproducing kernel with associated real-valued scalar RKHS
. Additionally, define
as another scalar reproducing kernel with associated real-valued scalar RKHS
. In particular,
is only a function of the first
components of
in both arguments.
For all , we will take
(corresponding to
) and all other entries of W as functions in
. Define the kernel derivative functions:
From [38], it follows that the kernel derivative functions satisfy the following two properties, similar to Theorem 2:
A similar property holds for and
. Consider then the following finite-dimensional spaces:
and the proposed representation for W(x):
where are constant symmetric matrices with non-zero entries only in the top-left
block, and
are constant symmetric matrices with zero entries in the top-left
block.
Theorem 4 (Representer Theorem for W). Consider the pointwise-relaxed metric sub-problem for problem (10):
Suppose the feasible set of the above LMI constrains is non-empty. Denote as the optimizer for this sub-problem. Then,
takes the form given in (28).
Proof. Notice that while the regularizer term is strictly convex, the surrogate loss function for the condition number is affine. However, provided the feasible set is non-empty, there still exists a minimizer (possible non-unique) for the above sub-problem.
Now, notice that the (p, q) element of takes the form
Additionally, the (p, q) element of takes the form
That is, constraints (9) and uniform definiteness at the constraint points in can be written in terms of functions in
and
alone. By strict convexity of the regularizer, and recognizing that W(x) is symmetric, the result follows.
Similar to the previous section, the key property here is the reduction of the infinite-dimensional search over W(x) to the finite-dimensional convex optimization problem over the constant symmetric matrices by choosing the function class for the entries of W using the scalar-valued RKHS.
At this point, both sub-problems are finite-dimensional convex optimization problems. Crucially, the only simplifications made are those given in Section 4.2. However, a final computational challenge here is that the number of parameters scales with the number of training N and constraint points. This is a fundamental challenge in all non-parametric methods. In the next section we present the dimensionality reduction techniques used to alleviate these issues.
B.4 Approximation via Random Matrix Features
The size of the problem using full matrix-valued kernel expansions in grows rapidly in , the number of constraint points times the state dimensionality. This makes training slow for even moderately long demonstrations even in low-dimensional settings. The induced dynamical system is slow to evaluate and integrate at inference time. Random feature approximations to kernel functions have been extensively used to scale up training complexity and inference speed of kernel methods [9, 25] in a number of applications. The quality of approximation can be explicitly controlled by the number of random features. In particular, it has been shown [26] that any function in the RKHS associated with the exact kernel can be approximated to arbitrary accuracy by a linear combination of sufficiently large number of random features.
These approximations have only recently been extended to matrix-valued kernels [2, 22]. Given a matrix-valued kernel K, one defines an appropriate matrix-valued feature map with the property
where d controls the quality of this approximation. With such an approximation, notice that any function g in the associated RKHS can be re-parameterized as:
where . Thus, the function g is now summarized by the
dimensional vector
instead of
vectors in
(
parameters). Applying a similar trick to:
one can approximate terms of the form
by for some
. Applying this approximation to the subspaces in (22), (23), and (28), we finally arrive at the function parameterizations4 in eqs. (13)–(15), and the constraints reformulation in problem (16) in terms of the constant vectors. The regularization terms in the objective follow from the definition of the inner product in
:
∥
A canonical example is the the Gaussian separable kernel
with feature map:
where are i.i.d. draws from
.
Notice that the true input matrix for the PVTOL system satisfies Assumption 1. Furthermore, it is a constant matrix. Thus, the feature mapping is therefore just a constant matrix with the necessary sparsity structure.
The feature matrix for f was generated using the Random Fourier approximation to the Gaussian separable matrix-valued kernel (see Appendix B.4) with and s = 8n = 48 sampled Gaussian directions, yielding a feature mapping matrix
with
(96 features for each component of f). The scalar-valued reproducing kernels for the entries of W were taken to be the Gaussian kernel with
. To satisfy condition (8), the kernel for
was only a function of the first
components of x. A total of s = 36 Gaussian samples were taken to yield feature vectors
and
of dimension
. Furthermore, by symmetry of W(x) only n(n + 1)/2 functions were actually parameterized. Thus, the learning problem in (16) comprised of
parameters for the functions, plus the extra scalar constants
.
The learning parameters used were: model N-R (all , R-R (all
, CCM-R (all
;
. Tolerance parameters: constraints:
; discard tolerance
. Note that a small penalization on
was necessary for all models due to the fact that feature matrix
is rank deficient.
Let be the set of smooth curves
satisfying c(0) = p, c(1) = q, and define
. At each time t, given the nominal state/control pair
and current actual state x:
By existence of the metric/differential controller pair , the set K is always non-empty. The resulting
then ensures that the solution x(t) indeed converges towards
exponentially [32].
From an implementation perspective, note that having obtained the curve , constraint (33) is simply a linear inequality in k. Thus, one can analytically compute a feasible feedback control value, e.g., by searching for the smallest (in any norm) element of K; for additional details, we refer the reader to [17,32].