My stuff
Neural Lander: Stable Drone Landing Control using Learned Dynamics

Precise near-ground trajectory control is difficult for multi-rotor drones, due to the complex aerodynamic effects caused by interactions between multi-rotor airflow and the environment. Conventional control methods often fail to properly account for these complex effects and fall short in accomplishing smooth landing. In this paper, we present a novel deep-learning-based robust nonlinear controller (Neural-Lander) that improves control performance of a quadrotor during landing. Our approach combines a nominal dynamics model with a Deep Neural Network (DNN) that learns high-order interactions. We apply spectral normalization (SN) to constrain the Lipschitz constant of the DNN. Leveraging this Lipschitz property, we design a nonlinear feedback linearization controller using the learned model and prove system stability with disturbance rejection. To the best of our knowledge, this is the first DNNbased nonlinear feedback controller with stability guarantees that can utilize arbitrarily large neural nets. Experimental results demonstrate that the proposed controller significantly outperforms a Baseline Nonlinear Tracking Controller in both landing and cross-table trajectory tracking cases. We also empirically show that the DNN generalizes well to unseen data outside the training domain.

Unmanned Aerial Vehicles (UAVs) require high precision control of aircraft positions, especially during landing and take-off. This problem is challenging largely due to complex interactions of rotor and wing airflows with the ground. The aerospace community has long identified such ground effect that can cause an increased lift force and a reduced aerodynamic drag. These effects can be both helpful and disruptive in flight stability [1], and the complications are exacerbated with multiple rotors. Therefore, performing automatic landing of UAVs is risk-prone, and requires expensive high-precision sensors as well as carefully designed controllers.

Compensating for ground effect is a long-standing problem in the aerial robotics community. Prior work has largely focused on mathematical modeling (e.g. [2]) as part of system identification (ID). These models are later used to approximate aerodynamics forces during flights close to the ground and combined with controller design for feed-forward cancellation (e.g. [3]). However, existing theoretical ground effect models are derived based on steady-flow conditions, whereas most practical cases exhibit unsteady flow. Alternative approaches, such as integral or adaptive control methods, often suffer from slow response and delayed feedback. [4] employs Bayesian Optimization for open-air control but not for take-off/landing. Given these limitations, the precision of existing fully automated systems for UAVs are still insufficient for landing and take-off, thereby necessitating the guidance of a human UAV operator during those phases.

To capture complex aerodynamic interactions without overly-constrained by conventional modeling assumptions, we take a machine-learning (ML) approach to build a black-box ground effect model using Deep Neural Networks (DNNs). However, incorporating such models into a UAV controller faces three key challenges. First, it is challenging to collect sufficient real-world training data, as DNNs are notoriously data-hungry. Second, due to high-dimensionality, DNNs can be unstable and generate unpredictable output, which makes the system susceptible to instability in the feedback control loop. Third, DNNs are often difficult to analyze, which makes it difficult to design provably stable DNN-based controllers.

The aforementioned challenges pervade previous works using DNNs to capture high-order non-stationary dynamics. For example, [5], [6] use DNNs to improve system ID of helicopter aerodynamics, but not for controller design. Other approaches aim to generate reference inputs or trajectories from DNNs [7]–[10]. However, these approaches can lead to challenging optimization problems [7], or heavily rely on well-designed closed-loop controller and require a large number of labeled training data [8]–[10]. A more classical approach of using DNNs is direct inverse control [11]–[13] but the non-parametric nature of a DNN controller also makes it challenging to guarantee stability and robustness to noise. [14] proposes a provably stable model-based Reinforcement Learning method based on Lyapunov analysis, but it requires a potentially expensive discretization step and relies on the native Lipschitz constant of the DNN.

Contributions. In this paper, we propose a learning-based controller, Neural-Lander, to improve the precision of quadrotor landing with guaranteed stability. Our approach directly learns the ground effect on coupled unsteady aerodynamics and vehicular dynamics. We use deep learning for system ID of residual dynamics and then integrate it with nonlinear feedback linearization control.

We train DNNs with layer-wise spectrally normalized weight matrices. We prove that the resulting controller is globally exponentially stable under bounded learning errors. This is achieved by exploiting the Lipschitz bound of spectrally normalized DNNs. It has earlier been shown that spectral normalization of DNNs leads to good generalization, i.e. stability in a learning-theoretic sense [15]. It is intriguing that spectral normalization simultaneously guarantees stability both in a learning-theoretic and a control-theoretic sense.

We evaluate Neural-Lander on trajectory tracking of quadrotor during take-off, landing and cross-table maneuvers. Neural-Lander is able to land a quadrotor much more accurately than a Baseline Nonlinear Tracking Controller with a pre-identified system. In particular, we show that compared to the baseline, Neural-Lander can decrease error in z axis from 0.13 m to 0, mitigate x and y drifts by as much as 90%, in the landing case. Meanwhile, Neural-Lander can decrease z error from 0.153 m to 0.027 m, in the cross-table trajectory tracking task.1 We also demonstrate that the learned model can handle temporal dependency, and is an improvement over the steady-state theoretical models.

Given quadrotor states as global position  p ∈ R3, velocityv ∈ R3, attitude rotation matrix  R ∈ SO(3), and body angular velocity  ω ∈ R3, we consider the following dynamics:

˙p = v, m ˙v = mg + Rfu + fa,(1a) ˙R = RS(ω), J ˙ω = Jω × ω + τu + τa,(1b)

where m and J are mass and inertia matrix of the system respectively,  S(·)is skew-symmetric mapping.  g = [0, 0, −g]⊤is the gravity vector,  fu = [0, 0, T]⊤and  τu = [τx, τy, τz]⊤

are the total thrust and body torques from four rotors predicted by a nominal model. We use  η = [T, τx, τy, τz]⊤ to denote the output wrench. Typical quadrotor control input uses squared motor speeds  u = [n21, n22, n23, n24]⊤, and is linearly related to the output wrench  η = B0u, with


where  cT and cQare rotor force and torque coefficients, and larmdenotes the length of rotor arm. The key difficulty of precise landing is the influence of unknown disturbance forces fa = [fa,x, fa,y, fa,z]⊤and torques  τa = [τa,x, τa,y, τa,z]⊤, which originate from complex aerodynamic interactions between the quadrotor and the environment.

Problem Statement: We aim to improve controller accuracy by learning the unknown disturbance forces  faand torques  τain (1). As we mainly focus on landing and take-off tasks, the attitude dynamics is limited and the aerodynamic disturbance torque  τais bounded. Thus position dynamics (1a) and  fa willour primary concern. We first approximate  fausing a DNN with spectral normalization to guarantee its Lipschitz constant, and then incorporate the DNN in our exponentially-stabilizing controller. Training is done off-line, and the learned dynamics is applied in the on-board controller in real-time to achieve smooth landing and take-off.

We learn the unknown disturbance force  fausing a DNN with Rectified Linear Units (ReLU) activation. In general, DNNs equipped with ReLU converge faster during training, demonstrate more robust behavior with respect to changes in hyperparameters, and have fewer vanishing gradient problems compared to other activation functions such as sigmoid [16].

A. ReLU Deep Neural Networks

A ReLU deep neural network represents the functional mapping from the input x to the output  f(x, θ), parameterized by the DNN weights  θ = W 1, · · · , W L+1:


where the activation function  φ(·) = max(·, 0)is called the element-wise ReLU function. ReLU is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. However, deep neural networks are usually trained by first-order gradient based optimization, which is highly sensitive on the curvature of the training objective and can be unstable [17]. To alleviate this issue, we apply the spectral normalization technique [15].

B. Spectral Normalization

Spectral normalization stabilizes DNN training by con- straining the Lipschitz constant of the objective function. Spectrally normalized DNNs have also been shown to generalize well [18], which is an indication of stability in machine learning. Mathematically, the Lipschitz constant of a function  ∥f∥Lipis defined as the smallest value such that


It is known that the Lipschitz constant of a general differentiable function f is the maximum spectral norm (maximum singular value) of its gradient over its domain  ∥f∥Lip =supx σ(∇f(x)).

The ReLU DNN in (3) is a composition of functions. Thus we can bound the Lipschitz constant of the network by constraining the spectral norm of each layer  gl(x) =φ(W lx). Therefore, for a linear map g(x) = Wx, the spectral norm of each layer is given by  ∥g∥Lip = supx σ(∇g(x)) =supx σ(W) = σ(W). Using the fact that the Lipschitz norm of ReLU activation function  φ(·)is equal to 1, with the inequality  ∥g1 ◦ g2∥Lip ≤ ∥g1∥Lip · ∥g2∥Lip, we can find the following bound on  ∥f∥Lip:


In practice, we can apply spectral normalization to the weight matrices in each layer during training as follows:


where  γis the intended Lipschitz constant for the DNN. The following lemma bounds the Lipschitz constant of a ReLU DNN with spectral normalization.

Lemma 3.1: For a multi-layer ReLU network  f(x, θ), defined in (3) without an activation function on the output layer. Using spectral normalization, the Lipschitz constant of the entire network satisfies:


with spectrally-normalized parameters ¯θ = ¯W 1, · · · , ¯W L+1.

Proof: As in (4), the Lipschitz constant can be written as a composition of spectral norms over all layers. The proof follows from the spectral norms constrained as in (5).

C. Constrained Training

We apply gradient-based optimization to train the ReLU DNN with a bounded Lipschitz constant. Estimating  fain (1) boils down to optimizing the parameters  θin the ReLU network in (3), given the observed value of x and the target output. In particular, we want to control the Lipschitz constant of the ReLU network.

The optimization objective is as follows, where we minimize the prediction error with constrained Lipschitz constant:


In our case,  ytis the observed disturbance forces and  xtis the observed states and control inputs. According to the upper bound in (4), we can substitute the constraint by minimizing the spectral norm of the weights in each layer. We use stochastic gradient descent (SGD) to optimize (6) and apply spectral normalization to regulate the weights. From Lemma 3.1, the trained ReLU DNN has a Lipschitz constant.

Our Neural-Lander controller for 3-D trajectory tracking is constructed as a nonlinear feedback linearization controller whose stability guarantees are obtained using the spectral normalizaion of the DNN-based ground-effect model. We then exploit the Lipschitz property of the DNN to solve for the resulting control input using fixed-point iteration.

A. Reference Trajectory Tracking

The position tracking error is defined as  ˜p = p − pd. Ourcontroller uses a composite variable s = 0 as a manifold on which  ˜p(t) → 0exponentially:


with  Λas a positive definite or diagonal matrix. Now the trajectory tracking problem is transformed to tracking a reference velocity  vr = ˙pd − Λ˜p.Using the methods described in Sec. III, we define ˆfa(ζ, u)as the DNN approximation to the disturbance aerodynamic forces, with  ζbeing the partial states used as input features to the network. We design the total desired rotor force  fd as


Substituting (8) into (1), the closed-loop dynamics would simply become  m˙s + Kvs = ϵ, with approximation error ϵ = fa − ˆfa. Hence,  ˜p(t) → 0globally and exponentially with bounded error, as long as  ∥ϵ∥is bounded [19]–[21].

Consequently, desired total thrust  Tdand desired force direction ˆkdcan be computed as


with ˆkbeing the unit vector of rotor thrust direction (typically z-axis in quadrotors). Using ˆkdand fixing a desired yaw angle, desired attitude  Rdcan be deduced [22]. We assume that a nonlinear attitude controller uses the desired torque  τd fromrotors to track  Rd(t). One such example is in [21]:


where the reference angular rate  ωris designed similar to (7), so that when  ω → ωr, exponential trajectory tracking of a desired attitude  Rd(t)is guaranteed within some bounded error in the presence of bounded disturbance torques.

B. Learning-based Discrete-time Nonlinear Controller

From (2), (9) and (10), we can relate the desired wrench ηd = [Td, τ ⊤d ]⊤with the control signal u through


Because of the dependency of ˆfa on u, the control synthesis problem here is non-affine. Therefore, we propose the following fixed-point iteration method for solving (11):


where  ukand  uk−1are the control input for current and previous time-step in the discrete-time controller. Next, we prove the stability of the system and convergence of the control inputs in (12).

The closed-loop tracking error analysis provides a direct correlation on how to tune the neural network and controller parameter to improve control performance and robustness.

A. Control Allocation as Contraction Mapping

We first show that the control input  ukconverges to the solution of (11) when all states are fixed.

Lemma 5.1: Define mapping  uk = F(uk−1)based on (12) and fix all current states:


If ˆfa(ζ, u) is La-Lipschitz continuous, and  σ(B−10 ) · La < 1;then  F(·)is a contraction mapping, and  ukconverges to unique solution of  u∗ = F(u∗).

Proof: ∀ u1, u2 ∈ Uwith U being a compact set offeasible control inputs; and given fixed states as ¯fd, τdand ˆk, then:

∥F(u1)− F(u2)∥2 =���B−10 �ˆfa(ζ, u1) − ˆfa(ζ, u2)����2 σ(B−10 ) · La ∥u1 − u2∥2 .

Thus,  ∃ α < 1,s.t  ∥F(u1) − F(u2)∥2 < α ∥u1 − u2∥2. Hence,  F(·)is a contraction mapping.

B. Stability of Learning-based Nonlinear Controller

Before continuing to prove the stability of the full system, we make the following assumptions.

Assumption 1: The desired states along the position trajectory  pd(t), ˙pd(t), and  ¨pd(t)are bounded.

Assumption 2: One-step difference of control signal satis-fies  ∥uk − uk−1∥ ≤ ρ ∥s∥with a small positive  ρ.

Here we provide the intuition behind this assumption. From (13), we can derive the following approximate relation with  ∆(·)k = ∥(·)k − (·)k−1∥:


Because update rate of attitude controller (> 100 Hz) and motor speed control (> 5 kHz) are much higher than that of the position controller (≈ 10 Hz), in practice, we can safely neglect  ∆sk, ∆˙vr,k, and ∆ζkin one update (Theorem 11.1 [23]). Furthermore,  ∆τd,kcan be limited internally by the attitude controller. It leads to:


with c being a small constant and  σ(B−10 ) · La < 1from Lemma. 5.1, we can deduce that  ∆urapidly converges to a small ultimate bound between each position controller update.

Assumption 3: The learning error of ˆfa(ζ, u)over the compact sets  ζ ∈ Z, u ∈ Uis upper bounded by  ϵm =supζ∈Z,u∈U∥ϵ(ζ, u)∥, where  ϵ(ζ, u) = fa(ζ, u) − ˆfa(ζ, u).

DNNs have been shown to generalize well to the set of unseen events that are from almost the same distribution as training set [24], [25]. This empirical observation is also theoretically studied in order to shed more light toward an understanding of the complexity of these models [18], [26]– [28]. Based on the above assumptions, we can now present our overall stability and robustness result.

Theorem 5.2: Under Assumptions 1-3, for a time-varying pd(t), the controller defined in (8) and (12) with  λmin(Kv) >Laρachieves exponential convergence of composite variable s to error ball  limt→∞ ∥s(t)∥ = ϵm/ (λmin(Kv) − Laρ) withrate  ((λmin(Kv) − Laρ) /m. And ˜pexponentially converges to error ball


with rate  λmin(Λ).

Proof: We begin the proof by selecting a Lyapunov function as  V(s) = 12m∥s∥2, then by applying the controller (8), we get the time-derivative of V:


Let  λ = λmin(Kv)denote the minimum eigenvalue of the positive-definite matrix  Kv. By applying the Lipschitz property of ˆfatheorem 3.1 and Assumption 2, we obtain



Fig. 1: (a) Intel Aero drone; (b) Training data trajectory. Part I (0 to 250 s) contains maneuvers at different heights (0.05 m to 1.50 m). Part II (250 s to 350 s) includes random x, y, and z motions for maximum state-space coverage.

Using the Comparison Lemma [23], we define W(t) = �V(t) =�m/2∥s∥and ˙W = ˙V/�2√V�to obtain


It can be shown that this leads to finite-gain  Lpstability and input-to-state stability (ISS) [29]. Furthermore, the hierarchical combination between s and  ˜pin (7) results in limt→∞ ∥˜p(t)∥ = limt→∞ ∥s(t)∥/λmin(Λ), yielding (14).

In our experiments, we evaluate both the generalization performance of our DNN as well as the overall control performance of Neural-Lander. The experimental setup is composed of a motion capture system with 17 cameras, a WiFi router for communication, and an Intel Aero drone, weighing 1.47 kg with an onboard Linux computer (2.56 GHz Intel Atom x7 processor, 4 GB DDR3 RAM). We retrofitted the drone with eight reflective infrared markers for accurate position, attitude and velocity estimation at 100Hz. The Intel Aero drone and the test space are shown in Fig. 1(a).

A. Bench Test

To identify a good nominal model, we first measured the mass, m, diameter of the rotor, D, the air density,  ρ, gravity,g. Then we performed bench test to determine the thrust constant,  cT, as well as the non-dimensional thrust coefficient CT = cTρD4. Note that  CTis a function of propeller speed n, and here we picked a nominal value at n = 2000 RPM .

B. Real-World Flying Data and Preprocessing

To estimate the disturbance force  fa, an expert pilot manually flew the drone at different heights, and we collected training data consisting of sequences of state estimates and control inputs {(p, v, R, u), y} where y is the observed value of  fa. We utilized the relation  fa = m ˙v −mg−Rfu from (1)to calculate  fa, where fuis calculated based on the nominal  cTfrom the bench test in Sec. VI-A. Our training set is a single continuous trajectory with varying heights and velocities. The trajectory has two parts shown in Fig. 1(b). We aim to learn the ground effect through Part I of the training set, and other aerodynamics forces such as air drag through Part II.


Fig. 2: (a) Learned ˆfa,zcompared to the ground effect model with respect to height  z, with vz = vx = vy = 0 m/s, R = I,u = 6400 RPM. Ground truth points are from hovering data at different heights. (b) Learned ˆfa,zwith respect to rotation speed  n (z = 0.2 m, vz = 0 m/s), compared to  CT measuredin the bench test. (c) Heatmaps of learned ˆfa,z versus z andvz. (Left) ReLU network with spectral normalization. (Right) ReLU network without spectral normalization.

C. DNN Prediction Performance

We train a deep ReLU network ˆfa(ζ, u) = ˆfa(z, v, R, u), with z, v, R, u corresponding to global height, global velocity, attitude, and control input. We build the ReLU network using PyTorch [30]. Our ReLU network consists of four fullyconnected hidden layers, with input and the output dimensions 12 and 3, respectively. We use spectral normalization (5) to constrain the Lipschitz constant of the DNN.

We compare the near-ground estimation accuracy our DNN model with existing 1D steady ground effect model [1], [3]:


where T is the thrust generated by propellers, n is the rotation speed,  n0is the idle RPM, and  µdepends on the number and the arrangement of propellers (µ = 1for a single propeller, but must be tuned for multiple propellers). Note that  cT is afunction of n. Thus, we can derive ¯fa,z(n, z)from T(n, z).

Fig. 2(a) shows the comparison between the estimated  fafrom DNN and the theoretical ground effect model (15) at different z (assuming T = mg when  z = ∞). We can see that our DNN can achieve much better estimates than the theoretical ground effect model. We further investigate the trend of ¯fa,zwith respect to the rotation speed n. Fig. 2(b) shows the learned ˆfa,zover the rotation speed n at a given


Fig. 3: Baseline Controller and Neural-Lander performance in take-off and landing. Means (solid curves) and standard deviations (shaded areas) of 10 trajectories.

height, in comparison with the  CTmeasured from the bench test. We observe that the increasing trend of the estimates ˆfa,zis consistent with bench test results for  CT.

To understand the benefits of SN, we compared ˆfa,zpredicted by the DNNs trained both with and without SN as shown in Fig. 2(c). Note that  vzfrom  −1 m/sto 1 m/s is covered in our training set, but  −2 m/s to −1 m/s is not.We observe the following differences:

1) Ground effect: ˆfa,zincreases as z decreases, which is also shown in Fig. 2(a).

2) Air drag: ˆfa,zincreases as the drone goes down (vz <0) and it decreases as the drone goes up (vz > 0).

3) Generalization: the spectral normalized DNN is much smoother and can also generalize to new input domains not contained in the training set.

In [18], the authors theoretically show that spectral normalization can provide tighter generalization guarantees on unseen data, which is consistent with our empirical observation. We will connect generalization theory more tightly with our robustness guarantees in the future.

D. Baseline Controller

We compared the Neural-Lander with a Baseline Nonlinear Tracking Controller. We implemented both a Baseline Controller similar to (7) and (8) with ˆfa ≡ 0, as well as an integral controller variation with  vr = ˙pd − 2Λ˜p −Λ2 � t0 ˜p(τ)dτ. Though an integral gain can cancel steady-state error during set-point regulation, our flight results showed that the performance can be sensitive to the integral gain,


Fig. 4: Neural-Lander performance in take-off and landing with different DNN capacities. 1 layer means ˆfa = Ax + b;0 layer means ˆfa = b; Baseline means ˆfa ≡ 0.

especially during trajectory tracking. This can be seen in the demo video.2

E. Setpoint Regulation Performance

First, we tested the two controllers’ performance in take-off/landing, by commanding position setpoint  pd, from (0, 0, 0), to (0, 0, 1), then back to (0, 0, 0), with ˙pd ≡ 0. From Fig. 3, we can conclude that there are two main benefits of our Neural-Lander. (a) Neural-Lander can control the drone to precisely and smoothly land on the ground surface while the Baseline Controller struggles to achieve 0 terminal height due to the ground effect. (b) Neural-Lander can mitigate drifts in  x − yplane, as it also learned about additional aerodynamics such as air drag.

Second, we tested Neural-Lander performance with different DNN capacities. Fig. 4 shows that compared to the baseline (ˆfa ≡ 0), 1 layer model could decrease z error but it is not enough to land the drone. 0 layer model generated significant error during take-off.

In experiments, we observed the Neural-Lander without spectral normalization can even result in unexpected controller outputs leading to crash, which empirically implies the necessity of SN in training the DNN and designing the controller.

F. Trajectory Tracking Performance

To show that our algorithm can handle more complicated environments where physics-based modelling of dynamics would be substantially more difficult, we devise a task of tracking an elliptic trajectory very close to a table with a period of 10 seconds shown in Fig. 5. The trajectory is partially over the table with significant ground effects, and a sharp transition to free space at the edge of the table. We compared the performance of both Neural-Lander and Baseline Controller on this test.

In order to model the complex dynamics near the table, we manually flew the drone in the space close to the table to collect another data set. We trained a new ReLU DNN model with x-y positions as additional input features: ˆfa(p, v, R, u).Similar to the setpoint experiment, the benefit of spectral


Fig. 5: (a) Heatmaps of learned ˆfa,z versus x and y, with other inputs fixed. (Left) ReLU network with spectral normalization. (Right) ReLU network without spectral normalization. (b) Tracking performance and statistics.

normalization can be seen in Fig. 5(a), where only the spectrally-normalized DNN exhibits a clear table boundary.

Fig. 5(b) shows that Neural-Lander outperformed the Baseline Controller for tracking the desired position trajectory in all x, y, and z axes. Additionally, Neural-Lander showed a lower variance in height, even at the edge of the table, as the controller captured the changes in ground effects when the drone flew over the table.

In summary, the experimental results with multiple ground interaction scenarios show that much smaller tracking errors are obtained by Neural-Lander, which is essentially the nonlinear tracking controller with feedforward cancellation of a spectrally-normalized DNN.

In this paper, we present Neural-Lander, a deep learning based nonlinear controller with guaranteed stability for precise quadrotor landing. Compared to the Baseline Controller, Neural-Lander is able to significantly improve control performance. The main benefits are (1) our method can learn from coupled unsteady aerodynamics and vehicle dynamics to provide more accurate estimates than theoretical ground effect models, (2) our model can capture both the ground effect and other non-dominant aerodynamics and outperforms the conventional controller in all axes (x, y and z), and (3) we provide rigorous theoretical analysis of our method and guarantee the stability of the controller, which also implies generalization to unseen domains.

Future work includes further generalization of the capabilities of Neural-Lander handling unseen state and disturbance domains, such as those generated by a wind fan array.

The authors thank Joel Burdick, Mory Gharib and Daniel Pastor Moreno. The work is funded in part by Caltech’s Center for Autonomous Systems and Technologies and Raytheon Company.

[1] I. Cheeseman and W. Bennett, “The effect of ground on a helicopter rotor in forward flight,” 1955.

[2] K. Nonaka and H. Sugizaki, “Integral sliding mode altitude control for a small model helicopter with ground effect compensation,” in American Control Conference (ACC), 2011. IEEE, 2011, pp. 202–207.

[3] L. Danjun, Z. Yan, S. Zongying, and L. Geng, “Autonomous landing of quadrotor based on ground effect modelling,” in Control Conference (CCC), 2015 34th Chinese. IEEE, 2015, pp. 5647–5652.

[4] F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe controller optimization for quadrotors with Gaussian processes,” in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 493–496. [Online]. Available: https://arxiv.org/abs/1509. 01066

[5] P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.

[6] A. Punjani and P. Abbeel, “Deep learning helicopter dynamics models,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 3223–3230.

[7] S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin, “Learning quadrotor dynamics using neural network for flight control,” in Decision and Control (CDC), 2016 IEEE 55th Conference on. IEEE, 2016, pp. 4653–4660.

[8] Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig, “Deep neural networks for improved, impromptu trajectory tracking of quadrotors,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 5183–5189.

[9] S. Zhou, M. K. Helwa, and A. P. Schoellig, “Design of deep neural networks as add-on blocks for improving impromptu trajectory tracking,” in Decision and Control (CDC), 2017 IEEE 56th Annual Conference on. IEEE, 2017, pp. 5201–5207.

[10] C. S´anchez-S´anchez and D. Izzo, “Real-time optimal control via deep neural networks: study on landing problems,” Journal of Guidance, Control, and Dynamics, vol. 41, no. 5, pp. 1122–1135, 2018.

[11] S. Balakrishnan and R. Weil, “Neurocontrol: A literature survey,” Mathematical and Computer Modelling, vol. 23, no. 1-2, pp. 101– 117, 1996.

[12] M. T. Frye and R. S. Provence, “Direct inverse control using an artificial neural network for the autonomous hover of a helicopter,” in Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on. IEEE, 2014, pp. 4121–4122.

[13] H. Suprijono and B. Kusumoputro, “Direct inverse control based on neural network for unmanned small helicopter attitude and altitude control,” Journal of Telecommunication, Electronic and Computer Engineering (JTEC), vol. 9, no. 2-2, pp. 99–102, 2017.

[14] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Proc. of Neural Information Processing Systems (NIPS), 2017. [Online]. Available: https://arxiv.org/abs/1705.08551

[15] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018.

[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[17] T. Salimans and D. P. Kingma, “Weight normalization: A simple reparameterization to accelerate training of deep neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 901– 909.

[18] P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized margin bounds for neural networks,” in Advances in Neural Information Processing Systems, 2017, pp. 6240–6249.

[19] J. Slotine and W. Li, Applied Nonlinear Control. Prentice Hall, 1991.

[20] S. Bandyopadhyay, S.-J. Chung, and F. Y. Hadaegh, “Nonlinear attitude control of spacecraft with a large captured object,” Journal of Guidance, Control, and Dynamics, vol. 39, no. 4, pp. 754–769, 2016.

[21] X. Shi, K. Kim, S. Rahili, and S.-J. Chung, “Nonlinear control of autonomous flying cars with wings and distributed electric propulsion,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 5326–5333.

[22] D. Morgan, G. P. Subramanian, S.-J. Chung, and F. Y. Hadaegh, “Swarm assignment and trajectory optimization using variable-swarm, distributed auction assignment and sequential convex programming,” Int. J. Robotics Research, vol. 35, no. 10, pp. 1261–1285, 2016.

[23] H. Khalil, Nonlinear Systems, ser. Pearson Education. Prentice Hall, 2002.

[24] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” arXiv preprint arXiv:1611.03530, 2016.

[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[26] B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “A pacbayesian approach to spectrally-normalized margin bounds for neural networks,” arXiv preprint arXiv:1707.09564, 2017.

[27] G. K. Dziugaite and D. M. Roy, “Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data,” arXiv preprint arXiv:1703.11008, 2017.

[28] B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “Exploring generalization in deep learning,” in Advances in Neural Information Processing Systems, 2017, pp. 5947–5956.

[29] S.-J. Chung, S. Bandyopadhyay, I. Chang, and F. Y. Hadaegh, “Phase synchronization control of complex networks of Lagrangian systems on adaptive digraphs,” Automatica, vol. 49, no. 5, pp. 1148–1161, 2013.

[30] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.

Designed for Accessibility and to further Open Science