Deep learning for universal linear embeddings of nonlinear dynamics

2017·Arxiv

Abstract

Abstract

Identifying coordinate transformations that make strongly nonlinear dynamics approximately linear is a central challenge in modern dynamical systems. These transformations have the potential to enable prediction, estimation, and control of nonlinear systems using standard linear theory. The Koopman operator has emerged as a leading data-driven embedding, as eigenfunctions of this operator provide intrinsic coordinates that globally linearize the dynamics. However, identifying and representing these eigenfunctions has proven to be mathematically and computationally challenging. This work leverages the power of deep learning to discover representations of Koopman eigenfunctions from trajectory data of dynamical systems. Our network is parsimonious and interpretable by construction, embedding the dynamics on a low-dimensional manifold parameterized by these eigenfunctions. In particular, we identify nonlinear coordinates on which the dynamics are globally linear using a modified auto-encoder. We also generalize Koopman representations to include a ubiquitous class of systems that exhibit continuous spectra, ranging from the simple pendulum to nonlinear optics and broadband turbulence. Our framework parametrizes the continuous frequency using an auxiliary network, enabling a compact and efficient embedding, while connecting our models to half a century of asymptotics. In this way, we benefit from the power and generality of deep learning, while retaining the physical interpretability of Koopman embeddings.

Keywords– Dynamical systems, Koopman theory, machine learning, deep learning.

1 Introduction

Nonlinearity is a hallmark feature of complex systems, giving rise to a rich diversity of observed dynamical behaviors across the physical, biological, and engineering sciences [17, 10]. Although computationally tractable, there exists no general mathematical framework for solving nonlinear dynamical systems. Thus representing nonlinear dynamics in a linear framework is particularly appealing because of powerful and comprehensive techniques for the analysis and control of linear systems [12], which do not readily generalize to nonlinear systems. Koopman operator theory, developed in 1931 [25, 26], has recently emerged as a leading candidate for the systematic linear representation of nonlinear systems [38, 35]. This renewed interest in Koopman analysis has been driven by a combination of theoretical advances [38, 35, 9, 36, 34], improved numerical methods such as dynamic mode decomposition (DMD) [46, 45, 29], and an increasing abundance of data. Eigenfunctions of the Koopman operator are now widely sought, as they provide intrinsic coordinates that globally linearize nonlinear dynamics. Despite the immense promise of Koopman embeddings, obtaining representations has proven difficult in all but the simplest systems, and representations are often intractably complex or are the output of uninterpretable black-box optimizations. In this work, we utilize the power of deep learning for flexible and general representations of the Koopman operator, while enforcing a network structure that promotes parsimony and interpretability of the resulting models.

Neural networks (NNs), which form the theoretical architecture of deep learning, were inspired by the primary visual cortex of cats where neurons are organized in hierarchical layers of cells to process visual stimulus [20]. The first mathematical model of a NN was the neocognitron [13] which has many of the features of modern deep neural networks (DNNs), including a multi-layer structure, convolution, max pooling and nonlinear dynamical nodes. Importantly, the universal approximation theorem [11, 18, 19] guarantees that a NN with sufficiently many hidden units and a linear output layer is capable of representing any arbitrary function, including our desired Koopman eigenfunctions. Although NNs have a four-decade history, the analysis of the ImageNet data set [28], containing over 15 million labeled images in 22,000 categories, provided a watershed moment [30]. Indeed, powered by the rise of big data and increased computational power, deep learning is resulting in transformative progress in many data-driven classi-fication and identification tasks [30, 28, 16]. A strength of deep learning is that features of the data are built hierarchically, which enables the representation of complex functions. Thus, deep learning can accurately fit functions without hand-designed features or the user choosing a good basis [16]. However, a current challenge in deep learning research is the identification of parsimonious, interpretable, and transferable models [54].

Deep learning has the potential to enable a scaleable and data-driven architecture for the discovery and representation of Koopman eigenfunctions, providing intrinsic linear representations of strongly nonlinear systems. This approach alleviates two key challenges in modern dynamical systems: 1) equations are often unknown for systems of interest [5, 47, 8], as in climate, neuroscience, epidemiology, and finance; and, 2) low-dimensional dynamics are typically embedded in a high-dimensional state space, requiring scaleable architectures that discover dynamics on latent variables. Although NNs have also been used to model dynamical systems [15] and other physical processes [39] for decades, great strides have been made recently in using DNNs to learn Koopman embeddings, resulting in several excellent papers [50, 33, 48, 53, 44, 31]. For example, the VAMPnet architecture [50, 33] uses a time-lagged auto-encoder and a custom variational score to identify Koopman coordinates on an impressive protein folding example. In all of these recent studies, DNN representations have been shown to be more flexible and exhibit higher accuracy than other leading methods on challenging problems.

The focus of this work is on developing DNN representations of Koopman eigenfunctions that remain interpretable and parsimonious, even for high-dimensional and strongly nonlinear systems. Our approach (See Fig. 1) differs from previous studies, as we are focused specifically on obtaining parsimonious models that match the intrinsic low-rank dynamics while avoiding overfitting and remaining interpretable, thus merging the best of DNN architectures and Koopman theory. In particular, many dynamical systems exhibit a continuous eigenvalue spectrum, which confounds low-dimensional representation using existing DNN or Koopman representations. This work develops a generalized framework and enforces new constraints specifically designed to extract the fewest meaningful eigenfunctions in an interpretable manner. For systems with continuous spectra, we utilize an augmented network to parameterize the linear dynamics on the intrinsic coordinates, avoiding an infinite asymptotic expansion in harmonic eigenfunctions. Thus, the resulting networks remain parsimonious, and the few key eigenfunctions are interpretable. We demonstrate our deep learning approach to Koopman on several examples designed to illustrate the strength of the method, while remaining intuitive in terms of classic dynamical systems.

2 Data-driven dynamical systems

To give context to our deep learning approach to identify Koopman eigenfunctions, we first summarize highlights and challenges in the data-driven discovery of dynamics. Throughout this work, we will consider discrete-time dy-

namical systems,

where is the state of the system and F represents the dynamics that map the state of the system forward in time. Discrete-time dynamics often describe a continuoustime system that is sampled discretely in time, so that with sampling time . The dynamics in F are generally nonlinear, and the state x may be high dimensional, although we typically assume that the dynamics evolve on a low-dimensional attractor governed by persistent coherent structures in the state space [10]. Note that F is often unknown and only measurements of the dynamics are available.

The dominant geometric perspective of dynamical systems, in the tradition of Poincar´e, concerns the organization of trajectories of Eq. 1, including fixed points, periodic orbits, and attractors. Formulating the dynamics as a system of differential equations in x often admits compact and ef-ficient representations for many natural systems [8]; for example, Newton’s second law is naturally expressed by Eq. 1. However, the solution to these dynamics may be arbitrarily complicated, and possibly even irrepresentable, except for special classes of systems. Linear dynamics, where the map F is a matrix that advances the state x, are among the few systems that admit a universal solution, in terms of the eigenvalues and eigenvectors of the matrix F, also known as the spectral expansion.

Koopman operator theory

In 1931, B. O. Koopman provided an alternative description of dynamical systems in terms of the evolution of functions in the Hilbert space of possible measurements y = g(x) of the state [25]. The so-called Koopman operator, K, that advances measurement functions is an infinite-dimensional linear operator:

Koopman analysis has gained significant attention recently with the pioneering work of Mezic et al. [38, 35, 9, 36, 34], and in response to the growing wealth of measurement data and the lack of known equations for many systems [8, 29]. Representing nonlinear dynamics in a linear framework, via the Koopman operator, has the potential to enable advanced nonlinear prediction, estimation, and control using the comprehensive theory developed for linear systems. However, obtaining finite-dimensional approximations of the infinite-dimensional Koopman operator has proven challenging in practical applications.

Finite-dimensional representations of the Koopman operator are often approximated using the dynamic mode decomposition (DMD) [45, 29], introduced by Schmid [46]. By construction, DMD identifies spatio-temporal coherent structures from a high-dimensional dynamical system, although it does not generally capture nonlinear transients since it is based on linear measurements of the system,

Figure 1: Diagram of our deep learning schema to identify Koopman eigenfunctions . (a) Our network is based on a deep auto-encoder, which is able to identify intrinsic coordinates and decode these coordinates to recover . (b,c) We add an additional loss function to identify a linear Koopman model K that advances the intrinsic variables y forward in time. In practice, we enforce agreement with the trajectory data for several iterations through the dynamics, i.e. . In (b), the loss function is evaluated on the state variable x and in (c) it is evaluated on y.

g(x) = x. Extended DMD (eDMD) and the related variational approach of conformation dynamics (VAC) [41, 42, 43] enriches the model with nonlinear measurements [51, 31]; for more details, see SI Appendix. Identifying regression models based on nonlinear measurements will generally result in closure issues, as there is no guarantee that these measurements form a Koopman invariant subspace [7]. The resulting models are of exceedingly high dimension, and when kernel methods are employed [52], the models may become uninterpretable. Instead, many approaches seek to identify eigenfunctions of the Koopman operator directly, satisfying:

Eigenfunctions are guaranteed to span an invariant subspace, and the Koopman operator will yield a matrix when restricted to this subspace [7, 21]. In practice, Koopman eigenfunctions may be more difficult to obtain than the solution of (1); however, this is a one-time up-front cost that yields a compact linear description. The challenge of identifying and representing Koopman eigenfunctions provides strong motivation for the use of powerful emerging deep learning methods [50, 33, 48, 53, 44, 31].

Koopman for systems with continuous spectra

The Koopman operator provides a global linearization of the dynamics. The concept of linearizing dynamics is not new, and locally linear representations are commonly obtained by linearizing around fixed points and periodic orbits [17]. Indeed, asymptotic and perturbation methods have been widely used since the time of Newton to approximate solutions of nonlinear problems by starting from the exact solution of a related, typically linear problem. The classic pendulum, for instance, satisfies the differential equation and has eluded an analytic solution since its mathematical inception. The linear problem associated with the pendulum involves the small angle approximation whereby and only the first term is retained in order to yield exact sinusoidal solutions. The next correction involving the cubic term gives the Duffing equation which is one of the most commonly studied nonlinear oscillators in physics [17]. Importantly, the cubic contribution is known to shift the linear oscillation frequency of the pendulum, as well as generate harmonics such as [4, 22]. An exact representation of the solution can be derived in terms of Jacobi elliptic functions, which have a Taylor series representation in terms of an infinite sum of sinusoids with frequencies where . Thus, the simple pendulum oscillates at the (linear) natural frequency for small deflections, and as the pendulum energy is increased, the frequency decreases continuously, resulting in a so-called continuous spectrum.

The importance of accounting for the continuous spectrum was discussed in 1932 in an extension by Koopman and von Neumann [26]. A continuous spectrum, as described for the simple pendulum, is characterized by a continuous range of observed frequencies, as opposed to the discrete spectrum consisting of isolated, fixed frequencies. This phenomena is observed in a wide range of physical systems that exhibit broadband frequency content, such as turbulence and nonlinear optics. The continuous spectrum thus confounds simple Koopman descriptions, as there is

Figure 2: Schematic of modified schema with auxiliary network to identify (parametrize) the continuous eigenvalue spectrum . This facilitates an aggressive dimensionality reduction in the auto-encoder, avoiding the need for higher harmonics of the fundamental frequency that are generated by the nonlinearity [4, 22]. For purely oscillatory motion, as in the pendulum, we identify the continuous frequency .

not a straightforward finite approximation in terms of a small number of eigenfunctions [34]. Indeed, away from the linear regime, an infinite Fourier sum is required to approximate the shift in frequency and eigenfunctions. In fact, in some cases, eigenfunctions may not exist at all.

Recently, there have been several algorithmic advances to approximate systems with continuous spectra, including nonlinear Laplacian spectral analysis [14] and the use of delay coordinates [6, 2]. A critically enabling innovation of the present work is explicitly accounting for the parametric dependence of the Koopman operator on the continuously varying , related to the classic perturbation results above. By constructing an auxiliary network (See Fig. 2) to first determine the parametric dependency of the Koopman operator on the frequency , an interpretable low-rank model of the intrinsic dynamics can then by constructed. In particular, a nonlinear oscillator with continuous spectrum may now be represented as a single pair of conjugate eigenfunctions, mapping trajectories into perfect sines and cosines, with a continuous eigenvalue parameterizing the frequency. If this explicit frequency dependence is unaccounted for, then a high-dimensional network is necessary to account for the shifting frequency and eigenvalues. We conjecture that previous Koopman models using high-dimensional DNNs represent the harmonic series expansion required to approximate the continuous spectrum for systems such as the Duffing oscillator.

3 Deep learning to identify Koopman eigenfunctions

The overarching goal of this work is to leverage the power of deep learning to discover and represent eigenfunctions of the Koopman operator. Our perspective is driven by the need for parsimonious representations that are efficient, avoid overfitting, and provide minimal descriptions of the dynamics on interpretable intrinsic coordinates. Unlike previous deep learning approaches to Koopman [50, 33, 48, 53], our network architecture is designed specifically to handle a ubiquitous class of nonlinear systems characterized by a continuous frequency spectrum generated by the nonlinearity. A continuous spectrum presents unique challenges for compact and interpretable representation, and our approach is inspired by the classical asymptotic and perturbation approaches in dynamical systems.

Our core network architecture is shown in Fig. 1, and it is modified in Fig. 2 to handle the continuous spectrum. The objective of this network is to identify a few key intrinsic coordinates spanned by a set of Koopman eigenfunctions , along with a dynamical system . There are three high-level requirements for the network, corresponding to three types of loss functions used in training:

1. Intrinsic coordinates that are useful for reconstruction. We seek to identify a few intrinsic coordinates where the dynamics evolve, along with an inverse x = so that the state x may be recovered. This is achieved using an auto-encoder (See Figure 1 a.), where is the encoder and is the decoder. The dimension p of the auto-encoder subspace is a hyperparameter of the network, and this choice may be guided by knowledge of the system. Reconstruction accuracy of the auto-encoder is achieved using the following loss: .

2. Linear dynamics. To discover Koopman eigenfunctions, we learn the linear dynamics K on the intrinsic coordinates, i.e., . Linear dynamics are achieved using the following loss: . More generally, we enforce linear prediction over m time steps with the loss: . (See Figure 1 c.)

3. Future state prediction. Finally, the intrinsic coordinates must enable future state prediction. Specifically, we identify linear dynamics in the matrix K. This corresponds to the loss , and more generally . (See Figure 1 b.)

Our norm is mean-squared error, averaging over dimension then number of examples, and we add regularization.

To address the continuous spectrum, we allow the eigenvalues of the matrix K to vary, parametrized by the function , which is learned by an auxiliary network (See Fig. 2). The eigenvalues are then used to parametrize block-diagonal . For each pair of complex eigenvalues, the discrete-time K has a Jordan block of the form:

This network structure allows the eigenvalues to vary across phase space, facilitating a small number of eigen-

Figure 3: Demonstration of neural network embedding of Koopman eigenfunctions for simple system with a discrete eigenvalue spectrum.

functions. To enforce circular symmetry in the eigenfunction coordinates, we often parameterize the eigenvalues by the radius . The second and third prediction loss function must also be modified for systems with continuous spectrum, as discussed in the SI Appendix.

To train our network, we generate trajectories from random initial conditions, which are split into training, validation, and test sets. Models are trained on the training set and compared on the validation set, which is also used for early stopping to prevent overfitting. We report accuracy on the test set.

4 Results

We demonstrate our deep learning approach to identify Koopman eigenfunctions on several example systems, including a simple model with a discrete spectrum and two examples that exhibit a continuous spectrum: the nonlinear pendulum and the high-dimensional unsteady fluid flow past a cylinder.

Simple model with discrete spectrum

Before analyzing systems with the additional challenges of a continuous spectrum and high-dimensionality, we consider a simple nonlinear system with a single fixed point and a discrete eigenvalue spectrum:

This dynamical system has been well-studied in the literature [49, 7], and for stable eigenvalues , the system exhibits a slow manifold given by ; we use and . As shown in Fig. 3, the Koopman embedding identifies nonlinear coordinates that flatten this inertial manifold, providing a globally linear representation of the dynamics; moreover, the correct Koopman eigenvalues are identified. Specific details about the network and training procedure are provided in the SI Appendix.

Nonlinear pendulum with continuous spectrum

As a second example, we consider the nonlinear pendulum, which exhibits a continuous eigenvalue spectrum with increasing energy:

Although this is a simple mechanical system, it has eluded parsimonious representation in the Koopman framework. The deep Koopman embedding is shown in Fig. 4, where it is clear that the dynamics are linear in the eigenfunction coordinates, given by . As the Hamiltonian energy of the system increases, corresponding to an elongation of the oscillation period, the parameterized Koopman network accounts for this continuous frequency shift and provides a compact representation in terms of two conjugate eigenfunctions. Alternative network architectures that are not specifically designed to account for continuous spectra with an auxiliary network would be forced to approximate this frequency shift with the classical asymptotic expansion in terms of harmonics. The resulting network would be overly bulky and would limit interpretability.

Recall that we have three types of losses on the network: reconstruction, prediction, and linearity. Figure 4II.A shows that the network is able to function as an auto-encoder, accurately reconstructing the ten example trajectories. Next, we show that the network is able to predict the evolution of the system. Figure 4II.B shows the prediction horizon for ten initial conditions that are simulated forward with the network, stopping the prediction when the relative error reaches 10%. As expected, the prediction horizon deteriorates as the energy of the initial condition increases, although the prediction is still quite accurate. Finally, we demonstrate that the dynamics in the intrinsic coordinates y are truly linear, as shown by the nearly concentric circles in Fig. 4II.C. The eigenfunctions and are shown in Fig. 4III, and are connected to theory [37] in the SI Appendix.

High-dimensional nonlinear fluid flow

As our final example, we consider the nonlinear fluid flow past a circular cylinder at Reynolds number 100 based on diameter, which is characterized by vortex shedding. This model has been a benchmark in fluid dynamics for decades [40], and has been extensively analyzed in the context of data-driven modeling [8, 32] and Koopman analysis [3]. In 2003, Noack et al. [40] showed that the high-dimensional dynamics evolve on a low-dimensional attractor, given by a slow-manifold in the following model:

This mean-field model exhibits a stable limit cycle corresponding to von Karman vortex shedding, and an unstable

Figure 4: Illustration of deep Koopman eigenfunctions for the nonlinear pendulum. The pendulum, although a simple mechanical system, exhibits a continuous spectrum, making it difficult to obtain a compact representation in terms of Koopman eigenfunctions. However, by leveraging a generalized network, as in Fig. 2, it is possible to identify a parsimonious model in terms of a single complex conjugate pair of eigenfunctions, parameterized by the frequency . In eigenfunction coordinates, the dynamics become linear, and orbits are given by perfect circles. For the sake of visualization, we use ten evenly spaced trajectories instead of the random trajectories in the testing set.

equilibrium corresponding to a low-drag condition. Starting near this equilibrium, the flow unwinds up the slow manifold toward the limit cycle. In [32], Loiseau showed that this flow may be modeled by a nonlinear oscillator with state-dependent damping, making it amenable to the continuous spectrum analysis. We use trajectories from this model to train a Koopman network, and the resulting eigenfunctions are shown in Fig. 5.

In this example, the damping rate and frequency are allowed to vary along level sets of the radius in eigenfunction coordinates, so that and , where ; this is accomplished with an auxiliary network as in Fig. 2. Although we only show the ability of the model to predict the future state in Fig. 5, corresponding to the second and third loss functions, the network also functions as an autoencoder.

5 Discussion

In summary, we have employed powerful deep learning approaches to identify and represent coordinate transformations that recast strongly nonlinear dynamics into a globally linear framework. Our approach is designed to discover eigenfunctions of the Koopman operator, which provide an intrinsic coordinate system to linearize nonlinear systems, and have been notoriously difficult to identify and represent using alternative methods. Building on a deep auto-

encoder framework, we enforce additional constraints and loss functions to identify Koopman eigenfunctions where the dynamics evolve linearly. Moreover, we generalize this framework to include a broad class of nonlinear systems that exhibit a continuous eigenvalue spectrum, where a continuous range of frequencies is observed. Continuousspectrum systems are notoriously difficult to analyze, especially with Koopman theory, and naive learning approaches require asymptotic expansions in terms of higher order harmonics of the fundamental frequency, leading to unwieldy models. In contrast, we utilize an auxiliary network to parametrize and identify the continuous frequency, which then parameterizes a compact Koopman model on the auto-encoder coordinates. Thus, our deep neural network models remain both parsimonious and interpretable, merging the best of neural network representations and Koopman embeddings. In most deep learning applications, although the basic architecture is extremely general, considerable expert knowledge and intuition is typically used in the training process and in designing loss functions and constraints. Throughout this paper, we have also used physical insight and intuition from asymptotic theory and continuous spectrum dynamical systems to design parsimonious Koopman embeddings.

The use of deep learning in physics and engineering is increasing at an incredible rate, and this trend is only expected to accelerate. Nearly every field of science is revis-

Figure 5: Learned Koopman eigenfunctions for the mean-field model of fluid flow past a circular cylinder at Reynolds number 100. (top) Reconstruction of trajectory from linear Koopman model with two states; modes for each of the state space variables x are shown along the coordinate axes. (bottom) Koopman reconstruction in eigenfunction coordinates y, along with eigenfunctions .

iting challenging problems of central importance from the perspective of big data and deep learning. With this explosion of interest, it is imperative that we as a community seek machine learning models that favor interpretability and promote physical insight and intuition. In this challenge, there is a tremendous opportunity to gain new understanding and insight by applying increasingly powerful techniques to data. For example, discovering Koopman eigenfunctions will result in new symmetries and conservation laws, as conserved eigenfunctions are related to conservation laws via a generalized Noether’s theorem. It will also be important to apply these techniques to increasingly challenging problems, such as turbulence, epidemiology, and neuroscience, where data is abundant and models are needed. The goal is to model these systems with a small number of coupled nonlinear oscillators using similar parameterized Koopman embeddings. Finally, the use of deep learning to discover Koopman eigenfunctions may enable transformative advances in the nonlinear control of complex systems. All of these future directions will be facilitated by more powerful network representations.

Acknowledgements

We acknowledge generous funding from the Army Research Office (ARO W911NF-17-1-0306) and the Defense Advanced Research Projects Agency (DARPA HR0011-16-C-0016). We would like to thank many people for valuable discussions about neural networks and Koopman theory: Bing Brunton, Karthik Duraisamy, Eurika Kaiser, Bernd Noack, and Josh Proctor, and especially Jean-Christophe Loiseau, Igor Mezi´c, and Frank No´e.

References

[1] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man´e, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi´egas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

[2] Hassan Arbabi and Igor Mezi´c. Ergodic theory, dynamic mode decomposition and computation of spectral properties of the koopman operator. SIAM J. Appl. Dyn. Syst., 16(4):2096–2126, 2017.

[3] Shervin Bagheri. Koopman-mode decomposition of the cylinder wake. Journal of Fluid Mechanics, 726:596– 623, 2013.

[4] Carl M Bender and Steven A Orszag. Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory. Springer Science & Business Media, 2013.

[5] Josh Bongard and Hod Lipson. Automated reverse en- gineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 104(24):9943–9948, 2007.

[6] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz. Chaos as an intermittently forced linear system. Nature Communications, 8(19):1–9, 2017.

[7] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N Kutz. Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE, 11(2):e0150171, 2016.

[8] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Nat. Acad. Sci., 113(15):3932–3937, 2016.

[9] Marko Budiˇsi´c, Ryan Mohr, and Igor Mezi´c. Applied Koopmanism a). Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(4):047510, 2012.

[10] Mark C Cross and Pierre C Hohenberg. Pattern forma- tion outside of equilibrium. Reviews of modern physics, 65(3):851, 1993.

[11] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2(4):303–314, December 1989.

[12] Geir. E. Dullerud and Fernando Paganini. A course in robust control theory: A convex approach. Texts in Applied Mathematics. Springer, Berlin, Heidelberg, 2000.

[13] F. Fukushima. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetic, 36:193–202, 1980.

[14] Dimitrios Giannakis and Andrew J Majda. Nonlinear laplacian spectral analysis for time series with intermittency and low-frequency variability. Proc. Nat. Acad. Sci., 109(7):2222–2227, 2012.

[15] R Gonzalez-Garcia, R Rico-Martinez, and IG Kevrekidis. Identification of distributed parameter systems: A neural net based approach. Comp. & Chem. Eng., 22:S965–S968, 1998.

[16] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www. deeplearningbook.org.

[17] Philip Holmes and John Guckenheimer. Nonlinear oscillations, dynamical systems, and bifurcations of vector fields, volume 42 of Applied Mathematical Sciences. SpringerVerlag, 1983.

[18] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.

[19] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks, 3(5):551–560, 1990.

[20] D. H. Hubel and T. N. Wiesel. Receptive fields, binoc- ular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology, 160:106–154, 1962.

[21] E. Kaiser, J. N. Kutz, and S. L. Brunton. Datadriven discovery of Koopman eigenfunctions for control. arXiv preprint arXiv:1707.01146, 2017.

[22] Jirayr Kevorkian and Julian D Cole. Perturbation methods in applied mathematics, volume 34. Springer Science & Business Media, 2013.

[23] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[24] Stefan Klus, Feliks N¨uske, P´eter Koltai, Hao Wu, Ioan- nis Kevrekidis, Christof Sch¨utte, and Frank No´e. Datadriven model reduction and transfer operator approximation. Journal of Nonlinear Science, 2018.

[25] B. O. Koopman. Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences, 17(5):315–318, 1931.

[26] BO Koopman and J v Neumann. Dynamical systems of continuous spectra. Proceedings of the National Academy of Sciences of the United States of America, 18(3):255, 1932.

[27] Milan Korda and Igor Mezi´c. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. arXiv preprint arXiv:1611.03537, 2016.

[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin- ton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.

[29] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proc- tor. Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM, 2016.

[30] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.

[31] Qianxiao Li, Felix Dietrich, Erik M Bollt, and Ioan- nis G Kevrekidis. Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the koopman operator. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(10):103111, 2017.

[32] J.-C. Loiseau and S. L. Brunton. Constrained sparse Galerkin regression. Journal of Fluid Mechanics, 838:42– 67, 2018.

[33] Andreas Mardt, Luca Pasquali, Hao Wu, and Frank No´e. VAMPnets: Deep learning of molecular kinetics. Nature Communications, 9(5), 2018.

[34] I. Mezi´c. Spectral operator methods in dynamical systems: Theory and applications. Springer, 2017.

[35] Igor Mezi´c. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1-3):309–325, 2005.

[36] Igor Mezic. Analysis of fluid flows via spectral prop- erties of the Koopman operator. Annual Review of Fluid Mechanics, 45:357–378, 2013.

[37] Igor Mezic. Koopman operator spectrum and data analysis. arXiv preprint arXiv:1702.07597, 2017.

[38] Igor Mezi´c and Andrzej Banaszuk. Comparison of sys- tems with complex behavior. Physica D: Nonlinear Phenomena, 197(1):101–133, 2004.

[39] Michele Milano and Petros Koumoutsakos. Neural network modeling for near wall turbulent flow. Journal of Computational Physics, 182(1):1–26, 2002.

[40] B. R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor, and F. Thiele. A hierarchy of low-dimensional models for the transient and post-transient cylinder wake. Journal of Fluid Mechanics, 497:335–363, 2003.

[41] Frank No´e and Feliks Nuske. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling & Simulation, 11(2):635–655, 2013.

[42] Feliks N¨uske, Bettina G Keller, Guillermo P´erez- Hern´andez, Antonia SJS Mey, and Frank No´e. Variational approach to molecular kinetics. Journal of chemical theory and computation, 10(4):1739–1752, 2014.

[43] Feliks N¨uske, Reinhold Schneider, Francesca Vitalini, and Frank No´e. Variational tensor approach for approximating the rare-event kinetics of macromolecular systems. J. Chem. Phys., 144(5):054105, 2016.

[44] Samuel E Otto and Clarence W Rowley. Linearlyrecurrent autoencoder networks for learning dynamics. arXiv preprint arXiv:1712.01378, 2017.

[45] C. W. Rowley, I. Mezi´c, S. Bagheri, P. Schlatter, and D.S. Henningson. Spectral analysis of nonlinear flows. J. Fluid Mech., 645:115–127, 2009.

[46] P. J. Schmid. Dynamic mode decomposition of numer- ical and experimental data. Journal of Fluid Mechanics, 656:5–28, August 2010.

[47] Michael Schmidt and Hod Lipson. Distilling freeform natural laws from experimental data. Science, 324(5923):81–85, 2009.

[48] Naoya Takeishi, Yoshinobu Kawahara, and Takehisa Yairi. Learning koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pages 1130–1140, 2017.

[49] Jonathan H. Tu, Clarence W. Rowley, Dirk M. Luchten- burg, Steven L. Brunton, and J. Nathan Kutz. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014.

[50] Christoph Wehmeyer and Frank No´e. Time-lagged au- toencoders: Deep learning of slow collective variables for molecular kinetics. arXiv preprint arXiv:1710.11239, 2017.

[51] Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. J. Nonlin. Sci., 6:1307–1346, 2015.

[52] Matthew O Williams, Clarence W Rowley, and Ioan- nis G Kevrekidis. A kernel approach to data-driven Koopman spectral analysis. Journal of Computational Dynamics, 2(2):247–265, 2015.

[53] Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. arXiv preprint arXiv:1708.06850, 2017.

[54] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328, 2014.

Supporting Information (SI)

Model problems and training datasets

We demonstrate the ability of deep learning to represent Koopman eigenfunctions on three example systems, shown in Fig. 6.

The first system has a single fixed point and a discrete eigenvalue spectrum:

where and . This provides a benchmark system to test our architecture. The second system is the nonlinear pendulum:

The nonlinear pendulum exhibits a continuous eigenvalue spectrum as the energy of the pendulum is increased, posing a significant challenge for classical Koopman embedding techniques. In this example, we consider the frictionless pendulum, so that the system is conservative and trajectories evolve on Hamiltonian energy level sets.

The third example is the mean-field model [40, 32] for the fluid flow past a circular cylinder at Reynolds number 100:

where , and . This system is a challenging canonical system in fluid dynamics, and is a model for the vortex shedding observed behind bluff bodies. The system exhibits a slow manifold, and we consider two cases corresponding to trajectories that start on the slow manifold and trajectories that start off of the manifold.

Creating the datasets

We create our datasets by solving the systems of differential equations in MATLAB using the ode45 solver.

For each dynamical system, we choose 5000 initial conditions for the test set, 5000 for the validation set, and 5000-20000 for the training set (see Table 1). For each initial condition, we solve the differential equations for some time span. That time span is t = 0, .02, . . . , 1 for the discrete spectrum and pendulum datasets. Since the dynamics on the slow manifold for the fluid flow example are slower and more complicated, we increase the time span for that dataset to t = 0, .05, . . . , 6. However, when we include data off the slow manifold, we want to capture the fast dynamics as the trajectories are attracted to the slow manifold, so we change the time span to t = 0, .01, . . . , 1.

The discrete spectrum dataset is created from random initial conditions x where , since this portion of phase space is sufficient to capture the dynamics.

The pendulum dataset is created from random initial conditions x where (just under ), , and the potential function is under 0.99. The potential function for the pendulum is . These ranges are chosen to sample the pendulum in the full phase space where the pendulum approaches having an infinite period.

Figure 6: Three model problems used to demonstrate deep neural network embedding of Koopman eigenfunctions. (top) Simple system with discrete spectrum and a single fixed point, (middle) nonlinear pendulum, exhibiting a continuous spectrum, and (bottom) unsteady fluid flow past a cylinder at Reynolds number 100.

The fluid flow problem limited to the slow manifold is created from random initial conditions x on the bowl where , and . This captures all types of dynamics on the slow manifold, including spiraling in towards the limit cycle at r = 1 and spiraling out towards it.

The fluid flow problem beyond the slow manifold is created from random initial conditions x where , and . These limits are chosen to include the dynamics on the slow manifold covered by the previous dataset, as well as trajectories that begin off the slow manifold. Any trajectory that grows to is eliminated so that the domain is reasonably compact and well-sampled.

Table 1: Dataset Sizes

Network architecture and training

Code

We use the Python API for the TensorFlow framework [1] and the Adam optimizer [23] for training. All of our code is available online at github.com/BethanyL/ DeepKoopman.

Network architecture

Each hidden layer has the form of Wx + b followed by an activation with the rectified linear unit (ReLU): f(x) = max{0, x}. In our experiments, training was significantly faster with ReLU as the activation function than with sigmoid. See Table 2 for the number of hidden layers in the encoder, decoder, and auxiliary network, as well as their widths. The output layers of the encoder, decoder, and auxiliary network are linear (simply Wx + b).

The input to the auxiliary network is y, and it outputs the parameters for the eigenvalues of K. For each complex conjugate pair of eigenvalues , the network de-fines a function mapping to and , where and are the corresponding eigenfunctions. Similarly, for each real eigenvalue , the network defines a function mapping to . For example, for the fluid flow problem off the attractor, we have three eigenfunctions. The auxiliary network learns a map from to and and an- other map from to . This could be implemented as one network defining a mapping where the layers are not fully connected (should not influence and should not influence and ). However, for simplicity, we implement this as two separate auxiliary networks, one for the complex conjugate pair of eigenvalues and one for the the real eigenvalue.

Table 2: Network Architecture

Explicit loss function

Our loss function has three weighted mean-squared error components: reconstruction accuracy , future state prediction , and linearity of dynamics . Since we know that there are no outliers in our data, we also use an term to penalize the data point with the largest loss. Finally, we add regularization on the weights W to avoid overfitting. More specifically:

where MSE refers to mean squared error and T is the number of time steps in each trajectory. The weights , and are hyperparameters. The integer is a hyperparameter for how many steps to check in the prediction loss. The hyperparameters , and are listed in Table 3.

Table 3: Loss Hyperparameters

The matrix K is parametrized by the function , which is learned by an auxiliary network. The eigenvalues can vary along a trajectory, so in and . However, on Hamiltonian systems, such as the pendulum, the eigenvalues are constant along each trajectory. If a system is known to be Hamiltonian, the network training could be sped up by encoding the constraint that . In order to demonstrate that this specialized knowledge is not necessary, we use the more general case for all of our datasets, including the pendulum.

Training

We initialize each weight matrix W randomly from a uniform distribution in the range for , where a is the dimension of the input of the layer. This distribution was suggested in [16]. Each bias vector b is initialized to 0. The model for the discrete spectrum example is trained for two hours on an NVIDIA K80 GPU. The other models are each trained for six hours. The learning rate for the

Figure 7: On the left is the average prediction error as the number of prediction steps increases for the discrete spectrum example. On the right, for each trajectory, we show how many steps the network can take before reaching 10% relative error.

Adam optimizer is 0.001. On the pendulum and fluid flow datasets, for five minutes, we pretrain the network to be a simple autoencoder, using the autoencoder loss but not the linearity or prediction losses, as this speeds up the training. For each dynamical system, we train multiple models in a random search of parameter space and choose the one with the lowest validation error. Each model is initialized with different random weights. We also use early stopping; for each model, at the end of training, we resume the step with the lowest validation error.

Results

The training, validation, and test errors for all examples are reported in Table 4.

Discrete spectrum example

Figure 7 shows the average prediction error versus the number of prediction steps. Even for a large number of steps, the error is quite small, giving good prediction. This fig-ure also demonstrates prediction performance on example trajectories. The eigenfunctions for this example are shown in Fig. 8. We see that one is quadratic and the other is linear. This is expected because we can analytically derive that and is a pair of eigenfunctions for this system, where . When the eigenvalues are allowed to vary with the auxiliary network used for continuous spectrum systems, the eigenvalues remain relatively constant, near the true values of and , as shown in Fig. 9.

Table 4: Errors for each Problem

Figure 8: Eigenfunctions for discrete spectrum example.

Figure 9: When the eigenvalues of the discrete spectrum example are allowed to vary in terms of and , they remain relatively constant; i.e., they are close to the true values and as expected.

Nonlinear pendulum

The nonlinear pendulum is one of the simplest examples that exhibits a continuous eigenvalue spectrum. Using the auxiliary network, we allow the frequency of the Koopman eigenvalues to vary continuously with the embedded coordinates and , as shown in Fig. 10. The frequency varies smoothly with the radius, from around to as the energy is increased. When the damping rate is also allowed to vary continuously, it remains nearly constant around the value of , since the system is conservative. Figure 11 shows the Koopman eigenfunctions in magnitude and phase coordinates, where it can be seen that that magnitude essentially traces level sets of the Hamiltonian energy. This is consistent with previous theoretical derivations of Mezic [37], and we thank him for communicating this connection to us.

Fluid flow on attractor

For the final example, we consider the nonlinear fluid vortex shedding behind a cylinder. We begin by considering dynamics on the attracting manifold. When we train the network with trajectories on the slow manifold, we are able to identify a single conjugate eigenfunction pair, shown in Fig. 13. The corresponding continuously varying eigenvalues are shown in Fig. 12, where it can be seen that the frequency is extremely close to the true constant , while the damping varies significantly, and in fact switches stability for trajectories outside the natural limit cycle. This is consistent with the data-driven model of Loiseau [32].

Figure 10: Eigenvalues for the pendulum vary in terms of and . Note that the frequency decreases as the radius increases, and .

Figure 11: Magnitude and phase of the pendulum eigenfunctions.

Figure 12: Continuous eigenvalues as a function of and . Note that the frequency . The parameter shows growth inside the limit cycle (marked in red) and decay outside the limit cycle.

Figure 13: Magnitude and phase of the eigenfunctions for the fluid flow on the attracting slow manifold.

Figure 14: The upper left plot is the average prediction error as the number of prediction steps increases for the fluid flow example with trajectories starting off the attractor. The upper right plot shows a trajectory on the attractor in linear coordinates and . The bottom row shows two examples of trajectories that begin off the attractor. The Koopman model is able to reconstruct both given only the initial condition.

Fluid flow off attractor

We now consider the case where we train a network using trajectories that start off of the attracting slow manifold. Figure 14 shows the prediction performance of the Koopman neural network for trajectories that start away from the bowl; in both cases, the dynamics are faithfully captured and the dynamics are propagated forward until the limit cycle.

The eigenfunctions are shown in Fig. 15, where it can be seen that the mode shapes match those in the on-attractor data in Fig. 13. The continuously varying eigenvalues are shown in Fig. 16. Again, similar to the on-attractor case, the damping varies considerably with radius, while the frequency is very nearly a constant .

Miscellaneous notes

Connection between eDMD and VAC

It has recently been shown that eDMD is equivalent to the variational approach of conformation dynamics (VAC) [41, 42, 43], first derived by No´e and N¨uske in 2013 to simulate molecular dynamics with a broad separation of timescales. Further connections between eDMD and VAC and between DMD and the time lagged independent component analysis (TICA) are explored in a recent review [24]. A key contribution of VAC is a variational score enabling the objec-

Figure 15: Eigenfunctions for the fluid flow example for trajectories starting off the attractor, corresponding to the complex conjugate pair of eigenvalues; the second row contains the magnitude and phase of those eigenfunctions.

Figure 16: Parameter variations for the complex eigenvalues in terms of and . Note that this is a natural extension of Fig. 12, which is limited to data on the bowl.

tive assessment of Koopman models via cross-validation. Recently, eDMD has been demonstrated to improve model predictive control performance in nonlinear systems [27].

designed for accessibility and to further open science