Discovery of Dynamics Using Linear Multistep Methods

2019·Arxiv

Abstract

Abstract

Linear multistep methods (LMMs) are popular time discretization techniques for the numerical solution of differential equations. Traditionally they are applied to solve for the state given the dynamics (the forward problem), but here we consider their application for learning the dynamics given the state (the inverse problem). This repurposing of LMMs is largely motivated by growing interest in data-driven modeling of dynamics, but the behavior and analysis of LMMs for discovery turn out to be significantly different from the well-known, existing theory for the forward problem. Assuming a highly idealized setting of being given the exact state with a zero residual of the discrete dynamics, we establish for the first time a rigorous framework based on refined notions of consistency and stability to yield convergence using LMMs for discovery. When applying these concepts to three popular step LMMs, the Adams-Bashforth, Adams-Moulton, and Backwards Differentiation Formula schemes, the new theory suggests that Adams-Bashforth for M ranging from 1 and 6, Adams-Moulton for M = 0 and M = 1, and Backwards Differentiation Formula for all positive M are convergent, and, otherwise, the methods are not convergent in general. In addition, we provide numerical experiments to both motivate and substantiate our theoretical analysis.

Key words. discovery of dynamics, data-driven modeling, linear multistep methods, stability and convergence, root condition, learning dynamics, artificial intelligence

1. Introduction. In this work, we focus on developing a new numerical analysis framework for the discovery of dynamical systems with given states, where finitely many discrete measurements are used to approximately recover the unknown dynamical system – a data-driven discovery of dynamics [5, 44]. Data-driven discovery of dynamical systems is experiencing a renaissance as costs of sensors, data storage, and computational resources has decreased [42]. Meanwhile, advancements in the fields of machine learning and data science [17, 22, 27, 28, 45] have brought in renewed vigor and enabled expansive view to this field. At the same time, the growth of data-driven discovery of dynamical systems has also led to a new solution method and model reduction approach to study multiscale and high dimensional complex problems. For more discussions, we refer to works such as [3, 6, 18, 19, 23, 25, 26, 29, 30, 31, 35, 36, 37, 38, 39, 40, 41, 43, 48, 50, 51, 53, 54, 55, 56].

methods. In this work, we consider using linear multistep methods (LMMs) to discover unspecified dynamics given the state at equidistant time steps and contribute to the fundamental theory of using LMMs for data-driven discovery. Historically, LMMs have been developed as popular schemes for numerically integrating known dynamic systems [16], with well-established mathematical theory in the last century [2, 12, 15, 21, 32]. Recent works

2 KELLER AND DU.

combine the classical numerical technique of linear multistep methods with neural networks for dynamics discovery [39, 50, 55].

Figure 1: Absolute -errors for the first coordinate of the 2D Damped Cubic System (6.1) on 5] with varying time mesh size h = 0.01, 0.02, 0.03, using a single hidden layer neural network with tanh activation function, as used in [39], after a fixed number of training iterations for each M.

Coined “LMNet,” LMMs are combined with neural networks for discovery of dynamics in [39, 50, 55]. Figure 1 shows the absolute errors associated with learning f for a nonlinearlydamped, 2D cubic oscillator (6.1) using neural networks with three representative schemes of LMMs – Adams-Moulton (AM), Adams-Bashforth (AB), and Backwards Differentiation Formula (BDF). These results are generated using the code repository built for [39]; reported are the errors of the dynamics rather than the integrated dynamics, which are shown in [39]. For solving differential equations with smooth solutions, increasing M corresponds to higher accuracy if the scheme is also stable. The AM scheme is an example of such a method; hence, the perplexing behavior in the errors of AM as observed in [39, 55] (see Tables 1 and 2 of [39]

DISCOVERY OF DYNAMICS 3

and Table 1 of [55]). As M increases and h decreases, the errors do not decrease. Further, as we expand the width, thereby increasing the expressibility of the network, the scheme still does not exhibit stable behavior. On the other hand, as shown in Figure 1, the AB and BDF methods with a fixed network size of 256 show a trend of convergence as M and the mesh size h decrease, while the AM methods show erratic behavior for the same width, persistent even with more expressibility of the network by widening the hidden layer (Figure 1b). Since AM is a stable method as a time integrator, these findings warrant further investigation. Indeed, it has also been observed by others that increased resolution does not necessarily imply better neural network representation and prediction without a mathematically sound formulation of the learning problem [3]. While there are many contributing factors such as the neural network structure and size as well as the training process, it is the goal of this paper to investigate these findings and provide a theoretical explanation of the phenomena.

To begin, we pose the problem of discovery of dynamics. In contrast to numerically integrating dynamics to learn the state, as many classical numerical methods do, this study focuses on learning the dynamics given the state. Dynamics discovery may be viewed as an inverse problem to the forward problem of classical numerical integration. is using a classical numerical technique for the inverse problem. Well-studied for the forward problem, LMMs in this inverse setting raises questions of classical notions of consistency, stability, and convergence. We seek in this work to investigate if the classical theory for LMMs as time integrators to solve the forward problem has an analog or counterpart in solving the inverse problem of learning dynamics. To initiate studies in this direction, we introduce a systematic framework for the numerical analysis of discovery of dynamics using LMMs. Our new framework is rooted in the classical theory for LMMs as numerical integrators of differential equations, but it adopts new stability and convergence criteria due to the inverse nature of using discrete time integrators for dynamics discovery. Consequently, it draws different conclusions regarding convergence in stark contrast to the conventional wisdom. The stability properties of particular schemes depart from the traditional numerical differential equation viewpoint, and some methods that are stable for the forward problem do not retain the property for the inverse problem dynamics discovery. Our theory is able to explain the unusual phenomena as reported in Figure 1 and lays a rigorous foundation for further elucidating the effect of neural networks on dynamics discovery via LMMs through follow-up studies. Therefore, this helps the scientific community broadly in our goal of making machine learning more transparent, explainable, stable and trustworthy.

1.2. Summary of Results. We present a framework in Section 3 consisting of nuanced notions of consistency and stability to handle unique challenges presented by using LMMs for discovery. These concepts are then combined to prove convergence. A set of algebraic criteria is developed to check for the consistency and stability, and thus convergence, of LMMs for dynamics discovery. With this foundation, in Theorems 4.1 and 4.2 , we outline consistency and stability properties of the Adams-Bashforth, Adams-Moulton, and Backwards Differentiation Formula schemes, and consequentially, Corollary 4.3, their convergence guarantees.

1.3. Outline. This paper is organized as follows. In Section 2 we briefly review LMMs and their theory for solving ordinary differential equations, including the standard notions in numerical analysis of truncation error, consistency, stability, and convergence, along with

4 KELLER AND DU.

an algebraic root condition for stability. In Section 3 we frame the problem of discovery using LMMs and develop nuanced versions of consistency and stability for discovery. In particular, in Section 3.3, we discuss how truncation error for discovery is inherited from the forward problem and introduce a stronger notion of consistency; in Section 3.4 we refine the traditional definition of stability and the algebraic root condition, and we show equivalent theorems connecting the root conditions and the refined notions of stability. In Section 4, the discovery framework of Section 3 is applied to characterize convergence properties of the Adams-Bashforth, Adams-Moulton, and Backwards Differentiation Formula schemes. Some discussions on the long time dynamics discovery are made in Section 5. Then, in Section 6, we show results of numerical experiments. Finally, in Section 7, we summarize the results and discuss future directions.

2. LMMs: Quick Review. In this section, we introduce notation used throughout this work and briefly review the theory of LMMs as time integrators. While LMMs are welldocumented in standard textbooks for solving ordinary differential equations (see [15, 32, 2, 21]), we include the salient points to facilitate direct comparison with the new theory for the discovery of unknown dynamics developed in the next section.

2.1. LMMs: Notation and Concepts. Consider the ordinary differential equation (ODE)

where is assumed to be a Lipschitz continuous, smooth, and bounded function. Discretizing the model problem (2.1), we assume a grid on the interval [ a, b ] defined to be a set of points: with equidistant mesh (denote this ordered set. We denote the set of grid functions Γ-step LMM approximates the ) in terms of the previous M (1) time steps step linear multistep method is given by

where ], the coefficients function f is assumed to be given and Lipschitz, and the LMM scheme (2.2) defines an iterative procedure stepping forward in the independent variable ] to solve for x(t) at the gridpoints. Associated with an step LMM are its first and second characteristic polynomials, given, respectively, by

where it is assumed that

DISCOVERY OF DYNAMICS 5

For the numerical integration of differential equations, the method (2.2) is called explicit if = 0 and implicit otherwise [15, 32, 2]. Implicit methods require a nonlinear solver to the generated system of equations, whereas explicit methods do not. Existence and uniqueness of solutions in the case of implicit schemes is shown in [15, 21]. For both implicit and explicit methods, a kickstarting method for initial M values must be chosen, and as such a critical component of analyzing any multistep method scheme is to understand how much errors in initial values pollute the subsequent calculations [15]. This aspect of numerical methods is called numerical stability [2].

Finally, for any index set S with cardinality ¯maxdenote the standard discrete norms for any vector z naturally embedded in where can be any vector norm of . The same notations are used also for discrete grid functions given either in Γ] or its subsets.

Remark 2.1. To fix ideas, we use the hat notation ˆ to mark grid functions generated by the discretization (2.2). In the forward problem, the state x(t) is iteratively produced by LMMs, and hence we study ˆx, whereas for dynamics discovery, we study ˆf, see Section 3.

2.2. The Adams Family and BDF. Adams-Bashforth (AB), Adams-Moulton (AM), and the Backwards Differentiation Formula (BDF) are three popular multistep method schemes that arise from a Lagrange interpolating polynomial of the state or dynamics at time data from previous time steps. Without loss of generality, we consider the scalar model problem in this section; for higher dimensions, the theory need only be applied in each dimension. Let Λ. The Lagrange interpolating polynomial of a function over the set is the polynomial of degree and degree obtained from the linear combination of basis functions

with Λ as the coefficient of the linear combination. The M-step AdamsMoulton (or AM-M) and Adams-Bashforth (or AB-M) are M-step LMMs that arise from interpolating the dynamics with Lagrange interpolating polynomials corresponding to ˜Λ = Λ, respectively, and then applying the fundamental theorem of calculus on the model problem (2.1). Letting )) for brevity, we have

BDF-M, on the other hand, is an M-step LMM for 1 derived from interpolating the state ) directly on the lattice Λ

6 KELLER AND DU.

By the change of variables , we have a scaled Lagrange interpolating polynomial, denoted

With (2.6), the integrand of (2.5) may be written independent of the time step, so that

The simplified coefficients for the BDF method with uniform mesh can be obtained similarly.

2.3. Truncation Error and Consistency. In this section, we introduce the residual and notions related to analytical error for LMMs. The residual operator is given by [15]:

defined for ˆ]. How accurately the discretization (2.2) approximates the solution of (2.1) is measured by the truncation error, defined below.

] be the exact solution of the dynamic system (2.1) defined at the grid coordinates. The local truncation error is given by

For smooth functions f and x, we have

where

Now, we proceed to define order of error and the notion of consistency.

Definition 2.3 (Order of Error [15]). A linear multistep method has error order of p if 0 and admits a ] provided

or simply,

DISCOVERY OF DYNAMICS 7

Definition 2.4 (Consistency [15]). A linear multistep method is consistent with the differential equation provided

The Adams family and BDF are consistent in the sense of Definition 2.4. Moreover, the local truncation error associated with the step AB and BDF schemes are for the step AM, the local truncation error is

It is well-known that consistency can be formulated algebraically in terms of the characteristic polynomials [11]. In particular, the consistency condition, i.e., is equivalent to (1) = 0 and (1). Moreover, the truncation error is order k if

2.4. Stability and the Root Condition. In this section, we review definitions of stability and the root condition for LMMs. Stability is defined as follows.

step method for the ordinary differential equation ˙x = f(t, x(t)) is called stable on [ a, b ] provided there exists a constant K not depending on h such that, for any two grid functions ], we have for all h sufficiently small

The characteristic polynomials defined in (2.3) may be used to determine the stability of a linear multistep method via the root condition.

Definition 2.6 (Algebraic Root Condition [32, 15] ). A polynomial satisfies the root condition provided the roots of the polynomial do not exceed magnitude 1, and those of magnitude 1 are simple.

The following theorem states the equivalence between the stability and the root condition.

Theorem 2.7 (Stability and the Root Condition [32, 15]). A linear multistep method is stable if and only if its first characteristic polynomial satisfies the algebraic root condition given by Definition 2.6.

Note that all AB and AM schemes satisfy the root condition and are stable by Definition 2.5, whereas BDF-M satisfies the root condition and is stable only for 1

2.5. Convergence. Finally, we introduce the definition of convergence for LMMs and the celebrated equivalence theorem for determining it.

Definition 2.8 (Convergence [15]). Consider the initial value problem (2.1) and a fixed linear multistep method defined by (2.2). Let ˆ] be the grid function obtained by applying (2.2) on a uniform, real-valued grid of [ a, b ] with mesh size the exact solution of (2.1) at the grid points. The linear multistep method is said to converge on [ a, b ] if

8 KELLER AND DU.

With Definition 2.8, one can obtain the Dahlquist Equivalence Theorem, Theorem 2.9 [32].

Theorem 2.9 (Equivalence Theorem [15]). The multistep method (2.2) converges in the sense of Definition 2.8 for all Lipschitz f if and only if it is consistent and stable.

From the Equivalence Theorem, it can be shown that the order of the error the same order as the truncation error (Definition 2.3) and thus the order of approximation, provided the initial error maxis also of the same order.

In this work, we develop an analogous theory for multistep methods modifying these theorems to deal with the discovery of dynamics rather than solving the differential equation. In particular, we show how the second characteristic polynomial is determinant of stability for discovery and whether the Adams family and BDF are stable or not.

3. Discovery of Dynamics. In this study, we consider a data-driven technique to solve for the dynamics f given information on the state x at equidistant time steps [39]. First, we introduce the problem and then discuss notions of consistency, stability, and convergence. We now proceed to define the problem of LMMs for discovery.

3.1. Problem Definition. Following earlier discussions, we are concerned with the initial value problem (2.1). In this section and the next, multivariate functions representing the continuum models are denoted by scalar notations, i.e., f = f(x) and x = x(t), so that boldface symbols can be reserved for vectors corresponding to discrete forms of dynamics, which should be clear in context without ambiguity. The task of learning is to produce a function to approximately represent the dynamics, f = f(x), based on a set of observed states, that conforms with the discrete dynamics described by a linear multistep method. In practice, one often encounters situations with only partial (incomplete) data or data containing observation errors and uncertainties; these complications are typical for inverse problems. When combined with deep networks, the approximation is produced by a network in a learned parametrized form, which introduces further approximations as well as implicit regularizations.

As the first step to develop a rigorous numerical analysis framework, we consider a very idealized setting in this work by assuming that (A1) a complete set of exact values of the state, , given at equally distributed, ordered grid points ; (A2) the neural networks (or the underlying function classes used to represent the dynamics) have sufficient approximation capability to produce zero residual for the discrete dynamical system; (A3) approximated values of the exact dynamics for some observed initial states are available.

Although the assumptions make the situation very idealized, the study is a very constructive step towards the understanding of the mathematical and computational issues related to the data-driven modeling using neural networks and discretized forms of the unknown dynamics, which are the focuses of our ongoing work. The findings made here shed light on future studies of similar issues under more realistic conditions, as discussed in Section 3.2 and further in Section 7. Under the assumptions (A1), (A2), and (A3) stated above, the procedure of learning dynamics can be described as follows. Given and ˆas suitable approximations of in a suitable subset of have zero residuals for the discrete dynamics based on the LMM discretization for

DISCOVERY OF DYNAMICS 9

n = M, . . . , N, i.e,

Indeed, the above system for ˆf is simply (2.2) rewritten for learning the dynamics rather than the state. To help with later discussions, we let + 1 denote the number of linear equations in the system. Given that the values of affect the structure of the resulting system, we let be the smallest and the largest index, respectively, among those m’s satisfying

We collect the ordered coefficients of the LMM scheme in the vectors and The system for ˆf in this reduced notation is then

For brevity, we introduce the index sets set of indices of the grid associated with the values of unknown dynamics and for the set of indices for supplied initial dynamics. The linear system (3.1) may be written in compact matrix-vector form:

where + 1) matrix of coefficients for corresponding to matrix banded lower-triangular matrix with its diagonal entries given by -th subdiagonal entries given by the ordered vector of unknowns is defined as

which can be generated from the assumed, suitably approximated starting values We note that since always exists so that (3.2) is solvable whenever the right hand terms are prescribed.

see how the theory developed in this work is connected to the increasingly popular machine learning based data-driven discovery of dynamics, we briefly recall the relevant learning problems here. For more extensive works on machine learning, we refer to [4, 17, 33, 34, 45].

In a generic supervised machine learning setting of learning an unknown function f, one often assumes knowledge of ˜N samples of input-output data,

10 KELLER AND DU.

sample dataset is often divided into sets of training and test sets, and one attempts to find a neural network (NN) representation of , through an empirical loss minimization over the training set. We let ˜denote an ordered subset of ˜data, so that (˜The loss is a suitably-defined function ) measuring a distance between . When evaluated over only training data, this loss leads to the training error. The desired goal is to learn only minimizes the loss in the training set (i.e., the training error), but also achieves a small loss in the remaining test samples (i.e., the generalization error).

In the setting of dynamics discovery, it is important to note that the dynamics, or output, data is not given directly. Instead, only the state, or the input, is provided, and information on the true dynamics f is inferred by constraining the data to conform with some dynamical system. For the LMM discretization of the dynamics given by (3.1), conformity is achieved by minimizing the error associated with the LMM system, which we call the LMM residual. A total loss function of the optimization problem may be effectively viewed as

where the loss ˜is an increasing function of the LMM residual and vanishes at the origin, e.g., ˜. A network approximation with sufficient accuracy would attempt to conform with the discretized LMM dynamics by minimizing the LMM residual to find the unknown data ˜f. Alternatively, as done in LMNet, the neural network approximation may be supplied to the LMM residual ˜, where the initial dynamics in ˆg are also learned. If the approximation can be as accurate as desired, we would be led to the idealized setting that as the network is trained more, given sufficient width, the neural network would converge to ˆsatisfies (3.2).

Naturally, due to other practical considerations as well as the finite approximation power of the neural networks, more general loss functions, regularization techniques, and network architectures may also be taken into account, see Section 7 for further discussions. Our main focus here is to illustrate the impact of using different LMM on the learning process by developing a rigorous mathematical theory of consistency, stability and convergence for the dynamics discovery, beginning with the highly idealized setting of exact state data.

3.3. Truncation Error and Consistency. LMMs for discovery inherit the truncation error of solving ordinary differential equations with LMMs. Indeed, truncation error is specific to the discretization of the continuous problem; therefore, the truncation error of a scheme for dynamics discovery remains the same as that for solving an ordinary differential equation for the state defined by (2.9). However, in addition to inheriting the same concept of consistency from Section 2, Definition 2.4, we also introduce some strengthened notions of consistency for dynamics discovery. We complement these concepts later on with refined notions of stability for a more nuanced discussion of convergence for discovery using LMMs. Consistency and its strengthened forms are defined below.

Definition 3.1 (Consistency for Dynamics Discovery). An LMM is consistent with the differential equation for dynamics discovery provided 0, and it is strongly consistent if 0. Furthermore, a method is consistent of degree provided

DISCOVERY OF DYNAMICS 11

Remark 3.2. With the Definition 3.1, all LMMs having at least k-th order truncation error are consistent of degree at least k. Moreover, since

LMMs having at least second-order truncation are automatically consistent of degree 2 and thus strongly consistent.

Following from the classical truncation error analysis for LMMs, we have the algebraic criteria for the consistency.

Lemma 3.3 (Consistency). A linear multistep method scheme for dynamics discovery is consistent provided that . Furthermore, it is consistent of degree k if it is order k in the sense of Definition 2.3, that is, if

3.4. Stability and the Root Condition for Discovery. In this section we develop stability in a similar spirit as in Section 2 but also introduce more refined notions of stability for convergence analysis. For discovery, the main distinction from theory for solving the forward problem is that now we consider perturbations to the recovered dynamics as opposed to the integrated states for the numerical solution of the differential equation. To begin we introduce a linear operator given by

Notice ( ˆarises from its forward counterpart (2.8) with the reduced

step method for the dynamics discovery is called stable on [ a, b ] provided there exists a constant depending on N, such that, for any two grid functions ], we have

for the dynamics discovery is called marginally stable on [ a, b ] provided that there exists a constant , not depending on N, such that, for any two grid functions we have

method for the dynamics discovery is called weakly stable of degree provided that there exists a constant , not depending on N, such that, for any two grid functions ], we have

12 KELLER AND DU.

In all cases, the norm on the left-hand-side is taken over the learned components and . This convention is used in the rest of the paper. Note that we choose to use negative degree values so that more negative degrees correspond to weaker stability. Similar to the observation given in Remark 3.2, we see that weak stability of degree 2 follows from the marginal stability in Definition 3.5.

We would like to turn the property of stability into an algebraic condition as for the case of numerical solution to ODEs. For the forward problem, the algebraic root condition (Definition 2.6) serves this purpose; however, for the inverse problem, we require a more subtle treatment of the root condition to capture the nuances in stability for dynamics discovery.

Definition 3.7 (Strong Root Condition [1, 46, 13, 2] ). A polynomial satisfies the strong root condition provided the roots of the polynomial have magnitude less than 1.

Likewise, we also generalize the above root conditions.

A polynomial satisfies the root condition of degree provided the roots of the polynomial do not exceed magnitude 1, and those of magnitude 1 have multiplicity no larger than k.

Remark 3.9. One may view the conventional (algebraic) root condition (Definition 2.6) and the strong root condition (Definition 3.8) as special cases of the -multiplicity root condition of Definition 3.8 with k = 1 and k = 0 respectively. The strong root condition has been used in the numerical analysis, control theory, and linear recurrence relation literature for study of relative stability for LMM as time integrators and asymptotic properties associated with the linear recurrence relations [1, 46, 13, 2].

Naturally, we can see that the notions of stability for discovery for LMMs are tied to the bounds on the solutions to the linear recurrence equations determined by the coefficients . We now relate them to the root conditions. Notice that while the stability in Theorem 2.7 for numerical integration of the given dynamics is concerned with the first characteristic polynomial ), the stability in Theorem 3.10 for the discovery of dynamics is concerned with the second characteristic polynomial ) defined by (2.3). More precisely, the root condition can be stated for a reduced second characteristic polynomial

Hence, we see a fundamental difference in the two stability notions. The dependence of stability on )) might be unexpected as it has not appeared in the numerical differential equation literature. However, it is also not surprising given the inverse problem nature of using LMMs for dynamics discovery.

Theorem 3.10 (Stability for Discovery). A linear multistep method for discovery of dynamics is stable provided that the second characteristic polynomial or the reduced ˆsatisfies the strong root condition in Definition 3.7. Likewise, an LMM for discovery of dynamics is marginally stable provided that satisfies the algebraic root condition in Definition 2.6. Furthermore, an LMM for discovery of dynamics is weakly stable of degree

DISCOVERY OF DYNAMICS 13

) provided that satisfies the (-multiplicity root condition in Definition 3.8.

] are both generated by solving the LMM (3.2). By setting ) with the operator ˆdefined in (3.3), we have

By standard recurrence and linear algebra theory [1, 15], the difference ˆe can be determined by the companion matrix of the above recurrence relation, denoted by Z. This matrix is an () matrix with its first row given by rest of the rows are of the form (I, 0) where I is the identity matrix of size ((is the zero column vector in . The matrix Z is associated with a characteristic polynomial given by ˆ) that shares the same set of roots as that of ), except a possible root at 0.

To consider the propagation of the difference ˆe, we form the matrix where has its rows given by its first row given by the vector , and all subsequent rows by zeros. Then

where is given by the initial data . Thus, stability is equivalent to

or equivalently the strong root condition. Meanwhile, marginal stability is equivalent to

which is equivalent to the (multiplicity root condition.

14 KELLER AND DU.

3.5. Error Analysis and Convergence. In this section, we use the truncation error to study the error for discovery, including defining convergence and the order of approximation of LMM schemes for discovery.

Definition 3.11 (Convergence and Order of Approximation for Discovery). Consider the initial value problem (2.1) discretized by an step LMM given by (3.1). Let where f is the exact grid function on the N +1 grid points the approximation solved from (3.1). The LMM is convergent for dynamics discovery ifas 0 whenever maxMoreover, ifconstant c, then p is called the convergence order, or alternatively, the order of approximation for dynamics discovery.

Using the introduced notions of consistency and stability, we now present convergence theorems for dynamics discovery.

(2.1) discretized by an step LMM given by (3.1). Let is the exact grid function on the N + 1 grid points the approximation solved from (3.1). Then,

where is the local truncation error of the scheme, is given by

Moreover, in the senses of consistency outlined in Definition 3.1 and stability in Defini-tions 3.4-3.6, if an LMM is consistent and stable, or strongly consistent and marginally stable, then it is convergent for dynamics discovery in the sense of Definition 3.11. Furthermore, if it is consistent of degree k and weakly stable of degree , then provided that , we also have

Proof. By Equation (3.1) and the truncation error defined in Equation (2.9), we have

Subtracting the equations, we observe

DISCOVERY OF DYNAMICS 15

or equivalently on the left side refers to those components

indexed in I.

Now, by the definitions of stability given in Definitions 3.4 and 3.5, there exists a constant independent of h, for h sufficiently small, such that

where = 1, if the LMM is stable or marginally stable, respectively. Thus, by Definition 3.1 on consistency and strong consistency, we haveLikewise, if the LMM is weakly stable of degree

By the definition on the consistency of degree k, together with the assumption on the initial data that 0, convergence also follows.

Theorem 3.12 states that for LMM based dynamics discovery, convergence follows from both stability and consistency, as in the case of LMM-based time integration. The equation (3.5) shows the interplay between the stability aspect of solving the system, manifested in and the consistency component of truncation error, , from discretization of the differential equation.

We note that Theorem 3.12 contains only sufficient conditions for convergence. There are examples of LMMs that are consistent and marginally stable, but not strongly consistent nor stable, which may still be convergent for dynamics discovery. An example is the LMM with 2; convergence for this LMM can be checked using calculations similar to that presented in the proof of Corollary 4.3. Nevertheless, in the spirit of the Dahlquist Equivalence Theorem, we also have the following result establishing consistency and some form of stability from convergence.

Theorem 3.13 (Convergence Theorems for Discovery II). Consider the dynamical system (2.1) discretized by an step LMM given by (3.1). If the LMM is convergent for dynamics discovery in the sense of Definition 3.11, then it is consistent and marginally stable in the senses of Definitions 3.1 and 3.5.

) = 1, respectively with x(a) = 0. If the LMM is convergent in the sense of Definition 3.11, then the dynamical system (3.1) leads to a linear recurrence relation with constants (1), respectively, serving as inhomogeneous terms on the right hand side. Since the LMM is convergent, the learned dynamics approach 0 or 1, respectively. Thus, as (1) = 0 and (1). Consequentially, the consistency conditions from Lemma 3.3 are satisfied. Using the theory on the linear recurrence relations given in the proof of Theorem 3.12, in order for0 whenever maxthere must exist some constant 0 such that maxFor this bound to exist, the root condition must be satisfied, and hence the method must be marginal stable.

16 KELLER AND DU.

As seen from the proof of Theorem 3.12, under some assumptions on the initial dynamics, we immediately get the order of convergence for LMM-based dynamics discovery.

Theorem 3.14 (Order of Convergence). Given an LMM with a truncation error of order , as in Definition 2.3. Then, as if the LMM is stable and max. Moreover, provided

that max

namics discovery. These refinements motivate accompanying definitions for the degree of consistency in Definition 3.1, whereas traditionally the order of error matches with the order of convergence (see Definition 2.3). For dynamics discovery, the order of convergence and degree of consistency matches for strongly stable schemes. While this might not hold generically for marginally or weakly stable LMMs, resulting in possible lower order of convergence than the degree of consistency. We show later that, for some cases such as AM-1, the same order can still be maintained.

4. Application to AB, AM, and BDF. We now apply the general theorem on LMM for the dynamics discovery to three popular special classes of methods– Adams-Bashforth (AB), Adams-Moulton (AM), and Backwards Differentiation Formula (BDF).

4.1. Consistency of AB, AM and BDF. It is well-known that the Adams Family schemes and BDF are consistent as time integrators. Specifically, AB-M and BDF-M have order of error M, while AM-M has order of error M + 1. As a result, these three classes of LMM methods remain consistent for dynamics discovery. Moreover, as a consequence of the order of error, AB-M and BDF-M are consistent of degree M, and the AM-M schemes are consistent of degree M + 1, as noted in Remark 3.2. Indeed, the latter fact is crucial to the convergence of AM-1.

M and BDF-M schemes are all consistent for dynamics discovery. Furthermore, AM-1 is consistent of degree 2 and thus strongly consistent .

Theorem 4.2. With the notions of stability defined in Definitions 3.5 and 3.4, 1. BDF-, and AM-0 are stable; 2. AM-1 is marginally stable and thus weakly stable of degree 3. AB-are unstable.

The proof of Theorem 4.2 is given in Section 4.3.

are convergent, with convergence order M. AB-M for 1 are convergent, with convergence order M. AM-0 is convergent with first-order convergence. AM-1 is convergent with second-order convergence if we have second order error on the initial data.

DISCOVERY OF DYNAMICS 17

Proof. The conclusions of Corollary 4.3 on the convergence of LMM schemes under consideration follow immediately from the application of Theorems 4.1, 4.2, and 3.14. The order of convergence follows, with the exception of AM-1. Indeed, a direct application would imply only first order convergence due to its degree-1 marginal stability. However, we note in this special case, the recurrence relation (3.6) is given by ˆUsing the error expansion given in Definition 2.3, the leading order of ˆof the form

where ) is assumed to be a smooth function depending on the solution of the exact dynamic x = x(t). Therefore, given second-order error in the initial data, AM-1 has second-order convergence even though it is not a strongly stable method.

Remark 4.4. The finite range of instability with respect to the order M for the AB scheme is due to limitation of explicit calculations. We conjecture that the scheme is unstable for all 7. Interestingly, that M = 6 is a threshold for stability of the polynomial echoes the stability criterion for the forward problem BDF [21], for which M = 6 is also the largest known order method that is stable. Explicit numerical calculation or Routh Arrays (see [13]) are used to show this fact [21, 10, 14]. Schur polynomials have since been used [9] to show a generalized stability argument for ]. We leave open a generalized stability result for 7 using the polynomial roots, but we have validated numerically the instability for 7

4.3. Verification of Root Conditions for AB, AM and BDF. We now verify, for the three classes of LMMs, the root condition holds for cases stated in Theorem 4.2.

We begin by calculating the roots of the second characteristic polynomial associated with AB and AM since ) in both cases. We first present some results for AB-M and AM-10 as computational evidence (with exact symbolic computation). We have also numerically validated instability for AB-20 as well and expect instability to persist for all 7. However, there is no theoretical proof so far. For AM-M, a general instability result for 2 is proved in Lemma 4.7.

Fix , where we recall from Section 2 that ΛΛ. Exchanging the integral and the summand in the formula for the Lagrange interpolating polynomial, one can observe that finding the roots of the second characteristic polynomial is equivalent to choosing satisfying a mean-zero equation. That is, for ; ˜Λ) defined in (2.6), we have

18 KELLER AND DU.

Table 1: Largest Magnitude Roots

As we see in Table 1, which is computed symbolically by Mathematica, the profile of the roots of the characteristic polynomial associated with the different schemes varies significantly. Equation (4.1) and the data in Table 1 immediately lead to the following lemma.

defined in (2.6). Then, we can characterize the roots of the second characteristic polynomial as the solution to the equation

Moreover, for AM-

Let us state some useful properties of the second characteristic polynomial ) associated with the AM methods and the corresponding coefficients of its B matrix.

step AM scheme has coefficients

for m = 0, 1, . . . , M. The coefficients are given by

DISCOVERY OF DYNAMICS 19

Certainly,

We prove (4.4) by induction. As the base case, M = 2. For M = 2, we have 12. Now assume (4.4) holds up to some arbitrary 2. We will show the result for M + 1.

as desired. Note we used the inductive hypothesis on the second term in (4.6). The proof by induction showing for is complete. To prove Part 2, note that the relation of signs between coefficients follows from the sign of the Lagrange basis polynomials in the integrand of the coefficients. For the integrand of (4.3) are of the same sign, and therefore the sign of depends only on the multiplier (Hence Part 2 of Lemma 4.6 follows.

Finally, for Part 3, we note that

This completes the proof.

The linear multistep method formed by the Adams-Moulton scheme for does not satisfy the root condition.

20 KELLER AND DU.

Taking the limit as , we see that (Meanwhile,

Hence, it follows from the Intermediate Value Theorem that there is at least one root of ) that is real in (1), violating the root condition. The result thus follows.

Theorem 4.8 (Root Condition of AB, AM, BDF). The strong root condition for discovery is satisfied by BDF-scheme for 1 The algebraic root condition, or the root condition with k = 1, is satisfied for AM-M with M = 1. On the other hand, the root condition is not satisfied for the AB-M scheme with 7 or the AM-M scheme with

Proof. The case of AM-0 is trivial. Lemma 4.5 implies the results of Theorem 4.8 for AB-10 and for AM-10. Furthermore, by Lemma 4.7, the AM-M scheme for 2 violates the root condition and hence is unstable. Finally, BDF-M has ) = 1, for all 1. Hence, the root condition is always satisfied for the BDF scheme for arbitrary 1. As a result, AM-0, identical to BDF-1, satisfies the root condition as well.

Finally, Theorem 4.2 follows directly from Theorems 4.8 and 3.10.

4.4. Discussions on the Effect of Initial Conditions. The theory developed so far is under the assumption that some initial data of the dynamics are provided, which leads to learning the approximated dynamics at later times. One may consider a situation where the some terminal data are given instead. In such cases, the approximate dynamics would be solved backward in time, yielding a modified system of equations. It is not hard to check that the stability would become dependent on a modified second characteristic polynomial whose roots are the reciprocals of those of ˆ. Naturally, it is of interest to check root conditions for the three classes of LMMs as well. For BDF, we clearly see the strong root condition holds as ˆ) = 1. For AM-0 and AB-1, the same also hold. Likewise for AM-1, the root condition but not the strong root condition remains true. For AM-Lemma 4.6 implies the product of the roots of ˆ) is less than one. Therefore, there might be at least one root of the modified second characteristic polynomial outside the unit disc, and hence instability for these methods is again expected. Interestingly, unlike in the case with initial data where there is not yet rigorous theory but only computational results for the AB methods, one can prove rigorously in the next lemma a result of instability for the backwards-in-time AB-via a similar argument as Part 3 of Lemma 4.6.

= 0 is true by construction. For 2, we have

DISCOVERY OF DYNAMICS 21

for m = 1, 2, . . . , M + 1. The coefficients

which completes the proof of the lemma.

From the above, we see that root conditions do not hold for the modified second characteristic polynomial associated with AB-2, so that instability would occur when terminal data are supplied. In practice, it is often the case that such initial dynamics are represented by neural networks as part of the unknown as well. Thus, the stability in such cases is worthy of further investigation, particularly in conjunction with the approximation properties of the neural networks to be employed. Clearly, the successful runs using neural networks in Figure 1 have good correspondence with those schemes enjoying some stability properties in one or both types of initial/terminal data .

5. Long Time Dynamics Discovery. In this section, we consider the problem of discov- ering dynamics of (2.1) over a variable interval (0, T), with terminal time 1 fixed mesh h. Notice by increasing T we increase the number of grid points N = T/h; hence we hope to relate our previous studies with variable mesh and fixed domain to this setting. For the numerical analysis of time integration, this study is reminiscent to that of asymptotic stability, which is often treated via the study of linear dynamics [15, 32, 2].

By rescaling time, ˜and defining ˜), we have via change of variables that the scaled dynamics ˜f may be related to that of the original variables by

Then, if we define ˜)), the rescaled differential equation becomes

Now, consider applying the LMM scheme to ˜x using the transformed model problem (5.1) with a step size ˜h = 1/N. Under this rescaling of time, one can check directly the leading truncation error term of an LMM of order p in the sense of Definitions 2.2 and 2.3 is

In light of (5.2), we can see that the truncation error of the discovered dynamics of (2.1) in the original time scale is a multiple of the truncation error of the rescaled model (5.1) by the factor . Meanwhile, from the analysis of Section 3.4, the error from stability is only directly dependent on , not the specific time domain.

Using these observations of the effects on consistency and stability, we can deduce the behavior of an LMM in the long-time regime. For a strongly stable order LMM, the global error behaves like ) provided that max

22 KELLER AND DU.

uniformly bounded as T increases. Hence, we may view strongly stable LMMs as A-stable, in the case of dynamics discovery, for fixed . This can be seen as another difference with the case of the forward problem of time integration, where the order of A-stable LMMs is known to be limited by 2 due to the celebrated Dahlquist barrier theorems [12, 15, 32, 2]. On the other hand, for unstable methods, the exponential growth in N of the inverse matrix dominates over any gain in accuracy from consistency. Thus, lack of stability leads to an exponentially increasing error as T grows linearly.

As a special example, the marginally stable AM-1 is not stable for dynamics discovery, but as stated in the Corollary 4.3 and the derivation in its proof, we can use the rescaling to get the global error in the form ), Thus, we expect AM-1, for a fixed h, to have a constant error as T increases, which is supported by numerical experiments presented in the next section.

To recap, from the analysis in this section, for dynamics discovery, BDFs enjoy asymptotic stability for a fixed time step size h as T increases. Same holds for AB-M, at least for a small value of M that enjoys the stability as 0 for a given terminal time. While this also holds for AM-1, it does not hold for AM-2. As shown in Figure 3, the errors from AB and BDF remain fixed across various values of T, while the AM methods yield exponential growth of error in

systems associated with each of the studied multistep methods and show numerical evidence consistent with the theoretical findings. We limit ourselves to the idealized setting of numerically exact states considered for the theoretical analysis and to low dimensional dynamic systems for the sake of illustration and benchmarking. In addition, we also take the initial data for the dynamics to be exact. For a model problem, we consider the 2D Cubic System, a nonlinearly damped oscillator, specified as in [39, 6].

6.1. Fixed Time Domain. First we study the methods on a fixed time domain, with varying time step. We show in Figure 2 the results from the Adams family and BDF methods. The exact dynamics are computed by numerically integrating (6.1) on a very refined mesh. The errors of the discovered dynamics in the norm are shown in Figure 2 for different M against different number of grid points. In addition, Figure 2d shows a segment of the approximated dynamics captured over the interval versus the true dynamics using a stable and convergent method (AB-3) when h = 0.01. In this figure, the dotted and dashed lines represent the true dynamics in the first and second coordinates, i.e. , respectively. The crosses and asterisks denote the learned dynamics in the first and second coordinates, i.e. ˆ, respectively. The method is able to capture the twist and intersection of the two coordinates. Clearly, the numerical results support the theoretical findings of this paper.

6.2. Long Time Behavior. Here, we consider the problem of discovering dynamics over a changing domain [0with fixed mesh size h. In Figure 3, we discover the dynamics

DISCOVERY OF DYNAMICS 23

Figure 2: Numerical results of the three types of schemes on the 2D cubic system (6.1) on the unit time interval for different choices of M and N.

of the 2D Cubic System over specified ranges of T (T = 12.5, 25, 37.5). For AM (Figure 3a, 3b, and 3c) we use h = 0.01 to first generate data over [0, 50] and then select the slice of data matching the Ts. AM-M clearly suffers from the exponential error growth when it has a constant error when M = 1, as predicted in Section 5. Meanwhile, also consistent with the analysis of Secton 5, AB and BDF are robust for the long-time dynamics discovery – yielding a constant error for fixed mesh as T increases and a decreasing error for larger M.

7. Conclusions and Future Steps. In this paper, we extend the foundational work of solving ordinary differential equations using LMMs to the problem of dynamics discovery. We introduce refined notions of consistency, and stability, and convergence for discovery based on classical definitions, and we showed how three prominent schemes – Adams-Bashforth, Adams-Moulton, and Backwards Differentiation Formula – may or may not be convergent

24 KELLER AND DU.

Figure 3: Long Time Errors for Discovery of 2D Cubic System (6.1)

numerical methods for dynamics discovery in general. To do so, we first derive algebraic criteria to determine the consistency and stability of the LMM, in a spirit similar to the counterpart for the classical theory. The key difference lies in the characteristic polynomial of attention; instead of the root condition for the first characteristic polynomial, as classically attributed to LMMs as time integrators, stability for discovery of dynamics is attributed to root conditions on the second characteristic polynomial. While the conditions are trivial for the BDF class, their validity in the case of AM schemes requires the study of some new properties of the Lagrange interpolants. The case of AB, at the present, has to be investigated computationally. Numerical results are presented to show agreement with the theoretical findings. In conclusion, we find theoretically and numerically that the systems for BDF-M for all 6, and AM-M for M = 0 and 1 are and convergent, while AB-10 and AM-2 are not, as summarized in Table 2. These conclusions are drawn provided some initial data on the dynamics are supplied. Modifications need to be made, as discussed in Section 4.4, if other types of additional data on the dynamics are provided. LMM schemes are well-studied for the forward problem in numerical analysis. As such tools, they can be useful to the subject of machine learning. For example, they can be applied to the design and training of neural networks that are seen as discrete forms of

Table 2: LMMs: similarities and differences for integrating and learning dynamics.

DISCOVERY OF DYNAMICS 25

dynamic systems [52, 7, 47]. Different from such applications, the new study given here is motived by recent interest in using machine learning [4, 17, 33, 34, 45] to formalize a variety of inverse problems such as learning dynamics using classical discretization techniques like LMMs. The change of the problem type from forward integration to inverse learning leads to different mathematical theory as illustrated in the Table 2. Note that in particular, BDF provides a class of methods convergent for integrating and learning dynamics, while not all AB and AM methods can share the same conclusion. Our framework can be applied to check on other LMMs besides these examples. Furthermore, it will be interesting to explore if there are systematic ways to generate broader classes of LMMs good for both tasks of model-based time integration and data-driven learning.

As discussed in Section 3.2, our current study assumes the best possible case that the exact states along with suitable approximations to the initial dynamics are all given, together with the assumption that the neural network representation can produce zero residual for the LMM dynamics. While this setting is highly idealized, based on the conclusions drawn, we can speculate about the impact on the properties of stability and convergence caused by different choices of time discretization schemes for a more informed attempt at discovery of unknown dynamics in more practical settings. The latter leads to many interesting issues to be considered in the future. For instance, instead of assuming only data on the state with a loss function ), we may consider a more general loss function with data on the state and dynamics, i.e. ), given by

For LMMs with grid functions, the loss associated with dynamics conformity comes from the discretization (3.1), and ˜], the space of grid functions. The total loss can be taken as an expectation over training samplesand minimized to obtain some optimal representation of the state or dynamics. LMNet is an example where the conformity term is minimized over parameterized neural networks of various types, so that ˆ, as studied in [37, 39, 50, 55]. Whenever the term involving the LMM residual is accounted for, the framework developed in this paper would be relevant. For stable LMMs considered here, one may expect that it may be possible to extend the convergence results for exact and complete data if the set of neural networks can satisfy some universal approximation properties. The convergence would be expected to be in the sense of function approximations which would imply good generalization error, at least among suitable classes of smooth dynamic systems. For systems displaying chaotic behavior and sharp transitions, new ideas are likely needed in order to assure accurate discovery of the underlying complex dynamics.

In this more general setting, neural network representations may also provide implicit regularization of the learned dynamics so that unstable LMMs could potentially be stabilized. However, regularization likely produces additional consistency error so the convergence has to be more carefully examined. Moreover, we may consider compressed representation and treat incomplete data by promoting sparsity or exploring the use of partial physics as regularization

26 KELLER AND DU.

to achieve physics-informed and data-driven discovery of the dynamics. Finally, there are many avenues of exploration to extend the results reported here. Some interesting topics for future studies include

1. the effects of regularization by specifying various forms of the regularization terms , such as those promoting smoothness, sparsity, low dimensionality, and extending the above tasks for study of the dynamics discovery problem with incomplete and uncertain data.

2. different reduced-order models via choice of constrained representations on the dynamics or the state variables or both [3, 55];

3. extension of the stability framework to incorporate other multistep and multistage schemes such as predictor-corrector, Milne and Runge-Kutta [43];

4. derivation of a general class of LMMs that are convergent for both the forward problem of time integration and the backward problem of dynamics discovery.

5. the errors in numerically integrated states based on learned dynamics [39]; 6. distributed dynamic systems such as time-dependent PDEs and examine the additional effect due to spatial discretization;

7. generalizing to the study of dynamics for a suitable set of initial conditions.

Naturally, learning dynamics has strong connections to the subject of time-series prediction using deep learning [8, 20, 24, 49]. Our current work here may motivate further rigorous numerical analysis studies in such a direction as well. To conclude, we see from this study that there are many new challenges in physics-based and data-driven modeling and simulations warranting further numerical analysis research.

8. Acknowledgments. The authors would like to thank the CM3 group at Columbia University for invigorating discussions, Wen Ding for his stimulating suggestions, and the referees and Associate Editor of SIAM Journal of Numerical Analysis for their valuable comments.

REFERENCES

[1] R. P. Agarwal, Difference equations and inequalities: theory, methods, and applications, CRC Press, 2000.

[2] K. Atkinson, W. Han, and D. E. Stewart, Numerical solution of ordinary differential equations, vol. 108, John Wiley & Sons, 2011.

[3] K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M. Stuart, Model reduction and neural networks for parametric pdes, arXiv preprint arXiv:2005.03180, (2020).

[4] C. M. Bishop, Pattern recognition and machine learning, springer, 2006.

[5] S. L. Brunton and J. N. Kutz, Data-driven science and engineering: Machine learning, dynamical systems, and control, Cambridge University Press, 2019.

[6] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences, 113 (2016), pp. 3932–3937.

[7] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, in Advances in neural information processing systems, 2018, pp. 6571–6583.

[8] J. T. Connor, R. D. Martin, and L. E. Atlas, Recurrent neural networks and robust time series prediction, IEEE transactions on neural networks, 5 (1994), pp. 240–254.

[9] D. M. Creedon and J. J. Miller, The stability properties ofq-step backward difference schemes, BIT Numerical Mathematics, 15 (1975), pp. 244–249.

[10] C. W. Cryer, On the instability of high order backward-difference multistep methods, BIT Numerical

DISCOVERY OF DYNAMICS 27

Mathematics, 12 (1972), pp. 17–25.

[11] G. Dahlquist, Convergence and stability in the numerical integration of ordinary differential equations, Mathematica Scandinavica, (1956), pp. 33–53.

[12] G. G. Dahlquist, A special stability problem for linear multistep methods, BIT Numerical Mathematics, 3 (1963), pp. 27–43.

[13] R. C. Dorf and R. H. Bishop, Modern control systems, Pearson, 2011.

[14] C. Fredebeul, A-BDF: a generalization of the backward differentiation formulae, SIAM journal on numerical analysis, 35 (1998), pp. 1917–1938.

[15] W. Gautschi, Numerical analysis, Springer Science & Business Media, 1997.

[16] H. H. Goldstine, A History of Numerical Analysis from the 16th through the 19th Century, vol. 2, Springer Science & Business Media, 2012.

[17] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, MIT press, 2016.

FD-Net with auxiliary time steps: Fast prediction of PDEs using Hessian-free trust-region methods, arXiv preprint arXiv:1910.12680, (2019).

[19] J. Han, A. Jentzen, and E. Weinan, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences, 115 (2018), pp. 8505–8510.

[20] M. Han, J. Xi, S. Xu, and F.-L. Yin, Prediction of chaotic time series based on the recurrent predictor neural network, IEEE transactions on signal processing, 52 (2004), pp. 3409–3416.

[21] P. Henrici, Discrete variable methods in ordinary differential equations, (1962).

[22] M. I. Jordan and T. M. Mitchell, Machine learning: Trends, perspectives, and prospects, Science, 349 (2015), pp. 255–260.

[23] S. H. Kang, W. Liao, and Y. Liu, Ident: Identifying differential equations with numerical time evolution, arXiv preprint arXiv:1904.03538, (2019).

[24] F. Karim, S. Majumdar, H. Darabi, and S. Chen, LSTM fully convolutional networks for time series classification, IEEE access, 6 (2017), pp. 1662–1669.

[25] I. G. Kevrekidis, C. W. Rowley, and M. O. Williams, A kernel-based method for data-driven Koopman spectral analysis, Journal of Computational Dynamics, 2 (2016), pp. 247–265.

[26] Y. Khoo, J. Lu, and L. Ying, Solving parametric PDE problems with artificial neural networks, arXiv preprint arXiv:1707.03351, (2017).

[27] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp. 1097–1105.

[28] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, nature, 521 (2015), p. 436.

[29] Z. Long, Y. Lu, and B. Dong, PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network, Journal of Computational Physics, 399 (2019), p. 108925.

[30] F. Lu, M. Zhong, S. Tang, and M. Maggioni, Nonparametric inference of interaction laws in systems of agents from trajectory data, Proceedings of the National Academy of Sciences, 116 (2019), pp. 14424–14433.

[31] C. Ma, J. Wang, and W. E, Model reduction with memory and the machine learning of dynamical systems, arXiv preprint arXiv:1808.04258, (2018).

[32] , Cambridge University Press, 2003.

[33] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of machine learning, MIT press, 2018.

[34] K. P. Murphy, Machine learning: a probabilistic perspective, MIT press, 2012.

[35] S. Pan and K. Duraisamy, Data-driven discovery of closure models, SIAM Journal on Applied Dynamical Systems, 17 (2018), pp. 2381–2413.

[36] E. Qian, B. Kramer, B. Peherstorfer, and K. Willcox, Lift & learn: Physics-informed machine learning for large-scale nonlinear dynamical systems, Physica D: Nonlinear Phenomena, 406 (2020), p. 132401.

[37] T. Qin, K. Wu, and D. Xiu, Data driven governing equations approximation using deep neural networks, Journal of Computational Physics, 395 (2019), pp. 620–635.

[38] M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial differential equations, The Journal of Machine Learning Research, 19 (2018), pp. 932–955.

[39] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236, (2018).

28 KELLER AND DU.

[40] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Numerical Gaussian processes for time-dependent and nonlinear partial differential equations, SIAM Journal on Scientific Computing, 40 (2018), pp. A172–A198, https://doi.org/10.1137/17M1120762.

[41] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019), pp. 686–707.

[42] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery of partial differential equations, Science Advances, 3 (2017), p. e1602614.

[43] S. H. Rudy, J. N. Kutz, and S. L. Brunton, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, Journal of Computational Physics, 396 (2019), pp. 483–506.

[44] M. Schmidt and H. Lipson, Distilling free-form natural laws from experimental data, science, 324 (2009), pp. 81–85.

[45] S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014.

[46] S. Strelitz, On the Routh-Hurwitz problem, The American Mathematical Monthly, 84 (1977), pp. 542– 544.

[47] Q. Sun, Y. Tao, and Q. Du, Stochastic training of residual networks: a differential equation viewpoint, arXiv preprint arXiv:1812.00174, (2018).

[48] Y. Sun, L. Zhang, and H. Schaeffer, Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data, arXiv preprint arXiv:1908.03190, (2019).

[49] Y. Tao, L. Ma, W. Zhang, J. Liu, W. Liu, and Q. Du, Hierarchical attention-based recurrent highway networks for time series prediction, arXiv preprint arXiv:1806.00685, (2018).

[50] R. Tipireddy, P. Perdikaris, P. Stinis, and A. Tartakovsky, A comparative study of physics-informed neural network models for learning unknown dynamics and constitutive relations, 2019, https://arxiv.org/abs/1904.04058.

[51] M. Wang, H.-X. Li, X. Chen, and Y. Chen, Deep learning-based model reduction for distributed parameter systems, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46 (2016), pp. 1664–1674.

[52] E. Weinan, A proposal on machine learning via dynamical systems, Communications in Mathematics and Statistics, 5 (2017), pp. 1–11.

[53] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition, Journal of Nonlinear Science, 25 (2015), pp. 1307– 1346.

[54] K. Wu and D. Xiu, Data-driven deep learning of partial differential equations in modal space, arXiv preprint arXiv:1910.06948, (2019).

[55] X. Xie, G. Zhang, and C. G. Webster, Non-intrusive inference reduced order modeling of fluid dynamics using linear multistep network, arXiv preprint arXiv:1809.07820, (2018).

[56] Y. Zhu and N. Zabaras, Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification, Journal of Computational Physics, 366 (2018), pp. 415–447.