Learning reduced systems via deep neural networks with memory

2020·Arxiv

Abstract

Abstract

LEARNING REDUCED SYSTEMS VIA DEEP NEURAL NETWORKS WITH MEMORY

XIAOHAN FU∗, LO-BIN CHANG∗, AND DONGBIN XIU† Abstract. We present a general numerical approach for constructing governing equations for unknown dynamical systems when only data on a subset of the state variables are available. The unknown equations for these observed variables are thus a reduced system of the complete set of state variables. Reduced systems possess memory integrals, based on the well known Mori-Zwanzig (MZ) formulism. Our numerical strategy to recover the reduced system starts by formulating a discrete approximation of the memory integral in the MZ formulation. The resulting unknown approximate MZ equations are of ﬁnite dimensional, in the sense that a ﬁnite number of past history data are involved. We then present a deep neural network structure that directly incorporates the history terms to produce memory in the network. The approach is suitable for any practical systems with ﬁnite memory length. We then use a set of numerical examples to demonstrate the eﬀectiveness of our method. Key words. Deep neural network, reduced system, Mori-Zwanzig formulation, memory integral

1. Introduction. Designing data-driven numerical methods to discover unknown physical laws has received an increasing amount of attention lately. Several methods were developed for dynamical systems by using traditional numerical approximation techniques. In these approaches, the unknown governing equations are treated as target functions, whose inputs are the state variables and outputs are their temporal derivatives. Methods using sparse recovery, as well as more standard polynomial approximations, have been developed, cf. [29, 3, 3, 25, 9, 30, 26, 22, 24, 34, 33, 12, 19, 20, 18, 11, 28]. More recently, more research efforts are being devoted to the use of modern machine learning techniques, particularly deep neural networks (DNNs). The studies include recovery of ordinary differential equations (ODEs) [21, 16, 23] and partial differential equations (PDEs) [12, 19, 20, 18, 11, 28]. A notable development along this line of approach is the use of flow map for modeling the unknown dynamical equations [16]. Flow map describes the (unknown) mapping between two system states. Once it is accurately approximated, it can serve as a model for system prediction. The major advantage of using flow map is that it avoids requiring temporal derivative data, which can be difficult to acquire in practice and often subject to larger errors. In particular, residual network (ResNet), developed in image analysis community ([8]), is particularly suitable for equation recovery, in the sense that it can be an exact integrator [16]. This approach has since been extended and applied to other problems [35, 15, 4].

The aforementioned approaches are data driven and rely on observational data of the state variables to numerically estimate the underlying dynamical systems. For many practical systems, however, one does not have access to data for all the state variables. Instead, one often only have data on a subset of the variables, i.e., the observables. It is then natural to seek a governing equation for the evolution of the observed variables. This, however, introduces additional challenges from mathematical point of view. Even when the underlying governing equations for the full variable set are autonomous, the effective governing equations for the observed variables, i.e.,

∗Department of Statistics, The Ohio State University, Columbus, OH 43210, USA. Emails:

†Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA. Email: xiu.16@osu.edu. Funding: This work was partially supported by AFOSR FA9550-18-1-0102. 1

the reduced system of equations, include memory terms and become non-autonomous. This is a direct result of the well known Mori-Zwanzig (MZ) formulation [13, 37]. The memory term in the MZ formulation represents a significant computational challenge. Various approximation techniques have been developed to facilitate efficient estimation of the memory effect. See, for example, [6, 1, 7, 27, 5, 31, 36], and the references therein. And more recently, data driven methods were developed to provide effective closure or estimation for memory integral [10, 2].

The topic of this paper is on data driven learning of unknown dynamical systems when only data on a subset of the state variables, i.e., observables, are available. We make a general assumption that the underlying unknown system of complete equations are autonomous. Our goal is to construct a dynamical model for the evolution of the observables, whose data are available, thus discovering a reduced system. Due to the MZ formulation, the unknown governing equations for the observables are non-autonomous and possess memory integrals. Therefore, the aforementoned existing data driven methods for equation discovery are not applicable. On the other hand, the existing approximation techniques for the memory integral in MZ equations are not applicable either, as the underlying complete system is unavailable. We therefore propose a new method to directly learn the evolution equations for the observables, with a built-in memory effect. To accomplish this, we make a general assumption that the reduced systems for the observables have “decaying memory” over longer time horizon. When the observables are representative of the full system states, this usually holds true as the evolution of the observables depends on their current states and their immediate past, and the dependence does not usually extend to infinite past. In another word, the initial states of the observables should have diminishing effects on their evolution over longer time. Based on the decaying memory assumption, we then truncate the memory integral in the MZ formulation up to its “memory length” to obtain an approximate MZ (AMZ) equation. The AMZ equation is then discretized by using a set of time instances inside the memory interval. The resulting discrete approximate MZ (d-AMZ) equation, still unknown at this stage, becomes our goal of equation learning. We then design a deep neural network (DNN) structure that explicitly incorporate the observable data inside the memory interval. The proposed DNN structure is an extension of the ResNet structure used for autonomous system learning ([16]). By incorporating data from immediate past, the new DNN can explicitly model the memory terms in the MZ equation. We remark that our current method has similarity with a recent and independent work [32], where similar truncation and discretization of MZ formulation was proposed. However, the work of [32] utilizes long short-term memory (LSTM) neural network structure to achieve memory effect. Our proposed DNN structure takes a much simpler form, in the sense that it is basically a standard full connected network and does not requires any “gates” as in LSTM. The new network also allows direct conceptual and numerical connection with the “true” memory of the underlying reduced systems.

This paper is organized as follows. After the problem setup in Section 2, we present the main method in Section 3. The decaying memory assumption is first discussed in Section 3.1, followed by the discrete approximate Mori-Zwanzig formulation in Section 3.2. The DNN structure is then presented in Section 3.3, along with its data set construction and training in Section 3.4. Numerical examples are then presented in Section 4 to demonstrate the properties of the proposed approach.

2. Setup and Preliminaries. Let us consider a system of ordinary differential equations (ODEs),

dt = f(x), x(0) = (2.1) where are the state variables. We assume that the form of the governing equations, which manisfests itself via , is unknown. Let x = (z; w), where is the subset of the state variables with available data, and is the unobserved subset of the state variables. Our goal is to construct an effective governing equation for the observed variables z.

2.1. Data. We assume trajectory data are available only for the observables z and not for the full set of the state variables x. Let be the total number of such partially observed trajectories. For each i-th trajectory, we have

where are discrete time instances at which the data are available. Note that each trajectory is originated from an unknown initial condition = (; ). For notational convenience, we shall assume a constance time step

We then seek to develop a numerical model for the evolution dynamics of z(t), without data on w and the knowledge of the full model (2.1).

2.2. Learning of Full System. When data on the full set of state variables x are available, the task of recovering the full model (2.1) is relatively more straightforward. A number of different approaches exist. In this paper, we adopt and modify the approach developed in [16], which seeks to recover the underlying flow-map of (2.1) as opposed to the right-hand-side of (2.1). In particular, suppose data of the full set of state variables are available as

for a total number of trajectories over time instances with a constant step size, ∆ = for all i and k. One can then re-group the data into pairs of two adjacent time instances, for each i = 1,

Note that for autonomous system (2.1), time t can be arbitrarily shifted and only the relative time difference is relevant. One can then define the data set as

where is the identity matrix of size , and for any ,

Based on (2.5), it was then proposed in [17] to use residue network (ResNet) ([8]) to recover the system. The ResNet has a structure of

where is the operator corresponding to a fully connected deep neural network. Upon using the data set (2.4), the ResNet (2.6) can be trained to approximate the dynamics (2.5), with the deep network operator .

2.3. Mori-Zwanzig Formulation for Reduced System. The approach in the previous section, along with most other existing equation recovery methods, does not apply to the problem considered in this paper. The reason is because here we seek to develop/discover the dynamic equations for only the observables z, which belong to a subset of the full set variables x. Even though the full system (2.1) is autonomous, a crucial property required in most of the existing equation recovery methods, the evolution equations for the subset variables z become non-autonomous. This is well understood from Mori-Zwanzig (MZ) formulation ([13],[37]). The evolution of the reduced set of variables z follows generalized Langevin equation in the following form,

The first term R depends only on the reduced variables z at the current time and is Markovian. The second term, known as the memory, depends on the reduced variables z at all time, from the intial time s = 0 to the current time s = t. Its integrand involves K, commonly known as the memory kernel. The last term is called orthogonal dynamics, which depends on the unknown initial condition of the entire variable x(0) and is treated as noise. Note that this formulation is an exact representation of the dynamics of the observed variables z. The presence of the memory term makes the system non-autonomous and induces computational challenges. We remark again that, even though various techniques exist to estimate the memory intergal (cf. [6, 1, 7, 27, 5, 31, 36]), they rely on the knownledge of the full system (2.1) and are thus not applicable in the setting of this paper, where the full system is unknown.

3. Main Method. In this section, we discuss the detail of our proposed numeri- cal method. We first present finite memory approximation to the exact Mori-Zwanzig formulation (2.7). We then discuss discrete approximation to the finite memory MZ formulation. Our neural network model is then constructed to approximate the unknown discrete approximate MZ formulation.

3.1. Finite Memory Approximation. We make a basic assumption in the memory term of the Mori-Zwanzig formulation (2.7). That is, we assume the memory kernel K in the memory term decays over time, i.e., for sufficiently large t > 0,

More specifically, this is defined as follows.

For any small , the decaying memory assumption implies that the memory term in (2.7) depends only on the reduced variables z from its current state at time s = t to its recent past up to . The memory effect from the “earlier time” [0) is negligible, up to the choice of . We remark that this assumption holds true for many practical physical systems, whose states are usually dependent upon their immediate past and do not extend indefinitely backward in time. In another word, the initial conditions have diminishing influence on the system states as time evolves. The constant is independent of t and is called memory length. Its value is obviously problem dependent. Note that at the early stage of the system evolution when time t is small, i.e., , the condition (3.2) is trivially satisfied by letting min(). (Or, by setting z(t) = 0, for t < 0.) Therefore, throughout the rest of the paper we shall use without explicitly stating out this trivial case. Upon adopting the decaying memory assumption (3.2), we define the following approximate Mori-Zwanzig (AMZ) dynamical system

where is the memory length for a chosen small error threshold from (3.2), and the noise term F in (2.7) is embedded in the approximation.

3.2. Discrete Finite-Memory Approximation. We now consider discrete representation of the AMZ system (3.3). Let ˆ= ˆ) be the solution at time ∆ over a constant time step ∆. Let 0 be the memory length in the AMZ equation (3.3). Let 1 be the number of time levels inside the memory range of time , i.e., for such that

where 0 is the error. That is, we have assumed that the finite memory integral in AMZ (3.3) can be approximated by the (+ 1) dimensional function M. Note that this merely assumes a finite integral can be approximated by using the values of its integrand at a set of discrete locations inside the integraton domain. This is a very mild assumption used in the theory of numerical integration, which is a classic numerical analysis topic. For example, one can always choose M to be a certain numerical quadrature rule using the nodes and with established error behavior. Our approach in this paper, however, does not use any pre-selected approximation rule for the memory integral. Instead, we shall leave M unspecified and treat it as unknown.

With the discrete approximation to the memory integral in place, we now define a discrete approximate Mori-Zwanzig (d-AMZ) equation,

where M is defined in (3.4). Obviously, this is an approximation to the AMZ (3.3) at time level , where the approximation error stems from the use of (3.4). Note that both R and M on the right-hand-side are unknown at this stage.

3.3. Neural Network Structure. The d-AMZ formulation (3.5) serves as our foundation for learning the dynamics of the reduced variables z. It indicates that the evolution of z at any time depends on not only its current state but also a finite number of its past history states , where the number depends on the memory length and the time step size ∆. Based on this, we propose to build a deep neural network structure to create a mapping from to and utilize the observational data on z to train the network.

Let us define

and a () matrix

where a () size identity matrix is concatenated by zero matrices of size (). Let

be the operator of a fully connected feedforward neural network with parameter set Θ. We then define a deep neural network in the following manner

A illustration of this network with = 2 memory terms is shown in Fig. 3.1. It is straightforward to see that this network creates a mapping

3.4. Data, Network Training and Prediction. To train the network (3.9), we re-organize the data set (2.2). Let be the memory length. With time step ∆ chosen, we have .

For the trajectory data (2.2) on z, let us consider each of the i-th trajectory data, where i = 1. We assume + 2. That is, each trajectory needs to contain at least (+ 2) data entries. (Otherwise this trajectory is discarded.) We then select a sequence of (+2) data entries of consecutive time instances along this trajectory and group them into two vectors, with the first one as the concatenation

Fig. 3.1: Illustration of the proposed neural network with = 2 memory steps.

of the first (+ 1) entries in the form of (3.7) and the second one as the last entry, in the following form, = 1(3.11)

where is the number of such selected sequences of length (+ 2), and

Here is the “starting” position of this sequence in the i-th trajectory. Obviously, when the total number of data entries along the trajectory is exactly (+ 2), the starting position has to be = 1 and the number of such groupings is = 1.

When the number of data entries is more than (+2), one may choose 1 number of such groups. We here discuss two straightforward selections.

• Deterministic selection. This is done by selecting the starting position sequentially from = 1 to 1, and then for each selected starting position take a sequence of (+ 2) to form the group (3.11). This results in = 1 number of groups. • Random selection. Choose 1 as the number of se-lected groups. Randomly selected starting position from the index set . And for each selected starting position, form a group in the form of (3.11). In our numerical studies, we have found the random selection to be more effective than the deterministic selection.

The aforementioned group selection procedure is then repeated for each i = 1, trajectory. We then form the training data set by collecting all the groups from (3.11)

where is the total number of data groupings. This is the training data set, where we have re-labeled the entries using a single index j for computational convenience.

By using the data set (3.13) and our network structure (3.9), we then train the neural network by finding its parameter set Θthat minimizes the mean-squared loss, i.e.,

where

is the network output via (3.9) for input . Upon finding the optimal network parameter Θ, we obtain a trained network model

This in turn defines a predictive model for the unknown dynamical system for the

observed variables z,

With initial data on z, one can iteratively apply the network model to predict the evolution of z at later time.

It is worthwhile to discuss the difference between the trained network model (3.16) and Euler forward approximation of the d-AMZ equation (3.5). If the operators R and M in (3.5) are known, its Euler forward approximation takes the following form,

This obviously induces temporal discretization error. In this case of Euler forward the error is O(∆).

Although our model (3.16) and the Euler approximation (3.17) resemble each other, we emphasize that they are fundamentally different. Our neural network model (3.16) is a direct nonlinear approximation to the time average of the right-hand-side of (3.5), whereas the Euler method (3.17) is a pre-selected piecewise constant approximation to the time average. Therefore, the Euler method requires the knowledge of the operators R and M, which is not available in our setting, and has O(∆) temporal error. Our neural network model (3.16), on the other hand, does not contain this temporal error and uses data to directly approximate the operators in (3.5). For autonomous dynamical systems, it was shown that the neural network model is exact in temporal integration ([17]), with the only source of errors being the training error (3.14). Error analysis for the reduced system model (3.16) in this paper is considerably more complicated, and will be pursued in separate studies.

4. Numerical Examples. In this section, we present numerical examples to examine the performance of the proposed learning method. Our examples include two linear systems, where the exact Mori-Zwanzig formulation for the reduced system is available, and two nonlinear systems, one of which is chaotic. Since in all examples the true models are available, we are able to compute their solutions with high resolution numerical solver. This creates reference solutions, with which we compare the predictive results by our neural network models. For the chaotic system, an analytically defined reduced model is also available, and its results are used to compare against those of our trained reduced network model.

The training data for the reduced variables are synthetic and generated by solving the true systems with high resolution. In each example, we first choose a range of interest for the full variables x = (. This will be the range in which we seek an accurate model for the observed variable . We then randomly generate number of initial conditions using the uniform distribution on . For each i = 1, we solve the underlying true system of equations with high resolution and march forward in time with time step ∆. In all examples, we set ∆ = 0.02. Each trajectory is marched forward in time for = K = const steps. We then only keep the trajectory data for the observed variables z to create our raw data set (2.2). For benchmarking purpose, we did not add additional noises to the data. This allows us to examine more closely the properties of the method.

The memory length (3.2) is problem dependent. In each example, we progressively increase the memory length to achieve converged results. The number of memory steps is then determined as . We then randomly select number of sequences of length (+ 2) data entries from each of the i-th trajectory data, where i = 1, as described in Section 3.4, to form the training data set (3.13). We fix = to be a constant for all trajectories. The total number of data entries in the training data set (3.13) is then . In all the tests, we purposefully keep the number of data J to be roughly 5 10 times of the number of parameters in the neural network structure. This is to avoid any potential training accuracy loss due to lack of data and/or overfitting, thus allowing us to focus on the properties of the numerical methods. In practical computations when data are limited, proper care needs to be taken during network trianing. This is well recognized and well studied topic outside the scope of this paper.

system

where is a parameter controlling the decay rate of the solution. We set to be the observed variable and to be the unobserved variable. Note that this is a simplified case of the well documented case

Fig. 4.1: Example 1: fast decay case. Neural network model prediction of with memory length = 0.6 using four different initial conditions.

where the matrices A’s are of proper sizes. The exact Mori-Zwanzig dynamics for the observed variable z is known as

In our example, we set the domain of interest to be = [2]. For data generation, we set all trajectory length to be (+ 2). Hence, each trajectory contributes to one data entry (= 1) in the training data set (3.13). Based on the solution behavior, two cases are presented here: (1) Fast decay case for = 2; and (2) Slow decay case for = 1.1.

We first consider the fast decay case. In Fig. 4.1, we plot the neural network model prediction of the observed variable for up to t = 20, using memory term = 30, which corresponds to memory length ∆ = 0.6. We observe that, for four arbitrarily chosen initial conditions, the network predictions match the exact true solution well. Since the solution decay to zero fast, we did not conduct model prediction over longer term.

Longer-term predictions are conducted for the slow decaying case with = 1.1, as shown in Fig. 4.2. The NN model is constructed using memory lenght = 0.6. The model prediction again shows good agreement with the exact solution for up to t = 100.

We then examine the effect of different memory length in the NN modeling. The prediction errors produced by the models with varying are shown in Fig. 4.3. We

Fig. 4.2: Example 1: slow decay case. Neural network model prediction of with memory length = 0.6 using two different initial conditions.

observe that the NN models become more accurate as the memory length increases. The accuracy starts to saturate around = 30, which corresponds to memeory length = 0.6. Further increasing the memory length induces no further accuracy improvement.

4.2. Example 2: Nonlinear System. We now consider a damped pendulum system, which is a simple nonlinear system.

where = 0.1 and = 8.91. The domain of interest is set as = [2] [4]. The observed variable is and we seek to construct an NN model for its prediction. Our training data sets (3.13) are generated by collecting = 5 sequences of data randomly from each trajectory data of length K = 50. The memory step is tested for = 3, 5, 8, 10, 13, 15, 18, 20. The numerical errors in the model predictions with different memory steps are shown in Fig. 4.4. We observe that the accuracy improvement over increasing starts to saturate with 10. The NN model predictive result with = 20 is shown in Fig. 4.5. This corresponds to memory length ∆ = 0.4. We again observe very good agreement with the reference solution for long-term integration up to t = 100.

Fig. 4.3: Example 1: NN prediction errors in log scale vs. number of memory terms.

Fig. 4.4: Example 2. Prediction errors in log scale vs. the number of memory steps.

tem ([14])

where 0 is a small parameter. In this example, we choose the observed variables to be z = (and let the fast variable y be the unobserved variable. Note that

Fig. 4.5: Example 2. NN model prediction of with memory length = 0.4 using two arbitrary initial conditions.

for this system, there exists a homogenized system for the slow variables (),

The reduced system is a good approximation for the true system when 1. Here we will construct NN models for the reduced variables z and compare the prediction results against the true solution of (4.5), as well as those obtained by the reduced system (4.6). We set = 0.01, in which case the reduced system (4.6) is considered an accurate approximation of the true system.

The domain of interest is set to be = [10]18]100], which is sufficiently large to enclose the solution trajectories for different initial conditions. To generate training data set (3.13), we solve the true system (4.5) using randomly sampled initial conditions via a high resolution numerical solver and record the trajectories of z with length K = 100, which corresponds to a time lapse of K∆ = 2. In each trajectory, we randomly select = 5 sequences of data with length (+ 2) for our training data set. Different memory steps of = 10, 20, 30, 40, 50, 60, 70, 80 are examined. This corresponds to memory length T ranging from 0.2 to 1.6. Our results indicate that = 60, i.e, = 1.2, delivers accurate predictions. Further increasing memory length does not lead to better predictions.

The evolution of prediction errors, measured in -norm against the reference true

Fig. 4.6: Example 3. Errors in model predictions over time, by the NN model with = 1.2 and the reduced system (4.6). The errors are averaged over 100 simulations of different initial conditions.

solution, are shown in Fig. 4.6, for both our neural network model and the reduced model (4.6) for long term prediction up to t = 400. Here the errors are averaged over 100 simulations using randomly selected initial conditions. We observe that our NN model produces noticebaly more accurate results than the reduced system (4.6), even for the case of small when (4.6) is supposed to be highly accurate. More importantly, the NN model exhibits much smaller error growth over long time, compared to the reduced system (4.6). The solution behavior is shown in Fig. 4.7, for long-term integration up to t = 400 for the first component . (Behavior for other components are similar.) The NN model produces visually better results than the reduced system (4.6), when compared to the true solution, especially in term of capturing the phase/frequency of the solution.

system involving 20 state variables x = (,

where is identity matrix of size 10 10, and Σ= 1, 2, j = 1, 2. We set the observed variables to be and let be the unobserved variables. Note that as a linear system (4.2), the exact Mori-Zwanzig equations for the reduced variables z are available in analytical form (4.3). We set the entries of Σto be small and consider them as perturbations to an oscillatory system. The exact values of the entries of matrices Σare presented in Appendix, for self completeness of the paper.

The domain of interest is set to be = [2]. In generating the training data sets, we randomly select = 5 sequences of data from each trajectory of z with length K = 100. We test different memory steps for = 10, 15, 20, 25, 30, 35, 40, 45, 50. Our experiments show that = 30, which corresponds to a memory length = 0.6, provides accurate prediction. Further increasing memory length does not lead to better prediction accuracy. Our NN model predictions for long-term integration up to t = 150 are shown in Fig. 4.8. Compared to the reference solution obtained from

Fig. 4.7: Example 3. Long-term model prediction of by the NN model with = 1.2 and the reduced system (4.6). Zoomed view for [200, 400] with three aribitrarily chosen initial conditions.

the true system, we observe very good agreement, where the NN predictions overlap the true solutions to be visually indistinguishable.

5. Conclusion. We present construction of deep neural network (DNN) model to approximating unknown dynamical systems when only a subset of variables are observed. The DNN model then provides a reduced model for the unknown dynamical system. Based on Mori-Zwanzig (MZ) formulation for reduced systems, we established a discrete Mori-Zwanzig formulation with finite memory assumption. We then designed a straightforward DNN structure to explicitly incorporate the system memory into the predictive model. Numerical tests on both linear and nonlinear systems demonstrated good accuracy of the DNN models. This invites further in-depth study of the approach, both theoretically and numerically.

REFERENCES

[1] D. Bernstein, Optimal prediction of Burgerss equation, Multiscale Model. Simul., 6 (2007), p. 2752.

Fig. 4.8: Example 4. NN model prediction of p for up to t = 150, with memory length = 0.6.

[2] C. Brennan and D. Venturi, Data-driven closures for stochastic dynamical systems, J. Comput. Phys., 372 (2018), p. 281298.

[3] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci., 113 (2016), pp. 3932–3937.

[4] Z. Chen and D. Xiu, On generalized residue network for deep learning of unknown dynamical systems, J. Comput. Phys., submitted (2020).

[5] A. Chertock, D. Gottlieb, and A. Solomonoff, Modified optimal prediction and its application to a particle-method problem, J. Sci. Comput., 37 (2008), pp. 189–201.

[6] A. J. Chorin, O. H. Hald, and R. Kupferman, Optimal prediction with memory, Physica D: Nonlinear Phenomena, 166 (2002), pp. 239–257.

[7] O. Hald and P. Stinis, Optimal prediction and the rate of decay for solutions of the Euler equations in two and three dimensions, Proc. Natl. Acad. Sci., 104 (2007), p. 65276532.

[8] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[9] S. H. Kang, W. Liao, and Y. Liu, IDENT: Identifying differential equations with numerical time evolution, arXiv preprint arXiv:1904.03538, (2019).

[10] H. Lei, N. Baker, and X. Li, Data-driven parameterization of the generalized Langevin equation, Proc. Natl. Acad. Sci., 113 (2016), p. 1418314188.

[11] Z. Long, Y. Lu, and B. Dong, PDE-Net 2.0: Learning PDEs from data with a numericsymbolic hybrid deep network, arXiv preprint arXiv:1812.04426, (2018).

[12] Z. Long, Y. Lu, X. Ma, and B. Dong, PDE-net: Learning PDEs from data, in Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, Stockholmsm¨assan, Stockholm Sweden, 10– 15 Jul 2018, PMLR, pp. 3208–3216.

[13] H. Mori, Transport, collective motion, and brownian motion, Progress of theoretical physics, 33 (1965), pp. 423–455.

[14] G. Pavliotis and A. Stuart, Multiscale methods: averaging and homogenization, Springer, 2008.

[15] T. Qin, Z. Chen, J. Jakeman, and D. Xiu, A neural network approach for uncertainty quantifi-cation for time-dependent problems with random parameters, Inter. J. Uncertainty Quan-tification, submitted (2020).

[16] T. Qin, K. Wu, and D. Xiu, Data driven governing equations approximation using deep neural networks, J. Comput. Phys., 395 (2019), pp. 620 – 635.

[17] T. Qin, K. Wu, and D. Xiu, Structure-preserving method for reconstructing unknown Hamiltonian systems from trajectory data, SIAM J. Sci. Comput., submitted (2019).

[18] M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial differential equations, Journal of Machine Learning Research, 19 (2018), pp. 1–24.

[19] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations, arXiv preprint arXiv:1711.10561, (2017).

[20] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics informed deep learning (part ii): Data-driven discovery of nonlinear partial differential equations, arXiv preprint arXiv:1711.10566, (2017).

[21] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236, (2018).

[22] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery of partial differential equations, Science Advances, 3 (2017), p. e1602614.

[23] S. H. Rudy, J. N. Kutz, and S. L. Brunton, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, J. Comput. Phys., 396 (2019), pp. 483–506.

[24] H. Schaeffer, Learning partial differential equations via data discovery and sparse optimization, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 473 (2017).

[25] H. Schaeffer and S. G. McCalla, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), p. 023302.

[26] H. Schaeffer, G. Tran, and R. Ward, Extracting sparse high-dimensional dynamics from limited data, SIAM Journal on Applied Mathematics, 78 (2018), pp. 3279–3295.

[27] P. Stinis, Higher order MoriZwanzig models for the Euler equations, Multiscale Model. Simul., 6 (2007), p. 741760.

[28] Y. Sun, L. Zhang, and H. Schaeffer, NeuPDE: Neural network based ordinary and partial differential equations for modeling time-dependent data, arXiv preprint arXiv:1908.03190, (2019).

[29] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), (1996), pp. 267–288.

[30] G. Tran and R. Ward, Exact recovery of chaotic systems from highly corrupted data, Multiscale Model. Simul., 15 (2017), pp. 1108–1129.

[31] D. Venturi and G. Karniadakis, Convolutionless Nakajima-Zwanzig equations for stochastic analysis in nonlinear dynamical systems, Proc. R. Soc. A, 470 (2014), p. 20130754.

[32] Q. Wang, N. Ripamonti, and J. Hesthaven, Recurrent neural network closure of parametric POD-Galerkin reduced-order models based on the Mori-Zwanzig formalism, J. Comput. Phys., https://doi.org/10.1016/j.jcp.2020.109402 (2020).

[33] K. Wu, T. Qin, and D. Xiu, Structure-preserving method for reconstructing unknown hamiltonian systems from trajectory data, arXiv preprint arXiv:1905.10396, (2019).

[34] K. Wu and D. Xiu, Numerical aspects for approximating governing equations using data, J. Comput. Phys., 384 (2019), pp. 200–221.

[35] K. Wu and D. Xiu, Data-driven deep learning of partial differential equations in modal space, J. Comput. Phys., 408 (2020), p. 109307.

[36] Y. Zhu and D. Venturi, Faber approximation of the Mori-Zwanzig equation, J. Comput. Phys., 372 (2018), pp. 694–718.

[37] R. Zwanzig, Nonlinear generalized langevin equations, Journal of Statistical Physics, 9 (1973), pp. 215–220.

The detailed setting of Example 4 is

where is identity matrix of size 10 10, and Σ= 1, 2, j = 1, 2. The matices Σare defined as follows.

designed for accessibility and to further open science