Structure-preserving Method for Reconstructing Unknown Hamiltonian Systems from Trajectory Data

2019·arXiv

Abstract

KAILIANG WU , TONG QIN , AND DONGBIN XIU∗ Abstract. We present a numerical approach for approximating unknown Hamiltonian systems using observational data. A distinct feature of the proposed method is that it is structure-preserving, in the sense that it enforces the conservation of the reconstructed Hamiltonian. This is achieved by directly approximating the underlying unknown Hamiltonian, rather than the right-hand-side of the governing equations. We present the technical details of the proposed algorithm and its error estimate in a special case, along with a practical de-noising procedure to cope with noisy data. A set of numerical examples are presented to demonstrate the structure-preserving property and eﬀectiveness of the algorithm. Key words. data-driven discovery, Hamiltonian systems, equation approximation, structure-preserving method

1. Introduction. Data-driven discovery of physical laws has received an increasing amount of attention recently. While earlier attempts such as [2, 37] used symbolic regression to select the proper physical laws and determine the underlying dynamical systems, more recent efforts tend to treat the problem as an approximation problem. In this approach, the sought-after governing equation is treated as an unknown target function relating the data of the state variables to their temporal derivatives. Methods along this line of approach usually seek to exactly recover the equations by using certain sparse approximation techniques (e.g., [40]) from a large set of dictionaries; see, for example, [4]. Many studies have been conducted to effectively deal with noise in data [4, 35], corruptions in data [41], limited data [36], partial differential equations [32, 34], etc, and in conjunction with other methods such as model selection approach [22], Koopman theory [3], and Gaussian process regression [28], to name a few. Standard approximations using polynomials without seeking exact recovery can also be effective (cf. [44]). More recently, there is a surge of work that tackles the problem using machine learning methods, particularly via neural networks [29, 30], to systems involving ordinary differential equations (ODEs) [10, 31, 33, 26, 25] and partial differ-ential equations (PDEs) [23, 21, 15, 13, 27, 20, 45]. Neural network structures such as residual network (ResNet) were shown to be highly suitable for this type of problems [26].

Hamiltonian systems are an important class of governing equations in science and engineering. One of the most important properties of Hamiltonian systems is conservation of Hamiltonian, usually a nonlinear function of state variables, along trajectories. Research efforts have been devoted to estimating Hamiltonian function of a given system from measurements; cf. [38, 43, 11, 1, 19]. However, few studies exist to reconstruct an unknown Hamiltonian dynamical systems from trajectory data of state variables.

In this paper, we present a numerical approach to reconstruct an unknown Hamiltonian system from its trajectory data. The focus of this paper is on the conservation of the reconstructed Hamiltonian along the solution trajectories. The current method is an extension of the method proposed in [44], which seeks accurate approximation of unknown governing equations using orthogonal polynomials. However, instead of

∗Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA.

wu.3423@osu.edu, qin.428@osu.edu, xiu.16@osu.edu. Funding: This work was partially sup-

ported by AFOSR FA9550-18-1-0102. 1

approximating the governing equations directly, as in [44] and most of other existing studies, our current method seeks to approximate the unknown Hamiltonian first and then reconstruct the approximate governing equations using the approximate Hamiltonian. The approximation of the unknown Hamiltonian is conducted using orthogonal polynomials and with controllable numerical errors. Since in most practical situations, the Hamiltonian takes the form of smooth functions, polynomial approximation can achieve high order accuracy with modest degree polynomials. The resulting approximate governing equations, which are derived from the reconstructed Hamiltonian, can then automatically satisfy the conservation of the reconstructed Hamiltonian, which is an accurate approximation of the true Hamiltonian. This structure preserving (SP) property—the conservation of Hamiltonians along trajectories—is a distinctly new feature of our present method, not found in most of the existing studies. Along with a detailed exposition of the algorithm, we also provide an error estimate of the method in a special case and use a set of numerical examples to demonstrate the properties of the method.

2. Preliminaries. In this section, we introduce some basics about Hamiltonian systems and the setup of our data-driven discovery of Hamiltonian systems.

2.1. Hamiltonian Systems. Let us consider a Hamiltonian system

where p and q are column vectors in , and H(p, q) is a continuously differentiable scalar function called Hamiltonian. The Hamiltonian often represents the total energy of the system. It is not unique and is defined up to an arbitrary constant.

Let u := (be the state variable vector. The Hamiltonian system (2.1) can be equivalently written as

where stands for full gradient and the matrix J takes the form

with and being identity matrix and zero matrix of size , respectively. Hereafter, we will use in place of , unless confusion arises otherwise.

The Hamiltonian system (2.2) is an autonomous system and conserves the Hamiltonian along the integral curves (c.f., [24]). That is, the solution of the Hamiltonian system (2.2) satisfies

where is the initial state of the system at t = 0, and ) stands for the solution u at time t with an initial state .

2.2. Data and Problem Setup. We assume that the equations of the Hamiltonian system (2.1), or (2.2), are not known. Our data-driven approach for approximating the unknown equations requires the availability of a set of data pairings between the solution states and their corresponding time derivatives, in the form of

where ˙x denotes the time derivative of x, and K is the total number of data pairs.

2.2.1. Direct Data Collection. Let be a bounded domain. It is the domain-of-interest, inside which we seek to construct an accurate approximation to the unknown governing equations (2.2). Our data set (2.4) shall be collected in the domain D.

When time derivatives of the state variables are readily available, either directly measured from experiments or computed via certain numerical simulation techniques, the data collection procedure is straightforward. Let 1 be the number of solution trajectories, originated from initial states. Let 0 = be a sequence of time instances on the m-th trajectory, for m = 1, . . . , M. We assume that the state variables data and their derivative data are available on these time instances, i.e., for 0 and 1 ,

where and are errors/noises in the data for the state variables and their time derivatives, respectively.

Once all the data pairs are collected, they are grouped in the set (2.4), where we omit the subscripts and superscripts for notational convenience. This is because our method for approximating the governing equation using the data does not utilize the trajectory or time instance information associated with each data pair.

2.2.2. Time Derivatives Approximation and De-noising. In many practi- cal situations, time derivative data are not available. Consequently, one only possesses trajectory data for the state variables. In this case, it is necessary to estimate the time derivatives of the state variables via a numerical procedure.

Again, let M be the number of trajectories where only the state variable data are available. For notational convenience we let 0 = be a sequence of the same time instances on all trajectories, where the state variable data

are available. Again, stands for errors/noises in the data. To numerically estimate the time derivatives, it is necessary to require 1, i.e., there need to be at least two data entries of the state variables along each trajectory in order to estimate the time derivatives.

For noiseless data, i.e. , time derivatives can be computed by straightforward numerical differentiation. For example, for equally spaced time instances with uniform step-size ∆t, a second-order finite difference

with proper one-sided second-order finite difference at the end points j = 0 and j = J. This requires at least three data entries on each trajectory, i.e., 2, and induces errors of ). Higher order approximations requires more data points on each trajectory.

For noisy state variable data with , direct numerical differentiation is less robust, as the errors in estimating ) would scale as . Several techniques have been developed for numerical differentiation of noisy data. See, for example, [16, 42, 9, 5, 17]. In this paper, we employ a straightforward de-noising approach, which has been shown to be effective for equation recovery ([44]). We first construct a least squares polynomial approximation of the trajectory using the available data on x, and then analytically differentiate the least squares fitted polynomial to obtain an estimate of the time derivatives. More specifically, for each trajectory, the least squares polynomial approximation is to find a polynomial vector such that

where denotes vector 2-norm and denotes the space of one-dimensional polynomials of degree at most Q, with 1 . Once the least squares fitting problem is solved, the time derivatives can be approximated by differentiating the polynomials, i.e.,

This approach also provides a filter to de-noise the noisy trajectory data. For noisy data, we advocate the use of the filtered trajectory data to replace the original noisy data, i.e.,

Our numerical experiments indicate that this filtering procedure can improve the learning accuracy for noisy data. This is similar to the results from [44].

Remark 2.1. If the true trajectories ) are non-smooth, estimating time derivatives using global approximation (2.9) might not be sufficiently accurate. In this case, piecewise approximation should be considered.

3. The Main Method. With the data pairs (2.4), our goal is now to accurately approximate the unknown Hamiltonian system (2.2). Let f := , which is the unknown right-hand-side of (2.2). We seek an accurate approximation such that

is an accurate approximation of the true system (2.2). Our key goal is to ensure the approximate system is also Hamiltonian, in the sense that , where becomes an approximation to the true (and unknown) Hamiltonian. The existing methods for equation recovery seek to approximate the right-hand-side of the true system directly and therefore do not enforce the conservation of Hamiltonian.

3.1. Algorithm. To preserve the Hamiltonian, we propose to directly approximate the unknown Hamiltonian first and then derive the approximate governing equations from the approximate Hamiltonian.

Let us assume the unknown Hamiltonian ), which is a weighted Sobolev space on domain equipped with inner product

where ) is a (probability) measure defined on D. Let ) be a finite dimensional subspace. We then define its associated gradient function space as

Let be a basis for V. Then, for each j = 1, . . . , N, there exists a function such that

We then seek as an approximation to the true Hamiltonian H and = as an approximation to . Assume that N < K, i.e., the dimension of the linear subspace is smaller than the total number of available data pairs (2.4), we then define the following least squares problem

This provides a class of approximate Hamiltonians which differ only in an additive constant C:

where the constant C can be arbitrarily chosen and does not affect the resulting approximate Hamiltonian system (3.1). In particular, when taking C = 0, we use the notation for . The problem (3.4) is then equivalent to the following problem for the unknown coefficients c = (

where

with

This is an over-determined system of equations and can be readily solved. Upon solving this least squares type problem, we obtain and subsequently , which gives us the approximate system of equations (3.1). It is trivial to see that the system preserves the approximate Hamiltonian in the following sense.

Theorem 3.1. Let ) be the solution of the system (3.1) with initial state , then,

3.2. Analysis. We now present error analysis for the proposed algorithm in a special case. Our analysis is based on a few basic results from [7] for least squares polynomial approximations, which requires the following assumptions on the basis functions and the data.

mal in the following sense

Note that this assumption is only needed for the theoretical analysis. The practical computation of can be conducted by using any basis of V, for the solution does not depend on the basis. (Also, any non-orthogonal basis can be orthogonalized via Gram-Schmidt procedure.) We remark that the choice of basis affects the stability of the least squares problem (3.7). The actual computation of can be made using any known basis of W, since the solution to the problem (3.4) is independent of the chosen basis. Thus, the error estimates in Section 3.2.3 also hold for any other bases of W.

We assume that data for the state variable = 1, 2, . . . , K, are i.i.d. drawn from a probability measure ) on D. This is a standard assumption, made mostly to facilitate theoretical analysis. See, for example, [7].

3.2.2. Stability. The following stability result holds for the least squares prob- lem (3.7). Lemma 3.2. Consider the problem (3.7)

:= (1 + ) log(1 +

Proof. The proof is a direct extension of the proof of Theorem 1 in [7] (see also [8] for a correction).

Remark 3.1. The function is the “diagonal” of the reproducingkernel of V. It is independent of the choice of the orthonormal basis and only depends

on the space V and the measure . The following result is a direct consequence of Lemma 3.2 with = . Corollary 3.3. The least squares problem (3.7) is stable in the following sense:

for any r > 0,

provided that

3.2.3. Error bound. To analyze errors in the proposed algorithm, we consider only noiseless data case. For noisy data, the analysis is considerably more involved and will be pursued in a separate study.

For noiseless data, we consider the more practical case when only state variable data are available and time derivative data are computed numerically. Since the state variable data are noiseless, the only errors in the data set (2.4) are the numerical approximation errors for the time derivatives, as discussed Section 2.2.2. We assume the approximation errors := ˙are uniformly bounded, i.e.,

where is an assumed bound that depends on the regularity of u(t) in the time interval [0] and the accuracy of the numerical differentiation method.

For any r > 0, under the condition (3.13), it holds that

where the expectation E is taken over the random sequences of is defined in (3.13), L is the bound defined in (3.15), ) is defined by

and Π) denotes the orthogonal projector of onto V, i.e., the best approximation to in V,

We now discuss error bound for the reconstructed Hamiltonian

Note that Hamiltonian is not unique and is defined up to an additive constant C. Therefore, the error between ) and H(x) should be understood in the quotient space .

where is a constant depending only on the domain D and the dimensionality d. Furthermore, under the assumptions of Corollary 3.5, we have

Proof. Let us take

Using the Poincar´e inequality (cf. [18]) we obtain (3.17). The proof is then completed upon combining (3.17) with Corollary 3.5.

4. Numerical Examples. In this section we present numerical examples to demonstrate the properties and effectiveness of the proposed method.

In all the test cases, we generate synthetic trajectory data by solving the underlying Hamiltonian systems using a high resolution numerical solver. More specifically, we use the classical fourth-order explicit Runge-Kutta method (cf. [12, p. 131]) with a very small time step of size 0.0001∆t. The proposed numerical method is then applied to the data to produce the corresponding approximate Hamiltonian systems, whose solutions are then compared against the solutions to the true Hamiltonian systems to examine numerical errors. Note that in all of the tests the only available data are on the solution state variables. The time derivatives of the states are estimated numerically using the procedure discussion in Section 2.2.2.

For convenience, we assume the computational domain D to be a hypercube. Without loss of generality, we employ polynomial basis functions in all the numerical examples. Specifically, we set the finite dimensional subspace W as , the linear space of 2d-dimensional polynomials of total degrees up to 1. That is,

where i = () is multi-index with . In all examples, we use the tensor products of univariate Legendre polynomials as a basis on the hypercube domain D, which are commonly used in many practical applications. See, for example, [39]. Although the Legendre polynomials do not satisfy the orthogonality defined at the beginning of Section 3.2, our numerical results indicate that they are a good choice. Note that the solution to the least squares problem (3.4) is independent of basis choice.

The gradient function space V is defined via (3.2), and we have

The basis functions of V are set as ) = = 1, . . . , N, where are the Legendre polynomials in W.

For noiseless data, we employ second-order finite difference method to estimate the time derivatives. For noisy data, we use the polynomial least squares de-noising (2.9) with a polynomial degree of Q = 5. The detail of the time derivative estimation is discussed in See Section 2.2.2.

Once the approximate system (3.1) is constructed, we simulate its trajectories for some arbitrarily chosen initial state , which is not in the training data (2.4), and then compare the errors against the trajectories u produced by the exact Hamiltonian system from the same initial state . All errors are reported as relatively errors in the following form for any 0:

Example 1: Single pendulum. The Hamiltonian of an ideal single pendulum with unit mass is its total energy

where l is the length of the pendulum, q is the angular displacement of the pendulum from its downward equilibrium position, p the angular momentum, and g = 9.8 the

gravitational constant. The true Hamiltonian formulation of the dynamics is

We set l = 1 and the computational domain D = (). The data pairs (2.4) consist of M = 500 short trajectories, each of which is generated by random initial state in D and contains J = 40 steps. Hereafter, the random initial states are independently drawn from the uniform distribution over D. All data are then perturbed by a multiplicative factor (1+), where is i.i.d. uniform distributed in [08]. This corresponds to 8% relative noise in all data.

6. The numerical solution of the approximate Hamiltonian system is denoted as ). To assess the accuracy of the algorithm, we set an arbitrarily chosen initial state = (876193)and solve both the approximate solution ) and the exact solution ). For cross-comparison, we also implemented the equation approximation algorithm from [44], which directly approximates the right-hand-side of the unknown governing equations and thus, in general, does not preserve any Hamiltonian. We denote this solution ). In Fig. 4.1(a), we plot the evolution of the relative errors in the numerical solutions. We clearly observe that the errors in our structure-preserving (SP) algorithm is notably smaller than the non-SP algorithm from [44]. In Fig. 4.1(b), we examine the time evolution of the Hamiltonians. The exact Hamiltonian is obviously conserved along the exact solution trajectory, i.e., )) = ). As expected from Theorem 3.1, the approximate Hamiltonian of the recovered system (3.1) is also exactly preserved along its trajectory. The only (small) errors in the computed Hamiltonian may (merely) arise from the ODE solver, which we employ to numerically solve the reconstructed system; see Figure 4.2 for the Hamiltonian deviation ∆ ) := ) computed by the classical fourth-order explicit Runge-Kutta solver with different time step-sizes . We clearly observe that the errors in the Hamiltonian deviation decrease quickly as we reduce , and the errors are close to the level of round-off error when = 2. We also list the , and total variation norms of the computed ∆ ) in Table 4.1, which shows that the errors in the Hamiltonian deviation converge to zero at a order related to the employed ODE solver. These results further confirm that the recovered system (3.1) does exactly preserve the approximate Hamiltonian . Note that the non-SP method from [44], albeit quite accurate, generally does not preserve or relate to any Hamiltonian. For the present test case, we have examined that the system recovered by the non-SP method, denoted by

is not a Hamiltonian system, because it does not satisfy = 0, so that there is no Hamiltonian ) satisfying and . Note that the data are noisy, and the approximate functions g(p, q) and h(p, q) are not univariate, not as the functions in the true system (4.1).

The advantage of the proposed SP algorithm is more notable in Fig. 4.3, we present system predictions over longer time. The SP method is able to accurately capture the phase of the solution much better than the non-SP method in [44].

Note that the de-noising procedure (2.11) has been applied in the computation. For comparison, we also apply the proposed SP learning method without using the de-noising procedure (2.11). The results are plotted in Fig. 4.4. Direct comparison of the numerical errors obtained by the two approaches is shown Fig. 4.5. It is evident that the results obtained without the de-noising procedure are less accurate than those by using de-noising.

(a) Evolution of relative errors (b) Evolution of Hamiltonian

Fig. 4.1. Example 1: Solutions of the reconstructed system with an initial state . Left: relative errors against the true solution by the SP method (non-SP method (); Right: time evolution of the Hamiltonian.

(a) τ = 1 × 10−3 (b) τ = 5 × 10−4 (c) τ = 2.5 × 10−4

Fig. 4.2. Example 1: evolution of the Hamiltonian deviation of the reconstructedsystem computed by the fourth-order Runge-Kutta solver with different time step-sizes

whose Hamiltonian is

We set the parameters = 1 and = 1.1 and the computational domain D as [1]. We use M = 300 noiseless short trajectory data, each of which contains J = 2 intervals (i.e. 3 data points). The degree of the polynomials for approximation of the Hamiltonian is n = 6. The reconstructed system is solved with an initial state = (0and compared against the solution of the true system. The relative numerical errors in the solutions of the SP algorithm (denoted as ) and non-SP algorithm from [44] (denoted as ) are plotted in Fig. 4.6, along with the time evolution of the reconstructed Hamiltonian. The higher accuracy of the SP algorithm is again evident from the plot, as it induces smaller errors over long-term integration and preserves the approximate Hamiltonian along its trajectory. In Fig. 4.7 and Fig. 4.8, we present the trajectories and the phase plots generated by the reconstructed system. The advantage of the new SP algorithm is again notable, as it is able to preserves both the phase and amplitude of the solution much better over long-term integration.

(a) p(t) of the learned SP system (b) q(t) of the learned SP system (c) p(t) of the learned non-SP system (d) q(t) of the learned non-SP system

Fig. 4.3. Example 1: Long-term solution of the reconstructed system by the SP algorithm (top plots) and non-SP algorithm (bottom plots), with initial state (Solid lines represent solution of the true system.)

(a) p(t) of the SP method without de-noising (b) q(t) of the SP method without de-noising

Fig. 4.4. Example 1: Long-term solution of the reconstructed systems by the SP algorithm without the de-noising procedure (2.11). (Solid lines represent solution of the true system.)

Example 1: The errors and convergence rates of the Hamiltonian deviation puted by the fourth-order Runge-Kutta solver with different time step-sizes

Fig. 4.5. Example 1: Evolution of relative errors against the true solution by the SP method with the de-noising procedure () and without the procedure (), respectively.

We now consider the H´enon-Heiles system [14],

where the Hamiltonian is

This system is used to describe the motion of stars around a galactic center. Chaotic behavior of the solution will appear when the Hamiltonian is larger than 1/8 ([14]). For our numerical tests, we choose the computational domain D to be [1]and employ M = 500 trajectories, each of which contain J = 2 intervals. Polynomials of degree up to n = 3 are used to approximate the Hamiltonian. The reconstructed system is solved with an initial state = (025)and compared against the true solution. The time evolution of the reconstructed Hamiltonian and the numerical error in the solution are plotted in Fig. 4.9. We observe sufficiently small

(a) Evolution of relative errors (b) Evolution of Hamiltonian

Fig. 4.6. Example 2: Solutions of the reconstructed system with an initial state . Left: relative errors against the true solution by the SP method () and non-SP method (); Right: time evolution of the Hamiltonian.

(a) p(t) of the learned SP system (b) q(t) of the learned SP system (c) p(t) of the learned non-SP system (d) q(t) of the learned non-SP system

Fig. 4.7. Example 2: Comparison of solutions of the learned SP and non-SP systems with the solution of the true system for the initial state . Solid lines represent solution of the true system and the (blue) circles are solutions of the approximate ones.

(a) Exact (b) SP (c) non-SP

Fig. 4.8. Example 2: Phase plots on plane starting from the initial state

and stable numerical errors and good conservation of the Hamiltonian over relatively long-term integration.

In Fig. 4.10 and Fig. 4.11, the trajectory plots and phase plots for the reconstructed system using the new SP algorithm are presented, along with those from the true system as reference. The solutions exhibit non-trivial behavior. And the reconstructed system is able to accurately produce the solutions.

(a) Evolution of relative errors (b) Evolution of Hamiltonian

Fig. 4.9. Example 3: Solutions of the reconstructed systems with an initial state . Left: time evolution of the relative errors; Right: time evolution of the Hamiltonian.

Example 4: Cherry problem. We now consider the Cherry Hamiltonian system [6],

(a) p1(t) (b) p2(t) (c) q1(t) (d) q2(t)

Fig. 4.10. Example 3: Comparison of the solutions of the reconstructed system

(circles) with those of the true system (solid lines), for the same initial state

(a) Plase plots on q1-p1 plane

(b) Plase plots on q2-p2 plane

whose true Hamiltonian is

We take the computational domain as D = (2) 2) 1) 1) and use M = 500 short trajectory data, each of which contain J = 2 intervals. Polynomials of degree up to n = 3 are employed to approximate the Hamiltonian. The reconstructed system is then solved using an arbitrarily chosen initial state = (. The solutions are then compared against those from the true system with the same initial state. The relative errors in the numerical prediction are plotted in Fig. 4.12, along with the time evolution of the reconstructed Hamiltonian and the true Hamiltonian H. We again observe good accuracy by the SP algorithm and conservation of the approximate Hamiltonian. The solution states are plotted in Figs. 4.13, and their phase plots in 4.14. The numerical solutions agree with the true solutions well.

(a) Evolution of relative errors (b) Evolution of Hamiltonian

Fig. 4.12. Example 4: Solutions of the reconstructed system with an initial state . Left: time evolution of the relative error; Right: time evolution of the Hamiltonian.

Example 5: Double pendulum. Finally, we consider a double pendulum problem, as illustrated in Fig. 4.15. Two masses and are connected via massless rigid rods of length and , and and are the angles of the two rods with respect to the vertical direction. We define the canonical momenta of the system as

By letting and , the Hamiltonian of the system is

H(p, p, q, q) = mp+ (m+ m) lp2mllppcos(q) 2ml+ msin(q

(a) p1(t) (b) p2(t) (c) q1(t) (d) q2(t)

Fig. 4.13. Example 4: Comparison of solutions of the reconstructed systems (circles) with the solution of the true system (solid lines), with the initial state

(a) Plase plots on p1-p2 plane

(b) Plase plots on q1-q2 plane

Fig. 4.15. A diagram of the double pendulum.

where

In the numerical experiment, we set = 1 and g = 9.8, and set the computational domain as D = (5) 4) 1) 1). The Hamiltonian in this example is notably more complicated than the ones in the previous examples. Consequently, we employ a higher order polynomial, of degree up to n = 15, to conduct the approximation. The data set include M = 20, 000 short trajectories, each of which contains J = 2 intervals. The reconstructed Hamiltonian system is then solved with an arbitrarily chosen initial state = (0)for up to T = 20. Its solution is compared against the reference solutions from the true system with the same initial state. Fig. 4.16 shows the evolution of the reconstructed Hamiltonian and the true one, which remain constant as expected. The time evolution of the solution states are plotted in Fig. 4.17. We observe good agreement with the true solution. The corresponding phase plots are further displayed in Fig. 4.18, where we also plot the results obtained by using higher order polynomial approximations (n = 16 with M = 20, 000 and n = 18 with M = 60, 000) to show the convergence. We see that the evolution of trajectories is more accurately predicted by the reconstructed Hamiltonian system obtained by using higher degree polynomials. The relative numerical errors, shown in Fig. 4.19, further validate the convergence behavior.

Fig. 4.16. Example 5: Time evolution of the Hamiltonian of the reconstructed system with an initial state

5. Conclusion. We presented a structure-preserving numerical method for reconstructing unknown Hamiltonian systems using observation data. The key ingredient of the method is to approximate the unknown Hamiltonian first and then derive the approximate equations using the reconstructed Hamiltonian. By doing so, the reconstructed system is able to preserve the approximate Hamiltonian along its trajectories. This is an important property often desired by many practical applications. We presented the algorithm, its error estimate in a special case and used a variety of examples to demonstrate the effectiveness of the approach. In its current form, polynomials are used to construct the approximation. Other forms of approximation, such as neural networks, will be explored in a separate work.

Appendix A. Proof of Theorem 3.4. The technique used in the following proof is similar to the proof of Theorem 3 of [7]. However, our Theorem 3.4 applies to vector-valued function in gradient function space. This prevents direct use, in component-by-component manner, of the result from [7], which applies only to scalar-valued function. Also, our analysis incorporates numerical errors induced by estimating time derivatives ˙. These numerical errors often do not follow random distribution. Consequently, we do not employ i.i.d. assumption on the errors, as opposed to the work of [7]. Due to these subtle, and yet significant, differences, we include the proof of Theorem 3.4 here for completeness of the paper.

Proof. Let = be the probability measure of the random sequence . Let Ω be the set of all possible draws, which is divided into the set Ωof all draws such that

and the complement set Ω:= Ω . We consider the following splitting

We now estimate the upper bounds for and .

(a) p1(t) (b) p2(t) (c) q1(t) (d) q2(t)

(a) Plase plots on q1-p2 plane

(b) Plase plots on q2-p1 plane

Fig. 4.18. Example 5: Phase plots starting from the initial state n = 15, M = 20, 000; middle: n = 16, M = 20, 000; bottom: n = 18, M = 60, 000.

We now consider . For every , if , then

so that

Fig. 4.19. Example 5: Time evolution of the relative errors in solutions of the recon- structed system with an initial state

For almost every with respect to ), if , then

for almost every with respect to ). It follows that

Let us rewrite the derivative data as

where denotes the error in the estimated derivative. Define := ). Similar to [7], one can write

where

and

where = (and = (are respectively the solutions of the two systems

with the matrix A defined in (3.8), y = (= (, and

Hence

For each 1 , we estimate as follows:

where the CauchySchwarz inequality has been used in the inequality. It follows that

We now estimate for each 1 .

It follows from (A.6) that

which together with (A.4) complete the proof.

REFERENCES

[1] E. Bairey, I. Arad, and N. H. Lindner, Learning a local hamiltonian from local measurements, Phys. Rev. Lett., 122 (2019), p. 020504.

[2] J. Bongard and H. Lipson, Automated reverse engineering of nonlinear dynamical systems, Proc. Natl. Acad. Sci. U.S.A., 104 (2007), pp. 9943–9948.

[3] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz, Chaos as an intermittently forced linear system, Nature Communications, 8 (2017).

[4] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. U.S.A., 113 (2016), pp. 3932–3937.

[5] R. Chartrand, Numerical differentiation of noisy, nonsmooth data, ISRN Applied Mathematics, 2011 (2011).

[6] T. M. Cherry, V. on periodic solutions of hamiltonian systems differential equations, Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 227 (1928), pp. 137–221.

[7] A. Cohen, M. A. Davenport, and D. Leviatan, On the stability and accuracy of least squares approximations, Foundations of Computational Mathematics, 13 (2013), pp. 819–834.

[8] A. Cohen, M. A. Davenport, and D. Leviatan, Correction to: On the stability and accuracy of least squares approximations, Foundations of Computational Mathematics, 19 (2019), pp. 239–239.

[9] J. Cullum, Numerical differentiation and regularization, SIAM J. Numer. Anal., 8 (1971), pp. 254–265.

[10] W. E, A proposal on machine learning via dynamical systems, Communications in Mathematics and Statistics, 5 (2017), pp. 1–11.

[11] H. Fujita, Y. O. Nakagawa, S. Sugiura, and M. Oshikawa, Construction of hamiltonians by supervised learning of energy and entanglement spectra, Physical Review B, 97 (2018), p. 075114.

[12] D. F. Griffiths and D. J. Higham, Numerical methods for ordinary differential equations: initial value problems, Springer Science & Business Media, 2010.

[13] J. Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences, 115 (2018), pp. 8505–8510.

The applicability of the third integral of motion: some numerical experiments, The Astronomical Journal, 69 (1964), p. 73.

[15] Y. Khoo, J. Lu, and L. Ying, Solving parametric pde problems with artificial neural networks, arXiv preprint arXiv:1707.03351, (2018).

[16] I. Knowles and R. J. Renka, Methods for numerical differentiation of noisy data, Electronic Journal of Differential Equations, 21 (2014), pp. 235–246.

[17] I. Knowles and R. Wallace, A variational method for numerical differentiation, Numer. Math., 70 (1995), pp. 91–110.

[18] G. Leoni, A first course in Sobolev spaces, vol. 181, American Mathematical Soc., 2017.

[19] H. Li, C. Collins, M. Tanha, G. J. Gordon, and D. J. Yaron, A density functional tight binding layer for deep learning of chemical hamiltonians, Journal of chemical theory and computation, 14 (2018), pp. 5764–5776.

[20] Z. Long, Y. Lu, and B. Dong, PDE-Net 2.0: Learning PDEs from data with a numericsymbolic hybrid deep network, arXiv preprint arXiv:1812.04426, (2018).

[21] Z. Long, Y. Lu, X. Ma, and B. Dong, PDE-Net: learning PDEs from data, arXiv preprint arXiv:1710.09668, (2017).

[22] N. M. Mangan, J. N. Kutz, S. L. Brunton, and J. L. Proctor, Model selection for dynamical systems via sparse regression and information criteria, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 473 (2017).

[23] A. Mardt, L. Pasquali, H. Wu, and F. Noe, VAMPnets for deep learning of molecular kinetics, Nature Comm., 9 (2018), p. 5.

[24] J. E. Marsden and T. S. Ratiu, Introduction to mechanics and symmetry: a basic exposition of classical mechanical systems, vol. 17, Springer Science & Business Media, 2013.

[25] T. Qin, Z. Chen, J. Jakeman, and D. Xiu, A neural network approach for uncertainty quantification for time-dependent problems with random parameters, arXiv preprint arXiv:1910.07096, (2019).

[26] T. Qin, K. Wu, and D. Xiu, Data driven governing equations approximation using deep neural networks, J. Comput. Phys., 395 (2019), pp. 620–635.

[27] M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial differential equations, arXiv preprint arXiv:1801.06637, (2018).

[28] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Machine learning of linear differential equations using gaussian processes, Journal of Computational Physics, 348 (2017), pp. 683– 693.

[29] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations, arXiv preprint arXiv:1711.10561, (2017).

[30] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics informed deep learning (part ii): data-driven discovery of nonlinear partial differential equations, arXiv preprint arXiv:1711.10566, (2017).

[31] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236, (2018).

[32] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery of partial differential equations, Science Advances, 3 (2017), p. e1602614.

[33] S. H. Rudy, J. N. Kutz, and S. L. Brunton, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, arXiv preprint arXiv:1808.02578, (2018).

[34] H. Schaeffer, Learning partial differential equations via data discovery and sparse optimization, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 473 (2017).

[35] H. Schaeffer and S. G. McCalla, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), p. 023302.

[36] H. Schaeffer, G. Tran, and R. Ward, Extracting sparse high-dimensional dynamics from limited data, arXiv preprint arXiv:1707.08528, (2017).

[37] M. Schmidt and H. Lipson, Distilling free-form natural laws from experimental data, Science, 324 (2009), pp. 81–85.

[38] A. Shabani, M. Mohseni, S. Lloyd, R. L. Kosut, and H. Rabitz, Estimation of many-body quantum hamiltonians via compressive sensing, Phys. Rev. A, 84 (2011), p. 012107.

[39] , American Mathematical Society, Providence, RI, 1939.

[40] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical

Society. Series B (Methodological), (1996), pp. 267–288.

[41] G. Tran and R. Ward, Exact recovery of chaotic systems from highly corrupted data, Multiscale Model. Simul., 15 (2017), pp. 1108–1129.

[42] J. Wagner, P. Mazurek, and R. Z. Morawski, Regularised differentiation of measurement data, in XXI IMEKO World Congress “Measurement in Research and Industry”, Prague, Czech Republic, 2015.

[43] N. Wiebe, C. Granade, C. Ferrie, and D. G. Cory, Hamiltonian learning and certification using quantum resources, Physical review letters, 112 (2014), p. 190501.

[44] K. Wu and D. Xiu, Numerical aspects for approximating governing equations using data, J. Comput. Phys., 384 (2019), pp. 200–221.

[45] K. Wu and D. Xiu, Data-driven deep learning of partial differential equations in modal space, J. Comput. Phys., 408 (2020), p. 109307.

Designed for Accessibility and to further Open Science