Deep learning has gained unprecedented attention over the last few years, and deservedly so, as it has introduced transformative solutions across diverse scientific disciplines [1, 2, 3, 4]. Despite the ongoing success, there exist many scientific applications that have yet failed to benefit from this emerging technology, primarily due to the high cost of data acquisition. It is well known that the current state-of-the-art machine learning tools (e.g., deep/convolutional/recurrent neural networks) are lacking robustness and fail to provide any guarantees of convergence when operating in the small data regime, i.e., the regime where very few training examples are available.
In the first part of this study, we introduced physics informed neural networks as a viable solution for training deep neural networks with few training examples, for cases where the available data is known to respect a given physical law described by a system of partial differential equations. Such cases are abundant in the study of physical, biological, and engineering systems, where longstanding developments of mathematical physics have shed tremendous insight on how such systems are structured, interact, and dynamically evolve in time. We saw how the knowledge of an underlying physical law can introduce structure that effectively regularizes the training of neural networks, and enables them to generalize well even when only a few training examples are available. Through the lens of different benchmark problems, we highlighted the key features of physics informed neural networks in the context of data-driven solutions of partial differential equations [5, 6].
In this second part of our study, we shift our attention to the problem of data-driven discovery of partial differential equations [7, 8, 9]. To this end, let us consider parametrized and nonlinear partial differential equations of the general form
where u(t, x) denotes the latent (hidden) solution, ] is a nonlinear operator parametrized by
, and Ω is a subset of
. This setup encapsulates a wide range of problems in mathematical physics including conservation laws, diffusion processes, advection-diffusion-reaction systems, and kinetic equations. As a motivating example, the one dimensional Burgers’ equation [10] corresponds to the case where
Here, the subscripts denote partial differentiation in either time or space. Now, the problem of data-driven discovery of partial differential equations poses the following question: given a small set of scattered and potentially noisy observations of the hidden state u(t, x) of a system, what are the parameters
that best describe the observed data?
In what follows, we will provide an overview of our two main approaches to tackle this problem, namely continuous time and discrete time models, as well as a series of results and systematic studies for a diverse collection of benchmarks. In the first approach, we will assume availability of scattered and potential noisy measurements across the entire spatio-temporal domain. In the latter, we will try to infer the unknown parameters from only two data snapshots taken at distinct time instants. All data and codes used in this manuscript are publicly available on GitHub at https://github.com/ maziarraissi/PINNs.
We define f(t, x) to be given by the left-hand-side of equation (1); i.e.,
and proceed by approximating u(t, x) by a deep neural network. This assumption along with equation (2) result in a physics informed neural network f(t, x). This network can be derived by applying the chain rule for differ-entiating compositions of functions using automatic differentiation [11]. It is worth highlighting that the parameters of the differential operator into parameters of the physics informed neural network f(t, x).
2.1. Example (Burgers’ Equation)
As a first example, let us consider the Burgers’ equation. This equation arises in various areas of applied mathematics, including fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flow [10]. It is a fundamental partial differential equation and can be derived from the Navier-Stokes equations for the velocity field by dropping the pressure gradient term. For small values of the viscosity parameters, Burgers’ equation can lead to shock formation that is notoriously hard to resolve by classical numerical methods. In one space dimension the equation reads as
Let us define f(t, x) to be given by
and proceed by approximating u(t, x) by a deep neural network. This will result in the physics informed neural network f(t, x). The shared parameters of the neural networks u(t, x) and f(t, x) along with the parameters (
) of the differential operator can be learned by minimizing the mean squared error loss
where
and
Here, denote the training data on u(t, x). The loss
responds to the training data on
enforces the structure imposed by equation (3) at a finite set of collocation points, whose number and location is taken to be the same as the training data.
To illustrate the effectiveness of our approach, we have created a training data-set by randomly generating N = 2, 000 points across the entire spatio-temporal domain from the exact solution corresponding to and
. The locations of the training points are illustrated in the top panel of figure 1. This data-set is then used to train a 9-layer deep neural network with 20 neurons per hidden layer by minimizing the mean square error loss of (5) using the L-BFGS optimizer [12]. Upon training, the network is calibrated to predict the entire solution u(t, x), as well as the unknown parameters
) that define the underlying dynamics. A visual assessment of the predictive accuracy of the physics informed neural network is given in the middle and bottom panels of figure 1. The network is able to identify the underlying partial differential equation with remarkable accuracy, even in the case where the scattered training data is corrupted with 1% uncorrelated noise.
To further scrutinize the performance of our algorithm, we have performed a systematic study with respect to the total number of training data, the noise corruption levels, and the neural network architecture. The results are summarized in tables 1 and 2. The key observation here is that the proposed
Figure 1: Burgers equation: Top: Predicted solution u(t, x) along with the training data. Middle: Comparison of the predicted and exact solutions corresponding to the three temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom: Correct partial differential equation along with the identified one obtained by learning
methodology appears to be very robust with respect to noise levels in the data, and yields a reasonable identification accuracy even for noise corruption up to 10%. This enhanced robustness seems to greatly outperform competing approaches using Gaussian process regression as previously reported in [7] as well as approaches relying on sparse regression that require relatively clean data for accurately computing numerical gradients [13].
2.1.1. Example (Navier-Stokes Equation)
Our next example involves a realistic scenario of incompressible fluid flow as described by the ubiquitous Navier-Stokes equations. Navier-Stokes equations describe the physics of many phenomena of scientific and engineering
Table 1: Percentage error in the identified parameters
different number of training data N corrupted by different noise levels. Here, the neural network architecture is kept fixed to 9 layers and 20 neurons per layer.
Table 2: Percentage error in the identified parameters
for different number of hidden layers and neurons per layer. Here, the training data is considered to be noise-free and fixed to N = 2, 000.
interest. They may be used to model the weather, ocean currents, water flow in a pipe and air flow around a wing. The Navier-Stokes equations in their full and simplified forms help with the design of aircraft and cars, the study of blood flow, the design of power stations, the analysis of the dispersion of pollutants, and many other applications. Let us consider the Navier-Stokes equations in two dimensions(2D) given explicitly by
where u(t, x, y) denotes the x-component of the velocity field, v(t, x, y) the y-component, and p(t, x, y) the pressure. Here, ) are the unknown parameters. Solutions to the Navier-Stokes equations are searched in the set of divergence-free functions; i.e.,
This extra equation is the continuity equation for incompressible fluids that describes the conservation of mass of the fluid. We make the assumption that
for some latent function Under this assumption, the continuity equation (7) will be automatically satisfied. Given noisy measurements
of the velocity field, we are interested in learning the parameters as well as the pressure p(t, x, y). We define f(t, x, y) and g(t, x, y) to be given by
and proceed by jointly approximating using a single neural network with two outputs. This prior assumption along with equations (8) and (9) results into a
The parameters
of the Navier-Stokes operator as well as the parameters of the neural networks
trained by minimizing the mean squared error loss
Here we consider the prototype problem of incompressible flow past a circular cylinder; a problem known to exhibit rich dynamic behavior and transitions for different regimes of the Reynolds number . Assuming a non-dimensional free stream velocity
kinematic viscosity
01, the system exhibits a periodic steady state behavior characterized by a asymmetrical vortex shedding pattern in the cylinder wake, known as the K´arm´an vortex street [14].
To generate a high-resolution data set for this problem we have employed the spectral/hp-element solver NekTar [15]. Specifically, the solution domain is discretized in space by a tessellation consisting of 412 triangular elements, and within each element the solution is approximated as a linear combination of a tenth-order hierarchical, semi-orthogonal Jacobi polynomial expansion [15]. We have assumed a uniform free stream velocity profile imposed at the left boundary, a zero pressure outflow condition imposed at the right boundary located 25 diameters downstream of the cylinder, and periodicity for the top and bottom boundaries of the [8] domain. We integrate equation (6) using a third-order stiffly stable scheme [15] until the system reaches a periodic steady state, as depicted in figure 2(a). In what follows, a small portion of the resulting data-set corresponding to this steady state solution will be used for model training, while the remaining data will be used to validate our predictions. For simplicity, we have chosen to confine our sampling in a rectangular region downstream of cylinder as shown in figure 2(a).
Given scattered and potentially noisy data on the stream-wise u(t, x, y) and transverse v(t, x, y) velocity components, our goal is to identify the unknown parameters , as well as to obtain a qualitatively accurate reconstruction of the entire pressure field p(t, x, y) in the cylinder wake, which by definition can only be identified up to a constant. To this end, we have created a training data-set by randomly sub-sampling the full high-resolution data-set. To highlight the ability of our method to learn from scattered and scarce training data, we have chosen N = 5, 000, corresponding to a mere 1% of the total available data as illustrated in figure 2(b). Also plotted are representative snapshots of the predicted velocity components u(t, x, y) and v(t, x, y) after the model was trained. The neural network architecture used here consists of 9 layers with 20 neurons in each layer.
A summary of our results for this example is presented in figure 3. We observe that the physics informed neural network is able to correctly identify
Figure 2: Navier-Stokes equation: Top: Incompressible flow and dynamic vortex shedding past a circular cylinder at Re = 100. The spatio-temporal training data correspond to the depicted rectangular region in the cylinder wake. Bottom: Locations of training datapoints for the the stream-wise and transverse velocity components, u(t, x, y) and v(t, x, t), respectively.
the unknown parameters with very high accuracy even when the training data was corrupted with noise. Specifically, for the case of noise-free training data, the error in estimating
is 0.078%, and 4.67%, respectively. The predictions remain robust even when the training data are corrupted with 1% uncorrelated Gaussian noise, returning an error of 0.17%, and 5.70%, for
, respectively.
A more intriguing result stems from the network’s ability to provide a qualitatively accurate prediction of the entire pressure field p(t, x, y) in the absence of any training data on the pressure itself. A visual comparison against the exact pressure solution is presented in figure 3 for a represen-
Figure 3: Navier-Stokes equation: Top: Predicted versus exact instantaneous pressure field p(t, x, y) at a representative time instant. By definition, the pressure can be recovered up to a constant, hence justifying the different magnitude between the two plots. This remarkable qualitative agreement highlights the ability of physics-informed neural networks to identify the entire pressure field, despite the fact that no data on the pressure are used during model training. Bottom: Correct partial differential equation along with the identified one obtained by learning
tative pressure snapshot. Notice that the difference in magnitude between the exact and the predicted pressure is justified by the very nature of the Navier-Stokes system, as the pressure field is only identifiable up to a constant. This result of inferring a continuous quantity of interest from auxiliary measurements by leveraging the underlying physics is a great example of the enhanced capabilities that physics informed neural networks have to offer, and highlights their potential in solving high-dimensional inverse problems.
Our approach so far assumes availability of scattered data throughout the entire spatio-temporal domain. However, in many cases of practical interest, one may only be able to observe the system at distinct time instants. In the next section, we introduce a different approach that tackles the data-driven discovery problem using only two data snapshots. We will see how, by leveraging the classical Runge-Kutta time-stepping schemes, one can construct discrete time physics informed neural networks that can retain high predictive accuracy even when the temporal gap between the data snapshots is
very large.
We begin by applying the general form of Runge-Kutta methods with q stages to equation (1) and obtain
Here, . This general form encapsulates both implicit and explicit time-stepping schemes, depending on the choice of the parameters
. Equations (11) can be equivalently expressed as
where
We proceed by placing a multi-output neural network prior on
This prior assumption along with equations (13) result in two physics in-
and
Given noisy measurements at two distinct temporal snapshots of the system at times
, respectively, the shared parameters of the neural networks (14), (15), and (16) along with the parameters
of the differential operator can be trained by minimizing the sum
of squared errors
where
and
Here,
3.1. Example (Burgers’ Equation)
Let us illustrate the key features of this method through the lens of the Burgers’ equation. Recall the equation’s form
and notice that the nonlinear operator in equation (13) is given by
Given merely two training data snapshots, the shared parameters of the neural networks (14), (15), and (16) along with the parameters of the Burgers’ equation can be learned by minimizing the sum of squared errors in equation (17). Here, we have created a training data-set comprising of
solution at time instants
9, respectively. The training data are shown in the top and middle panel of figure 4. The neural network architecture used here consists of 4 hidden layers with 50 neurons each, while the number of Runge-Kutta stages is empirically chosen to yield a temporal error accumulation of the order of machine precision
by setting
Table 3: Percentage error in the identified parameters
different gap size ∆t between two different snapshots and for different noise levels.
where the time-step for this example is ∆t = 0.8. The bottom panel of figure 4 summarizes the identified parameters ) for the cases of noise-free data, as well as noisy data with 1% of Gaussian uncorrelated noise corruption. For both cases, the proposed algorithm is able to learn the correct parameter values
with remarkable accuracy, despite the fact that the two data snapshots used for training are very far apart, and potentially describe different regimes of the underlying dynamics.
A sensitivity analysis is performed to quantify the accuracy of our predictions with respect to the gap between the training snapshots ∆t, the noise levels in the training data, and the physics informed neural network architecture. As shown in table 3, the proposed algorithm is quite robust to both ∆t and the noise corruption levels, and it consistently returns reasonable estimates for the unknown parameters. This robustness is mainly attributed to the flexibility of the underlying implicit Runge-Kutta scheme to admit an arbitrarily high number of stages, allowing the data snapshots to be very far apart in time, while not compromising the accuracy with which the nonlinear dynamics of equation (18) are resolved. This is the key highlight of our discrete time formulation for identification problems, setting it apart from competing approaches [7, 13]. Lastly, table 4 presents the percentage error in the identified parameters, demonstrating the robustness of our estimates with respect to the underlying neural network architecture.
3.1.1. Example (Kortewegde Vries Equation)
Our final example aims to highlight the ability of the proposed framework to handle governing partial differential equations involving higher order derivatives. Here, we consider a mathematical model of waves on shallow
Figure 4: Burgers equation: Top: Solution u(t, x) along with the temporal locations of the two training snapshots. Middle: Training data and exact solution corresponding to the two temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom: Correct partial differential equation along with the identified one obtained by learning
water surfaces; the Korteweg-de Vries (KdV) equation. This equation can also be viewed as Burgers’ equation with an added dispersive term. The KdV equation has several connections to physical problems. It describes the evolution of long one-dimensional waves in many physical settings. Such physical
Table 4: Percentage error in the identified parameters
different number of hidden layers and neurons in each layer.
settings include shallow-water waves with weakly non-linear restoring forces, long internal waves in a density-stratified ocean, ion acoustic waves in a plasma, and acoustic waves on a crystal lattice. Moreover, the KdV equation is the governing equation of the string in the Fermi-Pasta-Ulam problem [17] in the continuum limit. The KdV equation reads as
with () being the unknown parameters. For the KdV equation, the nonlinear operator in equations (13) is given by
and the shared parameters of the neural networks (14), (15), and (16) along with the parameters ) of the KdV equation can be learned by minimizing the sum of squared errors (17).
To obtain a set of training and test data we simulated (20) using conventional spectral methods. Specifically, starting from an initial condition ) and assuming periodic boundary conditions, we have integrated equation (20) up to a final time t = 1.0 using the Chebfun package [18] with a spectral Fourier discretization with 512 modes and a fourth-order explicit Runge-Kutta temporal integrator with time-step ∆
this data-set, we then extract two solution snapshots at time
8, and randomly sub-sample them using
to generate a training data-set. We then use these data to train a discrete time physics informed neural network by minimizing the sum of squared error loss of equation (17) using L-BFGS [12]. The network architecture used here comprises of 4 hidden layers, 50 neurons per layer, and an output layer predicting the solution at the q Runge-Kutta stages, i.e.,
where q is computed using equation (19) by setting ∆t = 0.6.
The results of this experiment are summarized in figure 5. In the top panel, we present the exact solution u(t, x), along with the locations of the two data snapshots used for training. A more detailed overview of the exact solution and the training data is given in the middle panel. It is worth noticing how the complex nonlinear dynamics of equation (20) causes dramatic differences in the form of the solution between the two reported snapshots. Despite these differences, and the large temporal gap between the two training snapshots, our method is able to correctly identify the unknown parameters regardless of whether the training data is corrupted with noise or not. Specifically, for the case of noise-free training data, the error in estimating is 0.023%, and 0.006%, respectively, while the case with 1% noise in the training data returns errors of 0.057%, and 0.017%, respectively.
We have introduced physics informed neural networks, a new class of universal function approximators that is capable of encoding any underlying physical laws that govern a given data-set, and can be described by partial differential equations. In this work, we design data-driven algorithms for discovering dynamic models described by parametrized nonlinear partial differential equations. The inferred models allow us to construct computationally efficient and fully differentiable surrogates that can be subsequently used for different applications including predictive forecasting, control, and optimization.
Although a series of promising results was presented, the reader may perhaps agree that this two-part treatise creates more questions than it answers. In a broader context, and along the way of seeking further understanding of such tools, we believe that this work advocates a fruitful synergy between machine learning and classical computational physics that has the potential to enrich both fields and lead to high-impact developments.
Figure 5: KdV equation: Top: Solution u(t, x) along with the temporal locations of the two training snapshots. Middle: Training data and exact solution corresponding to the two temporal snapshots depicted by the dashed vertical lines in the top panel. Bottom: Correct partial differential equation along with the identified one obtained by learning
This work received support by the DARPA EQUiPS grant N66001-15-2-4055, the MURI/ARO grant W911NF-15-1-0562, and the AFOSR grant FA9550-17-1-0013. All data and codes used in this manuscript are publicly
[1] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, pp. 1097–1105.
[2] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444.
[3] B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept learning through probabilistic program induction, Science 350 (2015) 1332–1338.
[4] B. Alipanahi, A. Delong, M. T. Weirauch, B. J. Frey, Predicting the se- quence specificities of DNA-and RNA-binding proteins by deep learning, Nature biotechnology 33 (2015) 831–838.
[5] M. Raissi, P. Perdikaris, G. E. Karniadakis, Numerical Gaussian pro- cesses for time-dependent and non-linear partial differential equations, arXiv preprint arXiv:1703.10230 (2017).
[6] M. Raissi, P. Perdikaris, G. E. Karniadakis, Inferring solutions of dif- ferential equations using noisy multi-fidelity data, Journal of Computational Physics 335 (2017) 736–746.
[7] M. Raissi, G. E. Karniadakis, Hidden physics models: Machine learning of nonlinear partial differential equations, arXiv preprint arXiv:1708.00588 (2017).
[8] M. Raissi, P. Perdikaris, G. E. Karniadakis, Machine learning of linear differential equations using Gaussian processes, Journal of Computational Physics 348 (2017) 683 – 693.
[9] S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz, Data-driven discovery of partial differential equations, Science Advances 3 (2017).
[10] C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solutions of the Burgers equation, Computers & fluids 14 (1986) 23–41.
[11] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: a survey, arXiv preprint arXiv:1502.05767 (2015).
[12] D. C. Liu, J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical programming 45 (1989) 503–528.
[13] S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equa- tions from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences 113 (2016) 3932–3937.
[14] T. Von K´arm´an, Aerodynamics, volume 9, McGraw-Hill New York, 1963.
[15] G. Karniadakis, S. Sherwin, Spectral/hp element methods for computa- tional fluid dynamics, Oxford University Press, 2013.
[16] A. Iserles, A first course in the numerical analysis of differential equa- tions, 44, Cambridge University Press, 2009.
[17] T. Dauxois, Fermi, Pasta, Ulam and a mysterious lady, arXiv preprint arXiv:0801.1590 (2008).
[18] T. A. Driscoll, N. Hale, L. N. Trefethen, Chebfun guide, 2014.