Reservoir computing model of two-dimensional turbulent convection

2020·arXiv

Abstract

Abstract

Reservoir computing is an efficient implementation of a recurrent neural network that can describe the evolution of a dynamical system by supervised machine learning without solving the underlying mathematical equations. In this work, reservoir computing is applied to model the large-scale evolution and the resulting low-order turbulence statistics of a two-dimensional turbulent RayleighB´enard convection flow at a Rayleigh number Ra = 10and a Prandtl number Pr = 7 in an extended spatial domain with an aspect ratio of 6. Our data-driven approach which is based on a long-term direct numerical simulation of the convection flow comprises a two-step procedure. (1) Reduction of the original simulation data by a Proper Orthogonal Decomposition (POD) snapshot analysis and subsequent truncation to the first 150 POD modes which are associated with the largest total energy amplitudes. (2) Setup and optimization of a reservoir computing model to describe the dynamical evolution of these 150 degrees of freedom and thus the large-scale evolution of the convection flow. The quality of the prediction of the reservoir computing model is comprehensively tested by a direct comparison of the results of the original direct numerical simulations and the fields that are reconstructed by means of the POD modes. We find a good agreement of the vertical profiles of mean temperature, mean convective heat flux, and root mean square temperature fluctuations. In addition, we discuss temperature variance spectra and joint probability density functions of the turbulent vertical velocity component and temperature fluctuation the latter of which is essential for the turbulent heat transport across the layer. At the core of the model is the reservoir, a very large sparse random network characterized by the spectral radius of the corresponding adjacency matrix and a few further hyperparameters which are varied to investigate the quality of the prediction. Our work demonstrates that the reservoir computing model is capable to model the large-scale structure and low-order statistics of turbulent convection which can open new avenues for modeling mesoscale convection processes in larger circulation models.

I. INTRODUCTION

The application of machine learning (ML) methods, in particular of deep neural networks (DNN)[1–5], to fluid flows has transformed the way of processing and analyzing large amounts of data. ML methods are used to parametrize unresolved scales in Reynolds stresses and subgrid scale models for complex physical or geometrical flow configurations at high Reynolds numbers which still remain inaccessible to direct numerical simulations or even large eddy simulations [6–10]. They are also used for the detailed segmentation of complex images [11, 12]. DNNs converted large-scale patterns in an extended three-dimensional turbulent convection flow [13, 14] into a planar dynamical network where the edges are regions of locally enhanced convective heat flux. Physics-informed and physics-constrained neural networks were developed as a substitute for solving the underlying partial differential equations of the fluid motion [15–17]. All supervised ML algorithms make use of the fact that it is often easier to train a DNN with a number of labeled training data of an intended input-output behavior, than to develop a specific numerical code to provide the correct answer for all possible input data. During the training, information propagates forward through the network while weights and biases are updated at each neuron in each hidden layer of the network backwards from the output to the input layer. This back-and-forth iteration (which is called epoch) has to be repeated multiple times until a minimum of the loss function is obtained. Typically, (stochastic) gradient descent algorithms are used for the minimum search of the loss function [5]. Such a training process requires typically big data records. One further disadvantage of DNNs is their limited capability to process unseen data at very different flow parameters, such as Reynolds or Rayleigh numbers. Possible solutions to overcome this limitation to stick with comprehensive training sets from different cases or to switch to transfer learning methods [5] which continue the training to adapt a pre-trained DNN to the new input data.

Turbulence problems are inherently highly chaotic with stochastic temporal variation of the involved fields. Highdimensional data due to the large grid sizes have to be processed by DNNs to predict statistical properties or the flow evolution; in some cases they fail completely as for example discussed in detail in refs. [18, 19]. These methods suffer from the already mentioned large dimension of the input vector which leads to expensive training procedures of the model. Recurrent neural networks (RNN) are better suited: due to their “internal memory” and feedback mechanisms they are by construction more appropriate to learn the dynamics (and thus the resulting statistics of the flow). The long short-term memory (LSTM) network [20] is a specific subclass of RNNs, which provides a “gated mechanism” for information flow in a feedback loop. The internal memory state allows the sequential processing of the data, using a smaller layer size or less layers and processing the time series in smaller batches compared to other DNNs [21]. LSTMs have been applied recently with success for a small Galerkin 9-mode model of a turbulent shear flow [22] and compared with a standard DNN [21]. In order to circumvent expensive training procedures, echo state networks (ESN) or reservoir computing models (RCM) [23, 24] seem to be a further alternative which is considered here. RCM have received recently renewed attention as a method of equation-free modeling of nonlinear dynamical systems, such as for the already mentioned Galerkin 9-mode model of a turbulent shear flow [25], the Lorenz96 model [26] or the one-dimensional partial differential Kuramoto-Sivashinsky (KS) equation [27, 28] on up to 512 grid points. None of the previous examples handles however two-dimensional spatial fields and evaluates reservoir computing on the basis of turbulent statistics. We also mention here that larger numbers of degrees of freedom of these dynamical systems can been processed with parallel versions of RNNs, such as in ref. [26] for a LSTM.

The present work derives a reservoir computing model for a two-dimensional, fully turbulent Rayleigh-B´enard convection (RBC) flow at a relatively large Rayleigh number of Ra = 10and a Prandtl number of Pr = 7 [29, 30] in a domain Ω = with an aspect ratio Γ = L/H = 6. Here L is the horizontal length and H the height of the simulation domain. We set up a (Boussinesq-) equation-free model to describe the large-scale dynamics of the flow and to reproduce low-order statistics, such as profiles of the temperature fluctuations and the convective heat flux across the layer as well as the joint statistics of velocity components and temperature. This requires the following subsequent steps: (1) Data-driven reduction of the fully resolved direct numerical simulation (DNS) record to the most energetic degrees of freedom by a standard Proper Orthogonal Decomposition (POD) based on the snapshot method [31–34] which is applied to the three-dimensional vector field composed of velocity field and temperature fluctuations. This truncation will contain here 83% of the mean total turbulent energy and cause a compression by 92%. (2) Construction and training of a RCM that predicts the dynamical evolution of the most energetic degrees of freedom obtained in step (1) and thus of the large-scale flow and the resulting low-order statistics. We substitute here a Galerkin-truncated reduced-order model, which is obtainable by a projection of the Boussinesq equations of turbulent convection on the individual POD eigenspaces, by the simple dynamics on a large reservoir (which will be explained further below) that does not have any explicit information on the underlying RBC dynamics and is purely data-driven.

The combined application of POD analysis with a small feedforward neural network on two-dimensional fields was reported by Krischer et al. [35]. The usage of RNNs in combination with POD have been reported by Wan et al. [36] for a two-dimensional Kolmogorov flow, by Vlachas et al. [37] for the Lorenz96 model and the KS equation, by Deng et al. [38] for the reconstruction of time-resolved turbulent flow measurement from discrete data obtained from the experiments, or by Mohan and Gaitonde [39] for turbulent flow control. Pawar et al. [40] combined POD with a standard DNN for predictions in a heated cavity flow.

The outline of this paper is as follows. Section II describes the Boussinesq model of turbulent Rayleigh-B´enard convection along with numerical procedure and some basic results. Section III illustrates the POD snapshot method. Section IV presents the RCM employed in this work and the variation of hyperparameters of the reservoir. Section V discusses also the results which are obtained for the present application and provides a comparison with the original DNS and POD based data. We conclude with a summary and an outlook in section VI.

II. DIRECT NUMERICAL SIMULATION

Direct numerical simulations are applied to solve the Boussinesq equations for the two-dimensional case. These equations are made dimensionless by the height of the layer H, the free-fall velocity = , where g is the acceleration due to gravity, is the thermal expansion coefficient at constant pressure, and ∆T represents the imposed temperature difference between the bottom and the top. Times are expressed in units of the free-fall time . The Rayleigh number is given by Ra = ) = 10, the Prandtl number by Pr = = 7, and the aspect ratio is Γ = L/H = 6 . Here, is the kinematic viscosity, and is the thermal diffusivity of the fluid. The equations of motion in dimensionless form, which couple the two-dimensional velocity field u = (), the pressure field p, and the temperature field T, are given by

No-slip boundary conditions are imposed at the bottom (y = 0) and top (y = 1) for the velocity field. The temperature field is constant, T = 1 at y = 0 and T = 0 at y = 1. The side walls obey periodic boundary conditions for all fields. Equations (1)–(4) are numerically solved by the Nek5000 spectral element method package [41]. We use = 4816 spectral elements and apply Lagrangian interpolations polynomials of order N = 11 on each element and in each spatial direction. Further details pertaining to the numerical procedure can be found in refs. [14, 30]. We sampled the turbulent flow every 0for the subsequent POD snapshot analysis.

We apply the standard Reynolds decomposition which is given by

Mean profiles which have been obtained by a combined x-line and time average are displayed in Fig. 1. In panel (a) of this figure we show linear profile of diffusive heat transfer ) = 1 which exists for Rayleigh numbers below the instability threshold Ra= 1708 together with the mean temperature profile for Ra Ra. As expected the magnitude of mean velocity is practically zero which is shown in Fig. 1(b). This means that and = as indicated in Eqns. (5) and (6). The DNS start from the diffusive equilibrium state and relax into the statistically stationary turbulent convection state after an initial period of 100. Any subsequent snapshot analysis starts for t > 100. For the data analysis, we interpolate all DNS snapshots spectrally onto a uniform, somewhat coarser two-dimensional mesh of = 160 108 points. Our data base consists of = 2000 equidistant simulation snapshots spanning a time range of 500. Figure 2 displays the time series of the Nusselt number, the dimensionless measure of the turbulent heat transfer, which is taken at the bottom and top plates as an average of the diffusive heat flux, Nu(y = 0, 1) = . The Nusselt number is obtained as an arithmetic average of both means and follows to Nu = 1454. This value agrees well with the results from three-dimensional simulations of RBC for the same parameters which were reported in ref. [12]

FIG. 1. Mean profiles of (a): Temperature and (b): Velocity components. The average quantities are obtained by averaging in time and in homogeneous direction x. The diffusive equilibrium temperature profile ) is added to panel (a).

FIG. 2. Temporal variation of Nusselt number for bottom and top plates obtained in the spectral element DNS. We start sampling our data for times to exclude the initialization effects which are indicated by the green-shaded region.

III. PROPER ORTHOGONAL DECOMPOSITION OF SIMULATION DATA

Rather than using slices of DNS data as a direct input into a ML algorithm as done for example by King et al. [18] in a semi-supervised algorithm, we introduce an intermediate data reduction step. We take a standard POD analysis for simplicity, a prominent method to extract a subset of the energetically dominant degrees of freedom from the fully resolved turbulent convection data [31–34, 42, 43]. In detail, we will apply the snapshot method [44, 45] which was developed by Sirovich and Park [31, 32]. Input is the three-dimensional vector field = () = () with zero mean, = 0. A mean total turbulent energy or variance (which comprises turbulent kinetic energy and scalar fluctuation variance) is given by

We want to determine POD modes Φ) that maximize the averaged projection of onto Φ, i.e.,

where () denotes a scalar product on (Ω) with = () and (Ω) which induces a norm = (. We use the Einstein summation convention and drop the summation symbols for same indices, e.g., () is the short notation for ). Variational calculus translates (9) into a search of maxima of a constrained functional J [46], i.e.,

with the Lagrangian multiplier and . This leads to the following integral equation

with the Hermitean, non-negative kernel operator ˆ. The tensor product symbol is omitted for the rest. Indices m, n = 1, 2, 3 and the Einstein summation convection are again used. There is no summation over index p = 1, 2, . . . which stands for the POD modes that all solve (11). For the present case ; the kernel operator is a 3matrix with the size of the uniform data mesh. The number 3corresponds to the (finite) number of degrees of freedom, , of the flow, which is approximated on a computational grid and consists of three fields. In addition, we use a Fourier expansion in the homogeneous x-direction which reduces the POD analysis to the vertical coordinate y only and uses the periodicity along x. Thus (11) translates into

with k, l = 1, 2, 3 and (see [34] for details). Here,

Thus the velocity components and temperature fluctuation are expanded in the following POD base

with p = 1= 51840 (see section II), k = 1, 2, 3, 2, and the reality condition ) = ). Equation (11) (or (12)) would correspond to a large eigenvalue problem which has to be solved by a direct method. Following Sirovich [44], the complexity of this task can be reduced. The snapshot method converts the original eigenvalue problem (11) for the kernel operator into one for a (snapshot) matrix with . We therefore chose the following expansion ansatz for each POD mode

We substitute the time average by an arithmetic average over the snapshots. Expansion (15) is inserted into (12) and results to the following approximation

One has to find the eigenvectors () and eigenvalues which completes the procedure [34]. Here, is the integer that stands for DNS snapshot at time . Note that the index p runs now from 1 to only.

Figure 3 displays the spectrum of the eigenvalue analysis. The eigenvalues are sorted with respect to their magnitude which stands for the energy contained in the corresponding POD mode. We denote these modes again by Φ. As typical for a turbulent flow, the spectrum falls off quickly, but shows a long tail. We will truncate the POD mode expansion to the = 150 most energetic POD modes () which contain 83 % after the total energy E as given by (8). This is shown in Fig. 3b. Other truncation levels can be taken, but would not change the subsequent results qualitatively. Figure 4 illustrates the spatial structure of the three components of the modes

FIG. 3. Eigenvalue spectrum of POD modes as obtained from the analysis of 2000 DNS snapshots. (a): Individual contribution of each mode, and (b): Cumulative contribution of modes. The shaded region show the contribution of first 150 modes which capture the 83% of the total energy of the convection flow and thus provides a good approximation of the large-scale structure.

Φ(x, y) and Φ(x, y). We obtain the time-dependence of the expansion coefficients ) of the most energetic POD modes Φby

Here, m = 1, 2, 3 and p, q = 1. Orthogonality of the POD modes has been used to obtain the final relation. The time series ) are used to train the reservoir computing model which is discussed in the following section. They are considered as the ground truth (GT). The first step of the RCM setup is now completed.

IV. RESERVOIR COMPUTING MODEL

A. Architecture and training of reservoir computing model

In the following, we summarize briefly the basics of the RCM, a special type of RNNs [5]. The RCM architecture is inspired by the brain where many neurons are randomly, recurrently and sparsely connected. Figure 5 shows the general structure of a RCM. The input consists of training data in form of time series of the POD expansion coefficients ) with i = 1. They are converted at each instant into a reservoir state vector with . This is done by means of the random weight matrix which is determined at the beginning of the training,

Here N is the number of nodes in the reservoir – a big sparse random network which is described by a (symmetric) adjacency matrix . The initialization of A is again random and different strategies for this initialization have been suggested which can improve the results as shown by Strauss et al. [47]. Two important parameters of A are the reservoir density D of active nodes and the spectral radius ), which is set by the largest absolute value of the eigenvalues. Across the reservoir nodes, a simple nonlinear dynamical system evolves which comprises the short-term memory of the network,

with j, k = 1, . . . , N and m = 1. The nonlinearity enters in form of a typical activation function, here a hyperbolic tangent. A further parameter – the leakage rate – enters the model which blends linear and nonlinear

FIG. 4. Spatial structure of two POD modes. Temperature fluctuation field ) snapshot taken at time and resulting by projection onto (a) Φ) and (b) Φ). Horizontal velocity component ) by projection onto (c) Φand (d) Φ). Wallnormal velocity component ) by projection onto (e) Φ) and (f) Φmodes are shown in simulation domain and were obtained by the inverse Fourier transform.

contributions. Optimally, the reservoir should be operated close to an instability which implies a spectral radius 1. The final element is the random output weight matrix which maps the updated reservoir vector back to the POD expansion coefficients,

This procedure is repeated for = 700 time steps and all reservoir states are saved. The big advantage of the RCM is that the training is performed with respect to the output layer only. Thus a back propagation procedure that is required in case of DNNs is avoided (see e.g. [27, 28] for more details). As a consequence RCM obtains training input from the preceding time step only while other RNNs take a time step sequence from the past. The optimized output weight matrix, , is obtained by a minimization of a regularized quadratic cost function C. A regularization term is added to C to tackle the over-fitting problem [5]. The function is given by

During this training process, the number of nodes N, the reservoir density D, the spectral radius ), the leakage rate , and the prefactor 0 of the regularization term of the quadratic cost function are hyperparameters to tune additionally. Prefactor in (21) is the ridge regression parameter. The result of this optimization procedure is the matrix (as already said) which completes the training.

The prediction mode of the RCM after the training with given POD coefficient time series, ), as an initial condition is given by

Now the reservoir dynamics is closed and no further input is required. The dynamics of eq. (22) are used to examine the large-scale evolution and low-order statistics of a two-dimensional turbulent convection flow without using the underlying Boussinesq equations (1)–(3). The translation back to the turbulent fields goes via to the ) and the subsequent reconstruction by the corresponding POD modes Φ(x, y).

FIG. 5. The basic architecture of a reservoir computing model RCM (or echo state network ESN) consisting of an input layer, a reservoir, and an output layer. Dotted arrows mark connections which remain fixed during the training. The reservoir consists of N nodes and, both, the input and output layer have units. The reservoir substitutes the stack of hidden layers in a deep neural network.

B. Implementation of reservoir computing model

Previous studies that apply reservoir computing were frequently performed for dynamical systems with a smaller number of degrees of freedom as we discussed in the introduction. Here, we want to take the RCM approach to a new level of complexity by an application to a fully turbulent flow in an extended domain. This will result in additional challenges that start with the architecture and training. The data base consists of the last 1400 (out of the originally 2000) snapshots of the turbulent velocity and temperature fields as reconstructed from the 150 POD modes. These most energetic 150 POD modes account for 83% of the mean total turbulent energy E (see eq. 8). By selecting 150 modes, we were able to compress our data by 92%, while losing 17% of the information. We also checked that the separate variances of the two-dimensional velocity field and the temperature fluctuations have a similar magnitude.

We use a 50-50% split of this dataset for training and testing. The current problem is related to time-series forecasting; therefore a random splitting of the data is not considered. The first = 700 snapshots are used for the training of our RCM and the remaining 700 snapshots are exclusively used for (blind) testing of our framework. As already mentioned earlier, there are few tunable configuration parameters, commonly known as hyperparameters. All these parameters represent the dynamics of the reservoir A and thus affect the RCM [23, 24]. These hyperparameters have to be tuned carefully. Bayesian optimization [48, 49] is one of the sophisticated methods to find out the optimized hyperparameters in any machine learning task. Here, we applied a simpler grid search as a favorable option which resulted in 100 different hyperparameter settings (). We note here that RCM can become linearly unstable once 1 which sets an upper bound for this parameter. On top of this grid search, the ridge regression parameter had to be tuned.

We use two measures to determine the quality of the RCM output in relation to GT – the training data. Note again, that ground truth in our case comprises the reconstructions of the large-scale convection flow as obtained from the first 150 POD modes. Therefore, we monitor the following mean square error (MSE) which is given by

where = 150 and = 700 for training and test data, respectively. We are however also interested in the actual flow quantities, such as mean and fluctuation profiles across the convection layer, and thus take furthermore the normalized average relative error (NARE) to ground truth, following [21] in this respect. For example, this error

is given for the mean temperature profile by

Table I summarizes the MSE for both, training and test phase, for different values of reservoir node number N (5 different values), spectral radii ) (5 different values), and leakage rates (4 different values). In order to obtain the prescribed spectral radius, an adjacency matrix ˜A with the prescribed reservoir density is initialized randomly first. The spectral radius ( ˜A) is determined subsequently and the desired spectral radius ) is enforced afterwards by a rescaling A = ˜( ˜A). The hyperparameter tuning proceeds as follows: a hyperparameter grid () with 5 4 = 100 tripels is investigated for three different ridge regression parameters . The MSE is determined together with the three NARE for mean temperature, temperature fluctuations, and convective heat flux for each of the 300 cases. The three NARE values follow typically the same trend in the parameter grid search. These NARE magnitudes are calculated after the training or test runs by reconstructing the turbulence fields and performing the combined line-time average as given in (24). The tables for the three NARE, which would correspond to table I, are not shown here but have been evaluated. It is observed that the reservoir dynamics show high sensitivity on all these hyperparameters. The joint evaluation by means of MSE and NARE avoids overfitting. For example, the MSE shows its lowest training value at ) = 0.4 (see columns 3 and 4 of table I), but the reconstructed flow results in a high NARE amplitudes. This supports our earlier argument regarding the necessity of additional NARE monitoring of flow quantities rather than solely relying on the MSE measure of the POD expansion coefficients.

Table II depicts the final optimal values from the grid search. These values are obtained by a cross-validation procedure, i.e., by monitoring both defined measures (23) and (24). We model a dynamical system that comprises of = 150 modes as the degrees of freedom. On the one hand, this requires a large number of nodes in the reservoir A; as a rule of thumb N should exceed by an order of magnitude. On the other hand, N should not be chosen too large which can cause an overfitting because irrelevant statistical fluctuations in the training data will be learnt by the model [50]. The leakage rate = 0.95 is kept close to 1 which indicates that reservoir evolves slowly [24, 27]. This value gave the smallest training MSE error as seen in two last columns of table I. The spectral radius of adjacency matrix of the reservoir is the central parameter to tune [50]. A spectral radius close to one, which is eventually taken here, drives the reservoir dynamics close to an instability as discussed in [5]. The training phases are relatively short and inexpensive when compared to standard DNN or LSTM networks since RCMs do not require back-propagation. With the present data size, it took less than a minute when performed on 24 CPUs.

TABLE I. Performance of the reservoir computing model for different hyperparameters tripels () quantified by the mean square error (MSE) for training and test phases. The tripels consist of the number of reservoir nodes N, the spectral radius ), and the leakage rate . A total of 5 4 = 100 tripels were investigated times 3 different ridge regression parameters . The first two columns display results for (), the second for (), and the third two columns for (= 0.95, and = 0.95 (see also table 2).

TABLE II. Optimized hyperparameters which result from the whole grid search study (and indicated with asterisk). We list the number of reservoir nodes N, the spectral radius ), the leakage parameter , the reservoir density D (not varied), and the prefactor of the regularization term . Also displayed are the four measures: mean square error MSE, see Eq. (23), and the normalized average relative error (NARE) to ground truth as given by Eq. (24) for temperature fluctuations, mean temperature, and convective heat flux. Optimal hyperparameters are found with respect to all of these four measures.

V. PREDICTION OF TWO-DIMENSIONAL TURBULENT CONVECTION

FIG. 6. Temporal evolution of four POD modes coefficients. From top to bottom: (a,b) and (g,h) The green shaded range shows the training phase while the other range depicts the test phase. Here = 1400 is the snapshot index which translates into the time into a time instant. The corresponding panels in the right column show the initial phase of the forecast. The distance between two snapshots corresponds to 50 integration time steps of the original DNS model.

Figure 6 shows examples of the temporal variation of four selected POD modes –the GT – together with their predicted time series from our RCM. The predicted time series follow the general trends of the POD data closely, such as amplitudes and frequencies. For instance, the time series of ) varies on a much smaller frequency as the one for ), which is seen in Figs. 6(a,g). A comparison of the exact time evolution of four POD modes is displayed in the panels (b,d,f,h) of Fig. 6. The modes with a larger energy content match the GT for a longer period while the modes with a smaller content deviate quickly. We recall that the turbulent flow modeled here is a highly nonlinear dynamical system. Roundoff errors in the time integration are amplified quickly and lead to an exponentially fast separation of neighboring trajectories in phase space. We conclude that the forecast skills of our RCM model are limited. The GT is a sequence of snapshots which are separated by 0.25 of each other. This corresponds to 50 integration time steps dt = 5 10in the original DNS – an additional source of roundoff errors.

Figure 7a depicts the absolute error of the POD modes coefficients. The error fluctuates around zero mean which suggests that trained model does not suffer from any bias in a specific direction. The MSE as a function of time is also illustrated in Figure 7b for both training and testing phase. In correspondance with Fig. 6, the MSE quickly grows from 0 to 0.0005 with time, but saturates again. The RCM model in its present form is limited in view to a detailed forecast, but can reproduce the low-order statistics as we will see further below. A detailed determination of the forecast quality requires to determine the Lyapunov spectrum of the turbulent flow as done by Pathak et al. [51] or Vlachas et al. [26] for other dynamical systems. This is beyond the scope of the present work which focusses on a reproduction of statistical properties.

Figure 8 depicts an instantaneous snapshot of all three fields from the original DNS and compares the data with reconstructed field from the 150 POD modes as well as with the prediction from our RCM. We present results only for the blind test data to show the generalization ability of the trained network in predicting all three variables, i.e., the temperature and the two velocity components. It can be observed that the RCM can capture all the large-scale features of the turbulent convection flow with a very good quality even though the forecast skills are limited. This

FIG. 7. Temporal evolution of the error for RCM and GT. (a) Absolute error shown for four individual POD mode coefficients with the mode number indicated by the legend. (b) Mean square error MSE() versus number of snapshots (see also eq. (23)).

FIG. 8. Comparison of the turbulence fields in the test phase for = 201. (a,d,g): Original DNS snapshot. (b,e,h): POD model reconstruction with the 150 most energetic modes which serves as the ground truth for our machine learning problem. (c,f,i): Predicted field from the reservoir computing model. In panels (a,b,c), the instantaneous temperature field shown, in (d,e,f) the velocity component ), and in (g,h,i) the wallnormal velocity component

includes a reproduction of the large-scale motion as seen in Figs. 8(b,c). Thermal plumes, which are a result of thermal boundary layer instabilities [52], can also be captured by the RCM which is a challenging task at such high Rayleigh number (see Fig. 8a).

Next, we report mean and fluctuations profiles of the flow in Fig. 9. Again we show a comparison between the DNS, the projection of the snapshots on the POD modes and the prediction from RCM. In the Boussinesq case, the flow has an up-down-symmetry with respect to the midline at y = 0.5 such that we show the upper half only and mirror the data from the lower half. This symmetry which is in line with a transformation () could possibly be embedded as a physical constraint into the model, a task which we leave for the future work on the three-dimensional case. We also refer here to ref. [53] that describes how to embed such a symmetry in a neural network for a Hamiltonian system.

Figure 9a compares the mean temperature profiles which is perfectly reproduced by the POD as well as the reservoir computing. Figure 9b displays the mean profiles of the turbulent convective heat flux, . It can be seen that the RCM result is in good agreement with the POD data, both close to the wall and the center of the layer. The deviation between the POD data and the DNS data is caused by the truncation at 83% of the total energy, while the deviation between the POD data and the RCM is caused by the model error. Figure 9c shows the temperature fluctuations . Again the agreement is very good. To summarize, the RCM successfully predicts first and second-order statistics as demonstrated by these mean profiles.

FIG. 9. Comparison line- and time averaged mean profiles obtained from original DNS, truncated POD, and RCM. (a): Mean temperature, (b): Mean turbulent convective heat flux, and (c): Root mean square fluctuations of temperature. The profiles in case of the RCM were obtained from the test data set only and thus not used in the training phase.

To have a thorough comparison, we add results obtained for the Fourier spectra. Figure 10 depicts the spectrum of the temperature variance at three different locations in the wall-normal direction. As expected, both POD and RCM spectra deviate from the DNS spectra, particularly in the high-wavenumber tail representing the small eddies. The reason is the truncation to 150 POD modes. Our statistics prediction from the RCM follows however closely the trend of the POD-based data.

Furthermore, we show a comparison of probability density functions (PDF) of local convective heat flux at three different vertical locations in our turbulent convection domain. Figure 11 displays the PDFs and we can again observe a very good agreement, even for most of the tails of the distribution. After having found such a good agreement of the PDFs of turbulent heat flux, we assess the joint probability density function of the two individual fluctuating fields of the turbulent heat flux. We have therefore sampled instantaneous fluctuations of the wall normal velocity component and the temperature fluctuation at the midline. With these data, we determined the joint probability density function and their contours are shown in figure 12. The RCM is capable to reproduce the plume ejections from the bottom and the top which correspond to the skewed contours in the first and third quadrants, respectively.

FIG. 10. Comparison of energy spectra for temperature variance at different distances from the wall (a): 15, and (c): 50. The legend indicates the three data sets taken for the calculation. Line color is the same for all three panels.

FIG. 11. Comparison of probability density function (PDF) of the local convective heat flux (y = 0.15, and (c): y = 0.50. The legend indicates the three data sets taken for the calculation. Line color is the same for all three panels.

FIG. 12. Comparison of joint probability density function of the turbulent vertical velocity component and the turbulent temperature fluctuation, P(), at the midline at y = 0.50 (a): DNS, (b): POD, and (c): RCM. The isocontour levels in the legend are logarithmically spaced and hold for all three panels.

VI. CONCLUSIONS AND OUTLOOK

In the present work, we have discussed a machine learning–based approach to convective turbulence that aimed at a simple modeling of the large-scale evolution and low-order statistics. We applied a specific recurrent neural network with an efficient design, the reservoir computing model also known as the echo state network. Since it is not possible to feed the direct numerical simulation data directly into such a model, in particular in view to future applications to three-dimensional cases, we had to add an intermediate data reduction procedure. In this work, we applied therefore a standard and well-established Proper Orthogonal Decomposition snapshot analysis which extracted the most energetic degrees of freedom in form of empirical orthogonal modes of the fully developed turbulent flow at hand. Turbulence fields which are reconstructed from this finite number of POD modes are the ground truth and serve as training and testing data for the reservoir computing model. The quality of the prediction of the reservoir computing model is in the final part of the work comprehensively tested. This is done by a direct comparison of the RCM results with both, the original direct numerical simulations and the fields reconstructed with the POD modes. In detail, we find a good agreement of the vertical profiles of mean temperature, mean convective heat flux, and root mean square temperature. In addition, we discuss temperature variance spectra and joint probability density functions of the turbulent vertical velocity component and temperature fluctuation, the latter of which is essential for the turbulent heat transport across the layer.

Our presented work should be considered as a first step and a proof-of-concept investigation which certainly can be extended into several directions which we want to discuss briefly now. (1) Although, we report a rather comprehensive analysis of the reservoir parameters, a full Bayesian hyperparameter optimization might improve the performance of the RCM even further. (2) For the present case, it was not necessary to augment the training data since we could generate a long-term DNS record for the two-dimensional case with a reasonable computational effort. In a three-dimensional application this will not be the case anymore as discussed for example in refs. [12, 14]. Data augmentation can be done for example by using symmetries of the corresponding turbulent flow, an idea that goes back to Sirovich and Park in their application of POD to convection [31, 32]. Similar aspects of physics-informed ML were recently discussed in another context by Mattheakis et al. [53] for the symplectic nature of Hamiltonian systems which were modeled by neural networks. (3) First efforts have been discussed to generalize the RCM or ESN to a deep-ESN which consists of multiple reservoirs for a possible direct simulation data processing [54]. The successful application of this network architecture to turbulence is yet open.

To summarize, our presented reservoir computing model could successfully capture essential properties of the dynamics of the larger scales of a turbulent convection flow. It is a recurrent neural network that describes the turbulent convection flow without explicit knowledge of the Boussinesq equations which can lead to interesting applications, e.g., for global circulation models where mesoscale (moist) convection processes [55] have to be parametrized [56]. Our presented efforts will be extended to the three-dimensional RBC case and to subsequent tests of the performance of the RCM with respect to variations of Ra and Pr.

ACKNOWLEDGMENTS

The work is supported by the Deutsche Forschungsgemeinschaft with Grant No. SCHU 1410/30-1 and in parts by the project “DeepTurb – Deep Learning in and of Turbulence” which is funded by the Carl Zeiss Foundation. The generation of the long-term DNS data base was possible by supercomputing resources which were provided by the project grant HIL12 of the John von Neumann Institute for Computing. We thank Christian Cierpka, Florian Heyder, Patrick M¨ader, and Karl Worthmann for fruitful discussions.

[1] G. Hinton, L. Deng, and D. et al. Yu, “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Process. Mag. 29, 82–97 (2012).

[2] M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science 349, 255–260 (2015).

[3] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015).

[4] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Netw. 61, 85–117 (2015).

[5] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning (MIT press, 2016).

[6] J. Ling, A. Kurzawski, and J. Templeton, “Reynolds averaged turbulence modeling using deep neural networks with embedded invariance,” J. Fluid Mech. 807, 155–166 (2016).

[7] J. N. Kutz, “Deep learning in fluid dynamics,” J. Fluid Mech. 814, 1–4 (2017).

[8] K. Duraisamy, G. Iaccarino, and H. Xiao, “Turbulence modeling in the age of data,” Annu. Rev. Fluid Mech. 51, 357–377 (2019).

[9] R. Fang, D. Sondak, P. Protopapas, and S. Succi, “Neural network models for the anisotropic Reynolds stress tensor in turbulent channel flow,” J. Turb. , 1–19 (2019).

[10] S. Brunton, B. R. Noack, and P. Koumoutsakos, “Machine learning for fluid mechanics,” Annu. Rev. Fluid Mech. 52, 477–508 (2020).

[11] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” LNCS 9351, 234–241 (2015).

[12] E. Fonda, A. Pandey, J. Schumacher, and K. R. Sreenivasan, “Deep learning in turbulent convection networks,” Proc. Natl. Acad. Sci. USA 116, 8667–8672 (2019).

[13] M. S. Emran and J. Schumacher, “Large-scale mean patterns in turbulent convection,” J. Fluid Mech. 776, 96–108 (2015).

[14] A. Pandey, J. D. Scheel, and J. Schumacher, “Turbulent superstructures in Rayleigh-B´enard convection,” Nat. Commun. 9, 2118 (2018).

[15] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” J. Comput. Phys. 378, 686–707 (2019).

[16] N. Geneva and N. Zabaras, “Modeling the dynamics of pde systems with physics-constrained deep auto-regressive networks,” J. Comp. Phys. 403, 109056 (2020).

[17] M. Raissi, A. Yazdani, and G. E. Karniadakis, “Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations,” Science 367, 1026–1030 (2020).

[18] R. King, O. Hennigh, A. T. Mohan, and M. Chertkov, “From deep to physics-informed learning of turbulence: Diagnostics,” arXiv:1810.07785 (2018).

[19] A. T. Mohan, D. Daniel, M. Chertkov, and D. Livescu, “Compressed convolutional LSTM: An efficient deep learning framework to model high fidelity 3d turbulence,” arXiv:1903.00033 (2019).

[20] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9, 1735–1780 (1997).

[21] P. A. Srinivasan, L. Guastoni, H. Azizpour, P. Schlatter, and R. Vinuesa, “Predictions of turbulent shear flows using deep neural networks,” Phys. Rev. Fluids 4, 054603 (2019).

[22] J. Moehlis, H. Faisst, and B. Eckhardt, “A low-dimensional model for turbulent shear flows,” New J. Phys. 6, 56 (2004).

[23] H. Jaeger and H. Haas, “Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication,” Science 304, 78–80 (2004).

[24] M. Lukoˇseviˇcius and H. Jaeger, “Reservoir computing approaches to recurrent neural network training,” Comput. Sci. Rev. 3, 127–149 (2009).

[25] S. Pandey, J. Schumacher, and K. R. Sreenivasan, “A perspective on machine learning in turbulent flows,” J. Turb. , 1–18 (2020).

[26] P. R. Vlachas, J. Pathak, B. R. Hunt, T. P. Sapsis, M. Girvan, E. Ott, and P. Koumoutsakos, “Forecasting of spatiotemporal chaotic dynamics with recurrent neural networks: a comparative study of reservoir computing and backpropagation algorithms,” Neural Netw. 126, 191–217 (2020).

[27] Z. Lu, J. Pathak, B. R. Hunt, M. Girvan, R. Brockett, and E. Ott, “Reservoir observers: Model-free inference of unmeasured variables in chaotic systems,” Chaos 27, 041102 (2017).

[28] J. Pathak, B. R. Hunt, M. Girvan, Z. Lu, and E. Ott, “Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach,” Phys. Rev. Lett. 120, 024102 (2018).

[29] F Chill`a and J Schumacher, “New perspectives in turbulent Rayleigh-B´enard convection,” Eur. Phys. J. E 35, 58 (2012).

[30] J. D. Scheel, M. S. Emran, and J. Schumacher, “Resolving the fine-scale structure in turbulent Rayleigh-B´enard convec- tion,” New J. Phys. 15, 113063 (2013).

[31] L. Sirovich and H. Park, “Turbulent thermal convection in a finite domain: Part I. Theory,” Phys. Fluids A 2, 1649–1658 (1990).

[32] H. Park and L. Sirovich, “Turbulent thermal convection in a finite domain: Part II. Numerical results,” Phys. Fluids A 2, 1659–1668 (1990).

[33] J. Bailon-Cuba, M. S. Emran, and J. Schumacher, “Aspect ratio dependence of heat transfer and large-scale flow in turbulent convection,” J. Fluid Mech. 655, 152–173 (2010).

[34] J. Bailon-Cuba and J. Schumacher, “Low-dimensional model of turbulent Rayleigh-B´enard convection in a Cartesian cell with square domain,” Phys. Fluids 23, 077101 (2011).

[35] K. Krischer, R. Rico-Mart´ınez, I. G. Kevrekidis, H. H. Rotermund, G. Ertl, and J. L. Hudson, “Model identification of a catalytic spatiotemporally varying reaction,” AIChE J. 39, 89–98 (1993).

[36] Z. Y. Wan, P. Vlachas, P. Koumoutsakos, and T. P. Sapsis, “Data-assisted reduced-order modeling of extreme events in complex dynamical systems,” PLoS ONE 13, e0197704 (2018).

[37] P. R. Vlachas, W. Byeon, Z. Y. Wan, T. P. Sapsis, and P. Koumoutsakos, “Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks,” Proc. R. Soc. A 474, 20170844 (2018).

[38] Z. Deng, Y. Chen, Y. Liu, and K. C. Kim, “Time-resolved turbulent velocity field reconstruction using a long short-term memory (LSTM)-based artificial intelligence framework,” Phys. Fluids 31, 075108 (2019).

[39] A. T. Mohan and D. V. Gaitonde, “A deep learning based approach to reduced order modeling for turbulent flow control using LSTM neural networks,” arXiv:1804.09269 (2018).

[40] S. Pawar, S. M. Rahman, H. Vaddireddy, O. San, A. Rasheed, and P. Vedula, “A deep learning enabler for nonintrusive reduced order modeling of fluid flows,” Phys. Fluids 31, 085101 (2019).

[41] P. F. Fischer, “An overlapping Schwarz method for spectral element solution of the incompressible Navier-Stokes equations,” J. Comput. Phys. 133, 84–101 (1997).

[42] B. Podvin and A. Sergent, “Proper orthogonal decomposition investigation of turbulent Rayleigh-B´enard convection in a rectangular cavity,” Phys. Fluids 24, 105106 (2012).

[43] B. Podvin and A. Sergent, “A large-scale investigation of wind reversal in a square Rayleigh-B´enard cell,” J. Fluid Mech. 766, 172–201 (2015).

[44] L. Sirovich, “Turbulence and the dynamics of coherent structures. Part I: Coherent structures,” Quaterly of Applied Math. XLV, 561–571 (1987).

[45] J. Weiss, “A tutorial on the Proper Orthogonal Decomposition,” in AIAA Aviation 2019 Forum (2019) p. 3333.

[46] P. Holmes, J. L. Lumley, and Berkooz G., Turbulence, Coherent Structures, Dynamical Systems and Symmetry (Cambridge University Press, 1996).

[47] T. Strauss, W. Wustlich, and R. Labahn, “Design strategies for weight matrices of echo state networks,” Neural Comput. 24, 3246–3276 (2012).

[48] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms for hyper-parameter optimization,” in Advances in Neural Information Processing Systems (2011) pp. 2546–2554.

[49] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in Neural Information Processing Systems (2012) pp. 2951–2959.

[50] H. Jaeger, Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the ”echo state network” approach, Vol. 5 (GMD-Forschungszentrum Informationstechnik Bonn, 2002).

[51] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, R. Brockett, and E. Ott, “Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data,” Chaos 27, 121102 (2017).

[52] A. K. De, V. Eswaran, and P. K. Mishra, “Dynamics of plumes in turbulent Rayleigh-B´enard convection,” Eur. J. Mech. B/Fluids 72, 164–178 (2018).

[53] M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, and E. Kaxiras, “Physical symmetries embedded in neural networks,” arXiv:1904.08991 (2019).

[54] C. Gallicchio and A. Micheli, “Richness of deep echo state networks,” LNCS 11506, in press (2019).

[55] O Pauluis and J Schumacher, “Self-aggregation of clouds in conditionally unstable moist convection,” Proc. Natl. Acad. Sci. USA 108, 12623–12628 (2011).

[56] P. Gentine, M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis, “Could machine learning break the convection parametriza- tion deadlock,” Geophys. Res. Lett. 45, 5742 (2018).

Designed for Accessibility and to further Open Science