b

DiscoverSearch
About
My stuff
Solving for high dimensional committor functions using artificial neural networks
2018·arXiv
Abstract
Abstract

In this note we propose a method based on artificial neural network to study the transition between states governed by stochastic processes. In particular, we aim for numerical schemes for the committor function, the central object of transition path theory, which satisfies a high-dimensional Fokker-Planck equation. By working with the variational formulation of such partial differential equation and parameterizing the committor function in terms of a neural network, approximations can be obtained via optimizing the neural network weights using stochastic algorithms. The numerical examples show that moderate accuracy can be achieved for high-dimensional problems.

In this paper, we study the transition between two states described by the overdamped Langevin process

image

where  Xt ∈ Ω ⊂ Rd, U : Rd → R, β= 1/T, and  Wtis a d-dimensional Wiener process, using the transition path theory [7, 8, 15]. The central object in the transition path theory is the committor function. Let  τDbe the first hitting time for region  D ⊂Ω. For two disjoint regions  A, B ⊂Ω, the committor function is defined as

image

which is the probability of hitting region B before region A with the stochastic process (1) starting at x. The committor function q(x) provides useful statistical description on properties such as density and probability current of reaction trajectory. However, obtaining the committor function q(x) can be a formiddable task, as it involves solving a high-dimensional Fokker-Planck equation

image

The high dimensional nature of (3) renders obtaining q(x) via finite-element-type methods intractable. On the other hand, since the transition paths are often localized to a quasi one-dimensional reaction tube, the region of interest is rather small compare to Ω. Under this approximation, the finite temperature string method [6, 18] is developed to simultaneously find the best “tube” and the corresponding committor function. Another approach is based on explicit dimension reduction using e.g., the leading eigenfunctions

of the generator  −β−1∆+∇U(x)·∇by means of diffusion maps [5, 4] to approximate q(x). More recently, a direct point cloud discretization for the Fokker-Planck equation [13] has been also considered. In recent years, the artificial neural-network (NN) has shown great success in representing high-dimensional probablity distributions or classifiers in a variety of machine learning tasks [10, 14, 16]. Motivated by those recent success, in this note we use an NN to provide a low-dimensional parameterization of the committor function q(x) → qθ(x) (4) where  θ ∈ Rpis the parameter vector of the NN.  qθ(x) is then obtained by minimizing the variational formulation of (3) over  θas a nonlinear Galerkin method. In this way, we switch from solving a partial differential equation to solving an optimization problem. Optimizing the cost in the variational formulation of (3) using gradient-descent-based method involves computing an integral with respect to the equilibrium measure. For doing such integration, we use a Monte-Carlo method where the samples are generated according to the stochastic process (1). An NN parameterized committor function can potentially be used to guide further sampling in transition regions wherein sample density is low (in similar spirit as [19] where an NN parameterized force field is used to guide sampling in the application of molecular dynamics). This paper is organized as the following. In Section 2, we design an NN tailored for solving for the committor function. In Section 3, we demonstrate the success of the proposed method in a few examples. In Section 4, we conclude the note. Before moving on, in the next subsection we survey related methods for solving (3).

1.1 Previous approaches

A popular way to solve (3) is to discretize the generator  −β−1∆ + ∇U(x) · ∇of the overdamped Langevin process (1) on the sampled points using diffusion map [5, 4]. The lower eigenmodes of such discretized operator in principle can provide a low-dimensional (nonlinear) reparameterization of the committor function. If the transition trajectories lie on a low dimensional manifold, it is possible to discretize the generator accurately via sampling. More precisely, with N samples, let  K ∈ RN×N, the diagonal matrix D ∈ RN×N, the discretized generator  L ∈ RN×Nbe defined as

image

respectively. Let  qA, qB, qΩ\A∪Bbe vectors corresponding to committor function values on the points belong to regions  A, B, Ω \ A ∪ B, the committor function q satisfies

image

In the presence of a spectral gap the eigenmodes of  L(Ω\A∪B, Ω\A∪B) provides reduced coordinates for the reaction tube, thus solving (6) for q can be seen as expanding q using the reduced coordinates. However, discretizing the generator using diffusion map may suffer from low order of convergence. Therefore [13] improves upon diffusion map by explicitly constructing the tangent plane of each point in the sampled point cloud and discretizing the generator in each of the tangent plane.

On the other hand, recent years have seen usage of machine learning techniques in solving high-dimensional partial differential equations. The success of [3] where an NN parameterized spin wavefunction is used as an ansatz for solving the many-body Schr¨odinger equation motivates us to consider solving (3) using an NN as well. Our work is also similar to the methods in [12, 17, 2, 9] for solving partial differential equations. [12, 2] demonstrate success of NN-based method for solving boundary values problem

image

where  b(x) = 0 on ∂Ω, a(x) is a smooth function that satisfies the boundary conditions on  ∂Ω, and n(x)is an NN-parameterized function. Then u is found from solving

image

leading to an optimization problem over the NN parameters. The improvement of [2] over [12] is that a(x) and b(x) are also learned as a neural-network separately from n(x), whereas in [12] they are obtained via explicit construction. Such methods remove the need of specifying basis for discretizing u therefore can complement Galerkin-type method. While these methods obtain rather impressive results in low-dimension, their performance in high dimension is unexplored, which is in fact the most interesting regime. In a very recent work [17], a neural-network is used to parameterize the solution to a high-dimensional parabolic equation. Although such setting is similar the one we consider, in our case the boundary conditions might result singularities in u, making it more difficult to be approximated using an NN, which will be address in Section 2. Moreover, since [12, 17, 2] work with the strong form of a partial differential equation, the computational cost can be high as the second order derivative of u(x) is needed, whereas our approach is based on the variational formulation of the PDE. Although using an NN in solving the variational formulation of a PDE [9] has been explored before, [9] does not face the type of singularity issue arises in our application.

In this section, we present the general strategy of solving (3) using a neural-network. For simplicity, we let Ω =  Rd, U(x) be a confining potential that gives rise to an equilibrium measure  µ(x) :=exp(−βU(x))/Z(β)normalized on the region Ω\(A ∪ B), where  Z(β) :=�Ω\A∪Bexp(−βU(x))dx. Instead of working with the strong form (3), we solve the variational problem

image

To see the boundary conditions for  q on ∂Ω, let q∗(x) be the minimizer of (10) and  q(x, λ) = q∗(x)+λη(x).Since  q∗(x) is a stationary point, for any  η(x)

image

The third equality follows from  η(x) = 0 on  ∂A, ∂B, and requiring

image

via imposing suitable boundary condition on  ∂Ω. Here�∂Ω dsstands for the surface integral. When the domain is unbounded as the considered case, the condition

image

where  BRdenotes a ball with radius R, can ensure (12). Notice that we simply need  ∇q(x) to have subexponential growth as  |x| → ∞ for(13) to hold, as long as exp(−βU(x)) ≤exp(−a|x|) when |x| > Rfor some R, a > 0. The last equality in (11) implies that a solution to (10) provides a solution to (3).

As mentioned earlier, to cope with the high-dimensionality of q(x), the proposed method consists of

parameterizing q(x) as an NN function  qθ(x). Instead of (10), we solve

image

where the boundary conditions are only enforced as soft-constraints (with hardness tuned by the choice of ρ). The first integral is then approximated via sampling according to the overdamped Langevin process (1). To approximate the second and third integrals, for our problems there exist rather convenient scheme for drawing samples from  µ∂A(x), µ∂B(x). The choice of the measures on the boundaries is based on the consideration of sampling convenience. In our examples, we mainly work with regions A and B being balls, therefore the samples on  ∂A and ∂Bare drawn by normalizing and recentering normally distributed samples. Note that we can rewrite (14) as a single expectation

image

if we define a mixture measure

image

where  χΣ(x) is the characteristic function of region Σ,  αis a parameter that controls the proportion between the sample size in  ∂A ∪ ∂Bwith the sample size in Ω\(A ∪ B). This allows us to solve for (3) as an optimization problem (15) over NN weights using stochastic gradient type methods based on stochastic approximation of the expectation. While this is in principle straightforward, challenges arise due to the specific nature of the high dimensional Fokker-Planck equation we aim to solve. We discuss those challenges below and then the proposed neural-network design to overcome them.

2.1 Challenge in high T regime

Although this method seems straight-forward, the main difficulty in using an NN to approximate the committor function is that in some situations, singularities are present within region A and B. Consider the case where A and B are two balls of radius  r and Ω = Rd, d ≥3, centered at (−w0/2, 0, . . . , 0), (w0/2, 0, . . . , 0).When  T → ∞and  β →0, (3) becomes

image

The last boundary condition is there in order to satisfy (13). A solution to (16) can be obtained by first solving the Laplace’s equation with Dirichlet’s boundary conditions

image

and letting q(x) = (1/2)(˜q(x) + 1). A classical way to solve (17) analytically is via method of images. Using the Green’s function

image

where Γ(·) is the gamma function, the solution of (17) can be obtained as

image

where

image

image

Figure 1. The solution to (16) obtained from method of images (Red) and solving (15) (Blue) without taking care of the singularity issue.

where  c0= (2π)d/2/Γ(d/2). Due to the singularities in ˜q, the committor function may be steep near regions A, B which can present difficulties when using an NN approximation. To illustrate, we let d = 3, w0= 1, r = 0.15 and we plot the solution of (16) along the  x1-dimension in Fig. 1. As a contrast, we minimize (15) using an NN with 3 hidden-layers and tanh nonlinearities, where each hidden-layer has 12 nodes. 3e+04 samples sampled uniformly from the box [−2, 2]3 are used in the optimization problem. We let  ρ = 666 and α = 1/15 to enforce the boundary condition on  ∂A and ∂B. As shown in Fig. 1, the NN has difficulty capturing the behavior of the committor function near the singularities.

2.2 Challenge in low T regime

A different type of singularities can exist in the low temperature regime. Consider the potential

image

Here, (22) resembles a double potential well, and  ∂Aand  ∂Bare located in the potential wells. When the temperature is low, the equilibrium distribution for such U(x) is concentrated in A and B while the midpoint of A and B has a low density. Therefore, as q(x) goes from 0 to 1 when  x1goes from -1 to 1, it is preferrable for  |∇q(x)|to concentrate around the midpoint of A and B in order to have a low cost

image

As an example, we plot the committor function when T = 0.05 in Fig. 2 when d = 10. While the committor function is steep around  x1= 0, unlike the case of high T, an NN with a single hidden layer with tanh activation function gives a good approximation. Since our goal is simply to show qualitatively that an NN is capable of handling such singularity issue, we defer the implementation details to Section 3.

2.3 Neural-network architecture

In this subsection, we present an NN network architecture that can deal with the aforementioned challenges. When solving problem (16) via its variational formulation, a natural choice of the NN architecture is to mimic (20) and take

image

image

Figure 2. The committor function in (22) along  x1dimension when T = 0.05 for an arbitrarily chosen (x2, . . . , xd) with d = 10.

where  nθ1, nθ2are neural-network parameterized functions,  θ= [θT1 , θT2]T. By this ansatz, we explicitly remove the dominant singularities at  y+,0, y−,0. On the other hand, in order to determine the committor function of the transition process in the potential (22) when  β → ∞, (tanh(wf(x1)) + 1)/2 where w is alarge scalar and f is certain smooth function may be used to capture the sharp transition around the midpoint between  ∂Aand  ∂B. This suggests using tanh as nonlinearity in an NN.

Therefore, to deal the issue of singularity, we propose the following neural-network architecture to solve for the committor function

image

where  nθk(x)’s are functions parameterized by the NN,  Nsis the number of singularities, and each  Sk(x, yk)is a problem dependent function with singularity at  yk. The vector  θ= [θT0 , . . . , θTNs]Tcontains the parameters of the neural networks. Except  nθ0(x), each nθk(x) is an NN with 3 hidden layers where each hidden layer has 6 nodes.  nθ0(x) consists of multiple hidden layers each having 12 nodes. A hyperparameter we tune here is the number of hidden layers in  nθ0(x), where the choice of it is made using cross-validation. More precisely, the NN for committor function should give similar cost�x∈Ω\A∪B |∇qθ(x)|2exp(−βU(x))dxin training and testing samples. We use tanh as the activation function of the hidden nodes. At high temperature, we expect singularities of 1/|x|d−2type to be dominant, whereas at low temperature the function  nθ0(x) withtanh nonlinearities should be the main contributor to the committor function. The pipeline of solving for  qθ(x) is depicted in Fig. 3. As shown in Fig. 4, when using such architecture to solve for the variational form of (16), we indeed recover the 1/|x| type behavior near A and B.

In this section, we evaluate the proposed method in a few numerical examples. In these examples, the minimization in (15) using such NN architecture is done using the Adam [11] optimizer, a variant of stochastic gradient descent, in the TensorFlow [1] engine. The ratio 2αof samples on  ∂A ∪ ∂Bto samples on Ω  \ ∂A ∪ ∂Bare kept between 1/10 to 1/100. Then  ρis tuned in order to have the boundary conditions satisfied with 10−3accuracy. In all of the experiments, 2000 boundary samples are used, and we set the batch size to be 3000 in the Adam optimizer. We evaluate the performance using the following metric

image

image

Figure 3. An example of the neural network architecture for a committor function with two 1/|x|d−2 typesingularities.

image

Figure 4. The solution to (3) from method of images (Blue) and from solving (15) when explicitly including 1/|x| type singularities in the NN.

where  vR(q) :=  kBT/Z(β)�Ω\A∪B |∇q(x)|2exp(−βU(x))dxis the rate of reaction. Note that  vRis the energy one minimizes for q in the variational formulation. To calculate these errors, we generate samples by simulating the stochastic process (1).

In the first numerical experiment, we solve for the committor function in the potential well (22) with regions A and B being (23) when d = 10. In this case, q(x) =  f(x1) where

image

To solve this problem using an NN, we set  Ns= 0 in (26) as there is no singularity in this problem. In nθ0(x), only one hidden layer is used. In this example, we sample differently from what is presented in (15). When T is small, it is difficult to obtain sufficient samples near the x = 0 saddle point of U(x). Therefore, instead of working with (15) directly, we sample  x1uniformly from [−1, 1], (x2, . . . , xd)from a  d −1-dimensional gaussian distribution, and change the first term of the integrand in (15) from |∇qθ(x)|2χΩ\A∪B(x) to

image

to ensure sufficient sample coverages along  x1. In this case,

image

For a subset of these samples, we let  x1 = 1, −1 to get the samples on the boundaries  ∂A, ∂B. We use aseparate batch of samples, serving as validation dataset, to determine  E1and  E2. In Table 1 we report the error and the number of samples used for solving this problem in dimension d = 10 with temperature T = 0.2, 0.05.

image

Table 1. Results for the double well potential (22) between two planes.

In the second experiment, we solve for the committor function for the transition process between a pair of coecentric spheres, with potential

image

In this example, even with moderate T, the committor function still display a singular behavior  q ∼1/|x|d−2, d ≥3. Therefore in (26) we let  Ns= 1,  S1= 1/|x|d−2. We use 3 hidden layers for  nθ0. The equilibrium density is proportional to exp(−β|x|2), therefore the samples can be drawn from the gaussian distribution. The samples on the two boundaries are obtained via rescaling samples from the normal distribution to have norm a or b. The results for T = 2, d = 6, a = 1, b = 0.25 are summarized in Table 2. We compare the solution with and without including the 1/|x|d−2 type singularity. It is worth noting that the explicit inclusion of singularity is rather important for this example even at moderate temperature. In Fig. 5a, we plot  qθ(x) along several randomly chosen radial directions to check whether  qθ(x) is close to a single-variable function when explicitly including a singular function in the NN architecture. In Fig. 5b, we plot the NN committor function when the singularity is not explicitly taken care of.

image

Table 2. Results for the coecentric spheres example. We compare the cases when  Ns= 0 and  Ns= 1.

image

Figure 5. The committor function for the stochastic process (31) between a pair of coecentric spheres as a function of |x|. We compare the case when singularity is explictly included in the NN parameterization and the case when singularity is not included (in (a) and (b) respectively). Red: The ground truth committor function. Blue: The NN parameterized committor function along three different choices of radial direction.

In the third experiment, we work with the rugged-Muller potential

image

considered in [13]. This is a Muller potential perturbed by a rugged potential in the first two dimension, where the roughness is controlled by  γ, k. In the rest of the dimensions, we place a quadratic potential well where its strength is controlled by  σ. The domain Ω = [−1.5,1]  × [−0.5,2]. The parameters in (33) are taken from [13], for completeness we provide them in the following:

image

In this example, we let T = 40, 22, regions A and B being two balls with radius 0.1 centered at (−0.57, 1.43)and (−0.56, 0.044). The points are again sampled using Euler-Maruyama scheme. The ground truth is obtained via applying finite element method on uniform grid to (3), where the code is provided by the authors of [13]. We use an NN with two singularities of type log(|x − y|) where y is the position of singularity, and  nθ0that has 3 hidden layers. The results are reported in Table 3 and the contours of the committor function are shown in Fig. 6. As shown in the table, although we can achieve few percents accuracy, for the case with lower temperature more samples are needed to determine the committor function (since the equilibrium distribution is less smooth).

image

Table 3. Results for the rugged Muller potential example.

image

Figure 6. Figures for the rugged Muller potential on a 2-dimensional hyperplane. (a) and (d): The equilibrium distribution when T = 40, 22 for the rugged-Muller potential. (b) and (c): The ground truth committor function and the NN parameterized committor function for T = 40. (e) and (f): The ground truth committor function and the NN parameterized committor function for T = 22.

In this note, we develop method based on neural-network to represent the high-dimensional committor function. The neural-network parameters are found via optimizing the variational form of the FokkerPlanck equation. In order to better approximate the committor function, the NN function has to be designed carefully in order to deal with the singularities in high and low T regime. Through numerical experiments, we show the usefulness of the proposed alternative approach in dealing with high-dimensional partial differential equations. We remark that the quality of the learned committor function depends crucially on sampling. When the temperature is low, due to the sparsity of samples between regions A and B when a naive sampling scheme is used, the NN approximation to the committor function tends to make a transition that is too sharp compare to the ground truth. The usage of enhanced sampling schemes, for example using the currently learned NN to guide further sampling, is certainly an important future direction to investigate.

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, J. Kudlur, Manjunath Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.

[2] J. Berg and K. Nystr¨om. A unified deep artificial neural network approach to partial differential equations in complex geometries. arXiv preprint arXiv:1711.06464, 2017.

[3] G. Carleo and M. Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.

[4] R. R. Coifman, I. G. Kevrekidis, S. Lafon, M. Maggioni, and B. Nadler. Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Modeling & Simulation, 7(2):842–864, 2008.

[5] R. R. Coifman and S. Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.

[6] W. E, W. Ren, and E. Vanden-Eijnden. Finite temparture string method for the study of rare events. J. Phys. Chem. B, 109:6688–6693, 2005.

[7] W. E and E. Vanden-Eijnden. Towards a theory of transition paths. Journal of statistical physics, 123(3):503, 2006.

[8] W. E and E. Vanden-Eijnden. Transition path theory and path-finding algorithms for the study of rare events. Ann. Rev. Phys. Chem., 61:391–420, 2010.

[9] W. E and B. Yu. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, pages 1–12.

[10] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.

[11] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[12] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998.

[13] R. Lai and J. Lu. Point cloud discretization of Fokker-Planck operators for committor functions. Multiscale Model. Simul. in press; arXiv preprint arXiv:1703.09359, 2017.

[14] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.

[15] J. Lu and J. Nolen. Reactive trajectories and the transition path process. Probab. Theory Related Fields, 161:195–244, 2015.

[16] J. Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.

[17] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. arXiv preprint arXiv:1708.07469, 2017.

[18] E. Vanden-Eijnden and M. Venturoli. Revisiting the finite temperature string method for the calculation of reaction tubes and free energies. J. Chem. Phys., 130:194103, 2009.

[19] L. Zhang, H. Wang, and W. E. Reinforced dynamics of large atomic and molecular systems. arXiv preprint arXiv:1712.03461, 2017.


Designed for Accessibility and to further Open Science