In this paper, we study the transition between two states described by the overdamped Langevin process
where = 1/T, and
is a d-dimensional Wiener process, using the transition path theory [7, 8, 15]. The central object in the transition path theory is the committor function. Let
be the first hitting time for region
Ω. For two disjoint regions
Ω, the committor function is defined as
which is the probability of hitting region B before region A with the stochastic process (1) starting at x. The committor function q(x) provides useful statistical description on properties such as density and probability current of reaction trajectory. However, obtaining the committor function q(x) can be a formiddable task, as it involves solving a high-dimensional Fokker-Planck equation
The high dimensional nature of (3) renders obtaining q(x) via finite-element-type methods intractable. On the other hand, since the transition paths are often localized to a quasi one-dimensional reaction tube, the region of interest is rather small compare to Ω. Under this approximation, the finite temperature string method [6, 18] is developed to simultaneously find the best “tube” and the corresponding committor function. Another approach is based on explicit dimension reduction using e.g., the leading eigenfunctions
of the generator by means of diffusion maps [5, 4] to approximate q(x). More recently, a direct point cloud discretization for the Fokker-Planck equation [13] has been also considered. In recent years, the artificial neural-network (NN) has shown great success in representing high-dimensional probablity distributions or classifiers in a variety of machine learning tasks [10, 14, 16]. Motivated by those recent success, in this note we use an NN to provide a low-dimensional parameterization of the committor function
) (4) where
is the parameter vector of the NN.
) is then obtained by minimizing the variational formulation of (3) over
as a nonlinear Galerkin method. In this way, we switch from solving a partial differential equation to solving an optimization problem. Optimizing the cost in the variational formulation of (3) using gradient-descent-based method involves computing an integral with respect to the equilibrium measure. For doing such integration, we use a Monte-Carlo method where the samples are generated according to the stochastic process (1). An NN parameterized committor function can potentially be used to guide further sampling in transition regions wherein sample density is low (in similar spirit as [19] where an NN parameterized force field is used to guide sampling in the application of molecular dynamics). This paper is organized as the following. In Section 2, we design an NN tailored for solving for the committor function. In Section 3, we demonstrate the success of the proposed method in a few examples. In Section 4, we conclude the note. Before moving on, in the next subsection we survey related methods for solving (3).
1.1 Previous approaches
A popular way to solve (3) is to discretize the generator of the overdamped Langevin process (1) on the sampled points using diffusion map [5, 4]. The lower eigenmodes of such discretized operator in principle can provide a low-dimensional (nonlinear) reparameterization of the committor function. If the transition trajectories lie on a low dimensional manifold, it is possible to discretize the generator accurately via sampling. More precisely, with N samples, let
, the diagonal matrix
, the discretized generator
be defined as
respectively. Let be vectors corresponding to committor function values on the points belong to regions
, the committor function q satisfies
In the presence of a spectral gap the eigenmodes of ) provides reduced coordinates for the reaction tube, thus solving (6) for q can be seen as expanding q using the reduced coordinates. However, discretizing the generator using diffusion map may suffer from low order of convergence. Therefore [13] improves upon diffusion map by explicitly constructing the tangent plane of each point in the sampled point cloud and discretizing the generator in each of the tangent plane.
On the other hand, recent years have seen usage of machine learning techniques in solving high-dimensional partial differential equations. The success of [3] where an NN parameterized spin wavefunction is used as an ansatz for solving the many-body Schr¨odinger equation motivates us to consider solving (3) using an NN as well. Our work is also similar to the methods in [12, 17, 2, 9] for solving partial differential equations. [12, 2] demonstrate success of NN-based method for solving boundary values problem
where ) is a smooth function that satisfies the boundary conditions on
is an NN-parameterized function. Then u is found from solving
leading to an optimization problem over the NN parameters. The improvement of [2] over [12] is that a(x) and b(x) are also learned as a neural-network separately from n(x), whereas in [12] they are obtained via explicit construction. Such methods remove the need of specifying basis for discretizing u therefore can complement Galerkin-type method. While these methods obtain rather impressive results in low-dimension, their performance in high dimension is unexplored, which is in fact the most interesting regime. In a very recent work [17], a neural-network is used to parameterize the solution to a high-dimensional parabolic equation. Although such setting is similar the one we consider, in our case the boundary conditions might result singularities in u, making it more difficult to be approximated using an NN, which will be address in Section 2. Moreover, since [12, 17, 2] work with the strong form of a partial differential equation, the computational cost can be high as the second order derivative of u(x) is needed, whereas our approach is based on the variational formulation of the PDE. Although using an NN in solving the variational formulation of a PDE [9] has been explored before, [9] does not face the type of singularity issue arises in our application.
In this section, we present the general strategy of solving (3) using a neural-network. For simplicity, we let Ω = ) be a confining potential that gives rise to an equilibrium measure
exp(
normalized on the region Ω
), where
) :=
exp(
. Instead of working with the strong form (3), we solve the variational problem
To see the boundary conditions for ) be the minimizer of (10) and
Since
) is a stationary point, for any
)
The third equality follows from ) = 0 on
, and requiring
via imposing suitable boundary condition on Ω. Here
stands for the surface integral. When the domain is unbounded as the considered case, the condition
where denotes a ball with radius R, can ensure (12). Notice that we simply need
) to have subexponential growth as
(13) to hold, as long as exp(
exp(
for some R, a > 0. The last equality in (11) implies that a solution to (10) provides a solution to (3).
As mentioned earlier, to cope with the high-dimensionality of q(x), the proposed method consists of
parameterizing q(x) as an NN function ). Instead of (10), we solve
where the boundary conditions are only enforced as soft-constraints (with hardness tuned by the choice of ). The first integral is then approximated via sampling according to the overdamped Langevin process (1). To approximate the second and third integrals, for our problems there exist rather convenient scheme for drawing samples from
). The choice of the measures on the boundaries is based on the consideration of sampling convenience. In our examples, we mainly work with regions A and B being balls, therefore the samples on
are drawn by normalizing and recentering normally distributed samples. Note that we can rewrite (14) as a single expectation
if we define a mixture measure
where ) is the characteristic function of region Σ,
is a parameter that controls the proportion between the sample size in
with the sample size in Ω
). This allows us to solve for (3) as an optimization problem (15) over NN weights using stochastic gradient type methods based on stochastic approximation of the expectation. While this is in principle straightforward, challenges arise due to the specific nature of the high dimensional Fokker-Planck equation we aim to solve. We discuss those challenges below and then the proposed neural-network design to overcome them.
2.1 Challenge in high T regime
Although this method seems straight-forward, the main difficulty in using an NN to approximate the committor function is that in some situations, singularities are present within region A and B. Consider the case where A and B are two balls of radius 3, centered at (
When
and
0, (3) becomes
The last boundary condition is there in order to satisfy (13). A solution to (16) can be obtained by first solving the Laplace’s equation with Dirichlet’s boundary conditions
and letting q(x) = (1/2)(˜q(x) + 1). A classical way to solve (17) analytically is via method of images. Using the Green’s function
where Γ() is the gamma function, the solution of (17) can be obtained as
where
Figure 1. The solution to (16) obtained from method of images (Red) and solving (15) (Blue) without taking care of the singularity issue.
where = (2
2). Due to the singularities in ˜q, the committor function may be steep near regions A, B which can present difficulties when using an NN approximation. To illustrate, we let d = 3,
= 1, r = 0.15 and we plot the solution of (16) along the
-dimension in Fig. 1. As a contrast, we minimize (15) using an NN with 3 hidden-layers and tanh nonlinearities, where each hidden-layer has 12 nodes. 3e+04 samples sampled uniformly from the box [
are used in the optimization problem. We let
15 to enforce the boundary condition on
. As shown in Fig. 1, the NN has difficulty capturing the behavior of the committor function near the singularities.
2.2 Challenge in low T regime
A different type of singularities can exist in the low temperature regime. Consider the potential
Here, (22) resembles a double potential well, and and
are located in the potential wells. When the temperature is low, the equilibrium distribution for such U(x) is concentrated in A and B while the midpoint of A and B has a low density. Therefore, as q(x) goes from 0 to 1 when
goes from -1 to 1, it is preferrable for
to concentrate around the midpoint of A and B in order to have a low cost
As an example, we plot the committor function when T = 0.05 in Fig. 2 when d = 10. While the committor function is steep around = 0, unlike the case of high T, an NN with a single hidden layer with tanh activation function gives a good approximation. Since our goal is simply to show qualitatively that an NN is capable of handling such singularity issue, we defer the implementation details to Section 3.
2.3 Neural-network architecture
In this subsection, we present an NN network architecture that can deal with the aforementioned challenges. When solving problem (16) via its variational formulation, a natural choice of the NN architecture is to mimic (20) and take
Figure 2. The committor function in (22) along dimension when T = 0.05 for an arbitrarily chosen (
) with d = 10.
where are neural-network parameterized functions,
= [
]
. By this ansatz, we explicitly remove the dominant singularities at
. On the other hand, in order to determine the committor function of the transition process in the potential (22) when
tanh(
large scalar and f is certain smooth function may be used to capture the sharp transition around the midpoint between
and
. This suggests using tanh as nonlinearity in an NN.
Therefore, to deal the issue of singularity, we propose the following neural-network architecture to solve for the committor function
where )’s are functions parameterized by the NN,
is the number of singularities, and each
is a problem dependent function with singularity at
. The vector
= [
contains the parameters of the neural networks. Except
) is an NN with 3 hidden layers where each hidden layer has 6 nodes.
) consists of multiple hidden layers each having 12 nodes. A hyperparameter we tune here is the number of hidden layers in
), where the choice of it is made using cross-validation. More precisely, the NN for committor function should give similar cost
exp(
in training and testing samples. We use tanh as the activation function of the hidden nodes. At high temperature, we expect singularities of 1
type to be dominant, whereas at low temperature the function
tanh nonlinearities should be the main contributor to the committor function. The pipeline of solving for
) is depicted in Fig. 3. As shown in Fig. 4, when using such architecture to solve for the variational form of (16), we indeed recover the 1/|x| type behavior near A and B.
In this section, we evaluate the proposed method in a few numerical examples. In these examples, the minimization in (15) using such NN architecture is done using the Adam [11] optimizer, a variant of stochastic gradient descent, in the TensorFlow [1] engine. The ratio 2of samples on
to samples on Ω
are kept between 1/10 to 1/100. Then
is tuned in order to have the boundary conditions satisfied with 10
accuracy. In all of the experiments, 2000 boundary samples are used, and we set the batch size to be 3000 in the Adam optimizer. We evaluate the performance using the following metric
Figure 3. An example of the neural network architecture for a committor function with two 1singularities.
Figure 4. The solution to (3) from method of images (Blue) and from solving (15) when explicitly including 1/|x| type singularities in the NN.
where ) :=
exp(
is the rate of reaction. Note that
is the energy one minimizes for q in the variational formulation. To calculate these errors, we generate samples by simulating the stochastic process (1).
In the first numerical experiment, we solve for the committor function in the potential well (22) with regions A and B being (23) when d = 10. In this case, q(x) = ) where
To solve this problem using an NN, we set = 0 in (26) as there is no singularity in this problem. In
), only one hidden layer is used. In this example, we sample differently from what is presented in (15). When T is small, it is difficult to obtain sufficient samples near the x = 0 saddle point of U(x). Therefore, instead of working with (15) directly, we sample
uniformly from [
from a
1-dimensional gaussian distribution, and change the first term of the integrand in (15) from
) to
to ensure sufficient sample coverages along . In this case,
For a subset of these samples, we let 1 to get the samples on the boundaries
separate batch of samples, serving as validation dataset, to determine
and
. In Table 1 we report the error and the number of samples used for solving this problem in dimension d = 10 with temperature T = 0.2, 0.05.
Table 1. Results for the double well potential (22) between two planes.
In the second experiment, we solve for the committor function for the transition process between a pair of coecentric spheres, with potential
In this example, even with moderate T, the committor function still display a singular behavior 1
3. Therefore in (26) we let
= 1,
= 1
. We use 3 hidden layers for
. The equilibrium density is proportional to exp(
), therefore the samples can be drawn from the gaussian distribution. The samples on the two boundaries are obtained via rescaling samples from the normal distribution to have norm a or b. The results for T = 2, d = 6, a = 1, b = 0.25 are summarized in Table 2. We compare the solution with and without including the 1
type singularity. It is worth noting that the explicit inclusion of singularity is rather important for this example even at moderate temperature. In Fig. 5a, we plot
) along several randomly chosen radial directions to check whether
) is close to a single-variable function when explicitly including a singular function in the NN architecture. In Fig. 5b, we plot the NN committor function when the singularity is not explicitly taken care of.
Table 2. Results for the coecentric spheres example. We compare the cases when = 0 and
= 1.
Figure 5. The committor function for the stochastic process (31) between a pair of coecentric spheres as a function of |x|. We compare the case when singularity is explictly included in the NN parameterization and the case when singularity is not included (in (a) and (b) respectively). Red: The ground truth committor function. Blue: The NN parameterized committor function along three different choices of radial direction.
In the third experiment, we work with the rugged-Muller potential
considered in [13]. This is a Muller potential perturbed by a rugged potential in the first two dimension, where the roughness is controlled by . In the rest of the dimensions, we place a quadratic potential well where its strength is controlled by
. The domain Ω = [
1]
2]. The parameters in (33) are taken from [13], for completeness we provide them in the following:
In this example, we let T = 40, 22, regions A and B being two balls with radius 0.1 centered at (and (
044). The points are again sampled using Euler-Maruyama scheme. The ground truth is obtained via applying finite element method on uniform grid to (3), where the code is provided by the authors of [13]. We use an NN with two singularities of type log(
) where y is the position of singularity, and
that has 3 hidden layers. The results are reported in Table 3 and the contours of the committor function are shown in Fig. 6. As shown in the table, although we can achieve few percents accuracy, for the case with lower temperature more samples are needed to determine the committor function (since the equilibrium distribution is less smooth).
Table 3. Results for the rugged Muller potential example.
Figure 6. Figures for the rugged Muller potential on a 2-dimensional hyperplane. (a) and (d): The equilibrium distribution when T = 40, 22 for the rugged-Muller potential. (b) and (c): The ground truth committor function and the NN parameterized committor function for T = 40. (e) and (f): The ground truth committor function and the NN parameterized committor function for T = 22.
In this note, we develop method based on neural-network to represent the high-dimensional committor function. The neural-network parameters are found via optimizing the variational form of the FokkerPlanck equation. In order to better approximate the committor function, the NN function has to be designed carefully in order to deal with the singularities in high and low T regime. Through numerical experiments, we show the usefulness of the proposed alternative approach in dealing with high-dimensional partial differential equations. We remark that the quality of the learned committor function depends crucially on sampling. When the temperature is low, due to the sparsity of samples between regions A and B when a naive sampling scheme is used, the NN approximation to the committor function tends to make a transition that is too sharp compare to the ground truth. The usage of enhanced sampling schemes, for example using the currently learned NN to guide further sampling, is certainly an important future direction to investigate.
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, J. Kudlur, Manjunath Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
[2] J. Berg and K. Nystr¨om. A unified deep artificial neural network approach to partial differential equations in complex geometries. arXiv preprint arXiv:1711.06464, 2017.
[3] G. Carleo and M. Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
[4] R. R. Coifman, I. G. Kevrekidis, S. Lafon, M. Maggioni, and B. Nadler. Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Modeling & Simulation, 7(2):842–864, 2008.
[5] R. R. Coifman and S. Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
[6] W. E, W. Ren, and E. Vanden-Eijnden. Finite temparture string method for the study of rare events. J. Phys. Chem. B, 109:6688–6693, 2005.
[7] W. E and E. Vanden-Eijnden. Towards a theory of transition paths. Journal of statistical physics, 123(3):503, 2006.
[8] W. E and E. Vanden-Eijnden. Transition path theory and path-finding algorithms for the study of rare events. Ann. Rev. Phys. Chem., 61:391–420, 2010.
[9] W. E and B. Yu. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, pages 1–12.
[10] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
[11] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[12] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998.
[13] R. Lai and J. Lu. Point cloud discretization of Fokker-Planck operators for committor functions. Multiscale Model. Simul. in press; arXiv preprint arXiv:1703.09359, 2017.
[14] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
[15] J. Lu and J. Nolen. Reactive trajectories and the transition path process. Probab. Theory Related Fields, 161:195–244, 2015.
[16] J. Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
[17] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. arXiv preprint arXiv:1708.07469, 2017.
[18] E. Vanden-Eijnden and M. Venturoli. Revisiting the finite temperature string method for the calculation of reaction tubes and free energies. J. Chem. Phys., 130:194103, 2009.
[19] L. Zhang, H. Wang, and W. E. Reinforced dynamics of large atomic and molecular systems. arXiv preprint arXiv:1712.03461, 2017.