Neural networks and their modern computational implementations, generally called deep learning, have been successfully applied in several areas of mathematics and science in recent years. In this paper, we will generalize the methodology proposed in Sirignano and Spiliopoulos [2018] to numerically solve PDEs, known as Deep Galerkin Method (DGM), to an infinite dimensional setting. Applications of deep learning to solve PDEs date back to Lee and Kang [1990], Lagaris et al. [1998], Parisi et al. [2003]. Lately, many articles have dealt with the finite-dimensional PDE problems, see, for instance, E et al. [2017, 2018], Raissi et al. [2019], Al-Aradi et al. [2019]. Many of them examine non-linear PDEs and then consider the Backward Stochastic Differential Equation (BSDE) technique. We will not pursue this approach here.
We propose a numerical method based on neural networks to solve path-dependent partial differential equations (PPDEs) that arises from the functional calculus framework proposed in Dupire [2009]. This theory was firstly proposed in the aforesaid reference with the goal to extend results available for vanilla derivatives contracts in Financial Mathematics to more general, path-dependent derivatives, as, for instance, Asian, barrier and lookback options. Additionally, non-linear PPDEs appear in the context of stochastic optimal control and differential games, see Saporito [2019] and Pham and Zhang [2014].
One of the main features of this functional calculus is the fact that all the modelling is non-anticipative, meaning that it does not look into the future of the evolution of the state dynamics. This fact suggests the choice of Long-Short Term Memory (LSTM) networks to model these objects. In fact, we propose a novel architecture that combines LSTM and feed-forward, which we called Path-Dependent Deep Galerking Method (PDGM) architecture, that captures the non-anticipativeness of functionals and deals with the necessary path deformations from this functional calculus.
Recently, in Fouque and Zhang [2019], it has been shown that the LSTM network can be used to numerically solve coupled forward anticipated BSDEs, and effectively approximate the conditional expectation for a non-Markovian process.
There are very few methods available to solve PPDEs; for a discussion about them, see Ren and Tan [2017] and references therein. In this paper, the authors summarize some numerical methods to deal with PPDE, namely finite difference, trinomial tree, probabilistic schemes. These methods are either Monte Carlo or tree based. Our method differs from all of them by considering the recent neural network approach for differential equations.
The closest work to ours, but different nonetheless, is Jacquier and Oumgari [2019]. In this paper, the authors consider the functional framework proposed by Viens and Zhang [2019] that generalizes the functional Itˆo calculus to deal with the fractional Brownian motion in a very inventive way. The numerical procedure proposed in Jacquier and Oumgari [2019] uses the approach that combines BSDE and deep learning to numerically solve PPDE that arises from the rough Heston model. Our approach could be modified to handle those PPDEs. However, it is outside the scope of this paper.
The paper is organized as follows. In Section 2 we introduce the functional Itˆo calculus and the main theoretical object of our study, the PPDEs. The algorithm is presented and studied in Section 3. Finally, we show several numerical examples in Section 4. In order to show the capabilities of the method, we mostly consider cases where closed-form solutions are available.
In this section, we review the notion of Path-Dependent Partial Differential Equations (PPDEs) and the theory that created them, the functional Itˆo calculus. Proposed in the seminal paper Dupire [2009], this framework allows us to apply the techniques of differential calculus to functions that depend on the history of the state variable being considered. It was firstly developed in the Itˆo’s stochastic calculus setting, but this generalization could be obviously applied in the usual, deterministic differential calculus. Below we present the necessary definitions and results to define precisely what is a PPDE.
2.1 Functional Itˆo Calculus
We start by fixing a time horizon T > 0. Denote the space of c`adl`ag paths in [0, t] taking values in
and define
. Capital letters will denote elements of
(i.e. paths) and lower-case letters will denote spot value of paths. In symbols,
means
and
, for
.
A functional is any function . For such objects, we define, when the limits exist, the time and space functional derivatives, respectively, as
where
see Figures 1 and 2. In the case when the path lies in a multidimensional space, the path deformations above are understood as follows: the flat extension is applied to all dimension jointly and equally and the bump is applied to each dimension individually.
We consider here continuity of functionals as the usual continuity in metric spaces with respect to the metric:
where, without loss of generality, we are assuming , and
Figure 1: Flat extension of a path.
Figure 2: Bumped path.
The norm is the usual Euclidean norm in the appropriate Euclidean space, depending on the dimension of the path being considered. This continuity notion could be relaxed, see, for instance, Oberhauser [2016].
Moreover, we say a functional f is boundedness- preserving if, for every compact set , there exists a constant C such that
, for every path
satisfying
for some
, see Cont and Fourni´e [2010].
A functional is said to belong to
if it is
-continuous, boundedness-preserving and it has
-continuous, boundedness-preserving derivatives
and
. Here, clearly,
.
Our numerical method is based on the following approximation of the functional derivatives: for a smooth functional , we use
Additionally, one could obviously consider
2.2 PPDEs
For any in [0, T], denote by
the space of
-valued c`adl`ag paths on [s, t]. Now define the operator
, the concatenation of paths, by
which is a paste of Y and Z.
Given functionals and
and fixing a probability space
, we consider a process x given by the stochastic differential equation (SDE)
with and
. The process
denotes a standard Brownian motion in
and we assume
and
are such that there exists a unique strong solution for the SDE (4). This unique solution will be denoted by
and the path solution from t to T by
. We forward the reader, for instance, to Rogers and Williams [2000] for results on SDEs with functional coefficients.
Finally, we define the conditioned expectation as
for any . The path
is equal to the path
up to t and follows the dynamics of the SDE (4) from t to T with initial path
. Moreover, if we define the filtration
generated by
, one may prove
where the expectation on the left-hand side is the one discussed above and the one on the right-hand side is the usual conditional expectation.
The Feynman-Kac formula in the classical stochastic calculus is a very important result that relates conditional expectations of functions of diffusions and PDEs. It turns out that a functional extension of this result is available.
Theorem 2.1 (Functional Feynman-Kac Formula; Dupire [2009]) Let x be a process given by the SDE (4). Consider functionals and
and define the functional f as
for any path . Thus, if
and
and
are
-continuous, then f satisfies the (linear) Path-dependent Partial Differential Equation (PPDE):
with , for any
in the topological support of the stochastic process process x.
Remark 2.2 In diffusion models (and
), under mild assumptions on
and
, the Stroock-Varadhan Support Theorem states that the topological support of x is the space of continuous paths starting at
, see for instance [Pinsky, 1995, Chapter 2]. So, under these assumptions, the PPDE (6) will hold for any continuous path. See Jazaerli and Saporito [2017] for a discussion on this type of result in the case of SDEs with functional coefficients. For instance, the arithmetic and geometric Brownian motions have full support on the space of continuous path, with the GBM having a restriction for positive range for the paths.
Remark 2.3 Existence and uniqueness of classical (in the functional sense) of solution of PPDEs of the form (6) was studied in Flandoli and Zanco [2016], for instance. We forward the reader to the aforesaid reference for conditions of the functional parameters to ensure this result. Furthermore, several results have been developed to study non-linear versions of such PPDE and the existence and uniqueness of viscosity solutions, see Ekren et al. [2014], Ekren et al. [2016a,b].
In this section, we will present our algorithm to numerically solve a vast class of PPDEs. The main idea of algorithm is to apply the DGM methodology of Sirignano and Spiliopoulos [2018] to the PPDE framework. By DGM methodology we mean to approximate the solution of the equation by finding an neural network that approximately solve the equation in a given sense and any other additional conditions. In order to achieve this we need to consider a neural network architecture that correctly models the functionals that appear in PPDEs. Since functionals are non-anticipative, their value at t does not depend on state values after t, for any given time t. Because of this characteristic we consider a combination of feed-forward and the LSTM networks.
Another difference between our setting and the DGM is the space where the equation is defined. In their case, the domain of the PDE is some subset of an Euclidean space. In our case, it is a subset of the space of paths or of the space of continuous paths.
3.1 Long Short-Term Memory (LSTM) arquitecture
We start by stating some useful definitions. A set of layers with input
in a feed-forward neural network can be defined as
where is some activation function such as
and
. Then the set of feed-forward neural networks with
hidden layers is defined as a composition of layers:
Instead of all the inputs being not ordered as in the feed-forward neural network, we often encounter sequential information as input, which is the case of our application. Additionally, for instance, in natural language processing, one of the main topics is sentimental analysis, where given paragraphs of texts, one classify them into different categories. In such cases, recurrent neural networks (RNN) come into play, which stores information so far, and uses them to perform computations in the next step. However, it was shown in Bengio et al. [1994] that plain RNNs suffer from exploding or vanishing gradient problems. The LSTM network in Hochreiter and Schmidhuber [1997] is designed to tackle this problem, in which the inputs and outputs are controlled by gates inside each LSTM cell. This architecture is powerful for capturing long-range dependence of the data. Each LSTM cell is composed of a cell state, which contains the information, and three gates, which regulate the flow of information. Mathematically, the rule inside the ith cell follows, for ,
where the operator denotes the element-wise product. Additionally,
is known as the output vector with initial value
, and
is called the cell state vector with initial value
refers to the number of hidden units. Moreover,
are weight matrices, and
is the bias vector. These parameters are learned during training via a stochastic gradient descent algorithm combined with backpropagation to compute the gradients.
The set of LSTM network up to time i is defined as
where are defined in (8) and
.
3.2 PDGM architecture
In order to model the objects from the functional Itˆo calculus, we propose a novel neural network architecture that combines LSTM and feed-forward networks in order to guarantee non-anticipativeness and to deal with the necessary path deformations from the functional Itˆo calculus. We call such architecture Path-Dependent Deep Galerking Method (DPGM). The network structure is displayed in Figure 3.
Figure 3: PDGM architecture.
The PDGM architecture approximates a functional f as follows. We start by considering a time discretization , with
. We then approximate
by a feed-forward neural network
, where
. Here
, where a is an output vector from an LSTM network, i.e.
, for some
are the neural network’s parameters. The spatial and time extensions can be properly obtained by assigning correct inputs to the feed-forward neural network. This shows the effectiveness of the PDGM architecture for the functional Itˆo calculus setting. Therefore, the functional derivatives can be approximated according to the approximation as in (3):
3.3 Algorithm
where , for some subset of paths
. As an illustration, L could be given by the linear operator
We train the neural network to minimize the following objective function:
where is the operator L with the finite difference approximation of the functional derivatives and
Here, and
are measures in the path space
and
, respectively. The choice of this measures and consequently how we should sample the paths
in order to approximate the theoretical loss function J will be discussed below. Additionally, we apply stochastic gradient descent to minimize the loss function over a set of parameters
. The neural network with optimized parameters
delivers an approximation of the solution of the PPDE (11). Given M simulated paths accordingly to the laws
and
, time and space discretization parameters
and h, the loss J will be approximated by
Moreover, when a closed-form solution is available, we compute the -error from this one and our numerical solution approximating the mean squared error
defined above.
Our algorithm works as follows:
3.3.1 Simulation
One important ingredient of the method above is the simulation of the paths . The goal of this step is to select good representatives of the set
, the domain of the PPDE. Usually, for problems that arises from the Feynman-Kac formula (even in its non-linear form), it is straightforward to choose the generating process that should be considered for the simulation of these paths. For instance, if we have the path-dependent heat equation (studied in Section 4.1), one should simulate from the Brownian motion.
However, the simulation does not to have as precise as in the Monte Carlo methods. The reason is that the PPDE itself has the dynamics of the state variable within its formulation. For example, in the Heston model studied in Section 4.2.4, one could simulate the CIR dynamics simplifying the natural reflecting barrier at 0 (taking the maximum of the simulated value and zero, for instance).
Nonetheless, one should be aware of the choice of simulated paths for the training. Usually, the space is much bigger than the possible simulated paths (e.g. in the Brownian case,
is the space of continuous paths in [0, T]). An interesting exercise is to verify that training from a given set of simulated paths gives the algorithm sufficient knowledge to predict the value of the functional on a different type of path. Numerical experiments showed us that one important aspect is the range of the test paths. If the range is very different from the trained paths, the approximation will not work very well. In the numerical examples below, we consider the exact model coming from the PPDE to simulate the paths for the training sets and test the trained functional in very smooth and very rough paths different from the generating process of the training paths, but respecting their range. The method performs very well in all of them.
Remark 3.1 An idea similar to control variates applied in Monte Carlo methods would be the following. Suppose that is a path-independent functional (i.e.
) such that
with
being somewhat analogous to
(e.g. grows similarly). Then, the functional
solves the same PPDE and the final condition might be better behaved. We then could apply the algorithm to approximate
and use the formula
to find an approximation for f.
3.3.2 Convergence Result
The derivation of convergence results similar to the ones shown in Sirignano and Spiliopoulos [2018] are very challenging in this setting. We leave them for possible future work since it would require some new results from the functional Itˆo calculus theory. However, we sketch an approach for the proof of existence of a PDGM network such that the loss function is arbitrarily small.
The argument would be as follows: fix N as the time discretization parameter and consider the approximation of the functional f as where,
. If f is smooth, then
is smooth in the last variable and
as
, where the convergence is of the functional and their derivatives. Now, fundamentally, the PDGM approximates
and this is very similar to argument presented in Sirignano and Spiliopoulos [2018]. The result, under possible additional technical conditions, could be formulated as:
Conjecture 3.1 Assume there exists a classical solution for the PPDE (11). Then, for any , there exists
,
and
such that
with
satisfies
In this section we will provide several examples of PPDEs with their closed-form and PDGM solutions. We will consider different dynamics (Brownian motion, geometric Brownian motion and Heston model) and different path-dependent final conditions (running integral, running maximum and running minimum). Moreover, we will also consider a non-linear case. The algorithm could handle more complex problems, as for instance, high-dimensional PPDEs. We decided to choose classic examples for pedagogical reasons: they are well-known to the readers, they have closed-form solutions and demonstrate how powerful the method is. Furthermore, as it was clear from the exposition of the method, the PDGM is able to deal with any path-dependent structure as long as it might be written as a PPDE of the form (11). Additional conditions, such as boundary and integral conditions, could be added to the loss function similarly to the DGM methodology.
We have used a personal desktop with Intel Core i7, 16GB RAM, and a NVIDIA RTX 2080 graphic card to run these numerical examples. Additionally, we have used the TensorFlow. Each epoch takes approximately 0.4s to 0.8s depending on the complexity of the neural network and the PPDE. The Python code for an illustrative example of the geometric Asian option shown in Section 4.2.1 is available at https://github.com/zhaoyu-zhang/ PDGM-Geometric_Asian.
4.1 Brownian motion
In this section, we consider the class of examples
where is any continuous path, see Remark 2.2. These PPDEs arise from the linear expectations of path-dependent final condition g under a Brownian model. Under smoothness condition on f, the PPDE above holds for any continuous path Y . These simple examples allow us to provide a very clear introduction to the method and serve as illustrations. Below we will consider five different final conditions g: path-independent, linear running integral, quadratic running integral, one high-dimensional case and a strongly path-dependent example.
Training paths in this subsection are sampled from standard Brownian motions paths with T = 1 and time discretization N = 100. For the path independent, linear running integral, quadratic running integral examples, we choose mini-batch size M = 128 paths. We use a single layer LSTM network with 64 units connecting with a deep feed-forward neural network which consists of three hidden layers with 64, 128, 64 respectively. Although we only train our neural network using standard Brownian motions simulated paths, our algorithm is able to provide a good approximation to the true solution for paths other than those. Furthermore, we show the train losses, test losses and MSE after 10,000 epochs in the table below.
Table 1: Train and test losses for the Brownian case
For path independent, linear running integral, quadratic running integral examples, three representatives test paths with their corresponding solution and derivatives are plotted in Figures 5, 7, and 12 respectively. Path 1 is a standard Brownian motion path. Path 2 is the smooth path for
. Path 3 is a realization of a sequence of uniform random variables between -1 and 1, i.e.,
, for each
.
4.1.1 Path Independent
As a sanity check, consider the case where , which yields a PDE with solution
As an example, we consider , which gives
. Figure 4 shows the training and testing losses. Three representative paths with their corresponding solution and derivatives are shown in Figure 5. It can be seen that our algorithm provides a good approximation. The functional derivatives for this example are
and
, which are also captured by the algorithm.
Figure 4: Train and test losses for the path-independent example.
Figure 5: Three representative paths with corresponding solutions and functional derivatives for the path-independent example.
4.1.2 Linear Running Integral
In this example, training and test losses reach after 10000 epochs, which is shown in Figure 6. Figure 7 plots three representative paths (as in the path-independent example) with their corresponding solution and functional derivatives. The predicted solutions are approximately the same as true solutions. From the plot, the derivatives in this example for both
and
are 0 which is true also by direct computation.
Figure 6: Train and test losses for the linear running integral example.
Figure 7: Three representative paths with corresponding solutions and functional derivatives for the linear running integral example.
Though using Brownian motions as training paths yields a faster convergence with a small number of neurons, one drawback is that when the test path is outside the domain of the trained paths, it would yield a poor prediction, as it was discussed in Section 3.3.1. In particular, Figure 8 plots 128 Brownian paths used for training, showing that the domain is from to 2. In Figure 9, the neural network is not able to find the right solutions to a Brownian path with volatility 4 which starts at
.
Figure 8: A sample of 128 Brownian motion paths.
Figure 9: Prediction failure due to the limitation of training domain.
One easy remedy for the above problem is to use varying volatility of a Brownian paths with varying initial values. In addition, one may also need to enlarge the neural network. For example, the training paths for Figure 10 are Brownian paths with volatility and initial value
. The single layer LSTM network consists of 128 units, and each of the three layer feed-forward neural networks contains 128 hidden neurons. For example, in Figure 10, test path 1 is a Brownian path with volatility 4 and starting at
; test path 2 is a function
; test path 3 is a realization of a sequence of i.i.d. uniform random variables drawn from
to 5. As a result, according to the above setup, the neural network is capable to predict solutions to the paths with wider domain.
4.1.3 Quadratic Running Integral
In order to consider a more complicated case, take
Figure 10: Three representative paths with corresponding solutions and functional derivatives for the linear running integral example.
Moreover
yielding
Similar to the above examples, training and testing loss is around after 10000 epochs as in Figure 11, and three representative paths with their corresponding solution and derivatives are plotted in Figure 12.
4.1.4 High-Dimensional Example
Here we will show that the methodology we have developed could handle high-dimensional PPDEs. Since this is not the main focus of the paper, the example serves more as an illustration of the method under this setting. Several numerical improvements could be introduce following the suggestions outlined in Sirignano and Spiliopoulos [2018].
Figure 11: Train and test losses for the quadratic running integral example.
Figure 12: Three representative paths with corresponding solutions and functional derivatives for the quadratic running integral example.
Consider a d-dimensional Brownian motion and the payoff functional
It can be straightforwardly shown that
As an illustration, we choose d = 20, T = 1 and . Each dimension of the training paths are sampled from Brownian motions. For the numerical implementation, our algorithm works the same as in other aforementioned examples. On the left of Figure 13, we plot the each dimension of a 20-dimensional path separately. The first 10 dimensions are sampled from standard Brownian motion paths. For the remaining 10 paths, points at each time step are sample from an uniform distribution between -2 and 2. The right plot in Figure 13 compares the true solution and the solution predicted, which are similar.. Time and spatial derivatives can also found on the right plot of Figure 13.
Figure 13: High-Dimensional Example
4.1.5 Hitting Time of the Final Value
which is the hitting time of the final value of the path . This example presents a stronger type of path-dependence than the ones presented so far. The value of
can be found in closed form. Indeed, notice that in the Brownian case,
where is the hitting time of the value x of a Brownian bridge from 0 to x and
is the probability density of
. Fixing T = 1 and x > 0, one might show that the probability density if
is given by
We might then compute
Therefore,
In this example, we use Brownian motion paths with starting points following a standard normal distribution as training paths. The discretization mesh size is again chosen to be 0.01. The training and testing losses approach to 0.01 after 25,000 epochs as shown on the left side of Figure 14. On the right of Figure 14, we compare of 12,800 paths between PDGM architecture and Monte Carlo method. In the Monte Carlo method, for each starting position
, we simulate 5,000 Brownian motion paths in order to compute the sample mean. The results from Monte Carlo simulation have bell shape with mean around 0.54, but the results from our method are more concentrated at 0.53, and the difference is less than 1%. This bias comes from discretization of time as also discussed in Remark 4.1.
Figure 15 shows three representative test paths and their solutions from both our method and Monte Carlo simulation. Finding the solution from Monte Carlo simulation for an entire path is quite expensive. At each time step of a given path, we simulate 2,000 Brownian motion concatenated paths with the original path, i.e. we need to simulate 200,000 paths to approximate the pathwise solution. Test path 1 is a Brownian motion path starting at 0.1233; test path 2 is a straight line from 0 to 3; test path 3 is a realization of a sequence of i.i.d. uniform random variables between 2 and 2.5. The solutions are similar, and the solution from the PDGM algorithm tends to be smoother. Our algorithm after properly trained is able to compute path solutions for any path with similar range, however, the Monte Carlo simulation is only capable to compute the solution for each entire path at a time.
Figure 14: Train and test losses on the left, and the histogram comparison of between PDGM architecture and Monte Carlo method on the right for the hitting time example.
4.2 Applications in Mathematical Finance
Functional Itˆo calculus, and hence PPDEs, was born from the necessity to deal path-dependent financial derivatives in the Mathematical Finance literature. In this section we will consider the classical Black–Scholes model, where the spot value follows a geometric Brownian Motion with constant parameters
Figure 15: Three representative paths with corresponding solutions for the hitting time example.
Under this model, the price of a general path-dependent financial derivative with maturity T and payoff
for any continuous path taking positive values, see Remark 2.2.
We will consider three examples (Geometric Asian, Lookback and Barrier options) where closed-form solutions are available. Moreover, we will consider one path-dependent example with the process x having stochastic volatility. Additionally, one could consider several other path-dependent, exotic derivatives with different dynamics. The PDGM could be applied similarly to these cases requiring possibly more computational power or time.
In the examples below, we consider the PPDE (16) with parameters and T = 1. The payoff functional g will vary for each case. We use the geometric Brownian motion with these parameters as training paths for the algorithm with number of batch size of M = 128 paths and N = 100 time steps. Moreover, there are 128 units in a single layer LSTM cell, and the deep feed-forward neural network consists of three hidden layers with 128 neurons in each.
For geometric Asian option and the lookback option, Figures 16 and 17 show three representative test paths with corresponding closed-form solutions. Path 1 is a geometric Brownian motion path with the same parameters as above. Path 2 is the smooth path for
. Path 3 is a realization of a sequence of i.i.d. uniform random variables between 1 and 3. For the barrier option, we consider a down and out option. Figures 18 shows three representative test paths (different from the ones above and defined in Section 4.2.3) with corresponding closed-form solutions.
The solutions predicted from our algorithm are approximately the same as the true solutions. Our algorithm is able to predict solutions for any given paths in the domain of training paths regardless of the shape of a path. Furthermore, the losses after 15,000 epochs in these examples, together with the MSE when closed-form solution is available, are given in the table below.
Table 2: Train and test losses for the Mathematical Finance examples
4.2.1 Geometric Asian Option
The case of continuously-monitored geometric Asian options with fixed strike is determined by the payoff
where is the positive part of x and K > 0 is called the strike. A closed-form solution is available in this case:
where is the cumulative distribution function of the standard normal,
In the numerical examples below, we fix the strike at K = 0.4. Three representative test paths with corresponding closed-form solutions are given in Figure 16.
4.2.2 Lookback option
A lookback call option with floating strike is given by the payoff
If we denote , the closed-form solution, assuming q = 0, for the price of this option can be written as
where
Figures 17 shows three representative test paths with corresponding closed-form solutions. Our algorithm shows a promising result.
Figure 16: Three representative paths with corresponding solutions for the geometric Asian option.
Figure 17: Three representative paths with corresponding solutions for the lookback option.
4.2.3 Barrier option
There are several types of barrier options, see for instance, Reiner and Rubinstein [1991] and Fouque et al. [2011]. Here, we will focus on the case of down-and-out call options. More precisely, the option becomes worthless whether the spot value crosses a down barrier . Otherwise, the payoff is a call with strike
. The payoff functional can then be written as
In addition, the solution should also satisfy the boundary condition if the barrier B was crossed by the path
. A closed-form solution is available:
where
is the price of a call option with strike K and maturity T at
and
Due to the fact that the option become valueless when the stock price crosses the barrier, we need to slightly modify the loss function in our algorithm. In this case, the loss for a given sample path j at time is
The total loss is calculated as
We then minimize the above loss objective using stochastic gradient descent algorithm and update parameter .
In the numerical implementation, we choose B = 0.6 and K = 0.8. Figure 18 plots three representative paths and the corresponding solutions. Ttest path 1 is a geometric Brownian motion with the parameters described above. Note this path does not cross the barrier. Test path 2 is another geometric Brownian motion but with . This path down crosses the barrier around t = 0.4. The third test path is a smooth path
. As a result, the predicted solutions and the true solutions are approximately the same.
Remark 4.1 In our numerical experiment, we simulate paths Y at time equally spaced by
. When verifying whether the barrier was crossed, we have available these discretized values
. It would be possible to have
, but
. This problem vanishes when
goes to 0. One approach would be then to consider a sufficiently small
in order to diminish this issue. This concern appears in the usual Monte Carlo methods to price barrier options and the methods used there might be adapted to assist us here, see Gobet [2009].
4.2.4 Exotic Option in Stochastic Volatility Models
The price at time t of a general path-dependent option with maturity T and payoff can be written as the functional
and solves the PPDE
Figure 18: Three representative paths with corresponding solutions for the down-and-out call option.
The generalization of our algorithm to this multidimensional case is straightforward. We will consider the geometric Asian option as in Section 4.2.1. For the numerical implementation, we specify . For a fixed maturity time T = 1, Figure 19 plots a pair of stock prices path realization and volatility path realization on the left-hand side. Solutions predicted from our algorithm are shown on the right by with different strikes from 0 to 1. On the left of Figure 20, we plot the patterns of option prices versus strikes with fixed maturity T = 1, and on the right we plot the option prices versus different maturities with fixed strike price K = 0.4.
Figure 19: Solutions to the Heston model given a pair of paths of (on the left) by varying the strike prices.
Figure 20: On the left: prices vs strike prices K. On the Right: prices vs maturity times T.
4.3 Non-linear PPDE
This is the last numerical example. We consider a non-linear PPDE with closed-form solution. This example was studied in Ren and Tan [2017].
The closed-formula solution is given by , where
is the running integral, and the
We want to find the suitable source so f satisfies the PPDE above. Notice that
Rearranging
Then,
The motivation for this problem is the following stochastic differential game:
where
We use standard Brownian motion paths to train the neural network. We specify the coefficients to be and
. While keeping track of the signs of spatial derivatives, our algorithm works in the same way as in the other examples. Loss reaches around
after 15000 epochs, and loss is plotted in Figure 21. Three representative paths with their corresponding solutions are presented in Figure 22. Test path 1 is a realization of standard Brownian motion path. Test path 2 is a smooth path
. Test path 3 is is
.
Figure 21: Train and test losses for the path-independent example.
Figure 22: Three representative paths with corresponding solutions for the non-linear PPDE.
We have proposed a new method to solve PPDEs based on neural networks, called Path-Dependent Deep Galerking Method (PDGM). A novel network architecture was developed in order to deal with the objects from the functional Itˆo calculus. There are very few methods available to solve these equations; for a discussion about them, see Ren and Tan [2017] and references therein. We then showed the vast capabilities of the PDGM in various examples.
Future work could be divided between two main avenues. Firstly, one could study theoretical questions regarding the PDGM method as its consistency, speed of convergence and stability. Secondly, one could apply the method to more complex situations. As mentioned in the introduction, one could also extend PDGM to the different family of PPDEs originated from Viens and Zhang [2019].
Additionally, the notion of monotone numerical schemes was generalized to the PPDE setting in Ren and Tan [2017]. As future research one could study if the proposed method here is monotonic as defined in aforesaid reference.
A. Al-Aradi, A. Correia, and Y. F. S. D. Naiff, G. Jardim. Applications of the Deep Galerkin Method to Solving Partial Integro-Differential and Hamilton-Jacobi-Bellman Equations. preprint, 2019. Available at arXiv: http: //arxiv.org/abs/1912.01455.
Y. Bengio, P. Simard, , and P. Frasconi. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks, 5:157–166, 1994.
R. Cont and D.-A. Fourni´e. Change of Variable Formulas for Non-Anticipative Functional on Path Space. J. Funct. Anal., 259(4):1043–1072, 2010.
B. Dupire. Functional Itˆo Calculus. Quantitative Finance, 2019:721–729, 2009.
W. E, J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differ- ential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5 (4):349–380, 2017.
W. E, J. Han, and A. Jentzen. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
I. Ekren, C. Keller, N. Touzi, and J. Zhang. On Viscosity Solutions of Path Dependent PDEs. Ann. Probab., 42: 204–236, 2014.
I. Ekren, N. Touzi, and J. Zhang. Viscosity Solutions of Fully Nonlinear Parabolic Path Dependent PDEs: Part I. Ann. Probab., 44:1212–1253, 2016a.
I. Ekren, N. Touzi, and J. Zhang. Viscosity Solutions of Fully Nonlinear Parabolic Path Dependent PDEs: Part II. Ann. Probab., 44:2507–2553, 2016b.
F. Flandoli and G. Zanco. An infinite-dimensional approach to path-dependent Kolmogorov equations. Ann. Probab., 44, 2016.
J.-P. Fouque and Z. Zhang. Deep learning methods for mean field control problems with delay. Available in ArXiv: https: // arxiv. org/ abs/ 1905. 00358 , 2019.
J.-P. Fouque, G. Papanicolaou, R. Sircar, and K. Sølna. Multiscale Stochastic Volatility for Equity, Interest Rate, and Credit Derivatives. Cambridge University Press, 2011.
E. Gobet. Advanced monte carlo methods for barrier and related exotic options. In A. Bensoussan, Q. Zhang, and P. Ciarlet, editors, Handbook of Numerical Analysis, pages 497–528. Elsevier, 2009.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
A. Jacquier and M. Oumgari. Deep PPDEs for rough local stochastic volatility. Available in ArXiv: https: // arxiv. org/ abs/ 1906. 02551 , 2019.
S. Jazaerli and Y. F. Saporito. Functional Itˆo Calculus, Path-dependence and the Computation of Greeks. Stochastic Process. Appl., 127:3997–4028, 2017.
I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial Neural Networks for Solving Ordinary and Partial Differential Equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998.
H. Lee and I. S. Kang. Neural algorithm for solving differential equations. Journal of Computational Physics, 91 (1):110–131, Nov. 1990. ISSN 00219991. doi: 10.1016/0021-9991(90)90007-N. URL http://linkinghub. elsevier.com/retrieve/pii/002199919090007N.
H. Oberhauser. An extension of the Functional Itˆo Formula under a Family of Non-dominated Measures. Stoch. Dyn., 16, 2016.
D. R. Parisi, M. C. Mariani, and M. A. Laborde. Solving differential equations with unsupervised neural networks. Chemical Engineering and Processing: Process Intensification, 42(8-9):715–721, Aug. 2003. ISSN 02552701. doi: 10.1016/S0255-2701(02)00207-6. URL http://linkinghub.elsevier.com/retrieve/pii/ S0255270102002076.
T. Pham and J. Zhang. Two Person Zero-sum Game in Weak Formulation and Path Dependent Bellman-Isaacs Equa- tion. SIAM J. Control Optim., 52(4):2090–2121, 2014.
R. G. Pinsky. Positive Harmonic Functions and Diffusion. Cambridge University Press, 1995.
M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686 – 707, 2019.
E. Reiner and M. Rubinstein. Breaking Down the Barriers. Risk Magazine, 4, 1991.
Z. Ren and X. Tan. On the convergence of monotone schemes for path-dependent PDEs. Stochastic Process. Appl., 127:1738–1762, 2017.
L. Rogers and D. Williams. Diffuions, Markov Processes and Martingales. Cambridge Mathematical Library, second edition, 2000.
Y. F. Saporito. Stochastic Control and Differential Games with Path-Dependent Influence of Controls on Dynamics and Running Cost. SIAM Journal on Control and Optimization, 57(2):1312–1327, 2019. Available at arXiv: http://arxiv.org/abs/1611.00589.
J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375:1339–1364, 2018.
F. Viens and J. Zhang. A martingale approach for fractional brownian motions and related path dependent pdes. To appear in Ann. App. Probab., 2019.