In the past five years there has been rapid progress in the development of deep generative models for continuous data like images [Kingma and Welling, 2013] and sequences like natural language [Sutskever et al., 2011]). The two most prominent models developed are generative adversarial networks (GANs) [Goodfellow et al., 2014] and variational autoencoders (VAEs) [Kingma and Welling, 2013], both learn a distribution parameterized by neural networks.
There has also been tremendous progress in deep generative models for combinatorial structures, particularly graphs. Graph generation is an important research area with signifi-cant applications in drug and material designs. However, discovering new compounds with specific properties is an extremely challenging task because of the huge, unstructured and discrete nature of the search space. Hence, more effort needs to be directed towards building simple yet efficient models with the proper inductive biases. We focus our efforts on this domain specific area of generating molecular graphs.
One of the first graph generative models is the Erdos-R´enyi (ER) random graph model [Erd˝os and R´enyi, 1960] where each edge and node exists with independent probability. The modern approach from deep generative models in this class is
Figure 1: The Erdos-Renyi model for c1cnon1
GVAE where the decoder outputs independent probabilities for edge and node features.
Another class of generative models of graphs are sequential models that construct graphs sequentially, node by node. When generating molecules, these models can be constrained to ensure they only generate valid molecules, but they must be trained one molecule at a time since the generative process depends on a sequence of probabilistic decisions. This also makes them significantly more difficult to train and tune.
Models in the ER family such as GVAE, do not have this requirement and are significantly easier to train. However, because they do not consider edge correlations, it is much harder to match the training distribution with these models. This also makes them difficult if not impossible to constrain so that they only generate valid molecules. For example, [Ma et al., 2018] proposes a regularization framework for GVAEs to in order for them to generate semantically valid graphs. They formulate penalty terms that address validity constraints. However, they only obtain limited success on small graphs and are sig-nificantly less successful on larger molecules.
We propose that more effort should be directed towards improving the underlying model design of GVAE before attempting to construct regularization frameworks. In this work, we focus on improving GVAE’s basic structure: in particular we note that its MLP decoder does not have the proper inductive bias for graph structured data. Therefore, we introduce a simple model for graph generation by building a message passing neural network into the encoder and decoder of a VAE (MPGVAE). Our model is simple to implement but is more adept at domain specific graph generation and even allows us to avoid using graph matching. We demonstrate the MPGVAE on a few standard tasks in molecular generative modeling.
In this paper we focus on undirected graphs G with n nodes denoted G = (A, E, X), where the adjacency matrix is such that
implies nodes u and v are connected. The edge feature tensor
specifies edge features, with
denoting that nodes u and v have edge type w with specific edge features
. The node feature matrix
stacks all node feature vectors
.
2.1 Neural message passing
Message passing neural networks (MPNNs) [Gilmer et al., 2017] operate on graphs G. First, a message passing phase runs for T propagation steps and is defined in terms of message functions and node update functions
. During the message passing phase, hidden states
of each node in the graph are updated based on messages
according to
Every node v receives an aggregate message from its neighbours , in this case, by simple summation. Then in the second phase we readout predictions y based on final node embeddings, after T propagation steps.
2.2 Variational Auto-Encoders for Graphs
A GVAE learns a probability distribution from a set of training graphs such that we can sample new graphs from it. To do this, a VAE learns a latent representation z of those graphs so that the generative model
, which is defined by a neural network with parameters
, can generate a graph G. Assuming the training data are independent, the objective is to maximize the log-evidence of the data
which is intractable but can be lower-bounded by introducing a variational approximation . Another neural network called the inference network encodes a training graph G into a latent representation z and outputs the parameters of
The latent space z is low dimensional to make sure the model does not just memorize the training data. We maximize the lower bound with respect to the generator and inference network parameters :
where denotes the Kullback–Leibler divergence. The first term in (6) is the reconstruction loss which ensures the generated graphs are similar to the training graphs. The second term in (6) regularizes the latent space to ensure that we spread the density of the generative model around. The prior p(z) is a standard normal.
Graph neural networks. The first neural network operating on graphs model was proposed by [Scarselli et al., 2008], and later improved upon by [Li et al., 2015]. A general framework based on neural message passing was constructed by [Gilmer et al., 2017]. Also, [Duvenaud et al., 2015] introduced a convolutional neural network for molecular graphs.
Generating SMILES. Several works have explored training generative models on SMILES representations of molecules. One of the first, CharacterVAE [G´omez-Bombarelli et al., 2018] is based on a VAE with recurrent neural networks. The GrammarVAE [Kusner et al., 2017] and SDVAE [Dai et al., 2018] constrain the decoder in order to follow particular syntactic and semantic rules. Recently, these works were extended by [Krenn et al., 2019] to ensure 100% reconstruction validity.
Sequential models. The first of such models comes from [Johnson, 2016], which incrementally constructs a graph as a sequence of probabilistic decisions, in order to do some reasoning task. Using this framework, DeepGMG [Li et al., 2018] built an autoregressive model for graphs conditioned on the full generation history. CGVAE [Liu et al., 2018] improves upon DeepGMG by building a Gated Graph NN [Li et al., 2015] into the encoder and decoder of a VAE [Kingma and Welling, 2013], but it only conditions on the current partial graph during generation. Graph convolution policy network [You et al., 2018a] uses a GCN [Kipf and Welling, 2016] to sequentially generate molecules in goal-directed way through reinforcement learning. In general, sequential models have some issues with stability and scaling as larger graphs require more propagation steps.
More generally GraphRNN [You et al., 2018b] generates the adjacency matrix sequentially, one entry or one column at a time through an RNN. However, the GraphRNN scales quadratically in number of nodes and can’t handle long term dependencies. A recent work, improves upon this, GRAN [Liao et al., 2019] generates graphs one block of nodes and associated edges at a time.
Modern ER models. These are models which generate entire graphs at a time where each edge and node is independent; these include Graphvae [Simonovsky and Komodakis, 2018] and MolGAN [De Cao and Kipf, 2018] which avoids issues arising from node ordering that are associated with likelihood based methods by using an adversarial loss instead [Goodfel- low et al., 2014].
MPGVAE as domain specific Graphite. Graphite [Grover et al., 2018] is a recently introduced method for generative model of graphs parameterizing a variational auto-encoder with a graph neural network using a iterative graph refinement strategy in the decoder. This particular work is very similar to MPGVAE but uses a simple GCN [Kipf and Welling, 2016] and includes latent variables per node, our model can be seen as a domain specific version of Graphite for more complicated constrained graphs.
Node Updates
Edge Updates
Message Passing
Figure 2: The Generative Process for MPGVAE 1) first we read in an unconnected graph from the fixed dimension latent space then 2) we perform message passing on the raw graph creating both an edge and node representation to 3) read out a generated graph using that learned representation
We introduce the Generative process and the decoder as well as the inference model and the encoder. Both the Encoder and Decoder use a message passing neural network (MPNN) to learn a graph structured representation during inference and generation.
We use a variant of the MPNN from [Gilmer et al., 2017] that uses graph attention [Veliˇckovi´c et al., 2017] as an aggregation method and the message function similar to interaction networks [Battaglia et al., 2016].
The node update function uses a GRUCell [Chung et al., 2014]. After propagation through message passing layers, we use the set2set model [Vinyals et al., 2015] as the readout function to combine the node hidden features into a fixed-size graph level representation vector.
4.1 Generative process
In this section we describe the basic structure of the generative process starting with the underlying observation model and then discuss the structure of the decoder which consists of three main graph generation steps:
1) Graph Read In 2) Message Passing and 3) Graph Readout.
The basic structure of the generative mode, including these three steps, is visualized in Figure 2.
The observation model. Based on the assumptions of the ER family, we assume the observation model factorizes:
For molecular graphs we have that both edges and node features are categorical
we also treat non-existent nodes and edges as a category. The three main steps the decoder uses to readout graphs from this distribution are
1. Graph Read In. The first step, reads in the initial graph representation from the fixed dimensional latent space by projecting the latent representation with a linear layer then passing each vector through a RNN cell to construct an initial state for each node. Each edge is initialized with a zero vector.
2. Message Passing. Afterwards, using the initialized graph we perform message passing on both the node and edge representations. This allows us to create a representation we can use to read out a graph
3. Graph Readout. Lastly, using the final edge and node representation we transform the edge representation and predict independent edge and node probabilities.
4.2 Inference Model.
The variational posterior q(z|G) is a factored Gaussian with mean vector and variance vector
. We use a standard Gaussian prior p(z) = N(0, I) .
Encoder The encoder is a MPNN [Gilmer et al., 2017] which takes in a molecular graph and encodes it into a fixed dimensional representation that is mapped to the parameters of the variational approximation. We describe the model below:
Messages We pass messages along edges where each message is updated using the current edge and node representation
(9) this is a similar message function, to the one found in interaction networks [Battaglia et al., 2016].
Edge Update we use the message representation to directly update edges For all edges in the graph. The edges are initialized with the bond features.
Node Update for each node we construct a message by aggregating using a graph attention [Veliˇckovi´c et al., 2017] where we first compute attention coefficients
Then aggregate over neighbourhoods by summation :
Then each node is updated using the previous state and messages using a Gated Recurrent cell [Chung et al., 2014] and [Li et al., 2015]
Read Out After propagation through message passing layers, we use the set2set model [Vinyals et al., 2015] as the readout function to combine the node hidden features into a fix-sized hidden graph level representation.
Using this representation, the mean and log variance of the variational posterior are computed with two separate linear layers.
4.3 Graph Read In and Message Passing
Graph Read In First we read in the raw graph representation from the fixed dimensional latent space by projecting the latent representation to a high dimensional space with a linear layer with a sigmoid activation . Then we pass each transformed vector through a RNN cell to read in an initial state for each node. Each edge is initialized with a zero vector. This is described in algorithm box 2.
Message Passing Using the initialized graph we perform message passing using both the node and edge representations. We use a MPNN that is identical to the encoder’s MPNN except does not perform any graph level aggregation. After T propagation rounds we come to the final representation for each edge and node.
4.4 Graph Readout
Taking in the final edge and node representation we transform each edge representation such that we obtain a symmetric edge tensor– simply by adding its transpose to itself and dividing by 2 thus making it symmetric.
Then we use a linear layer given for nodes and edges to map the learned representation from a high dimensional space back to the dimension space of the graph (has dimension number of bond types and each
has dimension num atoms.
From this state, using both representations we can sample independent edge and node probabilities using a softmax function to normalize the probabilities.
In the next section we discuss our experimental results.
*
* OH
We focus our evaluation efforts on a few standard tasks in molecular generative modeling following previous works [G´omez-Bombarelli et al., 2018] [Simonovsky and Ko- modakis, 2018] including :
• Molecular Generation Quality We test the MPGVAE on the task of generating molecules when sampling from the prior distribution using a variety of established measures and plot samples from the model.
• Matching The Training Data Distribution We also test the MPGVAE on its ability to match the distributional statistics of the training data.
• Conditional Generation Lastly, we test the ability of the MPGVAE to generate molecules conditionally based on atom histograms.
Dataset In all experiments, we used QM9 [Ramakrishnan et al., 2015] a subset of the massive 166.4 billion molecules GDB-17 chemical database [Ruddigkeit et al., 2012]. QM9 contains 133,885 organic compounds of up to 9 heavy atoms: carbon, oxygen, nitrogen and fluorine (CNOF).
Baselines We consider 2 main baselines GraphVAE [Si- monovsky and Komodakis, 2018] and MolGAN [De Cao and Kipf, 2018]. We also compare with the Character VAE (CVAE)[G´omez-Bombarelli et al., 2018] and GrammarVae (GraVAE) [Kusner et al., 2017] on the first task.
Table 1: Comparison of Metrics
Model Configuration Both the encoder and decoder MPNN use 4 layers with [32,64,64,128] units in the encoder and [64,64,32,32] in the decoder. The set2vec and vec2set functions use 256 hidden units in the graph level representation and latent space uses 18 dimensions.
5.1 Molecular Generation Quality
Metrics We generate samples from the prior and assess them using the following statistics defined in [Simonovsky and Komodakis, 2018]: 1) Validity is the ratio between the number of valid and generated molecules. 2) Uniqueness is the ratio between the number of unique samples and valid samples. 3) Novelty measures the ratio between the set of valid samples not in the training data and the total number of valid samples.
Figure 3: Distributions of QED (left), logp (middle), and SA (right) for sampled molecules and QM9
We can see from table 1 that MPGVAE produces the best balance of the three measures while MolGAN, GVae and GraVAE cannot generate unique molecules –meaning they only learn to generate a few kinds of molecules – leading to lower number of total valid, unique and novel generated (column Num). MPGVAE generates 300 percent more novel, valid unique molecules than GVAE- a substantial improvement. Also upon visual inspection MPGVAE generates more reasonable molecules than both GraVAE and MolGAN – both of which generate a few molecules with disconnected, isolated scaffolding.
5.2 Matching distributional statistics
In this section, we use various statistics to compare our model’s learned distribution to the training distribution
continuous statistics. Similar to [Seff et al., 2019] we leverage three commonly used quantities when assessing molecules: the quantitative estimate of drug-likeness (QED) score, the synthetic accessibility (SA) score, and the log octanol-water partition coefficient (logP). Th metrics depend on many molecular features allowing for an overall comparison of distributional statistics.
discrete statistics we consider a second set of discrete statistics to measure how well each model captures the training distribution. Following [Liu et al., 2018], we measure the average number of each atom type in the generated samples, and count the average number of rings in each molecule.
Results We evaluate the quality of our generative model by comparing the distribution that MPGVAE generates to those in the original data. In Fig. 3, we display Gaussian kernel density estimates (KDE) of the continuous metrics for generated sets of molecules from two baseline methods, in addition to MPGVAE. A normalized histogram of the QM9 training distribution is also shown for visual comparison. In Fig. 4. we plot a stacked histogram of the average number of atoms and rings in each molecule. For each method, we use samples from the model.
In figure 3, we see that MPGVAE more closely matches the distribution from QM9 for the continuous measures. For the discrete statistics, in figure 4 we see as well that MPGVAE’s
Figure 4: Distributions of atom, bond and ring count
Table 2: Conditional Generation Task
stacked barplot is closer to QM9’s than the baselines. GVAE does not match the ring statistics while MPGVAE does.
5.3 Conditional generation
For more control over the generated molecules we can condition both encoder and decoder on a label vector y of atom histograms associated with each input molecule G [Sohn et al., 2015]. The decoder p(G|z, y) takes in z and y, while in the encoder y is concatenated to node features.
Again we draw samples from the prior and as in [Si- monovsky and Komodakis, 2018] compute the discrete point estimate of what is decoded argmax p(G|z, y) We are interested in accuracy, which is the ratio of chemically valid molecules with atom histograms equal to their label y over number of sampled molecules. We again see in table 2 that MPGVAE has improved accuracy over GVAE.
In this work we addressed the problem of generating graphs from within the ER family without edge correlation. We built a MPNN into both the decoder and encoder of a VAE and demonstrated it substantially improves GVAE.
[Battaglia et al., 2016] Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pages 4502–4510, 2016.
[Chung et al., 2014] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[Dai et al., 2018] Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. Syntax-directed variational autoencoder for structured data. arXiv preprint arXiv:1802.08786, 2018.
[De Cao and Kipf, 2018] Nicola De Cao and Thomas Kipf. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
[Duvenaud et al., 2015] David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael G´omezBombarelli, Timothy Hirzel, Al´an Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Neural Information Processing Systems, 2015.
[Erd˝os and R´enyi, 1960] Paul Erd˝os and Alfr´ed R´enyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
[Gilmer et al., 2017] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 1263–1272. JMLR. org, 2017.
[G´omez-Bombarelli et al., 2018] Rafael G´omezBombarelli, Jennifer N Wei, David Duvenaud, Jos´e Miguel Hern´andez-Lobato, Benjam´ın S´anchezLengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Al´an AspuruGuzik. Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
[Goodfellow et al., 2014] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
[Grover et al., 2018] Aditya Grover, Aaron Zweig, and Stefano Ermon. Graphite: Iterative generative modeling of graphs. arXiv preprint arXiv:1803.10459, 2018.
[Johnson, 2016] Daniel D Johnson. Learning graphical state transitions. 2016.
[Kingma and Welling, 2013] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[Kipf and Welling, 2016] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[Krenn et al., 2019] Mario Krenn, Florian H¨ase, AkshatKumar Nigam, Pascal Friederich, and Al´an Aspuru-Guzik. Selfies: a robust representation of semantically constrained graphs with an example application in chemistry. arXiv preprint arXiv:1905.13741, 2019.
[Kusner et al., 2017] Matt J Kusner, Brooks Paige, and Jos´e Miguel Hern´andez-Lobato. Grammar variational autoencoder. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1945– 1954. JMLR. org, 2017.
[Li et al., 2015] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493, 2015.
[Li et al., 2018] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.
[Liao et al., 2019] Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K Duvenaud, Raquel Urtasun, and Richard Zemel. Efficient graph generation with graph recurrent attention networks. In Advances in Neural Information Processing Systems, pages 4257– 4267, 2019.
[Liu et al., 2018] Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander Gaunt. Constrained graph variational autoencoders for molecule design. In Advances in Neural Information Processing Systems, pages 7795–7804, 2018.
[Ma et al., 2018] Tengfei Ma, Jie Chen, and Cao Xiao. Constrained generation of semantically valid graphs via regularizing variational autoencoders. In Advances in Neural Information Processing Systems, pages 7113–7124, 2018.
[Ramakrishnan et al., 2015] Raghunathan Ramakrishnan, Mia Hartmann, Enrico Tapavicza, and O Anatole Von Lilienfeld. Electronic spectra from tddft and machine learning in chemical space. The Journal of chemical physics, 143(8):084111, 2015.
[Ruddigkeit et al., 2012] Lars Ruddigkeit, Ruud Van Deursen, Lorenz C Blum, and Jean-Louis Reymond. Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. Journal of chemical information and modeling, 52(11):2864–2875, 2012.
[Scarselli et al., 2008] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2008.
[Seff et al., 2019] Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, and Ryan P Adams. Discrete object generation with reversible inductive construction. In Advances in Neural Information Processing Systems, pages 10353– 10363, 2019.
[Simonovsky and Komodakis, 2018] Martin Simonovsky and Nikos Komodakis. Graphvae: Towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pages 412–422. Springer, 2018.
[Sohn et al., 2015] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems, pages 3483–3491, 2015.
[Sutskever et al., 2011] Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1017–1024, 2011.
[Veliˇckovi´c et al., 2017] Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
[Vinyals et al., 2015] Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391, 2015.
[You et al., 2018a] Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay Pande, and Jure Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems, pages 6410–6421, 2018.
[You et al., 2018b] Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and Jure Leskovec. Graphrnn: Generating realistic graphs with deep auto-regressive models. arXiv preprint arXiv:1802.08773, 2018.