Graph Convolutional Networks: analysis, improvements and results

2019·Arxiv

Abstract In the current era of neural networks and big data, higher dimensional data is processed for automation of diﬀerent application areas. Graphs represent a complex data organization in which dependencies between more than one object or activity occur. Due to the high dimensionality, this data creates challenges for machine learning algorithms. Graph convolutional networks were introduced to utilize the convolutional models concepts that shows good results. In this context, we enhanced two of the existing Graph convolutional network models by proposing four enhancements. These changes includes: hyper parameters optimization, convex combination of activation functions, topological information enrichment through clustering coeﬃcients measure, and structural redesign of the network through addition of dense layers. We present extensive results on four state-of-art benchmark datasets. The performance is notable not only in terms of lesser computational cost compared to competitors, but also achieved competitive results for three of the datasets and state-of-the-art for the fourth dataset.

Abstract In the current era of neural networks and big data, higher dimensional data is processed for automation of diﬀerent application areas. Graphs represent a complex data organization in which dependencies between more than one object or activity occur. Due to the high dimensionality, this data creates challenges for machine learning algorithms. Graph convolutional networks were introduced to utilize the convolutional models concepts that shows good results. In this context, we enhanced two of the existing Graph convolutional network models by proposing four enhancements. These changes includes: hyper parameters optimization, convex combination of activation functions, topological information enrichment through clustering coeﬃcients measure, and structural redesign of the network through addition of dense layers. We present extensive results on four state-of-art benchmark datasets. The performance is notable not only in terms of lesser computational cost compared to competitors, but also achieved competitive results for three of the datasets and state-of-the-art for the fourth dataset.

1. Introduction In the last few years, data is usually represented as points in a vector space. Today, structured data is omnipresent, able to include structural information between points and can be particularly important to better represent the models learned on them. For this purpose, the graphs are widely used to represent this type of information through nodes/vertices and edges, including local and spatial information derived from the data. Very often the interest concerns a prediction about the node properties in such graphs. For example, given a network that represents a human phenomenon, like a common exchange of messages in a social network, the goal is to predict the area of belonging i.e. users with common interests. Performing a forecasting process, especially in a semi-supervised environment, has been at the center of graph-based semi-supervised learning (SSL) Rozza et al. [2014]. In graph-based SSL, a small set of nodes is ini-

tially labeled. Starting from this information the remaining part of the graph structure is adopted, initially without the label, to label the nodes. Notationally, the structure described by the graph is normally incorporated as an explicit regularizer which applies a sliding constraint on the labels of the nodes to estimate. Recently, Graph Convolutional Networks (GCN)Deﬀerrard et al. [2016]; Kipf and Welling [2017] have been proposed with a purpose to work on deep neural networks and graph-structured data. In this paper, our attention is on the task of graph-based SSL using GCNs. GCN progressively estimate a transformation from graph to vector space, also called embedding, and an aggregation of neighborhood nodes, whereas a target loss function for backpropagation errors is adopted. Indeed, node embedding result represents an estimation for label scores on the nodes. Moreover, it would be appropriate to obtain an association of conﬁdence estimation to the label scores. This conﬁdence scores can be adopted to understand the reliability of the estimated labels on a generic node. This improvement is introduced in the model called Conﬁdence based Graph Convolutional Networks (Con-

fGCN) Vashishth et al. [2019]. In this context, the aim of our paper is fourfold: 1. The models provide a set of parameters to be optimized. In this phase the goal is to ﬁnd the optimal combination (activation function, loss function, hidden layers, number of nodes, etc) in order to obtain the best performance. It is assumed that this phase is very long and expensive. 2. In past few years, many researchers have been worked designing novel activation functions in order to help deep neural networks to converge and obtain better performance. Standard neural networks employ logistic sigmoid activation functions which is aﬀected by saturation problem and and consequently the eﬀectiveness and eﬃciency of the classiﬁcation is reduced. In this paper, we introduce an eﬃcient approach to learn, during training, combinations of base activation functions (such as Relu6); our goal is to check a search space for the activation functions through a convex combination of the base functions. 3. The models adopt only the information relating to the degree of the single nodes (matrix ˜D in equation 1) to process the graphs. In this regard, we introduce a measure that provides additional topological information called clustering coeﬃcients. 4. In deep learning the goal is to improve performance by focus attention on adding new layers, modifying the activation functions or changing the regularization methods. Furthermore, current structure, layers, can be redesigned to obtain optimal results compared to existing models. To this aim, we combined GCN and Dense layers. This new model provides a mixture of both GCN and Dense layers and compared to individual GCN, resulted in better performance. Additionally, we analyze the two baseline models, GCN and ConfCGN, in order to show the impact of proposed changes during training and testing phases. The paper is organized as follows: section 2 gives an overview of related work. Section 3 provides an overview of the two models GCN and ConfCGN. Section 4 explains the enhancement we proposed in this paper. Whereas, section 5 discusses the results achieved with proposed approach on GCN and confGCN. Finally, section 6 gives some future directions and concludes our paper. 2. Related Work Recent literature provides some interesting insights about application of neural networks and data organized as graphs. In Kipf and Welling [2017] a variant of convolutional neural networks, called Graph Convolutional Networks (GCNs), which operate directly on graphs is presented. Main motivation of convolutional architecture is related to localized ﬁrst-order approximation of spectral graph convolutions. The model works by scaling linearly nodes connections and adopts hidden layer representations which encode both structure and features of graphs. ———s on graphs, is presented. The proposed formulation does not alter the computational complexity of standard CNNs, despite being found to be processing graph structures. In Marcheggiani and Titov [2017], an enhanced version of Kipf and Welling [2017] is introduced. It is able to work with syntactic dependency graphs in form of sentence encoders and extracts words latent feature representations arranged in a sentence. Moreover, the authors showed that layers are complementary to LSTM layers. In Veliˇckovi´c et al. [2018], a neural network architecture for inductive and transductive problems, based on masked self-attentional layers, called graph attention networks (GATs), for graph-structured data is presented. In this model, nodes are able to contribute about neighboring features extraction and diﬀerent weights to diﬀerent nodes in a neighborhood are enabled, eliminating expensive matrix operations. In this way, several key challenges of spectral-based graph neural networks are addressed at the same time. In Vashishth et al. [2019], a modiﬁed version called Conﬁdence-based Graph Convolutional Networks (ConfGCN) of Kipf and Welling [2017] is introduced. The improvement concerns a conﬁdence estimation about label scores that has not been explored in GCNs. ConfGCN adopts label scores estimation to identify the inﬂuence of a node on the neighborhood during aggregation, thus acquiring anisotropic abilities. In Liao et al. [2018] a graph partition neural networks (GPNN), an extension of graph neural networks (GNNs), useful to work with large graphs are described. GPNNs combine local information between nodes in small subgraphs and global information between the subgraphs. Graphs are partitioned in eﬃcient way through several al-2

gorithms and, additionally, a novel variant for fast processing of large scale graphs is introduced. In Yadav et al. [2019] a modiﬁed version of Kipf and Welling [2017] named Lov´asz Convolutional Networks (LCNs) is introduced. The model is able to capture global graph properties through Lov´asz orthonormal embeddings of the nodes. In Atwood and Towsley [2016], a Diﬀusion-Convolutional Neural Networks (DCNNs) are described. Diﬀusion-convolution operation is useful to learn representations as an eﬀective basis for node classiﬁcation. The model includes diﬀerent qualities such as latent representation for graphical data, invariance under isomorphism, polynomial-time prediction and learning. In Bruna et al. [2014], possible generalizations of Convolutional Neural Networks (CNNs) to signals is deﬁned for more general domains. In particular, two models, one based upon a hierarchical clustering of the domain and another based on the spectrum of the graph Laplacian are described. The model is able to learn convolutional layers with a number of parameters independent of the input size, resulting in eﬃcient deep architectures. Further, a deep architecture with small learning complexity on general non-Euclidean domains is introduced in Henaﬀ et al. [2015]. The model is an extension of Spectral Networks which includes a graph estimation procedure. Finally, in Li et al. [2016] authors describe Gated graph sequence neural networks (GGNN), an extended versions of Graph Neural Networks (GNN) Scarselli et al. [2008], which uses modiﬁed gated recurrent units modern optimization techniques and extends output sequences. In the following section, we will explain the two baseline models i.e. GCN and ConfGCN. 3. Baseline Models In this section, we brieﬂy introduce Graph Convolutional Networks (GCNs) Kipf and Welling [2017] and its enhancement, Conﬁdence-based Graph Convolutional Networks Vashishth et al. [2019]. The two models are compared and analyzed in detail in terms of their limitations and diﬀerences. Subsequently, starting from these points a set of improvements are proposed and demonstrated experimentally. 3.1. Notations In this section we deﬁne the important elements for this research. Given G = (V, E, X) an undirected graph, where V = Vl ∪ Vu the set containing labeled (Vl) and unlabeled (Vu) nodes in the graph of dimension nl and nu, E is the set of edges and X ∈ R(nl+nu)×d is the input node features. The label of a node v is represented by a vector Yv ∈ Rm, belonging to m classes. In this context, the goal is to predict the labels, Y ∈ Rnl×m, of the unlabeled nodes of G. To consider the conﬁdence, label distribution µv ∈ Rm and a diagonal co-variance matrix Σv ∈ Rm×m estimations are added, ∀v ∈ V. µv,i represents the score of label i on node v, while (Σv)ii represents the variance in the estimation of µv,i. In other words, (Σ−1v )ii is conﬁdence in µv,i. 3.2. Graph Convolutional Networks Graph Convolutional Networks (GCNs) Kipf and Welling [2017] works on undirected graphs. Given a graph G = (V, E, X), the node representation after a single layer of GCN can be deﬁned as: H = f(( ˜D− 12 (A + I) ˜D− 12 )XW) (1) W ∈ Rd×d includes the model parameters, A represents

function such as ReLU, f(x) = max(0, x). Equation 1 can be reformulated as: hv = f� �

Whu + b�, ∀v ∈ V (2) b ∈ Rd represents bias, N(v) includes nodes neighborhood of v in graph G including v and hv is representation of node v. The goal is to acquire multi-hop dependencies between nodes, diﬀerent GCN layers can be superimposed over one another. The representation of the node v after k layers can be written as hv = f� �

(Wkhku + bk)�, ∀v ∈ V (3) where, Wk and bk represent the weight and bias parameters of GCN layer. 3

3.3. Conﬁdence based Graph Convolutional Networks In Vashishth et al. [2019] Conﬁdence-based Graph Convolutional Networks (ConfGCN) is described. The authors deﬁne the inﬂuence score of node u considering its near node v during GCN process as follows: ruv = 1dM(u, v) (4) dM(u, v) = (µu − µu)T(Σ−1u + Σ−1v )(µu − µu) (5) dM(u, v) represents Mahalanobis distance between two nodes Orbach and Crammer [2012]. Speciﬁcally, considering nodes u and v, with label distributions µu and µv and co-variance matrices Σu and Σv, ruv gives more importance to spatially close nodes belonging to same class, otherwise reduces importance of nodes with low conﬁdence scores. This results leads to inclusion of anisotropic capability during neighborhood exploration. For a node v, equation 3 can be rewritten as: hv = f� �

ruv × (Wkhku + bk)�, ∀v ∈ V. (6) The ﬁnal label prediction is obtained by equation 7 with K number of layers. ˜Yv = so ftmax(WKhKv + bK), ∀v ∈ V (7) 3.4. GCN versus ConfGCN Models The diﬀerences between the two models are shown below: 1. The major diﬀerence in both models is that GCN implements the nodes embedding, projection from graph space to vector space, to describe the neighborhood while ConfGCN implements the conﬁ-dence based prediction scheme where the neighboring nodes having higher conﬁdence would be important parameters for the label of the unknown nodes. 2. GCN model implements Chebychev polynomial method for the computational cost reduction while ConfGCN model uses the Loss Smoothening, regularization and optimization for better eﬃciency. Despite having more executional time per epoch ConfGCN model has better eﬃciency with the similar datasets. 3. GCN doesnt have constraints on the number of nodes that inﬂuences the representation of a given target node and each node is inﬂuenced by all the nodes in its k-hop neighborhood. Whereas, in ConfGCN the label conﬁdences are used to ignore less conﬁdent nodes and nodes having higher conﬁdence would be considered important. Furthermore, number of nodes inﬂuencing do not sway the prediction of the wrong labels. 4. For more number of nodes in graphs such as Cora and CoraML datasets, ConfGCN has signiﬁcantly better performances than Kipf GCN as the previous model implements the Nodes entropy of neighborhood calculation. The limitations of the two models are shown below: 1. In GCN, memory requirement grows linearly in the size of the dataset. Whereas, ConfGCN requires higher memory requirement. 2. GCN is not applicable to directed graphs. It does not support edge features and is limited to undirected graphs (weighted or unweighted). 3. In GCN, locality for the nodes are assumed. 4. ConfGCN require more computational cost compared to the basic model. Cost increases as a con-ﬁdence value, (equation 4), for the exploration of the neighborhood node. 5. ConfGCN require more time for execution. 6. In ConfGCN, increasing layers reduces the accuracy. This behavior is connected to the increase of inﬂu-encing nodes with increasing layers, which results in average information during aggregation. In the following section, we will explain the proposed enhancement we did for selecting an optimal deep model that may result in fewer executing time and enhanced performance. 4. Proposed Models: Enhanced GCN and ConfGCN We proposed four major enhancement for both the models. The ﬁrst enhancement is changing the hyper-parameters and training algorithm. The second and third are major enhancement i.e. adding more structural information to adjacency matrix and canonical optimization technique (also referred as convex). Finally, the fourth 4

concerns a combination of two base models with introduction of additional dense layers. All these enhancement are applied on both the baseline models. Following section will explain how these models are designed and implemented. 4.1. Optimizing Hyper-parameters First we optimized the baseline models by ﬁne-tuning the hyper-parameter that include activation function (AF), loss function (LF), and the number of nodes in each hidden layer. For AF we have explored it with ReLu, ReLu6, Elu, and S elu. In case of LF, we utilized simple cross entropy and cross entropy so ftmax V2. Whereas, to explore the best number of nodes, we have taken nodes in each layer as 16, 32, 48, 64, 80, 96, 100, 112 and 200. We explored to ﬁnd the best combination of these parameters to provide optimal result in minimum amount of time. From now we will call the two enhanced versions Optimized Graph Convolutional Networks (OpGCN) and Optimized Conﬁdence based Graph Convolutional Networks (OpConfGCN). 4.2. Convex combination of activation functions A standard neural network Nd can be composed of a set of hidden layers d and a set functions Li that lead to a ﬁnal mappingL related to a problem to address: Nd =L◦Ld ◦· · ·◦L1. Speciﬁcally, each hidden layer function Li is composed of two functions, gi and σi, which include parameters within the spaces Hgi and Hσi. A remapping of the layer input neurons in form of activation function can be seen as: Li = σi ◦ gi. The learning process of Li consists in a procedure of optimization in the space Hi = Hσi × Hgi. Commonly, σi does not provide for a learning phase and Hσi is a singleton. Then, Hi = {σi} × Hgi. If we consider a fully-connected layer from Rni to Rmi which adopts Relu activation function, Hgi represents the set of all aﬃne transformations from Rni to Rmi, then Hi = ReLu × Lin(Rni, Rmi) × K(Rmi), where Lin(A, B) and K(B) are respectively the sets of linear maps between A and B, and the set of translations of B. In this paper, we adopt a technique to deﬁne learnable activation functions Manessi and Rozza [2018] that can be adopted in all hidden layers of a GCN architecture. The approach consists of a hypothesis space Hσi and is based on the following idea: • select a set of activation functions F = {f1, . . . , fN}, in which elements can be adopted as base elements; • ﬁx the activation function σi to combine in linear way as elements belonging to F set; • look for an optimal hypothesis space; • look for GCN optimization respect to Hi = Hσi×Hgi. conv(A) := {Σiciai|Σici = 1, ci ≥ 0, ai ∈ A}; (8) conv(A) is not vector subspace of V and is a generic convex subset in V reducing to a (|A| − 1)-dimensional simplex when the elements of A are linearly independent. If we consider F := {f0, f1, . . ., fN} the set of activation functions fi, the vector space F is deﬁned from F considering all linear combinations �i ci fi with ci ≥ 0, Σici = 1. Note that, despite F is a spanning set of F, it is not generally a basis; indeed |F| ≥ dim F. Based on previous deﬁni-tions, we can now deﬁne the technique to build learnable activation functions as follows: • ﬁx a ﬁnite set F = {f1, . . ., fN}, where each fi is a learneable activation function; • create an additional activation functionf as a linear combination of all the fi ∈ F; • select as hypothesis space H

The results are obtained for the following combination for F: F := {Relu6, Relu6} (9) where Relu6 = min(max(0, x), 6) (10) In this work we have implemented two methods: 1. Taking two input layers of the model, use the diﬀer-ent activation for them and then applying any mathematical operations on the inputs, i.e. Summation, Subtraction, Maximum, minimum and Average values of both Input layers output. 5

2. Looking at those results we got to know that summation operations are having the best results so we applied the canonical form on the outputs. In this case the convex combination becomes conv(A) := c1Relu6 + c2Relu6. The Structure of Base-Line model with optimized results is shown in Table 1:

Table 1: Baseline Model structure for enhancing with convex approach

Whereas its enhanced model structure is given in Table 2: From now we will call the two enhanced versions Convex Graph Convolutional Networks (ConvGCN) and Convex Conﬁdence based Graph Convolutional Networks (ConvConfGCN). 4.3. Clustering coeﬃcients In equation 1 the adjacency matrix A, which describes the topology of the network, is very signiﬁcant part of both models. Furthermore, the identity matrix I is added to A in order to remove zero values on the main diagonal. Our idea is to add more information about nodes by introducing a particular property called Clustering Coeﬃ-cients. In graph theory, the clustering coeﬃcient describes the degree of aggregation of nodes in a graph. The measure is based on triplets of nodes. A triplet is deﬁned as three connected nodes. A triangle can include three closed triplets, each one centered on one of the nodes. Two possible versions can be deﬁned: the Global Clustering Coeﬃ-cients (GCCs) and the local Clustering Coeﬃcients (CCs) Opsahl [2013]. We adopt the second deﬁned as: CCi = δiki(ki − 1) (11) ki is the degree of node i and δi is the number of edges between the ki neighbors of node i. The measure is in the range {0, . . ., 1}, 0 if none of the neighbors of a node is connected and 1 if all of the neighbors are connected. Topological information is provided through CCs, which is connected to other structural properties Strang et al. [2018], such as transitivity, density, characteristic path length, and eﬃciency, useful for representation in the vector space. In this work we are suggesting that there is another possibility of the matrix I which is to replace the main diagonal of the matrix I with CCs values. For a graph of n × n nodes the identity matrix becomes:

(12) From now we will call the two enhanced versions Clustering Coeﬃcients Graph Convolutional Networks (CCGCN) and Clustering Coeﬃcients Conﬁdence based Graph Convolutional Networks (CCConfGCN). The Structure of Base-Line model with optimized results is similar to Table 1. Matrix was added to the Adjacency matrix while pre-processing of the input and the combined matrix was considered as input to the neural network. The new matrix In, having the same size as Identity matrix, is added to the adjacency matrix instead of plain identity matrix. 4.4. GCN and Dense Layer combination Deep learning models have shown that beside creating a new layer, activation function, regularization method etc., if one can redesign existing layers etc. in a proper way. It can result in optimal performance as compare to the previous models. We adopted the same GCN and Dense layers and created a model that gave the optimal results. A dense layer is commonly known as fully connected layer and it is represented as:

Here, ylnu represents the neuron at layer n, wlni,v represents the weight (i, v) for that neuron multiplied with input neuron yln−1i , and blnv represents that bias that is added to the weighted sum. The resultant weighted sum value is passed through an activation function fln. Table 3 shows the structure of the model. We used this model on all four datasets. After extensive experiments, the best results are shown in Table 5. This combination provides a mixture 6

of both GCN and Dense layers and result in better performance compared to individual GCN or Dense layer. The training phase adopts the same Adam optimizer (similar to all other models). In each layer, we used Relu6 activation function. From now we will call the two enhanced versions Dense Graph Convolutional Networks (DGCN) and Dense Conﬁdence based Graph Convolutional Networks (DConfGCN).

Table 3: Model having both GCN and Dense Layer

In Table 3, ’In-Nodes’ represents the input nodes to a layer, ’Out-Nodes’ represent the output nodes of a layer, ’AF’ represents the activation function, whereas drop out rate is represented by ’DO’. 5. Results This section describes the results obtained on public datasets with the proposed improvements. In addition, the achieved results will be compared to the state-of-the-art models in literature. 5.1. Datasets For performance evaluation we adopt several semi-supervised classification datasets that are commonly used by other researchers. The set of dataset comprise of Cora, Citeseer, Pubmed Sen et al. [2008], and Cora-ML Bojchevski and G¨unnemann [2018]. The setup is the same as being followed in Vashishth et al. [2019]. Our aim concerns to classify documents into one of the pre-defined classes. Datasets represent citation networks, in which each document is encoded using bag-of-words features with undirected edges between nodes. The dataset statistics is summarized in table 4. Label mismatch concerns the fraction of edges between nodes with different labels in the training data. The datasets have substantially low label mismatch rate except Cora-ML.

Table 4: Dataset statistics.

5.2. Competitors Our method is compared with approaches of diﬀerent nature. Competitors can be divided into four groups. First group includes approaches based on extensions of the GCN model. G-GCN Marcheggiani and Titov [2017] provides an extension adopting edge-wise gating to remove noisy edges during aggregation. GAT Veliˇckovi´c et al. [2018] provides a method based on attention which gives diﬀerent weights to diﬀerent nodes by allowing nodes to attend to their neighborhood. Dual-GCN Monti et al. [2018] allows to learn both vertex and edge features and generalizes the GAT model Veliˇckovi´c et al. [2018]. LGCN Gao et al. [2018] works based on a learnable graph convolutional layer (LGCL). LGCL automatically selects a ﬁxed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format. Fast-GCN Liang et al. [2015] is an accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Second group includes approaches based on extensions of the GNN model Scarselli et al. [2008]. GGNN Li et al. [2016] generalizes RNN framework for graph-structured data application. GPNN Liao et al. [2018] adopts partition approach to spread the information after the subdivision of large graphs into subgraphs. Third group includes approaches based on embedding. SemiEmb Weston et al. [2012] is a framework which provides semi-supervised regularization to improve training. DeepWalk Perozzi et al. [2014] adopts random walks to learns node features. Planetoid Yang et al. [2016] adopts a transductive and inductive approach for class label prediction using neighborhood information. Fourth group includes baseline approaches. LP Zhu et al. [2003] is a label propagation algorithm which spreads labels information to neighborhoodfollowing the proximity. ManiReg Belkin et al. [2006] provides geometric regularization on data. Feat Yang et al. [2016] works based on node features ignoring the structure infor-7

mation. 5.3. Comparison We have summarized our results by showing the best results of all the enhancements for all the datasets. Table 5 shows the accuracy of all the models mentioned in Section 4. We have been successful in getting state-of-the-art result on one dataset as well as very close to the state-of-the-art work done till now on the other three datasets as highlighted in green in Table 5. On CoraML dataset we achieved the current best accuracy of 86.9 ± 0.4 using DConfGCN model. This is the current state-of-the-art based on our knowledge as the most recent papers i.e. Dual-GCN, LGCN, and Fast-GCN did not reported their results on CoraML dataset. In case of Citeseer dataset, the best result which we achieved is 73.26%, this is more than Dual GCN and 0.3% less from LGCN. This makes our accuracy with ConvConfGCN the second best till date. However, just to highlight that LGCN Gao et al. [2018] report only the best result whereas our result are based on 100 run which are more stronger compare to reporting one highest performance. We have got the 3rd best accuracy for Pubmed dataset i.e. 79.8 ± 0.4. Finally, on Cora dataset, we achieved 82.1±1.2 accuracy with DGCN that is better than baseline GCN and ConfGCn by slight margin, but at 4th position overall in the list. One of the reason for not having the best result for Citeseer, Cora, and Pubmed could be that the best reported results in the papers Gao et al. [2018]; Monti et al. [2018]; Liang et al. [2015] are not having the mean performance over multiple runs. Another reason is that, our model can not be directly compared with model like LGCN as it uses regular convolutional kernel in their model. Rather designing new kernels to work on graph data, in LGCN the authors organized the graph data in a way that normal convolutional kernel can operate over it and learn feature from them. These enhancement and results are reported to provide baseline for future works to be done in the field of SSL for the Graphs. In table 6 execution time for PubMed dataset is shown. As the size of the features in each dataset varies, that is why the time (in seconds) per epoch varies for each dataset. GCN and its enhancements are faster compared to confGCN and its enhancements. While optimizing based on hyper-parameters, we found that the major reduction in computational cost was due to usage of the

Table 5: Performance comparison of different methods on described datasets.

cross-entropy softmax V2 function rather than simple cross-entropy. Therefore, in all our later experiments we used this loss function. ConfGCN based models took more time compare to GCN based models. The optimal models interms of execution time is OPGCN.

Table 6: Execution time on Pubmed dataset

6. Conclusions We present enhanced models of GCN and ConfGCN for the Graph Convolution with Semi-supervised learning. The main focus among all the enhancements was on four changes: parametric conﬁguration, adding more structural information to adjacency matrix for graph representation, convex optimization related to activation functions 8

and combination of base models and dense layers. As this work is related to the Graph convolutions and with these enhanced models we have been able to show that the addition of the layers can be helpful for the increment of the accuracy, so this process has opened a path where addition of layer means accuracy reduction limitation of SSL has been removed. Also currently all the Graph Convolutional Layers are using 1D convolutions to operate the model, there can be 2D or 3D dimensional weighing schemes can be implemented on the concurrent models. The GCN was a new approach for SSL and in that the layer-wise propagation rule was implemented while ConfGCN is a model which estimates label scores with labels conﬁdence. We have prepared six diﬀerent models with diﬀerent conﬁgurations and we have validated our models with four benchmark datasets. In majority of the enhancements, we have been successful in increasing the accuracy as well as the execution time for all the best possible con-ﬁgurations in all four data-sets. Acknowledgments The ﬁrst two authors acknowledge the guidance and supervision of their late Prof. Alfredo Petrosino. May he rest in peace. References

James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1993–2001, 2016.

Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399–2434, 2006.

Aleksandar Bojchevski and Stephan G¨unnemann. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In International Conference on Learning Representations, pages 1–13, 2018.

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. Spectral networks and locally connected networks on graphs. In International Conference on

Learning Representations (ICLR2014), CBLS, April 2014, 2014.

Micha¨el Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844– 3852, 2016.

Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1416–1424. ACM, 2018.

Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. CoRR, abs/1506.05163, 2015.

Thomas N. Kipf and Max Welling. Semi-supervised clas- sification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. Gated graph sequence neural networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.

Meimei Liang, Futao Zhang, Gulei Jin, and Jun Zhu. Fast- gcn: a gpu accelerated tool for fast gene co-expression networks. PloS one, 10(1):e0116776, 2015.

Renjie Liao, Marc Brockschmidt, Daniel Tarlow, Alexan- der L. Gaunt, Raquel Urtasun, and Richard S. Zemel. Graph partition neural networks for semi-supervised classification. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings, 2018.

Franco Manessi and Alessandro Rozza. Learning com- binations of activation functions. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 61–66. IEEE, 2018.

Diego Marcheggiani and Ivan Titov. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1506–1515, 2017.

Federico Monti, Oleksandr Shchur, Aleksandar Bo- jchevski, Or Litany, Stephan G¨unnemann, and Michael M Bronstein. Dual-primal graph convolutional networks. arXiv preprint arXiv:1806.00770, 2018.

Tore Opsahl. Triadic closure in two-mode networks: Re- defining the global and local clustering coefficients. Social Networks, 35(2):159–167, 2013.

Matan Orbach and Koby Crammer. Graph-based trans- duction with confidence. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 323–338. Springer, 2012.

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deep- walk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.

Alessandro Rozza, Mario Manzo, and Alfredo Petrosino. A novel graph-based fisher kernel method for semi-supervised learning. In 2014 22nd International Conference on Pattern Recognition, pages 3786–3791. IEEE, 2014.

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2008.

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI magazine, 29 (3):93–93, 2008.

Alexander Strang, Oliver Haynes, Nathan D Cahill, and Darren A Narayan. Generalized relationships between characteristic path length, efficiency, clustering coeffi-cients, and density. Social Network Analysis and Mining, 8(1):14, 2018.

Shikhar Vashishth, Prateek Yadav, Manik Bhandari, and Partha Talukdar. Confidence-based graph convolutional networks for semi-supervised learning. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, pages 1792–1801, 2019.

Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li`o, and Yoshua Bengio. Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.

Jason Weston, Fr´ed´eric Ratle, Hossein Mobahi, and Ro- nan Collobert. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012.

Prateek Yadav, Madhav Nimishakavi, Naganand Yadati, Shikhar Vashishth, Arun Rajkumar, and Partha Talukdar. Lov´asz convolutional networks. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1978–1987, 2019.

Zhilin Yang, William W Cohen, and Ruslan Salakhutdi- nov. Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48, pages 40–48. JMLR. org, 2016.

Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003.