Reconstruction of Gene Regulatory Networks usingMultiple Datasets

2019·Arxiv

Abstract

Abstract

Motivation: Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results. Results: The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks and DREAM5’s Escherichia coli and Saccharomyces cerevisiae networks and sub-networks. Many single-dataset and multi-dataset algorithms were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score. Availability: The Python implementation along with the data sets and results can be downloaded from github.com/msaremi/GENEREF

1 INTRODUCTION

GENE Regulatory Networks (GRNs) model the crucialinteractional patterns that control the molecular machinery in the cells of an organism [1]. With the advent of high-troughput technologies, simultaneous expression data from several genes have been available for many species and many algorithms have been proposed that can reconstruct GRNs from these data. Though algorithms perform quite well on artificial data sets, the shortcomings the algorithms encounter when dealing with laboratory data, keeps deciphering GRN data an open challenge in the field of systems biology. Multiplicity and variety of genomic data sets gathered for a species provide an opportunity to develop algorithms that employ multiple data sets to extract more information and yield more accurate results in the real world challenges. In this paper, we provide an algorithm, named GENEREF (GEne NEtwork inference with REgularized Forests), that brings together the idea of boosting in machine learning [2] and the capability of feature scoring in decision trees into a single model. The model is capable of using multiple types of data sets for the task of GRN reconstruction in arbitrary orders.

There are typically two types of transcriptome data that are used to reconstruct GRNs: Steady-state data and time series data. The steady-state data are obtained by applying simultaneous perturbations on a number of genes and measuring the expressions after the network reaches a steady state [3], [4]. Data of this type is plentiful [5]. In the case of multiple perturbations, the system can be corresponding to profiles acquired from different donors or biological replicates [3]. The time series data on the other hand can in principle be more informative, since they capture the dynamic behaviour of the system throughout the time. Despite the temporal information, they are less available due to the capturing and timing difficulties in the process. The systematical issues that arise with regards to this type of data sets include rarity of fairly uniform cell populations to sample over time, de-synchronization of cells during the experiment, and high costs of dense sampling of the population [6].

In the realm of steady state data, ensemble methods based on regressor trees have shown to be one of the most effective approaches to extract the relation between the genes in a GRN [3], [7], [8], [9], with GENIE3 being the best performer in the DREAM4 challenge [10]. The DREAM4 challenge measures performance of variant algorithms on the reverse engineering of five steady state in silico networks. GENIE3 divides a problem with g genes into g sub-problems and proceeds to solve each problem by the intrinsic feature ranking property of random forests or extra trees. This work can be thought of as a multi-iteration algorithm where each iteration contains an extension to the GENIE3 algorithm applied on a data set. Unlike GENIE3, it can exploit multiple data sets and is less prone to over-fitting because of the regularization mechanisms it employs.

An advent algorithm that we will consider as the competing algorithm with our work is dynGENIE3. It is another extension to GENIE3, that introduces the differential equations governing the gene expression relationships to the typical tree-based ensemble methods used in GENIE3. The differential equations are formed based on the assumption that the transcription rate of the target gene is a function of the expression levels of all other genes and the decay rate of the products of the target gene itself. Compared to dynGENIE3, two advantages of our algorithm are that it does not depend on the decay rate parameters included in the differential equations and it can be extended to use different types of steady state data (e.g. double knock-out and multi-perturbation) [9].

In sum, GENEREF is a highly scalable method that consistently increases the performance of the base algorithm (GENIE3), and outperforms the present multi-perturbation GRN algorithms. The algorithm also receives better scores than the competing algorithms (dynGENIE3 and iRafNet) on some networks when using the typical scoring method incorporating both AUROC and AUPR. Moreover, GENEREF drastically outperforms dynGENIE3 when a new scoring method based only on the AUPR metric is used.

In what follows we will introduce our method. This paper also encompasses the evaluation of our algorithm on the in silico DREAM4 networks in comparison to other network reverse-engineering methods.

2 METHODS

The problem of GRN reconstruction can be summarized as initially inferring rankings for the potential regulatory links from the data and eventually applying a threshold on these rankings to obtain a predicted network. GENEREF – along with many other algorithms in the field [3], [9], [12], [13] – deals only with the former issue and puts the latter aside. Although, it is possible to compute the expected optimal threshold cut-off [14], comparable algorithms typically satisfy with taking into account all of the distinguishing thresholds when reporting their evaluation results. (We will discuss these evaluations metrics further in sub-section 3.2.) In what follows in this section, we provide the details of problem and the preliminary structures of the algorithm, along with its definition.

2.1 Network Inference with Tree-based Methods

Tree-based algorithms belong to the group of GRN inference methods that associate a value to each ordered pair of genes predicting how certainly there is a link in the ground truth network from the first to the second gene. This is done by finding a regression that predicts the expression of one gene based on the others. The basic idea is to decompose the problem of finding the regulatory links in g genes into g sub-problems. Each sub-problem will then be a regression problem in which the goal is to find the best prediction of the target gene based on the expression values of all the other genes.

Generally, tree-based methods – be them random forest or extra trees – are non-parametric algorithms that implement the idea of regression algorithms using regression trees: For each gene ), a sub-problem is defined. In the j-th sub-problem, the expression values of the j-th gene will be considered as the target values and the expression values of all other genes will express the feature values. A regression is then solved for each sub-problem using a tree-based method.

Trees have the innate capability of calculating feature importance scores. As the importance of the i-th feature in the j-th sub-problem gives an estimation about the certainty of the existence of an edge from gene to , the tree-based algorithms merge these scores measured for each sub-problem to produce a final ranking for all edges in the network.

In a random forest setup, the average of the importance values of all features for a tree is close to the variance of the output variable. Therefore, to make the feature importances in all sub-problems homogeneous, the expression levels of all genes are typically normalized before the data is provided to the algorithm [3]. In our work, we applied this pre-processing step for each data set before it is fed to a random forest.

2.2 Regularized Random Forests

Regularized Random Forests (RRFs) are an extension to the typical random forests that are superior to the regular forests in handling the issue of feature redundancy. When used as a feature selection technique, RRFs can run with the same setup as regular forests. Firstly introduced in 15, RRFs addressed the feature redundancy issue in the problem of feature selection by setting a penalty against adding new features to the currently selected features set and trying to maintain a minimal feature set.

In the original RRF framework, consider a feature selection problem with g features represented as the feature set . Trees in a forest are formed in a top-down manner, selecting one feature and a split point for each node that is being added to the hierarchy of the tree. The selected feature and its split point will determine the importance of each feature after the construction of all trees. As a tree in the forest is built, there will be more and more features based on which the learning data is split. We name the set of these features . In every node of each tree, the node is configured based on the improvement function as follows:

ImprovementRImprovementImprovementotherwise. (1) where is the node being configured and is a regularization parameter in the range [0, 1). Improvement(.) is the function used in regular random forests and in the case of a regression problem, can be any of the Mean Squared Error (MSE) or Mean Absolute Error (MAE) gains [16]. The ImprovementR function is tested for a randomly selected set of features and their randomly selected split points and the feature with the maximum improvement will be assigned to the node:

where is the feature assigned to node is then updated by the insertion of . It can be seen that as shrinks in equation 1, the tree biases more towards selecting features that are already in the selected feature set, and hence will be more picky about choosing features in computed nodes. By choosing the proper value for , it will therefore become possible to omit the redundant features from the final selected features using an importance score.

After the construction of the whole forest, the relative importance of a feature is measured by averaging the values of the Improvement function over all the nodes split by that feature:

where is the set of all nodes in the whole forest with the feature. These importance measurements guide the procedure of feature selection in the RRF.

Empirical data indicate that GRNs are sparse networks with in-degree distribution concentrated around the mean connectivity [17], [18], [19]. We speculate that the prediction performance of such networks can be enhanced using regularization methods. The regularization also reduces the probability of over-fitting. Based on these grounds, we utilize the same idea of RRFs and use an algorithm similar to the guided regularized random forests proposed in 20. We propose an iterative algorithm that uses various sets of learning data to reconstruct the final regulatory network. Each of these data sets can be seen as the output of an experiment. This work also inherits the decomposition approach used in many GRN reverse engineering algorithms like GENIE3 [3] and TIGRESS [12]. On each iteration, the algorithm picks a data set and uses the regularization parameters obtained from the previous iteration to guide a random forest. At the end, the network edges are ranked based on the final feature importance values.

2.3 Problem Definition

Suppose that we want to discover the regulatory links between g genes () of a certain species. Also, consider there to be M learning data sets , (l = 1, ..., M) of that species. Each data set represents a set of gene expression profiles obtained from a corresponding experiment. The l-th data set is comprised of gene expression profiles. Each gene profile is a snapshot containing the expression values of all of the g genes.

Here, we consider two types of experiments: steady state experiments and time-series experiments. To differentiate between the type of data sets that are generated using these experiments, we superscribe them with appropriate abbreviations. In our case, every arbitrary data set is either represented as if it holds the measurements of a steady state experiment, or as if it holds the measurements of a time-series experiment.

In a steady state experiment, perturbations are applied to randomly selected subset of genes (i.e. their basal expression is modified) and snapshots are captured after the network reaches a steady state. Each element in a steady state data set corresponds to the expression values of all genes in one of these snapshots. If the experiment producing entails profiles labeled with , then will be comprised of the following elements:

where is a vector containing the expression levels of all g genes in profile in that experiment:

In this vector, is the expression level of gene in profile . We also denote the vector of the expression level of all genes except in profile using :

In a time series experiment, interventions are applied to a subset of genes, and while their causal effect is propagated through the GRN, the expression of all genes are probed simultaneously at specific time points. Consider that the profiles in the experiment producing are captured at time points after the perturbation is applied (i < j indicates will consist of elements ordered as follows:

where is a vector that contains the expression levels of all g genes at time :

For each multi-factorial perturbation data set, a learning problem composed of g sub-problems can be constructed, such that in the j-th sub-problem, the j-th gene values become the target and all the other values constitute the input features. Every sub-problem is parallel to a regression problem where the goal is to find functions that minimizes the errors in the following equation:

and

Note that in the steady-state setup the value of is excluded from the domain of function , because steady-state data sets have no information regarding causal selfloops.

Time-series learning data can be treated similarly, with the difference that the causal effect of the regulators appear after a time delay. Hence, the expression of each gene will be a function of the expression of all genes (possibly including the same gene) in a previous time point. This is equivalent to an auto-regressive model [21]. We use the same expression as equation 9 with the exception that the input features will be chosen from a previous time point ):

For the sake of simplicity, we will keep k = 1 throughout this paper.

These regression sub-problems can be solved by the regular random forests or the guided RRFs within which the importance of a feature will be measured using equation 3. When the guided RRFs are used, the regularization parameters should be provided as introduced in sub-section 2.4. Throughout the rest of this paper we will drop the superscript from the notation of data sets wherever the type of the data set is not important.

Fig. 1: Folded data flow diagram of GENEREF. On the l-th iteration, the l-th data set and the l-th regularization matrix are fed to a random forest and the l-th confidence matrix will be produced. The next level’s regularization matrix is then constructed based on .

2.4 The GENEREF framework

The goal of a multi-dataset algorithm is to combine available data sets in a plausible manner to produce a network as similar to the ground-truth network as possible. (We will discuss the similarity criteria in sub-section 3.2.) Here, we represent the ground-truth network by an adjacency matrix . Our algorithm, GENEREF, produces confidence matrices, ), that are of the same size as W. The elements of the last confidence matrix, , are expected to show how certainly the corresponding edges will be appearing on the actual network. The goal of GENEREF is therefore to produce a on which the elements corresponding to the real connections of the network get higher values than the others.

GENEREF works in an iterative manner: On each iteration it takes a pre-processed1 data set and provides a set of regressors (here, the regular or regularized random forests) with it. Figure 1 shows the overall structure of the algorithm. There are M data sets in total. The data sets can be of either type. There is also no restriction upon the uniqueness of the data sets. Two data sets, and , are allowed to share elements or be equal. In principle, the algorithm accepts any order of data sets.

On each iteration, GENEREF picks a data set and decomposes it to g new data sets, each of which is represented as a separate sub-problem to be solved by a regular or regularized random forest. On the first iteration, a set of g regular random forests are trained with the decompositions of data set . Consider sub-problem j on this iteration. The random forest regressor solves this sub-problem and provides importance scores for the features. We represent these scores as a vector where is the importance score for the i-th feature in the j-th sub-problem of data set and is caculated based on equation 3. Note that in the steady state configuration, the j-th feature is not present and therefore . However,

in a time-series data set, the j-th feature can be of a higher value. Up to this stage, our algorithm parallels GENIE3 [3]. GENIE3 generates a temporary matrix by simply merging the column vectors , and represents it as the final edge confidence matrix. The issue with this approach is that since the sub-problems have been solved independently, feature importance values will not be comparable across the columns and cannot be joined without further modification. Therefore, a modulation phase is needed on the columns before joining them. An ideal regressor is expected to assign the highest possible value (i.e. 1) to the “relevant” features and the least possible value (i.e. 0) to the “irrelevant” ones. By diversifying the values on each column to these extents, the modulation phase helps the potential links on various columns to be treated equally. While GENIE3 keeps as is and reports it as the final confidence matrix, GENEREF goes on to performing the modulation phase by applying the following standardization function on each column of :

where is a small constant. The edge confidence matrix is constructed by merging the column vectors :

On each of the next iterations, GENEREF exploits the next data set. The feature importance values are computed using the regularized random forest. However, since on these iterations GENEREF has already acquired knowledge about which features are potentially more important using the previous learning data, it uses this knowledge to regularize the random forests. To this end, GENEREF uses a vector of regularization values for each sub-problem at each iteration. We represent these vectors by (), where is the regularization value of the

j-th feature in the i-th subproblem on iteration l. The vectors form the l-th iteration’s regularization matrix using the following equation:

The details about how the regularization parameters are computed are skipped for now and will be provided in sub-section 2.4.1. After these parameters are calculated for the current iteration, the RRF computes the improvement function for each node in the forest using equation 15 and retrieves the new feature importance scores .

Here, is equivalent to the feature corresponding to the j-th gene in the i-th sub-problem.

GENEREF provides a matrix for each iteration in the same way as the second iteration. Once the algorithm finishes iteration M, it returns the final edge confidence matrix . Every element on this matrix, shows the certainty that an edge exists from gene to gene . Please refer to algorithm 1 for the concise definition of this method. Standard performance criteria, as described in section 3, can be used to evaluate this matrix by comparing it to the adjacency matrix W.

2.4.1 Regularization parameters

The regularization parameters matrix used on the l-th iteration of GENEREF, , is calculated using the edge con-fidence matrix values obtained in iteration in the algorithm, . In this sub-section, we will introduce a method to properly convert to . This method encompasses an operation followed by a mapping. We refer to the mapping as the regularization mapping throughout the rest of this paper.

The more the certainty of the presence of an edge, the more potentially important that edge will be, and therefore it should be penalized less. Although the confidence values provide an ordering for the importance of each edge, there is no guarantee that their values depict the relative importance of the edges. In fact, using the random forest regressors is not a proper means to get accurate regularization values. Instead, we remove the relative importance values while keeping their ordering. The following function does this.

where is the indicator function. The operation helps us make sure that the whole domain of the mapping takes its effect evenly at each iteration. This gives a full control of how we want to take into account different edges based on their importance.

It has been reported that GRN’s have fast decreasing in-degree distributions [22], [23]. This means that for each node there are only a few direct regulatory genes. Also, it has been shown that GRN’s typically have few global regulatory genes [22], [23], [24], [25]. It is therefore expected that a few links should be preserved as the important ones and all the others should be left as unimportant. The important ones are preferably those with the highest importance values. It is the regularization mapping that determines the regularization values based on the retrieved importance of the edges.

For the regularization mapping, we chose a simple, yet fairly flexible function – the Kumaraswamy cumulative distribution function (CDF) [26]. This extension of the power function, ensures selecting the few global regulators if proper parameters are set. However, any other nondecreasing functions that try to satisfy the aforementioned properties of GRNs can be tested too. We denote the Kumaraswamy CDF function using KCDF :

Consider that is the matrix comprised of the elements . The Kumaraswamy CDF alters the importance values by using two constant shape parameters and and returns the regularization parameters matrix:

Both and range from 0 to 1. Roughly speaking, greater values extend the domain of non-regulatory edges, while greater values favour preserving a broader domain for regulatory links. When , all features are treated equally, there will be no regularization, and the RRF treats all features as equally important. We describe how selective the regularization mapping is by defining a property called the “selectivity” of the mapping. In section 3, we will make use of this property to discuss the optimal value of and .

We define the selectivity property as the area between the regularization mapping and the line y = 1 and denote it by Sel . When the regularization mapping is the KCDF function, the property is defined as:

In essence, the selectivity property is a determiner of how selective the regularization mapping is; the higher the selectivity, the more picky the mapping about taking into account less important edges.

3 RESULTS

3.1 Data selection

In this work, we chose the DREAM4 and DREAM5 networks as the base gold-standard networks from which the data sets were generated. The DREAM4 data sets [27] are available as part of the DREAM4 challenge [10] and consist of five in silico networks. Each network has 100 genes and is generated from specific gene modules chosen to mimic networks in two model species Escherichia coli and Saccharomyces cerevisiae. The DREAM5 networks are part of the DREAM5 standard challenge [11]. Two of the DREAM5 networks that we used in our experiments belong to the same model species, but are larger and are made based on curated interactions in E. coli and high-confidence set of interactions extracted from genome-wide transcription factor binding data (ChIP-chip) and evolutionarily conserved binding motifs in S. cerevisiae [11], [37]. In addition to the original DREAM5 networks, we also extracted some subnetworks these networks.

We used GeneNetWeaver (GNW) 3.0 [28] as the tool to generate the in silico data sets from the gold-standard networks. For each network in the DREAM4 challenge, we produced 10 collections of training data. In each collection, there are three data sets: a multi-perturbation, a knockouts and a time-series data set were generated. All of the data sets were produced using the recommended settings in DREAM4. As in the DREAM4 challenge, internal noise based on stochastic differential equations and a mix of normal and log-normal noise as the measurement noise were added to each network.

We used the same piece of software to extract sub-networks from the original networks in the DREAM5 challenge. In addition to the two original DREAM5 networks, this let us produce ten more networks of these two model species:

• E. coli sub-networks: 5 sub-networks of the E. coli gold-standard network comprised of 40, 80, 160, 320, and 640 genes,

• S. cerevisiae sub-networks: 5 sub-networks of the S. cerevisiae gold-standard network comprised of 100, 200, 400, 800, and 1600 genes.

All of the above networks were generated using the GNW’s network extraction functionality with its default settings, after removing the self-regulatory links. In the case of the S. cerevisiae sub-networks, the greater sub-networks are super-networks of the smaller subnetworks. For example, S. cerevisiae of size 1600 is a super-network of S. cerevisiae of size 800. We generated four data sets for each of the five networks in any group: A multi-factorial perturbation data set, a time-series data set, and two knockout data sets. All of these data sets were generated using GNW with the default settings.

Overall, there were 5 DREAM4 networks, 2 DREAM5 networks, and 10 DREAM5 sub-networks. We used each of these groups for distinct tests and evaluations. While the original networks will be used for the final evaluations, we will use the sub-networks as the validation sets, and extract the optimal parameters of GENEREF based on them.

3.2 Performance evaluation

To evaluate the results from the algorithm, each final edge confidence matrix ( ) is compared to the corresponding gold-standard network. Two evaluation metrics were considered: Area under the receiver operating characteristics (AUROC) and area under precision-recall curve (AUPR). The p-values of the metrics – which are the probability that a random reconstruction algorithm produces an equal or greater value than the metric – were calculated based on the methods introduced in 29. Based on the p-values, the prediction performance metric for a single network can be defined as:

where and are respectively the AUPR p-value of the k-th network and the AUROC p-value of the k-th network. The overall score of the algorithm over all C networks is determined by averaging the scores obtained for each network:

Here, C depends on the number of networks in the experiment. For example, if our experiment is performed on the five DREAM4 data sets, then C = 5.

Our observations suggest that the AUROC and AUOR scores do not show uncorrelatedness to a confidential extent

Fig. 2: The AUROC and AUPR of 100,000 evaluations of a random predictor used on a network with 100 genes and 176 edges (Network 1 from DREAM4).

on sparse networks on the scale of DREAM4’s. Refer to figure 2 as an example, where the random AUROC and AUPR values of Network 1 in DREAM4 are depicted and a lack of complete correltedness can be inferred. (We will further discuss the correlation between these two metrics in a network in Appendix A.) Moreover, it has been shown that the AUPR metric is more informative than the AUROC in classification problems with skewed class distributions as in biological regulatory networks and the DREAM4 and DREAM5 networks [30]. Considering these two facts, we determined to introduce a second score metric too:

and similar to equation 21, the overall score for scoreAUPR will be calculated using the following equation:

3.3 Evaluation of the competitive methods

Among the competing algorithms are those that utilize only one type of data sets: SVR [32] (which uses support vector regressors) and its ensemble variant E-SVR; TIGRES [12], which applies the linear regression equation directly to the problem; GENIE3 [3] and its time-lagged variant tl-GENIE3, which are roughly equivalent to one iteration of GENEREF; BiXGBoost [13], which considers two local structure of GRNs into a bidirectional model and applies a gradient boosting with regressor trees [31] to solve the model; and NIMEFI (GENIE3+E-SVR) [33], an ensemble method that combines predictions from GENIE3 and E-SVR. In order to have a fairer comparison, we also included dynGENIE3 [9], which utilizes both types of data sets, and iRafNet [8], which guides the training of a data set using an other data set. Both GENIE3 and dynGENIE3 are tree-based ensemble methods, with dynGENIE3 taking into account the ordinary differential equations acting in both the steady state and time series data sets. iRafNet, is another tree based algorithm that applies prior knowledge extracted from other data set during the training of random forests on the main data set. Unlike GENEREF, that alters the Improvement function, iRafNet uses the prior weights to alter the distribution of feature selection probability during the construction of the trees.

The DREAM4 scores that we report for SVR, E-SVR, TIGRESS, and NIMEFI were taken from 33. We ran our own implementations for GENIE3, tl-GENIE3, and GENEREF. In order to obtain the results for dynGENIE3, the original implementation was run on our data sets. The scoring metrics of iRafNet were calculated using the AUROC and AUPR values reported in 8. Also, we calculated the scoring metrics of BiXGBoost based on its AUROC and AUPR values reported in 13.

3.4 Performance evaluation on the DREAM4 networks

As GENEREF can be launched with different arrangement of data sets, here we evaluate the influence of the number of the data sets and their types. We tested GENEREF with different combination of three data sets: a multi-factorial perturbation steady state data set, a multi-perturbation time-series data set, and a single knock-out steady state data set. For the sake of abbreviation, we call them respectively MF, TS, and KO. All data sets were generated using the GNW application, with the default settings. This means that all the data sets were generated with the same settings as the DREAM4 contest. We call an ordering of the data sets a configuration. Based on the number of data sets that GENEREF was trained on, the configurations that we used for our experiments can be divided into three categories:

• GENEREF(MF), GENEREF(TS), and GENEREF(KO): On each of these configurations, only one data set was exploited. Note that the GENEREF algorithm in these configurations is equivalent to GENIE3 or tl-GENIE3 with the exception that GENEREF applies a modulation phase.

• GENEREF(KO+TS), GENEREF(TS+MF), and GENEREF(KO+MF): In this set of configurations, only two of the data sets were used. For example, GENEREF(TS+MF) trains on the MF data set, using the regularization matrix attained from the TS data set.

• GENEREF(KO+TS+MF): This configuration utilizes all the three data sets, in the indicated order, to infere the GRN.

Unless explicitly stated for an experiment, other orderings of the data sets were set aside.

There are two parameters in GENEREF that should be kept in consideration. They are the and parameters that are the shape parameters of the mapping applied on the confidence matrices to obtain the feature importance matrices. Although the parameter can differ from one iteration in the algorithm to another, we chose to keep it constant throughout the algorithm for the sake of simplicity.

To eliminate the parameters and get an overall assessment for the performance that is comparable to present state-of-the-art methods, we performed a 5-fold cross validation on the five DREAM4 networks. To this end, for

tl-GENIE3(TS) GENIE3(KO) GENIE3(MF)

GENEREF(TS) GENEREF(KO) GENEREF(MF)

Fig. 3: The effect of the modulation phase on the performance of the first iteration of GENEREF. 10 experiments for each model were done in order to obtain the robustness values.

each of the five networks, we applied a grid search with 13-by-13 logarithmically uniform intervals over the range offor andfor to retrieve an and a value that maximized an assumed metric such as the AUROC. Then, for each of the five target networks, we averaged the optimal and values of the other four networks and used these averaged values as the evaluation parameters to re-compute the metric on the target network. Finally, we reported the metric corresponding to the evaluation parameters on each of the target network as our algorithm’s computed metric.

As our first experiment, we evaluated the influence of the modulation phase on GENEREF. The evaluation was done only using the first category of configurations, i.e., one-level GENEREF trained on the three data sets. To get a proper evaluation, we compared each GENEREF run with the corresponding configuration of the GENIE3 algorithm. Figure 3 shows the the comparison of the robustness of these two algorithms. In order to evaluate the robustness, the 10 collections of data sets for each of the 5 networks were used and the corresponding score and scoreAUPR values were measured. The figure depicts an obvious improvement on every one of the 5 networks.

We also examined the robustness of our algorithm based on the number of iterations it was run. The same data sets were used. Figure 4 compares the robustness of all three categories of configurations. On each network, the first three box and whisker plots show the algorithm run on only one data set. The next three plots, depict the robustness of the two-dataset configurations of the algorithm. The last plot belongs to a run with three data sets. Although not universal, a general pattern of enhancement is observed when more data sets are added to the algorithm. Except for Network 2 with negligible improvement, all other data sets have a noticeable improvement when the third category of configurations is used in comparison to the second category. The improvement is in terms of both score and scoreAUPR.

Along with the robustness evaluation, we directly tested the influence of the number of iterations on the performance of GENEREF. Figure 5 compares the score and scoreAUPR criteria of GENEREF versus the number of iterations. Here, instead of using the aforementioned categories, all M-permutations of the three data sets were tested for the M-iteration algorithm. This eliminates the effect of the order of data sets and gives an expectation of improvement based on M. An exceptionless improvement is observed as the number of iteration increses from 1 to 3.

To further evaluate our method, we also performed a Mann-Whitney test that compares four configurations of the algorithm with GENIE3 trained on the KO data set, GENIE3 trained on the MF data set, tl-GENIE3 trained on the TS data set, and dynGENIE3 trained on both the TS and MF data sets. These are the only algorithms that we managed to test directly. The results are shown in table 1, which compares the overall-score and overall-scoreAUPR metrics. As can be seen, the U-statistic marginally depend on the score metric. While GENEREF outperforms the other corresponding algorithms when using either score, the difference is more considerable when using the scoreAUPR metric. Also, based on the U-statistic, it can be concluded again that

Fig. 4: The robustness evaluation of the score and scoreAUPR metric using seven configurations of GENEREF: three 1-iteration, three 2-iteration, and one 3-iteration configurations.

modulation phase has improved the overall performance of the algorithm.

For the last experiment, we compared the performance of various algorithms. We compared our algorithm to eight other methods: SVR, E-SVR, GENIE3, BiXGBoost, NIMEFI, iRafNet, and dynGENIE3. The results can be found on table 2. To have a fair comparison, we only compared those configurations of GENEREF that utilized the same data set as the algorithms in comparison. The competing algorithms use one of the four groups of data sets. SVR, E-SVR, TIGRESS, NIMEFI and TIGRESS only use the MF data set. tl-GENIE3 works only based on the TS data set. iRafNet trains on the TS data set based on the information obtained from the KO data set. dynGENIE3 merges the MF and TS data sets and then performs the inference. None of the competing algorithms were run on all of the three data sets. Although the KO and MF data sets can hypothetically be combined to boost the performance, neither of the original papers have reported results based on a combination of these two data sets either. Therefore, the only algorithm using the three data sets and reported on the last group of data sets, is GENEREF.

Two metrics were used for the comparisons: score and scoreAUPR. With any of the two metrics, GENEREF has the highest hit-count in every group of data sets. It also has the highest overall score and scoreAUPR among all the algorithms. When only one data set is used (MR or TS) GENEREF outperforms the competing algorithms on most of the networks. Even though it only applies a single improvement on GENEIE3 or tl-GENIE3, it turns out that this improvement has a drastic effect. When the information from the MF and TS data sets is used GENEREF outperforms dynGENIE3 in all cases. iRafNet is the only algorithm that has higher values in more than one of the networks when the KO and TS have been used. It also has the higher overall performance in terms of both score and scoreAUPR. However, after the third data set (the MF data set) comes into play, the overall performance of GENEREF goes higher than that of iRafNet.

3.5 Performance evaluation on the DREAM5 networks

Our cross-validation tests on the DREAM4 networks suggest that “independent networks of the same type and size [10]” lead to the same range of the optimal and parameters. In this subsection, we will bring visual and systematic investigations of the optimal parameters of GENEREF. We start by evaluating the effect of three network characteristics of GRNs on the parameters: the type of the network (here, by comparing the variation of the parameters in E. coli versus S. cerevisiae – a prokaryote specimen versus an eukaryote one), the size of the network, and also the number

Fig. 5: Average score and scoreAUPR on the DREAM4 networks based on the number of iterations in GENEREF. For iterations, the average was taken on all M-permutations of the three data sets (the KO, TS, and MF data sets). All configurations with repetitive data sets have been dismissed.

of iterations (number of data sets). After that, we proceed to evaluate GENEREF on the DREAM5 data sets and compare it to the competitive algorithms.

We used the ten sub-networks that had been extracted from the DREAM5 original networks as the gold-standard networks of the first set of experiments. We presented the effect of the characteristics on the AUROC metric in figure 6. This figure compares the GENEREF algorithm performance on each of the ten networks based on the number of iterations. Since the possible ordering of data sets for each number of iterations is arbitrary, we averaged all the permutations of the data sets that would constitute that number of iterations. As we are only interested in the optimal range of parameters and not their corresponding metric values, we scaled all the plots to the same [min, max] range to make the plots comparable. Each plot is the scaled AUROC value over logaritmically uniform parameters in the range of . Figure 7 shows the performance of the same networks when the AUPR metric is used.

Based on an intuitive visual analysis of figures 6 and 7, the effect of the size of the network on the range of the AUROC- and AUPR-maximizing and values is not significant. The parameters approach their optimal values uniformly, with the uniformity becoming more robust as the size of network increases; there is no clear-cut difference be-

tween the plots of E. coli 320 and 640, or S. cerevisiae 800 and 1600. Neither does the addition of iterations influence the pattern drastically. The type of the network does not present a significant effect either; the difference between the patterns of either of the E. coli and S. cerevisiae is not considerable. Overall, the AUROC-maximizing parameter values ranged from to , and the AUPR-maximizing parameter values ranged from to confidence interval), with no visually evident effect on any of the mentioned network characteristics.

Besides the visual analysis, we analyzed the effects of the characteristics systematically. We will discuss these effects in terms of the optimal selectivity of the algorithm, and define the optimal selectivity of GENEREF as the value of the selectivity function corresponding to the and values that maximize an assumed metric – typically score or scoreAUPR.

We expect the size of the network and number of iterations to have an effect on the optimal value of the parameters. It is known that the average number of regulators per gene (denoted by K) remains roughly constant as the number of genes increases [36]. In other words, the edge density of the network (E) decreases by a factor of g:

TABLE 1: The U statistic values derived from the Mann-Whitney test for comparison of GENEREF to the base algorithms. Number on a cell shows how probably the GENEREF algorithm performs better than the competing algorithm.

TABLE 2: Comparison of the performance of algorithms on the DREAM4 networks using various scoring metrics. The best results in each data set group are highlighted in boldface. The best total results are marked with an asterisk.

This raises the hypothesis that the parameters should converge toward a higher optimal selectivity when networks become larger. Moreover, the algorithm is expected to have accumulated more information at higher iterations. If the accumulated information converge toward suggesting the same set of links, the optimal selectivity of the algorithm is expected to increase over iterations. On the other hand, if each new data set suggests novel links, the optimal selectivity may decrease. In general, we expect the optimal selectivity to be affected by the number of iterations too. We tested these hypotheses by comparing changes in the size of the network to the changes of optimal selectivity in terms of AUROC and AUPR. To this end, we performed a p-value hypothesis testings on the group-wise correlation coefficients of three variables: the logarithm of the size of the network (log(g)), the number of iterations (M), and the optimal selectivity of the network (optimal SelKCDF ). We brought a rigorous explanation of these tests in appendix B. In sum, both of these tests affirmed that each of the network size and number of iterations have an effect on the optimal selectivity of the regularization mapping (

and p < 0.01, respectively), although contrary to our expectation, the size of the network had a negative impact on the optimal selectivity when the AUPR metric was used.

Table 3 reports the scores of various algorithms on the DREAM5 networks in comparison to competing algorithms. Here, we used the original learning data from the DREAM5 context as our data sets. Unlike the DREAM4 contest, DREAM5 has only one file consisting all of the expression levels of all of the experiments. To get the standard data sets, we dissociated each experiment thoroughly and formed two separate data sets; one by pooling all of the steady state profiles (MF/KO) and the other one by embedding all of the time-series profiles (TS). We performed a two-iteration

networks [32]. However, the sparsity does not explain the consistency of the optimal range of parameters among the network of different sizes, because the edge density of GRNs decreases by a factor of g (number of genes) as the number of genes increases [32]. This consistency may be explainable by the fact that the average number of regulatory genes for each gene (K) is nearly identical across networks of different type and size [32]: At the modulation phase, our algorithm homogenizes the confidence matrix generated on the most recent iteration. This ensures that the regulatory genes of each gene are independently regularized at the regularization phase and will have an independent effect on the RRFs of the next iteration. As the size of the network increases, the average number of relevant regulatory genes for each gene (K) remains roughly constant, and therefore the proportion of the total relevant links () to the total potential links () remains constant:

The constant average in degree across the networks suggests that

The optimal mapping function is expected to change shape and become more selective for networks of the greater size, because the edge density of GRNs decreases as the number of edges increase [32]. We should mention that we could not increase the parameter much beyond the selected range. Although the AUROC starts decreasing at around , beyond this limit point, the Kumaraswamy CDF function becomes so steep that our implementation fails to do the precise calculations of the confidence matrices and encounters artifacts. To keep the results unaffected by these artifacts, we confined the parameter below .

Figures 6 and 7 are also indicative of , and scoreAUPR optimal parameters as these are monotonic functions of AUROC and AUPR.

4 DISCUSSION

In this paper, we introduced GENEREF, an iterative algorithm that can derive regulatory information between genes from various types of data sets. The evaluation was done on the synthetic data set networks and although GENEREF does not systemically acquire the best performance when compared to other competitive algorithms, it has the highest overall scores when all the three data sets are used. Therefore, we conclude that GENEREF performs on par with other algorithms that utilize multiple data sets. On the one hand, the diversity of existing data sets stems from the ongoing process of generating data from different kinds of experiments in laboratories around the world. Because of the widespread prevalence in production min

Fig. 6: The AUROC metric as a function of the and parameters. Five Escherichia coli and five Saccharomyces cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

Fig. 7: The AUPR metric as a function of the and parameters. Five Escherichia coli and Five Saccharomyces cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

Fig. 6: The standardized AUROC metric as a function of the and parameters. Five E. coli and five S. cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

GENREF on these two data sets.

Similar to DREAM4, we used cross-validation to obtain the optimal parameters for each of the DREAM5 networks. To get the expected optimal and parameters for the

E. coli network, we performed a linear regression on the S. cerevisiae sub-networks (including the original network itself) that maximized the considered metric in terms of the number of genes. We considered size of the network (g) as

Fig. 6: The AUROC metric as a function of the and parameters. Five Escherichia coli and five Saccharomyces cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

Fig. 7: The AUPR metric as a function of the and parameters. Five Escherichia coli and Five Saccharomyces cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

networks [32]. However, the sparsity does not explain the consistency of the optimal range of parameters among the network of different sizes, because the edge density of GRNs decreases by a factor of g (number of genes) as the number of genes increases [32]. This consistency may be explainable by the fact that the average number of regulatory genes for each gene (K) is nearly identical across networks of different type and size [32]: At the modulation phase, our algorithm homogenizes the confidence matrix generated on the most recent iteration. This ensures that the regulatory genes of each gene are independently regularized at the regularization phase and will have an independent effect on the RRFs of the next iteration. As the size of the network increases, the average number of relevant regulatory genes for each gene (K) remains roughly constant, and therefore the proportion of the total relevant links () to the total potential links () remains constant:

The constant average in degree across the networks suggests that

The optimal mapping function is expected to change shape and become more selective for networks of the greater size, because the edge density of GRNs decreases as the number of edges increase [32]. We should mention that we could not increase the parameter much beyond the selected range. Although the AUROC starts decreasing at around , beyond this limit point, the Kumaraswamy CDF function becomes so steep that our implementation fails to do the precise calculations of the confidence matrices and encounters artifacts. To keep the results unaffected by these artifacts, we confined the parameter below .

Figures 6 and 7 are also indicative of , and scoreAUPR optimal parameters as these are monotonic functions of AUROC and AUPR.

4 DISCUSSION

In this paper, we introduced GENEREF, an iterative algorithm that can derive regulatory information between genes from various types of data sets. The evaluation was done on the synthetic data set networks and although GENEREF does not systemically acquire the best performance when compared to other competitive algorithms, it has the highest overall scores when all the three data sets are used. Therefore, we conclude that GENEREF performs on par with other algorithms that utilize multiple data sets.

On the one hand, the diversity of existing data sets stems from the ongoing process of generating data from different kinds of experiments in laboratories around the world. Because of the widespread prevalence in production

Fig. 6: The AUROC metric as a function of the and parameters. Five Escherichia coli and five Saccharomyces cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

Fig. 7: The AUPR metric as a function of the and parameters. Five Escherichia coli and Five Saccharomyces cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

Fig. 7: The standardized AUPR metric as a function of the and parameters. Five E. coli and five S. cerevisiae sub-networks were tested. Each color plot is scaled over the same range.

the input variable and the parameters as the output variables of the regression. We also set the number of iterations to 2 (since were only two data sets) and only considered the optimal ordering of the data sets (MF/KO+TS for both

score and scoreAUPR). We did the same procedure on the E. coli networks to obtain the expected optimal parameter values of S. cerevisiae.

Again GENEREF does not systemically surpass the other

TABLE 3: Comparison of the performance of algorithms on the DREAM5 networks. The best results in each metric group are highlighted in boldface and the best total results are marked with an asterisk. We omitted SVR and E-SVR from this table due to their low performance on the smaller DREAM4 data sets. Also, TIGRESS, NIMEFI, BiXGBoost, and iRafNet are excluded because they either did not report their results on DREAM5 or used other types of data like protein-protein interactions.

algorithms. Yet, when the algorithms are fed the whole learning data, GENEREF gives the best score for the E. coli network and performs drastically better than the competitive algorithms when scoreAUPR is considered. When the MF/KO or TS data set is considered, GENEREF still improves GENIE3. However, dynGENIE3 is the best performer on the TS data set.

4 DISCUSSION

In this paper, we introduced GENEREF, an iterative algorithm that can derive regulatory information between genes from various types of data sets. The evaluation was done on the synthetic and in vitro data set networks. Although GENEREF does not systemically acquire the best performance when compared to other competitive algorithms, it has the highest overall scores when all the three DREAM4 data sets are used. Besides, GENEREF outperforms the competitive algorithms when performed on the E. coli data sets, if the score metric is used and on the S. cerevisiae, if the scoreAUPR metric is used. Therefore, we conclude that GENEREF performs on par with other algorithms that utilize multiple data sets.

On the one hand, the diversity of existing data sets stems from the ongoing process of generating data from different kinds of experiments in laboratories around the world. Because of the widespread prevalence in production of data sets, it seems necessary to have advanced multi-dataset algorithms. Because our algorithm exploits the data sets in an incremental manner, it also has the advantage that new data sets can be fed to the algorithm without the need to reutilize the previously used data sets.

On the other hand, our observations reveal that, on our synthetic data sets, as the number of data sets increased, the improvement in the performance increase of adding the next data set plummeted. We hypothesize that this reduction can have been induced by at least one of these two scenarios: Firstly, if the data sets contain redundant information, the data sets that are on the close-to-final iterations will have less new information to reveal. The other problem that can arise when dealing with linear algorithms is that the patterns learned on previous iterations can start to fade out. In our case, the previously inferred regulatory links may vanish when new data sets are learned. This problem can be more prominent with the regulatory links that are detected only through surgical interventions (e.g. single knock-out/knock-down experiments). More experiments will be needed to reveal the exact cause of the plummeting improvement phenomenon.

It is recommended that data sets that can be merged are not considered as separate learning data and are fed to the algorithm as a single data set, even if multiple times. The recommendation applies to all multi-perturbation and knock-out data sets and time series data sets with the same configuration (e.g. those with identical time gaps and profiled in similar laboratory conditions). The reason is, as it is argued in [20], in a tree node with a small number of instances, RRF is likely to select a feature not strongly relevant, and therefore performs poorly if the number of data records falls dramatically. In our tests on the DREAM5 networks, we obtained good results after coalescing all the steady-state and all of the time-series data sets, although the time gaps between the time-series data sets were not equal. However, further experiments to test the validity of this hypothesis might be required.

Although GENEREF principally outperforms both of the base algorithms, i.e. GENIE3 and GENIE3 (time-lagged), it cannot be concluded that it is the case for all GRNs. The results on the five DREAM4 networks are very datadependent. The difference in the performance can be explained as a result of various aspects of the networks that control the regulatory relationship between the genes. We determined the number of genes as one of the regulating characteristics on the performance of the algorithm. Further investigations might be needed to determine other influential characteristics impacting the performance of the algorithm. Also, to investigate more deeply the performance of GENEREF, we plan to test it on a bigger real-world data set in the future.

It should be noted that there are “combinatory methods” that take into account the results of multiple algorithms and outperform the current generation of multi-dataset GRN reconstruction algorithms as with dynGENIE3. MCZ+dynGENIE3 [9] is one of these combinatory methods that makes use of the predictions of the two algorithms – MCZ [34] and dynGENIE3 [9]. It multiplies the edge confidence matrices obtained from each of these algorithms. We did not discuss these combinatory methods throughout this paper and limited our comparisons to algorithms that generated the predictions independently. However, the usage of the predictions obtained from GENEREF in such combinatory methods can be a topic of research.

Similar to the AdaBoost algorithm [35], on the conceptual level GENEREF can be thought of as machine learning meta algorithm that can exploit various regressors into a single model. Though, we used only random forests as our regressors, other regressor algorithms, including TIGRESS and dynGENIE3, can take this place. More generally, the data obtained from any other algorithm can be used to guide GENEREF’s RRFs. Recently, methods based on deep learning and graph convolutional neural network methods have proven to be a great potential for GRN reconstruction and have taken a lot of attention [38], [39]. The combination and replacement of other regression algorithms or making use of information obtained from any of these advent approaches in the meta algorithm will be an other topic for exploration in our future work.

5 AUTHOR CONTRIBUTIONS STATEMENT

M.S. performed the experiments and analyzed the results. M.S. and M.A. reviewed the manuscript.

6 ADDITIONAL INFORMATION

Mr. Saremi declares no competing interests. Dr. AmirMazlaghani, who has supervised this thesis, has received compensation as an assistant professor at Amirkabir University of Technology for her supervision.

ACKNOWLEDGMENTS

The authors would like to thank Dr. Fatemeh Zare Mirakabad (Amirkabir University of Technology) who gave them continued support within the area of biology, during this research and the composition of this article.

TABLE 4: The correlation between AUROC and AUPR of DREAM4 size 100 and DREAM 5 networks. g: number of genes; K: average number of transcriptional regulators per gene; Correlation Coefficient: the correlation between the AUROC and AUPR values of a random solver. Each correlation coefficient value is obtained from 100,000 randomly generated networks.

TABLE 5: The correlation between AUROC and AUPR of biological networks of different sizes. The average number of transcriptional regulators per gene has been kept constant (K = 1.875).

APPENDIX A CORRELATION COEFFICIENT BETWEEN AUROC

AND AUPR

In order to have an evaluation of the dependency between the null distribution of both AUROC and AUPR, we calculated the correlation coefficient for the networks in both DREAM4 and DREAM5, as in table 4. In this table, g represents the number of genes, and K is the average number of regulators per gene. In none of these networks the AUROC and AUPR metrics demonstrate negligible linear dependency.

Furthermore, we demonstrated the effect of the network size on the correlation values in table 5. Note that the null AUROC and AUPR distributions only depend on the size number of links in the gold-standard network. We used a consistent edge density across all of these networks which we believe could represent that of real biological network. We averaged the values from table I in [36] of the following networks: two D. melanogaster networks, Sea urchin, S. cerevisiae, two S. cerevisiae networks, E. coli, and

TABLE 6: The correlation between the logarithm of the number of genes (log(g)) and the optimal selectivity of the regularization mapping (optimal SelKCDF ). The last row of each group shows Fisher’s combined p-value for all of the tests in that group.

Arabidopsis thaliana.

It can be inferred from the tables that the number of actual edges (gK) has a positive impact on the correlation between the two metrics. However, the number of potential edges () has compensated this effect. In none of the networks, the correlation coefficient value has fallen below 0.7.

APPENDIX B THE RELATION BETWEEN NETWORK FEATURES AND THE OPTIMAL SELECTIVITY

In order to test the effect of the size of the network on the - and -optimal selectivity of the regularization mapping we performed p-value hypothesis testings on the correlation coefficient between these two variables. We denoised the optimal parameters by taking the upper decile of each plot in figures 6 and 7 and averaging the corresponding and values. Then computed the correlation coefficient values between these two variables grouped by the type of the network and the number of iterations. Table 6 shows the results of the p-value tests on each of these groups.

We performed the same procedure to test the effect of the number of iterations and the optimal selectivity of the regularization mapping. Results are shown in table 7.

TABLE 7: The correlation between the number of iterations (M) and the optimal selectivity of the regularization mapping (optimal SelKCDF ). The last row of each group shows Fisher’s combined p-value for all of the tests performed in that group.

REFERENCES

[1] Barabasi, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. reviews genetics 5, 101 (2004).

[2] Freund, Y., Schapire, R. & Abe, N. A short introduction to boosting. Journal-Japanese Soc. For Artif. Intell. 14, 1612 (1999).

[3] Irrthum, A., Wehenkel, L., Geurts, P. et al. Inferring regulatory networks from expression data using tree-based methods. PloS one 5, e12776 (2010).

[4] Guo, Shun and Jiang, Qingshan and Chen, Lifei and Guo, Donghui Gene regulatory network inference using PLS-based methods. BMC bioinformatics 17, 1, 545 (Springer, 2016).

[5] Xiong, Jie, Zhou, Tong Gene regulatory network inference from multifactorial perturbation data using both regression and correlation analyses. PloS one 7, e43819 (2012).

[6] Bar-Joseph, Z., Gitter, A. & Simon, I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat. Rev. Genet. 13, 552 (2012).

[7] Huynh-Thu, V. A. & Sanguinetti, G. Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics 31, 1614–1622 (2015).

[8] Petralia, F., Wang, P., Yang, J. & Tu, Z. Integrative random forest for gene regulatory network inference. Bioinformatics 31, i197–i205 (2015).

[9] Geurts, P. et al. dyngenie3: dynamical genie3 for the inference of gene networks from time series expression data. Sci. reports 8, 3384 (2018).

[10] Stolovitzky, G. et al. Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Annals New York Acad. Sci. 1115, 1–22 (2007).

[11] Marbach, D. et al. Wisdom of crowds for robust gene network inference. Nat. methods 9, 796 (2012).

[12] Haury, A.-C., Mordelet, F., Vera-Licona, P. & Vert, J.-P. Tigress: trustful inference of gene regulation using stability selection. BMC systems biology 6, 145 (2012).

[13] Zheng, R. et al. Bixgboost: a scalable, flexible boosting-based method for reconstructing gene regulatory networks. Bioinformatics 35, 1893–1900 (2018).

[14] Grimaldi, M. et al. RegnANN: reverse engineering gene networks using artificial neural networks. PloS one 6, e28646 (2011).

[15] Deng, H. & Runger, G. Feature selection via regularized trees. In The 2012 International Joint Conference on Neural Networks (IJCNN), 1–8 (IEEE, 2012).

[16] Breiman, L. Classification and regression trees (Routledge, 2017).

[17] Thieffry, D., Huerta, A. M., Pérez-Rueda, E. & Collado-Vides, J. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in escherichia coli. Bioessays 20, 433–440 (1998).

[18] Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of escherichia coli. Nat. genetics 31, 64 (2002).

[19] Guelzim, N., Bottani, S., Bourgine, P. & Képès, F. Topological and causal structure of the yeast transcriptional regulatory network. Nat. genetics 31, 60 (2002).

[20] Deng, H. & Runger, G. Gene selection with guided regularized random forest. Pattern Recognit. 46, 3483–3489 (2013).

[21] Michailidis, G. & d’Alché Buc, F. Autoregressive models for gene regulatory network inference: Sparsity, stability and causality issues. Math. biosciences 246, 326–334 (2013).

[22] Guelzim, N., Bottani, S., Bourgine, P. & Képès, F. Topological and causal structure of the yeast transcriptional regulatory network. Nat. genetics 31, 60 (2002).

[23] Nair, A., Chetty, M. & Wangikar, P. Improving gene regulatory network inference using network topology information. Mol. BioSystems 11, 2449–2463 (2015).

[24] Roy, S., Shah, V. K. & Das, S. K. Characterization of e. coli gene regulatory network and its topological enhancement by edge rewiring. In BICT, 391–398 (2015).

[25] Lawson, C. L. et al. Catabolite activator protein: Dna binding and transcription activation. Curr. opinion structural biology 14, 10–20 (2004).

[26] Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 46, 79–88 (1980).

[27] Marbach, D., Schaffter, T., Floreano, D., Prill, R. J. & Stolovitzky, G. The dream4 in-silico network challenge. Draft. version 0.3 (2009).

[28] Schaffter, T., Marbach, D. & Floreano, D. Genenetweaver: in sil- ico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011).

[29] Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the dream2 challenges: a community effort to assess biological network inference. Annals New York Acad. Sci. 1158, 159–195 (2009).

[30] Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one 10, e0118432 (2015).

[31] Chen, T. He, T. et al. Xgboost: A scalable tree boosting system. Proc. 22nd acm sigkdd international conference on knowledge discovery data mining 785–794 (2015).

[32] Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J. & Vapnik, V. Support vector regression machines. In Advances in neural information processing systems, 155–161 (1997).

[33] Ruyssinck, J. et al. Nimefi: gene regulatory network inference using multiple ensemble feature importance algorithms. PLoS One 9, e92709 (2014).

[34] Greenfield, A., Madar, A. et al. DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models. PloS one 5, e13397 (2010).

[35] Freund, Y., Schapire, R. E. et al. Experiments with a new boosting algorithm. In icml, vol. 96, 148–156 (Citeseer, 1996).

[36] Leclerc, R. D. Survival of the sparsest: robust gene networks are parsimonious. Mol. systems biology 4, 1 (2008).

[37] Fan, A., Wang, H. et al. Inferring large-scale gene regulatory networks using a randomized algorithm based on singular value decomposition. IEEE/ACM transactions on computational biology bioinformatics 16, 6 (2018).

[38] Zitnik, M., Agrawal, M., Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, 13, i457–i466 (2018).

[39] Li, Y., Huang, C. et al. Deep learning in bioinformatics: Introduction, application, and perspective in the big data era. Methods 166, 4–21 (2019).

Mehrzad Saremi is an M.Sc. graduate from Amirkabir University of Technology in Artificial Intelligence. He received his M.Sc. in 2019. His research interests revolve around Bioinformatics and Causal Graph Models.

Maryam Amirmazlaghani received the Ph.D. degree in electrical engineering from the Amirkabir University of Technology, Tehran, in 2009. She is currently a Faculty Member with the Department of Computer Engineering and Information Technology, Amirkabir University of Technology.