b

DiscoverSearch
About
My stuff
Causal discovery of linear non-Gaussian acyclic models in the presence of latent confounders
2020·arXiv
Abstract
Abstract

Causal discovery from data affected by latent confounders is an important and difficult challenge. Causal functional model-based approaches have not been used to present variables whose relationships are affected by latent confounders, while some constraint-based methods can present them. This paper proposes a causal functional model-based method called repetitive causal discovery (RCD) to discover the causal structure of observed variables affected by latent confounders. RCD repeats inferring the causal directions between a small number of observed variables and determines whether the relationships are affected by latent confounders. RCD finally produces a causal graph where a bi-directed arrow indicates the pair of variables that have the same latent confounders, and a directed arrow indicates the causal direction of a pair of variables that are not affected by the same latent confounder. The results of experimental validation using simulated data and real-world data confirmed that RCD is effective in identifying latent confounders and causal directions between observed variables.

Keywords: Causal discovery, Causal structures, Latent confounders

image

Many scientific questions aim to find the causal relationships between variables rather than only find the correlations. While the most effective measure for identifying the causal relationships is controlled experimentation, such experiments are often too costly, unethical, or technically impossible to conduct. Therefore, the development of methods to identify causal relationships from observational data is important.

Many algorithms that have been developed for constructing causal graphs assume that there are no latent confounders (e.g., PC [1], GES [2], and LiNGAM [3]). They do not work effectively if this assumption is not satisfied. Conversely, FCI [4] is an algorithm that presents the pairs of variables that have latent confounders. However, since FCI infers causal relations on the basis of the conditional independence in the joint distribution, it cannot distinguish between the two graphs that entail exactly the same sets of conditional independence. Therefore, to understand the causal relationships of variables where latent confounders exist, we need a new method that satisfies the following criteria: (1) the method should accurately (without being biased by latent confounders) identify the causal directions between the observed variables that are not affected by latent confounders, and (2) it should present variables whose relationships are affected by latent confounders.

Compared to the constraint-based causal discovery methods (e.g., PC [1] and FCI [4]), causal functional model-based approaches [5, 6, 7, 8, 9] can identify the entire causal model under proper assumptions. They represent an effect Y as a function of direct cause X. They infer that variable X is the cause of variable Y when X is independent of the residual obtained by the regression of Y on X but not independent of Y .

Most of the existing methods based on causal functional models identify the causal structure of multiple observed variables that form a directed acyclic graph (DAG) under the assumption that there is no latent confounder. They assume that the data generation model is acyclic, and that the external effects of all the observed variables are mutually independent. Such models are called additive noise models (ANMs). Their methods discover the causal structures by the following two steps: (1) identifying the causal order of variables and (2) eliminating unnecessary edges. DirectLiNGAM [8], which is a variant of LiNGAM [3], performs regression and independence testing to identify the causal order of multiple variables. DirectLiNGAM finds a root (a variable that is not affected by other variables) by performing regression and independence testing of each pair of variables. If a variable is exogenous to the other variables, then it is regarded as a root. Thereafter, DirectLiNGAM removes the effect of the root from the other variables and finds the next root in the remaining variables. DirectLiNGAM determines the causal order of variables according to the order of identified roots. RESIT [9], a method extended from Mooij et al. [6] identifies the causal order of variables in a similar manner by performing an iterative procedure. In each step, RESIT finds a sink (a variable that is not a cause of the other variables). A variable is regarded as a sink when it is endogenous to the other variables. RESIT disregards the identified sinks and finds the next sink in each step. Thus, RESIT finds a causal order of variables. DirectLiNGAM and RESIT then construct a complete DAG, in which each variable pair is connected with the directed edge based on the identified causal order. Thereafter, DirectLiNGAM eliminates unnecessary edges using AdaptiveLasso [10]. RESIT eliminates each edge  X → Yif X is independent of the residual obtained by the regression of Y on Z/{X} where Z is the set of causes of Y in the complete DAG.

Causal functional model-based methods effectively discover the causal structures of observed variables generated by an additive noise model when there is no latent confounder. However, the results obtained by these methods are likely disturbed when there are latent confounders because they cannot find a causal function between variables affected by the same latent confounders. Furthermore, the causal functional model-based approaches have not been used to show variables that are affected by the same latent confounder, as FCI does.

This paper proposes a causal functional model-based method called repetitive causal discovery (RCD) to discover the causal structures of the observed variables that are affected by latent confounders. RCD is aimed at producing causal graphs where a bi-directed arrow indicates the pair of variables that have the same latent confounders, and a directed arrow indicates the direct causal direction between two variables that do not have the same latent confounder. It assumes that the data generation model is linear and acyclic, and that external influences are non-Gaussian. Many causal functional model-based approaches discover causal relations by identifying the causal order of variables and eliminating unnecessary edges. However, RCD discovers the relationships by finding the direct or indirect causes (ancestors) of each variable, distinguishing direct causes (parents) from indirect causes, and identifying the pairs of variables that have the same latent confounders.

Our contributions can be summarized as follows:

• We developed a causal functional model-based method that can present

image

• The method can also identify the causal direction of variable pairs that

image

• The results of experimental validation using simulated data and real-world

image

A briefer version of this work without detailed proofs can be found in [11].

2.1. Data generation process

This study aims to analyze the causal relations of observed variables confounded by unobserved variables. We assume that the relationship between each pair of (observed or unobserved) variables is linear, and that the external influ-ence of each (observed or unobserved) variable is non-Gaussian. In addition, we assume that (observed or unobserved) data are generated from a process represented graphically by a directed acyclic graph (DAG). The generation model is formulated using Equation 1.

image

where  xidenotes an observed variable,  bijis the causal strength from  xjto  xi, fkdenotes a latent confounder,  λikdenotes the causal strength from  fkto  xi, and  eiis an external effect. The external effect  eiand the latent confounder  fkare assumed to follow non-Gaussian continuous-valued distributions with zero mean and nonzero variance and are mutually independent. The zero/nonzero pattern of  bijand  λikcorresponds to the absence/existence pattern of directed edges. Without loss of generality [12], latent confounders  fkare assumed to be mutually independent. In a matrix form, the model is described as Equation 2:

image

where the connection strength matrices B and Λ collect  bijand  λik, and the

vectors x, f and e collect  xi, fkand  ei.

2.2. Research goals

This study has two goals. First, we extract the pairs of observed variables that are affected by the same latent confounders. This is formulated by C whose element  cijis defined by Equation 3:

image

Element  cijequals 0 when there is no latent confounder affecting variables  xiand  xj. Element  cijequals 1 when variables  xiand  xjare affected by the same

image

Figure 1: (a) Data generation model (f1 and f2are latent confounders). (b) Causal graph that RCD produces. A bi-directed arrow indicates that two variables are affected by the same latent confounders.

latent confounders.

The second goal is to estimate the absence/existence of the causal relations between the observed variables that do not have the same latent confounder. This is defined by a matrix P whose element  pijis expressed by Equation 4:

image

pij= 0 when  cij= 1 because we do not aim to identify the causal direction between the observed variables that are affected by the same latent confounders.

Finally, RCD produces a causal graph where a bi-directed arrow indicates the pair of variables that have the same latent confounders, and a directed arrow indicates the causal direction of a pair of variables that are not affected by the same latent confounder. For example, assume that using the data generation model shown in Figure 1-(a), our final goal is to draw a causal diagram shown in Figure 1-(b), where variables  f1and  f2are latent confounders, and variables A–H are observed variables.

3.1. The framework

RCD involves three steps: (1) It extracts a set of ancestors of each variable. Ancestor is a direct or indirect cause. In this paper,  Midenotes the set of ancestors of  xi. Miis initialized as  Mi = ∅. RCD repeats the inference of causal directions between variables and updates M. When inferring the causal directions between observed variables, RCD removes the effect of the already identified common ancestors. Causal direction between variables  xiand  xjcan be identified when the set of identified common causes (i.e.  Mi ∩ Mj) satisfies the back-door criterion [13, 14] to  xiand  xj. The repetition of causal inference is stopped when M no longer changes. (2) RCD extracts parents (direct causes) from M. When  xjis an ancestor but not a parent of  xi, the causal effect of  xjon  xiis mediated through  Mi \ {xk}. RCD distinguishes direct causes from indirect causes by inferring conditional independence. (3) RCD finds the pairs of variables that are affected by the same latent confounders by extracting the pairs of variables that remain correlated but whose causal direction is not identified.

3.2. Finding ancestors of each variable

RCD repeats the inference of causal directions between a given number of variables to extract the ancestors of each observed variable. We introduce Lemmas 1 and 2, by which the ancestors of each variable can be identified when there is no latent confounder. Then, we extend them to Lemma 3 by which RCD extracts the ancestors of each observed variable for the case that latent confounders exist. We first quote Darmois-Skitovitch theorem (Theorem 1) proved in [15, 16] because it is used to prove the lemmas.

Theorem 1. Define two random variables  y1and  y2as linear combinations of independent random variables  si(i= 1, · · · , q): Y1= �qi=1 αisi, Y2= �qi=1 βisi. Then, if  y1and  y2are independent, all variables  sjfor which

image

on  xjand  r(i)jdenote the residual obtained by the linear regression of  xjon  xi. The causal relation between variables  xiand  xjis determined as follows: (1) If xiand  xjare not linearly correlated, then there is no causal effect between  xiand  xj. (2) If  xiand  xjare linearly correlated and  xjis independent of residual r(j)i, then  xjis an ancestor of  xi. (3) If  xiand  xjare linearly correlated and  xjis dependent on  r(j)iand  xiis dependent on  r(i)j, then  xiand  xjhave a common ancestor. (4) There is no case that  xiand  xjare linearly correlated and  xjis independent of  r(j)iand  xiis independent of  r(i)j.

Proof. The causal relationship between two variables  xiand  xjcan be classified into the following four cases: (Case 1) There is no common cause of the two variables, and there is no causal effect between them; (Case 2) There is no common cause of the two variables, and one variable is a cause of the other variable; (Case 3) There are common causes of the two variables, and there is no causal effect between them; (Case 4) There are common causes of the two variables, and one variable is a cause of the other variable. Cases 1, 2, 3, and 4 are modeled by Equations 5, 6, 7, and 8, respectively:

image

where  eiand  ejare the non-Gaussian external effects that are mutually independent,  bijis the non-zero causal strength from  xjto  xi, and  ciand  cjare the linear combinations of the common causes of  xiand  xj. The linear combinations of the common causes  ciand  cjare linearly correlated and are independent of eiand  ej. We investigate the following three points for each case: (1) whether xiand  xjare linearly correlated, (2) whether  xjis independent of  r(j)i, and (3) whether  xiis independent of  r(i)j.

Case 1: Variables  xiand  xjare mutually independent because of Equation 5. Therefore,  xiand  xjare not linearly correlated. Let  αdenote the coefficient of  xjwhen  xiis regressed on  xj. Since  xiand  xjare mutually independent,

α= 0. Then,

image

Therefore,  xjis independent of  r(j)ibecause  xiand  xjare mutually independent. Similarly,  xiis independent of  r(i)j.

Case 2: Variables  xiand  xjare linearly correlated because  xi = bijxj +ei. Let αdenote the coefficient of  xjwhen  xiis regressed on  xj. Then,  α = bijbecause bijxjis the only term on the right side of equation  xi = bijxj + eithat covaries with  xj. Then, we have  r(j)i:

image

Then,  xjis independent of  r(j)ibecause  xjis independent of  ei. Let  βdenote the coefficient of  xiwhen  xjis regressed on  xi. Since  xiand  xjare linearly correlated,  β ̸= 0. Then, we have  r(i)j:

image

Then,  xiis not independent of  r(i)jbecause of the term  −βeiin Equation 11 and Theorem 1.

Case 3: Since  ciand  cjare linearly correlated,  xiand  xjare linearly correlated. Let  αdenote the coefficient of  xjwhen  xiis regressed on  xj. Since  xiand  xjare linearly correlated,  α ̸= 0. Then, we have  r(j)i:

image

Then,  xjis not independent of  r(j)ibecause of the term  −αejin Equation 12 and Theorem 1. Similarly,  xiis not independent of  r(i)j.

Case 4: Since  ciand  cjare linearly correlated,  xiand  xjare linearly correlated. Let  αdenote the coefficient of  xjwhen  xiis regressed on  xj. Then,  α ̸= bijbecause  xjcovaries with terms  bijxjand  cion the right side of equation  xi= bijxj + ci + ei. We have  r(j)i:

image

Then,  xjis not independent of  r(j)ibecause of the term (bij − α) ejin Equation 12 and Theorem 1. Let  βdenote the coefficient of  xiwhen  xjis regressed on  xi. Since  xiand  xjare linearly correlated,  β ̸= 0. Then, we have  r(i)j:

image

Then,  xiis not independent of  r(i)jbecause of the term  −βeiin Equation 14 and Theorem 1. These cases can be summarized as follows: (Case 1)  xiand xjare not linearly correlated; (Case 2)  xiand  xjare linearly correlated,  xjis independent of  r(j)i, and  xiis not independent of  r(i)jwhen the causal direction is xi ← xj; (Cases 3 and 4)  xiand  xjare linearly correlated,  xjis not independent of  r(j)i, and  xiis not independent of  r(i)j. Lemma 1-(1) assumes that  xiand  xjare not linearly correlated. This assumption only corresponds to Case 1. Therefore, there is no causal effect between  xiand  xj. Lemma 1-(2) assumes that  xiand xjare linearly correlated, and  xjis independent of  r(j)i. This assumption only corresponds to Case 2. Therefore,  xjis an ancestor of  xi. Lemma 1-(3) assumes that  xiand  xjare linearly correlated,  xjis not independent of  r(j)i, and  xiis not independent of  r(i)j. This corresponds to Case 3 or Case 4. Therefore,  xiand  xjhave common ancestors. According to Lemma 1-(4), there is no case among Cases 1–4 where  xiand  xjare linearly correlated,  xjis independent of r(j)i, and  xiis independent of  r(i)j.

It is necessary to remove the effect of common causes to infer the causal directions between variables. When the set of the identified common causes of variables  xiand  xjsatisfies the back-door criterion, the causal direction between xiand  xjcan be identified. The back-door criterion [13, 14] is defined as follows:

Definition 1. A set of variables Z satisfies the back-door criterion relative to

an ordered pair of variables (xi, xj) in a DAG G if no node in Z is a descendant of  xi, and Z blocks every path between  xiand  xjthat contains an arrow into

xi.

Lemma 1 is generalized to Lemma 2 to incorporate the process of removing the effects of the identified common causes. Lemma 2 can also be used to determine whether the identified common causes are sufficient to detect the causal direction between the two variables.

Lemma 2. Let  Hijdenote the set of common ancestors of  xiand  xj. Let  yiand  yjdenote the residuals when  xiand  xjare regressed on  Hij, respectively. Let  r(j)iand  r(i)jdenote the residual obtained by the linear regression of  yion  yj, and  yjon  yi, respectively. The causality and the existence of the confounders are determined by the following criteria: (1) If  yiand  yjare not linearly correlated, then there is no causal effect between  xiand  xj. (2) If  yiand  yjare linearly

image

Figure 2: (a) Variables A, B, and C are the causes of variable D, and they have a common cause,  f1. (b) A and Bare the causes of D, but C is not.

correlated and  yjis independent of the residual  r(j)i, then  xjis an ancestor of xi. (3) If  yiand  yjare linearly correlated and  yjis dependent on  r(j)iand  yiis dependent on  r(i)j, then  xiand  xjhave a common ancestor other than  Hij, and Hijdoes not satisfy the back-door criterion to (xi, xj) or (xj, xi). (4) There is no case that  yiand  yjare linearly correlated and  yjis independent of  r(j)iand yiis independent of  r(i)j.

Proof. When Lemma 1 is applied to  yiand  yj, Lemma 2 is derived.

Next, we consider the case that there are latent confounders. In Lemma 2, the direction between two variables is inferred by regression and independence tests. However, if there are two paths from latent confounder  fkto  xi, and xjis only on one of the paths, then  Mi ∩ Mjcannot satisfy the back-door criterion. For example, in Figure 2-(a), variables A, B, and C are the causes of variable D, and the causes are also affected by the same latent confounder  f1. The causal direction between A and D cannot be inferred only by inferring the causality between them because the effect of  f1is mediated through B and C to D. Therefore, A, B, and C are the causes of D when they are independent of the residual obtained by the multiple regression of D on {A, B, C}. However, it is necessary to confirm that variables in each proper subset of {A, B, C} are not independent of the residual obtained by the regression of D on the proper subset (i.e., no proper subset of {A, B, C} satisfies the back-door criterion). For example, in Figure 2-(b), C is not a cause of D, but A, B, and C are all independent of the residual obtained by the multiple regression of D on {A, B, C}. C should not be regarded as a cause of D because A and B are also independent of the residual when D is regressed on {A, B}. This example is generalized and formulated by Lemma 3:

Lemma 3. Let X denote the set of all observed variables. Let U denote a subset of X that contains  xi(i.e.,  U ⊆ Xand  xi ∈ U). Let M denote the sequence of  Mjwhere  Mjis a set of ancestors of  xj. For each  xj ∈ U, let  yjdenote the residual obtained by the multiple linear regression of  xjon the common ancestors of U, where the set of common ancestors of U is �xj∈U Mj. We define  f(xi, U, M) as a function that returns 1 when each  yj ∈ {yj | xj ∈ U \xi}is independent of the residual obtained by the multiple linear regression of  yion {yj | j ̸= i}; otherwise it returns 0. If  f(xi, V, M) = 0 for each  V ⊂ Uand f(xi, U, M) = 1, then each  xj ∈ Uis an ancestor of  xj.

Proof. We prove Lemma 3 by contradiction. Assume that  xj ∈ U \{xi}is not an ancestor of  xi, even though  f(xi, V, M) = 0 for each  V ⊂ U, and  f(xi, U, M) = 1. Let  Djdenote the set that consists of the descendants of  xjand  xjitself.

Then,

image

Let  HUdenote the set of common causes of U (i.e.  HU= �xj∈U Mj). Let  αk

denote the coefficient of  xk ∈ HUwhen  xiis regressed on  HU. Then,

image

Let  sUidenote the residual obtained by the multiple regression of  yion  {yj |xj ∈ U \ xi}, and let  βkdenote the coefficient of  ykobtained by the multiple

regression of  yion  yk ∈ {yk | xk ∈ U \ {xi}}. Then, we have  sUi:

image

There is no term that includes  ej, the external effect of  yj, other than  −βjyjin Equation 15. External effect  ejis independent of the other terms in Equation 15. Since  yjis independent of  sUi,  βj= 0 by Theorem 1. Therefore, we have  sUias follows:

image

Every  yk ∈ U \{xi, xj}is independent of  sUi. This means  f(xi, U \{xj}, M) = 1, and it contradicts the assumption; that is,  f(xi, V, M) = 0 for each  V ⊂ U.

We describe the procedure and the implementation of how RCD extracts the ancestors of each observed variable in Algorithm 1. The output of the algorithm is sequence  M = {Mi}, where  Miis the set of identified ancestors of xi. Argument  αCis the alpha level for the p-value of the Pearson’s correlation. If the p-value of two variables is smaller than  αC, then we estimate that the variables are linearly correlated. Argument  αIis the alpha level for the p-value of the Hilbert-Schmidt independence criterion (HSIC) [17]. If the p-value of the HSIC of two variables is greater than  αI, then we estimate that the variables are mutually independent. Argument  αSis the alpha level to test whether a variable is generated from a non-Gaussian process using the Shapiro-Wilk test [18]. Argument n is the maximum number of explanatory variables used in multiple linear regression for identifying causal directions; i.e., the maximum number of (|U| −1) in Lemma 3. In practice, this should be set to a small number when the number of samples is smaller than the number of variables. RCD does not perform multiple regression analysis of more than n explanatory variables.

RCD initializes  Mito be an empty set for each  xi ∈ X. RCD repeats the inference between the variables in each  U ⊂ Xthat has (l + 1) elements. Number l is initialized to 1. If there is no change in M, l is increased by 1. If there is a change in M, l is set to 1. When l exceeds n, the repetition ends. Variable changed has information about whether there is a change in M within an iteration.

In line 16 of Algorithm 1, RCD confirms that there is no identified ancestor of  xiin U by checking that  Mi ∩ U = ∅. This confirms that  f(xi, V, M) = 0 for each  V ⊂ Uin Lemma 3. In lines 17–24, RCD checks whether  f(xi, U, M) = 1 in Lemma 3. When  f(xi, U, M) = 1 is satisfied,  xiis put into S. S is a set of candidates for a sink (a variable that is not a cause of the others) in U. It is necessary to test whether there is only one sink in U because two variables may be misinterpreted as causes of each other when the alpha level for the independence test (αI) is too small.

We use least squares regression for removing the effect of common causes in line 12 of Algorithm 1, but we use a variant of multiple linear regression called multilinear HSIC regression (MLHSICR) to examine the causal directions between variables in U in line 20 of Algorithm 1 when  l ≥2. Coefficients obtained by multiple linear regression using the ordinary least squares method with linearly correlated explanatory variables often differ from true values due to estimation errors. Thus, the relationship between the explanatory variables and the residual may be misinterpreted to be dependent in the case that explanatory variables are affected by the same latent confounders. To avoid such failure, we use MLHSICR defined as follows:

Definition 2. Let variable  xidenote an explanatory variable, x denote a vector that collects explanatory variables  xi, and y denote a response variable. MLHSICR models the relationship  y = λ⊤x by the coefficient vector  λin the following equation:

image

where HSIC�(a, b) denotes the Hilbert-Schmidt independence criterion of a and

image

Mooij et al. [6] have developed a method to estimate the nonlinear causal function between variables by minimizing the HSIC between the explanatory variables and the residual. RCD estimates  λby minimizing the sum of the HSICs in Equation 17 using the L-BFGS method [19], similar to Mooij et al. [6]. L-BFGS is a quasi-Newton method, and RCD sets the coefficients obtained by the least squares method to the initial value of  λ.

3.3. Finding parents of each variable

When  xjis an ancestor but not a parent of  xi, the effect of  xjon  xiis mediated through  Mi \ {xj}. Therefore,  xj ⊥⊥ xi | Mi \ {xj}. [20] proposed a method to test the conditional independence using unconditional independence testing in Theorem 2 (proved by them):

Theorem 2. If  xiand  xjare neither directly connected nor unconditionally independent, then there must exist a set of variables Z and two functions f and g such that  xi − f(Z) ⊥⊥ xj − g(Z), and  xi − f(Z) ⊥⊥ Zor  xj − g(Z) ⊥⊥ Z.

image

where f and g are multiple linear regression functions of  xjon  Mi \{xj}and  xi

on  Mi \ {xj}, respectively. Since (Mi \ {xj}) ∩ Mj = Mi ∩ Mj, we can assume

that  xj⊥⊥  xi| (Mi\ {xj})  xj − h(Mi ∩ Mj) ⊥⊥  xi − g(Mi\ {xj}) where h is a

multiple linear regression function of  xjon (Mi ∩ Mj).

Based on Theorem 2, RCD uses Lemma 4 to distinguish the parents from the ancestors. We proved Lemma 4 without using Theorem 2.

image

Lemma 4. Assume that  xj ∈ Mi; that is,  xjis an ancestor of  xi. Let  zidenote the residual obtained by the multiple regression of  xion  Mi\{xj}. Let  wjdenote the residual obtained by the multiple regression of  xjon (Mi ∩Mj). If  ziand  wjare linearly correlated, then  xjis a parent of  xi; otherwise,  xjis not a parent of  xi.

Proof. Variable  xiand  xjare formulated as follows:

image

Let  αkdenote the coefficient of  xk ∈ (Mi\{xj}) when  xiis regressed on  Mi\{xj}.

Then,

image

Let  βkdenote the coefficient of  xk ∈ (Mi∩Mj) when  xjis regressed on  Mi∩Mj.

Then,

image

=

image

=

image

From Equations 20, and 21,

image

Since  xiand  xjdo not have the same latent confounder:

image

From Equations 21, 22, and 23,  ziand  wiare linearly correlated when  bij ̸= 0. It means that  xjis a parent (direct cause) of  xi. When  bij= 0,  ziand  wiare not linearly correlated. It means that  xjis not a parent of  xi.

3.4. Identifying pairs of variables that have the same latent confounders

RCD infers that two variables are affected by the same latent confounders when those two variables are linearly correlated even after removing the effects of all the parents. RCD identifies the pairs of variables affected by the same latent confounders by using Lemma 5.

Lemma 5. Let  Miand  Mjrespectively denote the sets of ancestors of  xiand xj, and  Piand  Pjrespectively denote the sets of parents of  xiand  xj. Assume that  xi /∈ Mjand  xj /∈ Mi. Let  yidenote the residual obtained by the multiple regression of  xion  Pi, and  yjdenote the residual obtained by the multiple regression of  xjon  Pj. If  yiand  yjare linearly correlated, then  xiand  xjhave the same latent confounders.

Proof. Variable  xiand  xjare formulated as follows:

image

Let  αkdenote the coefficient of  xk ∈ Piwhen  xiis regressed on  Pi. Then,

image

Let  βkdenote the coefficient of  xk ∈ Pjwhen  xjis regressed on  Pj. Then,

image

Variables  eiand  eiare independent of each other. If we assume that  xiand  xj

do not have the same latent confounder, then,

image

Then,  yiand  yjare mutually independent. However, this contradicts the assumption of Lemma 5 that  yiand  yjare linearly correlated. Therefore,  xiand xjhave the same latent confounders.

We evaluated the performance of RCD relative to the existing methods in terms of how accurately it finds the pairs of variables that are affected by the same latent confounders and how accurately it infers the causal directions of the pairs of variables that are not affected by the same latent confounder. In regard to the latent confounders, we compared RCD with FCI [4], RFCI [21], and GFCI [22]. In addition to these three methods, we compared RCD with PC [1], GES [2], DirectLiNGAM [8], and RESIT [9] to evaluate the accuracy of causal directions. In the following sections, DirectLiNGAM is called LiNGAM for simplicity.

4.1. Performance on simulated structures

image

Figure 3: Performance evaluation on causal graphs using simulated data: The vertical red lines indicate the median values of the results. The evaluation of the latent confounders corresponds to the evaluation of bi-directed arrows. The evaluation of causality corresponds to the evaluation of directed arrows.

We performed 100 experiments to evaluate RCD relative to the existing methods. We prepared 300 sets of samples for each experiment. The data of each experiment were generated as follows: The data generation process was modeled the same as Equation 1. The number of observed variables  xiwas set to 20 and the number of latent confounders  fkwas set to 4. Let X and Y denote the stochastic variables, and assume that  Y ∼ N(0.0, 0.5) and  X = Y 3. We used the random samples of X for  eiand  fkbecause X is non-Gaussian. The number of causal arrows between the observed variables is 40, and the start point and the end point of each causal arrow were randomly selected. We randomly drew two causal arrows from each latent confounder to the observed variables. Let Z denote a stochastic variable that comes from a uniform distribution on [−1.0, −0.5] and [0.5, 1.0]. We used the random samples of Z for  bijand  λik.

We evaluated (1) how accurately each method infers the pairs of variables that are affected by the same latent confounders (called the evaluation of latent confounders), and (2) how accurately each method infers causality between the observed variables that are not affected by the same latent confounder (called the evaluation of causality). The evaluation of latent confounders corresponds to the evaluation of bi-directed arrows in a causal graph, and the evaluation of causality corresponds to the evaluation of directed arrows. We used precision, recall, and F-measure as evaluation measures. In regard to the evaluation of latent confounders, true positive (TP) is the number of true bi-directed arrows that are correctly inferred. In regard to causality, TP is the number of true directed arrows that a method correctly infers in terms of their positions and directions. Precision is TP divided by the number of estimations, and recall is TP divided by the number of all true arrows. F-measure is defined as F-measure = 2  ·precision  ·recall/(precision + recall).

The arguments of RCD, that is,  αC(alpha level for Pearson’s correlation), αI(alpha level for independence),  αS(alpha level for the Shapiro-Wilk test), and n (maximum number of explanatory variables for multiple linear regression) were set as  αC= 0.01, αI= 0.01, αS= 0.01, and n = 2.

In regard to the types of edges, FCI, RFCI, and GFCI produce partial ancestral graphs (PAGs) that include six types of edges:  →(directed),  ↔(bi-directed),  ◦→(partially directed),  ◦−◦(nondirected), and  ◦−(partially undirected). In the evaluation, we only used the directed and bi-directed edges. PC, GES, LiNGAM, and RESIT produce causal graphs only with the directed edges; thus, we did not evaluate those methods in terms of latent confounders.

The box plots in Figure 3 display the results. The vertical red lines indicate the median values. Note that some median values are the same as the upper or lower quartiles. For example, the median and the upper quartile of the recalls of RCD in the results of latent confounders are the same. It means that the results between the median and the upper quartile are the same. In regard to the evaluation of latent confounders, the precision, recall, and F-measure values are almost the same for RCD, FCI, RFCI, and GFCI, but the medians of precision, recall, and F-measure values of RCD are the highest among them. In regard to causality, RCD scores the highest medians of the precision and F-measure values among all the methods, and the median of recall for RCD is the second

highest next to RESIT.

The results suggest that RCD does not greatly improve the performance metrics compared to the existing methods. However, there is no other method that has the highest or the second highest performance for each metric. FCI, RFCI, and GFCI perform as well as RCD in terms of finding the pairs of variables that are affected by the same latent confounders, but they do not perform well in terms of the recall of causality. In addition, no other method performs well in terms of both precision and recall of causality. RCD can successfully find the pairs of variables that are affected by the same latent confounders and identify the causal direction between variables that are not affected by the same latent confounder.

4.2. Performance on real-world structures

Causal structures in the real-world are often very complex. Therefore, RCD likely produces a causal graph where each pair of observed variables is connected with a bi-directed arrow. The result of identifying latent confounders is affected by the threshold of the p-value for the independence test,  αI. If  αIis too large or too small, then all the variable pairs are likely concluded to have the same latent confounders. Therefore, we need to find the most appropriate value of αI. We increased k from 1 to 25 and set  αIas  αI= 0.1kand repeated the process. We adopted a result that has the smallest number of pairs of variables with the same latent confounders.

We analyzed the General Social Survey data set, taken from a sociological data repository.1The data have been used for the evaluation of DirectLiNGAM in Shimizu et al. [8]. The sample size is 1380. The variables and the possible directions are shown in Figure 4. The directions were determined based on the domain knowledge in Duncan et al. [23] and temporal orders.

We evaluated the directed arrows (causality) in the causal graphs produced

image

Figure 4: Variables and causal relations in the General Social Survey data set used for the evaluation.

Table 1: The results of the application to sociological data.

image

by RCD and the existing methods, based on the directed arrows in Figure 4. In addition, we evaluated the bi-directed arrows in causal graphs produced by the methods as accurate inference if they exist in Figure 4 as directed arrows.

The results are listed in Table 1. In regard to bi-directed arrows (latent confounders), the number of successful inferences by RCD is the highest, and the precisions of RCD, FCI, and RFCI are all 1.0. In regard to the directed arrows (causality), the numbers of the successful arrows of RCD, RESIT, and LiNGAM are the highest. The precisions of RCD and LiNGAM are also the highest. The causal graph produced by RCD is shown in Figure 5. The dashed

image

Figure 5: Causal graph produced by RCD: The dashed arrow,  x3 ← x5is incorrect inference,

image

arrow  x3 ← x5is the incorrect inference, but the others are correct.

RCD performs the best among the existing methods in terms of both identifying the pairs of variables that are affected by the same latent confounders and identifying the causal direction of the pairs of variables that are not affected by the same latent confounder.

We developed a method called repetitive causal discovery (RCD) that produces a causal graph where a directed arrow indicates the causal direction between the observed variables, and a bi-directed arrow indicates a pair of variables have the same confounder. RCD produces a causal graph by (1) finding the ancestors of each variable, (2) distinguishing the parents from the indirect causes, and (3) identifying the pairs of variables that have the same latent confounders. We confirmed that RCD effectively analyzes data confounded by unobserved variables through validations using simulated and real-world data.

In this paper, we did not discuss the utilization of prior knowledge. However, it is possible to make use of prior knowledge of causal relations in practical applications of RCD. In this study, information about the ancestors of each variable was initialized to be an empty set. If we have prior knowledge about causal relations, the information about the ancestors of each variable that RCD

retains can be set according to the prior knowledge.

There is still room for improvement in the RCD method. The optimal settings of the arguments of RCD and the extension of RCD for nonlinear causal relations will be investigated in future studies.

We thank Dr. Samuel Y. Wang for his useful comments on a previous version of our algorithm proposed in [11]. Takashi Nicholas Maeda has been partially supported by Grant-in-Aid for Scientific Research (C) from Japan Society for the Promotion of Science (JSPS) #20K19872. Shohei Shimizu has been partially supported by ONRG NICOP N62909-17-1-2034 and Grant-in-Aid for Scientific Research (C) from Japan Society for the Promotion of Science (JSPS) #16K00045 and #20K11708.

image

image

[10] H. Zou, The adaptive lasso and its oracle properties, Journal of the Amer-

image

[11] T. N. Maeda, S. Shimizu, RCD: Repetitive causal discovery of linear non-

image

[12] P. O. Hoyer, S. Shimizu, A. J. Kerminen, M. Palviainen, Estimation of

image

image

[13] J. Pearl, Comment: Graphical models, causality and intervention, Statis-

image

[14] J. Pearl, Causality: models, reasoning and inference, Cambridge University

image

[15] G. Darmois, Analyse g´en´erale des liaisons stochastiques: etude particuli´ere

image

[16] V. P. Skitovitch, On a property of the normal distribution, Doklady

image

[17] A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Sch¨olkopf, A. J. Smola, A

image

[18] S. S. Shapiro, M. B. Wilk, An analysis of variance test for normality (com-

image

[19] D. C. Liu, J. Nocedal, On the limited memory BFGS method for large scale

image

[20] H. Zhang, S. Zhou, K. Zhang, J. Guan, Causal discovery using regression-

image

[21] D. Colombo, M. H. Maathuis, M. Kalisch, T. S. Richardson, Learning

image

image

[22] J. M. Ogarrio, P. Spirtes, J. Ramsey, A hybrid causal search algorithm for

image

[23] O. D. Duncan, D. L. Featherman, B. Duncan, Socioeconomic background

image


Designed for Accessibility and to further Open Science