Estimating Transfer Entropy via Copula Entropy

2019·Arxiv

Abstract

Abstract

Causal discovery is a fundamental problem in statistics and has wide applications in different fields. Transfer Entropy (TE) is a important notion defined for measuring causality, which is essentially conditional Mutual Information (MI). Copula Entropy (CE) is a theory on measurement of statistical independence and is equivalent to MI. In this paper, we prove that TE can be represented with only CE and then propose a non-parametric method for estimating TE via CE. The proposed method was applied to analyze the Beijing PM2.5 data in the experiments. Experimental results show that the proposed method can infer causality relationships from data effectively and hence help to understand the data better.

Keywords: Copula Entropy; Transfer Entropy; Conditional Independence; Causal Discovery; Estimation

1 Introduction

Causality is about the relationship between cause and effect and is ubiquitous in natural and social worlds. Questing such relationship is the central topic of different sciences. Causal discovery [1] is the statistical problem to identify such causal relations from observational or experimental data collected from the underlying systems. In statistics, association is closely related to causality. However, the former does not imply the latter, as is well known that correlation does not mean causation. To discover causality relationships, conceptual tools beyond association are needed.

Granger [2, 3] developed the notion of causality between two stationary time series based on the philosophy that cause should improve the prediction of effects. Given two random variable , Granger Causality (GC) is defined as follows:

Definition 1 (Granger Causality).

Transfer Entropy (TE) is another information theoretic measure of causality defined for stationary time series by Schreiber [4]. Inspired by the notion of Mutual Information (MI), TE is defined in the form of conditional MI, as follows:

Definition 2 (Transfer Entropy), Y, i = 1, . . . , T

The above definition can be written as condition MI, as follows:

By such definition, TE is a model-free measure and can be interpreted as the information cause provides to effects. TE assumes time series to be stationary. Additionally, since conditional probability are used in the definition, it means TE also assumes Markovianity. Though having great potential applications to many fields, TE has been considered notoriously difficult to estimate [5].

Both GC and TE are defined based on the same philosophy of causality. Barnett et al. have shown that GC and TE are equivalent under the Gaussianity assumption [6]. Apparently, TE can be applied to more broad cases than GC. Directed Information is a notion closely related to TE, which is defined as the sum of a group of TE and one MI [7].

Copula theory is about the representation of statistical dependence [8, 9]. Copula Entropy (CE) is a recently introduced theory on measurement of statistical independence [10]. CE has been proved to be equivalent to MI [10] and therefore connects copula theory with information theory. It is a ideal measure of statistical independence since it has several good properties, such as multivariate, symmetric, non-negative (0 iff independent), invariant to monotonic transformation, equivalent to correlation coefficient in Gaussian cases. In [10], Ma and Sun also proposed a non-parametric method for estimating CE (See Section 3.2). CE has been applied to discover associations in data [11].

In this paper, we propose a non-parametric method for estimating TE with CE. Specifically, we first prove that TE can also be represented with only CE and hence propose a method for estimating TE via CE based on this representation. We test the proposed method on the Beijing PM2.5 data to identify the causality in environmental and meteorological systems and also compare it with the related methods.

2 Related Research

Estimating TE is a fundamental problem for its applications. One of the basic estimation methods is based on the definition of TE. Faes et al. [12] used the original definition of TE to estimate it as the difference between two conditional entropies. Vicente et al. [13] proposed to estimate TE by expanding the defi-nition of TE into the sum of four entropies. Kontoyiannis and Skoularidou [14] proposed to estimate directed information via likelihood based MI estimator and showed the asymptotic convergence of such estimator theoretically.

Copula has been applied to estimate measures of causality. Taoufik et al. represented the hypothesis of Conditional Independence (CI) in terms of copula densities and then defined the test statistic based on the Hellinger distance between two terms of copula densities [15]. Hu and Liang proposed to compute GC with copula, which write the definition of GC into a conditional copula density formation and then estimate conditional copula density empirically [16]. This work proposed to estimate GC with non-parametric PDF approximation, which is lack of convergence guarantee. Song [17] proposed to test the hypothesis of CI after deriving conditional copula with Rosenblatt transforms. No¨el et al. [18] proposed the estimators of conditional association measures with the estimated conditional copula. The above three research are all testing CI based on conditional copula. However, estimating conditional copula from data is usually biased and unstable. Since TE is essentially a conditional MI, it is natural to link it to CE. Wieczorek and Roth [19] proposed the notion of causal compression with CE, which proved that Directed Information can be represented with only CE. Kim et al. proposed a Copula Nonlinear GC with a VAR system by using a copula version of beta regression model [20]. Cui et al. considered learning causal structure from data with missing values with Gaussian copula assumption [21]. Reddi and P´oczos [22] proposed a scale invariant conditional dependence measure by transforming Hilbert-Schmidt dependence measures with copula functions. Testing CI through partial copula was a popular topic recently. Bergsma [23, 24] first proposed to test CI by testing independence between the variables derived from original variable by partial copula transformation. Bianchi, et al. [25] introduced a test statistic for CI called weighted partial copula which is defined as the weighted distance between the estimated conditional (or partial) copula and independence copula. Also with partial copula, Petersen and Hansen [26] proposed a test statistics based on generalized correlation between the residuals estimated with quantile regression. Frattarolo and Guegan [27] proposed to test CI with empirical conditional copula projections.

Another line of the related research is on estimating Conditional MI (CMI) based on the kNN method for estimating MI [28, 29, 30, 31]. TE is essentially CMI. These related works on estimating CMI are all based on a same idea that expands CMI into four terms of entropies by definition and then estimates each term with kNN entropy estimator. In this paper, we will proposed a different kNN-based method for estimating TE (or CMI).

In this paper, we will proposed a method for estimating TE via CE based on the representation TE with only CE. Together with the previous work [10], we will develop a theoretical framework on testing both independence and CI based on CE. Previously, two similar frameworks for (conditional) independence testing were also proposed based on kernel tricks in machine learning [32, 5] and distance covariance/correlation [33, 34, 35]. Both frameworks can be considered as nonlinear generalization of traditional (partial) correlation, and have non-parametric estimation methods. The kernel-base framework is based on the idea, called kernel mean embedding, that test correlation [32] or partial correlation [5] by transforming distributions into RKHS with kernel functions. Another framework defines a concept called distance correlation with characteristic function [33, 34]. With this concept, Wang et al. [35] proposed a new concept for testing CI, called Conditional Distance Correlation (CDC), defined with characteristic function for conditional functions.

3 Copula Entropy

3.1 Theory

Copula theory is about the representation of multivariate dependence with copula function [8, 9]. At the core of copula theory is Sklar theorem [36] which states that multivariate probability density function can be represented as a product of its marginals and copula density function which represents dependence structure among random variables. Such representation separates dependence structure, i.e., copula function, with the properties of individual variables – marginals, which make it possible to deal with dependence structure only regardless of joint distribution and marginal distributions. This section is to define an statistical independence measure with copula. For clarity, please refer to [10] for notations.

With copula density, Copula Entropy is define as follows [10]:

Definition 3 (Copula Entropy). Let X be random variables with marginal distributions u and copula density c(u). CE of X is defined as

In information theory, MI and entropy are two different concepts [37]. In [10], Ma and Sun proved that they are essentially same – MI is also a kind of entropy, negative CE, which is stated as follows:

The proof of Theorem 1 is simple [10]. There is also an instant corollary (Corollary 1) on the relationship between information of joint probability density function, marginal density function and copula density function, which is stated as follows:

The above results cast insight into the relationship between entropy, MI, and copula through CE, and therefore build a bridge between information theory and copula theory. CE itself provides a mathematical theory of statistical independence measure.

3.2 Estimation

It has been widely considered that estimating MI is notoriously difficult. Under the blessing of Theorem 1, Ma and Sun [10] proposed a simple and elegant non-parametric method for estimating CE (MI) from data which composes of only two steps:

1. Estimating Empirical Copula Density (ECD);

2. Estimating CE.

For Step 1, if given data samples i.i.d. generated from random variables , one can easily estimate ECD as follows:

where i = 1, . . . , N and represents for indicator function. Let ], and then one can derive a new samples set as data from ECD c(u). In practice, Step 1 can be easily implemented non-parametrically with rank statistics.

Once ECD is estimated, Step 2 is essentially a problem of entropy estimation which has been contributed with many existing methods. Among them, the kNN method [38] was suggested in [10]. With rank statistics and kNN methods, one can derive a non-parametric method of estimating CE, which can be applied to any situation without assumption on the underlying system.

4 TE via CE

In this section, we propose a method for estimating TE via CE. Before that, we derive a representation of TE with CE.

Proof. The proof starts from the definition of TE (2). After expanding the definition equation, the definition can be easily transformed into the equation composed of four different CE terms.

The last term ) in (12) equals to 0 if .

The above proposition shows that TE is the sum of four terms: joint CE of cause X and the past and future of effect Y , self-joint CE of Y , association between cause X and effects Y , and joint CE of the past of Y (exists only if it is multivariate). The second term means excluding the information of self dynamics of effects from joint CE and the third term means excluding the information from cause to the past of effects.

With this representation, we propose a method for estimating TE via CE by two simple steps:

1. estimating the three or four CE terms in (12);

2. calculating TE from these estimated CEs.

CE can be estimated with the method in Section 3.2, and hence we derive a non-parametric method for estimating TE. The proposed method inherits the merits of the method for estimating CE, including model-free, tuning-free, good convergence performance and low computation burden.

5 Experiments and Results

5.1 Data

The Data used in our experiment is the Beijing PM2.5 dataset in the UCI machine learning repository [39], which is about air pollution at Beijing. This hourly data set contains the PM2.5 data of US Embassy in Beijing. Meanwhile, meteorological data from Beijing Capital International Airport are also included. The data has been analyzed at month scale [40, 41].

Meteorological factors in data include dew point, temperature, pressure, cumulated wind speed, combined wind direction, cumulated hours of snow, cumulated hours of rain. The first four factors are analyzed in our experiments. The data was collected hourly from Jan. 1st, 2010 to Dec. 31st, 2014, which results in 43824 samples with missing values. To avoid tackling missing values, only the data from April 2nd, 2010 to May 14th, 2010 were used in our experiments, which contains 1000 samples without missing values.

In our experiments, we analyze the causal relationship between meteorological factors and PM2.5 at hour scale. Studying such relationship can help to understand the underlying mechanism of pollution generation and to build the forecasting model of PM2.5.

5.2 Experiments

In the experiments, we estimated TE between meteorological factors and PM2.5 to measure how the former affect the latter after several hours lag. In the experiments, for (12), two cases are considered: and (), which means the past one and four states are conditioned. The latter case is for testing the Markovianity of time series data. The time lags in the experiments is from 1 to 24 hours. To investigate the relationship between three CE terms in TE, we also estimated them from data respectively. To investigate the dynamic relationship between meteorological factors, we estimated TE between four pairs of factors: temperature to pressure and dew point, and cumulated wind speed to temperature and pressure. TE is estimated with the proposed non-parametric method. The non-parametric method for estimating CE is implemented in the R package ’copent’ on CRAN [42].

We also conducted an experiment to compare our method with two other methods on testing CI: kernel-based CI (KCI) [5] and CDC [35]. The three methods were compared on inferring causal relationships between meteorological

Figure 1: TE of different time lags of meteorological factors on PM2.5.

factors and PM2.5 from data. The R packages ’CondIndTests’ and ’cdcsis’ are used in the experiments as the implementations of KCI and CDC respectively.

5.3 Results

The estimated TE of four factors are shown in Figure 1. It can be learned from the Figure that TE of the four meteorological factor increase sharply in the first 9 hours time lags and that TE of temperature and cumulated wind speed reach to their peak value at 9 hours time lags while TE of the other two factors still increase but with relatively slow rate. Generally speaking, the trends of TE of dew point and pressure are similar and that of the other two factors are similar. The three CE factors of TE of temperature on PM2.5 are illustrated in Figure 2. It can be learned that association strength measured by CE does not increase as causality strength measured by TE increases and that the increasing trend of TE is mostly contributed by the difference between joint CE and self joint CE. TE between meteorological factors are shown in Figure 3. It can be learned

Figure 2: CE Factors of TE of temperature factor.

Figure 3: TE between meteorological factors.

that the influence of temperature on pressure and dew point increase with time and takes more than 10 hours lag to reach its peak and that the influence of cumulated wind speed on temperature and pressure increases very quickly in the first 4-5 hours lag.

The comparison between TE, KCI, and CDC on estimating causality from pressure to PM2.5 is shown in Figure 4. It can be learned that TE and CDC present similar results with an increasing phrase and a slow increasing phrase, while the result of KCI does not show such trend.

6 Discussion

In this paper, we developed a theory of TE representation with only CE. It is essentially a theory of testing CI through CE. In the previous research [10], we have defined CE as a measurement of statistical independence. Therefore, a simple and elegant framework on testing both independence and CI based on only CE is proposed. Based on this theoretical framework, (conditional) independence can be tested by estimating CE. Since (conditional) independence

Figure 4: Estimated TE, CDC, and KCI between pressure and PM2.5.

is of fundamental importance in statistics [43], such theoretical framework and estimation method will have wide applications in the related fields.

Previously, two similar frameworks for testing (conditional) independence were also proposed based on kernel tricks in machine learning [32, 5] and distance covariance/correlation [33, 34, 35]. Compared with these two frameworks, the framework based on CE is much sound theoretically due to the rigorous definition of CE and much efficient computationally due to the simple method for estimating CE. As shown in Figure 4, our method can estimate TE or CI much effectively compared with its competitors.

The CE based representation of TE (11) casts theoretical insights on how causality is measured by attaching each term with causal meaning. The joint CE can be interpreted as measuring all the causal effect on Y from two time series; the self joint CE as measuring the effect of the past of Y on the future Y which corresponds to causal dynamical mechanism of Y ; the association as the causality by the possible common causes which has effect on both the cause X and the past of the effect Y . In a word, TE can be interpreted as the difference between all the effects on Y and the effects of all the factors except X on Y , i.e., the effect of only X on Y .

Compared with the previous works on estimating TE, the proposed method is more computationally efficient. The previous works based on definitions or partial copula, usually require estimating (conditional) CDF or PDF explicitly, which is computationally unstable, especially in the cases of small data and high dimensions. Our method is based the non-parametric CE estimation, which do not need to estimate (conditional) PDF and thus has good convergence guarantee. Similar to ours, the previous works on estimating TE with kNN method [38] is also non-parametric method. The difference between us is that our method is supported by elegant proposition theoretically and therefore computationally efficient since it needs estimate three entropy terms compared with four entropy terms in those previous works.

The application of TE requires time series to be stationary. In our experiment, the data used was collected from Beijing during April to May. Considering the weather condition of this period at Beijing, the stationary of the data can be assumed to be true. Another assumption by TE is Markovianity. In our experiments, only one past state (1 hour lag) is conditioned, which means assuming the Markovianity of meteorological processes at local scale (mesogamme scale) [44]. Considering the distance between two locationswhere the data were collected, we argue that this assumption is reasonable at this temporal and spatial scale in meteorological sense. To validate this assumption, the TE of meteorological factors conditioned on the past four hours lag are also estimated, as shown in Figure 5. Comparison between Figure 1 and Figure 5 shows that conditioning on more hours lag does not change TE too much and the trend of TEs remain same, which suggests that Markovianity is a reasonable assumption for the experiments.

The experimental results show that the effects of meteorological factors on PM2.5 increase with roughly two phrases: the sharp increasing phrase in the first 9 hours time lag with its peak at about 9 hours lag and the flat increasing phrase during which TE of Dew point and pressure increase with relatively

Figure 5: TE of different time lags of meteorological factors conditioned on past four hours states.

flat rate while TE of temperature and cumulated wind speed does increase any more. This phenomenon may means that the effects of meteorological factors on PM2.5 do not show immediately and is a cumulating meteorological process and that they affect the air quality of 9 hours later most. We conjecture that this corresponds to a underlying dynamical mechanism of PM2.5 generation.

The experimental results also show clearly how meteorological factors affects with each other. For example, It can be learned from Figure 3 that wind change temperature and pressure very quickly at Beijing. More specifically, wind changes temperature in 3 hours later and changes pressure in 5 hours later. Compared Figure 1(d) with Figure 3(c) and 3(d), it can be learned that wind has causal effect on temperature and pressure hours more quickly than on PM2.5. This may be explained as how the entrained wind air is blended throughout the mixed layer and may help to build meteorological forecasting models for air quality [44].

The above experimental results help to understand the data with reasonable explanations with reference to meteorological knowledge. This means the proposed method can estimate TE effectively to infer causality relationships from observational data.

The results also show that association and causality are two different things. It can be learned from Figure 2 that even when the association between temperature and PM2.5 does not increase the TE of them still increases clearly. This suggests that only association (or correlation) is not suitable for investigate the causality relationship between temporal factors.

There are several research on analyzing air pollution data with causality tools. Dahlhaus and Eichler [45] tried to infer GC within air pollution data. However, instead of GC, the tool for measuring causal relationship is partial correlations on time series, which makes implicit Gaussian assumptions and infer linear relationship only. Zhu et al. [46] applied TE to analyze the spatialtemporal causality relationships of air pollutants at different locations including Beijing. However, They estimated TE under also the Gaussian assumption, which is unreasonable both theoretically and empirically due to the non-linearity and non-Gaussianity of weather system [47]. Compared with them, our estimation method makes no assumption on the underlying distributions and therefore derives more reliable results. Another related work by Kreuzer et al. [41] applied copula to extend Gaussian state space model for predicting air pollution with the same dataset as ours. However, even introducing copula to state space model extends model flexibility for the issues of non-linearity and non-Gaussianity, state space model is still questionable for the dynamics of the underlying atmospheric system. For example, the meaning of the state variables is unexplainable in theory and unobservable in practice. Meanwhile, selecting parametric copula model brings the risk of model misspecification.

7 Conclusion

In this paper, we prove that TE can be represented with only CE and then propose a non-parametric method for estimating TE via CE which composed of only two simple steps. The proposed method was applied to analyze the Beijing PM2.5 data in the experiments. Experimental results show that the proposed method can identify causality relationships from data, discover how meteorological factors affects PM2.5 and each other, and hence help to understand the data better and to build better forecasting model for time series data. The experiments that compare the proposed method with other methods on testing CI show the advantage of our method.

References

[1] Paul W Holland. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960, 1986.

[2] Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, pages 424–438, 1969.

[3] Clive WJ Granger. Testing for causality: a personal viewpoint. Journal of Economic Dynamics and Control, 2:329–352, 1980.

[4] Thomas Schreiber. Measuring information transfer. Physical Review Letters, 85(2):461, 2000.

[5] Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨olkopf. Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, page 804–813, Arlington, Virginia, USA, 2011. AUAI Press.

[6] Lionel Barnett, Adam B Barrett, and Anil K Seth. Granger causality and transfer entropy are equivalent for gaussian variables. Physical Review Letters, 103(23):238701, 2009.

[7] James Massey. Causality, feedback and directed information. In Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), pages 303–305. Citeseer, 1990.

[8] Roger B Nelsen. An introduction to copulas. Springer Science & Business Media, 2007.

[9] Harry Joe. Dependence modeling with copulas. CRC press, 2014.

[10] Jian Ma and Zengqi Sun. Mutual information is copula entropy. Tsinghua Science & Technology, 16(1):51–54, 2011.

[11] Jian Ma. Discovering association with copula entropy. arXiv preprint arXiv:1907.12268, 2019.

[12] Luca Faes, Giandomenico Nollo, and Alberto Porta. Compensated transfer entropy as a tool for reliably estimating information transfer in physiological time series. Entropy, 15(1):198–219, 2013.

[13] Raul Vicente, Michael Wibral, Michael Lindner, and Gordon Pipa. Trans- fer entropy—a model-free measure of effective connectivity for the neurosciences. Journal of Computational Neuroscience, 30(1):45–67, 2011.

[14] Ioannis Kontoyiannis and Maria Skoularidou. Estimating the directed in- formation and testing for causality. IEEE Transactions on Information Theory, 62(11):6053–6067, 2016.

[15] Taoufik Bouezmarni, Jeroen VK Rombouts, and Abderrahim Taamouti. Nonparametric copula-based test for conditional independence with applications to granger causality. Journal of Business & Economic Statistics, 30(2):275–287, 2012.

[16] Meng Hu and Hualou Liang. A copula approach to assessing granger causal- ity. NeuroImage, 100:125–134, 2014.

[17] Kyungchul Song et al. Testing conditional independence via rosenblatt transforms. The Annals of Statistics, 37(6B):4011–4045, 2009.

[18] No¨el Veraverbeke, Marek Omelka, and Irene Gijbels. Estimation of a condi- tional copula and association measures. Scandinavian Journal of Statistics, 38(4):766–780, 2011.

[19] Aleksander Wieczorek and Volker Roth. Causal compression. arXiv preprint arXiv:1611.00261, 2016.

[20] Jong-Min Kim, Namgil Lee, and Sun Young Hwang. A copula nonlinear granger causality. Economic Modelling, 88:420–430, 2020.

[21] Ruifei Cui, Perry Groot, and Tom Heskes. Learning causal structure from mixed data with missing values using gaussian copula models. Statistics and Computing, 29(2):311–333, 2019.

[22] Sashank J Reddi and Barnab´as P´oczos. Scale invariant conditional depen- dence measures. In International Conference on Machine Learning, pages 1355–1363. PMLR, 2013.

[23] Wicher P. Bergsma. Testing conditional independence for continuous ran- dom variables. Report Eurandom, 2004048, 2004.

[24] Wicher Bergsma. Nonparametric testing of conditional independence by means of the partial copula. Available at SSRN 1702981, 2010.

[25] Pascal Bianchi, Kevin Elgui, and Fran¸cois Portier. Conditional independence testing via weighted partial copulas. arXiv preprint arXiv:2006.12839, 2020.

[26] Lasse Petersen and Niels Richard Hansen. Testing conditional independence via quantile regression based partial copulas. arXiv preprint arXiv:2003.13126, 2020.

[27] Lorenzo Frattarolo and Dominique Guegan. Empirical projected copula process and conditional independence an extended version. 2013.

[28] Stefan Frenzel and Bernd Pompe. Partial mutual information for coupling analysis of multivariate time series. Physical Review Letters, 99(20):204101, 2007.

[29] Martin Vejmelka and Milan Paluˇs. Inferring the directionality of coupling with conditional mutual information. Physical Review E, 77(2):026214, 2008.

[30] Barnab´as P´oczos and Jeff Schneider. Nonparametric estimation of condi- tional information and divergences. In Artificial Intelligence and Statistics, pages 914–923. PMLR, 2012.

[31] Jakob Runge. Conditional independence testing based on a nearestneighbor estimator of conditional mutual information. In International Conference on Artificial Intelligence and Statistics, pages 938–947. PMLR, 2018.

[32] Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Sch¨olkopf, and Alex J. Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems 20, volume 20, pages 585–592, 2007.

[33] G´abor J. Sz´ekely, Maria L. Rizzo, and Nail K. Bakirov. Measuring and test- ing dependence by correlation of distances. Annals of Statistics, 35(6):2769– 2794, 2007.

[34] G´abor J. Sz´ekely and Maria L. Rizzo. Brownian distance covariance. The Annals of Applied Statistics, 3(4):1236–1265, 2009.

[35] Xueqin Wang, Wenliang Pan, Wenhao Hu, Yuan Tian, and Heping Zhang. Conditional distance correlation. Journal of the American Statistical Association, 110(512):1726–1734, 2015.

[36] M Sklar. Fonctions de repartition an dimensions et leurs marges. Publ. inst. statist. univ. Paris, 8:229–231, 1959.

[37] Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.

[38] Alexander Kraskov, Harald St¨ogbauer, and Peter Grassberger. Estimating mutual information. Physical Review E, 69(6):066138, 2004.

[39] Arthur Asuncion and David Newman. UCI machine learning repository, 2007.

[40] Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. Assessing Beijing’s PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2182):20150257, 2015.

[41] Alexander Kreuzer, Luciana Dalla Valle, and Claudia Czado. A bayesian non-linear state space copula model to predict air pollution in Beijing. arXiv preprint arXiv:1903.08421, 2019.

[42] Jian Ma. copent: Estimating copula entropy in R. arXiv preprint arXiv:2005.14025, 2020.

[43] A. P. Dawid. Conditional independence in statistical theory. Journal of the Royal Statistical Society Series B-Methodological, 41(1):1–15, 1979.

[44] Nelson L Seaman. Meteorological modeling for air-quality assessments. Atmospheric Environment, 34(12-14):2231–2259, 2000.

[45] Rainer Dahlhaus and Michael Eichler. Causality and graphical models in time series analysis. Oxford Statistical Science Series, pages 115–137, 2003.

[46] Julie Yixuan Zhu, Yu Zheng, Xiuwen Yi, and Victor OK Li. A gaussian bayesian model to identify spatio-temporal causalities for air pollution based on urban big data. In 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 3–8. IEEE, 2016.

[47] Philip Sura, Matthew Newman, C´ecile Penland, and Prashant Sardesh- mukh. Multiplicative noise and non-gaussianity: A paradigm for atmospheric regimes? Journal of the Atmospheric Sciences, 62(5):1391–1409, 2005.

designed for accessibility and to further open science