Minimum entropy production in multipartite processes due to neighborhood constraints

2020·Arxiv

Abstract

Abstract

I consider multipartite processes in which there are constraints on each subsystem’s rate matrix, restricting which other subsystems can directly affect its dynamics. I derive a strictly nonzero lower bound on the minimal achievable entropy production rate of the process in terms of these constraints on the rate matrices of its subsystems. The bound is based on constructing counterfactual rate matrices, in which some subsystems are held fixed while the others are allowed to evolve. This bound is related to the “learning rate” of stationary bipartite systems, and more generally to the “information flow” in bipartite systems.

Introduction.— Many systems are naturally modeled as having two or more interacting subsystems. Recent research in stochastic thermodynamics [7, 21, 26, 29] has started to investigate such composite systems [1, 9, 11, 12, 18–20]. So far, most of the research has been on the special case of bipartite processes, i.e., systems composed of two co-evolving subsystems, which have zero probability of making a state transition simultaneously [1, 2, 9, 11, 12, 18–20, 22, 23]. However, given that many systems have more than just two interacting subsystems, research is starting to extend to fully multipartite processes [10, 12, 30].

The definition of any composite system specifies which subsystems directly affect the dynamics of which other subsystems. It is now known that just by itself, such a specification of which subsystem affects which other one can cause a strictly positive lower bound on the entropy production rate (EP) of the overall composite system [4, 27, 29]. This minimal EP has sometimes been called “Landauer loss”, because it is the extra EP beyond the minimal amount (namely, zero) implicit in the Landauer bound [27–29]

Previous analyses of Landauer loss focused on scenarios where every subsystem evolves in isolation, without any direct coupling to the other subsystems. This is a severe limitation of those analyses. As an illustration, consider a composite system with three subsystems A,B and C. B evolves independently of A and C. However, B is continually observed by C as well as A. Moreover, suppose that A is really two subsystems, 1 and 2. Only subsystem 2 directly observes B, whereas subsystem 1 observes subsystem 2, e.g., to record a running average of the values of subsystem 2 (see Fig. 1).

There has been some work on a simplified version of this scenario, in which subsystem 4 is absent and subsystem 3 is required to be at equilibrium [3, 9]. But this work has focused on issues other than the minimal EP.

To investigate Landauer loss in these kinds of composite systems, here I model them as multipartite processes, in which each subsystem evolves according to its own rate matrix [10]. So restrictions on the direct coupling of

FIG. 1. Four subsystems, {1,2,3,4} interacting in a multipartite process. The red arrows indicate dependencies in the associated four rate matrices. B evolves autonomously, but is continually observed by A and C. So the statistical coupling between A and C could grow with time, even though their rate matrices do not involve one another. The three overlapping sets indicated at the bottom of the figure specify the three communities of a community structure for this process.

any subsystem i to the other subsystems are modeled as restrictions on the rate matrix of subsystem i, to only involve a limited set of other subsystems, called the “community” of i. (These are instead called “neighborhoods” in [10], but that expression already means something in topology, and so I don’t use it here.)

In this paper I derive a lower bound on the Landauer loss rate of composite systems, by deriving an exact equation for that minimal EP rate as a sum of non-negative expressions. One of those expressions is related to quantities that were earlier considered in the literature. It reduces to what has been called the “learning rate” in the special case of stationary bipartite systems [1, 5, 9]. That expression is also related to what (in a different context) has been called the “information flow” between a pair of subsystems [10, 11].

lar set of N subsystems, with finite state spaces {Xi : i = 1,...N}. x indicates a vector in X, the joint space of N . For any A ⊂ N , I write −A := N \A. So for example x−A is the vector of all components of x other than those in A. A distribution over a set of values x at time t is written as pX(t), with its value for x ∈ X written as pXx (t), or just px(t) for short. Similarly, pX|Yx,y (t) is the conditional dis- tribution of X given Y at time t, evaluated for the event X = x,Y = y (which I sometimes shorten to px|y(t)). I write Shannon entropy as S(pX(t)), St(X), or SX(t), as convenient. I also write the conditional entropy of X given Y at t as SX|Y(t). I write the Kronecker delta as both δ(a,b) or δab.

The joint system evolves as a multi-partite process, there is a set of time-varying stochastic rate matrices, {Kx′x (i;t) : i = 1,...,N}, where for all i, Kx′x (i;t) = 0 if x′−i ̸= x−i, and where the joint dynamics over X is gov- erned by the master equation

Note that each subsystem can be driven by its own external work reservoir, according to a time-varying protocol. For any A ⊆ N I define

Each subsystem i’s marginal distribution evolves as

due to the multipartite nature of the process [14]. Eq. (5) shows that in general the marginal distribution pxi will not evolve according to a continuous-time Markov chain (CTMC) over ∆Xi.

For each subsystem i, I write r(i;t) for any set of subsystems at time t that includes i where we can write

for an appropriate set of functions K x′r(i;t) xr(i;t) (i;t). In gen-

eral, r(i;t) is not uniquely defined, since I make no re-

quirement that it be minimal. I refer to the elements of r(i;t) as the leaders of i at time t. Note that the leader relation need not be symmetric. A community ω at time t is a set of subsystems such that i ∈ ω implies that r(i;t) ⊆ ω. Any intersection of two communities is a community, as is any union of two communities. Intuitively, a community is any set of subsystems whose evolution is independent of the states of the subsystems outside the community (although in general, the evolution of those external subsystems may depend on the states of subsystems in the community).

A specific set of communities that covers N and is closed under intersections is a community structure. A community topology is a community structure that is closed under unions, with the communities of the structure being the open sets of the topology. However, in general, unless explicitly stated otherwise, any community structure being discussed does not have N itself as a member.

As an example of these definitions, [1, 8, 9] investigate a special type of bipartite system, where the “internal” subsystem B observes the “external” subsystem A, but cannot affect the dynamics of that external subsystem. So A is its own community, evolving independently of B, while B is not its own community; its dynamics depends on the state of A as well as its own state. Another example of these definitions is illustrated in Fig. 1.

For simplicity, from now on I assume that the set of communities doesn’t change with t. Accordingly I shorten r(i;t) to r(i). For any community ω I write

(See SI.) So a community evolves according to a self-contained CTMC, in contrast to the general case of a single subsystem (cf. Eq. (5)).

I assume that each subsystem is attached to at most one thermal reservoir, and that all such reservoirs have the same temperature [10]. Accordingly, the expected entropy flow (EF) rate of any community ω ⊆ N at time t is

which I often shorten to ⟨ ˙Qω(t)⟩ [7, 26]. (Note that this is entropy flow from ω into the environment.) Make the associated definition that the expected EP rate of ω at time t is

which I often shorten to ⟨ ˙σω(t)⟩.

I refer to ⟨ ˙σω(t)⟩ as a local EP rate, and define the global EP rate as ⟨ ˙σ(t)⟩ := ⟨ ˙σN (t)⟩. For any community ω, ⟨ ˙σω(t)⟩ ≥ 0, since ⟨ ˙σω(t)⟩ has the usual form of an EP rate of a single system. In addition, that lower bound of 0 is achievable, e.g., if Kx′ωxω (ω;t)px′ω(t) = Kxωx′ω (ω;t)pxω(t) at time t for all xω,x′ω.

It is worth comparing the local EP rate to similar quantities that have been investigated in the literature. In contrast to ⟨ ˙σω(t)⟩, the quantity “σX” introduced in the analysis of (autonomous) bipartite systems in [23] is the EP of a single trajectory, integrated over time. More importantly, its expectation can be negative, unlike (the time-integration of) ⟨ ˙σω(t)⟩. On the other hand, the quantity “ ˙SXi ” considered in the analysis of bipartite systems in [11] is a proper expected EP rate, and so is non-negative. However, it (and its extension considered in [10]) is one term in a decomposition of the expected EP rate generated by a single community. It does not concern the EP rate of an entire community in a system with multiple communities. Finally, the quantity “σΩ” considered in [23] is also non-negative. However, it gives the total EP rate generated by a subset of all possible global state transitions, rather than the EP rate of a community [15].

EP bounds from counterfactual rate matrices.— To analyze the minimal EP rate in multipartite processes, we need to introduce two more definitions. First, given any function f : ∆X → R and any A ⊂ N (not necessarily a community), define the A-(windowed) derivative of f (p(t)) under rate matrix K(t) as

(See Eq. (3).) Intuitively, this is what the derivative of f (p(t)) would be if (counterfactually) only the subsystems in A were allowed to change their states.

In particular, the A-derivative of the conditional entropy of X given XA is

which I sometimes write as just dAdt SX|XA(t). (See Eq. 4

in [10] for a similar quantity.) dAdt SX|XA(t) measures how quickly the statistical coupling between XA and X−A changes with time, if rather than evolving under the actual rate matrix, the system evolved under a counterfactual rate matrix, in which x−A is not allowed to change. In the SI it is shown that in the special case that A is

a community, dAdt SX|XA(t) is the derivative of the neg- ative mutual information between XA and X−A, under the counterfactual rate matrix K(A;t), and is therefore non-negative [16].

The second definition we need is a variant of ⟨ ˙σω;K(t)⟩, which will be indicated by using subscripts rather than superscripts. For any A ⊆ B ⊆ N where B is a community (but A need not be),

which I abbreviate as ⟨ ˙σK(A;t)(t)⟩ when B = N . ⟨ ˙σK(A;t)(t)⟩ is a global EP rate, only evaluated under the counterfactual rate matrix K(A;t). Therefore it is non-negative. In contrast, ⟨ ˙σω;K(t)⟩ is a local EP rate. In the special case that A = ω is a community, these two EP rates are related by ⟨ ˙σK(A;t)(t)⟩ = ⟨ ˙σA;K(t)⟩ + dAdt SX|XA(t) (see Eq. (B3) in the SI).

In the SI it is shown that for any pair of communities, ω and ω′ ⊂ ω,

(See Fig. 1 for an illustration of such a pair of communities ω,ω′ ⊂ ω.) The first term on the RHS is the EP rate arising from the subsystems within community ω′, and the second term is the “left over” EP rate from the subsystems that are in ω but not in ω′. The third term is a time-derivative of the conditional entropy between those two sets of subsystems. All three of these terms are non-negative, so each of them provides a lower bound on the EP rate.

Eq. (15) is the major result of this paper. In particular, setting ω = N and then consolidating notation by rewriting ω′ as ω, Eq. (15) shows that for any community ω,

(where the shorthand notation has been used).

As an example of Eq. (17), consider again the type of bipartite process analyzed in [1, 8, 9]. Suppose we set ω to contain only what in [8] is called the “external” subsystem. Then if we also make the assumption of those papers that the full system is in a stationary state, dSX/dt = dSXω/dt = dSX−ω/dt = 0. So by Eq. (13),

(The RHS is called the “learning rate” of the internal subsystem about the external subsystem — see Eq. (8) in [5], and note that the rate matrix is normalized.)

So in this scenario, Eq. (17) above reduces to Eq. 7 of [1], which lower-bounds the global EP rate by the learning rate. However, Eq. (17) lower-bounds the global EP rate even if the system is not in a stationary state, which need be the case with the learning rate [17]. More generally, Eq. (16) applies to arbitrary multipartite processes, not just those with two subsystems, and is an exact equality rather than just a bound.

In some situations we can get an even more refined decomposition of EP rate by substituting Eq. (15) into Eq. (16) to expand the first EP rate on the RHS of Eq. (16). This gives a larger lower bound on ⟨ ˙σ(t)⟩ than the one in Eq. (17). For example, if ω and ω′ ⊂ ω are both communities under K(t), then

Both of the terms on the RHS in Eq. (20) are non-negative. In addition, both can be evaluated without knowing the detailed physics occurring within communities ω or ω′, only knowing how the statistical coupling between communities evolves with time.

This can be illustrated with the scenario depicted in Fig. 1. Using the communities ω and ω′ specified there, Eq. (20) says that the global EP rate is lower-bounded by the sum of two terms. The first is the derivative of the negative mutual information between subsystem 4 and the first three subsystems, if subsystem 4 were held fixed. The second is the derivative of the negative mutual information between subsystem 3 and the first two subsystems, if those two subsystems were held fixed.

Alternatively, suppose that ω is a community under K, and that some set of subsystems α is a community under K(N \ ω;t). Then since the term ⟨ ˙σK(N \ω;t)(t)⟩ in Eq. (16) is a global EP rate over N under rate matrix K(N \ω;t), we can again feed Eq. (15) into Eq. (16), (this time to expand the second rather than first term on the RHS of Eq. (16)) to get

The RHS of Eq. (22) also exceeds the bound in Eq. (17), by the negative α-derivative of the mutual information between XN \α and Xα, under the rate matrix K(N \ω;t).

Example.— Depending on the full community structure, we may be able to combine Eqs. (15) and (21) into an even larger lower bound on the global EP rate than Eq. (22). To illustrate this, return to the scenario depicted in Fig. 1. Take ω = {1,2,3} and α = {3,4}, as indicated in that figure. Note that the four sets {1},{2},{3},{3,4} form a community structure of

since under Kx′x ({4};t), neither subsystem 1,2 nor 3 changes its state. So α is a member of a community structure of K(N \ ω;t), and we can apply Eq. (21).

The first term in Eq. (21), ⟨ ˙σω(t)⟩, is the local EP rate that would be jointly generated by the set of three subsystems {1,2,3), if they evolved in isolation from the other subsystem, under the self-contained rate matrix

The third term in Eq. (21) is the local EP rate that would be jointly generated by the two subsystems {3,4}, if they evolved in isolation from the other two subsystems, but rather than do so under the rate matrix K(α;t) = K({3,4};t), they did so under the rate matrix Kx′x (N \ω;t) given in Eq. (23). (Note that Kx′x (N \ω;t) = 0 if x′3 ̸= x3, unlike Kx′x ({3,4};t).) The fourth term in Eq. (21) is the global EP rate that would be generated by evolving all four subsystems under the rate matrix for the subsystems in (N \ ω) \ α. But there are no subsystems in that set. So this fourth term is zero.

Both that first and third term in Eq. (21) are non-negative. The remaining two terms – the second and the fifth in Eq. (21) — are also non-negative. However, in contrast to the terms just discussed, these two depend only on derivatives of mutual informations. Specifically, the second term in Eq. (21) is the negative derivative of the mutual information between the joint random variable X1,2,3 and X4, under the rate matrix Kx′x ({1,2,3};t). Next, since N \ α = {1,2}, the fifth term is the negative derivative of the mutual information between X1,2 and X3,4, under the rate matrix given by windowing α onto K(N \ ω;t), i.e., under the rate matrix Kx′x ({4};t).

Recalling that ω := {1,2,3},α := {3,4} and defining γ := {4}, we can combine these results to express the global EP rate of the system illustrated in Fig. 1 in terms of the rate matrices of the four subsystems:

All five terms on the RHS of Eq. (25) are non-negative. Translated to this scenario, previous results concerning learning rates consider the special case of a stationary state px(t), and only tell us that the global EP rate is bounded by the fourth term on the RHS of Eq. (25):

Finally note that we also have a community ω′ = {3} which is a proper subset of both ω and α. So, for example, we can plug this ω′ into Eq. (15) to expand the first term in Eq. (21), ⟨ ˙σω;K(ω;t)(t)⟩, replacing it with the sum of three terms. The first of these three new terms, ⟨ ˙σω′;K(ω;t)(t)⟩, is the local EP rate generated by subsystem {3} evolving in isolation from all the other subsystems. The second of these new terms, ⟨ ˙σK(ω\ω′;t);ω(t)⟩, is the EP rate that would be generated if the set of three subsystems {1,2,3} evolved in isolation from the remaining subsystem, 4, but under the rate matrix

The third new term is the negative derivative of the mutual information between X1,2 and X3, under rate matrix K(ω;t). All three of these new terms are non-negative.

Discussion.— There are other decompositions of the global EP rate which are of interest, but don’t always provide non-negative lower-bounds on the EP rate. One of them based on the inclusion-exclusion principle is discussed in Appendix E. Future work involves combining these (and other) decompositions, to get even larger lower bounds.

I would like to thank Sosuke Ito, Artemy Kolchinsky, Kangqiao Liu, Alec Boyd, Paul Riechers, and especially Takahiro Sagawa for stimulating discussion. This work was supported by the Santa Fe Institute, Grant No. CHE-1648973 from the US National Science Foundation and Grant No. FQXi-RFP-IPW-1912 from the FQXi foundation. The opinions expressed in this paper are those of the author and do not necessarily reflect the view of the National Science Foundation.

[1] Andre C. Barato, David Hartich, and Udo Seifert, Effi-ciency of cellular information processing, New Journal of Physics 16 (2014), no. 10, 103024.

[2] Gili Bisker, Matteo Polettini, Todd R Gingrich, and Jordan M Horowitz, Hierarchical bounds on entropy production inferred from partial information, Journal of Statistical Mechanics: Theory and Experiment 2017 (2017), no. 9, 093210.

[3] Stefano Bo, Marco Del Giudice, and Antonio Celani, Thermodynamic limits to information harvesting by sensory systems, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 1, P01014.

[4] Alexander B Boyd, Dibyendu Mandal, and James P Crutchfield, Thermodynamics of modularity: Structural costs beyond the landauer bound, Physical Review X 8 (2018), no. 3, 031036.

[5] Rory A Brittain, Nick S Jones, and Thomas E Ouldridge, What we learn from the learning rate, Journal of Statistical Mechanics: Theory and Experiment 2017 (2017), no. 6, 063502.

[6] Thomas M. Cover and Joy A. Thomas, Elements of information theory, John Wiley & Sons, 2012.

[7] Massimiliano Esposito and Christian Van den Broeck, Three faces of the second law. i. master equation formulation, Physical Review E 82 (2010), no. 1, 011143.

[8] D. Hartich, A. C. Barato, and U. Seifert, Stochastic thermodynamics of bipartite systems: transfer entropy inequalities and a Maxwell’s demon interpretation, Journal of Statistical Mechanics: Theory and Experiment 2014 (2014), no. 2, P02016.

[9] David Hartich, Andre C. Barato, and Udo Seifert, Sensory capacity: an information theoretical measure of the performance of a sensor, Physical Review E 93 (2016), no. 2, arXiv: 1509.02111.

[10] Jordan M. Horowitz, Multipartite information flow for multiple Maxwell demons, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 3, P03006.

[11] Jordan M Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Physical Review X 4 (2014), no. 3, 031015.

[12] Sosuke Ito and Takahiro Sagawa, Information thermodynamics on causal networks, Physical review letters 111 (2013), no. 18, 180603.

[13] William McGill, Multivariate information transmission, Transactions of the IRE Professional Group on Information Theory 4 (1954), no. 4, 93–111.

[14] To see this, note that if x′i ̸= xi, then the only way for Kx′x (j;t)px′(t) to be nonzero is if x′−i = x−i and j = i. If instead x′i = xi, j can differ from i. However, if j ̸= i then the sum over x−i in Eq. (4) runs over all values of xj. By normalization of the rate matrix Kx′x (j;t), that sum must equal zero.

[15] It is possible to choose that subset of state transitions so that σΩ concerns all transitions in which one particular subsystem changes state while all others do not. In this case, like the quantity ˙SXi considered in [11], σΩ is a single term in the decomposition of the EP rate generated by a single community.

[16] In [10, 11], the A-derivative of the mutual information between A and N \ A, dAI(XA;XN \A)/dt, is interpreted as the “information flow” from N \ A to A. (See Eq. 12 in [10].) However, in the scenarios considered in this paper A will always be a community. Therefore none of the subsystems in A will evolve in a way directly dependent on the state of any subsystem in N \ A (nor vice-versa). So the fact that dAI(XA;XN \A)/dt < 0 will not indicate that information in A concerning N \ A “flows” between A and N \A in any sense. In the current context, it would be more accurate to refer to −dAS(X|XA)/dt as the “forgetting rate” of A concerning N \ A, than as “information flow”. dAI(XA;XN \A)/dt is also related to what is termed “nostalgia” in [24]. However, that paper considers discrete-time rather than continuous-time processes,

where subsystems are required to start in thermal equilibrium.

[17] Recall from the discussion above of the scenario consid- ered in [1] that while the external subsystem ω is its own community, the internal subsystem is not. This means that in general, if the full system is not in a stationary state, then the learning rate of the internal subsystem about the external subsystem (as defined in [1]) cannot be expressed as dωSX|Xω(t)/dt with ω being a community.

[18] Juan MR Parrondo, Jordan M Horowitz, and Takahiro Sagawa, Thermodynamics of information, Nature Physics 11 (2015), no. 2, 131–139.

[19] Takahiro Sagawa and Masahito Ueda, Second law of thermodynamics with discrete quantum feedback control, Physical review letters 100 (2008), no. 8, 080403.

[20] , Minimal energy cost for thermodynamic information processing: measurement and information erasure, Physical review letters 102 (2009), no. 25, 250602.

[21] Udo Seifert, Stochastic thermodynamics, fluctuation theorems and molecular machines, Reports on Progress in Physics 75 (2012), no. 12, 126001.

[22] Ito S. Kawaguchi K. Shiraishi, N. and T. Sagawa, Role of measurement-feedback separation in autonomous maxwell?s demons, New Journal of Physics (2015).

[23] Sagawa T. Shiraishi, N., Fluctuation theorem for partially masked nonequilibrium dynamics, Physical Review E (2015).

[24] Susanne Still, David A Sivak, Anthony J Bell, and Gavin E Crooks, Thermodynamics of prediction, Physical review letters 109 (2012), no. 12, 120604.

[25] Hu Kuo Ting, On the amount of information, Theory of Probability & Its Applications 7 (1962), no. 4, 439–447.

[26] Christian Van den Broeck and Massimiliano Esposito, Ensemble and trajectory thermodynamics: A brief introduction, Physica A: Statistical Mechanics and its Applications 418 (2015), 6–16.

[27] David Wolpert and Artemy Kolchinsky, The thermodynamics of computing with circuits, New Journal of Physics (2020).

[28] David H. Wolpert, Overview of information theory, computer science theory, and stochastic thermodynamics for thermodynamics of computation, Energetics of computing in life and machines (David H. Wolpert, Christopher P Kempes, Peter Stadler, and Josh Grochow, eds.), Santa Fe Institute Press, 2019.

[29] , The stochastic thermodynamics of computation, Journal of Physics A: Mathematical and Theoretical (2019).

[30] David H. Wolpert, Christopher P Kempes, Peter Stadler, and Josh Grochow (eds.), The energetics of computing in life and machines, Santa Fe Institute Press, 2019.

Write

If j ̸∈ ω, then a sum over all x−ω in particular runs over all xj. Therefore we get

Using the fact that we have a multipartite process and then the fact that ω is a community, we can expand this remaining expression as

To complete the proof plug in the definition of Kx′ωxω (ω;t).

Appendix B: Expansions of EP rates in multipartite processes

Lemma 1. Suppose we have a multipartite process over a set of systems N defined by a set of rate matrices {Kx′x (i;t)} and a subset A ∈ N . Then

� =� x,x′ Kx′x (A;t)px′(t)ln�Kx′x (A;t)Kxx′(A;t)

= � i∈A,x,x′ Kx′x (i;t)px′(t)ln�Kx′x (i;t)Kxx′(i;t)

If in addition A is a community under K, then we can also write the quantity in Eq. (B1) as

K x′A xA (A;t)px′A(t)ln

Proof. Invoking the multipartite nature of the process allows us to write

� = � i∈A,x′,x Kx′x (i;t)px′(t)ln�Kx′x (t)Kxx′(t)

�j Kx′i,x−ixi,x−i (j;t) �j Kxi,x−ix′i,x−i (j;t)

�j∈A Kx′i,x−ixi,x−i (j;t)

�j∈A Kxi,x−ix′i,x−i (j;t)

Eq. (B4) establishes Eq. (B1) and Eq. (B5) establishes Eq. (B2). To establish Eq. (B3), use the hypothesis that A is a community to expand

� =� x,x′ Kx′x (A;t)δx′−Ax−Apx′(t)ln

=� x,x′ K x′A xA (A;t)δx′−Ax−Apx′(t)ln

If A is a community, then

− � x,x′ Kx′x (A;t)px′(t)lnpxA(t) = −� xA,x′A

= ddt SXA(t) (C1)

We can combine this with Eq. (13) to expand

dA dt SX|XA(t) = ddt SX(t) − ddt SXA(t) (C2)

(Note that this expansion need not hold if A is not a community.) Suppose we could also establish that because subsystems outside of A don’t evolve under K(A;t), then SX−A(t) doesn’t change in time, i.e., that

dA dt SX−A(t) = 0 (C3)

This would then imply that

dA dt SX|XA(t) = dAdt IX−A|XA(t) (C4)

the windowed time-derivative of the mutual information between the communities in A and those outside of it. However, SX−A(t) is given by marginalizing pX(t) down to the subsystems in −A, by averaging over xA. In general, if A is not a community, those subsystems are statistically coupled with the ones in A. So as the subsystems in A evolve, SX−A(t) might change, i.e., Eq. (C3) may not hold.

This turns out not to be a problem when A is a community. To see this, first simplify notation by using P rather than p to indicate joint distributions that would evolve if K(t) were replaced by the counterfactual rate matrix K(A;t), starting from px(t). By definition,

δxA(t),x−A(t)xA(t+δt),x−A(t+δt) − P (xA(t + δt),x−A(t + δt) | xA(t),x−A(t)) δt (C5)

However, since by hypothesis A is a community,

KxA(t),x−A(t)xA(t+δt),x−A(t+δt)(A;t) = KxA(t)xA(t+δt)(A;t)δx−A(t)x−A(t+δt) (C6)

Plugging this into Eq. (C5) and summing both sides over xA(t + δt) shows that to leading order in δt,

P (x−A(t + δt) | xA(t),x−A(t)) = δx−A(t+δt)x−A(t) (C7)

Eq. (C7) in turn implies that to leading order in δt,

P(x(t + δt) | x(t)) = P (xA(t + δt) | x−A(t + δt),xA(t),x−A(t))P (x−A(t + δt) | xA(t),x−A(t)) (C8)

= P (xA(t + δt) | xA(t),x−A(t),x−A(t + δt) = x−A(t))δx−A(t+δt)x−A(t) (C9)

This formalizes the statement in the text that under the rate matrix K(A), x−A does not change its state. Next, since A is a community under K(A;t), we can expand further to get

P(xA(t + δt),x−A(t + δt) | xA(t),x−A(t)) = P (xA(t + δt) | xA(t))δx−A(t+δt)x−A(t) (C10)

So the full joint distribution is

P(xA(t + δt),x−A(t + δt),xA(t),x−A(t)) = P (xA(t + δt) | xA(t))δx−A(t+δt)x−A(t) P(xA(t),x−A(t)) (C11)

We can use this form of the joint distribution to establish the following two equations

SP(X−A(t + δt) | X−A(t),XA(t + δt) = 0 (C12) SP(X−A(t) | X−A(t + δt),XA(t + δt) = 0 (C13)

Applying the chain rule for entropy to decompose SP(X−A(t),X−A(t + δt) | XA(t + δt)) in two different ways, and plugging Eqs. (C12) and (C13), respectively, into those two decompositions, we see that

SP(X−A(t + δt) | XA(t + δt)) = SP(X−A(t) | XA(t + δt)) (C14)

Next, use Eq. (C14) to expand

dA;K dt SX|XA(t) = limδt→0SP(X−A(t) | XA(t)) − SP(X−A(t + δt) | XA(t + δt))δt (C15)

= lim δt→0 SP(X−A(t) | XA(t)) − SP(X−A(t) | XA(t + δt)) δt (C16)

Add and subtract S(X−A(t)) in the numerator on the RHS to get

dA;K dt SX|XA(t) = limδt→0I(X−A(t) | XA(t)) − I(X−A(t);XA(t + δt))δt (C17)

Since X−A(t) and XA(t + δt) are conditionally independent given XA(t), the difference of mutual informations in the numerator on the RHS is non-negative, by the data-processing inequality [6]. This completes the proof.

For simplicity of the exposition, treat ω as though it were all of N , i.e., suppress the ω index in xω and x′ω, suppress the ω argument of K(ω;t), and implicitly restrict sums over subsystems i to elements of ω. Then using the definition of K(ω′;t), we can expand

Since ω′ is a community, by Eq. (B3) we can rewrite the first sum on the RHS of Eq. (D2) as

� ������� −�x,x′Kx′x (ω′;t)px′(t)ln� px(t)pxω′ (t)

Moreover, by Eq. (B2), even though ω \ ω′ need not be a community, the second sum in Eq. (D2) can be rewritten as

� =� x,x′ Kx′x (ω \ ω′;t)px′(t)ln � Kx′x (ω \ ω′;t) Kxx′(ω \ ω′;t)px(t)

= ⟨ ˙σK(ω\ω′;t)(t)⟩ (D5)

Combining completes the proof. In order to express that proof as in the main text, with the implicit ω once again made explicit, use the fact that windowing K(ω;t) to ω′ ⊂ ω is the same as windowing K(t) to ω′.

For all n > 1, write N n for the multiset of all intersections of n of the sets of subsystems ωi:

N 2 = {ωi ∩ ωj : 1 ≤ i < j < |N 1|} (E1) N 3 = {ωi ∩ ωj ∩ ωk : 1 ≤ i < j < k < |N 1|} (E2)

and so on, up to N |N 1|. Any community structure N 1 specifies an associated set of sets,

N := N 1 ∪ N 2 ∪ ... ∪ N |N 1| (E3)

Note that every element of N is itself a community, since intersections of communities are unions of communities. Given any function f : N → R, the associated inclusion-exclusion sum (or just “in-ex sum”) is

Σf := � ω∈N 1 f (ω) − � ω∈N 2 f (ω) + � ω∈N 3 f (ω) − ... (E4)

In particular, given any distribution px, there is an associated real-valued function mapping any ω ∈ N to the marginal entropy of (the subsystems in) ω. So using SN to indicate that function,

ΣS := � ω∈N 1 Sω − � ω∈N 2 Sω + � ω∈N 3 Sω − ... (E5)

where Sω is shorthand for SXω. I refer to ΣS − SN as the in-ex information. As an example, if N 1 consists of two subsets, ω1,ω2, with no intersection, then the in-ex information is just the mutual information I(Xω1;Xω2). As another example, if N 1 consists of all singletons i ∈ N , then the in-ex information is the multi-information of the N separate random variables.

The global EP rate is the negative derivative of the in-ex information, plus the in-ex sum of local EP rates:

⟨ ˙σ(t)⟩ = dSN (t)dt + ⟨ ˙QN (t)⟩ (E6)

�SN (t) − ΣS(t)�+ Σ⟨ ˙σN (t)⟩ (E7)

Proof. To establish Eq. (E6), first plug in to the result in Appendix B and use the normalization of the rate matrices to see that the EP rate of the full set of N coupled subsystems is

⟨ ˙σ(t)⟩ =� x′,x Kx′x (t)px′(t)ln�Kx′x (t)px′(t)Kxx′(t)px(t)

=� i,x′,x Kx′x (i;t)px′(t)ln�Kx′x (t)px′(t)Kxx′(t)px(t)

=� i,x′,x Kx′x (i;t)px′(t)ln � Kx′x (t) Kxx′(t)px(t)

Now introduce the shorthand

G(A ⊆ N ) := � i∈A,x′,x Kx′x (i;t)px′(t)ln � Kx′x (t) Kxx′(t)px(t)

Note that N itself is a community; G is an additive function over subsets of N ; and ⟨ ˙σ(t)⟩ = G(N ). Accordingly, we can apply the inclusion-exclusion principle to Eq. (E8) for the set of subsets N (t) to get

⟨ ˙σ(t)⟩ = � ω∈N 1(t) G(ω) − � ω∈N 2(t) G(ω) + � ω∈N 3(t) G(ω) − ...

�

� x,x′ Kx′x (i;t)px′(t)ln � Kx′x (t) Kxx′(t)px(t)

� x,x′ Kx′x (i;t)px′(t)ln � Kx′x (t) Kxx′(t)px(t)

Now use Eq. (B3) in Lemma 1 to rewrite Eq. (E10) as

⟨ ˙σ(t)⟩ = �

� ������ Kx′ωxω (ω;t)Kxωx′ω (ω;t)pxω,x−ω(t)

Next, use the same kind of reasoning that resulted in Eq. (E11) to show that the sum �

� xω,x−ω,x′ω Kx′ωxω (ω;t)px′ω,x−ω(t)lnpxω,x−ω(t) − � ω∈N 2(t)

can be written as �x,x′ Kx′x (t)px′(t)lnpx(t) = SN (t). We can use this to rewrite Eq. (E11) as

⟨ ˙σ(t)⟩ = ddt

� ������ Kx′ωxω (t)Kxωx′ω (t)pxω(t)

�SXN (t) − ΣS(t)�+� ω∈N ⟨ ˙σω(t)⟩ − � ω∈N 2(t) ⟨ ˙σω(t)⟩ + ... (E13)

This establishes the claim.

If we use Eq. (10) to expand each local EP term in Eq. (E7) and then compare to Eq. (E6), we see that the global expected EF rate equals the in-ex sum of the local expected EF rates:

⟨ ˙QN (t)⟩ = Σ⟨ ˙QN (t)⟩ (E14)

Note as well as that we can apply Eq. (E7) to itself, by using it to expand any of the local EP terms σω(t) that occur

in the in-ex sum Σ⟨ ˙σN (t)⟩ on its own RHS.

Eq. (E7) can be particularly useful when combined with the fact that for any two communities ω,ω′ ⊂ ω, ⟨ ˙σω′⟩ ≤ ⟨ ˙σω⟩ (see Eq. (15)). To illustrate this, return to the scenario of Fig. 1. There are three communities in N 1 (namely, {1,2,3},{3},{3,4}), three in N 2 (namely, three copies of {3}), and one in N 3 (namely, {3}). Therefore using obvious shorthand,

⟨ ˙σ(t)⟩ = dS1,2,3,4(t)dt − dS1,2,3(t)dt − dS3,4(t)dt + dS3(t)dt + ⟨ ˙σ1,2,3⟩ + ⟨ ˙σ3,4⟩ − ⟨ ˙σ3⟩

(Note that in contrast to lower bounds involving windowed derivatives, none of the terms in Eq. (E15) involve counterfactual rate matrices.) So if the entropy of subsystem 4 conditioned on subsystems 1,2 and 3 is growing, while its entropy conditioned on only subsystem 3 is shrinking, then the global EP rate is strictly positive.

As a final comment, it is worth noting that in contrast to multi-information, in some situations the in-ex information can be negative. (In this it is just like some other extensions of mutual information to more than two variables [13, 25].) As an example, suppose N = 6, and label the subsystems as N = {12,13,14,23,24,34}. Then take N 1 to have four elements, {12,13,14},{23,24,12},{34,13,23} and {34,24,14}. (So the first element consists of all subsystems whose label involves a 1, the second consists of all subsystems whose label involves a 2, etc.). Also suppose that with probability 1, the state of every subsystem is the same. Then if the probability distribution of that identical state is p, the in-ex information is −S(p) + 4S(p) − 6S(p) = −3S(p) ≤ 0.