b

DiscoverSearch
About
My stuff
Spin Glass Models of Syntax and Language Evolution
2015·arXiv
Abstract
Abstract

Using the SSWL database of syntactic parameters of world languages, and the MIT Media Lab data on language interactions, we construct a spin glass model of language evolution. We treat binary syntactic parameters as spin states, with languages as vertices of a graph, and assigned interaction energies along the edges. We study a rough model of syntax evolution, under the assumption that a strong interaction energy tends to cause parameters to align, as in the case of ferromagnetic materials. We also study how the spin glass model needs to be modified to account for entailment relations between syntactic parameters. This modification leads naturally to a generalization of Potts models with external magnetic field, which consists of a coupling at the vertices of an Ising model and a Potts model with q = 3, that have the same edge interactions. We describe the results of simulations of the dynamics of these models, in different temperature and energy regimes. We discuss the linguistic interpretation of the parameters of the physical model.

The evolution of languages through the interaction of their speakers is a topic of interest to computational linguists and, like many interacting many-body problems, is difficult to study analytically. In this paper we follow an approach that views languages at the level of syntax, with syntactic structures encoded as a string of binary syntactic parameters, a point of view originating in the Principles and Parameters model of Generative Linguistics, [4], [5] (see also [1] for a more expository account). It is known that syntactic parameters can change in the course of language evolution. Cases of parameter flipping have been identified in the historical development of some Indo-European languages, see for example [22]. For recent results on language evolution from the point of view of syntactic parameters, see [10].

We construct a model for language evolution inspired by the physics of spin glasses. These are systems of interacting spin variables, with spins located at the vertices of a graph and with interaction energies along the edges that tend to favor alignment (ferromagnetic) or anti-alignment (anti-ferromagnetic) of the spin variables at the endpoints of each edge. The dynamics of the system also depends on thermodynamic temperature parameter, which is a measure of disorder in the system, so that the spin variables tend to be frozen onto the ground state at low temperature, while higher energy states become accessible to the dynamics at higher temperature.

We interpret each syntactic parameter as a spin variable, on a graph representing world languages and their interaction. We obtain the binary values of the syntactic parameters from the Syntactic Structures of the World’s Languages (SSWL) database1, which documents these values for 111 syntactic parameters and over 200 natural languages. To model the interaction strengths between languages, we use data from the MIT Media Lab2, by defining the strength of the influ-ence of language A on language B as the likelihood that two languages are to be co-spoken. In particular, in their database, two languages are connected when users that edit an article in one Wikipedia language edition are significantly more likely to also edit an article in another language edition. The frequency of such occurrences provides an estimate of the strength of the interaction.

The idea of modeling syntactic parameters as spin variables in a statistical mechanical setting is not entirely new to Computational Linguistics. A model based on this idea was proposed in [20]. The main difference with respect to the approach we follow here is that, in the model of [20], the vertices of the graph are individual speakers in a fixed population, rather than languages (populations of speakers) as in our model. A statistical physics model of language change based on syntactic parameters was also constructed in [3].

1.1. General assumptions of the model. We make a series of simplifying assumptions, to the purpose of obtaining a computationally feasible model. We will examine the plausibility of these assumptions and their interpretation from a Linguistics point of view.

First, we assume that the languages we simulate are sufficiently distinct and never converge and do not concern ourselves with, for example, whether a dialect of language A is truly distinct from language A or whether two closely related languages A and B will at some point just become the “same” language. Instead, we assume there exists a definition of a language for which the notion of distinct languages is precise and for which the languages we have identified are always distinct.

The second simplification we make is that for a given syntactic parameter, such as the “SubjectVerb” syntax, a language either has it or does not have it. One could account for finer syntactical structures by considering syntaxes of arbitrary length, but this would still admit a binary classi-fication over the languages.

A third assumption is that because language interaction occurs microscopically by human interaction, and a foreign language is easier to acquire if its syntax is familiar, interacting languages will generally prefer to align their syntactic parameters. From these assumptions, we construct a Hamiltonian and evolve the system from its current syntactic parameter state toward equilibrium.

1.2. Discussion of assumptions. Considering languages as “discrete” objects (as opposed to a continuum of dialects) is a rather common linguistic assumption. Alternative models, such as wave models of transmission of linguistic changes are also possible (see for example [18]), but we will not consider them in this paper. It would be interesting to see whether statistical physics methods could be relevant to wave models of languages, but that is outside the purpose of our present investigation.

The second assumption listed above is clearly more problematic: it is a drastic simplification, which ignores phenomena of entailment between syntactic parameters. Indeed, it is well known that there are relations between different syntactic parameters, while our assumption leads us to treat them as independent spin variables. For example, there are pairs of parameters (p, p′)with the property that if  p = +1 then p′ is undefined, while if  p = −1, then p′ can take either value  ±1: see [15], [16] for some explicit examples of this entailment behavior. Thus, in a more refined version of the model, the second assumption above should be modified in two ways: (1) an additional possible value 0 of the parameters should be introduced, which accounts for the case where a parameter is undefined; (2) relations between parameters should be introduced modeling the entailment property described above. The first modification simply corresponds, in spin glass models, to considering Potts models with number of possible spin states q = 3, instead of Ising models with q = 2 (see [21] for a survey of the mathematical properties and differences between these models), while the second modification requires relations (at the vertices of the graph) between different spin variables (syntactic parameters) and can be modeled on versions of Potts models with external magnetic fields, [9]. We will first consider the simpler version of the model, with the second assumption as listed above, and then discuss the effect of introducing modifications as indicated here. The case with entailment of parameters is discussed in  §5. A simple modification of the entailment model of  §5 also account for variants of the entailment relation (see [15], [16]) where  p = −1 implies p′ undefined, and  p = +1 implies p′ = ±1, or where the parameter  p2 inentailed by more than one parameter. The latter case requires the coupling at vertices of more than two spin glass models. We will focus on only one case for simplicity.

Regarding the third assumption, since our model of interaction energies for the spin glass system is based on measures of bilingualism, as obtained by Media Lab, the evolution of syntactic parameters that we describe would be best understood, linguistically, within the context of the theory of bilingual Code-switching, [17]. For example, as observed in the study of [23] of English–Spanish bilingual population, the language spoken by this population “acquires” the Pro-Drop parameter (which is +1 in Spanish but  −1 in English) in the sense that Spanish verbs inserted in English sentences retain their Pro-Drop form from Spanish. In the evolution of syntactic parameters we obtain from the spin glass simulation, we see previously inactive  −1 parameters switch to the activated form +1 as an effect of “ferromagnetic” alignment with the corresponding parameters of nearest neighbor vertices. In a realistic model, this should be interpreted (at least in a first phase of language evolution) in the sense of Code-switching theory, as imported forms from the neighboring languages that are inserted in the original syntactic structure, while retaining their parameters activated.

Finally, we should comment on our overall choice of describing languages in terms of syntactic parameters, within the Principles and Parameters setting of [4], [5]. While this framework can been criticized, see for example [11], it is clear that coding syntactic structures as a string of binary variables makes it extremely suitable for modeling via statistical physics methods, hence it is a very natural setting for our purposes. We proceed to a more detailed description of the model in the coming section.

The general data of a spin glass model (Potts model) consist of a finite graph G; a finite set A of possible spin states at a vertex, with #A = q; a set Σ = Maps(V (G), A) of possible states of the system, namely assignments of a spin state in A at each vertex in V (G); a set of interaction energies associated to the edges  e ∈ E(G). Usually the energies are assumed to be equal to zero is the spins at the endpoints  v ∈ ∂(e) are not aligned and equal to a value  −Jeif they are aligned. The “ferromagnetic” case corresponds to the assumption that all  Je ≥0 and the “antiferromagnetic” case corresponds to  −∞ ≤ Je ≤0. One often expresses the interaction energies in terms of edge variables  te = eβJe − 1, where βis a thermodynamic parameter (an inverse temperature).

In the Ising model case, where spin variables have two possible values  ±1, a typical form of the Hamiltonian of a spin glass model is, for states  S ∈ Σ

image

where  B = (Bv) is an external magnetic field: the first term measures the degree of alignment between nearby spins and the second the alignment of spins with the direction of the overall external field. When we allow for more general Potts model cases, where the spin variables have q ≥2 possible values, Hamiltonian (2.1) is rewritten as

image

The partition function of the system is

image

and the associated Gibbs probability measure is

(2.4) PG,β(S) = e−βH(S)ZG(β) .

For more general cases, and a more detailed mathematical discussion see [9], [21]. In particular, one can fit into this model the case where there are two different interaction energies  Je ̸= J¯eassociated to the two opposite orientations of e, by doubling all edges in the graph, so as to consider both orientations, with respective energies  Je and J¯e. This case will be relevant to our application.

2.1. Topology and Energetics of the System. With these simplifications and resources, the stage is set to map the problem onto a classical statistical mechanics problem. We identify each language with the vertex of a digraph, and each interaction of the form language A influences language B with a directed edge from vertex A to vertex B. Such an edge has a weight or interaction strength  JABgiven by the aforementioned metric used by MIT Media Lab. Note that in general, JAB ̸= JBA.

We represent the presence or absence of a particular syntactic parameter by associating with each vertex a spin-1/2 variable, where a spin of +1/2 indicates the former and  −1/2 represents the latter. For simplicity, in the following we just use either +1 or  −1 for the two possible spin states, as described above, referring to them as “spin-up” and “spin-down”, respectively, absorbing a factor of 1/2 into the term  Jℓℓ′. The physical analog of this setup is the association of a particle with spin attributes to each vertex and inter-particle interactions with the edges, the usual setting of Ising models in statistical physics.

The full specification of the configuration of the system in the single-particle (single-language) basis is then given by

image

where  Sℓ,p ∈ {−1, +1}, Pis the set of all parameters, and L is the set of all languages.

With this picture in mind, we turn to the evolution. In this tensor product basis, we can simulate the evolution of each of the 111 syntactic parameters, in the independent parameters approximation, by endowing the system with an interacting Hamiltonian for each syntactic parameter:

image

This Hamiltonian is minimized by the alignment of adjacent spins, as  Jℓℓ′ ≥ 0 ∀ℓ, ℓ′. Note that this is not the most general form of spin glass models, where the Hamiltonian typically contains additional disorder terms. We are also not considering here the possible presence of terms playing the role of an external magnetic field B, as in (2.3).

2.2. Dynamics and Equilibrium Physics. We now need to impart dynamics into the model by introducing fluctuations or thermal noise. Continuing in the physical analogy, this system at thermal equilibrium with a “heat bath” at temperature T will have a distribution of configurations specified by the probability density function

(2.7) P(| ⃗Sp⟩) = 1Z(β) exp(−βHp(| ⃗Sp⟩))

where Z is the normalization or partition function

image

and  β = T −1 an inverse temperature parameter (or (kBT)−1 in Physics, with  kBthe Boltzmann constant).

Once again, let us translate this to Linguistics. The temperature T defines the extent to which languages are “noisy” due to external factors (the heat bath); if they are individually very noisy, then we can expect that these fluctuations will, on average at equilibrium, result in no overall alignment of a syntactic parameter (magnetization of zero). In this limit, the languages fluctuate independently. On the other hand, at T = 0, all of the noise is frozen out, so the system will be found only in the configuration with the largest probability as per equation (2.7). That is the ground state, or the state with minimum  Hp, in which all languages either possess or lack a given parameter (magnetization of  ±1).

2.3. Temperature parameter. We comment more in detail here on the interpretation, in Linguistics, of the temperature parameter of the spin glass model. As we mentioned above, we want to interpret that as a heat bath that introduces a certain level of noisiness in the behavior of the parameters. In Linguistics, a new probabilistic approach to syntactic parameters was recently developed in [14]. It makes a strong case, based on an analysis of treebanks of different sizes for a sample of 20 languages, for viewing the syntactic parameters of a given language not as frozen on the up or down position,  ±1, but as a binary probability distribution  {P, 1 − P}, whichexpresses the tendency of the language of have a given syntactic parameter expressed or not. For example, the case of the Head-Initial structure is analyzed in [14] and it is shown that languages typically tend to express elements of both  ±1 possibilities for this parameters, with a certain probability. Similar results are obtained, in the same paper, for Subject-Verb, Object-Verb, and Adjective-Noun parameters.

In the light of this approach, we can regard each language  ℓ, and each parameter p, as endowed with the additional data of probability distributions  {Pℓ.p, 1 − Pℓ,p}, which represent how “noisy” the setting of the parameter p is in the language  ℓ.

In general, one expects that the probability distributions  {Pℓ.p, 1 − Pℓ,p}will be different for different languages  ℓand different parameter p. However, we can assume that they all depend on an overall thermodynamic parameter  βthat can be tuned to increase or decrease the amount of noise. In first approximation, one can assume for simplicity that all probabilities are the same, with a form like  Pℓ,p(β) = P(β) = 1 − e−β2 , so that, at zero temperature (β = ∞) the parameter p is set to +1 without any noisiness (or to  −1 in the symmetric case), while for  T → ∞ (at β = 0)one has the maximum amount of noise, with the uniform distribution P = 1/2 assigning equal probability to the two  ±1 values of the syntactic parameter.

One can consider more general functions  Pℓ,p(β), which are continuous and monotonically interpolating between the values  Pℓ,p(∞) = 1 and Pℓ,p(0) = 1/2. For example, one can slightly modify the case discussed above by taking  Pℓ,p(β) = 1 − e−βγℓ,p2, where the exponents  γℓ,pshould be fitted to data like those collected in [14], so that there is some common value  β0at which the probability distribution  Pℓ,p(β0) match the statistics for specific syntactic parameters in specific languages. Unfortunately, at present, we do not have a sufficiently large set of data, of the type collected in [14], to fully implement this analysis, so we only outlined here the general approach.

2.4. Variant of the Model: Entailment of Parameters. In the form of the spin glass model described here above, we made the overall simplifying assumption of treating all syntactic parameters as mutually independent and uncoupled. Thus, the spin glass model is simply an uncoupled collection of many independent Ising models, with the same underlying graph topology and interaction energies, one for each parameter p.

This “independent parameters assumption” is the most problematic. phenomena of entailment of parameters are well known, see [15], [16] for specific examples. In such cases, what typically happen is that one or more parameters depend on the setting of another parameter (or more). As a typical example, we can take the case of two parameters (p1, p2) with the property that, if the parameter  p1is activated (set to +1 value) in a language  ℓ, then the second parameter  p2 can takeeither value +1 or  −1, while if  p1 is set to −1, then the parameter  p2is just undefined in that language. Several such examples, in the family of Indo-European languages, are reported in the table of syntactic parameters given in Table A, Figure 1, in the Appendix of [15]. We will focus on the analysis of one such example in  §5.

In addition to the entailment problem, a more general problem, which is less well understood, is whether there are other dependencies between different syntactic parameters, and what would constitute a minimal number of independent binary variables. This is part of the general critique that is addressed at the Principles and Parameters model, [11]. For the purpose of the present paper, we will ignore this further problem, and we will assume that, except for the entailment phenomena described above, the syntactic parameters can be considered independent and uncoupled.

2.5. Comparison with other models. A statistical physics model of language acquisition and language evolution was developed in [3]. Their model is also based on the Principles and Parameters approach. However, the statistical physics model they construct is significantly different from the one we describe in the present paper. They are mostly interested in modeling language acquisition. They model by a Gibbs state the probability with which a speaker selects sentences, so that the choice should follow euphonic considerations: how best a given sentence fits the prosodic patterns of the language. To this purpose, they model prosody as a potential, which defines the Hamiltonian of the Gibbs state. The grammar is then estimated by a maximum likelihood argument maximizing the Gibbs probability measure.

Another linguistic application of statistical physics models, and in particular of the Ising spin glass model, occurs in  §13.6.2–13.6.3 of [20]. This is a model of language evolution, where the author considers a population of linguistic agents, corresponding to the vertices of the graph, with edges representing the interaction between individual agents. The model also assumes that there are two distinct languages in the population. The adoption of one or the other language by a given speaker corresponds to the two possible spin states  ±1 at that vertex in the spin glass system. The inverse temperature parameter of the Ising model is interpreted as the probability with which speakers of each language produce “cues” (in a cue-based learning algorithm). The behavior of the system then models a situation where, above a certain critical temperature, individual agents behave independently and both languages are equally represented in the population, while below a critical temperature one language becomes dominant.

To our knowledge, a model like ours, where the syntactic parameters themselves are treated as spin variables in a spin glass model, and interactions are modeled by bilingualism data, has not been previously considered.

Having established the equilibrium physics expected of a spin glass model of syntax, we seek to explore the evolution of the system computationally. Naively exploring the configuration space is

image

Figure 1. Initial state of the Subject-Verb parameter for various languages in the database. Most languages currently possess this syntax (value +1 of the parameter, red colored vertices). Graph built on MIT Media Lab data of language interaction.

computationally too expensive, so instead we employed a Markov Chain Monte Carlo simulation to propagate stochastically an initial configuration of syntactic parameter values (as specified by the SSWL) toward equilibrium, through the Metropolis algorithm, see for instance [12], [19].

The algorithm (see, for instance,  §2 and 3 of [19]) is based on the detailed balance condition

(3.1) P(s)P(s → s′) = P(s′)P(s′ → s),

where  P(s → s′) is the probability of transitioning from a state s to a state  s′, represents the condition that P(s) is the stationary distribution of a Markov process with transition probabilities

image

Figure 2. In the low temperature regime T = 0.000001 equilibrium, all of the languages are expected to acquire the Subject-Verb parameter in the +1 position.

P(s → s′). Uniqueness of the stationary distribution follows from an ergodicity property of the Markov process, usually implied by conditions of aperiodicity and positive recurrence.

In the Metropolis–Hastings algorithm, a Markov process with the required properties is constructed by obtaining transition probabilities  P(s → s′) in terms of acceptance-rejection rates:

(3.2) P(s → s′) = πA(s → s′) · π(s → s′),

where  π(s → s′) is the conditional probability of proposing a state  s′ given the state s and πA(s → s′) is the conditional probability of accepting the proposed state  s′ given s. The P(s → s′)are normalized to add up to one, by requiring that, in the remaining cases the proposed state is

image

Figure 3. In the high temperature regime T = 20 equilibrium for the SubjectVerb parameter, all vertices have local magnetization close to zero, with half of them approaching zero from the positive direction.

s′ = s. The usual Metropolis–Hastings choice of the acceptance distribution is

(3.3) πA(s → s′) = min{1, P(s′) π(s′ → s)P(s′) π(s → s′)}

which satisfies the balance condition (3.1)

image

image

Figure 4. Average (over vertices) value of spin, in the low temperature regime T = 0.000001 (left) and in the high temperature range T = 20 (right), as a function of the number of steps in the Monte Carlo simulation.

The selection probabilities  π(s → s′) in the Metropolis–Hastings algorithm are chosen so that the ergodicity condition holds. In the case where the graph is a lattice, this is usually achieved by considering single-spin-flip dynamics, where in each allowed transition  s → s′ the final state s′ differs from the initial state s by the flipping of a single spin. On these possible states, the probability is taken to be uniform,  π(s → s′) = 1N , with Nthe number of allowed states.

In the case where P is the Gibbs measure of a spin glass model, a choice of the acceptance probabilities  πA(s → s′) that satisfy the detailed balance condition (3.1) is given by setting

image

Namely, a new state  s′ with energy lower than or equal to that of s is automatically accepted, while one with a higher energy is accepted with probability exp(−β(H(s′) − H(s))). See §3.1 of[19] for more details.

3.1. Ergodicity and Metropolis–Hastings algorithm. The ergodicity property is necessary for the working of the Metropolis–Hastings algorithm. A Markov chain is ergodic if and only if it is irreducible and aperiodic. Irreducibility is the condition that the Markov process can reach any possible state of the system starting from any given state in a finite number of steps. A state is periodic if the chain can return to it only at multiples of some period d > 1. Aperiodicity is the absence of periodic states (d = 1 for all states). Under the assumption of single-spin-flip dynamics, ergodicity is satisfied for the Ising model on a lattice (see  §3.1 of [19]).

The graph we consider here is not a lattice. However, it makes sense in terms of the linguistic interpretation of the model, to assume a similar type of single-spin-flip dynamics. For similar arguments in favor of single-spin-flip dynamics in models of language acquisition and evolution, see [20]. Ergodicity of the Metropolis–Hastings algorithm, with the single-spin-flip dynamics, still holds for our graph. The following simple argument applies more generally.

Irreducibility can be checked using the notion of accessibility. Given states  s and s′, let τdenote the minimum number of steps needed for the Markov chain to reach  s′ starting at s. The state  s′ is accessible from s if there is a positive probability that  τ < ∞.Suppose that the states  s and s′ differ at a certain set  A ⊂ V (G) of sites, with k = #A and N = #V (G). Then P(τ < ∞) ≥ P(τ = k). Let LAbe the set of all k! orderings of the elements of A. For j = 1, . . . , k!, and  σj ∈ LA, let sj = (sj,r)r=0...,k−1be the corresponding list of k states in Σ, each obtained from the previous one by flipping the spin located at the site  σj(r) ∈ A. The probability  P(τ = k)is then the sum of the products of the probability of obtaining a string  σj in LAby choosing uniformly randomly a sequence of k elements in V (G), times the product of the probabilities of the flipping of the individual spins at the sites  σj(r) for r = 1, . . . , k,

image

The precise values of these probabilities depends on the topology of the graph G, through the set of edges, that determine the change in energy between successive states, according to (3.4). It is however cleat that  P(τ = k) >0, hence all states are accessible from a given one, which is equivalent to the irreducibility condition for the Markov chain.

For a single parameter, the set Σpof possible states has #Σp = 2#V (G). The number of states S′p that are accessible from a given state  Sp, in a single-spin-flip dynamics, is the number of states that differ from  Spby flipping a single spin at one of the vertices. Hence, for any given  Sp, thereare N = #V (G) accessible states  S′p. The probability of choosing one of these states is uniform π(Sp → S′p) = 1/N. For a given state  Sp, we have P(Sp → Sp) = 0 if and only if all the accessiblestates  S′p with π(Sp → S′p) = 1/N >0 have lower energy, so that all  πA(Sp → S′p) = 1 andP(Sp → Sp) = 1 − �S′p P(Sp → S′p) = 1 − 1N #{S′p} = 0. Thus, states S with P(Sp → Sp) = 0 aremaxima of the energy. All other states  Sp have P(Sp → Sp) >0, hence they are aperiodic states of the Markov chain. If a Markov chain is irreducible then all states have same period, hence the remaining states must also be aperiodic. Together with irreducibility this implies ergodicity.

Thus, if we focus on the behavior of a single syntactic parameter, we simply have a Ising model on our graph of languages G, with given interaction energies, and we apply the Metropolis– Hastings algorithm as above. When we consider more than one parameter at the same time, so that states are of the form ⃗Sp, under the simplifying assumption that different syntactic parameters are independent of each other, we just run the same kind of Metropolis–Hastings algorithm on each parameter, by each time flipping a single spin  Spof a single parameter p. Namely, we proceed as described above, using the Metropolis–Hastings algorithm with single-spin-flip dynamics. We propagate a state  st → st+1by proposing a configuration  | ⃗Sp⟩and accepting it,  st+1 = | ⃗Sp⟩, orrejecting it,  st+1 = st, with some probability. To simplify the noise, we assume it acts by trying to flip the spin at each vertex independently, although in reality this is more likely to occur in clusters. At each step, the algorithm chooses a vertex at random and tries to flip its spin, accepting or rejecting the flip according to (3.4):

image

Propagating the state for many steps produces a distribution of configurations that asymptotically approaches equation (2.7), allowing one to calculate thermal averages of various functions (such as magnetization) by evaluating them on the state at each step and averaging over steps.

3.2. Model with Entailment. We now discuss how one can adapt the previous computational setting to the more refined model that also takes into consideration dependencies and entailment of syntactic parameters.

The first modification in the model consists of the fact that we use, for each parameter, a set of three possible values  {−1, 0, +1}instead of the binary values  {±1}, thus allowing for the possibility that a parameter may be undefined (by effect of the values of other parameters). Thus, instead of the Ising model, we are looking at a Potts model with q = 3. The natural generalization of the Metropolis dynamics of the Ising model is the wider class of Glauber dynamics on graphs, where the single-spin-flip dynamics is replaced by the assumption that each move changes the parameter value at just one vertex, by choosing a possible value uniformly at random.

Consider again the situation described in  §2.4, where we have two parameters (p1, p2) with theproperty that, if the parameter  p1 = +1 then the parameter p2can take values  ±1, while if the parameter  p1 = −1 then p2is undefined. As before, we associate spin variables  Sp1 and Sp2 tothe two parameters, but now we assume that  Sp1has two possible spin states  {±1}, as before, while  Sp2has three possible spin states,  {+1, 0, −1}, with 0 for the case when the parameter is undefined.

Thus, we see that, in order to accommodate the entailment relations, we need to construct a multivariable spin glass model, where some of the spin variables associated to the vertices behave like an Ising model (with number of spin states q = 2), while other variables behave like a Potts model (with number of spin states q = 3).

Moreover, the entailment relation should be modeled by additional interaction terms in the Hamiltonian, that couple the variables  Sp1 and Sp2, in such a way that the configurations with Sp1 = 1 and Sp2 = ±1, and with  Sp1 = −1 and Sp2 = 0 are favored energetically over all the otherpossible combinations.

This leads us into a new class of spin glass model, which to our knowledge has not been considered in physics so far, where different spin variables, with different numbers of spin states coexist on the same graph and are coupled with one another. We describe a possible choice of an interaction term that achieves the desired result.

We focus on a case with only two parameters  p1, p2with an entailment relation as above. Let Sℓ,pbe the spin variables, as before, where  Sℓ,p1 ∈ {±1} and Sℓ,p2 ∈ {±1, 0}. We consider a change of variable for the first spin, with the new variable  Xℓ,p1 ∈ {0, 1} defined by Sℓ,p1 = exp(πiXℓ,p1).This just corresponds to the two possible choices of representing the group Z/2Z in the additive form {0, 1} or in the multiplicative form  {+1, −1}. We also associate to the spin variables  Sℓ,p2 ∈{+1, 0, −1}a variable  Yℓ,p2 = |Sℓ,p2| ∈ {0, 1}.

We then consider a Hamiltonian given by the sum of two terms  H = HE + HV, where the term HEexpresses the edge interactions between different languages, as in (2.6), written in the Potts model form (2.2)

image

where we treat the two parameters as independent, while the interaction between the parameters due to the entailment relation is encoded in the  HV term as

image

For  Jℓ >0, this gives an anti-ferromagnetic pairing between the variables  Xℓ,p1 and Yℓ,p2.

The preferred energy states for the Hamiltonian  HVare those where either  Xℓ,p1 = 0 andYℓ,p2 = 1, or Xℓ,p1 = 1 and Yℓ,p2 = 0, which correspond, respectively, to the entailment relationsSℓ,p1 = 1 and Sℓ,p2 = ±1, or Sℓ,p1 = −1 and Sℓ,p2 = 0. Notice that the size of the parameters Jℓ inthe term  HVin the Hamiltonian determine how strongly enforced the entailment relation is in a given language: for  Jℓ → ∞, an infinite energy barrier separates the ground states  HV,ℓ = 0 fromthe excited states  HV,ℓ = Jℓ, hence the entailment is strongly enforced, while lower values of  Jℓwould imply that the language  ℓadmits a certain frequency of exceptions to this syntactic rule. These should be interpreted in the same sense as the probabilistic approach to setting syntactic parameters discussed in [14], see also the discussion in  §2.3 above.

Notice that, if we freeze one of the parameter  p1and only consider the evolution of the second parameter, then the Hamiltonian above can be regarded simply as a case of Potts model with external magnetic field.

We then run the analog of the Metropolis–Hastings algorithm for the Hamiltonian  H = HE+HVon the graph of languages and language interactions. Since we now have spin variables with more than two possible states, instead of the single-spin-flip dynamics one uses the appropriate Glauber dynamics. For general results about ergodicity and convergence time of Glauber dynamics on some classes of graphs, we refer the reader to [2], [8]. In the simple case we analyze in  §5 below,we will work with a complete graph, so the results of [8] apply.

3.3. Implementation. Part of the implementation consisted of parsing and converting the SSWL database of syntactic parameters in a form usable for data analysis. Additionally, discrepancies in nomenclature between the MIT Media Lab topology source and the SSWL database had to be identified and manually resolved. The implementation for the independent parameters simulation was executed in MATLAB. The implementation for the entailment of parameters was executed in Java. The source files of the code are available at the GitHub repository:

image

As a first simulation, we consider all parameters as independent spin variables, hence we can focus on just one parameter at a time and run a single-parameter simulation. We report here the case of the Subject-Verb parameter as an illustrative example: we describe the behavior at low and high temperature T, averaging over a million steps. This is enough to obtain a clear understanding of the phase diagrams of simulations for all of the other parameters, when considered independently. The code in the GitHub repository can be used to generate analogous simulations for all parameters.

Here, when we refer to the “low” and “high” temperature regimes, we mean the range in which the ratio  T/⟨Jℓℓ′⟩is very small or, respectively, very large.

4.1. Subject-Verb Parameter. The initial state specified by the SSWL is given in Figure 1, where the vertices are represented as circular nodes colored red if the language possesses the parameter or blue if it lacks the parameter. The sizes of the vertices do not signify anything in these graphs: variable sizes are only used for ease of visualization.

Figure 1 suggests that the vast majority of languages currently possess this parameter in the activated form +1. Evolving this system at very low temperature (T = 0.000001), we want to know the local magnetization, or the thermal average of the spin value at each vertex. Since we are in the low temperature regime, we expect that the vertices will tend to orient their spins in the same direction (either all of the languages will tend to possess the parameter or will lack it). Marking vertices with  ⟨Sℓ,p⟩H >0 red and vertices with  ⟨Sℓ,p⟩H <0 blue, we can look at a graph of these languages at equilibrium, given in Figure 2.

As expected, since the system is close to T = 0, it is also close to the ground state. In fact, a configuration randomly sampled from the equilibrium ensemble will also look like Figure 2 nearly 100% of the time, because of how dominant the all-up configuration is: it is nearly impossible at low T for the state to propagate to a configuration that is not the ground state. At T = 0, this is the only configuration in the ensemble. At this low temperature, the languages quickly converge during their evolution to a state in which they all acquire the parameter in the activated form +1. A more physical way to view this is to look at a plot of the average spin as a function of time, as given in the first graph of Figure 4.

One may wonder why this system always ends up in this “all-up” state instead of an “all-down” state. The key is that the initial configuration is dominated by spin-up vertices, so the energy barrier is much higher for a droplet of spin-down to expand and dominate the system than it is for the sea of spin-up to swallow the droplets.

What happens when T is very large? Taking T = 20, the local magnetizations approach zero, with exactly half of the vertices approaching zero from the positive direction as in Figure 3. In fact, the local magnetizations vary with mean  −2.3927×10−4 and median 7×10−6 on the interval [−0.0186, 0.0147].

In this case, a configuration picked at random from the equilibrium ensemble is most likely to have approximately half of the vertices with spin up. Because T is large, a state can propagate to almost any other configuration fairly easily. At equilibrium, the languages here evolve approximately independently of each other, so the average (over vertices) spin fluctuates about zero, as in the second graph of Figure 4.

There are then two points to note: the model behaves as predicted, and it is marked by the same symmetry-breaking phase transition observed in the 2D Ising Ferromagnet at a critical temperature. This is not surprising, since this model is at its core just a ferromagnet with a more complicated topology.

We consider here a simple case where we focus on two parameters with an entailment relation, over a small set of Indo-European languages. In this case, we take the data on the values of the syntactic parameters from [15], [16], while we still use the same data from the MIT Media Lab database for the interaction strengths between different languages. In this simulation we used the Media Lab data based on book translations instead of Wikipedia editing3.

We consider, for example two parameters  p1, p2expressing Definiteness (respectively, number 7 and 12 in the list in Table A of [15]). The first parameter  p1expresses Partial Definiteness, while the parameter  p2is Definiteness Checking. See for instance [7] for a discussion of the role of these parameters. We consider three languages: English, Russian, and Bulgarian. We input interaction energies taken from the same MIT Media Lab data graph, and we neglect the effect of interaction with other languages. The initial configuration of the two parameters, taken from Table A of [15] is according to the following table:

image

Our second example considers another pair of parameters  p1, p2, given, respectively, by Strong Deixis and Strong Anaphoricity (numbers 52 and 53 in Table A of [15]). For a detailed discussion of deixis see [13], while regarding anaphoricity, see for instance [6]. In this example, we work with the complete graph on four vertices (tetrahedron graph), where the vertices  ℓ1, . . . , ℓ4correspond to the languages: English, Welsh, Russian, and Bulgarian. Notice that, according to the Media Lab graph, the interaction energies between the pairs Welsh/Russian and Welsh/Bulgarian are

image

Figure 5. Average value of spin for  p1 = Partial Definiteness (left) and forp2 = Definiteness Checking (right) in the high temperature/high energy (HT/HE)regime (top) and in the low temperature/high energy (LT/HE) regime (bottom), as a function of the number of steps in the Monte Carlo simulation.

negligible compared to the other interaction energies. The initial configuration, from Table A of [15], for this pair of parameters and these four languages is the following:

image

5.1. Entailment dynamics. We ran a simulation of the dynamics of the system for two entailed parameters  p1, p2, with Hamiltonian  H = HE + HV , where HE and HVare given, respectively, by (3.6) and (3.7), in the case of the two examples described above.

In this simulation, we replaced the acceptance probabilities (3.4) with

image

The dynamics depends on two parameters, the temperature and the coupling energy of the entailment relation. We ran simulations for high/low temperature and high/low entailment energy.

image

Figure 6. Average value of spin for  p1 = Partial Definiteness (left) and forp2 = Definiteness Checking (right) in the high temperature/low energy (HT/LE)regime (top) and in the low temperature/low energy (LT/LE) regime (bottom), as a function of the number of steps.

In the first case, for the graph with languages  {ℓ1, ℓ2, ℓ3} = {English, Russian, Bulgarian} andthe parameters  {p1, p2} = {Partial Definiteness, Definiteness Checking}, we consider an initial state as given in the first table above. The average value of spin for the two parameters is illustrated in Figures 5 and 6, in the different regimes of high temperature and high entailment energy (HT/HE), high temperature and low entailment energy (HT/LE), low temperature and high entailment energy (LT/HE), low temperature and low entailment energy (LT/LE). The final equilibrium states for the dynamics, in each of these cases, would then be as follows:

image

image

Figure 7. Average value of spin for  p1 = Strong Deixis (left) and for p2 =Strong Anaphoricity (right) in the HT/HE regime (top) and the LT/HE regime (bottom), as a function of the number of steps.

In the second case, with languages  {ℓ1, ℓ2, ℓ3, ℓ4} = {English, Welsh, Russian, Bulgarian} andwith parameters  {p1, p2} = {Strong Deixis, Strong Anaphoricity}, the initial state is given by the values of the parameters in the second table above, and the average value of spin in the different HT/HE, HT/LE, LT/HE, LT/LE regimes is illustrated in Figures 7 and 8. The final equilibrium states for the dynamics are then of the form

image

These examples are only illustrative and not entirely realistic, because we have singled out a small portion of the language graph that exhibits an interesting configuration of entailed parameters in the initial state, and we have run the simulation neglecting the interactions with all the other languages in the rest of the larger language graphs, which will also affect the behavior of the system. These examples, however, are interesting because they show situations where a con-figuration of entailed parameters reaches an equilibrium states where parameters of the individual languages have undergone some changes, but have not always converged to a configuration where all the parameters are aligned. While in the first example one obtains complete alignment of all

image

Figure 8. Average value of spin for  p1 = Strong Deixis (left) and for p2 =Strong Anaphoricity (right) in the HT/LE regime (top) and the LT/LE regime (bottom), as a function of the number of steps.

the parameters in the low temperature and low energy regime, in the second example, even in this range, parameters do not fully align. This shows that the presence of entailment between syntactic parameters can have a substantial effect on the dynamics that differs significantly in outcome from the case where one assumes an independence hypothesis on parameters.

In this paper, we introduced a new tool to study the flow of syntactic parameters by mapping the problem onto a spin glass model. This tool provides some flexibility, as one could provide any topology and any adjacency matrix. We have shown that, under a hypothesis of independence between syntactic parameters, in the evolution of the system parameters tend to align to the +1 position. We related this to linguistic models of bilingual code-switching. We also showed that, when entailment relations between parameters are taken into account, the uncoupled Ising models are replaced by a coupling at the vertices of Ising and Potts models with same edge interactions. One obtains then more complicated equilibrium configurations, depending on the entailment energy parameter, which gives the strength of the coupling, and its relation to the “temperature” parameter. Even in the low energy and low temperature regime one finds cases that do not yield a complete alignment of parameters.

One factor not accounted for in this project was the number of speakers for each language, which could significantly change the interaction strengths. Another interesting scenario would involve disordering terms in which two interacting languages actually prefer to antialign their parameters.

An overall hypothesis of this model is that the interaction energies along the edges, which are modeled on estimates of the size of the bilingual population, remain constant during the evolution. In a realistic model, the strength of the interaction would also be co-evolving. It is also likely that factors such as the existence of better and more efficient automated translators will eventually cause drastic changes in the size of human bilingual or multilingual populations, so the feasibility of such models of linguistic evolution may also be altered by external factors of this sort.

Acknowledgment. This work was performed as part of the activities of the last author’s Mathematical and Computational Linguistics lab and CS101/Ma191 class at Caltech. The last author is partially supported by NSF grants DMS-1201512 and PHY-1205440.

[1] M. Baker, The Atoms of Language, Basic Books, 2001.

[2] N. Berger, C. Kenyon, E. Mossel, Y. Peres, Glauber dynamics on trees and hyperbolic graphs, Probab. Theory Related Fields 131 (2005) N. 3, 311–340.

[3] M. Cassandro, P. Collet, A. Galves, C. Galves, A statistical-physics approach to language acquisition and language change, Physica A: Statistical Mechanics and its Applications, Vol.263 (1999) 427–437.

[4] N. Chomsky, Lectures on Government and Binding, Dordrecht: Foris Publications, 1982.

[5] N. Chomsky, H. Lasnik, The theory of Principles and Parameters, in “Syntax: An international handbook of contemporary research”, pp.506–569, de Gruyter, 1993.

[6] P. Cole, Indefiniteness and Anaphoricity, Language, Vol. 50 (1974) N.4, Part 1, 665–674.

[7] A. Cornilescu, A. Nicolae, On the Syntax of Romanian Definite Phrases: Changes in the Patterns of Definite-ness Checking, in “The Noun Phrase in Romance and Germanic”, pp.193–221, John Benjamins, 2011.

[8] P. Cuff, J. Ding, O. Louidor, E. Lubetzky, Y. Peres, A. Sly, Glauber dynamics for the mean-field Potts model, J. Stat. Phys. 149 (2012) 432–477.

[9] J.A. Ellis-Monaghan, I. Moffatt, The Tutte-Potts connection in the presence of an external magnetic field, Adv. in Appl. Math. 47 (2011) no. 4, 772–782.

[10] C. Galves (Ed.) Parameter Theory and Linguistic Change, Oxford University Press, 2012.

[11] M. Haspelmath, Parametric versus functional explanations of syntactic universals, in “The limits of syntactic variation”, pp. 75–107, John Benjamins, 2008.

[12] W. Krauth, Statistical Mechanics: Algorithms and Computations, Oxford University Press, 2006.

[13] F. Lenz, Deictic Conceptualisation of Space, Time, and Person, John Benjamins Publishing, 2003.

[14] H. Liu, Dependency direction as a means of word-order typology: a method based on dependency treebanks, Lingua, 120 (2010) 1567–1578.

[15] G. Longobardi, C. Guardiano, Evidence for syntax as a signal of historical relatedness, Lingua, 119 (2009) 1679–1706.

[16] G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin, Towards a syntactic phylogeny of modern Indo-European languages, Journal of Historical Linguistics, Vol.3 (2013) N.1, 122–152.

[17] J. MacSwan, Grammatical Theory and Bilingual Codeswitching, MIT Press, 2014.

[18] J. Nerbonne, Measuring the diffusion of linguistic change, Phil. Trans. R. Soc. B (2010) Vol. 365, 3821–3828.

[19] M.E.J. Newman, G.T. Barkema, Monte Carlo Methods in Statistical Physics, Oxford University Press, 1999.

[20] P. Niyogi, The computational nature of language learning and evolution, MIT Press, 2006.

[21] A.D. Sokal, The multivariate Tutte polynomial (alias Potts model) for graphs and matroids, in “Surveys in combinatorics 2005”, 173–226, London Math. Soc. Lecture Note Ser., 327, Cambridge Univ. Press, 2005.

[22] A. Taylor, The change from SOV to SVO in Ancient Greek, Language Variation and Change, Vol.6 (1994) 1–37.

[23] E. Woolford, Bilingual code-switching and syntactic theory, Linguistic Inquiry, Vol.14 (1983) N.3, 520–536.

[24] SSWL Database of Syntactic Parameters: http://sswl.railsplayground.net/

Division of Physics, Mathematics, and Astronomy, California Institute of Technology, 1200 E. California Blvd, Pasadena, CA 91125, USA

image


Designed for Accessibility and to further Open Science