b

DiscoverSearch
About
My stuff
Data-driven Discovery of Emergent Behaviors in Collective Dynamics
2019·arXiv
Abstract
Abstract

Particle- and agent-based systems are a ubiquitous modeling tool in many disciplines. We consider the fundamental problem of inferring the governing structure, i.e. interaction kernels, in a nonparametric fashion from observations of agent-based dynamical systems. In particular, we are interested in collective dynamical systems exhibiting emergent behaviors with complicated interaction kernels, and for kernels which are parameterized by a single unknown parameter. This work extends the estimators introduced in [1], which are based on suitably regularized least squares estimators, to these larger classes of systems. We provide extensive numerical evidence that the estimators provide faithful approximations to the interaction kernels, and provide accurate predictions for trajectories started at new initial conditions, both throughout the “training” time interval in which the observations were made, and often much beyond. We demonstrate these features on prototypical systems displaying collective behaviors, ranging from opinion dynamics, flocking dynamics, self-propelling particle dynamics, synchronized oscillator dynamics, to a gravitational system. Our experiments also suggest that our estimated systems can display the same emergent behaviors of the observed systems, that occur at larger timescales than those in the training data. Finally, in the case of families of systems governed by a parametric family of interaction kernels, we introduce novel estimators that estimate the parametric family of kernels, splitting it into a common interaction kernel and the action of parameters. We demonstrate this in the case of gravity, by learning both the “common component” 1/r2and the dependency on mass, without any a priori knowledge of either one, from observations of planetary motions in our solar system.

image

Emergent behavior in collective dynamics, such as clustering of opinions [2, 3, 4, 5], flocking of birds [6, 7, 8], milling of fish [9, 10, 11, 12], and concentric trajectories of planetary motion [13], is among one of the most interesting phenomena in macroscopic and microscopic scale systems. It occurs in systems used across many disciplines, including biology, social science, particle physics, astronomy, economics, and many more. Extensive studies have been conducted in order to understand the mechanism behind such intricate and yet geometrically simple behaviors. As shown in [14, 15, 7, 16, 17, 18, 5], these emergent behaviors are steady-states of various types of collective dynamics, and they can be qualitatively studied when the governing equations are known beforehand. However, if only the short-time trajectories of the dynamics are observed, it may be challenging to make accurate prediction about the emergent behaviors of the observed dynamics without prior knowledge of the governing equations. We offer a learning approach to overcome this difficulty by first discovering the governing equations from the observational data, and then use the estimated equations for large-time prediction.

Research on discovering governing equations of dynamical systems has enjoyed a long history in the science and engineering community; it can be traced back to the earlier work of Lagrange, Laplace and Gauss [19]. Among the many inspiring studies, the lengthy discovery of gravity had immense impact. In 1605, Kepler announced his first law of planetary motion, from his work on showing Mars’ elliptical orbit based on Tycho Brahe’s observational data. Based on Kepler’s first law and the assumption that gravity has a parametric form, namely 1rp, Newton formulated his law of universal gravitation, i.e., that gravity has the form 1/r2, in 1687. Our learning approach can re-discover the 1/r2form of the law of universal gravitation in a highly efficient and precise manner without the assumption of gravitation having a parametric form and planetary motion being elliptical, for details see Sec. 7.

Our work is focused on discovering governing equations of collective dynamics (also known as self-organized dynamics), a special kind of interacting particle- and agent-based dynamical systems. These are a distinguished subset of general autonomous systems of ODEs

image

In this general case, given  {X(t)}t∈[T0,T ](0  ≤ T0 < T), system identification consists in inferring f from X(t) and ˙X(t) observed at various t’s. Classical regression techniques (e.g. [20, 21, 22, 23]) have recently been brought to bear on this problem (e.g. [24, 25, 26, 27]). However, lack of independence among the observation data and the curse of dimensionality due to D typically being large, are all obstructions to finding the desired f effectively and efficiently (see [1] for an extended discussion) of these points. The works in [28, 1, 29] proposed a learning approach that exploits the special form of collective dynamics to overcome the difficulties mentioned above. It began with first order systems of the form

image

The 1-dimensional function  φis referred to as the interaction kernel. Here and in what follows we assume, with possibly abuse of notation, that the term  i′= i in the sum in the r.h.s. is 0, even in cases where  φmay not be defined at 0 (e.g.  φ(r) = 1/r2). The aforementioned works consider the problem of estimating φgiven trajectory observations, in terms of positions and velocities of the agents at various times, along one or multiple trajectories (with different initial conditions (ICs), e.g. sampled at random from some probability distribution on the state space). In [28, 1] a nonparametric learning approach to construct an estimator �φfor φis considered, that exploits the governing structure of the dynamics in (2), which is a special (yet ubiquitous) case of the general equations (1). The work [28] considered a first-order model of homogeneous agents (derived from gradient flow), and studied the convergence to its mean field limit and the inference of the mean-field limit interaction kernel from observations of trajectories of the system with a finite and yet increasing number of agents. The work [1] extended the approach in [28] to the situation where the number of agents is fixed, but the number of observations increases, showing that the nonparametric estimators for the interaction kernel converge at the near optimal rate for regression in one dimension, in particular independent of the dimension of the state space. It generalized the estimators to first and second-order of heterogeneous agents with 1-dimensional interaction kernels based on pairwise distances, providing substantial numerical evidence of the performance of these generalizations. The work [29] analyzes in detail the estimators for first order heterogeneous agent-based systems, generalizing the theoretical results of [1] to that case, while sharpening some of the constructions.

Here we extend the application of these approaches to rather general classes of agent-based systems, driven by first- and second-order dynamics, interaction kernels depending not only on pairwise distances but also on other pairwise data, which also depends on the states between the agents. We show that the estimators can be generalized to these settings, and measure their performance at both approximating the interaction kernel  φ(in a suitable, dynamics-driven weighted  L2-distance) and in predicting trajectories from the same initial conditions (ICs). These estimators may be constructed in a memory-efficient way, i.e. scalable to large data sets with a large number of agents1. The estimated interaction kernels, inferred from short-time trajectory data, can provide very good approximations to the original unknown interaction kernels, and yield predictions from new initial conditions. Furthermore, the estimated interaction kernels can also provide insight into discovering the correct emergent behaviors at large time, as we will demonstrate in several examples in section 5 and section 6. We also extend these estimators to consider not only a single system, but a family of systems, governed by a family of interaction kernels  {φk}k. We consider the case of gravity in section 7, and show that we can discover both the “common structure” of the interaction kernel, namely the 1/r2dependency on pairwise distance, and the dependency on mass.

The structure of our paper is as follows. In section 2 we discuss in detail the two models, of first and second-order systems respectively, which we are considering. In section 3 we outline the learning algorithm for each model. These algorithms are efficient and scalable. They enjoy highly favorable performance in terms of computational complexity as described in section 3.1. Section 4 defines the various performance measures, confusion matrices, and pattern indicator scores. It also discuses how we set up the numerical experiments. Section 5 to Section 7 provide a detailed study of five fundamental dynamical systems that vary across order, interaction kernel form, and agent characteristics, as well as learning of interaction kernels that involve parameters. Finally, we conclude the paper and discuss various future research directions stimulated by these results in section 8.

The main focus of this work is to numerically investigate the capability of the estimators at predicting emergent behaviors of various collective dynamics using our extended learning approach. Our extended learning covers dynamical systems much more elaborate than those considered in [1]. These systems have a complex balance between non-collective and collective forces, include interaction laws depending on more than one variable, and allow for parametric families of interaction laws. We motivate these extensions with families of dynamical systems exhibiting these more intricate governing structures, which were motivated by various applications and whose dynamical properties have been studied in the scientific literature.

Here we consider particle- and agent-based systems that model rather general complex systems, beyond those considered in [28, 1, 29]. The first-order models are governed by the following system of coupled ODEs

image

These systems contain heterogeneous agents: the agents are partitioned into K different types, with  Ckcontaining the indices of the agents of type k, for k = 1, . . . , K. Table 1 shows the definitions of variables in (3).

image

Table 1: Notation for first-order models

Remark 2.1. Compared to Eqn. (8) in [1], our new equation (3) has the following additions: the new  ξivariable, non-collective forces  Fx, Fξ, and 2-dimensional interaction laws (as opposed to only single-variable, pairwise-distance-based interactions).

image

for i = 1, · · · , N. Table 2 shows the definitions of variables in (4). Natural regularity and growth assumptions on the functions in the right-hand side are made so that the system has a unique solution for all times. For example assuming that the functions involved are at least Lipschitz and decay sufficiently rapidly at infinity would suffice.

image

Table 2: Notation for second-order models

Remark 2.2. Compared to Eqn. (11) in [1], our new equation (3) has the following additions: slightly different non-collective forces,  F ˙x, Fξ, and 2-dimensional interaction laws.

image

We are given observation data, namely  {ymi(tl), ˙ymi(tl)}N,Mi,m=1(yi=�xiξi

for second order systems) at time instances  T0 = t1 < · · · < tL = T. In the case of missing derivative data, namely ˙ymi, we will approximate ˙ymiusing appropriate finite difference schemes. The observation data is generated from M initial conditions (ICs),  {(ymi(0))i}m, which are i.i.d samples from a (typically unknown) probability distribution  µy(µy=  µx⊕µξfor first order and  µy=  µx⊕µ ˙x⊕µξfor second order). The unknowns in these systems are the interaction laws and the distribution of the initial conditions, while everything else is assumed known. We construct estimators for  φxki,ki′ , φξki,ki′(resp.  φxki,ki′ , φ ˙xki,ki′ , φξki,ki′for second-order systems) that are close to the true interaction laws with high probability. Moreover, such estimators yield approximate systems, whose dynamics are approximations to the dynamics of the original system within the training time interval [T0, T], but can also provide approximations for emergent behaviors of collective dynamics, ranging from first-order opinion dynamics to second-order gravitational dynamics governing the planetary movement in our solar system. A key component of evaluating the emergent dynamics are appropriate measures of the presence of a specific emergent behavior, which will be discussed in 4.

Similar to the the algorithm presented in [1], the learning algorithm which we use for the more complex dynamics considered here starts from the introduction of suitable cost functions whose minimizers, over a suitable approximation space, determine the estimators. Equation (3) can be rewritten in a more compact form: � ˙X = f nc,x(X, Ξ) +  f φx(X, Ξ)

image

Here  X =�x⊤1 · · · x⊤N�⊤ ∈ RNd, Ξ =�ξ1 · · · ξN�⊤ ∈ RN; for the interaction kernels, we use the vectorized notations,  φx=  {φxk,k′ ∈ Hxk,k′}Kk,k′=1and  φξ = {φξk,k′ ∈ Hξk,k′}Kk,k′=1, and  f φx, f φξare the collection of the

corresponding right hand side terms in (3) respectively. Lastly,  f nc,x(X, Ξ) is defined as the vectorization of

the non-collective forces  Fx(xi, ξi) ∈ Rdand  f nc,ξ(X, Ξ) is defined as the vectorization of the non-collective

image

where the  ∥·∥S(·)norm is defined as

image

for  Z =�z⊤1 · · · z⊤N�⊤with each  zi ∈ Rd′ (d′ = dor 1). Here  ∥·∥is the same norm used in (3) and (4); Hx= �Kk,k′=1 Hxk,k′and  Hξ = �Kk,k′=1 Hξk,k′are finite-dimensional hypothesis spaces. We choose each of the hypothesis space  Hxk,k′be to a finite dimensional function space of piece-wise polynomials of degree p, with p = 0 or 1 (polynomials of higher degree can be used and other type of basis functions are also possible, e.g., clamped B-splines, see [1]), with polynomial pieces supported on intervals that form a uniform partition of the observed range of variables [Rxk,k′,min, Rxk,k′,max] × [Sxk,k′,min, Sxk,k′,max]. Hence, each  ϕxk,k′can be expressed in terms of the linear combination of the basis functions as follows

image

Similar definitions are used for each  Hξk,k′. Substituting this expression into the functionals above, the mini- mization becomes a set of linear equations,

image

image

Remark 3.1. In the case of missing ˙xi(t) (for first order system) or ¨xi(t) (for second order system), we will approximate it using an appropriate finite difference scheme. See Sec. 4.4 for details on how we setup the examples with or without derivative information.

In the case of the second-order dynamics described in (4), we introduce a new variable  vi(t) = ˙xi(t) ∈ Rdand let  V =�v⊤1 · · · v⊤N�⊤, a compact form of (4) is given as follows,

image

Here  H ˙x= �Kk,k′=1 H ˙xk,k′. By choosing appropriate finite dimensional hypothesis spaces for  Hx, H ˙xand  Hξ, e.g., piece-wise polynomials of degree p, we can simplify the least square problems down to the following linear systems which we solve to generate the necessary coefficients:

image

Here,  ⃗α = �(⃗αx)⊤ (⃗α ˙x)⊤�⊤with  ⃗αxbeing the collection of  αxk,k′,ηxk,k′’s and  ⃗α ˙xbeing the collection of α ˙xk,k′,η ˙xk,k′’s.

Remark 3.2. For further details regarding the construction of the learning matrices,  A, Aξ, and the right hand side vectors, ⃗b,⃗bξ, we refer the reader to Section 2 : Algorithm of the Supplementary Information of [1].

3.1 Computational Complexity

The learning approach, which is described in Sec. 3, can be easily parallelized in the m (number of initial conditions) variable. Although it takes MLD double-precision floating-point numbers (D = Nd + N for a first-order system, and D = 2Nd + N for a second-order system) to store the discrete trajectory data, each computing core j only needs to store  MjLDfloating-point numbers, with  Mj ≈ MNumber of Cores. Furthermore, each computing core does not need to hold all of the trajectory data in memory, since the assembly of the learning matrix and the right-hand-side vector needs only LD floating-point numbers (one system trajectory at a time). The sizes for the learning matrix and right hand side vector are:  n × nand  n ×1 (n = nxor n = nξfor a first-order system and  n = nx+  n ˙xor  n = nξfor a second-order system), respectively. Since we have  n2 ≪ LD, n2 ≪ MLD, which makes solving for our estimators extremely memory efficient. At each time instance, we have to compute the various pairwise variables, requiring  O(N 2) distance calculations, hence the algorithm performs a total of  O(MLN 2) computations of pairwise variables. In solving the linear system, it performs  O(n3) operations (or  O(n2log(n)), we take the worst cases scenario since we use the built-in pseudoinverse routine in MATLAB to avoid any possible issues with numerical stability). The total computational complexity is  O(MLN 2+  n3). Online learning can be built into our learning approach: as trajectory data from different initial conditions comes in, one can simply average the estimators from previous trajectory data with the estimators from the new trajectory data to obtain a better approximation.

We consider three different kinds of performance measures: how close the estimated interaction kernel(s) are to the true one(s), how well the trajectories of the system driven by the estimated interaction kernel(s) approximate the trajectories of the original system, and finally how well emergent patterns are reproduced/predicted in the system driven by the estimated interaction kernels. We use appropriate dynamics adapted measures, specified in section 4.1 and the Appendix.

4.1 Estimation error of interaction kernels

Following the definitions in [1], we introduce a set of probability measures to calculate the learning error between

image

ρE,k,k′T(for continuous trajectory) and  ρL,E,k,k′T(for discrete trajectory) are only used in the theoretical setting; in practice, we use  ρL,M,E,k,k′T(with large M and L) for actual implementations and applications. These measures depend on the dynamical system and the distribution of initial conditions, weighting the areas of pairwise distances (the variable r) and of variables  sxbased on how often trajectories of the system explore them.

Table 3 explains the definitions of the variables in (5).

image

Table 3: ρT’s, Definition of the Variables

In the case of  Nk,k′= 0, we define the corresponding  ρE,k,k′T (r, sx) to a zero function.

We measure the error of the interaction kernel estimators,  φxk,k′ − ˆφxk,k′, using the dynamics-induced weighted

image

However since  ρE,k,k′Tis not calculable, we use  ρL,M,E,k,k′Tinstead. The weight,  r2, comes from the governing structure of (3). Theoretical guarantees such as those in [1, 29] bound these errors, with high probability, as M grows. Extending those bounds to the general types of systems considered here will be investigated in future work. The results of our numerical experiments suggest that the learning rate, i.e. the rate of decrease of the error in (6), as a function of M is independent of the dimension of the state space of the system, and only depends crucially on the number of variables in the interaction kernel. The curse of dimensionality (of the state space) is therefore avoided.

4.2 Trajectory errors

We consider another performance measure, which might be estimated from data, especially when the true interaction kernel is not known, that quantifies the prediction capability of our estimators, by comparing the observed trajectories to the estimated trajectories evolved from the same initial conditions but using the estimated interaction laws. We will consider both X(t) =�x⊤1(t) · · · x⊤N(t)�⊤and  Ξ(t) =�ξ1(t) · · · ξN(t)�⊤for  t ∈ [T0, T]. Let  X[T0,T ] = {X(t)}t∈[T0,T ], then the following norm is used

image

Here ˆX[T0,T ]is the estimated trajectory using our estimators with the same initial condition as in  X[T0,T ]. The scaling by maxt∈[0,T ] ∥X(t)∥S(d)enables us to compare trajectory errors for different kinds of dynamics, especially those with large  ∥xi∥. Similarly,

image

For performance measures defined for �φξand the second order systems, please see sec. A in the appendix.

4.3 Confusion Matrix and Pattern Indicator Scores

When a system is highly sensitive on small perturbations, or even chaotic, it is hopeless to expect that the estimated system will produce trajectories that are accurate approximations of the trajectories of the original system, except perhaps for very small times. However we have observed that certain large-time aspects of the dynamics of the system, such as certain emergent behavior including flocking or milling or clustering, are preserved in our estimated system, even when the trajectory-wise errors are relatively large. On the one hand, this may seem surprising, as at no point do we inject any knowledge about such emergent behaviors, into our estimator; on the other hand if such emergent behaviors are thought of as being “structurally robust” to perturbations of the system, and even perturbations of the laws of the system (the interaction kernels), then it becomes reasonable to expect that our estimated systems should preserve, at least to some degree, such emergent behaviors. We therefore introduce a way of measuring quantitatively the presence of such emergent behaviors, and quantify the performance in reproducing them in our estimated systems.

In order to accurately describe the capability of our estimators to predict the correct emergent behaviors at large time  Tf ≫ T, we consider confusion matrices and “pattern indicator scores”. These are defined differently for each dynamical system to measure its unique emergent behavior.

Several aspects of the emergent behaviors that we are interested in are observables (i.e. functions defined on the state variables of the system). We define various emergent behavior scores, such as the flocking score, the milling score, etc., and choose a target range for the score to be in as an indicator of occurrence of the emergent behavior. For example, if the flocking score is within (0.99, 1], then flocking occurs. We calculate these scores on the true and estimated systems (systems with the same initial conditions as the true systems but evolved using the learned interaction law(s)). From this indicator of whether the emergent behavior occurred in the true/estimated system, we construct a confusion matrix, given as follows (in the case of learning flocking systems),

image

Table 4: General form of a confusion matrix. Each  pi,jshows a probability (represented in percentages) of the combination, e.g.,  p1,1is the probability of the predicted system (evolved using the estimated interaction laws) showing non-flocking behavior given that the true system shows non-flocking behavior with the same initial conditions.

It is used to present the probability of the occurrence of the desired emergent behaviors in the true and estimated systems. Namely, if the true systems exhibit flocking with high probability, then the estimated systems should ideally show flocking with similar probability.

In order to provide deeper insight about the prediction of emergent behavior via confusion matrices, we also consider the following statistics from the confusion matrix.

image

Table 5: Definition of the statistical terms related to the confusion matrix.

Next we use a more refined measurement, a so-called ‘pattern indicator score’, to further demonstrate the capabilities of the estimated system at predicting emergent behaviors. Besides the emergent behavior scores, there are other quantitative descriptions of the emergent behaviors, such as the center-of-mass velocity in flocking, the common rotational axis in milling, the conservation of total energy in concentric trajectories, etc. The pattern indicator scores use these, sometimes together with the previously defined emergent behavior scores, to measure how well the estimated systems are predicting these observables compared to the true systems. Details of the definition of the confusion matrices and pattern indicator scores for each dynamics are in Sec. 5 to Sec. 7.

4.4 Setup of the Numerical Experiments

Here we describe the general setup for the subsequent sections of experiments. The various dynamical systems we consider exhibit a wide variety of emergent behaviors: clustering, flocking, milling, synchronization, and concentric trajectories. Different forms of interaction kernels are also considered, i.e.,  φ(r), φ(r, s) and  φ(r; P), where P is an unknown vector of parameters. These dynamics range from first-order dynamics of homogeneous agents to second-order dynamics of heterogeneous agents. We arrange the examples in three major sections based on the different types of the interaction laws.

The experiments are setup as follows: we first run  Mρdifferent initial conditions generated i.i.d from the probability measure  µyfor initial condition, and evolve2the dynamics from 0 to T: the dynamics observed in [T0, T] is used to compute the probability measures  ρLT’s, which are empirical approximations to the prob- ability measures  ρT’s. We do this only to compute and report the  L2(ρT) approximation errors; in practice this step is not required nor needed. Next, we generate another set of M random initial conditions and corresponding trajectories of the dynamics for  t ∈[0, T], with each dynamics observed at L equidistant times T0 = t1 < t1 < · · · < tL = T, producing the observation data, i.e.  {yi(tl)m}N,L,Mi,l,m=1, without the corresponding derivative information (i.e., ˙yi(tl)mis not given, except for Synchronized Oscillator Dynamics and Gravitational Dynamics), as input to our estimation procedure. We construct the hypothesis spaces, where the estimators are found, on the learning intervals, e.g. [rk,k′min, rk,k′max] × [sk,k′min, sk,k′max], derived from the observation data; the numbers of basis functions, as well as their degrees, are reported in each section. We report the  L2(ρT) errors between the estimated and true interaction kernels, as well as the trajectory errors based on the statistics over the training set and over a testing set (with new initial conditions), in the form of (mean value)±(standard deviation). Then we consider the emergent behavior of the true dynamics and the predicted dynamics at  Tf ≫ T, and evaluate “pattern indicator scores” and confusion matrices corresponding to the various kinds of emergent behaviors. The parameters used by all experiments are reported in table 6.

image

Table 6: Common Parameters

Each section/subsection is presented in a similar manner: we introduce the model and discuss why such model interesting for our learning approach; then, we relate the model equation to the learning paradigm presented in (3) and (4). Next, we present our learning results in figures and tables, in terms of approximation error, trajectory error, confusion matrix and pattern indicators. We end with a brief discussion of the learning results.

We consider here three prototypical types of emergent behavior: clustering, flocking and milling. We examined four different examples of collective dynamics in order to comprehensively explore all three types of emergent behaviors, with very different dynamical behaviors.

5.1 Opinion Dynamics

The opinion dynamics (OD) model, first introduced in [2], is a prototypical first-order model of homogeneous agents which describes the interaction of people’s opinions through time, see details and extensions in [3, 4, 5, 30, 31, 32]. These models have gained popularity in modeling human’s social behavior, and they can be used to predict interesting social phenomena, namely, clustering/consensus of opinions.

The governing equations (xi ∈ Rdbeing a vector of opinions) are:

image

Here  φx(r) ≥0 for all  r ≥0. With the interaction kernels giving attractive influences only, these models are bound to have clusters of opinions at large time. Table 7a shows how this dynamical system is mapped to the general form (3). The parameters used for setting up the experiment used are shown in table 7b.

image

Table 7: Opinion Dynamics (OD)

We consider the following interaction law,

image

Piece-wise constant polynomials with  nx= 99 basis functions are used to approximate  φx. The comparison of the true  φxand the estimated ˆφxis shown in Fig.1.

image

Figure 1: (OD) Comparison of  φxand ˆφx, with the relative error being 1.4 ·10−1 ± 2 ·10−2(calculated using (6)). The true interaction kernel is shown in black solid line, whereas the mean estimated interaction kernel is shown in blue solid line with its confidence interval shown in blue dotted lines. Shown in the background is the comparison of approximated  ρLTversus the empirical  ρL,MT.

As it is shown in Fig. 1, not only can our estimator detect the discontinuity in the  φ, but also can it detect the compact support of  φ. Meanwhile, there is higher uncertainty in learning the interaction kernel at r = 0 (the information of  φx(0) is lost since it is weighted by corresponding  ri,i′) and at those discontinuity points. Since  φxis non-negative, the agents in the system would eventually converge to clusters, this decreases the effective number of pairwise distance data for inferring  φx. However, we are still able to provide an accurate estimator of  φxby the continuity of the estimator. The comparison of a trajectory driven by the true  φxversus the other one driven by the estimated ˆφxis shown in Fig. 2: there is no major visual difference between the true and predicted trajectories (generated from the training initial condition); the differences are quantified in table 8.

image

Figure 2: (OD) Comparison of X and ˆX, with the errors reported in table 8. The first row of trajectories are generated from an initial condition taken from the observation data. The second row of trajectories are generated from another randomly chosen initial condition. The first column of trajectories are generated from the true interaction kernel, whereas the second column of trajectories are generated from our estimated kernel with the same initial conditions. The color of the trajectory indicates the flow of time, from deep blue (at t = T0) to light green (at  t = Tf).

image

Table 8: (OD) Trajectory Errors: Initial Conditions (ICs) used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). The trajectory errors is calculated using (7).

The confusion matrix and pattern indicator scores used to examine the capability of our estimators predicting the proper emergent behaviors associated with the Opinion Dynamics model are defined as follows. First, a confusion matrix is used to show the accuracy of our estimator to display the same clustering behavior as the true systems, see the results in table 9.

image

Table 9: (OD) Confusion Matrix: ICs used in the training set (first two rows), new ICs randomly drawn from µx(second set of two rows). It is generated using table 4.

We provide more statistics about the confusion matrix in order to understand our prediction of clustering better in table 10.

image

Table 10: (OD) Confusion Matrix Statistics: ICs used in the training set, new ICs randomly drawn from  µx. It is generated using table 5.

Next, when the true system has clustering, we want to know if the predicted system can have the same number of clusters as the true system has. Hence, we assign a score of 1 when the predicted system shows the same number of clusters as the true systems; and a score of 0 when it predicts the wrong number of clusters. PI1is the average of those scores over M trials. Lastly, we want to compare the clusters between the true and predicted systems. Let C contain the centers of the clusters at time  T 3from the true system, ˆC contain the centers of clusters from the estimated system; we shall use Hausdorff distance to calculate the distance between C and ˆC. PI2is the average of M trials of such distances. See table 11 for details.

image

Table 11: (OD) Pattern Indicator Scores: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows).

The smaller PI1is, the better we are at predicting the number of clusters right. Meanwhile, not only could we predict the number of clusters with high confidence, but also could we predict the actual location of the clusters.

5.2 Cucker-Smale Dynamics

Modeling how animals (or other living agents) move in a cohesive group formation has been a challenging and well-studied problem [6, 33, 34], [9, 35, 12], [36, 37, 15]. There are different degress of cohesion in a collective system: flocking (where each agent shares a common velocity), milling (where each agent rotates around the same axis or about the same oint), and swarming (transition state between flocking and milling). We first consider the simplest cohesion in a collective system, namely flocking (see detailed work in [38, 39, 8, 40, 41, 34, 42], its mean field limit in [43, 44, 45], and extension to a stochastic system in [46] and references therein), and investigate the learnability of these flocking systems.

The Cucker-Smale (CS) dynamics is one of the prototypical examples of flocking agents4. Its governing equations are

image

Here  ai,i′(X) =  H ·(1 +  ∥xi′ − xi∥2)−βwhere  H, βare chosen parameters. Table 12 shows how this dynamical system is mapped to the general form (4).

image

Table 13: (CS) Parameters for Experiment Setup

With certain choices of H and  β, the Cucker-Smale system is guaranteed to produce flocking (where all agents have the same final velocity) see [38]. For example, when  β < 12, the system is guaranteed to have flocking regardless of initial conditions; when  β = 12, the system has conditional flocking depending on the initial configuration of velocities; when  β > 12, the system has conditional flocking depending on the initial configuration of both positions and velocities.

We consider the following interaction law,

image

With this interaction kernel, the agents are guaranteed to flock (see theorem 2, 3 in [38]). We use the following parameters in table 13 to set up the experiment. Piece-wise linear polynomials with  n ˙x= 100 basis functions are used to approximate  φ ˙x. The comparison of the true  φ ˙xand the estimated ˆφ ˙xis shown in Fig.3.

image

Figure 3: (CS) Comparison of  φ ˙xand ˆφ ˙xtogether with a plot of  ρLT, ˙rversus  ρL,MT, ˙r, with the relative error being 4.7 ·10−3 ± 2 ·10−4(calculated using (19)). The true interaction kernel is shown by in a black solid line, whereas the mean estimated interaction kernel is shown in a blue solid line with its confidence interval shown in blue dotted lines. Shown in the background is the comparison of approximated  ρLTversus the empirical  ρL,MT.

As it is shown in Fig. 3, our learning approach produce faithful approximation to  φ ˙x, especially capturing the tail behavior of the original interaction law, notwithstanding the scarcity of samples in that region of pairwise distances and speeds; towards r = 0, the estimated kernel is also close to the true kernel. The comparison of true trajectory X(t) and learned ˆX(t) is shown in Fig. 4.

image

Figure 4: (CS) Comparison of X and ˆX, with the errors reported in table 14. The first row of trajectories are generated from an initial condition taken from the observation data. The second row of trajectories are generated from another randomly chosen initial condition. The first column of trajectories are generated from the true interaction kernel, whereas the second column of trajectories are generated from our estimated kernel with the same initial conditions. The color of the trajectory indicates the flow of time, from deep blue (at t = T0) to light green (at  t = Tf).

Fig. 4 shows no visual difference between the true trajectories and the learned trajectories (for the training initial condition and a randomly chosen initial condition), we provide a quantitative insight into the difference between trajectories in table 14.

image

Table 14: (CS) Trajectory Errors: Initial Conditions (ICs) used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). The trajectory errors in x/v is calculated using (7)/(8).

We consider the Flocking score (at  t = Tf) taken from [6],

image

When  Iflock= 0, perfect flocking occurs; however we use  Iflock < 0.1 to indicate flocking. The prediction capability of the estimated systems in the form of confusion matrix is reported in table 15.

image

Table 15: (CS) Confusion Matrix: ICs used in the training set (first two rows), new ICs randomly drawn from µx(second set of two rows). It is generated using table 4.

With  β= 14 < 12, the true system is guaranteed to show flocking. Since we have no control over when the flocking would occur, we use  Iflockto help us capture the essence of flocking behavior, i.e.  Iflock= 0 (or close to 0). And our estimated system can capture the same behavior (flocking or not) with high probability. We provide more statistics about the confusion matrix in order to understand our prediction of clustering better in table 16.

image

Table 16: (CS) Confusion Matrix Statistics: ICs used in the training set, new ICs randomly drawn from  µx. It is generated using table 5.

In order for us to provide more quantitative insight into the predication capability of our estimator for the case of flocking, we consider two different pattern indicator scores. First, PI1is the relative error of  Iflockbetween true and predicted systems, averaged over M trials. Second, we consider another important quantity, the center of mass velocity,  vCM. It is given by  vCM=

for i = 1, · · · , N), vCM= 1N�Ni=1 vi(Tf). Then we define PI2to be the relative error of the predicted center of mass velocity and true center of mass velocity averaged over M trials. The scores are reported in table 17.

image

Table 17: (CS) Pattern Indicator Scores: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows).

As it is shown in table 17, our estimated system can predict  Iflockwith relatively high accuracy. Surprisingly, our estimated system can reproduce  vCMdown to numerical accuracy.

5.3 Fish Milling in 2 dimensions

Next we consider a more complicated cohesive collective system: a dynamical system which produces milling patterns, where each agent rotates around the same axis or about the same point. The models we consider have been proposed in [9, 12] (see references therein for a variety of sources for the biological roots of these models). Useful background references for the two-dimensional models are [47, 10] as well as the primer [48]. Further theoretical study of models of this type has been done in [49, 50, 11].

The governing equations of the Fish Milling Dynamics in  R2(FM2D) of [9] are,

image

Here,  Uiis the Morse potential describing the interaction of the  ithagent with the other agents in the system, defined as follows

image

Here  Ca/Crare the attraction/repulsion strengths and  ℓa/ℓrare the effective attraction/repulsion lengths. Table 18 shows how the FM2D dynamics fits into the framework of (4).

image

Table 19: (FM2D) Parameters for Experiment Setup

The delicate balance between the self-propelling force produced by  Fx(xi, ˙xi) and the collective force induced by the energy kernel  Uican create a wide range of patterns for different initial conditions. Milling patterns (single or double milling) are one of the most interesting ones. Unlike the well-understood Cucker-Smale model, necessary and sufficient conditions on the interaction kernels and ICs that guarantee the existence milling patterns seem to be unknown. These milling patterns result from the balance of the non-collective force and the collective force induced by the energy kernel  Ui(especially when  Uiis not H-stable, double-milling would occur, see Fig. 1 in [9]), and are therefore rather sensitive to the selection of parameters: relatively small differences in the interaction laws can correspond to dynamical systems with very different dynamical patterns. The estimator error between the true and estimated interaction kernel may therefore offer little insight information into how well our estimated dynamics can re-produce milling patterns at large time. The confusion matrix and pattern indicator scores are finer indicators of performance in this case.

We consider the following interaction law,

image

With this setup, a double-milling pattern appears 100% of the time (see [12]). The other parameters are reported in table 19. We use piecewise constant polynomials with  nx= 122 basis functions to approximate  φx. The comparison of the true  φ ˙xand the estimated ˆφ ˙xis shown in Fig.5.

image

Figure 5: (FM2D) Comparison of  φxand ˆφx, with the relative error being 6.0·10−2±2·10−3(calculated using (6)). The true interaction kernel is shown in black solid line, whereas the mean estimated interaction kernel is shown in blue solid line with its confidence interval shown in blue dotted lines. Shown in the background is the comparison of approximated  ρLTversus the empirical  ρL,MT.

As it is shown in Fig. 5, our estimator closely resembles  φx, however when r is close to 0, there is a sharp drop of  φxto  −∞, the availability of r data close to 0 becomes scarcer, and since we are using a uniform basis to approximate  φx, the difference between  φxand ˆφxis apparent in this range. The comparison of the true trajectory X(t) and learned ˆX(t) is shown in Fig. 6.

image

Figure 6: (FM2D) Comparison of X and ˆX, with the errors reported in table 20. The first row of trajectories are generated from an initial condition taken from the observation data. The second row of trajectories are generated from another randomly chosen initial condition. The first column of trajectories are generated from the true interaction kernel, whereas the second column of trajectories are generated from our estimated kernel with the same initial conditions. The color of the trajectory indicates the flow of time, from deep blue (at t = T0) to light green (at  t = Tf).

Our predicted system can still estimate the position/velocity of the agents in large time, i.e., for  t ≫ T, with relatively small error, around 10−2. Moreover, when the dynamics enters its milling state, our predicted system is also in the same milling state. We provide a quantitative insight into the difference between trajectories in table 20.

image

Table 20: (FM2D) Trajectory Errors: Initial Conditions (ICs) used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). The trajectory errors in x/v is calculated using (7)/(8).

We are getting 10−1relative accuracy for estimating positions/velocities within the learning time interval (i.e. [0, T]); however, as time goes on, we can see roughly linear growth of the errors (Tf= 5T).

We consider the center of mass position  xCM(t) =

xCM(t) = 1N xi(t). We consider the indicator score  Is(at  t = Tf), Is = Iflock − Imill, where  Iflock, Imillare taken from [9]. Here, the flocking score  Iflockis defined as,

image

Again  Iflock= 1 when perfect flocking occurs. Then we consider the the milling score  Imillas follows,

image

When  Imill= 1 when perfect milling5(around the same axis) occurs; meanwhile  Iflock= 0 if  Imill= 1. Therefore Is ∈ [−1,1]. As suggested by the thresholds in [12], we use the case,  Is ≤ −0.5, to indicate milling. The true systems always show milling pattern (in fact, it shows double milling), and 100% of our estimated systems also show milling.

Next for the pattern indicator scores, let PI1be the relative error of  Isover M trials. And PI2is the relative error between the pair (Iflock, Imill) (in  ℓ2norm) over M trials. The scores are reported in table 21.

image

Table 21: (FM2D) Pattern Indicator Scores: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows).

Milling patterns in dynamics are very delicate. The intricate balance  α/βand the H-stability of  Uidecides the appearance of milling in a dynamics, especially when  Uiis not H-stable for double milling patterns. In the case of the true dynamics showing milling (to be exact, double milling), our predicted systems can capture the same behavior (with high accuracy) both in terms of  Isand the pair, (Iflock, Imill).

5.4 Fish Milling in 3 dimensions

Next, we consider a cohesive collective dynamics in 3D of self-propelled particles within a fluid environment, introduced in [12]. It is a more complicated 3D extension of the FM2D model, where agents could experience self-propelling force in a fluid.

The governing equations of the Fish Milling Dynamics in  R3(FM3D) are,

image

Here, u is the lab-frame fluid velocity generated at position  xi, −γ( ˙xi − u(xi)) gives the drag force (γ >0), FM( ˙xi, u(xi)) represents the self-propelling motility force, and  −∇xiUiis the agent-to-agent interaction force on agent i, and the energy potential  Uiis the same Morse potential defined in sec. 5.3.  FMis defined as follows

image

The parameters,  α, β >0, give the self-acceleration and deceleration, respectively; 0  ≤ λ ≤1 is a perception coefficient, with  λ= 0 showing a “clear” fluid (and it gives the classical Rayleigh-Helmholtz friction), and  λ= 1 for an “opaque” fluid; and the lab-frame fluid velocity u is given as follows

image

Table 23: (FM3D) Parameters for Experiment Setup

The delicate balance between the self-propelling force (in the presence of a fluid environment) and the collective force induced by the energy kernel  Uican create a wide range of patterns for such dynamics. And the H-stability of  Uiis the key at producing milling patterns. Again, we want to understand the milling pattern and predict such a pattern with our estimators when the true system shows milling.

We consider the following interaction law,

image

We also use the parameters in table 23 to set up the experiment. Piece-wise linear polynomials with  nx= 74 basis functions are used to approximate  φx. The comparison of the true  φ ˙xand the estimated ˆφ ˙xis shown in Fig.7.

image

Figure 7: (FM3D) Comparison of  φxand ˆφx, with the relative error being 1.49 ·10−1 ± 3.4 ·10−3(calculated using (6)). The true interaction kernel is shown in black solid line, whereas the mean estimated interaction kernel is shown in blue solid line with its confidence interval shown in blue dotted lines. Shown in the background is the comparison of approximated  ρLTversus the empirical  ρL,MT.

As it is shown in Fig. 7, our estimator, ˆφx, deviates from  φxfor r close to 0, for similar reasons as those discussed for the 2D case. The comparison of the true trajectory X(t) and learned ˆX(t) is shown in Fig. 8.

image

Figure 8: (FM3D) Comparison of X and ˆX, with the errors reported in table 24. The first row of trajectories are generated from an initial condition taken from the observation data. The second row of trajectories are generated from another randomly chosen initial condition. The first column of trajectories are generated from the true interaction kernel, whereas the second column of trajectories are generated from our estimated kernel with the same initial conditions. The color of the trajectory indicates the flow of time, from deep blue (at t = T0) to light green (at  t = Tf).

A 3D milling pattern is more complicated than its 2D counterpart. In the case of our experiments, some of the trajectories show a pattern of rotation about a fixed point. And our estimated dynamics also shows similar behavior. We provide a quantitative insight into the difference between trajectories in table 24.

image

Table 24: (FM3D) Trajectory Errors: Initial Conditions (ICs) used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). The trajectory errors in x/v is calculated using (7)/(8).

image

We consider the center of mass velocity  vCM(t) =

image

Here, the flocking score  Iflockis defined as,

image

next,

image

When  Imill= 1 when perfect milling (around the same axis) occurs; meanwhile  Iflock= 0 if  Imill= 1. Therefore Is ∈ [−1,1]. As suggested by the thresholds in [12], we use the case,  Is ≤ −0.5, to indicate milling, see the results in table 25.

image

Table 25: (FM3D) Confusion Matrix: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). It is generated using table 4.

The true FM3D systems show a non-milling6pattern. Furthermore, our predicted systems show a exceptionally similar probability of displaying the non-milling patterns.

We provide more statistics about the confusion matrix in order to understand our prediction of milling behavior better in table 26.

image

Table 26: (FM3D) Confusion Matrix Statistics: ICs used in the training set, new ICs randomly drawn from µx. It is generated using table 5.

Next, we use the pattern indicator scores to probe deeper into the actual large-time behavior of our predicted systems. PI1is the relative error of  Is. And PI2is the relative error of predicting the pair (Iflock, Imill) (in  ℓ2norm). The scores are reported in table 27.

image

Table 27: (FM3D) Pattern Indicator Scores: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows).

Not only can we predict  Iswith relatively high accuracy, but also can we offer insight into the actual pair of scores, (Iflock, Imill).

The flexibility of the learning algorithm given in [1] allows for a generalization of the dynamical system where the interaction kernels can depend on more than just one variable, i.e., more than just r (the pairwise distance data). For example, in modeling the movement of groups of animals, field of vision can affect how individuals influence each other; in synchronized fireflies, not only can the fireflies form spatial patterns, their light-emitting states can be also be locked in synchronization. We consider an important example in [51] which models how oscillators can sync and swarm together, hence the interaction kernels depend on both r and  ξ(the pairwise difference in phases). Further study of this type of model has been done in [52, 53, 54, 55], a review with applications to computation is given in [56], and a historical review of the development of the synchronization models can be found in [57]. These authors sought to develop a plausible model that could explain systems where a phase or real-valued feature affects the motion – and vice versa. They called such systems “swarmalators” to emphasize the combined behavior of swarming and synchronized oscillation of phases in the system. For the Synchronized Oscillator Dynamics (SOD), each agent is indexed by  i, ξiis its phase,  xiis (as usual) its position,  ωiis the fixed natural frequency,  viis the fixed self-propulsion velocity. The dynamics of  xiand ξiare governed by the following equations, �˙xi = vi+ 1N�Ni′=1� xi′−xi∥xi′−xi∥(A + Jcos(ξi′ − ξi)) − B xi′−xi∥xi′−xi∥2�

image

Table 28 shows how the SOD dynamics fits into the framework of (3).

image

Table 29: (SOD) Parameters for Experiment Setup

With certain choices of A, J, B and K, the SOD dynamics is going to produce either a static or a non-static spatial pattern with either phases in sync or out of sync (a total of 5 different states, see [51] for details). We consider the following interaction law,

image

Here K and J are changing and we take  sx=  sξ=  ξ(the pairwise difference in the phases, i.e.,  ξi′ − ξi). We consider a particular set of (J, K) values, i.e. (J, K) = (0.1, 1), which gives a static synchronous state. In table 29 we describe the other parameters that we use to set up the experiment. Here we use piecewise linear polynomials with  nx= 900 (with 30 basis functions in each dimension) and  nξ= 900 (with 30 basis functions in each dimension) basis functions to approximate  φxand  φξ. The comparison of the true  φ ˙xand the estimated ˆφ ˙xis shown in Fig.9.

image

Figure 9: (SOD) The true interaction laws are shown in black, and the mean estimated interaction laws are shown in blue.

As is shown in Fig. 9a and Fig. 9b, even with the interaction laws being 2-dimensional, we can still infer from the data with around 10−1relative accuracy with relatively small number of basis functions. A comparison of the true trajectory X(t) and learned ˆX(t) is shown in Fig. 10.

image

Figure 10: (SOD) Comparison of X and ˆX, with the errors reported in table 30. The first row of trajectories are generated from an initial condition taken from the observation data. The second row of trajectories are generated from another randomly chosen initial condition. The first column of trajectories are generated from the true interaction kernel, whereas the second column of trajectories are generated from our estimated kernel with the same initial conditions. The color of the trajectory indicates the flow of time, from deep blue (at t = T0) to light green (at  t = Tf).

A visual comparison of the trajectories between the true and estimated dynamics shows that the difference is small. An more quantitative insight into the difference between trajectories is provided in table 30.

image

Table 30: (SOD) Trajectory Errors: Initial Conditions (ICs) used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). The trajectory errors in  x/ξis calculated using (7)/(9).

The Synchronized Oscillator dynamics is a complex dynamical system with  xiand a periodic  ξiinteracting with each other within the agents themselves and also collectively among the other agents; however we are still able to maintain 2-digit relative accuracy in reproducing the trajectories, and in predicting the large-time behavior of the trajectories, the errors do not grow exponentially. We are considering the static synchronous state, hence we check the distribution of the phases at time  Tfto see if the phases are in sync. In particular, we use the mean and the variance of the phases at time  Tf. If the variance of  {ξi(Tf)}Ni=1is smaller than 0.01, we say that the dynamics is in static synchronous state. Since the true systems always show synchronous state, our estimated systems also shows synchronization of phases with the same certainty.

Next, we use the following pattern indicator scores to discuss the quantitative predication performance of our estimators. PI1is the relative error of the variance of the phases at time  Tf, and PI2is the relative error of the mean of the phases at time  Tf. The scores are reported in table 31.

image

Table 31: (SOD) Pattern Indicator Scores: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows).

Our estimated systems display exactly the same synchronization behavior as the true system (variance of the phases is exactly 0). Meanwhile, we can reproduce the final synchronized phase with relative high accuracy, however it comes with a big uncertainty.

image

In recent years, there has been rapid growth in developing algorithms (either theoretical or numerical) to identify the governing equations of physical systems based on observed data. A notable collection of these approaches assume a parametric form of the equations to perform various kinds of regression, usually sparse regression against an enormous library of standard mathematical functions, to fit the parameters [25, 58, 59]. Other approaches use force-based, statistical mechanics, and multiscale methods – see the works [60, 61, 37, 27, 24]. Our non-parametric learning approach can also be used to discover the elaborate structure of the true interaction law, i.e.,  φ(r) =  φ(r; P), where  P =�p1 · · · pk�is a vector of parameters. In many settings,  φcan be written as  φ(r; P) =  J(P )φm(r), where  J(·) might offer physical insight through its effect on the parameters. In this paper, we will focus on the case when P is 1-dimensional, i.e., a family of one-parameter interaction laws.

We consider a simplified planetary movement in our solar system (GSS) as a second order collective dynamical system example with parametric interaction laws. We take  xi(t) ∈ R2or  R3as the position of each planet (only the planets in the inner-solar system are considered, i.e., Mercury, Venus, Earth and Mars, hence N = 5). Their positions are governed by the following form of Newton’s Law,

image

˜mIiis the inertia mass of the  ithastronomical object (AO), and ˜mGiis the gravitational mass of the corresponding AO. In our setting we will assume that they are the same, hence (12) can be simplified to,

image

Here ˜miis the unknown mass of the  ithAO, and G = 6.67408  ·10−11m3kg−1s−2is the gravitational constant (known to the algorithm). There are a total of 5 different types of agents (each AO is of its own type) in this system, and the true interaction laws are

image

Here, the  φxk,k′is parameterized by  J(p1) =  Gp1with  p1= ˜mk′. Table 32 shows how the GSS dynamics fits into the framework of (4).

image

Table 33: (GSS) Parameters for Experiment Setup

We also use the parameters in table 33 to set up the experiment. These parameters are based on simple astronomical features of the system and are used for simulation of the dynamics and getting an appropriate and realistic number of observations. We use piecewise linear polynomials with  nEk,k′= 100 for  φxk,k′when  k ̸= k′, and piecewise constant polynomials with  nEk,k= 1 for  φxk,k. Each AO is given an index as follows: the Sun is assigned an index 1, and depending on the distance from the Sun, the index is increased gradually, and stopping at 5 for Mars. We use the following units: 1 day for time scale, 106km for length scale, and 1024kg for mass scale. The gravitational constant G becomes 8.642 · 6.67408  ·10−6(106km)3(1024kg)−1day−2. We also use the following data from NASA in table 34.

image

Table 34: (GSS) NASA Data for Each AO

The initial position distribution for the astronomical objects,  µx, is constructed as follows: the Sun is always placed at the origin, whereas the planets are randomly placed on ellipses with their corresponding perihelion and aphelion data, and the Sun is sitting at one of the foci (Sun is the common focus of all initial elliptical trajectory). We construct a distribution,  µ ˙x, which gives the initial velocities for the astronomical objects as follows: the Sun always has zero initial velocity, whereas the planets will have their initial velocity depending on their initial position and satisfying the Vis-Viva equation (see [13] for details). The comparison of the true φxk,k′’s and the estimated ˆφxk,k′’s is shown in Fig.11.

image

Figure 11: (GSS) Comparison of  φxk,k′’s and ˆφxk,k′’s , the relative errors are reported in table 35. For example, φx1,2on cell (1, 2) of the plot represents the true force Mercury has on Sun. Within each sub-plot, the true interaction kernel is shown in black solid line, whereas the mean estimated interaction kernel is shown in blue solid line with its confidence interval shown in blue dotted lines. Shown in the background of each sub-plot is the comparison of approximated  ρL,k,k′T,r(in lighter color) versus the empirical  ρL,M,k,k′T,r(in darker color).

We inferred a total of  N 2= 25 different interaction laws all together from the observation data. As shown in Fig. 11, the interactions from planets on the Sun and the Sun on planets are estimated with high accuracy, however the estimated inter-planet interactions offer little valuable insight of the original interactions. This is likely driven by the domination of the sun in terms of effect on the dynamics – due to its mass. The effect of the Sun’s mass creates a form of ill-posedness of the system which affects the accuracy of our estimation. Realizing the possibility of a parametric form of the interaction laws, we go through a delicate decoupling procedure detailed in 7.1, and produce a cleaned up version of ˆφxk,k′’s, shown in Fig. 12.

image

Figure 12: (GSS) Comparison of  φxk,k′’s and cleaned-up ˆφxk,k′’s , the relative errors are reported in table 36. Similar layout and setup as in Fig. 11.

As shown in Fig. 12, we are able to de-noise the original estimators and obtain a much cleaner presentation of the interaction laws. Relative  L2(ρT)-errors for each  φxk,k′are provided in tables 35 and 36 in order to re-affirm our claim.

image

Table 35: (GSS) Relative errors for the estimators, ˆφxk,k′(calculated using (6)).

image

Table 36: (GSS) Relative errors for the cleaned-up estimators, ˆφxk,k′(calculated using (6)).

The comparison of true trajectory X(t) and ˆX(t) is shown in Fig. 13.

image

Figure 13: (GSS) Comparison of X and ˆX, with the errors reported in table 37. The first row of trajectories are generated from an initial condition taken from the observation data. The second row of trajectories are generated from another randomly chosen initial condition. The first column of trajectories are generated from the true interaction kernel, whereas the second column of trajectories are generated from our estimated kernel with the same initial conditions. The color of the trajectory indicates the flow of time, from t = 0 to t = 4T; and each AO uses a different set of colors, as given by the color bar on the right.

image

Table 37: (GSS) Trajectory Errors: Initial Conditions (ICs) used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows). The trajectory errors in x/v is calculated using (7)/(8).

Since the conservation of the sum of gravitational potential energy and kinetic energy of each planet produces the elliptical orbits around the Sun, we will consider the conservation of total energy (as the sum of gravitational potential energy and kinetic energy) of each planet and Sun as the emergent behavior. The total energy for each planet at time t is calculated as,

image

ri,1(t) =  ∥xi(t) − x1(t)∥and  si = ∥vi(t)∥. Then we consider the variance and mean of the total energy

(associated to each planet) over time, i.e.,

image

When  EVari <10−2for i = 2, · · ·5, we consider the total energy to be conserved. Not surprisingly, with the total energy of the true system being always conserved, and with the predicted positions as well as their corresponding velocities of each AO estimated with about 10−2relative errors, 100% of the estimated systems show conservation of total energy. We also consider a set of Pattern Indicator scores to quantitatively measure the capability of our estimators to predict limit cycles correctly for GSS. PI1measures the relative errors between the energy variance from the true system and the predicted system over M trials. And PI2measures the relative errors between the mean energy from the true system and the predicted system over M trials. The scores are reported in table 38.

image

Table 38: (GSS) Pattern Indicator Scores: ICs used in the training set (first two rows), new ICs randomly drawn from  µx(second set of two rows).

Notice that the original system has its total energy variance being close to zero, and we are able to reproduce the total energy variance which is close to zero; moreover, the total energy of the predicted system for each planet resembles closely of its counterpart in the true system.

We have studied the Solar system as an interacting agent-based systems where each agent representing a different type due to the mass-based gravity. However, it is clear that there is only one underlying interaction law with an associated parameter (the mass) for each agent. Our learning approach performs well without any knowledge of this structure and produces each pairwise interaction as a different function. But in fact, they are a family of functions parameterized by one single parameter, which is the mass of each agent. Therefore, in this subsequent section, we proceed to show that the learned functions are close to the known gravitational kernel and that we can discover the underlying masses using an appropriate decoupling procedure.

7.1 Discovery of the Parametric Form

Having examined the behaviors of ˆφx1,k′and ˆφxk′,1(for  k′= 2, · · · , N) closely, we observe an interesting behavior of our estimators, which is that ˆφx1,k′and ˆφxk′,1(for  k′ ̸= 1) behave roughly the same, except at different scales. Such behavior prompts us to consider a single-parameter parametric structure of ˆφxk,k′’s, i.e.,

image

Remark 7.1. We do not assume any particular form of ˆφxm(r), except that ˆφxm(r) being continuous.

In fact, the original gravitational interaction kernels are parameterized by  G ˜mi′, i.e.  φxk,k′(r) =  G ˜mk′ · 1r3.

Remark 7.2. The gravitational constant G represents the length and time scales on which the experiment is conducted, and it will not be identifiable by our decoupling procedure. Therefore, we assume that G is known. In fact, the first implicit measurement of G with about 1% accuracy is attributed to Henry Cavendish in the Cavendish experiment performed in 1797  −1798, and the result was published in Philosophical Transactions of the Royal Society. Using the estimated G, with the radius of Earth first calculated by the Greek mathematician Eratosthenes in approximately 230 BC, and the gravitational acceleration,  g ≈ 9.8m/sec2, determined by Galileo in the 16thcentury, one can calculate the mass of the Earth, by connecting Newton’s second law and universal law of gravitation, to get  MEarth= 5.98 ·1024kg.

Since ˆφxk,k′ ≈ φxk,k′(for k = 1 or  k′= 1), we want to decouple  βk′and ˆφxm(r) from ˆφxk,k′through a three- step optimization procedure. First, we consider a sequence of points  {rq}Qq=1from the supports of ˆφx1,k′for

k′= 2, · · · N (rq’s are taken as the centers of the sub-intervals where the basis functions are built), and the following loss function,

image

f1is minimized over  βk′ ≥0 for  k′= 1, · · · , Nand ˆφxm(rq) ∈ Rfor q = 1, · · · , Q. We only keep portion of the minimizer, namely,  {ˆφx,∗m(rq)}Qq=1, due to the fact that the Sun related terms have significantly more dominance in  f1. Second, we extend the discrete values of  {ˆφx,∗m(rq)}Qq=1to a continuous function, and express ˆφxmas a linear combination of basis functions  ψη(clamped B-spline functions of degree 2) over the interval [R1, R2], where

image

Hence,

image

Then, we do a regularized least square fit to  {ˆφx,∗m(rq)}Qq=1, using the following loss function,

image

Here we take  λ= 10−3. The result of extending the discrete points to a continuous function is shown in Fig. 14.

image

Figure 14: (GSS) Extension from discrete  {ˆφx,∗m(rq)}Qq=1’s to a continuous ˆφxm. The 1r3line is shown as a reference line and it is not used in the learning of  {ˆφx,∗m(rq)}Qq=1’s nor the extension procedure.

image

function,

image

The re-scaling by �Qq=1(ˆφxk,1(rq))2dρL,M,x,k,1T (rq) and �Qq=1(ˆφx1,k′(rq))2dρL,M,x,1,k′T (rq) is to keep all terms balanced, and in this particular instance it especially counters the dominance of the mass of the Sun, whose mass takes up more than 95% of the mass of the whole solar system. The appropriate use of the dynamics-adapted measures enables us to identify parameters correctly.  f3is minimized over  βk′ ≥0 for  k′= 1, · · · , N. The minimizer  β∗k′together with ˆφxm(from the previous two steps), will have the following form

image

In order to offer deeper understanding of the difficulty of estimating the masses of each AO, we provide the relative errors for estimating the mass of each astronomical object in table 39 along with the mean and standard deviation of the estimated masses.

image

Table 39: (GSS) True, Estimated Masses, and the Relative Errors. Recall that the masses are measured in unit: 1024kg. Notice the immense difference in the scales of the masses, with the mass of the Sun taking up over 99% of the whole Solar system, which makes the mass estimation problem severely ill-posed.

We have demonstrated the effectiveness and efficiency of a nonparametric inference procedure to estimate the governing structure of various kinds of collective dynamics from observation of short-time trajectory data. Such estimators can be also used to predict the correct type of emergent behaviors of the observed systems at larger timescales than those obtained from the training data. The governing models proposed in section 2 encompass a wide range of dynamical systems of significant theoretical and computational interests to the physics, biology, and social science communities; and the algorithm in section 3 scales efficiently to a large number of homogeneous or heterogeneous agents.

The systems included first-order, one-dimensional interaction kernels (Opinion Dynamics in Sec. 5.1), second-order one-dimensional interaction kernels (Cucker-Smale, Self-Propelling Particles in 2D/3D, in Sec. 5.2 to 5.4), first-order two-dimensional interaction kernels (Synchronized Oscillator in Sec. 6), and second-order families of one-dimensional interaction kernels with underlying, but unknown, single parameters (Gravitational System in Sec. 7). In all cases, our estimators exhibit high precision in terms of standard performance measures, as well as high accuracy at capturing the proper type of emergent behaviors as measured by the confusion matrix and pattern indicator scores appropriate to the system. Our final example studied the intrinsic parametric structure of our learned estimators, which leads to the discovery of some fundamental physical concepts, such as accurate mass and the underlying shared kernel of 1r2for gravitational force.

Further study of more intricate parametric structure of the interaction laws is ongoing as well as the theoretical foundations of the systems (3),(4). We are also preparing the study of emergent behaviors on more complex systems with more elaborate interaction laws and governing structures.

Acknolwedgements. We acknowledge support from NSF-ATD-1737984, AFOSR FA9550-17-1-0280, NSF-IIS-1546392, NSF-IIS-1837991, NIH - T32GM119998; we thank Duke University and Prisma Analytics Inc. for free use of computing resources.

Similar to what we have defined for measuring the performance of �φx, we will use  ρξ,k,k′Tto give the performance

image

For (�φx, �φ ˙x, �φξ) learned from any second-order system, we will need two new sets of probability distributions.

image

Here, Y =

image

Here,  ξi,i′(t) =��ξi′(t) − ξi(t)��. The prediction error,  φxk,k′ − ˆφxk,k′, is measured in the same norm defined in (6); for  φξk,k′ − ˆφξk,k′, but it is weighted differently,

image

[1] F. Lu, M. Zhong, S. Tang, M. Maggioni, Nonparametric inference of interaction laws in systems of agents from trajectory data, Proceedings of the National Academy of Sciences 116 (29) (2019) 14424–14433. doi:10.1073/pnas.1822012116.

[2] U. Krause, A discrete nonlinear and non-autonomous model of consensus formation, Commun Part Diff Eq 2000 (2000) 227–236.

[3] I. Couzin, J. Krause, N. Franks, S. Levin, Effective leadership and decision-making in animal groups on the move, Nature 433 (7025) (2005) 513 – 516.

[4] V. D. Blodel, J. M. Hendricks, J. N. Tsitsiklis, On Krause’s multi-agent consensus model with state- dependent connectivity, IEEE Trans Automat Contr 54 (11) (2009) 2586 – 2597.

[5] S. Mostch, E. Tadmor, Heterophilious Dynamics Enhances Consensus, SIAM Rev 56 (4) (2014) 577 – 621.

[6] F. Cucker, S. Smale, Emergent behavior in flocks, IEEE Trans Automat Contr 52 (5) (2007) 852.

[7] F. Cucker, E. Mordecki, Flocking in noisy environments, J Math Pure Appl 89 (3) (2008) 278 – 296.

[8] F. Cucker, J.-G. Dong, A general collision-avoiding flocking framework, IEEE Trans Automat Contr 56 (5) (2011) 1124 – 1129.

[9] Y. li Chuang, M. R. D’Orsogna, D. Marthaler, A. L. Bertozzi, L. S. Chayes, State transitions and the continuum limit for a 2D interacting, self-propelled particle system, Physica D: Nonlinear Phenomena 232 (1) (2007) 33–47. doi:10.1016/j.physd.2007.05.007.

[10] N. Abaid, M. Porfiri, Fish in a ring: Spatio-temporal pattern formation in one-dimensional animal groups, Journal of the Royal Society Interface 7 (51) (2010) 1441–1453. doi:10.1098/rsif.2010.0175.

[11] G. Albi, D. Balagu´e, J. A. Carrillo, J. V. Brecht, Stability analysis of flock and mill rings for second order models in swarming, SIAM Journal on Applied Mathematics 74 (3) (2014) 794–818. arXiv:arXiv: 1304.5459v1, doi:10.1137/13091779X.

[12] Y. L. Chuang, T. Chou, M. R. D’Orsogna, Swarming in viscous fluids: Three-dimensional patterns in swimmer- and force-induced flows, Physical Review E 93 (4) (2016) 1–12. doi:10.1103/PhysRevE.93. 043112.

[13] T. S. Logsdon, Orbital mechanics: theory and applications, John Wiley, 1998.

[14] T. Vicsek, A. Czir´ok, E. Ben-Jacob, I. Cohen, O. Shochet, Novel Type of Phase Transition in a System of Self-Driven Particles, Physical Review Letters 75 (1995) 1226–1229. arXiv:cond-mat/0611743, doi: 10.1103/PhysRevLett.75.1226.

[15] K. Tonstrom, Y. Katz, C. C. Ioannou, C. Huepe, M. J. Kutz, I. D. Couzin, Collective states, multistability and transitional behavior in schooling fish, Computational Biology 9. arXiv:e1002915.

[16] G. Gr´egoire, H. Chat´e, Onset of collective and cohesive motion, Phys Rev Lett 92 (2004) 025702. doi: 10.1103/PhysRevLett.92.025702. URL https://link.aps.org/doi/10.1103/PhysRevLett.92.025702

[17] Y. Chuang, Y. Huang, M. D’Orsogna, A. Bertozzi, Multi-vehicle flocking: scalability of cooperative control algorithms using pairwise potentials, IEEE Intern Conf Robotics and Automation (2007) 2292 – 2299.

[18] N. Bellomo, P. Degond, E. Tadmor (Eds.), Active Particles, Volume 1, 1st Edition, Springer International Publishing AG, Switerland, 2017.

[19] S. M. Stigler, The History of Statistics: The Measurement of Uncertainty Before 1900, 1st Edition, Harvard University Press, 1986.

[20] A. Tsybakov, Introduction to Nonparametric Estimation, 1st Edition, Springer Publishing Company, In- corporated, New York, 2008.

[21] L. Gy¨orfi, M. Kohler, A. Krzyzak, H. Walk, A distribution-free theory of nonparametric regression, Springer, New York, 2002.

[22] R. DeVore, G. Kerkyacharian, D. Picard, V. Temlyakov, Approximation methods for supervised learning, Found Comput Math 6 (1) (2006) 3–58.

[23] P. Binev, A. Cohen, W. Dahmen, R. DeVore, V. Temlyakov, Universal algorithms for learning theory part i: piecewise constant functions, J Mach Learn Res 6 (Sep) (2005) 1297–1321.

[24] H. Schaeffer, R. Caflisch, C. D. Hauck, S. Osher, Sparse dynamics for partial differential equations, Proc Natl Acad Sci USA 110 (17) (2013) 6634–6639. arXiv:https://www.pnas.org/content/110/17/6634. full.pdf, doi:10.1073/pnas.1302752110.

image

[25] S. Brunton, J. Proctor, J. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc Natl Acad Sci USA 113 (15) (2016) 3932–3937. arXiv:http://www. pnas.org/content/113/15/3932.full.pdf, doi:10.1073/pnas.1517384113.

image

[26] G. Tran, R. Ward, Exact recovery of chaotic systems from highly corrupted data, Multi Model Simul 15 (3) (2017) 1108–1129.

[27] W. Bialek, A. Cavagna, I. Giardina, T. Mora, E. Silvestri, M. Viale, A. M. Walzak, Statistical mechanics for natural flocks of birds, Proc Natl Acad Sci USA 109 (2012) 4786 – 4791.

[28] M. Bongini, M. Fornasier, M. Hansen, M. Maggioni, Inferring interaction rules from observations of evolu- tive systems I: The variational approach, Math Mod Methods Appl Sci 27 (05) (2017) 909–951.

[29] F. Lu, M. Maggioni, S. Tang, Learning interaction kernels in heterogenous systems of agents from multiple trajectories, submitted.

[30] C. Brugna, G. Toscani, Kinetic models of opinion formation in the presence of personal conviction, Phys Rev E 92 (5) (2015) 052818. doi:10.1103/PhysRevE.92.052818.

[31] P.-E. Jabin, S. Motsch, Clustering and asymptotic behavior in opinion formation, Journal of Differential Equations 257 (11) (2014) 4165 – 4187. doi:https://doi.org/10.1016/j.jde.2014.08.005. URL http://www.sciencedirect.com/science/article/pii/S002203961400326X

[32] M. Dolfin, M. Lachowicz, Modeling opinion dynamics: How the network enhances consensus, Networks and Heterogeneous Media 10 (4) (2015) 877–896. doi:10.3934/nhm.2015.10.877.

[33] F. Cucker, S. Smale, On the mathematics of emergence, Jpn J Math 2 (1) (2007) 197 – 227.

[34] Y. P. Choi, S. Y. Ha, Z. Li, Emergent dynamics of the cuckerSmale flocking model and its variants, Modeling and Simulation in Science, Engineering and Technology (9783319499949) (2017) 299–331. arXiv:arXiv: 1604.04887v1, doi:10.1007/978-3-319-49996-3_8.

[35] V. Gazi, On lagrangian dynamics based modeling of swarm behavior, Physica D: Nonlinear Phenomena 260 (2013) 159 – 175, emergent Behaviour in Multi-particle Systems with Non-local Interactions. doi: https://doi.org/10.1016/j.physd.2013.06.010.

image

[36] H. Niwa, Self-organizing dynamic model of fish schooling, J Theor Biol 171 (1994) 123 – 136.

[37] Y. Katz, K. Tunstrom, C. Ioannou, C. Huepe, I. Couzin, Inferring the structure and dynamics of interactions in schooling fish, Proc Natl Acad Sci USA 108 (2011) 18720–8725.

[38] F. Cucker, S. Smale, Emergent behavior in flocks, IEEE Transactions on Automatic Control 52 (5) (2007) 852–862. doi:10.1109/TAC.2007.895842.

[39] F. Cucker, J. G. Dong, Avoiding collisions in flocks, IEEE Transactions on Automatic Control 55 (5) (2010) 1238–1243. doi:10.1109/TAC.2010.2042355.

[40] F. Cucker, J.-G. Dong, A conditional, collision-avoiding, model for swarming, Discrete Continuous Dyn Syst 43 (3) (2014) 1009 – 1020.

[41] S. M. Ahn, H. Choi, S. Y. Ha, H. Lee, On collision-avoiding initial configurations to Cucker-smale type flocking models, Communications in Mathematical Sciences 10 (2) (2012) 625–643. doi:10.4310/CMS. 2012.v10.n2.a10.

[42] T. Vicsek, A. Czirok, E. Ben-Jacob, I. Cohen, O. Shochet, Novel Type of Phase Transition in a System of Self-Driven Particles, Phys Rev Lett 75 (6) (1995) 1226 – 1229.

[43] S. Y. Ha, J. G. Liu, A simple proof of the Cucker-Smale flocking dynamics and mean-field limit, Commu- nications in Mathematical Sciences 7 (2) (2009) 297–325. doi:10.4310/CMS.2009.v7.n2.a2.

[44] M. Nourian, P. E. Caines, R. P. Malham´e, Mean Field analysis of controlled Cucker-Smale type flocking: Linear analysis and perturbation equations, IFAC Proceedings Volumes (IFAC-PapersOnline) 44 (1 PART 1) (2011) 4471–4476. doi:10.3182/20110828-6-IT-1002.03639.

[45] R. Shvydkoy, E. Tadmor, Eulerian dynamics with a commutator forcing, Transactions of Mathematics and Its Applications 1 (1) (2017) 1–26. doi:10.1093/imatrm/tnx001.

[46] R. Erban, J. Haˇskovec, Y. Sun, A Cucker-Smale model with noise and delay, SIAM Journal on Applied Mathematics 76 (4) (2016) 1535–1557. arXiv:arXiv:1507.04432v1, doi:10.1137/15M1030467.

[47] R. Lukeman, Y.-X. Li, L. Edelstein-Keshet, A conceptual model for milling formations in biological aggre- gates, Bulletin of Mathematical Biology 71 (2) (2008) 352. doi:10.1007/s11538-008-9365-7. URL https://doi.org/10.1007/s11538-008-9365-7

[48] A. J. Bernoff, C. M. Topaz, A primer of swarm equilibria, SIAM Journal on Applied Dynamical Systems 10 (1) (2011) 212–250. arXiv:arXiv:1008.0881v1, doi:10.1137/100804504.

[49] J. Carrillo, M. D’Orsogna, V. Panferov, Double milling in self-propelled swarms from kinetic theory, Kinetic and Related Models 2 (2) (2009) 363–378. doi:10.3934/krm.2009.2.363.

[50] P. Degond, J.-G. Liu, S. Motsch, V. Panferov, Hydrodynamic models of self-organized dynamics: Derivation and existence theory, Methods and Applications of Analysis 20 (2) (2013) 89–114. arXiv:arXiv:1108. 3160v1, doi:10.4310/maa.2013.v20.n2.a1.

[51] K. P. O’Keeffe, H. Hong, S. H. Strogatz, Oscillators that sync and swarm, Nature Communications 8 (1) (2017) 1–12. doi:10.1038/s41467-017-01190-3. URL http://dx.doi.org/10.1038/s41467-017-01190-3

[52] A. Gupta, A. Roy, A. Saha, S. S. Ray, Flocking of Active Particles in a Turbulent Flow (2018) 1–6arXiv: 1812.10288. URL http://arxiv.org/abs/1812.10288

[53] D. Levis, A. Diaz-Guilera, I. Pagonabarraga, M. Starnini, Flocking and spreading dynamics in populations of self-propelled agents (2019) 1–12arXiv:1901.08831. URL http://arxiv.org/abs/1901.08831

[54] N. Kruk, Y. Maistrenko, H. Koeppl, Self-propelled chimeras, Physical Review E 98 (3) (2018) 1–15. arXiv: arXiv:1511.04738v2, doi:10.1103/PhysRevE.98.032219.

[55] K. P. O’Keeffe, J. H. Evers, T. Kolokolnikov, Ring states in swarmalator systems, Physical Review E 98 (2).

image

[56] K. O’Keeffe, C. Bettstetter, A review of swarmalators and their potential in bio-inspired computing (2019) 85arXiv:arXiv:1903.11561v1, doi:10.1117/12.2518682.

[57] S. H. Strogatz, From Kuramoto to Crawford: exploring the onset of synchronization in populations of coupled oscillators, Physica D (143) (2000) 1 – 20.

[58] S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz, Data-driven discovery of partial differential equations, Sci Adv 3 (4) (2017) e1602614. doi:10.1126/sciadv.1602614. URL http://advances.sciencemag.org/content/3/4/e1602614

[59] S. Brunton, J. N. Kutz, J. L. Proctor, Data-driven discovery of governing physical laws, SIAM News 50 (1). URL https://sinews.siam.org/Details-Page/data-driven-discovery-of-governing-physical-laws

[60] M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, M. Viale, V. Zdravkovic, Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study, Proc Natl Acad Sci USA 105 (4) (2008) 1232–1237. arXiv:http://www.pnas.org/content/105/4/1232.full.pdf, doi:10.1073/pnas. 0711437105.

image

[61] R. Lukeman, Y.-X. Li, L. Edelstein-Keshet, Inferring individual rules from collective behavior, Proc Natl Acad Sci USA 107 (2010) 12576 – 12580.


Designed for Accessibility and to further Open Science