b

DiscoverSearch
About
My stuff
A Functional EM Algorithm for Panel Count Data with Missing Counts
2020·arXiv
Abstract
Abstract

Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors and predict future negative events, the field of quantitative behavioral research has evolved to increasingly rely upon panel count data collected via multiple self reports, for example, about frequencies of smoking using in-the-moment surveys on mobile devices. However, missing reports are common and present a major barrier to downstream statistical learning. As a first step, under a missing completely at random assumption (MCAR), we propose a simple yet widely applicable functional EM algorithm to estimate the counting process mean function, which is of central interest to behavioral scientists. The proposed approach wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption. Theoretical analysis of the proposed algorithm provides finite-sample guarantees by expanding parametric EM theory [3, 35] to our general non-parametric setting. We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data. We also discuss useful extensions to address deviations from the MCAR assumption and covariate effects.

A major goal in behavioral medicine is identifying temporal patterns of risk factors preventing an individual from successfully modifying a health-related behavior. In smoking cessation, one would like to know times of day, locations, and other contextual factors such as smoking opportunity [16] that may precipitate lapse to inform interventions to prevent lapse [22]. A basic task is to describe when smoking occurs through modeling the counting process of repeated negative events. One goal is to estimate the mean function of the counting process to characterize the dynamics of health-related behaviors at the population level.

The most common and widely-used measurement for behaviors is self-report via Ecological Momentary Assessment (EMA): a participant responds to prompts and enters data via a phone app [26]. An EMA can be randomly triggered by the app. Random EMAs provide a less-biased data collection paradigm relative to user initiated EMAs and have been used in a wide variety of behavioral studies [25, 21, 20]. Here, a participant is prompted 3-4 times a day at random times. As EMA times will not generally co-occur with events of interest, participants report the cumulative number of events since the last EMA. For example, how many cigarettes they smoked since the last prompt. This takes the form of panel count data [15][27], where exact times of a counting process are unobserved. Only the cumulative number of events since the last observation are measured, where observation times are assumed to follow some unknown distribution.

The counting process mean function can describe population-level temporal patterns of smoking. It converts the discrete patterns of smoking counts from a population of participants (see Fig. 1) into a temporally-continuous summary of smoking behavior (see Fig. 2b). However, the missingness inherent in EMA data makes consistent estimation of the mean function difficult. There are several conditions causing missingness. An EMA may be ignored by the user or opened and then abandoned. Second, the mobile app itself may postpone the triggering of an EMA for any of several reasons (e.g. battery low). While EMAs are triggered randomly, the random process is constrained to have a minimum temporal spacing between EMAs, in order to keep participant burden at an acceptable level. If EMAs are postponed or ignored too many times, it will not be possible to trigger the full set of EMAs for the day, resulting in missing EMAs. Missing data is a common issue in studies that involve EMAs, with [14] noting that over 126 studies, the average missingness rate is 25%.

For mean function estimation, missing EMAs cause problems when they lead to inaccurate counts of the total cigarettes smoked between EMAs. Due to recall bias, longer intervals between EMAs are less reliable.1 Behavioral scientists have developed heuristic imputation schemes to adjust for missing counts [11], but these may not consistently estimate the mean function. This paper presents the first self-contained and systematic treatment of missing data for panel count data, by providing a simple Expectation-Maximization (EM) algorithm [7, 3] for estimating the mean function with finite-sample theoretical guarantees.

Our primary methodological contribution is a functional EM algorithm to wrap standard non-parametric mean function estimators to handle missing data under a missing completely at random (MCAR) assumption [17]. The E-step uses estimates from a fitting method to impute missing data, and the M-step calls that fitting method to estimate a mean function. This extends several classic non-parametric methods [32, 18] and the baseline-only version of a semi-parametric mean function estimation method [30] to the setting of missing data.

We analyze our EM algorithm using the frameworks described in [3, 35] and obtain finite sample guarantees. This requires care as we are extending their work from the parametric to the non-parametric setting. This paper addresses three major theoretical challenges in the context of functional EM by: (i) noting that an infinite dimensional derivative in our setting is analogous to the inner products used in [35, 3], (ii) showing local uniform strong concavity of our population E-step, and (iii) a high probability finite sample bound on the convergence of the M-step. The lack of ground truth for missing data in real-world settings makes evaluation difficult and theoretical guarantees important. Finally, our proposed algorithm can consistently estimate the mean function even when the Poisson process assumption is violated. This is achieved by recovering the population MLE, and noting that under certain integrability conditions the population MLE of the Poisson process log-likelihood is the true mean function of the counting process, even when the counting process is not Poisson.

Ecological Momentary Assessments are frequently used to record counts for behavior, including smoking counts [24, 29, 10], alcohol counts [6] and promiscuous behaviors [34]. However, none of these papers use panel count data methods to estimate the mean function.

Panel count data analysis comes from nonparametric statistics, but our missing data problem has not been addressed. The closest works to ours are [30, 32, 18]. These papers estimate the mean function from panel count data, but cannot account for missing EMAs as in our motivating application. [30] develop an EM approach under a more limited missingness model assuming that the total counts for each participant are known prior to data analysis. This is unsuitable for analyzing data with missing EMAs. However, two strengths are that they can incorporate baseline covariate information, and they relax the standard assumption of independence between observation times and the counting process.

image

Figure 1: Raw cigarette counts smoked between observations. We want to convert this to a mean function of cumulative cigarettes smoked (Figure 2b), but some counts are missing or unreliable.

Neither of [32, 18] handle missing counts, and cannot be applied directly to our setting. We propose an EM algorithm for missing data to wrap any of [30, 32, 18], building on existing methods in the M-step. [32] provides the first strong theoretical guarantees for a mean function estimator in the panel count setting. [18] improve their rate of convergence with spline mean functions.

Many other papers in the literature focus on either extending to the semi-parametric setting, relaxing assumptions in either the non-parametric or semi-parametric setting, or improving estimation under specific distributional assumptions. They do not address missing data. [33] extended [32] to the semi-parametric setting, [13] analyzed panel count with informative observation times and subject-specific frailty, and [12] proposed using a smooth semi-parametric estimator to handle over-dispersion in panel count data. [8] proposed using a squared Gaussian process intensity function.

EM Theory. Also relevant is recent work on convergence of the EM algorithm to the true parameter [3, 35]. In [3], they show that under good initialization and certain regularity conditions, population EM converges to the true parameter. With finite sample uniform convergence of the M-step, one can also show a finite-sample bound for EM. [35] show that by assuming the ability to optimize over a ball around the true parameter, one can weaken regularity conditions and replace uniform convergence of the M-step with three concentration inequalities, often easier to prove. We base our population guarantees on [35]. We base our finite-sample guarantees on [3], because [35] assumes that empirical and population norms are the same which does not hold in our non-parametric setting.

3.1 Complete Data

Let  N = {N(u) : u ≥ 0}be a univariate counting process. The goal is to estimate the mean function  Λ∗(u) = E[N(u)]over a study window  [0, τ]. Let Kbe the random number of observations for a participant. Let  T = (T1, · · · , TK) ∈ RK+be a random vector of observation times, with T0 = 0. Let  ∆N = (∆N1, · · · , ∆NK)be the count increments:  ∆Nj = N(Tj) − N(Tj−1). Let ∆Λ(Tj) = Λ(Tj) − Λ(Tj−1)be mean function increments. For each participant  i = 1, · · · , n, we observe  Y = (∆N, T, K) ∈ N × T × K where N, T , Kare the corresponding sample spaces. Let Pn, Pdenote the empirical and true measures on  N ×T ×K, respectively. For a measurable function f, let  Pnf = 1n�ni=1 f(Yi)and  Pf =�fdP. Under a non-homogeneous Poisson process, the complete data sample log-likelihood is

image

and the population log-likelihood is similar but with P instead of  nPn. The goal is consistent mean function estimation even when the Poisson process assumption is violated.

3.2 Missing Data

Unlike previous work, we assume certain observations  ∆Njare missing. For  ∆N ∈ RK, each observation  ∆Njis missing completely at random with probability (w.p.)  ϵ ∈ [0, 1). Let τ ∈ {◦, 1}Kbe the missingness pattern, where  ◦ = 0but the notation represents missingness. Let  s = 1 − τ, thussj = 1if  ∆Njis missing. Let  ∆N (τ) = ∆N ⊙ τand  ∆N (s) = ∆N ⊙ s, where  ⊙is elementwise product. Then  ∆N = ∆N (s) + ∆N (τ). We observe  ∆N (τ)but not  ∆N (s). Let  Z ≡ ∆N (s)represent our missing data vector, and let Z be the space of values for our missing data. Then Y = (∆N (τ), T, K), and (Y, Z) ∈ N × T × K × Zgives us the observed and missing parts of the data, respectively. In this case  Pn and Pare now the empirical and true measures for  N ×T ×K×Z.

We first describe the general setting for EM. Let  fΛ(y, z), pΛ(y), and kΛ(y|z)be joint, marginal, and conditional densities respectively. Let  Θbe a convex set of functions and  {Θn}a sieve: a nested sequence  Θ1 ⊂ Θ2 ⊂ · · · such that ∪∞n=1Θn ⊆ Θis dense in  Θ. We define the following.

Definition 1. (Population  Q-function) Q(Λ′|Λ) ≡�Y��Z(y) log(fΛ′(y, z))kΛ(z|y)dz�pΛ∗(y)dy.

Definition 2. (Sample  Q-function) Qn(Λ′|Λ; {Yi}ni=1) ≡ 1n�ni=1 EΛ[log fΛ′(y, z)|Yi].

Definition 3.  Br(Λ∗) ≡ {Λ : ∥Λ − Λ∗∥∞ ≤ r} where ∥ · ∥∞is the essential supremum.

Definition 4. (Population  M-step) M(Λ(t)) = arg maxΛ′∈Θ∩Br(Λ∗) Q(Λ′|Λ(t)).

Definition 5. (Sample  M-step) Mn(Λ(t)) = arg maxΛ′∈Θn∩Br(Λ∗) Qn(Λ′|Λ(t)).

A Q-function is an E-step of the EM algorithm.. Population EM repeatedly takes  Λ(t+1) = M(Λ(t)),where  Λ(t) denotes iteration t’s estimate. Sample EM repeatedly takes  Λ(t+1) = Mn(Λ(t)). Next wedescribe EM computation. Algorithm 1 describes sample EM; population EM is similar.

image

Algorithm 1 assumes  Br(Λ⋆)is known following [35]: in practice it is unknown, and we suggest trying multiple initializations and choosing the one that leads to the highest final log-likelihood. In our numerical experiments and data analysis, for illustration we assume  Θn ={monotone step functions with at most n steps}. Next we describe the E- and M-steps.

4.1 E-Step

The population and sample Q-functions replace missing counts with mean function estimates. The sample Q-function (the population version is similar but uses P instead of  Pn.) is

image

4.2 M-Step

The population M-step maximizes  Q(Λ′|Λ) using

image

where  Uallis a uniform upper bound for functions in this set. The sample M-step uses a convex set Θn ⊂ Θ(sieve estimator).  Θnis potentially monotonic step functions [32, 30] or monotonic splines [18], subject to the constraint of having the upper bound of  Uallover the study time. Maximization proceeds via existing methods [32, 18, 30]. Like  Br(Λ∗), Uallis unknown.

Section 5.1 defines assumptions, 5.2 defines distances and a quasi-inner product, and 5.3 gives two regularity condition lemmas based on [35]. Proposition 1 shows that with good initialization and sufficiently small missingness probability, a population EM step gives a contraction, moving our estimate closer to the true mean function after every iteration. Theorem 1 then shows that population EM converges linearly to the true mean function.

We then show finite-sample theory. Proposition 2 states that with high probability, the sample M-step converges uniformly to the population M-step. Theorem 2 states that with high probability the distance between the current estimate and the true mean function is bounded by two terms: one describes applying population EM, and the other involves the uniform convergence of the M-step.

5.1 Assumptions

We make the following assumptions, similar assumptions were made in [32, 18, 30]:

1. The counting process N is independent of the number of observations K and observation times T, respectively;

2. The observation times are random variables taking values in the bounded set  [τ0, τ]where 0 < τ0 < τ and τ ∈ (0, ∞);

3. The number of observations is bounded, i.e., there exists  k0 > 0 such that P(K ≤ k0) = 1;

4. The true mean function is uniformly bounded over the study, satisfying  Λ∗(u) ≤ U ≤ Uallfor some  U ∈ (0, ∞) and all u ∈ [τ0, τ]. Recall Uallis a uniform upper bound on functions in  Θ;

5. The first derivative of  Λ∗(u)has a positive lower bound in  [τ0, τ].2

6. The observation times are  α-separated. That is,  P(Tj − Tj−1 ≥ α) = 1for some  α > 0and all  j = 1, · · · , K;

7. E[exp(aN(t))] is uniformly bounded for  t ∈ [0, τ]some constant a;

8. The count increments  ∆Njare missing completely at random (MCAR) w.p.  ϵ > 0. That is,sji, ji = 1, · · · , Ki, i = 1, · · · , nare iid Bernoulli(ϵ)random variables.

In the context of an observational smoking study, some of these assumptions can be scientifically expressed: 1) and 2) EMAs are delivered at random times throughout the study and are independent of smoking times 3) there is a maximum number of EMAs delivered over the study 5) there is a minimum smoking risk throughout the study 6) there is a minimum time between EMAs 8) the probability that an EMA is missed is  ϵ > 0. The only new assumption for panel count is assumption 8.

5.2 Measures, Distance Metrics, and Quasi-Inner Product

We next define measures for constructing distance metrics between mean functions. We then define a quasi-inner product that is used to show regularity conditions for population EM theory. Let B, B1, B2be the intersection of Borel sets in  R with [0, τ].

Definition 6 (Measures for sets containing observation times).  µ(B) ≡ E��Kj=1 1B(Tj)�and

image

Then  µ(B) = E|{observations in  B}| and µ2(B1 × B2) = E|{one observation in  B1, next in B2}|.Definition 7 (d2metric for mean functions).  ∥Λ1 − Λ2∥ ≡ [� τ0� τ0 |(Λ1(v) − Λ1(u)) − (Λ2(v) −Λ2(u))|2dµ2(u, v)]1/2.

This is the  d2metric of [32]. Under assumption 3, convergence in  ∥ · ∥implies convergence in  L2(µ).We base our theory on convergence in this norm. Now define

image

We do not claim that (1) is a valid inner product, but it is closely related to the inner product from [35, 3], the Q-function’s directional derivative in the direction of a parameter vector, while (1) is the Q-function’s right Gateaux derivative in the direction of a mean function, analogous in function space. The connection to inner products from [3, 35] is key to using existing EM theory to prove our Lemmas 1 and 2. We prove Equation (1) in A.1 in the supplementary material.

5.3 Population Theory

Before stating our main population EM convergence theorem, we define important constants and state two lemmas for the population Q-function. Lemmas 1 and 2 mirror gradient stability and local uniform strong concavity from [35], respectively, but they studied Q-functions with quadratic dependence on parameters. Our objective function does not have quadratic dependence on the mean function. However, we can decompose our Q-function into a sum of two terms: one locally uniformly strongly concave, and one strictly concave. The sum is then locally uniformly strongly concave. Lemma 2 shows this lower bound holds, and is a key step that allows us to apply [35] to our setting. We then prove Proposition 1 which states that population EM steps contract, bringing estimates closer to the true mean function after each iteration.

Definition 8. Let  c ≡ inf ∆Λ∗ > 0be a uniform lower bound on increments of the true mean function. This exists by assumptions 5 and 6.

Definition 9. Let  b ≡ sup{∆Λ : Λ ∈ Br(Λ∗)} > 0be a uniform upper bound on increments of mean functions in  Br(Λ∗). Note b ≤ Uall, where Uallis a uniform upper bound on functions in  Θ.

Definition 10. Let  γ = ϵc and ν = 1−ϵ3b

Lemma 1. (Gradient Stability) Assume assumptions 1, 2, 3, 4, 5, 6, and 8 hold. Then for  Λ, Λ′ ∈ Θ(proof in A.2),

image

Lemma 2. (Local Uniform Strong Concavity) Assume all assumptions hold. Then if  r ≤ c4and Λ′, Λ ∈ Br(Λ∗)(proof in A.3),

image

See section A.5 in the supplementary material for a detailed proof. This is similar to Proposition 3.2 in [35]. In order for this to give a contraction, we need  γ < ν, which holds if  ϵ < c3b+c. Thus if the uniform lower bound on increments of the true mean function goes up or the uniform upper bound on increments in the ball  Br(Λ∗)goes down, we can tolerate a higher probability of missing data.

Theorem 1. (Population EM Convergence to True Mean Function) Suppose the assumptions of the above hold and  0 < γ < ν. Take the EM sequence  Λ(t+1) = arg maxΛ′∈Θ∩Br(Λ∗) Q(Λ′|Λ(t))and

image

The proof is in A.6 in the supplementary material. At each EM step, we move towards the true function by a multiplicative factor of� γν�. This is similar to Theorem 3.1 in [35].

5.4 Sample Theory

We next discuss convergence of the sample EM algorithm. We again require the initial estimate Λ0and subsequent estimates  Λ(t)to be close, i.e., in the set  Br(Λ∗) ∩ Θn. The key additional assumption is that the sample size is large enough to satisfy certain conditions. For all n greater than this minimum sample size, we can guarantee approximate contraction of the sample EM algorithm, where the term that does not contract goes to 0 in large samples.

Proposition 2. Suppose all assumptions hold. Assume  Br(Λ∗)has radius  r ≤ c4. For any L > 0,

there exists  uLL→∞→ 0such that w.p.  1 − uL

image

See B.1 of the supplementary material for proof, as well as a definition for  uL. This is the rate of convergence an M-estimator, and generalizes the rate of convergence of [1] from maximum likelihood estimates to M-steps. We now state our main result.

Theorem 2. Suppose  0 < r ≤ c4 and 0 < γ < νand all assumptions hold such that the population contractivity holds. Let  κ = γν . Take the EM sequence  Λ(t+1) = arg maxΛ′∈Br(Λ∗)∩Θn Qn(Λ′|Λ(t))and  Λ0 ∈ Br(Λ∗) ∩ Θ. Then if the sample size is large enough that  2Ln−1/3 ≤ (1 − κ)r, then w.p. at least  1 − uL

image

See section B.2 in the supplementary material for a proof. The proof follows that of Theorem 5 of [3] closely. Note that this immediately implies that our algorithm can recover the true population MLE in large samples. Further by [32], the population MLE is the true mean function under assumptions 1,4 and 7 even under Poisson process violations. Thus our method is robust to Poisson process violations.

Here we show numerical and real data results to illustrate the utility of the proposed functional EM algorithm. In the M-step, we can choose any reasonable mean function estimator. In our experiments, we use a general likelihood-based augmented estimating equation (AEE) method [30], which uses monotone step functions when obtaining the mean function estimate with complete data. [31, 5] provide software implementations of AEE and a wide variety of other panel count methods. First, we perform synthetic experiments that demonstrate accurate recovery of the true mean function (see Section C of the supplemtary material). In particular, we also demonstrate good recovery of the true mean function using simulated data based on mixed poisson processes (which violate the Poisson process assumption) hence confirming theoretical robustness of functional EM to Poisson process assumption. In the following, we focus on two experiments involving real data.

6.1 Real Data with Synthetic Missingness

We analyze blaTum (bladder tumor dataset) [4] with 85 patients and counts of tumors taken at appointment times. We artificially delete intervals completely at random with probability 0.2. We then initialize  Λ(0) by replacing the missing data with Poisson(1) random variables and fitting AEE. We bootstrap 1,000 times, and plot the sample mean of our learned mean functions under complete data. We also set counts to zero in intervals with missing counts, which biases the mean function estimates. Figure 2a compares inference from complete vs partially missing data using our wrapper vs zeroing out missing data. Our wrapper learns a model much closer to the complete data than its initialization or the zeroing model. Section D of the supplementary material has additional experiments: we investigate sensitivity to initialization and different missingness probabilities. We also replace AEE with other M-step methods in Section D of the supplementary material and again confirm the utility of functional EM in approximating the mean function. The actual estimates vary slightly. In practice we recommend comparing results when making scientific conclusions; similar results from a wide variety of flexible methods indicate more robust scientific evidence.

image

Figure 2: The estimated mean function and the 95% bootstrap confidence band from a) bladder tumor dataset: synthetic 20% missingness probability. Initialization has missing data set to Poisson(1) values. b) smoking cessation study. Initialization treats self-reported counts in intervals over 24 hours (about 5.6% of observations) as valid.

6.2 Smoking Cessation Study

We analyze data from an ongoing smoking cessation study in which the following EMA question was delivered randomly 3-4 times per day: "Since the last assessment, how many cigarettes did you smoke?" We have 125 participants tracked over 14 days, with 3-4 random EMAs targeted. The first five days are their normal smoking behavior, while the remaining days involve them attempting to quit. In both curves the smoking rates are much higher pre-quit than post-quit, suggesting that they do in fact attempt to quit. After discussion with psychologists, we treat counts in intervals of longer than 24 hours as missing because they are considered to be unreliable. We use the counts to initialize our model. Unlike the previous experiments, we lack a ground truth. Note that this study is currently ongoing and we cannot link to it or share the dataset.

Figure 2b shows the results based on 1, 000 non-parametric bootstrap samples and two models: one treating long intervals as valid (“initial"), and the other treating them as missing where we use EM. Long intervals make up 5.6% of observations. The EM algorithm estimates that on average smokers attempting to quit smoked 80.4 cigarettes by the end of the study; in contrast, AEE underestimated (66.11), a difference of 21.7%. This is consistent with our collaborating behavioral scientists’ hypothesis that participants may under-report a cigarette count when the gap between completed EMAs is large. The proposed functional EM is able to borrow count information across multiple shorter overlapping intervals from other participants to alleviate this under-reporting. We discuss other analyses of the dataset further in section E of the supplementary material.

This paper proposed a functional EM algorithm to estimate the mean function for incomplete panel count data. Extending existing EM algorithm analysis to general non-parametric settings, we provided finite sample convergence guarantees to the truth. We conducted extensive experiments to illustrate the effectiveness of the proposed algorithm in recovering the true mean function. We applied the proposed algorithm to a smoking cessation study and found that participants may underestimate their cigarettes smoked over intervals longer than 24 hours. To the best of our knowledge, we are the first to apply panel count data methods to EMAs, despite their central role in behavioral science research.

From a theoretical perspective, the main limitation is the MCAR assumption. Deviations from MCAR can happen if non-reporting depends on a subject’s emotional state, which may be related to their smoking counts. Extensions to other missingness mechanisms such as Missing at Random (MAR) warrant future research; our analysis provides a general framework and first step, which can eventually be extended to also incorporate covariates under a more complex missing data model. Like some existing theoretical analysis of EM algorithms [35], another limitation is the need to optimize over a ball around the true mean function, which is unknown. Relaxing this condition is likely very challenging but also important future work. Finally, we would like test statistics for our estimator. This is challenging as the differences between fitted and true mean functions in norm do not converge to a normal distribution. However, there are test statistics based on weighting [2] that could be extended to this setting.

Understanding the dynamics for individuals who attempt to change and maintain behaviors to improve health has important societal value, for example, a comprehensive understanding of how smokers attempt to quit smoking may guide behavioral scientists to design better intervention strategies that tailor to the highest risk windows of relapse. Our theory and method provide a valid approach to understanding a particular aspect of the smoking behavior (mean function). The resulting algoithm is robust, readily adaptable and simple to implement, highlighting the potential for its wider adoption. The negative use case could be lack of sensitivity analysis around the assumptions such as missing data mechanism which may lead to misleading conclusions. Our current recommendation is to consult scientists about the plausibility of the assumption about missing data.

[1] BALAKRISHNAN, N., AND ZHAO, X. A class of multi-sample nonparametric tests for panel count data. Annals of the Institute of Statistical Mathematics 63, 1 (2011), 135–156.

[2] BALAKRISHNAN, N., ZHAO, X., ET AL. New multi-sample nonparametric tests for panel count data. The Annals of Statistics 37, 3 (2009), 1112–1149.

[3] BALAKRISHNAN, S., WAINWRIGHT, M. J., YU, B., ET AL. Statistical guarantees for the em algorithm: From population to sample-based analysis. The Annals of Statistics 45, 1 (2017), 77–120.

[4] BYAR, D. The veterans administration study of chemoprophylaxis for recurrent stage i bladder tumours: comparisons of placebo, pyridoxine and topical thiotepa. In Bladder tumors and other topics in urological oncology. Springer, 1980, pp. 363–370.

[5] CHIOU, S. H., HUANG, C.-Y., XU, G., AND YAN, J. Semiparametric regression analysis of panel count data: A practical review. International Statistical Review 87, 1 (2019), 24–43.

[6] COLLINS, R. L., KASHDAN, T. B., AND GOLLNISCH, G. The feasibility of using cellular phones to collect ecological momentary assessment data: Application to alcohol consumption. Experimental and clinical psychopharmacology 11, 1 (2003), 73.

[7] DEMPSTER, A. P., LAIRD, N. M., AND RUBIN, D. B. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological)

39, 1 (1977), 1–22.

[8] DING, H., LEE, Y., SATO, I., AND SUGIYAMA, M. Variational inference for gaussian process with panel count data. Uncertainty in Artificial Intelligence (2018).

[9] GERW (HTTPS://MATH.STACKEXCHANGE.COM/USERS/58577/GERW). Taylor expansion for gteaux derivative. Mathematics Stack Exchange. URL:https://math.stackexchange.com/q/3076726 (version: 2019-01-17).

[10] GRIFFITH, S. D., SHIFFMAN, S., AND HEITJAN, D. F. A method comparison study of timeline followback and ecological momentary assessment of daily cigarette consumption. Nicotine & tobacco research 11, 11 (2009), 1368–1373.

[11] HALLGREN, K. A., AND WITKIEWITZ, K. Missing data in alcohol clinical trials: a comparison of methods. Alcoholism: Clinical and Experimental Research 37, 12 (2013), 2152–2160.

[12] HUA, L., ZHANG, Y., AND TU, W. A spline-based semiparametric sieve likelihood method for over-dispersed panel count data. Canadian Journal of Statistics 42, 2 (2014), 217–245.

[13] HUANG, C.-Y., WANG, M.-C., AND ZHANG, Y. Analysing panel count data with informative observation times. Biometrika 93, 4 (2006), 763–775.

[14] JONES, A., REMMERSWAAL, D., VERVEER, I., ROBINSON, E., FRANKEN, I. H., WEN, C. K. F., AND FIELD, M. Compliance with ecological momentary assessment protocols in substance users: a meta-analysis. Addiction 114, 4 (2019), 609–619.

[15] KALBFLEISCH, J., AND LAWLESS, J. F. The analysis of panel data under a markov assumption. Journal of the American Statistical Association 80, 392 (1985), 863–871.

[16] KIRCHNER, T. R., CANTRELL, J., ANESETTI-ROTHERMEL, A., GANZ, O., VALLONE, D. M., AND ABRAMS, D. B. Geospatial exposure to point-of-sale tobacco: real-time craving

and smoking-cessation outcomes. American journal of preventive medicine 45, 4 (2013), 379–385.

[17] LITTLE, R. J., AND RUBIN, D. B. Statistical analysis with missing data, vol. 793. John Wiley & Sons, 2019.

[18] LU, M., ZHANG, Y., AND HUANG, J. Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94, 3 (2007), 705–718.

[19] MA, S., AND KOSOROK, M. R. Robust semiparametric m-estimation and the weighted bootstrap. Journal of Multivariate Analysis 96, 1 (2005), 190–217.

[20] O’CONNELL, K. A., GERKOVICH, M. M., COOK, M. R., SHIFFMAN, S., HICKCOX, M., AND KAKOLEWSKI, K. E. Coping in real time: Using ecological momentary assessment techniques to assess coping with the urge to smoke. Research in Nursing & Health 21, 6 (1998), 487–497.

[21] PIASECKI, T. M., JAHNG, S., WOOD, P. K., ROBERTSON, B. M., EPLER, A. J., CRONK, N. J., ROHRBAUGH, J. W., HEATH, A. C., SHIFFMAN, S., AND SHER, K. J. The subjective effects of alcohol–tobacco co-use: An ecological momentary assessment investigation. Journal of abnormal psychology 120, 3 (2011), 557.

[22] REHG, J. M., MURPHY, S. A., AND KUMAR, S. Mobile Health. Springer, 2017.

[23] SEN, B. A gentle introduction to empirical process theory and applications.

[24] SHIFFMAN, S. How many cigarettes did you smoke? assessing cigarette consumption by global report, time-line follow-back, and ecological momentary assessment. Health Psychology 28, 5 (2009), 519.

[25] SHIFFMAN, S., BALABANIS, M. H., GWALTNEY, C. J., PATY, J. A., GNYS, M., KASSEL, J. D., HICKCOX, M., AND PATON, S. M. Prediction of lapse from associations between smoking and situational antecedents assessed by ecological momentary assessment. Drug and alcohol dependence 91, 2-3 (2007), 159–168.

[26] SHIFFMAN, S., STONE, A. A., AND HUFFORD, M. R. Ecological momentary assessment. Annu. Rev. Clin. Psychol. 4 (2008), 1–32.

[27] SUN, J., AND KALBFLEISCH, J. Estimation of the mean function of point processes based on panel count data. Statistica Sinica (1995), 279–289.

[28] VAN DER VAART, A. W., AND WELLNER, J. A. Weak convergence. In Weak convergence

and empirical processes. Springer, 1996, pp. 16–28.

[29] WANG, H., SHIFFMAN, S., GRIFFITH, S. D., AND HEITJAN, D. F. Truth and memory: Linking instantaneous and retrospective self-reported cigarette consumption. The annals of applied statistics 6, 4 (2012), 1689.

[30] WANG, X., MA, S., AND YAN, J. Augmented estimating equations for semiparametric panel count regression with informative observation times and censoring time. Statistica Sinica (2013), 359–381.

[31] WANG, X., AND YAN, J. Fitting semiparametric regressions for panel count survival data with an r package spef. Computer methods and programs in biomedicine 104, 2 (2011), 278–285.

[32] WELLNER, J. A., ZHANG, Y., ET AL. Two estimators of the mean of a counting process with panel count data. The Annals of statistics 28, 3 (2000), 779–814.

[33] WELLNER, J. A., ZHANG, Y., ET AL. Two likelihood-based semiparametric estimation methods for panel count data with covariates. The Annals of Statistics 35, 5 (2007), 2106–2142.

[34] WRAY, T. B., KAHLER, C. W., AND MONTI, P. M. Using ecological momentary assessment (ema) to study sex events among very high-risk men who have sex with men (msm). AIDS and Behavior 20, 10 (2016), 2231–2242.

[35] WU, C., YANG, C., ZHAO, H., AND ZHU, J. On the convergence of the em algorithm: A data-adaptive analysis. arXiv preprint arXiv:1611.00519 (2016).

A.1 Proof of Equality in Equation 1

We start with the numerator

image

Now consider

image

We next need to show that we can pull the limit as  η ↓ 0under integrals. There are two relevant terms:

image

Noting that 1η log∆Λ(l)j +η∆Λ′j∆Λ(l)jis monotone increasing to ∆Λ′j∆Λ(l)j as η ↓ 0, we can apply the monotone convergence theorem to pull the limit under the integral. Then

image

image

then any integral of this over  T × K × Zwill also be finite, and we can apply Fubini’s theorem to such integrals. Then recalling that  ϵ > 0is the MCAR probability (assumption 8),

image

Here we used Fubini’s theorem to pull out  ϵ. Applying the CS inequality for inner products in  l2, andfinally applying the CS inequality for expectations gives us the result. The first term on rhs is then:

image

This gives us

image

A.3 Proof of Lemma 2

Note that

image

Define

image

and note that if �Kj=1 τj[∆Nj log[∆Λ′j] − Λ′(Tj)]and  �Kj=1 sj[∆Nj log[∆Λ′j] − Λ′(Tj)]are in- tegrable over  N × T × K × Zwith respect to the measure P, we can apply Fubini’s theorem to obtain

image

This proof takes place in four parts. We first show an inequality that allows us to characterize r for Br(Λ∗). Next we show that in this ball the integrability conditions above hold. We then show that  Q1and  Q2are both strictly concave. Finally we show that  Q1is strongly concave and thus Q, a positive linear combination of a strongly concave and a strictly concave function, is strongly concave.

A.3.1 Characterizing the Radius of Contraction

image

Proof. Let  φ(y) = (1 + y)(log(1 + y) − 1) + 1. Then we can Taylor expand

image

where  ξlies between 0 and y. Then noting that  φ(0) = φ′(0) = 0 and φ′′(y) = 11+y,

image

for  y ∈ (−1, 1). Letting h(x) = x(log x − 1) + 1this implies that  h(x) ≥ 13(x − 1)2 for (x − 1) ∈

(−1, 1)i.e.  x ∈ (0, 2). Let  x =∆Λ∗j∆Λ′j. Then we need  ∆Λ′j ≥ 12∆Λ∗jfor  h(x) ≥ 13(x − 1)2to hold. Note that if

image

Then

image

and if z = 0.5 the desired result holds. For equation 15 to hold it suffices to have  (∆Λ∗j − ∆Λ′j)2 ≤z2∆Λ∗2j ≤ z2c2. Now noting that  (a − b)2 ≤ 2(a2 + b2),

image

w.p. 1, and noting that we want z = 0.5 a sufficient condition is

image

A.4 Integrability Conditions

First note by assumption 7, all moments of  N(τ)are uniformly bounded and thus  ∥N(τ)∥∞ < ∞. Then for  Λ, Λ′ ∈ Br(Λ∗)and by assumption 3,

image

A.4.1 Strict Concavity of  Q1 and Q2

For  Q1 we have

image

and for  Q2the same argument can be made.

A.4.2 Strong Concavity of Q

Now note that

image

where the first inequality holds by claim 1. Now by strict concavity of  Q2as shown in A.4.1 we have

image

Summing  (1 − ϵ)Q1 and ϵQ2 we obtain

image

A.5 Proof of Population Contractivity

We now state the main proof of population contractivity. Denote

image

then

image

and rearranging terms and dividing both sides by  ∥Λ′ − Λ∗∥gives the desired result. Note that we used  ⟨∇Q(Λ∗|Λ∗), Λ′ − Λ∗⟩ ≤ 0, which if  ⟨·, ·⟩were a valid inner product would be the KKT conditions. However since  ⟨·, ·⟩may not be a valid inner product, they must be checked specifically. [32] does it in the sample case for the true log-likelihood: it is easy to verify that it still holds in the population case for Q-functions. Noting that  Λ∗maximizes  Q(Λ′|Λ∗), we have that

image

A.6 Proof of Theorem 1

By induction. It holds for t = 0. Assume it holds for  t ≥ 0. Then  Λ(t+1) ∈ Br(Λ∗)and by assumption  Q(Λ(t+1)|Λ(t)) ≥ Q(Λ∗|Λ(t)). Applying population contractivity and the induction assumption,

image

B.1 Proof of Proposition 2

B.1.1 Definitions and Background from the Literature

Before proving the proposition, we restate several important results that we use from existing literature. We repeat two definitions and two theorems from [23], adjusted to our notation. Note that  η-bracketsare normally called  ϵ-brackets. However, since we have already used  ϵto denote MCAR probabilities, we call them  η-brackets.

Definition 11. (η-bracket) Let (F, d) be a normed space of functions distance metric d induced by some norm. Given two functions  l(·) and g(·), the bracket [l, g] is the set of all functions  f ∈ F withl(u) ≤ f(u) ≤ g(u)∀u ∈ [0, τ]. An η-bracket is a bracket  [l, g] with d(l, g) < η.

Definition 12. (Bracketing numbers). The bracketing number  N[](η, F, L2(P))is the minimum number of  η-brackets needed to cover  F using L2(P) distance.

Definition 13. (Bracketing Integral) The bracketing integral is defined as

image

Note that since any non-empty set requires at least one bracket to cover it and  log(x + 1) ≤ 1+log(x)for  x ≥ 1,

image

for some  c1 > 0. We call this the separation condition. Further, let

image

and assume that there exists some function  φn(·)satisfying the following three conditions

image

3. for the rate of convergence  rnφn(rn) ≲ √nr2n (26)as n varies.

Here  ≲ means ≤the right hand side times a constant. Then for every  L > 0, ∥Mn(Λ) − M(Λ)∥ ≤2Lrnwith probability at least  1 − uL. Here  uL = ˜c �j>M 2j(α−2), where  ˜conly depends on the constants in the separation condition and the expected supremum bound.

Note that this is essentially a special case of Theorem 3.2.5 of [28], but it uses expectations instead of outer expectations. It makes the stronger version of their assumptions and draws a stronger conclusion, giving a finite sample bound. The key in our setting will be to ensure that  ˜cdoes not vary across iterations, which requires that the constants for the separation condition and the expected supremum bound do not vary across iterations. Importantly, we cannot always apply this theorem in the general functional EM setting: it requires that the sample Q-function at its maximizer over a Sieve is equal to the sample Q-function at its maximizer over the full function space. This holds for the Poisson process log-likelihood for panel count data for a range of cases: for instance with step functions or splines where jumps/knots occur at observation times. It does not necessarily hold for arbitrary models. In the latter case we may be able to use Theorem 6.1 of [23], but this would require checking it carefully for each potential model.

We also note Theorem 4. (Theorem 4.12 of [23]) For any class F of measurable functions  f : X → R such that

image

where ˜K > 0is some constant.

Importantly, ˜K is a universalconstant and does not depend on F. This was noted by [19]. A version of this theorem was originally Lemma 3.4.2 in [28].

B.1.2 Outline

We follow [1], which proves the rate of convergence for the maximum pseudo-likelihood of a Poisson process objective function for panel count data: extending their proof to the expected complete data log-likelihood case is straightforward. However, we face the issue that we want a high probability uniform bound on the distance between sample and population M-steps across EM iterations, whereas they neeeded an asymptotic high probability rate of convergence for the pseudo MLE. This poses three challenges: 1) our objective function at each iteration is the expected log-likelihood rather than the log-likelihood. Thus we cannot prove that the separation condition holds using the same techniques. 2) the separation condition must hold always rather than only in a neighborhood of the optimum 3) we need to check that constants are the same across EM iterations. Our aim is to apply Theorem 3 and show that the constant  ˜cin  uLdoes not vary across EM iterations. Note that other than checking the separation condition, the majority of this proof simply repeats the proof strategy of [1] but fills in details of results they call to make sure that constants don’t vary across iterations of EM.

This proof takes place in four parts. We first show that the separation condition holds. We then bound the expectation of the supremum of the magnitude of an empirical process. We next prove the two properties of the function involved in that bound to show the rate of convergence, and finally conclude by applying Theorem 3.

B.1.3 Separation Condition

We first prove that the separation condition given by Equation 22 holds. This involves applying functional second order Taylor expansions to  Q1and  Q2and using the remainder terms to obtain quadratic lower bounds.

Claim 2. For any  Λ, Λ′ ∈ Br(Λ∗),

image

Proof. Consider the functional second order expansion of  Q(Λ′|Λ)at  Q(M(Λ)|Λ), which we can do by [9].

image

where the last line follows since  Λ ∈ Br(Λ∗) so that ∥Λ − Λ∗∥∞ ≤ c4 and thus ∆Λ ≥ 12∆Λ∗ ≥ 12cw.p. 1. Noting that  ⟨∇Q(M(Λ)|Λ), Λ′ − M(Λ)⟩ ≤ 0by the KKT conditions,

image

and thus the separation condition holds since we optimized over  Θn ∩ Br(Λ∗).

B.1.4 Bounding the Expectation of the Supremum of the Magnitude of the Empirical Process

Our aim in this section is to apply Theorem 4 and use the result to show that the expected supremum condition, Equation 24 in Theorem 3 holds. Let

image

This section proceeds as follows. We first show that for all  f ∈ Mδ, Pf 2 ≤ c2δ2 for some constant c2 > 0and  ∥f∥∞ ≤ c3for some  c3 > 0. We next show a bound on the bracketing entropy in terms of the bracket size. We then use this to bound the bracketing integral using  δ1/2. We combine all of this to bound the expectation of the supremum of interest. We must carefully note that relevant constants do not vary across iterations.

Claim 3. For  Λ′ ∈ Θδ, P|mΛ′,Λ(Y )−mM(Λ),Λ(Y )|2 ≤ c2δ2 for some c2 > 0that does not depend on  Λ′ or Λ.

Proof.

image

Then note

image

where the first line uses the inequality  1 − 1x ≤ log(x) ≤ x − 1which implies�log�xy��2≤(x − y)2/ min(x, y)2. Plugging this back into equation 29 we obtain

image

so that we have  P|mΛ′,Λ(Y ) − mM(Λ),Λ(Y )|2 ≤ c2δ2 for some constant  c2, and c2does not depend on either  Λ′ or Λ.

Claim 4. For  Λ′ ∈ Θδ, ∥mΛ′,Λ(Y ) − mM(Λ),Λ(Y )∥∞ ≤ c3, where c3does not depend on  Λ or Λ′.

Proof. Again using���log�xy���� ≤ |x − y|/| min(x, y)|, we have

image

where we used that  Λ′, Λ∗, M(Λ) ∈ Br(Λ∗) an L∞ ball with r = c4

By theorem 2.7.5 of [28], which bounds the bracketing number of monotone functions mapping to [0, 1], and noting that  Mδhas bracketing number less than or equal to that of  Θ ∩ Br(Λ∗), which was shown in [2],

image

where  c4only depends on  Uall, the uniform upper bound in  Θ. Noting that  Mδ ∪ {0} = Mδ, we have

image

where again  c5only depends on  Uall. Let

image

and note that

image

Claim 5. Because  P|mΛ′,Λ(Y ) − mM(Λ),Λ(Y )|2 ≤ c2δ2 and ∥mΛ′,Λ(Y ) − mM(Λ),Λ(Y )∥∞ ≤ c3,we have

image

for  φn(δ) = δ1/2 + δ−1n−1/2, where c6does not depend on  Λ′ or Λ.

Proof. Here we use Theorem 4 in order to prove Equation 24 in Theorem 3. By Theorem 4 and again noting that  Mδ ∪ {0} = Mδ, we have

image

With this we have the bound on the expected supremum and have proven Equation 24 for Theorem 3.

B.1.5 Characterizing the Function in the Bound

We first prove Equation 25 in Theorem 3.

image

and thus  α = 12.

Next we prove Equation 26. Let  rn = n−1/3. Then

image

and

image

and thus

image

B.1.6 Putting it All Together

We have now proven that all of the assumptions of Theorem 3 hold, and that constants do not vary across iterations. Then for some constant  ˜c whichdoes not vary across iterations, for any L > 0 and using  α = 12, w.p. 1 − ˜c �j>M 2−3j/2,

image

B.2 Proof of Theorem 2

Note that if  ∀Λ ∈ Br(Λ∗), ∥Mn(Λ) − M(Λ)∥ ≤ 2Ln−1/3, then we have  supΛ∈Br(Λ∗) ∥Mn(Λ) −M(Λ)∥ ≤ 2Ln−1/3. We claim for any t > 0,

image

We prove by induction. First, this holds for t = 1.

image

Now assume holds true for t > 0. Then for t + 1,

image

Now iterating we have

image

C.1 Square Root Synthetic Experiment

We generate synthetic panel count data from mixed inhomogeneous Poisson processes with conditional mean functions  Λ∗(u)|X = Xu1/2, Λ∗(u)|X = Xu2, where  X ∼ uniform(0, 2). The mean functions are then  Λ∗(u) = u1/2and  Λ∗(u) = u2, respectively. The counting process conditional on X is Poisson, but the marginal counting process is not. We use 100 trajectories, each with 30 observations and for each observation set it to missing with probability 0.2. We initialize the mean function  Λ(0)by replacing the missing data with Poisson(1) random variables and fitting a model. We generate 1000 Monte Carlo runs and create Monte Carlo marginal confidence intervals from those runs.

Fig 3 compares the true mean function against AEE wrapped with our method vs AEE directly on the corrupted data. Taking the corrupted data as given learns highly biased results, while wrapping AEE with our algorithm learns close to the true mean function for both experiments.

image

Figure 3: Mean+95% Monte Carlo CIs of 1000 runs of Mixed Poisson processes, which violate the Poisson process assumption. Synthetic datasets: 20% missingness using AEE wrapped with our method. Black/solid is the true mean function. Red/dotted is initialization with missing data set to Poisson(1) values. Blue/dashed is our EM algorithm initialized at red. (a)  Λ∗(u) = √ua square root mean function (b)  Λ∗(u) = u2 a quadratic mean function

D.1 Comment on Mean and Confidence Intervals

In order to form the mean and marginal confidence intervals, we can note that the observation times are all discrete valued at times (months) 1 to 50. Thus we can simply take mean and marginal confidence intervals at those points. To do so, for each bootstrap replicate, we add a small amount of noise  ±1e − 6to the observed time points for identifiability purposes in order to train, and then take mean and marginal confidence intervals at the described points.

D.2 Varying Missingness Probabilities

In this experiment we vary the missingness probability with  ϵ = 0.1, 0.2, 0.3, 0.4. We do so for each of the five following methods: the non-parametric step function maximum pseudo-likelihood (NPMPLE) of [32], the smoothed maximum pseudo-likelihood (MPLs) and and maximum likelihood (MLs) estimators of [18], and the step function solution to the augmented estimating equation (AEE) and the informative censoring (AEEX) version methods of [30]. We show that bias is low but increasing as a function of the missingness probability for the five methods. Figure 4 shows results. The bias is very low in all cases. Further, in all cases it is much lower than the initialization with corrupted data shown in Figure 2a.

D.3 Varying Initialization

In this experiment we vary the initialization. We note that the mean count across all observations is 0.44. We investigate Poisson(1) to Poisson(4) initializations under both  ϵ = 0.2and  ϵ = 0.4. We use AEE in all cases. Figure 5 shows results. We see that further initializations increase bias, but only slightly for  ϵ = 0.2and more so for  ϵ = 0.4. This is not surprising as our theory requires good initialization and sufficiently low missingness probability.

D.4 Heterogeneity in Missingness Probabilities

Even if the MCAR assumption does hold, the missingness probability may vary between subjects. In this experiment, we let  ϵ|X = ϵmeanXwhere  X ∼ uniform(0, 2)and  ϵmean = 0.2, 0.4. This can capture between subject heterogeneity in missingness. Within each subject, we compare initialization of Poisson(1) to Poisson(4). Figure 6 shows results using the AEE method. They look very similar to results without heterogeneity in the missingness probabilities between subjects.

D.5 Missing at Random

Here we investigate the recovery under missing at random (MAR). For MAR, we note that 83.7% of the counts are 0: that is, there are no new tumors. We then set the missingness probability for a

image

Figure 4: results for mean and 95% CIs for 1000 bootstrap replicates for bladder tumor dataset, with the missingness probability  ϵ set to ϵ = 0.1, 0.2, 0.3, 0.4. We see that bias is very low in all settings, although it increases with the missingness probability. The methods wrapped with our method are (a) maximum pseudo-likelihood (MPL) of [32] (b) augmented estimating equations (AEE) of [30] (c) augmented estimating equations with informative censoring (AEEX) of [30] (d) maximum pseudo-likelihood splines (MPLs) of [18] note that for this we need to reduce the time axis slightly as spline models cannot interpolate far past the region where they have values in learning (e) maximum likelihood splines (MLs) of [18], same issue.

image

Figure 5: Here we vary the initialization by initializing increasingly far from the mean observed under complete data. The true sample mean of all intervals is 0.44. We initialize to Poisson random variables with means 1 to 4. We see that initializing further from the truth increases the bias, and that it is worse for the higher missingness probability of 0.4. Our theory requires good initialization and the missingness probability to be sufficiently low, so this is not surprising.

image

Figure 6: Here we let the missingness probability vary per participant while still being MCAR by multiplying the mean missingness by some value drawn from uniform[0, 2]. We again initialize to Poisson(1) to Poisson(4) for a) mean missingness 0.2 and b) mean missingness 0.4.

image

Figure 7: Missing at random (MAR). The time-varying missingness probability is  ϵj = 0.1if the previous  ∆Nj−1 = 0(approximately 84% of observations) and  ϵj = 0.3otherwise.

specific observation for a participant to  ϵj = 0.10if  ∆Nj−1 = 0and otherwise  ϵj = 0.3. Figure 7 shows the results. Recovery is very good.

In this section we plot several histograms and boxplots to get insight into behavior variability in the study. We plot cigarettes since the last assessment and number of days between EMAs, where for both we aggregate both between and within subjects. We also plot the percentage of long intervals between subjects. That is, we take the percentage for each participant and then take the boxplots and histograms with the summary statistic for each participant as a data point.

Figure 8 shows the plots for number of cigarettes since the last assessment. We see that most of the time they don’t smoke any cigarettes. The histogram looks like a geometric distribution for the number of cigarettes. The mean number of cigarettes is 1.76, the median is 0, and the 25 and 75 percentiles are 0 and 2, respectively. Thus, in at least half of the intervals they don’t smoke, and in 75% percent of them they smoke at most two cigarettes, but in some cases they smoke some huge number: in one case someone smoked over 40 cigarettes. That said, the number of cigarettes also will depend on the length of time elapsed since the last assessment, and the intensity function is time-varying: particularly, we noted in Figure 2b) that they tend to smoke more frequently in the pre-quit period.

Figure 9 plots the days between EMAs. Here we see that most observations are under one day, but there are a substantial number of outliers. In a number of cases the time between EMAs is over two days, and in several cases it is over six days. The mean time between observations is 0.34 days or approximately eight hours, the median is 0.17 or approximately four hours, and the 25 and 75 percentile are approximately two and 11 hours, respectively. Summarizing, most inter-EMA durations are under one day, but a few are very long.

Finally, also relevant is whether there is heterogeneity in the proportion of long intervals between subjects. Figure 10 investigates this. We see that participants vary between having no unreliable intervals and in one case having all unreliable intervals. The mean is 11%, the median is 6%, and the

image

Figure 8: a) boxplot b) histogram for number of cigarettes since the last assessment. We see that the median number of cigarettes is 0, and that there are many outliers.

image

Figure 9: a) boxplot b) histogram for number of days between EMAs. Recall that we treat intervals over one day as missing/unreliable. From the boxplot we see that the mean and 75% percentile are under one day, but that there are a substantial number of observations over one day. The histogram suggests a similar finding.

25 and 75 percentile are 1.8% and 11.4%, respectively. Thus there is substantial heterogeneity. While our theory does not currently account for this and we leave it to future work, the experiments from the previous section where each subject had a different missingness probability (Figure 6) suggests that our model can handle this issue well.

image

Figure 10: a) boxplot b) histogram for percent of long intervals between subjects. We see that most participants have a small proportion of long intervals (the median is 6.1%), while a few have a much larger proportion of long intervals.


Designed for Accessibility and to further Open Science