Super-resolution estimation of cyclic arrival rates

2016·arXiv

Abstract

1. Introduction

Real world arrival patterns typically exhibit cyclic (but not necessarily periodic) behaviour. Motivated by the need for tractable yet flexible functional forms for the arrival rate in queuing literature (Chen et al. [7]), we consider the following problem: Suppose we observe the jump times of a nonhomogeneous Poisson process (NHPP) in [0, T]. Here, N(t) denotes the number of arrivals in (0, t], and the intensity and the cumulative rate function are defined as

Our goal is to use the observed data to estimate arrival rates of the form

where the even number p of frequency components, the frequencies in a pre-specified band , and the complex coefficients are all unknown. Given the connections to Fourier series, this specification is very flexible and was introduced by Shao and Lii [19, 20]. They resolve the estimation problem under the classical setting where the frequencies are assumed to be spaced more than order 1/T apart. In this paper we examine the problem from the super-resolution perspective: We propose a simple procedure for estimating (1.1) when the frequencies can be up to order 1/T of each other. This is the finest possible resolution in the sense that no estimator can generally resolve frequencies separated by less than 1/T in the presence of noise [14].

Our approach modifies the classic periodogram and combines it with the super-resolution literature on total-variation/atomic norm regularization. Three ingredients (to be specified in Proposition 3) are used in Algorithm 1: i) A window function w(t) supported on [0, T]; ii) a threshold ; and iii) a neighbourhood exclusion radius r > 0. The simple but elegant intuition behind the thresholding idea (Donoho and Johnstone [8]) as applied to our situation is that the spectral energy (given by as defined in the algorithm) should be concentrated at the signal frequencies . If the signals are strong enough that exceed the ambient noise level, then setting above the noise level will result in the algorithm isolating a neighbourhood around each (see Figure 1.1). It will be shown that if the frequencies are separated from one another by a gap (resolution) of at least g(T)/T where , then with high probability our procedure will recover each with a precision of 2/T, provided that the dynamic range of the amplitudes is less than 14.5. This can be dramatically relaxed as the frequency gap is increased: For example if then the maximum allowable dynamic range exceeds 100. As discussed in section 3 of [20], some sort of dynamic range condition is needed even in the classical setting where the frequency gap is larger than order 1/T. Our analysis provides a way for quantifying the maximum allowable range when T is finite.

Figure 1.1. Visualization of Algorithm 1. In the depicted periodogram there are two signal frequencies and . Setting (horizontal line) above the ambient noise results in the algorithm selecting neighbourhoods (between the pairs of vertical lines) that contain and .

A notable aspect of our methodology is in introducing the windowed periodogram to the point process domain: Bartlett [1]’s classic ‘unwindowed’ periodogram for point processes is essentially, which is a special case of when w(t) is the rectangle window on [0, T]. We show that this window can and should be replaced with one that has faster decaying spectral tails. Doing so has two benefits. First, super-resolution can be achieved without needing to solve a semidefinite program. Second, even under the classical setting where the frequencies are spaced more than order 1/T apart (as ), frequency estimation is more precise with a windowed periodogram. For example, Figure 1.2 presents a log-log plot of the frequency estimation error versus T for various choices of g(T). The details of this experiment are elaborated upon in section 5.1. The plots show that the rate of convergence increases with g(T), with the windowed periodogram outperforming the unwindowed one until g(T) reaches order , whereupon both achieve the maximum rate of as predicted by theory.

The remainder of this paper is organized as follows. Our contributions to existing literature will be described below. Section 2 reviews some basic results from signal processing and shows how spectral leakage and windowing manifest themselves in arrivals data. This motivates the design of our estimation procedure. Frequency recovery is discussed in section 3 where , and r are specified. Under the conditions given in the section, it will be shown that our procedure will recover all frequencies to within a precision of 2/T with high probability. We will also articulate the tradeoffs involved relative to methods designed for the classical resolution setting. The estimator for the corresponding amplitudes and phases is given in section 4. In section 5 we use simulations to compare our procedure to the model selection approach in [19]. Concluding remarks can be found in section 6.

Contributions to literature. Our estimation procedure sits in between two streams of literature on spectral estimation. At one end, the classic approach is to visually inspect the unwindowed periodogram for point processes [1] to find frequencies corresponding to peaks in the plot. If it is assumed that there is only one frequency (Lewis [12], Vere-Jones [22]), the frequency corresponding to the

Figure 1.2. Frequency recovery error for the sim- ulation (5.4) as a function of T, for . ‘Win’ refers to the windowed periodogram and ‘Unwin’ refers to the classic one. The error rates are where s is the slope of the relevant fitted line.

largest peak of the periodogram is selected. Other approaches [2, 3, 11] also exist but it is unclear if they generalize to the setting with multiple frequencies.

Under the classical setting where the frequency gap is assumed to be 1/o(T), Shao and Lii [19, 20] extend the periodogram method to the multiple frequencies setting. Their procedure corresponds to setting in Algorithm 1 and running step 3 until p frequencies have been selected. The choice of p is determined using the AIC/BIC model selection criterion derived in the dissertation of Shao [19]. For BIC, [19] states that the probability of selecting the true p eventually approaches 1 as . Our procedure builds on [19, 20] in two directions. First, the use of windowing enables periodogram methods to achieve super-resolution, and this can be combined with either thresholding or model selection to estimate p. Second, finite sample performance bounds can be derived for our thresholding approach. This complements the BIC approach which does not come with high probability guarantees for finite data. Moreover, the bounds also provide a way for quantifying the allowable amplitude dynamic range when T is finite. As discussed in section 3 of [20], some sort of dynamic range condition is needed even if the frequency gap is larger than order 1/T. Of course, this will be more restrictive in the super-resolution setting, so there will be a cost to using our approach if the frequencies are in fact spaced far apart. This tradeoff will be discussed in section 3.

The other related stream of work is the super-resolution literature that uses total-variation or atomic norm regularization to select frequencies (Bhaskar, Tang, and Recht [4], Candès and Fernandez-Granda [6], Fernandez-Granda [10], Tang, Bhaskar, and Recht [21]). These papers study a generic spectral estimation problem in a discrete time setting. They generalize the -norm for a finite number of variables to the case where there is a continuum of predictors . An infinite dimensional extension of Lasso is then formulated and solved as a semidefinite program to select predictors and their coefficients. While the authors show that this method outperforms existing ones for the setting described, challenges arise when trying to adapt it to our problem. First, the arrival counts must be discretized into time bins, which introduces aliasing effects. Second, the required computational effort is overly taxingfor the size of problems we consider. For example, [7] analyzes 652 days of arrivals data from an emergency department and used 5,216 bins of 3 hour widths for the Lasso extension. The ADMM implementation recommended in [4] takes at least ten days to run on a computer with Intel i7 6500 cores. By contrast our procedure takes only a few minutes.

In terms of frequency recovery, the approaches in [10, 21] are guaranteed to pick out one or more frequencies within some C/T of each signal frequency when the resolution is 4/T. In the stochastic noise setting of [21] the guarantee holds with high probability, and they further conjecture that it is possible to prevent the selection of spurious frequencies. We contribute to this literature by resolving the conjecture in the affirmative, since our procedure recovers exactly p + 1 frequencies with high probability, one within 2/T of each true signal. The tradeoff with using a periodogram method is that a bound on the dynamic range of the amplitudes is needed. However as mentioned earlier, this can be dramatically relaxed by widening the frequency gap slightly, from 4/T to 6/T for example.

2. Overview of the estimation approach

Let the continuum of complex exponentialsbe our dictionary for constructing an arrival rate. Suppose the rate for the underlying NHPP is (1.1), which belongs in the collection

(2.1)

Since is real-valued, (1.1) will lie in the subset where the presence of implies its conjugate , so in particular will be real and positive. The quantity of interest is the (p + 1)-vector of frequencies in (1.1), where p is even but unknown. Given these, the coefficients in (1.1) will be estimated by the complex-valued least squares solution (4.1) described in section 4. Since is unobservable, we only see arrivals in the time window [0, T]. Estimating the intensity therefore becomes a question of recovering from the frequency components in the trajectory . To make the connection between the spectrums of the two quantities clearer, rewrite the latter in its Doob-Meyer form of signal and noise components

where is the indicator function of . Even in the absence of noise, the spectrum of the signal component is itself a distorted version of the one for : Denoting the Fourier transform of f(t) as

we can write the spectrum of as the sum of the Dirac delta spikes centred at :

On the other hand is the result of truncating due to T being finite, a spectrum distorting operation known as leakage: Denote the convolution operator ∗ by , the h-smoothed average of f about the point t. The spectrum of is

a weighted average of ’s spectral values concentrated at . Thus trun- cation has the effect of smearing the frequency spikes in into a continuous spectrum: For can have a non-zero value, creating an arti-ficial noise floor. The noise floor around strong signal frequencies may mask weaker neighbouring signals, leading to resolution loss and making it difficult to recover from . Leakage distortion is a manifestation of the uncertainty principle because perfect frequency localization requires , but this is only possible if , i.e. an infinite time window is needed.

The key idea that Algorithm 1 uses to deal with leakage is to replace with a suitably chosen window function w(t) to obtain the weighted arrival process . We see from (2.3) that the extent of leakage depends on the tail decay of , as this dictates the influence that distant frequencies has on the local spectral value. Since can be truncated to (0, T] using any w(t) supported on (0, T], we can multiply with one whose Fourier transform has lighter tails.

While the usual anti-leakage benefits of non-uniform windows is well known in signal processing, they are in fact needed in our procedure for attaining frequency resolutions of order 1/T: The tail decay of is of order . Thus if are spaced 1/T apart, the leakage (2.3) around a neighbourhood of from the other frequencies can be of order log p for the rectangle window. This can easily mask the periodogram spike at when p is large enough. Hence the classic periodogram method is generally unable to attain frequency resolutions of order 1/T. Interestingly the window that is usually considered optimal for signal processingis actually suboptimal for frequency recovery: Theorem 3.44 of [15] shows that the spectral tail decay of the prolate spheroidal function is also of order when it is time-limited to [0, T].

Returning to the problem of recovering from (2.2), consider the (1/T)-scaled spectrum of the windowed data :

(2.4)

Recall from Algorithm 1 that is defined as the windowed periodogram. For sufficiently far from , the noise level outside the vicinity of these frequencies should be low for light tailed :

(2.5) || ≤ ∥

If the signal strengths are sufficiently strong, then intuitively a neighbourhood of can be isolated by simply excluding frequency regions in where is below some threshold (see Figure 1.1). This is the idea behind step 2 of Algorithm 1. The analysis presented in the next section will guide our choices for , and r in our estimation procedure.

3. Frequency recovery

To guarantee that Algorithm 1 will recover the true signal frequencies with high probability, we will assume that conditions A1 and A2 given in this section hold from the point they are stated. First, since no method can distinguish among frequencies that are clustered arbitrarily close together, we impose a minimum separation gap.

The gap g(T)/T represents the frequency resolution for our procedure, and our recovery results cover all possible rates of growth for g(T) as . The lower bound of 4/T benchmarks the frequency gap employed in the super-resolution literature [10, 21]. If instead the benchmark target is the classical setting in [19, 20], then A1 may be relaxed to 6/T, see the remark following Proposition 3 below.

Under A1, we must localize each to within a neighbourhood of radius 2/T to avoid possible ambiguity from overlapping. To achieve this with thresholding, note from (2.5) that if is at least 2/T away from the nearest , then is strictly

less than

where the tail sum bounds the leakage noise floor outside the vicinity of , and the last term is the statistical noise level. The unknown can be estimated using the highest peak of the periodogram: It is shown in Appendix A that

(3.2)

Substituting the bound (3.2) for into (3.1) shows that the threshold level in Algorithm 1 should be

in order to remove from the region R all frequencies not within 2/T of any . Our procedure will then select a unique frequency within 2/T of each if , so we can set r = 2/T. In view of (2.4) and (3.3), a sufficient condition for is

for , or equivalently

(3.5)

It will be shown that the first two terms are dominant. Hence to first order, as the tail sums and become small relative to , a larger margin of separation between signal and leakage noise is attained in frequency domain. Therefore window functions with rapidly decaying spectral tails are desired. Of the commonly used continuous time windows presented in Table 3.1 of Prabhu [16]

Figure 3.1. Plot of for the Hann window (3.6). Left panel: Most of the energy is concentrated in the main lobe between . Right panel: The side lobes are of width 1/T and have successively lower peaks.

with spectral energy concentrated inside , the time-shifted Hann window has the lightest spectral tails (order ):

where is the sinc kernel. Note from Figure 3.1 that is symmetric and most of its energy is concentrated inside the main lobe between . The sidelobes are of width 1/T and have successively lower peaks. The following lemma provides estimates for and .

Lemma 1. For the Hann window

Furthermore if we define , then for any ,

The remaining quantity not yet examined in (3.4) and (3.5) is the supremum spectral density of the windowed statistical noise. Noting that and , the following lemma shows that the scaled spectral noise level is of order .

and

Lemmas 1 and 2 can be used in (3.4) to define the data-driven threshold

As T grows the last term in A2 vanishes, so to first order the condition requires the dynamic range of the amplitudes to be less than 14.5. The smaller the tailsums and are, the larger the allowable range. In particular if the gap in A1 is slightly relaxed from 4/T to 6/T, the value of 14.5 can be increased to over 100 by replacing the Hann window with the lighter spectral-tailed window [16]. Thus windows with light spectral tails provide a solution for detecting weak frequency signals in the presence of strong ones. This addresses a point mentioned in passing on page 110 of [20]: Issues with the periodogram method arise when the dynamic range is large, even in the classical setting where the frequency gap is 1/o(T). Our analysis provides a way for quantifying this for both the windowed and unwindowed periodograms when T is finite. In the special case where all the frequencies have the same amplitude , A2 simplifies to requiring the amplitude to be larger than a multiple of the statistical noise level (last term of A2).

The main frequency recovery result can now be stated under A1 and A2.

Proposition 3. Let w(t) in Algorithm 1 be the Hann window (3.6), and set r = 2/T and as (3.7). Then with probability at least

our procedure will select exactly p + 1 frequencies with precision . Furthermore,

Remark. Through the use of windowing, we obtain the first periodogram peakhunting method that is able to achieve super-resolution. Note from the definition of that if then the procedure will recover all frequencies with precision o(1/T). In particular, if g(T) is or greater then the estimation error is up to a log factor. For the unwindowed periodogram in the closely related time series setting, Theorem 6.8b of Li [13] shows that the same rate is achieved when g(T) is greater than . This is because has a bias of due to the slower spectral tail decay of the rectangle window (Remark 6.14 of [13]). Thus even under the classical resolution setting, windowing is still beneficial since it sharpens the precision of the frequency estimates.

Remark. In applications, , and are chosen to balance a number of considerations. First is the expected dynamic range (A2) for the particular problem being considered. Second is the bandwidth B: If A2 holds for values of satisfying , then the probability bound above is to leading order in B/T, in which case B has the same asymptotic scaling as [19, 20, 22]. Third is the desired recovery probability. One possible choice that balances these considerations is , and , which simpli-fies the probability bound to

Remark. When p is known, no thresholding is necessary, and the asymptotic normality results in [20] for the classic periodogram can be extended to the windowed one. The details are provided in Appendix B.

When does the approach of [19, 20] perform better? If are infact spaced more than order 1/T apart from one another, then it follows from (2.5) that the leakage outside a O(g(T)/T)-neighbourhood of the frequencies is of order for the Hann-windowed periodogram. Hence the threshold (3.7) is conservative in this setting. While it will still work within the dynamic range implied by A2, we expect the method in [19, 20], which was specifically designed for the classical resolution setting, to recover more of the frequencies with amplitude less than 1/14.5 of the largest one. Of course, the Hann window analyzed here can also be used with the method in [19, 20].

Connection to super-resolution literature. There are clear connections between our results and those arising from the work on super-resolution recovery of discrete time signals [4, 6, 10, 21]. In that setting the authors assume a discrete time signal and the observations are of the form y = x+e where is a noise vector.

For a bounded e, [6, 10] establish signal and support recovery guarantees for their semidefinite programming approach. On the other hand for the related AST approach [4, 21] achieves near minimax rates. Furthermore if is larger than some multiple of , then with high probability AST is guaranteed to pick out one or more frequencies within some C/n of each signal frequency. The authors conjecture that it is possible to prevent the selection of spurious frequencies, and that the sparsity p can be dropped from the lower bound on . The following corollary shows that our procedure resolves these conjectures in the affirmative when applied to this setting.

Corollary 4. Suppose T is replaced by n and is replaced by in A2. Under the discrete time setting above, with high probability our procedure will select exactly p frequencies within distance of the true ones.

Modified threshold. The last term in the threshold (3.7) comes from the spectral noise bound in Lemma 2, whose constant may be conservative. As a result we observe in experiments that a large value of T is sometimes needed for the guarantees to hold with high probability. To obtain a tighter estimate, one idea is to approximate the spectral noise level of the underlying nonhomogeneous Poisson process with that of a homogeneous one. This is motivated by the fact that the noise bound in Lemma 2 depends on only through the average rate , regardless of whether the Poisson process is homogeneous or not. Thus for a given , consider the modified threshold

where is the simulated for the homogeneous Poisson process with rate over [0, T]. It is equivalent to applying the expression (5.2) to simulated data. Clearly, if the second quantity in the curly bracket is smaller then we effectively recover (3.7). In experiments we find that thresholding with performs better than in practice. The following corollary provides a large sample recovery guarantee for .

all frequencies will be recovered with the precision stated in Proposition 3 when we threshold with .

Remark. If the second condition is to ever hold, must then be at least of order .

4. Amplitude and phase estimation

As noted by Rice and Rosenblatt [17] for the case of cyclic time series and [19, 20] for the case of cyclic Poisson processes, it is necessary for the estimated frequencies to be within o(1/T) of if we wish to estimate the coefficients consistently. We will therefore let in Proposition 3 so that . Our estimator is the complex-valued least squares solution to (2.2) in the limit :

where the j-th entry of the (p + 1)-vector y is , and the (j, k)- entry of the matrix is

where is the Fourier transform of the rectangle. Since are symmetric about zero, it can be shown for that and are conjugate pairs, hence the estimator for is always real-valued. We note that the corresponding estimator in [19, 20] can be recovered by setting to the identity matrix, which is asymptotically valid because converges to an orthonormal design as . Our choice of provides a second order correction when T is finite.

Proposition 6. Suppose the conditions for Proposition 3 hold with , and that

is invertible. Then with probability at least

i) is also invertible for sufficiently large T; and ii)

5. Numerical examples

We use simulations to compare our thresholding procedure (based on the modi- fied threshold) to the windowed periodogram combined with BIC model selection, and also to the classic periodogram in [19, 20] combined with BIC. We also use our procedure to analyze arrivals data from an academic emergency department in the United States. We focus on the BIC because it is asymptotically consistent, and the corresponding penalized log-likelihood for Poisson processes is derived in section 3.3.4 of [19]:

(5.1)

The algorithm for using the windowed periodogram with BIC selection corresponds to setting in Algorithm 1 and running step 3 until p frequencies have been selected. The value of p is chosen to minimize (5.1).

Since by default the frequency is always selected, we work with the centralized version of instead:

which is one way to generalize the centralized unwindowed periodogram given by equation 4 in [20]. This approximately removes from the windowed periodogram the peak at the origin.

The asymptotic analysis in [19, 20] recommends a minimum exclusion radius

of r = 3/T, which corresponds to assuming a frequency gap of at least 6/T in the finite T setting. Hence, in order to compare the methods, we also set r = 3/T in Algorithm 1 and assume that . The modified threshold corresponding to (3.8) is then

where we choose to be small. In all our analyses, turns out to be always smaller than the lower bound for the second quantity in the curly bracket. When the maximum allowable dynamic range widens to 47 under the Hann window.

5.1. Frequency recovery error rate. We use the following simulation to empirically study the error rates for frequency recovery in Proposition 3, which shows that when g(T) is constant the error is no greater than O(1/T). As remarked after the proposition, the error rate becomes for g(T) equal to or greater than . In the closely related time series setting, the unwindowed periodogram achieves the same rate when g(T) is greater than (Theorem 6.8b of [13]). We will therefore examine as a function of T at the frequency resolutions corresponding to . Consider the following class of arrival rates

whose frequencies are spaced apart by g(T)/T. The amplitudes are drawn randomly from U[1, 1.5], and the phases from . For each combination of T and g(T) we sample 100 sets of the amplitudes and phases, and then use each set to simulate the corresponding arrival process up to time T.

For the values of T considered, all frequencies are detected by the three methods. Hence BIC selection and thresholding produce the same results when applied to the windowed periodogram. Figure 1.2 plots on a log-log scale the error averaged across the 100 simulations for each combination of T and g(T). The slopes of the fitted lines estimate the error rate. For this example the windowed periodogram performs even better than what the theory predicts, achieving an error rate of almost even for g(T) = 6. All methods attain this rate when .

5.2. Misspecified arrival rate in the classical resolution setting. The saw-

tooth wave in [20] provides a nice example for testing the robustness of the methods to misspecifications to (1.1). Consider the arrival rate

which has an infinite number of Fourier series frequencies spaced apart. We simulate 100 realizations of the arrival process up to time T = 1, 000, which is well within the classical setting where the frequencies are spaced 1/o(T) apart. To assess the accuracies of the three methods at estimating , we use the average of the MSE across the 100 samples as the performance metric. We also report the average number of correct and spurious frequenciesrecovered by each method in Table 1.

Per the discussion in section 3 regarding when the classic periodogram method should outperform our approach, (5.5) fits the bill since the frequencies are separated by much more than order 1/T. Interestingly, the differences in performance among the methods are not statistically significant for this example.

Table 1. Results for the sawtooth intensity (5.5). Column UBIC is the unwindowed periodogram combined with BIC selection, WBIC is the windowed periodogram with BIC selection, and WThres is the windowed periodogram with thresholding. Averages over 100 simulations are reported (standard errors in parentheses).

5.3. A super-resolution example with varying dynamic range. The follow-

ing arrival rate is inspired by Professor E.H. Kaplan’s analysis of arrivals data to a psychiatric ward, where the existence of a lunar and a monthly cycle are verified:

The two frequencies at 1/28 and 1/30 are separated by a gap that is slightly larger than 6/T when T = 3, 000. The monthly cycle is r times stronger than the lunar one, meaning that leakage from the former can easily mask the latter when r is large. The left panels of Figure 5.1 display the centralized windowed periodograms for different values of the dynamic range r, and the right panels display the corresponding unwindowed periodograms. Here we apply thresholding to the windowed periodograms; BIC selection performs similarly.

For r = 10 (top row), both periodograms are able to resolve the two frequencies. For r = 15 (middle row) only the windowed periodogram is able to detect the weaker lunar cycle. Both methods fail to identify the lunar cycle when r = 50 (bottom row), although the windowed periodogram is still able to do so for r = 45 (not shown). This illustrates the role of windowing in suppressing leakage, thereby allowing for super-resolution frequency recovery. Moreover, our findings match the calculations at the beginning of this section that show the Hann-windowed periodogram has a maximum allowable dynamic range of 47 when . If there are actually more frequencies in (5.6) that are O(1/T) away from the lunar cycle, then the leakage around 1/28 in the classic periodogram will be of order log p as explained in section 2. In such cases the classic periodogram may not be able to detect the lunar cycle even if the dynamic range is 1.

5.4. Patient arrivals to an emergency department. Our last example ana-

lyzes arrivals data from the emergency department of an academic hospital in the United States. We focus in particular on the arrivals of 66,240 mid-acuity levelpatients from 2014 to Q3 of 2015 (T = 652 days).

As shown in the left panel of Figure 5.2, three intraday frequencies and five week-based ones are selected from the centralized periodogram. The intraday frequencies include a daily cycle (), a 12 hour cycle (), and an 8 hour cycle (). The week-based ones include a weekly cycle (), a half week cycle (), a 1/5 week cycle (), a 1/6 week cycle (),

Figure 5.1. Results for (5.6). Left panels: The centralized Hannwindowed periodograms. The threshold is represented by the horizontal line, and the locations of the frequencies and their estimates are given by the vertical ones. Right panels: The unwindowed centralized periodograms. Top row: r = 10, middle row: r = 15, bottom row: r = 50.

and a 1/8 week cycle (). Given that the fitted rate has a weekly period, we can compare it to the average arrival rate for each of the 168 hours of the week (right panel of Figure 5.2). Overall, we see that using 8 frequencies to model the arrival rate does almost as well as using 168 piecewise constant hourly fits. Moreover the sinusoidal estimate reveals two intraday peaks, the first at around 11am and the second at around 5pm. We also see that the intensity of arrivals fade steadily into the weekend.

Figure 5.2. ESI level 2 arrivals. Left panel: The centralized windowed periodogram. The selected threshold is represented by the dashed horizontal line, and the location of the frequency estimates are given by the vertical ones. Right panel: The estimated arrival rate (arrivals per day) over the course of a week is given by the solid line. The dash-dot line represents the empirical average arrival rate for each hour of the week.

6. Discussion

By a novel use of windowing, this paper shows that simple periodogram methods can in fact achieve super-resolution frequency recovery for cyclic arrival rates. This improves the resolution of classic periodograms, while being much faster to compute than the SDP approach in super-resolution literature. Under mild assumptions on the dynamic range of the frequency amplitudes, our approach guarantees that no spurious frequencies will be recovered. To establish the consistency of the coefficient estimates, our finite sample results show that if the frequency gap is 1/o(T), then the frequencies can be recovered with precision o(1/T) as required. Whether the gap can be relaxed to order 1/T is a question that is left for future research.

Another area for future research is to extend the cyclic specification (1.1) to allow for higher order non-cyclical components as well. One approach is to add wavelets to the basis of complex exponentials. It might then be possible to leverage the rateoptimal procedure in Brown et al. [5] to estimate the time-localized components of the arrival rate.

Acknowledgements

The review team provided many insightful comments that significantly improved the paper. Special thanks to Ed Kaplan and Don Green for stimulating discussions on spectral analysis. The emergency department arrivals data was kindly provided by Dr. Kito Lord. NC acknowledges the support from the HKUST start-up fund R9382. SNN acknowledges support from NSF Award DMS 1723128.

which establishes (3.2). For (3.3), let and suppose is the signal frequency closest to . If then (2.5) gives

(A.1)

Consider the alternative . If then the l-th signal frequency to the left of is at least 4l/T away, and the l-th signal frequency to the right of is at least away. Hence

(A.2)

The same bound also applies when , so combining (A.1) and (A.2) gives (3.3).

Proof of Lemma 1. We begin by estimating the side lobe heights of the Hann window’s spectrum.

Lemma 7. For defined in (3.6), let be the location of the peak of ’s side lobe in the interval for . Then where is the larger root of . Hence

Proof. By symmetry it suffices to consider the heights of the side lobes of 1} over intervals for k even, and for odd .

The first order condition implies that is the root of

Over the left hand side is decreasing from to and crosses zero at . The right hand side is positive and also decreasing, therefore the two sides intersect somewhere in . On this subinterval, linearizing about yields the lower bound . The locations at which this intersects

must all be less than . Therefore the larger root of 1/k)},

is a lower bound for . The bounds on in the lemma follow directly from the bounds on , and from noting that and for

where the tail is bounded using the polygamma function of order 2. The corresponding lower estimate is

The tail sum can be estimated in the same manner. To derive the remaining bounds stated in the lemma, define

so that for . If then for some integer under A1. Hence

Bearing in mind that ,

Therefore

The bound on the sum of the derivatives follows from some algebra showing that

Proof of Lemma 2. Lemma 8 below is required for the proof of Lemma 2. It gives a concentration bound for weighted sums of Poisson process increments using standard results for sub-exponential variables.

1. Then for z > 0,

and

Proof. Recall that for a centred Poisson random variable Z with rate

and that for we have

The claim is clear for . For one can show that by comparing its power series to a dominating geometric sum. Furthermore note that for any , hence optimizing Chernoff’s bound for

within this range gives

with the minimizer being

We now prove Lemma 2.

Proof. Recall from (2.2) and (2.4) that . Partitioning [0, T] into L intervals each of width , we find that for [0, B],

(A.3)

where are independent and centred Poisson random variables with rates . By rewriting , it follows from Lemma 8 in Appendix A that the first term above exceeds

(A.4)

with probability less than . To control the supremum over

of the last term in (A.3), we express it in the dual norm terminology of

[4]:

According to Appendix C of [4] the dual norm and its approximation

are equivalent:

Therefore, to bound , note that

where the third inequality follows from the union bound and the fourth one from . The last inequality follows from Lemma 8.

Substituting the bounds (A.4) and (A.5) into (A.3) reveals that

with probability at least

Since this holds for arbitrary , setting gives

with probability at least

Finally, it is easy to show using Chernoff’s bound that

Proof of Proposition 3. The proof requires the use of two results. First, it is easy to show for the Hann window that

Second, under the setting of Lemma 2,

To see this, note that . The window v(t) = tw(t)/T satisfies and under the conditions in the lemma. Hence (A.7) follows from applying Lemma 2.

Proof of proposition. That with high probability is clear from the development of section 3. We will use this as the starting point for obtaining a sharper bound in the region .

By a unitary renormalization of and a time shift, we may assume that is real and positive and that . Rewrite (2.4) as

By the local optimality of ,

so combining this with the right hand side of (A.6) yields

Returning to (A.8), an application of the mean value theorem to yields

for some . In view of the left hand side of (A.6) as well as (A.9), . Furthermore, Lemmas 1 and 2 imply that

where the last inequality comes from . Since ,

we desire a lower bound for in (A.10). Putting the bounds for and into (A.10) yields

which when combined with the right hand side of (A.6) gives . The derivative can be bounded using Lemma 1 and (A.7):

Proof of Corollary 5.

Proof. Specializing (3.2) and (3.3) to the Hann window gives

which will used throughout to bound in terms of and vice versa. In addition, Lemma 2 shows that with the stated probability the spectral

noise level is controlled by

When is at least 2/T away from the nearest , (3.4) tells us that

Hence, no spurious frequencies will be selected. To select the k-th frequency it suffices for . Along the lines of deriving (3.5) we see that

Furthermore

so it follows from the strengthened version of A2 that for all k, and we inherit the estimation precision of Proposition 3.

Proof of Proposition 6. The following bound is needed in the proof below: For any frequency pair ,

This follows from

where the inequality obtained from interchanging the modulus and integral is valid for complex-valued integrals. The second equality follows from , and the penultimate inequality from .

We will also need a matrix norm for the proof: When the collection of (p + 1) complex-valued matrices is equipped with the maximum row sum norm

it becomes a Banach algebra because is submultiplicative. Hence the resolvent admits the expansion for (Theorem 18.3 of [18]). Thus an invertible matrix remains invertible when perturbed by an error D with norm smaller than :

Proof of proposition. Under Proposition 3 we have , hence (A.11) implies that

so and result i) follows from (A.12). Henceforth we will assume that exists.

Next, observe that is the value of the periodogram (2.4) at when the rectangle window is used, so

where is a vector whose j-th entry is , and E is a matrix with (j, k)-entry . It follows from (A.11) that .

Furthermore, since the rectangle window satisfies the conditions in Lemma 2,

To complete the derivation of result ii), note that and

The derivation of the asymptotic normality results herein closely follows the setting and argument in Shao and Lii [20]. We extend the result for the estimator obtained from the classic periodogram under the known p setting to the windowed periodogram. Let be the frequency estimates obtained from the windowed periodogram, and consider the cosine representation of the arrival rate in (1.1), , where we can assume without loss of generality that .

Proposition 9. If as , then is asymptotically normal with zero mean and covariance

where

Proof. The following quantities are asymptotically normal with mean zero:

and the asymptotic covariance of and is

A Taylor expansion shows that

By our hypothesis that , the last two terms are , and also . Hence

and therefore

Putting everything together establishes the claimed asymptotic normality for

If as , the asymptotic behaviour of the coefficient estimate given by (4.1) is identical to that of y since converges to an orthonormal design. A standard application of the delta method then establishes the asymptotic

normality for the real and imaginary parts of

References

[1] MS Bartlett. The spectral analysis of point processes. J. R. Statist. Soc. B, 25(2):264–296, 1963.

[2] M Bebbington and R Zitikis. A robust heuristic estimator for the period of a Poisson intensity function. Methodol. Comput. Appl. Probab., 6(4):441–462, 2004.

[3] E Belitser, P Serra, and H van Zanten. Estimating the period of a cyclic non-homogeneous Poisson process. Scandinavian Journal of Statistics, 40(2): 204–218, 2013.

[4] BN Bhaskar, G Tang, and B Recht. Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Sig. Process., 61(23):5987–5999, 2013.

[5] L Brown, T Cai, R Zhang, L Zhao, and H Zhou. The root–unroot algorithm for density estimation as implemented via wavelet block thresholding. Probability theory and related fields, 146(3-4):401–433, 2010.

[6] EJ Candès and C Fernandez-Granda. Super-resolution from noisy data. Journal of Fourier Analysis and Applications, 19(6):1229–1254, 2013.

[7] N Chen, DKK Lee, and HP Shen. Can customer arrival rates be modelled by sine waves? Working paper, 2018.

[8] DL Donoho and JM Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994.

[9] A Dutt and V Rokhlin. Fast Fourier transforms for nonequispaced data. SIAM J. Sci. Comput., 14(6):1368–1393, 1993.

[10] C Fernandez-Granda. Support detection in super-resolution. In Proceedings of the 10th International Conference on Sampling Theory and Applications, pages 145–148, 2013.

[11] R Helmers and IW Mangku. On estimating the period of a cyclic Poisson process. Mathematical Statistics and Applications: Festschrift for Constance van Eeden (eds Moore, Froda, Leger), IMS Beachwood, pages 345–356, 2003.

[12] PAW Lewis. Remarks on the theory, computation and application of the spec- tral analysis of series of events. Journal of Sound and Vibration, 12(3):353–375, 1970.

[13] TH Li. Time series with mixed spectra. CRC Press, 2014.

[14] A Moitra. Super-resolution, extremal functions and the condition number of vandermonde matrices. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages 821–830. ACM, 2015.

[15] A Osipov, V Rokhlin, and H Xiao. Prolate spheroidal wave functions of order zero. Springer Ser. Appl. Math. Sci, 187, 2013.

[16] K Prabhu. Window functions and their applications in signal processing. CRC Press, 2013.

[17] JA Rice and M Rosenblatt. On frequency estimation. Biometrika, 75(3): 477–484, 1988.

[18] W Rudin. Real and complex analysis. McGraw-Hill, 1987.

[19] N Shao. Modeling Almost Periodicity in Point Processes. PhD thesis, University of California, Riverside, 2010.

[20] N Shao and KS Lii. Modelling non-homogeneous Poisson processes with almost periodic intensity functions. J. R. Statist. Soc. B, 73(1):99–122, 2011.

[21] G Tang, BN Bhaskar, and B Recht. Near minimax line spectral estimation. IEEE Trans. Inf. Theory, 61(1):499–512, 2015.

[22] D Vere-Jones. On the estimation of frequency in point-process data. J. of Appl. Probab., pages 383–394, 1982.

Designed for Accessibility and to further Open Science