Many problems that arise from sensor arrays, stochastic control, mobile communication, and signal processing in general, are modeled by a linear combination of damped sinusoids, with stationary frequencies. However, in the current age of big data, particularly in time series analysis, the sinusoids are mostly non-stationary, in that the phase functions are not necessarily linear in time. A general mathematical model for such non-stationary signals or time series may be formulated by
or more generally by the complex variant
where the phase functions ) are differentiable, the amplitude functions
) are complex-valued and continuous, and
) is a minimally oscillatory real-valued function, called the trend of
). Here, the subscript I of
) is used to indicate that noise is not attached to the model. This is only a convenience of notation. Our methods do work well in the presence of noise, as will be proved theoretically in Theorem 2.1 and experimentally in Section 5. Without the trend function
) in (1.1), the model is called the amplitude modulation-frequency modulation (AMFM) model in signal processing, and the adaptive harmonic model (AHM) in the current mathematics literature (see, for example, [17, 9, 19, 6, 8, 5]). In our recent paper [7], the model (1.2) was called the Hilbert spectrum model (HSM). The objective of this paper is to introduce and develop an efficient and effective construction of deep networks for solving the inverse problem of recovering the number K of terms in (1.2), the instantaneous frequencies (IFs) defined by
), the amplitude functions
), and the signal components
) :=
) cos
), for j = 1
, as well as the trend
), from a sufficiently “dense” finite data set
) : k = 0
, that are allowed to be non-uniformly sampled on a bounded time interval [b, c]. Of course our constructions and theoretical development apply to infinite non-uniform data samples on an infinite interval as well, simply by ignoring the need of meeting the requirement of certain boundary conditions.
It is important to point out that our objective of solving an inverse problem is different from that of the popular empirical mode decomposition (EMD) method, introduced in [14]. Indeed, EMD is an ad hoc computational scheme for the decomposition of a non-stationary signal or time series g(t) into its intrinsic mode functions “IMFs” with residue, called the “trend”, without the concern of recovering the actual IMFs and trend that constitute the source signal or time series g(t). To emphasize the difference between EMD and our inverse problem, we recall the pioneering work [13] of Gaspard de Prony in resolving an exponential sum H(t) with K exponents and K corresponding constant coefficients, from 2K samples H(0), H(1)1), where K is known, by first computing the K exponents before computing the K coefficients. In particular, if the exponents are
, with
, Prony’s method can be applied to solving the inverse problem (1.2), at least in theory, for stationary signals without the trend
). Unfortunately, Prony’s method fails when some of the frequencies
are very close to one another. The reason is that when these unknown exponents are considered as zeros of a polynomial p(z), with coefficients to be computed from Hankel matrix inversion, computation of these coefficients and the zeros from solving some corresponding eigenvalue problem is highly unstable. In any case, the procedure of first recovering the instantaneous frequencies before computing the signal components, as pioneered by Daubechies in [10, 9], and just about all later development in mathematics, including [17, 20, 5, 6, 19, 8, 7], is opposite to that of EMD, which first computes the signal components, called intrinsic mode functions (IMFs) by repeated applications of the “sifting process”, before computing the instantaneous frequencies by applying the Hilbert transform to each IMF.
On the other hand, the approach in our present paper is substantially different from those in the published literature, including [19, 17, 10, 9, 20, 5, 6, 8, 7]. First, unlike these papers, we work with non-uniform samples of . We then approximate
using a spline quasi-interpolant [1]. This quasi-interpolant can be implemented using deep networks as suggested in [15]. The signal separation operator, (SSO), proposed in [6] can then be evaluated using uniform samples of this quasi-interpolant. The SSO operator evaluates a trigonometric polynomial, which can also be synthesized using a further neural network as proved in [16], and its thresholding by another layer of a network computing the rectified linear units ReLU activation functions. As in [6], clustering and finding local maxima then lead to the determination of the instantaneous frequencies. Importantly, in contrast to [9], we obtain the (complex) amplitudes simply by evaluating the SSO at these frequencies.
One interesting aspect of our construction is that the deep networks need not be “trained” in the usual sense, but are theory inspired, in that their parameters are prescribed by the theory without training. Our constructions are presented schematically in Figure 1 below.
Figure 1: Theory inspired deep network to compute the thresholded values of SSO given non-uniform samples of the signal .
After describing our Signal Separation Operator defined in [6] in Section 2, we develop the spline approximation
in Section 3. The actual construction of deep networks as in Figure 1 is described in detail in Section 4, and illustrated in Section 5.
In this section, we review our construction of the signal separation operator (SSO) and its properties in [6]. In [6], trend extraction was done separately; the operator SSO is designed to separate from the Hilbert spectrum model
the components ) and the instantaneous frequencies
), finding the number K of these quantities in the process. The following definition summarizes the conditions assumed on the signal.
Definition 2.1. For each , let H(t) denote the collection of functions f of the form (2.1) where each component
) =
) exp(
)), j = 1
, satisfies each of the following conditions.
1. is continuous, and
is continuously differentiable.
2. With B = B(t) := max(2.2)
A crucial ingredient in the definition of SSO is the notion of a lowpass window function defined below.
Definition 2.2. A real-valued function h(u), defined for all , is said to be an admissible window function, if 0
) is an even function with support
1], such that
0 for some
).
Observe that since h is continuous, 0 implies that h(u) > 0 in some neighborhood of
, so that
for all sufficiently large values of n > 0. In the sequel, we assume h to be a fixed low pass window function that is at least 3 times continuously differentiable. In the definition of the operator SSO, we allow a perturbation of the original signal. Thus, we write
where f is as in (2.1), and is a perturbation. For each fixed
, the operator SSO works with equidistant samples
for some
0. In the remainder of this paper, let T denote the quotient space of R with equivalence relation
defined by (
, so that
) mod 2
.
Definition 2.3. (Signal separation operator, SSO) For and
, the signal separation operator
, applied to functions F in (2.5), is defined by
where h is an admissible window function and 0 are parameters, with n chosen to be an integer so that
, as defined in (2.4), is positive.
The statement of our main theorem [6, Theorem 2.4] given below in Theorem 2.1 below requires some further notation. We will consider to be fixed, and denote for brevity
We will further use the following abbreviated notation
to facilitate the statement of the theorem. Observe that if the parameter of the SSO
is chosen to satisfy
where B = B(t) is defined by (2.2), it follows from (2.2) and (2.8) that (0
2] for each k = 1
, and
Theorem 2.1. Let be fixed, and F(u) = f(u) +
) as defined in (2.5), with
) and
for some constant E > 0, where is as in Definition 2.1. Also, let n be the smallest integer satisfying
Then the following statements hold for all sufficiently small 0.
(a) The set [0
] :
can be expressed as a disjoint union of exactly K non–empty sets
= 1
, where K is the number of signal components
of f in (1.1), with the following properties:
We note that the application of SSO requires equidistant samples around each . When the signal is given only as a set of samples at non-uniform nodes, our idea is to use a spline function to approximate the signal, and then use this approximation in place of F in the application of SSO. The purpose of this section is to describe this
construction.
In this section, we fix an integer 1, and assume that the values
are known for a set of points
. We write
= [
= 0
2,
= [
],
We denote the class of all algebraic polynomials of degree < m by Π. For any g : [
, we now define an approximation operator. For j = 0
1, we define the polynomial
to be the unique polynomial that interpolates g at
; i.e.,
) =
) for k = jm + 1
.
The approximation to g is then defined by
Constatnt convention
In the sequel, the symbols will denote generic positive constants depending only the fixed parameters in the discussion, such as m, h, etc. Their values may be different at different occurrences, even within a single formula. The notation
means
.
Obviously,
If each , and hence, f is m-times continuously differentiable on [a, b], then it is well known that
Theorem 3.1. Let f be m times continuously differentiable on [a, b], f be supported on [a, b] (in particular, ) =
) = 0 for
= 0
1), and (2.13) be replaced by
Then the conclusions of Theorem 2.1 hold if ) is replaced by
))(t, u).
Remark 3.1. It is clear that there exists a polynomial such that
) =
) and
) =
) for
= 0
1. Therefore, the assumption in Theorem 3.1 holds without the apparently extra condition about f being supported on [a, b]. In this case, we need to deal with the function
instead of F . In [6], we have derived several algorithms for removing the “polynomial trend” without any knowledge of the polynomial in advance.
Remark 3.2. Although is stated with the piecewise polynomial interpolant as defined above, for the purposes of actual computation, there are many other methods available in the spline literature, which we use for the actual computations in Section 5. In particular, spline quasi-interpolation, introduced in [12] and discussed in some details in ([11, pages 178 and 194]), is attractive, since it is a local computational scheme that assures optimal order of approximation. To avoid using derivative data required in [11], the spline quasi-interpolation scheme constructed in [1] can be applied for real-time applications by using only data samples. On the other hand, if the data samples are valuable for certain applications, the quasi-interpolation scheme must be modified to possess both the interpolation and quasi-interpolation properties. A “prediction-correction” formulation of such schemes, called “local blending spline interpolation,” was introduced in [4]. In combination with the real-time spline quasi-interpolation scheme in [1], the local blending spline interpolation scheme for non-uniform knots is developed in [5] and adopted to satisfy the derivative boundary conditions in [18, 8], by considering the knot sequence:
For example, with m = 4 stacked knots at the boundary to interpolate the first and second divided differences at the end points b and c for cubic spline interpolation at for k = 0, . . . , N, where
may be chosen by taking the average of
and
for k = 0
1. We remark that interpolation of derivatives or divided differences at the boundary not only minimizes boundary artifact, but also allows the “extrapolation” capability of our computational scheme for each IMF and the trend, at least for
and t > c +d for some positive value d, with the larger extrapolation interval [
] and [c, c + d] by using higher m-th order B-splines to interpolate up to the (
2)-th order derivatives of divided differences at b and c by any even m > 4. To end this section, we point out that if the data
) are available for equally spaced time samples, at
, then local blending spline interpolation can be computed in real-time, simply by up-sampling, followed by moving averaging with weights derived in ([3, pp.115-117]). (See also [2] for the bivariate setting in terms of box splines).
In this section, we describe the implementation of our method in terms of deep neural networks.
4.1 Piecewise polynomials as deep networks
The first step in our algorithm sketched in Figure 1 is to implement the quasi-interpolant as a deep network. We observe that a piecewise polynomial Q with knots is a linear combination of the form
Next, let J be an integer such that 2, and
) = (
. The function
can be implemented exactly as ((
((
; i.e., as a neural network with J layers with one neuron each, evaluating the activation function
. We denote this network by
. Finally, since the function
can be expressed as a linear combination of the functions
by taking the divided difference, the function
can be implemented as the same linear combination of the networks
). A linear combination of these in turn implement Q exactly. See [15] for further details.
4.2 Trigonometric polynomials as neural networks
The next step is to implement the function )) as a neural network. In some sense, since
)) is a trigonometric polynomial of degree < n, it is already a neural network with the activation function
cos t. We have demonstrated in [16] how any trigonometric polynomial can be implemented approximately using other sufficiently smooth activation functions. For example, we consider the smooth ReLU function
log(1 +
) =
). Then the function
) = log
(1 +
)(1 +
)(1 +
is an analytic function on T. With 1, we construct the network
Then our results in [16] imply that
for some constant c > 0 and (0, 1).
4.3 Thresholding as a deep network
The next step is to implement min(). This is easy using ReLU networks; i.e., a network that uses the activation function
) =
, equivalently,
+ (
. Indeed, for any real numbers a, b,
Thus, min() can be implemented as network with two layers, receiving the inputs
) and
as shown in Figure 1.
In this section, we demonstrate the effectiveness of the theory introduced in this paper, by solving the inverse problem of recovering all instantaneous frequencies (IFs), amplitude functions, and IMFs, as well as the trend from the blind source signal , plus an additive noise. We will implement our deep networks as spline quasiinterpolants as described in Remark 3.2, followed by the SSO, thresholding, and the evaluation of another SSO as indicated in Figure 1. In each synthetic experiment,we consider white noise with zero mean and variance
, and the Signal-to-Noise Ratio (SNR), is defined by
The mean square error (MSE) is used as a performance measure, defined by
where f is the original signal and denotes the recovered signal. Since the choice of the non-uniform samples as well as the noise is random, we repeat each experiment 50 times. The accuracy of the reconstructed results of IFs and IMFs as well as the trend is measured by means of the normalized mean square error (NMSE), calculated by averaging MSE of 50 independent trials, and the standard deviation (STD).
Example 5.1. (Close-by IFs) The first example is a signal consisting of four IMFs with very close-by frequencies and a non-monotone trend, given by
In Figure 2, we plot the original signal ) and the observed signal (SNR= 20dB), together with the reconstructed results of IMFs (0 < t < 50) and trend (0 < t < 200). The numerical errors in the recovery are listed in Table 1.
Example 5.2. (Split signal) Next, we consider a challenging example introduced in [9], defined by
and add a noise with SNR= 10dB to this signal. This signal does not satisfy the conditions of Theorem 2.1. Nevertheless, the quasi-interpolatory spline approximation being highly localized, our techniques are able to recover the IFs and IMFs accurately as indicated in Figure 3 and reported on in further detail in Table 1.
Figure 2: Top (left to right): blind-source signal ) with additive noise, recovered frequencies. Middle (left to right): recovered 1st, 2nd, 3rd IMFs. Bottom (left to right): recovered 4th IMF, recovered trend.
Example 5.3. (Variable amplitudes) In this example, we illustrate the proposed method when applied to a non-stationary signal; i.e., a signal where the amplitudes are also time-dependent in addition to the frequencies, defined by
The signal ) is added with a white noise with SNR= 25dB. The results are illustrated graphically in Figure 4, with details given in Table 1.
Example 5.4. (Robustness) The following example demonstrates the robustness of our method.
Table 1: Reconstructed results of Examples 1 3
Figure 3: Top (left to right): blind-source signal ) with additive noise, recovered frequencies. Middle (left to right): recovered 1st IMF, recovered 2nd IMF, Bottom: recovered trend.
To sufficiently analyze the robustness, we use our method with the pure blind source signal ) and again the signal added with various levels of white noise (SNR= 0
8). The reconstructed results of signals with different noise levels are shown graphically in Figure 5 and and numerically in Table 2.
Example 5.5. (Extrapolation) Our aim in this example is to examine the utility of our methods in extrapolating the signal a little bit beyond the time interval where its values are available, both in the past and in the future. We consider
Figure 4: Top (left to right): blind-source signal ) with additive noise, recovered frequencies. Middle (left to right): recovered 1st IMF, recovered 2nd IMF, recovered 3rd IMF. Bottom: recovered 4th IMF, recovered trend.
To verify that our approach applies to data extrapolation, we process the data on the time-interval [0, 30], and predict the value of the IFs, IMFs, and the trend on the interval [30.1]; i.e., for 0.1 time units in the past and the same in the future. To demonstrate that our extrapolated graphs are good approximation of the unknown ground truth, we zoom-in to the intervals [
7] and [29.3, 30.1]. The results are shown graphically in Figure 6.
Example 5.6. (Bat Echo-location) We consider a real-word signal, namely “a bat echolocation signal”
emitted by a large brown bat, and discover that it consists of four IMFs,
To convince ourselves that this decomposition makes sense, we add an unknown signal , given by
to and decompose the combined blind source signal
, and discover that it consists of five IMFs,
,. . . ,
. The MSEs of the five IMFs are 2
10
, 1
10
, 1
10
, 7
10
, and
Figure 5: Top (left to right): blind-source signal ), recovered 1st IMF, recovered 2nd IMF, recovered trend. Second row (left to right):
) with additive noise (SNR=0), recovered 1st IMF, recovered 2nd IMF, recovered trend. Third row (left to right):
) with additive noise (SNR=-5), recovered 1st IMF, recovered 2nd IMF, recovered trend. Bottom: (left to right):
) with additive noise (SNR=-8), recovered 1st IMF, recovered 2nd IMF , recovered trend.
1, respectively, where MSE is measured by
Through the comparison of recovery, it demonstrates that the proposed computational scheme can recover the IMFs effectively. Furthermore, it enables us to identity and quantify the blind-source signal through efficient extraction of information from the individual components.
In this paper, we have developed a rigorous theory, along with an effective algorithm and efficient computational scheme, for the construction of deep networks to resolving the problem of separation of signal components and extraction of their instantaneous frequencies, based on random, non-uniform samples of a blind-source composite signal on a bounded time interval. This provides an effective solution of an important problem in signal processing, that none of the existing methods, including the popular empirical mode decomposition (EMD), synchrosqueezed wavelet transform (SST), as well as our previous signal separation operation (SSO), could work satisfactorily in
Table 2: Reconstructed results of Examples 4
general. A highlight of our computational procedure is that it precisely determines the exact number of signal components without an a priori knowledge of the composite signal. In addition, our deep networks are “theory inspired”; i.e., constructed on the basis of a solid mathematical theory which allows the use of pre-fabricated networks that do not require training in the classical sense. Our results are proved mathematically as well as demonstrated experimentally, including an example of real-world bat echo-location signal.
[1] G. Chen, C. K. Chui, and M. Lai. Construction of real-time spline quasiinterpolation schemes. Approx. Theory Appl, 4(4):61–75, 1988.
[2] C. K. Chui. Multivariate splines. SIAM, 1988.
[3] C. K. Chui. An introduction to wavelets. Academic press, San Diego, 1992.
[4] C. K. Chui and H. Diamond. A general framework for local interpolation. Numerische Mathematik, 58(1):569– 581, 1990.
[5] C. K. Chui, Y.-T. Lin, and H.-T. Wu. Real-time dynamics acquisition from irregular samples—with application to anesthesia evaluation. Analysis and Applications, 14(04):537–590, 2016.
[6] C. K. Chui and H. N. Mhaskar. Signal decomposition and analysis via extraction of frequencies. Applied and Computational Harmonic Analysis, 40(1):97–136, 2016.
[7] C. K. Chui, H. N. Mhaskar, and M. D. van der Walt. Data-driven atomic decomposition via frequency extraction of intrinsic mode functions. GEM-International Journal on Geomathematics, 7(1):117–146, 2016.
[8] C. K. Chui and M. D. van der Walt. Signal analysis via instantaneous frequency estimation of signal compo- nents. GEM-International Journal on Geomathematics, 6(1):1–42, 2015.
[9] I. Daubechies, J. Lu, and H. T. Wu. Synchrosqueezed wavelet transforms: an empirical mode decomposition-like tool. Applied and computational harmonic analysis, 30(2):243–261, 2011.
[10] I. Daubechies and S. Maes. A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models. Wavelets in medicine and biology, pages 527–546, 1996.
[11] C. de Boor. A practical guide to splines. Springer Verlag., 1978.
[12] C. de Boor and G. Fix. Spline approximation by quasiinterpolants. Journal of Approximation Theory, 8(1):19– 45, 1973.
[13] B. G. R. De Prony. Essai ´experimental et analytique: sur les lois de la dilatabilit´e de fluides ´elastique et sur celles de la force expansive de la vapeur de l’alkool,a diff´erentes temp´eratures. , 1(22):24–76, 1795.
[14] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. Yen, C. C. Tung, and H. H. Liu. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971):903–995, 1998.
-20
-15
-10
10
15
-4
-3
-2
-4
-3
-2
Figure 6: Top (left to right): blind-source signal ) with additive noise, recovered frequencies. In the second and third rows, we zoom-in to the sub-intervals [0, 0.7] and [29.3, 30] and extrapolate the data outside the time domain to [
0] and [30, 30.1], respectively. Middle (left to right): Extrapolated 1st IMF, predicted 1st IMF, extrapolated 2nd IMF. Bottom (left to right): Predicted 2nd IMF, extrapolated trend, predicted trend.
[15] H. N. Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Advances in Computational Mathematics, 1(1):61–80, 1993.
[16] H. N. Mhaskar and C. A. Micchelli. Degree of approximation by neural and translation networks with a single hidden layer. Advances in Applied Mathematics, 16(2):151–183, 1995.
[17] G. Thakur and H. T. Wu. Synchrosqueezing-based recovery of instantaneous frequency from nonuniform samples. SIAM Journal on Mathematical Analysis, 43(5):2078–2095, 2011.
[18] M. D. van der Walt. Wavelet Analysis of Non-stationary Signals with Applications. PhD thesis, University of Missouri, St. Louis, 2015.
[19] H.-T. Wu. Adaptive Analysis of Complex Data Sets. PhD thesis, Princeton University, 2012.
[20] H. T. Wu, P. Flandrin, and I. Daubechies. One or two frequencies? the synchrosqueezing answers. Advances in Adaptive Data Analysis, 3(01n02):29–39, 2011.
0.05
0.1
0.05
0.05
Figure 7: Top (left to right): blind-source signal, added component ), combined signal. Middle (left to right): recovered 1st IMF, recovered 2nd IMF, recovered 3rd IMF. Bottom (left to right): recovered 4th IMF, recovered 5th IMF.