Bayesian kernel-based system identification with quantized output data

2015·Arxiv

Abstract

Abstract

: In this paper we introduce a novel method for linear system identification with quantized output data. We model the impulse response as a zero-mean Gaussian process whose covariance (kernel) is given by the recently proposed stable spline kernel, which encodes information on regularity and exponential stability. This serves as a starting point to cast our system identification problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC) methods to provide an estimate of the system. In particular, we show how to design a Gibbs sampler which quickly converges to the target distribution. Numerical simulations show a substantial improvement in the accuracy of the estimates over state-of-the-art kernel-based methods when employed in identification of systems with quantized data.

1. INTRODUCTION

Identification of systems from quantized data finds applications in a wide range of areas such as communications, networked control systems, bioinformatics (see e.g. [Bae and Mallick, 2004] and [Wang et al., 2010]).

From a system identification perspective, identification of systems having quantized output data constitutes a challenging problem. In fact, the presence of a quantizer cascaded to a dynamic system, causes a loss of information on the behavior of that dynamic system. Thus, standard system identification techniques, such as least-squares or prediction error method (PEM) [Ljung, 1999], [S¨oderstr¨om and Stoica, 1989], may give poor performances. For this reason, in recent years several techniques for system identi-fication from quantized data have been proposed in a series of papers. Some of these methods are specifically tailored for identification of systems with binary measurements [Wang et al., 2003], [Wang et al., 2006], and are possibly implemented in a recursive fashion [Guo and Zhao, 2013], [Jafari et al., 2012]. Other methods, such as [Colinet and Juillard, 2010], exploit the knowledge of a dithering signal to improve the identification performances. Specific input design techniques are studied in [Godoy et al., 2014], [Casini et al., 2011] and [Casini et al., 2012]. Methods for handling general types of quantization of data have been proposed recently [Godoy et al., 2011], [Chen et al., 2012b]. In such contributions, the problem of identifying a linear dynamic system with quantized data is posed as a likelihood problem. In particular, in [Chen et al., 2012b] authors exploit the recently proposed Bayesian kernel-based formulation of the linear dynamic system identifi-cation problem (see [Pillonetto et al., 2014] for a survey).

Similarly to [Chen et al., 2012b], the starting point of this paper is the formulation of the problem of identifying a linear dynamic systems with quantized data using a Bayesian approach. We model the impulse response of the unknown system as a realization of a Gaussian random process. Such a process has zero mean and its covariance matrix (in this context also called a kernel) is given by the recently introduced stable spline kernels, [Pillonetto and De Nicolao, 2010], [Pillonetto et al., 2011], [Bottegal and Pillonetto, 2013] which are specifically designed for linear system identification purposes. The structure of this type of kernels depends on two hyperparameters, namely a scaling parameter and a shaping parameter, whose tuning permits more flexibility in the identification process and can be seen as a model selection step. In the standard setting (i.e. when there is no quantizer), kernel hyperparameters are chosen as those maximizing the marginal likelihood of the output measurements, obtained by integrating out the dependence on the system. Once the hyperparameters are chosen, the impulse response of the system is computed as the minimum mean-square Bayes estimate given the observed input/output data (see e.g. [Pillonetto and De Nicolao, 2010], [Chen et al., 2012a]).

A key assumption in kernel-based methods is that the output data and the system admit a joint Gaussian description. Such an assumption does not hold with quantized data and we need to think of a different approach. In this paper we propose a solution based on Markov Chain Monte Carlo (MCMC) techniques [Gilks et al., 1996]. To this end, we define a target probability density; the estimate of the system can be obtained by drawing samples from it. Such a probability density is function of the following random variables: 1) the (unavailable) non-quantized output of the linear system, 2) the scaling hyperparameter of the kernel, 3) the unknown measurement noise variance, 4) the impulse response of the system. The main contribution of this paper is to show how to design a Gibbs sampler [Gilks et al., 1996] by exploiting the knowledge of the conditional densities of the target distribution. The main advantage of the Gibbs sampler is that it does not require any rejection criterion of the generated samples and quickly converges to the target distribution. Note that MCMC-based approaches have recently gained popularity in system identification [Ninness and Henriksen, 2010], [Lindsten et al., 2012], [Bottegal et al., 2014].

The paper is organized as follows. In the next section, we introduce the problem of the identification of dynamic systems from quantized data. In Section 3, we give a Bayesian description of the variables entering the system. In Section 4, we describe the proposed method for iden-tification. Section 5 shows some simulations to assess the performances of the proposed method. Some conclusions end the paper.

2. PROBLEM STATEMENT

We consider the following linear time-invariant BIBO stable output error system

where is the impulse response characterizing the unknown system, which is fed by the input . The set T corresponds to either or , depending on whether the system is continuous-time or discrete-time. The output is corrupted by the additive white Gaussian noise , which has zero mean and unknown variance , and measured at the time instants . If the system is continuous-time, then I can represent any non-uniform sampling, whereas in the discrete-time case we shall consider (i.e., no downsampling). For ease of exposition, in this paper we shall derive our algorithm in the discrete-time case only; the extension to the continuous-time is quite straightforward (see e.g. [Wahba, 1990], [Pillonetto and De Nicolao, 2010]). Actually, the

Fig. 1. Block scheme of the system identification scenario.

output is not directly measurable, and only a quantized version is available, namely

where Q is a known map (our quantizer) of the type

It is well-known that a condition on the threshold to guarantee identifiability of the system is when C = 0, the system can be determined up to a scaling factor [Godoy et al., 2011].

Without loss of generality, let us assume the system to be strictly causal, i.e. input-output data samples are collected during an experiment. We formulate our system identification problem as the problem of estimating the impulse response g for n time instants, namely obtain . Recall that, if n is sufficiently large, these samples can be used to approximate the dynamics of the systems with arbitrary accuracy [Ljung and Wahlberg, 1992]. Introducing the vector notation

the input-output relation for the available samples can be

so that our estimation problem can be cast in, say, a “linear regression plus quantization” form.

3. BAYESIAN MODELS FOR THE QUANTITIES OF INTEREST

3.1 Establishing a prior for the system

In this paper we cast the system identification problem into a Bayesian framework. Our starting point is the setting of a proper prior on g. Following a Gaussian regression approach [Rasmussen and Williams, 2006], we model g as a zero-mean Gaussian random vector, i.e. we assume the following probability density function for g:

where is a covariance matrix whose structure depends on the value of the and 0 is the scaling hyperparameter. In this context, is usually called a kernel and determines the properties of the realizations of g. In this paper, we choose from the family of stable spline kernels [Pillonetto and De Nicolao, 2010], [Pillonetto et al., 2011]. Such kernels are specifically designed for system identification purposes and give clear advantages compared to other standard kernels [Bottegal and Pillonetto, 2013], [Pillonetto and De Nicolao, 2010] (like the quadratic kernel or the Laplacian kernel, see [Sch¨olkopf and Smola, 2001]). In this paper we make use of the first-order stable spline kernel (or TC kernel in [Chen et al., 2012a]). It is defined by

The above kernel is parameterized by , which regulates the decaying velocity of the generated impulse responses.

3.2 Bayesian description of the non-quantized output

Since we have assumed Gaussian distribution of the noise v, the joint distribution of the vectors z and g, given values of and the noise variance , is jointly Gaussian, namely

where

and Σ. It follows also that the posterior distribution of g, given the knowledge of z, is Gaussian, namely

where

In [Pillonetto and De Nicolao, 2010], an impulse response estimator is derived starting from (9). In fact, the minimum mean-squared error (MMSE) estimate of g is (see e.g. [Anderson and Moore, 1979])

Such an estimator depends on the kernel hyperparameters and the noise variance. A common strategy to choose the kernel hyperparameters is to maximize the marginal likelihood of z, that is

An estimate ˆof can be computed by means of the following steps:

ˆ(13) in order to obtain an estimate of g; (2) Compute the empirical estimate of

Clearly, the system identification method described above is not applicable in our problem, since z is not available. However, we can draw some information on such a vector from the quantized output y. First, note that from (7) it follows that

Note also that, once g is given, (15) is independent of and . Let denote the t-th row of U. Then, for each entry it holds that (see e.g. [Bae and Mallick, 2004])

where ) denotes a Gaussian distribution trun- cated below a and above b, whose original mean and variance are and respectively. Note that, for

due to the assumption on whiteness of noise.

3.3 Bayesian description of hyperparameters and noise variance

As mentioned in the previous subsection, knowing the values of hyperparameters is of paramount importance in kernel-based methods. The marginal likelihood maximization approach (12) is not applicable here, so we have to think of alternative ways to estimate the values of the hyperparameters.

Let us denote by Ga(a, b) the Gamma distribution with parameters a and b. The following result, drawn from [Magni et al., 1998], shows the marginal distribution of the inverse of the hyperparameter given g and .

Lemma 3.1. The posterior probability distribution of given g and is

Remark 3.2. To obtain the result of the above lemma, we have implicitly set an improper prior on with nonnegative support, i.e., ), where ) is the indicator function with support .

A similar argument holds for the noise variance . Recalling that v is Gaussian with variance equal to and readapting Lemma 3.1 to this case, it turns out that

where also here we have assumed the improper prior ).

In some situations, e.g. when the quantizer has mild effects on the measured signal (e.g., when the resolution of the quantizer is high), a sufficiently reliable estimate of can be obtained by using (14), with z replaced by y. However, note that in general ˆis not a consistent estimate of .

We conclude this section by recalling that, unfortunately, establishing a Bayesian model for is still an unsolved problem. In this paper, we shall consider such an hyperparameter as deterministic. A method for its choice will be discussed in the next section.

4. PROPOSED SYSTEM IDENTIFICATION METHOD

In this section we show how to exploit the Bayesian models introduced in the previous section to derive our system identification method. Assume for the moment that is known and let us drop it from the notation below. The impulse response estimate ˆg can be obtained by computing the following integral

where

denotes the joint distribution of the random quantities described in the previous section, given the quantized output y. The above integral can be computed using Markov Chain Monte Carlo (MCMC) methods [Gilks et al., 1996], by drawing a large number of samples from the distribution ) (usually called target distribution) and then averaging over g. In general, drawing samples from a distribution is a hard problem, if its probability density function does not admit a closed-form expression. However, if all the conditional probability densities of such a distribution are available in closed-form, the problem of sampling from the target distribution can be solved efficiently by resorting on a special case of the MetropolisHastings method, namely the Gibbs sampler (see e.g. [Gilks et al., 1996]). The basic idea is that each conditional random variable is the state of a Markov chain; then, by drawing samples from each conditional probability density iteratively, we converge to the stationary state of this Markov chain and generate samples of the target distribution. Here, the conditionals of (21) are as follows.

where each of the factors is a truncated Gaussian according to (16).

(2) ). Once g is given, becomes independent of all the other variables, i.e. such conditional density becomes and corresponds to (18), namely a Gamma distribution with parameters

(3) ). Once g and z are given, one can compute Recalling (19), it follows that this conditional density can be written as ) and is distributed as a Gamma random variable with parameters ( ).

(4) ). Given z, information carried by y becomes redundant and can be discarded, so that this conditional probability density corresponds to Its closed-form expression is given by (9), namely a Gaussian distribution with mean Cz and covariance matrix P (see (10)).

Given the above conditional densities, we are in position to illustrate the proposed identification algorithm.

In the above algorithm, the parameters M and are introduced. M the number of samples to be generated; clearly, large values of M should guarantee more accurate estimates of is the number of initial samples drawn from the conditional of g to be discarded and is also known as burn-in [Meyn and Tweedie, 2009]. In fact, the conditionals from which those samples are drawn are to be considered as non-stationary, since the Gibbs sampler takes a certain number of iterations to get close to a stationary distribution.

Setting initial values of the Gibbs sampler The initial estimate can be computed using the kernel-based method introduced in [Pillonetto and De Nicolao, 2010] and briefly revisited in Section 3.2. Replacing z with y in (11), one can obtain a (very) rough estimate of g, which can serve as initial condition for the Gibbs sampler.

Similarly, the initial value can be computed from (14), again by replacing z with y.

4.1 Estimation of the hyperparameter β

It remains to set a scheme for estimating . In [Chen et al., 2012b], an exact marginal likelihood maximization approach is proposed, but it is shown that such an approach needs a solution of a complicated integral. In this paper, we adopy a simple (and approximate) way to estimate . It consists in maximizing the cost function of (12), where, instead of using the non-available data z, we plug the quantized output y. Clearly, one should not expect to get very good results in general, especially when the difference between z and y is high (e.g. when the quantizer is binary). However, numerical experiments have shown that the accuracy on the estimation of with this strategy is satisfactory enough in order to obtain a good performance of the algorithm.

4.2 Comparison with [Chen et al., 2012b]

Although the methods proposed here and the method proposed in [Chen et al., 2012b] exploit the same Bayesian modeling of the unknown system, the techniques used to carry out the estimate of g are substantially different. In [Chen et al., 2012b], the impulse response is obtained as the maximum a posteriori (MAP) estimate given the quantized output data. Here instead, g is computed by means of (20), that is a minimum mean square error Bayes estimator.

5. NUMERICAL EXPERIMENTS

We test the proposed algorithm by means of 2 Monte Carlo experiments of 100 runs each. For each Monte Carlo run, we generate a linear system by picking 10 pairs of complex conjugate zeros with magnitude randomly chosen in [0, 0.95] and random phase. Similarly, we pick 10 pairs of complex conjugate poles with magnitude randomly chosen in [0, 0.93] and random phase. The goal is to estimate n = 50 samples of the impulse response from N input-output data. The inputs are realizations of white noise with unit variance. We compare the following estimators.

• B-Q-GS: This is the method described in this paper, namely a Bayesian system identification method for Quantized output data that uses the Gibbs Sampler. The parameter M, denoting the number of samples generated by the sampler, is set to 3 10. The first samples are discarded. The validity of the choice of M and is checked by assessing that quantiles 0.25, 0.5, 0.75 are estimated with good precision [Raftery and Lewis, 1996].

• SS-ML: This is the nonparametric kernel-based method proposed in [Pillonetto and De Nicolao, 2010] and revisited in [Chen et al., 2012a], which is not designed for handling quantized data. This method requires the estimation of the same parameters as our proposed

method. The kernel adopted for identification is (6). Its hyperparameters are estimated using (12), while is estimated through (14) (in both cases replacing z with y).

• LS: This is the least-squares estimator, where the data employed to estimate g are the quantized output measurements y. Note that, in principle, here the parameter n should be estimated from data using complexity criteria such as AIC or BIC [Ljung, 1999]. Here, for simplicity, we fix it to 50.

• SS-ML-NQ: Same as SS-ML. However, here we make use of the non-quantized vector z. Hence, this estimator exploits information which is not available in practice in this problem.

• LS-NQ: Least-squares estimator having access to the vector z. The parameter n is fixed as for the LS estimator.

The performances of the estimators are evaluated by means of the fitting score, computed as

where is the impulse response generated at the i-th run, ¯its mean and ˆthe estimate computed by the tested methods.

5.1 Binary quantizer

The first experiment is on the following binary quantizer

For each Monte Carlo run, the noise variance is such that var

noiseless (non-quantized) output and the noise is equal to 10. We generate N = 500 data samples. Figure 2

Fig. 2. Box plots of the fitting scores for the binary quantizer experiment.

shows the results of the Monte Carlo runs. The advantage of using the proposed identification technique, compared to methods which do not account for the quantizer, is evident. Despite the large loss of information caused by the quantizer, the proposed method gives a fit which is quite comparable to the oracle methods. Figure 3 reports one of the generated scenarios. It can be seen that there is a substantial difference between y and z. Nonetheless, the accuracy of the estimation of the impulse response is acceptable.

5.2 Ceil-type quantizer

In the second experiment we test the performance of our method on systems followed by a ceil-type quantizer, which is defined as

Again, for each Monte Carlo run, the noise variance is such that var

As shown in Figure 4, in this case, if one compares the oracle-type methods (i.e. SS-ML-Or. and LS-Or.) with the same methods using quantized data (SS-ML and LS), the loss of accuracy is relatively low. This because this type of quantizer has a mild effect on the measurements. It can be seen, however, that the proposed method is able to give a fit that is comparable to the standard kernel-based method that uses non-quantized data (SS-ML-Or). Moreover, it outperforms the least-squares estimator equipped with the knowledge of non-quantized data.

Fig. 4. Box plots of the fitting scores for the ceil-type quantizer experiment.

6. CONCLUSIONS

In this paper, we have introduced a novel method for system identification when the output is subject to quantization. We have proposed a MCMC scheme that exploits the Bayesian description of the unknown system. In particular, we have shown how to design an integration scheme based on the Gibbs sampler by exploiting the knowledge of the conditional probability density functions of the variables entering the system. We have highlighted, through some numerical experiments, the advantages of employing our method when quantizers affect the accuracy of measurements.

Important questions such as consistency of the method and robust selection of the kernel hyperparameter are currently under study.

REFERENCES

B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, N.J., USA, 1979.

K. Bae and B.K. Mallick. Gene selection using a twolevel hierarchical Bayesian model. Bioinformatics, 20 (18):3423–3430, 2004.

G. Bottegal and G. Pillonetto. Regularized spectrum estimation using stable spline kernels. Automatica, 49 (11):3199–3209, 2013.

Fig. 3. Left: Example of output data with the binary quantizer (samples between time instants 100 and 200). Right: Impulse response of the system generating the data and its estimates.

G. Bottegal, A.Y. Aravkin, H. Hjalmarsson, and G. Pillonetto. Outlier robust system identification: a Bayesian kernel-based approach. In Proceedings of IFAC World Congress, 2014.

M. Casini, A. Garulli, and A. Vicino. Input design in worst-case system identification using binary sensors. Automatic Control, IEEE Transactions on, 56(5):1186– 1191, 2011.

M. Casini, A. Garulli, and A. Vicino. Input design in worst-case system identification with quantized measurements. Automatica, 48(12):2997–3007, 2012.

T. Chen, H. Ohlsson, and L. Ljung. On the estimation of transfer functions, regularizations and Gaussian processes - revisited. Automatica, 48(8):1525–1535, 2012a.

T. Chen, Y. Zhao, and L. Ljung. Impulse response estimation with binary measurements: a regularized FIR model approach. In Proceedings of the 16th IFAC Symposium on System Identification, volume 16, pages 113–118, 2012b.

E. Colinet and J. Juillard. A weighted least-squares approach to parameter estimation problems based on binary measurements. Automatic Control, IEEE Transactions on, 55(1):148–152, 2010.

W.R. Gilks, S. Richardson, and D.J. Spiegelhalter. Markov chain Monte Carlo in Practice. London: Chapman and Hall, 1996.

B.I. Godoy, G.C. Goodwin, J.C. Ag¨uero, D. Marelli, and T. Wigren. On identification of FIR systems having quantized output data. Automatica, 47(9):1905–1915, 2011.

B.I. Godoy, P.E. Valenzuela, C.R. Rojas, J.C. Aguero, and B. Ninness. A novel input design approach for systems with quantized output data. In Control Conference (ECC), 2014 European, pages 1049–1054. IEEE, 2014.

J. Guo and Y. Zhao. Recursive projection algorithm on FIR system identification with binary-valued observations. Automatica, 49(11):3396–3401, 2013.

K. Jafari, J. Juillard, and M. Roger. Convergence analysis of an online approach to parameter estimation problems based on binary observations. Automatica, 48(11):2837– 2842, 2012.

F. Lindsten, T. B. Sch¨on, and M. Jordan I. A semiparametric bayesian approach to wiener system identifica-tion. In Proceedings of the 16th IFAC Symposium on System Identification, 2012.

L. Ljung. System Identification, Theory for the User. Prentice Hall, 1999.

L. Ljung and B. Wahlberg. Asymptotic properties of the least-squares method for estimating transfer functions and disturbance spectra. Advances in Applied Probability, pages 412–440, 1992.

P. Magni, R. Bellazzi, and G. De Nicolao. Bayesian function learning using MCMC methods. IEEE Transactions on Pattern Analysis Machince Intelligence, 20 (12):1319–1331, 1998.

S.P. Meyn and R.L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, 2009.

B. Ninness and S. Henriksen. Bayesian system identifica-tion via Markov chain Monte Carlo techniques. Automatica, 46(1):40–51, 2010.

G. Pillonetto and G. De Nicolao. A new kernel-based approach for linear system identification. Automatica, 46(1):81–93, 2010.

G. Pillonetto, A. Chiuso, and G. De Nicolao. Prediction error identification of linear systems: a nonparametric Gaussian regression approach. Automatica, 47(2):291– 305, 2011.

G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung. Kernel methods in system identification, machine learning and function estimation: A survey. Automatica, 50(3):657–682, 2014.

A. E. Raftery and S. M. Lewis. The number of iterations, convergence diagnostics and generic Metropolis algorithms. In Markov Chain Monte Carlo in practice. Chapman & Hall, 1996.

C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.

B. Sch¨olkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. (Adaptive Computation and Machine Learning). The MIT Press, 2001.

T. S¨oderstr¨om and P. Stoica. System Identification. Prentice-Hall, 1989.

G. Wahba. Spline models for observational data. SIAM, Philadelphia, 1990.

L.Y. Wang, J.F. Zhang, and G.G. Yin. System identifi-cation using binary sensors. Automatic Control, IEEE Transactions on, 48(11):1892–1907, 2003.

L.Y. Wang, G.G. Yin, and J.F. Zhang. Joint identification of plant rational models and noise distribution functions using binary-valued observations. Automatica, 42(4): 535–547, 2006.

L.Y. Wang, G.G. Yin, J.F. Zhang, and Y. Zhao. System identification with quantized observations. Springer, 2010.

designed for accessibility and to further open science