An Empirical Analysis of Constrained Support Vector Quantile Regression for Nonparametric Probabilistic Forecasting of Wind Power

2018·arXiv

Abstract

Introduction

Predicting and managing uncertainty in the production of wind power is one of the biggest challenges facing its integration into the smart grid. Forecasting uncertainty in wind is needed for many operational applications in a wind farm from turbine and storage control to bidding and trading in energy markets. Forecasting horizons can be categorized into three main time scales: short-term looking out several hours or days, long-term looking out to weeks or a month, and seasonal. Traditionally wind power prediction is based on deterministic point forecasts where they provide an expected output for a given look-ahead time. These forecasts however lack uncertainty information. As such a large research effort has been taken recently by the renewables forecasting community (Hong et al. 2016) to produce full probabilistic predictions which derive quantitative information on the associated uncertainty of power output. Although various methods have been proposed, it is still a challenge to make accurate and robust probabilistic predictions for highly nonlinear and complex data, such as wind.

Probabilistic wind models are based on either meteorological ensembles that are obtained by a weather model (Giebel et al. 2003) or on statistical learning methods (Foley et al. 2012). Focusing on statistical learning, these methods can be applied to forecast full predictive distributions in the form of quantiles or prediction intervals. For instance, in (Pinson and Kariniotakis 2004) prediction intervals are estimated by adaptive re-sampling which is a common probabilistic forecasting strategy. Quantile regression (QR) is another very popular approach. In (Bremnes 2004) local QR is applied to estimate different quantiles while In (Nielsen, Madsen, and Nielsen 2006) spline based QR is used to estimate quantiles of wind power. In (Landry et al. 2016) quantile loss gradient boosted machines are used to estimate 99 quantiles and in (Juban et al. 2016) multiple quantile regression is used to predict a full distribution with optimization done using the alternating direction method of multipliers. A thorough overview of probabilistic wind power forecasting is provided in (Zhang, Wang, and Wang 2014).

In most of these approaches, estimation of each quantile is conducted independently. This could lead to the quantile cross over problem where a lower quantile overlaps a higher one. This is undesirable as it violates the principle of distribution functions where their associated inverse functions should be monotone increasing. A way to prevent this issue is to utilize a simple heuristic of reordering estimated quantiles, however this does not have much theoretical basis and may lead to inappropriate quantiles.

The solution then is to optimize quantiles together with non-crossing constraints. In (Takeuchi et al. 2006) a constrained support vector quantile regression (CSVQR) method was developed with non-crossing constraints where it was used to fit quantiles on static data. This formulation is re-purposed here for probabilistic forecasting. Other machine learning frameworks have been used before for uncertainty prediction of renewables such as nearest neighbors (Mangalova and Shesterneva 2016), neural networks (Sideratos and Hatziargyriou 2012), and extreme learning machines (Wan et al. 2014) but support vector machines (SVMs) have yet to be examined for wind uncertainty forecasting. We propose that SVMs are not only effective in long term prediction due to their ability to handle nonlinear data via kernels but can be easily extended with constraints to ensure non-overlapping quantile estimates. Our study is the first to showcase the use of CSVQR with a sliding window of training data as well as showcase the effectiveness of constraints to ensure monotonically increasing quantiles for probabilistic prediction. We provide the derivation of CSVQR and analysis of experimental results on publicly available wind data. Several common benchmark methods are used for comparison.

Nonparametric Probabilistic Forecasting

This sections highlights the underlying theory and evaluation methods used in probabilistic forecasting. For a random variable such as wind power at time i its probability density function is defined as and its the cumulative distribution function as . If is a strictly increasing, the quantile with proportion of the random variable is uniquely defined as the value x such that or equivalently as the inverse of the distribution function . A quantile forecast with nominal proportion is an estimate of the true quantile for the lead time i + k, given predictor values (such as numerical wind speed forecasts). Prediction intervals then give a range of possible values within which an observed value is expected to lie with a certain probability . A prediction interval produced at time i for future horizon i + k is defined by its lower and upper bounds, which are the quantile forecasts whose nominal proportions and are such that .If it is assumed the future density function will take a certain form then this is called parametric probabilistic forecasting. For a nonlinear and bounded process such as wind generation, probability distributions of future wind power for instance may be skewed and heavy-tailed distributions (Dorvlo 2002). Else if no assumption is made about the shape of the distribution, a nonparametric probabilistic forecast (Pin- son et al. 2007) can be made of the density function by gathering a set of M quantiles forecasts such that

with chosen nominal proportions spread on the unit interval. In this paper we consider nonparametric forecasting of wind power on the resolution of one hour (predicting outwards to a month worth of values). On a short time scale of an hour, the wind density may fluctuate therefore making non-parametric forecasting more ideal then fitting a parametric density (Zhang, Wang, and Wang 2014). For nonparametric probabilistic forecasting quantile regression, introduced by (Koenker and Bassett Jr 1978), is a popular choice for estimating conditional quantiles. It is closely related to models for the conditional median (Koenker 2005). Minimizing the mean absolute function leads to an estimate of the conditional median of a prediction. By applying asymmetric weights to errors through a tilted form of the absolute value function the conditional quantiles of a predictive distribution can be computed. To achieve this the pin ball loss function is used, which is de-fined by

Figure 1: Plot of the pinball function for different values.

where . A visualization of the pinball function with several different values of is shown in Fig. 1. Given a vector of predictors where i = 1, ..., N, weights w and intercept b coefficient in a linear regression fashion, the conditional quantile is given by . The weights and intercept can be estimated by solving the following minimizing problem

where is the observed value of the predictand. The problem in Eq. (1) can be minimized by linear programming.

Evaluation Methods

In probabilistic forecasting it is important to evaluate the quantile estimates and derived predictive intervals. Prediction intervals (PIs) show where future wind power observations are expected to lie with an assigned probability termed as the PI nominal confidence (PINC) . The coverage probability of estimated PIs are expected to eventually reach a nominal level of confidence over the test data. A good measure for reliability which shows target coverage of the PIs is the PI coverage probability (PICP) which is de-fined by

is the indicator of PICP and N is the number of test samples. For reliable PIs, the examined PICP should be close to its corresponding PINC. A related assessment index is the average coverage error (ACE) which is defined by

To ensure PIs with high reliability, the ACE should be as close to zero as possible. Next to evaluate quantile estimates and full predictive densities it is important to use the pinball function as an assessment score called the quantile score (Qscore). The Q-score is obtained for every estimated quantile and is averaged over all target quantiles for all future time steps. For a quantile forecast the Q-score is defined as

where y is the observation used for forecast evaluation. A lower Q-score indicates a better forecast.

Support Vector Quantile Regression

To fit the nonlinearity of wind data, nonlinear quantile regression (NQR) can be utilized. NQR is implemented by projecting an input vector x into a potentially higher dimensional feature space F using a nonlinear mapping function implicitly defined by a kernel K. This gives the functional form of where is the -th quan- tile of the distribution of y conditional on the values of x, is a vector of parameters. The NQR simplifies into linear quantile regression if . To solve the NQR problem it can be expressed by the following formulation with added penalty to prevent overfitting

By introducing slack variables and the problem can be re-written as a support vector quantile regression problem

Non-crossing Quantile Constraints

In Eq. (2) a single quantile is estimated. To estimate multiple quantiles this formulation could be run to solve for different ’s independently. However in doing so quantiles may cross each other which is not desirable since it violates the principle of monotone increasing inverse density functions. To prevent this, constraints need to be introduced (Takeuchi et al. 2006). are defined as the orders of M conditional quantiles to be estimated. To ensure these quantiles do not cross each other the following constraint is needed . With this constraint the primal problem of the non-crossing conditional quantile estimator is given by

The Largrangian for the problem is then defined by

where a Lagrange multiplier is introduced for m = , and . By letting the partial derivatives of L with respect to be zero, the following is obtained

Partial derivatives of the other primal variables and are

Plugging these equalities back into Eq. (4) the following dual minimization problem can be obtained

From this dual formulation the conditional quantile can then be given by

Since the dual form is a quadratic programming (QP) problem it can be solved by a number of QP methods. For testing the constrained SVQR (CSVQR) method the radial basis function (RBF) kernel is utilized as it is a popular kernel function choice for support vector machines. Other kernels were tested on the case data sets described in the next section but resulted in poor results. The RBF kernel, given two samples x and which are represented as feature vectors, is calculated as

An advantage of a RBF kernel is that it can project vectors into an infinite dimensional feature space. In order to quickly solve for conditional quantile estimates sequential minimization optimization (Platt and others 1998) is applied to Eq. (8).

Application To The GEFCom2014 Dataset

Data for this case study comes from the publicly available Global Energy Forecasting Competition 2014 (Hong et al. 2016). The goal of the competition was to design parametric or nonparametric forecasting methods that would allow conditional predictive densities of the wind power generation to be described as a function of input data which were future weather forecasts and/or past wind power. Data is provided for the years of 2012 and 2013 from 10 wind farms titled Zone 1 to Zone 10. The predictors are numerical weather predictions (NWPs) in the form of wind speeds at an hourly resolution at two heights, 10m and 100m above ground level. These forecasts are for the zonal and meridional wind components (denoted U and V). It was up to users to deduce exact wind speed, direction, and other wind features if necessary. These NWPs were provided for the exact locations of the wind farms. Additionally, power measurements at the various wind farms, with an hourly resolution, are also provided. All power measurements are normalized by the nominal capacity of their wind farm. The goal in forecasting was to learn to associate the provided NWPs (or derived features) with wind power. Then NWPs are provided for the forecasting horizon of one month and it is up to a learning model to use those NWPs as input to a learning model to predict quantiles at each future time step. Fig. 2 showcases an example month worth of data where Fig. 2.a shows the four NWP given and Fig. 2.b shows their corresponding normalized wind power output.

In our analysis of CSVQR we used the summer months of June 2013 to August 2013 and fall months of September 2013 to November 2013 for testing from Zone 1. Training was done using a sliding window of three previous months to forecast the fourth month. For instance to predict June training was done on observed data from March to May, then to predict July training was done from April to June, etc. Thirteen features were derived from the raw data for training the CSVQR model. Features used are derived wind speeds at 10m and 100m, wind direction at 10m and 100m, wind energy at 10m and 100m, wind shear, wind energy difference (between 10m and 100m), wind direction difference (between 10m and 100m), and included in training are also

Figure 2: (a) Example plot of numerical wind predictions at 10m and 100m for U and V directions used as inputs to forecast wind power. (b) Observed wind power corresponding to the same time stamps.

the four raw wind speeds at 10m and 100m for U and V directions. All features were normalized between 0 and 1. Denoting u and v as the wind components and d as the energy density (we used d = 1), the equations used to compute wind speed (ws), wind direction (wd), wind energy (we), and wind shear (wsh) are

To empirically analyze the CSVQR model as an appropriate method for wind forecasting it is compared with two industry models and a naive model that are used for benchmarking in probabilistic wind forecasting applications (Sideratos and Hatziargyriou 2012; Pinson et al. 2007; Pinson and Kariniotakis 2010). The first is called the persistence method which is the most common benchmark and is considered difficult to outperform for short-term forecasting. This method corresponds to the persistence distribution and is formed by the most recent observations. For this case study, the past 12 hours of wind power observations were used to form the persistence distribution. Second method is the climatology approach where its predictive distribution is unconditional and based on all available past wind power observations. It is considered harder to beat in long-term forecasting. Lastly, the uniform distribution is used for a naive benchmark method where it assumes all wind power values at each time step occur with equal probability.

Results To visualize a probabilistic forecast Fig. 3 shows an example prediction for 80%, 60% 40%, and 20% prediction intervals for the month of July 2013. Observed wind power is shown in red. From such probabilistic forecasts it is then possible to derive full predictive density functions following that the estimated conditional quantiles are nondecreasing (Quinonero-Candela et al. 2006). Evaluation results for reliability of probabilistic forecasts in the form of prediction intervals of wind power over the months of June 2013

Figure 3: Example plot of estimated 80%, 60%, 40%, and 20% prediction intervals along with observed wind power in red for the month of July 2013.

to November 2013 is shown in Table 1. Results are shown for the CSVQR method and for the climatology, persistence, and uniform benchmark methods. Evaluation metrics for the PINC are the PICP and ACE. For the month of June and October, the climatology method was slightly better but this was due to the fact that this model can yield wide intervals to cover more data. However in all other months CSVQR outperformed all three benchmarks by several magnitudes. To further fully evaluate the forecasts it is also important to look at the quantile score to measure the coverage of the estimated quantiles. Table 2 shows the summary of Q-scores averaged across all quantiles from all lookahead periods for every forecast month. Their standard deviation is also provided to quantify the amount of variation among the quantiles. The Q-scores of the proposed approach was very low and gave excellent probabilistic forecasts across all different months.

Discussion

Wind power forecasting is crucial for many decision making problems in power systems operations, and is a vital component in integrating more wind into the power grid. Due to the chaotic nature of the wind it is often difficult to forecast. Uncertainty analysis in the form of probabilistic wind prediction can provide a better picture of future wind coverage. This paper studies a framework for probabilistic forecasting using support vector quantile regression with non-crossing constraints to ensure multiple quantiles can be predicted without overlapping each other. Effectiveness of the CSVQR approach is validated with the real world dataset of the Global Energy Forecasting Competition 2014. Forecasts are compared to common benchmarks and are evaluated using the quantile score and reliability metrics. Results show adequate reliability and low quantile scores across the prediction horizon, which verify effectiveness of the model for forecasting while preventing estimated quantiles from overlapping. Furthermore, this approach has the potential to be applied across a variety of domains. Future work will look into applying CSVQR to forecast electricity pricing and load demand for smart grid applications.

References

[Bremnes 2004] Bremnes, J. B. 2004. Probabilistic wind power forecasts using local quantile regression. Wind Energy 7(1):47–54.

[Dorvlo 2002] Dorvlo, A. S. 2002. Estimating wind speed distribution. Energy Conversion and Management 43(17):2311–2318.

[Foley et al. 2012] Foley, A. M.; Leahy, P. G.; Marvuglia, A.; and McKeogh, E. J. 2012. Current methods and advances in forecasting of wind power generation. Renewable Energy 37(1):1–8.

[Giebel et al. 2003] Giebel, G.; Landberg, L.; Badger, J.; Sat- tler, K.; Feddersen, H.; Nielsen, T. S.; Nielsen, H. A.; and Madsen, H. 2003. Using ensemble forecasting for wind power. Proceedings Cd-rom. Cd 2.

[Hong et al. 2016] Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; and Hyndman, R. J. 2016. Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond. International Journal of Forecasting 32(3):896–913.

[Juban et al. 2016] Juban, R.; Ohlsson, H.; Maasoumy, M.; Poirier, L.; and Kolter, J. Z. 2016. A multiple quantile regression approach to the wind, solar, and price tracks of gef-com2014. International Journal of Forecasting 32(3):1094– 1102.

[Koenker and Bassett Jr 1978] Koenker, R., and Bassett Jr, G. 1978. Regression quantiles. Econometrica: journal of the Econometric Society 33–50.

[Koenker 2005] Koenker, R. 2005. Quantile regression. Number 38. Cambridge university press.

[Landry et al. 2016] Landry, M.; Erlinger, T. P.; Patschke, D.; and Varrichio, C. 2016. Probabilistic gradient boosting machines for gefcom2014 wind forecasting. International Journal of Forecasting 32(3):1061–1066.

[Mangalova and Shesterneva 2016] Mangalova, E., and Shesterneva, O. 2016. K-nearest neighbors for gefcom2014 probabilistic wind power forecasting. International Journal of Forecasting 32(3):1067–1073.

Table 1: Results of prediction interval reliability in different months.

[Nielsen, Madsen, and Nielsen 2006] Nielsen, H. A.; Mad- sen, H.; and Nielsen, T. S. 2006. Using quantile regression to extend an existing wind power forecasting system with probabilistic forecasts. Wind Energy 9(1-2):95–108.

[Pinson and Kariniotakis 2004] Pinson, P., and Kariniotakis, G. 2004. On-line assessment of prediction risk for wind power production forecasts. Wind Energy 7(2):119–132.

[Pinson and Kariniotakis 2010] Pinson, P., and Kariniotakis, G. 2010. Conditional prediction intervals of wind power generation. IEEE Transactions on Power Systems 25(4):1845–1856.

[Pinson et al. 2007] Pinson, P.; Nielsen, H. A.; Møller, J. K.; Madsen, H.; and Kariniotakis, G. N. 2007. Non-parametric probabilistic forecasts of wind power: required properties and evaluation. Wind Energy 10(6):497–516.

[Platt and others 1998] Platt, J., et al. 1998. Sequential min- imal optimization: A fast algorithm for training support vector machines.

[Quinonero-Candela et al. 2006] Quinonero-Candela, J.; Rasmussen, C. E.; Sinz, F.; Bousquet, O.; and Sch¨olkopf, B. 2006. Evaluating predictive uncertainty challenge. 1–27.

[Sideratos and Hatziargyriou 2012] Sideratos, G., and Hatziargyriou, N. D. 2012. Probabilistic wind power forecasting using radial basis function neural networks. IEEE Transactions on Power Systems 27(4):1788–1796.

[Takeuchi et al. 2006] Takeuchi, I.; Le, Q. V.; Sears, T. D.; and Smola, A. J. 2006. Nonparametric quantile estimation. Journal of Machine Learning Research 7(Jul):1231–1264.

[Wan et al. 2014] Wan, C.; Xu, Z.; Pinson, P.; Dong, Z. Y.;

and Wong, K. P. 2014. Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Transactions on Power Systems 29(3):1033–1044.

[Zhang, Wang, and Wang 2014] Zhang, Y.; Wang, J.; and Wang, X. 2014. Review on probabilistic forecasting of wind power generation. Renewable and Sustainable Energy Reviews 32:255–270.

Table 2: Summary of the mean Q-score across all quantiles for a given method and month and their standard deviation.

Designed for Accessibility and to further Open Science