Variational Bayesian Methods for Stochastically Constrained System Design Problems

2020·Arxiv

Abstract

Abstract

We study system design problems stated as parameterized stochastic programs with a chance-constraint set. We adopt a Bayesian approach that requires the computation of a posterior predictive integral which is usually intractable. In addition, for the problem to be a well-defined convex program, we must retain the convexity of the feasible set. Consequently, we propose a variational Bayes-based method to approximately compute the posterior predictive integral that ensures tractability and retains the convexity of the feasible set. Under certain regularity conditions, we also show that the solution set obtained using variational Bayes converges to the true solution set as the number of observations tends to infinity. We also provide bounds on the probability of qualifying a true infeasible point (with respect to the true constraints) as feasible under the VB approximation for a given number of samples.

1 Introduction

A general system design problem can be formally stated as the following constraint optimization problem

where is the input/control vector in a convex set is the system parameter vector. The function encodes the cost/risk associated with the given values of parameter and control variable respectively. Similarly, the functions define the constraints on . Under certain regularity conditions on the cost and the constraint functions, and for a given value of the true system parameter , the problem (TP) at can be solved to obtain the optimal control vector . In practice, the true system parameters are unknown and these parameters must be estimated using observed data.

In this paper, we take a Bayesian approach and model the uncertainty over the parameters by computing a posterior distribution ) for a given prior distribution ) and the likelihood ) of observing data . We approximate the true problem (TP), using the posterior distribution, with the following joint chance-constrained problem:

where 1) is the specified confidence level desired by the decision maker (DM) based on the requirement, usually . We provide a supporting example (see Appendix B) to motivate the chance-constraint formulation as opposed to using expectations, in which case the constraints are only satisfied on an average. The unconstrained version of the above problem has been studied as a special case in Jaiswal et al. (2019).

In practice, computing posterior distributions is challenging and mostly intractable, and is typically approximated using Markov Chain Monte Carlo (MCMC) or Variational Bayesian(VB) methods. MCMC methods has its own drawbacks like poor mixing, large variance, and complex diagnostics, which have been the usual motivation for using VB (Blei et al., 2017). Here, we provide another important motivation for using VB in the chance-constrained Bayesian inference setting. In particular, we present an example (motivated from Pena-Ordieres et al. (2019)) where a sampling based approach to approximate the chance-constraint convex feasibility set (constraint set) in (BJCCP), results in a non-convex approximation; whereas an appropriate VB approximation retains its convexity. Therefore, we approximate (BJCCP) using a VB approximate posterior

minimize s.t.

Under certain regularity conditions, we also show that the optimizers of (VBJCCP) are consistent with those of (TP). More precisely, we show that the solution set obtained in (VBJCCP) converges to the true solution set as the number of observations, n tends to infinity. We also provide bounds on the probability of qualifying a true infeasible point (with respect to the true constraints) as feasible under the VB approximation for a given number of samples. As part of the future work, we want to analyze the risk-sensitive VB approximation of the (BJCCP), where the risk is quantified as the deviation of the approximate feasibility set from the true.

2 Variational Bayes for Chance-Constrained System Design

Bayesian statistics delineates natural principles to model uncertainty in parameter estimation, using observed data combined with prior knowledge. Let independent and identically distributed samples from the F measurable random vector ) with support Ω on probability space (Ωas the associated probability measure, with parameter

Using the posterior distribution ), we approximate (TP) as a data-driven joint chance-constrained problem, stated formally as:

and 1) is the specified confidence level desired by the decision maker (DM) based on the requirement. These are the two significant challenges in solving (BJCCP):

1. Computing the posterior distribution: While in some cases conjugate priors can be used, this is not acceptable in most problems; resulting in an intractable computation. The posterior intractability is the common motivation for using VB (Blei et al., 2017) and MCMC techniques for approximate Bayesian inference.

2. Convexity of the feasibility set: Ideally, one should expect (BJCCP) to be a convex program to take advantage of the well established convex solvers. But, even if the posterior distribution is computable, to qualify (BJCCP) as a convex program, the feasibility set,

must be convex. It might be possible that the above set is not convex even when the underlying constraint functions are so (in x) and thus finding a global optimum becomes challenging (Pr´ekopa, 1995).

Note that, if the constraint function has some specified structural regularity and the posterior distribution belongs to a certain class of distributions, then it can be shown that the feasibility set in (1) is convex. For instance, it can be shown that if the constraint functions {1, 2, . . . m} are quasi-convex in () and the distribution is log-concave then the feasibility set in (BJCCP) is convex ((Shapiro et al., 2009, Chapter 4) and Pr´ekopa (2003)). Also, Lagoa et al. (2005) showed that if the constraint function ) is of the form and has a symmetric log-concave density then with the feasibility set in (BJCCP) is convex. To address the posterior intractability, Monte Carlo (MC) methods offer one way to do approximate Bayesian inference with asymptotic guarantees. However, their asymptotic guarantees are offset by issues like poor mixing, large variance and complex diagnostics in practical settings with finite computational budgets. Apart from these common issues, there is another important reason due to which any sampling-based method can not be used directly to solve (BJCCP). Using the empirical approximation to the posterior distribution (constructed using the samples generated from MCMC algorithm) to approximate the chance-constraint feasibility set in (BJCCP), results in a non-convex feasibility set (Pena-Ordieres et al., 2019). To illustrate this, consider the following simple example (modified slightly) of a chance-constraint feasibility set from Pena-Ordieres et al. (2019). We plot in Figure 1(a) the following chance-constraint feasibility set

and its empirical approximator using 8000 MCMC (Metropolis-Hastings with 3000 burn-in samples ) samples generated from the underlying correlated multivariate Gaussian distribution. We observe that the resulting MC approximate feasibility set is non-convex.

Therefore, due to the posterior intractability and the non-convexity of the feasible region when using sampling approaches, as an alternative, we propose to use Variational Bayes (VB) methods.

The idea behind VB is to approximate the intractable posterior ) with an element ) of a simpler variational family Q. Examples of Q include the family of Gaussian distributions, delta functions, or the family of factorized ‘mean-field’ distributions that discard correlations between components of . The variational solution is the element of Q that is ‘closest’ to ), where closeness is usually measured in the Kullback-Leibler (KL) sense. Thus,

Using this, we approximate (BJCCP) with,

minimize s.t.

where is the confidence level. Choosing the approximation to the posterior distribution from a class of ‘simple’ distributions would facilitate in addressing the two critical problems associated with (BJCCP). Besides the tractability of the posterior distribution, for instance, using the results in Pr´ekopa (2003) and Lagoa et al. (2005) the choice of a log-concave family of distributions as the approximating family could retain the convexity of the feasibility set, if the constraint functions have certain structural regularity.

Next, we show that using the popular mean-field variational family to approximate the correlated multivariate Gaussian distribution in the same example in (2), we obtain a smooth and convex approximation to the (BJCCP) feasibility set. First, we compute mean-field approximation for fours different covariance matrices Σ, with fixed variance = 1 but varying covariance . Then, we plot the respective approximate VB chance-constraint feasibility region in Figure 1. We observe that VB approximation provides a smooth convex approximation to the true feasibility set, but it could be outside the true feasibility region if the are positively correlated.

Figure 1: Feasible Region : True Distribution vs Monte Carlo Approximation (5000 samples) vs. VB (mean field approximation).

2.1 Theoretical properties of (VBJCCP)

In this section, we establish theoretical guarantees on the approximate optimal solution set obtained using the VB approximation and show that it converges to the optimal solution set (TP) almost surely in . We show similar result for their corresponding optimal values and . The consistency of the approximate solution follows using techniques from the variational calculus and the consistency of the VB-approximate posterior distribution, which is proved under certain conditions on the prior distribution, likelihood model, and the variational approximation in Wang and Blei (2018). For brevity, we state the following results without any assumptions and proofs; it will be stated formally in Appendix C.

In the next result, we show that the solution obtained in (VBJCCP) are feasible with high probability. Let us define the set where the true constraint is satisfied as and VB-approximate feasibility set is denoted as ˆWe prove the next result using the convergence rate results for VB approximation in Zhang and Gao (2019).

Proposition 2.2. We show that if , then there exists constant {1, 2, . . . m}, such that

References

Aksin, Z., Armony, M., and Mehrotra, V. (2009). The modern call center: A multi-disciplinary perspective on operations management research. Production and Operations Management, 16(6):665–688.

Aktekin, T. and Ekin, T. (2016). Stochastic call center staffing with uncertain arrival, service and abandonment rates: A bayesian perspective. Naval Research Logistics (NRL), 63(6):460–478.

Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association.

Dupacova, J. and Wets, R. (1988). Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. The Annals of Statistics, 16(4):1517–1549.

Gans, N., Koole, G., and Mandelbaum, A. (2003). Telephone call centers: Tutorial, review, and research prospects. Manufacturing & Service Operations Management, 5(2):79–141.

Gross, D., Shortie, J. F., Thompson, J. M., and Harris, C. M. (2008). Simple Markovian Queueing Models. Wiley.

Hong, L. J., Yang, Y., and Zhang, L. (2011). Sequential convex approximations to joint chance constrained programs: A monte carlo approach. Operations Research, 59(3):617–630.

Jaiswal, P., Honnappa, H., and Rao, V. A. (2019). Risk-sensitive variational bayes: Formulations and bounds. arXiv preprint arXiv:1903.05220v3.

Lagoa, C. M., Li, X., and Sznaier, M. (2005). Probabilistically constrained linear programs and risk-adjusted controller design. SIAM Journal on Optimization, 15(3):938–951.

Pena-Ordieres, A., Luedtke, J. R., and Wachter, A. (2019). Solving chance-constrained problems via a smooth sample-based nonlinear approximation.

Pr´ekopa, A. (1995). Stochastic Programming. Springer Netherlands.

Pr´ekopa, A. (2003). Probabilistic programming. Handbooks in operations research and management science, 10:267–351.

Shapiro, A., Dentcheva, D., and Ruszczy´nski, A. (2009). Lectures on stochastic programming: modeling and theory. SIAM.

Wang, Y. and Blei, D. M. (2018). Frequentist consistency of variational bayes. Journal of the American Statistical Association, 0(0):1–15.

Zhang, F. and Gao, C. (2019). Convergence rates of variational posterior distributions. Annals of Statistics.

A A System Design Problem

To illustrate the system design problem in (TP) with an example, we model a queueing system and show that the optimal staffing problem aptly fits into the above framework. Consider a staffing problem where the decision maker (DM) has to decide the optimal number of servers, c, after observing the arrival and service data in an M/M/c queueing system; a queueing system where the inter-arrival times and service times are exponentially distributed (Markovian) with c number of servers, is denoted as M/M/c queueing system. We assume that the rate parameters of the exponentially distributed inter-arrival and service times distribution are unknown and denoted as respectively. Note that , combined together form the system parameter and the number of servers c is the control/input variable. The DM first uses a single server and collects data after the system reaches its ‘steady state’. Since the DM observed congestion in the queues, he/she decides to employ more servers. The DM collects n realizations of the random vector V := {T, S, E}, denoted as are the random variables denoting the arrival, service-start, and service-end time of each customer respectively. We also assume that there is no time lag between the two successive states for any customer and the inter-arrival and service times are independent, that is is independent of 1. The joint likelihood of the arrival and departure times for n customers is

Constraint functions: The DM chooses the number of servers c to maintain a constant measure of congestion. Congestion is usually measured as 1) is the probability that the customer did not wait in the queue. The closed-form expression for 1to be (see Gross et al. (2008))

where is also known as traffic intensity and for a stable queue DM fixes , which is maximum desired fraction of customers delayed in queue and the smallest c is chosen that satisfies:

Referring to the queueing literature, we will use the term the Quality of Service(QoS) constraint for the first constraint. The corresponding constraint optimization problem is

The above staffing problem and its variations are well studied in the queueing literature; interested reader may refer to Gans et al. (2003) and Aksin et al. (2009).

B Other Data-Driven Approaches to solve (TP)

Since the system parameters are unknown in practice, these are usually estimated using the observed data . The simplest approach could be to substitute the maximum likelihood estimates(MLE) ˆ) of the parameters in the (TP) and solve the following approximate problem:

We solved the queueing staffing problem in Appendix A using the MLE approach on simulated data (n observations, with n in 50-400 in increments of 50) and computed the approximate optimal number of servers denoted as . We repeated this experiment over 100 sample paths and

Table 1: Fraction of times violates QoS constraint.

computed ), the fraction of experiments violates the QoS constraint. Table 1 shows that the QoS constraint is violated in over 50% of the experiments.

It is anticipated that the MLE approach is unable to capture the uncertainty in parameter estimation therefore an alternative method is proposed using forecasting techniques. In this approach first, the uncertainty over the parameter estimation is captured by forecasting a probability distribution ) over the system parameters and then the forecast distribution is used to solve the (TP) problem using one of the following two formulations:

• Average-Constraint(AC)

• Chance-Constraint(CC)

Now consider a simple example from Hong et al. (2011), where the true problem is to find is unknown the DM uses data to forecast that is normally distributed. Using the AC formulation, notice that the approximate optimal solution 5. The above simple example shows that AC optimal solution could violate the constraint 50% of the times. On the other hand, CC formulation enables the DM to ensure that the approximate optimal solution satisfy the constraints with higher confidence by setting a higher confidence level(). In forecasting approach, the DM needs to forecast each time the new data is collected. We propose a principled data-driven approach using Bayesian methods, wherein we combine forecasting and optimization. A similar approach has also been discussed in Aktekin and Ekin (2016) to solve the M/M/c staffing problem with abandonment, but crucially relies on the availability of conjugate priors.

C Proofs

C.1 Proof of Proposition 2.1

Next define an indicator function 0 and 0 if t > 0 .

Lemma C.1. We show that for each

Proof. Recall the result in Wang and Blei (2018) that the VB approximate posterior consistent; that is for every

Observe that for any

Observe that, the result in (4) combined with the fact that the first term in (5) is always positive and bounded, implies that limtaking limits on either side of (5), we have

and the lemma follows.

Next we define hypo-convergence and epi-convergence of a sequence of function

Definition C.1 (Hypo-convergence). A sequence of functions hypo-converges to h(x); that is hypo

1. for every

2. there exists a sequence , such that lim inf

Definition C.2 (Epi-convergence). A sequence of functions epi-converges to h(x); that is ep

1. for every

2. there exists a sequence , such that lim sup

Lemma C.2. Under Assumption C.1, we show that,

Proof. Due to Assumption C.1, both the results above are a direct consequence of the result in (Du- pacova and Wets, 1988, Theorem 3.7).

Lemma C.3. We show that under Assumption C.1, hypo-converges to

Proof. Since by Assumption C.1 each ) is continuous in x, therefore upper-semicontinuous(USC) in ) is USC. Also, since the product of non-negative USC functions are also USC, it follows that )) is USC. Similarly, since by assumption function, therefore )) is a random upper- semicontinuous function (Dupacova and Wets, 1988). Now, using the reverse Fatou’s Lemma, for any

therefore

and the result follows.

Proof of Proposition 2.1. We will first show that the assertion of the theorem is true for m = 1. Recall ) is the solution of (VBJCCP): and is the solution of (TP).

Now observe that, since both )) are upper- semicontinuous their corresponding super-level sets are closed; and if X is bounded than the corresponding feasible sets are also compact. Also, if the the corresponding feasibility sets are non-empty then the corresponding optimal sets are also non-empty.

Next let us assume that there exists a true solution of (TP) which lies in the interior of X, that is for any 0, there is 0. It implies that there exists a sequence 0. Since, due to our result in Lemma C.1 ) converges pointwise to , therefore there exists an such that for all . Hence for all is a feasible solution of (VBJCCP) and therefore ). Taking lim sup on either sides, we obtain

where the last inequality follows from Lemma C.2 (1). Now, since x can be chosen arbitrarily close to , it follows that

Next, let ˆ; that is ˆSince X is compact, we assume that ˆ. Due to Lemma C.3, hypo-converges to , therefore we have

Now using the fact that 1, it follows from (11) that is a feasible point of (TP), since lim supTherefore, it follows that Since, due to Lemma C.2 (2), lim inf, it follows that

Hence, it follows from (10) and (12) that and it also follows that true solution of (TP), therefore . The above arguments can be easily generalized for the general case with m number of constraints.

C.2 Proof of Proposition 2.2

Proof. Using Markov’s inequality observe that for any

Now using Theorem 2.1 in Zhang and Gao (2019), it follows that for each

satisfies assumption (C1), then there exists a constant

where . Now observe that, using the above result in (14) directly proves the assertion of the proposition.