b

DiscoverSearch
About
My stuff
Variational Bayesian Methods for Stochastically Constrained System Design Problems
2020·arXiv
Abstract
Abstract

We study system design problems stated as parameterized stochastic programs with a chance-constraint set. We adopt a Bayesian approach that requires the computation of a posterior predictive integral which is usually intractable. In addition, for the problem to be a well-defined convex program, we must retain the convexity of the feasible set. Consequently, we propose a variational Bayes-based method to approximately compute the posterior predictive integral that ensures tractability and retains the convexity of the feasible set. Under certain regularity conditions, we also show that the solution set obtained using variational Bayes converges to the true solution set as the number of observations tends to infinity. We also provide bounds on the probability of qualifying a true infeasible point (with respect to the true constraints) as feasible under the VB approximation for a given number of samples.

A general system design problem can be formally stated as the following constraint optimization problem

image

where  x ∈ X ⊆ Rp is the input/control vector in a convex set  X and ξ ∈ Θ ⊆ Rq is the system parameter vector. The function  f(x, ξ) : X ×Θ �→ Rencodes the cost/risk associated with the given values of parameter and control variable  ξ and xrespectively. Similarly, the functions  gi(x, ξ) :X × Θ �→ Rdefine the constraints on  ξ and x. Under certain regularity conditions on the cost and the constraint functions, and for a given value of the true system parameter  ξ0, the problem (TP) at  ξ = ξ0can be solved to obtain the optimal control vector  x∗. In practice, the true system parameters are unknown and these parameters must be estimated using observed data.

In this paper, we take a Bayesian approach and model the uncertainty over the parameters  ξby computing a posterior distribution  π(ξ|Xn) for a given prior distribution  π(ξ) and the likelihood  Pξ(Xn) of observing data  Xn. We approximate the true problem (TP), using the posterior distribution, with the following joint chance-constrained problem:

image

where  β ∈ (0,1) is the specified confidence level desired by the decision maker (DM) based on the requirement, usually  β > 12. We provide a supporting example (see Appendix B) to motivate the chance-constraint formulation as opposed to using expectations, in which case the constraints are only satisfied on an average. The unconstrained version of the above problem has been studied as a special case in Jaiswal et al. (2019).

In practice, computing posterior distributions is challenging and mostly intractable, and is typically approximated using Markov Chain Monte Carlo (MCMC) or Variational Bayesian(VB) methods. MCMC methods has its own drawbacks like poor mixing, large variance, and complex diagnostics, which have been the usual motivation for using VB (Blei et al., 2017). Here, we provide another important motivation for using VB in the chance-constrained Bayesian inference setting. In particular, we present an example (motivated from Pena-Ordieres et al. (2019)) where a sampling based approach to approximate the chance-constraint convex feasibility set (constraint set) in (BJCCP), results in a non-convex approximation; whereas an appropriate VB approximation retains its convexity. Therefore, we approximate (BJCCP) using a VB approximate posterior

image

minimize Eq∗(ξ|Xn)[f(x, ξ)] (VBJCCP)s.t. q∗ (gi(x, ξ) ≤ 0, i ∈ {1, 2, 3, . . . , m}|Xn) ≥ β, ∀x ∈ X.

Under certain regularity conditions, we also show that the optimizers of (VBJCCP) are consistent with those of (TP). More precisely, we show that the solution set obtained in (VBJCCP) converges to the true solution set as the number of observations, n tends to infinity. We also provide bounds on the probability of qualifying a true infeasible point (with respect to the true constraints) as feasible under the VB approximation for a given number of samples. As part of the future work, we want to analyze the risk-sensitive VB approximation of the (BJCCP), where the risk is quantified as the deviation of the approximate feasibility set from the true.

Bayesian statistics delineates natural principles to model uncertainty in parameter estimation, using observed data combined with prior knowledge. Let  Xn = {X1, X2, . . . , Xn}, be nindependent and identically distributed samples from the F measurable random vector  X(ω) with support Ω  ⊂ Rdon probability space (Ω, F, Pξ), with Pξas the associated probability measure, with parameter  ξ.

Using the posterior distribution  π(ξ|Xn), we approximate (TP) as a data-driven joint chance-constrained problem, stated formally as:

image

and  β ∈ (0,1) is the specified confidence level desired by the decision maker (DM) based on the requirement. These are the two significant challenges in solving (BJCCP):

1. Computing the posterior distribution: While in some cases conjugate priors can be used, this is not acceptable in most problems; resulting in an intractable computation. The posterior intractability is the common motivation for using VB (Blei et al., 2017) and MCMC techniques for approximate Bayesian inference.

2. Convexity of the feasibility set: Ideally, one should expect (BJCCP) to be a convex program to take advantage of the well established convex solvers. But, even if the posterior distribution is computable, to qualify (BJCCP) as a convex program, the feasibility set,

image

must be convex. It might be possible that the above set is not convex even when the underlying constraint functions  gi(x, ξ), i ∈ {1, 2, . . . m}are so (in x) and thus finding a global optimum becomes challenging (Pr´ekopa, 1995).

Note that, if the constraint function has some specified structural regularity and the posterior distribution belongs to a certain class of distributions, then it can be shown that the feasibility set in (1) is convex. For instance, it can be shown that if the constraint functions  gi(x, ξ), i ∈{1, 2, . . . m} are quasi-convex in (x, ⃗ξ) and the distribution is log-concave then the feasibility set in (BJCCP) is convex ((Shapiro et al., 2009, Chapter 4) and Pr´ekopa (2003)). Also, Lagoa et al. (2005) showed that if the constraint function  gi(x, ξ) is of the form  {⃗aT x ≤ ⃗b}, where ξ = (⃗aT ,⃗b)Tand has a symmetric log-concave density then with  β > 12 the feasibility set in (BJCCP) is convex. To address the posterior intractability, Monte Carlo (MC) methods offer one way to do approximate Bayesian inference with asymptotic guarantees. However, their asymptotic guarantees are offset by issues like poor mixing, large variance and complex diagnostics in practical settings with finite computational budgets. Apart from these common issues, there is another important reason due to which any sampling-based method can not be used directly to solve (BJCCP). Using the empirical approximation to the posterior distribution (constructed using the samples generated from MCMC algorithm) to approximate the chance-constraint feasibility set in (BJCCP), results in a non-convex feasibility set (Pena-Ordieres et al., 2019). To illustrate this, consider the following simple example (modified slightly) of a chance-constraint feasibility set from Pena-Ordieres et al. (2019). We plot in Figure 1(a) the following chance-constraint feasibility set

image

and its empirical approximator using 8000 MCMC (Metropolis-Hastings with 3000 burn-in samples ) samples generated from the underlying correlated multivariate Gaussian distribution. We observe that the resulting MC approximate feasibility set is non-convex.

Therefore, due to the posterior intractability and the non-convexity of the feasible region when using sampling approaches, as an alternative, we propose to use Variational Bayes (VB) methods.

The idea behind VB is to approximate the intractable posterior  π(ξ|Xn) with an element q∗(ξ|Xn) of a simpler variational family Q. Examples of Q include the family of Gaussian distributions, delta functions, or the family of factorized ‘mean-field’ distributions that discard correlations between components of  ξ. The variational solution  q∗ is the element of Q that is ‘closest’ to  π(ξ|Xn), where closeness is usually measured in the Kullback-Leibler (KL) sense. Thus,

image

Using this, we approximate (BJCCP) with,

minimize Eq∗(ξ|Xn)[f(x, ξ)] (VBJCCP)s.t. q∗ (gi(x, ξ) ≤ 0, i ∈ {1, 2, 3, . . . , m}|Xn) ≥ β, ∀x ∈ X,

where  βis the confidence level. Choosing the approximation to the posterior distribution from a class of ‘simple’ distributions would facilitate in addressing the two critical problems associated with (BJCCP). Besides the tractability of the posterior distribution, for instance, using the results in Pr´ekopa (2003) and Lagoa et al. (2005) the choice of a log-concave family of distributions as the approximating family could retain the convexity of the feasibility set, if the constraint functions have certain structural regularity.

Next, we show that using the popular mean-field variational family to approximate the correlated multivariate Gaussian distribution in the same example in (2), we obtain a smooth and convex approximation to the (BJCCP) feasibility set. First, we compute mean-field approximation  qA(ξ) and qB(ξ) of N�ξ|µ = [0, 0]T , Σ�for fours different covariance matrices Σ, with fixed variance  σ11 = σ22= 1 but varying covariance  σ12 = {−0.1, −0.025, 0.025, 0.1}. Then, we plot the respective approximate VB chance-constraint feasibility region in Figure 1. We observe that VB approximation provides a smooth convex approximation to the true feasibility set, but it could be outside the true feasibility region if the  ξ1 and ξ2are positively correlated.

image

Figure 1: Feasible Region : True Distribution vs Monte Carlo Approximation (5000 samples) vs. VB (mean field approximation).

2.1 Theoretical properties of (VBJCCP)

In this section, we establish theoretical guarantees on the approximate optimal solution set  S∗V B(Xn)obtained using the VB approximation and show that it converges to the optimal solution set  S∗ of(TP) almost surely in  P0. We show similar result for their corresponding optimal values  V ∗V B(Xn)and  V ∗. The consistency of the approximate solution follows using techniques from the variational calculus and the consistency of the VB-approximate posterior distribution, which is proved under certain conditions on the prior distribution, likelihood model, and the variational approximation in Wang and Blei (2018). For brevity, we state the following results without any assumptions and proofs; it will be stated formally in Appendix C.

image

In the next result, we show that the solution obtained in (VBJCCP) are feasible with high probability. Let us define the set where the true constraint  i ∈ {1, 2, . . . m}is satisfied as  F i0 :={x ∈ X : {gi(x, ξ0) ≤ 0}, },and VB-approximate feasibility set is denoted as ˆFV B(Xn) := {x ∈ X :q∗ (gi(x, ξ) ≤ 0, i ∈ {1, 2, 3, . . . , m}|Xn) ≥ β}.We prove the next result using the convergence rate results for VB approximation in Zhang and Gao (2019).

Proposition 2.2. We show that if  x ∈ X\F i0, then there exists constant  Ci > 0 for each i ∈{1, 2, . . . m}, such that  P0[x ∈ ˆFV B(Xn)] ≤ Ciβ (ϵ2n + η2n), where ϵ2n → 0 as n → ∞ and η2n :=

image

Aksin, Z., Armony, M., and Mehrotra, V. (2009). The modern call center: A multi-disciplinary perspective on operations management research. Production and Operations Management, 16(6):665–688.

Aktekin, T. and Ekin, T. (2016). Stochastic call center staffing with uncertain arrival, service and abandonment rates: A bayesian perspective. Naval Research Logistics (NRL), 63(6):460–478.

Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association.

Dupacova, J. and Wets, R. (1988). Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. The Annals of Statistics, 16(4):1517–1549.

Gans, N., Koole, G., and Mandelbaum, A. (2003). Telephone call centers: Tutorial, review, and research prospects. Manufacturing & Service Operations Management, 5(2):79–141.

Gross, D., Shortie, J. F., Thompson, J. M., and Harris, C. M. (2008). Simple Markovian Queueing Models. Wiley.

Hong, L. J., Yang, Y., and Zhang, L. (2011). Sequential convex approximations to joint chance constrained programs: A monte carlo approach. Operations Research, 59(3):617–630.

Jaiswal, P., Honnappa, H., and Rao, V. A. (2019). Risk-sensitive variational bayes: Formulations and bounds. arXiv preprint arXiv:1903.05220v3.

Lagoa, C. M., Li, X., and Sznaier, M. (2005). Probabilistically constrained linear programs and risk-adjusted controller design. SIAM Journal on Optimization, 15(3):938–951.

Pena-Ordieres, A., Luedtke, J. R., and Wachter, A. (2019). Solving chance-constrained problems via a smooth sample-based nonlinear approximation.

Pr´ekopa, A. (1995). Stochastic Programming. Springer Netherlands.

Pr´ekopa, A. (2003). Probabilistic programming. Handbooks in operations research and management science, 10:267–351.

Shapiro, A., Dentcheva, D., and Ruszczy´nski, A. (2009). Lectures on stochastic programming: modeling and theory. SIAM.

Wang, Y. and Blei, D. M. (2018). Frequentist consistency of variational bayes. Journal of the American Statistical Association, 0(0):1–15.

Zhang, F. and Gao, C. (2019). Convergence rates of variational posterior distributions. Annals of Statistics.

To illustrate the system design problem in (TP) with an example, we model a queueing system and show that the optimal staffing problem aptly fits into the above framework. Consider a staffing problem where the decision maker (DM) has to decide the optimal number of servers, c, after observing the arrival and service data in an M/M/c queueing system; a queueing system where the inter-arrival times and service times are exponentially distributed (Markovian) with c number of servers, is denoted as M/M/c queueing system. We assume that the rate parameters of the exponentially distributed inter-arrival and service times distribution are unknown and denoted as λ and µrespectively. Note that  λ and µ, combined together form the system parameter  ξ = {λ, µ}and the number of servers c is the control/input variable. The DM first uses a single server and collects data after the system reaches its ‘steady state’. Since the DM observed congestion in the queues, he/she decides to employ more servers. The DM collects n realizations of the random vector V := {T, S, E}, denoted as  Xn := {V1, . . . Vn} where T, S, and Eare the random variables denoting the arrival, service-start, and service-end time of each customer  i ∈ {1, 2, . . . n}respectively. We also assume that there is no time lag between the two successive states for any customer and the inter-arrival and service times are independent, that is  Ti − Ti−1is independent of  Ei − Si for eachi ≥1. The joint likelihood of the arrival and departure times for n customers is

image

Constraint functions: The DM chooses the number of servers c to maintain a constant measure of congestion. Congestion is usually measured as 1−Wq(c, λ, µ), where Wq(c, λ, µ) is the probability that the customer did not wait in the queue. The closed-form expression for 1−Wq(c, λ, µ) is knownto be (see Gross et al. (2008))

image

where  r = λµ and ρ = rc with ρ < 1. ρis also known as traffic intensity and for a stable queue  ρ < 1.DM fixes  α, which is maximum desired fraction of customers delayed in queue and the smallest c is chosen that satisfies:

image

Referring to the queueing literature, we will use the term the Quality of Service(QoS) constraint for the first constraint. The corresponding constraint optimization problem is

image

The above staffing problem and its variations are well studied in the queueing literature; interested reader may refer to Gans et al. (2003) and Aksin et al. (2009).

Since the system parameters are unknown in practice, these are usually estimated using the observed data  Xn. The simplest approach could be to substitute the maximum likelihood estimates(MLE) ˆξ(Xn) of the parameters  ξin the (TP) and solve the following approximate problem:

image

We solved the queueing staffing problem in Appendix A using the MLE approach on simulated data (n observations, with n in 50-400 in increments of 50) and computed the approximate optimal number of servers denoted as  CnMLE. We repeated this experiment over 100 sample paths and

image

Table 1: Fraction of times  CnMLE violates QoS constraint.  λ0 = 16, µ0 = 4, α = 0.37.

computed  φ(CnMLE), the fraction of experiments  CnMLE violates the QoS constraint. Table 1 shows that the QoS constraint is violated in over 50% of the experiments.

It is anticipated that the MLE approach is unable to capture the uncertainty in parameter estimation therefore an alternative method is proposed using forecasting techniques. In this approach first, the uncertainty over the parameter estimation is captured by forecasting a probability distribution  P(ξ) over the system parameters and then the forecast distribution is used to solve the (TP) problem using one of the following two formulations:

Average-Constraint(AC)

image

Chance-Constraint(CC)

image

Now consider a simple example from Hong et al. (2011), where the true problem is to find c∗ = min{c : ξ − c ≤ 0}. Since, ξis unknown the DM uses data to forecast that  ξ ∼ N(·|0, 1)is normally distributed. Using the AC formulation, notice that the approximate optimal solution c∗A = min{c : Eξ[ξ] − c ≤ 0} = 0 and Pξ{ξ ≥ c∗A} = 0.5. The above simple example shows that AC optimal solution could violate the constraint 50% of the times. On the other hand, CC formulation enables the DM to ensure that the approximate optimal solution satisfy the constraints with higher confidence by setting a higher confidence level(β). In forecasting approach, the DM needs to forecast each time the new data is collected. We propose a principled data-driven approach using Bayesian methods, wherein we combine forecasting and optimization. A similar approach has also been discussed in Aktekin and Ekin (2016) to solve the M/M/c staffing problem with abandonment, but crucially relies on the availability of conjugate priors.

C.1 Proof of Proposition 2.1

image

Next define an indicator function  I(−∞,0](t) := 1 if t ≤0 and 0 if t > 0 .

Lemma C.1. We show that for each  x ∈ X

image

Proof. Recall the result in Wang and Blei (2018) that the VB approximate posterior  q∗(ξ|Xn) isconsistent; that is for every  η > 0.

image

Observe that for any  x ∈ X and η > 0,

image

Observe that, the result in (4) combined with the fact that the first term in (5) is always positive and bounded, implies that limn→∞�∥ξ−ξ0∥>η�mi=1 I(−∞,0](gi(x, ξ))q∗(ξ|Xn)dξ = 0 P0 − a.s. Nowtaking limits on either side of (5), we have

image

and the lemma follows.

Next we define hypo-convergence and epi-convergence of a sequence of function  {hk(x)} to h(x).

Definition C.1 (Hypo-convergence). A sequence of functions  {hk(x)}hypo-converges to h(x); that is hypo  − limn→∞ hk(x) = h(x), if

1. for every  xk → x, lim supk→∞ hk(xk) ≤ h(x), and

2. there exists a sequence  xk → x, such that lim infk→∞ hk(xk) ≥ h(x).

Definition C.2 (Epi-convergence). A sequence of functions  {hk(x)}epi-converges to h(x); that is ep  − limn→∞ hk(x) = h(x), if

1. for every  xk → x, lim infk→∞ hk(xk) ≥ h(x), and

2. there exists a sequence  xk → x, such that lim supk→∞ hk(xk) ≤ h(x).

Lemma C.2. Under Assumption C.1, we show that,

1. for each x ∈ X, limn→∞ Eq∗(ξ|Xn)[f(x, ξ)] = f(x, ξ0) P0 − a.s.

2. and, ep − limn→∞ Eq∗(ξ|Xn)[f(xn, ξ)] = f(x0, ξ0) P0 − a.s.

Proof. Due to Assumption C.1, both the results above are a direct consequence of the result in (Du- pacova and Wets, 1988, Theorem 3.7).

Lemma C.3. We show that under Assumption C.1,  q∗ ��mi=1 I(−∞,0](gi(x, ξ))|Xn�hypo-converges to �mi=1 I(−∞,0](gi(x, ξ0)) P0 − a.s as n → ∞; that is

image

Proof. Since by Assumption C.1 each  gi(x, ξ0) is continuous in x, therefore  I(−∞,0](gi(x, ξ0)) isupper-semicontinuous(USC) in  x because I(−∞,0](·) is USC. Also, since the product of non-negative USC functions are also USC, it follows that �mi=1 I(−∞,0](gi(x, ξ0)) is USC. Similarly, since by assumption  gi(x, ξ) is Carath´eodoryfunction, therefore �mi=1 I(−∞,0](gi(x, ξ)) is a random upper- semicontinuous function (Dupacova and Wets, 1988). Now, using the reverse Fatou’s Lemma, for any  xk → x0

image

therefore�

image

and the result follows.

Proof of Proposition 2.1. We will first show that the assertion of the theorem is true for m = 1. Recall  S∗V B(Xn) is the solution of (VBJCCP): and  S∗is the solution of (TP).

Now observe that, since both  q∗ (g(x, ξ) ≤ 0|Xn) and I(−∞,0](g(x, ξ0)) are upper- semicontinuous their corresponding super-level sets are closed; and if X is bounded than the corresponding feasible sets are also compact. Also, if the the corresponding feasibility sets are non-empty then the corresponding optimal sets  S∗V B(Xn) and S∗are also non-empty.

Next let us assume that there exists a true solution  x∗ of (TP) which lies in the interior of X, that is for any  ϵ >0, there is  x ∈ X such that ∥x − x∗∥ < ϵ and g(x, ξ0) ≤0. It implies that there exists a sequence  {xk} ⊂ X such that xk → x∗ as k → ∞ and g(xk, ξ0) ≤ 0 for all k ≥ 1. Now fixx ∈ X such that g(x, ξ0) ≤0. Since, due to our result in Lemma C.1  q∗ (g(x, ξ) ≤ 0|Xn) converges pointwise to  I(−∞,0](g(x, ξ0)) P0 −a.s, therefore there exists an  n0such that for all  n ≥ n0, we haveq∗ (g(x, ξ) ≤ 0|Xn) ≥ β. Hence for all  n ≥ n0, xis a feasible solution of (VBJCCP) and therefore Eq∗(ξ|Xn)[f(x, ξ)] ≥ V ∗V B(Xn). Taking lim sup on either sides, we obtain

image

where the last inequality follows from Lemma C.2 (1). Now, since x can be chosen arbitrarily close to  x∗, it follows that

image

Next, let ˆxn ∈ S∗V B; that is ˆxn ∈ X, q∗ (g(ˆxn, ξ) ≤ 0|Xn) ≥ β and V ∗V B(Xn) = Eq∗(ξ|Xn)[f(ˆxn, ξ)].Since X is compact, we assume that ˆxn → x∗ P0 − a.s. Due to Lemma C.3,  q∗ (g(x, ξ) ≤ 0|Xn)hypo-converges to  I(−∞,0](g(x, ξ0)) P0 − a.s as n → ∞, therefore we have

image

Now using the fact that  q∗ (g(ˆxn, ξ) ≤ 0|Xn) ≥ β for every n ≥1, it follows from (11) that  x∗is a feasible point of (TP), since lim supn→∞ q∗ (g(ˆxn, ξ) ≤ 0|Xn) ≥ β implies I(−∞,0](g(x∗, ξ0)) ≥β and β ∈ (0, 1).Therefore, it follows that  f(x∗, ξ0) ≥ V ∗.Since, due to Lemma C.2 (2), lim infn→∞ Eq∗(ξ|Xn)[f(ˆxn, ξ)] ≥ f(x∗, ξ0) P0 − a.s, it follows that

image

Hence, it follows from (10) and (12) that  V ∗V B(Xn) → V ∗ P0 − a.sand it also follows that  x∗ is thetrue solution of (TP), therefore  D(S∗V B(Xn), S∗) → 0 P0 − a.s. The above arguments can be easily generalized for the general case with m number of constraints.

C.2 Proof of Proposition 2.2

Proof. Using Markov’s inequality observe that for any  x ∈ X,

image

Now using Theorem 2.1 in Zhang and Gao (2019), it follows that for each  i ∈ {1, 2, 3, . . . , m} if

image

satisfies assumption (C1), then there exists a constant  Ci for each i ∈ {1, 2, 3, . . . , m} such that

image

where  η2n := 1n infq∈Q EP0��Θ q(ξ) log q(ξ)π(ξ|Xn)dξdξ�. Now observe that, using the above result in (14) directly proves the assertion of the proposition.


Designed for Accessibility and to further Open Science