Finding Optimal Points for Expensive Functions Using Adaptive RBF-Based Surrogate Model Via Uncertainty Quantification

2020·Arxiv

Abstract

Abstract

Global optimization of expensive functions has important applications in physical and computer experiments. It is a challenging problem to develop efficient optimization scheme, because each function evaluation can be costly and the derivative information of the function is often not available. We propose a novel global optimization framework using adaptive Radial Basis Functions (RBF) based surrogate model via uncertainty quantification. The framework consists of two iteration steps. It first employs an RBF-based Bayesian surrogate model to approximate the true function, where the parameters of the RBFs can be adaptively estimated and updated each time a new point is explored. Then it utilizes a model-guided selection criterion to identify a new point from a candidate set for function evaluation. The selection criterion adopted here is a sample version of the expected improvement (EI) criterion. We conduct simulation studies with standard test functions, which show that the proposed method has some advantages, especially when the true surface is not very smooth. In addition, we also propose modified approaches to improve the search performance for identifying global optimal points and to deal with the higher dimension scenarios.

Keywords Expected improvement Markov chain Monte Carlo Radial basis functions Sequential design

1 Introduction

In this paper, we consider the problem of global optimization of expensive functions, i.e., functions which require large computational costs to evaluate. For physical and computational experiments, these functions represent the relationship between input and output variables, and may require days or even weeks to evaluate at a single input setting. One example is the high-pressure mixing and combustion processes in liquid rocket engines, which requires numerically solving a large, coupled system of partial differential equations; see Oefelein and Yang (1998). Even when computation is parallelized over thousands of processing cores, a comprehensive simulation of a single injector may take months to complete. An important problem about expensive functions is how to optimize the output/response by choosing appropriate settings of the input variables. This problem can be challenging for two reasons. First, it is not feasible to conduct extensive runs of function evaluations to find the optimal input settings, since each function evaluation is expensive. It is thus desirable to identify the optimal input settings with as few runs as possible. The second challenge comes from the complicated nature of the functional relationship. They are usually regarded as “black-boxes”, because there is no explicit relationship between the input and output. Although various local optimization methods are available when the derivatives of the functions are known or can be easily obtained, see Boyd and Vandenberghe (2004), such methods are not applicable in the present scenario.

In the literature, a widely used practice for global optimization of expensive functions is to sequentially select input settings for function evaluations based on some criterions. The approach consists of two steps. First, it constructs a surrogate model to approximate the true function based on all the observed function outputs. The advantage of employing surrogate model is that it can provide predictions at any input settings with much cheaper computation. Second, it identifies a new input setting for function evaluation according to some surrogate model based selection criteria. With this approach, it is feasible to approximate the global optimizer of the true function via the surrogate model optimization. The commonly used surrogate models are the kriging models (Jones et al., 1998; Jones, 2001) and models based on radial basis functions (Gutmann, 2001; Regis and Shoemaker, 2007). Chen et al. (2011, 2017) proposed to construct the surrogate models via overcomlpete pre-specified basis functions. In addition, another type of optimization approach is statistical global optimization which chooses the next point based on a probability improvement criterion, like P-algorithm (Torn and ˇZilinskas, 1989). Gutmann (2001) and ˇZilinskas (2010) have showed the equivalence of the P-algorithm and the surrogate approach proposed in Gutmann (2001) under certain conditions. For more details along these lines, see a review in ˇZilinskas and Zhigljavsky (2016).

The primary objective of this paper is to propose a novel global optimization framework for optimizing expensive functions. Our approach is motivated by Regis and Shoemaker (2007), in which they utilize Radial Basis Functions (RBF) to build a deterministic surrogate model and guide the selection of the next explored point based on the predicted response and some distance criteria. The rationale of using RBFs is that they can capture the nonlinear trend of functions. However, the RBFs they used are pre-determined and lack the flexibility of modeling. Also, it is less efficient to perform function evaluation from their surrogate model, because they use RBFs in a deterministic way without providing prediction uncertainties. Although a distance criterion is used to avoid getting trapped at local optima, it does not incorporate the information in response values for the surrogate models. To make better use of all information in the data, we propose to construct surrogate model with RBFs that are chosen adaptively based on the updated outputs, and to select new points based on surrogate models with quantified uncertainties.

There are other approaches for global optimization of expensive functions in the literature. Jones et al. (1998) propose a global optimization scheme by constructing a surrogate model with the kriging method. Our approach is different in that they make strong assumptions on the correlation structure between explored points while ours does not. A detailed review related to the kriging model in global optimization can be found in Jones (2001). Chen et al. (2017) propose a global optimization scheme that builds a mean prediction model with linear basis functions selected from a dictionary of functions, and then imposes a Bayesian structure over the mean model to quantify the uncertainty of the prediction. Our approach is also different from Chen et al. (2017). Instead of using a predetermined discrete function dictionary with a large number of linear functions, we use a moderate number of RBFs that can be adaptively updated based on observed data.

The paper is organized as follows. In Section 2, we give a mathematical formulation of the global optimization problem, and provide a review of the RBFs. In Section 3, we present the proposed global optimization framework that utilizes adaptive RBF-based Bayesian surrogate model. In Section 4, we present simulation studies to validate and compare our proposed method with the methods by Regis and Shoemaker (2007) and Jones et al. (1998). Further simulation results for 3- and 4-dimensional functions are given in Section 5. A modification of the proposed method to avoid getting trapped in local optima is presented in Section 6.1. In addition, we study the effect of the grid size which is used as the candidate points in the proposed method, and also come out a grid-free approach. Concluding remarks and future research directions are given in Section 7.

2 Problem Formulation and Review of RBFs

Suppose f(x) is an expensive function of interest, where x = (and V is a p-dimensional convex domain in . The objective is to identify an optimal input setting that maximizes f(x),

Because it is not practical to evaluate f(x) over V to search the global maximizer due to the huge computational cost, a well-established practice is to sequentially select a few input settings for function evaluation using a two-step strategy. Suppose a set of N function evaluations are taken. In step 1, a surrogate model is constructed and the resulting model approximation is denoted by Unlike the true function f(x), the surrogate model is much cheaper to build and evaluate. Therefore it is feasible to predict function values over all In step 2, the next input setting is selected for function evaluation via certain criterion based on the surrogate model from step 1. Steps 1 and 2 iterate until the total computational budget is met. The best point among all the chosen input settings, ˆ= arg max), can be treated as an approximation to the true optimal point . By adopting this two-step strategy, we will present in Section 3 our proposed framework in detail. Note that the surrogate construction may not necessarily be an interpolator of the observed points, i.e., Because our goal is optimization, the surrogate is used to predict the location of the optimal points, rather than to approximate the response with high accuracy (Chen et al., 2011). Thus we want to capture the trend of the true response surface quickly and to serve this purpose, our surrogate model does not have to meet the interpolation requirement.

In the remaining part of this section, we give a brief review of the RBFs, which will be used in the proposed framework for the surrogate model construction. In the literature, the RBF is popularly deployed in applied mathematics and neural networks. See Buhmann (2003) and Bishop (2006). Several commonly used functions are: (1) Gaussian functions: ) = exp; (2) generalized multi-quadratic functions: ) = (+ with 1; (3) generalized inverse multi-quadratic functions: ) = (+ with 0; (4) thin plate spline functions: ) = ln(where is the center of the function, and s is a pre-specified constant which varies with the chosen function. In our work, we will focus on the Gaussian RBFs. The Gaussian RBFs have two types of parameters: the center parameter that determines the location of the RBFs, and the scale parameter s that measures the degree of fluctuation of the function. One advantage of using the Gaussian RBFs over other basis functions is that it can capture different trends of response by choosing differ-ent centers and scales. For example, a larger s indicates a more concentrated change in the surface, and vice versa.

3 General Global Optimization Framework

In this section, we propose a global optimization framework that utilizes adaptive RBF-based surrogate model via uncertainty quantification. In Section 3.1, we propose a novel hierarchical normal mixture Bayesian surrogate model with RBFs to approximate the true function, where the model coefficients are sparsely represented to avoid over-fitting, and the parameters of the RBFs are adaptively updated each time a new point is explored. This allows us to predict the function value at any given candidate point. In Section 3.2, we propose a model-guided selection criterion and based on the posterior samples, a sample version of the expected improvement criterion is adopted. A new point can then be selected to identify a more promising area of global maximizer. A summary of the algorithm and some discussions will be presented in Section 3.3.

3.1 Normal Mixture Surrogate Model with RBFs

Suppose we observe N explored points and its function values y = (= (. Without loss of generality, we assume ) = 0, because otherwise we can approximate ()’s instead of ’s, where ¯y is the sample mean of , i.e., ¯y = . We propose to construct a surrogate model by a summation of N Gaussian RBFs ) = expand an error term ):

Here, an error term is used to model the discrepancy between the model approximation ) constructed by the RBFs and the true function f(x). We assume that ) are independent normal distributions with mean 0 and vairance . Note that if the center parameters ’s and the scale parameters ’s are known and fixed, then the surrogate model in (2) is exactly the same as linear regression.

Because both ’s and ’s are unknown, the proposed modeling approach can handle highly nonlinear functions. A uniform prior over a rectangular region is used for = (

where = [min(max())], i.e., the smallest hyper-rectangle to cover the current explored points, and it is adaptively changed with the addition of new explored points, see Andrieu et al. (2001). A gamma prior is used for the scale parameters s = (

where and are common to all i’s.

We also impose a hierarchical structure on the coefficients ’s. Define a latent variable = (to indicate whether a certain basis function is important or not: = 1 indicates that the ith basis is important, while = 0 indicates the opposite. Specifically, we set = 0) ) with small and = 1) ) with relatively large C, where C can be interpreted as a variance ratio. This hierarchical setting is first employed in the Stochastic Search Variable Selection (SSVS) scheme by George and McCulloch (1993) and is also used for uncertainty quantification studies in Chen et al. (2017). Indeed, it is one type of the “g-prior” (see Zellner (1986)) for avoiding over-fitting. Now the mixture normal prior of the model coefficient = (can be written as follows:

with = 1 if = 0 and = C if = 1, and a binomial prior for the latent variable

Note that the choice of C plays an important role in the posterior sampling and control the model complexity. We also impose an inverse-gamma prior for the residual variance

By combining (2)-(7) with independent prior assumptions, we obtain the full posterior distribution of

det(2exp12

where the coefficient matrix ) is defined as

and the indicator function 1) = 1 if = 0 if

The posterior distribution defined in (8) is computationally intractable. Markov Chain Monte Carlo (MCMC) method is utilized to solve this problem, see Andrieu et al. (2001) and Koutsourelakis (2009). That is, we use the MCMC method to generate the posterior samples from ). Thus we sequentially sample and s by fixing the other components and the data x, y. Under certain conditions, we can guarantee that these samples can be treated as the posterior samples of . Here the MCMC method iterates the following two steps:

Specifically, we use the Gibbs sampler to generate the posterior samples for the parameters and the Metropolis-Hasting algorithm to obtain the posterior samples for the parameters and s, because there is no explicit formula for the posterior distributions of and s.

The samples of can be generated by

For the samples of , it would be simple to sample sequentially conditional on the other components, and can be generated by

where

= 1= 1det(exp12)

with = and with = 1;

with = and with = 0. Here the notation = (represents the vector of all ’s except

Now we turn to the parameters and s. First, consider the sampling procedure for . Instead of directly sampling the vector , we suggest sampling sequentially from

where = () denotes the vector of all ’s except We use the Metropolis-Hasting algorithm to generate posterior samples for Specifically, at a new step (k + 1), we set the proposed density to be a mixture of two densities, and a temporary sample can be obtained from the whole domain with uniform probability, or it can be a perturbation of the current iteration within its local neighborhood, i.e.,

Then we accept this temporary sample with the acceptance rate

Similarly, we can use the Metropolis-Hasting algorithm to generate samples of At step (k+1), we choose a temporary as a perturbation of the current sample by the proposed density

And we accept such sample with the acceptance rate

where = (From (9)–(14), we generate samples for iteratively based the updated estimate for the remaining parameters. Then, the Gibbs sequence

can be obtained, where K is the total number of iterations. After discarding the first say 40% samples, the remaining samples can be treated as the posterior samples of and s from ). Thus the posterior sample (˜x) for model approximation at a candidate explored point ˜x can be calculated by

Then the function prediction ) can then be calculated as the average of (˜x)’s, and the prediction uncertainties can be calculated as the sample variance of (˜x)’s.

Finally, we note that the mean value of the posterior density of in (9) is h = (()which is a biased estima- tor of with a nugget value Hence, this estimate of can be regarded as a ridge-type regression estimate. It is deployed to prevent the model co-efficients from being too large. Its use can lead to a more stable surrogate model.

A remaining issue in the Bayesian computation is the tuning of the hyper-parameters, which is critical for the model performance. For the hyper-parameters related to the RBF, we adopt the settings in Andrieu et al. (2001) and Koutsourelakis (2009). Specifically, for the proposed density of the RBF centers in (13), we set = 0.001. For the prior of the RBF scales in (4), we set = 2= 0, and for the proposed density of in (14), we set = 0.5. For the hyper- parameters related to model coefficients and residuals, we follow the settings in Chipman et al. (1997). Specifically, for , we suggest to set ), where = max(min(i.e., the largest change in and 5. For the prior of the indicator variable , we set = 0.5, i.e., the probability of selecting a variable is 50%. For the hyper parameter and in (7), we set = 2, and to be the 99% quantile of the inverse gamma prior that is close toConsider the variance ratio C. Usually we choose a large positive value for C, e.g., 10. From our experience, we fix C = 25 in the first simulation example. However, it should be problem-dependent and can be changed for different optimization problems.

3.2 A Point Selection Criterion

In this section, we discuss how to select new explored points based on the uncertainty of the response prediction for exploring uncertain regions. The ideal selection criterion should perfectly balance between exploration and exploitation properties to efficiently identify the optimal points within the given search budget. Here a sample version of the Expected Improvement criterion is adopted.

The EI criterion, initially proposed by Mockus et al. (1978), is used to select points close to the global maxima based on a chosen surrogate model. Using this criterion, an explored point is selected to maximize the expected improvement over the best observed response

where = maxis the maximum of the observed model outputs. It is pointed out in Jones et al. (1998) that under the Gaussian assumption of )) has the following closed form expression:

By examining the terms, we see that the expected improvement is large for those x having either (i) a predicted value at x that is much larger than the maximum of outputs obtained so far, i.e., or (ii) having much uncertainty about the value of y(x), i.e., when is large.

In our scenario, since the proposed surrogate model does not satisfy the Gaussian assumption, there is no analytical form for y, and thus it is not practical to calculate E(I(x)) directly. Instead, we calculate the Sampled Expected Improvement (SEI) as suggested in Chipman et al. (2012) and Chen et al. (2017), i.e., to estimate E(I(x)) based on the posterior samples of y,

where ) = ) is the mth posterior sample by (15), and M is the total number of posterior samples. Unlike in the Gaussian case, the SEI value in (18) may not be expressed as a weighted sum of the improvement term and the prediction uncertainty term. From its definition, only the prediction posterior samples ) that are larger than the current best value, , are taken in the summation. Thus SEI first identifies the possible “improvement” area, for some m}, and then sums over these terms.

A new explored point at step N + 1 is then selected to maximize the SEI criterion ˆE(I(x)),

where is the current explored point set. Since the SEI criterion does not have a closed form, it may not be easy to identify the next explored point by solving (19). Thus to identify the next point, we may limit the search over a pre-specified grid, , instead of over the whole region, V .

3.3 The Proposed Algorithm and Remarks

In the first part of this section, we will present a summary of the algorithm and the flexible usage of the proposed adaptive RBF-based global optimization framework. For abbreviation, we will refer to the proposed method as BaRBF, where “Ba” stands for “Bayesian adaptive”. In the second part, we will compare our method with the baseline method proposed in Regis and Shoemaker (2007).

Algorithm 1 summarizes the proposed global optimization method. In the beginning, the initial design is chosen as a space-filling design. In this paper, the maximin Latin hypercube design (Morris and Mitchell, 1995) is used. Then the main body of Algorithm 1 is to iterate between the two steps for the surrogate model construction in Section 3.1 and the point selection criterion in Section 3.2. Note that the proposed BaRBF can be flexibly used in different scenarios. For example, when the number of available function evaluations is small to moderate, there may not be enough observations to estimate all the parameters. In this case, we only need to update some part of RBF parameters, say the scale parameter s by setting all the scale parameters = 1, ..., N), and need not update the parameters. The choice of whether to

update all parameters or part of them can be decided based on the magnitude of the model residuals at the initial stage. If updating all parameters leads to relative large model residuals, then we can fix certain parameters instead. The formulas of the posterior distribution in (8)-(14) need some minor changes accordingly if certain RBF parameters are fixed. For the above example, one only needs to set in eq. (8) to be the same s, and update only one s in (14), and does not need to update the ’s in (12) and (13).

For the remaining part of this section, we will compare our BaRBF with the Global metric stochastic RBF (G-MSRBF) algorithm proposed by Regis and Shoemaker (2007) from a theoretical perspective. The G-MSRBF method will be regarded as the baseline method from now on. First we give a brief review. The GMSRBF employs a surrogate model ) using RBFs,

The RBF parameters in (20) are pre-specified, i.e., the RBF centers are set at the explored points , and s is pre-calculated at the initial stage of optimization. The model coefficients in (20) are estimated by solving a deterministic linear system of equation where = (And their point selection criterion

is a weighted average of the scaled response prediction (x) with

and the maximin distance criterion (x) with

where = max= min) = min= min = max The can take values in {1, 0.8, 0.6, 0.4, 0.2} periodically. For example, if at time N = 20= 0.8, then at the next time N = 21= 0.6. Then a new point is selected to maximize ), and finally the global maximizer is also estimated by = arg max).

Although both methods use RBFs, there are two main differences. First, the surrogate model is different. BaRBF uses a Bayesian surrogate model that provides not only predictions but also its uncertainties, while the G-MSRBF utilizes a deterministic surrogate model that only provides predictions. Because our proposed surrogate model is similar to ridge regression, the approximation of response is more robust and smooth compared to the interpolation surrogate model in G-MSRBF. The second difference lies in the choice of the selection criterion for new explored points. In our method, we utilize the expected improvement criterion E(maxwhich can be regarded as a soft-thresholding version of E(y). As previously discussed, thresholding the prediction makes it easier to identify global optima. In addition, under the Gaussian prediction, EI criterion contains the measure of the prediction improvement and the model uncertainty. Here the part of the prediction improvement can be treated as exploitation and the uncertainty part is used to explore the search space. In G-MSRBF, the global exploration is based on the maximin distance criterion, (x), and the (x) is used for local refining. The weighted average of these two criteria is adopted with pre-specific weight pattern. Simulation studies will be presented in Sections 4 and 5 to further understand and compare the empirical performance of the two methods.

4 Simulation Study

To assess the performance of BaRBF, we compare it with G-MSRBF, which is regarded as the baseline method. In G-MSRBF, the candidate set for the next point selection is chosen in each iteration. To make a fair comparison, the candidate points are fixed as a pre-specified grid, , in the experimental region and both methods will be implemented over the same grid.

In addition to the G-MSRBF, we also consider another global optimization approach based on Gaussian process surrogate model for comparisons. Jones et al. (1998) proposed the efficient global optimization (EGO) approach by using the Gaussian process for surrogate construction. EGO starts from an initial point set. After evaluating the response values of the initial design points, a numerical optimization approach, like meta-heuristic algorithm, is used to obtain the MLE of the parameters in the Gaussian process and then the corresponding surrogate model is obtained. Since the Gaussian process prediction follows a normal distribution, the EI criterion in Eq. (17) is used to identify the next explored point over the feasible point set, , and then the surrogate model is updated. Iterate these two steps until a stopping criterion is met. Usually the stopping criterion can be the number of explored points or the maximum value of the EI criterion over the unexplored points.

Fig. 1 The contour plot of the Branin function on [0with grid size 0.04. The red triangle represents the global optimum and two red crosses denote the other two local optima.

First we start with a 2D global optimization example, whose objective function is quit smooth. Then we consider another 2D objective function which is not smooth and has multiple global optima.

4.1 2D Branin function

We consider the standard 2D test function “Branin function”, which has been widely used in the global optimization literature, e.g. Jones et al. (1998). The scaled version of “Branin function” we use here is defined as follows,

where ¯= 15= 15and [0, 1][0, 1]. To simplify our code, we further restrict this function on the evenly spaced grid = [0, 0.04, . . ., 1]. The contour plot of the Branin function over the pre-specified grid points is given in Figure 1, where there will be two local maxima and one global maximum on [0.96, 0.16] with the maximum value 1.0473. In BaRBF, we first measure the prediction uncertainties for all grid points in and then identify the next explored point via (19) from the set .

The objective is to find x that maximizes f(x) in (24) with as few evaluations as possible. At each iteration of the algorithm, the current optimal point, , and its function value ) are recorded together with all the explored points. We randomly choose a small set of (= 16) initial explored points using a maximin Latin hypercube design (Santner et al., 2013). All three methods start with the same set of ’s. Each time the surrogate model is updated by incorporating the f value of a new explored point. Then we calculate and update the value. For each algorithm, new explored points are selected and evaluated sequentially until the total number of explored points reaches + 30) = 46. This process is repeated 60 times, and the average performances are reported and compared for the two methods.

For fair comparison among BaRBF and G-MSRBF, we set the initial sampler of the RBF parameters in BaRBF to be the same as the fixed RBF parameters in G-MSRBF. Specifically, we use Algorithm 1 in Fasshauer and Zhang (2007) to select an optimal value of s in G-MSRBF that minimizes a cost function that collects the errors for a sequence of partial fits to the data. The center parameters ’s are set as the explored points ’s.

From many simulation trials, we found out that, for the Branin test function, updating all parameters in BaRBF will lead to relatively large model residuals that do not converge. This might be caused by the small number of function evaluations. Thus we only update one scale parameter s with all and fix the center parameter ’s at the explored points. We iterate the MCMC 10,000 times. Also, we discard the first 40% of the samples, and take 1 out of every 5 samples in the remaining 60% of the samples, in order to obtain stable and less correlated posterior samples for model fitting. In order to implement BaRBF, two important tuning parameters need to be pre-specified. The first one is the value of C in the coefficient prior. Here we set C = 25. Since the weighted selection criterion is adopted to select the next explored point, the parameter d in the weight is fixed as 5 after we have tested several different possibilities.

In this subsection, we first illustrate the proposed BaRBF with one particular simulation sample. Figure 2 plots the contours of the surrogate model in BaRBF and the locations of the explored points using BaRBF with N = 16, 21, 26, 31, 36, 41 for one simulation sample. Figure 2(a) shows the initial status of BaRBF. The initial design is a 16-run maximin Latin hypercube indicated by 16 green squares. The next explored point chosen by the selection criterion, i.e. the 17th point, is indicated by the black circle in the lower right corner. In Figure 2(b), the five additional points (17th to 21st) are indicated by the five blue squares. These five points are divided into three sets, one closer to the global maximum, the other closer to the other two local maximums. As in Figure 2(a), the next explored point, i.e., the 22nd point, is indicated by the black circle. In Figure 2(c), all the 21 points from Figure 2(b) are indicated by green squares, the additional five points by blue squares, and the next explored point by black circle. Then the same symbols are used in Figure 2(d) to (f) to demonstrate the progression of points for N = 31, 36 and 41. Amazingly, except the initial design points, all the explored points are located closer to the three maxima, none for exploring bad regions. Finally the global maximum is identified in point 39 as shown by the black square in Figure 2(f). In summary these contour plots show that BaRBF efficiently explores the experimental space and quickly approaches the optimal points.

We report the performance of BaRBF and G-MSRBF based on 60 replications by randomly generating the initial LHD designs. The purpose is to see

Table 1 Summary of optimal values obtained by BaRBF, G-MSRBF and EGO with 60 replications in the 2-dimensional experiment with Branin function. The run size of the initial design is 16, and the optimal value of the Branin function is 1.0473.

whether BaRBF provides a more efficient search path to identify the global maximum compared with G-MSRBF, for the same number of function evaluations. The numerical results are summarized in Table 1. First BaRBF has the higher mean value, 1.0443, than that of G-MSRBF, 1.0425, and is more stable because of its smaller standard deviation. Based on the sample quantiles of the optimal values identified by both approaches, there is a detectable differ-ence at the 5% quantile values. We found out that for several cases, the best points identified by G-MSRBF are not close to the true optimal point and after checking the corresponding search processes, G-MSRBF did not efficiently explore the experimental space by properly choosing the next points. On the other hand, G-MSRBF performs better than BaRBF for the first quartile Q1. In addition, among the 60 replications, BaRBF can identify the true optimal values 26 times, which is higher than 20 times for G-MSRBF. Overall BaRBF has a better performance. Here we plot in Figure 3 the mean value as well as the 5% and 95% quantile curves of the current optimal values with respect to the number of iterations for both methods. The mean curves in the two plots are very similar. More meaningful is the comparison of the two 5% quantile curves. For G-MSRBF, the curve moves up quickly until N =13; then it gets stuck (flat) until about N = 23. By comparison, the 5% quantile curve for BaRBF moves up quickly until N = 18. By then, the band between the upper and lower quantile curves is very narrow and continues to shrink. The corresponding band for G-MSRBF does not shrink even to the end (N = 30). In fact it remains very wide when N = 26. This figure gives a more informative comparison than the numerical results in Table 1. It clearly shows the better performance of BaRBF over G-MSRBF.

When we compare the results with EGO, it seems that EGO works perfectly in this Branin example because EGO can quickly identify the global maximum point in each replications. The possible reason should be that since the Branin function can be treated as a smooth function, it can be fitted quit well by the Gaussian process model and thus the EI criterion in EGO can rapidly guide the search process to the target point. For our BaRBF, we do have a error assumption in the surrogate model and the model fitting may not be as well as Gaussian process model. In addition, SEI is computed as the sample expectation of the improvement function without any distributed assumption. Thus if the Gaussian assumption is satisfied, it is not surprised

Table 2 Summary of optimal values obtained by BaRBF, G-MSRBF and EGO with 60 replications of the Ronkkonen function. (The run size of the initial design is 16, and the optimal value of the Ronkkonen function is 0.4777.)

that EI can be more efficient in getting the global optimal point. In fact, when we monitor the search process of BaRBF, sometimes it may stay in a local area for a while. This should be related to that the exploration effect of the SEI criterion does not function well.

4.2 2D Ronkkonen function

In addition, we consider another 2D objective function in R¨onkk¨onen et al. (2011), i.e.,

where = for i = 1, 2, = 4, and = (0, 0.1, 0.2, 0.5, 1)= (0, 0.5, 0.8, 0.9, 1) and the experimental region is [0, 1]. This objective function has been used as a test function in Chipman et al. (2012). Here we also restrict the function on the evenly spaced grid = [0, 0.04, . . ., 1]. The contour plot of this Ronkkonen function over the pre-specified grid points is given in Figure 4, where there are 12 local maximums and 4 global maximal points with the maximum value, 0.4777. Because of its multiple local and global optimal points, this Ronkkonen function is not as smooth as the Branin function.

In this example, the initial point sets are the same as those in Section 4.1, and the total number of explored points is set as = 16 + 30 = 46, i.e., based on 16 initial points, the search algorithm iterates 30 times by sequentially adding 30 points. Then the best value among the 46 explored points is reported. Here the goal is to identify one of the four global maximum points. The average performances of BaRBF, G-MSRBF and EGO over 60 replications are summarized in Table 2.

First, G-MSRBF performs worst in terms of the frequencies of reaching one of the four global maximum points (see the last column of Table 2). Then we focus on comparing the performance between BaRBF and EGO. From Table 2, the mean of the best function value found by BaRBF is 0.4775 with standard deviation 4.6850e-4, while the corresponding values for EGO are 0.4526 and 0.0344 respectively. We also compute the sample quantiles of the best values found by both methods. Table 2 shows that BaRBF touches the global maximum at the 50% sample quantile, while EGO reaches 0.4777 at the 75% quantile. In addition, for the BaRBF, the frequency of reaching a global optimum is 43/60, while the frequency for the EGO is 18/60. The mean value of 60 replicates, and the 5% and 95% quantile curves of the current optimal values with respect to the number of iterations for EGO and BaRBF are shown in Figure 5. The mean curve of the BaRBF moves up quickly to the global optimal value, while the curve for EGO moves up more slowly. In addition, the 5% quantile curve for EGO does not get much improvement before adding 25 explored points, i.e., N = 16 + 25 = 41, while the same curve for BaRBF moves up quickly and gets close to the global optimal value after adding 15 points, i.e., N = 16 + 15 = 31. Another attractive feature for BaRBF is the extremely low standard deviation (see the std column), which may suggest that the BaRBF can perform stably over repeated implementations. In summary, the BaRBF outperforms the EGO in this example.

Chipman et al. (2012) have pointed that when the test function has multiple global optima, the EI criterion might lead the search process to jump out of the neighborhood of one global optimum to another one and cannot stick around one global optimum. This may be explained by the fact that the EI criterion tries to minimize the prediction uncertainties among different global optima. The 5% quantile curve for EGO in Figure 5 seems to support this point. For the BaRBF, by using SEI, it can quickly locate one neighborhood of a global maximum and then identify the best value. In addition, since this Ronkkonen function is not as smooth as the Branin function, the normality assumption and the interpolation property of the Gaussian process may not give advantages for the surrogate construction. For the BaRBF, the surrogate model consists of additive radial basis functions and their parameters are simultaneously adjusted via the proposed Bayesian approach. Thus our surrogate construction approach may be more advantageous for non-smooth test functions.

5 Simulation Studies for Three and Four Dimensions

In addition to these two different 2D test functions, we consider 3D and 4D examples to illustrate that the proposed BaRBF can deal with higher dimension cases and also compare its performances with EGO and G-MSRBF.

5.1 3D Ronkkonen Function

The Ronkkonen function defined in R¨onkk¨onen et al. (2011) is quite general and can be extended to different dimensions. Here we consider a 3D Ronkkonen function which is generalized from the 2D function in Section 4.2 and is shown

below,

where = for i = 1, 2, 3, = 4, and = (0, 0.1, 0.2, 0.5, 1); = (0, 0.5, 0.8, 0.9, 1); = (0, 0.6, 0.7, 0.9, 1). The experimental region considered here is [0, 1]. According to R¨onkk¨onen et al. (2011), there are 5local maximum points and 27 of them are the global maximum points.

We divide the experimental region into (26)grid points by setting the grid size as 0.04. For this grid, there is only one global maximum point (0.3200, 0.6800, 0.4400) with the function value 0.3584. We start by choosing the initial maximin LHD with 50 points for the 3-dimensional case. After the initial stage, we run BaRBF for 50 iterations. To implement BaRBF, we follow the same RBF set-up in Section 4.1. That is we set the scale parameter s with all and fix the center parameter ’s at the explored points. For the tuning variable C, we fix C as 15. The performance of the BaRBF is measured by the maximum function value identified within 100 explored points and is summarized based on 20 replications by independently re-generating the initial design points. For the comparison purpose, we also implement EGO and G-MSRBF.

Table 3 is a summary of the maximum values obtained by the three approaches. G-MSRBF performs worst in this case because G-MSRBF cannot identify any true global maximum value within 20 replications. Between BaRBF and EGO, BaRBF has better performance in the average maximum function values and the frequency of reaching the global maximum point, but both approaches share similar values in the first and third quartiles, Q1 and Q3, and the median value. Because there are few replications, the maximum value identified by EGO is less than 0.34. A possible reason should be similar to what was stated in Section 4.2, namely, the Ronkkonen function is not a smooth function and EGO may jump around different local modes. Finally, the std value for BaRBF is much lower than that for the others. This is similar to what we observe in Table 2 for the 2D case and has a similar implication on the stability of the BaRBF.

5.2 4D Hartmann Function

In this section, we study the performance for the 4-dimensional Hartmann function. This function has been studied in Picheny et al. (2013) and is defined as

Table 3 Summary of maximum values obtained by BaRBF, G-MSRBF and EGO with 20 replications in the 3-dimensional case. The optimal value of the 3-dimensional Ronkkonen function is 0.3584.

where = () = (1.0, 1.2, 3.0, 3.2); A = () =

P = () = 10

this function over the experimental region [0, 1]. Here we divide the experimental region int (21)grid points by setting the grid size as 0.05. The maximum function value over these points is 3.1218.

The proposed method, BaRBF, with SEI selection criterion is used to identify the true optimal point over the pre-specified grid for this 4D function. For the parameters in radial basis function, the scale parameter s is set as and the center parameters, ’s, are fixed at the explored points. In addition, the variance, C, in the mixture normal priors of the coefficients is fixed as 10.

We start by choosing the initial maximin LHD with 50 points for the 4-dimensional case. After the initial stage, we run BaRBF for 50 iterations. Thus there are in total 100 explored points in this example. To track the performance of the proposed method, we record the maximum function values in each iteration, i.e., max). That was repeated 20 times by randomly generating the initial maximin LHD. We repeat the same procedure for GMSRBF and EGO. A summary of the maximum values obtained by the three approaches are shown in Tables 4. For this 4-dimensional Hartmann function, BaRBF outperforms G-MSRBF in terms of the frequency, the mean of the optimal values and a smaller standard deviation, and as shown in Table 4, for BaRBF, the middle 50% of values between Q1 and Q3 is extremely tiny and smaller than that for G-MSRBF. However, EGO has the best performance in this case. We think it should be related to the target function because this Hartmann function should be a smooth function and thus it is fever to the EGO approach.

Table 4 Summary of maximum values obtained by BaRBF, G-MSRBF and EGO with 20 replications in the 4-dimensional case. The optimal value of the 4-dimensional Hartmann function is 3.1218.

6 Discussions

In this session, several issues are studied. First we modify the proposed algorithm by adding a step to force the search process to jump out from a local optimal area. Then we study the effects of grid size on the performance of the proposed method. Finally, in order to reduce the computational burden for high-dimensional optimization problems, a grid-free version of the proposed method is introduced and a higher dimension example is illustrated.

6.1 A Modified Version of BaRBF

From tracing the search process of the BaRBF in the Branin function example, we found out that sometimes the BaRBF gets stuck in a local area and cannot leave the area for a while. To overcome this potential weakness, we have the following modification. To jump out of this local area, we add an additional step, called the escape step, by monitoring the search process of the current best value. That is, we record the number of consecutive non-improvement iterations, i.e., iterations for which the current best function value cannot get improved from the new explored point. Denote this number by . Once exceeds a pre-specified number, , it indicates that the BaRBF is stuck in a local area. Instead of continuing the search, we add some additional points to explore the experiment region. The additional point is chosen based on the maximin-distance criterion in the region. That is, given the current explored points, we find the point such that the union set of this point with the current explored points has the maximal value of the minimal distance between any two points in this union set. The purpose is to put additional points in the unexplored area as far away as possible from the existing points. We continue adding points until we obtain a better function value or until we add points. Then we will return to the original search procedure. A similar idea has been adopted in Regis and Shoemaker (2007) to detect if the search algorithm converges to a local optimum. We refer to this modificcation as M-BaRBF.

We implement the M-BaRBF for the Branin function by setting = = 3. For the same initial points sets and other tuning parameters, the average performance of the 60 replicates for M-BaRBF is shown in Table 5.

Table 5 Summary of optimal values obtained by M-BaRBF with Branin function. The run size of the initial design is 16, and the optimal value of the Branin function is 1.0473.

Table 6 Summary of the 60 replications for the optimal values obtained by BaRBF with grid size, 0.02, in the 2-dimensional experiment with Branin function. The run size of the initial design is 16, and the optimal value of the Branin function is 1.0473.

Compare with the results for BaRBF in Table 1. Except for the 5% sample quantile value, the M-BaRBF performs better in the mean value and median value. In addition, the frequency to reach the global optimum is 38/60 which is much higher than 29/60 for the BaRBF. The Q1 value for M-BaRBF is significantly higher than that for BaRBF and the median value touches the global maximum of the Branin function.

6.2 The Effects of the Grid Size

In the numerical examples, we suggest to pre-specify the grid size first when we implement our algorithm. This size may be chosen based on the prior knowledge. In the numerical examples in Section 4, the grid size is fixed as 0.04 or 0.05. Suppose we can consider different grid sizes for a given optimization problem. We will illustrate the effects of the grid sizes on the performance.

We revisit the 2D Branin function example in Section 4.1. Instead of setting the grid size as 0.04, we choose the finer size 0.02 and divide the region into (51)grid points which still cover the original grid, [0, 0.04, . . ., 1]. Based on this finer grid, there are still two local maxima and the global optimal point is located at [0.96, 0.16]. Since the number of grid points is now about four times that of the original, we take more iterations, 4 30 = 120, for the proposed BaRBF. Then based on the same initial points and tuning parameters, the results with 60 replications are summarized in Table 6. In this table, in addition to the results with 120 iterations, we also report the results with 30 iterations for comparison purpose.

are 1.0471, 1.0471 and 1.0473 respectively which are significantly higher than the corresponding values shown in Table 1. Obviously BaRBF(120) has the higher mean value 1.0472 and a smaller standard deviation. This may be related to the fact that there are more candidate points and larger number of iterations. Thus BaRBF can identify better function values due to finer grid and can still explore the experimental space because of a larger number of iterations. To support this guess, we also report the summary of the BaRBF with finer grid and 30 iterations, denoted by BaRBF(30). The corresponding 5%, 25% sample quantiles and the median value are still better than the corresponding values shown in Table 1. But the frequency for obtaining the global optimal point is only 17/60. It means that 30 iterations may not be large enough for BaRBF to explore the whole region and the search process may get stuck in some local areas. Thus we need to have more iterations to increase the probability to jump out of these local areas. Overall we can conclude that when we have a finer grid, a larger number of iterations should be necessary.

6.3 A Grid-free Method

When we demonstrate the performance of the proposed BaRBF in Section 4, we mention that we need to pre-specify a grid as the explored candidate point set before implementing the BaRBF. For example, we set the grid size as 0.04 for the 2 and 3 dimension cases. Suppose we consider the application in parameter tuning. In practice, the precision of each variable should be limited and due to this grid set-up, we can easily select the next explored point based on the SEI criterion; otherwise it should be treated as another non-differentialable optimization problem. However, the problem for the grid set is the curse of dimensionality. For example, when we consider a 7-dimension region, [0, 1]. Suppose we choose the grid size as 0.2. Then the number of grid points is 6= 279936 points, which is huge. To keep such huge number of grid points in our process would slow down the computational speed. Since the grid point set is used as the candidate set for selecting the next explored point, we would propose another modified BaRBF without pre-specifying grid point set.

The idea is similar to the G-MSRBF in Regis and Shoemaker (2007). Instead of fixing the grid points for the candidate point set, we randomly generate the candidate points in each iterations. Regis and Shoemaker (2007) have mentioned two conditions for this point generation procedure. One is the conditionally independent sampling condition and another one is the dense condition, that is, any point in can be chosen as a candidate point with a positive probability. Since our approach satisfies the first condition, sample the candidate points from the uniform distribution over the experimental region to satisfy the dense condition to guarantee the convergence property, i.e., when N goes to infinity, the global optimal point can be identified by such search process with high probability. Thus in our algorithm, we add one more step to generate the candidate point set from the uniform distribution over the region and then choose the next explored points by applying the SEI criterion

Table 7 Summary of maximum values obtained by grid-free BaRBF with 30 replications in the 8-dimensional case. The optimal value of the 8-dimensional Rastrigin function is 0.0000.

to . After updating the , we would continue the selection process by re-generating the new candidate points.

To demonstrate this new modified BaRBF, named the grid-free BaRBF, the Rastrigin function (Pohlheim, 2015) is taken as the objective function, i.e.,

This Rastrigin function has multiple local maxima and the locations of the local maximal points are uniformly distributed. The global optimal point is [0.5, . . . , 0.5] with maximum value 0. Here we choose d = 8, that is, we consider a 8-dimensional problem. At first the 80 initial points are chosen from a Latin hypercube design over the experimental region, = [0, 1]. The number of iteration for grid-free BaRBF is 60 and at each iteration, we uniformly sample 8,000 points from . The results based on 30 independently replications are summarized in Table 7. Figure 6 shows the 5%, 95% quantile curves and the mean values with respect to the number of iterations. Overall the results suggest that the proposed grid-free BaRBF does improve the objective function values over the whole search process; especially in the first few iterations, the improvement is significant. However, the improvement does slow down later. In addition we also tested the case of the Rastrigin function with d = 10 and the results share a similar pattern. Thus we do not show the corresponding results here to save space. The G-MSRBF is also implemented for this Rastrigin function example. Following the same set-up for generating the candidate points, the results with 30 independent replications are also summarized in Table 7. The proposed Grid-free BaRBF performs better than the G-MSRBF, especially in terms of the three sample quantiles and the mean values.

7 Conclusion and Future Work

We have proposed a global optimization framework that utilizes an adaptive RBF-based Bayesian surrogate model to approximate the true function, and to guide the selection of new points for function evaluation. There is novelty in both steps of the strategy. First, we use a hierarchical normal mixture surrogate model, where the parameters in the RBFs can be automatically updated to best approximate the true function. Second, the sample EI criterion is employed as a selection criterion. We have conducted some extensive numerical studies on standard test functions. The results demonstrate that the proposed BaRBF is more efficient and stable for searching the global maximizer compared with the G-MSRBF. For the comparison between the EGO and the BaRBF, their performance depends on the characterization of the true objective functions. Since the Branin function and the 4D Hartmann function are quit smooth, the EGO has better performances. For the 2D and 3D Ronkkonen functions, the functions are not smooth and have more multiple extreme points. The BaRBF outperforms the EGO.

There are some directions for future research. First, the point selection criterion is a key element of BaRBF. A good selection criterion is to balance the trade-off between exploitation and exploration. Currently the Sampled EI criterion is adopted in the BaRBF. When the Gaussian prediction assumption is held, the EI criterion is a weighted sum of the prediction improvement and prediction variation which can be treated as a way to balance the exploitation and exploration. However, in the BaRBF, we do not have the distribution assumption and thus how SEI to balance these two properties is uncertain. Based on the numerical results in two 2D functions, our SEI criterion tends to have the exploitation property but less effect related to the exploration, because for the smooth Branin function, BaRBF may be stuck in a local area, but BaRBF can quickly identify the global maximum close to the explored points in the example of Ronkkonen function. Thus one possibility is to add the prediction variation measurement, i.e., to quantify the prediction uncertainties by the 95% confidence interval bandwidth of ):

where )) are the upper CI and lower CI calculated as the 97.5% and 2.5% quantiles of the posterior samples (x). However, the problem should be how to integrate the SEI and CIB together. In addition, there should be a data-driven tuning procedure for hyper-parameters in BaRBF, e.g., the value of C in the mixture normal prior. The tuning procedure suggested in Chen et al. (2017) might serve as a starting point.

In BaRBF, to simplify our algorithm, we fix the center parameters as the explored points . Thus it only depends on the current N explored points and is independent of the dimension of the region V and the resolution of the grid. Since we need to pre-specify the candidate point set for next point selection, in this paper, we fix the candidate points as the whole grid. However, when the number of grid points is large, it will not be efficient to consider the whole grid. Instead we propose a grid-free version by iteratively generating the candidate points from uniform distribution over the experimental region, and then implement the procedure over this candidate set to identify next explored points. Based on the theoretical results in Regis and Shoemaker (2007), the convergence property can hold when the number of explored points goes to infinity. In addition, to identify the true optimal point, an adaptive BaRBF method can be considered as follows. At each iteration, we refine the current grid locally based on the hot spot areas identified from the surrogate surface, and then re-run BaRBF in these local areas independently. Take the Branin function example in Figure 2 for illustration. In Figure 2(f), we can identify three hot spot areas. Then we can choose three smaller disjoint regions with finer grid to cover each hot spot area, and then individually implement BaRBF for each region. We can continue this procedure until the grid size in each region is small enough. In fact, this adaptive procedure is similar to the Multistart Local MSRBF (ML-MSRBF) in Regis and Shoemaker (2007) and the two versions can be integrated. Following the ML-MSRBF, we can treat the area around the current best value as the local hot spot and restart the BaRBF by choosing candidates from a normal distribution centered at the best point in this local hot spot with a certain variance. After some number of iterations, we may identify another best point and restart the BaRBF again. Repeat this procedure until the stopping criterion is met.

Finally we mention a possible approach for tuning the value of C in our proposed procedure. In Section 3, we give the guideline that we should choose C larger than 0. But how large is the value of C? According to the Bayesian variable selection literature, C can be tuned via the cross validation approach. Based on this approach, we can consider the following suggestion. After collecting the initial observations, we may tune the value of C via cross validation and keep this value for a certain number of iterations. Then we can re-tune the value of C via cross validation again.

Acknowledgements Chen’s research is supported by Ministry of Science and Technology (MOST) of Taiwan 104-2918-I-006-005 and the Mathematics Division of the National Center for Theoretical Sciences in Taiwan. Wu’s research is supported by ARO W911NF-17-1-0007 and NSF DMS-1564438.

References

Andrieu, C., De Freitas, N., and Doucet, A. (2001). Robust full bayesian learning for radial basis networks. Neural Computation, 13(10):2359–2407.

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: NY, Springer.

Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. New York: NY, Cambridge University Press.

Buhmann, M. D. (2003). Radial Basis Functions: Theory and Implementations, volume 12. New York: NY, Cambridge University Press.

Chen, R. B., Wang, W., and Wu, C. F. J. (2011). Building surrogates with overcomplete bases in computer experiments with applications to bistable laser diodes. IIE Transactions, 43(1):39–53.

Chen, R.-B., Wang, W., and Wu, C. F. J. (2017). Sequential designs based on bayesian uncertainty quantification in sparse representation surrogate modeling. Technometrics, 59(2):139–152.

Chipman, H., Hamada, M., and Wu, C. F. J. (1997). A bayesian variable- selection approach for analyzing designed experiments with complex aliasing. Technometrics, 39(4):372–381.

Chipman, H., Ranjan, P., and Wang, W. (2012). Sequential design for com- puter experiments with a flexible bayesian additive model. The Canadian Journal of Statistics, 40:663–678.

Fasshauer, G. E. and Zhang, J. G. (2007). On choosing “optimal” shape parameters for rbf approximation. Numerical Algorithms, 45(1-4):345–368.

George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sam- pling. Journal of the American Statistical Association, 88(423):881–889.

Gutmann, H. M. (2001). A radial basis function method for global optimiza- tion. Journal of Global Opt, 19(3):201–227.

Jones, D. R. (2001). A taxonomy of global optimization methods based on response surfaces. Journal of Global Opt, 21(4):345–383.

Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimiza- tion of expensive black-box functions. Journal of Global Opt, 13(4):455–492.

Koutsourelakis, P. S. (2009). Accurate uncertainty quantification using in- accurate computational models. SIAM Journal on Scientific Computing, 31(5):3274–3300.

Mockus, J., Tiesis, V., and Zilinskas, A. (1978). The application of bayesian methods for seeking the extremum. Towards Global Optimization, 2:117– 129.

Morris, M. D. and Mitchell, T. J. (1995). Exploratory designs for computer experiment. J. Statist. Plann. Inference, 43:381–402.

Oefelein, J. C. and Yang, V. (1998). Modeling high-pressure mixing and com- bustion processes in liquid rocket engines. Journal of Propulsion and Power, 14(5):843–857.

Picheny, V., Wagner, T., and Ginsbourger, D. (2013). A benchmark of kriging- based infill criteria for noisy optimization. Structural and Multidisciplinary Optimization, 48(3):607–626.

Pohlheim, H. (2015). Geatbx examples: Examples of objective functions. http://www.geatbx.com/download/GEATbx_ObjFunExpl_v37.pdf.

Regis, R. G. and Shoemaker, C. A. (2007). A stochastic radial basis func- tion method for the global optimization of expensive functions. INFORMS Journal on Computing, 19(4):497–509.

R¨onkk¨onen, J., Li, X., and Kyrki, V. (2011). A framework for generating tunable test function for multimodal optimization. Soft Comput, 15:1689– 1706.

Santner, T. J., Williams, B. J., and Notz, W. I. (2013). The Design and Analysis of Computer Experiments. Springer Science.

Torn, A. and ˇZilinskas, A. (1989). Global Optimization. Springer-Verlag, Berlin, Heidelberg.

ˇZilinskas, A. (2010). On similarities between two models of global optimization: statistical models and radial basis functions. Journal of Global Opt, 48:173– 182.

ˇZilinskas, A. and Zhigljavsky, A. (2016). Stochastic global optimization: a review on the occasion of 25 years of informatica. Informatica, 27:229–256.

Zellner, A. (1986). On assessing prior distributions and bayesian regression analysis with g-prior distributions. Bayesian Inference and Decision Tech-

niques: Essays in Honor of Bruno De Finetti, 6:233–243.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 2 The contours of surrogate model using BaRBF. Each of the six plots corresponds to a surrogate model with N = 16, 21, 26, 31, 36, 41 respectively. (Explanation of symbols is given in the text.)

Fig. 3 The mean value (solid line) and the 5% and 95% quantiles (dashed line) of current optimal values based on 60 replications for the example of Branin function. Upper panel: G-MSRBF, lower panel: BaRBF.

Fig. 4 The contour plot of the Ronkkonen function on [0with grid size 0.04. The red triangle represents the global optimum and 12 red crosses denote the other two local optima.

Fig. 5 The mean value (solid line) and the 5% and 95% quantiles (dashed line) of current optimal values based on 60 replications, Ronkkonen function. Upper panel: EGO, lower panel: BaRBF.

Fig. 6 Three lines are the 5% quantile; mean value and 95% quantile of current optimal values based on 30 replications for the example of Rastrigin function.