Adaptive Design of Experiments for Conservative Estimation of Excursion Sets

2016·Arxiv

Abstract

Abstract

We consider the problem of estimating the set of all inputs that leads a system to some particular behavior. The system is modeled by an expensive-to-evaluate function, such as a computer experiment, and we are interested in its excursion set, i.e. the set of points where the function takes values above or below some prescribed threshold. The objective function is emulated with a Gaussian Process (GP) model based on an initial design of experiments enriched with evaluation results at (batch-) sequentially determined input points. The GP model provides conservative estimates for the excursion set, which control false positives while minimizing false negatives. We introduce adaptive strategies that sequentially select new evaluations of the function by reducing the uncertainty on conservative estimates. Following the Stepwise Uncertainty Reduction approach we obtain new evaluations by minimizing adapted criteria. Tractable formulae for the conservative criteria are derived, which allow more convenient optimization. The method is benchmarked on random functions generated under the model assumptions in different scenarios of noise and batch size. We then apply it to a reliability engineering test case. Overall, the proposed strategy of minimizing false negatives in conservative estimation achieves competitive performance both in terms of model-based and model-free indicators.

Keywords: Batch sequential strategies; Conservative estimates; Stepwise Uncertainty Reduction; Gaussian process model.

1 Introduction

The problem of estimating the set of inputs that leads a system to a particular behavior is common in many applications, such as reliability engineering (see, e.g., Bect et al., 2012; Chevalier et al., 2014a), climatology (see, e.g., French and Sain, 2013; Bolin and Lindgren, 2015) and many other fields (see, e.g., Bayarri et al., 2009; Arnaud et al., 2010). Here we consider a system modeled as a continuous, expensive-to-evaluate function where X is a compact subset of . Section 5 shows an example of such systems. Given a

few evaluations of f and a fixed closed set , we are interested in estimating

In the motivating test case in section 5, for example, Γ(f) represents the safe region, i.e. all values of the physical parameters that lead the system of interest to a subcritical response, taking

There is much heterogeneity in the literature on how to name Γ(f). Here we follow Adler and Taylor (2007) and we call Γ(is often also called excursion set above t (see, e.g., Aza¨ıs and Wschebor, 2009; Bolin and Lindgren, 2015), but also level set (Berkenkamp et al., 2017). If ] the set is sometimes referenced as sojourn set (Spodarev, 2014) or sublevel set (Gotovos et al., 2013). Our work is primarily motivated by the case ), however it may also applied to

Throughout the article, we model f as a realization of a Gaussian process (GP) and, following Sacks et al. (1989) and Santner et al. (2018), we emulate f with the posterior GP distribution given the available function evaluations. The posterior GP distribution can be used as building block for different estimates of Γ(f), see, e.g., Azzimonti (2016).

Consider now a generic estimate ) and denote with vol(A) the volume of Simple quality indicators for a set estimate are the volume of false positives vol()), i.e. the volume of points estimated in the set while actually outside Γ(f), and the volume of false negatives vol(Γ(Γ), i.e. the volume of points estimated not in the excursion set while actually inside. For example, in Chevalier (2013) and Chevalier et al. (2014a), Γ(f) is estimated with the Vorob’ev expectation, a notion borrowed from random set theory (Molchanov, 2005, Chapter 2), that aims to minimize the overall volume of misclassified points. Figure 1a shows an analytical example where the input space is X = [0, 1] and the function f is generated as a realization of a GP (purple dashed line) with mean zero and Mat´ern covariance kernel with hyper-parameters see, e.g., Rasmussen and Williams (2006), Chapter 4, for details on the parametrization. We build a GP model (black solid line) from n = 10 evaluations of f (black triangles) chosen with a maximin Latin hypercube sample (LHS) design and we estimate Γ(f) (purple dotted horizontal line), where ), with the Vorob’ev expectation (, middle horizontal blue line); a comparison of with the true excursion set shows that here has volumes of false positive (0.084) and negatives (0.025) of the same order of magnitude.

The Vorob’ev expectation, explained in more details in section 2.1, gives a similar importance to false positives and false negatives. However, in a number of applications, the cost of misclassification is not symmetric with higher penalties for false positives, for instance, than for false negatives. Practitioners may be interested in set estimates which would very likely be included in an excursion set of the form Γ(Such a property naturally gives more importance to the minimization of (the volume of) false positives than of false negatives. French and Sain (2013) and Bolin and Lindgren (2015) introduced the concept of conservative estimates which select sets that are deliberately smaller – in volume – than Γ(f) and are included in the excursion set with a large . The empty set trivially satisfies this probabilistic inclusion property, therefore conservative estimates are selected as sets with maximal volume in a family of possible estimates. A conservative estimate at level 1 thus enforces a low probability of false positives.

In a reliability engineering framework, the excursion set can be the set of safe config-urations and a conservative estimate aims at selecting a region which is included in the safe set. Figure 1a shows a conservative estimate at level , green top horizontal line). In this example, CEhas a false positive volume equal to zero, however a much higher volume of false negative (0.121) than the Vorob’ev expectation. For a fixed threshold t, the excursion set above t is trivially the complement of the sojourn set below t. Note, however, that this does not hold for their respective set estimates. In particular,

Figure 1: Example of excursion set estimation. Comparison of Vorob’ev expectation (and conservative estimate (CE). The numbers near the new points indicate the order in which they were added to the DoE.

the conservative estimate of an excursion set is not the complement of the conservative estimate of the corresponding sojourn set due to the probabilistic inclusion property.

French and Sain (2013) and Bolin and Lindgren (2015) proposed an approach to compute conservative estimates for a fixed Design of Experiments (DoE). However, to the best of our knowledge, there is no study on how to reduce the uncertainty on conservative estimates with adaptive strategies. Here we focus on the problem of sequentially choosing evaluation points in order to reduce the uncertainty on conservative estimates. In order to illustrate this concept, consider the example introduced in figure 1a and notice how conservative set estimates do not intercept the excursion near x = 0.8. In this case we can increase the size of our design of experiments by adaptively choosing new evaluations of f that help us better localize the excursion. Figure 1b shows an example of such adaptive DoEs where, starting from the initial DoE in figure 1a, 10 additional points are selected with strategy C introduced in section 3.

Previous adaptive DoE strategies for excursion set estimation mainly focused on recovering the boundaries of the set. In particular, Picheny et al. (2010) introduced the targeted IMSE (integrated mean squared error), tIMSE, criterion to add points at locations that improve the accuracy of the model around a certain level of the response variable. Bect et al. (2012) investigated the concept of Stepwise Uncertainty Reduction (SUR) strategies for GP (see also Vazquez and Bect, 2009; Chevalier et al., 2014a; Bect et al., 2017). Such strategies, however, do not provide any control on false positives and as such are not adapted to the conservative estimation case. Here, by shifting the focus on the control of false positives, we extend the conservative estimation framework introduced by Bolin and Lindgren (2015) to sequential design of experiments. For example, notice how in figure 1b, some points (e.g. numbers 1, 2, 8) are chosen far from the boundary, in order to improve the confidence on the classification of those regions. Here we consider a definition of conservative estimates well suited to excursion sets of Gaussian processes and we provide a SUR strategy with tractable criteria to reduce the uncertainty on conservative estimates. The adaptive strategies are introduced in the case of excursion sets above our R implementation, available on CRAN, allows also for excursions below t.

1.1 Outline of the paper

The remainder of the paper is structured as follows. In the next section we briefly recall some background material. In particular, section 2.1 reviews set estimates preliminary to this work and section 2.2 recalls the concept of SUR strategies. In section 3.1 we introduce the metrics used to quantify uncertainty on such estimates. In section 3.2, we detail the proposed sequential strategies and, in appendix A, we derive fast-to-compute formulae for the associated criteria and illustrate their implementation. Section 4 presents a benchmark study where we analyze a trade-off between noisy evaluations and batch size in three scenarios. Section 5 shows the results obtained on a reliability engineering test case. In appendix B we provide more properties for conservative estimates that further justify the choices made in section 2.1. In supplementary material we also apply the proposed strategies on a coastal flood problem. All proofs are in appendices A and B.

2 Background

Let us consider n observations of the function f, possibly tampered by measurement noise

with independent realizations of standard Gaussian measurement noise and homogeneous noise variance.

In a Bayesian framework (see, e.g., Chil`es and Delfiner, 2012, and references therein) we consider f as a realization of an almost surely continuous Gaussian process (GP) ), with mean function ] and covariance function Cov(The mean function could potentially have a structure such as are parameters to be estimated and are known basis functions. With this notation, is a realization of 0, we denote by the observations at an initial design of experiments (DoE) . The posterior distribution of the process is Gaussian with mean and covariance computed as the conditional mean conditional covariance given the observations, see, e.g., Santner et al. (2018), Chapter 4, for closed-form formulae.

2.1 Vorob’ev expectation and conservative estimates

The prior distribution on induces a (random) set Γ(omit the dependency on when obvious and refer to Γ() as Γ when appropriate. By using the posterior distribution of , we can provide estimates for Γ(f). See, e.g. Chevalier et al. (2014a), Bolin and Lindgren (2015) and Azzimonti (2016) for summaries of different approaches. A central tool for the approach presented here is the coverage probability

function of a random closed set Γ, defined as

In our case we consider the posterior coverage function , defined with the posterior probability

Table 1: Summary values for example in figure 2, estimated from 100 GP realizations.

) is the CDF of a standard Normal random variable and

). The coverage function defines the family of

with 1]. These sets are closed for each 1] (see Molchanov, 2005, Proposition 1.34) and form a family of possible estimates parametrized by

The level can be selected in different ways. The choice 5 leads to the median, which is not conservative. (Vorob’ev, 1984; Molchanov, 2005; Chevalier et al., 2013) relies on the notion of measure. In the example in figure 1b and in the applications presented here we use the standard volume, however here we introduce the concept in a slightly more general form by using a finite measure , for example, be a probability distribution on X. The Vorob’ev expectation is defined as the quantile is also the minimizer of among all measurable sets such that (Γ)], see, e.g., Molchanov (2005, Theorem 2.3, Chapter 2). Vorob’ev expectation minimizes a uniformly weighted combination of the expected measure of false positives (Γ)], also called type I error) and false negatives () among sets with measure equal to (Γ)]. In appendix B.2 we prove a similar result for generic Vorob’ev quantiles. The quantity )], for two random sets Γ, is often called expected distance in measure. Chevalier (2013) used this distance to adaptively reduce the uncertainty on Vorob’ev expectations. In section 3.1 we adapt it for conservative estimates.

Figure 2: 1-dimensional example, n = 10 evaluations. Type I (red, diamonds) and Type II (green, triangles) errors for Vorob’ev expectation and conservative estimate.

Conservative estimates (Bolin and Lindgren, 2015; French and Sain, 2013) embed probabilistic control on false positives in the estimator. Denote with C a family of closed subsets

in ) is a set CE

The set CEis therefore a maximal set (according to ) in the family C such that the posterior probability of inclusion is at least . Here, by following French and Sain (2013), Bolin and Lindgren (2015) and Azzimonti and Ginsbourger (2018), we choose C as the family of Vorob’ev quantiles as introduced in equation (3). While the concept of probabilistic inclusion might seem unusual at first, conservative estimates are actually linked with the well known concept of confidence regions, as we briefly show in appendix B.1. Note further that the condition the probability of false positives. We can visualize this property on the one dimensional example introduced in figure 1 by empirically estimating the expected measure of false positives, Γ)]. Figure 2 shows 80 posterior realizations of the GP (light dashed black lines) and for each realization we computed the false positive (type I error, horizontal red lines) and false negatives (type II error, horizontal green lines). Notice how they are symmetrically minimized by the Vorob’ev expectation (figure 2a) while the conservative estimate with 95 (figure 2b) has small false positives and much larger false negatives. Table 1 reports the values for the expected volume of type I and II errors and the estimated probability of inclusion, ˆΓ). The Vorob’ev expectation may be closer to the truth than conservative estimates, especially for small DoEs, however CEgives control on the probability of false positives. Table 1 also reports the values for the Vorob’ev quantile , i.e. a non adaptive high quantile choice for . Note that fact, the quantile’s definition based on the marginal probability not imply any statement on the joint probability of inclusion.

The computation of CEin equation (4) requires finding a set C of maximum measure among sets included in the random set Γ with probability at least family of Vorob’ev quantiles 1], this optimization can be solved with a simple dichotomic search on . See appendix B.2 for more details. If we approximate is a set of large. The probability above is then computed efficiently with the integration technique proposed by Azzimonti and Ginsbourger (2018). The number generally chosen as large as the computational budget allows. This technique can also be used for excursion sets above t. An alternative method, not used here, is Monte Carlo with conditional realizations of the field, see, e.g. Azzimonti et al. (2016) for fast approximations of conditional realizations. The optimal Vorob’ev level chosen for conservative estimates at level is denoted by in what follows : CE

2.2 SUR strategies

Sequential design of experiments adaptively chooses the next evaluation points according to a strategy with the aim of improving the estimation of a quantity (or set, here) of interest. As shown in the introduction we can improve the set estimates in figure 1a by carefully adding new function evaluations, see figure 1b. There are many ways to build a sequential DoE, see, e.g., Santner et al. (2018), chapter 6. Here we follow the Stepwise Uncertainty Reduction approach (SUR, see, e.g., Fleuret and Geman, 1999; Bect et al., 2012; Chevalier et al., 2014a; Bect et al., 2019) and select a sequence of points in order to reduce the uncertainty of a quantity of interest.

In the remainder of the paper we consider that the first respective evaluations are known and we denote by ] the expectation conditional on We are interested in selecting the next batch of q locations . The advantage of batches with q > 1 lies in the fact that parallel function evaluations, when available, can save the user wall-clock time. In a sequential setting the response values at these points are unknown before evaluations, therefore we denote by ] the conditional expectation given the first n evaluations and with the next locations fixed at

For a specific problem, we consider a quantity, denoted by , which measures the residual uncertainty at step n. We define this quantity for the conservative estimation problem in section 3.1. If the first n locations and evaluations are known, then (deterministic) real number quantifying the residual uncertainty on the estimate. As an example, consider the setup in figure 1 and the uncertainty

Figure 3: Adaptive DoE with SUR strategy on the example introduced in figure 1.

can compute 23. On the other hand, the quantity , seen from step n, is random because is random. The next batch of q locations can then be selected following the principles of a SUR strategy, i.e. by

setting

a minimizer of the future uncertainty in expectation. For a more complete and theoretical perspective on SUR strategies see, e.g., Bect et al. (2019) and references therein. There are many ways to proceed with the minimization introduced above. See, e.g., Osborne et al. (2009), Ginsbourger and Le Riche (2010), Bect et al. (2012), Gonz´alez et al. (2016) and references therein. The objective function in equation (5) is a batch sequential one-step

lookahead sampling criterion and is denoted by

We can build a SUR strategy with the uncertainty )] mentioned above. The criterion associated with this uncertainty has the remarkable property that it can be computed with fast-to-evaluate formulae, thus making its optimization more convenient. Figure 3a shows the function is chosen as the minimizer of this function restricted to a finite discretization of the domain. Figure 3b shows the updated GP model and CEwhich, compared to figure 1a,

is now larger while still included inside the true set.

The expectation can only be computed if we know K which is often chosen from a parametric family depending on few hyper-parameters, in the analytical example. In practice, the hyper-parameters are unknown and can be estimated with a plug-in or with a fully Bayesian approach. In this work we follow the previous literature on boundary estimation with GP models (see, e.g. Ranjan et al., 2008; Picheny et al., 2013; Chevalier et al., 2014a; Azzimonti and Ginsbourger, 2018) and we use plug-in maximum likelihood estimates for the hyper-parameters. Model checking procedures, such as cross-validation, can be used to evaluate the robustness of hyper-parameters’ estimates. If only few observations are available, a fully Bayesian approach might better capture the overall uncertainty. However such an approach is not straightforward for many SUR criteria (see, e.g., Stroh, 2018) and it involves an additional computational cost.

In the next section we detail two uncertainty functions tailored for conservative estimates and we show how their respective SUR criteria can be computed.

3 SUR strategies for conservative estimates

3.1 Uncertainty quantification on conservative estimates

Our object of interest is Γ(f), therefore we require uncertainty functions that take into account the whole set. Chevalier et al. (2013) and Chevalier (2013) evaluate the uncertainty on the Vorob’ev expectation with the , i.e. the expected distance in measure between the current estimate and the set Γ. In this section we introduce an uncertainty suited for conservative estimates. The idea is to describe the uncertainty by looking at the expected measure of false negatives. In the example of figure 2b, this quantity is the mean measure of the sets in green. Expected distance in measure and false negatives are related concepts and, in order to highlight this connection, let us first recall

(Chevalier, 2013) that the Vorob’ev deviation of a quantile

In the following sections, will be chosen either as the Vorob’ev median, the threshold for a conservative estimate at level evaluations,

Let us denote by ) the random variables associated with the measure of the first and the second set difference in equation (6) and recall that their expectations, i.e. )], are called Type I and Type II errors respectively. Type II error provides a quantification of the residual uncertainty on the conservative estimate; we formalize this concept with the following definition.

Definition 1 (Type II uncertainty). Consider the Vorob’ev quantile corresponding

to the conservative estimate at level Type II uncertainty is defined as

This definition of residual uncertainty is reasonable for conservative estimates because they aim at controlling the error )]. In particular it is possible to show that the ratio between the Type I error and the measure of a conservative estimate is bounded by a constant which is close to zero when is close to one.

Proposition 1. Consider the conservative estimate , then the ratio between the error and the measure is bounded by 1

Proof. See appendix B.

If the posterior GP mean provides a good approximation of the function f, conservative estimates with high tend to be inside the true set Γ(f). In such situations the Type I error is usually very small while Type II error could be rather large. Note the differences in Type I/II errors reported in table 1 for the analytical example. Type II uncertainty is thus a relevant quantity when evaluating conservative estimates. In the test case studies we also compute the expected type I error to check that it is consistently small.

Figure 2b provides a visualization of type II uncertainty: the green horizontal lines are realizations of Γ obtained from posterior GP draws. Type II uncertainty is the expected value of the measure of such sets. In the example shown, this amounts to 0.187. Consider now the updated GP estimate in figure 1b where 10 new points were added to the initial DoE by following an adaptive strategy that will be described later. Figure 4a

Figure 4: 1d example, DoE with initial points of figure 1b plus 10 adaptive new points.

shows how the type II uncertainty is reduced in the updated model: the green horizontal lines are much shorter, resulting in a smaller expected measure of 0

The expectation and integration operators can be exchanged in equation (7), therefore

type II uncertainty can be further written as

where denotes the indicator function of the complement of set A. Figure 4b plots the integrand in equation (8) for the example in figure 1b with n = 10, after the initial LHS design, (black solid line), and for n = 20 (red dashed line), after 10 points were added with the same adaptive strategy as in figure 4a. The large bumps shown for n = 10 are reduced to small spikes after the new points are added in appropriate locations.

3.2 SUR criteria

Suppose that the first n locations and their respective function evaluations are known. Here we introduce one-step lookahead SUR criteria for conservative estimates based on the measures of residual uncertainty previously introduced. In a sequential algorithm we minimize such criteria to select the next batch of q > 0 locations

Since the locations and the responses are unknown, the

uncertainty and the conservative level are random variables. The criteria intro- duced below (equations (9) and (10)) are properly defined for , the conservative level after n + q evaluations, however, in that case the expectations involved can only be computed with an expensive Monte Carlo procedure. The criteria’s implementations use the last known level which allows to expand the criteria in fast-to-evaluate formu- lae. We consider two sampling criterion based on the uncertainty functions in equations (6) and (7).

The criterion is defined as

for is the Vorob’ev quantile obtained with n + q evaluations of the function at level , the conservative level obtained with n eval- uations. This is an adaptation of the Vorob’ev criterion introduced by Chevalier (2013) based on the Vorob’ev deviation (Vorob’ev, 1984; Molchanov, 2005; Chevalier et al., 2013). In Chapter 4.2, Chevalier (2013), derives the formula for this criterion for the Vorob’ev expectation, i.e. the quantile at level

Note that each evaluation of ) requires calculating the expectation could, in principle, be achieved with a Monte Carlo procedure that draws samples from the posterior distribution of Z, generate posterior samples for Γ and uses such samples to empirically evaluate the expectation. This procedure however could potentially be very costly and, since many evaluations of are required in order to find its minimum, we provide in appendix A, proposition 2 a fast-to-evaluate formula to compute this criterion for any

In the case of conservative estimates with high level , each term of equation (6) does not contribute equally to the expected distance in measure, as observed in proposition 1. It is thus reasonable to consider the following criterion.

Definition 2 (Type II criterion), the Vorob’ev quantile from n+q eval- uations with , the conservative level obtained with n evaluations. The Type II criterion

Figure 5: Comparison of (Bect et al., 2012) and on the example in figure 1.

is defined as

Proposition 3, appendix A, provides a fast-to-evaluate formula for (10).

The criteria are implemented in this work with a plug-in approach for covariance hyper-parameters, i.e. at each step the hyper-parameters Θ are estimated with maximum likelihood. A fully Bayesian approach would lead to a higher degree of conservativeness for the final estimate, as the hyper-parameter uncertainty would be accounted for. The formulae introduced in propositions 2 and 3 could be adapted to a fully Bayesian approach, however their evaluation requires advanced Monte Carlo techniques (Stroh, 2018) and it will be a future topic of research.

Figure 5 shows a comparison of estimated type I and type II errors obtained with strategy (Bect et al., 2012, equation (23)) and with strategy in the experimental setting of figure 1. Note how for the Vorob’ev expectation (blue lines) the two strategies (dashed or solid lines) produce very similar results, however for conservative estimates (green + lines), (solid lines) reduces type II error faster than (dashed lines).

Table 2: MC function evaluation scenarios, total cost

3.3 Implementation details

Propositions 2 and 3, in appendix A, provide fast-to-evaluate expressions for the criteria, however their computation requires numerical approximations. See appendix A for more details. New points are obtained by minimizing numerically the selected criterion, we use the genetic algorithm using derivatives of Mebane and Sekhon (2011) to solve the optimization problem.

The strategies are implemented in the R programming language (R Core Team, 2018) in the package KrigInv (Chevalier et al., 2014c). The function EGIparallel in KrigInv produces adaptive designs such as the one in figure 1b by automatically optimizing the criterion interfaces with DiceKriging (Roustant et al., 2012) for GP modeling, rgenoud (Mebane and Sekhon, 2011) for the optimization routine and anMC (Azzimonti and Ginsbourger, 2018) for conservative estimates. See algorithm 1, in supplementary materials.

4 Numerical benchmark for batch-sequential criteria

4.1 Noisy function evaluation scenarios

In this section we consider a synthetic numerical study that shows a practical use for batch-sequential criteria. The implementation of uncertainty quantification for expensive-to-evaluate computer experiments is often run on cloud computing platforms. When using such platforms, practitioners often have a fixed total computational budget which is determined, for example, by the money/time available to run experiments. Resources can be deployed in parallel by creating new computational nodes or sequentially by employing one node for longer time. Nodes are often virtual on such platforms, so there is no restriction on the number of parallel units available.

Here we consider how to allocate resources in order to provide a conservative estimate and reduce the uncertainties for the set Γ(f) in (1). In our setting the evaluations of the function f are approximated with Monte Carlo sampling, i.e. for i = 1, . . . , n, we obtain a value are realizations of i.i.d. Gaussian random variables with zero mean and variance . The number of Monte Carlo samples, fixed before the experiment starts and kept constant throughout. The observation model above can be written as with overall measurement noise cost of one observation is proportional to and for larger costs we achieve smaller variance . Note that noise variance is assumed here homogeneous, i.e. does not depend on the location . See Picheny et al. (2013) for an example of online allocation applied to the problem of minimizing a noisy function with tunable precision. We consider an adaptive strategy with k iterations of a batch sequential strategy that selects 1 new points for each iteration, i.e. n = kq function evaluations overall. The total cost of the procedure is therefore ), where we assume that the costs of training the GP and of optimizing the criterion are negligible. Since is fixed, then the choices of k and q are linked. A larger leads to more precise observations, however to a smaller overall number of evaluation n.

We study four possible allocations of resources described in table 2 which range from a “purely sequential” strategy, where at each iteration all resources are used to reduce evaluation noise at a single input location, to a batch-sequential strategy where at each iteration q=16 different locations are explored with a high noise level. Note that the second strategy is a hybrid strategy where we add 1 new evaluation with the criterion selected and 7 others with a randomized LHS space-filling criterion.

We analyze the trade-off between batch size and noise level on a synthetic test case where we assume that the function f is a realization of () with constant

Table 3: Strategies implemented in the test cases.

mean function m and Mat´ern covariance kernel K with smoothness parameter variance 2. The noise variance is described by the column in table 2 and depends on the specific scenario. The set to estimate is Γ(, an excursion above as the measure . For each scenario we consider an initial DoE of size model with zero prior mean and Matern 3/2 covariance kernel with hyperparameters and known and set to the values specified above. We select the next function evaluation with the strategies listed in table 3, where we recall that the strategy IMSE chooses the next evaluation by minimizing the integrated mean squared error of the prediction, see, e.g. Sacks et al. (1989) and tIMSE is the targeted IMSE strategy described in Picheny et al. (2010). We run each strategy for the number of iteration specified in table 2. We consider with different values for

Figure 6 shows a comparison of expected type II errors for the strategies listed in table 3 averaged over the 10 replications and the 10 initial DoEs. Scenario q = 1 shows a faster decrease in type II error for strategies B and C, however there are not enough function evaluations to reach convergence. For the other scenarios, the change in the median value of type II error is smaller than 1% between successive iterations before the final iteration. Note that in figure 6 (a), (c), (d) the differences between strategies are clear: strategies C and B are the fastest in reducing the error, followed by A and tIMSE; IMSE instead achieves the slowest error reduction. This reflects the fact that adaptive strategies tailored

Figure 6: Expected Type II error in different batch sequential scenarios.

to the problem require fewer iterations to reduce type II error. In figure 6 (b), the ranking between strategies is similar however the differences between strategies are less important. This is due to the effect of adding 7 new input points with the same space-filling strategy independently of the criterion used to select the first point.

Figure 6 shows that, in this example, the parallel scenarios provide a much faster convergence for all strategies considered. Another aspect to consider is wall-clock time: under the assumption that Monte Carlo samples can be evaluated in parallel, then all scenarios require the same wall-clock time. In some cases, however, the MC samples required for evaluating the function at one new input can be computed only on one computational node. Then the procedure with ever, for q > 1, each new input could be evaluated in parallel and the wall-clock time would become . Therefore, in case of non-parallelizable MC computations, a greatly reduced wall-clock time would also be an additional benefit of batch sequential scenarios with q > 1. Note that here the noise is set relatively low in all scenarios. In

Figure 7: GP realizations. Average proportion of points inside the excursion region.

high noise scenarios, where a strong trade-off between noise and number of evaluations is required, the situation is less clear. In supplementary material we present an example where the observations have higher noise variance. In such example, batch-sequential strategies still outperform sequential ones however the difference between scenarios is less pronounced.

4.2 Model-free comparison of strategies

The metrics presented in the previous sections are based on the GP model. In this section we compare the strategies with a simpler metric independent from the underlying model.

We consider the number of evaluation points that are selected inside and outside the excursion set. At each iteration i, this quantity is computed as the total number of points at iteration are the evaluations. Figure 7 shows the proportion of points inside the excursion set at each iteration for the three scenarios outlined in the previous section. Strategy IMSE is a space filling strategy therefore the proportion of points inside the excursion approximates the volume of excursion. In the scenario q = 1, the strategies have not yet reached convergence, therefore both B and C tend to select more points inside the set than A and tIMSE to consolidate the conservative estimation. The effect of the 7 points chosen with a space-filling strategy in scenario q = 1 + 7 is clear in figure 7b where all strategies show proportions very close to IMSE which remain stable as the iteration number grows. On the other hand, figure 7c and figure 7d again show a fast increase in the proportion of points inside for strategies B and C, however this is a transitory behavior and this proportion starts to decrease already after iterations 8 and 10 respectively. This highlights a tendency to explore the space by those strategies which was also verified visually by looking at the sequence of DoEs. Strategies A and tIMSE instead tend to choose points around the boundary of the set therefore they initially choose fewer points inside the set and even after convergence they do not show an exploratory behavior.

5 Reliability engineering test case

In reliability engineering applications, the set Γ(f) in equation (1) often represents safe inputs for a system. In such settings, it is vital to avoid flagging unsafe regions as safe.

Figure 8 shows an example of such reliability engineering applications: a test case from the French Institute for Radiological Protection and Nuclear Safety (IRSN). The problem concerns a nuclear storage facility and we are interested in estimating the set of parameters that lead to safe storage of the material. Since this is closely linked to the production of neutrons, the safety of a system is evaluated with the neutron multiplication factor produced by fissile materials, called k-effective or k-eff : ). In our application 5] with the two parameters representing the fissile material density,

PuO, and the water thickness, HO. We are interested in the set of safe configurations

where the threshold t = 0.92 was chosen, for safety reasons, lower than the true critical case (k-eff = 1.0) where an uncontrolled chain reaction occurs. Figure 8a shows the set Γ(k-eff) shaded in blue and the contour levels for the true function computed from evaluations

Figure 8: Nuclear criticality safety test case. k-eff function (left), conservative and non-conservative estimates with 15 (LHS design, middle) and 75 (15+60 strategy C) evaluations.

over a 50 50 grid, used as ground truth. The true data result from a MCMC simulation

and have a heterogeneous noise variance. Here we consider the k-eff function in figure 8 obtained from 50 50 evaluations of k-eff smoothed with a GP model that accounts for a prescribed value of noise variance provided by the simulator and considered as the true variance.

We consider a GP model with covariance function from the Mat´ern family and homogeneous noise variance estimated from the data. We choose the regularity parameter in order to represent the regularity of the underlying phenomenon. The initial DoE is a Latin hypercube sample design with triangles in figure 8a. We consider the five strategies listed in table 3 and we compare them on from the package lhs (Carnell, 2019) in R.

Figure 8b shows a conservative estimate at level 95 (shaded green) and a non conservative one (Vorob’ev expectation, shaded red) obtained from one of the 10 DoEs, the true set Γ(k-eff) is delimited in blue. Figure 8c shows that, as more evaluations are

Figure 9: Nuclear criticality safety test case, randomized initial DoEs.

available, conservative and non-conservative estimates both get closer to the true safe set. The estimates in this example are computed from 75 function evaluations, where the last 60 points were selected sequentially with strategy C.

We now test how to adaptively reduce the uncertainty on the estimate with the strategies in table 3. We run n = 20 iterations of each strategy and at each step we select a batch of q = 3 new points where k-eff is evaluated. The covariance hyper-parameters are re-estimated at each iteration. The conservative estimates are computed with the Lebesgue measure

Figure 9a shows the type II error (as percentage of the total measure of X) at the last iteration, i.e. after 75 evaluations of the function, for each initial DoE and each strategy. Strategy C achieves a median type II error 27% lower than IMSE. Strategy B median type II error is 25% lower than IMSE and strategy A’s 12% lower than IMSE.

Figure 9b shows the relative volume error as a function of the iteration number for strategies IMSE, tIMSE, A, B, C. The relative volume error is computed by comparing the conservative estimate with a ground truth for Γ(f) obtained from evaluations of k-eff on a 50 50 grid. The volume of Γ(f) computed with numerical integration from this grid of evaluations is 88.16% of the total volume of the input space. All strategies show a strong decrease in relative volume error in the first 10 iterations, i.e. until 30 evaluations of k-eff are added, and strategies B, C show the strongest decline in error in the first 5 iterations. Overall, strategy C, the minimization of the expected type II error, seems to yield the best uncertainty reductions both in terms of relative volume error and type II error.

6 Discussion

In this paper we introduced sequential uncertainty reduction strategies for conservative estimates. Such set estimates proved to be useful in a reliability engineering example, however they could be of interest in any situation where practitioners aim at controlling the overestimation of the set. The estimator CE, however, depends on the quality of the underlying GP model. Under the model, conservative estimates control, by definition, the false positive or type I error. If the GP model is not reliable then such estimates are not necessarily conservative. For a fixed model, increasing the level of confidence might mitigate this problem. We presented test cases with fixed 95, however testing different levels, e.g. 995, and comparing the results is a good practice. The computation of the estimator CE requires the approximation of the exceedance probability of a Gaussian process. This is currently achieved with a discrete approximation, however continuous approximations might be more effective.

The sequential strategies proposed here provide a way to reduce the uncertainty on conservative estimates by adding new function evaluations. They were introduced with a homogeneous noise variance observation model, however as shown in appendix A, the criteria implementations are available also in the heterogeneous noise variance case. Under such observation model, estimating the heteroskedastic noise variance structure can be challenging, see Binois et al. (2018) for more details. The numerical studies presented in the homogeneous and noise-free cases showed that adaptive strategies provide a better uncertainty reduction than generic strategies. In particular, strategy C, i.e. the criterion ), is among the best criteria in terms of Type 2 uncertainty and relative volume error in all test cases. In this work we mainly focused on showing the differences between strategies with a-posteriori measures of uncertainty. Expected type I and II errors could also be used to provide stopping criteria for the sequential strategies. Further studies on those quantities could lead to a better understanding of their the limit behavior as n

increases.

The strategies proposed in this work focus on reducing the uncertainty on conservative estimates. This objective does not necessarily lead to better overall models for the function or to good covariance hyper-parameters estimation. The sequential behavior of hyper-parameters maximum likelihood estimators under SUR strategies needs to be studied in more details and, in supplementary material, we report a small preliminary study on this aspect. On the other hand, a fully Bayesian approach, accounting for hyper-parameter uncertainty, could be used to strengthen the procedure’s overall conservativeness.

Supplementary Materials: additional test case, more details on the numerical bench-

On-line code: a git repository that allows (partial) reproducibility for the experiments

References

Adler, R. J. and Taylor, J. E. (2007). Random Fields and Geometry. Springer, Boston.

Arnaud, A., Bect, J., Couplet, M., Pasanisi, A., and Vazquez, E. (2010). ´Evaluation d’un risque d’inondation fluviale par planification s´equentielle d’exp´eriences. In 42`emes Journ´ees de Statistique de la SFdS.

Aza¨ıs, J.-M. and Wschebor, M. (2009). Level sets and extrema of random processes and fields. Wiley Online Library.

Azzimonti, D. (2016). Contributions to Bayesian set estimation relying on random field priors. PhD thesis, University of Bern.

Azzimonti, D., Bect, J., Chevalier, C., and Ginsbourger, D. (2016). Quantifying uncertain- ties on excursion sets under a Gaussian random field prior. SIAM/ASA J. Uncertain. Quantif., 4(1):850–874.

Azzimonti, D. and Ginsbourger, D. (2018). Estimating orthant probabilities of high di- mensional gaussian vectors with an application to set estimation. J. Comput. Graph. Statist., 27(2):255–267.

Bayarri, M. J., Berger, J. O., Calder, E. S., Dalbey, K., Lunagomez, S., Patra, A. K., Pitman, E. B., Spiller, E. T., and Wolpert, R. L. (2009). Using statistical and computer models to quantify volcanic hazards. Technometrics, 51(4):402–413.

Bect, J., Bachoc, F., and Ginsbourger, D. (2019). A supermartingale approach to Gaussian process based sequential design of experiments. Bernoulli, 25(4A):2883–2919.

Bect, J., Ginsbourger, D., Li, L., Picheny, V., and Vazquez, E. (2012). Sequential design of computer experiments for the estimation of a probability of failure. Stat. Comput., 22 (3):773–793.

Bect, J., Li, L., and Vazquez, E. (2017). Bayesian subset simulation. SIAM/ASA J. Uncertain. Quantif., 5(1):762–786.

Berkenkamp, F., Turchetta, M., Schoellig, A., and Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 908–918. Curran Associates, Inc.

Binois, M., Gramacy, R. B., and Ludkovski, M. (2018). Practical Heteroscedastic Gaussian Process Modeling for Large Simulation Experiments. Journal of Computational and Graphical Statistics, 27(4):808–821.

Bolin, D. and Lindgren, F. (2015). Excursion and contour uncertainty regions for latent Gaussian models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 77(1):85–106.

Carnell, R. (2019). lhs: Latin Hypercube Samples. R package version 1.0.1.

Chevalier, C. (2013). Fast uncertainty reduction strategies relying on Gaussian process models. PhD thesis, University of Bern.

Chevalier, C., Bect, J., Ginsbourger, D., Vazquez, E., Picheny, V., and Richet, Y. (2014a). Fast kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics, 56(4):455–465.

Chevalier, C., Ginsbourger, D., Bect, J., and Molchanov, I. (2013). Estimating and quan- tifying uncertainties on level sets using the Vorob’ev expectation and deviation with Gaussian process models. In Uci´nski, D., Atkinson, A., and Patan, C., editors, mODa 10 – Advances in Model-Oriented Design and Analysis. Physica-Verlag HD.

Chevalier, C., Ginsbourger, D., and Emery, X. (2014b). Corrected kriging update formulae for batch-sequential data assimilation. In Mathematics of Planet Earth, Lecture Notes in Earth System Sciences, pages 119–122. Springer Berlin Heidelberg.

Chevalier, C., Picheny, V., and Ginsbourger, D. (2014c). The KrigInv package: An effi- cient and user-friendly R implementation of kriging-based inversion algorithms. Comput. Statist. Data Anal., 71:1021–1034.

Chil`es, J.-P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, Second Edition. Wiley, New York.

Emery, X. (2009). The kriging update equations and their application to the selection of neighboring data. Comput. Geosci., 13(3):269–280.

Fleuret, F. and Geman, D. (1999). Graded learning for object detection. In Proceedings of the workshop on Statistical and Computational Theories of Vision of the IEEE international conference on Computer Vision and Pattern Recognition (CVPR/SCTV), volume 2.

French, J. P. and Sain, S. R. (2013). Spatio-temporal exceedance locations and confidence regions. Ann. Appl. Stat., 7(3):1421–1449.

Ginsbourger, D. and Le Riche, R. (2010). Towards gaussian process-based optimization with finite time horizon. In mODa 9 – Advances in Model-Oriented Design and Analysis, pages 89–96. Springer.

Gonz´alez, J., Osborne, M., and Lawrence, N. D. (2016). GLASSES: Relieving The Myopia Of Bayesian Optimisation. In 19th International Conference on Artificial Intelligence and Statistics, pages 790–799.

Gotovos, A., Casati, N., Hitz, G., and Krause, A. (2013). Active learning for level set esti- mation. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, pages 1344–1350. AAAI Press.

Mebane, W. R. J. and Sekhon, J. S. (2011). Genetic optimization using derivatives: The rgenoud package for R. Journal of Statistical Software, 42(11):1–26.

Molchanov, I. (2005). Theory of Random Sets. Springer, London.

Osborne, M. A., Garnett, R., and Roberts, S. J. (2009). Gaussian processes for global optimization. In 3rd international conference on learning and intelligent optimization (LION3), pages 1–15.

Picheny, V., Ginsbourger, D., O., R., Haftka, R., and Kim, N. (2010). Adaptive designs of experiments for accurate approximation of a target region. ASME. J. Mech. Des., 132(7):071008–071008–9.

Picheny, V., Ginsbourger, D., Richet, Y., and Caplin, G. (2013). Quantile-based optimiza- tion of noisy computer experiments with tunable precision. Technometrics, 55(1):2–13.

R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Ranjan, P., Bingham, D., and Michailidis, G. (2008). Sequential experiment design for contour estimation from complex computer codes. Technometrics, 50(4):527–541.

Rasmussen, C. E. and Williams, C. K. (2006). Gaussian processes for machine learning. MIT Press.

Roustant, O., Ginsbourger, D., and Deville, Y. (2012). DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodelling and optimization. Journal of Statistical Software, 51(1):1–55.

Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). Design and analysis of computer experiments. Statist. Sci., 4(4):409–435.

Santner, T. J., Williams, B. J., and Notz, W. I. (2018). The Design and Analysis of Computer Experiments. Springer New York, New York, NY.

Spodarev, E. (2014). Limit Theorems for Excursion Sets of Stationary Random Fields, pages 221–241. Springer International Publishing, Cham.

Stroh, R. (2018). Sequential design of numerical experiments in multi-fidelity : Application to a fire simulator. Theses, Universit´e Paris-Saclay.

Vazquez, E. and Bect, J. (2009). A sequential bayesian algorithm to estimate a probability of failure. IFAC Proceedings Volumes, 42(10):546–550.

Vorob’ev, O. Y. (1984). Srednemernoje modelirovanie (mean-measure modelling). Nauka, Moscow, In Russian.

A Fast-to-evaluate formulae for sequential strategies

In this section we prove two propositions that allow for the computation of the criteria in equations (9) and (10). We consider a more generic observation model than equation (2), where the noise variance is heterogeneous, i.e. ) is a function of

First we extend the result in Chevalier (2013) to any level which is a function of past n observations.

Proposition 2 (Criterion

where is a fixed threshold, then the criterion defined by equation (9) can be

expanded as

where

assumed invertible, is the cumulative dis-

tribution of the bivariate centered Normal with covariance matrix Σ and Φ is the standard Normal cumulative distribution.

Proof. Recall that

Notice that, for each , the coverage function can be written as

where are defined in equation (13) and -dimensional normal

random vector. The first part of equation (15) is

where the second equality follows from equation (17) and the third from

Moreover

where Ψ is the p.d.f. of 1). By equations (14) to (16), (18) and (19) we obtain equation (12).

We provide below a formulation for the SUR criterion in equation (10) which is fast-to-evaluate and allows for faster optimization.

Proposition 3 (Type II criterion)

[is a fixed threshold, the criterion can be expanded as

Proof. The proof follows from equations (15) and (19).

The evaluation of require the computation of an integral over X with respect to . The integral can be computed with an importance sampling Monte Carlo method as in Chevalier et al. (2014c) or by fixing the integration points with space filling designs, such as a Sobol’ sequence or uniform sampling. If the dimension of X is high, the region of interest for sampling could become very small with respect to X and this would make simple Monte Carlo or importance sampling methods very inefficient. We did not observe this behavior in our experiments, however, in such cases sequential Monte Carlo (SMC) methods could provide better results. See, e.g., Bect et al. (2017) and references therein. We exploit the kriging update formulas (Chevalier et al., 2014b; Emery, 2009) for faster updates of the posterior mean and covariance when new evaluations are added.

B Properties of conservative estimates

B.1 Conservative estimates and confidence regions

and recall that a conservative estimate for Γ is chosen as the Vorob’ev quantile with ), then we have that is also the minimizer of ) under the constraint can look at as a confidence region for Γ, in the sense that it is the smallest set that contains Γwith a given probability.

In a reliability framework, if Γ is the set of safe configurations, then by selecting a conservative estimate for the safe set, we are actually selecting a confidence region for the dangerous configurations.

B.2 Conservative estimates with Vorob’ev quantiles

The conservative estimate definition in equation (4) requires a family C in which to search for the optimal set CE. In practice, it is convenient to choose a parametric family indexed by a real parameter. Here we choose , i.e. the Vorob’ev quantiles. This

is a nested family indexed by and, for each

We now detail how to compute CE, for a fixed tions. For each 1], we define the function 1] that associates to each the probability Γ). The function is non decreasing due to the nested property in equation (21). Moreover, . The computation of CEamounts to finding the smallest , which is achievable, for example, with a simple dichotomic search. The procedure above is valid for any nested family of sets indexed by a real parameter, however, the Vorob’ev quantiles, in addition, have the following property.

Proposition 4. Consider a measure and an arbitrary A Vorob’ev quantile minimizes the expected distance in measure with Γ among all measurable

Proposition 4 is an extension of Theorem 2.3, Molchanov (2005) to a generic Vorob’ev quantile. As a consequence, a conservative estimate CEcomputed with Vorob’ev quantiles minimizes the expected measure of false negatives (Γ) for fixed probability of false positives (Γ). In general, the Vorob’ev quantile chosen for CEprocedure is not the set S with the largest measure satisfying the property See supplementary material for a counterexample.

B.3 Proofs

In the following, let us denote by (Ω, F, P) a probability space.

Proof of proposition 4. We want to show that the set

for each measurable set ). Let us consider a measurable set M

such that ). For each Ω, we have

By applying the expectation on both sides, since ), we obtain

where the second equality comes from the definition of . Moreover, since

which shows that verifies equation equation (22).

Proof of proposition 1. Notice that for all Ω such that

0. By applying the law of total expectation we obtain

designed for accessibility and to further open science