We consider as reference problem the black-box optimization: the objective function and/or constraints are analytically unknown and evaluating them might be very expensive and noisy. In black-box situations as we cannot assume any prior knowledge about the objective function , any functional form is a priori admissible and the value of the function at a point says nothing about the value at other points: the only way to develop a problem specific algorithm is to assume a model of and to learn through a sample of function values.
Such an algorithm must be sample efficient, because the cost of function evaluations is the dominating cost. This problem has been addressed in several fields under different names, including active learning (Kruschke et al., 2008; Griffiths et al., 2008; Wilson et al., 2015), Bayesian Optimization (BO) (Zhigljavsky and Zilinskas, 2007), (Candelieri et al., 2018), (Archetti et al. 2019), hyperparameter optimization (Eggensperger et al., 2019) and others.
In the BO framework a surrogate model of the objective function is built to sum up our a priori beliefs about the objective function and the informative value of new observations. Two probabilistic frameworks are usually considered: the Gaussian Processes (GPs) and Random Forests (RF) which offer alternative ways to update the beliefs as new data arrives and to provide an estimate of the expected value of the objective function and the uncertainty in this estimate.
A distinction is usually drawn among them accordingly to the type of design variables: continuous ones are better dealt with GP while integer/categorical and conditional ones with RF.
In both cases the next sampled point is chosen on the basis of its informative value through the maximization of an acquisition function (also called infill): this choice brings up the so called βexploration vs exploitation dilemmaβ, where exploration means devoting resources to know more about possible solutions while exploitation devotes resources to improve on solutions already identified in previous phases. The search for the new point must strike an effective balance between the needs of exploration and exploitation.
Psychologists have extensively studied how humans balance exploration and exploitation (Krusche et al., 2008), (Mehlhorn et al., 2015), with a recent attention on the links between modern machine learning algorithms and psychological processes. (Gershman, 2018; Schulz et al., 2016; Gopnik et al., 2017). Psychological research has mostly focused on how people learn functions according to a protocol in which an input is presented to participants and they are asked to predict the corresponding output value. Then they observe the true output value (usually noisy) in order to update their own βpredictive modelβ that is to adjust their internal representation of the underlying function. Psychologists have largely focused on GP: the issue of GP regression, kernel composition for different degrees of smoothness and safe optimization in their relation to cognition is studied in a recent survey by (Shultz et al., 2018). Directed exploration is realized by adding the so-called uncertainty bonuses to estimated values obtaining the upper confidence bound (UCB) algorithm (Srinivas et al., 2010). In (Wu et al., 2018) the human search strategy is analysed for rewards under limited search horizons, concluding that GP offers the best model for generalization and UCB the best solution of the exploration/exploitation dilemma.
A significant application of RF is given in (Plonsky et al., 2019) as a hybrid model of machine learning and decision mechanisms. A key driver in the above research activities is that Human learners are increasingly fast at adapting to unfamiliar environments. Psychologists are investigating the intriguing gap between the capabilities of human and machine learning.
Most previous research findings in human learning refer to function learning because is related to a probabilistic perspective on predictability and provides a proxy to generalization capability. Contrary to function learning, optimization is not yet widely considered in the literature; in (Borji & Itti, 2013) a simple 1-D optimization problem has been considered.
The approach presented in this paper has been sketched in (Archetti et al., 2019). The present paper has been significantly augmented and rewritten. A set of new computational results are related to the use, along with the GP, of the RF as surrogate model. The set of references has been also enlarged and the whole perspective has been widened to reflect that: learning and optimization of black-box functions are two faces of the same coin. GP and RF are shown to offer a reasonable unifying framework of human function learning, active sampling and optimization.
The structure of the paper is as follows: section 2 outlines the methodological background of BO including the basis of the 2 main surrogate models GP and RF and the management of the exploration/exploitation dilemma. Section 3 is devoted to the experimental set-up and section 4 reports the experimental results about the behavioural patterns of humans in optimizing black box functions. Section 5 outlines the conclusions of this study and the perspectives of future works.
This section provides the underlying methodological framework of the study. The global optimization problem we consider is defined as:
where the search space is generally box-bounded and is black-box meaning
that no gradient information is available and that we have only access to noisy observation of which are computationally expensive.
2.1 Gaussian Processes
GPs are a powerful non-parametric model for implementing both regression and classification. One way to interpret a GP is as a distribution over functions, with inference taking place directly in the space of functions (Williams and Rasmussen, 2006). A GP, therefore, is a collection of correlated random variables, any finite number of which have a joint Gaussian distribution. A GP is completely specified by its mean function and covariance function
and will be denoted by: . This means that the behaviour of
the model can be controlled entirely through the mean and covariance.
Usually, for notational simplicity we will take the prior of the mean function to be zero, although this is not necessary. The covariance function assumes a critical role int the GP modelling, as it specifies the distribution over functions, depending on a sample
trix whose entry thus the covariance specifies how to points a correlated and it controls the shape of the objective function.
noisy objective function.
Therefore, the predictive equations for GP regression, that are can be easily updated, by conditioning the joint Gaussian prior distribution on the observations:
The covariance function is the crucial ingredient in a GP predictor, as it encodes assumptions about the function to approximate: function evaluations that are near to a given point should be informative about the prediction at that point. Under the GP view it is the covariance function that defines nearness or similarity. Once the prior mean and the kernel are chosen, they are updated with the observation of to find a-posterior distribution and this allows us to find the expected value of the function at any point and to calculate its uncertainty through its predicted variance.
Where is the length scale (updated by maximum likelihood destination) and is a parameter governing the regularity of the gp samples which are differential and where is the gamma function and is the modified Bessel function of the second kind. The most widely adopted versions, specifically in the Machine Learning community and considered in this paper, are . The Matern kernel encodes the expected smoothness of the target function explicitly.
2.2 Random Forest
Random Forest (RF) is an ensemble learning method, based on decision trees, for both classification and regression problems (Ho, 1995). According to the originally proposed implementation, RF aims at generating a multitude of decision trees, at training time, and providing as output the mode of the classes (classification) or the mean/median prediction (regression) of the individual trees.
Although originally designed and presented as a machine learning algorithm, RF is also an effective and efficient alternative to GP for implementing BO. RF consists of an ensemble of different regressors (i.e., decision trees), it is possible to computeβas for GP β both ΞΌ(x) and Ο(x), simply as mean and variance of the samples of the individual outputs provided by the regressors. Due to the different nature of RF and GP, the associated probabilistic surrogate models will also result significantly different. While GP is well-suited to model smooth functions in search space spanned by continuous variables, RF can also deal with discrete and conditional variables.
A basic description of the RF learning algorithm is provided in 3.2. Albeit GPs offer better mathematical characterization, RFs often result more computationally efficient than GP, (even with continuous variables), also because RFs do not require to invert any kernel matrix.
2.3 The acquisition functions
The acquisition function is the mechanism to implement the trade-off between exploration and exploitation in BO. More precisely, any acquisition function aims to guide the search of the optimum towards points with potentially high values of objective function either because the prediction of , based on the probabilistic surrogate model, is high or the uncertainty, also based on the same model, is high (or both). Indeed, exploitation means to consider the area providing more chance to improve over current solution, while exploration means to move towards less explored regions of the search space where predictions based on the surrogate model are more uncertain, with higher variance. There are many acquisition functions, we quote only those used in this study. Probability of Improvement (PI) (Kushner 1964) and Expected Improvement (EI) (Mockus 1975) measure, respectively, the probability and the expectation of the improvement over the best observed value of given the predictive distribution of the probabilistic surrogate model. More recently, Upper/Lower Confidence Bound, (Srinivas et al., 2010) is widely used. It is an acquisition function that manages explorationexploitation by being optimistic in the face of uncertainty where Upper and Lower are used, respectively, for maximization and minimization problems:
where is the parameter to manage the trade-off between exploration and ex-
ploitation (is for pure exploitation; on the contrary, higher values of emphasizes exploration by inflating the model uncertainty). In (Srinivas et al., 2010), a policy is provided for updating the value of along function evaluations, with also a proof of convergence of such a policy.
In the case of a minimization problem the next point is chosen as
while, in the case of a maximization problem the next point is selected as
2.4 Bayesian optimization
The following algorithm summarizes a general Bayesian Optimization process where the acquisition function, whichever it is, is denoted by . This function is generally maximized, but in the case of
With respect to the probabilistic surrogate model, the summarized algorithm does not specify the probabilistic surrogate model, as well as the kernel in the case of a GP. This is basically done in order to maintain the algorithm as general as possible.
In this study we have used GP (considering the kernel presented in the previous section) and RF as surrogate probabilistic models. The three different acquisition functions previously described have been used for both the two surrogates.
General Bayesian Optimization Algorithm
Generate an initial set of randomly sampled (e.g., via Latin Hypercube
Sampling)
Evaluate the function in the initial set of points and obtain
2 update the surrogate model obtaining the new estimates of 3 select a new by optimizing an acquisition function , such that
3.1 User interface
Stimuli are 15 different 2D functions among which at the start of each game the test function is randomly chosen (Adorio et al., 2005). Each subject was informed about the goal of the experiment and the available number of clicks for the play, before to start. Subjects started by clicking any point in the screen getting the corresponding value; previously clicked points remained on the screen until the end of the trial. In particular, points are colored and resized according to the associated score, providing a visual feedback about the distribution of the scores collected so far.
We have developed three different game modalities:
1. Find the point with maximum value without knowing its value. In this case the hu-
mans have to optimize the simple regret (without knowing
2. Find the point with maximum value knowing its value. As in the previous modality,
the humans have to optimize the simple regret, but without knowing
3. Maximize the total score, that is the cumulative value of the selected points, without
knowing the value of the maximum. In this case the humans have to optimize the cumulative regret (without knowledge):
Data of each game are stored in a database with the following structure (Fig. 1). In games table, each row represents a single point of a game, specifying the user, the function and the game mode. The games are identified by the timestamp relative to the end of the game.
Fig. 1. ER diagram of database
The following picture (Fig. 2) shows a frame of the game.
Fig. 2. An example of a game play
3.2 Procedure
For each player, at each iteration, a GP and RF models are fitted on the observed points and the three acquisition functions, previously mentioned in section 2, are used to select the next point to query. All these points are compared, via Euclidean distance, to the corresponding choice made by the player.
Moreover, for the GP, five kernels have been used to consider different possibility of smoothness approximation of the objective function.
Human player choices and Bayesian Optimization are considered compliant, pointwise, if the distance between the point chosen by the human player and the algorithmic player is less than a given βthresholdβ. Finally, the strategy of the human player is assimilated to the acquisition function most frequently compliant, pointwise, along a play. The procedures for GP-based and RF-based BO are summarized in the following pseudo-codes:
- a search strategy (i.e., an acquisition fuction, under a GP with a given kernl, at a specific iteration)
5 compute
Finally, for a given kernel , the search strategy of the participant is compliant to the most frequent acquisition function in the series
1 2 fit a Random Forest 3 4 select
5 compute
10 else
Finally, the search strategy of the participant is compliant to the most frequent acquisition function in the series
3.3 Software resources and analysis
Software resources consists of a pipeline of two components developed in R. The first component is the procedure to compute the distance between the humansβ and the BOβs choices during each game, the second aggregates all the calculated distances and generates the related statistics.
The first component uses the R package named βmlrMBOβ (https://cran.r- project.org/web/packages/mlrMBO/vignettes/mlrMBO.html). This library offers both GP and RF as surrogate model along with the acquisition functions adopted in this study. L-BFGS algorithm is used to optimize the acquisition function based on GP and with continuous search space; otherwise, βfocussearchβ algorithm (Bischl et al. 2017) is adopted: it can handle with numeric, discrete and mixed search spaces, also involving conditional variables. Focus-search starts with a large set of random points where the acquisition function is evaluated. Then, it shrinks the search space around the current best point and perform a new random sampling of points within the βfocused spaceβ. The shrinkage operation is iteratively performed until a maximum number of iterations and the entire procedure can be restarted multiple times to mitigate the risk to converge to a local optimum. Finally, the best point over all restarts and iterations is returned as the solution.
4.1 Experiment 1 β Gaussian Process
According to the mentioned procedure, the following figures summarize the main results of the study.
Fig. 3. Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with βthresholdβ set to 0.10, and in UCB. Last bar represents the number of non-compliant.
From Fig. 3, EI is preferred, indicating that exploitative behaviour is dominant among humans. This outcome is relatively independent on the kernel.
Fig. 4. Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with βthresholdβ set to 0.15, and in UCB. Last bar represents the number of non-compliant.
From Fig. 4, results are coherent with previous Fig. 3. Moreover, the number of non-compliant is reduced with the less restrictive threshold.
Fig. 5. Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with βthresholdβ set to 0.15, and in UCB. Last bar represents the number of non-compliant.
From Fig. 5, one can see that with the reduction in , UCB gains a larger share of participants than previously.
Fig. 6. Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with βthresholdβ set to 0.15, and in UCB. Last bar represents the number of non-compliant.
From Fig. 6, the last confirmation that βgreed is goodβ: means no-exploration in UCB whose fully exploitative version gets the largest share of participants.
4.2 Experiment 2 β Random Forest
From Fig. 7, with and explorative UCB (), Probability of Improvement, a no-
toriously exploitative acquisition function, gets the larger share. The situation changes reducing the exploration component in UCB, which with nates choices among the compliant.
Fig. 7 Number of human players whose strategy is compliant with respect to different acquisition functions, with RF, in different settings of βthresholdβ and values of UCB. From left to right: threshold = 0.10 and ; threshold = 0.15 and and
BO is a principled approach for adding a mathematical structure to the search and optimization process which resembles human optimization strategies. This can be explained by the fact that probabilistic surrogate models explain function approximation in humans. In order to search efficiently for the optimum, one needs to learn the function landscape by updating its approximation through observations. Indeed, learning and optimization of black-box functions are 2 faces of the same coin.
GP and RF have been argued to offer a reasonable unifying framework of human function learning, efficient active sampling and search. While most research findings have been focused on human errors in function learning we focus on optimization of black-box functions, linked to human active search behavior.
The number of BO compliant participants is very high (less than 10% of participants resulted non-compliant to all models and acquisition functions). A general conclusion is that the exploitative oriented acquisition functions are get consistently the larger share. Also interesting is the analysis of which space model, that is kernel, and which exploitation-exploration balance, that is the acquisition function, are implied by human search. Based on the limited set of results presented in this paper, kernel is not a major factor in determining compliance. The acquisition function, and its parametrization for UCB, are the main determinants of the choice. The value of significantly affects the relative importance of UCB over the search processes performed by the compliant participants. Finally, we can conclude that βgreed is goodβ, that is the human behavior is quite exploitative (i.e., threshold=0.15 and =0 for UCB).
Ethical approval: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent: Informed consent was obtained from all individual participants included in the study.
1. Adorio, E. P., Diliman, U. P. MVF - Multivariate Test Functions Library in C for Unconstrained Global Optimization (2005). Retrieved June 2013.
2. Archetti, F., et al.: "Are Humans Bayesian in the Optimization of Black-Box Functions?." Numerical Computations: Theory and Algorithms: Third International Conference, NUMTA 2019, Crotone, Italy, June 15β21, 2019, Revised Selected Papers, Part II. Springer Nature.
3. Archetti, F., Candelieri A.: Bayesian Optimization and Data Science. Springer International Publishing, 2019.
4. Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: A modular framework for model-based optimization of expensive black-box functions. arXiv preprint arXiv:1703.03373, (2017).
5. Borji, A., Itti, L.: Bayesian optimization explains human active search. In: Advances in Neural Information Processing System 26 (NIPS2013). pp. 55β63 (2013)
6. Candelieri, A., Perego, R., & Archetti, F. (2018). Bayesian optimization of pump operations in water distribution systems. Journal of Global Optimization, 1-23.
7. Eggensperger, K., Lindauer, M., & Hutter, F. (2019). Pitfalls and best practices in algorithm configuration. Journal of Artificial Intelligence Research, 64, 861-893.
8. Gershman, S. J. (2018). Uncertainty and exploration. bioRxiv, 265504, https://doi.org/10.1101/265504
9. Gopnik, A., OβGrady, S., Lucas, C. G., Griffiths, T. L., Wente, A., Bridgers, S.,... & Dahl, R. E. (2017). Changes in cognitive flexibility and hypothesis search across human life history from childhood to adolescence to adulthood. Proceedings of the National Academy of Sci-ences, 114(30), 7892-7899.
10. Griffiths, T.L., Kemp, C. & Tenenbaum, J.B., 2008. Bayesian models of cognition. Cambridge handbook of computational cognitive modeling, pp.59β100.
11. Hartford, J.S., Wright, J.R. & Leyton-Brown, K., 2016. Deep learning for predicting human strategic behavior. In Advances in Neural Information Processing Systems. pp. 2424β2432.
12. Ho, Tin Kam. "Random decision forests." Proceedings of 3rd international conference on document analysis and recognition. Vol. 1. IEEE, 1995.
13. Kruschke, J. K. (2008). Bayesian approaches to associative learning: From passive to active learning. Learning & behavior, 36(3), 210-226.
14. Kushner, Harold J. "A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise." (1964): 97-106.
15. Mehlhorn, K., Newell, B. R., Todd, P. M., Lee, M. D., Morgan, K., Braithwaite, V. A., ... & Gonzalez, C. (2015). Unpacking the explorationβexploitation tradeoff: A synthesis of human and animal literatures. Decision, 2(3), 191.
16. MoΔkus, Jonas. "On Bayesian methods for seeking the extremum." Optimization techniques IFIP technical conference. Springer, Berlin, Heidelberg, 1975.
17. Plonsky, O. et al., 2019. Predicting human decisions with behavioral theories and machine learning. arXiv preprint arXiv:1904.06866.
18. Schulz, E. et al., 2015. Assessing the Perceived Predictability of Functions. In CogSci.
19. Schulz, Eric, et al. "Quantifying mismatch in Bayesian optimization." Nips workshop on bayesian optimization: Black-box optimization and beyond. 2016.
20. Schulz, E., Konstantinidis, E. & Speekenbrink, M., 2018. Putting bandits into context: How function learning supports decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(6), p.927.
21. Schulz, E., Tenenbaum, J., Duvenaud, D. K., Speekenbrink, M., & Gershman, S. J. (2016). Probing the compositionality of intuitive functions. In Advances in neural information processing systems (pp. 3729-3737).
22. Schulz, E., Speekenbrink, M., & Krause, A. (2018). A tutorial on GP regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology, 85, 1-16.
23. Srinivas, N., Krause, A., Kakade, S., & Seeger, M. (2010, June). GP optimization in the bandit setting: no regret and experimental design. In Proceedings of the 27th International Conference on International Conference on Machine Learning (pp. 1015-1022). Omnipress.
24. Williams, Christopher KI, and Carl Edward Rasmussen. GPes for machine learning. Vol. 2. No. 3. Cambridge, MA: MIT press, 2006.
25. Wilson, A.G. et al., 2015. The human kernel. In Advances in neural information processing systems. pp. 2854β2862.
26. Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behaviour, 2(12), 915.
27. Zhigljavsky, Anatoly, and Antanasz Zilinskas. Stochastic global optimization. Vol.
9. Springer Science & Business Media, 2007.