An Improved Generic Bet-and-Run Strategy for Speeding Up Stochastic Local Search

2018·Arxiv

1. Introduction

1. Introduction

Optimization algorithms are widely used in a variety of domains, such as production scheduling and planning or vehicle routing. In many such practical applications, the total time budget t available for optimization is limited to at most a few minutes. The goal is to find a solution which is as-good-as-possible within this budget. One method to do so is to develop better optimization algorithms. Another method is to make the best use of an existing solver.

There are two straightforward methods when approaching an optimization problem with one algorithm and a total time budget t: One can either assign the whole budget t to a single run of the algorithm or execute k independent restarts of the algorithm (Mart´ı 2003, Lourenc¸o et al. 2010) and therefore divide the budget t into k equally-sized chunks. It can be expected that the former strategy is the best for small budgets while the latter one is the better choice for large budgets. Budgets t of a few minutes, however, fall in neither category for many problem types, which, of course, depends on the instance and the solver.

Here, BET-AND-RUN strategies (Fischetti and Monaci 2014) pose a compromise by using an initialization time budget which is divided evenly amongst k independent runs. The run with the best-so-far solution is then continued for the remaining time units. Friedrich et al. [2017] showed recently that this simple approach can routinely outperform the two budgeting approaches above. Yet, it makes only use of a single unit of information per run for the “decision” which of the k runs to resume, namely the solution quality they reached at the end of their respective initialization budgets.

In order to investigate the question Can we do better than that?, we generalize the BET-AND-RUN concept as illustrated in Figure 1: The budget t is divided into three pieces, i.e., and used as follows.

Phase 1 The initialization budget is divided among a set of k initial, independent runs according to a

Phase 2 Then, a decision maker D is applied which may access the history of each run in form of

Phase 3 The remainder of the total budget t is then divided evenly among the m chosen runs, which thus

In the following, we show that our approach can yield an advantage above the two straightforward methods for solving optimization problems mentioned at the beginning of the introduction, and it also often outperforms the BET-AND-RUN strategy by Friedrich et al. [2017] when this is possible.

Figure 1 Our generic BET-AND-RUN strategy receives a total time budget t. It starts k independent runs

In this article, we first survey existing work in Section 2 and introduce the used dataset in Section 3. Then, we motivate the use of different decision makers in Section 4 before we present the results of our comprehensive study in Section 5. All datasets and the complete implementation of our algorithms have been made publicly available (Weise and Wagner 2018).

2. Related Work

In practice, stochastic search algorithms and randomized search heuristics are frequently restarted: If a run does not conclude within a pre-determined solution quality limit, we restart the algorithm (Mart´ı 2003, Lourenc¸o et al. 2010). One of the advantages of this simple approach is that it helps to avoid heavy-tailed runtime distributions (Gomes et al. 2000). However, due to the added complexity of designing an appropriate restart strategy for a given target algorithm, the two most common techniques used are to either restart with a certain probability at the end of each iteration, or to employ a fixed schedule of restarts.

Some theoretical results exist on how to construct optimal restart strategies. For example, Luby et al. [1993] showed that, for Las Vegas algorithms with known run time distribution, there is an optimal stopping time in order to minimize the expected running time. Even if the distribution is unknown, there is a universal sequence of running times given by (1,1,2,1,1,2,4,1,1,2,1,1,2,4,8,...), which is the optimal restarting strategy up to constant factors. While these results can be used for every problem setting, they only apply to Las Vegas algorithms.

Fewer results are known for the optimization case. Introductions to practical approaches for such restart strategies are given by Mart´ı [2003] and Lourenc¸o et al. [2010]. A relatively recent theoretical result is presented by Schoenauer et al. (2012). Several studies show the substantial impact of the restart policy on the efficiency of solvers for satisfiability problems (Biere and Fr¨ohlich 2015, Huang 2007). In this context, restarts have also been used to learn “no-goods” during backtracking Cir´e et al. (2014).

Quite often, classical optimization algorithms are deterministic and thus cannot be improved by restarts, as run time and outcome will not change. However, their characteristics can be subject to change. For example, Lalla-Ruiz and Voß [2016] exploited this by using different mathematical programming formulations so as to provide different starting points for the solver. While many other modern optimization algorithms also work mostly deterministically, they often have some randomized component, for example by choosing a random starting point. These initial solutions often strongly influence the quality of the outcome and the speed of reaching it. In our opinion, it follows quite naturally that algorithms should be run several times.

Fischetti and Monaci [2014] extended the classical restart strategies to the so-called BET-AND-RUN strategy:

Phase 1 perform k runs of the algorithm for some (short) time limit , assigning time units to

Phase 2 use remaining time to continue only the best run from the first phase until timeout.

Fischetti and Monaci [2014] experimentally studied this for mixed-integer programming. They explicitly introduce diversity in the starting conditions of the used MIP solver (IBM ILOG CPLEX) by directly accessing internal mechanisms. For them, k = 5 performed best.

de Perthuis de Laillevault et al. [2015] have shown that a BET-AND-RUN strategy can also benefit asymp-

totically from larger k. For the pseudo-boolean test function ONEMAX it was proven that choosing k > 1 decreases the O(nlog n) expected run time of the (1+1) evolutionary algorithm by an additive term of

Lissovoi et al. [2017] investigated BET-AND-RUN for a family of pseudo-Boolean functions, consisting of a plateau and a slope, as an abstraction of real fitness landscapes with promising and deceptive regions. The authors proved that BET-AND-RUN with non-trivial k and are necessary to find the global optimum efficiently, and that the choice of is linked to features of the problem. They also provided a fixed budget analysis to guide selection of the BET-AND-RUN parameters to maximize the solution quality.

Friedrich et al. [2017] investigated a comprehensive range of BET-AND-RUN strategies on the traveling salesperson problem and the minimum vertex cover problem. Their best strategy performed 40 short runs in the initial phase with a time limit that is 1% of the total time budget each, and then it used the remaining 60% of the total time budget to continue the best run of the first phase. They investigated the use of the universal sequence of Luby et al. [1993] as well, using various choices of , however, it turned out inferior.

Building on the success of BET-AND-RUN approaches for restarted local search solvers, Kadioglu et al. [2017] introduced the idea of adaptive restart strategies. Inside their approach, a learned black-box decision procedure dynamically decides whether to continue the current run, to continue a previous run, or to start a new run. While their approach performed favorably, the internal mechanisms were black-box and it remained unclear which algorithmic components and which decisions contributed to the success.

Note that the stream of BET-AND-RUN-related research is related to the very mature field of multi-armed bandits. To the best of our knowledge, however, there are no existing works there that make use of the core ideas of BET-AND-RUN to solve a single instance, i.e., to have an overall limited budget as well as the idea to exclusively stick to one arm after some first exploratory phase. For example, Gagliolo and Schmidhuber [2011] propose a method for allocating computation time to algorithm portfolios for solving instance sets (thus working on a much higher/coarser granularity), however, their approach does not carry over to our fine-grained scenario of using partial runs for optimizing a single instance.

3. Benchmarks

Here we shortly introduce the two benchmark problems and the optimization algorithms used to solve them.

3.1. Minimum Vertex Cover Problem

Solving the minimum vertex cover problem (MVC) means finding the smallest set of vertexes of a graph which contains at least one vertex from every edge. The MVC is one of the classical NP-hard problems with many applications (Gomes et al. 2006). It also is closely related to the problem of finding a maximum clique Abu-Khzam et al. (2006). The state-of-the-art algorithms for solving the MVC comprise FASTVC (Cai 2015), NuMVC (Cai et al. 2013), TwMVC (Cai et al. 2015), and FastWVC (Li et al. 2017).

Kadioglu et al. [2017] applied FASTVC Cai (2015) in their experiments, one of the best algorithms for large MVC instances. FASTVC is based on two low-complexity heuristics. The first one constructs an initial vertex cover and the second one chooses the vertex to be removed in each exchanging step, which involves random draws from a set of candidates. We use the data that Kadioglu et al. [2017] gathered in 10,000 independent runs on all 86 instances used by Cai (2015). These instances are of rather large scale and most of them are sparse, which is challenging for solvers. The number of vertices in the instances ranges from about 1000 to over 4 million and the number of edges from about 2000 to over 56 million.

3.2. Traveling Salesperson Problem

The traveling salesperson problem (TSP) (Applegate et al. 2007, Lawler et al. 1985) is one of the most well-known combinatorial optimization tasks. A TSP instance is defined as a fully-connected graph. Each edge in the graph has a weight, representing the distance between the two nodes it connects. A candidate solution is a cycle that visits each node in the graph exactly once and returns back to its starting node. The objective function, subject to minimization, is the sum of the weights of all edges in the tour, i.e., the total tour length. This optimization version of the TSP is NP-hard (Gary and Johnson 1979, Gutin and Punnen 2002). The state-of-the-art algorithms for the TSP include EAX (Nagata and Kobayashi 2013), the Chained-Lin-Kernighan heuristic (Applegate et al. 2003, Cook 2005), Partition Crossover (Whitley 2016), as well as hybrid metaheuristics (Liu et al. 2015).

We use the data from Kadioglu et al. [2017], who used the Chained-Lin-Kernighan heuristic (Apple- gate et al. 2003, Cook 2005) as TSP solver. They applied it 10,000 times to each of the 112 instances from TSPLib Reinelt (1991) and additionally to the large instances ch71009, mona-lisa100k, and usa115475. We omit instance linhp318, as no data was available on it.

In the next sections, for the sake of readability, we will refer to the combination of solver and problem by just using the problem domain, i.e., TSP and MVC.

Figure 2 k = 40 selected runs from the datasets brd14051 (TSP) and socfb-Stanford3 (MVC) for a total

4. BET-AND-RUN with Better Decision Makers

Our generalized BET-AND-RUN strategy can simulate a range of existing approaches. For instance, the simple multi-run strategy of restarting from scratch k times is a special case by choosing and . The single-run strategy corresponds to a multi-run method with k = 1. The strategies from Fischetti and Monaci, Friedrich et al. [2014, 2017] and Lissovoi et al. [2017] are special cases by choosing m = 1 and having . Our experiments detailed in the next section therefore also cover these approaches.

Note that in all related work, D = currentBest is applied, which picks one of the runs with the current best result after initialization and resumes it where it was paused. Intuitively, this is a robust strategy for which holds, but it does not necessarily pick the run which yields the best result after all budget has been exhausted, as illustrated in Figure 2. In such scenarios, paying some time cost for a more sophisticated choice could yield a better overall result.

In our experiments, we need to use clock time as time measure and cannot apply any other measure common in optimization (Weise et al. 2014) such as function evaluations (FEs). This is because the decision makers do not evaluate the objective function or generate candidate solutions by themselves, but only process the data already collected, i.e., the aforementioned (time,quality) tuples. Using clock time to measure the computational effort of both the optimization algorithm and the decision maker has the further advan-

Table 1 Baseline performance of currentBest. Shown is the number (and percentage) of instances from the

tage that we can also consider otherwise “hidden” costs, such as for the initialization of data structures and bookkeeping.

In order to get an initial estimate on how likely such scenarios are, we randomly draw 1’000’000 samples of k runs from our each of our 86 MVC and 113 TSP datasets. In Table 1, we count how often the runs chosen by currentBest, which were best after time , are outperformed by another run after time

t/k.

As can be seen, at least for t = 100s and cannot be beaten in the majority of samples. Still, in 78% of the MVC benchmark instances, there were at least some samples of k = 4 runs where currentBest made the wrong choice. For k = 40, the chance to theoretically being able to outperform currentBest on a random instance of the MVC problem is 23%. For the TSP, these chances tend to be lower, but there is still a potential to improve the overall performance. However, these are mean probabilities, and the actual values can deviate significantly. For example, for the two instances shown in Figure 2, the observed probabilities of outperforming currentBest when varying range from 0.25 to 0.93 (brd14051) and 0.01 to 0.10 (socfb-Stanford3).

A decision maker better than currentBest would need to, e.g., outperform it in at least some of these scenarios while not performing worse in others. Although we have confirmed that there exist sufficiently many scenarios where this is potentially possible, there is another requirement which may decrease these chances: The performance data collected during the initialization budget of a run i must permit making a sufficiently accurate prediction regarding the future progress of that run. If this is true, then sophisticated decision makers have a chance to yield better results. Qi et al., for instance, showed that perceptrons have good prediction accuracy in their experiments on the Maximum Satisfiability Problem Qi et al. (2017) and the TSP Qi et al. (2018) with simple solvers. This prediction capability should make them suitable for determining which solution quality a run would yield if continued for a certain amount of time. But other techniques, such as linear predictors, might yield improvements as well. An investigation of different BET-AND-RUN configurations and decision makers therefore is worthwhile.

5. Experimental Study

5.1. Experimental Setup

To investigate the performance of our approach, and in particular to investigate the benefits of our more general BET-AND-RUN setup, we first perform a wide scan of many different setups and then investigate fewer setups in more detail. All datasets have been made publicly available (Weise and Wagner 2018).

5.1.1. Initial Large-Scale Experiment. In the first set of experiments, we limit ourselves to 20 random

samples for each benchmark dataset and setup. We cover the total budgets . For and , we test 25 different values of . These are automatically chosen according to a heuristic based on the data from Kadioglu et al. (2017) for each instance before the experiment in order to maximize the number of different, meaningful outputs, e.g., the smallest values of are chosen that there is at least one data point. We briefly investigated more choices for m, but found that the preliminary results did not look very promising while the overall experimental time required would have more than doubled.

For the distribution of among the k initial runs, two strategies are tested. EVEN assigns for each . LUBY instead follows the Luby sequence Luby et al. (1993) and sets , which equals to if and to otherwise, with . The values of are chosen to be multiples of for the LUBY experiments and multiples of k for those using EVEN.

Our decision makers have access to the measured data points collected until is exhausted in the form of tuples of (time,quality). The set of basic decision makers includes currentBest, which picks the runs with the best quality value in their last measured data point, random, which randomly picks runs, and currentWorst, which picks the worst runs. The latter two performed worse and were included as sanity tests only. mostImprovements simply choses the run(s) i with the highest number of improvements (measure points) divided by the logarithm of their consumed time in the hope that they may be likely to attain further improvements. logTimeSum chooses the runs for which the sum of the logarithms of all time stamps at which improvements were made are the highest.

We also propose model-based decision makers that try to construct, for each run, a functional relationship between the time stamps and the achieved quality. These relationships are used to predict the quality that a run would reach if it was selected and pick the runs with the best predicted results.

As model types we test linear, quadratic, and cubic polynomials as well as perceptrons. The latter is suggested by Qi et al., Qi et al. [2017, 2018] for modeling optimization algorithm behavior. We apply perceptrons PER(n) with nodes on a single hidden layer and such just with input/output layer (n = 0). We use either tanh or the linear step as activation function. The parameters of the polynomials can either be computed directly based on two, three, or four data points or fitted using the Levenberg-Marquardt algorithm (Levenberg 1944, Marquardt 1963) algorithm based on last ten measured points. The parameters of the perceptrons are obtained by either applying SepCMA-ES (Ros and Hansen 2008) or CSA (Arnold and Beyer 2008) for at most 400 function evaluations, on the last 10 points collected in the run. We chose 10 points only in order to limit the runtime consumed for training the perceptrons, which grows linearly with the number of points.

For the modeling, time and quality may be either used directly or logarithmically scaled. Furthermore, if the time value of the last measured tuple (time,quality) is less than , we may add a “virtual end point” (timeto the dataset of run i. This makes sense because an optimization process may first quickly trace down a local optimum and then not improve anymore at all. In that case, no further measure point would appear in its initial budget and simply extrapolating its initial progress while ignoring this fact may yield wrong predictions. Finally, we test a linear model extrapolating from the very first measured point and the “virtual end point” of a run into the future.

This results in 44 decision maker setups, yielding a total of experiments simulated on the data from Kadioglu et al. (2017).

5.1.2. Targeted Smaller Experiment. In a second experiment we investigate fewer, selected values of

, which also allows us to test additional configurations.

We investigate one additional decision maker, diminishing, which is based on the idea of diminishing returns Samuelson and Nordhaus (2001). We set , where be the last improvement in terms of quality a run has made and the previous one. We further set where and are the corresponding required runtime. The decision maker assumes that it will take longer by factor to achieve each further improvement for the run, which, in turn, will be smaller by factor . Improvements and times are always discretized.

In this experiment, we set m = 1. We choose in correspondence to Kadioglu et al. (2017), who used the range 50s to 500s for MVC and 100s to 5000s for TSP. We take 1000 samples for each setup.

5.2. Results

5.2.1. Initial Large-Scale Experiment. An experiment of the scale and with as many parameters as

described in Section 5.1.1 cannot be discussed in full here. Our findings confirm that currentBest is a very robust basic strategy that performs the best in many situations. We know from Section 4 that good decision makers should perform very similar to it and only sometimes can yield better results. Averaged over all benchmark instances, it should be possible to gain an advantage of a few percent. From Table 1 we can predict that this advantage should be bigger on the MVC than on the TSP.

Indeed, in Figure 3, we can observe exactly this.We find that perceptron-based decision makers work generally well and are (slightly) more likely to most-often outperform a single run with the full budget than currentBest on MVC for all while this only holds for on the TSP.

Figure 3 Performance of the decision makers compared with a single run executed over the whole budget t,

Larger total budgets t seem to be beneficial when the goal is to outperform single runs or currentBest. Note that this is a parameter which cannot be controlled by the user as it results from application requirements.

The time needed by the decision makers is generally the highest for perceptron-based methods (influ-enced by the presence and size of the hidden layer) and in the 100ms range. If we do not consider in our simulated experiments, i.e., artificially set , the outcome of the experiments stays almost the same. is deducted from to be used for continuing the selected runs. It would be conceivable that using much time to make a decision could decrease too much so that the gain from better prediction is destroyed by the loss of budget for actually attaining the gain. However, the experiment indicates that using more

Figure 4 The average scores of the different budgeting strategies for t = 100s over all decision makers and

complex decision makers requiring more time may be viable, e.g., using more than the last 10 points to train our perceptrons would have been possible.

We now analyze the impact of the budgeting strategy, i.e., the choices of k, m, and whether to apply the EVEN or LUBY time distributions. Good choices of these parameters should obviously depend on the available total budget t and the runtime behavior of the solvers. In Figure 4 we plot the performance of the different configurations for t = 100s, averaged over all decision makers and for different values of .

Using EVEN with k = 10, m = 1 is the best choice for the MVC and the TSP for . For smaller budgets of the MVC and t = 100s on the TSP, it makes sense to just perform k = 40 independent restarts distributing the time according to the LUBY strategy. This may result from the fact that the set of decision makers over which we average also contains worse performing methods such as currentWorst and random. Continuing two runs (m = 2) only is a good choice for the large budgets t = 400s on both the MVC and TSP. This cost of continuing a second run is only then outweighed by the benefits of exploiting the variance of runtime performance.

5.2.2. Alternatives to currentBest. As shown in Section 4 and Figure 3, it is theoretically and prac-

tically possible to outperform the predictor currentBest. Next, we show to which extent and under which conditions we are able to do so given the predictors described above.

We compare our results with the best approach from Friedrich et al. (2017) (named F17 here and used as a benchmark by us), which uses currentBest to pick m = 1 run from k = 40 initial runs, each of which received 1% of the total time budget t, i.e., . The purpose of this comparison is to see whether improvements are possible, and also whether they are statistically significant.

In Figure 5, we show a qualitatively representative subset of our results. We have chosen two extreme total time budgets (a very small one of 2s and a very large one of 2’000s) and selected a diverse set of predictors. Note that we have chosen pie charts on purpose as they allow for a quick qualitative comparison of results.

It turns out, that the benchmark approach dominates or is dominated, depending on the problem domain, the instances, and the total time budget. For example, it is no surprise that the benchmark approach can beat the single run (lots of green) when the total time budget is large, as performance variance can be exploited. Also, we can see that the last phase of BET-AND-RUN, i.e. when a run is continued, is generally helpful when the total runtime is short, as both EVEN and LUBY are beaten significantly and often in both the MVC and TSP case (lots of green).

For MVC and small budgets, many predictors can beat the benchmark approach. This advantage vanishes as the total time budget increases, which is due to the algorithm’s convergence within the used time for the individual initial runs. Consequently, performances are typically not distinguishable anymore from F17 (lots of gray), while the differences remain statistically significant for the TSP.

When it comes to the different problem domains, it also turns out that for MVC many predictors perform better than the benchmark approach. For example, the diminishing approach is significantly better on 43 instances while worse only on 24 instances; similar ratios hold for the other predictors. For the TSP and the long total time budget, however, there are a few deviations from the “usual” pie chart in this category. Noteworthy deviations are the perceptrons PER(0) without hidden layer and diminishing. In both cases, the benchmark is better on only three instances (as visible by the little green section), while being beaten on 19 instances (shown in red).

Lastly, we briefly compare the performance of different predictors when only 4 instead of 40 initial runs are performed. The results in Figure 6 show that predictors more elaborate than currentBest are again significantly more successful in picking the best run for both small and large total time budgets (lots of red and gray).

6. Summary and Conclusions

Despite the appeal of a one-size-fits-all recommendation, we have observed in our studies that the best BET-AND-RUN configuration varies depending on total runtime budget t and instance. The resulting “best” configurations have varied from “make many short runs” to “just make a single run”.

This is further complicated by the observation that the very same algorithm on the very same instance can show significantly different behavior with intersecting performance profiles (see Figure 2). These then cause difficulties in choosing the right run to continue: depending on the total time budget, this causes a switch of the best decision maker from “currentBest” even to “currentWorst” in some cases. Theoretical results are needed to characterize this further in the context of stochastic algorithms and this will be the subject of our future work.

Still, over a wide variety of scenarios, we found that predicting the future performance of the initial runs in order to select those to continue is feasible. This means that it is possible to discriminate between “good” and “bad” sample runs, and to increasing the correlation of the chosen run with the a-posteriori best one. In particular, the crude and very fast concept of diminishing returns has led to surprisingly good results. Another good approach was to fit the parameters of a perceptron to the observed data, using numerical black-box optimizers.

As our generic BET-AND-RUN approach is configurable, per-instance configuration should be possible. Our preliminary work in this direction indicates that this is feasible, however, the uneven heterogeneity of the instance features in combination with the small number of instances is currently posing a major challenge.

Figure 5 Statistical comparison of the best BET-AND-RUN configuration from Friedrich et al. (2017) (here

Figure 6 Comparison of predictors for . The style is identical to that of Figure 5. As a

Also, the BET-AND-RUN approaches to date make their decisions purely based on solution quality and also consider just a single algorithm. Both aspects can be extended by giving the decision maker access to features of the solutions, and also by allowing for a diverse set of solvers (or configurations thereof) to participate in the overall optimization, with the overall goal to better exploit performance variance of solvers.

Endnotes

rentBest is outperformed and such where it is not. We attempted to select figures without bias.

Acknowledgments

T. Weise acknowledges support by the National Natural Science Foundation of China under Grants 61673359, 61150110488, and 71520107002. We used Alexandre Devert’s great implementation of SLPs, MLPs, SepCMA, and CSA (see http://github.com/marmakoide/jpack). M. Wagner acknowledges support by the ARC Discovery Early Career Researcher Award DE160100850.

References

Abu-Khzam F, Langston M, Shanbhag P, Symons C (2006) Scalable parallel algorithms for FPT problems. Algorithmica 45:269–284, URL http://dx.doi.org/10.1007/s00453-006-1214-1.

Applegate D, Bixby R, Chv´atal V, Cook W (2007) The Traveling Salesman Problem: A Computational Study (Princeton University Press).

Applegate D, Cook W, Rohe A (2003) Chained Lin-Kernighan for large traveling salesman problems. INFORMS Journal on Computing 15, URL http://dx.doi.org/10.1287/ijoc.15.1.82.15157.

Arnold D, Beyer H (2008) Evolution strategies with cumulative step length adaptation on the noisy parabolic ridge. Natural Computing 7:555–587, URL http://dx.doi.org/10.1007/s11047-006-9025-5.

Biere A, Fr¨ohlich A (2015) Evaluating CDCL restart schemes. International Workshop on Pragmatics of SAT (POS), 16, URL http://fmv.jku.at/papers/BiereFroehlich-POS15.pdf, preliminary version.

Cai S (2015) Balance between complexity and quality: Local search for minimum vertex cover in massive graphs. Yang Q, Wooldridge M, eds., Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), 747–753 (AAAI Press), code: http://lcs.ios.ac.cn/caisw/MVC.html, accessed 2017-12-28.

Cai S, Lin J, Su K (2015) Two weighting local search for minimum vertex cover. 29th AAAI Conference on Artificial Intelligence (AAAI 2015), January 25–30, 2015, Austin, United States, 1107–1113 (AAAI Press).

Cai S, Su K, Luo C, Sattar A (2013) Numvc: An efficient local search algorithm for minimum vertex cover. Journal of Artificial Intelligence Research 46:687–716, URL http://dx.doi.org/10.1613/jair.3907.

Cir´e A, Kadioglu S, Sellmann M (2014) Parallel restarted search. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 842–848.

Cook W (2005) The traveling salesperson problem: Downloads (website). http://www.math.uwaterloo.ca/tsp/ concorde/downloads/downloads.htm, accessed 2017-12-28.

de Perthuis de Laillevault A, Doerr B, Doerr C (2015) Money for nothing: Speeding up evolutionary algo- rithms through better initialization. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’15), July 11-15, 2015, , Madrid, Spain, 815–822, URL http://dx.doi.org/10.1145/2739480.2754760.

Fischetti M, Monaci M (2014) Exploiting erraticism in search. Operations Research 62:114–122, URL http://dx.doi. org/10.1287/opre.2013.1231.

Friedrich T, K¨otzing T, Wagner M (2017) A generic bet-and-run strategy for speeding up stochastic local search. Singh SP, Markovitch S, eds., 31st AAAI Conference on Artificial Intelligence, 801–807 (AAAI Press).

Gagliolo M, Schmidhuber J (2011) Algorithm portfolio selection as a bandit problem with unbounded losses. Annals of Mathematics and Artificial Intelligence 61(2):49–86, URL http://dx.doi.org/10.1007/s10472-011-9228-z.

Gary MR, Johnson DS (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness (New York, NY, USA: W. H. Freeman and Company), ISBN 0-7167-1045-5.

Gomes C, Selman B, Crato N, Kautz H (2000) Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. Journal of Automated Reasoning 24(1):67–100, URL http://dx.doi.org/10.1023/A:1006314320276.

Gomes F, de Meneses C, Pardalos P, Viana G (2006) Experimental analysis of approximation algorithms for the vertex cover and set covering problems. Computers & OR 33:3520–3534, URL http://dx.doi.org/10.1016/j.cor.2005. 03.030.

Gutin G, Punnen A, eds. (2002) The Traveling Salesman Problem and its Variations, volume 12 of Combinatorial Optimization (Berlin, Heidelberg: Kluwer).

Huang J (2007) The effect of restarts on the efficiency of clause learning. International Joint Conference on Artifical Intelligence (IJCAI), 2318–2323.

Kadioglu S, Sellmann M, Wagner M (2017) Learning a reactive restart strategy to improve stochastic search. Learning and Intelligent Optimization – 11th International Conference (LION 11), June 19-21, 2017, Nizhny Novgorod, Russia, Revised Selected Papers, 109–123, URL http://dx.doi.org/10.1007/978-3-319-69404-7 8.

Lalla-Ruiz E, Voß S (2016) Improving solver performance through redundancy. Systems Science and Systems Engineering 25(3):303–325, URL http://dx.doi.org/10.1007/s11518-016-5301-9.

Lawler E, Lenstra J, Rinnooy Kan A, Shmoys D (1985) The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization (New Year, USA: Wiley).

Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics 2:164–168.

Li Y, Cai S, Hou W (2017) An efficient local search algorithm for minimum weighted vertex cover on massive graphs. Proceedings of the 11th International Conference on Simulated Evolution and Learning (SEAL 2017),, 145–157, URL http://dx.doi.org/10.1007/978-3-319-68759-9 13.

Lissovoi A, Sudholt D, Wagner M, Zarges C (2017) Theoretical results on bet-and-run as an initialisation strategy. Bosman PAN, ed., Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’17), July 15-19, 2017, Berlin, Germany, 857–864 (ACM), URL http://dx.doi.org/10.1145/3071178.3071329.

Liu W, Weise T, Wu Y, Chiong R (2015) Hybrid ejection chain methods for the traveling salesman problem. Gong M, Pan L, Song T, Tang K, Zhang X, eds., Proceedings of the 10th International Conference on Bio-Inspired Computing – Theories and Applications (BIC-TA’15), September 25–28, 2015, Hefei, Anhui, China, volume 562 of Communications in Computer and Information Science, 268–282 (Berlin/Heidelberg: Springer-Verlag), ISBN 978-3-662-49013-6, URL http://dx.doi.org/10.1007/978-3-662-49014-3 25.

Lourenc¸o H, Martin O, St¨utzle T (2010) Iterated local search: Framework and applications. Gendreau M, Potvin JY, eds., Handbook of Metaheuristics, volume 146 of International Series in Operations Research & Management Science (ISOR), 363–397 (Springer), URL http://dx.doi.org/10.1007/978-1-4419-1665-5 12.

Luby M, Sinclair A, Zuckerman S (1993) Optimal speedup of Las Vegas algorithms. Information Processing Letters 47:173–180, URL http://dx.doi.org/10.1016/0020-0190(93)90029-9.

Marquardt D (1963) An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics 11(2):431–441.

Mart´ı R (2003) Multi-start methods. Glover F, Kochenberger GA, eds., Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science (ISOR), chapter 12, 355–368 (Springer).

Nagata Y, Kobayashi S (2013) A powerful genetic algorithm using edge assembly crossover for the traveling salesman problem. INFORMS Journal on Computing 25:346–363, URL http://dx.doi.org/10.1287/ijoc.1120.0506.

Qi Q, Weise T, Li B (2017) Modeling optimization algorithm runtime behavior and its applications. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’17) Companion, July 15-19, 2017, Berlin, Germany, 115–116 (ACM), URL http://dx.doi.org/10.1145/3067695.3076042.

Qi Q, Weise T, Li B (2018) Optimization algorithm behavior modeling: A study on the traveling salesman prob- lem. Proceedings of the Tenth International Conference on Advanced Computational Intelligence (ICACI’18), March 29-31, 2018, Xiamen, China, 845–850 (IEEE), ISBN 978-1-5386-4362-4.

Reinelt G (1991) TSPLIB – a traveling salesman problem library. ORSA Journal on Computing 3(4):376–384, URL http://dx.doi.org/10.1287/ijoc.3.4.376, instances: http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ tsp/, accessed 2017-12-28.

Ros R, Hansen N (2008) A simple modification in CMA-ES achieving linear time and space complexity. Rudolph G, Jansen T, Lucas SM, Poloni C, Beume N, eds., 10th International Conference on Parallel Problem Solving from

Nature (PPSN X), September 13-17, 2008, Dortmund, Germany, volume 5199 of Lecture Notes in Computer Science, 296–305 (Springer), URL http://dx.doi.org/10.1007/978-3-540-87700-4 30.

Samuelson P, Nordhaus W (2001) Microeconomics (McGraw-Hill).

Schoenauer M, Teytaud F, Teytaud O (2012) A rigorous runtime analysis for quasi-random restarts and decreasing stepsize. Artificial Evolution – 10th International Conference, Evolution Artificielle (EA’11), October 24-26, 2011, Angers, France, Revised Selected Papers, 37–48 (Springer), URL http://dx.doi.org/10.1007/ 978-3-642-35533-2 4.

Weise T, Chiong R, Tang K, L¨assig J, Tsutsui S, Chen W, Michalewicz Z, Yao X (2014) Benchmarking optimization algorithms: An open source framework for the traveling salesman problem. IEEE Computational Intelligence Magazine 9(3):40–52, URL http://dx.doi.org/10.1109/MCI.2014.2326101.

Weise T, Wagner M (2018) Results of Bet-and-Run Strategies with Different Decision Makers on the Traveling Sales- man Problem and the Minimum Vertex Cover Problem. URL http://dx.doi.org/10.5281/zenodo.1253770.

Whitley D (2016) Blind no more: Deterministic partition crossover and deterministic improving moves. Friedrich T, Neumann F, Sutton AM, eds., Companion Material Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), 515–532 (ACM), ISBN 978-1-4503-4323-7, URL http://dx.doi.org/10.1145/2908961. 2926987.

designed for accessibility and to further open science