Parameterized Complexity Analysis of Randomized Search Heuristics

2020·Arxiv

Abstract

Abstract

This chapter compiles a number of results that apply the theory of parameterized algorithmics to the running-time analysis of randomized search heuristics such as evolutionary algorithms. The parameterized approach articulates the running time of algorithms solving combinatorial problems in finer detail than traditional approaches from classical complexity theory. We outline the main results and proof techniques for a collection of randomized search heuristics tasked to solve NP-hard combinatorial optimization problems such as finding a minimum vertex cover in a graph, finding a maximum leaf spanning tree in a graph, and the traveling salesperson problem.

1 Introduction

Randomized search heuristics (RSHs) are a class of general-purpose algorithms that are often deployed to tackle hard combinatorial optimization problems that arise in practice. Instances of practical, real-world problems are usually structured or restricted in some way, and it is typically assumed that RSH techniques are successful when the underlying strategy is able to exploit the structural properties of the resulting search space.

The mathematical analysis of the running time of randomized search heuristics on discrete optimization problems has advanced in the last decade. For a wide array of these techniques, rigorous and precise asymptotic bounds on the performance as a function of problem size are now available. However, many of these kinds of results are restricted only to toy problems. While such analyses are useful for gaining an understanding of the general working principles underlying RSH techniques, it is often not clear how they might be interpreted in the context of classically hard problems in computer science.

Unless P = NP, the worst-case runtime of an NP-hard problem cannot be bounded from above by a polynomial in the input size. This is a rather restrictive view, and it often tells us nothing about the typical behavior of algorithms on problems that are likely to be encountered in practice. For example, many experimental studies confirm that randomized search heuristics such as evolutionary algorithms (EAs), ant colony optimization, simulated annealing, and simple hill-climbing perform well on practical instances of NP-hard problems. An important research question for RSH techniques applied to combinatorial optimization is: which features of a given instance determine its hardness, and how do such parameters influence the runtime?

The field of parameterized complexity offers a refinement of classical time complexity by analyzing the running time of an algorithm not just as a function of problem size, but also as a function of further parameters of the input, for example, solution size, structural restrictions, or quality of approximation [12, 15]. The idea is to capture the essence of what makes a problem instance hard, and try to isolate this hardness to some structural feature of the instance or its solution. The inevitable combinatorial explosion in the runtime is confined to a function of this parameter, with only polynomial dependence on the input size. Even large instances may exhibit a very restricted structure and can be easier to solve, independent of size. Parameterized complexity is therefore an obvious candidate for systematically studying what features of a particular problem are hard for RSH techniques. It can also offer advice on what types of problem might be soluble or insoluble by such approaches, and guide algorithm design. It should be noted that parameterized analysis can also be applied to study the efficiency of modules of an evolutionary algorithm. A good example is the hypervolume indicator, which has been widely applied in the area of evolutionary multiobjective optimization. Computing the optimal hypervolume is hard when the dimension grows, and the computation of the hypervolume has been investigated in [5] from a parameterized and average-case perspective.

Many hard problems have “easy parts” that can be efficiently solved in order to effectively shrink a problem to its computationally hard core structure. This can be done by efficiently reducing the problem instance to a smaller instance (kernelization), or constraining the search tree to a manageable size that is still guaranteed to contain a solution (bounded search tree method). A slower exact algorithm (even brute-force search) can then be run on the resulting smaller instance or search space. With little to no hope of a polynomial-time solution, one instead seeks algorithms that can solve a problem in time that grows polynomially with the problem size, although perhaps superpolynomially with respect to some instance parameter. In other words, if the parameter is fixed to be small, the problem class is tractable, even as its instances grow large. Such a problem class (and corresponding algorithm) is called fixed-parameter tractable (FPT). A slightly less desirable situation is an algorithm that runs in so-called slicewise polynomial time (XP). Here the runtime is a polynomial in the problem size, but a polynomial whose degree depends on the parameter.

This kind of demarcation into hard and easy components can also be useful for the analysis of RSH techniques. At the extreme end of the spectrum are functions such as Needle, whose black-box complexity establishes that no RSH could even beat simple random sampling in expectation. At the other extreme are problems from the OneMax class that are solved efficiently by even very simple approaches. Likely, practical optimization problems lie somewhere between these two extremes, containing some mixture of components that can be efficiently exploited by randomized search heuristics and components that essentially require random sampling. If the hard core component that demands random sampling is guaranteed to be small by the nature of the problem class, then RSH techniques can be a reasonable approach. The theory of parameterized complexity is therefore useful for isolating the structural features that can be efficiently exploited by RSH techniques from the hard “core” of a problem, on which an approach must resort to some kind of stochastic brute-force search behavior such as random walks, lucky jumps, or explicit restarts.

It should therefore not come as a surprise that analyzing randomized search heuristics from the perspective of parameterized complexity can lead to useful theoretical insights into algorithm design. For example, it has been shown that the specific choice of search operator can directly influence the fixed-parameter tractability of an algorithm on certain problems, for example, tree-preserving mutation on the maximum-leaf spanning tree problem [24] or standard uniform crossover on the closest-string problem [39].

The aim of this chapter is to discuss a number of results in the field of parameterized complexity applied to RSH techniques. We begin in Section 2 by introducing some background and technical details. In Section 3, we consider the maximum-leaf spanning tree problem and show that the use of a mutation operator commonly used for spanning trees reduces the XP runtime to FPT runtime when compared with standard bit mutations. In Section 4, we discuss multiobjective evolutionary algorithms that quickly focus their search on a kernel of minimum vertex cover instances, and subsequently perform random sampling on that kernel, resulting in FPT runtime. Decomposing the runtime analysis of an algorithm into a set of instance parameters is useful in its own right to better understand the components of a problem that influence the behavior of search heuristics. In Section 5, we present results on the maximization of submodular functions under different constraints. These results derive the expected time that simple evolutionary algorithms need to produce approximations as a function of both the problem size and additional parameters of the input. In Section 6, we describe the analysis of a standard evolutionary algorithm (EA) applied to the Euclidean traveling salesperson problem (TSP), which bounds the running time in the context of a well-known TSP parameterization (the number of points interior to the convex hull). In this case, it is possible to prove that the performance of the algorithm is bounded by the number of interior points, although this is not enough to obtain the desired fixed-parameter tractable runtime. On the other hand, if the EA is allowed to use some problem-specific information (namely, the cyclic order of points as they appear on the convex hull), it can explicitly focus its search on a small subset of states. This dramatic search space reduction yields fixed-parameter tractable runtimes for algorithms on parameterized TSP instances. We summarize the chapter in Section 7 and briefly discuss some open research problems.

2 Parameterized Complexity Analysis

Extending traditional runtime analysis by parameterization requires conducting a rigorous runtime analysis of an algorithm on a parameterization of a problem class. A parameterization of a problem class is a mapping of problem instances into the set of natural numbers. The running time of the algorithm is then expressed in terms of both the problem size and this extra parameter.

Let L be a language over a finite alphabet Σ. A parameterization of L is a mapping : Σ. The corresponding parameterized problem is the pair (). For a string , let ) and n = |x|. An algorithm deciding in time bounded by is called a slicewise polynomial-time algorithm (or XP algorithm). Here, is an arbitrary but computable function. An algorithm deciding in time bounded by is called a fixed-parameter tractable (or FPT) algorithm for the parameterization . Both kinds of algorithms run in polynomial time for fixed k, but an XP algorithm allows the degree of the polynomial to depend on the parameter, while the degree of the polynomial for the running time is independent of both n and k for an FPT algorithm.

Randomized search heuristics are typically stochastic processes that are allowed to run for a certain number of iterations, after which the best-so-far result is collected and returned. In each iteration, the process keeps a set of one or more candidate solutions, and evaluates their quality via a fitness or objective function. The candidate solutions for the next iteration are then computed using a number of transformation operations.

To analyze this class of algorithm, we consider a random variable T that measures the number of basic iterations (usually measured in calls to the objective function) until a solution is first discovered. Here, a solution may be, depending on the context, an element that maximizes or minimizes the objective function. This allows us to treat optimization problems in the same manner as one would treat decision problems. Specifically, given a class of instances of an optimization problem, for each N one can construct a decision problem as the set of all instances on which the maximum (or, minimum) objective function value is at least (or, at most) a particular value.

The quantity E[T ] is the expected optimization time, and is the most commonly used performance measure in the rigorous runtime analysis of randomized search heuristics. We say an algorithm is a Monte Carlo FPT algorithm for a parameterized problem () if it accepts with probability at least 1/2 in time and accepts with probability zero. Thus, any randomized search heuristic with a bound on L can be trivially transformed into a Monte Carlo FPT algorithm by stopping its execution after 2iterations.

Note that the parameter is allowed to depend on the input in more or less an arbitrary way. The selection of a meaningful parameterization depends strongly on what a “typical” problem instance looks like. In most cases, one hopes to choose a parameter that is assumed to be small over the set of problems one wishes to solve. Ideally, the parameter should somehow capture the source of exponential complexity for the problem [15].

The goal of applying parameterized complexity analysis to the field of randomized search heuristics is thus to somehow understand how much information from the fitness function can be exploited in more detail. At the worst extreme, there is no exploitable information in the fitness of solutions at all (i.e., the fitness of a solution tells us nothing about its relationship to a global optimum), and we are in a blind Needle-like case. Any RSH technique that employs such a fitness function must then rely entirely on getting lucky enough to stumble on an optimal solution. However, as previously mentioned, for most realistic problems we conjecture that there exists some structure in the fitness function that can be implicitly used by the RSH technique. Parameterized analysis can be seen as a technique that allows us to inspect the fitness function to assist in bounding how much “luck” is required to solve the problem.

3 Maximum-Leaf Spanning Trees

The classical minimum spanning tree problem, which can be solved in polynomial time by well-known deterministic algorithms such as those of Kruskal and Prim, has gained significant attention in the evolutionary computation literature [32, 11]. This includes the investigations of Witt [43], who considered an additional structural parameter of the given graph. He gave an upper bound on the runtime of simple evolutionary algorithms for the minimum spanning tree problem that depends on the circumference of the given graph. We will not present the details here, as the focus of this chapter is on NP-hard problems. We instead refer the interested reader to the original articles.

We start our investigations by considering an NP-hard variant of a spanning tree problem where the choice of mutation operator affects the parameterized runtime. Specifically, the commonly used standard bit mutation operation results in XP runtime, whereas a mutation operator that creates feasible solutions produces FPT runtime.

The problem we consider is the maximum-leaf spanning tree problem, and we summarize the results given in [24]. Given an undirected, connected graph G = (V, E), the goal is to find a spanning tree of G such that the number of leaves is maximum.

The authors of [24] considered two simple evolutionary algorithms that differ in the choice of the mutation operator. The first algorithm uses a general mutation operator carrying out standard bit mutations, and the second is specific to spanning tree problems. Both algorithms start with an arbitrary spanning tree T of G. We denote by m the number of edges in G, and by ) the number of leaves of the spanning tree T . A new solution is accepted only if it is a spanning tree whose number of leaves is at least as high as the number of leaves in the current solution. The algorithm called the Generic (1+1) EA is given in Algorithm 1.

Swapping an edge in the mutation step of the Generic (1+1) EA means that if an edge is present in T then it is not contained in with probability 1/m. On the other hand, if an edge is not present in T then it is contained in with probability 1/m. An edge does not change from T to with probability 1 in each mutation step, independently of the other edges.

The mutation operator of Algorithm 1 does not necessarily create an offspring that is a tree. If the offspring is not a tree, then this individual is discarded, as it represents an infeasible solution.

The second algorithm we consider is called the Tree-Based (1+1) EA and is illustrated in Algorithm 2. This approach uses a problem-specific mutation operator that ensures valid solutions, i.e., spanning trees. It is well known that, given a spanning tree T , a new spanning tree can be created by introducing an edge and removing an edge from the resulting cycle. Mutation operators based on this idea are commonly used when applying evolutionary algorithms to NP-hard spanning tree problems.

Our goal is to point out the differences between the two algorithms. To do this, we compare the expected optimization time E[T ] of the two algorithms. This shows that the problem-specific mutation operator of Algorithm 2 makes the difference between a fixed-parameter evolutionary algorithm and an evolutionary algorithm that cannot compute an optimal solution in expected FPT time.

For the Generic (1+1) EA, the authors of [24] gave a lower bound which showed that the algorithm cannot solve the problem in FPT time. They considered the graph given in Fig. 1. The instance contains

Figure 1: Local optimum, shown with dashed edges, and global optimum, shown with dotted edges; shared edges are drawn solid.

a local optimum, which has a distance to the global optimum in terms of the number of edges that have to be exchanged. The number of these edge exchanges depends on the number of nodes, r, the magnitude of which can be chosen to make it hard or easy to escape from the local optimum.

Formally, our graph, called (see Fig. 1) contains two components consisting of r vertices each. In component i, 1 2, two vertices and are connected to all the other vertices in that component. The vertex is connected to vertex x, which lies outside the component. Similarly, vertex is connected to vertex y. In addition, x and y share an edge. The graph is completed by attaching a path of 2 vertices to the vertex x. A tree has to contain all the edges of the path attached to x. In addition, at least one of the edges and has to be chosen for each i. For a given component, the maximum number of possible leaves is at most 1. This can be obtained by attaching all nodes of the component either to or to .

The graph contains a local optimum which consists of all edges attached to the vertices , 1 2, the edge {x, y}, and all path edges. The global optimum consists of all edges attached to the vertices , 1 2, the edge {x, y}, and all path edges. Compared with has an extra leaf, namely the vertex y. However, and differ by 4(1), edges which make it hard for the algorithms under consideration to obtain if has been produced before.

can only by improved by swapping at least 2(2) edges, as all nonsolid edges adjacent to at least one node need to be swapped to reach an improvement. As each bit corresponding to an edge of the graph is flipped with probability 1/m in the Generic (1+1) EA, the following lower bound on the expected optimization time of the Generic (1+1) EA is obtained.

Theorem 1. The expected optimization time of the Generic (1+1) EA on is lower bounded by (where c is an appropriate constant.

Using the same arguments, a lower bound of ((2)where c is an appropriate constant, has been given for the Tree-Based (1+1) EA. Again the bound considers the time to improve the locally optimal solution, which requires 2 edge exchanges. The mutation operator of the Tree-Based (1+1) EA has the benefit that a spanning tree is always created by introducing an edge and removing an edge from the resulting cycle, which results in a lower bound that is smaller than the one obtained for the Generic (1+1) EA. In terms of upper bounds, the Tree-Based (1+1) EA runs in FPT time when the value of an optimal solution k is the parameter.

The proof of the main result builds on the following lemma, which upper bounds the number of edges and the number of nodes of degree at least three as a function of k.

Lemma 2. Any connected graph G on n nodes and with a maximum number of k leaves in any spanning tree has at most n + 5edges and at most 1014 nodes of degree at least three.

Each spanning tree has 1 edges, which implies that the number of edge exchanges to obtain a maximum-leaf spanning tree from any spanning tree is n + 51) . Furthermore, a nonoptimal spanning tree can be improved by removing an edge of degree two from the cycle. The number of nodes of degree at least 3 is at most 1014, which gives a lower bound of 1/20k on the probability of removing an edge of degree two from the cycle.

The upper bound for the Tree-Based (1+1) EA is given in the following theorem, and the proof uses the arguments stated above.

Theorem 3. If the maximum number of leaf nodes in any spanning tree of G is k, then the Tree-Based (1+1) EA finds an optimal solution in expected time ).

4 Minimum Vertex Cover

The minimum vertex cover problem is an important classical NP-hard combinatorial optimization problem. Given an undirected connected graph G = (V, E), the task is to find a minimum set of vertices such that each edge is covered by one of the chosen nodes, i.e., holds for each . A set of vertices covering each edge is called a vertex cover.

Using a binary variable for each vertex , the minimum vertex cover problem can be formulated as the following integer linear program (ILP):

The linear program (LP) relaxation is obtained by relaxing the requirement to [0, 1]

The vertex cover problem is the most prominent problem in the area of parameterized complexity. As stated before, this area usually deals with decision problems. In the case of the vertex cover problem, one asks whether a given graph G has a vertex cover of at most k nodes.

Earlier studies [16, 33] on the performance of the (1 + 1) EA have shown that this algorithm may get stuck in the smaller component of a complete bipartite graph when the two partitions have different sizes. Escaping this local optimum requires the algorithm to flip all bits belonging to the global optimum at once, and therefore has a waiting time of Ω(), where OPT is the value of an optimal solution. Furthermore, if the two partitions and of the bipartite graph are extremely unbalanced, say and , where 0 is an arbitrary small constant, then the approximation ratio achieved by getting stuck in a local optimum is only = and can therefore be made very close to the trivial approximation achieved by selecting all vertices of the given graph.

4.1 Global SEMO

We consider the search space , where each bit of a search point x corresponds to a vertex of the given graph G. The vertex is chosen in the solution x iff = 1. The task is to find a solution x with a minimum number of vertices that covers all edges. This motivates us to introduce a fitness function based

on the number of edges left uncovered by x.

We denote by E(x) the set of edges covered by the cover x, i.e., E(x) := , where

:= = 1is the subset of vertices chosen by x.

Kratsch and Neumann [25] considered two fitness functions for minimum vertex cover. The first fitness

where = 1}| corresponds to the number of chosen vertices and u(x) := |E \E(x)| is the number of edges left uncovered by x. Note that u(x) is useful for directing the search process towards a feasible solution, i.e., a solution x for which u(x) = 0 holds. This function had already been considered in [16] in the context of approximations.

In addition, the authors of [25] examined a second fitness function that uses additional information obtained from a linear program. Let G(x) = (V, E \ E(x)) be the graph obtained from G by removing all edges covered by nodes in x. We also consider the fitness function

where LP(x) denotes the optimum value of the relaxed vertex cover ILP for G(x), i.e., the cost of an optimal fractional vertex cover of G(x).

The multiobjective approach uses the Global SEMO algorithm (see Algorithm 3). The algorithm starts with a bit string chosen uniformly at random. In each iteration, one individual x of the current population P is selected uniformly at random and undergoes standard bit mutation to produce an offspring . The offspring is added to the population iff it is not strictly dominated by any other individual in P. In this case, all individuals in P that are (weakly) dominated by are removed from P. We will examine Global SEMO for the minimum vertex cover problem in this section and for maximization in several different types of problem involving submodular functions in the next section.

When minimizing the number of uncovered edges and the number of chosen vertices at the same time, Global SEMO achieves an approximation to within a factor of O(log n) for the minimum vertex cover problem. These results may be generalized to the wider class of set cover problems. Kratsch and Neumann [25] have used a modification of Global SEMO (called Global SEMO) and shown that their approach computes an optimal solution in FPT time.

The results presented rely on an alternative mutation operator (see Algorithm 4) that has the ability to perform bit flips with a high probability if the corresponding node is adjacent to at least one uncovered edge (line 7 of Algorithm 4). This allows the algorithm to perform random sampling on the subgraph consisting of the uncovered edges. If this subgraph constitutes a kernel of the problem, the random sampling process is similar to a brute-force search on the kernel. We will summarize those results in the following.

We outline the results for the algorithms introduced in this section, but should also mention that the vertex cover problem has been subject to further parameterized analyses in the context of randomized search heuristics. For example, the investigations of the vertex cover problem that we present in this section have been extended to the weighted vertex cover problem [35]. Gao et al. [18] have studied random initialization heuristics as well as local search algorithms in terms of parameterized complexity and approximation. Furthermore, the vertex cover problem has been analyzed in dynamic settings where edges can be removed from or added to the graph [34].

4.2 Parameterized Analysis

The first parameterized result in the context of optimal vertex covers considers Global SEMOtogether with the objective function , which uses the number of uncovered edges as the second objective. The population size of the algorithm is upper bounded by n + 1, as the main objective (number of chosen nodes) can only take on that many different values. The same upper bound on the population size is applied when using .

The first analysis relies on the following basic insight. Let OPT be the value of an optimal solution; then an optimal solution has to include all nodes of degree at least OPT + 1. This is based on the simple observation that if a node v of degree OPT+1 is not selected, all neighbors of v have to be selected, resulting in a nonoptimal solution.

Theorem 4. The expected optimization time of Global SEMOfor the minimum vertex cover problem using the fitness function is upper bounded by + ).

The proof of the theorem proceeds in several different phases. First, the expected time until the search point 0is included in the population is analyzed. The proof for this part focuses on selecting the individual with the smallest number of 1-bits, which happens with probability at least 1/(n + 1), as the number of different values for is at most n + 1. Producing a solution with a smaller number of 1-bits is always accepted, and the problem can be seen as maximizing the number of 0-bits, slowed down by a population of size at most n + 1. Hence, after an expected number of log n) steps of Global SEMO or Global SEMOusing or , the search point 0is included in the population.

We now consider and assume that the search point 0is already included in the population. Subsequently, the expected number of steps where the population does not contain a solution x for that is a kernel for the problem is upper bounded by ). For is a kernel iff the vertices chosen by x constitute a subset of an optimal solution and the maximum degree of G(x) is at most OPT. In order to upper bound the number of steps where the population does not contain a solution x that is a kernel, a potential function with ) different values is taken into account that measures the population with respect to the number of uncovered edges that its individuals have. It can be shown that the potential can always be improved with probability at least Ω(1) if no kernel is contained in the population. As the potential cannot increase, the expected number of steps where the population does not contain a kernel is )

Denoting by ˆx the resulting vertex cover, the kernel instance ) has at most + OPT nonisolated nodes. In this case, the alternative mutation operator is able to produce the optimal solution from ˆx in expected time ). In this upper bound, the factor n accounts for selecting the individual ˆx with probability at least 1/(n + 1) and the term ) accounts for mutating this individual into an optimal solution. The exponential component of the runtime arises from the waiting time to make a lucky random jump, but this jump is now required only on a reasonably small kernel instance.

The runtime bound can be improved if the value of an optimal linear program LP(x) for the graph G(x) consisting only of the uncovered edges is used as the second criterion, leading to the fitness function . The goal is to minimize the penalty LP(x), and we have LP(x) = 0 iff x is a vertex cover.

The analysis is based on the following result of Nemhauser and Trotter [31], who proved a very strong relation between optimal fractional vertex covers and minimum vertex covers.

Theorem 5. Let be an optimal fractional vertex cover and let be the vertices whose corresponding components of are 0 or 1, respectively. Then there exists a minimum vertex cover that contains and no vertex of .

Theorem 5 implies that one can take all vertices set to 1 in an optimal fractional vertex cover and reduce the size of the problem in this way. Furthermore, it is well known that every basic feasible solution x of the vertex cover LP relaxation is half-integral, i.e., we have [4]. Using these properties, the following result has been shown.

Theorem 6. The expected optimization time of Global SEMOfor the minimum vertex cover problem using the fitness function is upper bounded by log + ).

We now explain the key ideas of the proof. We already know that the population contains the search point 0after an expected number of log n) steps. After 0has been included in the population, the number of steps where the population does not contain a kernel is investigated. For , a solution x is a kernel iff LP(x) = and each optimal fractional vertex cover assigns 1/2 to each nonisolated vertex of G(x). The number of steps where P does not contain such a kernel x after 0has been included in the population can be bounded by ) using the following arguments. Solutions with objective value () are Pareto optimal. The proof proceeds by considering the solution x with objective vector () and the largest value of r in the population. If x is not a kernel, that x can be chosen for mutation with a probability of at least 1/(n + 1) and one specific bit can be flipped with a probability of at least 1/(en) to produce a Pareto-optimal offspring with objective vector (r + 11). As the value of the LP is upper bounded by OPT, at most OPT of such steps can happen. This upper bounds the number of additional steps (after 0has been included in the population) by ).

Let ˆx be the kernel with objective vector (), where r is the maximum such that all nonisolated vertices of G(x) obtain a value of 1/2 in ) has at most 2(nonisolated vertices, as the vertices that are chosen belong to an optimal solution and every nonisolated vertex contributes 1/2 to the LP value. The expected time to produce an optimal solution after a kernel ˆx has been included in the population is ) = ), as the optimal solution can be obtained by choosing ˆx for mutation and flipping exactly the bits corresponding to the nonisolated nodes of an optimal solution while not flipping the remaining bits.

Kratsch and Neumann have also given the following trade-off results with respect to runtime and approximation. These results show the previous FPT time bound (= 0), as well as that Global SEMOachieves a 2-approximation (= 1) in expected polynomial time.

Theorem 7. Using the fitness function , the expected number of iterations of Global SEMOuntil it has generated a (1 + )-approximate vertex cover, i.e., a solution of fitness (r, 0) with (1 + , is log + ).

The proof of Theorem 7 uses the same kernelization arguments as the proof of Theorem 6. Once a solution ˆx that is a kernel of the problem has been produced, it is shown that if ˆx is selected for mutation then it will mutate with probability Ω((1) into a solution for which

holds. Such a solution can be turned into a vertex cover by single mutation steps that reduce LP(x) by at least 1/2 while increasing the size of the vertex cover by one, leading to a vertex cover of size at most (1 + .

5 Submodular Functions with Constraints

Submodular functions constitute a broad class of interesting problems. A function f : 2is submodular iff ) for all . In the context of optimizing a submodular function f, we will often consider the incremental value of adding a single element, leading to an equivalent definition. We denote by ) = ) the marginal value of i with respect to A. A function f is submodular iff ) for all and .

We consider the problem of maximizing a given submodular function f. The problem is NP-hard, as it generalizes many NP-hard combinatorial optimization problems, such as maximum cut [19, 14] and several others [1, 7, 21, 14], The class of submodular functions also includes the class of linear functions that have been well studied in the area of theory of evolutionary computation. Friedrich and Neumann [17] have analyzed the maximization of submodular functions with different constraints and carried out runtime analyses depending on the parameters of the given constraint. We will summarize the results in this section.

Friedrich and Neumann considered the maximization of a given submodular function f under a given set of matroid constraints. A matroid is a pair (X, I) composed of a ground set X and a nonempty collection I of subsets of X satisfying (1) if and then and, (2) if and |A| > |B| then for some . The sets in I are called independent, and the rank of a matroid is the size of any maximal independent set. We will consider several different classes of submodular functions together with different types of matroid constraints.

Friedrich and Neumann analyzed the (1 + 1) EA and Global SEMO as baseline algorithms. For the (1 + 1) EA, the fitness function h(x) = (v(x), f(x)) was considered. Here, v(x) measures the constraint violation of x. Generalizing the fitness function used by Reichel and Skutella [37] for the intersection of two matroids, they considered problems with k matroid constraints ,

where ) denotes the rank of x in matroid , i.e.,

for the set X given by x. We have v(x) = 0 iff x is a feasible solution and v(x) > 0 otherwise. The function h(x) is optimized in lexicographic order, i.e.,

We denote by F the set of feasible solutions. For Global SEMO, Friedrich and Neumann set z(x) = f(x) iff and z(x) = 1 iff and considered the multiobjective problem g(x) := (where = ) denotes the number of 0-bits in the given bit string x. Adding the number of 0-bits as the second objective to be maximized forces the empty set to be Pareto optimal, and allows the algorithm to construct solutions greedily.

5.1 Monotone Functions with Uniform Constraints

We now summarize the results for the special class of monotone submodular functions under one uniform matroid constraint. A function f is monotone iff ) for all . A uniform matroid constraint of size r means that a set is feasible iff it consists of at most r elements, i.e., .

A key property of Global SEMO that is often employed in theoretical analysis is that it constructs solutions in a manner similar to a greedy algorithm. Furthermore, the population size can be bounded by n + 1, as the number of different objective values for the second objective is n + 1. This implies that one particular individual that is needed for the analysis is selected with probability Ω(1/n). The algorithm removes elements in order to maximize the number of zeros. Using the number of zeros as the second objective implies that the algorithm maintains a population where the solution with the smallest number of elements is never removed. Furthermore, each solution that has a smaller number of selected elements than the solutions previously found is included in the population. Eventually, this leads to a population which includes the solution consisting of the empty set. In terms of the first objective (the overall goal function), the algorithm tries to maximize its objective value in a greedy manner. It does so by adding elements that provide the largest benefit to a current solution. Putting these arguments together, the following approximation result can be obtained for Global SEMO and the maximization of monotone submodular functions with a uniform constraint.

Theorem 8. The expected time until Global SEMO has obtained a (1 )-approximation for a monotone submodular function f under a uniform constraint of size r is (log n + r)).

The proof of the theorem uses the fact that the population size is always bounded by n + 1 and therefore one particular individual is selected with probability at least 1/(n + 1) in each step. The first phase of the proof shows that the empty set, represented by the bit string 0, is included in the population in expected time log n). Similarly to the analysis for vertex cover in the previous section, this bound is obtained by considering the factor O(n) for the population size and bounds on a coupon collector process for maximizing the number of 0-bits. The ) term accounts for the greedy process where the correct individual in the population is selected with probability Ω(1/n) and the appropriate greedy step is applied to this individual with probability Ω(1/n). Finally, there are at most r of these steps, as no more than r elements can be inserted owing to the given constraint. The approximation ratio follows from the greedy process.

5.2 Monotone Submodular Functions under Matroid Constraints

Now we take a look at more complex problems. Again we consider monotone submodular functions but with k matroid constraints. The algorithm that we consider is the (1 + 1) EA. The number of these matroid constraints is the important parameter that we consider and it determines the approximation ratio that is achieved, as well as the exponent of the runtime. Furthermore, there is a parameter 1 that allows for a fixed value of k to trade off the approximation quality and runtime of the algorithm.

Theorem 9. For any integers 2, 1 and a real value 0, the expected time until the (1 + 1) EA has obtained a (1/(k + 1))-approximation for any monotone submodular function f under k matroid constraints is log .

We summarize the main ideas of the proof here. The first part of the proof consists of showing that the algorithm reaches a feasible solution x with . The expected time until the (1 + 1) EA has obtained such a solution can be upper bounded by ). To attain this bound, the proof first argues that the (1+1) EA obtains a feasible solution in expected time O(kn (log k+log n)) by using the fitness level method applied to the value of the penalty v(x). Afterwards, it is shown that, from any feasible solution x, a feasible solution y with can be obtained by flipping k + 1 specific bits. The expected waiting time for this event is ).

A p-exchange operation applied to the current solution x introduces at most 2p new elements and deletes at most 2kp elements of x. A solution y that can be obtained from x by a p-exchange operation is called a p-exchange neighbor of x. According to [27], every solution x for which there exists no p-exchange neighbor y with (1+ ) is a (1))-approximation for any monotone submodular function. So, the proof works by analyzing the time until a feasible solution has been obtained. Afterwards, it uses the fact that there is still a p-exchange neighbor unless the desired approximation quality has already been obtained.

5.3 Symmetric Submodular Functions under Matroid Constraints

We now summarize the main result for Global SEMO for the optimization of symmetric submodular functions under k matroid constraints. The following theorem makes use of the greedy and local search ability that the algorithm Global SEMO has.

Theorem 10. The expected number of iterations until Global SEMO attains a-approximation for any symmetric submodular function under k matroid constraints is log , for any constant 0.

The analysis makes use of the following result in [26], which shows that there are always locally improving steps as long as the desired approximation quality has not been obtained.

Lemma 11. Let x be a solution such that no solution with fitness at least1 + ) can be achieved by deleting one element or by inserting one element and deleting at most k elements. Then x is a-approximation.

The proof of Theorem 10 uses this lemma together with the fact that Global SEMO introduces the search point 0into the population after an expected number of log n) steps. As the search point 0is Pareto optimal, it stays in the population once it has been introduced. Selecting 0for mutation and inserting the element that leads to the largest increase in the f-value produces a solution y with . The reason for this is that the number of elements is limited by n and that f is submodular. Global SEMO will also always have a solution with the largest f-value obtained so far in the population. Selecting this solution x for mutation and flipping at most k + 1 specific bits according to Lemma 11 produces a solution y with 1 + ) as long as x does not yet have the desired approximation quality. The expected waiting time for this event is ), as at most k+1 specific bits of x have to be flipped and the population size is at most n + 1.

The number of steps that improve the solution with the largest f-value needed in order to achieve the desired-approximation is upper bounded by

which implies that the expected time to achieve a-approximation is log .

6 Euclidean TSP

Given a set of n points in the plane, the objective of the Euclidean TSP is to find a permutation that minimizes the cost function

where ) denotes the Euclidean distance separating the points and and arithmetic is taken to be modulo n. The Euclidean TSP is NP-hard, but can be approximated to within a factor (1 + ) for every fixed in polynomial time [2].

It is convenient to consider the complete undirected graph G = (V, E) and define the Hamiltonian cycle induced by the edges followed by a given permutation :

We will refer to the cycle ) as a tour.

Iterative improvement methods rely on the iterated exchange of a small number of edges and are powerful approaches for solving large-scale TSP instances in practice. These heuristics move through the space of candidate solutions by repeatedly applying move or mutation operators to pivot between tours. For the TSP, this is typically some variant of the powerful k-opt operation. The k-opt move considers some candidate tour ), and deletes k mutually disjoint edges and reassembles the remaining fragments into a new valid tour ). The operation induces a neighborhood structure on the search space of tours, and thus serves as a strong and easy-to-implement local search operator. However, instances exist where this approach is provably inefficient. For example, local search algorithms employing a k-opt neighborhood operator can take exponential time even to find a locally optimal solution [6]. This even holds for the Euclidean case [13].

The convex hull of V is the smallest convex set containing V . A point is called an inner point if v lies in the interior of the convex hull of V . We denote by the set of inner points of V , and define Out(V ) := V \ Inn(V ). The TSP parameterized by k = Inn(V ) is in FPT. Specifically, De˘ıneko et al. [9] showed that if a Euclidean TSP instance with n vertices has k vertices interior to the convex hull, there is a dynamic programming FPT algorithm. Other parameterizations are not as propitious; for example, finding a local optimum in the k-opt neighborhood for the metric TSP is hard for W[1] [28]. [1], but the containment is conjectured to be proper [15], in which case no such FPT algorithm can exist.

Parameterized results for evolutionary algorithms for the Euclidean TSP have been developed in a series of papers [40, 29, 30, 41] in the context of the inner-point parameterization of De˘ıneko et al. [9]. We also would like to mention that the generalized traveling salesperson problem has been investigated in the context of parameterized complexity. In this problem, the cities belong to different clusters and the goal is to compute a shortest tour that visits each cluster exactly once. We refer the interested reader for details of the generalized TSP to Corus et al. [8].

The remainder of this section sketches these results, starting with the setting in which the algorithm is oblivious to problem-specific information (other than the cost of a tour) and ending with algorithms that exploit problem-specific structure.

6.1 Black-Box Algorithms

In the black-box setting, heuristics are not allowed any access to domain-specific knowledge about the instance other than the cost of a tour. For Euclidean TSP instances with k = Inn(V ) inner points, it is possible to show that the () EA generates an optimal solution in slicewise polynomial time (that is, in time , where g depends only on k). Later, in Section 6.2, we will discuss how it is possible to improve this to FPT time when domain knowledge is incorporated into the design of the algorithm.

The 2-opt operator mentioned above corresponds to segment reversal in the linear form of the corresponding tour permutation. We refer to the 2-opt operation as the inversion operation and illustrate it in Fig. 2. We consider random local search (RLS), defined in Algorithm 5, and the () EA, defined in Algorithm 6. Note that RLS maintains a population of size one, and performs exactly one inversion operation in each iteration. On the other hand, the () EA maintains a population of permutations and produces offspring in each generation by applying Poisson mutation (see Function mutate).

Definition 12. The inversion operation transforms permutations into one another by segment reversal in their linear forms.

A permutation x is transformed into a permutation ] by inverting the subsequence of the linear form of x from position i to position j, where 1 :

Figure 2: The effect of the inversion operation on a tour. Inverting a subsequence in the permutation representation corresponds to a 2-opt move in which a pair of edges in the current tour is replaced by a pair of edges not in the tour.

We also consider the permutation jump operator studied by Scharnow, Tinnefeld, and Wegener [38] in the context of sorting problems.

Definition 13. The jump operation transforms permutations into one another by position shifts in their linear form. A permutation x is transformed into a permutation ] by moving the element in position i in the linear form of x into position j in the linear form of ] while the other elements between position i and position j are shifted in the appropriate direction. Without loss of generality, suppose i < j. Then,

Every tour ), for all permutations on V , corresponds to a set of edges that describe a closed polygon in the plane. If V is noncollinear (no three points are collinear), the vertices on the boundary of the convex hull of V appear in their cyclic order in a minimum-cost tour, and no edge is intersecting [36]. When a tour contains a pair of edges that intersect at a point p, those edges form the diagonals of a convex quadrilateral. The interior edges of this figure describe nondegenerate triangles in the Euclidean plane. Thus, as long as no three points are collinear, removing these edges and replacing them with the corresponding nonintersecting edges results in a strictly shorter tour. This is illustrated in Fig. 3.

Figure 3: Removing the intersecting edges (s, t) and (u, v) and reconnecting the two disconnected tour path segments with edges (s, v) and (u, t) results in a strictly shorter tour.

6.1.1 Avoiding Arbitrarily Small Improvements

Worst-case proofs for 2-opt on the TSP exploit the fact that when points are allowed in arbitrary positions, the smallest change in fitness between neighboring solutions can be made arbitrarily small [13]. This allows the possibility of exponential-length paths between a candidate solution and a reachable local optimum. Sutton and Neumann [40] circumvented this is by imposing bounds on the angles between points. A set of points V is angle-bounded by for some 0 2 if, for any three points , 0 , where denotes the angle formed by the line from u to v and the line from v to w. Under this condition, the runtime bound depends on the angle bound , and so we may consider it as an additional parameterization of the instance. This is also applicable to the class of TSP instances whose points are embedded in an grid (with the further restriction that no three points are collinear). This kind of quantization can result when the coordinates of each point are rounded to the nearest value in a set of m equidistant values. In these cases, the changes in cost between neighboring solutions can be bounded from below, avoiding exponentially long improvement chains to reach a local optimum.

Definition 14. Let V be a set of points angle-bounded by . We define

where and denote the maximum and minimum Euclidean distances, respectively, between points in V .

Quantized instances yield a more meaningful interpretation of ), as is captured by the following proposition.

Proposition 15. Let V be a set of points embedded in an grid with no three points collinear. Then V is angle-bounded by such that ) =

Proposition 15 follows from Definition 14 and the fact that V is angle-bounded by arctan(2(2)and ).

6.1.2 Instances in Convex Position

A set of points V are in convex position when Inn(V ) = . In this case, we must wait only for the process to remove all intersecting edges. Upper bounds on the time until RLS and the () EA have removed all such edges (and thus produced an optimal tour) can be expressed as a function of the angle-bounding function A. More conveniently, when an instance is embedded in an grid, both processes can solve the instance in time polynomial in both n and m.

Theorem 16. Let V be a set of planar points in convex position angle-bounded by . The expected time for RLS to solve the TSP on V is )), where A is as defined in Definition 14.

The proof of Theorem 16 relies on the fact that any 2-opt move that replaces a pair of intersecting edges with a pair of nonintersecting edges in an angle-bounded instance results in an improvement of the tour by

Any pair of intersecting edges can be removed with a particular 2-opt operation (each of which occurs with probability Ω()), and thus we can derive a straightforward bound on the waiting time until all such intersections have been removed.

Theorem 17. Let V be a set of planar points in convex position angle-bounded by . The expected number of fitness evaluations needed by the () EA using 2-opt mutation to solve the TSP on V is bounded from above by max, where A is as defined in Definition 14.

The proof of Theorem 17 is similar to the proof of Theorem 16, except that we must account for any slowdown incurred by selecting from a population. Specifically, the probability that at least one of the offspring improves on the current best-so-far point is at least 1. When 2,

an intersection is removed with constant probability in each generation and we must wait only )) generations to find an intersection-free tour (owing to the improvement guarantee from (6.2)). On the other hand, when 1)/2, the improvement probability can be as low as ). The runtime bound follows by accounting for this and the extra fitness evaluations that need to occur in each generation.

6.1.3 Bounded Number of Inner Points

The polynomial-time results on angle-bounded instances in convex position raise the question of what kind of influence the number of inner points can have on the running time of the above-mentioned algorithms. In this section, we discuss how the Euclidean TSP parameterized by the number of inner points can be solved in slicewise polynomial time in the black-box setting.

Theorem 18. Let V be a set of points angle-bounded by such that |Inn(V ) | = k. The expected number of fitness evaluations needed for the () EA using 2-opt mutation to solve the TSP on V is bounded from above by

and the expected optimization time for the (1 + 1) EA is

Theorem 18 can be proved by partitioning the amount of time the () EA spends on tours that contain intersections and tours that do not contain intersections. In particular, let be the best-so-far tour found by generation t of the () EA. If ) contains a pair of intersecting edges, the probability of the EA creating a strictly improving tour via a 2-opt mutation on is bounded from below. Moreover, the angle-boundedness of the instance guarantees an additional lower bound on the amount of actual fitness improvement when such a mutation occurs. Hence, the total expected time that the process spends on tours with intersecting edges is bounded as in Theorem 17.

In the case where contains no intersecting edges, the vertices on the boundary of the convex hull must appear in in their correct cyclic order for a minimum-cost tour [36]. An optimal tour can then be produced from by rearranging the points in Inn(V ) to the correct positions. Poisson mutation (see Function mutate) is capable of performing this rearrangement by selecting at most 2|Inn(V ) | = 2k specific inversion operations. This occurs with probability at least

which yields a simple upper bound on the waiting time to jump from an intersection-free tour to an optimal solution. The claim then follows by carefully accounting for the correct parent selection probabilities and summing the bounds on the expected time spent on tours with intersections and nonoptimal intersection-free tours.

6.1.4 Mixed-Mutation Strategies

The proofs of the theorems in the preceding sections rely on the inversion operator to construct an intersection-free tour, but then rely on the inversion operator to simulate a jump operation in order to transform the intersection-free tour into an optimal solution. The analysis can be improved by relying on a mixed-mutation strategy (see Function mixed-mutation) that performs a mixture of both inversion and jump operations, each with constant probability. This improves the upper bound on the running time by a factor of Ω1)!1)!.

x;

draw r from a uniform distribution on the interval [0, 1];

draw s from a Poisson distribution with unit expectation;

if r < 1/2 then perform s + 1 random inversion operations on y;

else perform s + 1 random jump operations on y;

return y;

Theorem 19. Let V be a set of points angle-bounded by such that |Inn(V ) | = k. The expected number of fitness evaluations needed for the () EA using mixed mutation to solve the TSP on V is bounded from above by

and the expected optimization time for the (1 + 1) EA is bounded from above by

The proof is similar to the proof of Theorem 18. With mixed mutation, a 2-opt operation still occurs with constant probability, so the likelihood of a sufficient improvement is asymptotically equivalent to the case of Theorem 18. A jump operation occurs also with constant probability, but the probability that such an operation jumps to an optimal solution (by correctly rearranging the positions of the points in Inn(V )) is bounded from below by

6.2 FPT Evolutionary Algorithms

In the case where search heuristics have access to problem-specific information, FPT results are also available. Specifically, we consider heuristics that have access to both fitness values and the cyclic ordering of the points on the convex hull. This ordering can be precomputed in polynomial time [20] and stored so that it is available to the heuristic at any time.

6.2.1 A Population-Based Approach

Building on a previous study of Theile [42], Sutton et al. [41] constructed a population-based evolutionary algorithm that efficiently solves the Euclidean TSP when the number of inner points is not too large. They showed that a small modification to Theile’s (+1) EA that carefully maintains the invariant that the points in Out(V ) remain in correct convex-hull order for each individual results in an FPT evolutionary algorithm for the inner-point parameterization of the Euclidean TSP.

The EA maintains a large population of permutations on subtours in the graph G = (V, E) (a subtour is a Hamiltonian cycle on a subset of V ). In each generation, a new offspring is created via a specialized mutation operator that extends the subtour by incorporating an additional randomly chosen vertex, and a modified truncation selection is applied that chooses the best individual for a subtour. The EA can be seen as an evolutionary approach to dynamic programming, the framework for which was presented in [10].

For a set of n points V in the plane with |Inn(V ) | = k, we denote by := () a linear order on the points of Out(V ) such that for all and are adjacent on the boundary of the convex hull of V . For any subset , a permutation on U is a bijection . We say that a permutation x on is -respecting if and only if, for all ) =. We call U the ground set of the permutation x on U. We refer to the first element x(1) in the linear order of such a permutation as the head vertex and the last element x(|U|) as the tail vertex.

The () EA maintains a population P of -respecting permutations on subsets of V . For each subset ) and each ], the population P contains permutations on the ground set . There are (|S|+i)! possible permutation on this ground set. If we were to allow all of them in the population, |P| would be exponential in n. Hence, the key to the FPT running time of the EA is the realization that in an optimal solution, the points in Out(V ) must always appear in their order around the hull. Therefore it is wasteful to consider permutations that are not -respecting.

To exploit this, for each possible ground set , the population contains exactly |S| + 1 -respecting permutations on that ground set, one for each possible unique tail vertex from the ground set. Specifically, for every ) and every ] there is a permutation x for every such that

We denote a permutation over the ground set with tail vertex r by . The corresponding subtour of a is a cycle (x(1) = ) that starts at and runs through each point of the ground set U exactly once (the i points of Out(V ) are visited in the order in which they appear in ). Finally, the cycle visits r before returning to . An illustration of a subtour for an example permutation on a small ground set is depicted in Fig. 4. The fitness function utilized by the () EA is simply the cost of the subtour of an individual:

where the summation indices are taken to be modulo |S| + i.

Figure 4: The subtour defined by the permutation = () where S = {u, v, r} and i = 4. The positions of the points ) in the linear order of the permutation respect their cyclic order around the convex hull.

For any given ), there are ways to construct a ground set (by choosing i) and |S| + 1 ways to choose the tail vertex from . The total number of individuals in the population is thus

The specially designed mutation operator extends a permutation by adding exactly one new point to its ground set, preserving the validity constraints. In particular, a vertex v is chosen uniformly at random from the remaining vertices in (A new permutation is constructed from x by concatenating v with the linear order described by x; that is, for + 1},

Thus is a permutation over the ground set and uses v as the new tail vertex:

When and S = Inn(V ), the mutation operator has no effect, since the ground set cannot be extended for such an individual.

In each generation of the () EA, individuals are selected uniformly at random from P. For each selected individual x, an offspring is generated by composing the mutation operator described above s + 1 times, where s is drawn from a Poisson distribution with unit expectation. Survival selection proceeds by ensuring that each mutated offspring may replace only the individual in the parent population with the same ground set and tail vertex, and this replacement occurs only when the fitness of the offspring is at least as good as the fitness of the corresponding parent. In this way, the surviving population maintains the invariant that each valid combination of ground set and tail vertex is represented exactly once.

Theorem 20. Let V be a set of n points in the Euclidean plane with |Inn(V ) | = k. After O(max) generations, the () EA solves the TSP on V to optimality in expectation and with probability 1.

Note that this bound translates to O(max) fitness evaluations in expectation, by taking the random numbers counting fitness evaluations and generations to be and , respectively, and noting that

for Algorithm 7, ] = ]. The proof of Theorem 20 proceeds by bounding the time it takes to increase the set of optimal subtours in the population. In particular, we say that a population is solved to order m when it contains an individual permutation on a ground set of size m that corresponds to an optimal subtour on that ground set. Obviously, such subtours are never lost (since they cannot be replaced by a suboptimal subtour), and the initial population is solved to order 1 since it contains the individual . The claim follows by bounding the probability of a transformation from a population solved to order m to one solved to order m + 1, and subsequently taking the waiting time to get a population solved to order n.

6.2.2 Inner-Point Permutations

As we saw in Section 6.2.1, incorporating domain knowledge into the design of an EA can allow us to create a randomized FPT algorithm for a particular parameterization of the Euclidean TSP. Algorithm 7, however, potentially needs a large population, specifically ). Another approach is to keep a small population and use an EA to search for the optimal ordering on the inner points. Specifically, we let = () be the fixed order of points in Out(V ) as they appear on the convex hull. For any permutation ), it is straightforward to compute the value of the optimal tour through Inn(V ) and Out(V ) respecting the order of both and x. The naive approach is to try all ) possible ways of merging the linear orders of the permutations and x. This would violate our FPT requirement, since the parameter appears in the power of the polynomial. Instead, to preserve our FPT conditions, we can directly use a dynamic programming approach to compute the fitness of the permutation x on Inn(V ).

We define two (+ 1) matrices and , where ] (or ]) stores the value of the minimum-weight subtour of all tours through points and x(1), x(2), . . . , x(j) such that they respect the orders of both and x, and they end on an outer point (or inner point, respectively). Then the optimal tour given the permutations and x is

Taking the boundary case as 0] = 0 (the subtour consisting only of ), we can compute

for i 1, 2, . . ., n and j 1, . . . , k}, and

for and . Entries that do not correspond to valid subtours, namely ] for 1 (since the tour cannot end on and then return to ) and 0] for 1 (since a subtour cannot end on an inner point when the inner-point set is empty), are set to .

The two F matrices can be computed in O(nk) time using dynamic programming. Thus, the time complexity of the fitness evaluation of Dyn(x) is O(nk).

Theorem 21. Let V be a set of n points in the Euclidean plane with |Inn(V ) | = k. Assuming ), the () EAsolves the TSP on V using at most + (1)!) fitness evaluations with the jump operation as the basic mutation operation. This bound can be improved to + (2)!) by using 2-opt mutation. Moreover, each fitness evaluation has time complexity O(nk).

Note that we state the theorem slightly differently than in [41], in which the expected number of gener-

ations was proved to be O(max1)!) for jumps and O(max2)!) for 2-opt mutation. The bounds stated in Theorem 21 follow by noting that the number of fitness evaluations in generations of Algorithm 8 is , and the added assumption about . The proof of Theorem 21 relies again on the probability that a given mutation correctly arranges the inner points. Since the mutation operation performs s + 1 random basic operations, where s is Poisson distributed, the probability that it performs basic operations is 1)!. On a permutation of length k, a distinct jump (or 2-opt) move is chosen uniformly at random with probability at least , so the probability that a specific sequence of

basic operations occurs is at least

Therefore, the waiting time to create a globally optimal offspring is bounded by the diameter of the search space induced by the mutation operator. For 2-opt, this bound is at most 1 [3], and for the jump operation, the bound is k. In the case of jump, the probability that at least one of the offspring created in any generation is optimal is at least 1 (1 min. The claim follows from a standard waiting-time argument. We improve the bound for 2-opt by substituting 1) in the above transformation probability.

7 Conclusion

In this chapter, we have presented an outline of recent results on the parameterized complexity analysis of randomized search heuristics. This approach of incorporating additional salient parameters into running-time analysis allows a finer-grained understanding of the influence of problem structure on the behavior of these general-purpose optimization techniques.

We have seen that a parameterized analysis can illuminate the inherent efficiency of particular search operators, as well as reveal the difficult components that might arise in the search space of a problem instance. This is the case for the maximum-leaf spanning tree problem. On graphs where k is the maximum number of leaves in a spanning tree, a tree-preserving mutation operator guarantees that the (1 + 1) EA can find such a tree in fixed-parameter tractable time ). This is in contrast to standard mutation, for which there exist graphs with m edges requiring (steps.

We have also observed that the concept of kernelization from the theory of parameterized complexity can be useful. Multiobjective algorithms using a specialized mutation operator can focus the search on a problem kernel of the vertex cover problem, leading to an FPT running time. We have explored how parameterized analysis can help to strengthen an understanding of the components of very general problem classes on simple evolutionary algorithms. This is the case, for example, with the maximization of submodular functions under different constraints.

For the Euclidean TSP, the inner-point parameterization of De˘ıneko et al. [9] illuminates the difficulty for RSH techniques arising from the number of points that lie inside the convex hull of the instance. This informs the design of FPT problem-specific evolutionary algorithms, but so far the best known black-box analysis for this parameterization remains in XP time. An open problem is therefore either to prove that this is a lower bound for the parameterization, or to improve the upper bound to FPT time.

Traditional running-time analyses of randomized search heuristics on some artificial benchmark functions have already implicitly used a parameterized perspective. One clear example is for the Jump function, the running time analysis of which is typically parameterized by the jump-gap size (k) and the string length (n). Indeed, the running-time dichotomy between mutation-only evolutionary algorithms (Ω() [22]) and recombinant evolutionary algorithms (poly(n)) [22, 23]) already exhibits an “FPT-like” flavor. The application of parameterized analysis to running-time analysis of randomized search heuristics on combinatorial optimization problems with well-established parameterizations from the classical community is therefore a very natural research direction.

Perhaps the most significant research requirement is the need for good problem parameterizations. This requires theoreticians to work closely with practitioners in order to understand what problem components are the most meaningful and relevant in the real world, i.e., what features are most likely to be manifested (or be restricted) in practice, and what problem characteristics might be exploitable by different techniques. This emphasizes the importance of a strong and vibrant relationship between theory and practice.

References

[1] Ageev, A.A., Sviridenko, M.: An 0.828-approximation algorithm for the uncapacitated facility location problem. Discrete Applied Mathematics 93(2-3), 149–156 (1999)

[2] Arora, S.: Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems. J. ACM 45(5), 753–782 (1998). DOI 10.1145/290179.290180.

[3] Bafna, V., Pevzner, P.A.: Genome rearrangements and sorting by reversals. SIAM Journal of Computing 25(2), 272–289 (1996)

[4] Balinski, M.L.: On maximum matching, minimum covering and their connections. In: Proceedings of the Princeton Symposium on Mathematical Programming, pp. 434–445 (1970)

[5] Bringmann, K., Friedrich, T.: Parameterized average-case complexity of the hypervolume indicator. In: C. Blum, E. Alba (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, pp. 575–582. ACM (2013). DOI 10.1145/2463372.2463450.

[6] Chandra, B., Karloff, H., Tovey, C.: New results on the old k-opt algorithm for the traveling salesman problem. SIAM Journal on Computing 28(6), 1998–2029 (1999)

[7] Cornuejols, G., Fisher, M., Nemhauser, G.L.: On the uncapacitated location problem. In: Studies in Integer Programming, Annals of Discrete Mathematics, vol. 1, pp. 163 – 177. Elsevier (1977)

[8] Corus, D., Lehre, P.K., Neumann, F., Pourhassan, M.: A parameterised complexity analysis of bilevel optimisation with evolutionary algorithms. Evolutionary Computation 24(1), 183–203 (2016). DOI 10.1162/EVCOa

[9] De˘ıneko, V.G., Hoffman, M., Okamoto, Y., Woeginger, G.J.: The traveling salesman problem with few inner points. Operations Research Letters 34, 106–110 (2006)

[10] Doerr, B., Eremeev, A.V., Neumann, F., Theile, M., Thyssen, C.: Evolutionary algorithms and dynamic programming. Theoretical Computer Science 412(43), 6020–6035 (2011)

[11] Doerr, B., Johannsen, D., Winzen, C.: Multiplicative drift analysis. Algorithmica 64(4), 673–697 (2012)

[12] Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer (1999)

[13] Englert, M., R¨oglin, H., V¨ocking, B.: Worst case and probabilistic analysis of the 2-opt algorithm for the TSP. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1295–1304. Society for Industrial and Applied Mathematics (2007)