Large-scale optimization with the primal-dual column generation method

2013·Arxiv

Abstract

Abstract

The primal-dual column generation method (PDCGM) is a general-purpose column generation technique that relies on the primal-dual interior point method to solve the restricted master problems. The use of this interior point method variant allows to obtain suboptimal and well-centered dual solutions which naturally stabilizes the column generation. As recently presented in the literature, reductions in the number of calls to the oracle and in the CPU times are typically observed when compared to the standard column generation, which relies on extreme optimal dual solutions. However, these results are based on relatively small problems obtained from linear relaxations of combinatorial applications. In this paper, we investigate the behaviour of the PDCGM in a broader context, namely when solving large-scale convex optimization problems. We have selected applications that arise in important real-life contexts such as data analysis (multiple kernel learning problem), decision-making under uncertainty (two-stage stochastic programming problems) and telecommunication and transportation networks (multicommodity network flow problem). In the numerical experiments, we use publicly available benchmark instances to compare the performance of the PDCGM against recent results for different methods presented in the literature, which were the best available results to date. The analysis of these results suggests that the PDCGM offers an attractive alternative over specialized methods since it remains competitive in terms of number of iterations and CPU times even for large-scale optimization problems.

Keywords: column generation; cutting plane method; interior point methods; convex optimization; multiple kernel learning problem; two-stage stochastic programming; multi-commodity network flow problem.

1 Introduction

Column generation is an iterative oracle-based approach which has been widely used in the context of continuous as well as discrete optimization [14,51]. In this method, an optimization problem with a huge number of variables is solved by means of a reduced version of it, the restricted master problem (RMP). At each iteration, the RMP is modified by the addition of columns which are generated by the oracle (or, pricing subproblem). To generate these columns, the oracle uses a dual solution of the RMP.

In the standard column generation, optimal dual solutions of the RMP are used in the oracle to generate new columns. Since a simplex method is typically used to optimize the RMP, these solutions correspond to extreme points of the dual feasible set of the RMP. As a result, large variations are typically observed between dual solutions of consecutive column generation iterations, a behavior that may cause the slow convergence of the method. In addition, when active-set methods, such as the simplex method, are used to solve the RMP, degeneracy may adversely affect the performance of the column generation method. These drawbacks are also observed in the cutting plane method [43], which is the dual counterpart of column generation. Several alternatives to overcome such weaknesses have been proposed in the literature. Some of them modify the RMP by adding penalty terms and/or constraints to it with the purpose of limiting the large variation of the dual solutions [14,23,52,55]. Other alternatives use dual price smoothing techniques [58, 69]. Finally, there exist variants of column generation which rely on naturally stable approaches to solve the RMP, such as interior point methods [31,37,53,56].

The primal-dual column generation method (PDCGM) [37] is a variant which relies on well-centered and suboptimal dual solutions of the RMPs. To obtain such solutions, the method uses the primal-dual interior point algorithm [35]. The optimality tolerance used to solve the restricted master problems is loose during the first iterations and it is dynamically reduced as the method approaches optimality. This reduction guarantees that an optimal solution of the original problem is obtained at the end of the procedure. Encouraging computational results are reported in [37] regarding the use of the PDCGM to solve the relaxations of three widely studied mixed-integer programming problems, namely the cutting stock problem, the vehicle routing problem with time windows and the capacitated lot-sizing problem with setup times, after applying a Dantzig-Wolfe decomposition [19] to their standard compact formulations.

In this paper, we extend the computational study presented in [37] by analyzing the performance of the PDCGM applied to solve large-scale convex optimization problems. The applications considered in [37] have relatively small restricted master problems and the bottleneck is in solving the oracle. On the other hand, the applications we address in the current paper have large restricted master problems and the oracle subproblems are relatively easy to solve. Hence, we evaluate the performance of PDCGM operating in very different conditions besides addressing a broader class of optimization problems.

By large-scale problems we mean a formulation which challenges the current state-of-the-art implementations of optimization methods, due to a very large number of constraints and/or variables. Furthermore, we assume that such formulation has a special structure which allows the use of a reformulation technique, such as the Dantzig-Wolfe decomposition. Hence, large-scale refers not only to size, but also structure. The problems we address arise in important real-life contexts such as data analysis, decision-making under uncertainty and telecommunication/transportation networks.

The main contributions of this paper are the following. First, we review three applications which have gained a lot of attention in the optimization community in the past years and describe them in the column generation framework. Second, we study the behavior of the PDCGM to solve publicly available instances of these applications and compare its performance with recent results of other stabilized column generation/cutting plane methods, which we believe are the best results presented in the literature for the addressed problems. As a third contribution, we make our software available for any research use.

The remainder of this paper is organized as follows. In Section 2, we describe the decomposition principle and the column generation technique for different situations. In Section 3, we outline some stabilized column generation and cutting plane algorithms which have proven to be effective for the problems we deal with in this paper. In Sections 4, 5 and 6 we describe the multiple kernel learning (MKL), the two-stage stochastic programming (TSSP) and the multicommodity network flow (MCNF) problems, respectively. In each of these three sections, we present the problem formulation and derive the column generation components, namely the master problem and oracle. Then, we report on computational experiments comparing the performance of PDCGM with other state-of-the-art techniques. Finally, we summarize the main outcomes of this paper in Section 7.

2 Reformulations and the column generation method

Consider an optimization problem stated in the following form

where x is a vector of decision variables, X is a non-empty convex set and and are the problem parameters. We assume that without the presence of the set of linking constraints , problem (1) would be easily solved by taking advantage of the structure of X. More specifically, the n variable indices can be suitably partitioned into K subsets, such that X is given by the Cartesian product , in which the sets are independent from each other. Following this notation, we have the partition ) with for every .

We further assume that each set can be described as a polyhedral set, either equivalently or by using a fine approximation as discussed later in Section 2.3. Hence, let and

denote the sets of indices of all the extreme points and extreme rays of , respectively. With this notation, we represent by and the extreme points and the extreme rays of , with and . Any point can be represented as a combination of these extreme points and extreme rays as follows

with the coefficients 0 and 0, for and . The Dantzig-Wolfe decomposi- tion principle (DWD) [19] consists in using this relationship to rewrite the original variable vector x in problem (1). Additionally, considering the partitions ) and ) which are induced by the structure of X, we define , and for every , and and for every . By using this notation, we can rewrite (1) in the following equivalent form, known as the master problem (MP) formulation

Notice that the coefficients in (2) are now the decision variables in this model. Constraints (4) are called linking constraints as they correspond to the set of constraints in (1). The convexity constraints (5) ensure the convex combination required by (2), for each . As a result, the value of the optimal solution of (3)-(7), denoted by , is the same as the optimal value of (1).

2.1 Column generation

The number of extreme points and extreme rays in the MP formulation (3)-(7) may be excessively large. Therefore, solving this problem by a direct approach is practically impossible. Moreover, in many occasions the extreme points and extreme rays which describe X are not available and have to be generated by a procedure which might be costly. Hence, we rely on the column generation technique [22, 29, 51], which is an iterative process that works as follows. At each outer iteration, we solve a restricted master problem (RMP), which has only a small subset of the columns/variables of the MP. The dual solution of the RMP is then used in the oracle with the aim of generating one or more new extreme points or extreme rays, which may lead to new columns.

Then, if at least one of these new columns has a negative reduced cost, we add it to the RMP and start a new iteration. This step is called an outer iteration, to differentiate to the inner iterations which are the ones required to solve the RMP. The iterative process terminates when we can guarantee that the optimal solution of the current RMP is also optimal for the MP, even though not all the columns have been generated.

The RMP has the same formulation as the MP, except that only a subset of extreme points and extreme rays are considered. In other words, the RMP is defined by the subsets and . These subsets may change at every outer iteration by adding/removing extreme points and/or extreme rays. Therefore, the corresponding RMP can be represented as

The value of a feasible solution of the RMP () provides an upper bound of the optimal value of the MP (), as this solution corresponds to a feasible solution of the MP in which all the components in and are equal to zero.

To verify if the current optimal solution of the RMP is also optimal for the MP, we use the dual solutions. Let and represent the dual optimal solutions associated with constraints (9) and (10), respectively. The oracle checks the dual feasibility of these solutions in the MP by means of the reduced cost information. However, instead of calculating the reduced costs for every single variable of the MP, the oracle solves a pricing subproblem of the type

for each . There are two possible cases: either subproblem (13) has a bounded optimal solution, or its solution is unbounded. In the first case, the optimal solution corresponds to an extreme point of . Let (be the reduced cost of the variable associated to the column which is generated by using . If this reduced cost is negative, then and defines a new column that should be added to the RMP. On the other hand, if ) has an unbounded solution, then we take the direction of unboundedness as an extreme ray for this problem. The reduced cost in this case is given by (. If this reduced cost is negative, defines a column which must be added to the RMP. By summing over all the negative reduced costs, we define the value of the oracle as

By using this value, we obtain the following relationship at any outer iteration of the column generation method

When RMP is also an optimal solution of the MP.

The standard column generation based on optimal dual solutions as we have just described is known to have several drawbacks, specially when the simplex method is used to optimize the RMPs. In such case the resulting dual solutions correspond to extreme points of the dual feasible set. This typically leads to large oscillations of consecutive dual solutions (bang-bang effect), in particular at the beginning of the iterative process (heading-in effect) [51,67]. Moreover, optimal dual solutions contribute also to the tailing-off effect, a slow progress of the method close to termination. All these drawbacks may result in a large number of calls to the oracle, slowing down the column generation method (similar issues can be observed in a cutting plane context). Therefore, several stabilization techniques have been proposed to overcome these limitations. In Section 3, we recall some of the techniques which have been successfully used within the applications that we address in this paper.

2.2 Aggregated formulation

Formulation (3)-(7) is often called the disaggregated master problem, as each column in that formulation comes from an extreme point or an extreme ray of only one set . The separability of the sets is reflected in the master problem formulation, as we have master variables associated to each set , and K convexity constraints in (5). In some situations, it may be interesting to keep the variables aggregated in the master problem, e.g. when the number K is too large so that the disaggregated formulation has a relatively large number of constraints and columns, as pointed out in [61]. Hence, we can use the aggregated formulation of the master problem, in which we do not exploit the structure of X explicitly when defining the master variables. More specifically, let P and R be the sets of indices of extreme points and extreme rays of X, respectively. Any given point can be written as the following combination

where and are extreme points and extreme rays of X. By replacing this relationship in problem (1) and using the notation and for every and and for every , we obtain the aggregated master problem formulation

Although this is not required, we can still exploit the separability of X at the subproblem level, without changing (17)-(21). In such case, we obtain the extreme points and extreme rays in a decomposed way from each set , as in the disaggregated formulation. Then, we use these extreme points or rays together in order to generate a new column. This can be done as any extreme point/ray of X can be written as a Cartesian product of extreme points/rays from each . In order to add this new column, one should use +. Notice that in the particular case of identical subproblems (i.e. ), with and , we need to solve only one of the subproblems. In such case, it is common to make a variable change in the master variables, as and the new column becomes . In the applications that we address in sections 4 and 5, namely the multiple kernel learning problem and the two-stage stochastic programming problem, we use aggregated formulations of the master problem as described here.

2.3 Generalized decomposition for convex programming problems

As pointed out in [27, 28] the principles of DWD are based on a more general concept known as inner linearization. An inner linearization approximates a convex function by its epigraph so its value at a given point is not underestimated. Similarly, DWD can be used to approximate a convex set by means of the convex hull of points selected from the set. These points form a base so that any point that belongs to the resulting convex hull can be written as a convex combination of the points in the base. Hence, we can use the DWD to solve this approximated problem, as observed in [17, Ch. 24]. The accuracy of the approximation depends on the points that we include in the base. In Fig.1 we illustrate inner linearizations for a convex function and a convex set, respectively.

Figure 1: Inner linearizations of a convex function and a convex set.

Let us consider a convex optimization problem defined as follows

in which and are convex functions which are continuous in X. For simplicity, we assume that X is a bounded set and that we have a fine base (, where Q is a finite set of indices of points selected from X [27, 28]. By fine base, we mean a (finite) set of points that provide an approximation as accurate as we require for X. Hence, a point can be approximated by the convex combination of points in X as

Since the function f in problem (22) is convex, the following relationship must be satisfied for any

The right-hand side of (24) is an inner linearization of f(x), which can be used to describe f(x) as closely as desired. As long as we choose the base appropriately, this approximation does not underestimate the value of f(x). The same idea applies to F(x). By denoting ) and , we could approximate problem (22) as closely as require with the following linear master problem

Since the cardinality of Q is typically large, we use the column generation method to solve (25)–(28).

In a given outer iteration, let and be the dual optimal solutions associated with constraints (26) and (27), respectively. We use u to call the oracle, which is given by the following convex subproblem

An optimal solution of this subproblem results in a point of X. If 0, then the point can be added to the current base in order to improve the approximation of X. Hence, we obtain a new column of the master problem (25)-(28). The multiple kernel learning problem presented in Section 4 has a master problem formulation which is similar to (25)-(28) and hence we apply the ideas described above.

3 Stabilized column generation/cutting plane methods

There is a wide variety of stabilized variants of the column generation/cutting plane methods [4,8,23,31,37,44,48,52,55,61,69]. In this section we briefly present some methodologies which have proven to be very effective for the classes of problems addressed in this paper. For the sake of the length of this paper, in this section we only describe a small subset of the many different variants available in the literature.

3.1 ACCPM

The analytic center cutting plane method (ACCPM) proposed in [31, 33] is an interior point approach that relies on central prices. This strategy calculates a dual point which is an approximate analytic center of the localization set associated with the current RMP. This localization set is given by the intersection of the dual space of the current RMP with the half-space provided by the best lower bound found so far. Relying on points in the proximity of the analytic center of the localization set usually prevents the unstable behavior between consecutive dual points and also contributes to the generation of fewer and deeper constraints. When solving convex optimization problems with nonlinear objective function, the ACCPM can take advantage of second order information to enhance its performance [6]. An interesting feature of this approach is given by its theoretical polynomial complexity [1,32,45].

3.2 Bundle methods: Level set

Bundle methods [23,40,44] have become popular techniques to stabilize cutting plane methods. In general, these techniques stabilize the RMP via a proximity control function. There are different classes of bundle methods such as proximal [44], trust region [62] and level set [48], among others. They use a similar Euclidean-like prox-term to penalize any large deviation in the dual variables. The variants differ in the way they calculate the sequence of iterates. For instance, the level set bundle method is an iterative method which relies on piece-wise linear outer approximations of a convex function. At a given iteration, a level set is created using the best bound found by any of the proposed iterates and the best solution of the outer approximation. Then a linear convex combination is considered to create the level set bound. Finally, the next iterate is obtained by minimizing the prox-term function subject to the structural constraints and the level set bound constraint [48].

3.3 PDCGM

The primal-dual column generation method (PDCGM) was originally proposed in [38] and further developed in [37]. A very recent attempt of combining this method in a branch-and-price framework to solve the vehicle routing problem with time windows can be found in [57]. Warmstarting techniques for this method have been proposed in [34, 36]. The method relies on sub-optimal and well-centered RMP dual solutions with the aim of reducing the heading-in and tailing-off effects often observed in the standard column generation. Given a primal-dual feasible solution (˜) of the RMP (8)-(12), which may have a non-zero distance to optimality, we can use it to obtain upper and lower bounds for the optimal value of the MP as follows

The solution (˜) is called sub-optimal or -optimal solution, if it satisfies

for some tolerance 0. Additionally, the primal-dual interior point method provides well-centered dual solutions since it keeps the complementarity products of the primal-dual pairs in the proximity of the central path [35]. We say a point is well-centered if the following conditions are satisfied

where (0, 1) and is the barrier parameter used in the primal-dual interior point algorithm to define the central path [35].

The PDCGM dynamically adjusts the tolerance used to solve each restricted master problem so it does not stall. The tolerance used to solve the RMPs is loose at the beginning and is tightened as the column generation progresses to optimality. The method has a very simple description that makes it very attractive (see Algorithm 1).

Remark 1 The PDCGM converges to an optimal solution of the MP in a finite number of outer iterations. This has been shown in [37] based on the valid lower bound provided by an -optimal solution and the progress of the algorithm even if columns with nonnegative reduced costs are obtained.

Remark 2 Since close-to-optimality solutions of RMPs are used there is no guarantee of monotonic decrease of the upper bound. Therefore, we update the upper bound using (˜.

Remark 3 The only parameters PDCGM requires to be set are the degree of optimality denoted by D and the initial tolerance threshold .

Having described the PDCGM and two of the most successful stabilization techniques used in column generation and cutting plane methods, we now describe three different applications and compare the efficiency of these methods on them.

4 Multiple kernel learning problem (MKL)

Kernel based methods are a class of algorithms that are widely used in data analysis. They compute the similarities between two examples and via the so-called kernel function ). Typically, the examples are mapped into a feature space by using a mapping function Φ, so that Φ(. In practice, one kernel may not be enough to effectively describe the similarities between the data and therefore using multiple kernels provides a more accurate clas-sification. However, this may be more costly in terms of CPU time. Several studies concerning different techniques to solve the multiple kernel learning problem (MKL) are available in the literature, such as [7,46,60,64,65,70,71]. For a thorough taxonomy, classification and comparison of different algorithms developed in this context, see [39]. In this paper, we are interested in solving a problem equivalent to the semi-infinite linear problem formulation proposed in [64] which follows developments in [7] and that can be solved by a column generation method. We have closely followed the developments of both papers, namely [7] and [64], keeping a similar notation. In this section, we describe the single and multiple kernel learning problems in terms of their primal and dual formulations. Then, for the MKL, we derive the column generation components. Finally, we present two computational experiments with the PDCGM for solving publicly available instances and compare these results against several state-of-the-art methods.

4.1 Problem formulation

We focus on a particular case of kernel learning known as the support vector machine (SVM) with soft margin loss function [64]. As described in [68], the SVM is a classifier proposed for binary classification problems and is based on the theory of structural risk minimization. The problem can be posed as finding a linear discriminant (hyperplane) with the maximum margin in a given feature space for n data samples (), where is the d-dimensional input vector (features) and is the binary class label. We start by describing the single kernel problem and then extend the formulations to the multiple kernel case. In SVM with a single kernel, the discriminant function has usually the following form

where is the vector of weight coefficients, is the bias term and Φ : is the function which maps the examples to the feature space of dimension . Hence, we use the sign of ) to verify if an example x should be classified as either 1 or +1.

To obtain the vectors w and b which lead to the best linear discriminant, we can solve the following SVM problem with a single kernel

where is a penalty parameter associated with misclassification and is the vector of variables which measure the error in the classifications. In this formulation, the first term in the objective function aims to maximize the distance between the discriminant function and the features of the training data sample. This distance is called a margin and by maximizing 1we keep this margin as large as possible [9]. The second term in the objective function aims to minimize the misclassification of the data sample. The value of parameter C determines the importance of each term for the optimization.

Let and denote the vectors of dual variables associated to constraints (37) and (38), respectively. To obtain the dual formulation of (36)–(38), we first define the Lagrangian function:

Applying first order optimality conditions to the Lagrangian function and noting that and , we obtain the following conditions

Then, by using these relationships in the Lagrangian function, we obtain the dual problem

Recall that we use Φ(to denote the kernel function which maps . This function induces a with entries ) for each . Following [64], we only consider kernel functions which lead to positive semi-definite kernel matrices, so (39)-(41) is a convex quadratic programming problem. Notice that in an optimal solution of the dual problem (39)-(41), 0 means that the j-th constraint in (37) is active and, therefore, is a support vector.

For many applications and due to the nature of the data, a more flexible approach is to combine different kernels. MKL aims to optimize the kernel weights () while training the SVM. The benefit of using MKL is twofold. On the one hand, it finds the relevant features of a given kernel much like the single kernel learning context, and on the other hand, it leads to an improvement in the classification accuracy since more kernels can be considered. Similar to the single kernel SVM, the discriminant function can be described as

where K represents the set of kernels (each corresponding to a positive semi-definite matrix) with a different set of features and is the weight associated with kernel . We also have that 0 for every and and Φ) are the weight vector and the feature map associated with kernel , respectively. The MKL problem was first formulated as a semi-definite programming problem in [46]. In [7], the authors reformulated this problem as a second-order conic programming problem yielding a formulation which can be solved by sequential minimal optimization (SMO) techniques.

Similar to the single kernel problem formulation (36)-(38), the MKL primal problem for classification can be formulated as

where and is the dimension of the feature space associated with kernel k and , for every . As pointed out in [71], the use of v instead of and w (as in (42)) makes problem (43)-(45) convex. This problem can be interpreted as (i) finding a linear convex combination of kernels while (ii) maximizing the normalized distance between the features of the training data sample and the discriminant function and (iii) minimizing the misclassification of the data sample for each kernel function. Following [7], we associate with constraint (44) so the dual formulation of (43)-(45) can be written as

where for every

and

Note that problem (46)-(48) is a quadratically constrained quadratic problem (QCQP) [50]. Since the number of training examples and kernel matrices used in practice are typically large, it may become a very challenging large-scale problem. Nevertheless, this problem can be effectively solved if we use the inner linearization technique addressed in Section 2.3.

4.2 Decomposition and column generation formulation

Note that we have changed the direction of the optimization problem and now we minimize (compare (46) with (51)). This change does not affect the column generation, but the value of the objective function will have the opposite sign. Also, observe that only appears in the master problem. It is worth mentioning that this master problem formulation is the dual of the semi-infinite linear problem presented in [64]. Due to the large number of possible points Γ, we rely on column generation to solve the MKL. The corresponding oracle is given by a single pricing subproblem. Let and be the dual solutions associated with constraints (52) and (53), respectively. We always have 0 for all . In addition, from the dual of problem (51)-(54), we have that are the weights associated with the kernels in K [64]. The subproblem is defined as

This subproblem turns out to be of the same form as a single kernel problem such as (39)- (41), with ). Therefore, an SVM solver can be used to solve the subproblem. Finally, the value of the oracle is and a new column is obtained if 0.

4.3 Computational experiments

To evaluate the efficiency of the PDCGM for solving the MKL, we have carried out computational experiments using benchmarking instances from the UCI Machine Learning Repository data sets [26]. The pricing subproblem (55) is solved by a state-of-the-art machine learning toolbox, namely SHOGUN 2.0.0 [63] (http://www.shogun-toolbox.org/). The first set of experiments replicates the settings proposed in [60]. Hence, we have selected 3 polynomial kernels of degrees 1 to 3 and 10 Gaussian kernels with widths 2. We have used these 13 kernels to generate 13(d + 1) kernel matrices by using all features and every single feature separately, where d is the number of features of the instance. We have selected 70% of the data as a training sample and normalised it to unit trace. The remaining data (i.e., 30%) is used for testing the accuracy of the approach. We have randomly generated 20 instances per data set. We have chosen C = 100 as penalty parameter and the algorithm stops when the relative duality gap drops below (standard stopping criterion for this application). The degree of optimality was set to D = 10 and the initial RMP had only one column which was generated by solving problem (55) with equal weights (). A Linux PC with an Intel Core i5 2.3 GHz CPU and 4.0 GB of memory was used to run the experiments (single thread).

In Table 1 we compare the performance of the PDCGM with the performances of three other methods presented in [60], namely:

• SILP: The Semi-infinite linear programming algorithm was proposed in [64]; therein a cutting plane method without stabilization is used to solve the dual of (51)-(54).

• simpleMKL: The simple subgradient descent method was proposed in [60]. The method solves problem (43)-(45) with a weighted -norm regularization and an extra constraint for the kernel weights. This constrained optimization problem is solved by a gradient method on the simplex. The simpleMKL updates the descent direction by looking for the maximum admissible step and only calculates the gradient of the objective function when the objective function stops decreasing.

• GD: This is the standard gradient descent method which solves the same constrained optimization problem as the simpleMKL. In this approach, a descent direction is recomputed every time the kernel weights are updated.

The first column in Table 1 provides the name, size of the training sample (n) and number of kernels (|K|) for each class of instances. For each method described in the second column, we show the average results over 20 randomly generated instances of the corresponding class. The first number represents the average while the second, the standard deviation. Columns 3 to 6 show the number of kernels with no vanishing values at the optimal solution (# Kernel), the accuracy (Accuracy(%)), the CPU time in seconds (Time(s)) and the number of calls to the support vector machine solver (# SVM calls). Accuracy corresponds to the percentage of correct classifications made by the resulting discriminant function.

From the results in Table 1, we can observe that the PDCGM solves all the instances in less than 42 seconds on average. Also, it shows a variability of around 10% for the average time. Below the PDCGM results, we have included the results presented in [60] for the same statistics. For columns #Kernel, Accuracy and #SVM calls we have taken the results exactly as they are given in [60], while the values in the column Time have been scaled by a factor of 0.5 with respect to the originals. This is because these experiments were run on a Intel Pentium D 3 GHz CPU and 3 GB of memory. Considering the benchmark available at https: //www.cpubenchmark.net/singleThread.html, this machine has a score of 695 whereas the machine we used has a score of 1331. Nevertheless, these CPU times are provided for information only and indicate that PDCGM is competitive regarding CPU times with respect to the other

Table 1: MKL: comparison between PDCGM results and results reported in [60] for the SILP, simpleMKL and GD when using 70% of the data as training sample and 20 instances per data set.

methods. In addition to that, the results reported in the last column of Table 1 clearly demonstrate that the PDCGM requires fewer calls of the SVM solver than any of the other three methods. Particularly, if we compare the SILP and the PDCGM results, we can observe how much can be gained by using a natural stabilized column generation method over the unstabilized version. Also, the method seems to be at least as effective as the other methods since it provides a similar or higher level of accuracy and a smaller number of kernels for most of the data sets. This indicates that PDCGM could find more efficient weights to combine the available kernels. Finally, an important observation is that the simpleMKL can warmstart the SVM solver due to the small variation from iteration to iteration in the kernel weights and therefore an excessively large number of calls to the SVM solver does not translate into an excessively large CPU time [60]. The other three methods (SILP, GD and PDCGM) do not rely on this feature.

In a second set of computational experiments, we have used the same database [26] to generate instances following the experiments presented in [70]. The only difference with the previous experiments is that we now use 50% of the data available as training sample and the other 50% for testing purposes. This second set of experiments allows us to compare our approach against the SILP and simpleMKL (already described) and the extended level method (LSM), a state-of-the-art implementation to solve the MKL [70]. The LSM belongs to a family of bundle methods and it considers information of the gradients of all previous iterations (as the SILP) and computes the projection onto a level set as a way of regularizing the problem (as the simpleMKL). The results of the PDCGM and the results reported in [70] solving the MKL problem are presented in Table 2.

In this case we have omitted the number of calls to the SVM solver since this information was not reported in [70]. Comparing Table 1 and Table 2 one can observe that for the same databases, namely ionosphere, pima, sonar and wpbc, the time and accuracy have decreased when considering 50% of training sample instead of 70%. These results are expected since the problem sizes are smaller when a 50% training sample is used. The PDCGM solves all the instances in less than 21 seconds on average showing again a small variability on this metric. The machine used in [70] has similar characteristics to the one used to run our experiments (Intel 3.2 GHz and 2 GB of RAM memory). Therefore, we have kept the results as they appeared in [70]

The results obtained with the PDCGM demonstrate that the method is reliable with a performance similar to the state-of-the-art solver, such as the LSM. Moreover, the performance of the PDCGM seems to be more stable than those of the other methods showing the smallest standard deviation on CPU times for every instance set considered. Similar to the previous

Table 2: MKL: comparison between PDCGM results and results reported in [70] for the SILP, simpleMKL and LSM when using 50% of the data as training sample and 20 instances per data set.

results, and in terms of number of kernels with non-vanishing weights and accuracy, the PDCGM seems to be as effective as the SILP, simpleMKL and the LSM.

In additional experiments, we have tested the performance of the PDCGM and the LSM (bundle method), leaving aside the influence of the SVM solvers. The LSM implementation uses the optimization toolbox MOSEK (http://www.mosek.com) to solve the auxiliary subproblems. As the results indicate, the PDCGM shows less variability than the LSM regarding the CPU time spent on solving the RMPs. In addition, the PDCGM bottleneck is in the SVM solver (solving the subproblem) unlike the LSM where the bottleneck is in building the cutting plane model and computing the projection onto the level set (solving the restricted master problem). Indeed, the experiments show that the time required by the PDCGM to solve the RMPs does not exceed 2% of the total CPU time on average. Therefore, combining the PDCGM with a more efficient and well-tuned SVM solver implementation may lead to large savings with respect to the state-of-the-art approaches.

5 Two-stage stochastic programming problem (TSSP)

The stochastic programming field has grown steadily in the last decades. Currently, stochastic programming problems can be used to formulate a wide range of applications which arise in real-life situations. For an introduction to stochastic programming we refer the interested reader to [13,42]. The two-stage stochastic linear programming problem (TSSP) deals with uncertainty through the analysis of possible outcomes (scenarios). Several solution methodologies based on column generation and cutting plane methods have been proposed to deal with this class of problems [8, 12, 61, 66, 72]. The TSSP can be posed as two interconnected problems. One of them called the recourse problem containing only the second-stage variables and the other one called the first-stage problem which takes into consideration information related to the recourse problem but only optimizes over the set of first stage variables.

The first-stage problem can be stated as

where is the vector of cost coefficients, is the vector of first-stage variables (which are scenario independent), Ω is the realization of a random event and Ω the set of possible events, is a scenario independent matrix and the corresponding right hand side. Additionally, )] represents the expected cost of all possible outcomes, while ) is the optimal value of the second-stage problem defined by

where y is the vector of second-stage variables and W, T, h and q are matrices and vectors which depend on the realization of the random event .

5.1 Problem formulation

If we assume that set Ω is discrete and that every realization has a probability of occurrence associated with it, the TSSP problem can be stated as a deterministic equivalent problem (DEP) of the following form

where S is the set of possible scenarios and is the probability that scenario i occurs, with 0 for every scenario . Also, is the column vector cost associated with the second-stage variables for every . For completeness we also define , and for every .

When modeling real-life situations, the DEP is a large-scale problem. Due to the structure of and , this formulation has an L-shaped form [66] and can be solved using Benders decomposition [10], which can be seen as the dual counterpart of the DWD. Thus, to obtain a suitable structure for applying the DWD, we need to work on the dual of problem (62)-(66). By associating the vector of dual variables to constraints (63) and to (64), for every , the dual of the DEP can be stated as

Note that in both problems, the primal and the dual, the number of constraints is proportional to the number of scenarios and therefore decomposition techniques are often used to solve them. The number of constraints of the dual (primal) problem is + ˜m|S|). One may attempt to solve any of these problems directly with a simplex-type or interior point method; however, it has been shown that by using decomposition approaches one can solve very large problems whereas a direct method may fail due to the number of scenarios considered [66,72].

5.2 Decomposition and column generation formulation

The master problem associated with DEP formulation (67)-(69) is derived in this section. Also, we describe the subproblem which turns out to be the dual of the recourse problem (59)-(61) and decomposable by scenario. As in [72], we are interested in the aggregated version of the master problem.

The problem decomposes naturally in scenarios having (68) as the linking constraints. Let us define

. Note the slight difference between (70) and the set defined by constraints (69). This is similar to the substitution used in [18] where the dual variable associated with the constraint in (70) for a given scenario is divided by their corresponding probability, . We assume that (70) describes a non-empty convex polyhedron. Note that unlike the other two problems studied in this paper (i.e., MKL and MCNF), the set defined by (70) is not bounded and therefore we also need extreme rays to write an equivalent Dantzig-Wolfe reformulation.

Let P and R denote the index sets of extreme points and extreme rays of Θ, respectively. We can write any point Θ in terms of these extreme points and rays. Since we exploit the separability of Θ, we have ), with . Hence, any extreme point (ray) of Θ corresponds to the Cartesian product of extreme points (rays) of each Θ, so we have

where , and and correspond to extreme points and rays of set Θ, respectively. Therefore, the aggregated master problem associated with (67)-(69) is

This problem is similar to the one presented in [18] and has n + 1 constraints. Let and be the dual variables associated with constraints (72) and (73), respectively. We obtain |S| subproblems, where the i-th subproblem takes the form

This problem is the dual of (59)-(61). When solving (76) we can obtain: (a) an optimal bounded solution or (b) an unbounded solution. If all the subproblems lead to an optimal bounded solution, then we obtain an extreme point from each subproblem ). In this case, the reduced cost of the aggregated column is given by

Otherwise, let be the subset of subproblems with unbounded solutions. For each we obtain an extreme ray , so the aggregated column is generated by using these extreme rays only. As a result, the reduced cost of the new column is given by

5.3 Computational experiments

In this section, we report the results of computational experiments in which we verify the performance of the PDCGM for solving the two-stage stochastic programming problem. We have selected the instances proposed in [3] and [41], which have been widely used in the stochastic programming literature. All these instances are publicly available in the SMPS format [11]. Table 3 gives the basic information regarding each instance, namely the number of scenarios, the optimal value, and the numbers of columns and rows in the first-stage problem, in the second-stage problem and in the deterministic equivalent problem (DEP). Notice that the same instance name may lead to more than one instance by varying the number of scenarios. From the last two columns of the table, we see that instances with a large number of scenarios have very large-scale corresponding DEPs and hence some of them challenge the state-of-the-art implementations of simplex-type methods or interior point methods [72]. On the other hand, these instances can be effectively solved by using a decomposition technique.

In the computational experiments we apply the PDCGM to solve the aggregated master problem formulation (71)-(75), using one subproblem of type (76) for each scenario in the instance. The initial RMP is given by the columns we generate from the first-stage components of the optimal solution of the expected value problem [13]. In addition, to guarantee the feasibility of any RMP, we have added to it an artificial column with coefficient cost equal to 10, in which the entry on the convexity constraint (73) is equal to 1 and the remaining entries are equal to 0. The optimality tolerance used to stop the column generation procedure was set to and the degree of optimality was set to D = 5.0. The experiments were run on a Linux PC with an Intel Core i7 2.8 GHz CPU and 8.0 GB of memory. To solve the subproblems we have used the solver IBM ILOG CPLEX v. 12.4 with its default settings. In Table 4 we present the results of the computational experiments. The first two columns give the problem name and the number of scenarios which are considered in the instance, respectively. Columns 3 and 4 show the number of outer iterations and the total CPU time (in seconds) to solve the instance by using the PDCGM. Recall that an outer iteration involves solving the RMP and calling the oracle with the aim of generating a new column. The last row in the table gives the total number of outer iterations and the total CPU time to solve all the instances. Notice that the PDCGM was able to solve all the instances in the set in less than 430 seconds, with the total number of outer iterations equal to 1762. The average CPU time was 5.7 seconds, and the largest CPU time was 81.05 seconds. The number of outer iterations to solve an instance was never larger than 70, and on average it was 23.5.

To verify the performance of the PDCGM in relation to other column generation/cutting plane approaches, we have added to Table 4 the results presented in [72] for the instances given in Table 3. More specifically, we borrow from [72] the results which are reported for the standard Benders decomposition [10,66] and the Level-set method [48]. From a column generation point of view, these two methods correspond to solving the aggregated master problem formulation (71)-(75) by the standard column generation and by a bundle-type method (a stabilized column generation variant), respectively. The number of outer iterations and CPU time (in seconds) for solving the instances by these two approaches are given in columns 5 to 8 of Table 4. The remaining columns in the table show the relative performance of these methods in relation to PDCGM, i.e., the ratio between the values in columns 5 to 8 and the corresponding values in columns 3 and 4. We have taken the iteration numbers exactly as they are presented in [72], while the CPU times have been scaled according to the benchmark available at https: //www.cpubenchmark.net/singleThread.html. The computer used in [72] was a Linux PC with an Intel Core i5 2.4 GHz CPU and 6.0 GB of memory, and has a score of 1355 whereas the machine we used to run the PDCGM experiments has a score of 1602. Hence, we have multiplied their CPU times by a factor of 0.85. Moreover, the authors have implemented the methods on top of the FortSP stochastic solver system [21], a state-of-the-art solver for stochastic programming. Therefore, the conclusions about CPU times should be taken cautiously and, hence, we focus on the number of outer iterations.

The results in Table 4 show that even though the standard column generation (Benders) had the smallest CPU times on instances with few scenarios, this method delivers the worst overall

Table 3: TSSP: statistics for instances from [3] and [41]

Table 4: TSSP: PDCGM results and comparison with the results reported in [72] for the standard column generation (Benders), and bundle-type method (Level).

performance in relation to the PDCGM and the bundle-type method (Level). To solve all the instances, the standard approach required 3498 iterations, which is almost twice the figures obtained by the PDCGM. This is justified by the good performance of the PDCGM on the instances with a large number of scenarios, an important feature in practice. Besides, the overall performance of the PDCGM was similar to that of the level method, although the latter had about 16% more iterations in total. Hence, the results indicate that the PDCGM is competitive with respect to the level method, which is considered an efficient method for solving two-stage stochastic programming problems [72].

6 Multicommodity network ﬂow problem (MCNF)

Multicommodity network flow problems (MCNF) have been widely studied in the literature and can be applied in contexts in which commodities (e.g., goods and data packages) must be transported through a network with limited capacity and arc costs [59]. The current real-life applications involve transportation as well as telecommunication networks with a large number of arcs and commodities. Hence they lead to very large-scale optimization problems which require efficient solution methods. Column generation approaches have been successfully used for solving this class of problems [5,6,25,30,49]. In this work, we consider one of the most basic variants of the MCNF problem which includes a linear cost function and where the cost depends on the flow traversing an arc and it is independent of the commodity. In this section, we first describe the column generation formulation of the MCNF with linear objective costs. Then, we present the results of using the primal-dual column generation technique for solving large-scale instances, some of them taken from real-life applications.

6.1 Problem formulation

Consider a set K = {1, . . . , K} of commodities which must be routed through a given network represented by the set of nodes N = {1, . . . , n} and the set of arcs M = {1, . . . , m}. For each commodity there is a source node and a sink node , so that the total demand of the commodity () must be routed from to using one or more arcs of the underlying network. Let A be the node-arc incidence matrix of the network determined by sets N and M. In order to associate the demand of each commodity k with all the nodes in the network, we define an n-vector as follows

Let be the decision variable that determines the flow of commodity assigned to arc (. The total flow assigned to a given arc (cannot exceed the arc capacity . In addition, there is a cost that depends linearly on the total flow assigned to the arc. A compact formulation of the (linear) MCNF is given by

This formulation typically leads to a large-scale problem when modeling real-life situations with many commodities and arcs in the network, as the number of variables and constraints may become very large. Solving (77)-(80) by a linear optimization method may be prohibitive in practice, even for the current state-of-the-art solvers. Fortunately, the coefficient matrix of this formulation has a special structure that can be exploited to obtain a more efficient solution strategy.

In this paper, we focus on MCNF problems in which the costs in the network depend on the arcs only and the commodities compete equally for the capacity on the arcs. These problems are usually solved by approaches which are based on column generation procedures [5, 6, 49]. As recognized in [5], there is a different type of MCNF problems in which the costs depend additionally on the commodities assigned to the arc, and the commodities compete for mutual and/or individual capacities [15,16,24]. No paper in the MCNF literature deals with both types of problems simultaneously.

6.2 Decomposition and column generation formulation

In this section, we apply the Dantzig-Wolfe decomposition (DWD) to the compact formulation of the MCNF, following the description given in Section 2. The coefficient matrix of formulation (77)-(80) has a special structure. Namely, the incidence matrix A is replicated K times resulting in a block-angular structure, making appropriate the use of the DWD. Since the decision variables associated with different commodities are only connected by the linking constraints (78), we define the following K independent subsets

Let be the set of indices of extreme points in . Each set is bounded and hence it can be fully represented by its extreme points . As a result, any point can be rewritten similarly to (2). The master problem related to the compact formulation (77)-(80) is then given by

Since the number of variables can be huge, we recur to the column generation technique for solving the master problem, as described in Section 2. The corresponding oracle is given by K subproblems, so that the k-th subproblem provides an extreme point of which consists in the shortest reduced cost path that routes the total demand of commodity k from source to sink nodes. Let and be the dual solution vectors associated with constraints (83) and (84), respectively. The k-th subproblem can be stated as

It is a shortest path problem with arc lengths , initial node and end node . Since 0 for all (, these subproblems can be solved by the Dijkstra algorithm [20]. From the optimal solution of each subproblem, we generate a new column with reduced cost (and the value of the oracle is

6.3 Computational experiments

In this section, we verify the performance of the PDCGM to solve the master problem formulation of the MCNF. We have run computational experiments based on two sets of instances that have been proposed in [47] and, since then, they have been used as a benchmark for linear and nonlinear MCNF problems [2, 5, 6]. The instances in the first set simulate telecommunication networks which are represented by planar graphs. The instances in the second set represent networks with a grid structure and, hence, the number of different paths between two nodes can be very large. All instances are publicly available and can be downloaded from http: //www.di.unipi.it/di/groups/optimize/Data/MMCF.html. The initial columns in the RMP were generated by using the solution we obtain from calling the oracle with u = 0. The tolerance of the column generation was set to and the degree of optimality was set to D = 10. The experiments were run on a Linux PC with an Intel Core i7 2.8 GHz CPU and 8.0 GB of memory.

In practical MCNF problems, it is very likely that in the optimal solution only a small number of arcs will have full capacity usage. As a consequence, many of the arc capacity constraints (83) will be inactive at the optimum. If it was possible to identify these constraints before solving the problem, they could be discarded without changing the optimal solution. Since knowing the inactive constraints in advance is not possible in general, an active set strategy is used during the optimization process, in order to guess which arc capacity constraints should be classified as active. This idea has been successfully applied in the context of MCNF problems [5,24,54] and, hence, we have added this in our implementation as well. In the first RMP, we assume all the arc capacity constraints are inactive, i.e., they are excluded from the formulation. After each call to the oracle, we verify for each constraint, if the total flow on the corresponding arc violates its capacity. If so, the constraint is added to the active set and included in the formulation. An active constraint can also become inactive if the total flow on the corresponding arc is smaller than a fraction of the capacity of the arc. In the computational experiments presented below, we have adopted 9.

In Table 5, columns 1 to 4 show the name of the instances, the number of nodes (n), the number of arcs (m), and the number of commodities (K), respectively. Recall that the number of constraints in the RMP is m + K (disaggregated formulation). The results obtained by PDCGM are presented in the remaining columns. Column 5 shows the optimal value obtained with the optimality tolerance . Columns 6 to 8 show, the percentage of arc capacity constraints which are active at optimum (% Act), the number of outer iterations (Outer) and the total CPU time in seconds for solving the problem (CPU), respectively. The last column gives the percentage of the total CPU time which was used to solve the oracle subproblems (%Or).

According to the results in Table 5, the PDCGM was able to solve every instance in the planar set in less than 7103 seconds, and every instance in the grid set in less than 73 seconds. Most of the CPU time is spent on solving the RMPs, a typical behavior when disaggregated formulations are used for the MCNF [30]. The number of outer iterations was on average 22.4 for the planar instances and 20.1 for the grid instances. The number of active arcs was relatively small and never larger than 30% of the total number of arcs.

To verify the performance of the PDCGM in relation to the best results available in the literature, we have selected one of the most efficient methods for solving the linear MCNF. In [5], the authors use the analytic center cutting plane method (ACCPM) to solve the dual problem of an aggregated master problem formulation of the MCNF. In addition, they use an active set strategy and the elimination of columns. Table 6 shows the best results presented in [5] for solving the same instances described in Table 5. The first column gives the name of the instances. Columns 2 to 5 show the percentage of arc capacity constraints which are active when the column generation terminates (%Act), the number of outer iterations (Outer), the total CPU time in seconds, and the percentage of the total CPU time that was spent in the oracle (%Or), respectively, as reported in [5] for the ACCPM-based approach. We have scaled the CPU times using a factor of 0.4, as these other results were obtained on a Linux PC with Intel Pentium IV 2.8 Ghz and 2 GB of RAM. According to the benchmark provided at https://www.cpubenchmark.net/singleThread.html, this machine has a score of 635 whereas the machine used to run the PDCGM results has a score of 1602. Columns 6 to 9 show the ratios between the results presented in columns 2 to 5 and the corresponding results for PDCGM presented in Table 5. Table 6 has the purpose of giving an idea of the overall performance of the PDCGM (with its best settings) in relation to the best results presented in the literature for

Table 5: MCNF: dimensions and computational results obtained by PDCGM.

the linear MCNF. These results are just informative, as different computational environments were used and an aggregated formulation was used in [5]. From our experience, we observed that using a disaggregated approach reduces considerably the number of column generation iterations for the MCNF, although the RMP grows quickly and may become more difficult to solve. Nevertheless, a reduction in CPU times can be observed for most of the instances, which indicates that the PDCGM is an efficient alternative for solving the linear MCNF.

7 Conclusions

In this paper we have presented computational evidence of the performance of the primal-dual column generation method (PDCGM) for solving the multiple kernel learning (MKL), the two-stage stochastic programming (TSSP) and the multicommodity network flow (MCNF) problems. We have demonstrated how the Dantzig-Wolfe decomposition and the column generation algorithm can be applied to general convex optimization problems, including the use of aggregated and disaggregated formulations. Additionally, we have provided a thorough presentation of the key components involved in applying these techniques to each addressed application. The performance of the PDCGM was compared against some of the best results available in the literature for MKL, TSSP and MCNF, using different types of column generation/cutting plane methods. These applications provided us with different conditions to test the PDCGM, namely disaggregated and aggregated master problem formulations as well as bounded and unbounded subproblems. The computational experiments presented in this paper provide extensive evidence that the PDCGM is suitable and competitive in a broader context of optimization than that previously addressed in [37, 57]. It is worth mentioning that the PDCGM software is freely available for research use and can be downloaded at http://www.maths.ed.ac.uk/~gondzio/software/pdcgm.html.

Further studies will involve extending the PDCGM to solve a wider range of optimization problems, including those which are defined by nonconvex functions. In addition, handling second-order oracle information, as in [6], seems to be a promising avenue for future research for the PDCGM.

Table 6: MCNF: comparison between PDCGM results and the best results reported in [5] for an ACCPM-based approach.

Acknowledgements

We would like to express our gratitude to Victor Zverovich for kindly making available to us some of the TSSP instances included in this study. Also, we would like to thank Robert Gower for proofreading an early version of this paper. Also, we are very thankful to the anonymous referees for their careful reading and the important suggestions made, which certainly helped on improving the first draft of this paper. Pablo Gonz´alez-Brevis has been partially supported by FONDECYT, Chile through grant 11140521. Pedro Munari has been supported by FAPESP (S˜ao Paulo Research Foundation, Brazil) through grant 14/00939-8.

References

[1] Altman, A., Kiwiel, K.C.: A note on some analytic center cutting plane methods for convex feasibility and minimization problems. Computational Optimization and Applications 5(2), 175–180 (1996)

[2] Alvelos, F., Val´erio de Carvalho, J.M.: An extended model and a column generation algo- rithm for the planar multicommodity flow problem. Networks 50(1), 3–16 (2007)

[3] Ariyawansa, K., Felt, A.J.: On a new collection of stochastic linear programming test problems. INFORMS Journal on Computing 16(3), 291–299 (2004)

[4] Babonneau, F., Beltran, C., Haurie, A., Tadonki, C., Vial, J.P.: Proximal-ACCPM: A ver- satile oracle based optimisation method. In: E.J. Kontoghiorghes, C. Gatu, H. Amman, B. Rustem, C. Deissenberg, A. Farley, M. Gilli, D. Kendrick, D. Luenberger, R. Maes, I. Maros, J. Mulvey, A. Nagurney, S. Nielsen, L. Pau, E. Tse, A. Whinston (eds.) Optimisation, Econometric and Financial Analysis, Advances in Computational Management Science, vol. 9, pp. 67–89. Springer Berlin Heidelberg (2007)

[5] Babonneau, F., du Merle, O., Vial, J.P.: Solving large-scale linear multicommodity flow problems with an active set strategy and proximal-ACCPM. Operations Research 54(1), 184–197 (2006)

[6] Babonneau, F., Vial, J.P.: ACCPM with a nonlinear constraint and an active set strategy to solve nonlinear multicommodity flow problems. Mathematical Programming 120, 179–210 (2009)

[7] Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the twenty-first international conference on Machine learning, ICML ’04, p. 6. ACM, New York, NY, USA (2004)

[8] Bahn, O., Merle, O., Goffin, J.L., Vial, J.P.: A cutting plane method from analytic centers for stochastic programming. Mathematical Programming 69, 45–73 (1995)

[9] Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Data mining techniques for the life sciences, pp. 223–239. Springer (2010)

[10] Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik 4, 238–252 (1962)

[11] Birge, J.R., Dempster, M.A., Gassmann, H.I., Gunn, E.A., King, A.J., Wallace, S.W.: A standard input format for multiperiod stochastic linear programs. COAL Newsletter 17, 1–19 (1987)

[12] Birge, J.R., Louveaux, F.V.: A multicut algorithm for two-stage stochastic linear programs. European Journal of Operational Research 34(3), 384–392 (1988)

[13] Birge, J.R., Louveaux, F.V.: Introduction to Stochastic Programming. Springer (1997)

[14] Briant, O., Lemar´echal, C., Meurdesoif, P., Michel, S., Perrot, N., Vanderbeck, F.: Compar- ison of bundle and classical column generation. Mathematical Programming 113, 299–344 (2008)

[15] Castro, J.: Solving difficult multicommodity problems with a specialized interior-point al- gorithm. Annals of Operations Research 124, 35–48 (2003)

[16] Castro, J., Cuesta, J.: Improving an interior-point algorithm for multicommodity flows by quadratic regularizations. Networks 59(1), 117–131 (2012)

[17] Dantzig, G.B.: Linear programming and its extensions. Princeton University Press, Prince- ton, NJ (1963)

[18] Dantzig, G.B., Madansky, A.: On the solution of two-stage linear programs under un- certainty. In: Proceedings Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 165 – 176. University of California Press, Berkeley (1961)

[19] Dantzig, G.B., Wolfe, P.: The decomposition algorithm for linear programs. Econometrica 29(4), 767–778 (1961)

[20] Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)

[21] Ellison, E., Mitra, G., Zverovich, V.: FortSP: a stochastic programming solver. OptiRisk Systems (2010)

[22] Ford, L.R., Fulkerson, D.R.: A suggested computation for maximal multi-commodity net- work flows. Management Science 5(1), 97–101 (1958)

[23] Frangioni, A.: Generalized bundle methods. SIAM Journal on Optimization 13, 117–156 (2002)

[24] Frangioni, A., Gallo, G.: A bundle type dual-ascent approach to linear multicommodity min-cost flow problems. INFORMS Journal on Computing 11(4), 370–393 (1999)

[25] Frangioni, A., Gendron, B.: A stabilized structured Dantzig-Wolfe decomposition method. Mathematical Programming 140(1), 45–76 (2013)

[26] Frank, A., Asuncion, A.: UCI machine learning repository (2010). URL http://archive. ics.uci.edu/ml

[27] Geoffrion, A.M.: Elements of large-scale mathematical programming Part I: Concepts. Man- agement Science 16(11), 652–675 (1970)

[28] Geoffrion, A.M.: Elements of large-scale mathematical programming Part II: Synthesis of algorithms and bibliography. Management Science 16(11), 676–691 (1970)

[29] Gilmore, P.C., Gomory, R.E.: A linear programming approach to the cutting-stock problem. Operations Research 9(6), 849–859 (1961)

[30] Goffin, J.L., Gondzio, J., Sarkissian, R., Vial, J.P.: Solving nonlinear multicommodity flow problems by the analytic center cutting plane method. Mathematical Programming 76, 131–154 (1996)

[31] Goffin, J.L., Haurie, A., Vial, J.P.: Decomposition and nondifferentiable optimization with the projective algorithm. Management Science 38(2), 284–302 (1992)

[32] Goffin, J.L., Luo, Z.Q., Ye, Y.: Complexity analysis of an interior cutting plane method for convex feasibility problems. SIAM Journal on Optimization 6(3), 638–652 (1996)

[33] Goffin, J.L., Vial, J.P.: Convex nondifferentiable optimization: a survey focused on the analytic center cutting plane method. Optimization Methods and Software 17, 805–868 (2002)

[34] Gondzio, J.: Warm start of the primal-dual method applied in the cutting-plane scheme. Mathematical Programming 83, 125–143 (1998)

[35] Gondzio, J.: Interior point methods 25 years later. European Journal of Operational Re- search 218(3), 587–601 (2012)

[36] Gondzio, J., Gonz´alez-Brevis, P.: A new warmstarting strategy for the primal-dual column generation method. Mathematical Programming pp. 1–34 (2014)

[37] Gondzio, J., Gonz´alez-Brevis, P., Munari, P.: New developments in the primal-dual column generation technique. European Journal of Operational Research 224(1), 41–51 (2013)

[38] Gondzio, J., Sarkissian, R.: Column generation with a primal-dual method. Technical Report 96.6, Logilab (1996)

[39] G¨onen, M., Alpaydin, E.: Multiple kernel learning algorithms. Journal of Machine Learning Research 12, 2211–2268 (2011)

[40] Hiriart-Urruty, J.B., Lemar´echal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Springer-Verlag (1993)

[41] Holmes, D.: A (PO)rtable (S)tochastic programming (T)est (S)et (POSTS). Available in: http://users.iems.northwestern.edu/~jrbirge/html/dholmes/post.html [accessed in Apr, 2013] (1995)

[42] Kall, P., Wallace, S.W.: Stochastic Programming. John Wiley and Sons Ltd (1994)

[43] Kelley, L.E.: The cutting-plane method for solving convex programs. Journal of the Society for Industrial and Applied Mathematics 8(4), 703–712 (1960)

[44] Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable minimiza- tion. Mathematical Programming 46, 105–122 (1990)

[45] Kiwiel, K.C.: Complexity of some cutting plane methods that use analytic centers. Math- ematical Programming 74(1), 47–54 (1996)

[46] Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research 5, 27– 72 (2004)

[47] Larsson, T., Yuan, D.: An augmented lagrangian algorithm for large scale multicommodity routing. Computational Optimization and Applications 27, 187–215 (2004)

[48] Lemar´echal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Mathe- matical Programming 69(1-3), 111–147 (1995)

[49] Lemar´echal, C., Ouorou, A., Petrou, G.: A bundle-type algorithm for routing in telecom- munication data networks. Computational Optimization and Applications 44, 385–409 (2009)

[50] Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra and its Applications 284(1-3), 193–228 (1998)

[51] L¨ubbecke, M.E., Desrosiers, J.: Selected topics in column generation. Operations Research 53(6), 1007–1023 (2005)

[52] Marsten, R.E., Hogan, W.W., Blankenship, J.W.: The boxstep method for large-scale optimization. Operations Research 23(3), 389–405 (1975)

[53] Martinson, R.K., Tind, J.: An interior point method in Dantzig-Wolfe decomposition. Computers and Operation Research 26, 1195–1216 (1999)

[54] McBride, R.D.: Progress made in solving the multicommodity flow problem. SIAM J. on Optimization 8(4), 947–955 (1998)

[55] du Merle, O., Villeneuve, D., Desrosiers, J., Hansen, P.: Stabilized column generation. Discrete Mathematics 194(1-3), 229–237 (1999)

[56] Mitchell, J.E., Borchers, B.: Solving real-world linear ordering problems using a primal-dual interior point cutting plane method. Annals of Operations Research 62, 253–276 (1996)

[57] Munari, P., Gondzio, J.: Using the primal-dual interior point algorithm within the branch- price-and-cut method. Computers & Operations Research 40(8), 2026–2036 (2013)

[58] Neame, P.: Nonsmooth dual methods in integer programming. Ph.D. thesis, University of Melbourne, Department of Mathematics and Statistics (2000)

[59] Ouorou, A., Mahey, P., Vial, J.P.: A survey of algorithms for convex multicommodity flow problems. Management Science 46(1), 126–147 (2000)

[60] Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)

[61] Ruszczy´nski, A.: A regularized decomposition method for minimizing a sum of polyhedral functions. Mathematical Programming 35, 309–333 (1986)

[62] Schramm, H., Zowe, J.: A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results. SIAM Journal on Optimization 2(1), 121–152 (1992)

[63] Sonnenburg, S., R¨atsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., de Bona, F., Binder, A., Gehl, C., Franc, V.: The SHOGUN machine learning toolbox. Journal of Machine Learning Research 11, 1799–1802 (2010)

[64] Sonnenburg, S., R¨atsch, G., Sch¨afer, C., Sch¨olkopf, B.: Large scale multiple kernel learning. Journal of Machine Learning Research 7, 1531–1565 (2006)

[65] Suzuki, T., Tomioka, R.: SpicyMKL: a fast algorithm for multiple kernel learning with thousands of kernels. Machine Learning 85, 77–108 (2011)

[66] Van Slyke, R., Wets, R.: L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal on Applied Mathematics 17(4), 638–663 (1969)

[67] Vanderbeck, F.: Implementing mixed integer column generation. In: G. Desaulniers, J. Desrosiers, M.M. Solomon (eds.) Column Generation, pp. 331–358. Springer US (2005)

[68] Vapnik, V.: Statistical learning theory. Wiley (1998)

[69] Wentges, P.: Weighted Dantzig-Wolfe decomposition for linear mixed-integer programming. International Transactions in Operational Research 4(2), 151–162 (1997)

[70] Xu, Z., Jin, R., King, I., Lyu, M.: An extended level method for efficient multiple kernel learning. Advances in Neural Information Processing Systems 21, 1825–1832 (2009)

[71] Zien, A., Ong, C.S.: Multiclass multiple kernel learning. In: Proceedings of the 24th international conference on Machine learning, ICML ’07, pp. 1191–1198. ACM, New York, NY, USA (2007)

[72] Zverovich, V., F´abi´an, C.I., Ellison, E.F., Mitra, G.: A computational study of a solver sys- tem for processing two-stage stochastic LPs with enhanced Benders decomposition. Mathematical Programming Computation 4, 211–238 (2012)

designed for accessibility and to further open science