An adaptive data-driven approach to solve real-world vehicle routing problems in logistics

2020·Arxiv

Abstract

Abstract

Transportation occupies one-third of the amount in the logistics costs, and accordingly transportation systems largely influence the performance of the logistics system. This work presents an adaptive data-driven innovative modular approach for solving the real-world Vehicle Routing Problems (VRP) in the field of logistics. The work consists of two basic units: (i) an innovative multi-step algorithm for successful and entirely feasible solving of the VRP problems in logistics, (ii) an adaptive approach for adjusting and setting up parameters and constants of the proposed algorithm. The proposed algorithm combines several data transformation approaches, heuristics and Tabu search. Moreover, as the performance of the algorithm depends on the set of control parameters and constants, a predictive model that adaptively adjusts these parameters and constants according to historical data is proposed. A comparison of the acquired results has been made using the Decision Support System with predictive models: Generalized Linear Models (GLM) and Support Vector Machine (SVM). The algorithm, along with the control parameters, which using the prediction method were acquired, was incorporated into a web-based enterprise system, which is in use in several big distribution companies in Bosnia and Herzegovina. The results of the proposed algorithm were compared with a set of benchmark instances and validated over real benchmark instances as well. The successful feasibility of the given routes, in a real environment, is also presented.

Keywords Vehicle routing problem Multi-step algorithm Data-driven approach Parameter setting problem Real-world constraints in logistics

1 Introduction

Since logistics advanced in the 1950s [1], numerous studies were carried out within various application domains. Due to the trend of nationalisation and globalisation in recent decades, the importance of logistics management has been growing in various domains. For industries, logistics help to optimise the existing production and distribution processes based on the same resources through management techniques for promoting the efficiency and competitiveness of enterprises. The essential element in a logistics chain is the transportation system, which connects separated activities. Transportation occupies one-third of the amount in the logistics costs, and transportation systems largely influence the performance of the logistics system. Transportation is required over within the whole production procedure, starting from manufacture, delivery to the end consumer, and even return of goods. Only an excellent coordination between each component would maximize the benefits. Without well-developed transportation systems, logistics could not bring its advantages into full play. In logistics, such transportation system could provide better efficiency, reduce operation costs, and promote service quality [2]. The successful solving of vehicle routing problems can significantly improve company’s operations in the field of transportation.

The vehicle routing problem is a generalization of the Traveling Salesman Problem (TSP), which is one of the most studied optimization problems. The problem is concerned with a travelling salesman who has a task to visit a set of cities in the shortest possible path, having that each city is visited only once, and that the starting city must be the finishing city as well. It is necessary to find the shortest route, which fulfils the previous condition [3]-[5]. When the previously defined problem is concerned with more than one travelling salesman, having that all of them are in the same city, then the Vehicle Routing Problem at its starting state is defined. It is necessary to find a set of shortest possible routes for the travelling salesmen, such that each city is visited by only one salesman. This variant of the problem is called the Multiple Travelling Salesman Problem – MTSP [6]. MTSP is similar to the basic form of the VRP. There are two main reasons why VRP is one of the most studied problems in the academic community. Those are two characteristics which VRP has, and which TSP lacks [7]. Firstly, there are still no algorithms which are as successful in solving the VRP problem as the most successful algorithms for solving the TSP problem are. Secondly, VRP is a problem set exceptionally applicable in practice. Various difficulties occurring in companies that deal with transport and logistics formulated with different variants of the VRP.

To get the entire effect in practice, the approach and model (algorithm) presented in this paper for solving the real-world VRP should be adequately applied and validated in real conditions. Many facts indicate that to use this approach in practice, in the area of freight and logistics, it is important to consider various factors dependent on many parameters, primarily the number of served customers, and whether the sold/delivered goods are packages or pieces. This directly affects the way of storing the goods and loading the vehicles, through various natural limitations, such as the fact that some customers often have to use the same vehicle due to the temperature or other conditions; the realistic duration of unloading the goods at the delivery points; calculating the costs of transport routes; dependence on the loading and vehicle types; legal limits on the maximum service duration of a vehicle and a driver, and others. The human mind, and therefore the transport managers in companies in real-world situations create transportation routes by forming independent clusters to which vehicles are joined, from the available vehicle fleet. However, this approach which is based on regions cannot take into the consideration all the mentioned factors when creating the transportation routes, in a way that satisfies all the constraints. Most of the available software solutions operate on the same principle, i.e., grouping the customers into the clusters (regions), having the possibility to further bind multiple adjacent regions that lie in the same route. This clustering approach is often unable to solve complex problems that occur in practical applications in logistics, resulting in the unfeasible routes. It does not provide the best solution, especially in the cases of larger cities, where it is extremeley difficult to define logical and completely separated customers’ regions. Clustering approach also falls into the performance issues, caused by many delivery points, an extremely heterogeneous vehicle fleet, and a lot of different constraints that need to be fulfilled. Therefore, based on the mentioned facts, the purpose of this work is divided into two parts: (i) solving the complex real-world VRP in logistics using the proposed innovative multi-step algorithm, while meeting all of the appointed realistic constraints, (ii) adjusting the parameters and constants of the given model and algorithm by using the historical data, on an adaptive way. The proposed algorithm for solving the complex real-world VRP is based on the principle of penalization, and as such, uses numerous constants and parameters that are additionally adjusted using the historical data. Hence, this approach is “datadriven”. Transportation routes which are the final result of these two interconnected entities, are mostly feasible from the practical point of view, which is the most important point for any company that wants to successfully create the best possible transportation route. Therefore, this paper presents the application of a new innovative approach and concept for solving VRP using the real data of one of the largest distribution companies in Bosnia and Herzegovina. Dataset is public and stored at 4TU.ResearchData and available for use to other researchers, as the new benchmark data ([8]-[9]). The validation of this approach was first done upon the standardized benchmark data, then upon the actual routes of the mentioned distribution company.

In more details, in this work we present and propose a multi-step algorithm that solves the real-world VPR. The algorithm implements four successively connected steps, comprising several data transformation methods, heuristics, and the Tabu search. The main goal of the proposed algorithm is to solve the real-world VRP problems in logistics with minimal cost, while meeting all of the realistic constraints. The proposed algorithm consists of a number of control parameters and constants, and hereby, we also present the prediction model for adaptively setting up and adjusting the parameters according to the historical data. In this way the suggested approach of solving the VRP problem acquires a more adaptive character. A comparison of the acquired result was made using the realized decision support system for the analysed prediction models: The Generalized Linear Models (GLM) and Support Vector Machine (SVM). A comparison has been made between the results of the proposed algorithm with a set of benchmark instances, also results over real benchmark data have been presented which also contain additional realistic information and constraints. The proposed model and concept, along with the managing parameters which were acquired using the proposed prediction method, was incorporated into the web-based enterprise system which is in productional use in the real sector. In the example of the distribution company and its realistic data used in this paper, all the transportation routes obtained by the proposed approach in the testing period lasting for three months, were completely feasible, respecting all of the restrictions and constraints. The financial savings obtained by using these routes are considerable in the area of transportation of the given company, and in this way, the proposed approach is successfully validated in practice.

Based on a detailed overview of the state-of-the-art which is presented in the next section (Section 2), and analysis of real problems of optimizing transport routes, an area for proposing a new, modular approach and algorithm for adaptive solving of the complex VRP problems with realistic constraints, has been observed. The third section (3.1) contains a detailed description of the proposed multi-step algorithm, which consists of several steps which are incorporated in a single unique unit. The proposed algorithm for solving the VRP problem consists of its constants and control parameters. The third section (3.2) also presents an innovative approach for adjusting the given parameters. Discussion of the results has first been done on standardized input datasets, and afterwards on an real dataset of one of the biggest distribution company in Bosnia and Herzegovina. The given data has been available to other researchers as well. At the end, results of part of the system responsible for adjusting the control parameters, have been discussed. All of the aforementioned discussions of results are described and explained in more detail in section four, as well as successful feasibility of the given routes in real-world environment. Section five presents conclusions of the work and guidelines for further research in this scientific field.

2 Related work

As stated in [10], the transportation of the product is a significant component of the total product cost (10%). The routing problem complexity increases exponentially when the number of customers or vehicles increases. Even for smaller instances, manual routing is becoming a difficult task for humans, and software based solutions can provide better performance than an experienced worker. Many algorithms for heuristic solving of the vehicle routing problem were described in the literature. Various approaches that use modern and recently introduced algorithms are described In [11] state of the art and problem review is given. The paper makes a reference to 277 articles that present different approaches to solving the vehicle routing problem published between 2009 and 2015.

The vehicle routing problem mostly includes various real-world constraints. All those constraints have to be taken into consideration a feasible solution to real-world usage.

The most famous constraints are time windows. The time window represents the time interval when the customer can be served. In paper [12] authors presented a mathematical model for the vehicle routing problem with access time windows, a version of the VRP suitable for planning delivery routes in a city with accessibility restriction, which bans the access of freight vehicles to central urban areas in many European cities. They used the model to find exact solutions to small problem instances based on a case study and then compare the performance over larger instances of a modified savings algorithm, a genetic algorithm, and a Tabu search procedure. The results do not show a clear prevalence of any of them, but confirming the significance of those additional costs and externalities. In [13] the adaptive memetic algorithm for minimizing distance in vehicle routing problem with time windows (VRPTW) is described. In [14], different metaheuristic approaches for solving VRPTW are described, such as Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) or Artificial Bee Colony (ABC). In [15] a hybrid of ACO and Firefly algorithm (FA) for solving vehicle routing problems is given. The VRPTW has been further tested. In [16], the improved simulated annealing algorithm is described.

The vehicle capacity is an important constraint used in real-world examples. In [17] improved K-nearest neighbour algorithm is used to solve capacitated VRP (CVRP). In [18] an improved simulated annealing for the CVRP is described. In [19], an Evolutive Tabu-Search approach is used for CVRP.

The number of depots varies for different companies. When more than one depot is in usage, the problem is called a multi-depot vehicle routing problem (MDVRP). In [20] an improved ACO algorithm is used for multi-depot green VRP. Enhanced differential evolution algorithms for solving MDVRP are used in [21]. In [22] discrete FA is used to solve asymmetric multi-depot VRP. In [23] other approaches and literature review for multi-depot vehicle routing problem is described.

One of the most important realistic constraints is fuel consumption, whether it is logistics of the own vehicle fleet or outsourced logistics. Outsourced logistics operation to third-party logistics has attracted more attention in the past several years. However, very few papers analysed fuel consumption model in the context of outsourcing logistics. In paper [24] authors presented a hybrid Tabu search algorithm for a real-world open vehicle routing problem involving fuel consumption constraints. Experiments in this paper were conducted on instances based on real road data of Beijing, China, considering that outsourced logistics plays an increasingly important role in China’s freight transportation.

In literature and practice, many other constraints are considered, such as heterogeneous fleet VRP [25], city VRP [26], etc. In [27], a taxonomic review of the vehicle routing problem is given with different approaches and a methodology for classifying the literature. Often, the problems of transport optimization have a dynamic interpretation. Ride-sharing services are transforming urban mobility by providing timely and convenient transportation to anybody, anywhere, and anytime. However, most of the mathematical models do not fully address the potential of ride-sharing. Authors in paper [28] presented a more general mathematical model for real-time high-capacity ride-sharing that (i) scales to large numbers of passengers and trips, and (ii) dynamically generates optimal routes with respect to online demand and vehicle locations. The algorithm applied to fleets of autonomous vehicles and also incorporates rebalancing of idling vehicles to areas of high demand. This framework is general and can be used for many real-time multivehicles, multitask assignment problems.

From the aspect of practical application in logistics, it is always important to do all the necessary pretreatments which lead to the optimal transportation routes, satisfying all the constraints. In paper [29], the authors presented a multi-phase hybrid approach with clustering, dynamic programming, and a heuristic algorithm to solve a collaborative multiple-centre vehicle routing problem (CMCVRP). CMCVRP is a multiconstraint combinatorial and game optimization issue containing both vehicle routing optimization and profit distribution procedures. The CMCVRP is generally used to study the logistics network structure adjustment from a non-optimal network structure to a collaborative multiple DCs network optimization structure. The optimization of CMCVR can effectively improve vehicle loading rate and reduce the crisscross transportation phenomenon. Designing a reasonable profit distribution mechanism is a critical step in CMCVR optimization. Collaboration can be organized through a negotiation process by a logistics service provider. On the other hand, the study [30] establishes a linear optimization model to minimize the total cost of a two-echelon logistics joint distribution network. An improved ant colony optimization algorithm integrated with the genetic algorithm is presented to serve customer clustering units and resolve the model formulation by assigning logistics facilities. Collaborative two-echelon logistics joint distribution network can be organized through a negotiation process via logistics service providers or participants existing in the logistics system, which can effectively reduce the crisscross transportation phenomenon and improve the efficiency of the urban freight transportation system.

As it can be observed, all heuristic algorithms contain parameters that must be set for the algorithm to provide a quality and usable solution. This makes the parameter-setting problem a significant research area because the results of the algorithm significantly depend on the parameter values. In the majority of the cited research papers, a fact has been mentioned which indicates that every real-world VRP problem is a slightly bit different from the VRP problem that is the most similar to it. That is affected by two sets of parameters (control), among other things: (i) certain realistic constraints and input data constants, and (ii) constants of the used algorithm.

Each company that requires an implementation of the VRP has its constraints that are defined by the business policy of the given company. That is why the literature often states that constraints and restrictions in these types of problems are non-standard. Lee describes one of these problems in great detail in [31], along with a solution proposal on a concrete realized example. Data Mining (DM) techniques and methods are also often used for adjusting the realistic constraints of the VRP, as well as Machine Learning (ML) algorithms, and other methods such as Fuzzy logic or Neural Networks (NN). There are several available papers which describe the application of statistical methods for these causes. Some kinds of interesting real examples are presented in papers [32]-[33]. An especially interesting example of the classic application of real data is presented in research [34] where the term data-driven solving of the VRP problem is introduced. This work also mentions for the first time an additional phase, which can be used in a real environment, and that is the human-computer interaction phase, which enables that the end user has the possibility of manual processing and modification of the suggested routes. No matter how perfect the algorithm seems, there are always real situations that are impossible to predict and classify in advance, and that is why that possibility is needed in practical systems.

Each of the analysed approaches and algorithms for solving the VRP problem, including the one presented in this work, consists of certain constants and control parameters. Those parameters and constants are used to determine certain weight factors, punitive factors by individual criteria, depending on the importance of the criteria for the result of the real situation of vehicle routing and others. In literature, this approach is known as the Parameter Setting Problem (PSP). The most interesting work on this topic was presented by Calvet et al. [35] in which they describe a statistical approach for setting the parameters for metaheuristic algorithms, and which is applied to the VRP problems.

3 Proposed adaptive data-driven approach

The key element in the supply chain is the transportation system that unites different, spatially and temporally separated activities. The transportation includes one-third of logistics costs and significantly affects the performances of the logistic system. Distribution companies usually have problems where they are not able to optimize their transportation activities in the best possible way and therefore lose considerable financial resources. Two basic units are presented in this paper (Figure 1):

a multi-step algorithm, that is able to optimally solve real-world and extremely complex VRP problems, satisfying most of the constraints that can occur in practice (section 3.1),

a proposed algorithm consists of appropriate constants and parameters that can be set based on historical value, so the second part of this section (section 3.2) presents data-driven approach for setting the control parameters of the proposed algorithm.

This two-component model could be represented as a diagram seen in Figure 1. In addition to historical data, the other data are also taken into consideration, such as Global Positioning System (GPS) and Geographic Information System (GIS) data.

Figure 1: Proposed data-driven approach for adaptive solving the vehicle routing problem

Transportation routes obtained this way are optimal from the algorithmic point of view and completely feasible in a real environment, which is the most important fact, from the aspect of the practical application of such systems. The transportation costs are significantly reduced, routes are feasible, and customers of the company are more satisfied as their requirements are fulfilled by using this multicomponent approach on the real example of the distribution company.

3.1 Multi-step algorithm for solving real-world VRP problems in logistics

The proposed algorithm that solves the heterogeneous fleet vehicle routing problem with time windows (HVRPTW), and satisfies realistic constraints, consists of four steps, or rather two steps, one intermediate step and one post-step (Pseudocode 1). In the first step, the initial solution to the problem is created, which is improved in later steps. A modification of the heuristic Clarke-Wright algorithm was implemented. After that, the second step (intermediate step) strives to decrease the number of routes. The solution which is the result of the second step serves as an initial solution to the local search based on Tabu search, or the third step. After Tabu search, the fourth step (post-step) optimizes the order of customers within each acquired route. Before that, the transformation of input data, time windows of customers and time distances is done, which enables the duration of unloading at each customer to become equal to 0 and simplifies the problem. The algorithm also solves the problem where some customers cannot be serviced by certain vehicles from the fleet (Site-Dependent Vehicle Routing Problem - SDVRP), which is one of the most commonly used realistic constraints in practice.

Before the description of the proposed algorithm, the formulations of the basic concepts as well as the problem will be briefly introduced:

Route – this implies a row of customers and a vehicle which travels that route. The route has much more data and characteristics, but it is derived from these two, which means that the route is defined by a row of customers and one vehicle.

Solution – one solution is a set of routes where each customer is in exactly one route. The solution depends on the order of each customer in a transport route of a corresponding vehicle. The time of arrival at the customer is determined so that it is the earliest possible time (meaning that time is always equal to the start of the time window or it is equal to the time that was necessary to get to the previous customer). The time of arrival at the first customer is equal to the start of his time window, or it is equal to the start of the time window of the vehicle increased by the time necessary for the vehicle to arrive at the warehouse of the first customer.

Realistic constraints which are covered in the proposed algorithm and approach – a lot of the constraints which could be met in the real environment are included in the proposed four-steps algorithm and approach. Some of them are as follows: a large number of the customers with their time-windows included, heterogeneous fleet of vehicles, working hours of each vehicle, working hours of the drivers as well, working hours of the depot, site-dependent constraints, blocking roads for each vehicle in the fleet, vehicle limitations per weight and volume (capacity), vehicle filling mode when goods are sold at the unit level (article), dynamic calculation of the cost depending on the weight of the goods the vehicle is transporting, possibility of multiple and divided travelling for each vehicle, reasonable period of execution of algorithm for normal use in real situations, and other.

Route cost – this implies the number of kilometers multiplied by the cost per kilometer of the vehicle driving that route. That is the real route cost. Different penalties are added to that. Sometimes it is allowed for a route not to meet certain constraints, and for each constraint that is not met punishment points (penalties) are added, which increase when the violation of each of the constraints increases.

The real route cost is equal to:

– total distance of route travelled measured in kilometres,

– cost in monetary unit per kilometre for the vehicle driving that route.

This definition implies that only the variable cost of the vehicles is to be considered. In real examples, there are fixed costs as well, which can be described as the cost of each of the vehicles (cumulative amortization, registration, maintenance, tires, vehicle insurance and others) per day. The fixed cost can also include driver cost (one or more of them that are necessary for delivery) for each vehicle separately. If the fixed vehicle cost is taken into consideration as well, then the real route cost changes as follows:

Cost mostly depends on the fuel consumption of the vehicle , while cost includes vehicle amortization costs, driver costs, and others. Route delays happen when a vehicle is unable to service one or more customers until the end of the time window. If the vehicle arrives before the time window to a customer, it waits for delivery until the start of the time window and then starts delivering.

This delay does not represent route delay. instead, it is used in the context where the stated delays are added,

and legally essential drivers' breaks for every vehicle during the workday are obtained. The mentioned delays, if they exist, can be combined with the longest unloading at the customer, and that way necessary drivers' breaks are respected. The penalty for route delay is the sum of the delays at all of the customers of that route (presented in minutes) multiplied with the constant

– the end time of servicing the -th customer,

– the end of the time window of the -th customer.

The penalties for overloading in volume or weight are calculated by taking into account the percentage of the excess, so that 1 [m3] of excess costs bigger vehicles less, considering that in percentages, for bigger vehicles, it affects the problem of exceeding the given constraint less.

The penalty for volume is equal to the quotient of the excess in cubic meters and maximum volume, which the vehicle driving the route can carry multiplied with the constant .

The weight is penalized analogously as volume, the penalty is acquired when the relative excess is multiplied with the constant .

Given that there are certain constraints in which some customers cannot be serviced by certain vehicles (SDVRP), the route cost also includes the penalty for the violation of those conditions. The penalty for customers in the wrong vehicles is equal to:

the wrong vehicle,

The route cost also includes the penalty, which presents the cost increase for vehicles when they are transporting a bigger weight. In real examples of solving the VRP, it has been established that it is necessary to deliver the merchandise to the end customers as soon as possible, given that the consumption of fuel (and other vehicle parts) is significantly larger when the vehicle is transporting a greater amount of merchandise (measured in weight). That is the reason why the parameter was introduced. It strives to primarily service closer customers, especially those who, in percentages, have a greater impact on the total weight of the whole route due to the weight. can be presented with the following equation:

maximum weight,

In laboratory conditions, this fact is completely disregarded, and the route is primarily considered optimal if customers farther away from the warehouse are serviced first. However, from the aspect of practical application, the given statement is different. If there are customers in a city that is farther away from the warehouse, and a customer (or several) that is very close to the warehouse, but whose ordered weight is, for example, 20% of the total weight of the given route, in laboratory observations the observed vehicle will transport the necessary 20% of route weight from the customer (near the warehouse) to the city and back. Taking into account the fact that the vehicles travelling the routes can be large trucks (with a great transport capacity) and the routes long (several hundred kilometres), then the consumption of fuel in those cases can be larger than in the case where the given customer is serviced among the first (at the start of the route), during departure from the warehouse itself. The given modification decreases the real route cost created using the proposed algorithm. This is one of the contributions of this paper that has not been studied in more detail in other scientific papers.

The total route cost , marked as , is the sum of the real cost and all of the listed penalty costs.

The cost is calculated this way during the third step of the proposed algorithm (which will be explained in

greater details later on), which is conceivably the most important for the optimality of the overall algorithm. During the first step and mid-step of the algorithm delays and route, overloads are not allowed, so those penalties are equal to 0. Also, given that there are no vehicles during the first step of the route, is overlooked. Taking that into account, the cost during the first step is , and during the second step

Solution cost – solution cost is the sum of route costs which that solution entails.

Now that the main terms and their definitions have been stated, the formulation of the problem for the

implemented algorithm can be given as well: for the given set of customers and their orders, vehicles and their characteristics, time constraints related to the servicing of customers, time constraints related to the working hours of each of the vehicles, time constraints related to the warehouse (where all deliveries start and end at one warehouse), constraints related to which vehicle can service which customer, as well as other realistic constraints, it is necessary to find a set of routes and to assign a vehicle to each route, so that the solution cost

is minimized. Ideally, the real cost will be minimal, and all the other costs related to penalizing the violation of one or more of the constraints will be equal to zero.

3.1.1 Pre-step – Data initialization and Transformations

Before the transformation of customers’ time windows, as is shown in Figure 2, two other types of modifications/transformations are done. The first type relies on the fact that the proposed algorithm has the possibility of defining two additional warehouse parameters:

warehouse is available for use. More precisely, the given time is the earliest time the vehicles can leave the warehouse.

last time the vehicles can return to the warehouse after they have serviced the customers from the given route.

Based on the two parameters, the working time of the warehouse, as well as the working time of the vehicles themselves, can be controlled, which presents a complex option (generalization) of this constraint. That is included and explained in a later step of the proposed algorithm. The given parameters are taken into account so that the time windows of each of the customers are corrected through the following iterations:

The difference is calculated for each customer, where represents the start of the time window for the customer , and represents the time distance from depot to customer .

If the difference , is given, then the start of the time window of the customer is set to the value . Otherwise the value does not change for customer .

, is calculated for each customer, where presents the end of the time window for customer , and presents the time distance from customer to the depot.

If the sum is given, then the end of the time window for customer is set to the value . Otherwise the value does not change for customer .

The second transformation is based on the fact how the proposed algorithm is supposed to perform multiple deliveries to the customers in the situations when customer cannot be serviced by a valid vehicle in one delivery while not simultaneously violating SDVRP constraints. In order to meet those constraints, the algorithm performs the following iterations during this pre-step of data preparation:

For each customer, a set of available vehicles that can service them are checked, and then they are stored into a list of vehicles available to the customer

From the list of vehicles available to each customer a comparison is made between the volume and transport capacity of the vehicle and the volume and weight of the customer’s order

If the list of available vehicles to the customer contains vehicles, whose volume and transport capacity are greater or equal to those of the volume and weight ordered by the customer, the iteration for that given customer is stopped, and the next customer is reviewed

If the list of available vehicles to the customer does not contain vehicles whose volume and transport capacity are greater or equal the volume and weight of the customer’s order, several sub-iterations are done within this iteration:

For each allowed vehicle for multiple deliveries, the profitability of the allowed vehicle for the given customer is calculated the following way:

in the following steps of the proposed algorithm. The start of the working time of the most profitable vehicle is set to the value:

When all customers are checked, the iterations are stopped

Besides the listed transformations, other data preparations are done before the start of the algorithm’s execution. The listed preparations of the algorithm’s input data will be explained later in the following sections of this work.

Transformation maps the current problem into a different problem which can be proven to be the same as the first but is more simple to realize. This pre-step only change the algorithm’s input data related to time. After the transformation is done, the unloading time for each customer is 0, and the time distances between the customers have increased.

The implemented transformation is published in the work of the authors Liu and Shen [36], and it can be implemented if the distances meet the triangle inequality theorem, which is true for real-world problems of transport route optimization. Time distances for every two customers increase for half the unloading duration of those customers (intuitively, half the unloading time for each customer is assigned to the time distance to that customer, and the other half is assigned to the time distance from that customer). Time distances for each customer from the depot increase for half the unloading time of that customer. After that, time windows change as well, so that half the unloading time for that customer is added to the start of the time windows for each customer and the end of the time window decreases by half the unloading time for that customer. Meaning:

where and are unloading times of the customer, time distance from depot to customer , start

of the time window and end of the time window of the customer , in that order. is the time distance of the customers and , and , and are the new values of od and after the transformation, in that order.

After these changes are implemented, the unloading times for each of the customers are equal to 0. The newly acquired time distances between the customers are stored into a new matrix , and all of the data related to the customers, with an exception of the unloading times, are changed in the list of customers. The unloading times only change after the routes are made, locally at the customers of the routes, so that it is possibly to return the data to an initial state using inverse transformation (so that the value of unloading duration is not lost).

The implemented transformation is not completely the same as in the mentioned research. Instead, it is adjusted to be accurate in this case. The difference is that the work assumes that the delivery starts in the time window, and does not necessarily have to end in the time window, which is not accurate in the algorithm presented in this work. Also, in the mentioned work, the working time of the warehouse is changed in this step as well and considering that the given modification was done in the pre-step of the proposed algorithm, it is not done again in the transformation step.

The problem must stay the same, and after the algorithm is executed, and before the final print out of the results, an inverse transformation is done as well, so that all data is returned to the initial state understandable to the end-users. The inverse transformation is easily derived from the transformation (where quantity was added, it is now reduced for that amount, and where it was reduced by a certain amount, it is now added).

3.1.2 First step – Generating of initial routes using Clarke and Wright savings algorithm

Clarke-Wright heuristic (or Clark and Wright savings algorithm) starts with one route created for each

customer. Every route goes from the warehouse to the customer and back to the warehouse. After that, savings are calculated for every two customers with the following formula: . It shows savings measured in distance travelled, when two routes, whose end customers are and , are merged. This means that vehicle after servicing customer , instead of going to the warehouse, it now goes directly to customer and goes to the warehouse after. That way savings are acquired. It is often assumed that the distances of the VRP satisfy the triangle inequality theorem, , for all indexes

After all savings are calculated, they are sorted in descending order. After that, the following is done in the series of iterations (Pseudocode 2):

The largest savings that have not been observed are taken.

It is checked whether the observed customers are in different routes and whether the end customers (first or last) are in their routes. If at least one of these is not satisfied, the iteration is done.

If the weight of the route exceeds the capacity , the iteration is done.

Two routes are merged by merging the observed customers.

After these iterations are finished, the current set of routes is the result of the Clarke and Wright savings algorithm. This algorithm was originally written for the CVRP but has been later implanted on other variants of the VRP. It is mostly modified in a way where other conditions, which need to be met for this variant of the VRP, are checked while merging routes. For example, with the VRPTW, besides checking other conditions (for route capacity), it would be sufficient to also check whether the new route is feasible concerning time constraints.

As it was said, the initial solution is created in the first step. From the main version of the Clarke and Wright algorithm, other versions have been derived over time, some of which give better results. One of those was formed in 1999, in the work of Liu and Shen [36], and that version is implemented in the proposed algorithm. The main difference, from the main version, is that it not only observes the possibility of merging two routes but also observes the possibility of inserting a route between two customers on a different route. Besides that, the insertion of an inverse route between two customers of a different route (a reversed route is acquired by reversing the order of servicing customers), is observed as well. Given that one route can be inserted between two customers of a different route, savings cannot be calculated in advance, as is the case with the general algorithm. Therefore the algorithm is slightly different as well. The initial solution is acquired the same way, but the iterations are performed differently. In each iteration, two routes are merged. Of all the merging possibilities, the one with the highest savings is selected and executed.

During the selection of the routes, each pair of routes that can be merged is observed, taking into account that the new route needs to meet these conditions: weight, volume and time of arrival at the customer. The first step does not allow for a route to have a weight larger than the constant , which is set to the value of the largest vehicle's weight capacity. Also, it is not allowed for a route to have a volume greater than the constant , which is set to the value of the largest vehicle's volume capacity. Each overrun is calculated assuming that the largest vehicle will be driving each route. On the other hand, in the first step, the proposed route would have to depend on the available fleet of vehicles. To take into account the vehicles, route cost, which up to now only depended on the travelled distance, now depends on the fleet of vehicles. The route cost is equal to the distance travelled multiplied by the variable cost of the cheapest vehicle that can drive that route (by capacity).

When it comes to the SDVRP problem, a new term (constant) was introduced in the first step, which shows the similarity between routes and which is affected by constraints related to which vehicle cannot service which customer. During implementation and testing of the algorithm, it can be noticed that customers, who cannot be serviced by a certain vehicle are in several different routes, so that vehicle cannot travel any of those routes. It happened regularly that a set of routes cannot be travelled by an available fleet of vehicles. That’s why the algorithm allows the option to reduce the savings by a number (constant) which shows how similar routes are in terms of constraints, and increases the chance that two routes with similar constraints are merged. Taking into account the number , the algorithm tends to put all of the customers that cannot be serviced by a certain vehicle into the same route, so that only one route cannot be travelled by that vehicle. The number is obtained, so that for each pair of customers, where one customer is in one route and the other is in the second route, 1 is added to each of the differences in constraints of those two customers. This is one of the secondary contributions of this paper.

Each customer has information for each vehicle, regarding whether it can be served by them. Each vehicle, which can serve only one of two customers, represents the difference. After the number of differences is calculated, that number is divided by the sum of the number of customers in those two routes (which normalizes this penalty) and is multiplied by the constant, which often had the value 3. After the highest savings amount is determined, if it's possible to obtain savings by merging (there is a chance that no two routes can be merged), the routes are merged (if the highest savings amount was obtained when a reversed route was inserted into another, the reversed one is inserted). That way, after each iteration, two routes are merged, and the number of routes decreases by 1. If it is not possible to merge any two routes to obtain savings, the algorithm terminates.

3.1.3 First step – Assignment of vehicles to the routes

If it is not possible to merge two routes to obtain savings (the savings will be negative), then the step terminates. The current set of routes is the solution. Because the routes do not have their vehicles yet, in the following step vehicles are assigned to the routes using a brute-force approach.

All possible permutations of vehicle assignments are observed, and the one with the minimal cost is chosen (Pseudocode 3). The number of permutations increases rapidly, and this method is used when the number of available vehicles in the fleet is less than 10-15 (the number of permutations is reasonable, based on experimental measurements). For fleets with a vehicle count greater than 10-15, vehicle assignment is done using the greedy algorithm where the number of permutations quickly increases. First, the vehicle with the lowest cost is assigned to the longest route, from the vehicles with a capacity large enough to travel that route. In following iterations, a vehicle with the smallest cost is assigned to a route. If the number of routes is greater than the number of vehicles, a necessary amount of fictitious vehicles is included. A fictitious vehicle has cost larger than the other vehicles. Its cost is a constant , and its capacity is represented using two constants: and .

The following step of the algorithm needs to remove this vehicle (because if it does not, then the algorithm does not return a valid solution). In practice, a great cost of this vehicle forces the second step to remove it, and if it does not succeed in removing it, then it is intuitive and impossible to find the solution with a real fleet of vehicles.

3.1.4 Second step (intermediate step) – Route elimination

When the solution cost is observed by taking into account the number of routes, it can be noticed that the lower the number of routes is, the lesser the solution cost is as well. Even though that does not always have to be the case, the heuristic algorithms for the VRPTW always assume that the solution with the smaller number of routes (or used vehicles) is more optimal than the solution with a greater number of obtained routes (or used vehicles), and the cost is only observed when two solutions have the same number of routes (or used vehicles in the routing plan). One of the possible reasons, why this is the case, is because every route has its cost, which is independent of variable cost (fixed cost), and so it is better to have a smaller number of routes. In the case of the HVRPTW problem, which is most commonly found in real-world examples; this is not the case, considering that different vehicles travel the routes, the number of routes do not mean much.

After the first step of the algorithm has returned a set of routes, this step of the algorithm attempts to decrease the number of routes. The solution with fewer routes is accepted only if the cost is less or equal to the initial cost. The idea of route elimination was taken from the work of Braysy et al [37]. Route elimination is done using a constant number of iterations, and in each of the iterations the following is done (Pseudocode 4):

Chooses one permutation using the current set of routes.

Using the first route, permutation respectively tries to put each customer into another route.

If it succeeds in putting all of the customers in other routes, and the cost of the newly obtained solution is less than the cost of the solution before, the route is removed from the permutation and the previous step is done. If it does not succeed in doing that and obtaining lesser cost, then all of the customers are put in the initial route and the next iteration is done.

Each permutation has a chance to be chosen with equal probability, and the way of choosing permutations is explained in the book of Skiena [38]. When a customer is put in a different place, all possible places where that customer can be put are checked, and the one with the smallest cost is selected. If, in the end, the total sum of costs of rearranging customers is less or equal to the cost of the route that is removed, the removal of the route is approved, and the next route in the permutation is observed. That route is now the first route of the permutation.

3.1.5 Third step – Tabu search

Tabu search, created by Fred W. Glover in 1986 [39] and formalized in 1989 [40]-[41], is a metaheuristic search method employed as a local search method used for mathematical optimization. Local search is a method of solving problems of combinatorial optimization which start from one solution, and in a series of iterations, in each iteration, it goes from one solution to a solution in the neighbourhood of that solution. The neighbourhood of the solution presents solutions which are, in some way, similar to that solution and is most often obtained like other solutions which can be obtained from the current solution through some form of modification.

Naive local search always chooses the best solution from the neighbourhood of the solution, but doing so can cause repetitions of the solutions (one of the possibilities is that the search finds two solutions that are the best solutions in that neighbourhood). Tabu search solves that problem by forbidding some solutions from the neighbourhood, which prevents the repetition.

Those solutions can be forbidden by remembering the list of forbidden solutions, or, as is the case with this algorithm, by remembering the component of the solution that is forbidden. The proposed algorithm forbids a customer to be in a route a certain number of times after ending up in that route. That way, all solutions which have the customer in that route are forbidden. That way, it is forbidden for solutions to start repeating themselves, and the algorithm tends not to repeat even similar solutions so that it can enter a different area of possibility when it comes to solutions.

After the previous part is finished, the third step starts, which is the Tabu search with the current solution as the initial solution. The main idea was taken from the work [42], and the algorithm was continuously improved.

The Tabu search procedure, which has a fixed number of iterations, is started. The current solution changes in each iteration. The whole time, the best solution (to the current iteration) is memorized, and in each iteration, it is checked whether the current solution is better than the previous one, and if it is, then the current solution becomes the best solution. At the end of the iterations, the best solution becomes optimal (Pseudocode 5).

For the neighbourhood of solutions, two operators which modify the solution are used: RELOCATE and SWAP. The procedure will be briefly explained in the following text.

The operator RELOCATE removes a customer from their route and relocates them to a different route. All possible combinations are observed, and the one with the highest savings is selected. The algorithm goes through multiple iterations, one iteration for each customer. In every iteration, the following is done for customer :

The savings are calculated for the cost, in the case where the customer would be relocated from their route.

For each route except the one is in, the following is done:

After all of the iterations are done, the relocation which gave the highest savings is performed, provided

those savings are higher than 0. Solutions which have overruns in terms of weight and volume, as well as delays, are permitted, but additional penalty points exist for not meeting certain constraints, as it is described in the definition of the solution cost, at the beginning of this section. For each customer, the number of places where the customer can be relocated to, is approximately equal to the number of customers, considering that the customer can be relocated behind every customer, except the ones from their route and it can also be relocated to the start of each route except for its current route (so the number of possible places is , where is the number of customers, is the number of routes, and is the number of customers in the current route of the customer). If there are customers, every one of the removed customers can be relocated to approximately

places. Therefore, every solution has solutions in its neighbourhood when the RELOCATE operator is used, and that also determines the time complexity of the operator.

The SWAP operator exchanges two customers from different routes. Similarly to the previous operator, every two customers, who are not from the same route, are observed, as well as the savings, which would be obtained should the relocation be performed. In the end, two customers who give the highest savings are chosen. If those savings are greater than 0, the exchange is performed. The number of possible customer relocations depends on the number of routes and the number of customers per route. Considering that the number of routes is always at least 3, and the number of customers per route is mostly approximately equal for every route, each customer can be relocated with at least of the rest of the customers (because their route does not have more than customers). So there are at least possible relocations (every customer relocates with at least others, and is divided by 2 because each relocation is counted 2 times). Therefore, for each solution, there are other solutions that can be obtained by relocating two customers.

Following the presented methods and the operators, in each iteration, the proposed algorithm observes how RELOCATE and SWAP can be changed and improves the solution. The method, which gives the best solution and is not forbidden in terms of the set constraints, is selected.

The algorithm has characteristics of Tabu search in a way that it forbids the customer to enter the same route twice in a short amount of iterations. That is achieved by keeping a list (matrix) in which rows represent customers, and columns represent routes (called and which store the value of how many subsequent iterations a certain customer is forbidden from entering a certain route, while combinations which are not allowed during calculation of cost are not taken into account.

Every time a customer enters a route, with the operation RELOCATE, the assigned value to that customer and that route becomes equal to the constant TABU, which represents the number of following iterations in which that customer will not be able to enter that route. Every time the SWAP operation happens, and two customers enter new routes, the values in the matrix which represent those two customers and their new routes are set to value . The SWAP operation is not executed only when both customers are forbidden from entering routes, which they would otherwise enter. After selecting the modified solution with the highest savings, the operator is executed. Then, when the operator RELOCATE is applied, the vehicles are assigned to their routes again, using the greedy algorithm. The routes are sorted by length first, and vehicles are sorted by travelled kilometres. Afterwards, the cheapest vehicle from the fleet, which can travel that route in terms of capacity (weight and volume) is assigned to the longest route, and that vehicle becomes unavailable. The process is then continued, and the longest route gets assigned the cheapest available vehicle in terms of capacity which can travel, and the process continues with the third-longest and the rest of the routes until all iterations are completed. Before the assignment is done, solution cost is memorized, and if the solution cost is greater after the assignments, the routes are assigned to the vehicles like they were before, and the new assignment is cancelled.

Each 1000th iteration, all possibilities when it comes to the assignment of vehicles to routes are checked, and the best one is chosen. Given that checking all possibilities is a demanding operation, it is not possible to try all the possibilities from every iteration, and that is why vehicles are usually assigned using the greedy algorithm described earlier. This is one of the secondary contributions of this paper.

Also, the proposed algorithm tries to research a greater area of solutions and strives to find more diverse solutions. It does that by calculating how many times the pair (customer, route) appeared in the solution (how many times the route had that customer). That is stored in the table , and if the cost in the transition is positive, the number of times the customer was in the route multiplied by the appropriate constants is added to the cost. Intuitively, this means that only when the current solution does not have a better solution than itself in its neighbourhood, during the calculation of the transition cost, the solutions which appeared less up until that moment are preferred, so that the algorithm has a more diverse search. The idea given in [42] is that for each pair (customer , route ) the best solution in which that pair appears in (in which customer is in route ) is memorized as well. When it happens that a solution cannot be the next solution because customer cannot enter route because of , two options are considered:

The solution cost is not observed at all, which is the simplified form.

The solution cost is calculated, and if that cost is less than the minimal cost so far, in which customer and route participate, then the solution is chosen.

The second form presents a modification of the standard Tabu search.

3.1.6 Fourth step (post-step) – Improvements within each route

The fourth step of the whole modular algorithm additionally strives to optimize each route separately. Given that the previous step (Tabu search) relocates the customers between different routes, situations often occur where routes which are the result of the third step can be additionally improved, and in a way where customers are relocated from one location to another, within the same route. Besides that, considering that the algorithm is conceived to solve practical problems, where the route cost directly depends on the weight of the merchandise the vehicles are transporting and delivering to each customer, with permutations of the customer within each route, this fact can be fixed.

This statement can be especially noticed when a part of a certain route is located in one city, and the other part in another city, which is farther away from the warehouse.

The fourth step could be described in the following way, in terms of iterations (Pseudocode 6):

Take one customer from the route and try to relocate him to every place in the same (his) route.

Calculate the route cost for every customer relocation.

If the newly calculated route cost is less than the route cost before the relocation of the customer, the route becomes optimal. Otherwise, the optimal route is the initial route in this step, and the customer is returned to the starting position in the route.

Iterations are finished when there are no possible customer relocations (of one customer) within the same route that decrease the whole route cost. This, of course, implies that all the set constraints are met. Considering that the listed operations are not complex nor demanding in terms of time since they only consider the route cost calculation after permutating the customer, the possibility for additional improvement is noticed, for example, if the same procedure is executed for two or three consecutive customers within the same route. In that case, there is an attempt to relocate two consecutive customers to different sequences, and if the cost is smaller than before, that route becomes optimal, otherwise, after relocating customers one by one (the previously described procedure), the initial route is chosen. There is a small probability that improvements can be achieved if more than three consecutive customers are permutated. For each of the obtained routes, the algorithm terminates at those iterations that relocate three consecutive customers. This way of improvement within each route presents one of the contributions of this paper.

3.2 Data-driven approach for adjusting the control parameters of the

The previous section presents a modular algorithm, which is able to solve real-world VRP with a great number of various realistic constraints. The proposed algorithm consists of control parameters (constants) which are listed in the following text.

are not assigned to routes. However, it is still not permitted for routes to have a total weight that is greater than the weight of the largest vehicle in the fleet, and analogously, routes cannot have a total volume greater than the volume of the largest vehicle travelling that route. These two constants represent those two values which no route can exceed.

which is added to the solution when a route cannot be travelled by any real vehicles. That can happen when the number of routes is greater than the number of vehicles or when a route is overloaded in terms of volume or weight. The most common value of this parameter is equal to 2.

currently set through the input parameter of the algorithm, and its most common value is 25000.

iteration of Tabu search, it is allowed for the vehicle to be overloaded in terms of weight and volume by the values and . The most commonly set value for the parameter is 50 [kg], and for it is 0.1 [m3].

this constant is 30.

value of this parameter is 0.5 [CurrencyUnit/min].

0.001.

service them (SDVRP). This parameter is very important meeting the constraints and its most common value is 400.

represents the coefficient of cost increase when the vehicle is transporting the maximum weight. The set value is 0.2.

represents the cost increase when the volume is overloaded by 100%. The most commonly set value is 400.

represents the cost increase when the weight is overloaded by 100%. The most commonly set value is 400.

Which routes will be obtained as a result, how much the total cost will be, and whether all the set constraints will be met, all depend on the values of the given parameters. The primary goal with real-world VRP is to not violate any of the set constraints. Following that, some of the listed control parameters will be set in a way that is described in the following text, using several prediction methods and algorithms, which use historical data. In short, the whole idea of parameter adjustment can be presented as on the Figure 2.

Figure 2: Control parameter adjustment approach using historical data

Based on all the parameters that can influence the final result, several fundamental ones, which are memorized for every route, have been isolated, during testing and in production use of several distribution companies, which deal with product distribution from their warehouses to the delivery locations (markets, shops, supermarkets, and others). That way, a knowledge base, which is updated every day, is created, and the primary goal of this step of the system is to adjust the control parameters and constants, which are an integral part of the system, according to that historical data. With that, the whole approach gets an adaptive character, because the algorithm is improved in time, in a way the parameters, which enable obtaining the minimal cost of the transport route, are adjusted automatically while meeting all of the set constraints at the same time.

Attributes which are isolated as those that affect the obtained routes are:

number of customers

number of available vehicles

number of different available types of vehicles

number of different cities

the total number of constraints related to which customer cannot be serviced by which vehicle

the total number of articles ordered [items]

the total volume of all articles ordered [m3]

the total weight of all articles ordered [kg]

the total duration of time windows of all the customers [min]

whether all set constraints are met (1 – yes, 0 – no)

The goal attributes which affect the obtained routes and total cost, and which represent the control parameters of the implemented algorithm are:

Each of these parameters is separately set. First, the pre-processing of the data was done, and only the data, which meets all the constraints (value 1 in the given column), was selected from history. After, the redundant attributes were removed. Using the Attribute Importance option (step), the importance of the input attributes, for every goal attribute, was determined. For attribute importance, Minimum Descriptor Length (MDL) algorithm was used. Before that, normalization of the attributes was done so that the attribute values were rounded up to one decimal. After pre-processing and preparation of the input data, the regression model is created for determining the goal attributes. The model is identical for every goal attribute. The proposed model for one parameter is shown on Figure 3.

Figure 3: Proposed regression model for one parameter

Two algorithms were used for the regression:

Generalized Linear Models (GLM)

Support Vector Machine (SVM)

These two regression models were selected for several reasons. The advantage of SVM over other methods is that it provides better predictions with unseen test data, provides simple optimal solutions for the problem in training, and there are fewer parameters for optimization in comparison with other methods. The execution speed is not so crucial for the problem that it is used for, and so the disadvantages of the SVM regression method can be ignored. The GLM algorithm for regression was chosen because it represents a generalization of linear regression and is often used in cases when the output variables do not have a normal distribution. Given that the input data are associated with linear dependence, the choice of the GLM regression algorithm was a logical one. After that, the Decision Support System was made, which, based on the obtained results for both of the listed algorithms for each attribute, chooses the one that gives better results by the obtained indicator Predictive Confidence [%], and the same is used for the goal attribute during new routing.

The following section lists the obtained results for this part of the control parameter adjustment of the algorithm, the obtained results of the whole proposed VRP modular algorithm on standard data sets, as well as actual data sets, in great detail.

4 Discussion and comparison of results

This section will be divided into four subsections:

(4.1) analysis of the obtained results on standardized input data,

(4.2) analysis of the results on real-world benchmark data,

(4.3) analysis of the control parameter adjustment results of the proposed algorithm (from Section 3.2),

(4.4) practical significance of the proposed approach.

4.1 Analysis of the obtained results on standardized benchmark data

For VRP with time windows, some data instances have become standard over time and using such data every VRP algorithm is tested and validated. First, in 1987, in [43] Solomon published a set of VRPTW instances, which contain 25, 50, and 100 customers. For a long time, these instances proved to be a challenge for scientists, and there were not any instances with more customers. For the last 20 years, as algorithms for more delivery sites have been developed, there has been a need for instances with more customers. In 2005, Gehring and Homberger [44] created a new set of instances consisting of 200, 400, 600, 800, and 1000 customers each. They generated their instances in the same way Solomon created his. All of the instances are found on the web-page [45]. Optimal solutions are updated regularly on the mentioned page. Those solutions were used in the first method of testing the proposed algorithm of this work.

Solomons’ instances consist of vehicles in the fleet, where is always equal to one-quarter of the number of customers in the instance, capacity of the vehicles , information about the depot and customers. Every customer has his coordinates and , ordered weight , start and end of the time window and , and the unloading time . Coordinates and are given for the depot and time window and . Distances between customers and the depot are calculated based on the coordinates for customers , and distance is equal to the Euclidean distance of the dots and . The time distances are equal to the path distances. Solomon created 6 different sets of instances: R1, R2, C1, C2, RC1 and RC2. In R-sets, the coordinates of the customers were chosen at random, from a defined interval, so the customers are evenly distributed everywhere. In C-sets the customers are clustered, which means that in several places, a greater number of customers is present. RC sets are something between the sets R and C, the customers are not evenly distributed, and they are not completely clustered either. In sets 1, the time windows were chosen from a smaller interval when compared to sets 2. That way, the number of customers per vehicle is significantly greater in sets 2, and with that, the number of routes is smaller. In Solomons’ instances, the first goal of the optimization was the number of routes, then the total distance travelled by the vehicles. In this case, the smaller the number of routes is, the more optimal the solution is, and if two solutions have the same number of routes, the one with less distance travelled is optimal.

Table 1: Testing results for Solomons’ instances

Each of the 6 listed sets of instances contains 8-12 instances which are generated in the same way. For the algorithm testing, two have been taken from each group. Considering that the first criteria of Solomons’ instances is the number of routes, and the main idea of the algorithm presented in this work is to minimize the cost while meeting all of the set constraints, for some instances the algorithm found a solution with optimal cost, but with a greater number of routes (Table 1).

Table 2: Results of testing on instances for 200 and 400 customers

The algorithm was tested on Homberger and Gehring instances with 200 and instances with 400 customers. On instances with 200 customers, there was no significant difference when compared to Solomons’ instances with 100 customers. Some instances have an improved solution in terms of cost, but the number of vehicles (routes) is greater compared to the optimal solution. From a practical viewpoint, the given solutions are optimal because they decrease the cost for the company. The optimality of the solution slightly decreases as the number of customers increases, but even for 400 customers (instance rc2_4_1), a better solution in terms of cost, and which meets all the given constraints, is obtained (Table 2).

The execution time of the implemented algorithm for Solomons’ instances was up to a maximum of 1 [s]. The execution time was greater for instances with a greater number of customers, and so the execution of the algorithm for instances with 200 customers was up to a maximum of 5 [s], while for instances with 400 customers it was up to 200 [s]. One-third of the total execution time fell on the first step of the algorithm; the second step lasted for a negligibly short period of time, while the third step took up about 2/3 of the remaining execution time of the algorithm. The algorithm was executed on a laptop, with an Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz 2.50 GHz processor, and 16 GB RAM DDR3 of memory.

4.2 Analysis of the results on real-world benchmark data

The proposed algorithm can meet constraints that are not defined in standardized data sets, as it is explained in great detail in previous sections. Because of that, testing of the given algorithm was done on real data from one of the biggest distribution company in Bosnia and Herzegovina. The data which was used for testing and which will be mentioned in the following text is placed on the 4TU.ResearchData [8] to be available as a new real-world benchmark dataset for the rest of the researchers.

Ten different days for which it was necessary to create the optimal transport routes, which meet all of the set constraints, were used for testing. Results are shown in Table 3. From the presented results, it can be concluded that in 9 out of 10 cases, all of the realistic constraints, which can be strictly defined and which can significantly aggravate the process of finding an optimal solution, are met. Only in one case, the constraint regarding the volume of one vehicle was violated, but the given overrun is equal to (0,004 / 3,15) * 100 = 0,13% of the permitted vehicle volume, which can be practically ignored. For the given day, the vehicles were filled up in terms of volume by 94,61%.

Another thing which can be concluded is that a small number of available vehicles (7 or 8), where most of the vehicles are different (different types and forms of vehicles, with different characteristics), which significantly aggravates the process of finding an optimal solution. The optimality of the solution primarily depends on the set constraints, and then on the number of customers and an available fleet of vehicles. Following that, it can be noticed that the solution cost with 94 customers is 1,5 greater than the solution cost for a route with 115 customers.

Table 3: Results of testing on real data of a distribution company

The execution of the algorithm is mainly affected by the number of customers. The total execution time of the algorithm ranges approximately from 300 to 500 [s]. It can be noticed that the given time is somewhat longer compared to the algorithm execution on standardized input data. That can be explained with a rather complex set of data with a great set of additional constraints. One-third of the total execution time fell on the first step of the algorithm, the second step lasted for a negligibly short period, while the third step took up about 2/3 of the remaining execution time of the algorithm. The algorithm was executed on a laptop, with an Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz 2.50 GHz processor, and 16 GB RAM DDR3 of memory.

Table 4: Results of testing on real data of a distribution company – divided delivery

The obtained results of the optimal routes were respected in real-life in a way that the drivers of the given distribution company successfully respected the presented routes, and managed to fully satisfy them. Given that the proposed algorithm supports the possibility of divided delivery for customers that cannot be serviced by one delivery (because of the set constraints), an actual input data set is available on the mentioned 4TU.ResearchData weblink [8], for which the algorithm calculated a cost shown in Table 4.

Figure 4: Example of divided delivery

The multiple deliveries shown are most noticeable on a graphic display for one of the used vehicles. On Figure 4, the left picture shows the first delivery and return of the vehicle to the depot, while the right picture shows another delivery for the given vehicle. The red colour marks the depot, while the customers have enumerated blue labels that mark the sequence of delivery for each of the customers.

Figure 5 shows an example of a typical delivery for one vehicle. The obtained optimal routes are shown on the map in order, where the order of servicing each customer is shown with a marker.

Figure 5: Example of a typical delivery for one vehicle

From the visual representation of the delivery, on the geographical map, it can be concluded that customers which have larger orders are serviced first, which is explained in greater detail in the introduction of Section 3.1.

4.3 Analysis results of control parameter adjustment of the proposed

As it was mentioned earlier, for every one of the control parameters, an independent regression model is created, with two regression algorithms: GLM and SVM. After obtaining the results, for each parameter, a Decision Support System, which determined the predicted value of the algorithm with Predictive Confidence based on the regression results, was created. To enrich the knowledge database of the control parameters, the algorithm was started 5632 times with all of the constraints met for a variety of different days and input parameters. The given data, which was used for testing and validation of the prediction model, is placed on the 4TU.ResearchData [9] to be available to the rest of the researchers.

Table 5: Comparative results of the used regression models

For each control parameter, a comparison of the Predictive Confidence [%] results was done, which, for the given input data set, is shown in Table 5. From the presented results it can be concluded that better prediction results were given for every SVM control parameter than for the GLM algorithm, and that is why the implemented Decision Support System preferred the prediction results of the SVM algorithm.

The obtained Attribute Importance segment in the prediction model helps determine the value of every one of the input attributes for the goal control parameter. The average value of the input parameters for the output control variables, as well as their order, based on the given value, is shown in Table 6.

Table 6: Attribute importance results

Analysis of the average values of the effect the input attributes have on every one of the control parameters concludes that the input parameters affect the resulting prediction control parameters in the order presented in Table 6. The achieved results are as expected because the routing was exceptionally complex with strict constraints when using actual data with a smaller number of available vehicles. This was especially affected by the fact that an average of 8 vehicles was available for routing, and seven of those were of different types and varieties, which significantly affected the results and the complexity of the algorithm execution. Based on that, the conclusion arises that those parameters are the most important ones in adjusting the control parameters of the algorithm. Also, what can be concluded is that the time windows of customers are of great importance to control parameters, which significantly affects the complexity of the solution search.

The parameter which, according to Table 6, has the least significance in adjusting the values of control parameters is the number of constraints, in terms of which customer cannot be serviced by which vehicle. The given number is presented in the form of a summarized indicator. If it were presented in the form of a ratio, of the customer to the number of vehicles which can service that customer, then the significance of that parameter would increase, and it could be the most important one.

For every one of the control parameters, the given results are presented graphically as well. For example, results for the parameter are shown in Figure 6.

Figure 6: Predictive Confidence [%] comparison results for one control parameter

Also, it is possible to observe the comparison of Residual (Residual is the difference between the expected and predicted value of the dependent variable) for every one of the control parameters. An example of the comparison for the parameter is shown in Figure 7.

Figure 7: Residual comparison

The previous picture shows that for each attribute, except for the mentioned Predictive Confidence indicator, it is possible to obtain a multitude of other parameters (they primarily refer to the values of prediction errors) based on which it is possible to perform other comparisons and choose the model which meets all of the expectations and needs.

4.4 Practical significance of the proposed approach

No matter which of the approaches and methods are used for solving VRP in real conditions, there is always a risk that the given routes are not entirely feasible. Primarily this depends on the accuracy of the set parameters, input data and limitations. This work aims to present how some of the basic data required for solving and applying VRP can be set in a real environment. Practically feasible routes are the only ones that companies are interested in. In case the routes they get are not completely implemented a new level of insecurity in the operation of the implemented algorithm and model arises.

Table 7. Practical significance of the proposed approach (comparison results)

As a practical conclusion of all the presented results, it can be concluded that the proposed approach has always tried to use smaller vehicles, while the bigger ones (with the much higher cost, for example, one of the biggest and most expensive vehicle in the dataset [8] labelled 875M523) were only included when necessary, or rather when it was not possible to serve all the customers using smaller vehicles. Even in this case, the bigger vehicle was used for serving fewer customers, with relatively smaller distances from the depot, to minimize its cost.

According to the comparative results presented in Table 7, it can be concluded that the feasibility of the routes has significantly increased (about 20%) by introducing the previously described approach. On the other hand, two vehicles more were used in 10 days during the routing, and the routing costs, as well as the total distance, have been slightly increased. However, the feasibility of the routes in realistic circumstances is the most important criterion. Even if the routes are optimal but not feasible, the transport processes of the company are much more difficult, so they increase the costs because there are some customers not being served as it was planned. On the other hand, it distorts one important factor, the company reputation, even more, which can result in a decreased number of satisfied customers, cancellations of the principals, and finally, it can result in the closure of the company itself. Therefore, this work takes on a strategic epithet and is crucial for every company that transports the goods.

5 Conclusions and Future research

This work presents a complex vehicle routing problem in the field of logistics with time windows and a set of real constraints, as well as a modular algorithm which adaptively solves that problem. The proposed algorithm consists of four steps. In the first step, an initial solution to the problem is created. It uses a modification of the heuristic Clarke-Wright algorithm. After that, the second intermediate step strives to decrease the number of routes. The result (routes) of the second step serves as an initial solution for the local Tabu search, which is the third step. After Tabu search, the fourth step (post-step) strives to optimize the sequence of customers within each of the previous routes. Before that, to significantly simplify the problem, the transformation of input data (time windows of customers and time distances) is performed. This enables the delivery time at each customer to become 0. Besides that, warehouse and vehicle working times are transformed in the pre-step, as well as a delivery division in the cases where one customer cannot be serviced by a certain vehicle only once.

The proposed algorithm consists of constants and control parameters, which are determined in a unique way using the knowledge base from historical data, based on the Generalized Linear Models and Support Vector Machine regression model. Both models make up the inputs in the Decision Support System, whose main task is to determine the best values for each of the algorithm’s attributes, using the previously mentioned regression models. The stated procedure of the approximation of the best values of the control parameters for the given input data set is done in the phase of data preparation for each routing. The given procedure of pre-processing the algorithm’s parameters gives an adaptive character to the whole approach.

The presented modular, adaptive approach can solve real-life VRP problems with several hundred delivery locations while meeting all of the set real-life constraints. Testing of the algorithm was done in two phases. The first testing was done using standardized data sets, where the implemented algorithm showed highly satisfying results. For some input data sets the proposed algorithm produced better results, up to 6,5% compared to the currently existing optimal solutions. For other routings it produced slightly less good results (never more than 3% than the optimal) compared to the current optimal solution of the given instances. The second type of testing was done on an actual data set, which was also published on the web link to serve other scientists in their researches and comparisons. For those data sets, the algorithm mostly managed to meet all of the very strict set constraints, and despite the minimal cost, the execution time of the algorithm was satisfactory from the aspects of a real-life application. The algorithm also has the ability of adaptability through automatic self-adjustment of the control parameters so that it is better and more advanced with every routing.

The approach and this algorithm is in use in some of the biggest distribution companies, as an implemented web-based enterprise system. The system enables human-computer interaction by allowing the subsequent manual modification of the obtained transport routes. It enables a graphic representation of the routes and comparison of the results. Results also showed that proposed improvements are increased by approximately 20% in the execution of the obtained routes in a real environment. In this way, the company is assured of the quality of the generated transportation routes and their customers have confidence in the delivery.

Based on the detailed description of the individual sections, it can be concluded that the contribution of this work is reflected in several ways. The basic contribution is based on the proposal of a modular, data-driven approach for successfully solving the vehicle routing problem that can be applied to the real cases in the field of logistics. An innovative, predictive and adaptive method of setting up and adjusting the parameters and constants used in the implementation of VRP algorithms which is based on the historical data is presented. The proposed approach, aside from the prediction model for the used parameters and constants also consists of the multi-step algorithm that is able to solve complex real-world VRP problems, accepting some of the constraints and facts that are not taken into consideration in other scientific papers in this scientific field. These are essential for the practical usage, feasibility and cost effectiveness of the resulting transportation routes in a realistic environment. These innovative segments of the proposed algorithm are also one of the contributions to this work, which are emphasized in a detailed description of the algorithm itself. In addition, the contribution of this work is also reflected in the fact that the real benchmark dataset is published to other researchers for further analyses and experiments. A practical contribution is reflected in the implementation of the web-based, easy-to-use system based on the proposed approach, with the possibility of the subsequent modification of the obtained transportation routes included. The system is being used by several of the largest distribution companies in Bosnia and Herzegovina. The resulting transportation routes are completely feasible, which results in financial and other benefits of the very companies using it.

Future research can be based on using Variable Neighborhood Search (VNS) or Simulated Annealing for the third step of the algorithm. Also, the third step could use a hybrid approach, which combines multiple metaheuristic algorithms during the local search. Another improvement could be decreasing the run-time of the algorithm’s operators. One idea used in several of the mentioned scientific works relies on the approach of not observing all the possible combinations and choosing the best one, but rather only observing those combinations in which two geographically close routes can be swapped. Besides geographical location for certain parameter adjustments GPS data, weather conditions and delivery times can be used.

Data Availability

The data used to support the findings of this study have been deposited in the online repositories:

[1] https://doi.org/10.4121/uuid:598b19d1-df64-493e-991a-d8d655dac3ea

[2] https://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8

Conflicts of Interests

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank the Faculty of Electrical Engineering in Sarajevo for the resource support, and Info Studio d.o.o. Sarajevo for the possibility of practical use and testing.

References

[1] Dantzig, G. B., & Ramser, J. H. (1959). The truck dispatching problem. Management science, Vol. 6(1), 80-91.

[2] Tseng, Y. Y., Yue, W. L., & Taylor, M. A. P. (2005). The Role of Transportation in Logistic Chain. J. East. Asia Soc. Transp. Stud., Vol. 5, pp. 1657-1672.

[3] Lawler, E. L., Lenstra, J. K., Rinnooy Kan, A. H. G., & Shmoys, D. B. (1986). The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Journal of the Operational Research Society, Vol. 32(6), 655-655. doi:10.2307/2582681

[4] Schrijver , A. (2005). On the history of combinatorial optimization (till 1960). Handbooks in Operations Research and Management Science, Vol. 12, 1-68. doi:10.1016/S0927-0507(05)12001-5

[5] Klarreich, E. (2015). Computer Scientists Find New Shortcuts for Infamous Traveling Salesman Problem. WIRED, Simons Science News

[6] Bektas, T. (2006). The multiple traveling salesman problem: An overview of formulations and solution procedures. The International Journal of Management Science, Omega, Vol. 34, 209-219. doi:10.1016/j.omega.2004.10.004

[7] Kumar, S. N, & Panneerselvam R. (2012). A Survey on the Vehicle Routing Problem and Its Variants. Intelligent Information Management, Vol. 4, 66-74. doi:10.4236/iim.2012.43010

[8] Žunić, E. (Emir) (2018) Real-world VRP benchmark data with realistic non-standard constraints - input data and results. 4TU.Centre for Research Data. Dataset. https://doi.org/10.4121/uuid:598b19d1-df64-493e-991a-d8d655dac3ea

[9] Žunić, E. (Emir) (2018) Real-world VRP data with realistic non-standard constraints - parameter setting problem regression input data. 4TU.Centre for Research Data. Dataset. https://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8

[10] Rodrigue, J. P., Comtois, C., & Slack, B. (2016). The geography of transport systems. The Geography of Transport Systems, Taylor & Francis Group. doi:10.4324/9781315618159

[11] Braekers, K., Ramaekers K., & Van Nieuwenhuyse, I. (2016). The vehicle routing problem: State of the art classification and review. Computers & Industrial Engineering, Vol. 99, pp. 300-313. doi:10.1016/j.cie.2015.12.007

[12] Grosso, R., Muñuzuri, J., Escudero-Santana, A. & Barbadilla-Martín, E. Mathematical Formulation and Comparison of Solution Approaches for the Vehicle Routing Problem with Access Time Windows. Complexity (2018). doi:10.1155/2018/4621694

[13] Nalepa J., & Blocho M. (2016). Adaptive memetic algorithm for minimizing distance in the vehicle routing problem with time windows. Soft Computing, Vol. 20 (6), pp. 2309–2327. doi:10.1007/s00500-015-1642-4

[14] Dixit A., Mishra A., & Shukla A. (2019) Vehicle Routing Problem with Time Windows Using Meta-Heuristic Algorithms: A Survey. In: Yadav N., Yadav A., Bansal J., Deep K., Kim J. (eds) Harmony Search and Nature Inspired Optimization Algorithms. Advances in Intelligent Systems and Computing, Vol. 741, pp. 539-546. doi:10.1007/978-981-13-0761-4_52

[15] Goel R., & Maini R. (2018). A hybrid of ant colony and firefly algorithms (HAFA) for solving vehicle routing problems. Journal of Computational Science, Vol. 25, pp. 28-37. doi:10.1016/j.jocs.2017.12.012

[16] Mahmudy W. F. (2014). Improved Simulated Annealing for Optimization of Vehicle Routing Problem With Time Windows (VRPTW). Kursor, Vol. 7(3). doi:10.21107/KURSOR.V7I3.1092

[17] Mohammed, M. A., Abd Ghani, M. K., Hamed, R. I., Mostafa, S. A., Ibrahim, D. A., Jameel, H. K., & Alallah, A. H. (2017). Solving vehicle routing problem by using improved K-nearest neighbor algorithm for best solution. Journal of Computational Science, Vol. 21, pp. 232-240. doi:10.1016/j.jocs.2017.04.012

[18] Mari, F., Mahmudy, W., & Santoso, P. (2019). AN IMPROVED SIMULATED ANNEALING FOR THE CAPACITATED VEHICLE ROUTING PROBLEM (CVRP). Jurnal Ilmiah Kursor, Vol. 9(3), pp. 117-126. doi:10.28961/kursor.v9i3.178

[19] Caballero-Morales, S. O., Martínez-Flores J. L., & Sánchez-Partida D. (2018). An Evolutive Tabu-Search Metaheuristic Approach for the Capacitated Vehicle Routing Problem. In: García-Alcaraz J., Alor-Hernández G., Maldonado-Macías A., Sánchez-Ramírez C. (eds) New Perspectives on Applied Industrial Tools and Techniques. Management and Industrial Engineering, pp. 477-495. doi: 10.1007/978-3-319-56871-3_23

[20] Li, Y., Soleimani, H., & Zohal, M. (2019). An improved ant colony optimization algorithm for the multi-depot green vehicle routing problem with multiple objectives. Journal of Cleaner Production, Vol. 227, pp. 1161-1172. doi:10.1016/j.jclepro.2019.03.185

[21] Kunnapapdeelert, S., & Kachitvichyanukul, V. (2018). New enhanced differential evolution algorithms for solving multi-depot vehicle routing problem with multiple pickup and delivery requests. International Journal of Services and Operations Management, Vol. 31(3). doi:10.1504/IJSOM.2018.095562

[22] Li, J., Li, T., Yu, Y., Zhang, Z., Pardalos, P. M., Zhang, Y., & Ma, Y. (2019). Discrete firefly algorithm with compound neighborhoods for asymmetric multi-depot vehicle routing problem in the maintenance of farm machinery. Applied Soft Computing, Vol. 81. doi:10.1016/j.asoc.2019.04.030

[23] Montoya-Torres, J. R., López Franco, J., Nieto Isaza, S., Felizzola Jiménez, H., & Herazo-Padilla, N. (2015). A literature review on the vehicle routing problem with multiple depots. Computers and Industrial Engineering, Vol. 79, pp. 115-129. doi:10.1016/j.cie.2014.10.029

[24] Niu, Y., Yang, Z., Chen, P. & Xiao, J. (2018). A Hybrid Tabu Search Algorithm for a Real-World Open Vehicle Routing Problem Involving Fuel Consumption Constraints. Complexity. doi:10.1155/2018/5754908

[25] Soonpracha, K., Mungwattana, A., Janssens, G. K., & Manisri, T. (2014). Heterogeneous VRP review and conceptual frameworks. in Lecture Notes in Engineering and Computer Science. ISBN 978-988-19253-3-6.

[26] Kim, G., Ong, Y., Heng, C. K., Tan P. S., & Zhang, N. A. (2015). City Vehicle Routing Problem (City VRP): A Review. in IEEE Transactions on Intelligent Transportation Systems, Vol. 16(4), pp. 1654-1666. doi:10.1109/TITS.2015.2395536

[27] Eksioglu, B., Vural, A. V., & Reisman, A. (2009). The vehicle routing problem: A taxonomic review. Computers and Industrial Engineering, Vol. 57(4), pp. 1472-1483. doi:10.1016/j.cie.2009.05.009

[28] Alonso-Mora, J., Samaranayake, S., Wallar, A., Frazzoli, E. & Rus, D. (2017). On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment. Proc. Natl. Acad. Sci. U. S. A. doi:10.1073/pnas.1611675114

[29] Wang, Y., Ma, X., Li, Z., Liu, Y., Xu, M., & Wang, Y. (2017). Profit distribution in collaborative multiple centers vehicle routing problem. J. Clean. Prod. doi:10.1016/j.jclepro.2017.01.001

[30] Wang, Y., Ma, X., Liu, M., Gong, K., Liu, Y., Xu, M., & Wang, Y. (2017). Cooperation and profit allocation in two-echelon logistics joint distribution network optimization. Appl. Soft Comput. J. doi:10.1016/j.asoc.2017.02.025

[31] Lee, W. L. (2013). Real-Life Vehicle Routing with Non-Standard Constraints. Proceedings of the World Congress on Engineering (WCE) 2013, Vol I, pp. 432-437.

[32] Carić, T., Galić, A., Fosin, J., Gold, H., & Reinholz, A. (2008). A Modelling and Optimization Framework for RealWorld Vehicle Routing Problems, Vehicle Routing Problem, Tonci Caric and Hrvoje Gold (Ed.), InTech, DOI: 10.5772/5790.

[33] Calvet, L., Ferrer, A., Gomes, M., Juan, A., & Masip, D. (2016). Combining statistical learning with metaheuristics for the Multi-Depot Vehicle Routing Problem with market segmentation. Computers & Industrial Engineering, Vol. 94.

[34] Fu, C., & Wang, H. (2010). The solving strategy for the real-world vehicle routing problem. 3rd International Congress on Image and Signal Processing, Yantai, 2010, pp. 3182-3185.

[35] Calvet, L., Juan, A. A., Serrat, C., & Ries, J. (2016). A statistical learning based approach for parameter fine-tuning of metaheuristics. SORT - Statistics and Operations Research Transactions, Vol. 40(1), pp. 201-240.

[36] Liu, F. H., & Shen, S. Y. (1999). The fleet size and mix vehicle routing problem with time windows. Journal of the Operational Research society, pp. 721-732.

[37] Braysy, O., Porkka, P. P., Dullaert, W., Repoussis, P. P., & Tarantilis, C. D. (2009). A well-scalable metaheuristic for the fleet size and mix vehicle routing problem with time windows, Expert Systems with Applications, Vol. 36(4). pp. 8460-8475.

[38] Skiena, S. S. (2008). The Algorithm Design Manual. The Algorithm Design Manual. doi:10.1007/978-1-84800-070-4

[39] Glover, F. (1986). Future Paths for Integer Programming and Links to Artificial Intelligence. Computers and Operations Research. Vol. 13(5). pp. 533-549.

[40] Glover, F. (1989). Tabu Search – Part 1. ORSA Journal on Computing, Vol. 1(2), pp. 190-206.

[41] Glover, F. (1990). Tabu Search – Part 2. ORSA Journal on Computing, Vol. 2(1), pp. 4-32.

[42] Cordeau, J. F., Laporte, G., & Mercier, A. (2001). A unified tabu search heuristic for vehicle routing problems with time windows. Journal of the Operational research society, Vol. 52(8), pp. 928-936.

[43] Solomon, M. M. (1987). Algorithms for the vehicle routing and scheduling problems with time window constraints. Operations research, Vol. 35(2), 254-265. doi:10.1287/opre.35.2.254

[44] Homberger, J., and Gehring, H. (2005). A Two-Phase Hybrid Meta-Heuristic for the Vehicle Routing Problem with Time Windows. European Journal of Operational Research, Vol. 162(1), pp. 220-238. doi:10.1016/j.ejor.2004.01.027

[45] Transportation Optimization Portal – TOP, VRPTW benchmark data. SINTEF Applied Mathematics. Last online access: 12/04/2019, Available at: http://www.sintef.no/projectweb/top/vrptw/