Minimum Routing Cost Clustered Tree Problem (CluMRCT) is applied in various fields in both theory and application. Because the CluMRCT is NP-Hard, the approximate approaches are suitable to find the solution for this problem. Recently, Multifactorial Evolu- tionary Algorithm (MFEA) has emerged as one of the most efficient approximation algorithms to deal with many different kinds of problems. Therefore, this paper studies to apply MFEA for solving CluMRCT problems. In the proposed MFEA, we focus on crossover and mutation operators which create a valid solution of CluM- RCT problem in two levels: first level constructs spanning trees for graphs in clusters while the second level builds a spanning tree for connecting among clusters. To reduce the consuming resources, we will also introduce a new method of calculating the cost of CluMRCT solution. The proposed algorithm is experimented on numerous types of datasets. The experimental results demonstrate the effectiveness of the proposed algorithm, partially on large instances.
Mathematics of computing → Approximation algorithms; Discrete optimization.
Clustered Minimum Routing Cost Problem, Clustered Tree Problem, Multifactorial Evolutionary Algorithm, Evolutionary Algorithm
Tran Ba Trung, Huynh Thi Thanh Binh, Le Tien Thanh, Ly Trung Hieu, and Pham Dinh Thanh. 2019. Multifactorial Evolutionary Algorithm For
Clustered Minimum Routing Cost Problem. In The Tenth International Symposium on Information and Communication Technology (SoICT 2019), December 4–6, 2019, Hanoi - Ha Long Bay, Viet Nam. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3368926.3369712
The class of problems related to finding minimum cost clustered trees is among the most widely studied problems in applied mathematics and theoretical computer science. In general, clustered tree problems have many applications in the field of computer network design, computational biology, transportation and logistics as well as water resource management [12, 13]. Numerous problems with different cost function has received much attention such as Clustered Minimum Steiner Tree , Clustered Shortest-Path Tree  and Minimum Routing Cost Clustered Tree Problem (CluMRCT) .
Among those mentioned problems, CluMRCT is one of the most newly investigated problems. This problem has been formally formulated in . Concretely, consider a connected, undirected graph G = (V, E,w) with nonnegative edge length function . The vertices V are partitioned into A spanning tree T of G is a clustered spanning tree if T can be cut into k subtrees by eliminating 1 edges such that each subtree is a spanning tree for only one cluster. The CluMRCT problem focuses on the routing cost, which is the sum of shortest path distance between any pairs of vertices given a clustered spanning tree T. The CluMRCT problem is finding on graph G a clustered spanning treeT having the minimum routing cost. One practical realization of this problem is in computer network application where the communication terminals are vertices, and they are partitioned into many clusters. The communication between those terminals are restricted within a cluster and only a few terminals can be connected to another cluster for maximizing efficiency and other security concerns. Solving CluMRCT is equivalent to facilitate the network architecture that consumes the minimum peer-to-peer communication resources.
The CluMRCT has been proven to be NP-Hard by showing its relationship with an equivalent problem named clustered st-path problem, which is also proven to be NP-Hard . However, it is a pressing problem and approximate solutions are acceptable in practice. There are various approaches for finding approximate solutions for these type of intractable problem such as approximation algorithms, heuristic algorithms or meta-heuristic algorithms. However, methods to solve CluMRCT are limited to only a 2-approximation algorithm for the case of three clusters . In the practical application of CluMRCT for computer network, there would be multiple designs of the computer network topologies with various ways to set up the clusters constraint, thereby arising the problem of solving not only one CluMRCT but multiple CluMRCT instances. These instances do not exist in isolation; thus, effectively sharing the underlying pattern of minimum cost clustered tree for one instance may result in a better solution in another instance.
Motivated by that practical indication, this paper aims to design a Multifactorial Evolutionary Algorithm (MFEA)  to simultaneously solve multiple minimum cost clustered trees problem instances together. MFEA is a variant of Evolutionary Algorithm (EA) [15, 17], which is classified as a meta-heuristic algorithm that capable to provide a near-optimal solution for those mentioned hard problems in an acceptable timing limit. It is also being an algorithmic realization of a widely-study multitasking optimization scheme. It is attracting considerable interest because of its capability to optimize and facilitate better solutions for multiple related problems. The driving force behind MFEA is its genetic transfer using crossover across different tasks that share a partial proportion of high-quality solution from one task to another . If problems optimized by MFEA having the highly correlated fitness landscape and near global optima, the high-quality solution from one task will definitely having high performance in another, leading to significant improvement in every constitutive task [1, 10].
In particular, our contributions can be summarized as follow:
• We proposed a novel evolutionary algorithm to solve the CluMRCT, including a two levels meta-heuristic crossover and mutation operators on adjacency list representation of the problem.
• By following the generic procedure of MFEA, we expanded the representation and operators to facilitate a common platform for facilitating knowledge exchange between different instances of CluMRCT of the various number of clusters and cluster size.
This paper is organized as follows. Section 2 again reformulates the CluMRCT and presents comprehensive details on the way to efficiently compute its cost function. Section 3 examines the literature of the considered problem and the development of MFEA. The next section provides the two proposed algorithms. An empirical study to analyze the effectiveness of the proposed approaches is provided in Section 5. A discussion of the result and future direction is presented in Section 6
Given a weighted undirected graph G = (V, E,w). V and E are the vertex and the edge sets, respectively. In particular, all paths in the graph G are simple path. A path in G is simple if no vertex appears more than once on it. w is a nonnegative edge length function . An edge between vertices denoted by (u,v). The weight of edge (u,v) is denoted by w(u,v). A graph G is connected if all pairs of vertex within the graph is connected. A pair of vertices (u,v) is connected if G contains a path from u to v. For a graph G, V (G) and E(G) denote the vertex and the edge sets, respectively. For a vertex subset U , the subgraph of G induced by U is denoted by G[U ].
represents a cluster . An edge connecting two vertices of Gfi exists if the two corresponding clusters in G are connected by at least one edge. A spanning tree T of the graph G is a clustered spanning tree if all the vertices in the same cluster are clustered together inT. It means thatT can be cut intok subtrees by removing 1 edges such that each subtree is a spanning tree for one cluster
The objective of CluMRCT problem mainly focuses on the routing cost c(T). The routing cost c(T) is the total distance over all pairs of vertices in the connecting path defined by the clustered spanning tree T:
where is the distance between u and v onT. The CluMRCT problem searches for the clustered spanning tree T that minimize the routing cost:
In recent years, there has been considerable interest in solving the minimum cost clustered tree problem and its variants . Major developments have been carried out in developing approximate algorithms for solving the NP-hard Clustered Steiner Tree Problem (CluSteiner) variant and deriving the Steiner ratio of the problem . Chen subsequently proposed another approximate algorithm that improved the performance ratio by 40% under the condition of distance metric satisfying the triangular inequality. Another NPhard variant of the general class of minimum cost clustered tree problem called Clustered Shortest-Path Tree Problem (CluSPT) has also been widely investigated in recent literature, under theoretical viewpoint  as well as developing approximation algorithms for this problem [4, 19]. Following studies from Thanh et al. suggested another approach for solving this class problem more effectively by adopting the evolutionary multitasking paradigm [18, 20].
CluMRCT that we are focusing on this study is also a typical variant of the general minimum-cost clustered tree that having numerous practical applications. For instance, applications of CluM- RCT problem arise in the fields of network design, computational biology and transportation . CluMRCT is proven NP-hard by a transformation to the equivalent problem of. Because of its NPhardness, a popular method to tackling the CluMRCT would be developing approximation algorithm, heuristic or meta-heuristic algorithm. However, to the best of our knowledge, there are no published study has proposed such method for this problem.
In recent literature, MFEA  is a newly founded meta-heuristic algorithm that has been gaining much attention due to its ability to solve multiple hard problems more effectively than the traditional EA . Some preliminary work was carried out suggesting that MFEA is being able to capitalize the knowledge overlap across relevant tasks to facilitate better objective value on solving multiple combinatorial problems concurrently [11, 22]. Other studies attempt to investigate the theoretical foundation of MFEA as well as improve this algorithm such that its performance would not be impeded by solving tasks that are two conflicting in their fitness landscape and global optima [2, 5]. Also, there is no prior attempt to design a MFEA for solving CluMRCT.
In this section, we introduce MFEA to solve multiple CluMRCT problems simultaneously. A CluMRCT problem corresponding to a particular task in MFEA. Tasks are different number of clusters and different dimension is encoded into a unified individual. Each unified individual is divided into two-level, the first deal with constructing the spanning tree inside the clusters, the second problems deal with constructing tree connect between clusters. In the entire optimization process, we introduce novel two-levels genetic operators to construct spanning tree starting from the smallest task to the biggest task. As a result, the unified individual is the biggest tree contains the solution of all tasks.
4.1 Individual encoding, Population Initialization and Decoding method
An individual in the unified search space encompassing the genetic material of all of the tasks. An individual in the unified search space consists of the different solutions of CluMRCT problems corresponding to tasks. An individual is a spanning tree which is the number of clusters is the maximum number of the clusters from all of the tasks, and the cluster inside takes the maximum number of vertices from the all of all tasks. The clusters in each instance of CluMRCT problem is ascending sorted by the number of vertices before encoding into a unified individual to maximize the number of overlapping dimensions of different tasks. As a result, a unified individual captures solution for all task and the decoding method only finds the corresponding sub-graph with the task to produce valid solution for particular CluMRCT task. After defining structures of an individual, we generate a new individual for the CluMRCT problem. Initial individual is constructed randomly by applying PrimRST algorithms in the smallest task, all initial bigger tasks are generated from the smaller task and input graph by detecting and delete cycle until gaining spanning tree for the biggest task. The detail about encoding, decoding, and initializing individual are presented in .
4.2 Crossover operator
We introduce a new crossover operator for individuals in unified search space. In particular, the novel crossover operator generates two offsprings from two parents. Two offsprings are constructed from the set of edges of two parents. The crossover operator starts by selecting two individuals in unified search space from the current population to determine the common edges belong to each
individual, the length of this set represents the similarity of two parents. If two individuals have more common edge meaning that two individual close similarities. The algorithm determines the set of edges the only belong to a particular parent.
After that, a new offspring is constructed based on the combination of a set of common edges and a set of edges only belong to its parent. The crossover operator is process in two-levels: the first is applying for spanning inside cluster and the seconds one is applying for spanning tree which is connect among clusters. The details as presenting in Algorithm 1.
The steps of the DSTX algorithm is presented in Algorithm 2. DSTX generates two spanning trees . Line 2, 3 define two sets of edges only belong to a particular parent. Line 8 23 generates two offsprings spanning tree from the smallest task to the biggest task. Line 9 15 to generate offspring . In each particular task, the algorithm detects the cycles which are created by adding new edges to parents solution, then delete an edge in each cycle path to obtain the new solution. Removing an edge will be created a new valid solution for the current task but without disrupting the solution for the smaller task.
The Figure 1 illustrates the crossover operator on two parents. The Figure 1(b-d) presents crossover for each cluster, other figures present crossover steps for H-Graphs. Figure 1(b) depicts the sub-graph when combine the sub-graph on clusters in Parent 1 and the set of edges only belong to Parent 2. The red dash edges in Figure 1(d) belong to parent 2 obtained by performing the DSTX. Figure1(e) illustrates the sub-graph after building the spanning tree for clusters. Figure 1(d) and Figure 1(g) show graph after combining H-Graphs in second stages. The offspring obtained by crossover operator is depicted in Figure 1(h) after combining two graph in Figure 1(d) and Figure 1(g).
4.3 Mutation operator
This mutation operator to generate new individual based on applying small variation on the original individual. The main idea of the proposed mutation operator (PMO) is that the PMO adds an edge in the input graph G but not in the individual to create a cycle and then removes an edge in the cycle to create a new spanning tree. The details of the MPO is presented in Algorithm 3.
Figure 1: An example of crossover process applying two parents for MFEA with two tasks
4.4 Evaluation method
In this sub-section, we present a method to calculate the cost of CluMRCT solution in O(n) time complexity. The cost function of CluMRCT is the sum of connectivity cost between all pairs of vertices in the graph. The solution of CluMRCT is spanning tree, thus, traveling all vertices in O(n) time complexity and removing an edge from spanning tree will create two connected components. Given T = (E,V ) is the CluMRCT solution, the cost of two adjacency vertices on solutionT are calculated as the following formula is the number of vertices in connected component contains after remove fromT. |V 2| is the number of vertices in connected component contains after remove from is connecting cost between two adjacency vertices
The cost of CluMRCT solution T is calculated as following four- step procedure:
• Step 2: BFS algorithms start with s vertex then store the order of visiting to the visited list.
• Step 3: The cost of two adjacency veritices on solutionT are calculated as following: Travel bottom-up (leaves to root) in visited list to update the label of current vertex to l = d + 1, where d is the number of descendant vertices of current vertex. Update the label of ancestor vertex to . The cost between current vertex and its ancestor is calculated by formulal is the current vertex label, n is the number of vertices, is connecting cost between current vertex and its ancestor
• Step 4: The cost function of CluMRCT is calculated follow as: . Where, is the cost of two adjacency in CluMRCT solution T.
5.1 Problem instances
To the best of our knowledge, there is no publicly available set of benchmark for the CluMRCT problem. Therefore, we utilized a set public test instances for an equivalent clustered tree problem which is the CluSPT dataset version 3 on Mendeley . This dataset included six distinct types of instances that were generated by various algorithms  and categorized into two types conformable to their dimension. The instances were appropriate for evaluating cluster problems . However, to test the effectiveness of proposed algorithms to solving the CluMRCT instead of CluSPT, we ignore the information of the global source vertex of this CluSPT dataset. For evaluation of the proposed algorithms, instances with dimensionality from 30 to 500 were selected.
5.2 Experimental setup
To evaluate the performance of the proposed algorithm for the CluMRCT, the authors implemented the metaheuristic algorithm EA and MFEA. Then, the results of two algorithms for each instance were compared with each other in term of the best and average fitness of the optimized solutions.
Each scenario was simulated for 30 times on the computer (Intel Core i7 âĂŞ 4790, 16GB RAM), with a population size of 100 individuals evolving through 500 generations. The random mating probability is 0.5 and the mutation rate is 0.05. The source codes were installed by Java programming language.
5.3 Experimental criteria
Criteria for evaluating the quality of the result of the algorithms are presented in the following table:
In order to compare the quality of the CluMRCT solutions received from EA and MFEA, RPD is used to compute the difference between the average costs of two algorithms. The RPD is calculated by the following formula: 100% where is the average cost of a solution obtained from EA,is the average cost of a solution obtained from MFEA.
5.4 Experimental results
The experimental results show that the quality of the CluMRCT solution of MFEA algorithm is better than EA algorithm in most scenario. Particularly, MEFAâĂŹs results are better than EAâĂŹs results in all instances in Type 3, Type 4, Type 1 Large, Type 5 Large and Type 6 Large. It means that the larger instance’s size, the more MFEA tends to outperform than EA. There is a noteworthy point that with large instances, the results obtained by MFEA are always better than one obtained by EA in both average result and best-found in any Type. The comparison of results obtained by EA and results obtained by the proposed algorithm is presented in detail in the Table 1.
Table 1: The summarized results demonstrate the propor- tion of instances that MFEA outperformed EA in term of number of instances having higher best-found fitness and average fitness
The details results obtained by algorithms are presented in Table 2 – Table 5. In this table, the italic red cells on a column to show that on those instances, this algorithm outperforms than the other algorithm. The biggest RPD(MFEA, EA) is 98.59% while the biggest RPD(EA, MFEA) is 16.9.
5.5 Convergence trend
The functions in  was used for computing the normalized and averaged normalized objectives for analyzing the proposed MFEA algorithmâĂŹs convergence trends.
The convergence trend during the initial stages of the multi-tasks is depicted in Figure 2(a) for instances 10berlin52 and 10eil51 in Type 1; instances 4berlin52-2x2 and 42rat99-6x7 in Type 6. In this figure, multi-tasks (MT) converges faster than Single task (ST) at any time in the test instances of Type 1. With Type 6âĂŹs instances, ST converges faster than MT at the begin but MT has gradually surpassed ST from evaluation 100 to the end. In summary, the MT performance suggests better overall convergence characteristics than the ST performance.
With the same instances and genetic operators were used in ST and MT, the improvement can be reached because of the exploitation of multiple function landscapes via implicit genetic transfer, such as the evolutionary multitasking paradigm affords.
Refer to Figure 2(b) and Figure 2(c) for a better understanding of the improved performance because of MT. The figure depicts the convergence trends corresponding to each constitutive task. As you can see, in most instances, the convergence of MT is faster in comparison with ST as instances 10berlin52 and 10eil51 in Figure 2(b). But, some case, like 4berlin52-2x2 in Figures 2(c), the convergence
Figure 2: Comparing convergence trends of ˜in multi-tasks and serial single task for instances in Types 1, 6.
of MT was slower than the one of ST. In this case, 4berlin52-2x2 and 42rat99-6x7 are simultaneously solved, but two instances are very different structure of solution. 4berlin52-2x2 has 4 clusters and 52 vertices, meanwhile 42rat99-6x7 has 42 clusters and 99 vertices. In unified search space, the shared representation of two solutions is very small, only 4 clusters 42rat99-6x7 impact to 4berlin52-2x2. As a result, the two instances are less impacting each other in process of transferring knowledge during multitasking is the cause of ST outperform than MT.
This paper proposed a multitask optimization algorithm in the realm of the general MFEA to solve the multiple instances of CluMRCT problem together. Evolutionary operators and a new method for reducing consuming resource of evaluating the solution are also described. The proposed multitasking algorithm is tested on many datasets. Experimental results showed that the proposed MFEA outperforms its single-task variant in most test cases.
 K. K. Bali, A. Gupta, L. Feng, Y. S. Ong, and Tan Puay Siew. 2017. Linearized domain adaptation in evolutionary multitasking. In 2017 IEEE Congress on Evolutionary Computation (CEC). Institute of Electrical and Electronics Engineers (IEEE), Donostia/San Sebastian, Spain., 1295–1302.
 K. K. Bali, Y. Ong, A. Gupta, and P. S. Tan. 2019. Multifactorial Evolutionary Algorithm with Online Transfer Parameter Estimation: MFEA-II. IEEE Transactions on Evolutionary Computation (Early Access), - (march 2019), 1–1.
 Kavitesh Kumar Bali, Yew-Soon Ong, Abhishek Gupta, and Puay Siew Tan. 2019. Multifactorial Evolutionary Algorithm with Online Transfer Parameter Estimation: MFEA-II. IEEE Transactions on Evolutionary Computation (2019).
 Huynh Thi Thanh Binh, Pham Dinh Thanh, and Ta Bao Thang. 2019. New approach to solving the clustered shortest-path tree problem based on reducing the search space of evolutionary algorithm. Knowledge-Based Systems 180 (2019), 12 – 25.
 Y. Chen, J. Zhong, L. Feng, and J. Zhang. 2019. An Adaptive Archive-Based Evolutionary Framework for Many-Task Optimization. IEEE Transactions on Emerging Topics in Computational Intelligence (Early Access), - (june 2019), 1–16.
 Yen Hung Chen. 2017. The Clustered and Bottleneck Clustered Selected-Internal Steiner Tree Problems. In The Second Malta Conference in Graph Theory and Combinatorics 2017. The Second Malta Conference in Graph Theory and Combinatorics, Qawra, St Paul’s Bay, 44.
 Mattia D’Emidio, Luca Forlizzi, Daniele Frigioni, Stefano Leucci, and Guido Proietti. 2019. Hardness, approximability, and fixed-parameter tractability of the clustered shortest-path tree problem. Journal of Combinatorial Optimization 38, 1 (01 Jul 2019), 165–184.
 A. Gupta and Y. Ong. 2016. Genetic transfer or population diversification? Deciphering the secret ingredients of evolutionary multitask optimization. In
2016 IEEE Symposium Series on Computational Intelligence (SSCI). Institute of Electrical and Electronics Engineers (IEEE), Athens, Greece, 1–7.
 A. Gupta, Y. Ong, and L. Feng. 2016. Multifactorial Evolution: Toward Evolutionary Multitasking. IEEE Transactions on Evolutionary Computation 20, 3 (June 2016), 343–357.
 A. Gupta, Y. S. Ong, B. Da, L. Feng, and S. D. Handoko. 2016. Landscape synergy in evolutionary multitasking. In 2016 IEEE Congress on Evolutionary Computation (CEC). Institute of Electrical and Electronics Engineers (IEEE), Vancouver, BC, Canada, 3076–3083.
 Lei Zhou, L. Feng, Jinghui Zhong, Y. Ong, Z. Zhu, and E. Sha. 2016. Evolutionary multitasking in combinatorial search spaces: A case study in capacitated vehicle routing problem. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI). Institute of Electrical and Electronics Engineers (IEEE), Athens, Greece, 1–8.
 Chen-Wan Lin and Bang Ye Wu. 2017. On the minimum routing cost clustered tree problem. Journal of Combinatorial Optimization 33, 3 (01 Apr 2017), 1106–1121.
 Adriano Masone, Maria Elena Nenni, Antonio Sforza, and Claudio Sterle. 2019. The Minimum Routing Cost Tree Problem. Soft Computing 23, 9 (01 May 2019), 2947–2957.
 Mario Mestria, Luiz Satoru Ochi, and Simone de Lima Martins. 2013. GRASP with path relinking for the symmetric Euclidean clustered traveling salesman problem. Computers and Operations Research 40, 12 (2013), 3218 – 3229.
 Dinh Thanh Pham and Thi Thanh Binh Huynh. 2015. An effective combination of genetic algorithms and the variable neighborhood search for solving travelling salesman problem. In 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE, 142–149.
 Pham Dinh Thanh, Huynh Thi Thanh Binh, and Bui Thu Lam. 2015. New mechanism of combination crossover operators in genetic algorithm for solving the traveling salesman problem. In Knowledge and Systems Engineering. Springer, 367–379.
 P. D. Thanh, D. A. Dung, T. N. Tien, and H. T. T. Binh. 2018. An Effective Representation Scheme in Multifactorial Evolutionary Algorithm for Solving Cluster Shortest-Path Tree Problem. In 2018 IEEE Congress on Evolutionary Computation (CEC). Institute of Electrical and Electronics Engineers (IEEE), Rio de Janeiro, Brazil, 1–8.
 P. D. Thanh, H. Thi Thanh Binh, D. D. Dac, N. Binh Long, and L. M. Hai Phong. 2019. A Heuristic Based on Randomized Greedy Algorithms for the Clustered Shortest-Path Tree Problem. In 2019 IEEE Congress on Evolutionary Computation (CEC). Institute of Electrical and Electronics Engineers (IEEE), Wellington, New Zealand, 2915–2922.
 H. ThiThanh Binh, P. Dinh Thanh, T. Ba Trung, and L. Phuong Thao. 2018. Effective Multifactorial Evolutionary Algorithm for Solving the Cluster Shortest Path Tree Problem. In 2018 IEEE Congress on Evolutionary Computation (CEC). Institute of Electrical and Electronics Engineers (IEEE), Rio de Janeiro, Brazil, 1–8.
 Bang Ye Wu and Chen-Wan Lin. 2015. On the clustered Steiner tree problem. Journal of Combinatorial Optimization 30, 2 (01 Aug 2015), 370–386.
 Y. Yuan, Y. Ong, A. Gupta, P. S. Tan, and H. Xu. 2016. Evolutionary multitasking in permutation-based combinatorial optimization problems: Realization with TSP, QAP, LOP, and JSP. In 2016 IEEE Region 10 Conference (TENCON). Institute of Electrical and Electronics Engineers (IEEE), Marina Bay Sands, Singapore, 3157–3164.