A Two stage Adaptive Knowledge Transfer Evolutionary Multi-tasking Based on Population Distribution for Multi/Many-Objective Optimization

2020·arXiv

Abstract

Abstract

Multi-tasking optimization can usually achieve better performance than traditional single-tasking optimization through knowledge transfer between tasks. However, the current multi-tasking optimization algorithms suffer from some deficien-cies. On the problems sharing high similarity, the knowledge that can accelerate the convergence rate of tasks has not been fully taken advantages of. On the less similar problems, the algorithms tend to suffer from negative transfer, which may result in performance degradation. In addition, some knowledge transfer methods proposed previously do not fully consider how to deal with the situation in which the population falls into local optima. To solve these issues, this paper proposes an evolutionary multi-tasking optimization algorithm for multi/many-objective optimization with two-stage adaptive knowledge transfer based on population distribution . The resultant algorithm namely EMTPD can accelerate and improve the convergence performance of tasks based on the knowledge extracted from the probability model that reflects the search trend of the whole population. At the first transfer stage, an adaptive weight is used to adjust the step size of individual’s search, which can reduce the impact of negative transfer. At the second stage of knowledge transfer, the individual’s search range is further adjusted dynamically, which can improve the diversity of population and be benefi-cial for jumping out of local optima. Experimental results on multi-tasking multi-objective optimization test suites show that EMT-PD is superior to other six state-of-the-art evolutionary multi/single-tasking algorithms. To further investigate the effectiveness of EMT-PD on many-objective optimization problems, a multi-tasking many-objective optimization test suite is also designed in this paper. The experimental results on the new test suite also demonstrate the competitiveness of EMT-PD.

Index Terms—Multi-objective Optimization, Many-objective Optimization, Evolutionary multi-tasking, Population distribution, Knowledge transfer

I. INTRODUCTION

MULTI-OBJECTIVE optimization problems (MOPs)widely exist in real world [1–3]. Generally, an MOP can be described as follows:

where represents a D-dimensional decision vector in search space

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Z. Liang, W. Liang, X. Xu, X. Ma, L. Liu and Z. Zhu are with the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China (e-mail: liangzp@szu.edu.cn; liangwq0131@foxmail.com; 377611694@qq.com; maxiaoliang@yeah.net; liulingcs@szu.edu.cn; zhuzx@szu.edu.cn).

denotes the objective function vectors with n objectives. An MOP is also referred to as a many-objective optimization problem (MaOP) if n > 3 [4–6]. A solution x is said to dominate another solution y, if and only if for all and there exists one objective such that . A solution not dominated by any other solutions is called a Pareto optimal solution. All Pareto optimal solutions form the Pareto optimal set (PS) of which the corresponding projection in the objective space is called the Pareto front (PF).

MOPs and MaOPs can be effectively solved by multi/many-objective evolutionary algorithms [7–20]. The majority of the existing multi/many-objective evolutionary algorithms are designed to handle one problem at a time. To solve a new problem, the algorithms must start from scratch. However, many optimization problems encountered in real-world are correlated with each other. The experience of solving one problem can be beneficial to the solving of another related problem. In this line and inspired by the capability of human brain to process transactions in parallel, Gupta et al. [21] proposed a new optimization paradigm namely evolutionary multi-tasking (EMT) to deal with multiple optimization tasks simultaneously. Compared to the single-tasking counterpart algorithms, EMT has been shown to be able to considerably improve the overall performance in solving multiple related optimization tasks via knowledge transfer.

Following [21], a number of multi-objective EMT algorithms have been proposed in the literature to solve various MOPs. For example, Gupta et al. extended [21] and proposed a multi-objective EMT algorithm (MOMFEA) [22] to deal with multiple MOPs simultaneously through assortative mating and vertical cultural transmission. Yang et al. [23] presented a two-stage assortative mating method for EMT with decision variables classification, i.e., assortative mating is carried out on different variable groups with different parameters to balance the diversity and convergence. Feng et al. [24] proposed an EMT algorithm with explicit genetic transfer (EMT-EGT) to enhance the search ability of the evolution population. Chen et al. [25] introduced a memetic EMT framework for knowledge transfer between subpopulations. Tuan et al. [26] put forward an EMT algorithm employing local search strategy to accelerate the convergence of population. EMT has also achieved successes in real-world applications, e.g., permutation-based combinatorial optimization problems [27], branch testing problems in software engineering [28], modular knowledge representation in neural networks [29], symbolic regression problems [30], multi-objective pollution-routing

problems [31], and hyperspectral image unmixing [32].

The research on EMT algorithms has made remarkably progress, yet there remain room for further improvement and open questions to be addressed. Particularly, on high similar problems, the knowledge of high quality solutions has not bee fully used to improve the convergence of the population. On problems with low similarity, the population distributions of the tasks are generally different, which tends to result in negative transfer between tasks [33]. Moreover, the existing knowledge transfer methods also do not take good care of the situation where the population falls into local optima.

To address the aforementioned issues, this paper proposes a new multi-objective EMT algorithm namely EMT-PD with two-stage adaptive knowledge transfer based on population distribution. EMT-PD firstly builds probability models for each task, and then obtains knowledge from the product of different probability models. The knowledge can help to accelerate the convergence rate of population. At the first stage of knowledge transfer, the step size of individual’s search is adjusted by adaptive weight, which can reduce the probability of generating negative transfer. At the second stage of knowledge transfer, the search range of individual is further adjusted dynamically, which can promote the diversity of population and avoid getting trapped in local optima. EMT-PD is tested on multi-tasking multi/many-objective optimization test suites. The comparison studies with six state-of-the-art evolutionary multi/single-tasking algorithms demonstrate the competitiveness of EMT-PD. The main contributions of this paper are highlighted as follows:

1) A multi-tasking multi/many-objective evolutionary optimization algorithm with novel method of extracting and transferring knowledge is proposed to improve the efficiency and performance of optimization.

2) A multi-tasking many-objective optimization test suite is designed based on a representative many-objectives test suite MaF [34].

3) Based on three test suites, EMT-PD is fully analyzed by comparing with state-of-the-art algorithms.

The rest of this paper is organized as follows. Section II introduces the related work and motivation of EMT-PD. Section III describes the details of EMT-PD. Section IV presents the experimental design. Section V demonstrates the performance of EMT-PD with different probability models. In the end, Section VI concludes this work and discusses some potential future directions.

II. RELATED WORK AND MOTIVATION

This section briefly reviews the related knowledge extract and transfer methods and explains the motivation of the proposed method.

A. Extract and Transfer Knowledge in EMT

In EMT algorithms, knowledge can be extracted from a single or multiple individuals and transferred to other individuals to facilitate their search. Particularly, knowledge transfer methods based on single individual (KTS) refer to extracting knowledge from a single individual of one task, and transferring knowledge to other tasks. EMT algorithms with KTS include MFEA [21], M-BLEA [35], LDA-MFEA [36], S&MMFEA [37], MO-MFEA [22], GMFEA [38], TMO-MFEA [23], MTO-DRA [39], MFEA-II [40], MFEA-GHS [41], and MFGP [42]. All the above algorithms transfer knowledge through assortative mating and vertical cultural transmission [43]. In assortative mating, two individuals are selected from the population randomly, and then generate offspring by simulated binary crossover (SBX) and polynomial mutation. In vertical cultural transmission, each offspring is randomly assigned to a task. In KTS, each individual provides different knowledge for the tasks, and the diversity of population is maintained effectively. However, EMT algorithms with KTS cannot fully utilize the knowledge of high quality solutions to accelerate the convergence rate of the population due to the randomness of knowledge transfer.

Knowledge transfer methods based on multiple individuals (KTM) refer to extracting knowledge from multiple individuals of one task, and transferring knowledge to other tasks. EMT algorithms with KTM can be implemented based on particle swarm optimization (PSO). For example, Feng et al. [44] proposed a multi-tasking PSO algorithm. The convergence of population is accelerated by the guidance of optimal solutions of multiple tasks. Tang et al. [45] proposed an adaptive multi-tasking PSO algorithm by introducing a selfadaption strategy to adjust the inter-task knowledge transfer probability, which reduces the probability of negative transfer effectively. Song et al. [46] proposed a multi-tasking multiswarm optimization algorithm. The quality of the solutions is improved by crossover between optimal individuals of all tasks.

Differential evolution (DE) is another popular platform to construct EMT with KTM. For instance, Liu et al. [47] proposed SaM-MA, with three different mechanisms employed, i.e., DE algorithm, predicting optimal solution via surrogate model, and local searching strategy. The mixed use of these three mechanisms can balance diversity and convergence of population. Zhou et al. [48] proposed a new mutation strategy called DE/best/1, with gradually increased weight of knowledge transfer in the process of evolution, which improves the diversity of population.

Explicit knowledge transfer represents another form of KTM in EMT algorithms. Feng et al. [24] first proposed an EMT algorithm EMT-EGT with explicit knowledge transfer, which allows the incorporation of multiple search mechanisms with different biases. It is able to improve the search ability of population. Shang et al. [49] proposed a credit assignment approach, which selects proper individuals for explicit knowledge transfer. The efficiency of knowledge transfer of this method was shown to be improved.

KTM is beneficial to improve the convergence of population, but it also has the probability of extracting knowledge from inferior individuals. Besides, the search of entire population at one generation is guided by the same solutions in some KTMs, which increase the probability of population falling into local optima.

Fig. 1: The illustration of knowledge transfer and search direction, where x1 and x2 are the two dimensions of the decision variables. Triangles and squares represent the high quality solutions and inferior solutions, respectively. The solutions of task1 and task2 are distinguished with hollow and solid faces, respectively. p1 represents an solution selected from task1. The global optima are supposed to be located in the ellipse areas. (a) and (b) are KTS and KTM in high similarity scenario, respectively. (c) and (d) are KTS and KTM in low similarity scenario, respectively.

B. Motivation of Two-stage Adaptive Knowledge Transfer based on Population Distribution

In this subsection, we take an example of two-tasking problem as shown in Fig. 1 to explain the motivation of the proposed with two-stage adaptive knowledge transfer based on population distribution.

For high similarity problems, knowledge extracted from the high quality solutions of one task can accelerate the convergence of another task effectively [38]. However, for both KTS and KTM, knowledge may be extracted from inferior individuals. Fig. 1(a) shows an example of KTS in high similarity scenario. In the search space, high quality solutions of task1 and task2 converge in the same area, p1 is a solution of task1, p2 is a high quality solution of task2, and p3 is an inferior solution of task2. If knowledge is transferred from p2 to p1, then p1 will be guided to search the centralized area of high quality solutions. However, KTS may transfer knowledge from inferior solution p3 to p1, which would slow down the convergence of task1. Fig. 1(b) shows an example of KTM in high similarity scenario where a solution is guided by two other solutions. p1 is a solution of task1. p2 and p3 are two high quality solutions of task2, respectively. p4 is a inferior solution of task2. If knowledge is transferred from p2 and p3 to p1, p1 will search the centralized area of high quality solutions. However, KTM may also transfer the knowledge of inferior solution p4 to p1, if p1 learns from p3 and p4, which would cause p1 to deviate from the ideal search direction.

For low similarity problems, the population distribution of task1 and task2 is very different [50]. Whether in KTS or KTM, population of different tasks are difficult to guide each other to search with ideal search direction. There would be a high probability of generating negative transfer between those tasks. Fig. 1(c) shows an example of KTS in low similarity scenario, high quality solutions of different tasks are distributed in different areas. p1 represents an individual of task1. p2 is a high quality solution of task2. When knowledge is transferred from p2 to p1, p1 will deviate severely from the convergence area of high quality solutions of task1. Fig. 1(d) shows an example of KTM in low similarity scenario, which is similar to the situation of KTS. Moreover, in KTMs implemented with PSO or DE, the search of the entire population at one generation is only guided by the best solutions of the tasks, which easily leads the population to local optima.

To address the above issues, this paper presents a novel EMT algorithm with two-stage adaptive knowledge transfer based on population distribution. Specifically, the probability models are firstly built for the population of each task, and the knowledge used for transfer is extracted from the maximum point of the probability models’ product. Note that the population distribution rather than individual(s) is used as the knowledge source to relieve the impact of inferior individuals. Afterward, at the first stage of knowledge transfer, the step size of individual’s search is adaptively adjusted to reduce the impact of negative transfer. At the second stage of knowledge transfer, the search range of individual is adjusted again dynamically, which can increase the diversity of population and reduce its probability of falling into local optimum.

Multi-objective optimization algorithms have achieved in many practical applications [1–3]. However, the potential synergies between distinct optimization problems has not been fully explored in traditional multi-objective optimization algorithms. In this section, we propose a multi-tasking algorithm with two-stage knowledge transfer base on population distribution to exploit the potential synergies between problems. In this section, we first introduce the main framework of EMT-PD. Then the process of building probability model and extracting knowledge is proposed. After that, the details of two-stage adaptive knowledge transfer are explained. Finally, computational complexity analysis of EMT-PD is discussed.

A. The Main Framework of EMT-PD

The main framework of EMT-PD is summarized in Algorithm 1. Firstly, the population P is initialized, and divided into two sub-populations and for different tasks. At each evolutionary generation, a probability model is built for each task by Algorithm 2 in line 4. In line 5, the maximum point mp of the product of probability models is obtained by Eq. (6), which is used to guide the search of population. The two-stage adaptive knowledge transfer and offspring generation are carried out by Algorithm 3 in line 6. It is worth noting that EMT-PD is a general algorithm, which supports various types of probability model.

B. Build Probability Model

In this paper, the Maximum Likelihood Estimation (MLE) is used to estimate the parameters of probability model according to population, and the maximum point of probability model is obtained.

Let M denote the probability models of population. is the parameter of M in j-th dimension of decision variable. The MLE is used to find the parameter value , which maximizes the log-likelihood . Firstly, is calculated as follows:

where represents the j-th variance of i-th individual, i = 1, 2, ..., N. N denotes the population size. The type of probability model is R. ln denotes the natural logarithm.

Then, is calculated as follows:

Based on above calculations, the probability model of j- th decision variable be obtained, that is , which also can be labeled as follows:

The maximum point m of probability model M learned from population reflects the centralization of population in each generation of evolution [51]. The formula of calculating is as follows:

Algorithm 2 shows the process of building probability models by MLE. For each dimension of decision variable, the log-likelihood is calculated by Eq. (2) in line 2. Then the parameters of probability model are calculated by Eq. (3) in line 3. After that, the maximum point of is obtained according to Eq. (5) in line 4.

Fig. 2: The maximum point of probability models product reflects the common centralization of two populations.

Finally, the probability model M and the maximum point m of M is obtained, where , , and D is the dimension of decision variable.

C. Extract Knowledge from Population Distribution

The maximum point of probability models product reflects the common centralization of two populations [51]. In our algorithm, the maximum point of probability models product is used to guide the search of each task. It is helpful to accelerate convergence rate and decrease the probability of falling into local optimum.

According to Algorithm 2, the two probability models and can be learned from population of task1 and task2, respectively, where and . The maximum

Fig. 3: An example of two-stage knowledge transfer in the 2-Dimensions unified searching space. xare the two dimensions of decision variable.

point of the product of and can be calculated as follows:

As shown in Fig. 2. The green line is the population distribution of task1. The blue line is the population distribution of task2. and are the maximum point of task1 and task2 respectively. The red line represents the product of population distributions of task1 and task2, which reflects the common search trend of task1 and task2. The maximum point mp of red line reflects the common centralization of populations of task1 and task2, and mp is used to guide the search of each population.

D. Two-stage Knowledge Transfer and Offspring Generation

For high similarity problems, populations of task1 and task2 will converge to the approximate area, and the convergence rate of task1 and task2 can be accelerated by mp directly. But for low similarity problems, populations of task1 and task2 will converge to different areas. If mp is directly used to guide the search of population, there may be a lot of negative transfers. In addition, it is necessary to balance the convergence and the diversity of populations no matter for high or low similarity problems. To solve those issues, we propose a two-stage adaptive knowledge transfer based on population distribution. At the first stage, the step size of individual’s search is adjusted adaptively to reduce the impact of negative transfer. At the second stage, the search range of individual is further based on an intermediate individual, increases the diversity of population and helps to jumping out of local optimum.

Fig. 3 displays the process of two-stage knowledge transfer, where p is a individual selected from population of certain task. mp is the maximum point of product of probability models. At the first stage of knowledge transfer, the knowledge is transferred from mp to p. The search of p is guided by mp, and an intermediate individual is generated by the first stage knowledge transfer. At the second stage, the search range of the intermediate individual is adjusted again dynamically to generate offspring c.

Algorithm 3 is the pseudo code of the two-stage adaptive knowledge transfer and offspring generation for one task. Firstly, the Euclidean distance between mp and the maximum point m of the task’s probability model is calculated in line 1. Then for each individual p, the Euclidean distance between p and m is calculated in line 3. and are calculated as follows:

where tr() denotes the trace operation of a matrix. In line 4, the intermediate individual is generated via the first stage of knowledge transfer. is calculated is as follows:

where w is the adaptive weight of knowledge transfer. w is defined as follows:

As m reflects the centralization of population which p belongs to, if mp is close to m, it means that the population distribution of the two tasks are similar, and the knowledge from mp can effectively guide the search of p. Therefore, it is suitable to increase the wight of knowledge transfer w when mp is close to m. In Formula 10, is the distance between mp and m, the smaller is, the larger w is. On the contrary, if m is far away from mp, the difference of the population distribution of two tasks will become large, and the knowledge from mp may leads negative transfer between tasks. It is necessary to decrease the wight w of knowledge transfer in the situation. In Formula 10, the larger is, the smaller w is. In addition, is used to measure the distance between individual p and the centralization of population. If is small, it means that p is closer to the centralization of population. The weight w needs to be decreased to reduce transfer distance of p. In Formula 10, the smaller is, the smaller w is. On the contrary, if is large, it means p is far away from the centralization of population. The weight w needs to be increased to enlarge transfer distance of p. In Formula 10, w increases with .

In line 5, the offsping c is generated via the second stage of knowledge transfer as follows:

where is an intermediate individual generated by first stage knowledge transfer, and v is a search vector defined as follows:

where D is the dimension of decision variable, Q is the D-dimensional Gaussian White Noise, and F is the scaling factor. The Gaussian white noise Q adopted in this paper follows a standard Gaussian distribution, and its mean and variance are 0 and 1, respectively. Parameter F is used to adjust the order of magnitude of white noise. White noise is introduced to increase population diversity.

The reason to introduce the second stage knowledge transfer is to better balance the convergence and the diversity of populations since all individuals are searched in the same direction guided by mp at the first stage. When is small, the knowledge transfer at the first stage may be effective due to the high similarity between the two tasks. It is not necessary to substantially adjust the range of searching at the second stage in Eq. (12). On the contrary, if d1 is large, it means the low similarity between two tasks. The adjustment of search range at the second stage should be reasonably increased. As for , if it is small, the individual p is close to the convergence area. It is also not necessary to substantially adjust again at the second stage. If is large, the adjustment of search range at the second stage also should be reasonably increased. Polynomial mutation is performed in line 6. Offspring is combined into population C in line 7.

E. Computational Complexity Analysis

In this section, the computational complexity of EMT-PD within a generation is discussed. EMT-PD mainly consists of three parts: 1) building probability model; 2) two-stage knowledge transfer and offspring generation; and 3) environmental selection. The maximum likelihood estimation is used to build probability model, which requires computational complexity of O(DN). In two-stage knowledge transfer and offspring generation, a computational complexity of O(DN) is needed. In environmental selection, computational complexity of non-dominated sorting and crowding distance are and O(nN log(N)), respectively. To sum up, the computational complexity of EMT-PD is , where D and N represent the dimension of decision variable and population size respectively, and n is the number of objectives.

The performance of EMT-PD is firstly evaluated and compared with several state-of-the-art optimization algorithms on multi-tasking multi-objective test suites. Then a novel multi-tasking many-objective test suite is proposed, and the performance of EMT-PD on it is also evaluated and compared with state-of-the-art optimization algorithms. Due to space limitation, the experiment of convergence and running time are put into the Setion III and IV of the supplementary materials.

A. Experiment and Analysis on Multi-objective Problems

1) Test Problems and Compared Algorithms: To assess the performance of EMT-PD, two multi-objective test suites are applied.

The classical multi-tasking multi-objective test suite MTMOPs [52], which can be split into three groups according to the degree of intersection, i.e., complete intersection (CI), partial intersection (PI), and no intersection (NI). Moreover, these groups can be further partitioned into high similarity (HS), medium similarity (MS), and low similarity (LS). Therefore, there are nine problems, namely, CIHS, CIMS, CILS, PIHS, PIMS, PILS, NIHS, NIMS, and NILS.

The complex multi-tasking multi-objective optimization test suite CEC2019-CMO was proposed in IEEE CEC2019 competition on evolutionary multi-tasking optimization. CEC2019-CMO contains ten multi-tasking multi-objective problems. The details of CEC2019-CMO are summarized in [53].

Six state-of-the-art algorithms are used in comparision with EMT-PD, including three multi-objective EMT algorithms, i.e., MO-MFEA [22], TMO-MFEA [23], EMT-EGT [24], and three representative multi-objective evolutionary algorithms, i.e., NSGA-II [17], AR-MOEA [9] and CMOPSO [54]. NSGA-II is a basic multi-objective evolutionary algorithm serving as the baseline here. AR-MOEA is an indicator-based multi-objective optimization algorithms with an enhanced inverted generational distance indicator. CMOPSO is a multi-objective particle swarm optimization algorithm with competitive mechanism.

2) Performance Metric: The Inverted Generational Distance (IGD) [55] is widely used to evaluate the performance of multi-objective algorithms. IGD is calculated as follows:

TABLE I: COMMON PARAMETER SETTINGS OF ALGORITHMS

where d(z, A) refers to the Euclidean distance between a reference point z and a solution A in the objective space, and represents a predefined set of reference points on the PF. The smaller IGD is, the better the convergence and diversity of population is.

3) Parameter Settings: In EMT-PD, the type of probability model R is set to Gaussian probability model, and the scale factor F is set to 0.01. The parameters of TMO-MFEA are set according to [23], i.e., rmp is set to 1 for the diversity variable and 0.3 for the convergence variable. According to the description of EMT-EGT in [24], SPEA2 is used for one task and NSGA-II is used for the other task. The interval of explicit transmission is set to 10. The size of the elite population of CMOPSO is set to 10 according to [54]. Common parameter settings of algorithms are summarized in Table I.

It is worth noting that F is set to a lower weight. There are two main reasons. The first is that the scale of decision variable space is between 0 and 1 after normalization. If a relatively high noise is chosen, it may cause the individual to exceed the search range of decision variable. The second is that the individual which experienced the first knowledge transfer has moved towards the centralized area of high quality solution. If a relatively large disturbance is introduced, it easily drives individual jumping out of the centralized area of high quality solutions, leading to the individual quality deterioration. The sensitivity experiment results of parameter F can refer to Section II of the supplementary materials.

4) Results and Analysis on MTMOPs: Table II shows the experimental results on MTMOPs. The best result is highlighted. The Wilcoxon rank sum test is performed at a significance level of 5%. Symbols ’+’,’’,’’ denote that the result is significantly better, significantly worse, or comparable with that of EMT-PD, respectively. Next, the experimental results are analyzed from the degree of similarity.

EMT algorithms can work well on high similarity problems CI+HS, PI+HS and NI+HS. The overall performance of EMT algorithms is better than single-tasking algorithms. These results confirm that knowledge transfer across tasks in EMT is capable of accelerating convergence and finding better solutions.

For medium similarity problems CI+MS, PI+MS and NI+MS, EMT algorithms except for EMT-PD are not very competitive. CI+MS, PI+MS and NI+MS are comprised of unimodal and multimodal functions. The unimodal function only has a single global optimal solution. With the development of evolution, the population of unimodal function will gradually converge to a small area of search space. On the contrary, multimodal function has multiple global optimal solutions or local optimal solutions, the population of it will gradually converge to multiple different areas of search space. Since populations’ convergence areas of unimodal and multimodal functions are quite different, the probability of negative transfers is high. MO-MFEA, TMO-MFEA and EMTEGT cannot effectively handle those kinds of negative transfer, which makes their performance degradation. However, EMTPD can reduce the probability of generating negative transfer and increase the diversity of population through the two-stage adaptive knowledge transfer, which makes a good performance of EMT-PD on medium similarity problems.

EMT algorithms are still competitive on CI+LS and PI+LS, although they belong to low similarity problems. Because the global optima of two tasks on CI+LS and PI+LS have identical variables in the unified search space, those two tasks can still provide useful knowledge to each other on identical variables. On the contrary, for NI+LS, all global optima variables of two tasks are different. These tasks can only provide less useful knowledge to each other, resulting in the poor performances of EMT algorithms proposed previously. EMT-PD still shows obvious competitiveness on NI+LS owing to the adaptively adjusted step size of search at the first stage of knowledge transfer.

5) Results and Analysis on CEC2019-CMO: Table III shows the experimental results on CEC2019-CMO. The overall performance of EMT-PD is better than other algorithms. CMO2, CMO4 and CMO10 are composed of functions with the same PS. Therefore, the diversity and convergence of population can be maintained well via knowledge transfer between tasks. Compared with single-task algorithms, the performance

Fig. 4: The average performance score of all algorithms on MTMOPs and CEC2019-CMO. The values of EMT-PD are connected by a solid line to assess the score more easily.

of EMT algorithms is excellent. For CMO1, CMO3 and CMO6, objective functions of two tasks are very similar, which means the two tasks have similar properties. EMT algorithms can effectively optimize this kind of problems. For CMO7, CMO8 and CMO9 with different PS and objective functions, the performances of EMT algorithms proposed previously are not significant. However, based on adaptive step size of searching at the first stage of knowledge transfer, EMT-PD can effectively reduce the probability of negative transfer. The PFs of two tasks in CMO5 are very complex. To achieve better performance, the diversity of population needs to be carefully maintained in the optimization process. EMT-EGT provides multiple searching mechanisms for the population, which is conducive to maintain the diversity of population and hence obtain best performance on CMO5.

Fig. 4 shows the average performance score of all algorithms on MTMOPs and CEC2019-CMO. The calculation method of average performance score can be found in [56]. The lower average score is, the better performance of the IGD algorithm gets. EMT-PD has achieved the best results on MTMOPs and CEC2019-CMO, which demonstrates that the proposed knowledge extract and transfer method can effectively improve the performance of algorithm.

B. Experiment and Analysis on Many-objective Problems

1) Test Problems and Compared Algorithms: In order to further validate the performance of EMT-PD on MaOPs, we design a novel multi-tasking many-objective test suite based on the MaF test suite [34], labelled as MTMaOPs. The proposed MTMaOPs includes six problems, as shown in Table IV. MaF-HS1 is composed of MaF3 and MaF4. MaF-HS2 is composed of MaF4 and MaF6. MaF-MS1 is composed of MaF1 and MaF5*, where MaF5* is a shifted MaF5. The shifted individual is , where and the original individual . MaF-MS2 is composed of MaF5 and MaF6*, where MaF6* uses the same shift operation as MaF5*. MaF-LS1 is composed of MaF4 and MaF5. MaF-LS2 is composed of MaF3 and MaF6. The similarity denoted by sim of the problem is calculated as follows[52]:

TABLE II: AVERAGED IGD VALUE OBTAINED BY EMT-PD, MO-MFEA, TMO-MFEA, EMT-EGT, NSGA-II, ARMOEA AND CMOPSO ON OVER 30 INDEPENDENT RUNS ON MTMOPs

TABLE III: AVERAGED IGD VALUE OBTAINED BY EMT-PD, MO-MFEA, TMO-MFEA, EMT-EGT, NSGA-II, ARMOEA AND CMOPSO OVER 30 INDEPENDENT RUNS ON CEC2019-CMO

where and and is the rank of the k-th solution in population with respect to the two tasks. To calculate the value of sim, 1,000,000 solutions are randomly generated, thats means K =1,000,000.

The sim lying in (0, 1/3], (1/3, 2/3], (2/3, 1] is regarded as low, medium, and high similarity, respectively. Accord-

ing to the degree of similarity, MTMaOPs is divided into high similarity problems: MaF-HS1 and MaF-HS2, medium similarity problems MaF-MS1 and MaF-MS2, low similarity problems MaF-LS1 and MaF-LS2. In addition, each problem of MTMaOPs is set to contain three instances of different objective dimensions, that is, n = 10, n = 20, and n = 30.

Six state-of-the-art algorithms are used in comparision with EMT-PD on MTMaOPs, including three EMT algorithms, i.e., MO-MFEA [22], TMO-MFEA [23], EMT-EGT [24], and three many-objective optimization algorithms, i.e., NSGA-III [4],

TABLE IV: SUMMARY OF THE PROPOSED MTMaOPs

VaEA [57] and DDEANS [58]. NSGA-III is a basic many-objective evolutionary algorithms serving as the baseline here. VaEA is characterized by novel selection strategies. DDEANS is characterized by novel dynamical decomposition strategy.

2) Performance Metric: The Modified Inverted Generational Distance (IGD) [59] is adopted in this paper as the performance evaluation measure on MTMaOPs. IGDconsiders the Pareto dominance relation between reference vector and solution. It can accurately evaluate the performance of many-objective optimization algorithms. IGDis calculated as follows:

(15) where represents a reference vector. represents a solution set, n is the dimension of objectives, and denotes a predefined set of reference points on the PF. In this experiment, the number of reference points for calculating IGDis set to 100,000. The smaller value of IGDis, the better convergence and diversity of population gets.

TABLE V: PARAMETERS SETTING FOR DIFFERENT OBJEC- TIVE DIMENSION ON MTaMOPs

TABLE VI: PARAMETER SETTINGS FOR CROSSOVER AND MUTATION ON MTaMOPs

3) Parameter Settings: Because decomposition-based algorithms NSGA-III and VaEA are limited by reference vector, the population size of NSGA-III and VaEA can not be set optionally. For fair comparison, the same setting of population size are employed for all algorithms. The setting of population size can refer to [60]. The details of parameters for different objective dimension are summarized in Table V, where n indicates the number of objectives, H is the number of divisions considered along each objective coordinate, N represent the size of population, G is the maximal iteration, and FEs is maximal function evaluations. The common parameters of crossover and mutation are summarized in Table VI.

4) Results and Analysis on MTMaOPs: Table VII lists the statistical results on MTMaOPs. Except for MaF-HS2 and MaF-MS1, EMT-PD performs better than other algorithms. This result shows that the knowledge extract and transfer of EMT-PD is efficient on many-objective optimization problems. The results of MaF-HS2 and MaF-MS1 are analyzed in details as the following.

MaF-HS2 is a high similarity problem and consists of MaF4 and MaF6. The property of high similarity can promote the cooperation between two tasks, and improve the diversity and convergence of population for EMT algorithms. It makes all EMT algorithms have obvious competitiveness on MaF-HS2 compared with single-tasking many-objective algorithms. MaF4 is a badly scaled many-objective function. The traditional non-dominated sorting employed by EMT-PD cannot normalize the value of objective function, which leads to the fact that the population of MaF4 tends to converge to one side of PF, and cannot guide the search of population toward to convergence area effectively. It makes the poor performance of EMT-PD compared with EMT-EGT and TMO-MFEA.

MaF-MS1 is a medium similarity problem and is composed of MaF1 and MaF5. MaF1 is a linear problem, with its Pareto optimal solutions mainly concentrate on a very small area of searching space. In the early stage of evolution, only a few individuals can accurately reflect the centralization of population for MaF1. VaEA adopts maximum-vector-angle-first principle in environmental selection to ensure the diversity of population. DDEANS can balance dynamically the diversity and convergence of population according to the Euclidean distance between reference vector and population. The results on MaF-MS1 of VaEA and DDEANS are competitive just owing to the diversity mechanism of population effectively. TMO-MFEA uses different crossover parameters for different variables and EMT-EGT provides a variety of search mechanisms to the population. The diversity mechanism of TMOMFEA and EMT-EGT are also superior to EMT-PD, which brings better performance of TMO-MFEA and EMT-EGT than EMT-PD on MaF-MS1.

Fig. 5 shows the average performance score of all algorithms on MTMaOPs. The smaller the score is, the better IGDthe algorithm gets. It can be seen that single-tasking many-objective optimization algorithms achieves better performance than EMT algorithms on most of the test problems, which illustrates that EMT algorithm proposed previously cannot effectively deal with MaOPs. EMT-PD is still competitive on MaOPs, because the search direction of each variable is

TABLE VII: AVERAGED IGD+ VALUE OBTAINED BY EMT-PD, MO-MFEA, TMO-MFEA, EMT-EGT, NSGA-III, VaEA AND DDEANS OVER 30 INDEPENDENT RUNS ON MTMaOPs

Fig. 5: The average performance score of all algorithms on MT- MaOPs. The values of EMT-PD are connected by a solid line to assess the score more easily.

guided by population distribution, which makes the knowledge transfer is more accurate. Fig. 6 shows the average performance score of all algorithms

Fig. 6: The average performance score of all algorithms over 10, 20 and 30 objective dimensions of MTMaOPs.

over 10, 20 and 30 objective dimensions for MTMaOPs in term of IGD. It can be seen that with the increase of

Fig. 7: Average fitting error of EMT-PD with different probability models on some problems.

objective dimension, the average performance score of EMTPD becomes lower, which indicates that EMT-PD has more advantages in handling optimization problems with larger objective dimensions.

V. THE PERFORMANCE OF EMT-PD WITH DIFFERENT PROBABILITY MODELS

EMT-PD supports different probability models to fit population distribution. This section discusses the performance of EMT-PD with different probability models.

A. The Error of Fitting Population Distribution with Different Probability Models

Fig. 7 shows the average error of fitting population distribution with different probability models on CI+MS, PI+MS, 2019CEC-CMO5, 2019CEC-CMO9, MaF-HS2 and MaF-LS2. The error in each generation is calculated as follows[61]:

where represents the j-th dimension of i-th individuals, represents the probability model of j-th dimension calculated by Algorithm 2 at the g-th generation. g = 1, 2, ..., G and G represents the maximum number of iterations. N is the population size, and D is the dimension of decision variable. The average error in each generation is calculated as follows:

It can be seen from Fig. 7 that the average error of EMT-PD with Gaussian probability model is very small, especially for CI+MS, PI+MS, CEC2019-CMO9, MAF-HS2 and MAF-LS2. For CEC2019-CMO5, the average error of Exponential probability model, Gamma probability model and Beta probability model are also small. In summary, Gaussian probability model is the most versatile model for fitting all kinds of test suites, and other probability models may also be suitable for specific problems. In this paper, it is worth noting that the base of the logarithm in all experimental results is natural number e.

B. IGD and IGDof EMT-PD with Different Probability Models

Fig. 8 (a)-(h) shows the average IGD of EMT-PD with different probability models on 30 independent running on PI+MS, NI+LS, 2019CEC-CMO5 and 2019CEC-CMO8. Fig. 8 (i)-(l) shows the average IGDof EMT-PD with different probability models on 30 independent running on MAF-HS2 and MAF-MS1. It can be seen that EMT-PD with Gaussian probability model has better performance than EMT-PD with other probability models. EMT-PD with exponential model and Beta model shows poor performance on NI+LS, MAF-HS2 and MAF-MS1, which indicates that the universality of exponential model and Beta model are weak. It is important to note that the convergence speed of EMT-PD with Gamma model is faster than with Gaussian model on NI+LS in Fig. 8(d), which means that a strategy with adaptive selecting probability model is more suitable for the improvement of performance. The performance results on other problems with different probability models can refer to Section V of the supplementary materials.

VI. EFFECTS OF KNOWLEDGE TRANSFER FROM DIFFERENT INDIVIDUALS

To echo the research motivations and further demonstrate the effectiveness of knowledge transfer from different individuals, this section discusses five variants of EMT-PD.

EMT-SR is the variant of EMT-PD, where the knowledge is randomly extracted from one individual of population. EMT-MR is the variant of EMT-PD, where the knowledge is randomly extracted from three individuals of population. EMT-SH is the variant of EMT-PD, where the knowledge is extracted from the optimal solution of population. EMTMH is the variant of EMT-PD, where the knowledge is extracted from three optimal solutions of population. The optimal solution is the individual with the smallest Euclidean distance from PF. EMT-PD-1 is the variant of EMT-PD, which only has the first stage knowledge transfer. The comparative experiments are performed on five high similarity problems CI+HS, PI+HS, NI+HS, MaF-HS1, MaF-HS2, and five low similarity problems CI+LS, PI+LS, NI+LS, MaF-LS1, MaF-LS2. The experimental results can refer to Section I of the supplementary materials.

Fig. 8: IGDperformance of EMT-PD with different probability models on some problems.

According to experimental results, it can be seen that the variants EMT-SR and EMT-MR extracting knowledge from one or more random individuals cannot converge effectively, and the variants EMT-SH and EMT-MH extracting knowledge from one or more optimal solutions cannot converge steadily. The performance of variant EMT-PD-1 is not good due to the worse diversity of population and the disadvantage of easily falling into local optimum. EMT-PD has the best performance than its variants on both high similarity problems and low similarity problems, which clarifies the motivation of this paper.

This paper proposes a new multi-objective EMT algorithm with two-stage adaptive knowledge transfer based on population distribution. The knowledge extracted from probability model can effectively guide the search of population for convergence. The first stage of knowledge transfer is characterized by a novel adaptive weight, which can effectively reduce the probability of generating negative transfer. At the second stage of knowledge transfer, the search range of individuals is adjusted dynamically again to balance the diversity and the convergence of population and to help jumping across local optimum. At the same time, in order to further study the performance of EMT-PD on MaOPs, a novel test suite MTMaOPs based on MaF test suite is proposed. EMT-PD is compared with state-of-the-art algorithms on MTMOPs, CEC2019-CMO and MTMaOPs. The experimental results show that EMT-PD is competitive.

Although EMT-PD has shown superior performance on various test suites, there are still some further works worth doing. For example, all kinds of real-world problems are very different and complex. Gaussian probability model may not be the best model to fit specific real-world problem. It is valuable to propose an adaptive selection strategy of probability model according to the characteristics of problem. In addition, many-tasking optimization is also an interesting field worth investigation. We will try to expand EMT-PD to many-tasking optimization in the future.

REFERENCES

[1] J. H. Wang, Y. Zhou, Y. Wang, J. Zhang, C. L. P. Chen, and Z. B. Zheng, “Multiobjective vehicle routing problems with simultaneous delivery and pickup and time windows: Formulation, instances, and algorithms,” IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 582–594, March 2016.

[2] F. Sarro, F. Ferrucci, M. Harman, A. Manna, and J. Ren, “Adaptive multi-objective evolutionary algorithms for overtime planning in software projects,” IEEE Transactions on Software Engineering, vol. 43, no. 10, pp. 898–917, Oct 2017.

[3] L. Z. Cui, C. Xu, S. Yang, J. Z. Huang, J. Q. Li, X. Z. Wang, Z. Ming, and N. Lu, “Joint optimization of energy consumption and latency in mobile edge computing for internet of things,” IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4791–4803, June 2019.

[4] K. Deb and H. Jain, “An evolutionary many-objective opti- mization algorithm using reference-point-based nondominated sorting approach, part i: Solving problems with box constraints,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 577–601, Aug 2014.

[5] B. D. Li, J. L. Li, K. Tang, and X. Yao, “Many-objective evolutionary algorithms: A survey,” ACM Computing Surveys, vol. 48, no. 1, Sep 2015.

[6] M. Li, S. Yang, and X. Liu, “Diversity comparison of pareto front approximations in many-objective optimization,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2568–2584, Dec 2014.

[7] H. Chen, R. Cheng, W. Pedrycz, and Y. Jin, “Solving many- objective optimization problems via multistage evolutionary search,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–13, Aug 2019, Early Access.

[8] Y. Tian, C. He, R. Cheng, and X. Zhang, “A multistage evolutionary algorithm for better diversity preservation in multiobjective optimization,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–15, Dec 2019, Early Access.

[9] Y. Tian, R. Cheng, X. Y. Zhang, F. Cheng, and Y. C. Jin, “An indicator-based multiobjective evolutionary algorithm with reference point adaptation for better versatility,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 4, pp. 609–622, Aug 2018.

[10] W. J. Hong, K. Tang, A. M. Zhou, H. Ishibuchi, and X. Yao, “A scalable indicator-based evolutionary algorithm for large-scale multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 3, pp. 525–537, June 2019.

[11] L. G. de la Fraga and E. Tlelo-Cuautle, “Optimizing an ampli-fier by a many-objective algorithm based on r2 indicator,” in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), May 2015, pp. 265–268.

[12] Y. N. Sun, G. G. Yen, and Z. Yi, “IGD indicator-based evolutionary algorithm for many-objective optimization problems,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 2, pp. 173–187, April 2019.

[13] Q. F. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, Dec 2007.

[14] M. Y. Wu, K. Li, S. Kwong, Y. Zhou, and Q. F. Zhang, “Matching-based selection with incomplete lists for decomposition multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 4, pp. 554–568, Aug 2017.

[15] R. Cheng, Y. C. Jin, M. Olhofer, and B. Sendhoff, “A reference vector guided evolutionary algorithm for many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 5, pp. 773–791, Oct 2016.

[16] X. Y. He, Y. R. Zhou, Z. F. Chen, and Q. F. Zhang, “Evolutionary many-objective optimization based on dynamical decomposition,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 3, pp. 361–375, June 2019.

[17] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, Apr 2002.

[18] X. F. Zou, Y. Chen, M. Z. Liu, and L. S. Kang, “A new evolutionary algorithm for solving many-objective optimization problems,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 38, no. 5, pp. 1402–1412, Oct 2008.

[19] Y. Yuan, H. Xu, B. Wang, and X. Yao, “A new dominance relation-based evolutionary algorithm for many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 1, pp. 16–37, Feb 2016.

[20] V. Palakonda, S. Ghorbanpour, and R. Mallipeddi, “Pareto dominance-based moea with multiple ranking methods for many-objective optimization,” in 2018 IEEE Symposium Series on Computational Intelligence (SSCI), July 2018, pp. 958–964.

[21] A. Gupta, Y. S. Ong, and L. Feng, “Multifactorial evolution: Toward evolutionary multitasking,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 3, pp. 343–357, June 2016.

[22] A. Gupta, Y. S. Ong, L. Feng, and K. C. Tan, “Multiobjective multifactorial optimization in evolutionary multitasking,” IEEE Transactions on Cybernetics, vol. 47, no. 7, pp. 1652–1665, July 2017.

[23] C. E. Yang, J. L. Ding, K. C. Tan, and Y. C. Jin, “Two-stage assortative mating for multi-objective multifactorial evolutionary optimization,” in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Dec 2017, pp. 76–81.

[24] L. Feng, L. Zhou, J. H. Zhong, A. Gupta, Y. S. Ong, K. Tan, and A. K. Qin, “Evolutionary multitasking via explicit autoencoding,” IEEE Transactions on Cybernetics, vol. 49, no. 9, pp. 3457–3470, Sep. 2019.

[25] Y. L. Chen, J. H. Zhong, and M. K. Tan, “A fast memetic multi- objective differential evolution for multi-tasking optimization,” in 2018 IEEE Congress on Evolutionary Computation (CEC), July 2018, pp. 1–8.

[26] N. Q. Tuan, T. D. Hoang, and H. T. Thanh Binh, “A guided dif- ferential evolutionary multi-tasking with powell search method for solving multi-objective continuous optimization,” in 2018 IEEE Congress on Evolutionary Computation (CEC), July 2018, pp. 1–8.

[27] Y. Yuan, Y. S. Ong, A. Gupta, P. S. Tan, and H. Xu, “Evo- lutionary multitasking in permutation-based combinatorial optimization problems: Realization with TSP, QAP, LOP, and JSP,” in 2016 IEEE Region 10 Annual International Conference (TENCON), Nov 2016, pp. 3157–3164.

[28] R. Sagarna and Y. S. Ong, “Concurrently searching branches in software tests generation through multitask evolution,” in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Dec 2016, pp. 1–8.

[29] R. Chandra, A. Gupta, Y. S. Ong, and C. K. Goh, “Evolutionary multi-task learning for modular knowledge representation in neural networks,” Neural Processing Letters, vol. 47, no. 3, pp. 993–1009, June 2018.

[30] W. C. J. Zhong, L. Feng and Y. S. Ong, “Multifactorial genetic programming for symbolic regression problems,” IEEE Transactions on Systems, Man, and Cybernetics: System, pp. 1–14, July 2018.

[31] A. Rauniyar, R. Nath, and P. K. Muhuri, “Multi-factorial evo- lutionary algorithm based novel solution approach for multi-objective pollution-routing problem,” Computers and Industrial Engineering, vol. 130, no. 5, pp. 757–771, Apr 2019.

[32] H. Li, Y. S. Ong, M. G. Gong, and Z. K. Wang, “Evolutionary multitasking sparse reconstruction: Framework and case study,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 733–747, Oct 2019.

[33] Y. S. Ong and A. Gupta, “Evolutionary multitasking: A com- puter science view of cognitive multitasking,” Cognitive Computation, vol. 8, no. 2, pp. 125–142, Apr 2016.

[34] M. Q. Li, Y. Tian, X. Y. Zhang, S. X. Yang, Y. C. Jin, and X. Yao, “A benchmark test suite for evolutionary many-objective optimization,” IEEE Computational Intelligence Magazine, vol. 3, no. 1, pp. 67–81, Mar 2017.

[35] A. Gupta, J. Ma´ndziuk, and Y. S. Ong, “Evolutionary multitask- ing in bi-level optimization,” Complex & Intelligent Systems, vol. 1, no. 1, pp. 83–95, Dec 2015.

[36] K. K. Bali, A. Gupta, L. Feng, Y. S. Ong, and P. S. Tan, “Linearized domain adaptation in evolutionary multitasking,” in 2017 IEEE Congress on Evolutionary Computation (CEC), June 2017, pp. 1295–1302.

[37] B. S. Da, A. Gupta, Y. S. Ong, and L. Feng, “Evolutionary multitasking across single and multi-objective formulations for improved problem solving,” in 2016 IEEE Congress on Evolutionary Computation (CEC), July 2016, pp. 1695–1701.

[38] J. L. Ding, C. Yang, Y. C. Jin, and T. Y. Chai, “Generalized mul- titasking for evolutionary optimization of expensive problems,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 1, pp. 44–58, Feb 2017.

[39] M. G. Gong, Z. D. Tang, H. Li, and J. Zhang, “Evolutionary

multitasking with dynamic resource allocating strategy,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 858–869, Oct 2019.

[40] K. K. Bali, Y. S. Ong, A. Gupta, and P. S. Tan, “Multifactorial evolutionary algorithm with online transfer parameter estimation: MFEA-II,” IEEE Transactions on Evolutionary Computation, pp. 1–1, Mar 2019.

[41] Z. P. Liang, J. Zhang, L. Feng, and Z. X. Zhu, “A hybrid of genetic transform and hyper-rectangle search strategies for evolutionary multi-tasking,” Expert Systems with Applications, vol. 138, no. 30, Dec 2019.

[42] J. Zhong, L. Feng, W. Cai, and Y. S. Ong, “Multifactorial genetic programming for symbolic regression problems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–14, July 2018, Early Access.

[43] B. S. Da, A. Gupta, Y. S. Ong, and L. Feng, “The boon of gene-culture interaction for effective evolutionary multitasking,” Lecture Notes in Computer Science, vol. 9592, pp. 54–65, Feb 2016.

[44] L. Feng, W. Zhou, L. Zhou, S. W. Jiang, J. H. Zhong, B. S. Da, Z. X. Zhu, and Y. Wang, “An empirical study of multifactorial PSO and multifactorial DE,” in 2017 IEEE Congress on Evolutionary Computation (CEC), June 2017, pp. 921–928.

[45] Z. D. Tang and M. G. Gong, “Adaptive multifactorial particle swarm optimisation,” CAAI Transactions on Intelligence Technology, vol. 4, no. 1, pp. 37–46, Mar 2019.

[46] H. Song, A. K. Qin, P. Tsai, and J. J. Liang, “Multitasking multi- swarm optimization,” in 2019 IEEE Congress on Evolutionary Computation (CEC), June 2019, pp. 1937–1944.

[47] D. N. Liu, S. J. Huang, and J. H. Zhong, “Surrogate-assisted multi-tasking memetic algorithm,” in 2018 IEEE Congress on Evolutionary Computation (CEC), July 2018, pp. 1–8.

[48] L. Zhou, L. Feng, K. Liu, C. Chen, S. J. Deng, T. Xiang, and S. W. Jiang, “Towards effective mutation for knowledge transfer in multifactorial differential evolution,” in 2019 IEEE Congress on Evolutionary Computation (CEC), June 2019, pp. 1541–1547.

[49] Q. Shang, L. Zhang, L. Feng, Y. Hou, J. Zhong, A. Gupta, K. C. Tan, and H. L. Liu, “A preliminary study of adaptive task selection in explicit evolutionary many-tasking,” in 2019 IEEE Congress on Evolutionary Computation (CEC), June 2019, pp. 2153–2159.

[50] A. Gupta, Y. S. Ong, B. Da, L. Feng, and S. D. Handoko, “Landscape synergy in evolutionary multitasking,” in 2016 IEEE Congress on Evolutionary Computation (CEC), July 2016, pp. 3076–3083.

[51] J. Zhang, W. E. Zhou, X. Q. Chen, W. Yao, and L. Cao, “Multi-source selective transfer framework in multi-objective optimization problems,” IEEE Transactions on Evolutionary Computation, pp. 1–1, July 2019.

[52] Y. Yuan, Y. S. Ong, L. Feng, A. K. Qin, A. Gupta, B. S. Da, Q. F. Zhang, K. C. Tan, Y. C. Jin, and H. Ishibuchi, “Evolutionary multitasking for multiobjective continuous optimization: Benchmark problems, performance metrics and baseline results,” arXiv preprint arXiv:1706.02766, June 2017.

[53] L. Feng, K. Qin, A. Gupta, Y. Yuan, Y. S. Ong, and X. Chi, “IEEE CEC 2019 Competition on Evolutionary Multi-task Optimization,” http://www.bdsc.site/websites/MTO competiton 2019/MTO Competition CEC 2019.html, 2019.

[54] X. Y. Zhang, X. T. Zheng, R. Cheng, J. F. Qiu, and Y. C. Jin, “A competitive mechanism based multi-objective particle swarm optimizer with fast convergence,” Information Sciences, vol. 427, pp. 63–76, Feb 2018.

[55] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. da Fonseca, “Performance assessment of multiobjective optimizers: an analysis and review,” IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117–132, Apr 2003.

[56] J. Bader and E. Zitzler, “Hype: An algorithm for fast hypervolume-based many-objective optimization,” Evolutionary

Computation, vol. 19, no. 1, pp. 45–76, Mar 2011.

[57] Y. Xiang, Y. R. Zhou, M. Q. Li, and Z. F. Chen, “A vector angle-based evolutionary algorithm for unconstrained many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 1, pp. 131–152, Feb 2017.

[58] X. Y. He, Y. R. Zhou, Z. F. Chen, and Q. F. Zhang, “Evo- lutionary many-objective optimization based on dynamical decomposition,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 3, pp. 361–375, June 2019.

[59] I. Hisao, M. Hiroyuki, and N. Yusuke, “A study on performance evaluation ability of a modified inverted generational distance indicator,” 16th Genetic and Evolutionary Computation Conference, GECCO 2015, pp. 695–702, July 2015.

[60] K. Li, K. Deb, Q. F. Zhang, and S. Kwong, “An evolutionary many-objective optimization algorithm based on dominance and decomposition,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 5, pp. 694–716, Oct 2015.

[61] S. Boyd, L. Vandenberghe, and L. Faybusovich, “Convex op- timization,” IEEE Transactions on Automatic Control, vol. 51, no. 11, pp. 1859–1859, Nov 2006.

Designed for Accessibility and to further Open Science