b

DiscoverSearch
About
My stuff
A centralized reinforcement learning method for multi-agent job scheduling in Grid
2016·arXiv
Abstract
Abstract

One of the main challenges in Grid systems is designing an adaptive, scalable, and model-independent method for job scheduling to achieve a desirable degree of load balancing and system efficiency. Centralized job scheduling methods have some drawbacks, such as single point of failure and lack of scalability. Moreover, decentralized methods require a coordination mechanism with limited communications. In this paper, we propose a multi-agent approach to job scheduling in Grid, named Centralized Learning Distributed Scheduling (CLDS), by utilizing the reinforcement learning framework. The CLDS is a model free approach that uses the information of jobs and their completion time to estimate the efficiency of resources. In this method, there are a learner agent and several scheduler agents that perform the task of learning and job scheduling with the use of a coordination strategy that maintains the communication cost at a limited level. We evaluated the efficiency of the CLDS method by designing and performing a set of experiments on a simulated Grid system under different system scales and loads. The results show that the CLDS can effectively balance the load of system even in large scale and heavy loaded Grids, while maintains its adaptive performance and scalability.

Keywords:

Grid computing; Multi-agent job scheduling; Resource allocation; Reinforcement learning; Load balancing; Scalability;

One of the most significant problems in computer sciences, which is also important in economics, is multi-agent resource allocation. It addresses the problem of distributing a number of items among a number of agents [1]. The resource allocation, also called scheduling, is relevant to a variety of applications, such as Grid computing, public transportation, and network routing.

In recent decades, with the increasing demand for distributed computing, Grid computing emerges as a leading technology for supporting complicated and decentralized computing problems. Grid computing delivers a set of capabilities including aggregation, selection and sharing a number of heterogeneous resources that are distributed among geographically diverse locations [2]. One of the crucial tasks that directly affects the performance of Grid systems is coordinated resource allocation and job scheduling. By employing an efficient and robust algorithm for job scheduling, a Grid system can deliver its peak performance. When an applicable policy is used for resource allocation, Grid system achieves the speedup in job processing and provides high-quality services to users [3]. One of the key issues in Grid systems is load balancing that is defined as completing all the jobs at hand as soon as possible [2]. An adaptive scheduling method, which balances the load of Grid system effectively among the heterogeneous resources, is a requirement to maintain the performance of Grid system at a desirable level.

In recent years, many advances have been done in Grid job scheduling algorithms. These algorithms are divided into centralized and decentralized job scheduling. As centralized scheduling systems, which also known as traditional systems, we can point to PBS [4], SGE [5] and Condor [6]. These systems work effectively by utilizing global state information that is obtained from Grid environment. However, centralized resource allocation has a major drawback, i.e. single point of failure, which is also referred to as the lack of fault-tolerance ability. Furthermore, these methods is not scalable. To overcome the mentioned problems, researchers turning their approach to decentralized scheduling algorithms. However, other challenges were appeared by introducing the decentralized job scheduling methods. Most of decentralized schedulers, like AppleS [7] or Condor-G [8], may encounter many synchronization problems in resource management. These problems arise due to the fact that scheduling policies are performed individually by each scheduler, regardless of the other schedulers’ decisions. In order to tackle these problems, some scheduling methods were proposed based on coordination mechanisms, like Condor Flock P2P [9]. However, a deficiency that brings a high communication overhead to these schedulers is the extra dependency on negotiation among the schedulers. Hence, coordinating the scheduling among the decentralized schedulers, with a reduced communication cost, is an important driving force for new researches.

In recent years, a number of reinforcement learningdriven methods have been proposed for job scheduling in Grid environment. Some of these methods are designed for decentralized scheduling. It means that multiple agents perform the task of learning based on information obtained from previous actions, i.e. selection of resources and receiving feedback from Grid environment. The advantages of these methods are their low communication needs among the scheduler agents, and their scalability. Even though, the reinforcement learning is a proper framework for multi-agent job scheduling, it seems that multiple learner agents may capture the state of entire system and resources with some differences. However, multi-agent scheduling is regarded as an appropriate solution to overcome the single point of failure problem. Therefore, utilizing a centralized and effective learning approach in combination with distributed job scheduling, can resolve the single point of failure, and also improves the load balancing of Grid and the scalability of scheduling method.

In this paper, we propose a Grid job scheduling method, named Centralized Learning Distributed Scheduling (CLDS). In the CLDS method, multiple agents perform the task of scheduling and produce rewards for learning task according to scheduled jobs’ information. In this method, only one agent is responsible for the task of learning. This agent shares its information of Grid with scheduler agents by a utility table, and gets the reinforcement signals from them to make a unified view of Grid system. The advantages of this strategy are: 1) the communication cost is reduced by a simple coordination strategy, 2) since there are multiple schedulers that perform the scheduling task, and each of which can be employed as learner agent, there is no single point of failure, and 3) since there is a unified learner agent, the scheduler agents only see a single and real view of the Grid system. We designed and accomplished a set of experiments on a simulated Grid system to show the effectiveness of the CLDS method, in terms of load balancing, compared with other scheduling methods.

The rest of this paper is organized as follows. Section II gives an overview of previous job scheduling approaches that use reinforcement learning and other learning frameworks. In Section III, we describe the CLDS job scheduling method. Section IV explains the evaluation methodology and discusses the evaluation results. Finally, we draw conclusion and describe future work in Section V.

In Grid environment, the resources are heterogeneous, the performance of resources is vary and the applications are diverse. Therefore, an adaptive scheduling method is needed. Reinforcement learning is a machine learning method that is widely used to solve uncertain decision making problems. It obtains sub-optimal or near-optimal policies by interacting with the environment [10]. The reinforcement learning can solve the difficulties of Grid resource allocation problem by providing a model-free framework.

Tesauro et al. [11] propose a hybrid scheduling approach that benefits from the advantages of both queuing and reinforcement learning models, in open-loop and closed-loop traffics. In their method, a queuing model policy controls the system, while reinforcement learning trains offline on data collected from the system. They use reinforcement learning to train nonlinear function approximators instead of lookup tables, which enables scaling to larger state spaces. In [12], a general framework is proposed to perform dynamic resource allocation among multiple entities. It uses reinforcement learning in combination with fuzzy rule bases, and can be employed in an environment with and without any existing resource allocation policies. Zhang et al. [13] present a multi-agent learning algorithm for optimizing online resource allocation in cluster networks. Each agent decides according to two connected learning problems. The first problem is local allocation that refers to deciding which tasks to be allocated locally. The second problem is task routing that refers to deciding where to forward a task. A number of heuristic strategies is developed, in order to speed up the learning stage and to avoid weak initial policies.

A hybrid resource management method is proposed in [14] to improve the system reliability in Grid and Cloud computing. It employs a reinforcement learning method in combination with neural network to help the scheduler to deal with dynamicity in execution environment. In a previous work [15], a meta-scheduler is presented for job scheduling in computational Grids. It utilizes a fuzzy rulebased system to develop a Grid scheduling middleware. It incorporates a swarm intelligence method aimed at knowledge acquisition, in order to improve the ability of adapting to changes in the resources and applications conditions.

Wu et al. [2] propose a distributed learning algorithm called Ordinal Sharing Learning (OSL) based on the reinforcement learning framework. In the OSL method, each scheduler has a utility table and updates it in two steps. Firstly, it updates the table using the local rewards produced for the resources. Secondly, it updates the table using the utility table of its adjacent scheduler. After updating the utility table, the scheduler sends its table to the neighbor scheduler, and the neighbor agent performs updating and sending of the utility table likewise. Compared to the OSL, our proposed method employs a centralized learning strategy. In the CLDS method, there is a single learner agent and a single utility table. The scheduler agents send their produced rewards to the learner, and it updates the utility table using all the rewards. For the subsequent resource allocation decisions, the scheduler agents will use the updated utility table. With the use of this strategy, all the schedulers see the same view of Grid system.

As noted earlier, the centralized job scheduling methods, which use a single scheduler agent, may fail due to the single point of failure problem. Therefore, the fault tolerance ability of Grid system is violated. Moreover, in decentralized scheduling methods, an efficient and low cost coordination strategy is needed, in order to all schedulers decide according to a unified view of Grid system and the state of resources. We propose a multi-agent scheduling method that tackles the single point of failure problem. Moreover, a coordination strategy is adopted based on a centralized reinforcement learning approach and limited communications.

As mentioned earlier, in model-based job scheduling methods that mostly rely on the Grid Information System (GIS), schedulers may have inaccurate and time-delayed information about resources. To tackle this problem, an adaptive scheduling algorithm is deserved. Such an algorithm should not be dependent on an accurate model. To deal with this type of scheduling problem, we propose a centralized reinforcement learning method in combination with coordinated multi-agent job scheduling. An accurate view of the state of resources is available for all schedulers, because one learner agent performs the learning task and provides the information about Grid system to all the scheduler agents. Moreover, all the scheduler agents can undertake the learning task. As a result, our proposed method can also deal with the single point of failure problem in Grid system. In the following, after describing a general model for job scheduling in Grid, our proposed method is explained in detail.

Before introducing our proposed method, we describe a general job scheduling model in Grid, which has been widely used in the literature to assess job scheduling algorithms [16]. In this model, there are several users, resources, and schedulers. The users produce jobs and submit them to the schedulers. Different schedulers receive the jobs from users and allocate them to the resources in parallel. Each scheduler can allocate jobs to any of the resources. For the simplicity reasons, it is assumed that all the resources are computing resources. In general, the problem of decentralized job scheduling in Grids, can be modeled using a multi-agent job scheduling system [16]. This model is denoted as a 6-tuple ⟨A, R, P,S, C, JSR⟩, where A={a1, …, aN}is a set of agents, R={r1,…, rM}is a set of resources, P:A×N→[0, 1]is a job submission function, S:A×N→ℜis a probabilistic job size function, C:A×N→ℜis a probabilistic capacity function, and JSR is a job scheduling rule [2].

Although the above model is based on some abstractions, it still preserves the dynamicity, heterogeneity, and randomness features of Grid environment.

A. Centralized learning method

One of the reinforcement learning strategies for multi-agent job scheduling is based on multiple independent learners [2]. In such type of learning, there is no explicit communication among the agents, and each agent learns independently based on local state and local reward. However, this strategy may lead to an anomalous resource allocation, because there is no communication among agents and they do not have a real view of Grid system. In this case, the agents learn according to their local information, and a coordination mechanism seems to be required.

To tackle the above problem, in our proposed method, the scheduler agents submit their local rewards to a learner agent. In each time step, the learner agent collects the rewards and updates a utility table that holds the efficiency of selecting the resources. Then, the learner agent sends the updated utility table to the scheduler agents, and the schedulers make their resource allocation decisions according to the utility table. In the following, each step of the learning method is explained thoroughly.

1) Generating local rewards

In each time step, the agent aireceives job scheduling

tasks from one or more users. It makes a resource allocation decision for each job based on the utility table. The resource allocation strategy will be explained in the next subsection. When an agent submits a job to a resource, it records an entry for the submitted job in the Scheduled Job List. In the entry, Job ID, Job Size and Resource ID are recorded, also additional information is recorded, such as Starting Time and Completion Time. The agent aisearches the Scheduled Job List to attain the completion information of the jobs and generate reward for each one of the resources. If a job is completed, a positive reward is generated for the corresponding resource and its record is removed from the list. For each finished job jkin the Scheduled Job List, a positive reward is produced for the corresponding resource rq, based on the Job Size, Starting Time and Completion Time, as follows:

image

where reward(rq)is the produced reward for the resource rq, Job Size(jk)is the size of job jksubmitted to the resource rq, and Time to Completion(jk)is the total time that the resource rqspent to process the job jkand is calculated as Completion Time(jk) – Starting Time(jk). Each finished job produces a positive reward for the corresponding resource in proportion to its size and the total time between its starting and completion. In this way, for two jobs with equal size, the job which was finished in less time, produces a greater positive reward for the corresponding resource.

For each unfinished job jkin the Scheduled Job List, a negative reward (i.e. a penalty) is produced for the corresponding resource rq, based on the Job Size as follows:

image

where reward(rq)is the produced reward for the resource rq, and Job Size(jk)is the size of job jksubmitted to the resource rq. Each unfinished job produces a negative reward for the corresponding resource in proportion to its inversed size. In fact, an unfinished job with greater size produces a lower negative reward, because the bigger job needs more time to be completed.

If an agent has more than one job submitted to same resource in its Scheduled Job List, it calculates the sum of all produced positive and negative rewards for that resource and generates a single reward. After generating local rewards, each agent aiputs the rewards into a Reward Vector and submits it to the learner agent. The qthelement in the Reward Vector holds the produced reward for the resource rq.

2) Updating the utility table

In each time step, the learner agent receives the

reward vectors from all scheduler agents, and updates the utility table U. Then, it sends the new utility table to the scheduler agents. The utility table U is a vector that its size is equal to the number of resources, where U(q) holds the efficiency of the qthresource. The learner agent updates the utility table U using all the reward vectors, as follows:

image

where αis learning factor, and Reward Vectori(q)is the qthelement, which holds the reward for the qthresource, in the reward vector generated by the ithagent. In (3), all the produced rewards for a resource are added together and used for updating the efficiency of the resource.

After updating the utility table, the learner agent sends it to the scheduler agents. At the next time step, the scheduler agents will use the updated utility table to allocate the resources. They will generate the rewards and submit the Reward Vector to the learner agent again. In the following subsection, we will explain how the scheduler agents do the job scheduling task.

B. Multi-agent job scheduling

At each time step, in addition to generating the rewards and receiving the updated utility table, each scheduler agent submits the jobs in its Job Queue to appropriate resources. This decision is made according to the utility table. Each agent ai, for each job Jmin its Job Queue, selects the resource which has the greatest utility value in the utility table and submits the job Jmto it. As mentioned earlier, after submitting a job to a resource, the agent airecords an entry for the submitted job in the Scheduled Job List. Then, it removes the submitted job from its Job Queue.

In the proposed job scheduling method, there is only one learner agent that collects the reward vectors, updates the utility table, and sends the new utility table to the scheduler agents. However, there is no single point of failure in this method, because every scheduler agent can be employed as the learner agent. The learner agent has not any special capabilities, and can be replaced by each one of the scheduler agents.

As discussed above, the communication cost grows linearly with the number of agents, and it is still better than other methods that use traditional coordination mechanisms and need the communications with

exponential order.

We conducted a set of experiments to evaluate the performance of our CLDS job scheduling method on a simulated Grid system. We compared our CLDS method with other three job scheduling methods, i.e. Least Load Selection (LLS), Random Selection (RS), and Decentralized Min-Min Selection (DMMS) [17]. In the LLS method, an agent selects the resource with the least load and submits the current job in the queue to it. If multiple resources have the same minimum load, the agent selects one of them randomly. In the RS approach, an agent selects the resources and allocates them to the jobs in queue based on a uniform probability distribution. In the DMMS approach, each agent performs the scheduling task independently based on decentralized Min-Min algorithm, which is a benchmark scheduling algorithm for performance evaluation.

There is a metric, named Average Load of Resources (ALoR) [2], for evaluating the performance of Grid job scheduling methods. The ALoR can be properly used to assess the efficiency of job scheduling in the Grid model explained in Section III. The simulated Grid system which was used for the evaluations, was thoroughly based on the early discussed model. In the model, resources are different in their processing capacity C. For each resource, the processing capacity is determined as the inverse of CPU time required to perform a job of a unit length [2]. The capacity of resources is generated randomly from a uniform distribution in a given interval. Each resource has a queue for arriving jobs, and performs only one single job at a given time, based on First In First Out (FIFO) order. Moreover, each job has a length that is chosen randomly from a uniform distribution in a given interval. Considering the above assumptions, the ALoR is defined as follows [2]:

image

where |R| is the total number of resources, LoRris the load of rthresource, l_totalris the total length of jobs in the queue of rthresource, and Cris the capacity of rthresource.

In performance evaluation of job scheduling algorithms, a minimum value of ALoR is preferred. The lower ALoR refers to the better load balancing and system efficiency. We evaluated the performance of our proposed job scheduling method under three different system scales and two different system loads. The system scale is determined by the number of schedulers and resources, and the system load is determined by the proportion of the total length of jobs in schedulers’ queue to the total capacity of resources. In Fig. 1 to Fig. 6, the results of performance evaluation are represented. Our proposed method is indicated as CLDS. The different system scales and loads, used for performance evaluation, are given below:

Small scale (50 schedulers, and 200 resources), and

medium load (60%). The results are represented in Fig. 1.

Medium scale (150 schedulers, and 400 resources),

and medium load (60%). The results are represented in Fig. 2.

Large scale (300 schedulers, and 1200 resources),

and medium load (60%). The results are represented in Fig. 3.

Small scale (50 schedulers, and 200 resources), and

heavy load (90%). The results are represented in Fig. 4.

Medium scale (150 schedulers, and 400 resources),

and heavy load (90%). The results are represented in Fig. 5.

Large scale (300 schedulers, and 1200 resources),

and heavy load (90%). The results are represented in Fig. 6.

As can be seen in Fig. 1 to Fig. 6, our proposed job scheduling method, i.e. the CLDS, and the DMMS perform well in load balancing task under different scales and loads. The RS method fails to carry out the job scheduling task, and the LLS method shows the best efficiency among the four methods.

image

Fig. 1. ALoR for our proposed method (CLDS) and the other three methods, under a small scale and medium load system.

image

Fig. 2. ALoR for our proposed method (CLDS) and the other three methods, under a medium scale and medium load system.

image

Fig. 3. ALoR for our proposed method (CLDS) and the other three methods, under a large scale and medium load system.

image

Fig. 4. ALoR for our proposed method (CLDS) and the other three methods, under a small scale and heavy load system.

image

Fig. 5. ALoR for our proposed method (CLDS) and the other three methods, under a medium scale and heavy load system.

image

Fig. 6. ALoR for our proposed method (CLDS) and the other three methods, under a large scale and heavy load system.

At initial steps, our method performs worse than DMMS and LLS methods. It happens because in our method, the scheduler agents have no initial knowledge about the resources and the Grid system. At initial steps, the scheduler agents collect the information about the efficiency of resources, according to generated rewards and updating the utility table by the learner agent. After almost 1000 time steps, the CLDS begins to perform more efficiently than the DMMS. Afterwards, it converges to an efficient ALoR and obtains a sub-optimal policy for resource allocation. The LLS method achieves an impressive load balancing performance, however, it cannot be effectively employed in real world Grids. The LLS method has a high communication and computing cost. Moreover, it is a centralized scheduling method that is taken into account as a single point of failure. The DMMS method achieves the load balancing, although, the performance is lower than the CLDS and the LLS. In DMMS method, schedulers work without any coordination mechanisms, and therefore, some resources may not be utilized effectively or may be overloaded. The RS method does not consider the performance of resources and allocates the resources according to a random manner. Consequently, resources are not utilized based on their efficiency, the ALoR increases with a great degree, and the RS method fails to balance the load of resources.

We evaluated the effectiveness of our proposed job scheduling method under different system scales and system loads, in order to test its scalability and adaptive performance. As can be observed from the results, the CLDS method can converge to a sub-optimal policy in different system scales. The results show that the CLDS method is scalable and can be employed effectively in Grid systems with both small and large number of schedulers and resources. The CLDS method also provide an adaptive performance under medium and heavy system loads. When the load of system increases, the CLDS method still converges to a sub-optimal or even near-optimal policy.

In Grid systems, a job scheduling method that can fairly balance the load of system among all resources is a main requirement to improve the performance of the entire system. Moreover, such scheduling method should have essential abilities, such as scalability, adaptive performance, and fault tolerance. In this paper, we proposed a multi-agent job scheduling method, named CLDS, for Grid system, using reinforcement learning framework. In the CLDS method, a single learner agent is responsible for receiving the produced rewards from multiple scheduler agents, updating a utility table that contains the efficiency of the resources, and sends the updated utility table to all the scheduler agents. The scheduler agents make resource allocation decisions according to the utility table. Each scheduler agent can be employed as learner agent, therefore, there would be no single point of failure in the Grid system. The results of experiments show that the CLDS method performs well in load balancing and converge to a sub-optimal policy, or even to a near-optimal policy in some cases. The coordination mechanism among the schedulers is established through a centralized reinforcement learning approach with limited communications, which makes a unified view of the state of resources to the schedulers. Therefore, the scheduler agents can allocate the resources more effectively. According to the evaluation results under different system scales and loads, the CLDS method shows an adaptive and scalable performance.

In future work, we will try to improve the CLDS method by incorporating a hierarchical scheduling mechanism.

[1] Y. Chevaleyre, P. E. Dunne, U. Endriss, J. Lang, M. Lemaitre, N. Maudet, et al., "Issues in multiagent resource allocation," Informatica, vol. 30, 2006.

[2] J. Wu, X. Xu, P. Zhang, and C. Liu, "A novel multi-agent reinforcement learning approach for job scheduling in Grid computing," Future Generation Computer Systems, vol. 27, pp. 430-439, 2011.

[3] W.-C. Chung and R.-S. Chang, "A new mechanism for resource monitoring in grid computing," Future Generation Computer Systems, vol. 25, pp. 1-7, 2009.

[4] Portable Batch System. <http://www.pbsgridworks.com/>.

[5] Sun Grid Engine. <http://www.sun.com/software/gridware/>.

[6] D. Thain, T. Tannenbaum, and M. Livny, "Distributed computing in practice: the Condor experience," Concurrency and computation: practice and experience, vol. 17, pp. 323-356, 2005.

[7] F. Berman, R. Wolski, H. Casanova, W. Cirne, H. Dail, M. Faerman, et al., "Adaptive computing on the grid using AppLeS," Parallel and Distributed Systems, IEEE Transactions on, vol. 14, pp. 369-382, 2003.

[8] J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke, "Condor-G: A computation management agent for multiinstitutional grids," Cluster Computing, vol. 5, pp. 237-246, 2002.

[9] A. R. Butt, R. Zhang, and Y. C. Hu, "A self-organizing flock of condors," in Proceedings of the 2003 ACM/IEEE conference on Supercomputing, 2003, p. 42.

[10] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction: MIT press, 1998.

[11] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani, "A hybrid reinforcement learning approach to autonomic resource allocation," in Autonomic Computing, 2006. ICAC'06. IEEE International Conference on, 2006, pp. 65-73.

[12] D. Vengerov, "A reinforcement learning approach to dynamic resource allocation," Engineering Applications of Artificial Intelligence, vol. 20, pp. 383-390, 2007.

[13] C. Zhang, V. R. Lesser, and P. J. Shenoy, "A Multi-Agent Learning Approach to Online Distributed Resource Allocation," in IJCAI, 2009, pp. 361-366.

[14] M. Hussin, N. A. W. A. Hamid, and K. A. Kasmiran, "Improving reliability in resource management through adaptive reinforcement learning for distributed systems," Journal of Parallel and Distributed Computing, vol. 75, pp. 93-100, 2015.

[15] S. García-Galán, R. Prado, and J. M. Exposito, "Fuzzy scheduling with swarm intelligence-based knowledge acquisition for grid computing," Engineering Applications of Artificial Intelligence, vol. 25, pp. 359-375, 2012.

[16] A. Schaerf, Y. Shoham, and M. Tennenholtz, "Adaptive load balancing: A study in multi-agent learning," Journal of artificial intelligence research, pp. 475-500, 1995.

[17] R. F. Freund, M. Gherrity, S. Ambrosius, M. Campbell, M. Halderman, D. Hensgen, et al., "Scheduling resources in multiuser, heterogeneous, computing environments with SmartNet," in Heterogeneous Computing Workshop, 1998.(HCW 98) Proceedings. 1998 Seventh, 1998, pp. 184-199.


Designed for Accessibility and to further Open Science