Today, large production environments often need to handle a large variety of applications, including but not limited to interactive (user-facing) services, latency sensitive applications, batch analytics jobs, stream processing, iterative computations, maintenance services, etc. The standard practice today is to deploy these applications as containers which are then managed by various container orchestration engines
such as Docker-Swarm [16], YARN [43], Mesos [15], or Kubernetes [5]. These orchestration engines allocate resources (e.g., CPU and memory) to these jobs according to the estimated resource limits provided by the developers [5]. In a multi-tenant shared cluster, if multiple applications compete for the same shared resources, they slow each other down due to resource contention [24, 25, 45]. Thus to reduce the chances of contention, orchestration engines use developer specified affinity, and anti-affinity) [5, 9] rules to place applications on different machines. For stateless-services, resource estimates can be a bit aggressive, such that the resources allocated to each of the deployed containers would be enough to make it run smoothly, while the load fluctuations can be handled through an autoscaling mechanism by increasing or decreasing the number of deployed containers on the fly. For stateful-services, autoscaling can be really tricky and reactive migration of containers across machines have high overheads [9]. Hence, the containers are usually deployed with very conservative estimates by specifying large resource limits so that they can sustain phases with substantial increase in the resource demands. However, periods with such high resource usages are rare and often span only a very short fraction of the life-cycle of application, leading to resource wastage during the comparatively idle times.
In most of the real production systems, not all the applications would require to use the peak resource at the same time, and not all phases of their execution would contend for resources in a similar manner [1, 37, 45]. For example, Figure 1 shows the CPU usage characteristics of three production services in our cluster, across various phases of their execution. Their temporal resource usages show several "peaks" and "valleys", some more regular than the others. For user-facing services, such temporal resource usages can have daily and seasonal patterns due to fluctuations in user-demands [30] (e.g. some services are mostly used during working hours, while some services are mostly used during major holidays seasons). Applications can have different resource usage patterns across algorithmic phases [31, 45, 48], e.g., between the map and reduce phases in map-reduce jobs. Because of these temporal variations in the resource usage, deploying for peak using developer-provided limits is inefficient from an overall
Figure 1: Examples of resource usages by three production services with several "peaks" and "valleys".
resource utilization perspective. The variety of applications and the complexity of their temporal resource usage patterns makes it infeasible for the developers to express the placement logic in terms of existing placement rules available in current schedulers, e.g., affinity and anti-affinity rules in Kubernetes [2].
Borg [44] partially addresses this problem by packing a mix of high and low priority jobs in each machine, so that high priority jobs can expand during load spikes whereas low priority jobs can take advantage during the idle periods of the high priority jobs. However, not all clusters see such a health mix of low priority jobs to effectively fill the valleys of the high priority jobs.
Along with the temporal usage patterns, some jobs might have dependent succeeding jobs that rely on the completion of the first job. These dependencies can be intra or inter services. For example, a customer might have a nightly recommendation model builder, post completion of which a service kicks in to generate a new set of recommendations. A job scheduler that is aware of such dependencies can further utilize this information to efficiently schedule the existing jobs while making room for the upcoming jobs. A central scheduler can even discover serendipitous dependencies between different jobs coming from completely different developer groups, opening up scopes for resource alignment among these jobs leading to improved utilization of the cluster.
In this paper, we introduce an early prototype of DeepPlace, a self-learning scheduler that can opportunistically place containerized applications such that their temporal resource usages are aligned, resource contentions are minimized, quality of service is maintained and overall utilization improved. DeepPlace uses deep reinforcement learning (Deep RL) to learn hidden patterns from historical data over time to improve its scheduling policy. Essentially, DeepPlace treats resource usages of the applications as a multivariate timeseries and learns how these timeseries can be placed across different machines so that their resource usages are better aligned. We show through some example cases, how DeepPlace can take non-trivial decisions by anticipating future placement requests in order to optimize the overall resource usage in the cluster. For stateful-services, DeepPlace helps by minimizing the chances of resource contention, without being overly conservative, leading to operational excellence. For stateless-services, the need for scaling-up can be reduced by having a better placement to begin with. With a better placement, a small number of containers might be able to gracefully handle the load up to a certain extent without a need for scaling-up. However, when scaling-up does happen, where to place those new additional containers is another crucial question, as usually in container scaling new machines are not spawn off frequently, that can be answered by DeepPlace.
Reinforcement Learning. In RL, at a high-level, an agent interacts with a system and tries to learn an optimized policy. At each timestep t, the agent observes the state of the system , and chooses to take an
that changes the state to
at timestep t + 1, and the agent receives a reward
. The agent tries to maximize the received reward which would help it to learn an optimized policy. It is assumed that the state transitions and rewards are stochastic and the state transition probabilities and rewards depend only on the state of the environment
and the action taken by the agent
(i.e., show Markov property [41]).
The objective is to maximize the expected cumulative discounted reward: where
determines how much the future rewards contribute to the total reward. More details of theoretical background of RL can be found in [41] and [26]. Inspired by the recent trends in Deep Reinforcement Learning (DeepRL), in this paper, we use deep-neural-networks (DNNs) as a function approximator for the placement policy that DeepPlace wants to learn. The RL algorithm can perform gradient-descent on the parameters of this DNN so that it can maximize the expected cumulative discounted reward over the actions the RL-agent takes. The gradients are estimated by observing the trajectories of execution that are obtained by following the policy.
RL has been used in variety of scenarios including learning complex games [12, 20, 32], robotics [18, 19, 33], and very recently for video streaming [27], routing [4, 28, 42], device placement [29]. But, the application of RL to self-learning schedulers has not been thoroughly explored. Scheduling. To the best of our knowledge, recently proposed DeepRM [26] is the only other self-learning scheduler that also attempts to learn novel scheduling policy using DeepRL. DeepRM has a very simplified view of the cluster and thus comes with several limitations.
We now describe the design of DeepPlace explaining how it operates. DeepPlace observes temporal job behavior to optimize its policy, encoded in a DNN-based policy network, using RL. DeepPlace models the scheduling problem as an RL-environment where the compute cluster is composed of N machines on which the application services or jobs are to be scheduled. Each such machine has amount of total physical resource capacity for resource dimension d (e.g., CPU, Memory, etc.). For a job or service j, DeepPlace observes the time-series of the resource usages denoted as
where
is the resource usage along the resource dimension d. DeepPlace also keeps track of the current placement map of which services or jobs are running on which machines as well as what are the incoming services or jobs that need to be scheduled in the cluster, as a queue. The purpose of the
Figure 2: Input space representation of DeepPlace
Figure 3: Workflow of DeepPlace
queue is to incorporate in the state representation, a view of the upcoming jobs thus allowing the scheduler to learn the arrival patterns and dependencies amongst the jobs. The complete workflow for DeepPlace is shown in Figure 3.
3.1 State Space Representation
DeepPlace’s state space representation is inspired by [26]. Though DeepRM’s representation for scheduling is designed to answer: "what job to schedule when", DeepPlace is designed primarily to answer: "what job to schedule where". In extreme cases, DeepPlace can delay some scheduling decisions if no suitable placement exists. Thus, DeepPlace makes some key improvements in the input-space representation to capture the degree of competition for resources among the jobs sharing the same underlying resources of a machine and their temporal variations in resource usages.
Figure 2 illustrates the input-space representation.
(1) State of each machine in the cluster is represented as a 2D matrix or an image with pixels for each of the resource dimension d, where k is the number of previous logical timesteps.
(2) Within each machine, the vertical direction of the image (i.e., the matrix) represents the time axis and shows the utilization of jobs for up to k previous logical timesteps, and the horizontal direction represents the amount of resource used by each job/services (quantized into units of resources). This type of representation helps the DNN-based RL-agent to learn the temporal resource usage characteristics of each job. k is a configurable parameter that the user can choose. The value of k should be a number reasonably large enough w.r.t. scheduling time-scale so that it helps the agent to capture a significant overlap among applications as well as
temporal variations in the resource usage. However, larger k results in longer convergence time for the RL-agent.
same type of application with assigned number representation 0.2, are running in two different machines (machine 1 and 2), then in the combined state-space representation, these tasks will be represented as 1.2 and 2.2 respectively.
(8) Along with the combined representation of the machines, DeepPlace also keeps a waiting-queue in its state-space representation. This queue represents the tasks waiting to be scheduled. By observing the changes in the queue over time, the RL-agent learns some key dynamics about the arrival characteristics of the jobs, which type of and how many jobs come together, and the temporal dependency amongst them, as previously discussed.
3.2 Reward/Penalty Design
DeepPlace is driven by negative rewards (penalty) which has the following four components: Resource contention penalty. To help DeepPlace learn a placement policy that results in better resource alignment (complementary) and avoid resource contention among tasks scheduled in the same machine we use a modified version of cross-correlation to penalize the RL-agent during its learning. Cross-correlation (Cr) is calculated between all pairs of tasks i and j running on the same machine across resource dimension d as follows:
whereis the length of task
instantaneous resource demand across dimension d by task i at time t. Cross-correlation formula amplifies the effect of two peaks being scheduled together. The Cr for a particular state of the cluster is calculated by taking the sum of cross-correlation of each machine, which includes across all the resource dimensions (CPU or memory), the cross-correlation of each task with every other task in that machine.
Resource over-utilization penalty. To prevent scheduling of more tasks than that can be handled by a machine, there is a high penalty if the machine is not able to meet the resource requirement of tasks scheduled in that machine. It is calculated by adding a high negative factor each time a machine is unable to provide appropriate resources to the running tasks.
Wait-time penalty. To prevent the scheduler from holding jobs for a long time in search of a better place, we add a constant penalty proportional to the state of the waiting queue. It is equal to the number of waiting tasks in the queue multiplied by a negative constant at each time.
Under-utilization penalty. Since our goal is to improve overall utilization of the cluster by helping the scheduler learn how to achieve tighter packing and pack on less number of machines, if possible, we add a penalty proportional to the sum of unused resources in the used machines. White pixels in our state-representations denote the number of unused resources at any given time.
We use the modified version of REINFORCE algorithm as mentioned in [26]. The policy network consists of a single hidden layer of 20 neurons followed by output neurons equal to the number of actions (number of machines under consideration). We use a 36 core CPU server and python multiprocessing to create multiple workers (equal to batch size+1) each operating on distinct examples, taking a fixed number of trajectories and accumulating gradients. The last worker is used to combine the gradients of each worker and send to the policy network for updating the parameters. This gives a major improvement in the training speed. The training time increases significantly as we increase the cluster load. It also depends significantly on the type of applications under consideration (For example, Long running vs Short running jobs). For the hidden layer, we use Relu activation function, while for the output layer we use softmax activation. We use Adam optimizer and a learning rate of 0.001. The number of trajectories taken by each worker is fixed at 20.
[Workload.] In our evaluation setup, jobs arrive online as a Poisson process. The average job arrival rate is calibrated to create three average cluster load scenarios: 30%, 50% and 80%. In our setup, 50% of the jobs are long running and the other half are short running. Each job has 2 dimensions of resource requirements: CPU and memory. The capacity of these two resources in each machine is denoted by {1r, 1r} For each job, dominant resource usage is randomly chosen to be either CPU or memory. The resource usage of the dominant resource is independently chosen from a uniform distribution between 0.3r and 0.5r. The non-dominant resource usage is also independently and uniformly varied between 0.08r and 0.16r. Thus there is no correlation between the CPU and memory usages. Temporal resource usage for each job varies as a square wave with period uniformly chosen between 0.2t and 0.5t and width as one-fourth of the period, where t denotes the job length. Total 50 such different jobs are used for training and 18 for testing. Our evaluation runs with a cluster of 10 machines. [Baselines.] We compare DeepPlace with Tetris [13], which schedules jobs on machines based on how well job’s resource requirement aligns with the machine’s available resources balancing preferences for short jobs and packing in a combined score. We also compare it against Best Fit heuristic which allocates the job to the machine having the least units of the dominant resource of the job left.
Note: It is not possible to directly compare DeepPlace with DeepRM [26] because DeepRM only specifies which job to be scheduled next and does not say on which machine it should be scheduled. Thus DeepRM does not have any concepts of competition for resource usage among applications running in the same machine, resource fragmentation and machine-level over-utilization. Thus, a fair comparison with DeepRM with respect to our desired metrics is not possible.
[Learning Progress.] We first show how DeepPlace’s learning converge across multiple iterations in Figure 4. It can be observed that roughly after 1000 iterations, DeepPlace’s policy learning starts to converge and does not see any further significant drop in the normalized penalty.
[Improvement in Cluster Utilization.] We measure average utilization of machines for each resource as:
where T is the length of the observation period. Since the number of machines that are actually being actively used varies over time, in the denominator, we used maximum number of machines used at any point in time to normalize. [Comparing Scheduling Efficiency.] Here in Figure 7, we show how DeepPlace optimizes for cluster utilization for both CPU and memory. We can see that DeepPlace can provide a 68-100% increase in average utilization compared to Tetris across different cluster-load conditions. This is primarily achieved by efficient packing that requires significantly less number of machines to be used compared to Tetris as shown in Figure 5. Further, it can be observed that the gap between DeepPlace and Tetris in terms of the number of machines required to accommodate the jobs increases with the increase of the cluster load. Although it looks like BestFit provides even higher utilization because it just packs the jobs into the machines without any knowledge of peak or future resource usages of the jobs and as a consequence, BestFit suffers from huge over-utilization of the resources as shown in Figure 6. On the other hand, over-utilization due to DeepPlace’s placement decisions are almost negligible. Tetris already includes peak resource usage information in its placement decision thus resulting in no resource over-utilization.
[Improvement in Resource Fragmentation.] Fragmentation score of a cluster at a high-level measure what part of all the available resource in a cluster are concentrated.
The lower the fragmentation score, the higher the ability of the cluster to schedule unanticipated large jobs. Hence, low resource fragmentation in the cluster is a desirable operational property. In Figure 8, we see DeepPlace provides 6-13% reduction in resource fragmentation compared to Tetris. DeepPlace’s intelligent placement which takes both temporal resource usage characteristics and job arrival patterns
Figure 4: Convergence of ing under 50% average load
Figure 5: Comparison of number of machines used
Figure 6: Comparison of over-utilization in the cluster
Figure 7: Comparison of average resource utilization in the cluster
Figure 8: Comparison of resource fragmentation level in the cluster
Figure 9: Examples of learned placement policies
leaves bigger room in the machines (i.e. less fragmentation score) to accommodate unanticipated large jobs.
In this section, we discuss insights and applicability for real deployments.
What DeepPlace learned? Figure 9 illustrates how DeepPlace achieved better packing that ultimately resulted in higher overall utilization. Figure 9a shows how Job1 and Job2 were placed in the same machine because resource intensive parts of Job1 would finish before the resources are required by Job2. In Figure 9b, resource requirements for Job3 and Job4 alternate in such a manner that they do not exactly overlap with each other and thus were placed in the same machine for better packing. All these patterns were learned by DeepPlace on its own without any guiding rules.
looks at where to schedule an incoming application so that it can either improve the resource utilization or reduce the resource contention. However, how often such a placement decision needs to be made depends on the what kind of workload the cluster is handling. For a cluster handling short or medium-duration batch, cron or interactive
applications, frequent placement decisions need to be made and DeepPlace can be very useful. On the other hand, for long running services, typically new placement decisions are made less frequently, e.g., when the container for an upgraded service is being deployed, etc. However, if auto-scaling is enabled for these services, taking the decision on where the additional auto-scaled container should be placed in the cluster, can be suggested by DeepPlace. Cluster size. Our input-space representation as well as action-space of the RL is proportional to the number of machines in the cluster. Hence, larger the size of the cluster, the more iterations and training examples it needs for its policy learning to converge.
uses historical time-series pattern of resource usages to learn what job is to be scheduled in which machine so that based on their resource usage characteristics, they either improve the overal utilization or avoid aggravating contention by using the same resource at the same time. If DeepPlace starts to learn from scratch, it can be long before it sees sufficient examples required for its learning to converge. An option to speed up learning by bootstrapping the RL-agent’s policy is by replaying the time-series of historical resource usage through a simulation.
To conclude, in this paper we show an early design prototype of a self-learning scheduler that can exploit the temporal resource usage patterns and arrival dependencies of the jobs to provide a better placement policy and thus achieve better utilization without requiring any manually crafted rules or heuristics.
[1] Amvrosiadis, G., Park, J. W., Ganger, G. R., Gibson, G. A., Baseman, E., and DeBardeleben, N. On the diversity of cluster workloads and its impact on research results. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18) (2018), pp. 533–546.
[2] Authors, T. K. Affinity and anti-affinity. https://kubernetes.io/docs/ concepts/configuration/assign-pod-node/#affinity-and-anti-affinity.
[3] Boutin, E., Ekanayake, J., Lin, W., Shi, B., Zhou, J., Qian, Z., Wu, M., and Zhou, L. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In OSDI (2014), vol. 14, pp. 285–300.
[4] Boyan, J. A., and Littman, M. L. Packet routing in dynamically changing networks: A reinforcement learning approach. In Advances in neural information processing systems (1994), pp. 671–678.
[5] Burns, B., Grant, B., Oppenheimer, D., Brewer, E., and Wilkes, J. Borg, omega, and kubernetes. Queue 14, 1 (2016), 10.
[6] Delimitrou, C., and Kozyrakis, C. Paragon: Qos-aware scheduling for heterogeneous datacenters. In ASPLOS (2013).
[7] Delimitrou, C., and Kozyrakis, C. Quasar: Resource-efficient and qos-aware cluster management. In ASPLOS (2014).
[8] Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and Fonseca, R. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems (Eurosys) (2012), ACM, pp. 99–112.
[9] Garefalakis, P., Karanasos, K., Pietzuch, P. R., Suresh, A., and Rao, S. Medea: scheduling of long running applications in shared production clusters. In EuroSys (2018), pp. 4–1.
[10] Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: Fair allocation of multiple resource types. In Nsdi (2011), vol. 11, pp. 24–24.
[11] Ghodsi, A., Zaharia, M., Shenker, S., and Stoica, I. Choosy: Maxmin fair sharing for datacenter jobs with constraints. In Proceedings of the 8th ACM European Conference on Computer Systems (2013), ACM, pp. 365–378.
[12] Gibney, E. Google ai algorithm masters ancient game of go. Nature News 529, 7587 (2016), 445.
[13] Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S., and Akella, A. Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review 44, 4 (2015), 455–466.
[14] Haqe, M. E., He, Y., Elnikety, S., Bianchini, R., McKinley, K. S., et al. Few-to-many: Incremental parallelism for reducing tail latency in interactive services. In ACM SIGPLAN Notices (2015), vol. 50, ACM, pp. 161–175.
[15] Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R. H., Shenker, S., and Stoica, I. Mesos: A platform for finegrained resource sharing in the data center. In NSDI (2011), vol. 11, pp. 22–22.
[16] Inc., D. Docker swarm. https://docs.docker.com/engine/swarm/ how-swarm-mode-works/services/.
[17] Joe-Wong, C., Sen, S., Lan, T., and Chiang, M. Multiresource allocation: Fairness-efficiency tradeoffs in a unifying framework. IEEE/ACM Transactions on Networking (TON) 21, 6 (2013), 1785–1798.
[18] Kaelbling, L. P., Littman, M. L., and Moore, A. W. Reinforcement learning: A survey. Journal of artificial intelligence research 4 (1996), 237–285.
[19] Kober, J., Bagnell, J. A., and Peters, J. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 11 (2013), 1238–1274.
[20] Lample, G., and Chaplot, D. S. Playing fps games with deep reinforcement learning. In AAAI (2017), pp. 2140–2146.
[21] Leverich, J., and Kozyrakis, C. Reconciling high server utilization and sub-millisecond quality-of-service. In EuroSys (2014).
[22] Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., and Kozyrakis, C. Heracles: Improving resource efficiency at scale. In ISCA (2015).
[23] Mace, J., Bodik, P., Fonseca, R., and Musuvathi, M. Retro: Targeted resource management in multi-tenant distributed systems. In NSDI (2015), pp. 589–603.
[24] Maji, A. K., Mitra, S., and Bagchi, S. Ice: An integrated configuration engine for interference mitigation in cloud services. In 2015 IEEE International Conference on Autonomic Computing (2015), IEEE, pp. 91– 100.
[25] Maji, A. K., Mitra, S., Zhou, B., Bagchi, S., and Verma, A. Mitigating interference in cloud services by middleware reconfiguration. In Proceedings of the 15th International Middleware Conference (2014), ACM, pp. 277–288.
[26] Mao, H., Alizadeh, M., Menache, I., and Kandula, S. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (2016), ACM, pp. 50–56.
[27] Mao, H., Netravali, R., and Alizadeh, M. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2017), SIGCOMM ’17.
[28] Mestres, A., Rodriguez-Natal, A., Carner, J., Barlet-Ros, P., Alar-cón, E., Solé, M., Muntés-Mulero, V., Meyer, D., Barkai, S., Hibbett, M. J., et al. Knowledge-defined networking. ACM SIGCOMM Computer Communication Review (2017).
[29] Mirhoseini, A., Pham, H., Le, Q. V., Steiner, B., Larsen, R., Zhou, Y., Kumar, N., Norouzi, M., Bengio, S., and Dean, J. Device placement optimization with reinforcement learning. In ICML (2017).
[30] Mishra, A. K., Hellerstein, J. L., Cirne, W., and Das, C. R. Towards characterizing cloud backend workloads: insights from google compute clusters. ACM SIGMETRICS Performance Evaluation Review 37, 4 (2010), 34–41.
[31] Mitra, S., Gupta, M. K., Misailovic, S., and Bagchi, S. Phase-aware optimization in approximate computing. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (2017), IEEE, pp. 185–196.
[32] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[33] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[34] Nathuji, R., Kansal, A., and Ghaffarkhah, A. Q-clouds: managing performance interference effects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems (2010), ACM, pp. 237–250.
[35] Parkes, D. C., Procaccia, A. D., and Shah, N. Beyond dominant resource fairness: Extensions, limitations, and indivisibilities. ACM Transactions on Economics and Computation (TEAC) 3, 1 (2015), 3.
[36] Popa, L., Kumar, G., Chowdhury, M., Krishnamurthy, A., Ratnasamy, S., and Stoica, I. Faircloud: sharing the network in cloud computing. ACM SIGCOMM Computer Communication Review 42, 4 (2012), 187–198.
[37] Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing (2012), ACM, p. 7.
[38] Ren, X., Ananthanarayanan, G., Wierman, A., and Yu, M. Hopper: Decentralized speculation-aware cluster scheduling at scale. In ACM SIGCOMM Computer Communication Review (2015), vol. 45, ACM, pp. 379–392.
[39] Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes,
J. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (2013), ACM, pp. 351–364.
[40] Suresh, P. L., Canini, M., Schmid, S., and Feldmann, A. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In 12th USENIX Symposium on Networked Systems Design and Implementation (2015), USENIX Association, pp. 513–527.
[41] Sutton, R. S., Barto, A. G., et al. Reinforcement learning: An introduction. MIT press, 1998.
[42] Valadarsky, A., Schapira, M., Shahaf, D., and Tamar, A. Learning to route. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks (2017), ACM, pp. 185–191.
[43] Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (2013), ACM, p. 5.
[44] Verma, A., Pedrosa, L., Korupolu, M. R., Oppenheimer, D., Tune, E., and Wilkes, J. Large-scale cluster management at Google with Borg. In EuroSys (2015).
[45] Xu, R., Mitra, S., Rahman, J., Bai, P., Zhou, B., Bronevetsky, G., and Bagchi, S. Pythia: Improving datacenter utilization via precise contention prediction for multiple co-located workloads. In Proceedings of the 19th International Middleware Conference (2018), ACM, pp. 146– 160.
[46] Xu, Y., Musgrave, Z., Noble, B., and Bailey, M. Bobtail: Avoiding long tails in the cloud. In NSDI (2013).
[47] Yang, H., Breslow, A., Mars, J., and Tang, L. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In International Symposium on Computer Architecture (ISCA) (2013), pp. 607–618.
[48] Zhang, J., Yousif, M., Carpenter, R., and Figueiredo, R. J. Application resource demand phase analysis and prediction in support of dynamic resource provisioning. In Fourth International Conference on (2007), IEEE, pp. 12–12.
[49] Zhang, X., Tune, E., Hagmann, R., Jnagal, R., Gokhale, V., and Wilkes, J. Cpi2: Cpu performance isolation for shared compute clusters. In EuroSys (2013).