skip to main content

Rigorous Systems Research Group (RSRG) Seminar

Thursday, April 14, 2016
12:00pm to 1:00pm
Add to Cal
Annenberg 213
Balancing the Benefits and Costs of Redundancy // Admitting More Tenants with Tail Latency SLOs
Kristy Gardner, Computer Science, Carnegie Mellon University,
Timothy Zhu, Computer Science, Carnegie Mellon University,

Balancing the Benefits and Costs of Redundancy


Speaker:  Kristen Gardner, Computer Science, Carnegie Mellon University
 
Redundancy is an important tool used to reduce latency in computer systems. The idea is to create multiple copies of the same job and wait for the first copy to complete service. Empirical results have demonstrated that redundancy can provide a significant reduction in response time. However, a major concern is that running multiple copies of the same job adds too much load to the system, thereby potentially hurting response time. Unfortunately, most of the existing theoretical work on redundancy does not address this tradeoff.
 
In this talk, we introduce a new modeling framework that allows us to capture some of the practical concerns in systems with redundancy. We propose a new dispatching policy, Redundant-On-Idle, which is designed to balance the benefits and costs of redundancy, and we derive an approximation for response time under this policy. This work is currently in progress and any feedback is welcome!
 
Joint work with Mor Harchol-Balter and Alan Scheller-Wolf.


-----


Admitting more tenants with tail latency SLOs

Speaker:  Timothy Zhu, Computer Science, Carnegie Mellon University

Meeting tail latency Service Level Objectives (SLOs) in shared datacenter networks is known to be an important and challenging problem. The main challenge is in determining limits on the multi-tenancy such that SLOs are met. This requires calculating latency guarantees, which is a difficult problem, especially when tenants exhibit bursty behavior as is common in production environments. Nevertheless, recent papers in the past two years have shown techniques for calculating latency based on a branch of mathematical modeling called Deterministic Network Calculus (DNC). The DNC theory is designed for adversarial worst-case conditions, which is useful in some scenarios, but is often overly conservative. Typical tenants do not require strict worst-case guarantees, but are only looking for SLOs at lower percentiles (e.g., 99th, 99.9th). By considering SLOs at lower percentiles, it is possible to pack together many more tenants while still meeting tail latency SLOs. In this talk, I'll present a brand new technique for calculating tail latency based on a probabilistic theory called Stochastic Network Calculus (SNC). SNC is a new theory that is actively being developed by the theory community to overcome the limitations of DNC, and we are the first to bring this theory to practice in a real computer system. In experiments on our cluster, we demonstrate that our system can support twice as many tenants as the state-of-the-art while meeting tail latency SLOs.

 

For more information, please contact Sydney Garstang by email at [email protected].