Quality of Service (QoS) support for tail latency SLOs

Meeting tail latency (e.g., 99.9th percentile) Service Level Objectives (SLOs) is important for many user-facing applications. The problem of long tail latencies is pervasive in datacenter environments, and many companies and researchers are trying to better control latency – researchers and companies such as Amazon and Google repeatedly stress the importance of meeting 99th and 99.9th percentile latency goals [2, 1, 3, 7]. One primary challenge in meeting SLOs is managing the congestion between applications in shared datacenters. While much is known about sharing bandwidth [4, 8, 10, 12, 6], less is known about latency control [13, 9, 5]. For latency, it is crucial to account for the effects of burstiness, which is commonly exhibited by applications in practice. Reactive systems such as Cake [13] struggle to cope with bursty applications. Recent modeling-based systems Silo [7] and QJump [3] use the Deterministic Network Calculus (DNC) theory to account for burstiness, which works to meet SLOs, but is conservative in sharing resources. Also, configuring the QoS parameters (e.g., rate limits, priorities) is a non-trivial problem that neither system addresses.

Figure 1: Example storage and network traffic enforcement in a datacenter. The primary QoS parameters we use are prioritization and rate limiting.

I am pioneering efforts in improving the support for tail latency SLOs in datacenters by controlling storage and network traffic (Fig. 1). During my internship at Microsoft Research, I worked on the IOFlow [11] project, which provides tools for enforcing priorities and rate limits in both storage and networks. Our work led to a productized QoS feature in Microsoft’s Hyper-V hypervisor. While priority and rate limiting tools are necessary for meeting tail latency SLOs, a key challenge is figuring out how to configure the priorities and rate limits to meet SLOs. Our PriorityMeister [16] system introduces a new algorithm for configuring priorities based on analyzing the system using the DNC theory. We demonstrate how a modeling approach can be superior to reactive approaches in handling bursty applications. Since then, two other systems, Silo [7] and QJump [3], have adopted DNC-based approaches to handle tail latency SLOs.

While PriorityMeister is a good start in this research area, there are two important aspects that warrant further study. First, DNC-based systems assume adversarial worst-case behavior, which is very conservative in non-adversarial environments. Our SNC-Meister [14] admission control system addresses this issue by using a new probabilistic theory called Stochastic Network Calculus (SNC). As expected, we are able to co-locate many more applications by not assuming all applications are adversarially correlated. Our SNC library is now publicly available for use at https://github.com/timmyzhu/SNC-Meister. Second, while PriorityMeister focuses on configuring priorities, little is known on how to configure rate limits. We have continued our research into rate limit configuration and have found significant differences in how many applications can be co-located based on how rate limits are chosen. Our WorkloadCompactor [15] system combines rate limit configuration with application placement to jointly optimize the rate limit parameter selection with the number of servers needed to meet tail latency SLOs. Our code is open-sourced at https://github.com/timmyzhu/WorkloadCompactor.

References

Jeffrey Dean and Luiz Andre Ì Barroso. The tail at scale. Commun. ACM, 56(2):74-80, February 2013.
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s highly available key-value store. In ACM SOSP, pages 205-220, 2007.
Matthew P Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert NM Watson, Andrew W Moore, Steven Hand, and Jon Crowcroft. Queues don’t matter when you can jump them! In USENIX NSDI, 2015.
Ajay Gulati, Irfan Ahmad, and Carl A. Waldspurger. Parda: proportional allocation of resources for distributed storage access. In Proccedings of the 7th conference on File and storage technologies, FAST ’09, pages 85-98, Berkeley, CA, USA, 2009. USENIX Association.
Ajay Gulati, Arif Merchant, and Peter J. Varman. pclock: an arrival curve based approach for qos guarantees in shared storage systems. In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS ’07, pages 13-24, New York, NY, USA, 2007. ACM.
Ajay Gulati, Arif Merchant, and Peter J. Varman. mclock: handling throughput variability for hypervisor io scheduling. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI’10, pages 1-7, Berkeley, CA, USA, 2010. USENIX Association.
Keon Jang, Justine Sherry, Hitesh Ballani, and Toby Moncaster. Silo: Predictable message latency in the cloud. In ACM SIGCOMM, pages 435-448. ACM, 2015.
Wei Jin, Jeffrey S. Chase, and Jasleen Kaur. Interposed proportional sharing for a storage service utility. In Proceedings of the joint international conference on Measurement and modeling of computer systems, SIGMETRICS ’04/Performance ’04, pages 37-48, New York, NY, USA, 2004. ACM.
Arif Merchant, Mustafa Uysal, Pradeep Padala, Xiaoyun Zhu, Sharad Singhal, and Kang Shin. Maestro: quality- of-service in large disk arrays. In Proceedings of the 8th ACM international conference on Autonomic computing, ICAC ’11, pages 245-254, New York, NY, USA, 2011. ACM.
David Shue, Michael J. Freedman, and Anees Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12, pages 349-362, Berkeley, CA, USA, 2012. USENIX Association.
Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. IOFlow: A Software-defined Storage Architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, pages 182-196, New York, NY, USA, 2013. ACM.
Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, and Gregory R. Ganger. Argon: performance insulation for shared storage servers. In Proceedings of the 5th USENIX conference on File and Storage Technologies, FAST ’07, pages 5-5, Berkeley, CA, USA, 2007. USENIX Association.
Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, and Ion Stoica. Cake: enabling high-level slos on shared storage systems. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC ’12, pages 14:1-14:14, New York, NY, USA, 2012. ACM.
Timothy Zhu, Daniel S. Berger, and Mor Harchol-Balter. SNC-Meister: Admitting More Tenants with Tail Latency SLOs. In ACM SOCC, pages 1-14, New York, NY, USA, 2016. ACM.
Timothy Zhu, Michael A. Kozuch, and Mor Harchol-Balter. WorkloadCompactor: Reducing Datacenter Cost while Providing Tail Latency SLO Guarantees. In ACM SOCC, New York, NY, USA, 2017. ACM.
Timothy Zhu, Alexey Tumanov, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. PriorityMeister: Tail Latency QoS for Shared Networked Storage. In ACM SOCC, pages 29:1-29:14, New York, NY, USA, 2014. ACM.