I am working on designing heterogeneous systems that improve performance and/or cost. Heterogeneous systems allow for optimized designs that take advantage of the strengths of each type of component. For example, Deep Neural Network (DNN) models are a popular Machine Learning (ML) technique that has fueled the recent growth in Artificial Intelligence (AI), and we’ve demonstrated how to optimize their execution by splitting the computation across both GPU and CPU resources [3]. Our code is open-sourced at https://github.com/minus-one/sir_plus. Since most users access these AI tools over the web, we’ve researched how networking hardware within servers can be optimized to interact efficiently with GPU and CPU resources [4]. Our code is available at https://github.com/minus-one/splitrpc. By considering the heterogeneity within systems and optimizing them for their strengths, we expect our work to be impactful for improving performance and efficiency, especially for popular AI/ML systems.
Our research extends beyond hardware heterogeneity to consider how heterogeneity broadly can improve system design, particularly in the context of cloud computing. For example, our research demonstrates how to combine fast and slow servers to minimize tail latency under a fixed cost budget [2]. We have also investigated how to effectively combine different types of cloud VMs (burstable and on-demand) to lower cost while maintaining performance [1]. From the perspective of a cloud provider, we have studied how to efficiently manage a high degree of heterogeneity [5]. The success of these works will provide cloud users and providers new techniques for optimizing cost and performance through more efficient heterogeneous designs.
References
- Ataollah Fatahi Baarzi, Timothy Zhu, and Bhuvan Urgaonkar. BurScale: Using Burstable Instances for Cost-Effective Autoscaling in the Public Cloud. In ACM SOCC, New York, NY, USA, 2019. ACM.
- Adithya Kumar, Iyswarya Narayanan, Timothy Zhu, and Anand Sivasubramaniam. The Fast and The Frugal: Tail Latency Aware Provisioning for Coping with Load Variations. In The Web Conference WWW, New York, NY, USA, 2020. ACM.
- Adithya Kumar, Anand Sivasubramaniam, and Timothy Zhu. Overflowing emerging neural network inference tasks from the GPU to the CPU on heterogeneous servers. In ACM International Conference on Systems and Storage (SYSTOR), New York, NY, USA, 2022. ACM.
- Adithya Kumar, Anand Sivasubramaniam, and Timothy Zhu. SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving. In ACM SIGMETRICS, New York, NY, USA, 2023. ACM.
- Sultan Mahmud Sajal, Luke Marshall, Beibin Li, Shandan Zhou, Abhisek Pan, Konstantina Mellou, Deepak Narayanan, Timothy Zhu, David Dion, Thomas Moscibroda, and Ishai Menache. Kerveros: Efficient and Scalable Cloud Admission Control. In USENIX OSDI, Berkeley, CA, USA, 2023. USENIX Association.