Evaluating system performance

An important part of systems research and evaluation is being able to test systems under realistic settings. However, without direct access to production systems, one must rely upon reproducing the effects in experimental testbeds. This is typically done in ad-hoc manners for each project, and we as a research community often fall into evaluation pitfalls. My research will identify these pitfalls and introduce tools and techniques for performing realistic replay-based experiments that are repeatable/reproducible.

As an example, one commonly overlooked part of the evaluation methodology is in replaying requests (e.g., web request) in an experimental system. Typically, requests are collected and released by companies on their large-scale systems, but experimental systems are small and unable to handle the high load of the original collected data. Thus, researchers typically downscale the data to reduce the load, often in a manner that can lead to pitfalls. We published TraceSplitter [1] at EuroSys evaluating pitfalls in common downscaling approaches and introducing a better approach. Our code is open-sourced at https://github.com/smsajal/TraceSplitter. The goal of our work is to advance the research community practices to conduct more realistic experiments that are reproducible in terms of performance.

References

Sultan Mahmud Sajal, Rubaba Hasan, Timothy Zhu, Bhuvan Urgaonkar, Siddhartha Sen. TraceSplitter: A New Paradigm for Downscaling Traces. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys ’21), 2021. ACM.