Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds | Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

10 min read Original article ↗

Abstract

Abstract

Competition for the last-level cache (LLC) is a long-standing issue in multi-tenant cloud environments, often leading to severe performance interference among co-located virtual machines. LLC management in the cloud faces unique challenges, including unpredictable tenant workloads, misaligned performance metrics, and the need to ensure fairness under service level agreements (SLAs). Existing LLC allocation methods fall short in addressing these challenges. We present Cacheman, a comprehensive LLC management system designed from real-world cloud deployment experience. Cacheman introduces a novel gradient-based sharing mechanism for LLC ways, enabling smooth LLC allocation adjustments that simultaneously improve fairness and utilization efficiency. Its real-time allocation algorithm promptly detects and mitigates unfair LLC allocation, adapting to dynamic workloads with second-scale responsiveness. Additionally, Cacheman supports performance consistency for tenants running distributed applications by enforcing negotiated upper bounds on cache usage. Extensive experiments demonstrate that Cacheman effectively achieves its multi-dimensional goals, and long-term production deployment further shows that it significantly reduces SLA violations caused by LLC contention.

AI Summary

AI-Generated Summary (Experimental)

This summary was generated using automated tools and was not authored or reviewed by the article's author(s). It is provided to support discovery, help readers assess relevance, and assist readers from adjacent research areas in understanding the work. It is intended to complement the author-supplied abstract, which remains the primary summary of the paper. The full article remains the authoritative version of record. Click here to learn more.

Click here to comment on the accuracy, clarity, and usefulness of this summary. Doing so will help inform refinements and future regenerated versions.

To view this AI-generated plain language summary, you must have Premium access.

Formats available

You can view the full content in the following formats:

References

[1]

Jeongseob Ahn, Changdae Kim, Jaeung Han, Young-ri Choi, and Jaehyuk Huh. 2012. Dynamic virtual machine scheduling in clouds for architectural shared resources. In 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).

[2]

AMD. 2022. AMD64 Technology Platform Quality of Service Extensions. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/other/56375_1_03_PUB.pdf

[3]

Yang Bai, Yizhi Huang, Si Chen, and Renfa Li. 2025. PaLLOC: Pairwise-based low-latency online coordinated resource manager of last-level cache and memory bandwidth on multicore systems. Journal of Systems Architecture, 164 (2025), 103427.

[4]

Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques (PACT). 213–224.

[5]

Idan Burstein. 2021. Nvidia data center processing unit (dpu) architecture. In Proceedings of the 2021 IEEE Hot Chips Symposium (HCS). 1–20.

[6]

Bodhisatwa Chatterjee, Sharjeel Khan, and Santosh Pande. 2022. Com-CAS: Effective cache apportioning under compiler guidance. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 14–27.

[7]

Liuhua Chen, Haiying Shen, and Stephen Platt. 2016. Cache contention aware virtual machine placement and migration in cloud datacenters. In Proceedings of the 24th International Conference on Network Protocols (ICNP). 1–10.

[8]

Ruobing Chen, Haosen Shi, Yusen Li, Xiaoguang Liu, and Gang Wang. 2023. OLPart: Online learning based resource partitioning for colocating multiple latency-critical jobs on commodity computers. In Proceedings of the Eighteenth European Conference on Computer Systems (EuroSys). 347–364.

[9]

Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 104–117.

[10]

Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2019. Make the most out of last level cache in intel processors. In Proceedings of the Fourteenth European Conference on Computer Systems (EuroSys). 1–17.

[11]

Liran Funaro, Orna Agmon Ben-Yehuda, and Assaf Schuster. 2016. Ginseng: Market-Driven LLC Allocation. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC). 295–308.

[12]

Adrian Garcia-Garcia, Juan Carlos Saez, Fernando Castro, and Manuel Prieto-Matias. 2019. LFOC: A lightweight fairness-oriented cache clustering policy for commodity multicores. In Proceedings of the 48th international conference on parallel processing (ICPP). 1–10.

[13]

Andrew Herdrich, Edwin Verplanke, Priya Autee, Ramesh Illikkal, Chris Gianos, Ronak Singhal, and Ravi Iyer. 2016. Cache QoS: From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 657–668.

[14]

Armel Jeatsa, Boris Teabe, and Daniel Hagimont. 2022. CASY: A CPU cache allocation system for FaaS platform. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 494–503.

[15]

Yichuan Jiang. 2015. A survey of task allocation and load balancing in distributed systems. IEEE Transactions on Parallel and Distributed Systems, 27, 2 (2015), 585–599.

[16]

Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. ACM Sigplan Notices, 49, 4 (2014), 729–742.

[17]

Vincent Kherbache, Eric Madelaine, and Fabien Hermenier. 2017. Scheduling live migration of virtual machines. IEEE transactions on cloud computing, 8, 1 (2017), 282–296.

[18]

Samuel T King, George W Dunlap, and Peter M Chen. 2003. Operating System Support for Virtual Machines. In Proceedings of the 2003 USENIX Annual Technical Conference (USENIX ATC). 71–84.

[19]

Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: the Linux virtual machine monitor. In Proceedings of the Linux symposium. 1, 225–230.

[20]

Hyunjin Lee, Sangyeun Cho, and Bruce R Childers. 2011. CloudCache: Expanding and shrinking private caches. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (ISCA). 219–230.

[21]

Hwanjun Lee, Seunghak Lee, Yeji Jung, and Daehoon Kim. 2023. T-cat: Dynamic cache allocation for tiered memory systems with memory interleaving. IEEE Computer Architecture Letters, 22, 2 (2023), 73–76.

[22]

Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and Ponnuswamy Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA). 367–378.

[23]

Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser, and Ruby B Lee. 2016. Catalyst: Defeating last-level cache side channel attacks in cloud computing. In Proceedings of the 2016 IEEE international symposium on high performance computer architecture (HPCA). 406–418.

[24]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). 450–462.

[25]

R Manikantan, Kaushik Rajan, and R Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA). 428–439.

[26]

Sai Prashanth Muralidhara, Mahmut Kandemir, and Padma Raghavan. 2010. Intra-application cache partitioning. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). 1–12.

[27]

Agustín Navarro-Torres, Jesús Alastruey-Benedé, Pablo Ibáñez, and Víctor Viñals-Yúfera. 2023. BALANCER: bandwidth allocation and cache partitioning for multicore processors. The Journal of Supercomputing, 79, 9 (2023), 10252–10276.

[28]

Abhisek Pan and Vijay S Pai. 2013. Imbalanced cache partitioning for balanced data-parallel programs. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Micro). 297–309.

[29]

Reena Panda, Shuang Song, Joseph Dean, and Lizy K John. 2018. Wait of a decade: Did spec cpu 2017 broaden the performance horizon? In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA). 271–282.

[30]

Jinsu Park, Seongbeom Park, and Woongki Baek. 2019. Copart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers. In Proceedings of the Fourteenth European Conference on Computer Systems (EuroSys). 1–16.

[31]

Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 193–206.

[32]

Lucia Pons, Julio Sahuquillo, Salvador Petit, and Julio Pons. 2022. Cache-poll: Containing pollution in non-inclusive caches through cache partitioning. In Proceedings of the 51st International Conference on Parallel Processing (ICPP). 1–11.

[33]

Lucia Pons, Vicent Selfa, Julio Sahuquillo, Salvador Petit, and Julio Pons. 2018. Improving system turnaround time with intel CAT by identifying LLC critical applications. In Proceedings of the 24th International European Conference on Parallel and Distributed Computing (Euro-Par). 603–615.

[34]

Jiefan Qiu, Zonghan Hua, Lei Liu, Mingsheng Cao, and Dajiang Chen. 2022. Machine-learning-based cache partition method in cloud environment. Peer-to-Peer Networking and Applications, 15, 1 (2022), 149–162.

[35]

Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2021. Satori: Efficient and fair resource partitioning by sacrificing short-term benefits for long-term gains. In 2021 ACM/IEEE 48th annual international symposium on computer architecture (ISCA). 292–305.

[36]

Adam Ruprecht, Danny Jones, Dmitry Shiraev, Greg Harmon, Maya Spivak, Michael Krebs, Miche Baker-Harvey, and Tyler Sanderson. 2018. VM live migration at scale. ACM SIGPLAN Notices, 53, 3 (2018), 45–56.

[37]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th annual international symposium on Computer architecture (ISCA). 57–68.

[38]

Alberto Scolari, Davide Basilio Bartolini, and Marco Domenico Santambrogio. 2016. A software cache partitioning system for hash-based caches. ACM Transactions on Architecture and Code Optimization (TACO), 13, 4 (2016), 1–24.

[39]

Vicent Selfa, Julio Sahuquillo, Lieven Eeckhout, Salvador Petit, and María E Gómez. 2017. Application clustering policies to address system fairness with intel’s cache allocation technology. In 2017 26th international conference on parallel architectures and compilation techniques (PACT). 194–205.

[40]

Mohammad Shahrad, Sameh Elnikety, and Ricardo Bianchini. 2021. Provisioning differentiated last-level cache allocations to VMs in public clouds. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 319–334.

[41]

Till Smejkal, Jan Bierbaum, Thomas Oberhauser, Horst Schirmeier, and Hermann Härtig. 2023. Sleep well: Pragmatic analysis of the idle states of intel processors. In Proceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies. 1–10.

[42]

Wei Song, Boya Li, Zihan Xue, Zhenzhen Li, Wenhao Wang, and Peng Liu. 2021. Randomized last-level caches are still vulnerable to cache side-channel attacks! but we can fix it. In 2021 IEEE Symposium on Security and Privacy (SP). 955–969.

[43]

Binqi Sun, Debayan Roy, Tomasz Kloda, Andrea Bastoni, Rodolfo Pellizzoni, and Marco Caccamo. 2023. Co-optimizing cache partitioning and multi-core task scheduling: Exploit cache sensitivity or not? In 2023 IEEE Real-Time Systems Symposium (RTSS). 224–236.

[44]

Qinhan Tan, Zhihua Zeng, Kai Bu, and Kui Ren. 2020. PhantomCache: Obfuscating cache conflicts with localized randomization. In Proceedings of the Network and Distributed System Security Symposium (NDSS).

[45]

Alain Tchana, Bao Bui, Boris Teabe, Vlad Nitu, and Daniel Hagimont. 2016. Mitigating performance unpredictability in the IaaS using the Kyoto principle. In Proceedings of the 17th International Middleware Conference. 1–10.

[46]

Peng Wang, Yu Liu, Ziqi Liu, Zhelong Zhao, Ke Liu, Ke Zhou, and Zhihai Huang. 2024. ε -LAP: A Lightweight and Adaptive Cache Partitioning Scheme With Prudent Resizing Decisions for Content Delivery Networks. IEEE Transactions on Cloud Computing, 12, 3 (2024), 942–953.

[47]

Ruisheng Wang and Lizhong Chen. 2014. Futility scaling: High-associativity cache partitioning. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (Micro). 356–367.

[48]

Xiaodong Wang, Shuang Chen, Jeff Setter, and José F Martínez. 2017. SWAP: Effective fine-grain management of shared last-level caches with minimum hardware support. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 121–132.

[49]

Yaocheng Xiang, Xiaolin Wang, Zihui Huang, Zeyu Wang, Yingwei Luo, and Zhenlin Wang. 2018. DCAPS: Dynamic cache allocation with partial sharing. In Proceedings of the Thirteenth European Conference on Computer Systems (EuroSys). 1–15.

[50]

Cong Xu, Karthick Rajamani, Alexandre Ferreira, Wesley Felter, Juan Rubio, and Yang Li. 2018. dcat: Dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service. In Proceedings of the Thirteenth European Conference on Computer Systems (EuroSys). 1–13.

[51]

Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. Coloris: a dynamic cache partitioning system using page coloring. In Proceedings of the 23rd international conference on Parallel architectures and compilation techniques (PACT). 381–392.

[52]

Fenghua Yu, Tony Luck, and Vikas Shivappa. 2016. User Interface for Resource Control feature. https://www.kernel.org/doc/Documentation/x86/resctrl.rst

[53]

Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, and Nam Sung Kim. 2021. Don’t forget the I/O when allocating your LLC. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 112–125.

[54]

Chuanqi Zhang, Xueqi Li, Ninghui Sun, Yungang Bao, and Sa Wang. 2024. LazyCAT: Efficient Fine-Grained Cache Partitioning with Two Boundaries. In 2024 IEEE International Conference on High Performance Computing and Communications (HPCC). 111–118.

[55]

Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Keqiu Li, and Yungang Bao. 2020. Rhythm: component-distinguishable workload deployment in datacenters. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys). 1–17.

[56]

Zirui Neil Zhao, Adam Morrison, Christopher W Fletcher, and Josep Torrellas. 2024. Last-level cache side-channel attacks are feasible in the modern public cloud. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 582–600.

[57]

Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the 21st international conference on architectural support for programming languages and operating systems (ASPLOS). 33–47.