Designing a Practical CDN Cache
Abstract
Machine learning is transforming system designs, including caching strategies for Content Distribution Networks (CDNs). In this talk, I will share how we leverage ML to enhance CDN cache performance by treating cache replacement as a predictive optimization problem based on historical access patterns. Building a practical ML-driven CDN cache presents several key challenges: (1) Developing and refining an ML model that sustains high prediction accuracy in dynamic, real-world CDN environments, (2) Minimizing eviction overhead to match the efficiency of heuristic-based approaches, (3) Ensuring seamless integration into existing CDN architectures without disrupting performance, and (4) Balancing ML model storage requirements with available cache space to optimize overall capacity and efficiency. We will discuss the solutions implemented in two learned CDN cache systems that achieve significantly lower miss ratios than state-of-the-art heuristics while maintaining comparable throughput, with only modest increases in CPU utilization. Our prototypes have been validated on diverse production workloads, and our collaborations with industry partners have resulted in successful deployment in a large-scale CDN network.
Bio
Kai Li is the Paul M. Wythes ’55, P’86 and Marcia R. Wythes P’86 Professor at Princeton University. He received his B.S. from Jilin University, M.S. from the University of Science and Technology of China, and Ph.D. from Yale University. His research spans parallel and distributed systems, storage systems, and systems support for machine learning. He was an early advocate for using networks of PCs and servers to solve computational problems in parallel. His Ph.D. dissertation pioneered distributed shared memory, enabling data sharing across networks without physically shared memory. At Princeton, he led the development of user-level communication mechanisms for computer clusters—foundational to InfiniBand, a communication standard widely deployed in today’s high-performance clusters. As co-founder of Data Domain, Inc., Li led the creation of deduplication storage systems achieving 20–30× lossless compression for backup data. This innovation allowed cost-effective replacement of tape libraries, revolutionizing data center storage for backup and disaster recovery. Data Domain product line has dominated the market for the past two decades. He was the co-PI of ImageNet, which propelled deep learning to the forefront of machine learning and sparked what many call the deep learning revolution. He has received numerous honors and awards, including eight most influential or test-of-time paper awards across various areas in computer science. He has been elected as an ACM Fellow, an IEEE Fellow, and a member of the US National Academy of Engineering.
