TY - GEN
T1 - Towards Swap-Free, Continuous Ballooning for Fast, Cloud-Based Virtual Machine Migrations
AU - Negy, Kevin Alarcón
AU - Nightingale, Tycho
AU - Weatherspoon, Hakim
AU - Shen, Zhiming
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/11/20
Y1 - 2024/11/20
N2 - We have a production need to reduce the time for customers to live migrate their application virtual machine (VM) in the cloud. A single customer of ours migrates their nested, cloud-based, user virtual machines tens of thousands of times a month. Ballooning is one technique for modifying the size of a virtual machine and has been used to speed up VM migration and increase VM consolidation. However, it has a significant risk: the ominous out-of-memory (OOM) error. The issue is that it is infeasible to use ballooning during high-risk scenarios, namely during giant memory spikes and during live migration, for fear of swapping or worse, OOM errors. We advance the state of the art by optimizing the Linux balloon driver for VM migration in a non-overcommitted context, resulting in being able to handle both high-risk scenarios without relying on swapping and without causing OOM errors. We add a user-space continuous ballooning program that, in tandem with our balloon driver modifications, can handle memory spikes of hundreds of gigabytes, as well as survive an indefinite number of migrations. In this paper, we discuss our minimal changes to Linux, describe our continuous ballooning program, and evaluate our now in-production, cloud solution on real-world applications. Our tests are designed to measure resilience in the face of several memory spikes and live migrations. In our tests, we add at most 8% overhead, yet can provide a migration speedup of at least 52% for giant VMs with memory intensive applications reaching almost 600 GB.
AB - We have a production need to reduce the time for customers to live migrate their application virtual machine (VM) in the cloud. A single customer of ours migrates their nested, cloud-based, user virtual machines tens of thousands of times a month. Ballooning is one technique for modifying the size of a virtual machine and has been used to speed up VM migration and increase VM consolidation. However, it has a significant risk: the ominous out-of-memory (OOM) error. The issue is that it is infeasible to use ballooning during high-risk scenarios, namely during giant memory spikes and during live migration, for fear of swapping or worse, OOM errors. We advance the state of the art by optimizing the Linux balloon driver for VM migration in a non-overcommitted context, resulting in being able to handle both high-risk scenarios without relying on swapping and without causing OOM errors. We add a user-space continuous ballooning program that, in tandem with our balloon driver modifications, can handle memory spikes of hundreds of gigabytes, as well as survive an indefinite number of migrations. In this paper, we discuss our minimal changes to Linux, describe our continuous ballooning program, and evaluate our now in-production, cloud solution on real-world applications. Our tests are designed to measure resilience in the face of several memory spikes and live migrations. In our tests, we add at most 8% overhead, yet can provide a migration speedup of at least 52% for giant VMs with memory intensive applications reaching almost 600 GB.
KW - Ballooning
KW - Cloud Computing
KW - Memory Management
KW - Virtual Machine Migration
UR - https://www.scopus.com/pages/publications/85215517774
UR - https://www.scopus.com/pages/publications/85215517774#tab=citedBy
U2 - 10.1145/3698038.3698543
DO - 10.1145/3698038.3698543
M3 - Conference contribution
AN - SCOPUS:85215517774
T3 - SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
SP - 269
EP - 283
BT - SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
PB - Association for Computing Machinery, Inc
T2 - 15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Y2 - 20 November 2024 through 22 November 2024
ER -