Towards Swap-Free, Continuous Ballooning for Fast, Cloud-Based Virtual Machine Migrations

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We have a production need to reduce the time for customers to live migrate their application virtual machine (VM) in the cloud. A single customer of ours migrates their nested, cloud-based, user virtual machines tens of thousands of times a month. Ballooning is one technique for modifying the size of a virtual machine and has been used to speed up VM migration and increase VM consolidation. However, it has a significant risk: the ominous out-of-memory (OOM) error. The issue is that it is infeasible to use ballooning during high-risk scenarios, namely during giant memory spikes and during live migration, for fear of swapping or worse, OOM errors. We advance the state of the art by optimizing the Linux balloon driver for VM migration in a non-overcommitted context, resulting in being able to handle both high-risk scenarios without relying on swapping and without causing OOM errors. We add a user-space continuous ballooning program that, in tandem with our balloon driver modifications, can handle memory spikes of hundreds of gigabytes, as well as survive an indefinite number of migrations. In this paper, we discuss our minimal changes to Linux, describe our continuous ballooning program, and evaluate our now in-production, cloud solution on real-world applications. Our tests are designed to measure resilience in the face of several memory spikes and live migrations. In our tests, we add at most 8% overhead, yet can provide a migration speedup of at least 52% for giant VMs with memory intensive applications reaching almost 600 GB.

Original languageEnglish (US)
Title of host publicationSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages269-283
Number of pages15
ISBN (Electronic)9798400712869
DOIs
StatePublished - Nov 20 2024
Externally publishedYes
Event15th Annual ACM Symposium on Cloud Computing, SoCC 2024 - Redmond, United States
Duration: Nov 20 2024Nov 22 2024

Publication series

NameSoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing

Conference

Conference15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Country/TerritoryUnited States
CityRedmond
Period11/20/2411/22/24

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Networks and Communications
  • Computer Science Applications

Keywords

  • Ballooning
  • Cloud Computing
  • Memory Management
  • Virtual Machine Migration

Fingerprint

Dive into the research topics of 'Towards Swap-Free, Continuous Ballooning for Fast, Cloud-Based Virtual Machine Migrations'. Together they form a unique fingerprint.

Cite this