Guillotine: Hypervisors for Isolating Malicious AIs

James Mickens, Sarah Radway, Ravi Netravali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models - -models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualization techniques, Guillotine must also introduce fundamentally new isolation mechanisms to handle the unique threat model posed by existential-risk AIs. For example, a rogue AI may try to introspect upon hypervisor software or the underlying hardware substrate to enable later subversion of that control plane; thus, a Guillotine hypervisor requires careful co-design of the hypervisor software and the CPUs, RAM, NIC, and storage devices that support the hypervisor software, to thwart side channel leakage and more generally eliminate mechanisms for AI to exploit reflection-based vulnerabilities. Beyond such isolation at the software, network, and microarchitectural layers, a Guillotine hypervisor must also provide physical fail-safes more commonly associated with nuclear power plants, avionic platforms, and other types of mission-critical systems. Physical fail-safes, e.g., involving electromechanical disconnection of network cables, or the flooding of a datacenter which holds a rogue AI, provide defense in depth if software, network, and microarchitectural isolation is compromised and a rogue AI must be temporarily shut down or permanently destroyed.

Original languageEnglish (US)
Title of host publicationHOTOS 2025 - Proceedings of the 2025 Workshop in Hot Topics in Operating Systems
PublisherAssociation for Computing Machinery, Inc
Pages18-26
Number of pages9
ISBN (Electronic)9798400714757
DOIs
StatePublished - Jun 6 2025
Event20th ACM Workshop on Hot Topics in Operating Systems! - Banff, Canada
Duration: May 14 2025May 16 2025

Publication series

NameHOTOS 2025 - Proceedings of the 2025 Workshop in Hot Topics in Operating Systems

Conference

Conference20th ACM Workshop on Hot Topics in Operating Systems!
Country/TerritoryCanada
CityBanff
Period5/14/255/16/25

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems

Fingerprint

Dive into the research topics of 'Guillotine: Hypervisors for Isolating Malicious AIs'. Together they form a unique fingerprint.

Cite this