TY - GEN
T1 - VectorVisor
T2 - 2023 USENIX Annual Technical Conference, ATC 2023
AU - Ginzburg, Samuel
AU - Shahrad, Mohammad
AU - Freedman, Michael J.
N1 - Publisher Copyright:
© 2023 by The USENIX Association All Rights Reserved.
PY - 2023
Y1 - 2023
N2 - Beyond conventional graphics applications, general-purpose GPU acceleration has had significant impact on machine learning and scientific computing workloads. Yet, it has failed to see widespread use for server-side applications, which we argue is because GPU programming models offer a level of abstraction that is either too low-level (e.g., OpenCL, CUDA) or too high-level (e.g., TensorFlow, Halide), depending on the language. Not all applications fit into either category, resulting in lost opportunities for GPU acceleration. We introduce VectorVisor, a vectorized binary translator that enables new opportunities for GPU acceleration by introducing a novel programming model for GPUs. With VectorVisor, many copies of the same server-side application are run concurrently on the GPU, where VectorVisor mimics the abstractions provided by CPU threads. To achieve this goal, we demonstrate how to (i) provide cross-platform support for system calls and recursion using continuations and (ii) make full use of the excess register file capacity and high memory bandwidth of GPUs. We then demonstrate that our binary translator is able to transparently accelerate certain classes of compute-bound workloads, gaining significant improvements in throughput-per-dollar of up to 2.9× compared to Intel x86-64 VMs in the cloud, and in some cases match the throughput-per-dollar of native CUDA baselines.
AB - Beyond conventional graphics applications, general-purpose GPU acceleration has had significant impact on machine learning and scientific computing workloads. Yet, it has failed to see widespread use for server-side applications, which we argue is because GPU programming models offer a level of abstraction that is either too low-level (e.g., OpenCL, CUDA) or too high-level (e.g., TensorFlow, Halide), depending on the language. Not all applications fit into either category, resulting in lost opportunities for GPU acceleration. We introduce VectorVisor, a vectorized binary translator that enables new opportunities for GPU acceleration by introducing a novel programming model for GPUs. With VectorVisor, many copies of the same server-side application are run concurrently on the GPU, where VectorVisor mimics the abstractions provided by CPU threads. To achieve this goal, we demonstrate how to (i) provide cross-platform support for system calls and recursion using continuations and (ii) make full use of the excess register file capacity and high memory bandwidth of GPUs. We then demonstrate that our binary translator is able to transparently accelerate certain classes of compute-bound workloads, gaining significant improvements in throughput-per-dollar of up to 2.9× compared to Intel x86-64 VMs in the cloud, and in some cases match the throughput-per-dollar of native CUDA baselines.
UR - http://www.scopus.com/inward/record.url?scp=85180362375&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85180362375&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85180362375
T3 - Proceedings of the 2023 USENIX Annual Technical Conference, ATC 2023
SP - 1017
EP - 1037
BT - Proceedings of the 2023 USENIX Annual Technical Conference, ATC 2023
PB - USENIX Association
Y2 - 10 July 2023 through 12 July 2023
ER -