TY - JOUR
T1 - Thallo - Scheduling for High-Performance Large-Scale Non-Linear Least-Squares Solvers
AU - Mara, Michael
AU - Heide, Felix
AU - Zollhöfer, Michael
AU - Nießner, Matthias
AU - Hanrahan, Pat
N1 - Publisher Copyright:
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2021/10
Y1 - 2021/10
N2 - Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure 1. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.
AB - Large-scale optimization problems at the core of many graphics, vision, and imaging applications are often implemented by hand in tedious and error-prone processes in order to achieve high performance (in particular on GPUs), despite recent developments in libraries and DSLs. At the same time, these hand-crafted solver implementations reveal that the key for high performance is a problem-specific schedule that enables efficient usage of the underlying hardware. In this work, we incorporate this insight into Thallo, a domain-specific language for large-scale non-linear least squares optimization problems. We observe various code reorganizations performed by implementers of high-performance solvers in the literature, and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as bundle adjustment, face blendshape fitting, and spatially-varying Poisson deconvolution, as seen in Figure 1. Abstracting schedules from the optimization, we outperform state-of-the-art GPU-based optimization DSLs by an average of 16× across all applications introduced in this work, and even some published hand-written GPU solvers by 30%+.
KW - 3D Reconstruction
KW - DSL
KW - GPU
KW - non-linear least-squares
KW - optimization
KW - scheduling
UR - http://www.scopus.com/inward/record.url?scp=85124107295&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124107295&partnerID=8YFLogxK
U2 - 10.1145/3453986
DO - 10.1145/3453986
M3 - Article
AN - SCOPUS:85124107295
SN - 0730-0301
VL - 40
JO - ACM Transactions on Graphics
JF - ACM Transactions on Graphics
IS - 5
M1 - 3453986
ER -