TY - JOUR
T1 - Provably correct automatic subdifferentiation for qualified programs
AU - Kakade, Sham M.
AU - Lee, Jason D.
N1 - Funding Information:
Acknowledgments: We thank Dima Drusvyatskiy for many helpful discussions. Sham Kakade acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery, the NSF through award CCF-1740551, and ONR award N00014-18-1-2247. Jason D. Lee acknowledges support of the ARO under MURI Award W911NF-11-1-0303. This is part of the collaboration between US DOD, UK MOD and UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative.
Publisher Copyright:
© 2018 Curran Associates Inc.All rights reserved.
PY - 2018
Y1 - 2018
N2 - The Cheap Gradient Principle [Griewank and Walther, 2008] - the computational cost of computing the gradient of a scalar-valued function is nearly the same (often within a factor of 5) as that of simply computing the function itself - is of central importance in optimization; it allows us to quickly obtain (high dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing subderivatives: widely used ML libraries, including TensorFlow and PyTorch, do not correctly compute (generalized) subderivatives even on simple examples. This work considers the question: is there a Cheap Subgradient Principle? Our main result shows that, under certain restrictions on our library of nonsmooth functions (standard in nonlinear programming), provably correct generalized subderivatives can be computed at a computational cost that is within a (dimension-free) factor of 6 of the cost of computing the scalar function itself.
AB - The Cheap Gradient Principle [Griewank and Walther, 2008] - the computational cost of computing the gradient of a scalar-valued function is nearly the same (often within a factor of 5) as that of simply computing the function itself - is of central importance in optimization; it allows us to quickly obtain (high dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing subderivatives: widely used ML libraries, including TensorFlow and PyTorch, do not correctly compute (generalized) subderivatives even on simple examples. This work considers the question: is there a Cheap Subgradient Principle? Our main result shows that, under certain restrictions on our library of nonsmooth functions (standard in nonlinear programming), provably correct generalized subderivatives can be computed at a computational cost that is within a (dimension-free) factor of 6 of the cost of computing the scalar function itself.
UR - http://www.scopus.com/inward/record.url?scp=85064833535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064833535&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85064833535
SN - 1049-5258
VL - 2018-December
SP - 7125
EP - 7135
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
Y2 - 2 December 2018 through 8 December 2018
ER -