TY - GEN

T1 - Efficient optimization of loops and limits with randomized telescoping sums

AU - Beatson, Alex

AU - Adams, Ryan P.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences.

AB - We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences.

UR - http://www.scopus.com/inward/record.url?scp=85077957447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077957447&partnerID=8YFLogxK

M3 - Conference contribution

T3 - 36th International Conference on Machine Learning, ICML 2019

SP - 836

EP - 854

BT - 36th International Conference on Machine Learning, ICML 2019

PB - International Machine Learning Society (IMLS)

T2 - 36th International Conference on Machine Learning, ICML 2019

Y2 - 9 June 2019 through 15 June 2019

ER -