TY - GEN
T1 - Scalable Bayesian optimization using deep neural networks
AU - Snoek, Jasper
AU - Ripped, Oren
AU - Swersky, Kevin
AU - Kiros, Ryan
AU - Satish, Nadathur
AU - Sundaram, Narayanan
AU - Patwary, Md Mostofa Ali
AU - Prabhat,
AU - Adams, Ryan P.
N1 - Funding Information:
This work was supported by the Director, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This work used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. We would like to acknowledge the NERSC systems staff, in particular Helen He and Harvey Wasserman, for providing us with generous access to the Babbage Xeon Phi testbed. The image caption generation computations in this paper were run on the Odyssey cluster supported by the FAS Division of Science, Research Computing Group at Harvard University. We would like to acknowledge the FASRC staff and in particular James Cuff for providing generous access to Odyssey. Jasper Snoek is a fellow in the Harvard Center for Research on Computation and Society. Kevin Swersky is the recipient of an Ontario Graduate Scholarship (OGS). This work was partially funded by NSF IIS-1421780, the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Institute for Advanced Research (CIFAR).
Publisher Copyright:
© Copyright 2015 by International Machine Learning Society (IMLS). All rights reserved.
PY - 2015
Y1 - 2015
N2 - Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.
AB - Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.
UR - http://www.scopus.com/inward/record.url?scp=84970022032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84970022032&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84970022032
T3 - 32nd International Conference on Machine Learning, ICML 2015
SP - 2161
EP - 2170
BT - 32nd International Conference on Machine Learning, ICML 2015
A2 - Bach, Francis
A2 - Blei, David
PB - International Machine Learning Society (IMLS)
T2 - 32nd International Conference on Machine Learning, ICML 2015
Y2 - 6 July 2015 through 11 July 2015
ER -