Input warping for Bayesian optimization of non-stationary functions

Jasper Snoek, Kevin Swersky, Richard Zemel, Ryan P. Adams

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Scopus citations

Abstract

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space," to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.

Original languageEnglish (US)
Title of host publication31st International Conference on Machine Learning, ICML 2014
PublisherInternational Machine Learning Society (IMLS)
Pages3654-3662
Number of pages9
ISBN (Electronic)9781634393973
StatePublished - Jan 1 2014
Externally publishedYes
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: Jun 21 2014Jun 26 2014

Publication series

Name31st International Conference on Machine Learning, ICML 2014
Volume5

Other

Other31st International Conference on Machine Learning, ICML 2014
CountryChina
CityBeijing
Period6/21/146/26/14

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint Dive into the research topics of 'Input warping for Bayesian optimization of non-stationary functions'. Together they form a unique fingerprint.

Cite this