Practical Bayesian optimization of machine learning algorithms

Jasper Snoek, Hugo Larochelle, Ryan P. Adams

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5216 Scopus citations

Abstract

The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes bruteforce search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 25
Subtitle of host publication26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Pages2951-2959
Number of pages9
StatePublished - 2012
Externally publishedYes
Event26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 - Lake Tahoe, NV, United States
Duration: Dec 3 2012Dec 6 2012

Publication series

NameAdvances in Neural Information Processing Systems
Volume4
ISSN (Print)1049-5258

Other

Other26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Country/TerritoryUnited States
CityLake Tahoe, NV
Period12/3/1212/6/12

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Practical Bayesian optimization of machine learning algorithms'. Together they form a unique fingerprint.

Cite this