Maximum entropy density estimation with generalized regularization and an application to species distribution modeling

Miroslav Dudík, Steven J. Phillips, Robert E. Schapire

Research output: Contribution to journalArticlepeer-review

163 Scopus citations

Abstract

We present a unified and complete account of maximum entropy density estimation subject to constraints represented by convex potential functions or, alternatively, by convex regularization. We provide fully general performance guarantees and an algorithm with a complete convergence proof. As special cases, we easily derive performance guarantees for many known regularization types, including l1, l2, l22, and l 1 - l2, l22, style regularization. We propose an algorithm solving a large and general subclass of generalized maximum entropy problems, including all discussed in the paper, and prove its convergence. Our approach generalizes and unifies techniques based on information geometry and Bregman divergences as well as those based more directly on compactness. Our work is motivated by a novel application of maximum entropy to species distribution modeling, an important problem in conservation biology and ecology. In a set of experiments on real-world data, we demonstrate the utility of maximum entropy in this setting. We explore effects of different feature types, sample sizes, and regularization levels on the performance of maxent, and discuss interpretability of the resulting models.

Original languageEnglish (US)
Pages (from-to)1217-1260
Number of pages44
JournalJournal of Machine Learning Research
Volume8
StatePublished - Jun 2007
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Statistics and Probability
  • Artificial Intelligence

Keywords

  • Density estimation
  • Iterative scaling
  • Maximum entropy
  • Regularization
  • Species distribution modeling

Fingerprint

Dive into the research topics of 'Maximum entropy density estimation with generalized regularization and an application to species distribution modeling'. Together they form a unique fingerprint.

Cite this