The Worst-Case Data-Generating Probability Measure in Statistical Learning

Xinying Zou, Samir M. Perlaza, Inaki Esnaola, Eitan Altman, H. Vincent Poor

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. Such a WCDG probability measure is shown to be the unique solution to two different optimization problems: (a) The maximization of the expected loss over the set of probability measures on the datasets whose relative entropy with respect to a reference measure is not larger than a given threshold; and (b) The maximization of the expected loss with regularization by relative entropy with respect to the reference measure. Such a reference measure can be interpreted as a prior on the datasets. The WCDG cumulants are finite and bounded in terms of the cumulants of the reference measure. To analyze the concentration of the expected empirical risk induced by the WCDG probability measure, the notion of (epsilon, delta ) -robustness of models is introduced. Closed-form expressions are presented for the sensitivity of the expected loss for a fixed model. These results lead to a novel expression for the generalization error of arbitrary machine learning algorithms. This exact expression is provided in terms of the WCDG probability measure and leads to an upper bound that is equal to the sum of the mutual information and the lautum information between the models and the datasets, up to a constant factor. This upper bound is achieved by a Gibbs algorithm. This finding reveals that an exploration into the generalization error of the Gibbs algorithm facilitates the derivation of overarching insights applicable to any machine learning algorithm.

Original languageEnglish (US)
Pages (from-to)175-189
Number of pages15
JournalIEEE Journal on Selected Areas in Information Theory
Volume5
DOIs
StatePublished - 2024
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Media Technology
  • Artificial Intelligence
  • Applied Mathematics

Keywords

  • Gibbs algorithm
  • Supervised machine learning
  • generalization gap
  • relative entropy
  • sensitivity
  • worst-case

Fingerprint

Dive into the research topics of 'The Worst-Case Data-Generating Probability Measure in Statistical Learning'. Together they form a unique fingerprint.

Cite this