Model Selection, Data Distributions, and Reproducibility

Richard Shiffrin, Suyog Chandramouli

Research output: Chapter in Book/Report/Conference proceedingChapter

5 Scopus citations

Abstract

Models offering insights into the infinitely complex universe in which we reside are always wrong but vary along many dimensions of usefulness. It is most often useful in science to prefer models that are simultaneously a good representation of the "truth" and simple (a form of Occam's Razor). However, we cannot match "truth" to models directly. Thus in a limited experimental domain we represent "truth" by a distribution of possible experimental outcomes. Inference is carried out by assuming the observed data are a sample from that unknown data distribution. This chapter discusses the factors that govern the variability of that distribution, termed "replication variance," and how those factors do and should influence both model comparison and reproducibility. We present an extension of Bayesian model selection (BMS) that infers posterior probabilities that a given model instance predicts a data distribution that is the best match to the "true" data distribution. We point out close similarities to the other chief method for model comparison, minimum description length (MDL). Finally we show how posterior probabilities for the data distributions can be used to produce a predicted distribution for a statistic S defined on the data: Reproducibility of S can be assessed by comparing the value of S in the replication to the predicted distribution.

Original languageEnglish (US)
Title of host publicationReproducibility
Subtitle of host publicationPrinciples, Problems, Practices, and Prospects
Publisherwiley
Pages115-140
Number of pages26
ISBN (Electronic)9781118865064
ISBN (Print)9781118864975
DOIs
StatePublished - Jan 1 2015
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Mathematics
  • General Social Sciences

Fingerprint

Dive into the research topics of 'Model Selection, Data Distributions, and Reproducibility'. Together they form a unique fingerprint.

Cite this