TY - JOUR
T1 - Predictability, complexity, and learning
AU - Bialek, William
AU - Nemenman, Ilya
AU - Tishby, Naftali
N1 - Copyright:
Copyright 2005 Elsevier Science B.V., Amsterdam. All rights reserved.
PY - 2001/11
Y1 - 2001/11
N2 - We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or non-parametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
AB - We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or non-parametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
UR - http://www.scopus.com/inward/record.url?scp=0035514587&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0035514587&partnerID=8YFLogxK
U2 - 10.1162/089976601753195969
DO - 10.1162/089976601753195969
M3 - Article
C2 - 11674845
AN - SCOPUS:0035514587
VL - 13
SP - 2409
EP - 2463
JO - Neural Computation
JF - Neural Computation
SN - 0899-7667
IS - 11
ER -