TY - JOUR
T1 - Statistical Learning of Discrete States in Time Series
AU - Li, Hao
AU - Yang, Haw
N1 - Funding Information:
The authors thank the Betty and Gordon Moore Foundation (Grant GBMF4741) and Princeton University for financial support. We thank an anonymous reviewer for the suggestion to include the HMM analysis and the Gaussian-mixture model fitting as contrast.
Publisher Copyright:
© 2018 American Chemical Society.
PY - 2019/1/24
Y1 - 2019/1/24
N2 - Time series obtained from time-dependent experiments contain rich information on kinetics and dynamics of the system under investigation. This work describes an unsupervised learning framework, along with the derivation of the necessary analytical expressions, for the analysis of Gaussian-distributed time series that exhibit discrete states. After the time series has been partitioned into segments in a model-free manner using the previously developed change-point (CP) method, this protocol starts with an agglomerative hierarchical clustering algorithm to classify the detected segments into possible states. The initial state clustering is further refined using an expectation-maximization (EM) procedure, and the number of states is determined by a Bayesian information criterion (BIC). Also introduced here is an achievement scalarization function, usually seen in artificial intelligence literature, for quantitatively assessing the performance of state determination. The statistical learning framework, which is comprised of three stages, detection of signal change, clustering, and number-of-state determination, was thoroughly characterized using simulated trajectories with random intensity segments that have no underlying kinetics, and its performance was critically evaluated. The application to experimental data is also demonstrated. The results suggested that this general framework, the implementation of which is based on firm theoretical foundations and does not require the imposition of any kinetics model, is powerful in determining the number of states, the parameters contained in each state, as well as the associated statistical significance.
AB - Time series obtained from time-dependent experiments contain rich information on kinetics and dynamics of the system under investigation. This work describes an unsupervised learning framework, along with the derivation of the necessary analytical expressions, for the analysis of Gaussian-distributed time series that exhibit discrete states. After the time series has been partitioned into segments in a model-free manner using the previously developed change-point (CP) method, this protocol starts with an agglomerative hierarchical clustering algorithm to classify the detected segments into possible states. The initial state clustering is further refined using an expectation-maximization (EM) procedure, and the number of states is determined by a Bayesian information criterion (BIC). Also introduced here is an achievement scalarization function, usually seen in artificial intelligence literature, for quantitatively assessing the performance of state determination. The statistical learning framework, which is comprised of three stages, detection of signal change, clustering, and number-of-state determination, was thoroughly characterized using simulated trajectories with random intensity segments that have no underlying kinetics, and its performance was critically evaluated. The application to experimental data is also demonstrated. The results suggested that this general framework, the implementation of which is based on firm theoretical foundations and does not require the imposition of any kinetics model, is powerful in determining the number of states, the parameters contained in each state, as well as the associated statistical significance.
UR - http://www.scopus.com/inward/record.url?scp=85060030069&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060030069&partnerID=8YFLogxK
U2 - 10.1021/acs.jpcb.8b10561
DO - 10.1021/acs.jpcb.8b10561
M3 - Article
C2 - 30632755
AN - SCOPUS:85060030069
SN - 1089-5647
VL - 123
SP - 689
EP - 701
JO - Journal of Physical Chemistry B
JF - Journal of Physical Chemistry B
IS - 3
ER -