TY - JOUR
T1 - A knowledge gradient policy for sequencing experiments to identify the structure of RNA molecules using a sparse additive belief model
AU - Li, Yan
AU - Reyes, Kristofer G.
AU - Vazquez-Anderson, Jorge
AU - Wang, Yingfei
AU - Contreras, Lydia M.
AU - Powell, Warren Buckler
N1 - Funding Information:
The authors thank the Welch Foundation [Grant F-756]; the Air Force Office of Scientific Research (AFOSR) Young Investigator program (FA9550-13-1-0160); and the Consejo Nacional de Ciencia y Tecnología for the graduate 445 fellowship granted to J.V.A (CONACYT-94638). This research was supported in part by AFOSR [Grant FA9550-12-1-0200] for Program on Optimization and Discrete Mathematics, with valuable support from the program on Natural Materials and Systems. The authors thank Dr. Rick Russell for kindly providing the raw in vitro DMS footprinting data used in this work and previously published in Russell et al. (2006).
Publisher Copyright:
Copyright: © 2018 INFORMS
PY - 2018
Y1 - 2018
N2 - We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We perform a regularized, sparse linear model with a log link function where the marginal contribution to the thermodynamic cycle of each nucleotide is purely additive. The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist l 1 regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before exhausting an experimental budget. We also develop two new algorithms: batch SpKG and batch SpKG-LM. The first algorithm generates more suggestions sequentially to run parallel experiments. The second one dynamically adds new alternatives, in the form of types of probes, which are created by inserting, deleting, or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Tetrahymena Group I intron (a midsize RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.
AB - We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We perform a regularized, sparse linear model with a log link function where the marginal contribution to the thermodynamic cycle of each nucleotide is purely additive. The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist l 1 regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before exhausting an experimental budget. We also develop two new algorithms: batch SpKG and batch SpKG-LM. The first algorithm generates more suggestions sequentially to run parallel experiments. The second one dynamically adds new alternatives, in the form of types of probes, which are created by inserting, deleting, or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Tetrahymena Group I intron (a midsize RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.
KW - Decision analysis: sequential
KW - Simulation: design of experiments
KW - Statistics: Bayesian
UR - http://www.scopus.com/inward/record.url?scp=85061855117&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061855117&partnerID=8YFLogxK
U2 - 10.1287/ijoc.2017.0803
DO - 10.1287/ijoc.2017.0803
M3 - Article
AN - SCOPUS:85061855117
SN - 1091-9856
VL - 30
SP - 750
EP - 767
JO - INFORMS Journal on Computing
JF - INFORMS Journal on Computing
IS - 4
ER -