TY - JOUR
T1 - Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization
AU - Qiu, Jiahao
AU - Yuan, Hui
AU - Zhang, Jinghong
AU - Chen, Wentao
AU - Wang, Huazheng
AU - Wang, Mengdi
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts.
AB - While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts.
UR - http://www.scopus.com/inward/record.url?scp=85189643355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189643355&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i13.29386
DO - 10.1609/aaai.v38i13.29386
M3 - Conference article
AN - SCOPUS:85189643355
SN - 2159-5399
VL - 38
SP - 14686
EP - 14694
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 13
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
Y2 - 20 February 2024 through 27 February 2024
ER -