TY - JOUR
T1 - Machine Learning for Polymer Design to Enhance Pervaporation-Based Organic Recovery
AU - Yang, Meiqi
AU - Zhu, Jun Jie
AU - McGaughey, Allyson L.
AU - Priestley, Rodney D.
AU - Hoek, Eric M.V.
AU - Jassby, David
AU - Ren, Zhiyong Jason
N1 - Publisher Copyright:
© 2024 American Chemical Society.
PY - 2024/6/11
Y1 - 2024/6/11
N2 - Pervaporation (PV) is an effective membrane separation process for organic dehydration, recovery, and upgrading. However, it is crucial to improve membrane materials beyond the current permeability-selectivity trade-off. In this research, we introduce machine learning (ML) models to identify high-potential polymers, greatly improving the efficiency and reducing cost compared to conventional trial-and-error approach. We utilized the largest PV data set to date and incorporated polymer fingerprints and features, including membrane structure, operating conditions, and solute properties. Dimensionality reduction, missing data treatment, seed randomness, and data leakage management were employed to ensure model robustness. The optimized LightGBM models achieved RMSE of 0.447 and 0.360 for separation factor and total flux, respectively (logarithmic scale). Screening approximately 1 million hypothetical polymers with ML models resulted in identifying polymers with a predicted permeation separation index >30 and synthetic accessibility score <3.7 for acetic acid extraction. This study demonstrates the promise of ML to accelerate tailored membrane designs.
AB - Pervaporation (PV) is an effective membrane separation process for organic dehydration, recovery, and upgrading. However, it is crucial to improve membrane materials beyond the current permeability-selectivity trade-off. In this research, we introduce machine learning (ML) models to identify high-potential polymers, greatly improving the efficiency and reducing cost compared to conventional trial-and-error approach. We utilized the largest PV data set to date and incorporated polymer fingerprints and features, including membrane structure, operating conditions, and solute properties. Dimensionality reduction, missing data treatment, seed randomness, and data leakage management were employed to ensure model robustness. The optimized LightGBM models achieved RMSE of 0.447 and 0.360 for separation factor and total flux, respectively (logarithmic scale). Screening approximately 1 million hypothetical polymers with ML models resulted in identifying polymers with a predicted permeation separation index >30 and synthetic accessibility score <3.7 for acetic acid extraction. This study demonstrates the promise of ML to accelerate tailored membrane designs.
KW - LightGBM
KW - SHAP
KW - data leakage management
KW - machine learning
KW - membrane
KW - pervaporation
KW - wastewater
UR - http://www.scopus.com/inward/record.url?scp=85193743132&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193743132&partnerID=8YFLogxK
U2 - 10.1021/acs.est.4c00060
DO - 10.1021/acs.est.4c00060
M3 - Article
C2 - 38743597
AN - SCOPUS:85193743132
SN - 0013-936X
VL - 58
SP - 10128
EP - 10139
JO - Environmental Science and Technology
JF - Environmental Science and Technology
IS - 23
ER -