TY - JOUR
T1 - Predicting Extraction Selectivity of Acetic Acid in Pervaporation by Machine Learning Models with Data Leakage Management
AU - Yang, Meiqi
AU - Zhu, Jun Jie
AU - McGaughey, Allyson
AU - Zheng, Sunxiang
AU - Priestley, Rodney D.
AU - Ren, Zhiyong Jason
N1 - Publisher Copyright:
© 2023 American Chemical Society.
PY - 2023/4/11
Y1 - 2023/4/11
N2 - The extraction of acetic acid and other carboxylic acids from water is an emerging separation need as they are increasingly produced from waste organics and CO2 during carbon valorization. However, the traditional experimental approach can be slow and expensive, and machine learning (ML) may provide new insights and guidance in membrane development for organic acid extraction. In this study, we collected extensive literature data and developed the first ML models for predicting separation factors between acetic acid and water in pervaporation with polymers’ properties, membrane morphology, fabrication parameters, and operating conditions. Importantly, we assessed seed randomness and data leakage problems during model development, which have been overlooked in ML studies but will result in over-optimistic results and misinterpreted variable importance. With proper data leakage management, we established a robust model and achieved a root-mean-square error of 0.515 using the CatBoost regression model. In addition, the prediction model was interpreted to elucidate the variables’ importance, where the mass ratio was the topmost significant variable in predicting separation factors. In addition, polymers’ concentration and membranes’ effective area contributed to information leakage. These results demonstrate ML models’ advances in membrane design and fabrication and the importance of vigorous model validation.
AB - The extraction of acetic acid and other carboxylic acids from water is an emerging separation need as they are increasingly produced from waste organics and CO2 during carbon valorization. However, the traditional experimental approach can be slow and expensive, and machine learning (ML) may provide new insights and guidance in membrane development for organic acid extraction. In this study, we collected extensive literature data and developed the first ML models for predicting separation factors between acetic acid and water in pervaporation with polymers’ properties, membrane morphology, fabrication parameters, and operating conditions. Importantly, we assessed seed randomness and data leakage problems during model development, which have been overlooked in ML studies but will result in over-optimistic results and misinterpreted variable importance. With proper data leakage management, we established a robust model and achieved a root-mean-square error of 0.515 using the CatBoost regression model. In addition, the prediction model was interpreted to elucidate the variables’ importance, where the mass ratio was the topmost significant variable in predicting separation factors. In addition, polymers’ concentration and membranes’ effective area contributed to information leakage. These results demonstrate ML models’ advances in membrane design and fabrication and the importance of vigorous model validation.
KW - acetic acid
KW - data leakage management
KW - machine learning
KW - pervaporation
KW - separation factor
UR - http://www.scopus.com/inward/record.url?scp=85151322288&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151322288&partnerID=8YFLogxK
U2 - 10.1021/acs.est.2c06382
DO - 10.1021/acs.est.2c06382
M3 - Article
C2 - 36972410
AN - SCOPUS:85151322288
SN - 0013-936X
VL - 57
SP - 5934
EP - 5946
JO - Environmental Science and Technology
JF - Environmental Science and Technology
IS - 14
ER -