TY - GEN
T1 - Distributional Cloning for Stabilized Imitation Learning via ADMM
AU - Zhang, Xin
AU - Li, Yanhua
AU - Zhang, Ziming
AU - Brinton, Christopher G.
AU - Liu, Zhenming
AU - Zhang, Zhi Li
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The two leading solution paradigms for imitation learning (IL), BC and GAIL, each suffers from notable drawbacks. BC, a supervised learning approach to mimic expert actions, is vulnerable to covariate shift. GAIL applies adversarial training to minimize the discrepancy between expert and learner behaviors, which is prone to unstable training and mode collapse. In this work, we propose DC - Distributional Cloning - a novel IL approach for addressing the covariate shift and mode collapse problems simultaneously. DC directly maximizes the likelihood of observed expert and learner demonstrations, and gradually encourages the learner to evolve towards expert behaviors based on an averaging effect. The DC solution framework contains two stages in each training loop, where in stage one the mixed expert and learner state distribution is estimated via SoftFlow, and in stage two the learner policy is trained to match both the expert's policy and state distribution via ADMM. Experimental evaluation of DC compared with several baselines in 10 different physics-based control tasks reveal superior results in learner policy performance, training stability, and mode distribution preservation.
AB - The two leading solution paradigms for imitation learning (IL), BC and GAIL, each suffers from notable drawbacks. BC, a supervised learning approach to mimic expert actions, is vulnerable to covariate shift. GAIL applies adversarial training to minimize the discrepancy between expert and learner behaviors, which is prone to unstable training and mode collapse. In this work, we propose DC - Distributional Cloning - a novel IL approach for addressing the covariate shift and mode collapse problems simultaneously. DC directly maximizes the likelihood of observed expert and learner demonstrations, and gradually encourages the learner to evolve towards expert behaviors based on an averaging effect. The DC solution framework contains two stages in each training loop, where in stage one the mixed expert and learner state distribution is estimated via SoftFlow, and in stage two the learner policy is trained to match both the expert's policy and state distribution via ADMM. Experimental evaluation of DC compared with several baselines in 10 different physics-based control tasks reveal superior results in learner policy performance, training stability, and mode distribution preservation.
KW - imitation learning
KW - neural ordinary differential equations
UR - http://www.scopus.com/inward/record.url?scp=85185402652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185402652&partnerID=8YFLogxK
U2 - 10.1109/ICDM58522.2023.00091
DO - 10.1109/ICDM58522.2023.00091
M3 - Conference contribution
AN - SCOPUS:85185402652
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 818
EP - 827
BT - Proceedings - 23rd IEEE International Conference on Data Mining, ICDM 2023
A2 - Chen, Guihai
A2 - Khan, Latifur
A2 - Gao, Xiaofeng
A2 - Qiu, Meikang
A2 - Pedrycz, Witold
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Data Mining, ICDM 2023
Y2 - 1 December 2023 through 4 December 2023
ER -