TY - GEN
T1 - Stochastic policy gradient reinforcement learning on a simple 3D biped
AU - Tedrake, Russ
AU - Zhang, Teresa Weirui
AU - Seung, Hyunjune Sebastian
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2004
Y1 - 2004
N2 - We present a learning system which is able to quickly and reliably acquire a robust feedback control policy for 3D dynamic walking from a blank-slate using only trials implemented on our physical robot. The robot begins walking within a minute and learning converges in approximately 20 minutes. This success can be attributed to the mechanics of our robot, which are modeled after a passive dynamic walker, and to a dramatic reduction in the dimensionality of the learning problem. We reduce the dimensionality by designing a robot with only 6 internal degrees of freedom and 4 actuators, by decomposing the control system in the frontal and sagittal planes, and by formulating the learning problem on the discrete return map dynamics. We apply a stochastic policy gradient algorithm to this reduced problem and decrease the variance of the update using a state-based estimate of the expected cost. This optimized learning system works quickly enough that the robot is able to continually adapt to the terrain as it walks.
AB - We present a learning system which is able to quickly and reliably acquire a robust feedback control policy for 3D dynamic walking from a blank-slate using only trials implemented on our physical robot. The robot begins walking within a minute and learning converges in approximately 20 minutes. This success can be attributed to the mechanics of our robot, which are modeled after a passive dynamic walker, and to a dramatic reduction in the dimensionality of the learning problem. We reduce the dimensionality by designing a robot with only 6 internal degrees of freedom and 4 actuators, by decomposing the control system in the frontal and sagittal planes, and by formulating the learning problem on the discrete return map dynamics. We apply a stochastic policy gradient algorithm to this reduced problem and decrease the variance of the update using a state-based estimate of the expected cost. This optimized learning system works quickly enough that the robot is able to continually adapt to the terrain as it walks.
UR - http://www.scopus.com/inward/record.url?scp=14044262287&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=14044262287&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:14044262287
SN - 0780384636
SN - 9780780384637
T3 - 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
SP - 2849
EP - 2854
BT - 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
T2 - 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Y2 - 28 September 2004 through 2 October 2004
ER -