TY - GEN
T1 - Learning from Interventions
T2 - 16th Robotics: Science and Systems, RSS 2020
AU - Spencer, Jonathan
AU - Choudhury, Sanjiban
AU - Barnes, Matthew
AU - Schmittle, Matthew
AU - Chiang, Mung
AU - Ramadge, Peter
AU - Srinivasa, Siddhartha
N1 - Funding Information:
This work was (partially) funded by the DARPA Dispersed Computing program, NIH R01 (#R01EB019335), NSF CPS (#1544797), NSF NRI (#1637748), the Office of Naval Research, RCTA, Amazon, and Honda Research Institute USA.
Publisher Copyright:
© 2020, MIT Press Journals. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efficiently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.
AB - Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efficiently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.
UR - http://www.scopus.com/inward/record.url?scp=85127945884&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127945884&partnerID=8YFLogxK
U2 - 10.15607/RSS.2020.XVI.055
DO - 10.15607/RSS.2020.XVI.055
M3 - Conference contribution
AN - SCOPUS:85127945884
SN - 9780992374761
T3 - Robotics: Science and Systems
BT - Robotics
A2 - Toussaint, Marc
A2 - Bicchi, Antonio
A2 - Hermans, Tucker
PB - MIT Press Journals
Y2 - 12 July 2020 through 16 July 2020
ER -