### Abstract

The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge takes the form of a Markov Decision Process (MDP) that is used by the apprentice as a rough and imperfect model of the mentor's behavior. Specifically, taking a Bayesian approach, we treat the value of a policy in this modeling MDP as the log prior probability of the policy. In other words, we assume a priori that the mentor's behavior is likely to be a high-value policy in the modeling MDP, though quite possibly different from the optimal policy. We describe an efficient algorithm that, given a modeling MDP and a set of demonstrations by a mentor, provably converges to a stationary point of the log posterior of the mentor's policy, where the posterior is computed with respect to the "value-based" prior. We also present empirical evidence that this prior does in fact speed learning of the mentor's policy, and is an improvement in our experiments over similar previous methods.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 |

Pages | 384-391 |

Number of pages | 8 |

State | Published - Dec 1 2007 |

Event | 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 - Vancouver, BC, Canada Duration: Jul 19 2007 → Jul 22 2007 |

### Publication series

Name | Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 |
---|

### Conference

Conference | 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 |
---|---|

Country | Canada |

City | Vancouver, BC |

Period | 7/19/07 → 7/22/07 |

### All Science Journal Classification (ASJC) codes

- Artificial Intelligence

## Fingerprint Dive into the research topics of 'Imitation learning with a value-based prior'. Together they form a unique fingerprint.

## Cite this

*Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007*(pp. 384-391). (Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007).