Session: Genetics-Based Machine Learning to Evolutionary Machine Learning (06/07, 11:15-13:15, Room 5)

Neuroevolution-Based Inverse Reinforcement Learning

Karan Budhraja and Tim Oates

University of Maryland, Baltimore County, United States of America University of Maryland, Baltimore County, United States of America

The problem of Learning from Demonstration is targeted at learning to perform tasks based on observed examples. One approach to Learning from Demonstration is Inverse Reinforcement Learning, in which actions are observed to infer rewards. This work combines a feature-based state evaluation approach to Inverse Reinforcement Learning with neuroevolution, a paradigm for modifying neural networks based on their performance on a given task. Neural networks are used to learn from a demonstrated expert policy and are evolved to generate a policy similar to the demonstration. The algorithm is discussed and evaluated against competitive feature-based Inverse Reinforcement Learning approaches. At the cost of execution time, neural networks allow for non-linear combinations of features in state evaluations. This results in better correspondence to observed examples as opposed to using linear combinations. This work also extends existing work on Bayesian Non-Parametric Feature construction for Inverse Reinforcement Learning by using non-linear combinations of intermediate data to improve performance. The algorithm is observed to be specifically suitable for a linearly solvable non-deterministic Markov Decision Processes in which multiple rewards are sparsely scattered in state space. This translates to real-world control problems such as those in robotics and automation (e.g. the robust output tracking problem or controlling an n-joint arm), where the underlying equations can be made linear. A conclusive performance hierarchy between evaluated algorithms is presented.