Usage
  • 188 views
  • 416 downloads

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

  • Author / Creator
    Holland, Gordon Z.
  • Dyna is an architecture for reinforcement learning agents that interleaves planning, acting, and learning in an online setting. This architecture aims to make fuller use of limited experience to achieve better performance with fewer environmental interactions. Dyna has been well studied in problems with a tabular representation of states, and has also been extended to some settings with larger state spaces that require function approximation. In Dyna, the environment model is typically used to generate one-step rollouts from selected start states, but longer trajectories could also be generated. Given a fixed budget of computation, planning could take on a variety of shapes: many short rollouts, or fewer long rollouts. In this work, one-step Dyna was applied to several games from the Arcade Learning Environment (ALE) and the result was that the model-based updates offered surprisingly little benefit over performing more updates with the agent’s existing experience, even when using a perfect model. However, when the model was used to generate longer trajectories of simulated experience, performance improved dramatically. The results show that to get the most from planning, the model must be used to generate unfa- miliar experience, and that performing longer rollouts is an effective strategy to accomplish this. Similar observations were made with pre-trained learned models and a model that was learned online along with the value function.

  • Subjects / Keywords
  • Graduation date
    Fall 2018
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/R3GF0NC3X
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.