Usage
  • 67 views
  • 78 downloads

Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains

  • Author / Creator
    Gendron-Bellemare, Marc
  • This thesis presents new algorithms for dealing with large scale reinforcement learning problems. Central to this work is the Atari 2600 platform, which acts as both a rich evaluation framework and a source of challenges for existing reinforcement learning methods. Three contributions are presented; common to all three is the idea of leveraging the highly structured nature of Atari 2600 games in order to achieve meaningful results. The first part of this work formally introduces the notion of contingency awareness: the recognition that parts of an agent's observation are under its control, while others are solely determined by its environment. Together with this formalization, I provide empirical results showing that contingency awareness can be used to generate useful features for value-based reinforcement learning in Atari 2600 games. The second part investigates the use of hashing in linear value function approximation. My work provides a new, theoretically sound hashing method for linear value function approximation based on prior work on sketches. Empirically, the new hashing method offers a significant performance advantage compared to traditional hashing, at a minuscule computational cost. My third contribution is the quad-tree factorization (QTF) algorithm, an information-theoretic approach to the problem of predicting future Atari 2600 screens. The algorithm relies on the natural idea that future screens can be efficiently factored into image patches. QTF goes a step further by providing a hierarchical-decomposition screen model, so that image patches are only as large as they need to be. Together, the contributions in this thesis are motivated by the need to efficiently handle the Atari 2600's large observation space -- the set of all possible game screens -- in arbitrary Atari 2600 games. This work provides evidence that general, principled approximations can be devised to allow us to tackle the reinforcement learning problem within complex, natural domains.

  • Subjects / Keywords
  • Graduation date
    2013-11
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/R3DJ58T21
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
    English
  • Institution
    University of Alberta
  • Degree level
    Doctoral
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Veness, Joel (Computing Science)
    • Bowling, Michael (Computing Science)
  • Examining committee members and their departments
    • Sutton, Richard (Computing Science)
    • Bulitko, Vadim (Computing Science)
    • Singh, Satinder (Electrical Engineering and Computer Science, University of Michigan)
    • Schuurmans, Dale (Computing Science)