Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains

Gendron-Bellemare, Marc

doi:doi:10.7939/R3DJ58T21

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

345 views
564 downloads

Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains

Author / Creator

Gendron-Bellemare, Marc
This thesis presents new algorithms for dealing with large scale reinforcement learning problems.
Central to this work is the Atari 2600 platform, which acts as both a rich evaluation framework and
a source of challenges for existing reinforcement learning methods. Three contributions are presented; common to all three is the idea of leveraging the highly structured nature of Atari 2600 games in order to achieve meaningful results.

The first part of this work formally introduces the notion of contingency awareness: the recognition that parts of an agent's observation are under its control, while others are solely determined by its environment. Together with this formalization, I provide empirical results showing that contingency awareness can be used to generate useful features for value-based reinforcement learning in Atari 2600 games.

The second part investigates the use of hashing in linear value function approximation. My work provides a new, theoretically sound hashing method for linear value function approximation based on prior work on sketches. Empirically, the new hashing method offers a significant performance advantage compared to traditional hashing, at a minuscule computational cost.

My third contribution is the quad-tree factorization (QTF) algorithm, an information-theoretic approach to the problem of predicting future Atari 2600 screens. The algorithm relies on the natural idea that future screens can be efficiently factored into image patches. QTF goes a step further by providing a hierarchical-decomposition screen model, so that image patches are only as large as they need to be.

Together, the contributions in this thesis are motivated by the need to efficiently handle the Atari 2600's large observation space -- the set of all possible game screens -- in arbitrary Atari 2600 games. This work provides evidence that general, principled approximations can be devised to allow us to tackle the reinforcement learning problem within complex, natural domains.
Subjects / Keywords
Graduation date

Fall 2013
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3DJ58T21
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Veness, Joel (Computing Science)
- Bowling, Michael (Computing Science)
Examining committee members and their departments
- Schuurmans, Dale (Computing Science)
- Bulitko, Vadim (Computing Science)
- Sutton, Richard (Computing Science)
- Singh, Satinder (Electrical Engineering and Computer Science, University of Michigan)