Download the full-sized PDF
Permanent link (DOI): https://doi.org/10.7939/R3DJ58T21
This file is in the following communities:
|Graduate Studies and Research, Faculty of|
This file is in the following collections:
|Theses and Dissertations|
Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains Open Access
- Other title
Bayesian model learning
- Type of item
- Degree grantor
University of Alberta
- Author or creator
- Supervisor and department
Bowling, Michael (Computing Science)
Veness, Joel (Computing Science)
- Examining committee member and department
Singh, Satinder (Electrical Engineering and Computer Science, University of Michigan)
Bulitko, Vadim (Computing Science)
Schuurmans, Dale (Computing Science)
Sutton, Richard (Computing Science)
Department of Computing Science
- Date accepted
- Graduation date
Doctor of Philosophy
- Degree level
This thesis presents new algorithms for dealing with large scale reinforcement learning problems.
Central to this work is the Atari 2600 platform, which acts as both a rich evaluation framework and
a source of challenges for existing reinforcement learning methods. Three contributions are presented; common to all three is the idea of leveraging the highly structured nature of Atari 2600 games in order to achieve meaningful results.
The first part of this work formally introduces the notion of contingency awareness: the recognition that parts of an agent's observation are under its control, while others are solely determined by its environment. Together with this formalization, I provide empirical results showing that contingency awareness can be used to generate useful features for value-based reinforcement learning in Atari 2600 games.
The second part investigates the use of hashing in linear value function approximation. My work provides a new, theoretically sound hashing method for linear value function approximation based on prior work on sketches. Empirically, the new hashing method offers a significant performance advantage compared to traditional hashing, at a minuscule computational cost.
My third contribution is the quad-tree factorization (QTF) algorithm, an information-theoretic approach to the problem of predicting future Atari 2600 screens. The algorithm relies on the natural idea that future screens can be efficiently factored into image patches. QTF goes a step further by providing a hierarchical-decomposition screen model, so that image patches are only as large as they need to be.
Together, the contributions in this thesis are motivated by the need to efficiently handle the Atari 2600's large observation space -- the set of all possible game screens -- in arbitrary Atari 2600 games. This work provides evidence that general, principled approximations can be devised to allow us to tackle the reinforcement learning problem within complex, natural domains.
- Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.
- Citation for previous publication
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. (2013). The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47.Bellemare, M. G., Veness, J., and Bowling, M. (2012). Investigating contingency awareness using Atari 2600 games. In Proceedings of Twenty-Sixth AAAI Con- ference on Artificial Intelligence.Bellemare, M. G., Veness, J., and Bowling, M. (2012). Sketch-based linear value function approximation. In Advances in Neural Information Processing Systems 25.Bellemare, M. G., Veness, J., and Bowling, M. (2013). Bayesian learning of re- cursively factored environments. In Proceedings of the Thirtieth International Conference on Machine Learning.
- Date Uploaded
- Date Modified
- Audit Status
- Audits have not yet been run on this file.
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 1706180
Last modified: 2015:10:12 20:05:00-06:00
Original checksum: d454c36aa24bdaa94d66bfe3b9441458
Well formed: false
Status message: No document catalog dictionary offset=0