An Empirical Study of Model-Free Exploration for Deep Reinforcement Learning

Zhao, Xutong

doi:doi:10.7939/r3-q69n-7d90

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

233 views
251 downloads

An Empirical Study of Model-Free Exploration for Deep Reinforcement Learning

Author / Creator

Zhao, Xutong
Reinforcement learning (RL) is a learning paradigm focusing on how agents interact with an environment to maximize cumulative reward signals emitted from the environment. Exploration versus exploitation challenge is critical in RL research: the agent ought to trade off between taking the known rewarding sequence of actions and exploring unknown actions that might be more rewarding. Exploration research in RL often uses algorithms and environments with many degrees of freedom, which can interfere with the interpretability of results. This thesis presents a systematic, yet simple, study of exploration methods for value-based control algorithms. We present a novel suite of small environments that each pose a distinct exploration challenge. Our environment designs allow us to observe the strengths and weaknesses of individual exploration methods, as well as trends across implementation details and conceptual approaches to exploration. We conduct a literature survey and categorize model-free exploration approaches by their underlying heuristics. We also empirically evaluate the performance of representative exploration methods on our exploration domains. Despite the simplicity of our environments, none of the tested exploration methods achieves good performance in all environments. However, some methods consistently improved upon the Q-learning baseline. Beyond our survey results, our suite of interpretable environments can be used as a sanity check to ensure that an exploration method behaves appropriately in simple situations.
Subjects / Keywords
Graduation date

Fall 2021
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-q69n-7d90
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- White, Adam (Computing Science)