Two-Timescale Networks for Nonlinear Value Function Approximation

Chung, Wesley

doi:doi:10.7939/r3-dx5r-7020

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

287 views
446 downloads

Two-Timescale Networks for Nonlinear Value Function Approximation

Author / Creator

Chung, Wesley
Policy evaluation, learning value functions, is an integral part of the reinforcement learning problem. In this thesis, I propose a neural network architecture, the Two-Timescale Network (TTN), for value function approximation which utilizes linear function approximation for the value function with learned features. By separating these two learning processes—approximating the value function and learning features—we can utilize classic policy evaluation methods suited for linear function approximation but still obtain nonlinear estimates of the value function. Additionally, the separation facilitates proving convergence guarantees for the value estimates. This thesis contains empirical investigations about the choice of linear policy evaluation algorithm, the choice of objective for feature-learning and also presents some experiments in the control setting.
We find that TTNs perform competitively with other algorithms which train both the features and the value function estimates jointly. In particular, utilizing least-squares temporal difference methods seem to provide the largest benefit and eligibility traces can also be helpful for linear time TD algorithms.
Overall, this thesis provides evidence that separating feature and value learning is a promising direction for nonlinear value function approximation.
Subjects / Keywords
Graduation date

Fall 2019
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-dx5r-7020
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Specialization
- Statistical Machine Learning
Supervisor / co-supervisor and their department(s)
- White, Martha (Computing Science)