Adaptive Representation for Policy Gradient

Das Gupta, Ujjwal

doi:doi:10.7939/R38Q2B

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

398 views
392 downloads

Adaptive Representation for Policy Gradient

Author / Creator

Das Gupta, Ujjwal
Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Methods like policy gradient, that do not learn a value function and instead directly represent policy, often need fewer parameters to learn good policies. However, they typically employ a fixed parametric representation that may not be sufficient for complex domains. This thesis introduces two algorithms which can learn an adaptive representation of policy: the Policy Tree algorithm, which learns a decision tree over different instantiations of a base policy, and the Policy Conjunction algorithm, which adds conjunctive features to any base policy that uses a linear feature representation. In both of these algorithms, policy gradient is used to grow the representation in a way that enables the maximum local increase in the expected return of the policy. Experiments show that these algorithms can choose genuinely helpful splits or features, and significantly improve upon the commonly used linear Gibbs softmax policy, which is chosen as the base policy.
Subjects / Keywords
Graduation date

Spring 2015
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/R38Q2B
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Specialization
- Statistical Machine Learning
Supervisor / co-supervisor and their department(s)
- Talvitie, Erik (Computing Science)
- Bowling, Michael (Computing Science)
Examining committee members and their departments
- Hoover, H. James (Computing Science)
- Bowling, Michael (Computing Science)
- Sutton, Richard S. (Computing Science)
- Talvitie, Erik (Computing Science)