Usage
  • 368 views
  • 289 downloads

Adaptive Representation for Policy Gradient

  • Author / Creator
    Das Gupta, Ujjwal
  • Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Methods like policy gradient, that do not learn a value function and instead directly represent policy, often need fewer parameters to learn good policies. However, they typically employ a fixed parametric representation that may not be sufficient for complex domains. This thesis introduces two algorithms which can learn an adaptive representation of policy: the Policy Tree algorithm, which learns a decision tree over different instantiations of a base policy, and the Policy Conjunction algorithm, which adds conjunctive features to any base policy that uses a linear feature representation. In both of these algorithms, policy gradient is used to grow the representation in a way that enables the maximum local increase in the expected return of the policy. Experiments show that these algorithms can choose genuinely helpful splits or features, and significantly improve upon the commonly used linear Gibbs softmax policy, which is chosen as the base policy.

  • Subjects / Keywords
  • Graduation date
    Spring 2015
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/R38Q2B
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.