Usage
  • 53 views
  • 246 downloads

Leveraging Off-Policy Prediction in Recurrent Networks for Reinforcement Learning

  • Author / Creator
    Schlegel, Matthew K
  • Partial observability---when the senses lack enough detail to make an optimal decision---is the reality of any decision making agent acting in the real world. While an agent could be made to make due with its available senses, taking advantage of the history of senses can provide more context and enable the agent to make better decisions. This thesis investigates recurrent architectures to learn agent state (a summarization of the agent's history), and identifies some modifications---inspired by predictive representations of state---to enable efficient learning in (continual) reinforcement learning. First, I contribute to standard recurrent neural networks trained through back-propagation through time. This contribution provides pragmatic recommendations for incorporating action information into a recurrent architecture, and through extensive empirical investigations shows the trade-offs of several techniques. Second, I develop a recurrent predictive architecture which uses temporal abstractions---predictions in the form of general value functions---as the basis for its state representation. I show advantages of this architecture over standard recurrent networks in a continuing reinforcement learning domain, derive an objective and corresponding learning algorithm, and discuss several added concerns when using this architecture---such as discovery, what types of networks can be constructed, and off-policy prediction.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-53cp-y292
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.