Learning and Planning with the Average-Reward Formulation

  • Author / Creator
    Wan, Yi
  • The average-reward formulation is a natural and important formulation of learning and planning problems, yet has received much less attention than the episodic and discounted formulations. This dissertation makes three areas of contributions to algorithms and their theories concerning the average-reward formulation, primarily through the lens of reinforcement learning. The first area of contributions is a family of tabular learning and planning average-reward algorithms and their convergence theories. The second area of contributions of this dissertation is a complete extension of the options framework (Sutton, Precup, and Singh 1999) for temporal abstraction from the discounted formulation to the average-reward formulation. The extension includes general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as incremental planning variants of the learning algorithms, an option-interrupting algorithm, and convergence theories of the algorithms. The third area of contributions includes an average-reward prediction function approximation algorithm, its convergence analysis, and an error bound for the convergence point.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.