Usage
  • 126 views
  • 138 downloads

Sample-Efficient Control with Directed Exploration in Discounted MDPs Under Linear Function Approximation

  • Author / Creator
    Kumaraswamy, Raksha K
  • An important goal of online reinforcement learning algorithms is efficient data collection to learn near-optimal behaviour, that is, optimizing the exploration-exploitation trade-off to reduce the sample-complexity of learning. To improve sample-complexity of learning it is essential that the agent directs its exploratory behaviour towards either visiting unvisited parts of the environment, or reducing uncertainty it may have with respect to the visited parts. In addition to such directed exploration, sample-complexity of learning can be improved by using a representation space that is amenable to online reinforcement learning. This thesis presents several algorithms that focus on these avenues for improving the sample-complexity of online reinforcement learning, specifically in the setting of discounted MDPs under linear function approximation.
    A key challenge to direct effective online exploration is the learning of reliable uncertainty estimates. We address this by deriving high-probability confidence-bounds for value uncertainty estimation. We use these derived confidence-bounds to design two algorithms that direct effective online exploration; they differ mainly in their approach to directing ex- ploration for visiting unknown regions of the environment. In the first algorithm we propose a heuristic to do so, whereas the second algorithm uses a more principled strategy based on optimistic initialization. The second algorithm is also a planning-compatible algorithm that can be parallelized, scaling sample-efficiency benefits with the compute resources afforded to the algorithm.
    To improve sample-efficiency by utilizing representations that are amenable to online reinforcement learning, the thesis proposes a simple strategy for learning such representations offline. The representation learning algorithm encodes a property we call locality. Locality reduces interference in learning targets used by online reinforcement learning algorithms, consequently improving its sample-efficiency. The thesis shows that these learned representations also aid effective online exploration.
    Overall, this thesis proposes algorithms for improving sample-efficiency of online reinforcement learning, motivates their utility, and evaluates their benefits empirically.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/r3-fkfq-2p67
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.