Usage
  • 176 views
  • 234 downloads

Feature Generalization in Deep Reinforcement Learning: An Investigation into Representation Properties

  • Author / Creator
    Miahi, Erfan
  • In this thesis, we investigate the connection between the properties and the generalization performance of representations learned by deep reinforcement learning algorithms. Much of the earlier work on representation learning for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation---good representations emerge under appropriate training schemes. We bring these two perspectives together, empirically investigating the properties of representations that are good at generalization in reinforcement learning. This analysis allows us to provide novel hypotheses regarding the impact of auxiliary tasks in end-to-end training of deep reinforcement learning methods. We introduce and measure six representational properties over more than 28 thousand agent-task settings. We consider DQN agents with convolutional networks in a pixel-based navigation environment. We develop a method to better understand why some representations improve generalization, through a systematic approach varying task similarity and measuring and correlating representation properties with generalization performance. Using this insight, we design two novel auxiliary losses and show that they generalize as well as our best baselines.

  • Subjects / Keywords
  • Graduation date
    Fall 2022
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-jys9-3j89
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.