Usage
  • 318 views
  • 232 downloads

Characterizing Discrete Representations for Reinforcement Learning

  • Author / Creator
    Meyer, Edan J
  • In reinforcement learning (RL), agents learn to maximize a reward signal using nothing but observations from the environment as input to their decision making processes. Whether the agent is simple, consisting of only a policy that maps observations to actions, or complex, containing auxiliary components like value functions and world models, the agent's sole dependence on observations remains invariant. With this fact in mind, the importance of representation becomes clear: changes to the representation of observations affect every part of an agent. Good representations can help an agent learn better policies faster, and bad representations can have the opposite effect.

    In this work, we advocate for the use of a form a discrete representations in RL. Through a series of three distinct problem settings in pixel-based Minigrid environments, we incrementally build up to the continual RL setting, where an agent must continually adapt in order to change to best maximize reward. We compare models learned over discrete representation spaces to those learned over continuous representation spaces in each setting, identifying different benefits of discrete representations in each. When learning a model of the world, discrete representations enable more accurate modeling of the world. In episodic RL, policies learned over discrete representations learn faster. And in continual RL, agents learning from discrete representations are quicker to adapt to changes in the environment. In summary, we find that discrete representations enable learning faster and learning better solutions.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-2s7y-a232
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.