Methodical Advice Collection and Reuse in Deep Reinforcement Learning

  • Author / Creator
  • Reinforcement learning (RL) has shown great success in solving many challenging tasks via the use of deep neural networks. Although the use of deep learning for RL brings immense representational power to the arsenal, it also causes sample inefficiency. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to a student. Despite the promising results in the action advising literature, there are limitations, such as a limited advice budget, inflexibility of choices to conduct advice collection and reuse. This thesis proposes the use of single (student agent) or dual uncertainties (student and the model of teacher) to drive the advice collection and reuse process in order to provide more flexibility in our algorithms to more efficiently exploit a teacher’s advice budget. Additionally, this thesis introduces a new method to compute uncertainty for a deep learning RL agent using a secondary neural network. The results show that using two uncertainties to drive advice collection and reuse improves learning performance across several Atari games.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.