Usage
  • 19 views
  • 22 downloads

On the benefits of sparsity in value function approximators for Reinforcement Learning

  • Author / Creator
    Davelouis Gallardo, Fatima D
  • In machine learning, sparse neural networks provide higher computational efficiency and in some cases, can perform just as well as fully-connected networks. In the online and incremental reinforcement learning (RL) problem, Prediction Adapted Networks (Martin and Modayil, 2021) is an algorithm that can adapt the sparse connectivity of a shallow value network with random hidden-layer weights. Martin and Modayil evaluated Prediction Adapted Networks (PANs) in the RL prediction setting and showed promising results, suggesting that one can use multiple online predictions of input signals to discover high-performing NN sparse topologies with no a priori inductive biases. However, there remain some open questions that one can ask about this algorithm. For instance, do the statistical benefits of PANs carry over to reinforcement learning control in multiple environments? Do PANs provide performance gains when we learn the sparse value network’s weights end-to-end in both the prediction and control settings? How does predictive sparsity compare against sparse network structures learned end-to-end? The contributions of this work are two fold. First, we investigate the above questions and provide answers. Second, we devise a methodology that encodes sparse value network structures as binary masks and systematically evaluate their performance. In one RL control environment, we find that predictive sparsity performs on par with both a fully-connected architecture and a sparse network induced by L1 regularization. However, in another domain PANs does not generate a sparse structure that can outperform even random sparsity. Surprisingly, in the same RL prediction environment that was used in the PANs original work, we found that learning the hidden-layer weights does not lead to better performance, suggesting there may be unidentified properties of environments for which PANs is best suited.

  • Subjects / Keywords
  • Graduation date
    Spring 2024
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-63xz-v628
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.