Usage
  • 68 views
  • 107 downloads

Learning Programmatic Policies from ReLU Neural Networks

  • Author / Creator
    Orfanos, Spyros
  • Oblique decision trees use linear combinations of features in the decision nodes. Due to the non-smooth structure of decision trees, training oblique decision trees is considerably difficult as the parameters are tuned using expensive non-differentiable optimization techniques or found by extensive search of a discretized space. Recent work showed that one can induce oblique decision trees from the derivatives of ReLU neural networks. For learning with gradient descent, the derivative-based model requires one to anneal from a smooth approximation of ReLU activation functions to ReLUs during training and to use a dynamic programming algorithm to efficiently compute the gradients.

    In this thesis we show that regular ReLU neural networks trained with backpropagation can be written as oblique decision trees. We also show that hidden units from ReLU networks can be used to implicitly train oblique decision trees using computationally efficient algorithms for axis-aligned trees. Our result provides not only simple and efficient ways to induce oblique decision trees, but effective methods for synthesizing programmatic policies. This is because oblique decision trees can be seen as programs written in a domain-specific language commonly used in the programmatically interpretable reinforcement learning literature. All one needs to do is use a ReLU neural network to encode the policy, and learn using any policy gradient algorithm. Our methods can then map the policy learned with gradient descent to a program. Empirical results show that our approaches for synthesizing programmatic policies is competitive with the current state of the art if the synthesized programs are small; our methods outperforms the state of the art in almost all control problems evaluated if it is allowed to synthesize longer programs.

  • Subjects / Keywords
  • Graduation date
    Spring 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-cqbq-2z25
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.