Usage
  • 45 views
  • 70 downloads

What to do when your discrete optimization is the size of a neural network?

  • Author / Creator
    Silva, Hugo Luis A
  • Oftentimes, machine learning applications using neural networks involve solving discrete optimization problems, such as in pruning, parameter-isolation-based continual learning and training of binary networks. Still, these discrete problems are combinatorial in nature and are also not amenable to gradient-based optimization. Additionally, classical approaches used in discrete settings do not scale well to large neural networks, forcing scientists and empiricists to rely on alternative methods. Among these, two main distinct sources of top-down information can be used to lead the model to good solutions: (1) extrapolating gradient information from points outside of the solution set (2) comparing evaluations between members of a subset of the valid solutions. We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the later, while also noting that some hybrid methods combine the two. The main goal of this work is therefore to compare both approaches. For that purpose, we first overview the two classes while also discussing some of their drawbacks analytically. Then, on the experimental section, we compare their performances, starting with smaller Microworld experiments, which allow more fine-grained control of problem variables, and gradually moving towards larger problems in the overparametrized regime, including neural network regression and neural network pruning for image classification, where we additionally compare against magnitude-based pruning. A future version of this work will also include experiments on sequential task learning, which are currently underway.

  • Subjects / Keywords
  • Graduation date
    Fall 2023
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-4vy9-pp45
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.