Usage
  • 140 views
  • 220 downloads

Mixed Low-bit Quantization for Model Compression with Layer Importance and Gradient Estimations

  • Author / Creator
    Liu, Hongyang
  • Deep neural networks (DNNs) have been widely used in the modern world in recent years. However, due to the substantial memory consumption and high computational power use of DNNs, deploying them on devices with limited resources is challenging. Model compression methods can provide us with a remedy here. Among those techniques, neural network quantization has achieved a high compression rate using a low bitwidth representation of weights and activations while maintaining the accuracy of the high-precision original network. However, mixed precision (per-layer bit-width precision) quantization requires careful tuning to maintain accuracy while achieving further compression and higher granularity than fixed precision quantization. In this thesis, We propose an accuracy-aware criterion to quantify the layer’s importance rank. Our method applies imprinting per layer, which acts as a proxy module for accuracy estimation in an efficient way. We rank the layers based on the accuracy gain from previous modules and iteratively quantize those with less accuracy. Previous mixed-precision methods either rely on expensive search techniques such as reinforcement learning (RL) or end-to-end optimization with a lack of interpretation
    to the quantization configuration scheme. Our method is a one-shot, efficient, accuracy-aware information estimation and thus draws better interpretability to the selected bit-width configuration. We have also pointed out the problem of
    the Straight-Through Estimator (STE), which is commonly used for gradients estimation in the quantization field. We’ve discussed some ways to address the problem of using STE.

  • Subjects / Keywords
  • Graduation date
    Spring 2022
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-n9my-f856
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.