Hardware Accelerators for Deep Neural Networks

Machupalli, Raju

doi:doi:10.7939/r3-gywx-8e23

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

223 views
528 downloads

Hardware Accelerators for Deep Neural Networks

Author / Creator

Machupalli, Raju
Deep Neural Networks (DNNs) have recently evolved as the state-of-the-art method for machine learning applications such as object detection, face recognition, and image classification. However, a DNN typically has high computational complexity, and specialized hardware accelerators would be helpful to obtain real-time performance.
Over the last decade, many accelerators have been proposed in the literature for DNN models. This thesis presents a comprehensive review of the existing DNN accelerators. The accelerators were classified into four categories: ALU, Dataflow, Sparsity, and Hybrid, based on the optimization techniques used. The classification provides a good starting point to identify significant areas where an accelerator can be further optimized for better throughput, latency, and energy performance.
In this thesis, we also explored the bit-precision requirement of the MAC units for DNN implementation. A DNN has two modes of operations: Training and Inference. It is generally known that the inference can be done using lower-precision MAC units, but the training requires higher-precision MAC units. The lower-precision MAC units consume less energy which may be desirable for low-power applications. We propose an iterative MAC model where the inference will be done using low-precision MAC in a single pass, and the training will be done with the same low-precision MAC using multiple passes (to achieve higher bit precision). The proposed model, during training, determines the number of iterations on the fly by checking the error magnitude. Experimental results, with LeNet-300-100 model implemented using the iterative MAC, show a satisfactory performance for digit classification.
Subjects / Keywords
Graduation date

Fall 2023
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-gywx-8e23
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Electrical and Computer Engineering
Specialization
- Computer engineering
Supervisor / co-supervisor and their department(s)
- Mrinal, Mandal(Electrical and Computer Engineering)
- Mrinal, Mandal(Electrical and Computer Engineering)