Usage
  • 362 views
  • 830 downloads

Design of FPGA-based Accelerators for Deflate Compression and Decompression using High-Level Synthesis

  • Author / Creator
    Ledwon, Morgan
  • As the volumes of data that are transmitted over the internet continue to increase, data compression is becoming increasingly necessary in order to efficiently utilize fixed data storage space and bandwidth. The Deflate compression algorithm is one of the most widely used lossless data compression algorithms, forming the basis of the .zip and .gzip file formats as well as the Hypertext Transfer Protocol (HTTP). The provision of Field-Programmable Gate Arrays (FPGAs) to implement hardware accelerators alongside conventional Central Processing Unit (CPU) servers in the Internet cloud is becoming increasingly popular. The ability for FPGAs to be rapidly reprogrammed and the inherently parallel resources that they provide makes FPGAs especially well suited for certain cloud-computing applications, like data compression and decompression. High-Level Synthesis (HLS) is a relatively new technology that enables FPGA designs to be specified in a high-level programming language like C or C++, instead of a hardware description language, as done conventionally, and enables designs to be implemented at a faster pace. This thesis examines the design and implementation of FPGA-based accelerators for both Deflate compression and decompression using high-level synthesis.

    In Deflate compression, a balance between the resulting compression ratio and the compression throughput needs to be found. Achieving higher compression throughputs typically requires sacrificing some compression ratio. In Deflate decompression, the inherently serial nature of the compressed format makes task-level parallelization difficult without altering the standard format. In order to maximize the decompression throughput without altering the format, other sources of parallelism need to be found and exploited. Both our compressor and decompressor designs were specified in C++ and synthesized using Vivado HLS for a clock frequency of 250 MHz on a Xilinx Virtex UltraScale+ XCVU3P-FFVC1517 FPGA. Both were tested using the Calgary corpus benchmark files. In the design of the compressor, many different areas of the design that affect the trade-off between compression ratio and compression throughput, such as the hash bank architecture and hash function, are examined. Our implemented compressor design was able to achieve a fixed compression throughput of 4.0 GB/s while achieving a geometric mean compression ratio of 1.92 on the Calgary corpus. In the design of the decompressor, various FPGA hardware resources are utilized in order to increase the amount of exploitable parallelism such that the decompression process can be accelerated. Our decompressor design was able to achieve average input throughputs of 70.73 MB/s and 130.58 MB/s on dynamically and statically compressed files, respectively, while occupying only 2.59% of the Lookup Tables (LUTs) and 2.01% of the Block Random-Access Memories Block Random-Access Memories (BRAMs) on the FPGA.

  • Subjects / Keywords
  • Graduation date
    Fall 2019
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-pg03-an82
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.