An Investigation on Self-Attentive Models for Malware Classification

  • Author / Creator
    Lu, Qikai
  • Malware classification is a critical task in cybersecurity. It offers insights on the threats posed to victim devices from different malware and aids in the designing of precautionary measures. In real world applications, due to the vast amount of malware present in the networks, real-time malware classification must be both accurate and fast. In this thesis, we first investigate the application of self-attentive models to classify malicious binary files from raw bytes alone. We propose two transformer-based models. The first model, SeqConvAttn, conducts sequenced-based classification using byte sequences extracted from binary files. Noting that the feedforward latency of SeqConvAttn scales poorly to input sequence length, we then experimented with converting binary files into images, and introduced the second model, ImgConvAttn, to apply self-attention to image-based classification. Next, we investigated the integration of the two models into a two-stage framework, such that the superior accuracy and low latency of the respective models can both be leveraged. Through experiments on the BIG 2015 Dataset provided by the Microsoft Malware Classification Challenge and a select subset of the BODMAS Malware Dataset, we demonstrate that self-attention can enhance the accuracy of malware classifiers for both sequence-based and image-based classification. Furthermore, we demonstrate that our two-stage framework design can reduce inference latency significantly while maintaining high accuracy.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.