Android Malware Detection based on Factorization Machine

  • Author / Creator
    Li, Chenglin
  • With the increasing popularity of Android smart phones in recent years, the amount of Android malware is growing rapidly. Due to its great threat and damage to mobile phone users, Android malware detection has become increasingly important in cyber security. Traditional methods for android malware detection, like signature-based ones, cannot protect users from the ever-increasing sophistication and rapid behavior changes in new types of Android malware. Therefore, lots of recent efforts have been made to use machine learning to characterize and discover the malicious behavior patterns of mobile apps for malware detection. In this thesis, we propose a novel and highly reliable machine learning algorithm for Android Malware detection based on the use of Factorization Machine and the extensive study of Android app features. We first extract 7 types of features that are highly relevant to malware detection from the manifest file and source code of each mobile app, including Application Programming Interface (API) calls and permissions.
    We have two observations. First, the numerical feature representation of an app usually forms a long and highly sparse vector. Second, the interactions among different features are critical to revealing some malicious behavior patterns. Based on these observations, we propose to use factorization machines, which fits the problem the best, as a supervised classifier for malware detection. According to extensive performance evaluation, our proposed method achieved a test result of 99.01% detection rate with a false positive rate of 0.09% on the DREBIN dataset, and a 99.2% detection rate with only 0.93% false positive rate on the AMD dataset, significantly outperforming a number of state-of-the-art machine-learning-based Android malware detection methods as well as commercial antivirus engines.

  • Subjects / Keywords
  • Graduation date
    Fall 2018
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.