A Study of Unsupervised Outlier Detection for One-Class Classification

  • Author / Creator
    Swersky, Lorne
  • One-class classification is a problem that arises in situations where we have data available that describes objects belonging to a particular class but very little or no data describing objects that do not belong to this class, where we must then be able to classify new data objects according to whether or not they belong to this class. Outlier detection is a similar problem where we are presented with an unlabelled collection of data and must determine whether the data objects are outliers or inliers according to some definition of an outlier. In this thesis we explore the relationship between one-class classification and outlier detection by comparing methods used for each problem in a common framework, investigate some unique issues in applying one-class classification in a realistic setting, as well as consider methods to combine one-class classifiers.

    In comparing one-class classification and outlier detection, we note that they are similar problems in that both are looking to classify data objects as either inlier or outlier. We extend previous comparison studies by studying a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics using the commonly used area under the receiver operating characteristic curve (AUC) measure, as well as the adjusted precision-at-n measure used in unsupervised outlier detection. An additional contribution is the adaptation of the unsupervised outlier detection method, GLOSH, to the one-class classification setting.

    The lack of outlier data objects available for training means that we cannot use the standard procedure of using a validation set in order to estimate the generalization performance of a model for one-class classification, and so selecting good parameters for a method can be difficult. We investigate this problem by comparing the performance of methods at different parameter values to determine how stable their performance is with respect to their parameters, and whether certain parameter settings are likely to do well across multiple datasets.

    In combining one-class classifiers, we apply rank-based combination strategies to the outlier rankings produced by multiple one-class classifiers and compare different strategies. We find that simple combinations of ranks can produce robust classifiers which outperform individual classifiers.

  • Subjects / Keywords
  • Graduation date
    Fall 2018
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.