Building a Competitive Associative Classification Model

Sood, Nitakshi

doi:doi:10.7939/r3-wbe7-g094

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

211 views
315 downloads

Building a Competitive Associative Classification Model

Author / Creator

Sood, Nitakshi
The power of associative classifiers is to determine patterns from the data and perform classification based on the features that are most indicative for prediction. Although they have emerged as competitive classification systems, however, they suffer limitations such as without prior knowledge it would be cumbersome to state the proper support and confidence threshold values which vary with the dataset. Most of the existing rule-based classifiers also suffer from the production of a large number of classification rules, affect- ing the model readability. This hampers the classification accuracy as noisy rules might not add any useful information for classification and also lead to longer classification time. In this study, we further propose SigD2 which uses a novel, two-stage pruning strategy which prunes most of the noisy, redundant and uninteresting rules and makes the classification model more accurate and readable.
Furthermore, deciding a heuristic for associative classification system such as sum, average, minimum, maximum of confidence of the rules is yet another challenging task. In our study, we propose BiLevCSS (Bi-Level Classification using Statistically Significant Rules), a two stage classification model which implements automatic learning on the rules. In the first stage of learning, statistically significant classification association rules are derived through as- sociation rule mining. Further in the second stage of learning, we employ a machine learning based algorithm which automatically learns the weights of
the rules for classification. We use the p-value obtained from the Fisher’s exact test to determine the statistical significance of rules. The rules obtained from the first stage form meaningful features to be used in the second stage of learning. Therefore, in this study, the supervised learning classifiers like Neu- ral Network, SVM and rule based classifiers like RIPPER help in classifying the rules automatically in the second stage of learning, instead of forcing the use of a specific heuristic for the same.
Further, it has been noticed that due to the huge success of deep learning, other machine learning paradigms have had to take back seat. Yet other mod- els, particularly rule-based, are more readable and explainable and can even be competitive when labeled data is not abundant. To make SigDirect more competitive with the most prevalent but uninterpretable machine learning- based classifiers like neural networks and support vector machines, we further propose bagging and boosting on the ensemble of the Sigdirect classifier. The results of the proposed algorithms are quite promising and we are able to ob- tain a minimal set of statistically significant rules for classification without jeopardizing the classification accuracy.
Another challenge faced by associative classifiers is their inability to deal with very high dimensional data sets. In order to address this problem, we divide the high dimensional feature space into smaller subspaces, to be given as an input to the ensemble of SigD2. Our proposed algorithm, Diverse SubSpace for Ensemble (DSAFE) ensures diversity among each subspace while ensuring the coverage of the total feature space. This strategy although compensates on the explainability factor of the complete model, however, each subspace still remains a white-box and there is a possibility of getting the explanation of obtained results. We have also tested all our models on the UCI datasets and were found to outperform various state-of-the-art classifiers not only in terms of classification accuracy but also in terms of the number of rules. We have also tested our classification model on the COVID-19 Kaggle dataset for prediction problem and the results obtained are quite promising. Thus, our study highlights the fact that the association based classification models can be quite competitive with various other existing approaches. Lastly, we also show that designing a multi-layered architecture for feature transformation on SigD2, like deep neural networks, can potentially give good results.
Subjects / Keywords
Graduation date

Fall 2020
Type of Item

Thesis
Degree

Master of Science
DOI

https://doi.org/10.7939/r3-wbe7-g094
License

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Language

English
Institution

University of Alberta
Degree level

Master's
Department
- Department of Computing Science
Supervisor / co-supervisor and their department(s)
- Zaïane, Osmar (Computing Science)