Statistically Significant Dependencies for Spatial Co-location Pattern Mining and Classification Association Rule Discovery

  • Author / Creator
    Li, Jundong
  • Spatial co-location pattern mining and classification association rule discovery are two canonical tasks studied in the data mining community. Both of them focus on the detection of sets of features that show associations. The difference is that in spatial co-location pattern mining, the features are all spatial features which contain location information. While in classification association rule discovery, we constrain the mining process to generate association rules that always have as consequent a class label. Existing methods on these two tasks mostly use the support-confidence framework in an Apriori-like way or through a FP-growth approach to mine the co-location patterns and classification association rules which require the setting of confounding parameters. However, the lack of statistical dependencies between features in the used framework may lead to the omission of many interesting patterns and/or the detection of meaningless rules. To address the above limitations, we fully exploit the property of statistical significance and propose two novel algorithms for these two tasks, respectively. The CMCStatApriori, a co-location mining algorithm, is able to detect more general and statistically significant co-location rules. We use it on real datasets with the National Pollutant Release Inventory (NPRI), and propose a classification scheme to help evaluate the discovered co-location rules. The second algorithm, SigDirect, an associative classifier, aims to mine classification association rules which show statistically significant dependencies between a set of antecedent features and a class label. Experimental results on UCI datasets show that SigDirect achieves a competitive if not better classification performance while indeed produces a very small number of rules. We also show the potential of integrating statistically significant negative classification association rules in the SigDirect algorithm.

  • Subjects / Keywords
  • Graduation date
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
    • Department of Computing Science
  • Supervisor / co-supervisor and their department(s)
    • Zaiane, Osmar (Computing Science)
  • Examining committee members and their departments
    • Zaiane, Osmar (Computing Science)
    • Sander, Joerg (Computing Science)
    • Musilek, Petr (Electrical and Computer Engineering)