Fast Parallel Association Rule Mining Without Candidacy Generation

Lu, Paul; El-Hajj, Mohammad; Zaiane, Osmar

doi:doi:10.7939/R3RN3078C

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Technical Reports (Computing Science)

Usage

243 views
379 downloads

Fast Parallel Association Rule Mining Without Candidacy Generation

Author(s) / Creator(s)
Technical report TR01-12. Searching for frequent patterns in transactional databases is considered one of the most important data mining problems. Most current association mining algorithms, whether sequential or parallel, adopt an apriori-like algorithm that requires full multiple I/O scans of the data set and expensive computation to generate the potential frequent items. The recent explosive growth in data collection made the current association rule mining algorithms restricted and inadequate to analyze excessively large transaction sets due to the above mentioned limitations. In this paper we introduce a new parallel algorithm MLFPT (Multiple Local Frequent Pattern Tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating the candidate items, and distributing the work fairly among processors to achieve near optimum load balance. We have devised partitioning strategies at different stages of the mining process to achieve near optimal balancing between processors. This algorithm has been experimented on databases made of hundreds of thousands of dimensions and size larger than 10 Giga bytes using 64 processors SGI origin shared memory machine. We have successfully tested our algorithm on datasets larger than 50 million transactions. This paper presents the results of our algorithm on different data sizes experimented on different numbers of processors, and studies the effect of these variations on the overall performance. | TRID-ID TR01-12
Date created

2001
Subjects / Keywords
Type of Item

Report
DOI

https://doi.org/10.7939/R3RN3078C
License

Attribution 3.0 International

Language
- English