Fast Parallel Association Rule Mining Without Candidacy Generation

  • Author(s) / Creator(s)
  • Technical report TR01-12. Searching for frequent patterns in transactional databases is considered one of the most important data mining problems. Most current association mining algorithms, whether sequential or parallel, adopt an apriori-like algorithm that requires full multiple I/O scans of the data set and expensive computation to generate the potential frequent items. The recent explosive growth in data collection made the current association rule mining algorithms restricted and inadequate to analyze excessively large transaction sets due to the above mentioned limitations. In this paper we introduce a new parallel algorithm MLFPT (Multiple Local Frequent Pattern Tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating the candidate items, and distributing the work fairly among processors to achieve near optimum load balance. We have devised partitioning strategies at different stages of the mining process to achieve near optimal balancing between processors. This algorithm has been experimented on databases made of hundreds of thousands of dimensions and size larger than 10 Giga bytes using 64 processors SGI origin shared memory machine. We have successfully tested our algorithm on datasets larger than 50 million transactions. This paper presents the results of our algorithm on different data sizes experimented on different numbers of processors, and studies the effect of these variations on the overall performance. | TRID-ID TR01-12

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International