ERA

Download the full-sized PDF of Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive MiningDownload the full-sized PDF

Analytics

Share

Permanent link (DOI): https://doi.org/10.7939/R32J6858P

Download

Export to: EndNote  |  Zotero  |  Mendeley

Communities

This file is in the following communities:

Computing Science, Department of

Collections

This file is in the following collections:

Technical Reports (Computing Science)

Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Open Access

Descriptions

Author or creator
El-Hajj, Mohammad
Zaiane, Osmar
Additional contributors
Subject/Keyword
Data Mining
Frequent Patterns
Database Systems
Type of item
Report
Language
English
Place
Time
Description
Technical report TR03-08. Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: either the gigantic data structure built is assumed to fit in main memory, or the recursive mining process is too voracious in memory resources. Another major impediment is the repetitive and interactive nature of any knowledge discovery process. To tune parameters, many runs of the same algorithms are necessary leading to the building of these huge data structures time and again. This paper proposes a new disk-based association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding frequent patterns could be achieved in less than a full scan with random access. Second, for each frequent item, a relatively small independent tree is building summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. Experimental studies reveal that our Inverted Matrix approach outperform FP-Tree especially in mining very large transactional databases with a very large number of unique items. Our random access disk-based approach is particularly advantageous in a repetitive and interactive setting.
Date created
2003
DOI
doi:10.7939/R32J6858P
License information
Creative Commons Attribution 3.0 Unported
Rights

Citation for previous publication

Source
Link to related item

File Details

Date Uploaded
Date Modified
2014-04-30T23:19:46.899+00:00
Audit Status
Audits have not yet been run on this file.
Characterization
File format: pdf (Portable Document Format)
Mime type: application/pdf
File size: 191390
Last modified: 2015:10:12 17:00:52-06:00
Filename: TR03-08.pdf
Original checksum: 73505644105269aade502abc9e572d13
Well formed: true
Valid: true
Page count: 10
Activity of users you follow
User Activity Date