Download Infrequent Weighted Item set Mining Using Frequent Pattern Growth

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Infrequent Weighted Item set Mining Using
Frequent Pattern Growth
Abstract:
Frequent weighted item sets represent correlations frequently
holding in data in which items may weight differently. However, in some
contexts, e.g., when the need is to minimize a certain cost function,
discovering rare data correlations is more interesting than mining
frequent ones Our method operates on a graph where vertices correspond
to frequent items and edges correspond to frequent item sets of size two.
Utility based data mining is a new research area interested in all types of
utility factors in data mining processes and targeted at incorporating
utility considerations in data mining tasks. Utility based data mining is a
new research area interested in all types of utility factors in data mining
processes and targeted at incorporating utility considerations in data
mining tasks. The UMining algorithm is used to find all high utility
itemsets within the given utility constraint threshold. Fast Utility Frequent
Mining, is a more precise and very recent algorithm. It takes both the
utility and the support measure into consideration. This method gives the
itemsets that are both high utility as well as that are, frequent. A new
concept is proposed for generating different kinds of itemsets namely
High utility and high frequent itemsets (HUHF), High utility and low
frequent itemsets (HULF), Low utility and high frequent itemsets
(LUHF) and Low utility and low frequent itemsets (LULF). These
itemsets are generated using the basic framework FP-Growth algorithms.
i
Existing System:
 The traditional association rule mining (ARM) is used to identify
frequently occurring patterns of item sets.
 ARM model treats all the items in the database equally by only
considering if an item is present in a transaction or not.
 The frequent item set mining approach may not satisfy a sales
manager’s goal.
 The support measure reflects the statistical correlation of items, but
it does not reflect their semantic significance. In other words,
statistical correlation may not measure how useful an item set is in
accordance with a user’s preferences (i.e., profit).

The profit of an item set depends not only on the support of the
item set, but also on the prices of the items in that item set.
Disadvantage of Existing System:
The practical usefulness of the frequent itemset mining is limited by
the significance of the discovered itemsets. There are two principal
limitations. A huge number of frequent itemsets that are not interesting to
the user are often generated when the minimum support is low. For
example, there may be thousands of combinations of products that occur in
1% of the transactions. If too many uninteresting frequent itemsets are
found, the user is forced to do additional work to select the itemsets that are
i
indeed interesting.
Proposed System:
 Two novel quality measures are proposed to drive the IWI mining
process. Infrequent item sets that do not contain any infrequent
subset have been proposed.

Experiments, performed on both synthetic and real-life data sets,
show efficiency and effectiveness of the proposed approach.
 In particular, they show the characteristics and usefulness of the
item sets discovered from data coming from benchmarking and
real.
 To reduce the computational time the authors introduce the concept
of residual tree.
 The item sets that are both high frequent and high utility can be
obtained using this method.
 Then Customer Relationship Management (CRM) is incorporated
into the system by tracking the customers who are frequent buyers
of the different kinds of item sets.
Advantages of Proposed System:
 In proposed Customer Relationship Management (CRM) is
incorporated into the system by tracking the customers who are
frequent buyers of the different kinds of item sets. So we spitted
the Frequent Utility Frequent Mining (FUFM)
a. HUHF
b. HULF
c. LUHF
ii
d. LULF
Software requirements:
 Software
:
Java 1.7
 Tool
:
Net Beans 7.1
 Database
:
SQL Server 2000
 Operating System :
Windows XP
Hardware requirements:
 1GB RAM
 40GB Hard disk
 Intel(R) Core 2 Duo processor (2.00 GHz)
iii
4