Download Pattern Mining Techniques of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
Pattern Mining Techniques of Data Mining
Vijay Subhash Patil1, Prof. Neeta A. Deshpande2
PG Student, Department of Computer Engineering, MCOERC, Nashik, Maharashtra, India.
Associate Professor, department of Computer Engineering, MCOERC, Nashik, Maharashtra, India.
This paper performs a high-level overview of frequent
pattern mining methods, extensions and applications. With
a rich body of literature on this theme, we organize our
discussion into different techniques. Data mining has been
applied in the areas such as loan/credit card approval
predict good customers based on old customers (Banking),
identify those who are likely to leave for a
competitor(Customer relationship management), Fraud
detection (telecommunications, financial transactions),
identify likely responders to promotions (Targeted
marketing) etc. The algorithm based on Apriori[4] and FPtree[2] applied on association rule mining is most useful to
discover frequent pattern.
Remainder of this paper is organized as follows. Section
2 gives the short discussion on data mining, KDD process
and challenges involved in it. Section 3 gives the brief
overview of recent work made in Frequent Pattern Mining
(FPM) and also presents comparison of recent FPM
methods followed by conclusion in section 4.
Abstract— Frequent patterns are patterns that appear in a
data set frequently. This method searches for recurring
relationship in a given data set. Several techniques have been
proposed to improve the performance of frequent pattern
mining algorithms. This paper presents revision of different
frequent mining techniques including Apriori based
algorithms, association rule based, Cp tree, and FP growth. A
brief description of each technique has been provided. In the
earlier, different frequent pattern mining techniques are
compared based on various parameters of importance. We
have studied different techniques for frequent pattern mining
proposed by different researchers and scientists. Each
technique has its own merits and demerits. Performance of
particular technique depends on input data and available
resources. These techniques are found in many applications
such as market basket approach, including applications in
marketing and e-commerce, classification, clustering, web
mining, bio-informatics and finance.
Keywords— Data Mining, Frequent Patterns, Knowledge
Discovery, Pattern Mining, Information Retrieval.
I. INTRODUCTION
II. DATA MINING AND KNOWLEDGE DISCOVERY
Data mining refers to extracting or ―mining‖ knowledge
from large amounts of data. Frequent pattern mining has
been a focused theme in data mining research for previous
decades. Abundant literature has been dedicated to this
research and tremendous progress has been made. The
literature has been ranging from appropriate and scalable
algorithms for frequent itemset mining in transaction
databases to numerous researchers, such as sequential
pattern mining[1], structural pattern mining[2], correlation
mining, associative classification[3], and frequent patternbased clustering, as well as their wide applications. In this
paper, we provide overall description of the current status
of frequent pattern mining and discussed a few appropriate
research directions. We consider that frequent pattern
mining research has substantially broadened the scope of
data analysis and will have deep impact on data mining
methodologies and applications in the long run. However,
there are still some types of challenging research issues that
need to be solved before frequent pattern mining can claim
a cornerstone approach in data mining applications.
Data mining, also popularly referred to as knowledge
discovery from data (KDD), is the automated or convenient
extraction of patterns representing knowledge implicitly
stored or captured in large databases, data warehouses, the
Web, other massive information repositories, or data
streams. Knowledge discovery as a process is depicted in
Figure 1 and consists of an iterative sequence of the
following steps:
1. Data cleaning- To remove noise and inconsistent data.
2. Data integration- Where multiple data sources may be
combined.
3. Data selection - Where data relevant to the analysis
task are retrieved from the database.
4. Data transformation - Where data are transformed or
consolidated into forms appropriate for mining by
performing summary or aggregation operations, for
instance.
5. Data mining - An essential process where intelligent
methods are applied in order to extract data patterns.
6. Pattern evaluation - To identify the truly interesting
patterns representing knowledge based on some
interesting measures.
523
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
7. Knowledge presentation - Where visualization and
knowledge representation techniques are used to
present the mined knowledge to the user.
Steps 1 to 4 are different forms of data preprocessing,
where the data are prepared for mining. From a database
perspective on knowledge discovery, efficiency and
scalability are key issues in the implementation of data
mining systems. Having a large amount of redundant data
may slow down or confuse the knowledge discovery
process.
Data
Wareho
use
Knowledge
Data
Selection
Preprocessing
Interpretation
KDD systems incorporate theories, algorithms, and
methods from all these fields.
In general, we nominate yourself that knowledge
discovery would be most effective if one could develop an
environment for human-centered, exploratory mining of
data, that is, where the human user is allowed to play a key
role in the process.
III. VARIOUS FREQUENT PATTERN MINING TECHNIQUES
In the section we covered the basics of itemset mining.
Most pertinent to this paper, three major classes of itemsets
were introduced:
1. A frequent itemset is simply a set of items occurring a
number of times with certain percentage.
2. A closed itemset is set of items which is as large as it
can possibly be without losing any transactions.
3. A maximal frequent itemset is a frequent itemset which
is not contained in another frequent itemset.
Unlike closed itemsets, maximal itemsets do not mean
anything about transactions.
Transformati
on
Data
Mining
Figure 1. The KDD Process
A. Association Rule Mining:
Association rule mining was proposed by Agrawal et al
[3] to understand the relationship among items transactions
or market baskets. For instance, if a customer buys cheese,
what is the chance that he/she buys butter at the same time?
This data may be useful for decision makers to determine
strategies in a store. More formally, given a set I = {I1,
I2,…, In} of items (e.g. mango, onion and knife, in a
supermarket). The database contains a number of
transactions. Each transaction ‗ t‘ is a binary vector with
t[k]=1 if t bought item I[k] and t[k]=0 otherwise(e.g. {1, 0,
0, 1, 0}). An association rule is of the form X ÎIj, where X
is a set of some items in I, and Ij is a single item not in X
(e.g. {Tomato, knife} Î Plate). A transaction t satisfies X if
for all items I k in X, t[k] = 1. The support for a rule XÎIj is
the fraction of transactions that satisfy the union of X and Ij
A rule X Î Ij has confidence c% if and only if c% of
transactions that satisfy X also satisfy Ij.
Data Mining aims at clustering, Association Rules,
Functional Dependencies, Data Summarization, Web
Application, Image Retrieval, etc. Some of the challenges
to the use of the data mining methodologies include the
following:
• Scalability problem involved in extremely large
heterogeneous databases spread over multiple files.
Data may be in different disks or across the web in
different geographical locations and combining such
data in a single very large file may be infeasible.
• To improve prediction accuracy we need to evaluate
features and reduce datasets dimensionally.
• To handle dynamic changes in data we need to choose
appropriate metrics and evaluation techniques.
• Integration of user interaction and knowledge domain.
• Quantitative estimation of performance.
• Efficient incorporation of soft computing tools.
Data mining supports knowledge discovery by finding
hidden patterns and associations, constructing analytical
models, performing classification and prediction, and
presenting the mining results using visualization tools. The
subject of KDD has evolved, and continues to evolve, from
the intersection of research from such fields as databases,
Machine learning, pattern recognition, statistics, artificial
intelligence, reasoning with uncertainties, knowledge
acquisition for expert systems, data visualization, machine
discovery, and high-performance computing.
Confidence (P→Q) = Probability (Q/P) =
The most common form of association rules is
implication rule which is in the form of A→B, where A I,
B I and A B = Ф. The support of the rule A→B is equal
to the percentage of transactions in I containing A→B. The
confidence of the rule A→B is equal to the percentage of
transactions in D containing A also containing B. The
mining process of association rule can be divided into two
steps.
524
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
1. Generation of frequent item sets: It generates all sets of
items that have support greater than a certain threshold,
called min support.
2. Generation of Rule association: The frequent item sets,
generate all association rules that have confidence
greater than a certain threshold called min confidence.
Association rule mining plays an important role in the
literature of data mining. It may face many challenges for
the development of efficient and effective methods. After
taking a closer look, we find that the application of
association rules requires much more investigations in
order to aid in more specific targets. We may see a trend
towards the study of applications of association rules.
Apriori and FP-growth algorithms are used in mining
association rules.
: {frequent items};
For (k=1; != Ф;k++) do begin
=Candidates generated from
For each transaction t in database do
Increment the count of all candidates in
that are contained in t
=candidates in
End
Return
with min_support
.
Find frequent itemsets with cardinality from 1 to k (kitemset) in iterations.Now Uses the frequent itemsets to
generate association rules. According to Apriori property,
All the subsets of a frequent itemset must also be frequent.
We can determine that four latter candidates cannot
possibly be frequent.
For example, let us take {I1, I2, I3}. The 2-item subsets
of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item
subsets of {I1, I2, I3} are members of L2, We will keep
{I1, I2, I3} in C3.
Let us take another example of {I2, I3, I5} which shows
how the pruning is performed. The 2-item subsets are {I2,
I3}, {I2, I5} & {I3, I5}. But {I3, I5} is not a member of L2
and hence it is not frequent violating Apriori Property.
Thus We will have to remove {I2, I3, I5} from C3.
Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking
for all members of result of Join operation for Pruning.
Now, the transactions in D are scanned in order to
determine L3, consisting of those candidates 3-itemsets in
C3 having minimum support.
At the end of each scan, transactions that are potentially
useful are used for the next iteration. A technique called
scan reduction uses candidate 2 item sets to generate
subsequent candidate item sets. If all intermediate data can
be held in the main memory, once scanning is required to
generate all candidate frequent item sets. Another scanned
data is required to verify whether the candidate frequent
item sets are frequent or not.
B. Apriori Principle Based Mining:
The Apriori principle calculates the probability of an
item presented in a frequent itemset, given that another
item or items is also present i.e. A subset of a frequent
itemset must be frequent. E.g., if {biscuits, milk, nuts} is
frequent, {biscuits, milk} must be. The Apriori is an
influential algorithm for frequent item sets mining for
Boolean association rules [3]. The Apriori method is
Proposed by Agrawal & Srikant 1994[5]. It is a similar
level-wise algorithm by Mannila et al. 1994[6]. Figure 2
illustates the working of Apriori principle. We may define
Apriori Property: A subset of frequent itemset must be
frequent. Join Operation: To find Lk , a set of candidate kitemsets is generated by joining Lk-1 with itself. To find the
frequent itemsets: the sets of items that have minimum
support. A subset of a frequent itemset must also be a
frequent itemset i.e., if {AB} is a frequent itemset, both
{A} and {B} should be a frequent itemset.
C. FP-Growth Method:
FP-growth method is a divide and conquers strategy in
which it decomposes both the mining task and database
according to the frequent patterns obtained. So far it leads
to focused on the search of smaller databases. Other factors
include no candidate generation, no candidate test
compressed database. This scalable frequent pattern
method was proposed by Han et al [7] in 2000 which used
FP-tree. The FP-growth algorithm is the frequent patterns
mining algorithm without candidate generation.
Figure 2. Illustrating Apriori principle
Apriori Algorithm:
: Candidate itemset of size k.
: Frequent itemset of size k.
525
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
Frequent Pattern tree is used to design a compact data
structure for the efficient frequent pattern mining.
Let the transaction database DB and the minimum
support threshold be 3 (i.e., ξ = 3).A unique data structure
can be designed based on the following observations: Since
only the frequent items will play a role in the frequentpattern mining, it is necessary to perform one scan of
transaction database DB to identify the set of frequent
items (with frequency count obtained as a by-product).
If the set of frequent items of each transaction can be
stored in some compact structure, it may be possible to
avoid repeatedly scanning the original transaction database.
If multiple transactions share a set of frequent items, it
might be possible to merge the shared sets with the number
of occurrences registered account. It is easy to check
whether two sets are identical if the frequent items in all of
the transactions are listed according to a fixed order.
FP-tree is one of best approach to discover frequent
pattern to overcome the drawback of the apriori algorithm.
It requires only two passes of processing. One pass is
required for ordering and structuring frequent items other
pass is for inserting those frequent items in the tree. FP-tree
as better performance than Apriori as reduce database scan.
FP-growth method compresses a large database into a
compact Frequent-Pattern tree (FP-tree) structure which is
highly condensed, but complete for frequent pattern
mining. This method avoids costly database scans.
Following are the simple steps taken to perform pattern
mining as, Scan DB once, find frequent 1-itemset (single
item pattern). Order of frequent items is in descending
order of their frequency. Scan DB again, construct FP-tree.
Eclat takes a depth-first search and adopts a vertical
layout to represent databases, during which each item
is pictured by a group of dealings IDs (called a tidset)
whose transactions contain the item. It's tough to utilize the
downward
closure
property
like
in
Apriori.
However, tidsets
has a
plus that there's no would
like for count support, the support of associate itemset is
that the size of the tidset representing it. the most operation
of Eclat is intersecting tidsets, therefore the dimensions of
tidsets is one among main factors affecting the time
period and
memory
usage
of
Eclat.
The
larger tidsets are,the longer and memory are required.
Zaki proposed a new vertical data representation, called
Diffset, and introduced dEclat, an Eclat-based algorithm
using diffset. Instead of using tidsets, they use the
difference of tidsets (called diffsets). Using diffsets has
reduced considerably the set size representing itemsets and
thus operations on sets are much faster. dEclat had been
shown to achieve significant improvements in performance
as well as memory usage over Eclat, especially on dense
databases. However, when the dataset is sparse, diffset
loses its advantage over tidset. Therefore, Zaki suggested
using tidset format at the start for sparse databases and then
switching to diffset format later when a switching condition
is met. Eclat is based on two main steps: 1) candidate
generation and 2) pruning. In the candidate generation step,
each k-itemset candidate is generated from two
frequent
and then its support is
counted, if its support is lower than the threshold, then it
will be discarded, otherwise it is frequent itemsets and used
to generate (k+1)-itemsets. Since Eclat uses the vertical
layout, counting support is trivial. Candidate generation is
indeed a search in the search tree. This search is a depthfirst search and it starts with frequent items in the item base
and then 2-itemsets are reached from 1-itemsets, 3-itemsets
are reached from 2-itemsets and so on. Eclat starts with the
prefix {} and the search tree is actually the initial search
tree. To divide the initial search tree, it picks the prefix {a},
generate the corresponding equivalence class and does
frequent itemset mining in the sub tree of all itemsets
containing {a}, in this sub tree it divides further into two
sub trees by picking the prefix {ab}: the first sub tree
consists of all itemset containing {ab}, the other consists of
all itemsets containing {a} but not {b}, and this process is
recursive until all itemsets in the initial search tree are
visited. The search tree of an item base {a,b,c,d,e} is
represented by the tree as below:
D. ECLAT Algorithm:
The Eclat algorithm is used to perform itemset mining.
Zaki et al 2000 [8] used éclat algorithm for exploring
vertical data format. The idea for the eclat algorithm is to
use tid set intersections to compute the support of a
candidate itemset avoiding the generation of subsets that
does not exist in the prefix tree Units.
Algorithm:
The Eclat algorithm is recursively defined. Initially it
uses all the single items with their tid sets. In each
recursive call, the function Intersect Tid sets verifies each
item set-tidset pair
with all the others
pairs
to generate new candidates
. If the
new candidate is frequent, it is added to the set
. Then,
recursively, it finds all the frequent item sets in
the
branch. The algorithm searches in a Depth First
Search manner to find all the frequent sets.
526
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
Burdick et al [9] extend the idea in Depth Project and
give an algorithm called Mafia to mine maximal frequent
item sets. Similar to Depth Project, their method also uses a
bitmap representation, where the count of an Item set is
based on the column in the bitmap (the bitmap is called
\vertical bitmap"). As an example, in Figure 4 (a), the bit
vectors for items B, C, and D are 111110, 011111, and
110110, respectively. To get the bit vectors for any item
set, we only need to apply the bit vector and-operation on
the bit vectors of the items in the item set. From above
example, the bit vector for item set BC is 111110 011111,
which equals 011110, while the bitmap for item set BCD
can be calculated from the bitmaps of BC and D, i.e.,
011110 110110, which is 010110. The count of an item set
is the number of 1's in its bit vector. Mafia is a depth
algorithm. The testing order is indicated by the number on
the top-right side of the item sets. Besides subset
infrequency pruning and superset frequency pruning, some
other pruning techniques are also used in Mafia. As an
example, the support of an item set X, Y equals the support
of X, if and only if XY = X this is the case if the bit vector
for Y has a 1 in every position that the bit vector for X has
1. The last condition is easy to test. This allows us to
conclude without counting that X, Y also is frequent. The
technique is called Parent Equivalence Pruning.
For frequent itemset A, if there exists no item B such
that every transaction containing A also contains B, then A
is a frequent closed pattern. In other words, frequent
itemset A is closed if there is no item B, not already in A,
that always accompanies A in all transactions where A
occurs. Closed itemsets are concise representation of
frequent patterns. This can generate all frequent patterns
with their support from frequent closed ones which helps to
reduce number of patterns and rules in the extracting
information from the given transactional dataset.
Frequent itemset A is maximal if there is no other
frequent itemset B that is superset of A. In other words,
there is no other frequent pattern that would include a
maximal pattern. Maximal itemsets are more concise
representation of frequent patterns but the information
about supports is lost. This can generate all frequent
patterns from frequent maximal ones but without their
respective support. the difference between both is shown in
given figure 5.
Figure 3. Search tree on item base {a, b ,c, d, e}
Following the depth-first-search of Eclat we will pick
the prefix
and we can generate an equivalence class
with item sets {ab, ac, ad, ae} which are all 2-itemsets
containing. In this sub tree, we pick the prefix {ab} and the
equivalence class we get consists of item sets {abc ,abd,
abe}.We can see that each node in the tree is a prefix of an
equivalence class with item sets right below it. It could be
seen that Eclat does not fully exploit the downward closure
property because of its depth-first search.
E. Mining Closed & Maximal Itemsets:
Mining closed and maximal itemsets uses vertical
bitmap representation method that extracts the information
by ANDing the columns. This method has better scalability
& interoperability. Mining maximal frequent item sets is
one of the most fundamental problems in data mining.
Figure 4. Bitmap representation of depth first search
527
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
Another objective of the periodic tree restructuring
technique is to construct a frequency-descending tree with
reduced overall restructuring cost. The construction of a
CP-tree starts with an insertion phase. As shown in the
figure 6 The first insertion phase begins by inserting the
first transaction b, a, e into the tree in a lexicographical
item order Since the tree will be restructured after every
three transactions, the first insertion phase ends here and
the first restructuring phase starts immediately. Since items
so far are not inserted in frequency-descending order, the
CP-tree at this stage is like a frequency-independent tree
with a lexicographical item order. To rearrange the tree
structure, first, the item order of the I-list is rearranged in
frequency-descending order, and then the tree is
restructured according to the new I-list item order. It can be
noted that items having a higher count value are arranged at
the upper most portion of the tree; therefore, the CP-tree at
this stage is a frequency-descending tree offering higher
prefix sharing among patterns in tree nodes. The first
restructuring phase terminates when the full process of Ilist rearrangement and tree restructuring is completed. The
CP-tree construction will enter into the next insertion
phase, thereafter. The CP-tree repeats this procedure until
all transactions of a database are inserted into a tree. The
CP-tree improves the possibility of prefix sharing among
all the patterns in DB with one DB scan. Thus, more
frequently occurring items are more likely to be shared and
thus they are arranged closer to the root of the CP-tree.
Figure 5. Maximal vs closed itemsets.
F. CP Tree:
CP tree is also known as compact pattern tree that
captures database information with one scan and provides
the same mining performance as the FP growth method.
The CP-tree [10] introduces the concept of dynamic tree
restructuring to produce a highly compact frequencydescending tree structure at runtime.
This CP-tree method retains all items regardless of
whether they are frequent or not in a tree structure. Since
the CP-tree maintains the complete information for DB in a
highly compact frequency-descending manner, it is quite
easy to discover transactions to delete and or update, and
insert new transactions into the tree. Thus, when adding
new transactions into the tree, the transaction should be
inserted according to current-list order. Consequently, by
using the tree restructuring operation the final compact tree
can have a higher count value at the uppermost portion of
the tree.
CP-tree construction mainly consists of two phases:
(i) Insertion phase: firstly it scans transaction(s), and
inserts them into the tree according to the current item
order of I-list then updates the frequency count of
respective items in the I-list (I-list maintains the current
frequency value of each item).
(ii) Restructuring phase: Secondly this phase rearranges the
I-list according to frequency-descending order of items and
restructures the tree nodes according to this newly
rearranged I-list.
These two phases are dynamically executed in alternate
fashion, starting with the insertion phase by scanning and
inserting the first part of DB, and finishing with the
restructuring phase at the end of DB.
Figure 6 CP tree insertion and restructuring phase
Thus, the goal of constructing CP-tree is to obtain a
significant improvement in mining performance based on
its compact tree structure.
528
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 3, March 2014)
G. Sliding Window Method:
Suppose I = {i1,i2,…,in} be a set of all data items, item
set X is a subset of complete data items I sub-sets (X , I),
some items containing k items are k-item sets. Transaction
T is an item set, data stream can be seen as a continuous
arriving transaction sequence DS = {T1, T2, ..., TN}. Let
T1 be the transactions of earliest arrival time in data
stream, TN be the latest arrival transactions in the data
stream. Let w represent the fixed size of basic sliding
window, that is, there are only w recent transactions in the
basic sliding window [11].
In future we would like to study and evaluate the
applications of each technique in the real world scenario.
Performance of particular technique depends on input data
and available resources. These techniques are found in
many applications other than market basket approach,
including applications in marketing and e-commerce,
classification, clustering, web mining, bio-informatics and
finance.
REFERENCES
[1]
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In
Proc. 1995 Int. Conf. Data Engineering (ICDE‘95), Taipei, Taiwan,
pp. 3–14.
[2] Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., and Yang, D. 2001. HMine:
Hyper-structure mining of frequent patterns in large
databases. In Proc. 2001 Int. Conf. Data Mining (ICDM‘01), San
Jose, CA, pp. 441–448.
[3] Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association
rules between sets of items in large databases.
[4] Goswami D.N. et. al. ―An Algorithm for Frequent Pattern Mining
Based On Apriori‖ (IJCSE) International Journal on Computer
Science and Engineering Vol. 02, No. 04, 2010, 942-947.
[5] Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining
association rules. In Proc. 1994 Int. Conf. Very Large Data Bases
(VLDB‘94), Santiago, Chile, pp. 487–499.
[6] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo,
A.I. 1996. Fast discovery of association rules.
[7] Han J, Pei J, Yin Y (2000) Mining frequent patterns without
candidate generation. In: Proceeding of the 2000 ACM-SIGMOD
international conference on management of data (SIGMOD‘00),
Dallas, TX, pp 1– 12.
[8] ZakiMJ(2000) Scalable algorithms for
association mining.
IEEETransKnowl Data Eng 12:372–390.
[9] Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal
frequent itemset algorithm for transactional databases. In:
Proceeding of the 2001 international conference on data engineering
(ICDE‘01), Heidelberg, Germany, pp 443–452 .
[10] Tanbeer, S. K., Ahmed, C. F., Jeong, B.-S.,& Lee, Y.-K. (2009).
Efficient single-pass frequent pattern mining using a prefixtree.
Information Sciences, 179(5), 559583.
[11] Mahmood Deypir, Mohammad Hadi Sadreddini, Sattar Hashemi
―Towards a variable size sliding window model for frequent itemset
mining over data streams‖ Computers & Industrial Engineering 63
(2012) 161–172.
Figure 7 sliding window Protocol
Data stream DS can be segmented according to the
number W, each w transactions correspond to as a subsequence data stream, that the corresponding size (or
width) of the basic sliding window is w. The current basic
sliding window is represented as: swi={T‘1,T‘2,…,T‘w},
where sw indicates the basic sliding window, i is the
current window number of the window (i.e., the i-th basic
window). The sliding window SW consists of a continuous
series of basic window swi, which denotes as <sw1,
sw2,…, swk >,sliding window contains the numbers of the
window is the size of the sliding window, denoted by | SW
| = k. The size of a sliding window defines the desired lifetime of information in a newly generated transaction. a
sliding window method is used to find recently frequent
itemsets over an online data stream. The recent change of a
data stream can be adaptively reflected to the current
mining result of the data stream.
IV. CONCLUSION
We have studied several frequent mining techniques
used in data mining. Our analysis has showed that every
technique is a unique and has equivalent merit and demerit
and also compared each technique with one another.
529