Download Mining Frequent Patterns Without Candidate generation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Theoretical computer science wikipedia , lookup

Neuroinformatics wikipedia , lookup

Corecursion wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Pattern language wikipedia , lookup

Pattern recognition wikipedia , lookup

Transcript
TITLE
What should be in
Objective, Method and Significant
The title should convey to the objective/purpose, method and significant
of the paper. The nicer the better.
Example:
A Support-Ordered Trie for Fast Frequent Itemset Discovery
method
significant
objective
1
TITLE
The title should convey to the objective/purpose, method, and
significant of the paper. The nicer the better.
Example:
Mining Frequent Patterns Without
Frequent- Pattern Tree Approach
Candidate
generation:
A
Method : Frequent- Pattern Tree
Objective :Mining Frequent Patterns Without Candidate generation
Significant : fast/ efficient ??
JiaWei Han et al. “Mining Frequent Patterns Without Candidate generation: A Frequent- Pattern
Tree Approach”, Data Mining and Knowledge Discovery, 8, 53-87, 2004
2
ABSTRACT
An abstract should briefly:
•
•
•
•
•
Re-establish the topic of the research.
Give the research problem and/or main objective of
the research (this usually comes first).
Indicate the methodology used.
Present the main findings.
Present the main conclusions
3
ABSTRACT
The Body of the Abstract
The abstract is a very brief overview of your
ENTIRE study. It tells the reader
-WHAT you did,
-WHY you did it,
-HOW you did it,
-WHAT you found, and
- WHAT it means.
4
ABSTRACT
Briefly state the purpose of the research
(introduction), how the problem was studied/solved
(methods), the principal findings (results), and what
the findings mean (discussion and conclusion).
It is important to be descriptive but concise--say only
what is essential, using no more words than necessary
to convey meaning.
5
Common Problems
Too long. If your abstract is too long, it may be bored – abstract is
usually a specified maximum number of words.
Too much detail. Abstracts that are too long often have unnecessary
details. The abstract is not the place for detailed explanations of
methodology or for details about the context of your research problem.
Too short. Shorter is not necessarily better. If your word limit is 200 but
you only write 95 words, you probably have not written in sufficient
detail. Many writers do not give sufficient information about their findings.
Failure to include important information. You need to be careful to
cover the points listed above. Often people do not cover all of them
because they spend too long explaining, for example, the methodology
and then do not have enough space to present their conclusion.
6
ABSTRACT
Mining frequent patterns in transaction databases, time-series databases, and
many other kinds of databases has been studied popularly in data mining research.
Most of the previous studies adopt an Apriori-like candidate set generation-and-test
approach. However, candidate set generation is still costly, especially when there
exist a large number of patterns and/or long patterns.
In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is
an extended prefix-tree structure for storing compressed, crucial information about
frequent patterns, and develop an efficient FP-tree based mining method, FP-growth,
for mining the complete set of frequent patterns by pattern fragment growth.
Efficiency of mining is achieved with three techniques: (1) a large database is
compressed into a condensed, smaller data structure, FP-tree which avoids costly,
repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment
growth method to avoid the costly generation of a large number of candidate sets,
and (3) a partitioning-based, divide-and-conquer method is used to decompose the
mining task into a set of smaller tasks for mining confined patterns in conditional
databases, which dramatically reduces the search space.
Our performance study shows that the FP-growth method is efficient and scalable for
mining both long and short frequent patterns, and is about an order of magnitude
faster than the Apriori algorithm and also faster than some recently reported new
frequent-pattern mining methods.
7
ABSTRACT
Mining frequent patterns in transaction databases, time-series databases, and many
other kinds of databases has been studied popularly in data mining research. Most of
the previous studies adopt an Apriori-like candidate set generation-and-test
approach. [WHAT].
However, candidate set generation is still costly, especially when there
exist a large number of patterns and/or long patterns [WHY].
In this study, we propose a novel frequent-pattern tree (FP-tree) structure, and
develop an efficient FP-tree based mining method, FP-growth, for mining the
complete set of frequent patterns by pattern fragment growth. Efficiency of mining is
achieved with three techniques: …[HOW].
Our performance study shows that the FP-growth method is efficient and scalable for
mining both long and short frequent patterns, and is about an order of magnitude
faster than the Apriori algorithm and also faster than some recently reported new
frequent-pattern mining methods.[WHAT YOU FOUND]
8
ABSTRACT
The importance of data mining is apparent with the advent of
powerful data collection and storage tools; raw data is so abundant that
manual analysis is no longer possible. [WHAT]
Unfortunately, data mining problems are difficult to solve and this
prompted the introduction of several novel data structures to improve
mining efficiency. [WHY]
Here, we will, critically examine existing preprocessing data structures
used in association rule mining for enhancing performance in an attempt
to understand their strength and weaknesses. Our analysis culminate in
a practical structure called the SOTrieT (Support-Ordered Trie
Itemset) and two synergistic algorithms to accompany it for the fast
discovery of frequent itemsets.[HOW]
Experiments involving a wide range of synthetic data sets reveal that its
algorithms outperform FP-growth, a recent association rule mining
algorithm with excellent performance, by up to two orders of magnitude
and, thus, verifying its efficiency and viability.[WHAT YOU FOUND]
9