Download Mining Frequent Patterns Without Candidate generation

TITLE What should be in Objective, Method and Significant The title should convey to the objective/purpose, method and significant of the paper. The nicer the better. Example: A Support-Ordered Trie for Fast Frequent Itemset Discovery method significant objective 1 TITLE The title should convey to the objective/purpose, method, and significant of the paper. The nicer the better. Example: Mining Frequent Patterns Without Frequent- Pattern Tree Approach Candidate generation: A Method : Frequent- Pattern Tree Objective :Mining Frequent Patterns Without Candidate generation Significant : fast/ efficient ?? JiaWei Han et al. “Mining Frequent Patterns Without Candidate generation: A Frequent- Pattern Tree Approach”, Data Mining and Knowledge Discovery, 8, 53-87, 2004 2 ABSTRACT An abstract should briefly: • • • • • Re-establish the topic of the research. Give the research problem and/or main objective of the research (this usually comes first). Indicate the methodology used. Present the main findings. Present the main conclusions 3 ABSTRACT The Body of the Abstract The abstract is a very brief overview of your ENTIRE study. It tells the reader -WHAT you did, -WHY you did it, -HOW you did it, -WHAT you found, and - WHAT it means. 4 ABSTRACT Briefly state the purpose of the research (introduction), how the problem was studied/solved (methods), the principal findings (results), and what the findings mean (discussion and conclusion). It is important to be descriptive but concise--say only what is essential, using no more words than necessary to convey meaning. 5 Common Problems Too long. If your abstract is too long, it may be bored – abstract is usually a specified maximum number of words. Too much detail. Abstracts that are too long often have unnecessary details. The abstract is not the place for detailed explanations of methodology or for details about the context of your research problem. Too short. Shorter is not necessarily better. If your word limit is 200 but you only write 95 words, you probably have not written in sufficient detail. Many writers do not give sufficient information about their findings. Failure to include important information. You need to be careful to cover the points listed above. Often people do not cover all of them because they spend too long explaining, for example, the methodology and then do not have enough space to present their conclusion. 6 ABSTRACT Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns. In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods. 7 ABSTRACT Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. [WHAT]. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns [WHY]. In this study, we propose a novel frequent-pattern tree (FP-tree) structure, and develop an efficient FP-tree based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: …[HOW]. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.[WHAT YOU FOUND] 8 ABSTRACT The importance of data mining is apparent with the advent of powerful data collection and storage tools; raw data is so abundant that manual analysis is no longer possible. [WHAT] Unfortunately, data mining problems are difficult to solve and this prompted the introduction of several novel data structures to improve mining efficiency. [WHY] Here, we will, critically examine existing preprocessing data structures used in association rule mining for enhancing performance in an attempt to understand their strength and weaknesses. Our analysis culminate in a practical structure called the SOTrieT (Support-Ordered Trie Itemset) and two synergistic algorithms to accompany it for the fast discovery of frequent itemsets.[HOW] Experiments involving a wide range of synthetic data sets reveal that its algorithms outperform FP-growth, a recent association rule mining algorithm with excellent performance, by up to two orders of magnitude and, thus, verifying its efficiency and viability.[WHAT YOU FOUND] 9

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Mining Frequent Patterns Without Candidate generation