Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Effective Pattern Discovery for Text Mining
Presenter : Chuang, Kai-Ting
Authors
: Ning Zhong, Yuefeng Li, and Sheng-Tang Wu
2012, TKDE
Intelligent Database Systems Lab
Outlines
Motivation
Objectives
Methodology
Experiments
Conclusions
Comments
Intelligent Database Systems Lab
Motivation
• Many data mining techniques have been proposed
for mining useful patterns in text documents.
• How to effectively use and update discovered
patterns is still an open research issue.
TID
1
2
3
4
5
Items
Bread, Milk
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Frequent Itemset
E.g.: {Milk, Bread, Diaper}, {Milk}
Intelligent Database Systems Lab
Objectives
• This paper presents an innovative and effective
pattern discovery technique which includes the
processes of pattern deploying and pattern evolving.
Intelligent Database Systems Lab
Pattern-based ( Phrase-based)
• Low frenquency.
• Misinterpretaion.
Intelligent Database Systems Lab
Methodology-Framework
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-Frequent and
Closed Patterns
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-Pattern Taxonomy
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-PDM
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-PDM
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-D-Pattern Mining
Algorithm
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-IPEvolving
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
Methodology-Shuffling
Pattern
Taxonomy Model
Pattern
Deploying
Method
Inner Pattern
Evolution
Intelligent Database Systems Lab
The list of methods used for
evaluation
Intelligent Database Systems Lab
Baseline Models-Concept-Based
Models
CBM:
CBM Pattern Matching:
Intelligent Database Systems Lab
Baseline Models-Term-Based
Mothods
Rocchio:
Prob:
TF-IDF:
BM25:
SVM:
Intelligent Database Systems Lab
Comparison of all methods on the
first 50 topics
Intelligent Database Systems Lab
Experiment
Intelligent Database Systems Lab
Experiment
Intelligent Database Systems Lab
Experiment
Intelligent Database Systems Lab
Experiment
Intelligent Database Systems Lab
Conclusions
• The experimental results show that the proposed
model outperforms not only other pure data mining-
based methods and the concept-based model, but
also term-based state-of-the-art models, such as
BM25 and SVM-based models.
Intelligent Database Systems Lab
Comments
• Advantages
– The approach is helpful.
• Applications
– Text mining.
Intelligent Database Systems Lab