Download yes

Document related concepts
no text concepts found
Transcript
Classification
分類與預測
Classification (分類)
predict categorical class labels 預測分門別類的類別
標籤
categorical class labels: 離散式的或名目式的
(discrete/nominal)
根據訓練組資料的分類屬性的值(即類別標籤),
建構一個模型,並以之將新資料歸類
Prediction (預測)
model continuous-valued functions 掌握連續值型態
的函數,預測未知或遺失的值
2
Ming-Yen Lin, IECS, FCU
Classification Problem
Given
a database D={t1,t2,…,tn} and
a set of classes C={C1,…,Cm},
the Classification Problem is to
define a mapping f:DgC
where each ti is assigned to one class.
Actually divides D into equivalence classes.
Prediction is similar, but may be viewed as
having infinite number of classes.
3
Ming-Yen Lin, IECS, FCU
Classification Ex: Grading
 If x >= 90 then grade =A.
 If 80<=x<90 then grade =B.
 If 70<=x<80 then grade =C.
 If 60<=x<70 then grade =D.
 If x<50 then grade =F.
x
<90
x
<80
x
<70
x
<50
F
Ming-Yen Lin, IECS, FCU
>=90
A
>=80
B
>=70
C
>=60
D
4
Classification Ex: Letter Recognition
View letters as constructed from 5 components:
Letter A
Letter B
Letter C
Letter D
Letter E
Letter F
5
Ming-Yen Lin, IECS, FCU
Classification Examples
 Teachers classify students’ grades as A, B, C, D, or F.
 Identify mushrooms as poisonous or edible.
 Classify a river as flooding or not.
 Identify an individual as certain credit risk.
 Speech recognition: Identify the spoken “word” from the
classified examples (rules).
 Pattern recognition: Identify a pattern from the
classified ones.
 典型應用
 信用核發,目標行銷,醫療診斷,處方(treatment)有效性
分析
6
Ming-Yen Lin, IECS, FCU
Classification—兩大步驟
 建構模型: 描述一組預定的類別(predetermined class)
 每筆資料皆已具有預定的類別(由 class label attribute決定)
 training set :用來建構模型的資料組
 模型的呈現:(1) classification rules (2) decision trees (3) mathematical formulae
 使用模型: 將未來或未知之objects分類
 估計模型之 accuracy
 將已知類別的test sample灌入模型,比較模型的答案類別
 Accuracy rate: test set samples正確被模型分類的比例
 Test set:獨立於 training set, 否則會over-fitting
 若accuracy 可接受,用此模型將未知類別標籤的資料項分類
 Most common techniques use DTs, NNs, or are based on distances or
statistical methods
7
Ming-Yen Lin, IECS, FCU
Process 1: 建構模型
Training
Data
NAME
M ike
M ary
B ill
Jim
D ave
A nne
RANK
YEARS TENURED
A ssistant P rof
3
no
A ssistant P rof
7
yes
P rofessor
2
yes
A ssociate P rof
7
yes
A ssistant P rof
6
no
A ssociate P rof
3
no
Ming-Yen Lin, IECS, FCU
Classification
Algorithms
Classifier
(Model)
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
8
Process 2:使用模型
Classifier
Testing
Data
Unseen Data
(Jeff, Professor, 4)
NAME
T om
M erlisa
G eorge
Joseph
RANK
YEARS TENURED
A ssistant P rof
2
no
A ssociate P rof
7
no
P rofessor
5
yes
A ssistant P rof
7
yes
Ming-Yen Lin, IECS, FCU
Tenured?
9
Defining Classes
Distance Based
Partitioning Based
10
Ming-Yen Lin, IECS, FCU
Supervised vs. Unsupervised Learning
 監督式學習 (classification)
 監督: training data (觀察值或測量值等) 帶有指示此觀察之
標籤
 新資料依照training set來分類
 非監督式學習(clustering)
 training data 的類別標籤未知
 給予一組觀察值或測量值等,目的是建構資料中存在的
類別或群體(cluster)
11
Ming-Yen Lin, IECS, FCU
議題 (1): Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and handle missing
values
 相關性分析(Relevance analysis):即屬性的選取
(feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
 Missing Data
 Ignore
 Replace with assumed value
12
Ming-Yen Lin, IECS, FCU
議題(2): 評估 Classification Methods
 預測準確率(Predictive accuracy)
 Classification accuracy on test data
 Confusion matrix
 OC Curve
 速度與可擴充性( scalability)
 建構 classifier 的時間多久
 何時可以使用classifier
 強固性(Robustness)
 handling noise and missing values
 可擴充性(Scalability)
 disk-resident databases 的效率
 可解釋性(Interpretability)
 對model的瞭解與model可提供的insight(瞭解深度)
 rule有多好( goodness measure )
 decision tree size
 compactness of classification rules
13
Ming-Yen Lin, IECS, FCU
Height Example Data
Name
Kristina
Jim
Maggie
Martha
Stephanie
Bob
Kathy
Dave
Worth
Steven
Debbie
Todd
Kim
Amy
Wynette
Gender
F
M
F
F
F
M
F
M
M
M
F
M
F
F
F
Height
1.6m
2m
1.9m
1.88m
1.7m
1.85m
1.6m
1.7m
2.2m
2.1m
1.8m
1.95m
1.9m
1.8m
1.75m
Output1
Short
Tall
Medium
Medium
Short
Medium
Short
Short
Tall
Tall
Medium
Medium
Medium
Medium
Medium
Output2
Medium
Medium
Tall
Tall
Medium
Medium
Medium
Medium
Tall
Tall
Medium
Medium
Tall
Medium
Medium
accuracy
14
Ming-Yen Lin, IECS, FCU
Classification Performance
True Positive
False Negative
False Positive
True Negative
15
Ming-Yen Lin, IECS, FCU
Confusion Matrix Example
Using height data example with Output1 correct
and Output2 actual assignment
Actual
Membership
Short
Medium
Tall
Assignment
Short
Medium
0
4
0
5
0
1
Tall
0
3
2
16
Ming-Yen Lin, IECS, FCU
Operating Characteristic Curve
17
Ming-Yen Lin, IECS, FCU
Classification Using Decision Trees
Partitioning based: Divide search space into
rectangular regions.
Tuple placed into class based on the region
within which it falls.
DT approaches differ in how the tree is built: DT
Induction
Internal nodes associated with attribute and arcs
with values for that attribute.
Algorithms: ID3, C4.5, CART
18
Ming-Yen Lin, IECS, FCU
Decision Tree
Given:
D = {t1, …, tn} where ti=<ti1, …, tih>
Database schema contains {A1, A2, …, Ah}
Classes C={C1, …., Cm}
Decision or Classification Tree is a tree associated with
D such that
Each internal node is labeled with attribute, Ai
Each arc is labeled with predicate which can be
applied to attribute at parent
Each leaf node is labeled with a class, Cj
19
Ming-Yen Lin, IECS, FCU
DT Induction
20
Ming-Yen Lin, IECS, FCU
DT Splits Area
Gender
M
F
Height
21
Ming-Yen Lin, IECS, FCU
Comparing DTs
Balanced
Deep
22
Ming-Yen Lin, IECS, FCU
Decision Tree Induction
Training Dataset
age
<=30
<=30
31…40
>40
>40
>40
31…40
<=30
<=30
>40
<=30
31…40
31…40
>40
Ming-Yen Lin, IECS, FCU
income student credit_rating
high
no fair
high
no excellent
high
no fair
medium
no fair
low
yes fair
low
yes excellent
low
yes excellent
medium
no fair
low
yes fair
medium
yes fair
medium
yes excellent
medium
no excellent
high
yes fair
medium
no excellent
buys_computer
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
23
Output: A Decision Tree for “buys_computer”
age?
<=30
student?
overcast
30..40
yes
>40
credit rating?
no
yes
excellent
fair
no
yes
no
yes
an example from Quinlan’s ID3
24
Ming-Yen Lin, IECS, FCU
Decision Tree Induction演算法
 基本方法 (a greedy algorithm)
 top-down recursive divide-and-conquer 式的建構此樹
 開始時,所有的example都在樹根
 只能處理categorical屬性(若屬性是continuous-valued, 預做
discretization)
 遞迴地將examples依據所選屬性 partition
 選屬性的根據:以(1) heuristic(2)或statistical measure (如
information gain)
 停止partitioning的條件
 某一節點的類別均相同
 已沒有其他可以用來partition的屬性 – 這個leaf的類別以
「多數決」
 已經沒有sample留下
Ming-Yen Lin, IECS, FCU
25
Decision Tree Induction is often based on
Information Theory
So
26
Ming-Yen Lin, IECS, FCU
Information
27
Ming-Yen Lin, IECS, FCU
DT Induction
When all the marbles in the bowl are mixed up,
little information is given.
When the marbles in the bowl are all from one
class and those in the other two classes are on
either side, more information is given.
Use this approach with DT Induction !
28
Ming-Yen Lin, IECS, FCU
Information/Entropy
 Given probabilitites p1, p2, .., ps whose sum is 1,
Entropy is defined as:
 Entropy measures the amount of randomness or
surprise or uncertainty.
 Goal in classification
 no surprise
 entropy = 0
29
Ming-Yen Lin, IECS, FCU
Entropy
log (1/p)
H(p,1-p)
30
Ming-Yen Lin, IECS, FCU
選屬性: Information Gain (ID3/C4.5)



選具有最高 information gain之屬性
S contains si tuples of class Ci for i = {1, …, m}
information measures info required to classify any arbitrary tuple
m

si
si
I( s1,s2,...,sm )   log 2
s
i 1 s
entropy of attribute A with values {a1,a2,…,av}
s1 j  ...  smj
E(A)  
I (s1 j,..., smj)
s
j 1
v

information gained by branching on attribute A
Gain(A)  I(s 1, s 2 ,..., sm)  E(A)
31
Ming-Yen Lin, IECS, FCU
Information Gain 例子




Class P: buys_computer = “yes”
Class N: buys_computer = “no”
I(p, n) = I(9, 5) =0.940
Compute the entropy for age:
age
<=30
30…40
>40
age
<=30
<=30
31…40
>40
>40
>40
31…40
<=30
<=30
>40
<=30
31…40
31…40
>40
pi
2
4
3
ni I(pi, ni)
3 0.971
0 0
2 0.971
income student credit_rating
high
no
fair
high
no
excellent
high
no
fair
medium
no
fair
low
yes fair
low
yes excellent
low
yes excellent
medium
no
fair
low
yes fair
medium
yes fair
medium
yes excellent
medium
no
excellent
high
yes fair
medium
no
excellent
Ming-Yen Lin, IECS, FCU
buys_computer
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
5
4
I (2,3) 
I ( 4,0)
14
14
5

I (3,2)  0.694
14
E ( age) 
5
I (2,3) 表 “age <=30” 14個中
14
有5個,其中 2 ‘yes’、3 ‘ no’
故
Gain(age)  I ( p, n)  E (age)  0.246
同理
Gain(income)  0.029
Gain( student )  0.151
Gain(credit _ rating )  0.048
32
由decision tree產生規則






將知識以 IF-THEN 規則方式呈現
每個路徑(從root到leaf) 產生一條規則
路徑上每個attribute-value pair是一個條件組合( conjunction)
Leaf具有類別預測值
人們對於規則比較容易懂
Example




IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40”
THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
 IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”
33
Ming-Yen Lin, IECS, FCU
避免 Overfitting
 Overfitting: induced tree 可能會overfit training data
 太多分支:有些可能是noise或outlier的不正常反應
 unseen samples的accuracy不好
 Two approaches
 預先修剪(Prepruning): 建樹時早一點結束
 如果會導致goodness measure掉到某個threshold下,別再「分」了
(no more split)
合適的threshold 難定
 後修剪(Postpruning): 從長太滿的樹中移除branches
 可得到一連串漸進(progressively)修剪的樹
用一組不同於 training 的data 決定那一個「修剪的樹」
最佳
Ming-Yen Lin, IECS, FCU
34
決定 Final Tree Size的方法
 分成 training (2/3) 與 testing (1/3) sets
 used for data set with large number of samples
 使用 cross validation, 例如, 10倍 cross validation
 divide the data set into k subsamples
 use k-1 subsamples as training data and one sub-sample as test data --- kfold cross-validation
 for data set with moderate size
 使用 所有 data training
 但使用 statistical test (如 chi-square) 估計 展開(expand)或修剪(prune)
某個節點可否改善整體分配
 用 minimum description length (MDL) 原則
 當encoding minimized時,停止長樹
35
Ming-Yen Lin, IECS, FCU
Information Gain example
YES: 9
NO: 5
Total = 14
P(YES) = 9/14 = 0.64
P(NO) =5/14 = 0.36
I(Y,N) = -( 9/14 * log2(9/14) + 5/14 * log2(5/14) ) = 0.94
-----------------------------------------------------------------------
age buys_computer
<=30
no
<=30
no
31…40
yes
>40
yes
>40
yes
>40
no
31…40
yes
<=30
no
<=30
yes
>40
yes
<=30
yes
31…40
yes
31…40
yes
>40
no
Age (<= 30): I(Y,N) = -( 2/5 * log2(2/5) + 3/5 * log2(3/5) ) = 0.97
YES 2
NO 3
Age (31..40): I(Y, N) = -( 4/4 * log2(4/4) + 0/4 * log2(0/4) ) = 0
YES 4
NO 0
Age(>40):-( 2/5 * log2(2/5) + 3/5 * log2(3/5) ) = 0.97
YES 3
NO 2
E(age) = 5/14 I(Y,N)age(<=30) + 4/14 I(Y,N)age(31..40)+ 5/14 I(Y,N)age(>40) = 0.694
Gain = 0.94 – 0.694 = 0.246
Ming-Yen Lin, IECS, FCU
36
Example: Error Rate
Test set: 84
84
2/3
1/3
56
1/7
6+2 E1
E1
=
28
3/7
1/6 E2
3/7
3/4
1/12 E3
2/7 E4
類別錯誤的個數
進入這個 leaf 的個數
1/4
3/7 E5
= 2/(6+2)
Etree = 2/3*(1/7*E1+ 3/7*E2+3/7*E3)+1/3*(3/4*E4+1/4*E5)
37
Ming-Yen Lin, IECS, FCU
Example of overfitting
“Yes”: 11
“No” : 9
Predict: Yes
Entropy =
-(11/20log211/20+
9/20log29/20)
= 0.993
“Yes”: 3
“No”: 0
Predict: Yes
“Yes”: 8
“No”: 9
Predict: No
Average entropy =
-17/20(8/17log28/17+
9/17log29/17)
= 0.848
38
Ming-Yen Lin, IECS, FCU
強化(enhance)基本決策樹
 允許 continuous-valued 屬性
 動態定義將continuous attribute value partition為離散區間的
新離散屬性
 處理 missing attribute values
 指定共通值
 指定各個值的機率
 建立Attribute
 為現有稀疏的屬性(sparsely represented)建立新attribute
 可減少(簡化) fragmentation, repetition, and replication
39
Ming-Yen Lin, IECS, FCU
DT Issues
Choosing Splitting Attributes
Ordering of Splitting Attributes
Splits
Tree Structure
Stopping Criteria
Training Data
Pruning
40
Ming-Yen Lin, IECS, FCU
ID3
 Creates tree using information theory concepts and
tries to reduce expected number of comparison..
 ID3 chooses split attribute with the highest
information gain:
41
Ming-Yen Lin, IECS, FCU
ID3 Example (Output1)
 Starting state entropy:
4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = 0.4384
 Gain using gender:
 Female: 3/9 log(9/3)+6/9 log(9/6)=0.2764
 Male: 1/6 (log 6/1) + 2/6 log(6/2) + 3/6 log(6/3) = 0.4392
 Weighted sum: (9/15)(0.2764) + (6/15)(0.4392) =
0.34152
 Gain: 0.4384 – 0.34152 = 0.09688
 Gain using height:
0.4384 – (2/15)(0.301) = 0.3983
 Choose height as first splitting attribute
42
Ming-Yen Lin, IECS, FCU
CART, C4.5, CHAID
Classification And Regression Tree
maximize diversitybefore- diversityafter
diversity分散度-僅含單一類別之diversity低
– eg. entropy/information
– compute I(2,3,9)?
 C4.5
 Post-pruning
 CHAID
 A tree is “pruned” by halting its construction early.
(Pre-pruning)
 CHAID is restricted to categorical variables
 Continuous variables must be broken into ranges or replaced with classes
such as high, low, medium
43
Ming-Yen Lin, IECS, FCU
C4.5
ID3 favors attributes with large number of
divisions
Improved version of ID3:
Missing Data
Continuous Data
Pruning
Rules
GainRatio:
44
Ming-Yen Lin, IECS, FCU
CART
 Create Binary Tree
 Uses entropy
 Formula to choose split point, s, for node t:
 PL,PR probability that a tuple in the training set will be
on the left or right side of the tree.
45
Ming-Yen Lin, IECS, FCU
CART Example
At the start, there are six choices for split
point (right branch on equality):
P(Gender)=2(6/15)(9/15)(2/15 + 4/15 + 3/15)=0.224
P(1.6) = 0
P(1.7) = 2(2/15)(13/15)(0 + 8/15 + 3/15) = 0.169
P(1.8) = 2(5/15)(10/15)(4/15 + 6/15 + 3/15) = 0.385
P(1.9) = 2(9/15)(6/15)(4/15 + 2/15 + 3/15) = 0.256
P(2.0) = 2(12/15)(3/15)(4/15 + 8/15 + 3/15) = 0.32
Split at 1.8
46
Ming-Yen Lin, IECS, FCU
Classification in Large Databases
 Classification—a classical problem extensively studied
by statisticians and machine learning researchers
 Scalability: Classifying data sets with millions of
examples and hundreds of attributes with reasonable
speed
 Why decision tree induction in data mining?
 relatively faster learning speed (than other classification
methods)
 convertible to simple and easy to understand classification
rules
 can use SQL queries for accessing databases
 comparable classification accuracy with other methods
Ming-Yen Lin, IECS, FCU
47
Scalable Decision Tree Induction in Data Mining
 SLIQ (EDBT’96 — Mehta et al.)
 builds an index for each attribute and only class list and the
current attribute list reside in memory
 SPRINT (VLDB’96 — J. Shafer et al.)
 constructs an attribute list data structure
 PUBLIC (VLDB’98 — Rastogi & Shim)
 integrates tree splitting and tree pruning: stop growing the tree
earlier
 RainForest (VLDB’98 — Gehrke, Ramakrishnan &
Ganti)
 separates the scalability aspects from the criteria that
determine the quality of the tree
 builds an AVC-list (attribute, value, class label)
Ming-Yen Lin, IECS, FCU
48
Presentation of Classification Results
49
Ming-Yen Lin, IECS, FCU
Decision Tree in SGI/MineSet 3.0
50
Ming-Yen Lin, IECS, FCU
Tool: Decision Tree
C4.5, the “classic” decision-tree tool, developed
by J. Q. Quinlan
http://www.cse.unsw.edu.tw/~quinlan
Classification Tree in Excel
EC4.5, a more efficient version of C4.5
http://www.-kdd.di.unipi.it/software
IND, provides Gini and C4.5 decision trees
http://ic.arc.nasa.gov/projects/bayes-group/ind/INDprogram.html
51
Ming-Yen Lin, IECS, FCU
Regression
 Assume data fits a predefined function
 Determine best values for regression coefficients
c0,c1,…,cn.
 Assume an error: y = c0+c1x1+…+cnxn+e
 Estimate error using mean squared error for
training set:
52
Ming-Yen Lin, IECS, FCU
Linear Regression Poor Fit
53
Ming-Yen Lin, IECS, FCU
Classification Using Regression
Division: Use regression function to divide area
into regions.
Prediction: Use regression function to predict a
class membership function. Input includes
desired class.
54
Ming-Yen Lin, IECS, FCU
Division
55
Ming-Yen Lin, IECS, FCU
Prediction
56
Ming-Yen Lin, IECS, FCU
Classification Using Distance
Place items in class to which they are
“closest”.
Must determine distance between an item
and a class.
Classes represented by
Centroid: Central value.
Medoid: Representative point.
Individual points
Algorithm: KNN
57
Ming-Yen Lin, IECS, FCU
K Nearest Neighbor (KNN):
Training set includes classes.
Examine K items near item to be classified.
New item placed in class with the most
number of close items.
O(q) for each tuple to be classified. (Here q
is the size of the training set.)
58
Ming-Yen Lin, IECS, FCU
KNN
59
Ming-Yen Lin, IECS, FCU
KNN Algorithm
60
Ming-Yen Lin, IECS, FCU
Classification Using Neural
Networks
 Typical NN structure for classification:
 One output node per class
 Output value is class membership function value
 Supervised learning
 For each tuple in training set, propagate it through
NN. Adjust weights on edges to improve future
classification.
 Algorithms: Propagation, Backpropagation, Gradient
Descent
61
Ming-Yen Lin, IECS, FCU
NN Issues
 Number of source nodes
 Number of hidden layers
 Training data
 Number of sinks
 Interconnections
 Weights
 Activation Functions
 Learning Technique
 When to stop learning
62
Ming-Yen Lin, IECS, FCU
Decision Tree vs. Neural Network
63
Ming-Yen Lin, IECS, FCU
Propagation
Tuple Input
64
Ming-Yen Lin, IECS, FCU
NN Propagation Algorithm
65
Ming-Yen Lin, IECS, FCU
Example Propagation
66
Ming-Yen Lin, IECS, FCU
NN Learning
Adjust weights to perform better with the
associated test data.
Supervised: Use feedback from knowledge of
correct classification.
Unsupervised: No knowledge of correct
classification needed.
67
Ming-Yen Lin, IECS, FCU
NN Supervised Learning
68
Ming-Yen Lin, IECS, FCU
Supervised Learning
 Possible error values assuming output from node i is yi
but should be di:
 Change weights on arcs based on estimated error
69
Ming-Yen Lin, IECS, FCU
NN Backpropagation
Propagate changes to weights backward from
output layer to input layer.
Delta Rule: r wij= c xij (dj – yj)
Gradient Descent: technique to modify the
weights in the graph.
70
Ming-Yen Lin, IECS, FCU
Backpropagation
71
Ming-Yen Lin, IECS, FCU
Backpropagation Algorithm
72
Ming-Yen Lin, IECS, FCU
Gradient Descent
73
Ming-Yen Lin, IECS, FCU
Gradient Descent Algorithm
74
Ming-Yen Lin, IECS, FCU
Output Layer Learning
75
Ming-Yen Lin, IECS, FCU
Hidden Layer Learning
76
Ming-Yen Lin, IECS, FCU
Types of NNs
Different NN structures used for different
problems.
Perceptron
Self Organizing Feature Map
Radial Basis Function Network
77
Ming-Yen Lin, IECS, FCU
Perceptron
 Perceptron is one of the simplest NNs.
 No hidden layers.
78
Ming-Yen Lin, IECS, FCU
Perceptron Example
 Suppose:
 Summation: S=3x1+2x2-6
 Activation: if S>0 then 1 else 0
79
Ming-Yen Lin, IECS, FCU
Self Organizing Feature Map (SOFM)
SOM
Competitive Unsupervised Learning
Observe how neurons work in brain:
Firing impacts firing of those near
Neurons far apart inhibit each other
Neurons have specific nonoverlapping tasks
Ex: Kohonen Network
80
Ming-Yen Lin, IECS, FCU
Kohonen Network
81
Ming-Yen Lin, IECS, FCU
Kohonen Network
 Competitive Layer – viewed as 2D grid
 Similarity between competitive nodes and input nodes:
 Input: X = <x1, …, xh>
 Weights: <w1i, … , whi>
 Similarity defined based on dot product
 Competitive node most similar to input “wins”
 Winning node weights (as well as surrounding node
weights) increased.
82
Ming-Yen Lin, IECS, FCU
Radial Basis Function Network
RBF function has Gaussian shape
RBF Networks
Three Layers
Hidden layer – Gaussian activation function
Output layer – Linear activation function
83
Ming-Yen Lin, IECS, FCU
Radial Basis Function Network
84
Ming-Yen Lin, IECS, FCU
Classification Using Rules
Perform classification using If-Then rules
Classification Rule: r = <a,c>
Antecedent, Consequent
May generate from from other techniques
(DT, NN) or generate directly.
Algorithms: Gen, RX, 1R, PRISM
85
Ming-Yen Lin, IECS, FCU
Generating Rules from DTs
86
Ming-Yen Lin, IECS, FCU
Generating Rules Example
87
Ming-Yen Lin, IECS, FCU
Generating Rules from NNs
88
Ming-Yen Lin, IECS, FCU
1R Algorithm
89
Ming-Yen Lin, IECS, FCU
1R Example
90
Ming-Yen Lin, IECS, FCU
PRISM Algorithm
91
Ming-Yen Lin, IECS, FCU
PRISM Example
92
Ming-Yen Lin, IECS, FCU
Decision Tree vs. Rules
 Tree has implied order in
which splitting is
performed.
 Tree created based on
looking at all classes.
 Rules have no ordering
of predicates.
 Only need to look at one
class to generate its rules.
決策樹方法的優點?缺點?
93
Ming-Yen Lin, IECS, FCU
Related documents