Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Review:
Data Mining
n
CS 341, Spring 2007
n
Decision Trees
Neural networks
Lecture 6: Classification – issues,
regression, bayesian classification
© Prentice Hall
Data Mining Core Techniques
n Classification
n Clustering
n Association Rules
© Prentice Hall
Classification Outline
Goal: Provide an overview of the classification
problem and introduce some of the basic
algorithms
n
n
3
© Prentice Hall
Classification Outline
4
Classification Problem
n
Classification Problem Overview
Classification Techniques
– Regression
– Bayesian classification
n
n
© Prentice Hall
Classification Problem Overview
Classification Techniques
– Regression
– Bayesian classification
– Distance
– Decision Trees
– Rules
– Neural Networks
Goal: Provide an overview of the classification
problem and introduce some of the basic
algorithms
n
n
2
5
Given a database D={t1,t2,…,tn} and a set
of classes C={C1,…,Cm}, the
Classification Problem is to define a
mapping f:D C where each ti is assigned
to one class.
Actually divides D into equivalence
classes.
classes.
Prediction is similar, but may be viewed
as having infinite number of classes.
© Prentice Hall
6
1
Classification Examples
n
n
n
n
n
n
Classification Ex: Grading
Teachers classify students’
students’ grades as
A, B, C, D, or F.
Identify mushrooms as poisonous or
edible.
Predict when a river will flood.
Identify individuals with credit risks.
Speech recognition
Pattern recognition
© Prentice Hall
n
n
n
n
n
If x >= 90 then grade
=A.
If 80<=x<90 then
grade =B.
If 70<=x<80 then
grade =C.
If 60<=x<70 then
grade =D.
If x<60 then grade =F.
7
Letter C
Letter D
Letter E
Letter F
n
n
© Prentice Hall
>=90
x
<80
x
<70
x
<50
F
A
>=80
B
>=70
C
>=60
D
8
Classification Techniques
View letters as constructed from 5 components:
Letter B
<90
© Prentice Hall
Classification Ex: Letter
Recognition
Letter A
x
n
Approach:
1. Create specific model by evaluating
training data (or using domain
experts’
experts’ knowledge).
2. Apply model developed to new data.
Classes must be predefined
Most common techniques use DTs,
NNs,
NNs, or are based on distances or
statistical methods.
9
© Prentice Hall
Defining Classes
10
Issues in Classification
n
Distance Based
Missing Data
– Ignore
– Replace with assumed value
n
Overfitting
– Large set of training data
– Filter out erroneous or noisy data
n
–
–
–
Partitioning Based
© Prentice Hall
Measuring Performance
11
Classification accuracy on test data
Confusion matrix
OC Curve
© Prentice Hall
12
2
Classification Accuracy
n
True positive (TP)
n
False positive (FP)
n
True negative (TN)
n
False negative (FN)
Classification Performance
– ti Predicted to be in Cj and is actually in it.
True Positive
False Negative
False Positive
True Negative
– ti Predicted to be in Cj but is not actually in it.
– ti not predicted to be in Cj and is not actually in it.
– ti not predicted to be in Cj but is actually in it.
© Prentice Hall
13
n
n
An m x m matrix
Entry Ci,j indicates the number of tuples
assigned to Cj, but where the correct
class is Ci
The best solution will only have nonnonzero values on the diagonal.
© Prentice Hall
14
Height Example Data
Confusion Matrix
n
© Prentice Hall
15
Confusion Matrix Example
N am e
K ris tin a
J im
M a g g ie
M a rth a
S te p h a n ie
Bob
K a th y
D ave
W o rth
S te v e n
D e b b ie
Todd
K im
Amy
W y n e tte
G ender
F
M
F
F
F
M
F
M
M
M
F
M
F
F
F
H e ig h t
1 .6 m
2m
1 .9 m
1 .8 8 m
1 .7 m
1 .8 5 m
1 .6 m
1 .7 m
2 .2 m
2 .1 m
1 .8 m
1 .9 5 m
1 .9 m
1 .8 m
1 .7 5 m
© Prentice Hall
O u tp u t1
S h o rt
T a ll
M e d iu m
M e d iu m
S h o rt
M e d iu m
S h o rt
S h o rt
T a ll
T a ll
M e d iu m
M e d iu m
M e d iu m
M e d iu m
M e d iu m
O u tp u t2
M e d iu m
M e d iu m
T a ll
T a ll
M e d iu m
M e d iu m
M e d iu m
M e d iu m
T a ll
T a ll
M e d iu m
M e d iu m
T a ll
M e d iu m
M e d iu m
16
Operating Characteristic Curve
Using height data example with Output1
(correct) and Output2 (actual) assignment
Actual
Assignment
Membership
Short
Medium
Short
0
4
Medium
0
5
Tall
0
1
© Prentice Hall
Tall
0
3
2
17
© Prentice Hall
18
3
Classification Outline
Regression
Goal: Provide an overview of the classification
problem and introduce some of the basic
algorithms
n
n
Classification Problem Overview
Classification Techniques
– Regression
– Distance
– Decision Trees
– Rules
– Neural Networks
© Prentice Hall
n
Assume data fits a predefined function
n
Determine best values for parameters in the
model
n
Estimate an output value based on input
values
n
Can be used for classification and prediction
19
© Prentice Hall
Linear Regression
n
n
n
n
Example: 4.3
Assume the relation of the output variable to
the input variables is a linear function of some
parameters.
Determine best values for regression
coefficients c0,c1,…,cn.
Assume an error: y = c0+c1x1+…+cnxn+ε
Estimate error using mean squared error for
training set:
n
n
n
21
Example: 4.4
n
n
n
n
Y = C0 + ε
Find the value for c0 that best partition
the height values into classes: short and
medium
The training data for yi is
{1.6, 1.9, 1.88, 1.7, 1.85, 1.6, 1.7, 1.8, 1.95,
1.9, 1.8, 1.75}
n
© Prentice Hall
20
How ?
© Prentice Hall
22
Linear Regression Poor Fit
Y = c0 + c0 x1 + ε
Find the value for c0 and c1 that best predict
the class.
Assume 0 for the short class, 1 for the
medium class
The training data for (xi, yi) is
{(1.6,0), (1.9,0) , (1.88, 0), (1.7, 0), (1.85, 0), (1.6, 0), (1.7,0),
(1.7,0), (1.8,0),
(1.95, 0), (1.9, 0), (1.8, 0), (1.75, 0)}
n
How ?
© Prentice Hall
23
© Prentice Hall
24
4
Division
Classification Using Regression
n
n
Division: Use regression function to
divide area into regions.
Prediction:
Prediction: Use regression function to
predict a class membership function.
© Prentice Hall
25
© Prentice Hall
Prediction
26
Logistic Regression
n
n
n
A generalized linear model
Extensively used in the medical and social
sciences
It has the following form
Loge (p /p -1) = c0 + c1x1 + … + ckxk
p is the probability of being in the class, 1 – p is the
probability that is not.
The parameters c0, c1, … ck are usually estimated by
maximum likelihood. (maximize the probability of
observing the given value.)
© Prentice Hall
27
28
Linear Regression vs. Logistic
Regression
Why Logistic Regression
n
© Prentice Hall
P is in the range [0,1]
– A good model would like to have p value close to
0 or 1
n
n
Linear function is not suitable for p
Consider the odds p/1p/1-p.
– As p increases, the odds (p/1(p/1-p) increases
– The odds is in the range of [0, +∞
+∞], asymmetric.
– The log odds lies in the range -∞ to +∞,
symmetric.
© Prentice Hall
29
© Prentice Hall
30
5
Classification Outline
Bayes Theorem
Goal: Provide an overview of the classification
problem and introduce some of the basic
algorithms
n
n
n
n
Classification Problem Overview
Classification Techniques
n
Posterior Probability: P(h1|xi)
Prior Probability: P(h1)
Bayes Theorem:
– Regression
– Bayesian classification
n
© Prentice Hall
31
© Prentice Hall
n
N am e
K ris tin a
J im
M a g g ie
M a rth a
S te p h a n ie
Bob
K a th y
D ave
W o r th
S te v e n
D e b b ie
Todd
K im
Amy
W y n e tte
Assume that the contribution by all
attributes are independent and that
each contributes equally to the
classification problem.
ti has m independent attributes {xi1,…, xim,}.
P (t
(ti | Cj)
∏ P (x
(xik | Cj)
© Prentice Hall
G ender
F
M
F
F
F
M
F
M
M
M
F
M
F
F
F
33
H e ig h t
1 .6 m
2m
1 .9 m
1 .8 8 m
1 .7 m
1 .8 5 m
1 .6 m
1 .7 m
2 .2 m
2 .1 m
1 .8 m
1 .9 5 m
1 .9 m
1 .8 m
1 .7 5 m
O u tp u t1
S h o rt
T a ll
M e d iu m
M e d iu m
S h o rt
M e d iu m
S h o rt
S h o rt
T a ll
T a ll
M e d iu m
M e d iu m
M e d iu m
M e d iu m
M e d iu m
O u t p u t2
M e d iu m
M e d iu m
T a ll
T a ll
M e d iu m
M e d iu m
M e d iu m
M e d iu m
T a ll
T a ll
M e d iu m
M e d iu m
T a ll
M e d iu m
M e d iu m
© Prentice Hall
Example 4.5
n
32
Example: using the output1 as
classification results
Naï
Naïve Bayes Classification
n
Assign probabilities of hypotheses given a
data value.
34
Example 4.5
Step1: Calculate the prior probability
n
Step1: Calculate the prior probability
– P (short) = 4/15 = 0.267
– P (medium) = 8/15 = 0.533
– P (tall) = 3/15 = 0.2
– P (short) =
– P (medium) =
– P (tall) =
n
Step 2: Calculate the conditional probability
– P(Genderi |Cj),
Genderi = F or M, Cj = short or medium or tall
– P(Heighti |Cj)
Heighti in (0,1.6],(1.6,1.7],(1.7,1.8],(1.8,1.9],(1.9,2.0],(>2.0).
© Prentice Hall
35
© Prentice Hall
36
6
Example 4.5 (cont’
(cont’d)
Attribute
Gender M
F
Height (<1.6]
(1.6,1.7]
(1.7,1.8]
(1.8,1.9]
(1.9,2.0]
( >2.0 )
count
short medium tall
1
2
3
3
6
0
2
0
0
2
0
0
0
3
0
0
4
0
0
1
1
0
0
2
Example 4.5 (cont’
(cont’d)
probability p(xi |Cj)
short medium tall
1/4
2/8
3/4
6/8
0/3
2/4
0
0
2/4
0
0
0
3/8
0
4/8
n
Given a tuple t ={Adam, M, 1.95m}
Step 3: Calculate P(t|Cj)
P(t|short)
P(t|short) =
P(t|medium)
P(t|medium) =
P(t|tall)=
P(t|tall)=
0
n
0
0
1/8
1/3
0
0
2/3
© Prentice Hall
n
3/3
Step 4: calculate P(t)
P(t)
P(t)
P(t) =
P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall)
P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall)
37
© Prentice Hall
Example 4.5 (cont’
(cont’d)
n
n
Example 4.5 (cont’
(cont’d)
Given a tuple t ={Adam, M, 1.95m}
Step 3: Calculate P(t|Cj)
n
Step 5: Calculate P(Cj | t) using Bayes Rule
P(short|t)
P(short|t) = P(t|short)P(short)/P(t)
P(t|short)P(short)/P(t) =
P(medium|t)
P(medium|t) =
P(tall|t)=
P(tall|t)=
P(t|short)
P(t|short) = ¼ x 0 =0
P(t|medium)
P(t|medium) = 2/8 x 1/8 =0.031
P(t|tall)=
P(t|tall)= 3/3 x1/3 =0.333
n
38
n
Step 4: calculate P(t)
P(t)
Last step:
– classify t based on these probabilities
P(t)
P(t) =
P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall)
P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall)
= 0.0826
© Prentice Hall
39
© Prentice Hall
Example 4.5 (cont’
(cont’d)
n
A Summary
Step 5: Calculate P(Cj | t) using Bayes Rule
P(short|t)
P(short|t) = P(t|short)P(short)/P(t)
P(t|short)P(short)/P(t) = 0
P(medium|t)
P(medium|t) = 0.2
P(tall|t)=
P(tall|t)= 0.799
n
Last step:
– Classify the new tuple as tall.
© Prentice Hall
40
41
n
Step 1: Calculate the prior probability of each class. P (C
(Cj)
n
Step 2: Calculate the conditional probability for each attribute
value, P(Genderi |Cj),
n
Step 3: Calculate the conditional probability P(t|Cj)
n
Step 4: calculate the prior probability of a tuple,
tuple, P(t)
P(t)
n
Step 5: Calculate the posterior probability for each class given
the tuple,
tuple, P(Cj | t) using Bayes Rule
n
Step 6: Classify a tuple based on the P(Cj | t), the tuple belongs
to the class with has the highest posterior probability.
© Prentice Hall
42
7
Next Lecture:
n
Classification:
– DistanceDistance-based algorithms
– Decision treetree-based algorithms
n
HW2 will be announced!
© Prentice Hall
43
8