Download No - Robo Paathshaala - Robotics Education for School Students

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Classification: Naïve Bayes Classifier
© Tan,Steinbach, Kumar
Introduction to Data Mining
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
4/18/2004
1
Joint, Marginal, Conditional Probability…
To determine probabilities of events that result from
combining other events in various ways.
There are several types of combinations and relationships
between events:
•Complement of an event [everything other than that event]
•Intersection of two events [event A and event B] or [A*B]
•Union of two events [event A or event B] or [A+B]
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.2
Example of Joint Probability
Why are some mutual fund managers more successful than
others? One possible factor is where the manager earned his
or her MBA. The following table compares mutual fund
performance against the ranking of the school where the
fund manager earned their MBA:
Mutual fund outperforms
the market
Mutual fund doesn’t
outperform the market
Top 20 MBA program
.11
.29
Not top 20 MBA program
.06
.54
Venn Diagrams
E.g. This is the probability that a mutual fund
outperforms AND the manager was in a top20 MBA program; it’s a joint probability
[intersection].
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.3
Example of Joint Probability
Alternatively, we could introduce shorthand notation to
represent the events:
A1 = Fund manager graduated from a top-20 MBA program
A2 = Fund manager did not graduate from a top-20 MBA program
B1 = Fund outperforms the market
B2 = Fund does not outperform the market
A1
A2
B1
B2
.11
.29
.06
.54
E.g. P(A2 and B1) = .06
= the probability a fund outperforms the market
and the manager isn’t from a top-20 school.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.4
Marginal Probabilities…
Marginal probabilities are computed by adding across rows
and down columns; that is they are calculated in the margins
of the table:
P(A2) = .06 + .54
“what’s the probability a fund
manager isn’t from a top school?”
B1
B2
P(Ai)
A1
A2
.11
.29
.40
.06
.54
.60
P(Bj)
.17
.83
1.00
P(B1) = .11 + .06
“what’s the probability a fund
outperforms the market?”
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
BOTH margins must add to 1
(useful error check)
6.5
Conditional Probability…
Conditional probability is used to determine how two events
are related; that is, we can determine the probability of one
event given the occurrence of another related event.
Experiment: random select one student in class.
P(randomly selected student is male) =
P(randomly selected student is male/student is on 3rd row) =
Conditional probabilities are written as P(A | B) and read as
“the probability of A given B” and is calculated as:
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.6
Conditional Probability…
Again, the probability of an event given that another event
has occurred is called a conditional probability…
P( A and B) = P(A)*P(B/A) = P(B)*P(A/B) both are true
Keep this in mind!
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.7
Conditional Probability…
Example 6.2 • What’s the probability that a fund will
outperform the market given that the manager graduated
from a top-20 MBA program?
Recall:
A1 = Fund manager graduated from a top-20 MBA program
A2 = Fund manager did not graduate from a top-20 MBA program
B1 = Fund outperforms the market
B2 = Fund does not outperform the market
Thus, we want to know “what is P(B1 | A1) ?”
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.8
Conditional Probability…
We want to calculate P(B1 | A1)
B1
B2
P(Ai)
A1
A2
.11
.29
.40
.06
.54
.60
P(Bj)
.17
.83
1.00
Thus, there is a 27.5% chance that that a fund will outperform the market
given that the manager graduated from a top-20 MBA program.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.9
Independence…
One of the objectives of calculating conditional probability is
to determine whether two events are related.
In particular, we would like to know whether they are
independent, that is, if the probability of one event is not
affected by the occurrence of the other event.
Two events A and B are said to be independent if
P(A|B) = P(A)
and
P(B|A) = P(B)
P(you have a flat tire going home/radio quits working)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.10
Are B1 and A1 Independent
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.11
Independence…
For example, we saw that
P(B1 | A1) = .275
The marginal probability for B1 is: P(B1) = 0.17
Since P(B1|A1) ≠ P(B1), B1 and A1 are not independent
events.
Stated another way, they are dependent. That is, the
probability of one event (B1) is affected by the occurrence of
the other event (A1).
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.12
Determine the probability that a fund
outperforms (B1) or the manager graduated
from a top-20 MBA program (A1).
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.13
Union…
Determine the probability that a fund outperforms (B1)
or the manager graduated from a top-20 MBA program (A1).
A1 or B1 occurs whenever:
A1 and B1 occurs, A1 and B2 occurs, or A2 and B1 occurs…
B1
B2
P(Ai)
A1
A2
.11
.29
.40
.06
.54
.60
P(Bj)
.17
.83
1.00
P(A1 or B1) = .11 + .06 + .29 = .46
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
6.14
Data Mining
Classification: Naïve Bayes Classifier
© Tan,Steinbach, Kumar
Introduction to Data Mining
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
4/18/2004
15
Classification: Definition
Given a collection of records (training set )
Each record contains a set of attributes, one of the
attributes is the class.
Find a model for class attribute as a function of the
values of other attributes.
Goal: previously unseen records should be assigned a
class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually, the given
data set is divided into training and test sets, with training set used to build the
model and test set used to validate it.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Illustrating Classification Task
Tid
Attrib1
Attrib2
Attrib3
Class
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Learning
algorithm
Induction
Learn
Model
Model
10
Training Set
Tid
Attrib1
Attrib2
Attrib3
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
Apply
Model
Class
10
Test Set
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Deduction
Examples of Classification Task
Predicting tumor cells as benign or malignant
Classifying credit card transactions
as legitimate or fraudulent
Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
Categorizing news stories as finance,
weather, entertainment, sports, etc
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
Splitting Attributes
Refund
Yes
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
NO
> 80K
YES
10
Training Data
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Married
Model: Decision Tree
Another Example of Decision Tree
MarSt
Tid Refund Marital
Status
Taxable
Income Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
10
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Married
NO
Single,
Divorced
Refund
No
Yes
NO
TaxInc
< 80K
NO
> 80K
YES
There could be more than one tree that
fits the same data!
Decision Tree Classification Task
Tid
Attrib1
Attrib2
Attrib3
Class
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Tree
Induction
algorithm
Induction
Learn
Model
Model
10
Training Set
Tid
Attrib1
Attrib2
Attrib3
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
Apply
Model
Class
10
Test Set
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Deduction
Decision
Tree
Apply Model to Test Data
Test Data
Start from the root of tree.
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
Apply Model to Test Data
Test Data
Refund
Yes
Refund Marital
Status
Taxable
Income Cheat
No
80K
Married
?
10
No
NO
MarSt
Single, Divorced
TaxInc
< 80K
NO
Married
NO
> 80K
YES
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Assign Cheat to “No”
Bayes Classifier
A probabilistic framework for solving classification
problems
Conditional Probability:
P ( A, C )
P (C | A) 
P ( A)
Bayes theorem:
P ( A, C )
P( A | C ) 
P (C )
P( A | C ) P(C )
P(C | A) 
P( A)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Bayes Theorem
Given a hypothesis h and data D which bears on the hypothesis:
P ( D | h) P ( h)
P ( h | D) 
P( D)
P(h): independent probability of h: prior probability
P(D): independent probability of D
P(D|h): conditional probability of D given h: likelihood
P(h|D): conditional probability of h given D: posterior
probability
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example of Bayes Theorem
Given:
A doctor knows that meningitis causes stiff neck 50% of the time
Prior probability of any patient having meningitis is 1/50,000
Prior probability of any patient having stiff neck is 1/20
If a patient has stiff neck, what’s the probability he/she
has meningitis?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example of Bayes Theorem
Given:
A doctor knows that meningitis causes stiff neck 50% of the time
Prior probability of any patient having meningitis is 1/50,000
Prior probability of any patient having stiff neck is 1/20
If a patient has stiff neck, what’s the probability he/she
has meningitis?
P( S | M ) P( M ) 0.5 1 / 50000
P( M | S ) 

 0.0002
P( S )
1 / 20
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Bayesian Classifiers
Consider each attribute and class label as random variables
Given a record with attributes (A1, A2,…,An)
Goal is to predict class C
Specifically, we want to find the value of C that maximizes P(C|
A1, A2,…,An )
Can we estimate P(C| A1, A2,…,An ) directly from data?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Bayesian Classifiers
Approach:
compute the posterior probability P(C | A1, A2, …, An) for all
values of C using the Bayes theorem
P(C | A A  A ) 
1
2
n
P( A A  A | C ) P(C )
P( A A  A )
1
2
n
1
2
n
Choose value of C that maximizes
P(C | A1, A2, …, An)
Equivalent to choosing value of C that maximizes
P(A1, A2, …, An|C) P(C)
How to estimate P(A1, A2, …, An | C )?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Bayes Classifier
Likelihood
Normalization Constant
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Prior
Maximum A Posterior
Based on Bayes Theorem, we can compute the Maximum A
Posterior (MAP) hypothesis for the data
We are interested in the best hypothesis for some space H
given observed training data D.
hMAP  argmax P(h | D)
hH
 argmax
hH
P ( D | h) P ( h)
P( D)
 argmax P( D | h) P(h)
hH
H: set of all hypothesis.
Note that we can drop P(D) as the probability of the data is constant
(and independent of the hypothesis).
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Naïve Bayes Classifier
Assume independence among attributes Ai when class is
given:
P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
Can estimate P(Ai| Cj) for all Ai and Cj.
New point is classified to Cj if P(Cj)  P(Ai| Cj) is maximal.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
•
Example: Play Tennis
38
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
•
Learning Phase
Outlook
Play=Yes
Play=No
Temperature
Play=Yes
Play=No
Sunny
2/9
4/9
3/9
3/5
0/5
2/5
Hot
2/9
4/9
3/9
2/5
2/5
1/5
Overcast
Rain
Humidity
High
Normal
Mild
Cool
Play=Yes Play=No
3/9
6/9
4/5
1/5
Play=Yes
Play=No
Strong
3/9
6/9
3/5
2/5
Weak
P(Play=Yes) = 9/14
P(Play=No) = 5/14
39
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Wind
Example
•
Test Phase
–
Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
–
P(Outlook=Sunny|Play=Yes) = 2/9
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9
P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14
P(Play=No) = 5/14
MAP rule
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
40
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Naïve Bayes Classifier
If one of the conditional probability is zero, then the entire
expression becomes zero
Probability estimation:
N ic
Original : P ( Ai | C ) 
Nc
N ic  1
Laplace : P ( Ai | C ) 
Nc  c
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
c: number of classes
p: prior probability
Problem Solving
Name
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Give Birth
yes
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Can Fly
no
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
Live in Water Have Legs
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
yes
no
no
no
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
yes
yes
yes
no
yes
mammals
non-mammals A: attributes
non-mammals M: mammals
mammals
N: non-mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
Live in Water Have Legs
yes
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
no
Class
Class
?
Solution
Name
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Give Birth
yes
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
Can Fly
no
Live in Water Have Legs
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
Class
yes
no
no
no
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
yes
yes
yes
no
yes
mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
non-mammals
non-mammals
mammals
non-mammals
mammals
non-mammals
Live in Water Have Legs
yes
no
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Class
?
A: attributes
M: mammals
N: non-mammals
6 6 2 2
P ( A | M )      0.06
7 7 7 7
1 10 3 4
P ( A | N )      0.0042
13 13 13 13
7
P ( A | M ) P( M )  0.06   0.021
20
13
P ( A | N ) P ( N )  0.004   0.0027
20
P(A|M)P(M) > P(A|N)P(N)
=> Mammals
Classifier Evaluation Metrics: Confusion Matrix
Confusion Matrix:
Actual class\Predicted class
C1
¬ C1
C1
True Positives (TP)
False Negatives (FN)
¬ C1
False Positives (FP)
True Negatives (TN)
Example of Confusion Matrix:
Actual class\Predicted buy_computer buy_computer
class
= yes
= no
Total
buy_computer = yes
6954
46
7000
buy_computer = no
412
2588
3000
Total
7366
2634
10000
Given m classes, an entry, CMi,j in a confusion matrix indicates #
of tuples in class i that were labeled by the classifier as class j
May have extra rows/columns to provide totals
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
44
Classifier Evaluation Metrics: Accuracy, Error Rate,
Sensitivity and Specificity
A\P
C
¬C
Class Imbalance Problem:
C TP FN P
 One class may be rare, e.g.
¬C FP TN N
fraud, or HIV-positive
P’ N’ All
 Significant majority of the
negative class and minority of
Classifier Accuracy, or recognition
rate: percentage of test set tuples that the positive class
 Sensitivity: True Positive
are correctly classified
recognition rate
Accuracy = (TP + TN)/All
 Sensitivity = TP/P
Error rate: 1 – accuracy, or
 Specificity: True Negative
Error rate = (FP + FN)/All
recognition rate
 Specificity = TN/N
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

45
Measuring Error
Error rate
= # of errors / # of instances = (FN+FP) / N
Recall = # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit rate
Precision
= # of found positives / # of found
= TP / (TP+FP)
Specificity
= TN / (TN+FP)
False alarm rate = FP / (FP+TN) = 1 - Specificity
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
46
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.