Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
CLASSIFICATION: Bayesian Classifiers
Uses Bayes’ (Thomas Bayes, 1701-1781) Theorem to build
probabilistic models of relationships between attributes
and classes
Statistical principle for combining prior class knowledge
with new evidence from data
Multiple implementations
Naïve Bayes
Bayesian networks
CLASSIFICATION: Bayesian Classifiers
Requires concept of conditional probability
Measures the probability of an event given that (by
evidence or information) another event has occurred
Notation: P(A|B) = Probability of A given that
knowledge of B occurred
P(A|B) = P(A∩B)/P(B)
Equivalently if P(B) ≠ 0, = P(A∩B) = P(A|B)P(B)
BAYESIAN CLASSIFIERS: Conditional
Probability
Example:
Suppose 1% of a specific population has a form of cancer
A new diagnostic test
produces correct positive results for those with the
cancer of 99% of the time
produces correct negative results for those without the
cancer of 98% of the time
P(cancer) = 0.01
P(positive test|cancer) = 0.99
P(negative test|cancer) = 0.98
BAYESIAN CLASSIFIERS: Conditional
Probability
Example:
But what if you tested positive? What is the probability
that you actually have cancer?
Bayes’ Theorem “reverses” the process to provide us
with an answer.
BAYESIAN CLASSIFIERS: Bayes’
Theorem
P(B|A)
= P(B∩A)/P(A), if P(A)≠0
= P(A∩B)/P(A)
= P(A|B)P(B)/P(A)
Application to our example
P(cancer | test positive) =
P(test positive | cancer)*P(cancer)/P(test positive) =
0.01*0.99/(0.01*0.99+0.99*0.02) = 0.33
BAYESIAN CLASSIFIERS: Bayes’
Theorem
0.99
cancer
0.01
0.99
0.01
No cancer
Test positive
Test negative
0.02
Test positive
0.98
Test negative
BAYESIAN CLASSIFIERS: Naïve Bayes’
Bayes’ Theorem Interpretation
P(class C| F1, F2, … , Fn) =
P(class C) × P(F1, F2, … , Fn| C)/P(F1, F2, … , Fn)
posterior = prior × likelihood/evidence
BAYESIAN CLASSIFIERS: Naïve Bayes’
Key concepts
Denominator independent of class C
Denominator effectively constant
Numerator equivalent to joint probability model
P(C, F1, F2, … , Fn)
Naïve conditional independence assumptions
P(C|F1, F2, … , Fn) ∝ P(C)P(Fn1|C) P(F2|C) ⋯ P(Fn|C)
∝ P (C ) Õ P ( Fi | C )
i=1
BAYESIAN CLASSIFIERS: Naïve Bayes’
Multiple distributional assumptions possible
Gaussian
Multinomial
Bernoulli
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Training set (example from Wikipedia)
Sex
Height(feet)
Weight(pounds)
Foot size(inches)
male
male
male
male
female
female
female
female
6
5.92 (5'11")
5.58 (5'7")
5.92 (5'11")
5
5.5 (5'6")
5.42 (5'5")
5.75 (5'9")
180
190
170
165
100
150
130
150
12
11
12
10
6
8
7
9
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Assumptions
Continuous data
Gaussian (Normal) distribution
2ö
æ
-( x - m)
1
÷
p=
exp çç
2
÷
2
2
s
2ps
è
ø
P(male) = P(female) = 0.5
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Classifier generated from training set
Sex
Height mean
Height variance
Weight mean
Weight variance
Foot size mean
Foot size variance
male
5.855
0.035033
176.25
122.92
11.25
0.91667
female
5.4175
0.097225
132.5
558.33
7.5
1.6667
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Test sample
Sex
Height
Weight
Foot size
sample
6
130
8
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Calculate posterior probabilities for both genders
Posterior(male) =
P(male)P(height|male)P(weight|male)P(foot
size|male)/evidence
Posterior(female) =
P(female)P(height|female)P(weight|female)P(f
oot size|female)/evidence
Evidence is constant and same so we ignore
denominators
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Calculations for male
P(male) = 0.5 (assumed)
1
P(height|male) = 2p 0.035033
æ - ( 6 - 5.855) 2 ö
÷ » 1.5789
exp çç
) è 2 ( 0.035033) ÷ø
(
P(weight|male) =
P(foot size|male) =
æ - (130 -176.25) 2 ö
÷ » 5.9881×10 -6
exp çç
÷
2p (122.92)
è 2 (122.92) ø
æ - (8 -11.25) 2 ö
1
÷ » 1.3112 ×10 -3
exp çç
÷
2p ( 0.91667)
è 2 ( 0.91667) ø
1
Posterior numerator (male)
» 6.1984×10-9
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Calculations for female
P(male) = 0.5 (assumed)
æ - ( 6 - 5.4175)
1
P(height|female) = 2p ( 0.097225) exp ççè 2 ( 0.097225)
P(weight|female) =
P(foot size|female) =
2
ö
÷ » 0.22346
÷
ø
æ - (130 -132.5) 2 ö
÷ » 0.016789
exp çç
÷
2p ( 558.33)
è 2 ( 558.33) ø
1
æ - (8 - 7.5) 2 ö
÷ » 1.3112 ×10-3
exp çç
÷
2p (1.6667)
è 2 (1.6667) ø
1
Posterior numerator (female)
» 5.3778×10-4
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Conclusion
Posterior numerator (significantly) greater for
female classification than for male, so classify
sample as female
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
Note
We did not calculate P(evidence) [normalizing
constant] since not needed, but could
P(evidence) =
P(male)P(height|male)P(weight|male)P(fo
ot size|male)
+
P(female)P(height|female)P(weight|
female)P(foot size|female)
BAYESIAN CLASSIFIERS: Bayesian
Networks
Judea Pearl (UCLA Computer Science, Cognitive
Systems Lab): one of the pioneers of Bayesian
Networks
Author: Probabilistic Reasoning in Intelligent
Systems,1988
Father of journalist Daniel Pearl
Kidnapped and murdered in Pakistan in 2002 by
Al-Queda
BAYESIAN CLASSIFIERS: Bayesian
Networks
Probabilistic graphical model
Represents random variables and conditional
dependencies using a directed acyclic graph
(DAG)
Nodes of graph represent random variables
BAYESIAN CLASSIFIERS: Bayesian
Networks
Edges of graph represent conditional
dependencies
Unconnected nodes conditionally independent
of each other
Does not require all attributes to be conditionally
independent
BAYESIAN CLASSIFIERS: Bayesian
Networks
Probability table associating each node to its
immediate parent nodes
If node X has no immediate parents, table
contains only prior probability P(X)
If one parent Y, table contains P(X|Y)
If multiple parents {Y1, Y2, ⋯ , Yn}, table contains
P(X|Y1, Y2, ⋯ , Yn)
BAYESIAN CLASSIFIERS: Bayesian
Networks
BAYESIAN CLASSIFIERS: Bayesian
Networks
Model encodes relevant probabilities from
which probabilistic inferences can then be
calculated
Joint probability: P(G, S, R) = P(R)P(S|R)*P(G|S, R)
G = “Grass wet”
S = “Sprinkler on”
R = “Raining”
BAYESIAN CLASSIFIERS: Bayesian
Networks
We can then calculate, for example:
P ( it is raining | grass is wet ) =
=
å
å
sprinklerÎ{T ,F }
P ( it is raining AND grass is wet )
P (grass is wet )
P (grass is wet = T AND sprinkler AND raining=T )
sprinklerÎ{T ,F }, rainingÎ{T ,F }
P (grass is wet = T AND sprinkler AND raining)
BAYESIAN CLASSIFIERS: Bayesian
Networks
That is
P ( it is raining | grass is wet ) =
P ( TTT ) + P ( TFT )
=
P ( TTT ) + P ( TTF ) + P ( TFT) + P ( TFF )
0.198 + 0.154
» 0.3577
0.198 + 0.288 + 0.1584 + 0.0
BAYESIAN CLASSIFIERS: Bayesian
Networks
Building the model
Create network structure (graph)
Determine probability values of tables
Simplest case
Network defined by user
Most real-world cases
Defining network too com[plex
Use machine learning: many algorithms
BAYESIAN CLASSIFIERS: Bayesian
Networks
Algorithms built into Weka
User defined network
Conditional independence tests
Genetic search
Hill climber
K2
Simulated annealing
Maximum weight spanning tree
Tabu search
BAYESIAN CLASSIFIERS: Bayesian
Networks
Many other versions online
BNT (Bayes’ Net Tree) Matlab toolbox
Kevin Murphy, University of British Columbia
http://www.cs.ubc.ca/~murphyk/Software/