Download DATA TYPES: Definitions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CLASSIFICATION: Bayesian Classifiers
 Uses Bayes’ (Thomas Bayes, 1701-1781) Theorem to build
probabilistic models of relationships between attributes
and classes
 Statistical principle for combining prior class knowledge
with new evidence from data
 Multiple implementations
 Naïve Bayes
 Bayesian networks
CLASSIFICATION: Bayesian Classifiers
 Requires concept of conditional probability
 Measures the probability of an event given that (by
evidence or information) another event has occurred
 Notation: P(A|B) = Probability of A given that
knowledge of B occurred
 P(A|B) = P(A∩B)/P(B)
 Equivalently if P(B) ≠ 0, = P(A∩B) = P(A|B)P(B)
BAYESIAN CLASSIFIERS: Conditional
Probability
 Example:
 Suppose 1% of a specific population has a form of cancer
 A new diagnostic test
 produces correct positive results for those with the
cancer of 99% of the time
 produces correct negative results for those without the
cancer of 98% of the time
 P(cancer) = 0.01
 P(positive test|cancer) = 0.99
 P(negative test|cancer) = 0.98
BAYESIAN CLASSIFIERS: Conditional
Probability
 Example:
 But what if you tested positive? What is the probability
that you actually have cancer?
 Bayes’ Theorem “reverses” the process to provide us
with an answer.
BAYESIAN CLASSIFIERS: Bayes’
Theorem
 P(B|A)
= P(B∩A)/P(A), if P(A)≠0
= P(A∩B)/P(A)
= P(A|B)P(B)/P(A)
 Application to our example
 P(cancer | test positive) =
 P(test positive | cancer)*P(cancer)/P(test positive) =
 0.01*0.99/(0.01*0.99+0.99*0.02) = 0.33
BAYESIAN CLASSIFIERS: Bayes’
Theorem
0.99
cancer
0.01
0.99
0.01
No cancer
Test positive
Test negative
0.02
Test positive
0.98
Test negative
BAYESIAN CLASSIFIERS: Naïve Bayes’
 Bayes’ Theorem Interpretation
 P(class C| F1, F2, … , Fn) =
P(class C) × P(F1, F2, … , Fn| C)/P(F1, F2, … , Fn)
 posterior = prior × likelihood/evidence
BAYESIAN CLASSIFIERS: Naïve Bayes’
 Key concepts
 Denominator independent of class C
 Denominator effectively constant
 Numerator equivalent to joint probability model
 P(C, F1, F2, … , Fn)
 Naïve conditional independence assumptions
 P(C|F1, F2, … , Fn) ∝ P(C)P(Fn1|C) P(F2|C) ⋯ P(Fn|C)
∝ P (C ) Õ P ( Fi | C )
i=1
BAYESIAN CLASSIFIERS: Naïve Bayes’
 Multiple distributional assumptions possible
 Gaussian
 Multinomial
 Bernoulli
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Training set (example from Wikipedia)
Sex
Height(feet)
Weight(pounds)
Foot size(inches)
male
male
male
male
female
female
female
female
6
5.92 (5'11")
5.58 (5'7")
5.92 (5'11")
5
5.5 (5'6")
5.42 (5'5")
5.75 (5'9")
180
190
170
165
100
150
130
150
12
11
12
10
6
8
7
9
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Assumptions
 Continuous data
 Gaussian (Normal) distribution
2ö
æ
-( x - m)
1
÷
p=
exp çç
2
÷
2
2
s
2ps
è
ø
 P(male) = P(female) = 0.5
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Classifier generated from training set
Sex
Height mean
Height variance
Weight mean
Weight variance
Foot size mean
Foot size variance
male
5.855
0.035033
176.25
122.92
11.25
0.91667
female
5.4175
0.097225
132.5
558.33
7.5
1.6667
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Test sample
Sex
Height
Weight
Foot size
sample
6
130
8
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Calculate posterior probabilities for both genders
 Posterior(male) =
P(male)P(height|male)P(weight|male)P(foot
size|male)/evidence
 Posterior(female) =
P(female)P(height|female)P(weight|female)P(f
oot size|female)/evidence
 Evidence is constant and same so we ignore
denominators
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Calculations for male
 P(male) = 0.5 (assumed)
1
 P(height|male) = 2p 0.035033
æ - ( 6 - 5.855) 2 ö
÷ » 1.5789
exp çç
) è 2 ( 0.035033) ÷ø
(
 P(weight|male) =
 P(foot size|male) =
æ - (130 -176.25) 2 ö
÷ » 5.9881×10 -6
exp çç
÷
2p (122.92)
è 2 (122.92) ø
æ - (8 -11.25) 2 ö
1
÷ » 1.3112 ×10 -3
exp çç
÷
2p ( 0.91667)
è 2 ( 0.91667) ø
1
 Posterior numerator (male)
» 6.1984×10-9
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Calculations for female
 P(male) = 0.5 (assumed)
æ - ( 6 - 5.4175)
1
 P(height|female) = 2p ( 0.097225) exp ççè 2 ( 0.097225)
 P(weight|female) =
 P(foot size|female) =
2
ö
÷ » 0.22346
÷
ø
æ - (130 -132.5) 2 ö
÷ » 0.016789
exp çç
÷
2p ( 558.33)
è 2 ( 558.33) ø
1
æ - (8 - 7.5) 2 ö
÷ » 1.3112 ×10-3
exp çç
÷
2p (1.6667)
è 2 (1.6667) ø
1
 Posterior numerator (female)
» 5.3778×10-4
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Conclusion
 Posterior numerator (significantly) greater for
female classification than for male, so classify
sample as female
BAYESIAN CLASSIFIERS: Naïve Bayes’
Example
 Note
 We did not calculate P(evidence) [normalizing
constant] since not needed, but could
 P(evidence) =
P(male)P(height|male)P(weight|male)P(fo
ot size|male)
+
P(female)P(height|female)P(weight|
female)P(foot size|female)
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Judea Pearl (UCLA Computer Science, Cognitive
Systems Lab): one of the pioneers of Bayesian
Networks
 Author: Probabilistic Reasoning in Intelligent
Systems,1988
 Father of journalist Daniel Pearl
 Kidnapped and murdered in Pakistan in 2002 by
Al-Queda
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Probabilistic graphical model
 Represents random variables and conditional
dependencies using a directed acyclic graph
(DAG)
 Nodes of graph represent random variables
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Edges of graph represent conditional
dependencies
 Unconnected nodes conditionally independent
of each other
 Does not require all attributes to be conditionally
independent
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Probability table associating each node to its
immediate parent nodes
 If node X has no immediate parents, table
contains only prior probability P(X)
 If one parent Y, table contains P(X|Y)
 If multiple parents {Y1, Y2, ⋯ , Yn}, table contains
P(X|Y1, Y2, ⋯ , Yn)
BAYESIAN CLASSIFIERS: Bayesian
Networks
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Model encodes relevant probabilities from
which probabilistic inferences can then be
calculated
 Joint probability: P(G, S, R) = P(R)P(S|R)*P(G|S, R)
 G = “Grass wet”
 S = “Sprinkler on”
 R = “Raining”
BAYESIAN CLASSIFIERS: Bayesian
Networks
 We can then calculate, for example:
P ( it is raining | grass is wet ) =
=
å
å
sprinklerÎ{T ,F }
P ( it is raining AND grass is wet )
P (grass is wet )
P (grass is wet = T AND sprinkler AND raining=T )
sprinklerÎ{T ,F }, rainingÎ{T ,F }
P (grass is wet = T AND sprinkler AND raining)
BAYESIAN CLASSIFIERS: Bayesian
Networks
 That is
P ( it is raining | grass is wet ) =
P ( TTT ) + P ( TFT )
=
P ( TTT ) + P ( TTF ) + P ( TFT) + P ( TFF )
0.198 + 0.154
» 0.3577
0.198 + 0.288 + 0.1584 + 0.0
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Building the model
 Create network structure (graph)
 Determine probability values of tables
 Simplest case
 Network defined by user
 Most real-world cases
 Defining network too com[plex
 Use machine learning: many algorithms
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Algorithms built into Weka








User defined network
Conditional independence tests
Genetic search
Hill climber
K2
Simulated annealing
Maximum weight spanning tree
Tabu search
BAYESIAN CLASSIFIERS: Bayesian
Networks
 Many other versions online
 BNT (Bayes’ Net Tree) Matlab toolbox
 Kevin Murphy, University of British Columbia
 http://www.cs.ubc.ca/~murphyk/Software/
Related documents