Download PPT13 - Naive Bayes Classifier

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Eco 6380
Predictive Analytics For Economists
Spring 2016
Professor Tom Fomby
Department of Economics
SMU
Presentation 13
Naïve Bayes Classifier
Chapter 8 in SPB
Basis of Naïve Bayes Classifier:
Bayes Theorem
β€’
The Naïve Bayes classifier is a classification method based on Bayes Theorem. Let 𝐢𝑗 denote
that an output belongs to the j-th class, 𝑗 = 1, 2, β‹― , 𝐽, out of J possible classes. Let
𝑃 𝐢𝑗 𝑋1 , 𝑋2 , β‹― , 𝑋𝑝 ) denote the (posterior) probability of belonging in the j-th class given the
individual characteristics 𝑋1 , 𝑋2 , β‹― , 𝑋𝑝 . Furthermore, let
denote the
probability of a case with individual characteristics 𝑋1 , 𝑋2 , β‹― , 𝑋𝑝 belonging to the j-th class
and 𝑃(𝐢𝑗 ) denote the unconditional (i.e. without regard to individual characteristics) prior
probability of belonging to the j-th class. For a total of J classes, Bayes theorem gives us the
following probability rule for calculating the case-specific probability of falling into the j-th
class:
where Denom ο€½ P( X 1 , X 2 , , X p | C1 ) P(C1 )    P( X 1 , X 2 , , X p | C J ) P(C J )
The Naïve Independence Assumption –
P( X 1 , X 2 , , X p | C j ) ο€½ P( X 1 | C j ) οƒ— P( X 2 | C j )  P( X p | C j ) .
This assumption states that the joint probability of a specific case being in Class j is (naively) equal
to the product of the individual probabilities of each individual characteristic being in Class j. This
simplifies the computations of the Class probabilities of cases and helps prevent class
probabilities from predominately being singular (i.e. either zero or one) across a majority of the
cases. Then, in the independent case, the terms on the right-hand-side of the above equation
can be calculated simply as the relative frequencies of the individual 𝑋𝑖 β€˜s in Class 𝐢𝑗 . For
example, the training data set could be used to calculate the relative frequency
P( X i | C j ) ο€½ [(# of X i in C j )/(total # of cases in C j )]
Two Ways of Calculating the
Prior Class Probabilities 𝑃(𝐢𝑗 )
β€’ β€œPure” Bayesian Uniform (Uninformative) Prior – each
Class has an equal probability
P(C j ) ο€½ 1 / J , j ο€½ 1,2,, J
β€’ Empirical Bayes Prior – the training set relative
frequencies of each Class are used as the β€œEmpirical”
Bayes Prior
P(C j ) = (# training cases falling into C j / total # of training cases)
β€’ For more details on the Naïve Bayes Classifier see
the pdf file β€œNaïve Bayes Classifier.pdf” on the
class website.
Classroom Exercise:
Exercise 8