Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

History of statistics wikipedia, lookup

Probability wikipedia, lookup

Inductive probability wikipedia, lookup

Transcript
```Naïve Bayes Classifier
The Naïve Bayes Classifier is the most simple classification models proposed in the
literature to solve the classification problem. The basis for this rests on the Bayesian
model and it makes some assumptions to calculate probabilities of the instance
belonging to different class labels. Let us look at the mathematical formulation of the
Naïve bayes model.
Given a set of variables X = {x1 , x2 . . . xd }, we want to ﬁnd out the posterior probability for the event Cj among a set of possible outcomes C =
{c1 , c2 . . . cd }
Using Bayes rule we can say that
P (Cj |x1 , x2 . . . xd ) = P (x1 , x2 . . . xd |Cj )P (Cj )
(1)
Under the Naive bayes assumption that the conditional probabilities of the
independent variables are statistically independent we can represent the likelihood as a product of terms then
d
P (X|Cj ) = π P ((xk |Cj )
k=1
(2)
Substituting this in Bayes rule we derive that
d
P (Cj |X) = P (Cj ) π P (xk |Cj )
(3)
k=1
where P (Cj |X) is deﬁned as the Naive bayes posterior probability.
Once the posteriors for all the class labels have been estimated the
instance is classified and is assigned a class label Cj if P (Cj |X) is the
maximum value among all the posterior probability values.
To understand the working of the Naïve Bayes Classification Model let
us look at its working over the example of the weather dataset. In the
figure below we present few samples of the weather dataset.
If we look at this dataset the first thing we must identify is the class label which happens
to be the attribute play here. It is a binary attribute. So now before we proceed to the
classification task the first thing we must do here is estimate the class prior
probabilities. This can simply be done by computing the individual frequencies of the
two class labels in the 14 instances. So the priors are
P(yes)=P(C1)=9/14
P(no)=P(C2)=5/14
Now the second step deals with estimating the probability of the data given class. Now
if we indicate each of the individual attributes as x1,x2,x3,x4 then using equation 2 we
can understand that this probability must be computed as a product of the individual
probabilities.
For example
P(x1=sunny|C2)= 3/5 which can be observed from the table above. Like this if we
compute all the probabilities then for computing the probability of the occurrence of
the first instance which
P(x=(sunny, hot, high, false)|C2)=
P(x1=sunny|C2)*P(x2=hot|C2)*P(x3=high|C2)*P(x4=false|C2)= 3/5*2/5*4/5*2/5=
48/625
So now at the end of the second step we have computed the probability of occurrence
of the data given the class. So now to finally calculate the posterior probability which is the
probability of class given the instance had occurred which is calculated using equation 3 here
for class C2
P(C2|x)=(P(x=(sunny, hot, high, false)|C2) * P(C2))/P(x) = 0.0275 in this case
As the denominator P(x) is constant for the same data sample we are just interested in
computing the numerator for different classes respectively. For this case we compute P(C1|x)
similarly and compare both the values and assign the class whose posterior value is higher.
It was generally observed that Naïve Bayes works well despite the independence assumption.
Also experiments show it to be quite competitive with other methods on standard datasets.
The Naïve Bayes algorithm is readily implemented in the Weka toolkit. As this algorithm needs
nominal/categorical attributes for classification we have to discretize numerical data inorder to
run Naïve Bayes on it. The quality of results in such cases is dependent on the efficiency of the
discretization procedures also.
The table below summarizes the performance of the Naïve Bayes classifier on few standard UCI
datasets. These consist of both numerical and categorical data. The split ratio taken is 80% for
training and 20% for testing. The metric we have calculated here is the accuracy of the
classifier.
Dataset Name
Number of instances/
attributes
Number of classes
Accuracy(%)
Iris
150/4
3
86*
Pima Indian Diabetes
768/9
2
77
Segment
2310/19
7
92*
Teacher Assistant
151/4
3
50
Haberman
306/3
2
44
```
Related documents