Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Naïve Bayes Classifier The Naïve Bayes Classifier is the most simple classification models proposed in the literature to solve the classification problem. The basis for this rests on the Bayesian model and it makes some assumptions to calculate probabilities of the instance belonging to different class labels. Let us look at the mathematical formulation of the Naïve bayes model. Given a set of variables X = {x1 , x2 . . . xd }, we want to ﬁnd out the posterior probability for the event Cj among a set of possible outcomes C = {c1 , c2 . . . cd } Using Bayes rule we can say that P (Cj |x1 , x2 . . . xd ) = P (x1 , x2 . . . xd |Cj )P (Cj ) (1) Under the Naive bayes assumption that the conditional probabilities of the independent variables are statistically independent we can represent the likelihood as a product of terms then d P (X|Cj ) = π P ((xk |Cj ) k=1 (2) Substituting this in Bayes rule we derive that d P (Cj |X) = P (Cj ) π P (xk |Cj ) (3) k=1 where P (Cj |X) is deﬁned as the Naive bayes posterior probability. Once the posteriors for all the class labels have been estimated the instance is classified and is assigned a class label Cj if P (Cj |X) is the maximum value among all the posterior probability values. To understand the working of the Naïve Bayes Classification Model let us look at its working over the example of the weather dataset. In the figure below we present few samples of the weather dataset. If we look at this dataset the first thing we must identify is the class label which happens to be the attribute play here. It is a binary attribute. So now before we proceed to the classification task the first thing we must do here is estimate the class prior probabilities. This can simply be done by computing the individual frequencies of the two class labels in the 14 instances. So the priors are P(yes)=P(C1)=9/14 P(no)=P(C2)=5/14 Now the second step deals with estimating the probability of the data given class. Now if we indicate each of the individual attributes as x1,x2,x3,x4 then using equation 2 we can understand that this probability must be computed as a product of the individual probabilities. For example P(x1=sunny|C2)= 3/5 which can be observed from the table above. Like this if we compute all the probabilities then for computing the probability of the occurrence of the first instance which P(x=(sunny, hot, high, false)|C2)= P(x1=sunny|C2)*P(x2=hot|C2)*P(x3=high|C2)*P(x4=false|C2)= 3/5*2/5*4/5*2/5= 48/625 So now at the end of the second step we have computed the probability of occurrence of the data given the class. So now to finally calculate the posterior probability which is the probability of class given the instance had occurred which is calculated using equation 3 here for class C2 P(C2|x)=(P(x=(sunny, hot, high, false)|C2) * P(C2))/P(x) = 0.0275 in this case As the denominator P(x) is constant for the same data sample we are just interested in computing the numerator for different classes respectively. For this case we compute P(C1|x) similarly and compare both the values and assign the class whose posterior value is higher. It was generally observed that Naïve Bayes works well despite the independence assumption. Also experiments show it to be quite competitive with other methods on standard datasets. The Naïve Bayes algorithm is readily implemented in the Weka toolkit. As this algorithm needs nominal/categorical attributes for classification we have to discretize numerical data inorder to run Naïve Bayes on it. The quality of results in such cases is dependent on the efficiency of the discretization procedures also. The table below summarizes the performance of the Naïve Bayes classifier on few standard UCI datasets. These consist of both numerical and categorical data. The split ratio taken is 80% for training and 20% for testing. The metric we have calculated here is the accuracy of the classifier. Dataset Name Number of instances/ attributes Number of classes Accuracy(%) Iris 150/4 3 86* Pima Indian Diabetes 768/9 2 77 Segment 2310/19 7 92* Teacher Assistant 151/4 3 50 Haberman 306/3 2 44