Download Classification And Bayesian Learning

Classification And Bayesian Learning Supervisor Prof. Dr. Mohamed Batouche Presented By Abdu Hassan AL- Gomai 1 Contents       Classification vs. Prediction. Classification Step Process. Supervised vs. Unsupervised Learning. Major Classification Models. Evaluating Classification Methods. Bayesian Classification. 2 Classification vs. Prediction  What is the difference between classification and prediction?  The decision tree is a classification model, applied to existing data. If you apply it to new data, for which the class is unknown, you also get a prediction of the class. [From ( http://www.kdnuggets.com/faq/classification-vsprediction.html )].   classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data. Typical Applications     Text Classification. target marketing. medical diagnosis. treatment effectiveness analysis. 3 Classification—A Two-Step Process  Model construction: describing a set of predetermined classes.     Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of tuples used for model construction is training set. The model is represented as classification rules, decision trees, or mathematical formula. Model usage: for classifying future or unknown objects  Estimate accuracy of the model.     The known label of test sample is compared with the classified result from the model. Accuracy rate is the percentage of test set samples that are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known. 4 Classification Process (1): Model Construction Training Data NAME Mike Mary Bill Jim Dave Anne RANK YEARS TENURED Assistant Prof 3 no Assistant Prof 7 yes Professor 2 yes Associate Prof 7 yes Assistant Prof 6 no Associate Prof 3 no Classification Algorithms Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ 5 Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME T om M erlisa G eorge Joseph RANK YEARS TENURED A ssistant P rof 2 no A ssociate P rof 7 no P rofessor 5 yes A ssistant P rof 7 yes Tenured? 6 Supervised vs. Unsupervised Learning  Supervised learning (classification)    Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations (Teacher presents inputoutput pairs). New data is classified based on the training set. Unsupervised learning (clustering)   The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data. 7 Major Classification Models       Classification by Bayesian Classification Decision tree induction Neural Networks Support Vector Machines (SVM) Classification Based on Associations Other Classification Methods     KNN Boosting Bagging … 8 Evaluating Classification Methods   Predictive accuracy Speed and scalability    Robustness   efficiency with respect to large data. Interpretability:   handling noise and missing values. Scalability   time to construct the model. time to use the model. understanding and insight provided by the model. Goodness of rules  compactness of classification rules. 9 Bayesian Classification Here we learn:  Bayesian classification  E.g. How to decide if a patient is ill or healthy, based on A probabilistic model of the observed data  Prior knowledge.  10 Classification problem   Training data: examples of the form (d,h(d))  where d are the data objects to classify (inputs)  and h(d) are the correct class info for d, h(d){1,…K} Goal: given dnew, provide h(dnew) 11 Why Bayesian?    Provides practical learning algorithms  E.g. Naïve Bayes Prior knowledge and observed data can be combined It is a generative (model based) approach, which offers a useful conceptual framework  E.g. sequences could also be classified, based on a probabilistic model specification  Any kind of objects can be classified, based on a probabilistic model specification 12 Bayes’ Rule P ( d | h) P ( h) p(h | d )  P(d ) Who is who in Bayes’ rule P ( h) : P ( d | h) : Understand ing Bayes' rule d  data h  hypothesis (model) - rearrangin g p ( h | d ) P ( d )  P ( d | h) P ( h) P ( d , h)  P ( d , h) the same joint probabilit y on both sides prior belief (probability of hypothesis h before seeing any data) likelihood (probability of the data if the hypothesis h is true) P(d )   P(d | h) P(h) : data evidence (marginal probability of the data) h P(h | d ) : posterior (probability of hypothesis h after having seen the data d ) 13 Naïve Bayes Classifier   What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances are conditionally independent given the classification hypothesis P(d | h)  P(a1 ,..., aT | h)   P(at | h) t    it is a simplifying assumption, obviously it may be violated in reality in spite of that, it works well in practice The Bayesian classifier that uses the Naïve Bayes assumption and computes the maximum hypothesis is called Naïve Bayes classifier  One of the most practical learning methods  Successful applications:  Medical Diagnosis  Text classification 14 Naïve Bayesian Classifier: Example1 The Evidence relates all attributes without Exceptions. Outlook Temp. Sunny Cool Humidity Windy High Play True ? Evidence E Pr[ yes | E ]  Pr[Outlook  Sunny | yes ]  Pr[Temperature  Cool | yes ] Probability of class “yes”  Pr[ Humidity  High | yes ]  Pr[Windy  True | yes] Pr[ yes]  Pr[ E ]  93  93  93  149  Pr[ E ] 2 9 15 Outlook Temperature Yes Humidity No Yes Windy Yes No No Sunny 2 3 Hot 2 2 High 3 4 Overcast 4 0 Mild 4 2 Normal 6 1 Rainy 3 2 Cool 3 1 Play Yes No Yes No False 6 2 9 5 True 3 3 9/14 5/14 Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 Rainy 3/9 2/5 Cool 3/9 1/5 Outlook Temp Humidity Windy Play Sunny Hot High False No True No High False Yes Sunny Overcast Hot Hot High Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No 16 Compute Prediction For New Day Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 Rainy 3/9 2/5 Cool 3/9 1/5 9/14 For compute prediction for new day: Outlook Temp. Humidity Windy Play Sunny Cool High True ? Likelihood of the two classes For “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053 For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206 Conversion into a probability by normalization: P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205 P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795 17 5/14 Naïve Bayesian Classifier: Example2 Training dataset age <=30 <=30 Class: 30…40 C1:buys_computer= >40 ‘yes’ >40 C2:buys_computer= >40 ‘no’ 31…40 <=30 Data sample <=30 X =(age<=30, >40 Income=medium, <=30 Student=yes 31…40 Credit_rating= 31…40 Fair) >40 income student credit_rating high no fair high no excellent high no fair medium no fair low yes fair low yes excellent low yes excellent medium no fair low yes fair medium yes fair medium yes excellent medium no excellent high yes fair medium no excellent buys_computer no no yes yes yes no yes no yes yes yes yes yes no 18 Naïve Bayesian Classifier: Example2  Compute P(X/Ci) for each class P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4 X=(age<=30 ,income =medium, student=yes,credit_rating=fair) P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.667 =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007 X belongs to class “buys_computer=yes” 19 Naïve Bayesian Classifier: Advantages and Disadvantages  Advantages :    Disadvantages      Easy to implement. Good results obtained in most of the cases. Assumption: class conditional independence , therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Naïve Bayesian Classifier. How to deal with these dependencies?  Bayesian Belief Networks. 20 References  Software: NB for classifying text: http://www-2.cs.cmu.edu/afs/cs/project/theo11/www/naive-bayes.html  Useful reading for those interested to learn more about NB classification, beyond the scope of this module: http://www-2.cs.cmu.edu/~tom/NewChapters.html. http:// www.cs.unc.edu/Courses/comp790-090 s08/Lecturenotes.  Introduction to Bayesian Learning, School of Computer Science, University of Birmingham, [email protected]. 21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Classification And Bayesian Learning