Download DATA TYPES: Definitions

CLASSIFICATION: Bayesian Classifiers  Uses Bayes’ (Thomas Bayes, 1701-1781) Theorem to build probabilistic models of relationships between attributes and classes  Statistical principle for combining prior class knowledge with new evidence from data  Multiple implementations  Naïve Bayes  Bayesian networks CLASSIFICATION: Bayesian Classifiers  Requires concept of conditional probability  Measures the probability of an event given that (by evidence or information) another event has occurred  Notation: P(A|B) = Probability of A given that knowledge of B occurred  P(A|B) = P(A∩B)/P(B)  Equivalently if P(B) ≠ 0, = P(A∩B) = P(A|B)P(B) BAYESIAN CLASSIFIERS: Conditional Probability  Example:  Suppose 1% of a specific population has a form of cancer  A new diagnostic test  produces correct positive results for those with the cancer of 99% of the time  produces correct negative results for those without the cancer of 98% of the time  P(cancer) = 0.01  P(positive test|cancer) = 0.99  P(negative test|cancer) = 0.98 BAYESIAN CLASSIFIERS: Conditional Probability  Example:  But what if you tested positive? What is the probability that you actually have cancer?  Bayes’ Theorem “reverses” the process to provide us with an answer. BAYESIAN CLASSIFIERS: Bayes’ Theorem  P(B|A) = P(B∩A)/P(A), if P(A)≠0 = P(A∩B)/P(A) = P(A|B)P(B)/P(A)  Application to our example  P(cancer | test positive) =  P(test positive | cancer)*P(cancer)/P(test positive) =  0.01*0.99/(0.01*0.99+0.99*0.02) = 0.33 BAYESIAN CLASSIFIERS: Bayes’ Theorem 0.99 cancer 0.01 0.99 0.01 No cancer Test positive Test negative 0.02 Test positive 0.98 Test negative BAYESIAN CLASSIFIERS: Naïve Bayes’  Bayes’ Theorem Interpretation  P(class C| F1, F2, … , Fn) = P(class C) × P(F1, F2, … , Fn| C)/P(F1, F2, … , Fn)  posterior = prior × likelihood/evidence BAYESIAN CLASSIFIERS: Naïve Bayes’  Key concepts  Denominator independent of class C  Denominator effectively constant  Numerator equivalent to joint probability model  P(C, F1, F2, … , Fn)  Naïve conditional independence assumptions  P(C|F1, F2, … , Fn) ∝ P(C)P(Fn1|C) P(F2|C) ⋯ P(Fn|C) ∝ P (C ) Õ P ( Fi | C ) i=1 BAYESIAN CLASSIFIERS: Naïve Bayes’  Multiple distributional assumptions possible  Gaussian  Multinomial  Bernoulli BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Training set (example from Wikipedia) Sex Height(feet) Weight(pounds) Foot size(inches) male male male male female female female female 6 5.92 (5'11") 5.58 (5'7") 5.92 (5'11") 5 5.5 (5'6") 5.42 (5'5") 5.75 (5'9") 180 190 170 165 100 150 130 150 12 11 12 10 6 8 7 9 BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Assumptions  Continuous data  Gaussian (Normal) distribution 2ö æ -( x - m) 1 ÷ p= exp çç 2 ÷ 2 2 s 2ps è ø  P(male) = P(female) = 0.5 BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Classifier generated from training set Sex Height mean Height variance Weight mean Weight variance Foot size mean Foot size variance male 5.855 0.035033 176.25 122.92 11.25 0.91667 female 5.4175 0.097225 132.5 558.33 7.5 1.6667 BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Test sample Sex Height Weight Foot size sample 6 130 8 BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Calculate posterior probabilities for both genders  Posterior(male) = P(male)P(height|male)P(weight|male)P(foot size|male)/evidence  Posterior(female) = P(female)P(height|female)P(weight|female)P(f oot size|female)/evidence  Evidence is constant and same so we ignore denominators BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Calculations for male  P(male) = 0.5 (assumed) 1  P(height|male) = 2p 0.035033 æ - ( 6 - 5.855) 2 ö ÷ » 1.5789 exp çç ) è 2 ( 0.035033) ÷ø (  P(weight|male) =  P(foot size|male) = æ - (130 -176.25) 2 ö ÷ » 5.9881×10 -6 exp çç ÷ 2p (122.92) è 2 (122.92) ø æ - (8 -11.25) 2 ö 1 ÷ » 1.3112 ×10 -3 exp çç ÷ 2p ( 0.91667) è 2 ( 0.91667) ø 1  Posterior numerator (male) » 6.1984×10-9 BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Calculations for female  P(male) = 0.5 (assumed) æ - ( 6 - 5.4175) 1  P(height|female) = 2p ( 0.097225) exp ççè 2 ( 0.097225)  P(weight|female) =  P(foot size|female) = 2 ö ÷ » 0.22346 ÷ ø æ - (130 -132.5) 2 ö ÷ » 0.016789 exp çç ÷ 2p ( 558.33) è 2 ( 558.33) ø 1 æ - (8 - 7.5) 2 ö ÷ » 1.3112 ×10-3 exp çç ÷ 2p (1.6667) è 2 (1.6667) ø 1  Posterior numerator (female) » 5.3778×10-4 BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Conclusion  Posterior numerator (significantly) greater for female classification than for male, so classify sample as female BAYESIAN CLASSIFIERS: Naïve Bayes’ Example  Note  We did not calculate P(evidence) [normalizing constant] since not needed, but could  P(evidence) = P(male)P(height|male)P(weight|male)P(fo ot size|male) + P(female)P(height|female)P(weight| female)P(foot size|female) BAYESIAN CLASSIFIERS: Bayesian Networks  Judea Pearl (UCLA Computer Science, Cognitive Systems Lab): one of the pioneers of Bayesian Networks  Author: Probabilistic Reasoning in Intelligent Systems,1988  Father of journalist Daniel Pearl  Kidnapped and murdered in Pakistan in 2002 by Al-Queda BAYESIAN CLASSIFIERS: Bayesian Networks  Probabilistic graphical model  Represents random variables and conditional dependencies using a directed acyclic graph (DAG)  Nodes of graph represent random variables BAYESIAN CLASSIFIERS: Bayesian Networks  Edges of graph represent conditional dependencies  Unconnected nodes conditionally independent of each other  Does not require all attributes to be conditionally independent BAYESIAN CLASSIFIERS: Bayesian Networks  Probability table associating each node to its immediate parent nodes  If node X has no immediate parents, table contains only prior probability P(X)  If one parent Y, table contains P(X|Y)  If multiple parents {Y1, Y2, ⋯ , Yn}, table contains P(X|Y1, Y2, ⋯ , Yn) BAYESIAN CLASSIFIERS: Bayesian Networks BAYESIAN CLASSIFIERS: Bayesian Networks  Model encodes relevant probabilities from which probabilistic inferences can then be calculated  Joint probability: P(G, S, R) = P(R)P(S|R)*P(G|S, R)  G = “Grass wet”  S = “Sprinkler on”  R = “Raining” BAYESIAN CLASSIFIERS: Bayesian Networks  We can then calculate, for example: P ( it is raining | grass is wet ) = = å å sprinklerÎ{T ,F } P ( it is raining AND grass is wet ) P (grass is wet ) P (grass is wet = T AND sprinkler AND raining=T ) sprinklerÎ{T ,F }, rainingÎ{T ,F } P (grass is wet = T AND sprinkler AND raining) BAYESIAN CLASSIFIERS: Bayesian Networks  That is P ( it is raining | grass is wet ) = P ( TTT ) + P ( TFT ) = P ( TTT ) + P ( TTF ) + P ( TFT) + P ( TFF ) 0.198 + 0.154 » 0.3577 0.198 + 0.288 + 0.1584 + 0.0 BAYESIAN CLASSIFIERS: Bayesian Networks  Building the model  Create network structure (graph)  Determine probability values of tables  Simplest case  Network defined by user  Most real-world cases  Defining network too com[plex  Use machine learning: many algorithms BAYESIAN CLASSIFIERS: Bayesian Networks  Algorithms built into Weka         User defined network Conditional independence tests Genetic search Hill climber K2 Simulated annealing Maximum weight spanning tree Tabu search BAYESIAN CLASSIFIERS: Bayesian Networks  Many other versions online  BNT (Bayes’ Net Tree) Matlab toolbox  Kevin Murphy, University of British Columbia  http://www.cs.ubc.ca/~murphyk/Software/

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download DATA TYPES: Definitions