Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of randomness wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Stochastic geometry models of wireless networks wikipedia , lookup
Birthday problem wikipedia , lookup
Probability box wikipedia , lookup
Dempster–Shafer theory wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Networks Distinguished Prof. Dr. Panos M. Pardalos Center for Applied Optimization Department of Industrial & Systems Engineering Computer & Information Science & Engineering Department Biomedical Engineering Program, McKnight Brain Institute University of Florida http://www.ise.ufl.edu/pardalos/ Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Lecture Outline Introduction Fundamentals of Probability Probability Space Conditional Probability Bayes Rule Naı̈ve Bayes Classifier Bayesian Approach Example Bayesian Networks Graph Theory Concepts Definition Inferencing Conclusions Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Introduction I Bayesian Networks are applied in cases of uncertainty when we know certain [conditional] probabilities and are looking for unknown probabilities given specific conditions. I Applications: bioinformatics and medicine, engineering, document classification, image processing, data fusion, and decision support systems, etc Examples: I I I I Inference: P(Diagnosis|Symptom) Anomaly detection: Is this observation anomalous? Active Data Collection: What is the next diagnostic test given a set of observations? Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Discrete Random Variables I Let A denote a boolean-valued random variable I If A denotes an event, and there is some degree of uncertainty as to whether A occurs. Examples I I I I A = Patient has Tuberculosis A = Coin flipping outcome is Head A = France will win World Cup in 2010 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Intuition Behind Probability I Intuitively probability of event A equals to the proportion of the outcomes where A is true I Ω is the set of all possible outcomes. I Its area is P(Ω) = 1 I The set colored in orange corresponds to the outcomes where A is true I P(A) = Area of orange oval. Clearly 0 ≤ P(A) ≤ 1. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Kolmogorov’s Probability Axioms “The theory of probability as a mathematical discipline can and should be developed from axioms in exactly the same way as geometry and algebra. ” Andrey Nikolaevich Kolmogorov. Foundations of the Theory of Probability, 1933. 1. P(A) ≥ 0, ∀A ⊆ Ω 2. P(Ω) = 1 3. σ-additivity: Any countable sequence of pairwise disjoint events A1 , A2 , . . . satisfies ! [ X P Ai = P(Ai ) i Panos Pardalos i Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Other Ways to Deal with Uncertainty I Three-valued logic: True / False / Maybe I Fuzzy logic (truth values between 0 and 1) I Non-monotonic reasoning (especially focused on Penguin informatics) I Dempster-Shafer theory (and an extension known as quasi-Bayesian theory) I Possibabilistic Logic I But... Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Coherence of the Axioms I The Kolmogorov’s axioms of probability are the only model with this property: I Wagers (probabilities) are assigned in such a way that no matter what set of wagers your opponent chooses you are not exposed to certain loss Bruno de Finetti, Probabilismo, Napoli, Logos 14, 1931, pp 163-219. Bruno de Finetti. Probabilism: A Critical Essay on the Theory of Probability and on the Value of Science, (translation of 1931 article) in Erkenntnis, volume 31, September 1989, pp 169-223. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Consequences of the Axioms I P(Ā) = 1 − P(A), where Ā = Ω\A I P(∅) = 0 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Consequences of the Axioms I P(A ∪ B) = P(A) + P(B) − P(A ∩ B) I P(A) = P(A ∩ B) + P(A ∩ B̄) Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Conditional Probability P(A|B) = Proportion of the space in which A is true that also have B true Formal definition: P(B|A) = Panos Pardalos P(A ∩ B) P(A) Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Conditional Probability: Example I Let us draw a card from the deck of 52 playing cards. I A =the card is a court card. P(A) = 12/52 = 3/13 I B =the card is a queen. P(B) = 4/52 = 1/13, P(B ∩ A) = P(B) = 1/13 If we apply the definition we obtain very intuitive result: I I P(B|A) = I P(A|B) = 1/13 3/13 1/13 1/13 = 1/3 =1 I C =the suit is spade. P(C ) = 1/4 I Note that P(C |A) = P(C ) = 1/4. In other words event C is independent from A. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Independent Events Definition Two events A and B are independent if and only if P(A ∩ B) = P(A)Pr (B). Let us denote independence of A and B as I (A, B). The independence of A and B implies I P(A|B) = P(A), if P(B) 6= 0 I P(B|A) = P(B), if P(A) 6= 0 I Why? Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Conditional Independence I One might observe that people of longer arms tend to have higher levels of reading skills I If the age is fixed then this relationship disappears I Arm length and reading skills are conditionally independent given the age Definition Two events A and B are conditionally independent given C if and only if P(A ∩ B|C ) = P(A|C )Pr (B|C ). Notation: I (A, B|C ). I P(A|B, C ) = P(A|C ), if P(B|C ) 6= 0 I P(B|A, C ) = P(B|C ), if P(A|C ) 6= 0 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Bayes Rule I The definition of conditional probability P(A|B) = P(A ∩ B) P(B) I implies the chain rule: P(A ∩ B) = P(A|B)P(B). I By symmetry P(A ∩ B) = P(B|A)P(A) I After we equate right hand sides and do some algebra we obtain Bayes Rule P(B|A) = Panos Pardalos P(A|B)P(B) P(A) Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Monty Hall Problem I I I The treasure is equally probable contained in one of the boxes A, B and C , i.e. P(A) = P(B) = P(C ). You are offered to chose one of them. Let us say you choose box A. Then the host of the game opens the box which you did not chose and does not contain the treasure Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Probability Space Conditional Probability Bayes Rule Monty Hall Problem I For instance, the host has opened box C I Then you are offered an option to reconsider your choice. What would you do? In other words what are the probabilities P(A|NA,C ) and P(B|NA,C )? I I I What does your intuition advise? Now apply Bayes Rule. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Classification based on Bayes Theorem I Let Y denote class variable. For example we want to predict if the borrower will default. I Let X = (X1 , X1 , . . . , Xk ) denote the attribute set (i.e. home owner, marital status, annual income, etc) I We can treat X and Y as random variables and determine P(Y |X ) (posterior probability). I Knowing the probability P(Y |X 0 ) we can relate the relate the record X to the class that maximizes the posterior probability. I How can we estimate P(Y |X ) from training data? Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions # Home Owner (binary) Marital Status (categorical) 1 2 3 4 5 6 7 8 9 10 Yes No No Yes No No Yes No No No Single Married Single Married Divorced Married Divorced Single Married Single Bayesian Approach Example Annual Income (continuous) 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K Defaulted Borrower (class) No No No No Yes No No Yes No Yes Table: Historical data for default prediction Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Bayes approach I I Accurate estimate of posterior probability for every possible combination of attributes and classes requires a very large training set, even for a moderate number of attributes. We can utilize Bayes theorem instead P(Y |X ) = I I I P(X |Y ) × P(Y ) P(X ) P(X ) is a constant and can be calculated as a normalization multiplier P(Y ) can be easily estimated from training set (fraction of training records that belong to each class) P(X |Y ) is a more challenging task. Methods: I I Naı̈ve Bayes Classifier Bayesian Network Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Naı̈ve Bayes Classifier I Attributes are assumed to be conditionally independent, given the class label y : P(X |Y = y ) = k Y P(Xi |Y = y ) i=1 I thus P(Y |X ) = I P(Y ) Qk i=1 P(Xi |Y = y) P(X ) Now we need to estimate P(Xi |Y ) for i = 1, . . . , k. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Estimating Probabilities I I P(Xi = x|Y = y ) is estimated according to fraction of training instances in class y that take on a particular attribute value xi . For example I I I I P(Home Owner=Yes|Y = No) = 3/7 P(Marital Status=Single|Y = Yes) = 2/3 What about continuous attributes? One solution is to discretize each continuous attribute and then replace value with its corresponding interval (transform continuous attributes into ordinal attributes). I How can we discretize?.. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Continuous Attributes I Assume a certain type of probability distribution for continuous attribute. I For example it can be a Gaussian distribution having p.d.f. − 1 fij (xi ) = √ e 2πσij I (xi −µij )2 2σ 2 ij parameters for fij can be estimated based on training records that belongs to class yi Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Continuous Attributes I Using approximation Z xi + P(xi < Xi ≤ xi + |Y = yi ) = fij (y )dy ≈ fij (xi ) xi I and the fact that cancels out when we normalize posterior probability for P(Y |X ) allows us to assume P(Xi = xi |Y = yj ) = fij (xi ) Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Example I The sample mean for annual income attribute with respect to the class No x̄ = I 125 + 100 + 70 + . . . + 75 = 110 7 Variance (125 − 110)2 + (100 − 110)2 + . . . + (75 − 100)2 = 2975 6 √ s = 2975 = 54.54 Given a test record with income $120K s2 = I I P(Income = 120|No) = √ Panos Pardalos (120−110)2 1 e − 2×2975 = 0.0072 2π54.54 Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Example I I I I I I I I I I I Suppose X =(Home Owner=No, Marital Status=Married, Income = $120K) P( Home Owner = Yes | Y = No) = 3/7 P( Home Owner = No | Y = No) = 4/7 P( Home Owner = Yes | Y = Yes) = 0 P( Home Owner = No | Y = Yes) = 1 P( Marital Status = Divorced | Y = No) = 2/7 P( Marital Status = Married | Y = No) = 1/7 P( Marital Status = Single | Y = No) = 4/7 P( Marital Status = Divorced | Y = Yes) = 2/3 P( Marital Status = Married | Y = Yes) = 1/3 P( Marital Status = Single | Y = Yes) = 0 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Example I For annual income: I I I class No: x̄ = 110, s 2 = 2975 class Yes: x̄ = 90, s 2 = 25 Class-conditional probabilities: I I P(X |No) = P(Home Owner = No|No)× ×P(Status=Married|No)× ×P(Annual Income = $120K |No) = 4/7 × 4/7 × 0.0072 = 0.0024 P(X |Yes) = P(Home Owner = No|Yes)× ×P(Status=Married|Yes)× ×P(Annual Income = $120K |Yes) = 1 × 0 × 1.2 × 10−9 = 0 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Example I Posterior probabilities I I P(No|X ) = α × 7/10 × 0.0024 = 0.0016α P(Yes|X ) = 0 I where α = 1/P(X ) I Since P(No|X ) > (Yes|X ) the record is classified as “No” Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Bayesian Approach Example Naı̈ve Bayes Classifier: Discussion I Robust to isolated noise points because such points are averaged out when estimating conditional probabilities from data I Can handle missing values by ignoring the example during model building and classification I Robust to irrelevant attributes. If Xi is irrelevant then P(Xi |Y ) is almost uniformly distributed and thus P(Xi |Y ) has little impact on posterior probability Correlated attributes can degrade the performance because conditional independence does not hold. I I Bayesian Networks account dependence between attributes Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Directed Graph Definition A directed graph or digraph G is an ordered pair G := (V , A) with I V is a set, whose elements are called vertices or nodes, I A ⊆ V × V is a set of ordered pairs of vertices, called directed edges, arcs, or arrows. Panos Pardalos I V = {V1 , V2 , V3 , V4 , V5 } I E = {(V1 , V1 ), (V1 , V4 ), (V2 , V1 ), (V4 , V2 ), (V5 , V5 )} I Cycle: V1 → V4 → V2 Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Directed Acyclic Graph Definition A directed acyclic graph (DAG ), is a directed graph with no directed cycles; that is, for any vertex v , there is no nonempty directed path that starts and ends on v . Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Some Graph Theory Notions I V1 and V4 are parents of V2 I I V5 , V3 and V2 are descendants of V1 I I V1 is connected to V5 , V3 and V2 with directed paths V4 and V2 are ancestors of V3 I I (V1 , V2 ) ∈ E and (V4 , V2 ) ∈ E There exist directed paths from V4 and V2 to V3 V6 and V4 are nondescendents of V1 I Panos Pardalos Directed paths from V1 to V4 and V6 do not exist Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Bayesian Network Definition I Elements of Bayesian Network: I I Directed acyclic graph (DAG) encodes the dependence relationships among a set of variables A probability table associating each node to its immediate parent nodes I Each node of the graph represents a variable I Each arc asserts the dependence relationship between the pair of variables I DAG satisfies Markov condition Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing The Markov Condition Definition Suppose we have a joint probability distribution P of the random variables in some set V and a DAG G = (V , E ). We say that (G , P) satisfies the Markov condition if for each variable X ∈ V , {X } is conditionally independent of the set of all its nondescendents (NDX ) given the set of all its parents (PAX ). I ({X }, NDX |PAX ). The definitin implies that a root node X , which has no parents, is unconditionally independent from its nondescendents. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Figure: Bayes network: a case study Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Markov Condition Example Node E D HD Hb B C Parents ∅ ∅ E, D ? ? ? Independency I (E , {D, Hb}) I (D, E ) I (HD, Hb|{E , D}) ? ? ? Note that I (A, B|C ) implies I (A, D|C ) whenever D ⊂ B. Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Naı̈ve Bayes Classifier Representation I Recall that a naı̈ve Bayes classifier assumes conditional independence of attributes X1 , X1 ,. . ., Xk , given target class Y I This can be represented using a Bayesian Network below Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Inferencing I We can compute joint probability from a Bayesian Network P(X1 , X2 , . . . , Xn ) = n Y P(Xi |parents(Xi )) i=1 I Thus we can compute any conditional probability P P(X) entries X P(Xk , Xm ) matching Xk ,Xm P(Xk |Xm ) = = P P(Xm ) P(X) entries X matching Xk Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Example of Inferencing I Suppose no prior information about the person is given I What is the probability of developing heart disease? I α = {Yes, No}, β = {Healthy, Unhealthy} P(HD = Yes) = P P = α β P(HD = Yes|E = α, D = β)· ·P(E = α, D = β) = P P = α β P(HD = Yes|E = α, D = β)· ·P(E = α) · P(D = β) = = 0.25 · 0.7 · 0.25 + 0.45 · 0.7 · 0.75+ +0.55 · 0.3 · 0.25 + 0.75 · 0.30.75 = 0.49 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing I Now let us compute probability of heart disease when the person has high blood pressure γ = {Yes, No} I Probability of high blood pressure I P(B = High) = P = γ P(B = High|HD = γ)· ·P(HD = γ) = 0.85 · 0.49 + 0.2 · 0.51 = = 0.5185 I The posterior probability of heart disease given high blood pressure is P(HD = Yes|BP = High) = P(BP = High|HD = Yes) · P(HD = Yes) = P(BP = High) = (0.85 · 0.49)/0.5185 = 0.8033 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Complexity Issues I Recall, we can compute any conditional probability: P P(X) entries X P(Xk , Xm ) matching Xk ,Xm P(Xk |Xm ) = = P P(Xm ) P(X) entries X matching Xk I Generally it requires exponentially large number of operations I We can apply various tricks to reduce complexity I But querying of Bayes nets is NP-hard D. M. Chickering, D. Heckerman, C. Meek, Large-Sample Learning of Bayesian Networks is NP-Hard. Journal of Machine Learning Research, 5 (2004) 1287-1330 Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Graph Theory Concepts Definition Inferencing Bayesian Networks Discussion I Bayes network is an elegant way of encoding casual probabilistic dependencies. I The dependency model can be represented graphically I Constructing a network requires effort but adding a new variable is quite straightforward I Well suited for incomplete data. I Due to probabilistic nature of the model the method is robust to model overfitting Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions What we have learned I Independence and conditional independence I Bayes theorem I Naı̈ve Bayes classification I The definition of a Bayes net I Computing probabilities with a Bayes net Panos Pardalos Bayesian Networks Introduction Fundamentals of Probability Naı̈ve Bayes Classifier Bayesian Networks Conclusions Literature Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2005 Finn V. Jensen, Thomas D Nielsen. Bayesian Networks and Decision Graphs. 2nd Ed., Springer, 2007 Richard E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2003 Indea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988 Panos Pardalos Bayesian Networks