* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Wet-Sprinkler-Rain Example
Survey
Document related concepts
Transcript
Used in Spring 2012, Spring 2013, Winter 2014 (partially) 1. Bayesian Networks 2.Conditional Independence 3.Creating Tables 4.Notations for Bayesian Networks 5.Calculating conditional probabilities from the tables 6.Calculating conditional independence 7.Markov Chain Monte Carlo 8.Markov Models. 9.Markov Models and Probabilistic methods in vision Introduction to Probabilistic Robotics 1. Probabilities 2. Bayes rule 3. Bayes filters 4. Bayes networks 5. Markov Chains new next Bayesian Networks and Markov Models • Bayesian networks and Markov models : 1. Applications in User Modeling 2. Applications in Natural Language Processing 3. Applications in robotic control 4. Applications in robot Vision many many more Bayesian Networks (BNs) – Overview • Introduction to BNs – Nodes, structure and probabilities – Reasoning with BNs – Understanding BNs • Extensions of BNs – Decision Networks – Dynamic Bayesian Networks (DBNs) Definition of Bayesian Networks • A data structure that represents the dependence between variables • Gives a concise specification of the joint probability distribution • A Bayesian Network is a directed acyclic graph (DAG) in which the following holds: 1. A set of random variables makes up the nodes in the network 2. A set of directed links connects pairs of nodes 3. Each node has a probability distribution that quantifies the effects of its parents 5 Conditional Independence The relationship between : – conditional independence – and BN structure is important for understanding how BNs work 6 Conditional Independence – Causal Chains • Causal chains give rise to conditional independence A B C P(C | A B) P(C | B) • Example: “Smoking causes cancer, which causes dyspnoea” A smoking B cancer C dyspnoea Conditional Independence – Common Causes • Common Causes (or ancestors) also give rise to conditional independence cancer B Xray dyspnoea A C P(C | A B) P(C | B) (A(A indep C)| C) B |B indep Example: “Cancer is a common cause of the two symptoms: a positive Xray and dyspnoea” I have dyspnoea (C ) because of cancer (B) so I do not need an Xray test Conditional Dependence – Common Effects • Common effects (or their descendants) give rise to conditional dependence pollution A C smoking ??? B pollution cancer cancer P( A | C B) P( A) P(C ) ((A indep C) | B) • Example: “Cancer is a common effect of pollution and smoking” Given cancer, smoking “explains away” pollution We know that you smoke and have cancer, we do not need to assume that your cancer was caused by pollution Joint Distributions for describing uncertain worlds • Researchers found already numerous and dramatic benefits of Joint Distributions for describing uncertain worlds • Students in robotics and Artificial Intelligence have to understand problems with using Joint Distributions • You should discover how Bayes Net methodology allows us to build Joint Distributions in manageable chunks 10 Bayes Net methodology Why Bayesian methods matter? 1. Bayesian Methods are one of the most important conceptual advances in the Machine Learning / AI field to have emerged since 1995. 2. A clean, clear, manageable language and methodology for expressing what the robot designer is certain and uncertain about 3. Already, many practical applications in medicine, factories, helpdesks: for instance 1. 2. 3. P(this problem | these symptoms) // we will use P as probability anomalousness of this observation choosing next diagnostic test | these observations 11 12 Problem 1: Creating Joint Distribution Table • Joint Distribution Table is an important concept Probabilistic truth table • • Truth table of all combinations of Boolean Variables You can guess this table, you can take data from some statistics, You can build this table based on some partial tables • Idea – use decision diagrams to represent these data. Use of independence while creating the tables Wet – Sprinkler – Rain Example 17 W S R Wet-Sprinkler-Rain Example 18 Problem 1: Creating the Joint Table Our Goal is to derive this table Let us observe that if I know 7 of these, the eight is obviously unique , as their sum = 1 So I need to guess or calculate or find 2n-1 = 7 values But the same data can be stored explicitely or implicitely, not necessarily in the form of a table!! What extra assumptions can help to create this table? Wet-Sprinkler-Rain 21 Example Sprinkler on under condition that it rained Understanding of causation You need to understand causation when you create the table Wet-Sprinkler-Rain Example 22 Independence simplifies probabilities We use independence of variables S and R P(S|R) = Sprinkler on under condition that it rained S and R are independent We can use these probabilities to create the table Wet-Sprinkler-Rain 23 Example Wet-Sprinkler-Rain Example We create the CPT for S and R based on our knowledge of the problem Sprinkler was on It rained Conditional Probability Table (CPT) Grass is wet What about children playing or a dog pissing? It is still possible by this value 0.1 This first step shows the collected data 24 Full joint for only S and R Independence of S and R is used 0.95 0.90 0.90 0.01 Use chain rule for probabilities Wet-Sprinkler-Rain 25 Example Chain Rule for Probabilities Random variables 0.95 0.90 0.90 0.01 Full joint probability • You have a table • You want to calculate some probability P(~W) Wet-Sprinkler-Rain Example Independence of S and R implies calculating fewer numbers to create the complete Joint Table for W, S and R Six numbers We reduced only from seven to six numbers Wet-Sprinkler-Rain 28 Example Explanation of Diagrammatic Notations such as Bayes Networks You do not need to build the complete table!! You can build a graph of tables or nodes which correspond to certain types of tables 30 Wet-Sprinkler-Rain Example Wet-Sprinkler-Rain Example Sprinkler was on It rained Grass is wet Conditional Probability Table (CPT) This first step shows the collected data Full joint probability • You have a table • You want to calculate some probability P(~W) When you have this table you can modify it, you can also calculate everything!! Problem 2: Calculating conditional probabilities from the Joint Distribution Table Wet-Sprinkler-Rain Example Probability that S=T and W=T Probability that S=T = Probability that grass is wet under assumption that sprinkler was on Wet-Sprinkler-Rain 36 Example Wet-Sprinkler-Rain Example We will use this in next slide We showed examples of both causal inference and diagnostic 37 inference “Explaining Away” the facts from the table Calculated earlier from this table < Wet-Sprinkler-Rain Example 38 Conclusions on this problem 1. Table can be used for Explaining Away 2. Table can be used to calculate conditional independence. 3. Table can be used to calculate conditional probabilities 4. Table can be used to determine causality 39 Problem 3: What if S and R are dependent? Calculating conditional independence Conditional Independence of S and R Wet-SprinklerRain Example 41 Diagrammatic notation for conditional Independence of two variables Wet-Sprinkler-Rain Example extended 42 Conditional Independence formalized for sets of variables S3 S1 S2 Now we will explain conditional independence CLOUDY - Wet-SprinklerRain Example Example – Lung Cancer Diagnosis Example – Lung Cancer Diagnosis 1. A patient has been suffering from shortness of breath (called dyspnoea) and visits the doctor, worried that he has lung cancer. 2. The doctor knows that other diseases, such as tuberculosis and bronchitis are possible causes, as well as lung cancer. 3. She also knows that other relevant information includes whether or not the patient is a smoker (increasing the chances of cancer and bronchitis) and what sort of air pollution he has been exposed to. 4. A positive Xray would indicate either TB or lung cancer. 46 Nodes and Values in Bayesian Networks Q: What are the nodes to represent and what values can they take? A: Nodes can be discrete or continuous • Boolean nodes – represent propositions taking binary values Example: Cancer node represents proposition “the patient has cancer” • Ordered values Example: Pollution node with values low, medium, high • Integral values Example: Age with possible values 1-120 Lung Cancer 47 Lung Cancer Example: Nodes and Values Node name Type Values Pollution Binary {low,high} Smoker Boolean {T,F} Cancer Boolean {T,F} Dyspnoea Boolean {T,F} Xray Binary {pos,neg} Example of variables as nodes in BN Lung Cancer Example: Bayesian Network Structure Pollution Smoker Cancer Xray Dyspnoea 49 Lung Cancer Conditional Probability Tables (CPTs) in Bayesian Networks Conditional Probability Tables (CPTs) in Bayesian Networks After specifying topology, we must specify the CPT for each discrete node 1. Each row of CPT contains the conditional probability of each node value for each possible combination of values in its parent nodes 2. Each row of CPT must sum to 1 3. A CPT for a Boolean variable with n Boolean parents contains 2n+1 probabilities 4. A node with no parents has one row (its prior probabilities) 51 Lung Cancer Example: example of CPT Smoking is true Pollution low Probability of cancer given state of variables P and S C= cancer P = pollution S = smoking X = Xray D = Dyspnoea Bayesian Network for cancer 52 Lung Cancer • Several small CPTs are used to create larger JDTs. The Markov Property for Bayesian Networks • Modelling with BNs requires assuming the Markov Property: – There are no direct dependencies in the system being modelled which are not already explicitly shown via arcs • Example: smoking can influence dyspnoea only through causing cancer Markov’s Idea: all information is modelled by arcs Software NETICA for Bayesian Networks and joint probabilities Reasoning with Numbers – Using Netica software Here are the collected data 56 Lung Cancer Representing the Joint Probability Distribution: Example Pr( P low S F C T X pos D T ) We want to calculate this Pr( P low) Pr( S F ) Pr( C T | P low, S F ) Pr( X pos | C T ) Pr( D T | C T ) This graph shows how we can calculate the joint probability from other probabilities in the network P = pollution S = smoking X = Xray D = Dyspnoea 57 Lung Cancer Problem 4: Determining Causality and Bayes Nets: Advertisement example Causality and Bayes Nets: Advertisement example • Bayes nets allow one to learn about causal relationships • One more Example: – Marketing analysts want to know whether to increase, decrease or leave unchanged the exposure of some advertisement in order to maximize profit from the sale of some product – Advertised (A) and Buy (B) will be variables for someone having seen the advertisement or purchased the product Advertised-Buy Example Causality Example 1. So we want to know the probability that B=true given that we force A=true, or A=false 2. We could do this by finding two similar populations and observing B based on A=true for one and A=false for the other 3. But this may be difficult or expensive to find such populations –Advertised (A) seen the advertisement –Buy (B) will be variables for someone purchased the product Advertised-Buy Example How causality can be represented in a graph? Markov condition and Causal Markov Condition • But how do we learn whether or not A causes B at all? • The Markov Condition states: – Any node in a Bayes net is conditionally independent of its non-descendants given its parents • The CAUSAL Markov Condition (CMC) states: – Any phenomenon in a causal net is independent of its noneffects given its direct causes Advertised (A) and Buy (B) Advertised-Buy Example Acyclic Causal Graph versus Bayes Net • Thus, if we have a directed acyclic causal graph C for variables in X, then, by the Causal Markov Condition, C is also a Bayes net for the joint probability distribution of X • The reverse is not necessarily true—a network may satisfy the Markov condition without depicting causality Advertised-Buy Example Causality Example: when we learn that p(b|a) and p(b|a’) are not equal • Given the Causal Markov Condition CMC, we can infer causal relationships from conditional (in)dependence relationships learned from the data • Suppose we learn with high Bayesian probability that p(b|a) and p(b|a’) are not equal • Given the CMC, there are four simple causal explanations for this: (more complex ones too) Causality Example: four causal explanations A causes B If they advertise more you buy more B causes A If you buy more, they have more money to advertise Causality Example: four causal explanations selection bias 1. Hidden common cause of A and B (e.g. income) In rich country they advertise more and they buy more A and B are causes for data selection 1. 2. (a.k.a. selection bias, perhaps if database didn’t record false instances of A and B) If you increase information about Ad in database, then you increase also information about Buy in the database Causality Example continued • But we still don’t know if A causes B • Suppose – We learn about the Income (I) and geographic Location (L) of the purchaser – And we learn with high Bayesian probability the network on the right Advertised (A=Ad) and Buy (B) Advertised-Buy Example Causality Example - using CMC • Given the Causal Markov Condition CMC, the ONLY causal explanation for the conditional (in)dependence relationships encoded in the Bayes net is that Ad is a cause for Buy • That is, none of the other relationships or combinations thereof produce the probabilistic relationships encoded here Advertised (Ad) and Buy (B) Advertised-Buy Example Causality in Bayes Networks • Thus, Bayes Nets allow inference of causal relationships by the Causal Markov Condition (CMC) Problem 5: Determine D-separation in Bayesian Networks D-separation in Bayesian Networks • We will formulate a Graphical Criterion of conditional independence • We can determine whether a set of nodes X is independent of another set Y, given a set of evidence nodes E, via the Markov property: – If every undirected path from a node in X to a node in Y is d-separated by E, then X and Y are conditionally independent given E Determining D-separation (cont) • A set of nodes E d-separates two sets of nodes X and Y, if every undirected path from a node in X to a node in Y is blocked given E • A path is blocked given a set of nodes E, if there is a node Z on the path for which one of three conditions holds: 1. Z is in E and Z has one arrow on the path leading in and one arrow out (chain) 2. 3. Z is in E and Z has both path arrows leading out (common cause) Neither Z nor any descendant of Z is in E, and both path arrows lead into Z (common effect) Chain Common cause Common effect Another Example of Bayesian Networks: Alarm Alarm Example • Let us draw BN from these data Bayes Net Corresponding to AlarmBurglar problem Alarm Example 74 Compactness, Global Semantics, Local Semantics and Markov Blanket • Compactness of Bayes Net Burglar Earthquake John calls Mary calls 75 Alarm Example Global Semantics, Local Semantics and Markov Blanket for BNs • Useful concepts 76 Alarm Example 78 Markov’s blanket are: 1. Parents 2. Children 3. Children’s parent 79 Problem 6 How to systematically Build a Bayes Network -Example 81 Alarm Example 83 Alarm Example 84 Alarm Example So we add arrow 85 Alarm Example 86 Alarm Example 87 Alarm Example Bayes Net for the car that does not want to start Such networks can be used for robot diagnostics or diagnostic of a human done by robot 88 Inference in Bayes Nets and how to simplify it Alarm Example 89 First method of simplification: Enumeration 90 Alarm Example 91 Alarm Example Second method: Variable Elimination Alarm Example Variable A was eliminated 92 3SAT Example Polytrees are 93 better IDEA: 94 Convert DAG to polytrees Clustering is used to convert nonpolytree BNs 95 EXAMPLE: Clustering is used to convert non-polytree BNs Not a polytree Is a polytree 96 Alarm Example • Approximate Inference 1. 2. 3. 4. Direct sampling methods Rejection sampling Likelihood weighting Markov chain Monte Carlo 97 1. Direct Sampling Methods 98 Direct Sampling Direct Sampling generates minterms with their probabilities We start from top W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 100 Rain Example Cloudy = yes W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 101 Rain Example Cloudy = yes W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 102 Rain Example sprinkler = no W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 103 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 104 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 105 Rain Example W=wet C= cloudy R = rain S = sprinkler We generated a sample minterm C S’ R W Wet Sprinkler 106 Rain Example 2. Rejection Sampling Methods 107 Rejection Sampling • Reject inconsistent samples Wet Sprinkler 108 Rain Example 3. Likelihood weighting methods 109 110 W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 111 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 112 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 113 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 114 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 115 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 116 Rain Example W=wet C= cloudy R = rain S = sprinkler Wet Sprinkler 117 Rain Example Likelihood weighting vs. rejection sampling 1. Both generate consistent estimates of the joint distribution conditioned on the values of the evidence variables 2. Likelihood weighting converges faster to the correct probabilities 3. But even likelihood weighting degrades with many evidence variables because a few samples will have nearly all the total weight 119 Sources Prof. David Page Matthew G. Lee Nuria Oliver, Barbara Rosario Alex Pentland Ehrlich Av Ronald J. Williams Andrew Moore tutorial with the same title Russell &Norvig’s AIMA site Alpaydin’s Introduction to Machine Learning site.