Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9 Outline Bayesian networks D-separation and independence Inference Russell & Norvig, sections 14.1 to 14.4 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2 Recall the Story from FOL Anyone passing their 457 exam and winning the lottery is happy. Anyone who studies or is lucky can pass all their exams. Bob did not study but is lucky. Anyone who’s lucky can win the lottery. Is Bob happy? ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3 Add Probabilities Anyone passing their 457 exam and winning the lottery has a 99% chance of being happy. Anyone only passing their 457 exam has an 80%, while someone only winning the lottery has a 60% chance of being happy, and someone who does neither has a 20% chance of being happy. Anyone who studies has a 90% chance of passing their exams. Anyone who’s lucky has a 50% chance of passing their exams. Anyone who’s both lucky and who studied has a 99% chance of passing, but someone who didn’t study and is unlucky has a 1% chance of passing. There’s a 20% chance that Bob studied, but a 75% chance that he’ll be lucky. Anyone who’s lucky has a 40% chance of winning the lottery, while an unlucky person only has a 1% chance of winning. What’s the probability of Bob being happy? ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4 Probabilities in the Story Example of probabilities in the story P(Lucky) = 0.75 P(Study) = 0.2 P(PassExam|Study) = 0.9 P(PassExam|Lucky) = 0.5 P(Win|Lucky) = 0.4 P(Happy|PassExam,Win) = 0.99 Some variables directly affect others! Graphical representation of dependencies and conditional independencies between variables? ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5 Bayesian Network Lucky Study Belief network Win PassExam Happy ECE457 Applied Artificial Intelligence Directed acyclic graph Nodes represent variables Edges represent conditional relationships Concise representation of any full joint probability distribution R. Khoury (2007) Page 6 Bayesian Network Lucky Study Win PassExam Nodes with no parents have prior probabilities Nodes with parents have conditional probability tables Happy ECE457 Applied Artificial Intelligence For all truth value combinations of their parents R. Khoury (2007) Page 7 Bayesian Network P(L) = 0.75 L P(W) F T 0.01 0.4 Study Win PassExam P(S) = 0.2 P(W|L) P(W|L) W E F F P(H) P(H) 0.2 0.8 T F T 0.6 0.4 0.8 0.2 0.99 0.01 F T T Lucky Happy ECE457 Applied Artificial Intelligence L F T F T S F F T T P(E) 0.01 0.5 0.9 0.99 R. Khoury (2007) P(E|LS) P(E|LS) P(E|LS) P(E|LS) Page 8 Bayesian Network a o b e p d c q n s m f y k ECE457 Applied Artificial Intelligence t v w l i j u h g r z x R. Khoury (2007) Page 9 Chain Rule Recall the chain rule P(A,B) = P(A|B)P(B) P(A,B,C) = P(A|B,C)P(B,C) P(A,B,C) = P(A|B,C)P(B|C)P(C) P(A1,A2,…,An) = P(A1|A2,…,An)P(A2|A3,…,An)…P(An-1|An)P(An) P(A1,A2,…,An) = i=1n P(Ai|Ai+1,…,An) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10 Chain Rule If we know the value of a node’s parents, we don’t care about more distant ancestors Their influence is included through the parents A node is conditionally independent of its predecessors given its parents Or more generally, a node is conditionally independent of its non-descendents given its parents Update chain rule P(A1,A2,…,An) = i=1n P(Ai|parents(Ai)) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11 Chain Rule Example Probability that Bob is happy because he won the lottery and passed his exam, because he’s lucky but did not study P(H,W,E,L,S) = P(H|WE) * P(W|L) * P(E|LS) * P(L) * P(S) P(H,W,E,L,S) = 0.99 * 0.4 * 0.5 * 0.75 * 0.8 P(H,W,E,L,S) = 0.12 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12 Constructing Bayesian Nets Lucky Study Win PassExam Build from the topdown Start with root nodes Add children Go down to leaves Happy ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13 Constructing Bayesian Nets Lucky Study Win PassExam Happy ECE457 Applied Artificial Intelligence What happens if we build with the wrong order? Network becomes needlessly complicated Node ordering is important! R. Khoury (2007) Page 14 Connections We can understand dependence in a network by considering how evidence is transmitted through it Information entered at one node Propagates to descendents and ancestors through connected nodes Provided no node in path already has evidence (in which case we would stop the propagation) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15 Serial Connection Lucky Study Win PassExam Happy ECE457 Applied Artificial Intelligence Study and Happy are dependent Study and Happy are independent given PassExam Intuitively, the only way Study can affect Happy is through PassExam R. Khoury (2007) Page 16 Converging Connection Lucky Study Win PassExam Happy ECE457 Applied Artificial Intelligence Lucky and Study are independent Lucky and Study are dependent given PassExam Intuitively, Lucky can be used to explain away Study R. Khoury (2007) Page 17 Diverging Connection Lucky Study Win PassExam Happy ECE457 Applied Artificial Intelligence Win and PassExams are dependent Win and PassExams are independent given Lucky Intuitively, Lucky can explain both Win and PassExam. Win and PassExam can affect each other by changing the belief in Lucky R. Khoury (2007) Page 18 D-Separation Determine if two variables are independent given some other variables X is independent of Y given Z if X and Y are dseparate given Z X is d-separate from Y if, for all (undirected) paths between X and Y, there exists a node Z for which: The connection is serial or diverging and there is evidence for Z The connection is converging and there is no evidence for Z or any of its descendents ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19 D-Separation X X Y Z Z Blocks path if not Blocks path if in evidence in evidence Z Z2 Blocks path if in evidence Y X ECE457 Applied Artificial Intelligence Blocks path if not in evidence Y R. Khoury (2007) Page 20 D-Separation Can be computed in linear time using depth-first-search algorithm Fast algorithm to know if two nodes are independent Allows us to infer whether learning the value of a variable might give us information about another variable given what we already know All d-separated variables are independent but not all independent variable are dseparated ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21 D-Separation Exercise c g e b a d h f i If we observe a value for node g, what other nodes are updated? j Nodes f, h and i If we observe a value for node a, what other nodes are updated? Nodes b, c, d, e, f ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22 D-Separation Exercise c g e b a d h f i Given an observation of c, are nodes a and f independent? j Yes Given an observation of i, are nodes g and j independent? No ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23 Other Independence Criteria o b p d c n s m u h g v w l i z y k ECE457 Applied Artificial Intelligence A node is conditionally independent of its nondescendents given its parents Recall from updated chain rule x R. Khoury (2007) Page 24 Other Independence Criteria o b p d c n s m u h g v w l i z y k ECE457 Applied Artificial Intelligence x A node is conditionally independent of all others in the network given its parents, children, and children’s parents Markov blanket R. Khoury (2007) Page 25 Inference in Bayesian Network Compute the posterior probability of a query variable given an observed event P(A1,A2,…,An) = i=1n P(Ai|parents(Ai)) Observed evidence variables E = E1,…,Em Query variable X Between them: nonevidence (hidden) variables Y = Y1…Yl Belief network is X E Y ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26 Inference in Bayesian Network P(X|E) Recall Bayes’ Theorem: P(A|B) = P(A,B) / P(B) P(X|E) = α P(X,E) Recall marginalization: P(Ai) = j P(Ai,Bj) P(X|E) = α Y P(X,E,Y) Recall chain rule: P(A1,A2,…,An) = i=1n P(Ai|parents(Ai)) P(X|E) = α Y A=XE P(A|parents(A)) ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27 Inference Example Lucky P(L) = 0.75 L P(W) F T 0.01 0.4 Study Win W E F F P(H) 0.2 T F T 0.6 0.8 0.99 F T T PassExam P(S) = 0.2 L F T F S F F T P(E) 0.01 0.5 0.9 T T 0.99 Happy ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28 Inference Example #1 With only the information from the network (and no observations), what’s the probability that Bob won the lottery? P(W) = l P(W,l) P(W) = l P(W|l)P(l) P(W) = P(W|L)P(L) + P(W|L)P(L) P(W) = 0.4*0.75 + 0.01*0.25 P(W) = 0.3025 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29 Inference Example #2 Given that we know that Bob is happy, what’s the probability that Bob won the lottery? From the network, we know P(h,e,w,s,l) = P(l)P(s)P(e|l,s)P(w|l)P(h|w,e) We want to find P(W|H) = α l s e P(l)P(s)P(e|l,s)P(W|l)P(H|W,e) P(W|H) also needed to normalize ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30 Inference Example #2 l s F F T F e F F P(s) 0.8 0.8 P(l) 0.25 0.75 P(e|l,s) 0.99 0.5 P(W|l) P(H|W,e) 0.01 0.6 0.001188 0.4 0.6 0.072 F T F T T T F F F F T T 0.2 0.2 0.8 0.8 0.25 0.75 0.25 0.75 0.1 0.01 0.01 0.5 0.01 0.4 0.01 0.4 0.6 0.6 0.99 0.99 0.00003 0.00036 0.0000198 0.1188 F T T T T T 0.2 0.2 0.25 0.75 0.9 0.99 0.01 0.4 0.99 0.99 0.0004455 0.058806 P(W|H) = α 0.2516493 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31 Inference Example #2 l P(W|l) P(H|W,e) 0.25 0.99 0.75 0.5 0.99 0.6 0.2 0.2 0.039204 0.036 0.2 0.2 0.8 0.8 0.25 0.75 0.25 0.75 0.99 0.6 0.99 0.6 0.2 0.2 0.8 0.8 0.00099 0.00018 0.001584 0.144 T 0.2 0.25 0.9 0.99 0.8 0.03564 T 0.2 0.75 0.99 0.6 0.8 0.07128 s e P(s) P(l) F F T F F F 0.8 0.8 F T F T T T F F F F T T F T T T P(e|l,s) 0.1 0.01 0.01 0.5 P(W|H) = α 0.328878 ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32 Inference Example #2 P(W|H) = α <0.2516493, 0.328878> P(W|H) = <0.4335, 0.5665> Note that P(W|H) > P(W|H) because P(W|L) P(W|L) The probability of Bob having won the lottery has increased by 13.1% thanks to our knowledge that he is happy! ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33 Expert Systems Bayesian networks used to implement expert systems Diagnostic systems that contains subject-specific knowledge Knowledge (nodes, relationships, probabilities) typically provided by human experts System observes evidence by asking questions to user, then infers most likely conclusion ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34 Pathfinder Expert system for medical diagnostic of lymph-node diseases Very large Bayesian network Over 60 diseases Over 100 features of lymph nodes Over 30 features for clinical information Lot of work from medical experts 8 hours to define features and diseases 35 hours to build network topology 40 hours to assess probabilities ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35 Pathfinder One node for each disease Assumes the diseases are mutually exclusive and exhaustive Large domain, hard to handle Several small networks for diagnostic tasks built individually Then combined into a single large network ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36 Pathfinder Testing the network ECE457 Applied Artificial Intelligence R. Khoury (2007) 53 test cases (real diagnostics) Diagnostic accuracy as good as a medical expert Page 37 Assumptions Learning agent Environment Fully observable / Partially observable Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 38 Assumptions Updated We can handle a new combination! Fully observable & Deterministic Fully observable & Stochastic Games of chance (Monopoly, Backgammon) Partially observable & Deterministic No uncertainty (map of Romania) Logic (Wumpus World) Partially observable & Stochastic ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 39