* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lec13-BayesNet
Theoretical computer science wikipedia , lookup
Network science wikipedia , lookup
Generalized linear model wikipedia , lookup
Corecursion wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Pattern recognition wikipedia , lookup
Probability box wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ? Issues • Representational Power – allows for unknown, uncertain information • Inference – Question: What is Probability of X if E is true. – Processing: in general, exponential • Acquisition or Learning – network: human input – probabilities: data+ learning Bayesian Network • • • • Directed Acyclic Graph Nodes are RV’s Edges denote dependencies Root nodes = nodes without predecessors – prior probability table • Non-root nodes – conditional probabilites for all predecessors Bayes Net Example: Structure Earthquake Burglary Alarm John Calls Mary Calls Probabilities Structure dictates what probabilities are needed P(B) = .001 P(-B) = .999 P(E) = .002 P(-E) = .998 etc. P(A|B&E) = .95 P(A|B&-E) = .94 P(A|-B&E) = .29 P(A|-B&-E) = .001 P(JC|A) = .90 P(JC|-A) = .05 P(MC|A) = .70 P(MC|-A) = .01 Joint Probability yields all • Event = fully specified values for RVs. • Prob of event: P(x1,x2,..xn) = P(x1|Parents(X1))*..P(xn|Parents(Xn)) • E.g. P(j&m&a&-b&-e) = P(j|a)*P(m|a)*P(a|-b^-e)*P(-b)*P(-e) = .9*.7*.001*.999*..998 = .00062. • Do this for all events and then sum as needed. • Yields exact probability (assumes table right) Many Questions • With 5 boolean variables, joint probability has 2^5 entries, 1 for each event. • A query corresponds to the sum of a subset of these entries. • Hence 2^2^5 queries possibles. – 4 billion possible queries. Probability Calculation Cost • With 5 boolean variables need 2^5 entries. In general 2^n entries with n booleans. • For Bayes Net, only need tables for all conditional probabilities and priors. • If max k inputs to a node, and n RVs, then need at most n*2^k table entries. • Data and computation reduced. Example Computation Method: transform query so matches tables Bold = in a table P(Burglary|Alarm) = P(B|A) = P(A|B)*P(B)/ P(A) P(A|B) = P(A|B,E)*P(E)+P(A|B,~E)*P(~E). Done. Plug and chug. Query Types • Diagnostic: from effects to causes – P(Burglary | JohnCalls) • Causal: from causes to effects – P(JohnCalls | Burglary) • Explaining away: multiple causes for effect – P(Burglary | Alarm and Earthquake) • Everything else Approximate Inference • Simple Sampling: logic sample • Use BayesNetwork as a generative model • Eg. generate million or more models, via topological order. • Generates examples with appropriate distribution. • Now use examples to estimate probabilities. Logic Sampling: simulation • Query: P(j&m&a&-b&-e) • Topological sort Variables, i.e – Any order that preserves partial order – E.g B, E, A, MC, JC • Use prob tables, in order to set values – E.g. p(B = t) = .001 => create a world with B being true once in a thousand times. – Use value of B and E to set A, then MC and JC • Yields (1 million) .000606 rather than .00062 • Generally huge number of simulations for small probabilities. Sampling -> probabilities • • • Generate examples with proper probability density. Use the ordering of the nodes to construct events. Finally count to yield an estimate of the exact probability. Sensitivity Analysis: Confidence of Estimate • Given n examples and k are heads. • How many examples needed to be 99% certain that k/n is within .01 of the true p. • From statistic: Mean = np, Variance = npq • For confidence of .99, t = 3.25 (table) • 3.25*sqrt(pq/N) < .01 => N >6,400. • But correct probabilities not needed, just correct ordering. Lymphoma Diagnosis PathFinder systems • • • • • 60 diseases, 130 features I: rule based, performance ok II: used mycin confidence, better III: Do Bayes Net: best IV: Better Bayes Net: (add utility theory) – outperformed experts – solved the combination of expertise problem Summary • Bayes nets easier to construct then rulebased expert systems – Years for rules, days for random variables and structure • Probability theory provides sound basis for decisions – Correct probabilities still a problem • Many diagnostic applications • Explanation less clear: use strong influences