Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probabilistic Inference Reading: Chapter 13 Next time: How should we define artificial intelligence? Reading for next time (see Links, Reading for Retrospective Class): Turing paper Mind, Brain and Behavior, John Searle Prepare discussion points by midnight, wed night (see end of slides) Transition to empirical AI Add in Ability to infer new facts from old Ability to generalize Ability to learn based on past observation Key: Observation of the world Best decision given what is known 2 Overview of Probabilistic Inference Some terminology Inference by enumeration Bayesian Networks 3 4 5 6 7 8 Probability Basics Sample space Atomic event Probability model An event A 9 10 Random Variables Random variable Probability for a random variable 11 12 13 14 15 16 Logical Propositions and Probability Proposition = event (set of sample points) Given Boolean random variables A and B: Event a = set of sample points where A(ω)=true Event ⌐a=set of sample points where A(ω)=false Event aΛb=points where A(ω)=true and B(ω)=true Often the sample space is the Cartesian product of the range of variables Proposition=disjunction of atomic events in which it is true (aVb) = (⌐aΛb)V(aΛ⌐b)V(aΛb) P(aVb)= P(⌐aΛb)+P(aΛ⌐b)+P(aΛb) 17 18 19 20 21 22 23 24 Axioms of Probability All probabilities are between 0 and 1 Necessarily true propositions have probability 1. Necessarily false propositions have probability 0 The probability of a disjunction is P(aVb)=P(a)+P(b)-P(aΛb) P(⌐a)=1-p(a) 25 The definitions imply that certain logically related events must have related probabilities P(aVb)= P(a)+P(b)-P(aΛb) 26 Prior Probability Prior or unconditional probabilities of propositions Probability distribution gives values for all possible assignments P(female=true)=.5 corresponds to belief prior to arrival of any new evidence P(color) = (color = green, color=blue, color=purple) P(color)=<.6,.3,.1> (normalized: sums to 1) Joint probability distribution for a set of r.v.s gives the probability of every atomic event on those r.v.s (i.e., every sample point) P(color,gender) = a 3X2 matrix 27 28 29 30 31 32 33 Inference by enumeration Start with the joint distribution 34 Inference by enumeration P(HasTeeth)=.06+.12+.02=.2 35 Inference by enumeration P(HasTeethVColor=Green)=.06+.12+.02+.24=.4 4 36 Conditional Probability Conditional or posterior probabilities E.g., P(PlayerWins|HostOpenDoor=1 and PlayerPickDoor2 and Door1=goat) = .5 If we know more (e.g., HostOpenDoor=3 and door3-goat): P(PlayerWins)=1 Note: the less specific belief remains valid after more evidence arrives, but is not always useful New evidence may be irrelevant, allowing simplification: P(PlayerWins|Californiaearthquake)=P(PlayerWins)=.3 37 Conditional Probability A general version holds for joint distributions: P(PlayerWins,HostOpensDoor1)=P(PlayerWins|HostOpensDoor1)*P(Ho stOpensDoor1) 38 Inference by enumeration Compute conditional probabilities: P(⌐Hasteeth|color=green)= P(⌐HasteethΛcolor=green) P(color=green) 0.8 = 0.24 0.06+.24 39 Normalization Denominator can be viewed as normalization constraint α P(⌐Hasteeth|color=green) = α P(⌐Hasteeth|color=green) =α[P(⌐Hasteeth,color=green, female)+ P(⌐Hasteeth,color=green, ⌐ female)] =α[<0.03,0.12>+<0.03,0.012>]=α<0.06,0.24> =<0.2,0.8> Compute distribution on query variable by fixing evidence variables and summing over hidden variables 40 Inference by enumeration 41 Independence A and B are independent iff P(A|B)=P(A) or P(B|A)=P(B) or P(A,B)=P(A)P(B) 32 entries reduced to 12; for n independent biased coins, 2n -> n Absolute independence powerful but rare Any domain is large with hundreds of variables none of which are independent 42 43 Conditional Independence If I have length <=.2, the probability that I am female doesn’t depend on whether or not I have teeth: P(female|length<=.2,hasteeth)=P(female|h asteeth) The same independence holds if I am >.2 P(male|length>.2,hasteeth)=P(male|length>.2) Gender is conditionally independent of hasteeth given length 44 In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n Conditional independence is our most basic and robust form of knowledge about uncertain environments 45 Next Class: Turing Paper A discussion class Graduate students and non-degree students: Anyone beyond a bachelor’s: Prepare a short statement on the paper. Can be your reaction, your position, a place where you disagree, an explication of a point. Undergraduates: Be prepared with questions for the graduate students All: Submit your statement or your question by midnight Wed night. All statements and questions will be printed and distributed in class on Wednesday. 46