Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 321 Discrete Structures Winter 2008 Lecture 19 Probability Theory Announcements • Readings – Probability Theory • 6.1, 6.2 (5.1, 5.2) Probability Theory • 6.3 (New material!) Bayes’ Theorem • 6.4 (5.3) Expectation – Advanced Counting Techniques – Ch 7. • Not covered Discrete Probability Experiment: Procedure that yields an outcome Sample space: Set of all possible outcomes Event: subset of the sample space S a sample space of equally likely outcomes, E an event, the probability of E, p(E) = |E|/|S| Example: Dice Example: Poker Probability of 4 of a kind Combinations of Events EC is the complement of E P(EC) = 1 – P(E) P(E1 E2) = P(E1) + P(E2) – P(E1 E2) Combinations of Events EC is the complement of E P(EC) = 1 – P(E) P(E1 E2) = P(E1) + P(E2) – P(E1 E2) Probability Concepts • • • • • Probability Distribution Conditional Probability Independence Bernoulli Trials / Binomial Distribution Random Variable Discrete Probability Theory • Set S • Probability distribution p : S [0,1] – For s S, 0 p(s) 1 – s S p(s) = 1 • Event E, E S • p(E) = s Ep(s) Examples Conditional Probability Let E and F be events with p(F) > 0. The conditional probability of E given F, defined by p(E | F), is defined as: Examples Independence The events E and F are independent if and only if p(E F) = p(E)p(F) E and F are independent if and only if p(E | F) = p(E) Are these independent? • Flip a coin three times – E: the first coin is a head – F: the second coin is a head • Roll two dice – E: the sum of the two dice is 5 – F: the first die is a 1 • Roll two dice – E: the sum of the two dice is 7 – F: the first die is a 1 • Deal two five card poker hands – E: hand one has four of a kind – F: hand two has four of a kind Bernoulli Trials and Binomial Distribution • Bernoulli Trial – Success probability p, failure probability q The probability of exactly k successes in n independent Bernoulli trials is Random Variables A random variable is a function from a sample space to the real numbers Bayes’ Theorem Suppose that E and F are events from a sample space S such that p(E) > 0 and p(F) > 0. Then False Positives, False Negatives Let D be the event that a person has the disease Let Y be the event that a person tests positive for the disease Testing for disease Disease is very rare: p(D) = 1/100,000 Testing is accurate: False negative: 1% False positive: 0.5% Suppose you get a positive result, what do you conclude? P(D|Y) Spam Filtering From: Zambia Nation Farmers Union [[email protected]] Subject: Letter of assistance for school installation To: Richard Anderson Dear Richard, I hope you are fine, Iam through talking to local headmen about the possible assistance of school installation. the idea is and will be welcome. I trust that you will do your best as i await for more from you. Once again Thanking you very much Sebastian Mazuba. Bayesian Spam filters • Classification domain – Cost of false negative – Cost of false positive • Criteria for spam – v1agra, ONE HUNDRED MILLION USD • Basic question: given an email message, based on spam criteria, what is the probability it is spam Email message with phrase “Account Review” • 250 of 20000 messages known to be spam • 5 of 10000 messages known not to be spam • Assuming 50% of messages are spam, what is the probability that a message with “Account Review” is spam Proving Bayes’ Theorem Expectation The expected value of random variable X(s) on sample space S is: Flip a coin until the first head Expected number of flips? Probability Space: Computing the expectation: Linearity of Expectation E(X1 + X2) = E(X1) + E(X2) E(aX) = aE(X) Hashing H: M [0..n-1] If k elements have been hashed to random locations, what is the expected number of elements in bucket j? What is the expected number of collisions when hashing k elements to random locations? Hashing analysis Sample space: [0..n-1] [0..n-1] . . . [0..n-1] Random Variables Xj = number of elements hashed to bucket j C = total number of collisions Bij = 1 if element i hashed to bucket j Bij = 0 if element i is not hashed to bucket j Cab = 1 if element a is hashed to the same bucket as element b Cab = 0 if element a is hashed to a different bucket than element b Counting inversions Let p1, p2, . . . , pn be a permutation of 1 . . . n pi, pj is an inversion if i < j and pi > pj 4, 2, 5, 1, 3 1, 6, 4, 3, 2, 5 7, 6, 5, 4, 3, 2, 1 Expected number of inversions for a random permutation Insertion sort 4 for i :=1 to n-1{ j := i; while (j > 0 and A[ j - 1 ] > A[ j ]){ swap(A[ j -1], A[ j ]); j := j – 1; } } 2 5 1 3 Expected number of swaps for Insertion Sort Left to right maxima max_so_far := A[0]; for i := 1 to n-1 if (A[ i ] > max_so_far) max_so_far := A[ i ]; 5, 2, 9, 14, 11, 18, 7, 16, 1, 20, 3, 19, 10, 15, 4, 6, 17, 12, 8 What is the expected number of left-toright maxima in a random permutation