Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapters 13 and 14 •Probability and counting Birthday Problem What is the smallest number of people you need in a group so that the probability of 2 or more people having the same birthday is greater than 1/2? Answer: 23 No. of people 23 30 40 60 Probability .507 .706 .891 .994 We will solve this problem a few slides later using the laws of probability Probability •Formal study of uncertainty •The engine that drives Statistics • Primary objectives: 1. use the rules of probability to calculate appropriate measures of uncertainty. 2. Learn the probability basics so that we can do Statistical Inference Introduction Nothing in life is certain We gauge the chances of successful outcomes in business, medicine, weather, and other everyday situations such as the lottery or the birthday problem Randomness and probability Randomness ≠ chaos A phenomenon is random if individual outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of repetitions. Coin toss The result of any single coin toss is random. But the result over many tosses is predictable, as long as the trials are independent (i.e., the outcome of a new coin flip is not influenced by the result of the previous flip). The probability of heads is 0.5 = the proportion of times you get heads in many repeated trials. First series of tosses Second series Approaches to Probability 1. Relative frequency event probability = x/n, where x=# of occurrences of event of interest, n=total # of observations Coin, die tossing; nuclear power plants? Limitations repeated observations not practical Approaches to Probability (cont.) 2. Subjective probability individual assigns prob. based on personal experience, anecdotal evidence, etc. 3. Classical approach every possible outcome has equal probability (more later) Basic Definitions Experiment: act or process that leads to a single outcome that cannot be predicted with certainty Examples: 1. Toss a coin 2. Draw 1 card from a standard deck of cards 3. Arrival time of flight from Atlanta to RDU Basic Definitions (cont.) Sample space: all possible outcomes of an experiment. Denoted by S Event: any subset of the sample space S; typically denoted A, B, C, etc. Null event: the empty set F Certain event: S Examples 1. Toss a coin once S = {H, T}; A = {H}, B = {T} 2. Toss a die once; count dots on upper face S = {1, 2, 3, 4, 5, 6} A=even # of dots on upper face={2, 4, 6} B=3 or fewer dots on upper face={1, 2, 3} 3. Select 1 card from a Laws of Probability 1. 0 P ( A) 1, for any event A 2. P (F ) 0, P ( S ) 1 Probability rules (cont’d) Coin Toss Example: S = {Head, Tail} Probability of heads = 0.5 Probability of tails = 0.5 3) The complement of any event A is the event that A does not occur, written as A. The complement rule states that the probability of an event not occurring is 1 minus the probability that is does occur. P(not A) = P(A) = 1 − P(A) Tail = not Tail = Head P(Tail ) = 1 − P(Head) = 0.5 Venn diagram: Sample space made up of an event A and its complementary A , i.e., everything that is not A. Birthday Problem What is the smallest number of people you need in a group so that the probability of 2 or more people having the same birthday is greater than 1/2? Answer: 23 No. of people 23 30 40 60 Probability .507 .706 .891 .994 Example: Birthday Problem A={at least 2 people in the group have a common birthday} A’ = {no one has common birthday} 3 people 23 people :P ( A') 364 363 365 365 : 364 363 343 P ( A') . 498 365 365 365 so P ( A ) 1 P ( A ' ) 1 . 498 . 502 Unions: , or Intersections: , and A A Mutually Exclusive (Disjoint) Events Venn Diagrams A and B disjoint: A B= Mutually exclusive or disjoint events-no outcomes from S in common A A A and B not disjoint Addition Rule for Disjoint Events 4. If A and B are disjoint events, then P(A or B) = P(A) + P(B) Laws of Probability (cont.) General Addition Rule 5. For any two events A and B P(A or B) = P(A) + P(B) – P(A and B) General Addition Rule For any two events A and B P(A or B) = P(A) + P(B) - P(A and B) P(A) =6/13 + A P(B) =5/13 _ B P(A and B) =3/13 P(A or B) = 8/13 A or B 20 Laws of Probability - 5 Multiplication Rule 6. For two independent events A and B P(A and B) = P(A) × P(B) Note: assuming events are independent doesn’t make it true. Multiplication Rule The probability that you encounter a green light at the corner of Dan Allen and Hillsborough is 0.35, a yellow light 0.04, and a red light 0.61. What is the probability that you encounter a red light on both Monday and Tuesday? It’s reasonable to assume that the color of the light you encounter on Monday is independent of the color on Tuesday. So P(red on Monday and red on Tuesday) = P(red on Monday) × P(red on Tuesday) = 0.61 × 0.61 = 0.3721 Laws of Probability: Summary 0 P(A) 1 for any event A P() = 0, P(S) = 1 P(A’) = 1 – P(A) If A and B are disjoint events, then P(A or B) = P(A) + P(B) 5. For any two events A and B, P(A or B) = P(A) + P(B) – P(A and B) 6. For two independent events A and B 1. 2. 3. 4. P(A and B) = P(A) × P(B) M&M candies If you draw an M&M candy at random from a bag, the candy will have one of six colors. The probability of drawing each color depends on the proportions manufactured, as described here: Color Probability Brown Red Yellow Green Orange Blue 0.3 0.2 0.2 0.1 0.1 ? What is the probability that an M&M chosen at random is blue? S = {brown, red, yellow, green, orange, blue} P(S) = P(brown) + P(red) + P(yellow) + P(green) + P(orange) + P(blue) = 1 P(blue) = 1 – [P(brown) + P(red) + P(yellow) + P(green) + P(orange)] = 1 – [0.3 + 0.2 + 0.2 + 0.1 + 0.1] = 0.1 What is the probability that a random M&M is any of red, yellow, or orange? P(red or yellow or orange) = P(red) + P(yellow) + P(orange) = 0.2 + 0.2 + 0.1 = 0.5 Example: toss a fair die once S = {1, 2, 3, 4, 5, 6} A = even # appears = {2, 4, 6} B = 3 or fewer = {1, 2, 3} P(A or B) = P(A) + P(B) - P(A and B) =P({2, 4, 6}) + P({1, 2, 3}) - P({2}) = 3/6 + 3/6 - 1/6 = 5/6 Example: college students Suppose 56% of all students live on campus, 62% of all students purchase a campus meal plan and 42% do both. Question: what is the probability that a randomly selected student either lives OR eats on campus. •L= {student lives on campus} M = {student purchases a meal plan} P(a student either lives or eats on campus) = P(L or M) = P(L) + P(M) - P(L and M) =0.56 + 0.62 – 0.42 = 0.76 THE RELATIONSHIP BETWEEN ODDS AND PROBABILITIES •World Series Odds •The odds at the above link are the odds against a team winning the World Series, though the author claims they’re “odds for winning the World Series” •Odds are frequently a source of confusion. Odds for? Odds against? •From probability to odds •From odds to probability From Probability to Odds If event A has probability P(A), then the odds in favor of A are P(A) to 1-P(A). It follows that the odds against A are 1-P(A) to P(A) If the probability of an earthquake in California is .25, then the odds in favor of an earthquake are .25 to .75 or 1 to 3. The odds against an earthquake are .75 to .25 or 3 to 1 From Odds to Probability If the odds in favor of an event E are a to b, then P(E)=a/(a+b) in addition, P(E’)=b/(a+b) If the odds in favor of UNC winning the NCAA’s are 3 (a) to 1 (b), then P(UNC wins)=3/4 P(UNC does not win)= 1/4 Probability Models The Equally Likely Approach (also called the Classical Approach) Assigning Probabilities If an experiment has N outcomes, then each outcome has probability 1/N of occurring If an event A1 has n1 outcomes, then P(A1) = n1/N Dice You toss two dice. What is the probability of the outcomes summing to 5? This is S: {(1,1), (1,2), (1,3), ……etc.} There are 36 possible outcomes in S, all equally likely (given fair dice). Thus, the probability of any one of them is 1/36. P(the roll of two dice sums to 5) = P(1,4) + P(2,3) + P(3,2) + P(4,1) = 4 / 36 = 0.111 We Need Efficient Methods for Counting Outcomes Product Rule for Ordered Pairs A student wishes to commute to a junior college for 2 years and then commute to a state college for 2 years. Within commuting distance there are 4 junior colleges and 3 state colleges. How many junior college-state college pairs are available to her? Product Rule for Ordered Pairs junior colleges: 1, 2, 3, 4 state colleges a, b, c possible pairs: (1, a) (1, b) (1, c) (2, a) (2, b) (2, c) (3, a) (3, b) (3, c) (4, a) (4, b) (4, c) Product Rule for Ordered Pairs junior colleges: 1, 2, 3, 4 state colleges a, b, c 4 junior colleges 3 state colleges possible pairs: total number of possible (1, a) (1, b) (1, c) pairs = 4 x 3 = 12 (2, a) (2, b) (2, c) (3, a) (3, b) (3, c) (4, a) (4, b) (4, c) Product Rule for Ordered Pairs junior colleges: 1, In 2,general, 3, 4 if there are n1 ways to choose the first element of state colleges a, b,thec pair, and n ways to choose 2 the second element, then the possible pairs: number of possible pairs is (1, a) (1, b) (1, c) n1n2. Here n1 = 4, n2 = 3. (2, a) (2, b) (2, c) (3, a) (3, b) (3, c) (4, a) (4, b) (4, c) Counting in “Either-Or” Situations • NCAA Basketball Tournament: how many ways can the “bracket” be filled out? 1. How many games? 2. 2 choices for each game 3. Number of ways to fill out the bracket: 263 = 9.2 × 1018 • • Earth pop. about 6 billion; everyone fills out 1 million different brackets Chances of getting all games correct is about 1 in 1,000 Counting Example Pollsters minimize lead-in effect by rearranging the order of the questions on a survey If Gallup has a 5-question survey, how many different versions of the survey are required if all possible arrangements of the questions are included? Solution There are 5 possible choices for the first question, 4 remaining questions for the second question, 3 choices for the third question, 2 choices for the fourth question, and 1 choice for the fifth question. The number of possible arrangements is therefore 5 4 3 2 1 = 120 Efficient Methods for Counting Outcomes Factorial Notation: n!=12 … n Examples 1!=1; 2!=12=2; 3!= 123=6; 4!=24; 5!=120; Special definition: 0!=1 Factorials with calculators and Excel Calculator: non-graphing: x ! (second function) graphing: bottom p. 9 T I Calculator Commands (math button) Excel: Insert function: Math and Trig category, FACT function Factorial Examples 20! = 2.43 x 1018 1,000,000 seconds? About 11.5 days 1,000,000,000 seconds? About 31 years 31 years = 109 seconds 1018 = 109 x 109 20! is roughly the age of the universe in seconds Permutations A B C D E How many ways can we choose 2 letters from the above 5, without replacement, when the order in which we choose the letters is important? 5 4 = 20 Permutations (cont.) 5! 5! 5 4 20 5 4 (5 2)! 3! 5! Notation : 5 P2 20 (5 2)! Permutations with calculator and Excel Calculator non-graphing: nPr Graphing p. 9 of T I Calculator Commands (math button) Excel Insert function: Statistical, Permut Combinations A B C D E How many ways can we choose 2 letters from the above 5, without replacement, when the order in which we choose the letters is not important? 5 4 = 20 when order important Divide by 2: (5 4)/2 = 10 ways Combinations (cont.) 5! 5! 5 4 20 5 C2 10 (5 2)!2! 3!2! 1 2 2 n! n Cr (n r )! r! 5 2 n r ST 311 Powerball Lottery From the numbers 1 through 20, choose 6 different numbers. Write them on a piece of paper. And the numbers are ... 16 11 2 10 8 4 wow scream Chances of Winning? Choose 6 numbers from 20, without replacemen t, order not important. Number of possibilit ies? 20! 20 38,760 6 20 C6 (20 6)!6! Example: Illinois State Lottery Choose 6 numbers from 54 numbers without replacemen t; order not important 54! 25,827,165 54 C6 48!6! (about 1 second in 10 months) (1200 ft 2 house, 16.5 million ping pong balls) North Carolina Powerball Lottery Prior to Jan. 1, 2009 5 from 1 - 55: 55! 3, 478, 761 5!50! 1 from 1 - 42 (p'ball #): 42! 42 1!41! 3, 478, 761*42 146,107,962 After Jan. 1, 2009 5 from 1 - 59: 59! 5, 006, 386 5!54! 1 from 1 - 39 (p'ball #): 39! 39 1!38! 5, 006, 386*39 195, 249, 054 The Forrest Gump Visualization of Your Lottery Chances How large is 195,249,054? $1 bill and $100 bill both 6” in length 10,560 bills = 1 mile Let’s start with 195,249,053 $1 bills and one $100 bill … … and take a long walk, putting down bills end-to-end as we go Raleigh to Ft. Lauderdale… … still plenty of bills remaining, so continue from … … Ft. Lauderdale to San Diego … still plenty of bills remaining, so continue from… … San Diego to Seattle … still plenty of bills remaining, so continue from … … Seattle to New York … still plenty of bills remaining, so continue from … … New York back to Raleigh … still plenty of bills remaining, so … Go around again! Lay a second path of bills Still have ~ 5,000 bills left!! Chances of Winning NC Powerball Lottery? Remember: one of the bills you put down is a $100 bill; all others are $1 bills. Put on a blindfold and begin walking along the trail of bills. Your chance of winning the lottery is the same as your chance of selecting the $100 bill if you stop at a random location along the trail and pick up a bill . Virginia State Lottery 50! Pick 5 : 50 C5 2,118,760 45!5! 2,118,760 25 C1 25! 2,118,760 52,969000 24!1! Probability Trees A Graphical Method for Complicated Probability Problems Probability Tree Example: probability of playing professional baseball 6.1% of high school baseball players play college baseball. Of these, 9.4% will play professionally. Unlike football and basketball, high school players can also go directly to professional baseball without playing in college… studies have shown that given that a high school player does not compete in college, the probability he plays professionally is .002. Question 1: What is the probability that a high school baseball player ultimately plays professional baseball? Question 2: Given that a high school baseball player played professionally, what is the probability he played in college? Question 1: What is the probability that a high school baseball player ultimately plays professional baseball? Play prof. .094 .061*.094=.005734 Play coll 0.061 .906 HS BB Player Play prof. .002 Does not play coll 0.939 Does not Play prof. .998 .939*.002=.001878 P(hs bb player plays professionally) = .061*.094 + .939*.002 = .005734 + .001878 = .007612 Question 2: Given that a high school baseball player played professionally, what is the probability he played in college? Play prof. .094 Play coll 0.061 .906 .061*.094=.005734 P(hs bb player plays professionally) = .005734 + .001878 = .007612 HS BB Player Play prof. .002 .939*.002=.001878 Does not play coll 0.939 Does not Play prof. .998 P(played in college given that played professionally) .005734 = .7533 .007612 Example: AIDS Testing V={person has HIV}; CDC: Pr(V)=.006 P : test outcome is positive (test indicates HIV present) N : test outcome is negative clinical reliabilities for a new HIV test: 1. If a person has the virus, the test result will be positive with probability .999 2. If a person does not have the virus, the test result will be negative with probability .990 Question 1 What is the probability that a randomly selected person will test positive? Probability Tree Approach A probability tree is a useful way to visualize this problem and to find the desired probability. Probability Tree clinical reliability clinical reliability Multiply branch probs Question 1: What is the probability that a randomly selected person will test positive? Pr( P) .00599 .00994 .01593 Question 2 If your test comes back positive, what is the probability that you have HIV? (Remember: we know that if a person has the virus, the test result will be positive with probability .999; if a person does not have the virus, the test result will be negative with probability .990). Looks very reliable Question 2: If your test comes back positive, what is the probability that you have HIV? Pr( P) .00599 .00994 .01593 P(have HIV given that test is positive) .00599 = .376 .00599 .00994 Summary Question 1: Pr(P ) = .00599 + .00994 = .01593 Question 2: two sequences of branches lead to positive test; only 1 sequence represented people who have HIV. Pr(person has HIV given that test is positive) =.00599/(.00599+.00994) = .376 Recap We have a test with very high clinical reliabilities: 1. If a person has the virus, the test result will be positive with probability .999 2. If a person does not have the virus, the test result will be negative with probability .990 But we have extremely poor performance when the test is positive: Pr(person has HIV given that test is positive) =.376 In other words, 62.4% of the positives are false positives! Why? When the characteristic the test is looking for is rare, most positives will be false. examples 1. P(A)=.3, P(B)=.4; if A and B are mutually exclusive events, then P(AB)=? A B = , P(A B) = 0 2. 15 entries in pie baking contest at state fair. Judge must determine 1st, 2nd, 3rd place winners. How many ways can judge make the awards? 15P3 = 2730