Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of randomness wikipedia , lookup
Indeterminism wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Probability box wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Law of large numbers wikipedia , lookup
Boy or Girl paradox wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Inductive probability wikipedia , lookup
Risk aversion (psychology) wikipedia , lookup
CS110: Introduction to Computer Science: Lab12 Birthdays Parties and Counting Blind Using Probabilities to Build Estimates Suppose you are throwing a surprise party for your friends. However, it is essential that no two people at the party share a birthday. How many people can you invite and still be confident that this happens. Interestingly, solutions to this problem are useful for counting things that are too big to capture. Core Quantitative Issue Computation Probabilities Estimation Supporting Computational Issues Random Numbers Combinations Experimental Science Prepared by Fred Annexstein University of Cincinnati CC Some rights reserved. 2007 1 The Birthday Paradox A simple problem in elementary probability and statistics is the Birthday Problem: What is the probability that at least two of N randomly selected people have the same birthday? (Same month and day, but not necessarily the same year.) Random num from 0 to 364 BirthDay= 01/01 + num 1/1/2007 154 6/4/2007 216 8/5/2007 182 7/2/2007 344 12/11/2007 2 1/3/2007 338 12/5/2007 216 8/5/2007 253 9/11/2007 57 2/27/2007 Build the following table in Excel that gives a random listing of Birthdays, starting with any day (we chose 01/01) 181 7/1/2007 362 12/29/2007 164 6/14/2007 You will use Excel’s “conditional formatting” to highlight those rows that indicate days that are duplicates of previously appearing days on the list. 116 4/27/2007 292 10/20/2007 23 1/24/2007 128 5/9/2007 57 2/27/2007 2 7/12/2007 The problem is usually simplified by assuming two things: Nobody was born on February 29. People's birthdays are uniformly distributed over the other 365 days of the year. Is it surprising how fast duplicates begin to appear?! 192 The Birthday Problem in terms of Probabilities and Recursion This problem, like others we have seen before, is easier to solve by considering the complementary problem: What is the probability that N randomly selected people have all different birthdays? Well let’s see: Suppose for the moment that we knew the probability that N-1 people all had different birthdays. Let us call that value All_Different (N-1). Good, put that in your pocket. Now focus on the probability that the Nth person coming in to the party also had a different birthday from all others. Sure, we know that would be the chance of hitting any of the days not seen so far. Let us call that value Diff_Person = 365-(N-1) / 365 Hence we know that: All_Different(N) = All_Different(N-1) * Diff_Person This is called a Recursive Function – an extremely important concept in Computer Science. To solve it all you need is an initial value, say for one person N=1, and then iterate. 3 Sample Spreadsheet 4 Questions: 1. When does the probability become greater than 50-50 that two people at the party share a birthday? 2. When does the probability become greater than 75% that two people at the party share a birthday? 3. When does the probability become greater than 99.9% that two people at the party share a birthday? 4. On the planet Mars, there are 669 Martian days in a Martian year. For a party on Mars when does the probability become greater than 50-50 that two Martians at a party share a birthday? 5 BUT WHAT ABOUT LEAP YEARS? What happens to the probabilities if we add February 29 to the mix? Do you think it will increase or decrease the chance that 2 people share a birthday? To model the leap year, it gets somewhat more complicated, but we are smart and can make some additional assumptions: • Equal numbers of people are born on days other than February 29. • The number of people born on February 29 is one-fourth of the number of people born on any other day. Why? Hence the probability that a randomly selected person was born on February 29 is 1 out of 1461 = 0.25/365.25, and the probability that a randomly selected person was born on another specified day is 1/365.25. The probability that N persons, possibly including one born on February 29, have all different birthdays is the sum of two probabilities: • That the N persons were born on N different days other than February 29. • That the N persons were born on N different days, and include one person born on February 29. We can add the probabilities because the two cases are mutually 6 exclusive. Now each probability can be expressed recursively: Leap Year Recursion First compute the case that excludes birthdays on Feb29. We have that the probability that N persons were born on N different days other than February 29 is: All_Diff_no_Feb29(N) = All_Diff_no_Feb29 (N-1) * Diff_Person_no_Feb29 (N) Diff_Person_no_Feb29 (N) = (365- (N-1)) / 365.25 is the probability the Nth person has different birthday from all others. For the second case, which includes one birthday on Feb 29: All_Diff_yes_Feb29 (N) = All_Diff_yes_Feb29(N-1) * Diff_Person_no_Feb_29 (N-1) + All_Diff_no_Feb29(N-1) * 0.25 / 365.25; Redo the calculations with the new ‘leap year’ formulas. Questions: 5. Does the new model increase or decrease the chance that 2 people share a birthday? 6. Do any of your previous answers (#1-3) change with the new calculations? 7 Estimating the Size of Big We can apply the logic we just used in reverse to estimate the size of something that might be very difficult to count directly, e.g., the number of web-pages on the WWW, the number of deer in Mt. Airy, or the number of ants in the Amazon rainforest. Blind Counting Theorem: Given a large set of N objects. If we select objects at random and with replacement, then the number of expected draws before a repetition occurs is approximately: =SQRT(N * pi() / 2) Run 6 experiments using N=10,100,1000,10000,10^5,10^6 and using your method from Slide 2 you must determine when a repetition occurs. Compute the absolute and relative errors of your results with respect to this Theorem on each experiment. Produce a table like the following: Experiment N First repeat Theorem Value Error Abs Error Rel 1 10 6 3.96 2.04 51.39% 2 100 16 12.53 3.47 27.66% 3 1000 50 39.63 10.37 26.16% 4 10000 120 125.33 5.33 4.25% 5 100000 301 396.33 95.33 24.05% 6 1000000 1400 1253.31 146.69 11.70% 8