Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of randomness wikipedia , lookup
Indeterminism wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Dempster–Shafer theory wikipedia , lookup
Probability box wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Law of large numbers wikipedia , lookup
Boy or Girl paradox wikipedia , lookup
Risk aversion (psychology) wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Inductive probability wikipedia , lookup
CS110: Introduction to Computer Science: Lab12 Birthdays, Leap Years, and Blind Counting Using Probabilities to Build Estimates Suppose you are throwing a surprise party for your friends. However, it is essential that no two people at the party share a birthday. How many people can you invite and still be confident that this happens. Interestingly, solutions to this problem are useful for counting things that too big to capture. Core Quantitative Issue Computation Probabilities Securing Supporting Computational Issues Random Numbers Combinations Experimental Science Prepared by Fred Annexstein University of Cincinnati CC Some rights reserved. 2007 1 The Birthday Paradox A simple and fun problem in elementary probability and statistics is the Birthday Problem: What is the probability that at least two of N randomly selected people have the same birthday? (Same month and day, but not necessarily the same year.) The problem is usually simplified by assuming two things: Nobody was born on February 29. People's birthdays are uniformly distributed over the other 365 days of the year. Build the following Table in Excel that gives a random listing of Birthdays, starting with any day (we chose 01/01) You will use Excel’s “conditional formatting” to highlight those rows that indicate days that are duplicates of previously appearing days on the list. It should be surprising how fast duplicates begin to appear! Random num from 0 to 364 Birthday+num 1/1/2007 154 6/4/2007 216 8/5/2007 182 7/2/2007 344 12/11/2007 2 1/3/2007 338 12/5/2007 216 8/5/2007 253 9/11/2007 57 2/27/2007 181 7/1/2007 362 12/29/2007 164 6/14/2007 116 4/27/2007 292 10/20/2007 23 1/24/2007 128 5/9/2007 2 2/27/2007 57 The Birthday Problem in terms of Probabilities and Recursion This problem, like others we have seen before, is easier to solve by considering the complementary problem: What is the probability that N randomly selected people have all different birthdays? Well let’s see: Suppose for the moment that we knew the probability that N-1 people had different birthdays. Call that value All_Different (N-1). Good. Now we focus on the probability that the Nth person coming in to such a party also had a different birthday from all other partygoers. Sure, we know that would be the chance of hitting any of the days not seen so far: Diff_Person = 365-(N-1) / 365 Hence we know that: All_Different(N) = All_Different(N-1) * Diff_Person This is called a Recursive Function – an extremely important concept in Computer Science. To solve it all you need is an initial value, say for one person N=1, and then iterate. 3 Sample Spreadsheet 4 Questions: 1. When does the probability become greater than 50-50 that two people at the party share a birthday? 2. When does the probability become greater than 75% that two people at the party share a birthday? 3. When does the probability become greater than 99.9% that two people at the party share a birthday? 4. On the planet Mars, there are 669 Martian days in a Martian year. For a party on Mars when does the probability become greater than 50-50 that two Martians at the party share a birthday? 5 BUT WHAT ABOUT LEAP YEAR? What happens to the probabilities if we add February 29 to the mix? It gets somewhat more complicated, but we are smart and can make some additional assumptions: • Equal numbers of people are born on days other than February 29. • The number of people born on February 29 is one-fourth of the number of people born on any other day. Why? Hence the probability that a randomly selected person was born on February 29 is 0.25/365.25, and the probability that a randomly selected person was born on another specified day is 1/365.25. The probability that N persons, possibly including one born on February 29, have all different birthdays is the sum of two probabilities: • That the N persons were born on N different days other than February 29. • That the N persons were born on N different days, and include one person born on February 29. We can add the probabilities because the two cases are mutually exclusive. Now each probability can be expressed recursively: 6 Leap Years First compute the case that excludes birthdays on Feb29, we have that the probability that N persons were born on N different days other than February 29 is: All_Diff_ex_Feb29(N) = All_Diff_ex_Feb29 (N-1) * Diff_Person_ex_Feb29 (N) Diff_Person_ex_Feb29 (N) = (365- (N-1)) / 365.25 is the probability the Nth person has different birthday from others. For the second case, which includes one birthday on Feb 29: All_Diff_in_Feb29 (N) = All_Diff_in_Feb29(N-1) * Diff_Person_ex_Feb_29 (N-1) + All_Diff_ex_Feb29(N-1) * 0.25 / 365.25; Redo the calculations with the new ‘leap year’ information. Question: 5. Do any of your previous answers (#1-3) change by adding Feb 29 to the calculations? 7 Estimating the Size of Big We can apply the logic we just used in reverse to estimate the size of something that might be very difficult to count directly, e.g., the number of web-pages on the WWW, the number of ants in the Amazon rainforest, etc. Blind Counting Theorem: Given a large set of N objects. If we select objects at random and with replacement, then the number of expected draws before a repetition occurs is approximately: =SQRT(N * pi() / 2) Run 6 experiments using N=10,100,1000,10000,10^5,10^6 and using your method from Slide 2 you must determine when a repetition occurs. Compute the absolute and relative errors of your results with respect to this Theorem on each experiment. Produce a table like the following: Experiment N First repeat Theorem Value Error Abs Error Rel 1 10 6 3.96 2.04 51.39% 2 100 16 12.53 3.47 27.66% 3 1000 50 39.63 10.37 26.16% 4 10000 120 125.33 5.33 4.25% 5 100000 301 396.33 95.33 24.05% 6 1000000 1400 1253.31 146.69 11.70% 8