Download All_Diff_ex_Feb29 (N-1) - University of Cincinnati

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Indeterminism wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Randomness wikipedia , lookup

Probability box wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Boy or Girl paradox wikipedia , lookup

Risk aversion (psychology) wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Birthday problem wikipedia , lookup

Transcript
CS110: Introduction to Computer Science: Lab12
Birthdays, Leap Years, and Blind Counting
Using Probabilities to Build Estimates
Suppose you are throwing a surprise
party for your friends. However, it is
essential that no two people at the
party share a birthday. How many
people can you invite and still be
confident that this happens.
Interestingly, solutions to this
problem are useful for counting
things that too big to capture.
Core Quantitative Issue
Computation
Probabilities
Securing
Supporting Computational Issues
Random Numbers
Combinations
Experimental Science
Prepared by Fred Annexstein
University of Cincinnati
CC Some rights reserved. 2007
1
The Birthday Paradox
A simple and fun problem in elementary
probability and statistics is the Birthday
Problem: What is the probability that at
least two of N randomly selected people
have the same birthday? (Same month
and day, but not necessarily the same
year.)
The problem is usually simplified by assuming
two things: Nobody was born on February
29. People's birthdays are uniformly
distributed over the other 365 days of the
year.
Build the following Table in Excel that gives a
random listing of Birthdays, starting with
any day (we chose 01/01)
You will use Excel’s “conditional formatting” to
highlight those rows that indicate days that
are duplicates of previously appearing
days on the list.
It should be surprising how fast duplicates
begin to appear!
Random num
from 0 to 364
Birthday+num
1/1/2007
154
6/4/2007
216
8/5/2007
182
7/2/2007
344
12/11/2007
2
1/3/2007
338
12/5/2007
216
8/5/2007
253
9/11/2007
57
2/27/2007
181
7/1/2007
362
12/29/2007
164
6/14/2007
116
4/27/2007
292
10/20/2007
23
1/24/2007
128
5/9/2007
2
2/27/2007
57
The Birthday Problem in terms of Probabilities and Recursion
This problem, like others we have seen before, is easier to solve by
considering the complementary problem: What is the probability that
N randomly selected people have all different birthdays?
Well let’s see: Suppose for the moment that we knew the probability that
N-1 people had different birthdays. Call that value All_Different (N-1).
Good. Now we focus on the probability that the Nth person coming in to
such a party also had a different birthday from all other partygoers.
Sure, we know that would be the chance of hitting any of the days not
seen so far:
Diff_Person = 365-(N-1) / 365
Hence we know that:
All_Different(N) = All_Different(N-1) * Diff_Person
This is called a Recursive Function – an extremely important
concept in Computer Science. To solve it all you need is an
initial value, say for one person N=1, and then iterate.
3
Sample Spreadsheet
4
Questions:
1. When does the probability become greater than 50-50 that two
people at the party share a birthday?
2. When does the probability become greater than 75% that two
people at the party share a birthday?
3. When does the probability become greater than 99.9% that two
people at the party share a birthday?
4. On the planet Mars, there are 669 Martian days in a Martian year.
For a party on Mars when does the probability become greater
than 50-50 that two Martians at the party share a birthday?
5
BUT WHAT ABOUT LEAP YEAR?
What happens to the probabilities if we add February 29 to the mix?
It gets somewhat more complicated, but we are smart and can make
some additional assumptions:
• Equal numbers of people are born on days other than February 29.
• The number of people born on February 29 is one-fourth of the
number of people born on any other day. Why?
Hence the probability that a randomly selected person was born on
February 29 is 0.25/365.25, and the probability that a randomly
selected person was born on another specified day is 1/365.25.
The probability that N persons, possibly including one born on February
29, have all different birthdays is the sum of two probabilities:
• That the N persons were born on N different days other than
February 29.
• That the N persons were born on N different days, and include one
person born on February 29.
We can add the probabilities because the two cases are mutually
exclusive. Now each probability can be expressed recursively:
6
Leap Years
First compute the case that excludes birthdays on Feb29, we have that
the probability that N persons were born on N different days other
than February 29 is:
All_Diff_ex_Feb29(N) = All_Diff_ex_Feb29 (N-1) *
Diff_Person_ex_Feb29 (N)
Diff_Person_ex_Feb29 (N) = (365- (N-1)) / 365.25
is the probability the Nth person has different birthday from others.
For the second case, which includes one birthday on Feb 29:
All_Diff_in_Feb29 (N) = All_Diff_in_Feb29(N-1) *
Diff_Person_ex_Feb_29 (N-1) +
All_Diff_ex_Feb29(N-1) *
0.25 / 365.25;
Redo the calculations with the new ‘leap year’ information. Question:
5. Do any of your previous answers (#1-3) change by adding Feb 29
to the calculations?
7
Estimating the Size of Big
We can apply the logic we just used in reverse to estimate the size of
something that might be very difficult to count directly, e.g., the number
of web-pages on the WWW, the number of ants in the Amazon
rainforest, etc.
Blind Counting Theorem: Given a large set of N objects. If we select
objects at random and with replacement, then the number of expected
draws before a repetition occurs is approximately:
=SQRT(N * pi() / 2)
Run 6 experiments using N=10,100,1000,10000,10^5,10^6 and using your
method from Slide 2 you must determine when a repetition occurs.
Compute the absolute and relative errors of your results with respect to
this Theorem on each experiment. Produce a table like the following:
Experiment
N
First repeat
Theorem Value
Error Abs
Error Rel
1
10
6
3.96
2.04
51.39%
2
100
16
12.53
3.47
27.66%
3
1000
50
39.63
10.37
26.16%
4
10000
120
125.33
5.33
4.25%
5
100000
301
396.33
95.33
24.05%
6
1000000
1400
1253.31
146.69
11.70%
8