Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Randomness wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Chapter 3
Discrete Random Variables and Probability Distributions
 3.1 - Random Variables
 3.2 - Probability Distributions for Discrete
Random Variables
 3.3 - Expected Values
 3.4 - The Binomial Probability Distribution
 3.5 - Hypergeometric and Negative
Binomial Distributions
 3.6 - The Poisson Probability Distribution
POPULATION
Discrete random variable X
Examples: shoe size, dosage (mg), # cells,…
Pop values
Probabilities
Cumul Probs
x
p(x)
F (x)
x1
p(x1)
p(x1)
x2
p(x2)
p(x1) + p(x2)
x3
p(x3)
p(x1) + p(x2) + p(x3)
⋮
⋮
⋮
1
Total
1
Total Area = 1
Mean
   x p( x)
all x
2
Variance    ( x   ) p( x)
2
all x
X
~ The Binomial Distribution ~
 Used only when dealing with binary outcomes
(two categories: “Success” vs. “Failure”), with a
fixed probability of Success () in the population.
 Calculates the probability of obtaining any given
number of Successes in a random sample of n
independent “Bernoulli trials.”
 Has many applications and generalizations, e.g.,
multiple categories, variable probability of
Success, etc.
POPULATION
40% Male,
60% Female
For any randomly selected individual,
define a binary random variable:
1 if Male, with prob   0.4
Y 
0 if Female, with prob 1    0.6
RANDOM
SAMPLE
n = 100
Discrete random variable
X = # Males in sample
(0, 1, 2, 3, …, 99, 100)
x
p(x)
F(x)
x1
p(x1)
F(x1)
How can we calculate the probability of x p(x ) F(x )
= P(X = x),
for x==2),
0, …,
1, 2,
3, …,100?
P(Xp(x)
= 0),
1), P(X
P(X
= 99), P(X = x100)?
p(x )
⋮
⋮
⋮
1
F(x) = P(X ≤ x), for x = 0, 1, 2, 3, …,100?
1
2
2
3
3
2
POPULATION
40% Male,
60% Female
RANDOM
SAMPLE
n = 100
For any randomly selected individual,
define a binary random variable:
1 if Male, with prob   0.4
Y 
0 if Female, with prob 1    0.6
Discrete random variable
X = # Males in sample
(0, 1, 2, 3, …, 99, 100)
Example: How can we calculate the probability of
p(25)
p(x) = P(X = x),
for=xP(X
= 0,=1,25)?
2, 3, …,100?
Solution:
F(x) =
Model
P(X the
≤ x),sample
for x =as
0, a1,sequence
2, 3, …,100?
of independent
coin tosses, with 1 = Heads (Male), 0 = Tails (Female),
where
P(H) = 0.4, P(T) = 0.6
.… etc….
5
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
......
97
98
99
100
......
…
X = 25 Heads: { H1, H2, H3,…, H25 }
permutations of 25 among 100
There are 100 possible open slots for H1 to occupy.
For each one of them, there are 99 possible open slots left for H2 to occupy.
For each one of them, there are 98 possible open slots left for H3 to occupy.
…etc…etc…etc…
For each one of them, there are 77 possible open slots left for H24 to occupy.
For each one of them, there are 76 possible open slots left for H25 to occupy.
Hence, there are ??????????????????????
100  99  98  …  77  76 possible outcomes.
This value is the number of permutations of the coins, denoted 100P25.
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
......
97
98
99
100
......
X = 25 Heads: { H1, H2, H3,…, H25 }
100  99  98  …  77  76
permutations of 25 among 100
This number unnecessarily includes the distinct permutations of the
25 among themselves, all of which have Heads in the same positions.
For example: We would not want to count this as a distinct outcome.
1
2
3
4
5
......
......
97
98
99
100
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
......
97
98
99
100
......
X = 25 Heads: { H1, H2, H3,…, H25 }
100  99  98  …  77  76
permutations of 25 among 100
This number unnecessarily includes the distinct permutations of the
25 among themselves, all of which have Heads in the same positions.
How many is that? By the same logic…... 25  24  23  …  3  2  1
100  99  98  …  77  76
100!_
=
25  24  23  …  3  2  1
25! 75!
“25 factorial” - denoted 25!
R: choose(100, 25)
Calculator: 100 nCr 25
 100 
“100-choose-25” - denoted  25  or 100C25


This value counts the number of combinations of 25 Heads among 100 coins.
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
0.4 0.6 0.6 0.4 0.6
......
97
. ... . . ... .
98
99
100
0.6 0.4 0.4 0.6
 100 
Answer:  25 


What is the probability of each such outcome?
Recall that, per toss, P(Heads) =  = 0.4
P(Tails) = 1 –  = 0.6
Answer: Via independence in binary outcomes between any two coins,
0.4  0.6  0.6  0.4  0.6  …  0.6  0.4  0.4  0.6 = (0.4)25 (0.6)75.
100 
25
75
Therefore, the probability P(X = 25) is equal to……. 
 (0.4) (0.6)
 25 
R: dbinom(25, 100, .4)
2100
How many possible outcomes of n = 100 tosses exist?
How many possible outcomes of n = 100 tosses exist with X = 25 Heads?
1
2
3
4
5
0.4 0.5
0.6 0.5
0.6 0.5
0.4 0.5
0.6
0.5
 100 
Answer:  25 


......
97
. ... . . ... .
98
99
100
0.6 0.5
0.4 0.5
0.4 0.5
0.6
0.5
This is the “equally likely” scenario!
What is the probability of each such outcome?
Recall that, per toss, P(Heads) =  = 0.4
0.5
P(Tails) = 1 –  = 0.5
0.6
Answer: Via independence in binary outcomes between any two coins,
25 100
75
0.4  0.5
0.6  0.5
0.6  0.5
0.4  0.5
0.6  …  0.5
0.6  0.5
0.4  0.5
0.4  0.5
0.6 = (0.4)
.
(0.5)(0.6)
0.5
 100 
10025 100
100 75
(0.6)
2(1/
2)
(0.5)
Therefore, the probability P(X = 25) is equal to…….  25  (0.4)


Question: What if the coin were “fair” (unbiased), i.e.,  = 1 –  = 0.5 ?
POPULATION
“Success”
40%
Male, vs.
“Failure”
60%
Female
RANDOM
SAMPLE
nsize
= 100
n
For any randomly selected individual,
define a binary random variable:
“Success” with prob    0.4
1 if Male,
Y 
“Failure” with prob 11–  0.6
0 if Female,
Discrete random variable
X = # “Successes”
Males in sample
in sample
(0, 1, 2, 3, …, 99,
n) 100)
Example: What is the probability
100
100
 n xx x25 100
x
xx
75
(0.4)
(0.4)
 (1
(1(0.6)
(0.6)
))n100





x x
P(X = 25)?
x
 25
n
x = 0, 1, 2, 3, …,100
Solution:
F(x) =Model
P(X ≤the
x), sample
for x = 0,as
1, 2,
a 3,
sequence
…,100? of n = 100
independent
coinwith
tosses,
with 1 = Heads
(Male), 0= Tails
Bernoulli trials
P(“Success”)
= , P(“Failure”)
= 1 –(Female).
.
independent, with constant
probability () per trial
Then X is said to follow a Binomial distribution,
written X ~ Bin(n, ), with “probability mass function”
n x
n x
, x = 0, 1, 2, …, n.
 (1 .…
 )etc….

x
 
p(x) = 
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
–
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies?
Check:
1. Independent outcomes?
Reasonably assume that outcomes
“Type O” vs. “Not Type O” between
two individuals are independent of
each other. 
2. Constant probability  ?
From table,  = P(Type O) = .461
throughout population. 
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
10
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
10
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
n = 10
p = .461
pmf = function(x)(dbinom(x, n, p))
N = 100000
x = 0:10
bin.dat = rep(x, N*pmf(x))
hist(bin.dat, freq = F, breaks = c(-.5, x+.5), col = "green")
axis(1, at = x)
axis(2)
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
10
n

Also, can show mean  =  x p(x) =
== 4.61
(10)(.461)
and variance  2 =  (x – ) 2 p(x) = n (1 – ) = 2.48
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
Example: Blood Type probabilities, revisited
R: dbinom(0:10, 10, .461)
Rh Factor
x
Blood Type
+
–
O
.384
.077
.461
1
A
.323
.065
.388
2
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
 10 
p(x) =  x  (.461)x (.539)10 – x
 
0
.999
Suppose n = 10 individuals are to
be selected at random from the
population.
Probability table for X = #(Type O)
Binomial model applies. X ~ Bin(10, .461)
3
4
5
6
7
8
9
10
p(x)
 10 
0
 
 10 
 1
 
 10 
2
 
 10 
3
 
 10 
4
 
 10 
5
 
 10 
6
 
 10 
7
 
 10 
8
 
 10 
9
 
 10 
 10 
 
Also, can show mean  =  x p(x) = n = 4.61
and variance  2 =  (x – ) 2 p(x) = n (1 – ) = 2.48
F (x)
(.461)0 (.539)10 = 0.00207
0.00207
(.461)1 (.539)9 = 0.01770
0.01977
(.461)2 (.539)8 = 0.06813
0.08790
(.461)3 (.539)7 = 0.15538
0.24328
(.461)4 (.539)6 = 0.23257
0.47585
(.461)5 (.539)5 = 0.23870
0.71455
(.461)6 (.539)4 = 0.17013
0.88468
(.461)7 (.539)3 = 0.08315
0.96783
(.461)8 (.539)2 = 0.02667
0.99450
(.461)9 (.539)1 = 0.00507
0.99957
(.461)10 (.539)0 = 0.00043
1.00000
1
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 

 x 
–
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
AB
.032
.007
.039
.833
.166
.999
1500
individuals
Suppose nn==10
individuals
areare
to to
be selected at random from the
population.
Probability table for X = #(Type AB–)
Binomial model applies. X ~ Bin(10,
Bin(1500,
.461)
.007)
Also, can show mean  =  x p(x) = n = 10.5
– ) = 10.43
2.48
and variance  2 =  (x – ) 2 p(x) = n (1
x = 0, 1, 2, …, 1500.
RARE EVENT!
Example: Blood Type probabilities, revisited
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 

 x 
x = 0, 1, 2, …, 1500.
Is there a better alternative?
RARE EVENT!
Long positive skew as x  1500
…but contribution  0
Chapter 3
Discrete Random Variables and Probability Distributions
 3.1 - Random Variables
 3.2 - Probability Distributions for Discrete
Random Variables
 3.3 - Expected Values
 3.4 - The Binomial Probability Distribution
 3.5 - Hypergeometric and Negative
Binomial Distributions
 3.6 - The Poisson Probability Distribution
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 

 x 
–
x = 0, 1, 2, …, 1500.
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
Poisson distribution
AB
.032
.007
.039
RARE EVENT!
.833
.166
.999
Is there a better alternative?
1500
individuals
Suppose nn==10
individuals
areare
to to
be selected at random from the
population.
Probability table for X = #(Type AB–)
p( x ) =
e μ μ x
x!
x = 0, 1, 2, …,
where mean and variance are
 = n = 10.5 and  2 = n = 10.5
Binomial model applies. X ~ Bin(1500, .007)
Also, can show mean  =  x p(x) = n = 10.5
and variance  2 =  (x – ) 2 p(x) = n (1 – ) = 10.43
X ~ Poisson(10.5)
Notation: Sometimes the
symbol  (“lambda”) is
used instead of  (“mu”).
Example: Blood Type probabilities, revisited
Rh Factor
Blood Type
+
Therefore,
 1500 
x
1500  x
(.007)
(.993)
p(x) = 

 x 
–
x = 0, 1, 2, …, 1500.
O
.384
.077
.461
A
.323
.065
.388
B
.094
.017
.111
Poisson distribution
AB
.032
.007
.039
RARE EVENT!
.833
.166
.999
Is there a better alternative?
Suppose n = 1500 individuals are to
be selected at random from the
population.
Probability table for X = #(Type AB–)
p( x ) =
x

ee10.5
(1x 0.5)
x !x !
where mean and variance are
 = n = 10.5 and  2 = n = 10.5
Ex: Probability of exactly X = 15 Type(AB–) individuals = ?
 1500 
15
1485
Binomial:  15  (.007) (.993)


x = 0, 1, 2, …,
Poisson:
X ~ Poisson(10.5)
e 10.5 (10.5)15
15!
(both ≈ .0437)
Example: Deaths in Wisconsin
Example: Deaths in Wisconsin
Assuming deaths among young adults
are relatively rare, we know the following:
• Average λ = 584 deaths per year
• Mortality rate (α) seems constant.
Therefore, the Poisson distribution can be used as a good model to make
future predictions about the random variable X = “# deaths” per year, for this
population (15-24 yrs)… assuming current values will still apply.
 Probability of exactly X = 600 deaths next year
e584 (584)600
 0.0131
P(X = 600) =
600!
R: dpois(600, 584)
 Probability of exactly X = 1200 deaths in the next two years
Mean of 584 deaths per yr  Mean of 1168 deaths per two yrs, so let λ = 1168:
e1168 (1168)1200
 0.00746
P(X = 1200) =
1200!
584 deaths / yr
 Probability of at least one death per day: λ = 365 days / yr = 1.6 deaths/day
P(X ≥ 1) = P(X = 1) + P(X = 2) + P(X = 3) + …
True, but not practical.
e1.6 (1.6)0
= 1 – e–1.6 = 0.798
P(X ≥ 1) = 1 – P(X = 0) = 1 –
0!
● Binomial ~ X = # Successes in n trials, P(Success) = 
● Poisson ~ As above, but n large,  small, i.e., Success RARE
● Negative Binomial ~ X = # trials for k Successes, P(Success) = 
● Geometric ~ As above, but specialized to k = 1
● Hypergeometric ~ As Binomial, but  changes between trials
● Multinomial ~ As Binomial, but for multiple categories, with
1 + 2 + … + last = 1 and x1 + x2 + … + xlast = n