Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1
1
Econ 240A
Power Four
1
Last Time
• Probability
2
Problem 6.61
• A survey of middle aged men reveals
that 28% of them are balding at the
crown of their head. Moreover, it is
known that such men have an 18%
probability of suffering a heart attack in
the next ten years. Men who are not
balding in this way have an 11%
probability of a heart attack. Find the
probability that a middle aged man will
suffer a heart attack in the next ten
3
years.
P (Bald and MA) = 0.28
Bald
Not Bald
Middle Aged men
4
Bald
Not Bald
Middle Aged men
5
P (Bald and MA) = 0.28
P(HA/Bald and MA) = 0.18
P(HA/Not Bald and MA)
= 0.11
Probability of a heart attack in
the next ten years
• P(HA) = P(HA and Bald and MA) +
P(HA and Not Bald and MA)
• P(HA) = P(HA/Bald and MA)*P(BALD
and MA) + P(HA/Not BALD and MA)*
P(Not Bald and MA)
• P(HA) = 0.18*0.28 + 0.11*0.72 =
0.054 + .0792 = 0.1296
6
Random Variables
7
Outline
• Random Variables & Bernoulli Trials
• example: one flip of a coin
– expected value of the number of heads
– variance in the number of heads
• example: two flips of a coin
• a fair coin: frequency distribution of the
number of heads
– one flip
– two flips
8
Outline (Cont.)
• Three flips of a fair coin, the number of
combinations of the number of heads
• The binomial distribution
• frequency distributions for the binomial
• The expected value of a discrete
random variable
• the variance of a discrete random
variable
9
Concept
• Bernoulli Trial
– two outcomes, e.g. success or failure
– successive independent trials
– probability of success is the same in each
trial
• Example: flipping a coin multiple times
10
Flipping a Coin Once
The random variable k is the number of heads
it is variable because k can equal one or zero
it is random because the value of k depends on
probabilities of occurrence, p and 1-p
Prob. = p
Heads, k=1
Prob. = 1-p
11
Tails, k=0
Flipping a coin once
• Expected value of the number of heads
is the value of k weighted by the
probability that value of k occurs
– E(k) = 1*p + 0*(1-p) = p
• variance of k is the value of k minus its
expected value, squared, weighted by
the probability that value of k occurs
– VAR(k) = (1-p)2 *p +(0-p)2 *(1-p) =
VAR(k) = (1-p)*p[(1-p)+p] =(1-p)*p
12
Flipping a coin twice: 4
elementary outcomes
Prob =p
Prob =p
heads
heads h,h,h;hk=2
Prob=1-p
tails
Prob =1-p
13
h, t;
h, k=1
t
Prob=p
heads t,t,h;hk=1
Prob =1-p
tails t,t,t;t k=0
tails
Flipping a Coin Twice
• Expected number of heads
– E(k)=2*p2 +1*p*(1-p) +1*(1-p)*p + 0*(1-p)2
E(k) = 2*p2 + p - p2 + p - p2 =2p
– so we might expect the expected value of k in n
independent flips is n*p
• Variance in k
– VAR(k) = (2-2p)2 *p2 + 2*(1-2p)2 *p(1-p) +
(0-2p)2 (1-p)2
14
Continuing with the variance in k
– VAR(k) = (2-2p)2 *p2 + 2*(1-2p)2 *p(1-p) + (02p)2 (1-p)2
– VAR(k) = 4(1-p)2 *p2 +2*(1 - 4p +4p2)*p*(1-p) +
4p2 *(1-p)2
– adding the first and last terms, 8p2 *(1-p)2 +
2*(1 - 4p +4p2)*p*(1-p)
– and expanding this last term, 2p(1-p) -8p2 *(1-p) +
8p3 *(1-p)
– VAR(k) = 8p2 *(1-p)2 + 2p(1-p) -8p2 *(1-p)(1-p)
– so VAR(k) = 2p(1-p) , or twice VAR(k) for 1 flip
15
• So we might expect the variance in n
flips to be np(1-p)
16
Frequency Distribution for the
Number of Heads
• A fair coin
17
One Flip of the Coin
probability
1/2
O heads
18
1 head
# of heads
Two Flips of a Fair Coin
probability
1/2
1/4
0
19
1
2
# of heads
Three Flips of a Fair Coin
• It is not so hard to see what the value
of the number of heads, k, might be for
three flips of a coin: zero, one ,two,
three
• But one head can occur two ways, as
can two heads
• Hence we need to consider the number
of ways k can occur, I.e. the
combinations of branching probabilities
where order does not count
20
Three flips of a coin; 8 elementary outcomes
H
p
p
p
H
1-p
H
H
T
p
1-p
T
T
H
H
T
1-p
p
T
H
1-p
T
T
3 heads
2 heads
2 heads
1 head
2 heads
1 head
1 head
0 heads
Three Flips of a Coin
• There is only one way of getting three
heads or of getting zero heads
• But there are three ways of getting two
heads or getting one head
• One way of calculating the number of
combinations is Cn(k) = n!/k!*(n-k)!
• Another way of calculating the number
of combinations is Pascal’s triangle
22
23
Three Flips of a Coin
Probability
3/8
2/8
1/8
0
24
1
2
3
# of heads
The Probability of Getting k Heads
• The probability of getting k heads (along a
given branch) in n trials is:
pk *(1-p)n-k
• The number of branches with k heads in n
trials is given by Cn(k)
• So the probability of k heads in n trials is
Prob(k) = Cn(k) pk *(1-p)n-k
• This is the discrete binomial distribution
where k can only take on discrete values of
0, 1, …k
25
Expected Value of a discrete
random variable
• E(x) =
n
x(i) * p[ x(i)]
i 0
• the expected value of a discrete
random variable is the weighted
average of the observations where the
weight is the frequency of that
observation
Expected Value of the sum of
random variables
• E(x + y) = E(x) + E(y)
27
Expected Number of Heads After
Two Flips
• Flip One: kiI heads
• Flip Two: kjII heads
• Because of independence p(kiI and kjII)
= p(kiI)*p(kjII)
• Expected number of 1heads
after two
1
flips: E(kiI + kjII) =
(kiI + kjII)
I
II
p(ki )*p(kj )
i 0
j 0
1
• E(kiI + kjII) =
i 0
kiI p(kiI)*
1
j 0
p(kjII) +
Cont.
1
•
1
j 0
E(kiI + kjII)
kjII *p(kjII)
=
1
i 0
k iI
p(kiI)
p(kiI)*
1
j 0
p(kjII) +
i 0
• E(kiI + kjII) = E(kiI) + E(kjII) = p*1 +
p*1 =2p
• So the mean after n flips is n*p
Variance of a discrete random
variable
• VAR(xi) =
n
{[ x(i) E[ x(i)]}
2
p[ x(i )]
i 0
• the variance of a discrete random
variable is the weighted sum of each
observation minus its expected value,
squared,where the weight is the
frequency of that observation
Cont.
• VAR(xi) =
n
{x(i) E[ x(i)]}
2
* p[ x(i )]
i 0
• VAR(xi) =
n
2
2
{[
x
(
i
)]
2
E
[
x
(
i
)]
*
x
(
i
)
[
Ex
(
i
)]
} * p[ x(i )}
i 0
n
• VAR(xi) = { [ x(i)]2 p[ x(i)]} E[ x(i)]2
i 0
• So the variance equals the second moment
minus the first moment squared
The variance of the sum of
discrete random variables
• VAR[xi + yj] = E[xi + yj - E(xi + yj)]2
• VAR[xi + yj] = E[(xi - Exi) + (yj - Eyj)]2
• VAR[xi + yj] = E[(xi - Exi)2 +
2(xi - Exi) (yj - Eyj) + (yj - Eyj)2]
• VAR[xi + yj] = VAR[xi] + 2 COV[xi*yj] + VAR[yj]
The variance of the sum if x
and y are independent
• COV [xi*yj] = E(xi - Exi) (yj - Eyj)
m
n
i 0
j 0
• COV [xi*yj]= (xi - Exi) (yj - Eyj) p[ x(i ) y ( j )]
m
• COV [xi*yj]=
(x
Ex
)
p[x(i)]*
(y
Ey
)*
i
i
j
j
i 0
j 0
p[y(j)]
• COV [xi*yj] = 0
n
Variance of the number of
heads after two flips
• Since we know the variance of the number
of heads on the first flip is p*(1-p)
• and ditto for the variance in the number of
heads for the second flip
• then the variance in the number of heads
after two flips is the sum, 2p(1-p)
• and the variance after n flips is np(1-p)
34
35
The Los Angeles Times Poll
• In a sample of approximately 2000
people, 56% indicate they will vote to
recall Governor Davis
• If the poll is an accurate reflection or
subset of the population of voters next
Tuesday, what is the expected
proportion that will vote for the recall?
• How much uncertainty is in that
expectation?
36
LA Times Poll
• The estimated proportion, from the
sample, that will vote for recall is:
pˆ k / n
• where p̂ is 0.56 or 56%
• k is the number of “successes”, the
number of people sampled who are for
recall, approximately 1,120
• n is the size of the sample, 2000
LA Times Poll
• What is the expected proportion of
voters next Tuesday that will vote for
recall?
• )pˆ ( E = E(k)/n = np/n = p, where
from the binomial distribution, E(k) =
np
• So if the sample is representative of
voters and their preferences, 56%
should vote for recall next Tuesday
LA Times Poll
• How much dispersion is in this estimate, i.e.
as reported in newspapers, what is the margin
of sampling error?
• The margin of sampling error is calculated as
the standard deviation or square root of the
variance in p̂
• VAR( pˆ ) = VAR(k)/n2 = np(1-p)/n2 =p(1-p)/n
• and using 0.56 as an estimate of p,
• VAR( pˆ ) = 0.56*0.44/2000 =0.0001
LA Times Poll
40
• So the sampling error should be 0.01 or
1%, i.e. the square root of 0.0001
• The LA Times story reports the
sampling error as 3%. What gives?
• Reading how the poll was constructed,
at least two samples were mixed, a
smaller sample, with a larger standard
error, was added to the larger sample,
increasing the overall margin of error
LA Times Poll
• Is it possible that Governor Davis could
survive? This estimate of 0.56 plus or
minus twice the sampling error of 0.03,
creates an interval of 0.50 to 0.62.
• Based on a normal approximation to the
binomial, the true proportion voting to
recall should fall in this interval with
probability of about 95%.
41
LA Times Poll
• The probability of falling below 0.50,
i.e. Davis surviving, is only about 2.5%
if this poll accurately reflects voter
sentiment and nothing happens to
change voters’ minds before Oct. 7
42