Download Chapter 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Section 2: Conditional Probability and
Conditional Expectation
2.1 Introduction
One of the most useful concepts in probability theory is that of conditional probability and
conditional expectation. The reason is twofold. First, in practice, we are often interested in
calculating probabilities and expectations when some partial information is available; hence,
the desired probabilities and expectations are conditional ones. Secondly, in calculating a
desired probability or expectation it is often extremely useful to first condition on some
appropriate random variable.
2.2 The Discrete Case
For any two events A and B, the conditional probability of A given B is defined, as long as
PB > 0, by
PA ∣ B =
PA ∩ B
PB
Hence, if X and Y are discrete random variables, then it is natural to define the conditional
probability mass function of X given that Y = y, by
p X∣Y x ∣ y = PX = x ∣ Y = y
PX = x, Y = y
=
PY = y
px, y
=
p Y y
for all values of y such that PY = y > 0. Similarly, the conditional probability distribution
function of X given that Y = y is defined, for all y such that PY = y > 0, by
F X∣Y x ∣ y = PX ≤ x ∣ Y = y
=
∑ p X∣Ya ∣ x
a≤x
Finally, the conditional expectation of X given that Y = y is defined by
EX ∣ Y = y =
∑ x.PX = x ∣ Y = y
x
=
∑ x.p X∣Yx ∣ y
x
If X is independent of Y, than the conditional mass function, distribution and expectation are
the same as the unconditional ones. This follows, since if X is independent of Y, then
12
p Y∣X x ∣ y = PX = x ∣ Y = y
= PX = x
Example (2.1) :
Consider an experiment which results in one of three possible outcomes with outcome i
3
occurring with probability p i , i = 1, 2, 3 and ∑ i=1 p i = 1. Suppose that n independent
replications of this experiment are performed and let X i , i = 1, 2, 3, denote the number of times
outcome i appears. Determine the conditional expectation of X 1 given that X 2 = m.
Solution :
For k ≤ n − m,
PX 1 = k ∣ X 2 = m =
PX 1 = k, X 2 = m
PX 2 = m
Now, if X 1 = k and X 2 = m, then it follows that X 3 = n − k − m. However,
PX 1 = k, X 2 = m, X 3 = n − k − m =
n!
p k p m p n−k−m
k!m!n − k − m! 1 2 3
(2.1)
This follows since any particular sequence of the n experiments having outcome 1 appear k
times, outcome 2 m times and outcome 3 n − k − m times has probability p k1 p m2 p n−k−m
of
3
n!
occurring. Since there are
such sequences, equation 2.1 follows.
k!m!n − k − m!
Therefore, we have
n!
p k p m p n−k−m
k!m!n − k − m! 1 2 3
PX 1 = k ∣ X 2 = m =
n!
p m 1 − p 2  n−m
m!n − m! 2
where we have used the fact that X 2 has a binomial distribution with paramenters n and p 2 .
Hence,
PX 1 = k ∣ X 2 = m =
n − m!
k!n − m − k!
p1
1 − p2
k
p3
1 − p2
n−m−k
or equivalently, writing p 3 = 1 − p 1 − p 2 ,
PX 1 = k ∣ X 2 = m =
n−m
k
p1
1 − p2
k
1−
p1
1 − p2
n−m−k
In other words, the conditional distribution of X 1 , given that X 2 = m, is binomial with
p1
paramenters n − m and
. Consequently,
1 − p2
p1
EX 1 ∣ X 2 = m = n − m
1 − p2
13
#
Example (2.2) :
There are n components. On a rainy day, component i will function with probability p i ; on
a nonrainy day, component i will function with probability q i , for i = 1, 2, ..., n. It will rain
tomorrow with probability α. Calculate the conditional expected number of components that
function tomorrow, given that it rains.
Solution :
Let
1, if component i functions tomorrow
Xi =
0, otherwise
Then, with Y defined to equal 1 if it rains tomorrow, and 0 otherwise, the desired conditional
expectation is obtained as follows.
n
E
∑ Xi ∣ Y = 1
n
=
i=1
∑ EX i ∣ Y = 1
i=1
n
=
∑ pi
#
i=1
2.3 Continuous Case
If X and Y have a joint probability density function fx, y, then the conditional probability
density function of X given that Y = y, is defined for all values of y such that f Y y > 0, by
f X∣Y x ∣ y =
fx, y
f Y y
To motivate this definition, multiply the left side by dx and the right side by
dxdy
to get
dy
fx, ydxdy
f Y ydy
Px ≤ X ≤ x + dx, y ≤ Y ≤ y + dy

Py ≤ Y ≤ y + dy
f X∣Y x ∣ ydx =
= Px ≤ X ≤ x + dx ∣ y ≤ Y ≤ y + dy
In other words, for small values dx and dy, f X∣Y x ∣ ydx is approximately the conditional
probability that X is between x and x + dx given that Y is between y and y + dy. The
conditional expectation of X, given that Y = y, is defined for all values of y such that
f Y y > 0, by
EX ∣ Y = y =
14
∞
∫ −∞ xf X∣Yx ∣ ydx
Example (2.3) :
The joint density of X and Y is given by
fx, y =
1 ye −xy , 0 < x < ∞, 0 < y < 2
2
0,
otherwise
X
What is E e 2 ∣ Y = 1 ?
Solution :
The conditional density of X, given that Y = 1, is given by
f X∣Y x ∣ 1 =
fx, 1
f Y 1
1 e −x
2
∞ 1 −x
∫ 0 e dx
2
−x
=e
=
Hence,
X
E e2 ∣Y=1
∞
x
e 2 f X∣Y x ∣ 1dx
=
∫0
=
∫ 0 e 2 e −x dx
∞
x
=2
#
2.4 Computing Expectations by Conditioning
Denote by EX ∣ Y that function of the random variable Y whose value at Y = y is
EX ∣ Y = y. Note that EX ∣ Y is itself a random variable. An extremely important
property of conditional expectation is that for all random variables X and Y
EX = EEX ∣ Y
provided that EX exists.
Theorem (2.1) :
If Y is a discrete random variable, then 2.2 states that
15
(2.2)
∞
EX =
∑ EX ∣ Y = yPY = y
(2.3)
y
while if Y is continuous with density f Y y, then 2.2 says that
EX =
∞
∫ −∞ EX ∣ Y = yf Yydy
Proof :
∑ EX ∣ Y = y.PY = y = ∑ ∑ xPX = x ∣ Y = yPY = y
y
y
=
= x, Y = y
PY = y
∑ ∑ x PXPY
= y
y
=
x
∑ x ∑ PX = x, Y = y
x
=
x
∑ ∑ xPX = x, Y = y
y
=
x
y
∑ xPX = x
x
= EX
#
One way to understand 2.3 is to interpret it as follows. It states that to calculate EX we
may take a weighted average of the conditional expected value of X given that Y = y, each of
he terms EX ∣ Y = y being weighted by the probability of the event on which it is
conditioned.
Example (2.4) : Expectation of the Sum of a Random Number of Random Variables
Suppose that the expected number of accidents per week at an industrial plant is four.
Suppose also that the numbers of workers injured in each accident are independent random
variables with a common mean of 2. Assume also that the number of workers injured in each
accident is independent of the number of accidents that occur. What is the expected number of
injuries during a week?
Solution :
Let
N = number of accidents and
X i = number of injuries in the i th accident, i = 1, 2, ...
Then the total number of injuries can be expressed as ∑ i=1 X i . Now
N
16
N
E
N
∑ Xi
=E E
i=1
∑ Xi ∣ N
i=1
But
N
E
∑ Xi ∣ N = n
n
=E
∑ Xi ∣ N = n
i=1
i=1
n
=E
∑ Xi
by the independence of X i and N
i=1
= nEX
which yields that
N
E
∑ Xi ∣ N
= NEX
i=1
and thus
N
E
∑ Xi
= ENEX
i=1
= ENEX
Therefore, in the example, the expected number of injuries during a week equals 4 × 2 = 8. #
Example (2.5) :
Independent trials, each of which is a success with probability p, are performed until there
are k consecutive successes. What is the mean number of necessary trials?
Solution :
Let
N k = number of necessary trials to obtain k consecutive successes
M k = mean
We will obtain a recursive equation for M k by conditioning on N k−1 , the number of trials
needed for k − 1 consecutive successes. This yields
M k = EN k  = EEN k ∣ N k−1 
Now,
EN k ∣ N k−1  = N k−1 + 1 + 1 − pEN k 
where the preceding follows since if it takes N k−1 trials to obtain k − 1 consecutive successes,
then either the next trial is a success and we have our k in a row or it is a failure and we must
begin anew. Taking expectations of both sides of the preceding yields
17
M k = M k−1 + 1 + 1 − pM k
= 1p + Mpk−1
Since N 1 , the time of the first success, is geometric with parameter p, we see that M 1 = 1p
and, recursively,
M 2 = 1p + 12
p
M 3 = 1p + 12 + 13
p
p
...
M k = 1p + 12 + ... + 1k
p
p
#
2.5 Computing Variances by Conditioning
Conditional expectations can also be used to compute the variance of a random variable.
The conditional variance of X given that Y = y is defined by
VX ∣ Y = y = EX − EX ∣ Y = y 2 ∣ Y = y
= EX 2 ∣ Y = y − EX ∣ Y = y 2
Letting VX ∣ Y denote that function of Y whose value when Y = y is VX ∣ Y = y, we
have the following result.
Theorem (2.2) :
The conditional variance formula
VX = EVX ∣ Y + VEX ∣ Y
Proof :
EVX ∣ Y = EEX 2 ∣ Y − EX ∣ Y 2 
= EEX 2 ∣ Y − EEX ∣ Y 2 
= EX 2  − EEX ∣ Y 2 
and
VEX ∣ Y = EEX ∣ Y 2  − EEX ∣ Y 2
= EEX ∣ Y 2  − EX 2
therefore
18
(2.4)
EVX ∣ Y + VEX ∣ Y = EX 2  − EX 2
#
Example (2.6) : Variance of a Compound Random Variable
Let X 1 , X 2 , ... be independent and identically distributed random variables with distribution
F having mean μ and variance σ 2 , and assume that they are independent of the nonnegative
integer valued random variable N. As noted in Example 2.4, where its expected value was
N
determined, the random variable S = ∑ i=1 X i is called a compound random variable. Find its
variance.
Solution :
Whereas we could obtain ES 2  by conditioning on N, let us instead use the conditional
variance formula. Now,
N
VS ∣ N = n = V
∑ Xi ∣ N = n
i=1
n
∑ Xi ∣ N = n
=V
i=1
n
∑ Xi
=V
i=1
= nσ
2
By the same reasoning,
ES ∣ N = n = nμ
Therefore, VS ∣ N = Nσ 2 , ES ∣ N = Nμ and the conditional variance formula gives that
VS = ENσ 2  + VNμ
= σ 2 EN + μ 2 VN
If N is a Poisson random variable, then S = ∑ i=1 X i is called a compound Poisson random
variable. Because the variance of a Poisson random variable is equal to its mean, it follows
that for a compound Poisson random variable having EN = λ
N
VS = λσ 2 + λμ 2
= λEX 2 
where X has the distribution F.
2.6 Computing Probabilities by Conditioning
Let E denote an arbitrary event and define the indicator random variable X by
19
#
X=
1, if E occurs
0, if E does not occur
It follows from the definition of X that
EX = PE
EX ∣ Y = y = PE ∣ Y = y
for any random variable Y
Therefore
PE =
=
∑ PE ∣ Y = yPY = y
y
∞
∫ −∞ PE ∣ Y = yf Yydy
if Y is discrete
if Y is continuous
Example (2.7) : The Ballot Problem
In an election, candidate A receives n votes, and candidate B receives m votes where
n > m. Assuming that all orderings are equally likely, show that the probability that A is
m
always ahead in the count of votes is nn −
+m.
Solution :
Let P n,m denote the desired probability. By conditioning on which candidate receives the
last vote counted we have
P n,m = PA always ahead ∣ A recives last vote n +n m
m
+PA always ahead ∣ B receives last vote n +
m
Now given that A receives the last vote, we can see that the probability that A is always ahead
is the same as if A had received a total of n − 1 and B a total of m votes. Because a similar
result is true when we are given that B receives the last vote, we see from the preceding that
P n,m = n +n m P n−1,m + m m
(2.5)
+ n P n,m−1
m
We can now prove that P n,m = nn −
+ m by induction on n + m. As it is true when n + m = 1,
that is, P 1,0 = 1, assume it whenever n + m = k. Then when n + m = k + 1, we have by
equation 2.5 and the induction hypothesis that
n−m+1
P n,m = n +n m n − 1 − m + m m
+n n+m−1
n−1+m
m
= nn −
+m
and the result is proven.
20
#
Exercise (2.1) :
1. Let X and Y be two discrete random variables taking values in 1, 2, 3, .... Suppose
PX = m, Y = n = 0.640.2 n+m−2 n, m = 1, 2, ...
and Z = X + Y. Compute
a. PX = m, Z = k for m = 5 and k = 7 first and then in general after that.
b. PX = 5|Z = 7
c. PZ = 7|X = 5
d. PY = 3|Z = 14
e. PX = 5, Y = 3|Z = 8
f. PZ = 8|X = 6, Y = 2
2. Let X and Y denote, respectively, the number of babies born on a certain day in a hospital
and the number of them which are boys. Suppose their joint distribution is
PX = n, Y = m =
e −14 7.14 m 6.68 n−m
, if m = 0, 1, ..., n; n = 0, 1, ...
m!n − m!
0,
otherwise
Find PX = n, PY = m and PX − Y = k|Y = m for all m, n ∈ 0, 1, ....
3. Suppose that px, y, the joint probability mass function of X and Y, is given by
p1, 1 = 0.5,
p1, 2 = 0.1,
p2, 1 = 0.1,
p2, 2 = 0.3
Calculate the conditional probability mass function of X given that Y = 1.
4. Let A, B and C be independent random variables with distributions indicated below:
PA = 1 = 0.4
PA = 2 = 0.6
PB = −3 = 0.25 PB = −2 = 0.25 PB = −1 = 0.25 PB = 1 = 0.25
PC = 1 = 0.5
PC = 2 = 0.4
PC = 3 = 0.1
What is the probability that
Ax 2 + Bx + C
has real roots?
5. Reliability is the probability of a device performing its purpose adequately for the period
of time intended under the operating conditions encountered. A piece of equipment
consists of three components in series: for the equipment to function, all three
components must be functioning. Let X 1 , X 2 , and X 3 be the respective lifetimes of the
components 1,2, and 3 measured in hours. Suppose
−t
PX 1 ≤ t = 1 − e 10000 , t ≥ 0
−2t
PX 2 ≤ t = 1 − e 10000
t≥0
−3t
PX 3 ≤ t = 1 − e 100000 t ≥ 0
If the lifetimes of the components are independent, what is the reliability of the
21
equipment in a mission requiring 4000 hours?
6. If X and Y are independent Poisson random variables with respective means λ 1 and λ 2 ,
calculate the conditional expected value of X given that X + Y = n.
7. Suppose the joint density of X and Y is given by
fx, y =
4yx − ye −x+y , 0 < x < ∞, 0 ≤ y ≤ x
0,
otherwise
Compute EX ∣ Y = y.
8. The Mean of a Geometric Distribution A coin, having probability p of coming up
heads, is to be successively flipped until the first head appears. What is the expected
number of flips required?
9. A miner is trapped in a mine containing three doors. The first door leads to a tunnel that
takes him to safety after two hours of travel. The second door leads to a tunnel that
returns him to the mine after three hours of travel. The third door leads to a tunnel that
returns him to his mine after five hours. Assuming that the miner is at all times equally
likely to choose any of the doors, what is the expected length of time until the miner
reaches safety?
10. Variance of the Geometric Random Variable Independent trials, each resulting in a
success with probability p, are performed in sequence. Let N be the trial number of the
first success. Find VN.
11. A total of n people have been invited to a party honoring an important official. The party
begins at time 0. The arrival times of the n guests are independent exponential random
variables with mean 1, and the arrival time of the official is uniformly distributed
between 0 and 1.
a. Find the probability that exactly k of the guests arrive before the official.
b. Find the expected number of guests who arrive before the official.
12. A vehicle insurance company classifies each of its policyholders as being of one of the
types i = 1, 2, ..., k. It supposes that the numbers of accidents that a type i policyholder
has in successive years are independent Poisson random variables with mean λ i ,
i = 1, 2, ..., k. The probability that a newly insured policyholder is type i is p i ,
k
p i = 1. Given that a policyholder had n accidents in her first year, what is the
∑ i=1
expected number that she has in her second year? What is the conditional probability that
she has m accidents in her second year?
13. Let X 1 and X 2 be independent geometric random variables having the same parameter p.
Determine
PX 1 = i ∣ X 1 + X 2 = n
22
14. An urn contains three white, six red and five black balls. Six of these balls are randomly
selected from the urn. Let X and Y denote respectively the number of white and black
balls selected. Compute the conditional probability mass function of X given that Y = 3.
Also compute EX ∣ Y = 1. Assume that when a ball is selected its colour is noted and
it is then replaced in the urn before the next selection is made.
15. Prove that if X and Y are jointly continuous, then
EX =
∞
∫ −∞ EX ∣ Y = yf Yydy
16. A coin having probability p of coming up heads is successively flipped until two of the
most recent three flips are heads. Let N denote the number of flips. Note that if the first
two flips are heads, then N = 2 Find EN.
17. Suppose X is a Poisson random variable with mean λ. The parameter λ is itself a random
variable whose distribution is exponential with mean 1. Show that
n+1
PX = n = 1
2
18. An insurance company supposes that the number of accidents that each of its
policyholders will have in a year is Poisson distributed, with the mean of the Poisson
depending on the policyholder. If the Poisson mean of a randomly chosen policyholder
has a gamma distribution with density function
gλ = λe −λ ,
λ≥0
what is the probability that a randomly chosen policyholder has exactly n accidents next
year?
23