Download here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Probability
Dr J. Marchini
January 10, 2011
Contents
1 Lecture 1
1.1 Random Walks . . . . . . . . . . . . . . . . .
1.1.1 Reminders . . . . . . . . . . . . . . . .
1.1.2 Random Walks - simplest case . . . .
1.1.3 Random Walks - Questions of interest
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
4
5
2 Lecture 2
2.1 Random Walks Continued . . . . . . .
2.2 Using Conditioning and Expectation .
2.2.1 Some reminders . . . . . . . . .
2.2.2 Application to Random Walks
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
8
12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Lecture 3
14
3.1 Using Conditioning and Expectation Continued . . . . . . . . . . 14
3.1.1 Application to Random Walks Continued . . . . . . . . . 14
4 Lecture 4
18
4.1 Using Conditioning and Expectation Continued . . . . . . . . . . 18
4.2 Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Sums of Random Variables . . . . . . . . . . . . . . . . . . . . . 21
5 Lecture 5
5.1 Continuous Random Variables . . . . . . . . .
5.1.1 Reminder: Discrete Random Variables
5.1.2 Continuous Random Variables . . . .
5.1.3 Properties: pdf’s and pmf’s . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
23
24
25
6 Lecture 6
26
6.1 Continuous Random Variables Continued . . . . . . . . . . . . . 26
6.1.1 Expectation of a Continuous Random Variable . . . . . . 27
6.1.2 Functions of Random Variables and their pdfs . . . . . . 31
1
7 Lecture 7
33
7.1 Continuous Random Variables Continued . . . . . . . . . . . . . 33
7.1.1 Functions of Random Variables and their pdfs Continued 33
7.1.2 The Normal Distribution . . . . . . . . . . . . . . . . . . 37
8 Lecture 8
8.1 Continuous Random Variables Continued . . . .
8.1.1 Jointly Continuous Distributions . . . . .
8.1.2 Expectation with Joint Distributions . . .
8.1.3 Independence . . . . . . . . . . . . . . . .
8.1.4 Random Samples for Continuous Random
2
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Variables
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
39
40
42
43
1
Lecture 1
1.1
Random Walks
1.1.1
Reminders
We will need the Partition Theorem, which rests on the definitition of conditional probability as:
P(A ∩ B)
P(A | B) =
P(A)
Definition 1 A Partition {Bi } satisfies B1 ∪ . . . ∪ Bn = Ω with Bi ∩ Bj = ∅
when i 6= j, where Ω is the set of all possible outcomes (called the sample space).
Theorem 1 (Partition Theorem) P(A) =
n
X
P(A | Bi )P(Bi )
1
Proof
P(A)
P(A ∩ Ω)
P(A ∩ (B1 ∪ . . . ∪ Bn ))
P((A ∩ B1 )) ∪ . . . ∪ (A ∩ Bn ))
P(A ∩ B1 ) + . . . + P(A ∩ Bn )
n
X
=
P(A ∩ Bi )
=
=
=
=
(de Morgan’s Law)
(Bi disjoint)
1
now use definition of conditional probability
=
Pn
1
P(A | Bi )P(Bi )
Difference Equations - second reminder
Example 1 (1st order) un − 2un−1 = 2n for n = 1, 2, 3, . . ..
Complementary Solution un − 2un−1 = 0. Try un = A.λn which gives
λ−2=0
=⇒
un = A.2n
Particular Solution No good trying α2n as it is part of the complementary
solution. Try un = αn2n which gives
αn2n − 2α(n − 1)2n−1
⇒
α2n
α
= 2n
= 2n
= 1
Hence un = A.2n + n2n . We would require a value of u0 , say, to fix A.
3
Example 2 (2nd order) un − 3un−1 + 2un−2 = 3n for n = 2, 3, . . ..
Complementary Solution un − 3un−1 + 2un−2 = 0. Try un = A.λn
⇒ λ2 − 3λ + 2 = 0
⇒
λ = 1 or 2
⇒
un = A + B.2n
Particular Solution Try un = α3n
⇒ α3n − 3α3n−1 + 2α3n−2
⇒
α(32 − 32 + 2)
⇒
α
= 3n
= 32
= 29
Hence un = A + B.2n + 29 .3n and we would need uo and u1 to determine
A and B.
1.1.2
Random Walks - simplest case
A random walk on the integers 0, 1, 2, . . . , N . If the walk gets to 0 or N it
stops - so these points are called absorbing boundaries. At each (discrete) unit
of time the walk moves to an adjacent point. So suppose we are at point k:
p
=
Probability of moving to k + 1 from k
q
=
Probability of moving to k − 1 from k
and in the simplest case, p + q = 1 and p, q do not depend on k.
Example 3 (Gambler’s ruin) A gambler playing a series of games in a casino
where in each game he has a probability p of winning £1 and probability q of
losing £1. He has £k in his pocket. He stops if he loses all £k or wins and has
£N in which case he leaves.
4
In this case we can assume p < q. At the start, we set X0 = k, say.
X1 be the position after 1 move
Let
X2 be the position after 2 moves, and so on
If X0 = k, then
P(X1 = k + 1)
=
p
P(X1 = k − 1)
=
q
P(X1 = j)
=
0
j 6= k + 1, k − 1
and
P(X2 = k + 2)
=
p2
P(X2 = k)
=
2pq
P(X2 = k − 2)
=
q2
P(X2 = j)
=
0
j 6= k + 2, k, k − 2
So we can build up probabilities of where the walk is after 2 moves, 3 moves
and even n moves. But it gets very complicated. The gambler is interested in
the probability of going broke or winning £N (some given large amount). This
is a simpler question.
1.1.3
Random Walks - Questions of interest
Question 1 What is the probability of absorption at 0? (i.e. arriving at 0
before N ).
Solution The outcome is a path on the integers 0 to N , ending at either 0
or N and consists of the entire set of moves to absorption. Let wk be the
probability of reaching 0 before N starting from k. (So the question is: what is
the probability of absorption at 0 rather than N ?).
5
The partition we use is the set of first moves: k → k + 1 or k → k − 1.
P(0 before N from k)
⇒
wk
⇒
wk
= P(0 before N from k | k → k + 1)P(k → k + 1)
+ P(0 before N from k | k → k − 1)P(k → k − 1)
= P(0 before N from k + 1)p
+ P(0 before N from k − 1)q
= pwk+1 + qwk−1
with boundaries w0 = 1 and wN = 0. We now solve:
pwk+1 − wk + qwk−1
=
0
which gives the auxilliary equation
pλ2 − λ + q = 0
(pλ − q)(λ − 1) = 0
q
hence λ = or 1 and wk = A
p
k
q
+ B, assuming p 6= q. Furthermore:
p
from w0 :
1
=
from wN :
0
=
⇒ 1
=
A+B
N
q
A
+B
p
N !
q
A 1−
p
which gives
wk
=
=
( pq )k
1 − ( pq )N
−
( pq )N
1 − ( pq )N
( pq )k − ( pq )N
1 − ( pq )N
NB If you solve the problem of absorption at N before 0 you can check P(absorption) = 1
from any starting point.
What happens if p = q? Then λ = 1 twice and you need to recall how to deal
with a repeated root.
6
Question 2 Suppose we know absorption occurs at 0. What is the probability
that the first move was k → k − 1? (we assume p 6= q).
Solution
P(k → k − 1 | 0 from k)
P((k → k − 1) ∩ (0 from k))
P(0 from k)
P(0 from k | k → k − 1)P(k → k − 1)
P(0 from k)
P(0 from k − 1)P(k → k − 1)
P(0 from k)
wk−1 q
wk
(( pq )k−1 − ( pq )N )q
=
=
=
=
=
( pq )k − ( pq )N
We would expect this result to be bigger than q - it is! Check this result for
p > q and q > p.
Example 4 Now suppose we randomly allocate the initial point on the integers
0, 1, . . . , N (again assuming p 6= q). What is the probability that absorption
occurs at 0 and not N ?
Partitioning over the starting points 0, 1, . . . , N , the Partition Theorem gives
P(absorption at 0) =
N
X
P(absorption at 0 | k)P(k)
k=0
Random selection of the starting point gives P(k) =
Hence:
P(absorption at 0)
=
N
X
wk
k=0
=
=
=
=
1
, for k = 0, 1, . . . , N .
N +1
1
N +1
q N
q k
N
1 X (p) − (p)
N +1
1 − ( pq )N
k=0
P
N
q k
(
)
− (N + 1)( pq )N
k=0 p
1
N +1
1 − ( pq )N
1 − ( pq )N +1 − (N + 1)(1 − ( pq ))( pq )N
(N + 1)(1 − ( pq ))(1 − ( pq )N )
1 − (N + 1)( pq )N + N ( pq )N +1
(N + 1)(1 − ( pq ))(1 − ( pq )N )
7
2
Lecture 2
2.1
Random Walks Continued
Example 5 In principle, we can deal with different move possiblities and probabilities.
P(k → k − 1)
= q
P(k → k + 1)
=
p1
P(k → k + 2)
=
p2
with q + p1 + p2 = 1. Partitioning over these cases gives:
P(0 from k)
= P(0 from k | k → k − 1)q + P(0 from k | k → k + 1)p1
+ P(0 from k | k → k + 2)p2
Let wk = P(getting to 0 before N ), then we have:
wk = qwk−1 + p1 wk+1 + p2 wk+2
⇒
0 = p2 wk+2 + p1 wk+1 − wk + qwk−1
Obviously a much more complicated system - a third order difference equation.
In principle we need to solve the cubic:
wk = A.λk
⇒
p2 λ3 + p1 λ2 − λ + q = 0
but since we know that λ = 1 is still a root, in practice we only need to solve a
quadratic. Note that the boundary conditions might become rather complicated
and we would need 3 conditions.
2.2
2.2.1
Using Conditioning and Expectation
Some reminders
Definition 2 (Conditional Expectation) If X is a discrete random variable
and A is an event with P(A) > 0, then
the expectation of X given A, denoted
X
by E(X | A) is defined by E(X | A) =
x P(X = x | A).
x
8
Example 6 A coin with probability p of throwing a head is repeatedly tossed.
Let a run be an unbroken series of heads or tails. What is the expected number
in the run given that the first in the run is a head?
Solution Let H be the event that the first throw in the sequence is a head, and
X be the number in the run. Then:
P(X = k | H) = pk−1 q
for q = 1 − p and k = 1, 2, . . .
Therefore
E(X | H)
∞
X
=
k=1
∞
X
=
k P(X = k | H)
kpk−1 q
k=1
∞
X
= q
kpk−1
k=1
q
(1 − p)2
1
q
=
=
Theorem 2 (Partition Theorem for Expectation) For a discrete random
variable X and a partition of the sample space {A1 , A2 , . . . , An } such that
n
X
P(Ai ) > 0 for each i, then E(X) =
E(X | Ai )P(Ai ).
i=1
Proof Note that E(X | Ai ) =
X
xP(X = x | Ai ). This gives:
x
X
E(X | Ai )P(Ai )
=
i
XX
i
x P(X = x | Ai )P(Ai )
x
interchanging order of summation (assume absolute convergence)
!
=
X
x
X
x
P(X = x | Ai )P(Ai )
i
then by the partition theorem, we have
=
X
xP(X = x)
x
=
E(X)
9
Example 7 Suppose in the coin experiment of the previous example, we now
ask, what is the expected length of the first sequence?
Solution Let X be the length of the first sequence, and partition over the two
events:
H = {1st toss results in a head}
T = {1st toss results in a tail}
Then:
E(X)
since E(X = x | H) =
1
q
= E(X | H)P(H) + E(X | T )P(T )
1
1
.p + .q
=
q
p
1 − 2pq
=
pq
and by symmetry E(X = x | T ) =
1
p
Example 8 Suppose a coin is tossed (with p = q = 12 ) until 2 heads appear
consecutively for the first time. Calculate the expected number of throws of the
coin.
Solution Partition on the 1st and 2nd throws:
st
HT = {1
T = {1st result is a tail}
result is a head, 2nd result is a tail}
HH = {1st result is a head, 2nd result is a head}
⇒ start all over again
⇒ start all over again
⇒ stop
and let X be the total number of throws. Then:
E(X)
= E(X | T )P(T ) + E(X | HT )P(HT ) + E(X | HH)P(HH)
1
1
1
= (1 + E(X)) + (2 + E(X)) + 2.
2
4
4
3 3
=
+ E(X)
2 4
= 6
Try extending this argument to 3 heads consecutively.
10
Example 9 Suppose that household i for i = 1, 2, . . . , n has probability pi of
owning at least one computer. From n households a subsample of m houses is
to be selected at random.
(a) If one household is selected at random, what is the probability that it has
at least one computer? Suppose
1 if the selected household has at least one computer
X=
0 otherwise
What is E(X)?
(b) If m households are randomly selected, what is the expected number with
at least one computer?
Solution
(a) Suppose that household i is selected. Then we see
P(X = 1|i selected) = P(X = 1|i) = pi .
Partitioning on household i for i = 1, . . . , n gives:
P(X = 1)
=
=
n
X
1
n
X
P(X = 1 | i)P(i selected)
pi
1
1
n
n
=
1X
pi
n 1
Again, using partition for expectation:
E(X)
=
=
n
X
1
n
X
E(X | i)P(i selected)
(1.pi + 0.(1 − pi ))
1
n
=
1X
pi
n 1
11
1
n
(b) Let Z be the number of households with at least one computer if m households are randomly selected, and
1 if j th selected house has a computer
0
Xj =
0 otherwise
with j = 1, 2, . . . be the j th selection (as opposed to the j th household).
[Note that the {Xj0 } are not independent.] Then
0
Z = X10 + . . . + Xm
0
0
⇒ E(Z) = E(X1 + . . . + Xm
)
0
0
= E(X1 ) + . . . + E(Xm
)
n
X
m
=
pi
n 1
as Xj0 are identically distributed and E(Xj0 ) =
2.2.2
(by linearity of E(·))
1X
pi , from part (a).
n
Application to Random Walks
What is the expected number of steps to absorption (either 0 or N )? Let Xk
be the number of steps to absorption from k. Then
E(Xk ) = E(Xk | k → k − 1)P(k → k − 1) + E(Xk | k → k + 1)P(k → k + 1)
and if we set ek = E(Xk ), then
ek
⇒ pek+1 − ek + qek−1
=
=
=
(1 + ek−1 )q + (1 + ek+1 )p
−q − p
−1
with boundary conditions e0 = eN = 0. Solving:
Complementary Solution The homologous equation pek+1 − ek + qek−1 = 0
gives the auxilliary equation
pλ2 − λ + q
⇒
λ
= 0
= 1 or
⇒
= A+B
ek
q
p
k
q
, assuming p 6= q
p
Particular Solution Try ek = αk (ek = constant is no good as this is part of
the complementary solution). Then
pα(k + 1) − αk + qα(k − 1) = −1
(p − q)α = −1
−1
⇒
α =
p−q
⇒
12
k
q
k
General Solution Hence ek = A + B
−
. Solving for boundary
p
p−q
conditions:
e0 = 0 :
0
eN = 0 :
⇒ B
=
A+B
0 =
N !
q
1−
=
p
A+B
−
N
q
N
−
p
p−q
N
p−q
Hence
ek =
N
N
−
(p − q)(1 − ( pq )N ) (p − q)(1 − ( pq )N )
[Similarly, we can solve for p = q = 12 .]
13
k
q
k
−
p
p−q
3
Lecture 3
3.1
3.1.1
Using Conditioning and Expectation Continued
Application to Random Walks Continued
Example 10 (Reflecting barrier at N , absorption at 0) Suppose for some
random walk with p 6= q, there is a barrier at N which simply reflects back.
Solution Let ek be the expected number of steps to absorption (which can now
only happen at 0). The general equation is the same: pek+1 − ek + qek−1 = −1
with e0 = 0, but at N
E(XN )
eN
= E(XN | N → N − 1)P(N → N − 1)
=
(1 + eN −1 ).1
so the boundary condition is eN = 1 + eN −1 . Solving:
ek = A + B
k
q
k
−
p
p−q
(from previous lecture)
with e0 = 0 = A + B and eN = eN −1 + 1 gives
⇒
N
q
N
A+B
−
p
p−q
=
B
=
=
⇒
ek
=
N −1
q
N −1
A+B
−
+1
p
p−q
N
−1
2p2
p
−
(p − q)2 q
−A
N −1
k !
2p2
p
q
k
1−
−
2
(p − q)
q
p
p−q
for p 6= q and k = 0, 1, . . . , N . Note that we need to re-solve the difference
equation when p = q = 21 .
14
Corollary 1 (to Partition Theorem fo Expectation) If Y = h(X) is a
function of a discrete random variable X, and B1 , B2 , . . . , Bn is a partition
of the sample space, then
E(Y ) = E(h(X)) =
n
X
E(h(X) | Bi )P(Bi ) =
n
X
1
E(Y | Bi )P(Bi ).
1
In particular, if the probability generating function (p.g.f.) of X
GX (s) =
∞
X
px sx = E(sX )
x=0
then
E(sX ) =
n
X
E(sX | Bi )P(Bi )
1
Proof Any function of a random variable is itself a random variable.
Example 11 Suppose that there are N red balls, N white balls and 1 blue ball
in an urn. A ball is selected at random and then replaced. Let X be the number
of red balls selected before a blue ball is chosen. Find:
(a) the probability generating function of X,
(b) E(X),
(c) Var X
Solution Condition on the first selected ball, i.e. partition on the colour of the
first selection, Red (R), White (W) or Blue(B). Then
(a)
= E(sX )
= E(sX | R)P(R) + E(sX | W )P(W ) + E(sX | B)P(B)
N
1
N
+ E(sX )
+ E(s0 )
= E(s1+X )
2N + 1
2N + 1
2N + 1
N
N
1
= sE(sX )
+ E(sX )
+ 1.
2N + 1
2N + 1
2N + 1
N
1
Ns
GX (s) +
GX (s) +
=
2N + 1
2N + 1
2N + 1
1
GX (s) =
N + 1 − Ns
GX (s)
⇒
which is the p.g.f. for a geometric distribution with parameter p =
15
1
N +1 .
(b)
E(X)
=
=
=
=
G0X (1)
N
(N + 1 − N s)2
N
1
−1 +
p
s=1
(c)
Var X
=
=
=
=
=
G00X (1) + G0X (1) − (G0X (1))2
2N 2
+ N − N2
(N + 1 − N s)3 s=1
2N 2
+ N − N2
13
N (N + 1)
q
p2
If we were asked for just E(X), it would be easier to calculate:
E(X)
=
=
=
E(X | R)P(R) + E(X | W )P(W ) + E(X | B)P(B)
N
1
N
+ E(X).
+ 0.
(1 + E(X)).
2N + 1
2N + 1
2N + 1
N (as before)
To calculate Var X, it’s easier to find GX (s) = E(sX ) first.
Corollary 2 Consider the special case when the partition is given by another
discrete random variable Z, so that we want E(X) and the partition is given by
{Z = n}n=0,1,2,... then
E(X) =
∞
X
E(X | Z = n)P(Z = n)
n=0
provided this sum converges absolutely. For those with an eye to mathematical
elegance, this is more succinctly expressed as
E(E(X | Z)) = E(X).
Example 12 In a comercial market garden, let Xi be the number of fruit produced by a plant which germinates from a seed and K be the number of seeds
which germinate from a total of n seeds planted. Let Xi for i = 0, 1, 2, . . . , K
be independent random variables
PK having Poisson distributions with mean µ, let
K ∼ B(n, p), and let Z = 0 Xi be the total number of fruit which the comercial grower has from the planted seeds. Find the expected number of fruit and
the variance.
16
Solution First of all:
E(sXi )
=
∞
X
sk P(Xi = k)
0
=
=
∞
X
sk uk e−µ
e
0
µs−µ
k!
Hence EXi = VarXi = µ. Also K ∼ B(n, p), hence:
GK (s)
⇒ E(K)
⇒ VarK
=
E(sK )
=
(q + ps)n
=
G0K (1)
=
np
=
G00K (1) + G0K (1) − (G0K (1))2
=
np(1 − p).
This gives:
E(Z)
=
=
=
=
n
X
k=0
n
X
0
n
X
0
n
X
0
E(Z | K = k)P(K = k)
E(X1 + . . . + Xn )P(K = k)
kE(X1 )P(K = k)
n k
kµ
p (1 − p)n−k
k
= µE(K)
= µnp.
17
4
Lecture 4
4.1
Using Conditioning and Expectation Continued
Solution (continued)
E(sZ )
=
n
X
E(sZ | K = k)P(K = k)
0
=
=
n
X
0
n
X
0
=
⇒ G0Z (s)
⇒
E(sX1 +...+Xk )P(K = k)
(e
X1 , . . . , Xk independent
n k
)
p (1 − p)n−k
k
µs−µ k
(1 − p + peµs−µ )n
= µnpeµs−µ (1 − p + peµs−µ )n−1
G00Z (s)
= µnp.µeµs−µ (1 − p + peµs−µ )n−1
+µnpeµs−µ (n − 1)pµeµs−µ (1 − p + peµs−µ )n−2
= µ2 np + µ2 np2 (n − 1) + µnp − µ2 n2 p2
⇒ VarZ
= µ2 np − µ2 np2 + µnp
= µ2 np(1 − p) + µnp
Alternatively we could find the variance directly. X1 , . . . , XK are independent
(here all Poisson) and K (here Binomial) independent of Xi . We require
VarZ = Var(X1 + . . . + XK ) = E(Z 2 ) − (E(Z))2
Hence
E(Z 2 )
=
=
n
X
k=0
n
X
E(Z 2 | K)P(K = k)
E((X1 + . . . + Xk )2 )P(K = k)
0
=
n
X
0
E(X12 + . . . + Xk2 + X1 X2 + . . . + Xk−1 Xk )P(K = k)
{z
}
{z
} |
|
k(k−1) terms
k terms
Also
E(X12 ) = VarX1 + µ2X = µ + µ2
for k terms
and
E(X1 X2 ) =
E(X1 ).E(X2 ) = µ2
|
{z
}
X1 ,X2 independent
18
for k(k − 1) terms
So
E(Z 2 )
=
n
X
(k(µ + µ2 ) + k(k − 1)µ2 )P(K = k)
0
=
n
X
(kµ + k 2 µ2 )P(K = k)
0
=
µ
n
X
2
kP(K = k) + µ
0
⇒ VarZ
4.2
n
X
k 2 P(K = k)
0
2
µK )
2 2
=
µE(K) + µ2 (VarK +
=
µnp + µ2 (np(1 − p) + n p )
=
µnp + µ2 np(1 − p) + µ2 n2 p2 − µ2 n2 p2
=
µnp + µ2 np(1 − p)
Random Samples
Definition 3 Let X1 , X2 , . . . , Xn denote n independent random variables,
each of which has the same distribution. These random variables are said to
constitute a random sample from the distribution.
Statistics often involves random samples where the distribution (“parent
distribution”) is unknown. A realisation of such a random sample is used to
make inferences about the distribution.
n
Definition 4 The sample mean is X̄ =
1X
Xi .
n 1
This is a key random variable which itself has an expectation and a variance.
Expectation of X̄
n
E(X̄)
= E(
1X
Xi )
n 1
n
=
1X
E(Xi )
n 1
by linearity of E
1
.nE(Xi )
n
= E(Xi )
=
Xi are identically distributed
= µX , say
Variance of X̄
VarX̄ = E((X̄ − µX )2 )
19
Lemma 3
Var(X + Y ) = VarX + VarY + 2Cov(X, Y )
where the covariance is defined to be
Cov(X, Y ) = E((X − µX )(Y − µY )).
Proof
E((X + Y − µX − µY )2 )
E(((X − µX ) + (Y − µY ))2 )
=
= E((X − µx )2 ) + 2E((X − µX )(Y − µY ))
+ E((Y − µY )2 )
=
(by linearity of E)
VarX + 2Cov(X, Y ) + VarY
If X, Y are independent then Cov(X, Y ) = 0 since
E((X − µX )(Y − µY )) = E(X − µX ).E(Y − µY ) = 0
|
{z
}
by indepencence
By a straightforward, but cumbersome extension,
Var(X1 + . . . + Xk ) =
k
X
VarXi +
i=1
k X
X
Cov(Xi , Xj )
i=1 j6=i
and so if Xi , Xj are all independent then
Var(X1 + . . . + Xk ) =
k
X
VarXi .
1
So returning to X̄ we have
VarX̄
1
Var( (X1 + . . . + Xn ))
n
1
=
Var(X1 + . . . + Xn )
n2
n
1 X
VarXi
since Xi independent
=
n2 1
=
=
=
1
.nVar(Xi )
n2
1
VarXi .
n
since X1 , . . . , Xn indentically distributed
This is the variance of X̄ for a random sample.
Example 13 Let X1 , . . . , Xn be a random sample from a Bernoulli distribution with parameter p. E(Xi ) = p, VarXi = p(1 − p). Hence E(X̄) = p and
Var(X̄) = np (1 − p).
20
4.3
Sums of Random Variables
Because the sample mean X̄ is important, it is generally the case that the sum
X1 + . . . + Xn is also important.
Example 14 Sum of two independent Poisson variables. Suppose X1 ∼ P oi(µ),
X2 ∼ P oi(µ). What is the distribution of Z = X1 + X2 ?
Solution
X1
E(s
)
∞
X
=
0
∞
X
=
sk PK = k
sk
0
µk −µ
e
k!
eµs−µ
=
Hence
GX1 +X2 (s)
= E(sX1 +X2 )
= E(sX1 .sX2 )
= E(sX1 ).E(sX2 )
=
⇒ X1 + X2
since X1 , X2 are independent
eµ(s−1) .eµ(s−1)
= e2µ(s−1)
∼ P oi(2µ)
Similarly for Xi ∼ P oi(µ), random sample X1 , . . . , Xn
GX1 +...+Xn (s)
= E(sX1 +...+Xn )
= E(sX1 ).E(sX2 ). . . . .E(sXn )
=
⇒ X1 + . . . + Xn
e
by independence
nµ(s−1)
∼ P oi(nµ)
A summary of discrete random variables is included in table 4.3, page 22.
Finally:
E(X)
= G0X (1)
VarX
= G00X (1) + G0X (1) − (G0X (1))2
21
22
pk q n−k
(Sum of k independent Geometric type (a))
Negative Binomial
P(X = n) =
n−1
k−1
q n−k pk
n = k, k + 1, k + 2, . . .
k = 0, 1, . . .
P(X = k) = q k p
(b) min X = 0
k = 0, 1, 2, . . .
k = 0, 1, . . . , n
k = 1, 2, . . .
µk −µ
k! e
n
k
P(X = k) = q k−1 p
P(X = k) =
P(X = k) =
GX (s) =
GX (s) =
GX (s) =
ps
1−qs
k
p
kq
p2
q
p2
q
p
p
1−qs
q
p2
µ
np(1 − p) = npq
Variance
pq
1
p
µ
np
Mean
p
ps
1−qs
k
GX (s) = eµ(s−1)
GX (s) = (q + ps)n
Table 1: Summary of Discrete Random Variables
Mass Function
Range of Values
Generating function
P(X = 1) = p
{0,1}
GX (s) = q + ps
P(X = 0) = (1 − p) = q
Geometric
(a) min X = 1
(sum of independent Poisson is Poisson)
Poisson
P oi(µ)
(sum of n independent i.d. Bernoulli)
Binomial
B(n, p)
Distribution
Bernoulli
B(1, p)
5
5.1
5.1.1
Lecture 5
Continuous Random Variables
Reminder: Discrete Random Variables
A discrete random variable is:
X : Ω −→ a countable subset (could be finite) of R
For each outcome there is an associated real number which is the observed value
of X, e.g. toss a fair coin 3 times so that p(H) = 12 , and X is the number of
heads, then
X : {T T T, HT T, . . . , HHH} −→ {0, 1, 2, 3}
and for example
P(X = 2)
= P({HHT, HT H, T HH})
3
1
.
= 3.
2
X has a probability mass function (pmf):
P(X = x) = px ,
for x in image of X with px > 0,
X
px = 1
x
Then if the coin is fair, we know p0 = p3 =
1
8
and p1 = p2 = 38 .
There is a fundamental function which we can also define which is shared
with continuous random variables.
Definition 5 The Cumulative Distribution Function (cdf ) of a random
variable X is the function FX , with domain R and codomain [0, 1], defined by
FX (x) = P(X 6 x)
∀x ∈ R
Example 15 For a fair coin, tossed 3 times, p0 = 81 , p1 = p2 = 38 , p3 =
Then

0 , x<0



1


, 06x<1

8
1
3
1
+
=
, 16x<2
P(X 6 x) = FX (x) =
8
8
2


1
3
3
7

+ + = 8 , 26x<3


 8 8 8
1 , x>3
23
1
8.
Properties of F (cdf )
1. Non-decreasing,
2. 0 6 F (x) 6 1,
3. As x → −∞, F (x) → 0; as x → ∞, F (x) → 1,
4. F (x) is right continuous, i.e. lim F (x + h) = F (x),
h↓0
5. P(X ∈ (a, b ]) = F (b) − F (a).
5.1.2
Continuous Random Variables
Definition 6 If X associates a real number with each possible outcome in the
sample space Ω (X : Ω → R) in such a way that FX (x) exists for all x and is
left continuous (as well as right continuous), then X is said to be a continuous
random variable.
What this actually means for us is that FX (x) is continuous for all x. In
this course for continuous random variables, FX (x) is not only continuous but
0
also differentiable almost everywhere. Suppose f (x) = FX
(x), then
P(X ∈ (x − h, x ])
= FX (x) − FX (x − h)
0
= FX (x) − FX (x) + hFX
(x) + O(h2 )
by taking the Taylor expansion. Taking the limit as h → 0 gives
lim
h↓0
P(X ∈ (x − h, x ])
0
= fX (x)(= FX
(x))
h
Definition 7 If FX (x) is differentiable (almost everywhere) and is continuous
dFX
0
is called the probability denstiy function (pdf) of
then fX (x) = FX
(x) =
dx
X.
The Fundamental Theorem of Calculus then gives:
P(X ∈ (a, b ])
= F (b) − F (a)
Z b
=
fX (x)dx
a
Hence
P(X < ∞)
=
“F (∞) − F (−∞)”
Z ∞
fX (x)dx
=
1
=
−∞
0
Additionally, F is non-decreasing gives that FX
(x) = fX (x) > 0.
N.B. If F is a step function, then X is a discrete random variable.
24
5.1.3
Properties: pdf ’s and pmf ’s
pdf (continuous)
fX (x) > 0 ∀x ∈ R
Z ∞
fX (x) = 1
pmf (discrete)
px > 0
P
−∞
Z
∀x in image of X
x
px = 1
x
FX (x) =
fX (x)dx
FX (x) =
−∞
P
k6x
pk
Why do we need continuous random variables with their pdf’s?
A Suppose you wish to calculate the number of working adults who regularly
contribute to charity. You might model this number as X out of n, where
n is the total number of working adults in the UK. We could, in theory
model this as a binomial B(n, p) where p = P(adult contributes), but n is
measured in millions. So instead consider Y ≈ X
n as a continuous random
variable, the “proportion”, with observations in [0, 1 ].
B Some outcomes are essentially continuous. Suppose you are making a
precision part for a piece of sophisticated equipment (e.g. a NASA rocket)
and a specific length is required. There will be tolerances below and above
which parts have to be rejected. Modern measuring techniques imply
looking at a continuous outcome for the length of the part.
Example 16 In one gram of soil there are possibly 1010 microbes. If one was
looking for the number of the most abundant species, one would not use a discrete
r.v. Instead, we would use a continuous r.v., say X to be the proportion of the
most abundant species. Hence X takes values in (0, 1 ).
Earthquake data We have data on the time in days between successive serious earthquakes worldwide which measured at least 7.5 on the Richter Scale, or
killed over 1000 people. Data was recorded between 16/12/1902 and 4/03/1977
for 63 earthquakes, which gives 62 recorded waiting times, with a minimum
of 9 days and a maximum of 1901 days. If earthquakes occur at random, an
exponential model should fit.
Histogram Area is proportional to the frequency of times within the interval
on the time axis (x-axis).
25
6
Lecture 6
6.1
Continuous Random Variables Continued
Example 17 Due to variations in a commercial coffee blender, the proportion
of coffee in a coffee-chicory mis is represented by a continuous random variable
X with pdf
cx2 (1 − x) x ∈ [0, 1]
fX (x) =
0
otherwise
Find the constant c and an expression for the cdf.
Solution
• The constant, c
∞
Z
1
=
fX (x)dx
−∞
Z 1
cx2 (1 − x)dx
=
0
=
c
⇒c =
x4
x3
−
3
4
1
0
12
• The cdf
Z
FX (x)
x
=
fX (x)dx
−∞
=

 0,
Rx
fX (x)dx,
 0
1,
x<0
06x<1
x>1
Since
Z
x
2
12x (1 − x)dx =
0
⇒ FX (x)
x4
x3
−
3
4
12


 0, 3
=
12 x3 −


1,
26
x<0
x4
4
, 06x<1
x>1
Example 18 The duration in minutes of mobile phone calls made by students
is represented by a random variable, X with pdf
1 −x/6
, x>0
6e
fX (x) =
0
otherwise.
What is the probability that a call lasts:
(i) between 3 and 6 minutes,
(ii) more than 6 minutes?
Solution
(i)
Z
P(X ∈ (3, 6))
6
=
fX (x)dx
3
Z
6
=
3
=
=
1 −x/6
e
dx
6
i6
h
−e−x/6
e
− 21
−e
3
−1
(ii)
Z
P(X > 6)
∞
=
fX (x)dx
Z6 ∞
=
1 −x/6
e
dx
6
i∞
=
h6
−e−x/6
=
e−1
6
6.1.1
Expectation of a Continuous Random Variable
Recall the definitions of mean and variance for a discrete r.v:
X
µX = E(X) =
xpx
x
X
2
σX
= VarX =
(x − µX )2 px
= E(X 2 ) − µ2X
x
and in general that E(h(X)) =
X
h(x)px
x
27
Definition
8 Let X be a continuous r.v. with pdf fX and let h be a real function
Z ∞
such that
h(x)fX (x)dx exists and is absolutely convergent i.e.
−∞
Z
∞
| h(x) | fX (x)dx < ∞.
−∞
Then we define the expectation of h(X) to be:
Z ∞
E(h(X)) =
h(x)fX (x)dx
−∞
In particular
(a) mean
Z
∞
µX = E(X) =
xfX (x)dx
−∞
(b) variance
2
σX
= VarX = E(X 2 ) − µ2X
where E(X 2 ) =
Z
∞
x2 fX (x)dx.
−∞
Properties of expectation (assuming all exist as appropriate)
1. If c is a constant and h(x) = c, for all x then
Z ∞
Z ∞
E(c) =
cfX (x)dx = c
fX (x)dx = c.
−∞
−∞
2. E(ch(X)) = cE(h(X)) (from properties of integrals).
3. E(c1 h1 (X) + c2 h2 (X)) = c1 E(h1 (X)) + c2 E(h2 (X)) (from properties of
integrals)
These properties hold whether X is continuous or discrete.
[NB E(sa+X ) = E(sa .sX ) = sa E(sX ) because s is a real-valued variable rather
than a random variable.]
28
Example 19 An archer fires at a target which has four concentric circles of
radii 15, 30, 45 and 60cm. A shot is subject to error and the distance in cm
from the centre is represented by a random variable X with pdf:

 x e−x/10 x > 0
fX (x) =
100

0
x<0
Find
(i) The expectation of X
(ii) The probability that he hits gold (inside the centre circle)
(iii) The variance of X.
Solution
(i)
=
=
Z
∞
x2 −x/10
e
dx
100
0
Z ∞
i∞
1 h
1
2xe−x/10 dx
−10x2 e−x/10
+
100
10 0
0
Z
E(X)
∞
x −x/10
e
dx = 1, gives E(X) = 2 × 10 = 20. (Alternatively
100
0
integrate again by parts.)
Then
(ii)
Z
P(X 6 15)
15
x −x/10
e
dx
100
0
i15
1 h
=
−10xe−x/10 + 10 −10e−x/10
100
0
1 −1.5
=
(−150 − 100)e
+ 100
100
= 1 − 2.5e−1.5
=
=
0.442 . . .
29
(iii)
E(X 2 )
=
⇒ VarX
∞
x3 −x/10
e
dx
100
0
∞
Z ∞
x3
3
=
−
x2 e−x/10 dx
.10e−x/10
+
100
10
0
0
= 30 × 20
Z
=
600
= E(X 2 ) − (E(X))2
=
600 − 400
=
200
√
√
Standard deviation is VarX and this is 10 2.
Example 20 A gambling game works as follows:
Bets are placed on the position of a light on the horizontal axis. Thie light is
generated by an arm which rotates with uniform speed. It is randomly stopped
and the light appears at P .
(i) What is the expected distance of the light from O?
(ii) Where should you bet if the winner is the one whose guess is closest to the
light when it stops?
30
Answer
(i) Let Θ be a random variable representing the angle when the arm is
stopped. Then Θ ∼ U (0, 2π), the uniform distribution on (0, 2π), and
so

 1 0 6 θ < 2π
fΘ (θ) =
2π

0
otherwise
Then the length OP is | cos Θ |, hence
2π
Z
E(| cos Θ |)
| cos θ | .
=
0
Z
=
4
0
=
=
π
2
1
dθ
2π
1
cos θdθ
2π
π
2
[sin θ]02
π
2
π
(ii) Intuition suggests that we should bet on the extremities of the diameter.
(Why?) Can we derive this?
6.1.2
Functions of Random Variables and their pdfs
We have Θ in the previous example, together with its pdf. Can we find the pdf
of the position on the horizontal diameter?
Example 21 1-1 transformation: Let R represent the distance from a given
tree to its nearest neighbour in a forest. Suppose it has pdf:

 r e−r2 /2λ , r > 0
fR (r) =
λ

0
elsewhere
Find the distribution of the “tree-free” area, A around the given tree, so that
A = πR2 .
Z r
Solution We use the cdf, FR (r) =
fR (r)dr.
−∞
FR (r)
= P(R 6 r)
(
0h
ir r < 0
=
−r 2 /2λ
−e
r>0
0
0
r<0
2
=
1 − e−r /2λ r > 0
31
Hence
FA (a)
= P(A 6 a)
= P(πR2 6 a)
r
a
= P(R 6
)
π
0
a<0
=
1 − e−a/2πλ a > 0

a<0
 0
1 −a/2πλ
⇒ fA (a) =
e
a>0

2πλ
Hence A is distributed exponentially, with mean 2πλ.
The method used is to consider the cdf, and re-write it in terms of A.
32
7
7.1
7.1.1
Lecture 7
Continuous Random Variables Continued
Functions of Random Variables and their pdfs Continued
Example 22 (extra) Let X be a random variable with pdf:
4xe−2x , x > 0
fX (x) =
0,
elsewhere
Find
(a) FX (x)
(b) E(X) and E
1
.
X
Solution
(a)
FX (x)
=
=
(b)
 Z x


0dx,
x<0

Z−∞
x


4xe−2x dx, x > 0

0
0,
x<0
1 − (1 + 2x)e−2x , x > 0
(i) E(X):
Z
E(X)
∞
=
xfX (x)dx
−∞
∞
Z
=
0
=
33
1
x.4xe−2x dx
(ii) E
1
:
X
E
1
X
Z
∞
=
−∞
∞
1
fX (x)dx
x
Z
4e−2x dx
∞
= −2e−2x 0
=
0
=
NB in general E
1
X
6=
2
1
.
E(X)
Example 23 transformation which is not 1-1: Let Θ be the random variable (as before) with pdf:
1
0 6 θ < 2π
2π
fΘ (θ) =
0
elsewhere
Let X be the position on the diameter (as before), so that X = cos Θ. Find the
pdf of X.
Solution
FΘ (θ)
=
=
P(Θ 6 θ)

θ<0
 0
θ
0
< θ < 2π
 2π
1
θ > 2π
We want FX (x) = P(X 6 x) = P(cos Θ 6 x). We note that observable values
of x satisfy −1 6 x 6 1, and that for each x there are 2 values of θ. Hence,
selecting arccos x ∈ [0, π]:
FX (x)
= P(cos Θ 6 x)
= P(arccos x 6 Θ 6 2π − arccos x)
(2 values of θ for each x)
= FΘ (2π − arccos x) − FΘ (arccos x)
1
1
=
(2π − arccos x) −
arccos x
2π
2π
1
=
(2π − 2arccosx)
2π
1
= 1 − arccos x
(for −1 6 x 6 1)
 π
 1√ 1
−1 < x < 1
⇒ fX (x) =
π 1 − x2

0
otherwise
(undefined at x = ±1)
34
Hence intuition is satisfied in example 20 as fX → ∞ as x → ±1 (even though
area under curve is 1).
A summary of common continuous distributions is shown in table 7.1.1, page 36.
35
36
(x−µ)2
2σ 2
, x∈R
1
, a<x<b
b−a
e−
Uniform (U (a, b))
2πσ 2
1
λα α−1 −λx
x
e
, x>0
Γ(α)
x2
1
√ e− 2 , x ∈ R
2π
√
fX (x) =
fX (x) = λe−λx , x > 0
x
x2
1
√ e− 2 dx
2π
x−µ
σ
−∞
Z
x−a
, a<x<b
b−a
Φ(x) =
Φ
FX (x) = 1 − e−λx , x > 0
Table 2: Common Continuous Distributions
pdf
cdf
Standard Normal (N (0, 1))
Normal (N (µ, σ 2 ))
Gamma (Γ(α, λ))
Exponential (exp(λ))
Name
a+b
2
0
µ
(b − a)2
12
1
σ2
α
λ2
1
λ2
1
λ
α
λ
variance
mean
• Exponential (Gamma also) distributions can arise from measuring “lifetime”, e.g. genuinely life of a person.
• Normal distributions can arise from very many distributions in the following way: generate 100 samples, each containing 62 observations, from an
1
).
exponential distribution
with (parameter) mean of 437, i.e. X ∼ exp( 437
P
1
Calculate x̄ = 62
xi for each of the 100 samples. Then x̄ is approximately drawn form a normal distribution with mean 437 and standard
deviation 437.
7.1.2
The Normal Distribution
A Normally distributed random variable, X ∼ N (µ, σ 2 ), has pdf:
fX (x) = √
If we set Z =
X−µ
σ ,
1
2πσ 2
e−
(x−µ)2
2σ 2
x∈R
,
then
z2
1
fZ (z) = √ e− 2 ,
2π
z∈R
hence Z ∼ N (0, 1), where N (0, 1) is the standard normal distribution.
Z ∞
z2
To justify the normalising constant: Let I =
e− 2 dz, then
−∞
I
2
Z
∞
=
2
e
Z
− z2
−∞
∞ Z ∞
=
−∞
Z
∞
2
dz
e
− y2
dy
−∞
1
e− 2 (y
2
+z 2 )
dydz.
−∞
∂(y, z) = r, and
Let y = r cos θ, z = r sin θ then | J |= ∂(r, θ) I2
Z
∞
Z
2π
e−
=
r=0
=
h
−e−
θ=0
i∞
r2
2
0
Z
∞
⇒
−∞
z2
1
√ e− 2 dx
2π
=
2π
=
1
37
r2
2
rdrdθ
2π
. [θ]0
Mean and Variance:
• N (0, 1):
∞
Z
E(Z)
=
−∞
z2
1
√ ze− 2 dz
2π
Integrand is odd, hence E(Z) = 0. Hence:
VarZ
(setting
= E(Z 2 )
Z ∞
z2
1
z 2 e− 2 dz
= √
2π −∞
z2
du
= ze− 2 , v = z)
dz
=
Z ∞
i∞
z2
z2
1 h
1
√
+√
e− 2 dz
−ze− 2
−∞
2π
2π −∞
So VarZ = 1.
• N (µ, σ 2 ):
X −µ
∼ N (0, 1). So:
σ
E(X) = E(σZ + µ)
X ∼ N (µ, σ 2 ) ⇐⇒ Z =
=
σE(Z) + µ
=
µ
(E is linear)
and
VarX
=
Var(σZ + µ)
= E((σZ + µ − µ)2 )
= E(σ 2 Z 2 )
= σ 2 E(Z 2 )
= σ2
N (0, 1) cdf: Let
Z
Φ(z)
| {z }
z
=
−∞
z 02
1
√ e− 2 dz 0
2π
standard notation
=
P(Z 6 z)
= FZ (z)
Then for X ∼ N (µ, σ 2 )
FX (x)
=
P(X 6 x)
= P(σZ + µ 6 x)
x−µ
= P(Z 6
)
σ
x−µ
= Φ(
)
σ
38
8
Lecture 8
8.1
8.1.1
Continuous Random Variables Continued
Jointly Continuous Distributions
Definition
9 Let X be a continuous r.v. with pdf fX (x) and cdf FX (x) =
Rx
f
(x)dx,
and similarly define Y , fY (y) and FY (y). Then X, Y are jointly
−∞ X
continuous if ∃fX,Y such that ∀x, y ∈ R
Z y Z x
fX,Y (x, y)dxdy = P(X 6 x, Y 6 y).
FX,Y (x, y) =
−∞
−∞
Then fX,Y (x, y) is called the joint density function of X and Y .
For example, in 1 gram of soil, let X be the proportion of the largest species of
microbes, and Y be the proportion of the second largest. Certainly not independent since X > Y > 0 and X + Y 6 1.
Properties
1. fX,Y > 0, ∀x, y
Z ∞Z ∞
2.
fX,Y (x, y)dxdy = 1
−∞
−∞
Example 24 Let the joint pdf of X and Y be
1
2 (x + y), x ∈ [0, 1], y ∈ [1, 2]
f (x, y) =
0
otherwise
Is f (x, y) a pdf ?
Solution
1. Clearly f > 0.
2.
Z
∞
Z
∞
Z
f dxdy
−∞
=
−∞
=
=
=
=
39
1
Z
2
1
(x + y)dxdy
x=0 y=1 2
2
Z 1
1
y2
xy +
dx
2
4 1
0
Z 1
1
3
x+
dx
2
4
0
1
1 2 3
x + x
4
4 0
1
Marginal pdf
Consider the previous example. Suppose we wish to find P(X > 21 ). There are
two methods:
1.
1
Z
1
P(X > )
2
Z
2
=
x= 21
1
Z
=
1
2
y=1
1
(x + y)dxdy
2
1
3
( x + )dx
2
4
9
16
=
2. Alternatively, find the marginal distribution of X by integrating Y out:
Z ∞
fX (x) =
fX,Y (x, y)dy
−∞
2
Z
1
(x + y)dy
1 2
1
3
x+ , 06x61
2
4
=
=
Then
P(X >
Z
1
)
2
1
=
fX (x)dx
1
2
=
9
16
Definition 10 If X, Y have joint density function fX,Y (x, y) then the marginal
pdf of X is
Z
∞
fX (x) =
fX,Y (x, y)dy
−∞
and similarly for Y , the marginal pdf is
Z ∞
fY (y) =
fX,Y (x, y)dx.
−∞
8.1.2
Expectation with Joint Distributions
Z
∞
Z
∞
E(h(X, Y )) =
h(x, y)fX,Y (x, y)dxdy
−∞
−∞
In particular
Cov(X, Y ) = E(X − µX )(Y − µY ) = E(XY ) − µX µY
where µX = E(X) and µY = E(Y ).
40
Example 25 As in the previous example, let X and Y have joint pdf
1
2 (x + y), x ∈ [0, 1], y ∈ [1, 2]
f (x, y) =
0
otherwise
Find
(i) E(X), E(Y )
(ii) Cov(XY )
Solution
(i)
1
Z
EX
2
Z
=
0
1
1
x. (x + y)dxdy
2
13
24
=
or
1
Z
=
xfX (x)dx
Z 1 3
1
x+
dx
=
x
2
4
0
13
=
24
E(X)
0
Similarly
1
Z
E(Y )
Z
2
1
1
y. (x + y)dxdy
2
Z
2
=
0
=
37
24
(ii)
Z
E(XY )
=
=
=
=
⇒ Cov(X, Y )
=
=
=
1
1
xy (x + y)dxdy
2
0
1
3 1 2 2 2 1 3 2
x
y
x
y
+
6
2
2
6
0 1 0 1
1
1
1 8 1
2−
+
−
6
2
2 6 2
5
6
E(XY ) − µX µY
5 37 13
− .
6 24 24
−1
576
41
Note that E(XY ) 6= E(X)E(Y ) and so X, Y are not independent. If X
and Y are independent, then Cov(X, Y ) = 0 as in the discrete case.
8.1.3
Independence
Definition 11 Let the continuous random variables X, Y have joint pdf fX,Y
and repsective marginal pdfs fX , fY . Then X and Y are independent if
fX,Y (x, y) = fX (x)fY (y)
for almost all points (x, y) in R2 .
Example 26 Are X, Y independent if
(a) fX,Y (x, y) = 2, for 0 < x < y < 1,
(b) fX,Y (x, y) = x + y, for 0 < x < 1, 0 < y < 1,
(c) fX,Y (x, y) =
1 − 12 (x2 +y 2 )
,
2π e
for x, y ∈ R?
Solution
(a)
1
Z
fX (x)
=
2dy
=
2(1 − x),
0<x<1
2dx
=
2y,
0<y<1
Z xy
fY (y)
=
0
Hence fX,Y 6= fX fY .
(b)
1
Z
fX (x)
=
(x + y) dy
=
Z01
fY (y)
=
(x + y) dx =
0
x + 21 ,
0<x<1
1
2
0<y<1
+ y,
Hence fX,Y 6= fX fY .
(c)
Z
fX (x)
=
=
∞
2
2
1
1
√ e−x /2 . √ e−y /2 dy
2π
2π
−∞
Z
1 −x2 /2 ∞ 1 −y2 /2
√ e
√ e
dy
2π
2π
−∞
|
{z
}
=1
=
2
1
√ e−x /2 ,
2π
x∈R
2
1
Similarly fY (y) = √ e−y /2 for y ∈ R. Hence fX,Y = fX fY everywhere
2π
in R2 .
Hence X, Y are independent in (c) only.
42
8.1.4
Random Samples for Continuous Random Variables
Recall that X1 , . . . , Xn are a random sample if they are independent and identically distributed with pdf f (x), say. Then the joint pdf is
fX1 ,...,Xn (x1 , . . . , xn ) = f (x1 ) . . . f (xn ) =
n
Y
f (xi )
1
Mean of Sample X̄ =
1
n (X1
+ . . . + Xn )
n
• E(X̄) =
1X
E(Xi ) = E(Xi ), as before.
n 1
•
Var(X̄)
=
Var
1
n
n
X
!!
Xi
(NB Cov(Xi , Xj ) = 0)
1
=
" n
#
1 X
VarXi
n2 1
=
1
VarXi
n
since Xi are independent and identically distributed.
Example 27 Suppose that Xi ∼ U (a, b) and that our random sample is X1 , . . . , Xn .
Find the mean and variance of X̄.
Solution
b+a
(b − a)2
, VarXi =
.
2
12
a+b
(b − a)2
So E(X̄) =
and VarX̄ =
.
2
12n
We quote the following results: E(Xi ) =
43