Download Probability

Document related concepts
no text concepts found
Transcript
Probability
Prof. F.P. Kelly
Lent 1996
These notes are maintained by Paul Metcalfe.
Comments and corrections to [email protected].
Revision: 1.1
Date: 2002/09/20 14:45:43
The following people have maintained these notes.
– June 2000
June 2000 – July 2004
July 2004 – date
Kate Metcalfe
Andrew Rogers
Paul Metcalfe
Contents
Introduction
1
2
3
4
5
6
v
Basic Concepts
1.1 Sample Space . . . . .
1.2 Classical Probability .
1.3 Combinatorial Analysis
1.4 Stirling’s Formula . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
2
The Axiomatic Approach
2.1 The Axioms . . . . . .
2.2 Independence . . . . .
2.3 Distributions . . . . .
2.4 Conditional Probability
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
8
9
Random Variables
3.1 Expectation . . . . . . . . . .
3.2 Variance . . . . . . . . . . . .
3.3 Indicator Function . . . . . . .
3.4 Inclusion - Exclusion Formula
3.5 Independence . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
14
16
18
18
Inequalities
4.1 Jensen’s Inequality . . . .
4.2 Cauchy-Schwarz Inequality
4.3 Markov’s Inequality . . . .
4.4 Chebyshev’s Inequality . .
4.5 Law of Large Numbers . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
26
27
27
28
Generating Functions
5.1 Combinatorial Applications . . . . . .
5.2 Conditional Expectation . . . . . . .
5.3 Properties of Conditional Expectation
5.4 Branching Processes . . . . . . . . .
5.5 Random Walks . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
34
34
35
37
42
Continuous Random Variables
6.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . .
6.2 Transformation of Random Variables . . . . . . . . . . . . . . . . . .
6.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . .
47
50
57
64
.
.
.
.
.
.
.
.
.
.
iii
iv
CONTENTS
6.4
6.5
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . .
Multivariate normal distribution . . . . . . . . . . . . . . . . . . . .
67
71
Introduction
These notes are based on the course “Probability” given by Prof. F.P. Kelly in Cambridge in the Lent Term 1996. This typed version of the notes is totally unconnected
with Prof. Kelly.
Other sets of notes are available for different courses. At the time of typing these
courses were:
Probability
Analysis
Methods
Fluid Dynamics 1
Geometry
Foundations of QM
Methods of Math. Phys
Waves (etc.)
General Relativity
Combinatorics
Discrete Mathematics
Further Analysis
Quantum Mechanics
Quadratic Mathematics
Dynamics of D.E.’s
Electrodynamics
Fluid Dynamics 2
Statistical Physics
Dynamical Systems
Bifurcations in Nonlinear Convection
They may be downloaded from
http://www.istari.ucam.org/maths/.
v
vi
INTRODUCTION
Chapter 1
Basic Concepts
1.1
Sample Space
Suppose we have an experiment with a set Ω of outcomes. Then Ω is called the sample
space. A potential outcome ω ∈ Ω is called a sample point.
For instance, if the experiment is tossing coins, then Ω = {H, T }, or if the experiment was tossing two dice, then Ω = {(i, j) : i, j ∈ {1, . . . , 6}}.
A subset A of Ω is called an event. An event A occurs is when the experiment is
performed, the outcome ω ∈ Ω satisfies ω ∈ A. For the coin-tossing experiment, then
the event of a head appearing is A = {H} and for the two dice, the event “rolling a
four” would be A = {(1, 3), (2, 2), (3, 1)}.
1.2
Classical Probability
If Ω is finite, Ω = {ω1 , . . . , ωn }, and each of the n sample points is “equally likely”
then the probability of event A occurring is
P(A) =
|A|
|Ω|
Example. Choose r digits from a table of random numbers. Find the probability that
for 0 ≤ k ≤ 9,
1. no digit exceeds k,
2. k is the greatest digit drawn.
Solution. The event that no digit exceeds k is
Ak = {(a1 , . . . , ar ) : 0 ≤ ai ≤ k, i = 1 . . . r} .
r
Now |Ak | = (k + 1)r , so that P(Ak ) = k+1
.
10
Let Bk be the event that k is the greatest digit drawn. Then Bk = Ak \ Ak−1 . Also
r
−kr
Ak−1 ⊂ Ak , so that |Bk | = (k + 1)r − k r . Thus P(Bk ) = (k+1)
10r
1
2
CHAPTER 1. BASIC CONCEPTS
The problem of the points
Players A and B play a series of games. The winner of a game wins a point. The two
players are equally skillful and the stake will be won by the first player to reach a target.
They are forced to stop when A is within 2 points and B within 3 points. How should
the stake be divided?
Pascal suggested that the following continuations were equally likely
AAAA
AAAB
AABA
ABAA
BAAA
AABB
ABBA
ABAB
BABA
BAAB
BBAA
ABBB
BABB
BBAB
BBBA
BBBB
This makes the ratio 11 : 5. It was previously thought that the ratio should be 6 : 4
on considering termination, but these results are not equally likely.
1.3
Combinatorial Analysis
The fundamental rule is:
Suppose r experiments are such that the first may result in any of n1 possible outcomes and such that for each of the possible outcomes of the first i − 1 experiments
there are ni possible outcomes
Qr to experiment i. Let ai be the outcome of experiment i.
Then there are a total of i=1 ni distinct r-tuples (a1 , . . . , ar ) describing the possible
outcomes of the r experiments.
Proof. Induction.
1.4
Stirling’s Formula
For functions g(n) and h(n), we say that g is asymptotically equivalent to h and write
g(n)
g(n) ∼ h(n) if h(n)
→ 1 as n → ∞.
Theorem 1.1 (Stirling’s Formula). As n → ∞,
log √
and thus n! ∼
√
n!
→0
2πnnn e−n
2πnnn e−n .
We first prove the weak form of Stirling’s formula, that log(n!) ∼ n log n.
Pn
Proof. log n! = 1 log k. Now
Z n
Z n+1
n
X
log xdx ≤
log k ≤
log xdx,
1
and
Rz
1
1
1
logx dx = z log z − z + 1, and so
n log n − n + 1 ≤ log n! ≤ (n + 1) log(n + 1) − n.
Divide by n log n and let n → ∞ to sandwich
Therefore log n! ∼ n log n.
log n!
n log n
between terms that tend to 1.
3
1.4. STIRLING’S FORMULA
Now we prove the strong form.
Proof. For x > 0, we have
1 − x + x2 − x3 <
1
< 1 − x + x2 .
1+x
Now integrate from 0 to y to obtain
y − y 2 /2 + y 3 /3 − y 4 /4 < log(1 + y) < y − y 2 /2 + y 3 /3.
Let hn = log
n!en
.
nn+1/2
Then1 we obtain
1
1
1
1
−
≤ hn − hn+1 ≤
+ 3.
2
3
2
12n
12n
12n
6n
1
For n ≥ 2, P
0 ≤ hn − hn+1 ≤ P
n2 . Thus hn is a decreasing sequence, and 0 ≤
n
∞
h2 −hn+1 ≤ r=2 (hr −hr+1 ) ≤ 1 r12 . Therefore hn is bounded below, decreasing
so is convergent. Let the limit be A. We have obtained
n! ∼ eA nn+1/2 e−n .
R π/2
We need a trick to find A. Let Ir = 0 sinr θ dθ. We obtain the recurrence Ir =
r−1
r Ir−2
by integrating by parts. Therefore I2n =
Now In is decreasing, so
1≤
(2n)!
(2n n!)2 π/2
I2n
I2n−1
1
→ 1.
≤
=1+
I2n+1
I2n+1
2n
But by substituting our formula in, we get that
2π
I2n
π 2n + 1 2
→ 2A .
∼
I2n+1
2 n e2A
e
Therefore e2A = 2π as required.
1 by
playing silly buggers with log 1 +
1
n
and I2n+1 =
(2n n!)2
(2n+1)! .
4
CHAPTER 1. BASIC CONCEPTS
Chapter 2
The Axiomatic Approach
2.1
The Axioms
Let Ω be a sample space. Then probability P is a real valued function defined on subsets
of Ω satisfying :1. 0 ≤ P(A) ≤ 1 for A ⊂ Ω,
2. P(Ω) = 1,
3. P
for a finite or infinite sequence A1 , A2 , · · · ⊂ Ω of disjoint events, P(∪Ai ) =
i P(Au ).
The number P(A) is called the probability of event A.
We can look at some distributions here. Consider an arbitrary finite or countable
Ω = {ω1 , ω2 , . . . } and an arbitrary collection {p1 , p2 , . . . } of non-negative numbers
with sum 1. If we define
X
P(A) =
pi ,
i:ωi ∈A
it is easy to see that this function satisfies the axioms. The numbers p1 , p2 , . . . are
called a probability distribution. If Ω is finite with n elements, and if p1 = p2 = · · · =
pn = n1 we recover the classical definition of probability.
Another example would be to let Ω = {0, 1, . . . } and attach to outcome r the
r
probability pr = e−λ λr! for some λ > 0. This is a distribution (as may be easily
verified), and is called the Poisson distribution with parameter λ.
Theorem 2.1 (Properties of P). A probability P satisfies
1. P(Ac ) = 1 − P(A),
2. P(∅) = 0,
3. if A ⊂ B then P(A) ≤ P(B),
4. P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
5
6
CHAPTER 2. THE AXIOMATIC APPROACH
Proof. Note that Ω = A∪Ac , and A∩Ac = ∅. Thus 1 = P(Ω) = P(A)+P(Ac ). Now
we can use this to obtain P(∅) = 1 − P(∅c ) = 0. If A ⊂ B, write B = A ∪ (B ∩ Ac ),
so that P(B) = P(A) + P(B ∩ Ac ) ≥ P(A). Finally, write A ∪ B = A ∪ (B ∩ Ac )
and B = (B ∩ A) ∪ (B ∩ Ac ). Then P(A ∪ B) = P(A) + P(B ∩ Ac ) and P(B) =
P(B ∩ A) + P(B ∩ Ac ), which gives the result.
Theorem 2.2 (Boole’s Inequality). For any A1 , A2 , · · · ⊂ Ω,
!
n
n
[
X
P
Ai ≤
P(Ai )
1
i
∞
[
P
!
≤
Ai
∞
X
1
P(Ai )
i
Proof. Let BS1 = A1 S
and then inductively let Bi = Ai \
disjoint and i Bi = i Ai . Therefore
!
!
[
[
P
Ai = P
Bi
i
Si−1
1
Bk . Thus the Bi ’s are
i
=
X
P(Bi )
i
≤
X
as Bi ⊂ Ai .
P(Ai )
i
Theorem 2.3 (Inclusion-Exclusion Formula).


!
n
[
X
\
P
Ai =
(−1)|S|−1 P
Aj  .
1
j∈S
S⊂{1,...,n}
S6=∅
Proof. We know that P(A1 ∪ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∩ A2 ). Thus the result
is true for n = 2. We also have that
P(A1 ∪ · · · ∪ An ) = P(A1 ∪ · · · ∪ An−1 ) + P(An ) − P((A1 ∪ · · · ∪ An−1 ) ∩ An ) .
But by distributivity, we have
!
!
!
n
n−1
n−1
[
[
[
P
Ai = P
Ai + P(An ) − P
(Ai ∩ An ) .
1
i
1
Application of the inductive hypothesis yields the result.
Corollary (Bonferroni Inequalities).

X
S⊂{1,...,r}
S6=∅
|S|−1
(−1)

!
n
≤
[
P
Aj  or P
Ai
1
≥
j∈S
\
according as r is even or odd. Or in other words, if the inclusion-exclusion formula is
truncated, the error has the sign of the omitted term and is smaller in absolute value.
Note that the case r = 1 is Boole’s inequality.
7
2.2. INDEPENDENCE
Proof. The result is true for n = 2. If true for n − 1, then it is true for n and 1 ≤ r ≤
n − 1 by the inductive step above, which expresses a n-union in terms of two n − 1
unions. It is true for r = n by the inclusion-exclusion formula.
Example (Derangements). After a dinner, the n guests take coats at random from a
pile. Find the probability that at least one guest has the right coat.
1
Solution. Let AS
k be the event that guest k has his own coat.
n
We want P( i=1 Ai ). Now,
P(Ai1 ∩ · · · ∩ Air ) =
(n − r)!
,
n!
by counting the number of ways of matching guests and coats after i1 , . . . , ir have
taken theirs. Thus
X
n (n − r)!
1
P(Ai1 ∩ · · · ∩ Air ) =
= ,
r
n!
r!
i <···<i
1
r
and the required probability is
!
n
[
1
(−1)n−1
1
,
P
Ai = 1 − + + · · · +
2! 3!
n!
i=1
which tends to 1 − e−1 as n → ∞.
Furthermore, let Pm (n) be the probability that exactly m guests take the right coat.
Then P0 (n) → e−1 and n! P0 (n) is the number of derangements of n objects. Therefore
n 1 × P0 (n − m) × (n − m)!
Pm (n) =
m
n!
−1
P0 (n − m)
e
=
→
as n → ∞.
m!
m!
2.2
Independence
Definition 2.1. Two events A and B are said to be independent if
P(A ∩ B) = P(A) P(B) .
More generally, a collection of events Ai , i ∈ I are independent if
!
\
Y
P
Ai =
P(Ai )
i∈J
i∈J
for all finite subsets J ⊂ I.
Example. Two fair dice are thrown. Let A1 be the event that the first die shows an odd
number. Let A2 be the event that the second die shows an odd number and finally let
A3 be the event that the sum of the two numbers is odd. Are A1 and A2 independent?
Are A1 and A3 independent? Are A1 , A2 and A3 independent?
1 I’m
not being sexist, merely a lazy typist. Sex will be assigned at random...
8
CHAPTER 2. THE AXIOMATIC APPROACH
Solution. We first calculate the probabilities of the events A1 , A2 , A3 , A1 ∩A2 , A1 ∩A3
and A1 ∩ A2 ∩ A3 .
Event
Probability
18
36
A1
A2
1
2
=
As above,
A3
6×3
36
=
1
2
A1 ∩ A2
3×3
36
=
1
4
A1 ∩ A3
3×3
36
=
1
4
A1 ∩ A2 ∩ A3
1
2
0
Thus by a series of multiplications, we can see that A1 and A2 are independent, A1
and A3 are independent (also A2 and A3 ), but that A1 , A2 and A3 are not independent.
Now we wish to state what we mean by “2 independent experiments”2 . Consider
Ω1 = {α1 , . . . } and Ω2 = {β1 , . . . } with associated probability distributions {p1 , . . . }
and {q1 , . . . }. Then, by “2 independent experiments”, we mean the sample space
Ω1 × Ω2 with probability distribution P((αi , βj )) = pi qj .
Now, suppose A ⊂ Ω1 and B ⊂ Ω2 . The event A can be interpreted as an event in
Ω1 × Ω2 , namely A × Ω2 , and similarly for B. Then
X
P(A ∩ B) =
αi ∈A
βj ∈B
pi q j =
X
αi ∈A
pi
X
qj = P(A) P(B) ,
βj ∈B
which is why they are called “independent” experiments. The obvious generalisation
to n experiments can be made, but for an infinite sequence of experiments we mean a
sample space Ω1 × Ω2 × . . . satisfying the appropriate formula ∀n ∈ N.
You might like to find the probability that n independent tosses of a biased coin
with the probability of heads p results in a total of r heads.
2.3
Distributions
The binomial distribution
with parameters n and p, 0 ≤ p ≤ 1 has Ω = {0, . . . , n} and
probabilities pi = ni pi (1 − p)n−i .
Theorem 2.4 (Poisson approximation to binomial). If n → ∞, p → 0 with np = λ
held fixed, then
n r
λr
p (1 − p)n−r → e−λ .
r
r!
2 or
more generally, n.
9
2.4. CONDITIONAL PROBABILITY
Proof.
n r
n(n − 1) . . . (n − r + 1) r
p (1 − p)n−r =
p (1 − p)n−r
r
r!
nn−1
n − r + 1 (np)r
=
...
(1 − p)n−r
n n
n
r!
n −r
r Y
n − i + 1 λr
λ
λ
=
1−
1−
n
r!
n
n
i=1
λr
× e−λ × 1
r!
λr
= e−λ .
r!
→1×
Suppose an infinite sequence of independent trials is to be performed. Each trial
results in a success with probability p ∈ (0, 1) or a failure with probability 1 − p. Such
a sequence is called a sequence of Bernoulli trials. The probability that the first success
occurs after exactlyP
r failures is pr = p(1 − p)r . This is the geometric distribution with
∞
parameter p. Since 0 pr = 1, the probability that all trials result in failure is zero.
2.4
Conditional Probability
Definition 2.2. Provided P(B) > 0, we define the conditional probability of A|B 3 to
be
P(A ∩ B)
.
P(A|B) =
P(B)
Whenever we write P(A|B), we assume that P(B) > 0.
Note that if A and B are independent then P(A|B) = P(A).
Theorem 2.5.
1. P(A ∩ B) = P(A|B) P(B),
2. P(A ∩ B ∩ C) = P(A|B ∩ C) P(B|C) P(C),
3. P(A|B ∩ C) =
P(A∩B|C)
P(B|C) ,
4. the function P(◦|B) restricted to subsets of B is a probability function on B.
Proof. Results 1 to 3 are immediate from the definition of conditional probability. For
result 4, note that A∩B ⊂ B, so P(A ∩ B) ≤ P(B) and thus P(A|B) ≤ 1. P(B|B) =
1 (obviously), so it just remains to show the last axiom. For disjoint Ai ’s,
!
S
[ P( i (Ai ∩ B))
P
Ai B =
P(B)
i
P
P(Ai ∩ B)
= i
P(B)
X
=
P(Ai |B) , as required.
i
3 read
“A given B”.
10
CHAPTER 2. THE AXIOMATIC APPROACH
Theorem 2.6 (Law of total probability). Let B1 , B2 , . . . be a partition of Ω. Then
X
P(A) =
P(A|Bi ) P(Bi ) .
i
Proof.
X
X
P(A ∩ Bi )
!
[
=P
A ∩ Bi
P(A|Bi ) P(Bi ) =
i
= P(A) , as required.
Example (Gambler’s Ruin). A fair coin is tossed repeatedly. At each toss a gambler
wins £1 if a head shows and loses £1 if tails. He continues playing until his capital
reaches m or he goes broke. Find px , the probability that he goes broke if his initial
capital is £x.
Solution. Let A be the event that he goes broke before reaching £m, and let H or
T be the outcome of the first toss. We condition on the first toss to get P(A) =
P(A|H) P(H) + P(A|T ) P(T ). But P(A|H) = px+1 and P(A|T ) = px−1 . Thus
we obtain the recurrence
px+1 − px = px − px−1 .
Note that px is linear in x, with p0 = 1, pm = 0. Thus px = 1 −
x
m.
Theorem 2.7 (Bayes’ Formula). Let B1 , B2 , . . . be a partition of Ω. Then
P(A|Bi ) P(Bi )
P(Bi |A) = P
.
j P(A|Bj ) P(Bj )
Proof.
P(Bi |A) =
by the law of total probability.
P(A ∩ Bi )
P(A|Bi ) P(Bi )
=P
,
P(A)
j P(A|Bj ) P(Bj )
Chapter 3
Random Variables
Let Ω be finite or countable, and let pω = P({ω}) for ω ∈ Ω.
Definition 3.1. A random variable X is a function X : Ω 7→ R.
Note that “random variable” is a somewhat inaccurate term, a random variable is
neither random nor a variable.
Example. If Ω = {(i, j), 1 ≤ i, j ≤ t}, then we can define random variables X and
Y by X(i, j) = i + j and Y (i, j) = max{i, j}
Let RX be the image of Ω under X. When the range is finite or countable then the
random variable is said to be discrete.
P
We write P(X = xi ) for ω:X(ω)=xi pω , and for B ⊂ R
P(X ∈ B) =
X
P(X = x) .
x∈B
Then
(P(X = x) , x ∈ RX )
is the distribution of the random variable X. Note that it is a probability distribution
over RX .
3.1
Expectation
Definition 3.2. The expectation of a random variable X is the number
E[X] =
X
ω∈Ω
provided that this sum converges absolutely.
11
pw X(ω)
12
CHAPTER 3. RANDOM VARIABLES
Note that
X
E[X] =
pw X(ω)
ω∈Ω
X
=
X
pω X(ω)
x∈RX ω:X(ω)=x
X
=
x
X
pω
x∈RX
ω:X(ω)=x
X
xP(X = x) .
=
x∈RX
Absolute convergence allows the sum to be taken in any order.
P
If X is a positive random variable and if ω∈Ω pω X(ω) = ∞ we write E[X] =
+∞. If
X
xP(X = x) = ∞ and
x∈RX
x≥0
X
xP(X = x) = −∞
x∈RX
x<0
then E[X] is undefined.
r
Example. If P(X = r) = e−λ λr! , then E[X] = λ.
Solution.
E[X] =
∞
X
r
re−λ λr!
r=0
= λe−λ
Example. If P(X = r) =
n
r
∞
X
λr−1
= λe−λ eλ = λ
(r
−
1)!
r=1
pr (1 − p)n−r then E[X] = np.
13
3.1. EXPECTATION
Solution.
E[X] =
=
n
X
r=0
n
X
r
n−r
rp (1 − p)
r
n!
pr (1 − p)n−r
r!(n − r)!
r=0
n
X
=n
n
r
(n − 1)!
pr (1 − p)n−r
(r − 1)!(n − r)!
r=1
n
X
= np
r=1
(n − 1)!
pr−1 (1 − p)n−r
(r − 1)!(n − r)!
n−1
X
(n − 1)! r
p (1 − p)n−1−r
(r)!(n
−
r)!
r=1
n−1
X n − 1
= np
pr (1 − p)n−1−r
r
r=1
= np
= np
For any function f : R 7→ R the composition of f and X defines a new random
variable f and X defines the new random variable f (X) given by
f (X)(w) = f (X(w)).
Example. If a, b and c are constants, then a + bX and (X − c)2 are random variables
defined by
(a + bX)(w) = a + bX(w)
2
and
2
(X − c) (w) = (X(w) − c) .
Note that E[X] is a constant.
Theorem 3.1.
1. If X ≥ 0 then E[X] ≥ 0.
2. If X ≥ 0 and E[X] = 0 then P(X = 0) = 1.
3. If a and b are constants then E[a + bX] = a + bE[X].
4. For any random variables X, Y then E[X + Y ] = E[X] + E[Y ].
h
i
2
5. E[X] is the constant which minimises E (X − c) .
Proof.
1. X ≥ 0 means Xw ≥ 0 ∀ w ∈ Ω
X
So E[X] =
pω X(ω) ≥ 0
ω∈Ω
2. If ∃ω ∈ Ω with pω > 0 and X(ω) > 0 then E[X] > 0, therefore P(X = 0) = 1.
14
CHAPTER 3. RANDOM VARIABLES
3.
E[a + bX] =
X
(a + bX(ω)) pω
ω∈Ω
=a
X
pω + b
ω∈Ω
X
pω X(ω)
ω∈Ω
= a + E[X] .
4. Trivial.
5. Now
E (X − c)2 = E (X − E[X] + E[X] − c)2
= E [(X − E[X])2 + 2(X − E[X])(E[X] − c) + [(E[X] − c)]2 ]
= E (X − E[X])2 + 2(E[X] − c)E[(X − E[X])] + (E[X] − c)2
= E (X − E[X])2 + (E[X] − c)2 .
This is clearly minimised when c = E[X].
Theorem 3.2. For any random variables X1 , X2 , ...., Xn
" n
#
n
X
X
E
Xi =
E[Xi ]
i=1
i=1
Proof.
"
E
n
X
#
Xi
i=1
"n−1
#
X
=E
Xi + Xn
i=1
"n−1 #
X
=E
Xi + E[X]
i=1
Result follows by induction.
3.2
Variance
2
Var X = E X 2 − E[X]
2
= E[X − E[X]] = σ
√
Standard Deviation = Var X
for Random Variable X
2
Theorem 3.3. Properties of Variance
(i) Var X ≥ 0 if Var X = 0, then P(X = E[X]) = 1
Proof - from property 1 of expectation
(ii) If a, b constants, Var (a + bX) = b2 Var X
15
3.2. VARIANCE
Proof.
Var a + bX = E[a + bX − a − bE[X]]
= b2 E[X − E[X]]
= b2 Var X
2
(iii) Var X = E X 2 − E[X]
Proof.
2
E[X − E[X]] = E X 2 − 2XE[X] + (E[X])2
2
= E X 2 − 2E[X] E[X] + E[X]
2
= E X − (E[X])2
Example. Let X have the geometric distribution P(X = r) = pq r with r = 0, 1, 2...
and p + q = 1. Then E[X] = pq and Var X = pq2 .
Solution.
E[X] =
∞
X
rpq r = pq
r=0
∞
X
rq r−1
r=0
∞
1 X d r
d 1 =
(q ) = pq
pq r=0 dq
dq 1 − q
q
= pq(1 − q)−2 =
p
∞
2 X
E X =
r2 p2 q 2r
r=0
= pq
X
∞
r(r + 1)q
r=1
r−1
−
∞
X
rq
r−1
r=1
2
1
2q
q
= pq(
−
= 2 −
(1 − q)3
(1 − q)2
p
p
2
Var X = E X 2 − E[X]
2q
q
q2
−
−
p2
p
p
q
= 2
p
=
Definition 3.3. The co-variance of random variables X and Y is:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
The correlation of X and Y is:
Cov(X, Y )
Corr(X, Y ) = √
Var X Var Y
16
CHAPTER 3. RANDOM VARIABLES
Linear Regression
Theorem 3.4. Var (X + Y ) = Var X + Var Y + 2Cov(X, Y )
Proof.
2
Var (X + Y ) = E (X + Y )2 − E[X] − E[Y ]
= E (X − E[X])2 + (Y − E[Y ])2 + 2(X − E[X])(Y − E[Y ])
= Var X + Var Y + 2Cov(X, Y )
3.3
Indicator Function
Definition 3.4. The Indicator Function I[A] of an event A ⊂ Ω is the function
(
1,
I[A](w) =
0,
if ω ∈ A;
if ω ∈
/ A.
(3.1)
NB that I[A] is a random variable
1.
E[I[A]] = P(A)
X
E[I[A]] =
pω I[A](w)
ω∈Ω
= P(A)
2. I[Ac ] = 1 − I[A]
3. I[A ∩ B] = I[A]I[B]
4.
I[A ∪ B] = I[A] + I[B] − I[A]I[B]
I[A ∪ B](ω) = 1 if ω ∈ A or ω ∈ B
I[A ∪ B](ω) = I[A](ω) + I[B](ω) − I[A]I[B](ω) WORKS!
Example. n ≥ couples are arranged randomly around a table such that males and females alternate. Let N = The number of husbands sitting next to their wives. Calculate
17
3.3. INDICATOR FUNCTION
the E[N ] and the Var N .
N=
n
X
I[Ai ]
Ai = event couple i are together
i=1
"
E[N ] = E
n
X
#
I[Ai ]
i=1
=
=
n
X
i=1
n
X
i=1
E[I[Ai ]]
2
n
2
=2
n

!2 
n
X
2
E N = E
I[Ai ] 
Thus E[N ] = n
i=1


2
n
X
X
= E
I[Ai ] + 2
I[Ai ]I[Aj ]
i=1
i≤j
= nE I[Ai ]2 + n(n − 1)E[(I[A1 ]I[A2 ])]
2
E I[Ai ]2 = E[I[Ai ]] =
n
E[(I[A1 ]I[A2 ])] = IE[[A1 ∩ B2 ]] = P(A1 ∩ A2 )
= P(A1 ) P(A2 |A1 )
2
1
1
n−2 2
=
−
n n−1n−1 n−1n−1
2
Var N = E N 2 − E[N ]
2
=
(1 + 2(n − 2)) − 2
n−1
2(n − 2)
=
n−1
18
3.4
CHAPTER 3. RANDOM VARIABLES
Inclusion - Exclusion Formula
N
[
N
\
Ai =
1
I
"N
[
!c
Aci
1
#
"
Ai = I
N
\
1
!c #
Aci
1
=1−I
"N
\
#
Aci
1
=1−
N
Y
I[Aci ]
1
=1−
N
Y
(1 − I[Ai ])
1
=
N
X
I[Ai ] −
X
i1 ≤ i2 I[A1 ]I[A2 ]
1
+ ... + (−1)j+1
X
I[A1 ]I[A2 ]...I[Aj ] + ...
i1 ≤i2 ...≤ij
Take Expectation
"N
#
!
N
[
[
E
Ai = P
Ai
1
1
=
N
X
P(Ai ) −
X
i1 ≤ i2 P(A1 ∩ A2 )
1
+ ... + (−1)j+1
X
P Ai1 ∩ Ai2 ∩ .... ∩ Aij + ...
i1 ≤i2 ...≤ij
3.5
Independence
Definition 3.5. Discrete random variables X1 , ..., Xn are independent if and only if
for any x1 ...xn :
P(X1 = x1 , X2 = x2 .......Xn = xn ) =
N
Y
P(Xi = xi )
1
Theorem 3.5 (Preservation of Independence).
If
X1 , ..., Xn are independent random variables and f1 , f2 ...fn are functions R → R
then f1 (X1 )...fn (Xn ) are independent random variables
19
3.5. INDEPENDENCE
Proof.
X
P(f1 (X1 ) = y1 , . . . , fn (Xn ) = yn ) =
P(X1 = x1 , . . . , Xn = xn )
x1 :f1 (X1 )=y1 ,...
xn :fn (Xn )=yn
=
N
Y
X
P(Xi = xi )
1 xi :fi (Xi )=yi
=
N
Y
P(fi (Xi ) = yi )
1
Theorem 3.6. If X1 .....Xn are independent random variables then:
"N
#
N
Y
Y
Xi =
E[Xi ]
E
1
1
P
P
NOTE that E[ Xi ] =
E[Xi ] without requiring independence.
Proof. Write Ri for RXi the range of Xi
"N
#
Y
X
X
E
Xi =
....
x1 ..xn P(X1 = x1 , X2 = x2 ......., Xn = xn )
x1 ∈R1
1
=
=
xn ∈Rn
!
N
Y
X
1
xi ∈Ri
N
Y
P(Xi = xi )
E[Xi ]
1
Theorem 3.7. If X1 , ..., Xn are independent random variables and f1 ....fn are function R → R then:
"N
#
N
Y
Y
E
fi (Xi ) =
E[fi (Xi )]
1
1
Proof. Obvious from last two theorems!
Theorem 3.8. If X1 , ..., Xn are independent random variables then:
Var
n
X
i=1
!
Xi
=
n
X
i=1
Var Xi
20
CHAPTER 3. RANDOM VARIABLES
Proof.
Var
n
X

!
Xi
= E
n
X
i=1
!2 
" n
#2
X

Xi
−E
Xi
i=1
i=1


" n
#2
X
X
X
2


=E
Xi +
Xi Xj − E
Xi
i
i=1
i6=j
X X
X
X
2
=
E Xi2 +
E[Xi Xj ] −
E[Xi ] −
E[Xi ] E[Xj ]
i
i
i6=j
i6=j
X 2
=
E Xi2 − E[Xi ]
i
=
X
Var Xi
i
Theorem 3.9. If X1 , ..., Xn are independent identically distributed random variables
then
n
Var
1X
Xi
n i=1
!
=
1
Var Xi
n
Proof.
n
Var
1X
Xi
n i=1
!
=
1
Var Xi
n2
n
1 X
= 2
Var Xi
n i=1
=
1
Var Xi
n
Example. Experimental Design.
Two rods of unknown lengths a, b. A rule can
measure the length but with but with error having 0 mean (unbiased) and variance σ 2 .
Errors independent from measurement to measurement. To estimate a, b we could take
separate measurements A, B of each rod.
E[A] = a
Var A = σ 2
E[B] = b
Var B = σ 2
21
3.5. INDEPENDENCE
Can we do better? YEP! Measure a + b as X and a − b as Y
E[X] = a + b
Var X = σ 2
E[Y ] = a − b
Var Y = σ 2
X +Y
E
=a
2
X +Y
1
Var
= σ2
2 2
X −Y
E
=b
2
X −Y
1
Var
= σ2
2
2
So this is better.
Example. Non standard dice.
a → B P(A ≥ B) = 23 .
You choose 1 then I choose one. Around this cycle
So the relation ’A better that B’ is not transitive.
22
CHAPTER 3. RANDOM VARIABLES
Chapter 4
Inequalities
4.1
Jensen’s Inequality
A function f ; (a, b) → R is convex if
f (px + qy) ≤ pf (x) + (1 − p)f (y) - ∀x, y ∈ (a, b) - ∀p ∈ (0, 1)
Strictly convex if strict inequality holds when x 6= y
f is concave if −f is convex. f is strictly concave if −f is strictly convex
23
24
CHAPTER 4. INEQUALITIES
Concave
neither concave or convex.
00
We know that if f is twice differentiable and f (x) ≥ 0 for x ∈ (a, b) the if f is
00
convex and strictly convex if f (x) ≥ 0 forx ∈ (a, b).
Example.
f (x) = − log x
0
−1
f (x) =
x
00
1
f (x) = 2 ≥ 0
x
f (x) is strictly convex on (0, ∞)
Example.
f (x) = −x log x
0
f (x) = −(1 + logx)
00
−1
≤0
f (x) =
x
Strictly concave.
Example. f (x = x3 is strictly convex on (0, ∞) but not on (−∞, ∞)
Theorem 4.1. Let f : (a, b) → R be a convex function. Then:
!
n
n
X
X
pi f (xi ) ≥ f
pi xi
i=1
i=1
P
x1 , . . . , Xn ∈ (a, b), p1 , . . . , pn ∈ (0, 1) and
pi = 1. Further more if f is strictly
convex then equality holds if and only if all x’s are equal.
E[f (X)] ≥ f (E[X])
25
4.1. JENSEN’S INEQUALITY
Proof. By induction on n n = 1 nothing to prove n = 2 definition of convexity.
Assume
results holds up to n-1. Consider x1 , ..., xn ∈ (a, b), p1 , ..., pn ∈ (0, 1) and
P
pi = 1
n
X
0
0
pi
For i = 2...n, set pi =
, such that
pi = 1
1 − pi
i=2
Then by the inductive hypothesis twice, first for n-1, then for 2
n
X
pi fi (xi ) = p1 f (x1 ) + (1 − p1 )
n
X
1
0
pi f (xi )
i=2
≥ p1 f (x1 ) + (1 − p1 )f
n
X
!
0
pi xi
i=2
≥f
p1 x1 + (1 − p1 )
n
X
!
0
pi xi
i=2
=f
n
X
!
pi xi
i=1
f is strictly convex n ≥ 3 and not all the x0i s equal then we assume not all of x2 ...xn
are equal. But then
!
n
n
X
X
0
0
pi xi
pi f (xi ) ≥ (1 − pj )f
(1 − pj )
i=2
i=2
So the inequality is strict.
Corollary (AM/GM Inequality). Positive real numbers x1 , . . . , xn
n
Y
i=1
! n1
xi
n
≤
1X
xi
n i=1
Equality holds if and only if x1 = x2 = · · · = xn
Proof. Let
1
n
then f (x) = − log x is a convex function on (0, ∞).
So
P(X = xi ) =
E[f (x)] ≥ f (E[x]) (Jensen’s Inequality)
−E[log x] ≥ log E[x]
[1]
n
n
1X
1X
Therefore −
log xi ≤ − log
x
n 1
n 1
! n1
n
n
Y
1X
xi
≤
xi
[2]
n i=1
i=1
For strictness since f strictly convex equation holds in [1] and hence [2] if and only if
x1 = x2 = · · · = xn
26
CHAPTER 4. INEQUALITIES
If f : (a, b) → R is a convex function then it can be shown that at each point
y ∈ (a, b)∃ a linear function αy + βy x such that
f (x) ≤ αy + βy x
f (y) = αy + βy y
x ∈ (a, b)
0
If f is differentiable at y then the linear function is the tangent f (y) + (x − y)f (y)
Let y = E[X], α = αy and β = βy
f (E[X]) = α + βE[X]
So for any random variable X taking values in (a, b)
E[f (X)] ≥ E[α + βX]
= α + βE[X]
= f (E[X])
4.2
Cauchy-Schwarz Inequality
Theorem 4.2. For any random variables X, Y ,
2
E[XY ] ≤ E X 2 E Y 2
Proof. For a, b ∈ R Let
LetZ = aX − bY
Then0 ≤ E Z 2 = E (aX − bY )2
= a2 E X 2 − 2abE[XY ] + b2 E Y 2
quadratic in a with at most one real root and therefore has discriminant ≤ 0.
27
4.3. MARKOV’S INEQUALITY
Take b 6= 0
2
E[XY ] ≤ E X 2 E Y 2
Corollary.
|Corr(X, Y )| ≤ 1
Proof. Apply Cauchy-Schwarz to the random variables X − E[X] and Y − E[Y ]
4.3
Markov’s Inequality
Theorem 4.3. If X is any random variable with finite mean then,
P(|X| ≥ a) ≤
E[|X|]
for any a ≥ 0
a
Proof. Let
A = |X| ≥ a
T hen |X| ≥ aI[A]
Take expectation
E[|X|] ≥ aP(A)
E[|X|] ≥ aP(|X| ≥ a)
4.4
Chebyshev’s Inequality
Theorem 4.4. Let X be a random variable with E X 2 ≤ ∞. Then ∀ ≥ 0
E X2
P(|X| ≥ ) ≤
2
Proof.
I[|X| ≥ ] ≤
x2
∀x
2
28
CHAPTER 4. INEQUALITIES
Then
I[|X| ≥ ] ≤
x2
2
Take Expectation
2
E X2
x
P(|X| ≥ ) ≤ E 2 =
2
Note
1. The result
is “distribution free” - no assumption about the distribution of X (other
than E X 2 ≤ ∞).
2. It is the “best possible” inequality, in the following sense
c
22
c
= − with probability 2
2
c
= 0 with probability 1 − 2
c
Then P(|X| ≥ ) = 2
E X2 = c
E X2
c
P(|X| ≥ ) = 2 =
2
X = + with probability
3. If µ = E[X] then applying the inequality to X − µ gives
P(X − µ ≥ ) ≤
Var X
2
Often the most useful form.
4.5
Law of Large Numbers
Theorem 4.5 (Weak law of large numbers). Let X1 , X2 ..... be a sequences of independent identically distributed random variables with Variance σ 2 ≤ ∞ Let
Sn =
n
X
Xi
i=1
Then
Sn
− µ ≥ → 0 as n → ∞
∀ ≥ 0, P n
29
4.5. LAW OF LARGE NUMBERS
Proof. By Chebyshev’s Inequality
Sn
E ( Snn − µ)2
P − µ ≥ ≤
n
2
E (Sn − nµ)2
properties of expectation
=
n2 2
Var Sn
= 2 2 Since E[Sn ] = nµ
n But Var Sn = nσ 2
Sn
nσ 2
σ2
Thus P − µ ≥ ≤ 2 2 = 2 → 0
n
n n
Example. A1 , A2 ... are independent events, each with probability p. Let Xi = I[Ai ].
Then
Sn
nA
number of times A occurs
=
=
n
n
number of trials
µ = E[I[Ai ]] = P(Ai ) = p
Theorem states that
Sn
P − p ≥ → 0 as n → ∞
n
Which recovers the intuitive definition of probability.
Example. A Random Sample of size n is a sequence X1 , X2 , . . . , Xn of independent
identically distributed random variables (’n observations’)
Pn
Xi
X̄ = i=1
is called the SAMPLE MEAN
n
Theorem states that provided the variance of Xi is finite, the probability that the sample
mean differs from the mean of the distribution by more than approaches 0 as n → ∞.
We have shown the weak law of large numbers. Why weak?
larger numbers.
Sn
P
→ µ as n → ∞ = 1
n
∃ a strong form of
This is NOT the same as the weak form. What does this mean?
ω ∈ Ω determines
Sn
,
n = 1, 2, . . .
n
as a sequence of real numbers. Hence it either tends to µ or it doesn’t.
Sn (ω)
P ω:
→ µ as n → ∞ = 1
n
30
CHAPTER 4. INEQUALITIES
Chapter 5
Generating Functions
In this chapter, assume that X is a random variable taking values in the range 0, 1, 2, . . ..
Let pr = P(X = r) r = 0, 1, 2, . . .
Definition 5.1. The Probability Generating Function (p.g.f) of the random variable
X,or of the distribution pr = 0, 1, 2, . . . , is
∞
∞
X
X
pr z r
z r P(X = r) =
p(z) = E z X =
r=0
r=0
This p(z) is a polynomial or a power series. If a power series then it is convergent for
|z| ≤ 1 by comparison with a geometric series.
X
X
r
|p(z)| ≤
pr |z| ≤
pr = 1
r
r
Example.
1
r = 1, . . . , 6
6
1
p(z) = E z X =
1 + z + . . . z6
6
z 1 − z6
=
6 1−z
pr =
Theorem 5.1. The distribution of X is uniquely determined by the p.g.f p(z).
Proof. We know that we can differential p(z) term by term for |z| ≤ 1
0
p (z) = p1 + 2p2 z + . . .
0
and so p (0) = p1
(p(0) = p0 )
Repeated differentiation gives
p(i) (z) =
∞
X
r=i
r!
pr z r−i
(r − i)!
and has p(i) = 0 = i!pi Thus we can recover p0 , p1 , . . . from p(z)
31
32
CHAPTER 5. GENERATING FUNCTIONS
Theorem 5.2 (Abel’s Lemma).
E[X] = lim p0 (z)
z→1
Proof.
p0 (z) =
∞
X
rpr z r−1
|z| ≤ 1
r=i
For z ∈ (0, 1), p0 (z) is a non decreasing function of z and is bounded above by
E[X] =
∞
X
rpr
r=i
Choose ≥ 0, N large enough that
N
X
rpr ≥ E[X] − r=i
Then
lim
z→1
∞
X
rpr z r−1 ≥ lim
z→1
r=i
N
X
rpr z r−1 =
r=i
N
X
rpr
r=i
True ∀ ≥ 0 and so
E[X] = lim p0 (z)
z→1
Usually p0 (z) is continuous at z=1, then E[X] = p0 (1).
z 1 − z6
Recall p(z) =
6 1−z
Theorem 5.3.
E[X(X − 1)] = lim p00 (z)
z→1
Proof.
00
p (z) =
∞
X
r(r − 1)pz r−2
r=2
Proof now the same as Abel’s Lemma
Theorem 5.4. Suppose that X1 , X2 , . . . , Xn are independent random variables with
p.g.f’s p1 (z), p2 (z), . . . , pn (z). Then the p.g.f of
X1 + X2 + . . . X n
is
p1 (z)p2 (z) . . . pn (z)
Proof.
E z X1 +X2 +...Xn = E z X1 .z X2 . . . .z Xn
= E z X1 E z X2 . . . E z Xn
= p1 (z)p2 (z) . . . pn (z)
33
Example. Suppose X has Poisson Distribution
P(X = r) = e−λ
λr
r!
r = 0, 1, . . .
Then
∞
X
λr
E zX =
z r e−λ
r!
r=0
= e−λ e−λz
= e−λ(1−z)
Let’s calculate the variance of X
0
00
p = λe−λ(1−z)
p = λ2 e−λ(1−z)
Then
0
0
0
E[X] = lim p (z) = p (1)( Since p (z) continuous at z = 1 )E[X] = λ
z→1
00
E[X(X − 1)] = p (1) = λ2
2
Var X = E X 2 − E[X]
2
= E[X(X − 1)] + E[X] − E[X]
= λ 2 + λ − λ2
=λ
Example. Suppose that Y has a Poisson Distribution with parameter µ. If X and Y are
independent then:
E z X+Y = E z X E z Y
= e−λ(1−z) e−µ(1−z)
= e−(λ+µ)(1−z)
But this is the p.g.f of a Poisson random variable with parameter λ + µ. By uniqueness
(first theorem of the p.g.f) this must be the distribution for X + Y
Example. X has a binomial Distribution,
n r
P(X = r) =
p (1 − p)n−r
r = 0, 1, . . .
r
n X X
n r
p (1 − p)n−r z r
E z =
r
r=0
= (pz + 1 − p)n
This shows that X = Y1 + Y2 + · · · + Yn . Where Y1 + Y2 + · · · + Yn are independent
random variables each with
P(Yi = 1) = p
P(Yi = 0) = 1 − p
Note if the p.g.f factorizes look to see if the random variable can be written as a sum.
34
CHAPTER 5. GENERATING FUNCTIONS
5.1
Combinatorial Applications
Tile a (2 × n) bathroom with (2 × 1) tiles. How many ways can this be done? Say fn
fn = fn−1 + fn−2
Let
F (z) =
f0 = f1 = 1
∞
X
fn z n
n=0
fn z n = fn−1 z n + fn−2 z n
∞
∞
∞
X
X
X
fn z n =
fn−1 z n +
fn−2 z n
n=2
n=2
n=0
F (z) − f0 − zf1 = z(F (z) − f0 ) + z 2 F (z)
F (z)(1 − z − z 2 ) = f0 (1 − z) + zf1
= 1 − z + z = 1.
1
Since f0 = f1 = 1, then F (z) = 1−z−z
2
Let
√
√
1− 5
1+ 5
α2 =
α1 =
2
2
1
(1 − α1 z)(1 − α2 z)
α1
α2
=
−
(1 − α1 z) (1 − α2 z)
!
∞
∞
X
X
1
n n
n n
α1
α1 z − α2
α2 z
=
α1 − α2
n=0
n=0
F (z) =
The coefficient of z1n , that is fn , is
fn =
5.2
1
(αn+1 − α2n+1 )
α1 − α2 1
Conditional Expectation
Let X and Y be random variables with joint distribution
P(X = x, Y = y)
Then the distribution of X is
P(X = x) =
X
P(X = x, Y = y)
y∈Ry
This is often called the Marginal distribution for X. The conditional distribution for X
given by Y = y is
P(X = x|Y = y) =
P(X = x, Y = y)
P(Y = y)
5.3. PROPERTIES OF CONDITIONAL EXPECTATION
35
Definition 5.2. The conditional expectation of X given Y = y is,
E[X = x|Y = y] =
X
xP(X = x|Y = y)
x∈Rx
The conditional Expectation of X given Y is the random variable E[X|Y ] defined by
E[X|Y ] (ω) = E[X|Y = Y (ω)]
Thus E[X|Y ] : Ω → R
Example. Let X1 , X2 , . . . , Xn be independent identically distributed random variables with P(X1 = 1) = p and P(X1 = 0) = 1 − p. Let
Y = X1 + X2 + · · · + Xn
Then
P(X1 = 1, Y = r)
P(Y = r)
P(X1 = 1, X2 + · · · + Xn = r − 1)
=
P(Y = r)
P(X1 ) P(X2 + · · · + Xn = r − 1)
=
P(Y = r)
n−1 r−1
p
p (1 − p)n−r
= r−1
n r
n−r
r p (1 − p)
n−1
P(X1 = 1|Y = r) =
=
=
r−1
n
r
r
n
Then
E[X1 |Y = r] = 0 × P(X1 = 0|Y = r) + 1 × P(X1 = 1|Y = r)
r
=
n
1
E[X1 |Y = Y (ω)] = Y (ω)
n
1
Therefore E[X1 |Y ] = Y
n
Note a random variable - a function of Y .
5.3
Properties of Conditional Expectation
Theorem 5.5.
E[E[X|Y ]] = E[X]
36
CHAPTER 5. GENERATING FUNCTIONS
Proof.
E[E[X|Y ]] =
X
P(Y = y) E[X|Y = y]
y∈Ry
=
X
P(Y = y)
y
=
X
P(X = x|Y = y)
x∈Rx
XX
y
xP(X = x|Y = y)
x
= E[X]
Theorem 5.6. If X and Y are independent then
E[X|Y ] = E[X]
Proof. If X and Y are independent then for any y ∈ Ry
E[X|Y = y] =
X
xP(X = x|Y = y) =
X
xP(X = x) = E[X]
x
x∈Rx
Example. Let X1 , X2 , . . . be i.i.d.r.v’s with p.g.f p(z). Let N be a random variable
independent of X1 , X2 , . . . with p.g.f h(z). What is the p.g.f of:
X1 + X2 + · · · + XN
E z X1 +,...,Xn = E E z X1 +,...,Xn |N
∞
X
=
P(N = n) E z X1 +,...,Xn |N = n
=
n=0
∞
X
P(N = n) (p(z))n
n=0
= h(p(z))
Then for example
E[X1 +, . . . , Xn ] =
0
d
h(p(z))
dz
z=1
0
= h (1)p (1) = E[N ] E[X1 ]
Exercise Calculate
d2
dz 2 h(p(z))
and hence
Var X1 +, . . . , Xn
In terms of Var N and Var X1
37
5.4. BRANCHING PROCESSES
5.4
Branching Processes
X0 , X1 . . . sequence of random variables. Xn number of individuals in the nth generation of population. Assume.
1. X0 = 1
2. Each individual
lives for unit time then on death produces k offspring, probabilP
ity fk .
fk = 1
3. All offspring behave independently.
Xn+1 = Y1n + Y2n + · · · + Ynn
Where Yin are i.i.d.r.v’s. Yin number of offspring of individual i in generation n.
Assume
1. f0 ≥ 0
2. f0 + f1 ≤ 1
Let F(z) be the probability generating function ofYin .
F (z) =
∞
X
h ni
fk z k = E z Xi = E z Yi
n=0
Let
Fn (z) = E z Xn
Then F1 (z) = F (z) the probability generating function of the offspring distribution.
Theorem 5.7.
Fn+1 (z) = Fn (F (z)) = F (F (. . . (F (z)) . . . ))
Fn (z) is an n-fold iterative formula.
38
CHAPTER 5. GENERATING FUNCTIONS
Proof.
Fn+1 (z) = E z Xn+1
= E E z Xn+1 |Xn
∞
X
=
P(Xn = k) E z Xn+1 |Xn = k
=
=
=
n=0
∞
X
n=0
∞
X
n=0
∞
X
h n n
i
n
P(Xn = k) E z Y1 +Y2 +···+Yn
h ni
h ni
P(Xn = k) E z Y1 . . . E z Yn
k
P(Xn = k) (F (z))
n=0
= Fn (F (z))
Theorem 5.8. Mean and Variance of population size
∞
X
If m =
k=0
∞
X
and σ 2 =
kfk ≤ ∞
(k − m)2 fk ≤ ∞
k=0
Mean and Variance of offspring distribution.
Then E[Xn ] = mn
(
Var Xn =
0
σ 2 mn−1 (mn −1)
,
m−1
2
nσ ,
m 6= 1
m=1
00
Proof. Prove by calculating F (z), F (z) Alternatively
E[Xn ] = E[E[Xn |Xn−1 ]]
= E[m|Xn−1 ]
= mE[Xn−1 ]
= mn by induction
E (Xn − mXn−1 )2 = E E (Xn − mXn−1 )2 |Xn
= E[Var (Xn |Xn−1 )]
= E σ 2 Xn−1
= σ 2 mn−1
Thus
2 2
E Xn2 − 2mE[Xn Xn−1 ] + m2 E Xn−1
= σ 2 mn−1
(5.1)
39
5.4. BRANCHING PROCESSES
Now calculate
E[Xn Xn−1 ] = E[E[Xn Xn−1 |Xn−1 ]]
= E[Xn−1 E[Xn |Xn−1 ]]
= E[Xn−1 mXn−1 ]
2 = mE Xn−1
2
Then E Xn2 = σ 2 mn−1 + m2 E[Xn−1 ]
2
Var Xn = E Xn2 − E[Xn ]
2 2
= m2 E Xn−1
+ σ 2 mn−1 − m2 E[Xn−1 ]
= m2 Var Xn−1 + σ 2 mn−1
= m4 Var Xn−2 + σ 2 (mn−1 + mn )
= m2(n−1) Var X1 + σ 2 (mn−1 + mn + · · · + m2n−3 )
= σ 2 mn−1 (1 + m + · · · + mn )
To deal with extinction we need to be careful with limits as n → ∞. Let
An = Xn = 0
= Extinction occurs by generation n
∞
[
An
and let A =
1
= the event that extinction ever occurs
Can we calculate P(A) from P(An )?
More generally let An be an increasing sequence
A1 ⊂ A2 ⊂ . . .
and define
A = lim An =
∞
[
n→∞
An
1
Define Bn for n ≥ 1
B1 = A1
Bn = An ∩
= An ∩
n−1
[
i=1
Acn−1
!c
Ai
40
CHAPTER 5. GENERATING FUNCTIONS
Bn for n ≥ 1 are disjoint events and
∞
[
i=1
n
[
P
∞
[
Ai =
Ai =
i=1
∞
[
Bi
i=1
n
[
Bi
i=1
∞
[
!
Ai
=P
i=1
!
Bi
i=1
=
∞
X
P(Bi )
1
= lim
n→∞
= lim
n→∞
= lim
n→∞
n
X
1
n
[
i=1
n
[
P(Bi )
Bi
Ai
i=1
= lim P(An )
n→∞
Thus
P lim An = lim P(An )
n→∞
n→∞
Probability is a continuous set function. Thus
P(extinction ever occurs) = lim P(An )
n→∞
= lim P(Xn = 0)
n→∞
= q,
Say
Note P(Xn = 0), n = 1, 2, 3, . . . is an increasing sequence so limit exists. But
P(Xn = 0) = Fn (0)
Fn is the p.g.f of Xn
So
q = lim Fn (0)
n→∞
Also
F (q) = F
lim Fn (0)
n→∞
= lim F (Fn (0))
n→∞
= lim Fn+1 (0)
n→∞
Thus F (q) = q
“q” is called the Extinction Probability.
Since F is continuous
41
5.4. BRANCHING PROCESSES
Alternative Derivation
X
q=
P(X1 = k) P(extinction|X1 = k)
k
=
X
P(X1 = k) q k
= F (q)
Theorem 5.9. The probability of extinction, q, is the smallest positive root of the equation F (q) = q. m is the mean of the offspring distribution.
If m ≤ 1 then q = 1, while if m ≥ 1thenq ≤ 1
Proof.
F (1) = 1
m=
∞
X
0
0
00
F (z) =
∞
X
0
kfk = lim F (z)
z→1
j(j − 1)z j−2 in 0 ≤ z ≤ 1 Since f0 + f1 ≤ 1 Also F (0) = f0 ≥ 0
j=z
Thus if m ≤ 1, there does not exists a q ∈ (0, 1) with F (q) = q. If m ≥ 1 then let α
be the smallest positive root of F (z) = z then α ≤ 1. Further,
F (0) ≤ F (α) = α
F (F (0)) ≤ F (α) = α
Fn (0) ≤ α
∀n ≥ 1
q = lim Fn (0) ≤ 0
n→∞
q=α
Since q is a root of F (z) = z
42
CHAPTER 5. GENERATING FUNCTIONS
5.5
Random Walks
Let X1 , X2 , . . . be i.i.d.r.vs. Let
Sn = S0 + X 1 + X 2 + · · · + X n
Where, usually S0 = 0
Then Sn (n = 0, 1, 2, . . . is a 1 dimensional Random Walk.
We shall assume
(
1,
with probability p
Xn =
−1, with probability q
This is a simple random walk. If p = q =
1
2
(5.2)
then the random walk is called symmetric
Example (Gambler’s Ruin). You have an initial fortune of A and I have an initial
fortune of B. We toss coins repeatedly I win with probability p and you win with
probability q. What is the probability that I bankrupt you before you bankrupt me?
43
5.5. RANDOM WALKS
Set a = A + B and z = B Stop a random walk starting at z when it hits 0 or a.
Let pz be the probability that the random walk hits a before it hits 0, starting from
z. Let qz be the probability that the random walk hits 0 before it hits a, starting from
z. After the first step the gambler’s fortune is either z − 1 or z + 1 with prob p and q
respectively. From the law of total probability.
0≤z≤a
pz = qpz−1 + ppz+1
Also p0 = 0 and pa = 1. Must solve pt2 − t + q = 0.
√
√
1 ± 1 − 4pq
1 ± 1 − 2p
q
t=
=
= 1 or
2p
2p
p
General Solution for p 6= q is
z
q
pz = A + B
p
A + B = 0A =
1−
and so
1−
pz =
1−
1
a
q
p
z
q
p
a
q
p
If p = q, the general solution is A + Bz
pz =
z
a
To calculate qz , observe that this is the same problem with p, q, z replaced by p, q, a−z
respectively. Thus
qz =
q
p
a
−
a
q
p
q
p
−1
z
if p 6= q
44
CHAPTER 5. GENERATING FUNCTIONS
or
a−z
if p = q
z
Thus qz + pz = 1 and so on, as we expected, the game ends with probability one.
qz =
P(hits 0 before a) = qz
a
q
p
qz =
Or =
− ( pq )z
a
q
p
if p 6= q
−1
a−z
if p = q
z
What happens as a → ∞?
∞
[
P( paths hit 0 ever) =
path hits 0 before it hits a
a=z+1
P(hits 0 ever) = lim P(hits 0 before a)
a→∞
= lim qz
a→∞
q
=
p
=1
p≥q
p=q
Let G be the ultimate gain or loss.
(
a − z, with probability pz
G=
−z,
with probability qz
(
apz − z,
E[G] =
0,
if p 6= q
if p = q
(5.3)
(5.4)
Fair game remains fair if the coin is fair then then games based on it have expected
reward 0.
Duration of a Game
Let Dz be the expected time until the random walk hits 0
or a, starting from z. Is Dz finite? Dz is bounded above by x the mean of geometric
random variables (number of window’s of size a before a window with all +10 s or
−10 s). Hence Dz is finite. Consider the first step. Then
Dz = 1 + pDz+1 + qDz−1
E[duration] = E[E[duration | first step]]
= p (E[duration | first step up]) + q (E[duration | first step down])
= p(1 + Dz+1 ) + q(1 + Dz−1 )
Equation holds for 0 ≤ z ≤ a with D0 = Da = 0. Let’s try for a particular solution
Dz = Cz
Cz = Cp (z + 1) + Cq (z − 1) + 1
C=
1
q−p
for p 6= q
45
5.5. RANDOM WALKS
Consider the homogeneous relation
pt2 − t + q = 0
t1 = 1
t2 =
q
p
General Solution for p 6= q is
Dz = A + B
z
q
z
+
p
q=p
Substitute z = 0, a to get A and B
1−
Dz =
z
q
p
z
a
−
q−p q−p1− q a
p 6= q
p
If p = q then a particular solution is −z 2 . General solution
Dz − z 2 + A + Bz
Substituting the boundary conditions given.,
Dz = z(a − z)
p=q
Example. Initial Capital.
p
0.5
0.45
0.45
q
0.5
0.55
0.55
z
90
9
90
a
100
10
100
P(ruin)
0.1
0.21
0.87
E[gain]
0
-1.1
-77
E[duration]
900
11
766
Stop the random walk when it hits 0 or a.
We have absorption at 0 or a. Let
Uz,n = P(r.w. hits 0 at time n—starts at z)
Uz,n+1 = pUz+1,n + qUz−1,n
U0,n = Ua,n = 0
n≥0
Ua,0 = 1Uz,0 = 0
0≤z≤a
∞
X
Let Uz =
Uz,n sn .
n=0
Now multiply by sn+1 and add for n = 0, 1, 2 . . .
Uz (s) = psUz+1 (s) + qsUz−1 (s)
Where U0 (s) = 1 and Ua (s) = 0
Look for a solution
z
Ux (s) = (λ(s)) λ(s)
0≤z≤a
n≥0
46
CHAPTER 5. GENERATING FUNCTIONS
Must satisfy
2
λ(s) = ps ((λ(s)) + qs
Two Roots,
λ1 (s), λ2 (s) =
1±
p
1 − 4pqs2
2ps
Every Solution of the form
z
z
Uz (s) = A(s) (λ1 (s)) + B(s) (λ2 (s))
Substitute U0 (s) = 1 and Ua (s) = 0.A(s) + B(s) = 1 and
a
a
A(s) (λ1 (s)) + B(s) (λ2 (s)) = 0
a
z
z
a
(λ1 (s)) (λ2 (s)) − (λ1 (s)) (λ2 (s))
a
a
(λ1 (s)) − (λ2 (s))
q
But λ1 (s)λ2 (s) = recall quadratic
p
a−z
a−z
− (λ2 (s))
q (λ1 (s))
Uz (s) =
a
a
p
(λ1 (s)) − (λ2 (s))
Uz (s) =
Same method give generating function for absorption probabilities at the other barrier.
Generating function for the duration of the game is the sum of these two generating
functions.
Chapter 6
Continuous Random Variables
In this chapter we drop the assumption that Ω id finite or countable. Assume we are
given a probability p on some subset of Ω.
For example, spin a pointer, and let ω ∈ Ω give the position at which it stops, with
Ω = ω : 0 ≤ ω ≤ 2π. Let
θ
(0 ≤ θ ≤ 2π)
2π
Definition 6.1. A continuous random variable X is a function X : Ω → R for which
Z b
P(a ≤ X(ω) ≤ b) =
f (x)dx
P(ω ∈ [0, θ]) =
a
Where f (x) is a function satisfying
1. f (x) ≥ 0
R +∞
2. −∞ f (x)dx = 1
The function f is called the Probability Density Function.
For example, if X(ω) = ω given position of the pointer then x is a continuous
random variable with p.d.f
(
1
, (0 ≤ x ≤ 2π)
f (x) = 2π
(6.1)
0,
otherwise
This is an example of a uniformly distributed random variable. On the interval [0, 2π]
47
48
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
in this case. Intuition about probability density functions is based on the approximate
relation.
Z
x+xδx
P(X ∈ [x, x + xδx]) =
f (z)dz
x
Proofs however more often use the distribution function
F (x) = P(X ≤ x)
F (x) is increasing in x.
If X is a continuous random variable then
Z x
F (x) =
f (z)dz
−∞
and so F is continuous and differentiable.
0
F (x) = f (x)
(At any point x where then fundamental theorem of calculus applies).
The distribution function is also defined for a discrete random variable,
X
F (x) =
pω
ω:X(ω)≤x
and so F is a step function.
In either case
P(a ≤ X ≤ b) = P(X ≤ b) − P(X ≤ a) = F (b) − F (a)
49
Example. The exponential distribution. Let
(
1 − e−λx , 0 ≤ x ≤ ∞
F (x) =
0,
x≤0
(6.2)
The corresponding pdf is
f (x) = λe−λx
0≤x≤∞
this is known as the exponential distribution with parameter λ. If X has this distribution then
P(X ≤ x + z|X ≤ z) =
P(X ≤ x + z)
P(X ≤ z)
e−λ(x+z)
e−λz
−λx
=e
= P(X ≤ x)
=
This is known as the memoryless property of the exponential distribution.
Theorem 6.1. If X is a continuous random variable with pdf f (x) and h(x) is a continuous strictly increasing function with h−1 (x) differentiable then h(x) is a continuous
random variable with pdf
d −1
h (x)
fh (x) = f h−1 (x)
dx
Proof. The distribution function of h(X) is
P(h(X) ≤ x) = P X ≤ h−1 (x) = F h−1 (x)
Since h is strictly increasing and F is the distribution function of X Then.
d
P(h(X) ≤ x)
dx
is a continuous random variable with pdf as claimed fh . Note usually need to repeat
proof than remember the result.
50
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example. Suppose X ∼ U [0, 1] that is it is uniformly distributed on [0, 1] Consider
Y = − log x
P(Y ≤ y) = P(− log X ≤ y)
= P X ≥ e−Y
Z 1
=
1dx
e−Y
= 1 − e−Y
Thus Y is exponentially distributed.
More generally
Theorem 6.2. Let U ∼ U [0, 1]. For any continuous distribution function F, the random
variable X defined by X = F −1 (u) has distribution function F .
Proof.
P(X ≤ x) = P F −1 (u) ≤ x
= P(U ≤ F (x))
= F (x) ∼ U [0, 1]
Remark
1. a bit more messy for discrete random variables
P(X = Xi ) = pi
i = 0, 1, . . .
Let
X = xj if
j−1
X
pi ≤ U ≤
i=0
j
X
pi
U ∼ U [0, 1]
i=0
2. useful for simulations
6.1
Jointly Distributed Random Variables
For two random variables X and Y the joint distribution function is
F (x, y) = P(X ≤ x, Y ≤ y)
F : R2 → [0, 1]
Let
FX (x) = P(Xz ≤ x)
= P(X ≤ x, Y ≤ ∞)
= F (x, ∞)
= lim F (x, y)
y→∞
51
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES
This is called the marginal distribution of X. Similarly
FY (x) = F (∞, y)
X1 , X2 , . . . , Xn are jointly distributed continuous random variables if for a set c ∈ Rb
ZZ
Z
P((X1 , X2 , . . . , Xn ) ∈ c) =
...
f (x1 , . . . , xn )dx1 . . . dxn
(x1 ,...,xn )∈c
For some function f called the joint probability density function satisfying the obvious
conditions.
1.
f (x1 , . . . , xn )dx1 ≥ 0
2.
ZZ
Z
...
f (x1 , . . . , xn )dx1 . . . dxn = 1
Rn
Example. (n = 2)
F (x, y) = P(X ≤ x, Y ≤ y)
Z x Z y
=
f (u, v)dudv
−∞
−∞
2
and so f (x, y) =
∂ F (x, y)
∂x∂y
Theorem 6.3. provided defined at (x, y). If X and y are jointly continuous random
variables then they are individually continuous.
Proof. Since X and Y are jointly continuous random variables
P(X ∈ A) = P(X ∈ A, Y ∈ (−∞, +∞))
Z Z
∞
=
f (x, y)dxdy
A
−∞
= fA fX (x)dx
Z
∞
where fX (x) =
f (x, y)dy
−∞
is the pdf of X.
Jointly continuous random variables X and Y are Independent if
f (x, y) = fX (x)fY (y)
Then P(X ∈ A, Y ∈ B) = P(X ∈ A) P(Y ∈ B)
Similarly jointly continuous random variables X1 , . . . , Xn are independent if
f (x1 , . . . , xn ) =
n
Y
i=1
fXi (xi )
52
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Where fXi (xi ) are the pdf’s of the individual random variables.
Example. Two points X and Y are tossed at random and independently onto a line
segment of length L. What is the probability that:
|X − Y | ≤ l?
Suppose that “at random” means uniformly so that
f (x, y) =
1
L2
x, y ∈ [0, L]2
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES
53
Desired probability
ZZ
=
f (x, y)dxdy
A
area of A
L2
2
L − 2 21 (L − l)2
=
L2
2Ll − l2
=
L2
=
Example (Buffon’s Needle Problem). A needle of length l is tossed at random onto a
floor marked with parallel lines a distance L apart l ≤ L. What is the probability that
the needle intersects one of the parallel lines.
Let θ ∈ [0, 2π] be the angle between the needle and the parallel lines and let x be
the distance from the bottom of the needle to the line closest to it. It is reasonable to
suppose that X is distributed Uniformly.
X ∼ U [0, L]
Θ ∼ U [0, π)
and X and Θ are independent. Thus
f (x, θ) =
1
0 ≤ x ≤ L and 0 ≤ θ ≤ π
lπ
54
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
The needle intersects the line if and only if X ≤ sin θ The event A
ZZ
=
f (x, θ)dxdθ
A
Z π
sin θ
=l
dθ
πL
0
2l
=
πL
Definition 6.2. The expectation or mean of a continuous random variable X is
Z ∞
E[X] =
xf (x)dx
−∞
R∞
provided not both of
−∞
xf (x)dx and
R0
−∞
xf (x)dx are infinite
Example (Normal Distribution). Let
f (x) = √
−(x−µ)2
1
e 2σ2
2πσ
−∞≤x≤∞
This is non-negative for it to be a pdf we also need to check that
Z ∞
f (x)dx = 1
−∞
Make the substitution z =
x−µ
σ .
Then
Z ∞
−(x−µ)2
1
e 2σ2 dx
2πσ −∞
Z Z ∞
−z 2
1
=√
e 2 dz
2π
−∞
I=√
Thus I 2 =
1
2π
Z
∞
e
−∞
−x2
2
Z
dx
∞
e
−y 2
2
dy
−∞
=
1
2π
Z
∞
Z
∞
e
−(y 2 +x2 )
2
dxdy
−∞ −∞
Z 2π Z ∞
−θ 2
1
re 2 drdθ
2π 0
0
Z 2π
=
dθ = 1
=
0
Therefore I = 1. A random variable with the pdf f(x) given above has a Normal
distribution with parameters µ and σ 2 we write this as
X ∼ N [µ, σ 2 ]
The Expectation is
Z ∞
−(x−µ)2
1
xe 2σ2 dx
2πσ −∞
Z ∞
Z ∞
−(x−µ)2
−(x−µ)2
1
1
=√
(x − µ)e 2σ2 dx + √
µe 2σ2 dx.
2πσ −∞
2πσ −∞
E[X] = √
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES
55
The first term is convergent and equals zero by symmetry, so that
E[X] = 0 + µ
=µ
Theorem 6.4. If X is a continuous random variable then,
Z ∞
Z ∞
E[X] =
P(X ≥ x) dx −
P(X ≤ −x) dx
0
0
Proof.
Z
∞
Z
∞
Z
∞
P(X ≥ x) dx =
Z0 ∞ Z
0
f (y)dy dx
x
∞
I[y ≥ x]f (y)dydx
=
0
0
Z
∞
Z
y
dxf (y)dy
=
Z0 ∞
=
0
yf (y)dy
0
∞
Z
Z
0
P(X ≤ −x) dx =
Similarly
yf (y)dy
−∞
0
result follows.
Note This holds for discrete random variables and is useful as a general way of
finding the expectation whether the random variable is discrete or continuous.
If X takes values in the set [0, 1, . . . , ] Theorem states
E[X] =
∞
X
P(X ≥ n)
n=0
and a direct proof follows
∞
X
P(X ≥ n) =
n=0
=
=
∞ X
∞
X
I[m ≥ n]P(X = m)
n=0 m=0
∞
∞
X
X
m=0
∞
X
!
I[m ≥ n] P(X = m)
n=0
mP(X = m)
m=0
Theorem 6.5. Let X be a continuous random variable with pdf f (x) and let h(x) be
a continuous real-valued function. Then provided
Z
∞
−∞
|h(x)| f (x)dx ≤ ∞
Z ∞
h(x)f (x)dx
E[h(x)] =
−∞
56
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Proof.
∞
Z
P(h(X) ≥ y) dy
0
∞
Z
"Z
#
f (x)dx dy
=
0
x:h(x)≥0
∞
Z
Z
I[h(x) ≥ y]f (x)dxdy
=
0
x:h(x)≥0
"Z
Z
h(x)≥0
#
dy f (x)dx
=
0
x:h(x)
Z
=
h(x)f (x)dy
x:h(x)≥0
∞
Z
Z
P(h(X) ≤ −y) = −
Similarly
0
h(x)f (x)dy
x:h(x)≤0
So the result follows from the last theorem.
Definition 6.3. The variance of a continuous random variable X is
Var X = E (X − E[X])2
Note The properties of expectation and
are the same for discrete and continR
P variance
uous random variables just replace
with in the proofs.
Example.
2
Var X = E X 2 − E[X]
Z
Z ∞
=
x2 f (x)dx −
−∞
X −µ
≤z
σ
2
xf (x)dx
−∞
Example. Suppose X ∼ N [µ, σ 2 ] Let z =
P(Z ≤ z) = P
∞
X−µ
σ
then
= P(X ≤ µ + σz)
Z µ+σz
−(x−µ)2
1
√
=
e 2σ2 dx
2πσ
−∞
Z z
x−µ
1 −u2
√ e 2 du
Let u =
=
σ
2π
−∞
= Φ(z) The distribution function of a N (0, 1) random variable
Z ∼ N (0, 1)
6.2. TRANSFORMATION OF RANDOM VARIABLES
57
What is the variance of Z?
2
Var X = E Z 2 − E[Z] Last term is zero
Z ∞
−z 2
1
z 2 e 2 dz
=√
2π −∞
∞
Z ∞
−z 2
−z 2
1
= − √ ze 2
+
e 2 dz
2π
−∞
−∞
=0+1=1
Var X = 1
Variance of X?
X = µ + σz
Thus E[X] = µ we know that already
Var X = σ 2 Var Z
Var X = σ 2
X ∼ (µ, σ 2 )
6.2
Transformation of Random Variables
Suppose X1 , X2 , . . . , Xn have joint pdf f (x1 , . . . , xn ) let
Y1 = r1 (X1 , X2 , . . . , Xn )
Y2 = r2 (X1 , X2 , . . . , Xn )
..
.
Yn = rn (X1 , X2 , . . . , Xn )
Let R ∈ Rn be such that
P((X1 , X2 , . . . , Xn ) ∈ R) = 1
Let S be the image of R under the above transformation suppose the transformation
from R to S is 1-1 (bijective).
58
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Then ∃ inverse functions
x1 = s1 (y1 , y2 , . . . , yn )
x2 = s2 (y1 , y2 , . . . , yn ) . . .
xn = sn (y1 , y2 , . . . , yn )
Assume that
∂si
∂yj
exists and is continuous at every point (y1 , y2 , . . . , yn ) in S
∂s1
∂y
1
J = ...
∂sn
...
..
.
...
∂y1
∂s1 ∂yn .. . ∂sn (6.3)
∂yn
If A ⊂ R
Z
P((X1 , . . . , Xn ) ∈ A) [1] =
Z
···
f (x1 , . . . , xn )dx1 . . . dxn
A
Z
=
Z
···
f (s1 , . . . , sn ) |J| dy1 . . . dyn
B
Where B is the image of A
= P((Y1 , . . . , Yn ) ∈ B) [2]
Since transformation is 1-1 then [1],[2] are the same
Thus the density for Y1 , . . . , Yn is
g((y1 , y2 , . . . , yn ) = f (s1 (y1 , y2 , . . . , yn ), . . . , sn (y1 , y2 , . . . , yn )) |J|
y1 , y 2 , . . . , y n ∈ S
= 0 otherwise.
Example (density of products and quotients). Suppose that (X, Y ) has density
(
4xy, for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0,
Otherwise.
Let U =
X
Y
and V = XY
(6.4)
59
6.2. TRANSFORMATION OF RANDOM VARIABLES
√
X=
x=
r
V
U
r
v
y=
u
r
∂x
1 u
=
∂v
2 v
UV
Y =
√
uv
r
∂x
1 v
=
∂u
2 u
1
∂y
−1 v 2
=
∂u
2 u 23
Therefore |J| =
1
2u
∂y
1
= √ .
∂v
2 uv
and so
1
(4xy)
2u
r
√
v
1
× 4 uv
=
2u
u
u
if (u, v) ∈ D
=2
v
=0
Otherwise.
g(u, v) =
Note U and V are NOT independent
u
g(u, v) = 2 I[(u, v) ∈ D]
v
not product of the two identities.
When the transformations are linear things are simpler still. Let A be the n × n
invertible matrix.
 
 
Y1
X1
 .. 
 .. 
 .  = A . .
Yn
Xn
−1
|J| = det A
= det A−1
Then the pdf of (Y1 , . . . , Yn ) is
g(y1 , . . . ,n ) =
1
f (A−1 g)
det A
Example. Suppose X1 , X2 have the pdf f (x1 , x2 ). Calculate the pdf of X1 + X2 .
Let Y = X1 + X2 and Z = X2 . Then X1 = Y − Z and X2 = Z.
1 −1
−1
A =
(6.5)
0 1
det A−1 = 1
1
det A
Then
g(y, z) = f (x1 , x2 ) = f (y − z, z)
60
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
joint distributions of Y and X.
Marginal density of Y is
Z ∞
g(y) =
f (y − z, z)dz
−∞≤y ≤∞
−∞
Z ∞
f (z, y − z)dz By change of variable
or g(y) =
−∞
If X1 and X2 are independent, with pgf’s f1 and f2 then
f (x1 , x2 ) = f (x1 )f (x2 )
Z ∞
and then g(y) =
f (y − z)f (z)dz
−∞
- the convolution of f1 and f2
For the pdf f(x) x̂ is a mode if f (x̂) ≥ f (x)∀x
x̂ is a median if
Z x̂
Z ∞
1
f (x)dx −
f (x)dx =
2
−∞
x̂
For a discrete random variable, x̂ is a median if
1
1
or P(X ≥ x̂) ≥
2
2
If X1 , . . . , Xn is a sample from the distribution then recall that the sample mean is
P(X ≤ x̂) ≥
n
1X
Xi
n 1
Let Y1 , . . . , Yn (the statistics) be the values of X1 , . . . , Xn arranged in increasing
order. Then the sample median is Y n+1 if n is odd or any value in
2
h
i
Y n2 , Y n+1 if n is even
2
If Yn = maxX1 , . . . , Xn and X1 , . . . , Xn are iidrv’s with distribution F and density f then,
P(Yn ≤ y) = P(X1 ≤ y, . . . , Xn ≤ y)
n
= (F (y))
61
6.2. TRANSFORMATION OF RANDOM VARIABLES
Thus the density of Yn is
g(y) =
d
n
(F (y))
dy
n−1
= n (F (y))
f (y)
Similarly Y1 = minX1 , . . . , Xn and is
P(Y1 ≤ y) = 1 − P(X1 ≥ y, . . . , Xn ≥ y)
n
= 1 − (1 − F (y))
Then the density of Y1 is
n−1
= n (1 − F (y))
f (y)
What about the joint density of Y1 , Yn ?
G(y, yn ) = P(Y1 ≤ y1 , Yn ≤ yn )
= P(Yn ≤ yn ) − P(Yn ≤ yn , Y1 ≥1 )
= P(Yn ≤ yn ) − P(y1 ≤ X1 ≤ yn , y1 ≤ X2 ≤ yn , . . . , y1 ≤ Xn ≤ yn )
n
n
= (F (yn )) − (F (yn ) − F (y1 ))
Thus the pdf of Y1 , Yn is
g(y1 , yn ) =
∂2
G(y1 , yn )
∂y1 ∂yn
n−2
= n(n − 1) (F (yn ) − F (y1 ))
=0
otherwise
f (y1 )f (yn )
− ∞ ≤ y1 ≤ yn ≤ ∞
What happens if the mapping is not 1-1? X = f (x) and |X| = g(x)?
Z
P(|X| ∈ (a, b)) =
b
(f (x) + f (−x)) dx
g(x) = f (x) + f (−x)
a
Suppose X1 , . . . , Xn are iidrv’s. What is the pdf of Y1 , . . . , Yn the order statistics?
(
n!f (y1 ) . . . f (yn ), y1 ≤ y2 ≤ · · · ≤ yn
g(y1 , . . . , yn ) =
(6.6)
0,
Otherwise
Example. Suppose X1 , . . . , Xn are iidrv’s exponentially distributed with parameter
λ. Let
z1 = Y1
z2 = Y2 − Y1
..
.
zn = Yn − Yn−1
Where Y1 , . . . , Yn are the order statistics of X1 , . . . , Xn . What is the distribution of
the z 0 s?
62
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Z = AY
Where

1
0
0
−1 1
0


A =  0 −1 1
 ..
..
..
 .
.
.
0
0 ...

...0 0
. . . 0 0

. . . 0 0

..
.. 
.
.
−1 1
(6.7)
det(A) = 1
h(z1 , . . . , zn ) = g(y1 , . . . , yn )
= n!f (y1 ) . . . f (yn )
= n!λn e−λy1 . . . e−λyn
= n!λn e−λ(y1 +···+yn )
= n!λn e−λ(z1 2z2 +···+nzn )
n
Y
λie−λizn+1−i
=
i=1
Thus h(z1 , . . . , zn ) is expressed as the product of n density functions and
Zn+1−i ∼ exp(λi)
exponentially distributed with parameter λi, with z1 , . . . , zn independent.
Example. Let X and Y be independent N (0.1) random variables. Let
D = R2 = X 2 + Y2
then tan Θ =
Y
X
then
d = x2 + y 2 and θ = arctan
2x
|J| = −y
x2
1+( xy )2
y
x
1
=2
x
y 2
1+( x )
2y
(6.8)
6.2. TRANSFORMATION OF RANDOM VARIABLES
63
1 −x2 1 −y2
f (x, y) = √ e 2 √ e 2
2π
2π
1 −(x2 +y2 )
2
=
e
2π
Thus
1 −d
e 2 0≤d≤∞
4π
But this is just the product of the densities
g(d, θ) =
1 −d
e 2
2
1
gΘ (θ) =
2π
0 ≤ θ ≤ 2π
0≤d≤∞
gD (d) =
0 ≤ θ ≤ 2π
Then D and Θ are independent. d ∼exponentially mean 2. Θ ∼ U [0, 2π].
Note this is useful for the simulations of the normal random variable.
We know we can simulate N [0, 1] random variable by X = f 0 (U ) when U ∼
U [0, 1] but this is difficult for N [0, 1] random variable since
Z
+x
F (x) = Θ(x) =
−∞
1 −z2
√ e 2
2π
is difficult.
Let U1 and U2 be independent ∼ U [0, 1]. Let R2 = −2 log U , so that R2 is
exponential with mean 2. Θ = 2πU2 . Then Θ ∼ U [0, 2π]. Now let
p
X = R cos Θ = −2 log U1 cos(2πU2 )
p
Y = R sin Θ = −2 log U2 sin(2πU1 )
Then X and Y are independent N [0, 1] random variables.
Example (Bertrand’s Paradox). Calculate√the probability that a “random chord” of
a circle of radius 1 has length greater that 3. The length of the side of an inscribed
equilateral triangle.
There are at least 3 interpretations of a random chord.
(1) The ends are independently and uniformly distributed over the circumference.
answer =
1
3
64
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
(2)The chord is perpendicular to a given diameter and the point of intersection is
uniformly distributed over the diameter.
2
a +
√ !2 √ 2
3
=
3
2
answer = 12
(3) The foot of the perpendicular to the chord from the centre of the circle is uniformly distributed over the diameter of the interior circle.
interior circle has radius 21 .
answer =
6.3
π
1
22
π12
=
1
4
Moment Generating Functions
If X is a continuous random variable then the analogue of the pgf is the moment generating function defined by
m(θ) = E eθx
for those θ such that m(θ) is finite
Z
∞
m(θ) =
−∞
where f (x) is the pdf of X.
eθx f (x)dx
6.3. MOMENT GENERATING FUNCTIONS
65
Theorem 6.6. The moment generating function determines the distribution of X, provided m(θ) is finite for some interval containing the origin.
Proof. Not proved.
Theorem 6.7. If X and Y are independent random variables with moment generating
function mx (θ) and my (θ) then X + Y has the moment generating function
mx+y (θ) = mx (θ) × my (θ)
Proof.
i
h
E eθ(x+y) = E eθx eθy
= E eθx E eθy
= mx (θ)my (θ)
Theorem 6.8. The rth moment of X ie the expected value of X r , E[X r ], is the coeffir
cient of θr! of the series expansion of n(θ).
Proof. Sketch of....
θ2 2
X + ...
2!
θ2 E eθX = 1 + θE[X] + E X 2 + . . .
2!
eθX = 1 + θX +
Example. Recall X has an exponential distribution, parameter λ if it has a density
λeλx for 0 ≤ x ≤ ∞.
Z ∞
E eθX =
eθx λeλx dx
0
Z ∞
=λ
e−(λ−θ)x dx
0
λ
= m(θ) for θ ≤ λ
=
λ−θ
0
λ
1
E[X] = m (0) =
=
(λ − θ)2 θ=0
λ
2λ
2
E X2 =
= 2
(λ − θ)2 θ=0
λ
Thus
2
Var X = E X 2 − E[X]
2
1
= 2− 2
λ
λ
66
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example. Suppose X1 , . . . , Xn are iidrvs each exponentially distributed with parameter λ.
Claim : X1 , . . . , Xn has a gamma distribution, Γ(n, λ) with parameters n, λ. With
density
λn e−λx xn−1
0≤x≤∞
(n − 1)!
we can check that this is a density by integrating it by parts and show that it equals 1.
i
h
E eθ(X1 +···+Xn ) = E eθX1 . . . E eθXn
n
= E eθX1
n
λ
=
λ−θ
Suppose that Y ∼ Γ(n, λ).
Z ∞
λn e−λx xn−1
E eθY =
eθx
dx
(n − 1)!
0
n Z ∞
λ
(λ − θ)n e−(λ−θ)x xn−1
=
dx
λ−θ
(n − 1)!
0
Hence claim, since moment generating function characterizes distribution.
Example (Normal Distribution). X ∼ N [0, 1]
Z
∞
x−µ 2
1
e−( 2σ2 ) dx
2πσ
−∞
Z ∞
1
−1 2
2
2
√
=
exp
(x
−
2xµ
+
µ
−
2θσ
x)
dx
2σ 2
2πσ
−∞
Z ∞
−1
1
2 2
2
2 4
√
=
exp
(x
−
µ
−
θσ
)
−
2µσ
θ
−
θ
σ
dx
2σ 2
2πσ
−∞
Z ∞
2
1
−1
µθ+θ 2 σ2
2 2
√
=e
exp
(x − µ − θσ ) dx
2σ 2
2πσ
−∞
E eθX =
eθx √
The integral equals 1 are it is the density of N [µ + θσ 2 , σ 2 ]
= eµθ+θ
2 σ2
2
Which is the moment generating function of N [µ, σ 2 ] random variable.
Theorem 6.9. Suppose X, Y are independent X ∼ N [µ1 , σ12 ] and Y ∼ N [µ2 , σ22 ]
then
1.
X + Y ∼ N [µ1 + µ2 , σ12 + σ22 ]
2.
aX ∼ N [aµ1 + a2 σ 2 ]
67
6.4. CENTRAL LIMIT THEOREM
Proof.
1.
h
i
E eθ(X+Y ) = E eθX E eθY
1
2 2
2 2
1
= e(µ1 θ+ 2 σ1 θ ) e(µ2 θ+ 2 σ2 θ
2
1
2
= e(µ1 +µ2 )θ+ 2 (σ1 +σ2 )θ
which is the moment generating function for
N [µ1 + µ2 , σ12 + σ22 ]
2.
h
i
h
i
E eθ(aX) = E e(θa)X
1
2
1
2
= eµ1 (θa)+ 2 σ1 (θa)
= e(aµ1 )θ+ 2 a
σ12 θ 2
which is the moment generating function of
N [aµ1 , a2 σ12 ]
6.4
Central Limit Theorem
X1 , . . . , Xn iidrv’s, mean 0 and variance σ 2 . Xi has density
Var Xi = σ 2
X1 + · · · + Xn has Variance
Var X1 + · · · + Xn = nσ 2
2
2
)
68
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
X1 +···+Xn
n
X1 +···+X
n
√
n
has Variance
Var
X1 + · · · + Xn
σ2
=
n
n
Var
X1 + · · · + Xn
√
= σ2
n
has Variance
Theorem 6.10. Let X1 , . . . , Xn be iidrv’s with E[Xi ] = µ and Var Xi = σ 2 ≤ ∞.
Sn =
n
X
Xi
1
Then ∀(a, b) such that −∞ ≤ a ≤ b ≤ ∞
Sn − nσ
√
lim P a ≤
≤b
n→∞
σ n
Which is the pdf of a N [0, 1] random variable.
Z
=
a
b
1 −z2
√ e 2 dz
2π
69
6.4. CENTRAL LIMIT THEOREM
Proof. Sketch of proof.....
WLOG take µ = 0 and σ 2 = 1. we can replace Xi by Xiσ−µ . mgf of Xi is
mXi (θ) = E eθXi
θ2 θ3 = 1 + θE[Xi ] + E Xi2 + E Xi3 + . . .
2
3!
θ2
θ3 3 =1+
+ E Xi + . . .
2
3!
The mgf of
Sn
√
n
h θ
i
h Sn i
√ (X +···+Xn )
θ√
E e n =E e n 1
h θ
i
h θ
i
√ X
√ X
= E e n 1 ...E e n n
h θ
in
√ X
=E e n 1
n
θ
= mX1 √
n
!n
θ3 E X 3
θ2
= 1+
+
3
2n
3!n 2
=→ e
θ2
2
as n → ∞
Which is the mgf of N [0, 1] random variable.
Note if Sn ∼ Bin[n, p] Xi = 1 with probability p and = 0 with probability (1−p).
Then
Sn − np
' N [0, 1]
√
npq
This is called the normal approximation the the binomial distribution. Applies as n →
∞ with p constant. Earlier we discussed the Poisson approximation to the binomial.
which applies when n → ∞ and np is constant.
Example. There are two competing airlines. n passengers each select 1 of the 2 plans
at random. Number of passengers in plane one
1
S ∼ Bin[n, ]
2
Suppose each plane has s seats and let
f (s) = P(S ≤ s)
S − np
' n[0, 1]
√
npq
s − 12 n
S − 21 n
f (s) = P
≤
√
√
1
1
n
2 n
2
2s − n
√
=1−Φ
n
therefore if n = 1000 and s = 537 then f (s) = 0.01. Planes hold 1074 seats only 74
in excess.
70
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example. An unknown fraction of the electorate, p, vote labour. It is desired to find p
within an error no exceeding 0.005. How large should the sample be.
0
Let the fraction of labour votes
sample be p . We can never be certain (with in the
0
out complete enumeration), that p − p ≤ 0.005. Instead choose n so that the event
0
p − p ≤ 0.005 have probability ≥ 0.95.
0
P p − p ≤ 0.005 = P(|Sn − np| ≤ 0.005n)
√ |Sn − np|
0.005 n
√
=P
≤
√
npq
n
Choose n such that the probability is ≥ 0.95.
Z
1.96
−1.96
1 −x2
√ e 2 dx = 2Φ(1.96) − 1
2π
We must choose n so that
√
0.005 n
√
≥ 1.96
n
But we don’t know p. But pq ≤
1
4
with the worst case p = q =
n≥
1
2
1.962 1
' 40, 000
0.0052 4
If we replace 0.005 by 0.01 the n ≥ 10, 000 will be sufficient. And is we replace 0.005
by 0.045 then n ≥ 475 will suffice.
Note Answer does not depend upon the total population.
71
6.5. MULTIVARIATE NORMAL DISTRIBUTION
6.5
Multivariate normal distribution
Let x1 , . . . , Xn be iid N [0, 1] random variables with joint density g(x1 , . . . xn )
g(x1 , . . . xn ) =
n
Y
1 −x2i
√ e 2
2π
i=1
−1 Pn
2
1
i=1 xi
2
n e
2
(2π)
−1
1
x∧~
x
2 ~
=
n e
(2π) 2
=
Write


X1
 X2 

~ =
X
 .. 
 . 
Xn
~ where A is an invertible matrix ~x = A−1 (~x − µ
and let ~z = µ
~ + AX
~ ) . Density of ~z
T
−1
−1
−1
1
1
e 2 (A (~z−~µ)) (A (~z−~µ))
n
2
det
A
(2π)
−1
1
z −~
µ)T Σ−1 (~
z −~
µ)
2 (~
=
1 e
n
2
(2π) 2 |Σ|
f (z1 , . . . , zn ) =
where ΣAAT . This is the multivariate normal density
~z ∼ M V N [~
µ, Σ]
Cov(zi , zj ) = E[(zi − µi )(zj − µj )]
But this is the (i, j) entry of
h
i
~
~ T
E (~z − µ
~ )(~z − µ
~ )T = E (AX)(A
X)
= AE XX T AT
= AIAT
= AAT = Σ Covariance matrix
If the covariance matrix of the MVN distribution is diagonal, then the components of
the random vector ~z are independent since
f (z1 , . . . , zn ) =
Where
n
Y
1
i=1
(2π) 2 σi
1
e
−1
2
 2
σ1
0

Σ= .
 ..
0
σ22
..
.
...
...
..
.
0
0
..
.
0
0
...
σn2
“
zi −µi
σi
”2





Not necessarily true if the distribution is no MVN recall sheet 2 question 9.
72
CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example (bivariate normal).
f (x1 , x2 ) =
1
1
2π(1 − p2 ) 2 σ1 σ2
"
×
"
2
1
x1 − µ1
exp −
−
2(1 − p2 )
σ1
2 ##
x1 − µ1
x2 − µ2
x1 − µ1
2p
+
σ1
σ2
σ1
σ1 , σ2 ≤ 0 and −1 ≤ p ≤ +1. Joint distribution of a bivariate normal random
variable.
Example. An example with
Σ−1 =
1
σ12
pσ1−1 σ2−1
σ22
1 − p2 pσ1−1 σ2−1
2
σ1
pσ1 σ2
Σ=
pσ1 σ2
σ22
E[Xi ] = µi and Var Xi = σi2 . Cov(X1 , X2 ) = σ1 σ2 p.
Correlation(X1 , X2 ) =
Cov(X1 , X2 )
=p
σ1 σ2
Related documents