Download Notes on random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
7
Random variables
A random variable is a real-valued function defined on some sample space. That is, it
associates to each elementary outcome in the sample space a numerical value.
Example 1. Consider tossing a coin n times. If X is the number of “heads” obtained, X
is a random variable.
Example 2. Consider a stock price which moves each day either up one unit or down one
unit, and suppose its initial value is 10$. Let T be the first time the value of the stock hits
either 0$ or 20$. Then T is a random variable.
Example 3. The lifetime T of a lightbulb is a random variable.
In the last example, if we can measure time with infinite precision, then the possible
values of T are the non-negative real numbers [0, ∞). This is an uncountable set: there is
no way to enumerate [0, ∞) in a sequence. We will have to treat random variables of this
type separately from the random variables which take values in a countable set. While in
practice time can only be measured up to finite precision and consequently the possible
values of T will in fact be countable, it is still more convenient mathematically to make
the idealization that all values in [0, ∞) are possible, and we will do so.
7.1
Distribution functions
For a random variable X, we can associate the distribution function FX (·), sometimes
called the cumulative distribution function, defined by
FX (t) = P (X ≤ t) .
(1)
Notice that FX (·) is defined on all real numbers. The distribution function determines the
probability that X falls in an interval:
P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = FX (b) − FX (a) .
Example 4. Suppose a coin with probability p of landing “heads” is tossed until the first
time a “heads” appears. Let T be the number of tosses required.
For a real number t, let [t] denote the integer part of t. We have
P (T > t) = P (T > [t]) = P (first [t] tosses are “tails”) = (1 − p)[t] .
1
Consequently,
(
1 − (1 − p)[t]
FT (t) = P (T ≤ t) =
0
if t ≥ 0 ,
if t < 0 .
We can use this to compute
P (2 < T ≤ 5) = 1 − (1 − p)5 − 1 − (1 − p)2 = (1 − p)2 − (1 − p)5 .
We may want to find the probability that X falls in a closed interval. To this end, we
need the following:
Proposition 1. Let X be a random variable with distribution function F . Then
P (X < t) = lim F (s) .
s↑t
The value lims↑t F (s) is called the left limit of F at t, and is denoted sometime by
F (t−). Part of the conclusion of Proposition 1 is that a distribution function has left limits
everywhere.
To prove this, we first need to show that a probability P (·) obeys a certain kind of
continuity.
Lemma 2.
(i) Let A1 ⊂ A2 ⊂ A3 ⊂ · · · be a non-decreasing sequence of events. Then
lim P (An ) = P (
n→∞
∞
[
Ak ) .
k=1
(ii) Let A1 ⊃ A2 ⊃ A3 ⊃ · · · be a non-increasing sequence of events. Then
lim P (An ) = P (
n→∞
∞
\
Ak ) .
k=1
Proof. We prove (i). The proof of (ii) is obtained by looking at complements and using (i),
and is left to the reader as an exercise.
Sn
S∞
S∞Define Bk = Ak \ Ak−1 . The events {Bk } are disjoint, An = k=1 Bk , and k=1 Ak =
k=1 Bk . Thus,
P(
∞
[
k=1
Ak ) = P (
∞
[
k=1
Bk ) =
∞
X
P (Bk )
k=1
= lim
n
X
n→∞
k=1
2
P (Bk ) = lim P (
n→∞
n
[
k=1
Bk ) = lim P (An ) .
n→∞
S
1
Proof of Proposition 1. Notice that {X < t} = ∞
k=1 {X ≤ t − k }. Then applying Lemma
2 gives
1
1
P (X < t) = lim P (X ≤ t − ) = lim F (t − ) = lim F (s) .
n→∞
n→∞
s↑t
n
n
Thus, we can also use the distribution function of X to calculate other probabilities
involving X:
P (a ≤ X ≤ b) = P (X ≤ b) − P (X < a) = F (b) − F (a−)
P (X = a) = P (X ≤ a) − P (X < a) = F (a) − F (a−) .
(2)
A random variable X is proper if P (−∞ < X < ∞) = 1. Almost all random variables
we will encounter will be proper, but it is worth noting that there do exist random variables
which are not proper.
Example 5. Suppose a particle moves on the integer {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} as
follows: at each move, it moves up one integer with probability 2/3, and moves down one
integer with probability 1/3. The particle starts at 0. Let T be the first time that the
particle is at −1. The event that the particle never hits −1 is {T = ∞}. We will see later
that P (T < ∞) < 1, so that T is not a proper random variable.
S
Writing {X < ∞} = ∞
k=1 {X < k}, if X is a finite random variable then applying
Lemma 2 again shows
1 = P (X < ∞) = lim P (X ≤ n) = lim FX (n) .
n→∞
n→∞
Also,
0 = P (X < −∞) = lim P (X ≤ −n) = lim FX (−n) = lim FX (n) .
m→∞
n→∞
n→−∞
T∞
1
Finally, since {X ≤ t} = k=1 {X ≤ t + k }, using part (ii) of Lemma 2 shows that
lims↓t F (s) = F (t), and so F is right-continuous everywhere.
We summarize the properties of the distribution function of a random variable X as
follows:
Proposition 3. Let X be a proper random variable with distribution function F . Then
(i) F is right-continuous: lims↓t F (s) = F (t),
(ii) the left-limits of F exist everywhere,
3
(iii) limt→∞ F (t) = 1.
(iv) limt→−∞ F (t) = 0.
The first two properties imply that the worst behavior possible of a distribution function
is that it jumps.
8
Discrete random variables
We call a random variable which can take on only countable many values a discrete random
variable.
8.1
Probability mass functions
Let X be a discrete random variable which takes values in the set A = {a0 , a1 , a2 , . . .}.
Associated to X is the function pX (·), defined by
pX (a) = P (X = a) .
(3)
The function pX (·) is defined for all real numbers, although it will be strictly positive only
for a in the set A. A function p(·) satisfying
(i) p(a) ≥ 0 for all a,
P
(ii)
a p(a) = 1
is called a probability mass function, or pmf for short. It is easily checked that pX (·) satisfies
these conditions, and we call it the pmf of X. We write X ∼ p(·) to indicate that X has
pmf p(·).
Notice that from (2) we have
pX (a) = P (X = a) = FX (a) − FX (a−) ,
(4)
so the pmf of X can be determined if the distribution function of X is known.
Example 6. Suppose n independent experiments are performed, each of which can result in
either “success” or “failure”, and suppose that the probability of success on each experiment
is p. Such a sequence of experiments is called Bernoulli trials. Let X be the number of
successes in these n experiments. The event {X = k} contains all outcomes containing
4
exactly k successes and n−k failures. There are nk such outcomes, each having probability
pk (1 − p)n−k (by independence.) Thus
n k
pX (k) = P (X = k) =
p (1 − p)n−k .
(5)
k
A random variable X having the pmf in (5) is called a Binomial(n, p) random variable,
and we write X ∼ Binomial(n, p).
8.2
The distribution of a discrete random variable
An event determined by X is an event of the form {X ∈ A}, where A is a subset of the real
numbers. We can find the probability of any event determined by X using only the pmf of
X:
X
pX (a) .
(6)
P (X ∈ A) =
a∈A
Applying (6) to the set A = (−∞, t] gives
FX (t) = P (X ≤ t) =
X
pX (a) .
(7)
a∈A
Thus the distribution function of X can be computed from the pmf of X.
To summarize, we record the following:
Proposition 4. Let X be a discrete random variable. Each of the following can be computed
using any of the others:
(i) The probabilities of all events determined by X, that is, the collection of probabilities
{P (X ∈ A) : A ⊂ R},
(ii) the pmf pX (·) of X,
(iii) the distribution function FX (·) of X.
Proof. This is the content of the combination of equations (4), (6), and (7).
The collection of probabilities {P (X ∈ A) : A ⊂ R} is called simply the distribution of
X. It contains all the probabilistic information about the random variable X. Proposition
4 says that for a discrete random variable, it is enough to specify either the pmf or the
distribution function to specify the distribution. Thus, if one is asked to determine the
distribution of X, it is sufficient to provide either the pmf or the distribution function.
5
9
Continuous random variables
A probability density function (abbreviated pdf or sometimes simply density) is a realvalued function f defined on the real numbers satisfying
(i) f (t) ≥ 0 for all real numbers t,
R∞
(ii) −∞ f (t)dt = 1.
A continuous random variable is a random variable X for which there exists a pdf fX
so that
Zb
(8)
P (a < X ≤ b) = fX (t)dt for all a < b .
a
In fact, if (8) holds, then for any subset of real numbers A such that
the identity
Z
P (X ∈ A) = f (t)dt
R
A
f (t)dt is defined,
(9)
A
is valid.
Applying (9) to the set (−∞, t] shows that
Zt
FX (t) = P (X ≤ t) =
f (s)ds ,
(10)
−∞
and so the distribution function of X can be determined from the density function of X.
Note that a consequence of (10) is that FX is a continuous function for a continuous
random variable, and in particular
P (X = a) = FX (a) − FX (a−) = 0 .
Applying the Fundamental Theorem of Calculus to (10) shows that
d
FX (t) = fX (t) ,
dt
(11)
at all points t where fX is continuous. Thus if fX is piecewise continuous, as will be the
case in this course, then it can be determined from the distribution function via (11).
The following summarizes the situation for continuous random variables with piecewise
continuous densities:
6
Proposition 5. Let X be a continuous random variable with piecewise continuous density.
Each of the following can be computed using any of the others:
(i) The probabilities of all events determined
by X, that is, the collection of probabilities
R
{P (X ∈ A) : A ⊂ R such that A f (t)dt is defined},
(ii) the pdf fX (·) of X,
(iii) the distribution function FX (·) of X.
Proof. This is what equations (9), (10), (11) say.
9.1
Interpretation of density function
What is the interpretation of the density function? Suppose that X has a density f , which
is continuous at the point a. We have
P (a ≤ X ≤ a + ∆)
F (a + ∆) − F (a)
=
.
∆
∆
The right-hand side tends to F 0 (a) = f (a) as ∆ → 0. Thus we can write that
P (a ≤ X ≤ a + ∆)
= f (a) + ε0 (∆) ,
∆
where ε0 (∆) → 0 as ∆ → 0. Multiplying both sides by ∆, we have that
P (a ≤ X ≤ a + ∆) = f (a)∆ + ∆ε0 (∆) .
| {z }
ε(∆)
If ε(∆) = ∆ε0 (∆), then ε(∆)/∆ → 0 as ∆ → 0. Thus we can write
P (a ≤ X ≤ a + ∆) ≈ f (a)∆ ,
(12)
where the error in the approximation is ε(∆) and satisfies ε(∆)/∆ → 0 as ∆ → 0. Equation
(12) is useful in interpreting the meaning of a probability density function: the probability
of X falling in a very small interval near a is approximated by f (a)∆, where ∆ is the length
of the interval.
7
10
Expected Value
Let X be a discrete random variable with
2

5

2
pX (a) = 51



5
0
the following pmf
if
if
if
if
a = −1 ,
a = 0,
a = 1,
a 6∈ {−1, 0, 1} .
How should the “average” value of X be defined? A first attempt might be to say that the
average value should be 0, since 0 is in the center of the three possible values {−1, 0, 1}. But
this does not take into account that X does not assume these values with equal probability.
The average should account for not just the values taken on by X, but also the probabilities
associated to each of these values.
This leads to the definition of the expectation of X, which is a weighted average of the
values of X, the weights determined by the pmf or pdf. Precisely, we define
(P
apX (a)
if X is discrete
(13)
E(X) = R ∞a
tfX (t)dt if X is continuous .
−∞
E(X) is only defined when the sum or integral in (13) converges absolutely, that is, we
need
(P
|a|pX (a) < ∞
if X is discrete
R ∞a
|t|fX (t)dt < ∞ if X is continuous
−∞
In the example above,
2
1
1
2
E(X) = −1 + 0 + 1 = − .
5
5
5
5
We use the terms expected value, mean, and moment all to refer to the expectation of X.
Example 7. Let X be a Binomial(n, p) random variable. This means that X has a pmf
given by
n k
pX (k) =
p (1 − p)n−k ,
k
8
for k = 0, 1, . . . , n. The pmf is 0 for any other values. Then
n
X
n k
E(X) =
k
p (1 − p)n−k
k
k=0
n
X
n!
=
k (n − k)!pk (1 − p)n−k
k!
k=1
=
n
X
k−1
= np
= np
n!
pk (1 − p)n−k
(k − 1)!(n − k)!
n
X
k=1
n−1
X
(n − 1)!
pk−1 (1 − p)n−1−(k−1)
(k − 1)!(n − 1 − (k − 1))!
(n − 1)!
pk (1 − p)n−1−k
k!(n
−
1
−
k)!
k=0 |
{z
}
(n−1
)
k
|
{z
}
pmf of Binomial(n − 1, p) r.v.
= np .
Example 8. We say that X is an Exponential random variable with parameter λ if it has
a pdf
(
1
1 −λ
e t if t ≥ 0 ,
λ
f (t) =
0
if t < 0 .
X has the property that
P (X > t + s | X > t) = P (X > t) .
(The reader should check that!) The expected value is the integral
Z∞
E(X) =
1 1
t e− λ t dt .
λ
0
We can evaluate this by integration by parts: Set
1
v = −e− λ t
1 1
dv = e− λ t dt
λ
u=t
du = dt
9
so that
Z∞
0
Z∞
1 ∞
1
1 −1t
t e λ dt = −te− λ t + e− λ t dt .
λ
0
0
1 ∞
= −λe− λ t 0
=λ
10.1
Functions of random variables
If g : R → R is a function, and X is a random variable, then Y = g(X) is a new random
variable. To calculate E(Y ) according to the definition (13), we need the pmf if X is
discrete, or the pdf if X is continuous. Fortunately, the following proposition tells us how
to compute E(Y ) without finding its density or pmf.
Proposition 6. Let X be a random variable, and g a real-valued function.
(P
g(a)pX (a)
if X is discrete with pmf pX ,
E(g(X)) = R ∞a
g(t)fX (t)dt if X is continuous with pdf fX .
−∞
Proof. We prove the case where X is discrete:
X
bP (g(X) = b)
E(g(X)) =
b
=
X
X
b
b
a : g(a)=b
=
X X
=
X X
=
X
b
b
P (X = a)
bpX (a)
a : g(a)=b
g(a)pX (a)
a : g(a)=b
g(a)pX (a) .
a
An immediate corollary is the following:
10
Corollary 7. Let X be a random variable, and let α and β be constants. Then
E(αX + β) = αE(X) + β .
Proof. We write the continuous case, the discrete case is similar: Applying Proposition 6
to g(x) = αx + β gives
Z
E(αX + β) = (αt + β)fX (t)dt
Z
Z
= α tfX (t) + β fX (t)dt
= αE(X) + β .
11
Variance
Expectation measures the center of mass of a density or pmf. Variance is a measure of how
spread out the density or pmf of X is.
The random variable Y = (X − E(X))2 gives the squared distance of X to its mean
value. This measures how far X is from its center of mass. Taking the expectation of Y
gives the variance of X:
(P
(a − E(X))2 pX (a)
if X is discrete with pmf pX ,
V (X) = E(X − E(X))2 = R ∞a
2
(t − E(X)) fX (t)dt if X is continuous with pdf pX .
−∞
The following is a useful way to compute variance
Proposition 8. for a random variable X,
V (X) = E(X 2 ) − [E(X)]2 .
Proof. The proof is similar in the continuous and discrete cases, we show here the discrete
11
case:
V (X) =
X
=
X
=
X
(a − E(X))2 pX (a)
a
(a2 − 2aE(X) + [E(X)]2 )pX (a)
a
a2 pX (a) − 2E(X)
a
X
apX (a) + [E(X)]2
a
a
2
2
= E(X ) − 2E(X)E(X) + [E(X)]
= E(X 2 ) − [E(X)]2
12
X
pX (a)