Download Continuous Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Continuous Random Variables
COMP 245 STATISTICS
Dr N A Heard
Contents
1
Continuous Random Variables
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Probability Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
3
5
2
Mean, Variance and Quantiles
2.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
6
7
3
Continuous Distributions
8
3.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
1
1.1
Continuous Random Variables
Introduction
Definition
Suppose again we have a random experiment with sample space S and probability measure
P.
Recall our definition of a random variable as a mapping X : S → R from the sample space
S to the real numbers inducing a probability measure PX ( B) = P{ X −1 ( B)}, B ⊆ R.
We define the random variable X to be (absolutely) continuous if ∃ f X : R → R s.t.
PX ( B) =
Z
x∈B
f X ( x )dx,
B ⊆ R,
(1)
in which case f X is referred to as the probability density function (pdf) of X.
Comments
A connected sequence of comments:
• One consequence of this definition is that the probability of any singleton set B = { x }, x ∈
R is zero for a continuous random variable,
PX ( X = x ) = PX ({ x }) = 0.
• This in turn implies that any countable set B = { x1 , x2 , . . .} ⊆ R will have zero probability measure for a continuous random variable, since PX ( X ∈ B) = PX ( X = x1 ) + PX ( X =
x2 ) + . . ..
• This automatically implies that the range of a continuous random variable will be uncountable.
• This tells us that a random variable cannot be both discrete and continuous.
Examples
The following quantities would typically be modelled with continuous random variables.
They are measurements of time, distance and other phenomena that can, at least in theory, be
determined to an arbitrarily high degree of accuracy.
• The height or weight of a randomly chosen individual from a population.
• The duration of this lecture.
• The volume of fuel consumed by a bus on its route.
• The total distance driven by a taxi cab in a day.
Note that there are obvious ways to discretise all of the above examples.
2
1.2
Probability Density Functions
pdf as the derivative of the cdf
From (1), we notice the cumulative distribution function (cdf) for a continuous random
variable X is therefore given by
FX ( x ) =
Z x
−∞
∀ x ∈ R,
f X (t)dt,
This expression leads us to a definition of the pdf of a continuous r.v. X for which we
already have the cdf; by the Fundamental Theorem of Calculus we find the pdf of X to be
given by
f X (x) =
d
FX ( x )
dx
or
FX0 ( x ).
Properties of a pdf
Since the pdf is the derivative of the cdf, and because we know that a cdf is non-decreasing,
this tells us the pdf will always be non-negative.
So, in the same way as we did for cdfs and discrete pmfs, we have the following checklist
to ensure f X is a valid pdf.
pdf:
1. f X ( x ) ≥ 0, ∀ x ∈ R;
2.
Z ∞
−∞
f X ( x )dx = 1.
Interval Probabilities
Suppose we are interested in whether continuous a r.v. X lies in an interval ( a, b]. From the
definition of a continuous random variable, this is given by
PX ( a < X ≤ b) =
Z b
That is, the area under the pdf between a and b.
3
a
f X ( x )dx.
f(x)
P(a<X<b)
a
b
x
Further Comments:
• Besides still being a non-decreasing function satisfying FX (−∞) = 0, FX (∞) = 1, the cdf
FX of a continuous random variable X is also (absolutely) continuous.
• For a continuous r.v. since ∀ x, P( X = x ) = 0, Fx ( x ) = P( X ≤ x ) ≡ P( X < x ).
• For small δx, f X ( x )δx is approximately the probability that X takes a value in the small
interval, say, [ x, x + δx ).
• Since the density (pdf) f X ( x ) is not itself a probability, then unlike the pmf of a discrete
r.v. we do not require f X ( x ) ≤ 1.
• From (1) it is clear that the pdf of a continuous r.v. X completely characterises its distribution, so we often just specify f X .
Example
Suppose we have a continuous random variable X with probability density function
2
cx , 0 < x < 3
f (x) =
0,
otherwise
for some unknown constant c.
Questions
1. Determine c.
2. Find the cdf of X.
3. Calculate P(1 < X < 2).
Solutions
1. To find c: We must have
1=
Z ∞
−∞
f ( x )dx =
Z 3
0
1
⇒c= .
9
4
x3
cx dx = c
3
2
3
= 9c
0

 0,
Rx
Rx
2. F ( x ) =
f
(
u
)
du
=
0
 −∞
1,
u2
9 du
3. P(1 < X < 2) = F (2) − F (1) =
=
x<0
0≤x≤3
x > 3.
x3
27 ,
8
1
7
−
=
≈ 0.2593.
27 27
27
0.6
0.8
1.0
cdf
0.0
0.2
0.4
F(x)
0.6
0.4
0.0
0.2
f(x)
0.8
1.0
pdf
−1
0
1
2
3
4
5
−1
0
1
x
1.3
2
3
4
5
x
Transformations
Transforming random variables
Suppose we have a continuous random variable X and wish to consider the transformed
random variable Y = g( X ) for some function g : R → R, s.t. g is continuous and strictly
monotonic (so g−1 exists).
Suppose g is monotonic increasing. Then for y ∈ R, Y ≤ y ⇐⇒ X ≤ g−1 (y). So
FY (y) = PY (Y ≤ y) = PX ( X ≤ g−1 (y)) = FX ( g−1 (y))
By the chain rule of differentiation,
0
f Y (y) = FY0 (y) = f X { g−1 (y)} g−1 (y)
d −1
g (y) is positive since we assumed g was increasing.
dy
If we had g monotonic decreasing, Y ≤ y ⇐⇒ X ≥ g−1 (y) and
0
Note g−1 (y) =
FY (y) = PX ( X ≥ g−1 (y)) = 1 − FX ( g−1 (y)).
So by comparison with before, we would have
0
f Y (y) = FY0 (y) = − f X { g−1 (y)} g−1 (y)
0
with g−1 (y) always negative.
So overall, for Y = g( X ) we have
0
f Y (y) = f X { g−1 (y)}| g−1 (y)|.
5
(2)
2
2.1
Mean, Variance and Quantiles
Expectation
E( X )
For a continuous random variable X we define the mean or expectation of X,
Z ∞
µ X or EX ( X ) =
−∞
x f X ( x )dx.
More generally, for a function of interest of the random variable g : R → R we have
EX { g( X )} =
Z ∞
−∞
g( x ) f X ( x )dx.
Linearity of Expectation
Clearly, for continuous random variables we again have linearity of expectation
∀ a, b ∈ R,
E( aX + b) = aE( X ) + b,
and that for two functions g, h : R → R, we have additivity of expectation
E{ g( X ) + h( X )} = E{ g( X )} + E{ h( X )}.
2.2
Variance
Var( X )
The variance of a continuous random variable X is given by
σX2 or VarX ( X ) = E{( X − µ X )2 } =
Z ∞
−∞
( x − µ X )2 f X ( x )dx.
and again it is easy to show that
VarX ( X ) =
Z ∞
−∞
x2 f X ( x )dx − µ2X
= E( X 2 ) − {E( X )}2 .
For a linear transformation aX + b we again have Var( aX + b) = a2 Var( X ),
R.
6
∀ a, b ∈
2.3
Quantiles
QX (α)
Recall we defined the lower and upper quartiles and median of a sample of data as points
(¼,¾,½)-way through the ordered sample.
For α ∈ [0, 1] and a continuous random variable X we define the α-quantile of X, QX (α),
as
QX (α) = min{q : FX (q) = α}.
q ∈R
If FX is invertible then
QX (α) = FX−1 (α).
In particular the median of a random variable X is QX (0.5). That is, the median is a solution
1
to the equation FX ( x ) = .
2
Example (continued)
Again suppose we have a continuous random variable X with probability density function
given by
2
x /9, 0 < x < 3
f (x) =
0,
otherwise.
Questions
1. Calculate E( X ).
2. Calculate Var( X ).
3. Calculate the median of X.
Solutions
1. E( X ) =
2
Z ∞
2. E( X ) =
−∞
x f ( x )dx =
Z ∞
−∞
So Var( X ) =
Z 3
0
2
x f ( x )dx =
3
x2
x4 34
x. dx =
=
= 2.25.
9
36 0
36
Z 3
0
3
x2
x5 35
x . dx =
=
= 5.4.
9
45 0
45
E( X 2 ) − {E( X )}2
3. From earlier, F ( x ) =
2
= 5.4 − 2.252 = 0.3375.
x3
, for 0 < x < 3.
27
1
x3
1
Setting F ( x ) = and solving, we get
=
⇐⇒ x =
2
27
2
median.
7
r
3
27
3
= √
≈ 2.3811 for the
3
2
2
3
3.1
Continuous Distributions
Uniform
U( a, b)
Suppose X is a continuous random variable with probability density function
f (x) =
1
b− a ,
0,
a<x<b
otherwise,
and hence corresponding cumulative distribution function
F(x) =

 0,
x−a
b− a ,

1,
x≤a
a<x<b
x ≥ b.
Then X is said to follow a uniform distribution on the interval ( a, b) and we write X ∼
U( a, b).
Example: U(0,1)
1
f (x)
6
0
1
F(x)
6
-
x
1
0
1
-
x
Notice from the cdf that the quantiles of U(0,1) are the special case where Q(α) = α.
Relationship between U( a, b) and U(0,1)
Suppose X ∼ U(0, 1), so FX ( x ) = x, 0 ≤ x ≤ 1. For a < b ∈ R, if Y = a + (b − a) X then
Y ∼ U( a, b).
..............................................
.....................
..............
.............
...........
...........
.........
.........
.........
.
.
.
.
.
.
.
........
.....
.
.
.
.
.
....
.
..
-
0
X
a
1
Y
b
Proof:
y−a
We first observe that for any y ∈ ( a, b), Y ≤ y ⇐⇒ a + (b − a) X ≤ y ⇐⇒ X ≤
.
b−a
y−a
y−a
From this we find Y ∼ U( a, b), since FY (y) = P(Y ≤ y) = P X ≤
= FX
=
b−a
b−a
y−a
.
b−a
8
Mean and Variance of U( a, b)
To find the mean of X ∼ U( a, b),
E( X ) =
=
Z ∞
x f ( x )dx =
−∞
b2 − a2
2( b − a )
=
Z b
a
b
x2
1
dx =
x.
b−a
2( b − a ) a
(b − a)(b + a)
a+b
=
.
2( b − a )
2
( b − a )2
12 ,
Similarly we get Var( X ) = E( X 2 ) − E( X )2 =
µ=
3.2
a+b
,
2
σ2 =
so
( b − a )2
.
12
Exponential
Exp(λ)
Suppose now X is a random variable taking value on R+ = [0, ∞) with pdf
f ( x ) = λe−λx ,
x ≥ 0,
for some λ > 0.
Then X is said to follow an exponential distribution with rate parameter λ and we write
X ∼ Exp(λ).
Straightforward integration between 0 and x leads to the cdf,
F ( x ) = 1 − e−λx ,
x ≥ 0.
The mean and variance are given by
µ=
1
,
λ
1
.
λ2
σ2 =
1.0
Example: Exp(1), Exp(0.5) & Exp(0.2) pdfs
f(x)
0.6
0.8
λ=1
0.2
0.4
λ = 0.5
0.0
λ = 0.2
0
2
4
6
x
9
8
10
0.0
0.2
0.4
F(x)
0.6
0.8
1.0
Example: Exp(0.2), Exp(0.5) & Exp(1) cdfs
0
2
4
6
8
10
x
Lack of Memory Property
First notice that from the exponential distribution cdf equation we have P( X > x ) = e−λx .
An important (and not always desirable) characteristic of the exponential distribution is
the so called lack of memory property.
For x, s > 0, consider the conditional probability P( X > x + s| X > s) of the additional
magnitude of an exponentially distributed random variable given we already know it is greater
than s.
Well
P( X > x + s )
P( X > x + s | X > s ) =
,
P( X > s )
which, when X ∼ Exp(λ), gives
e−λ( x +s)
= e−λx ,
e−λs
again an exponential distribution with parameter λ.
P( X > x + s | X > s ) =
So if we think of the exponential variable as the time to an event, then knowledge that we
have waited time s for the event tells us nothing about how much longer we will have to wait
- the process has no memory.
Examples
Exponential random variables are often used to model the time until occurrence of a random event where there is an assumed constant risk (λ) of the event happening over time, and
so are frequently used as a simplest model, for example, in reliability analysis. So examples
include:
• the time to failure of a component in a system;
• the time until we find the next mistake on my slides;
• the distance we travel along a road until we find the next pothole;
• the time until the next jobs arrives at a database server;
10
Link with Poisson Distribution
Notice the duality between some of the exponential r.v. examples and those we saw for a
Poisson distribution. In each case, “number of events” has been replaced with “time between
events”.
Claim:
If events in a random process occur according to a Poisson distribution with rate λ then the
time between events has an exponential distribution with rate parameter λ.
Proof:
Suppose we have some random event process such that ∀ x > 0, the number of events
occurring in [0, x ], Nx , follows a Poisson distribution with rate parameter λ, so Nx ∼ Poi(λx ).
Such a process is known as an homogeneous Poisson process. Let X be the time until the first
event of this process arrives.
Then we notice that
P( X > x ) ≡ P( Nx = 0)
(λx )0 e−λx
0!
−λx
=e .
=
and hence X ∼ Exp(λ). The same argument applies for all subsequent inter-arrival times.
3.3
Gaussian
N(µ, σ2 )
Suppose X is a random variable taking value on R with pdf
1
( x − µ )2
f (x) = √
exp −
,
2σ2
σ 2π
for some µ ∈ R, σ > 0.
Then X is said to follow a Gaussian or normal distribution with mean µ and variance σ2 ,
and we write X ∼ N(µ, σ2 ).
The cdf of X ∼ N(µ, σ2 ) is not analytically tractable for any (µ, σ), so we can only write
1
F(x) = √
σ 2π
( t − µ )2
exp −
2σ2
−∞
Z x
11
dt.
0.4
Example: N(0,1), N(2,1) & N(0,4) pdfs
N(2,1)
0.2
f(x)
0.3
N(0,1)
0.0
0.1
N(0,4)
−4
−2
0
2
4
6
2
4
6
x
0.0
0.2
0.4
F(x)
0.6
0.8
1.0
Example: N(0,1), N(2,1) & N(0,4) cdfs
−4
−2
0
x
N(0, 1)
Setting µ = 0, σ = 1 and Z ∼ N(0, 1) gives the special case of the standard normal, with
simplified density
z2
1
f (z) ≡ φ(z) = √ e− 2 .
2π
Again for the cdf, we can only write
1
F (z) ≡ Φ(z) = √
2π
Z z
−∞
t2
e− 2 dt.
Statistical Tables
Since the cdf, and therefore any probabilities, associated with a normal distribution are not
analytically available, numerical integration procedures are used to find approximate probabilities.
12
In particular, statistical tables contain values of the standard normal cdf Φ(z) for a range of
values z ∈ R, and the quantiles Φ−1 (α) for a range of values α ∈ (0, 1). Linear interpolation is
used for approximation between the tabulated values.
But why just tabulate N(0, 1)? We will now see how all normal distribution probabilities
can be related back to probabilities from a standard normal distribution.
Linear Transformations of Normal Random Variables
Suppose we have X ∼ N(µ, σ2 ). Then it is also true that for any constants a, b ∈ R, the
linear combination aX + b also follows a normal distribution. More precisely,
X ∼ N(µ, σ2 ) ⇒ aX + b ∼ N( aµ + b, a2 σ2 ),
a, b ∈ R.
(Note that the mean and variance parameters of this transformed distribution follow from the
general results for expectation and variance of any random variable under linear transformation.)
In particular, this allows us to standardise any normal r.v.,
X ∼ N(µ, σ2 ) ⇒
X−µ
∼ N(0, 1).
σ
Standardising Normal Random Variables
X−µ
So if X ∼ N(µ, σ2 ) and we set Z =
, then since σ > 0 we can first observe that for
σ
any x ∈ R,
X−µ
x−µ
≤
σ
σ
x−µ
⇐⇒ Z ≤
.
σ
X ≤ x ⇐⇒
Therefore we can write the cdf of X in terms of Φ,
x−µ
FX ( x ) = P( X ≤ x ) = P Z ≤
σ
x−µ
=Φ
.
σ
Table of Φ
z
0
.1
.2
.3
.4
.5
.6
.7
.8
Φ(z)
.5
.540
.579
.618
.655
.691
.726
.758
.788
z
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Φ(z)
.816
.841
.864
.885
.903
.919
.933
.945
.955
z
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
13
Φ(z)
.964
.971
.977
.982
.986
.989
.992
.994
.995
z
2.8
3.0
3.5
1.282
1.645
1.96
2.326
2.576
3.09
Φ(z)
.997
.998
.9998
.9
.95
.975
.99
.995
.999
Using Table of Φ
First of all notice that Φ(z) has been tabulated for z > 0.
This is because the standard normal pdf φ is symmetric about 0, so φ(−z) = φ(z). For the
cdf Φ, this means
Φ(z) = 1 − Φ(−z).
So for example, Φ(−1.2) = 1 − Φ(1.2) ≈ 1 − 0.885 = 0.115.
Similarly, if Z ∼ N(0, 1) and we want P( Z > z), then for example P( Z > 1.5) = 1 − P( Z ≤
1.5) = 1 − Φ(1.5).
Important Quantiles of N(0, 1)
We will often have cause to use the 97.5% and 99.5% quantiles of N(0, 1), given by Φ−1 (0.975)
and Φ−1 (0.995).
Φ(1.96) ≈ 97.5%.
So with 95% probability an N(0, 1) r.v. will lie in [−1.96, 1.96] (≈ [−2, 2]).
Φ(2.58) = 99.5%.
So with 99% probability an N(0, 1) r.v. will lie in [−2.58, 2.58].
More generally, for α ∈ (0, 1) and defining z1−α/2 to be the (1 − α/2) quantile of N(0, 1), if
Z ∼ N(0, 1) then
PZ ( Z ∈ [−z1−α/2 , z1−α/2 ]) = 1 − α.
More generally still, if X ∼ N(µ, σ2 ), then
PX ( X ∈ [µ − σz1−α/2 , µ + σz1−α/2 ]) = 1 − α,
and hence [µ − σz1−α/2 , µ + σz1−α/2 ] gives a (1 − α) probability region for X centred around
µ.
This can be rewritten as
PX (| X − µ| ≤ σz1−α/2 ) = 1 − α.
Example
An analogue signal received at a detector (measured in microvolts) may be modelled as a
Gaussian random variable X ∼ N(200, 256).
1. What is the probability that the signal will exceed 240µV?
2. What is the probability that the signal is larger than 240µV given that it is greater than
210µV?
Solutions:
1. P( X > 240) = 1 − P( X ≤ 240) = 1 − Φ
240 − 200
√
256
14
= 1 − Φ(2.5) ≈ 0.00621.
2. P( X > 240| X > 210) =
1−Φ
240
√−200
256
P( X > 240)
≈ 0.02335.
=
P( X > 210)
√−200
1 − Φ 210
256
The Central Limit Theorem
Let X1 , X2 , . . . , Xn be n independent and identically distributed (i.i.d.) random variables
from any probability distribution, each with mean µ and variance σ2 .
!
!
!
n
From before we know E
!
n
0, Var
∑ Xi − nµ
n
∑ Xi
∑ Xi
= nµ,Var
i =1
= nσ2 . First notice E
i =1
2
= nσ . Dividing by
√
nσ, E
i =1
∑in=1 Xi − nµ
√
nσ
= 0, Var
n
∑
Xi − nµ
i =1
n
∑i=1 Xi − nµ
√
nσ
=
=
1.
But we can now present the following, astonishing result.
∑in=1 Xi − nµ
√
∼ Φ.
n→∞
nσ
lim
This can also be written as
lim
n→∞
X̄ − µ
√ ∼ Φ,
σ/ n
∑in=1 Xi
.
n
Or finally, for large n we have approximately
σ2
X̄ ∼ N µ,
,
n
or
where X̄ =
n
∑ Xi ∼ N
nµ, nσ2 .
i =1
We note here that although all these approximate distributional results hold irrespective
of the distribution of the { Xi }, in the special case where Xi ∼ N(µ, σ2 ) these distributional
results are, in fact, exact. This is because the sum of independent normally distributed random
variables is also normally distributed.
Example
Consider the most simple example, that X1 , X2 , . . . are i.i.d. Bernoulli( p) discrete random
variables taking value 0 or 1.
Then the { Xi } each have mean µ = p and variance σ2 = p(1 − p).
By definition, we know that for any n,
n
∑ Xi ∼ Binomial(n, p).
i =1
15
But now, by the Central Limit Theorem (CLT), we also have for large n that approximately
n
∑ Xi ∼ N
nµ, nσ2 ≡ N(np, np(1 − p)).
i =1
So for large n
Binomial(n, p) ≈ N(np, np(1 − p)).
Notice that the LHS is a discrete distribution, and the RHS is a continuous distribution.
Binomial(10,½) pmf & N(5,2.5) pdf
N(5,2.5)
0.00
0.00
0.05
0.05
0.10
0.10
f(x)
p(x)
0.15
0.15
0.20
0.20
0.25
0.25
Binomial(10,0.5)
0
2
4
6
8
10
0
2
4
x
6
8
10
x
Binomial(100,½) pmf & N(50,25) pdf
0.08
N(50,25)
0.06
f(x)
0.00
0.02
0.04
0.04
0.02
0.00
p(x)
0.06
0.08
Binomial(100,0.5)
20
30
40
50
60
70
80
x
20
30
40
50
x
16
60
70
80
Binomial(1000,½) pmf & N(500,250) pdf
N(500,250)
0.000
0.000
0.005
0.005
0.010
0.010
f(x)
p(x)
0.015
0.015
0.020
0.020
0.025
0.025
Binomial(1000,0.5)
400
450
500
550
600
400
450
500
x
550
600
x
So suppose X was the number of heads found on 1000 tosses of a fair coin, and we were
interested in P( X ≤ 490).
Using the binomial distribution pmf, we would need to calculate P( X ≤ 490) = p X (0) +
p X (1) + p X (2) + . . . + p X (490) (!) (≈ 0.27).
the CLT we have approximately X ∼ N(500, 250) and so P( X ≤ 490) ≈
However, using
490 − 500
√
Φ
= Φ(−0.632) = 1 − Φ(0.632) ≈ 0.26.
250
Log-Normal Distribution
Suppose X ∼ N(µ, σ2 ), and consider the transformation Y = e X .
1
0
Then if g( x ) = e x , g−1 (y) = log(y) and g−1 (y) = .
y
Then by (2) we have
{log(y) − µ}2
,
f Y (y) =
exp −
2σ2
σy 2π
1
√
y > 0,
and we say Y follows a log-normal distribution.
Example: LN(0,1), LN(2,1) & LN(0,4) pdfs
0.2
0.4
0.6
LN(0,1)
LN(2,1)
0.0
f(x)
0.8
1.0
1.2
LN(0,4)
0
2
4
6
x
17
8
10