Download Lecture 03: Continuous random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Lecture 03: Continuous random variables
Ba Chu
E-mail: ba [email protected]
Web: http://www.carleton.ca/∼bchu
(Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for
details. Examples will be given and explained in the class.)
1
Objectives
In reality, we often encounter situations where the random event under our consideration has uncountable prospects. The purpose of this lecture is to introduce the concept of continuous random
variables associated with events of uncountable numbers of outcomes. First, I will motivate this
concept with the continuous uniform distribution. Second, I will define the probability density
functions (pdf) and the cumulative distribution functions (cdf) of continuous random variables.
Third, I will explain typical continuous random variables such as the normal random variable,
the t random variable, and the χ2 random variable. Fourth, I will explain the concept of moment
generating functions (mgf) and their usefulness in deriving moments of random variables. Examples
will be explained along the way.
2
Motivation
Consider a random variable X whose set of possible values is the unit interval, i.e., [0, 1]. Specifically,
we ask the following question:
what probability distribution formalizes the notion of ‘equally likely’ outcomes in the
1
unit interval [0, 1]?
We have learnt to formalize the notion of ‘equally likely’ by assigning the same probability to
each individual outcome in the same space. Thus, if S = {s1 , . . . , sN }, then P (si ) = 1/N . Let
E = {si1 , . . . , sik } denote an event in S, we have
P (E) =
k
X
P ({sij }) =
j=1
k
X
1
k
= .
N
N
j=1
Now, if E is an event with countable outcomes with real values in [0, 1] (say, E = { 21 , 13 , 14 , . . . },
a difficulty arises. This is because, if we assume that every value in [0,1] is equally likely, i.e.,
P (X = x) = for every x ∈ [0, 1], we obtain
∞
X
∞
X
1
P (E) =
P ({ }) =
= ∞.
j
j=2
j=2
(2.1)
Hence, we must assign a probability of zero to every outcome in [0, 1], i.e., P (X = x) = 0 for
every x ∈ [0, 1]; thus P (E) = 0. This may give you confusion. However, in probability we always
emphasize that
If an event is impossible, then it necessarily has zero probability; however, having zero
probability does not necessarily mean that an event is impossible.
As we have seen, probabilities of any countable event must be zero. Hence, to obtain positive
probabilities, we must consider events whose cardinality is uncountable.
Consider the events [0, 0.5] and [0.5, 1]. Because all outcomes are equally likely, these events
must have the same probability, i.e., P (X ∈ [0, 0.5]) = P (X ∈ [0.5, 1]). Since P (X ∈ [0, 1]) = 1, we
have: P (X ∈ [0, 0.5]) = P (X ∈ [0.5, 1]) = 0.5. The intuitive conclusion here is that the probability
of each subinterval equals the proportion of the entire interval occupied by the subinterval. In
mathematical notation, our conclusion can be expressed as follows:
Suppose that X(S) = [0, 1] and each x ∈ [0, 1] is equally likely. If 0 ≤ a ≤ b ≤ 1, then
P (X ∈ [a, b]) = b − a.
2
The probability distribution that we discussed above is called the uniform distribution. If X ∼
Uniform[0, 1], the the cdf of X is given by:
• If y < 0, then F (y) = P (X ≤ y) = 0.
• If y ∈ [0, 1], then F (y) = y.
• If y > 1, then F (y) = 0.
3
Basic Concepts
Consider a real-valued function f : R −→ R. Let Area[a,b] (f ) denote the region bounded by the
graph of f at the specified end points a and b. By elementary integral calculus, we can determine that
Rb
Area[a,b] (f ) = a f (x)dx. Now, we introduce the notation of pdfs of continuous random variables
on the real probability space.
Definition 1. A pdf is a function f : R −→ R such that
1. f (x) ≥ 0 for every x ∈ R.
2. Area[−∞,+∞] (f ) =
R∞
−∞
f (x)dx = 1.
It is apparent that the cdf of X is given by F (y) = P (X ≤ y) =
Ry
−∞
f (x)dx; and f (y) =
Definition 2. The expected value of X is
Z
∞
µ = E(X) =
xf (x)dx.
−∞
If the function g : R −→ R is such that Y = g(X) is a random variable, then we have
Z
∞
E(Y ) = E[g(X)] =
g(x)f (x)dx.
−∞
The variance of X is
2
Z
∞
σ = V ar(X) =
−∞
3
(x − µ)2 f (x)dx.
d
F (y).
dy
Example 1. What is the probability that a battery-powered watch will stop with its minute hand
positioned between 10 and 20 minutes past the hour?
Answer: Let X denote the number of minutes past the hour to which the minute hand points
when the watch stops. The pdf that describes the notion of equally likely values in the interval [0, 60]
is



0, x ∈ (−∞, 0),



f (x) = 1/60, x ∈ [0, 60),




 0, x ∈ [60, ∞).
1
=
To check that f is a pdf, observe that f (x) ≥ 0 for every x ∈ R and that Area[ 0, 60)(f ) = (60−0) 60
1
= 1/6.
1. Thus, P (X ∈ (10, 20)) = Area(10,20) (f ) = (20 − 10) 60
Example 2. Consider two battery-powered watches. Let X1 denote the number of minutes past the
hour at which the first watch stops and let X2 denote the number of minutes past the hour at which
the second watch stops. What is the probability that the larger of X1 and X2 will be between 30 and
50?
Answer: Here we have two independent variables, each uniformly distributed – Uniform[0,60),
and a third random number Y = max(X1 , X2 ). Let F denote the cdf of Y . We want to calculate
P (30 < Y < 50) = F (50) − F (30). By the independence of X1 and X2 , we have
F (y) = P (Y ≤ y) = P ((X1 ≤ y) ∩ (X2 ≤ y)) = P (X1 ≤ y)P (X2 ≤ y) =
y2
.
602
Hence, P (30 < Y < 50) = 4/9.
4
Normal Random Variable
This is the most important family of distributions in probability or statistics.
Definition 3. A continuous random variable X is normally distributed with mean µ and variance
4
σ 2 > 0, denoted X ∼ N (µ, σ 2 ), if the pdf of X is
1
1
f (x) = √
exp −
2
2πσ
x−µ
σ
2
.
Here are useful properties of the normal random variables.
1. f (x) > 0 and X(S) = (−∞, ∞).
2. f is symmetric about µ, i.e., f (µ + x) = f (µ − x).
3. f (x) decreases as |x − µ| increases. In fact, the decrease is very rapid. We express this by
saying that f has very light tails.
4. P (µ − σ < X < µ + σ) = 0.683.
5. P (µ − 3σ < X < µ + 3σ) = 0.997.
Definition 4. The standard normal distribution is N (0, 1). The pdf is often written as Φ(x); and
the values of Φ(x) for every x ∈ R are given in formal statistical tables.
If X ∼ N (µ, σ 2 ), then Z =
X−µ
σ
∼ N (0, 1).
Example 3. If X ∼ N (2, 16), then the probability that X assumes a value between 0 and 10 is
<
P (0 < X < 10) = P ( 0−µ
σ
X−µ
σ
<
10−µ
)
σ
= P (Z < 2) − P (Z < −0.5).
Theorem 1. If Z1 , . . . , Zn ∼ N (0, 1) and Y = Z12 + · · · + Zn2 , then Y is Chi-squared distributed
with n degrees of freedom, denote χ2 (n).
Let T = √Z , then the distribution of T is called a t distribution with n degrees of freedom.
Y /n
We will denote this distribution by t(n).
Finally, let Y1 ∼ χ2 (n1 ) and Y2 ∼ χ2 (n2 ) be independent random variables and consider the
random variable F =
Y1 /n1
.
Y2 /n2
The distribution of F is called the F distribution with n1 and n2
degrees of freedom, respectively. We will denote this distribution by F (n1 , n2 ).
5
Theorem 2. Let Fn denote the cdf of t(n). Then
lim Fn (t) = Φ(t)
n−→∞
for every t ∈ (−∞, +∞).
5
Moment Generating Functions (MGF) and Their Properties
The mgf m(t) for a random variable Y is defined to be E[etY ], where t is an arbitrarily small
constant. In view of the Taylor expansion of etY : etY = 1 + tY +
m(t) = 1 + tE(Y ) +
t2
E(Y 2 )
2!
+
t3
E(Y 3 )
3!
(tY )2
2!
+
(tY )3
3!
+ . . . , we have
+ ...,
0
m (t) = E(Y ) +
3t2
2t
E(Y 2 ) +
E(Y 3 ) + . . . ,
2!
3!
and
00
m (t) = E(Y 2 ) +
0
3.2t
E(Y 3 ) + . . . .
3!
00
It is immediate to see that m (0) = E(Y ) and m (0) = E(Y 2 ). Hence, if you have the mgf for a
random variable, you can find the moments by taking the derivatives and evaluating them at zero.
Note that the variance of Y can be derived using the formula V ar(Y ) = E(Y 2 ) − (E(Y ))2 .
If Y is a discrete random variable, then
m(t) = E(etY ) =
∞
X
ety pdf (y) = 1 + t
y=0
∞
X
0
∞
ypdf (y) +
t2 X 2
y pdf (y) + . . . .
2! y=0
Example 4. Suppose Y is a Poisson random variable with parameter λ. Then pdf (y) =
6
e−λ λy
y!
for
y = 0, 1, 2, . . . . The mgf of Y is given by
m(t) =
∞
X
= e−λ
ety
y=0
∞
X
0
e−λ λy
y!
(et λ)y
y!
t
= e−λ ee λ
= eP λ(et − 1).
The moments of Y can be immediately found by taking the derivatives and evaluating them at zero.
Example 5. Suppose that Y has a geometric distribution with probability p. Given that P (Y =
y) = pq y−1 , the mgf of Y is m(t) =
pet
.
1−qet
2 /2
Example 6. Suppose that Z ∼ N (0, 1). Then, m(t) = E[etZ ] = et
.
Example 7. Suppose that Y has a binomial distribution with parameters (n, p). Then, the mgf for
Y is given by m(t) = (pet + q)n .
The Uniqueness Theorem that I am going to state is very important in using mgfs to find the
probability distribution of a function of random variables.
Theorem 3. Suppose that the random variables X and Y have mgfs given by mX (t) and mY (t)
respectively. If mX (t) = mY (t) for all values of t, then X and Y have the same pdf.
Example 8. If Z ∼ N (0, 1) and Y = σZ + µ, show that Y ∼ N (µ, σ 2 ).
2 /2
Answer: We know that mZ (t) = et
. Hence, mY (t) = etµ mZ (σt) = etµ+σ
2 t2 /2
which is the mgf
of N (µ, σ 2 ). This implies that Y ∼ N (µ, σ 2 ) by the uniqueness theorem.
Example 9. Suppose that Y1 ∼ Poisson(λ1 ) and Y2 ∼ Poisson(λ2 ) and that they are independent.
Show that W = Y1 + Y2 is Poisson(λ1 + λ2 ).
t
Answer: mW (t) = mY1 (t).mY2 (t) = e(λ1 +λ2 )(e −1) . This implies W is Poisson(λ1 + λ2 ) by the
uniqueness theorem.
7
Example 10. Let Y1 , Y2 , . . . , Yn denote independent identically distributed (IID) normal random
P
variables with mean µ and variance σ 2 . Then, show that Y = n1 n1 Yi is normally distributed with
mean µ and variance σ 2 /n.
Answer: It is immediate to verify that muY (t) = eµt+
σ 2 t2
2n
. Hence, Y ∼ N (µ, σ 2 /n) by the
uniqueness theorem.
6
Exercises
1. Suppose that Y is a normal random variable with mean µ and variance σ 2 .
(a) Show that the mgf of Y is eµt+
t2 σ 2
2
.
(b) Show that E(Y ) = µ and V ar(Y ) = σ 2 .
(c) Derive the third and fourth moments of Y , i.e., E(Y 3 ) and E(Y 4 ).
2. Find the first, second, and third moments of the random variable Y for each of the following
mgfs:
(a) m(t) =
8
1
.
1−2t
t
(b) m(t) = e2(e −1) .
3. Consider the function f : R −→ R defined by



0,





 cx,
f (x) =


c(3 − x),





 0,
x < 0,
0 < x < 1.5,
1.5 < x < 3,
x > 3,
where c is an undetermined constant.
(a) For what value of c is f a probability density function?
(b) Suppose that a continuous random variable X has probability density function f . Compute E(X).
8
(c) Compute P (X > 2).
(d) Suppose that Y ∼ Uniform(0, 3). Which random variable has the larger variance, X or
Y?
(e) Determine and graph the cdf of X.
4. Imagine that you are throwing at a circular dart board, B. Let us measure the dart board
in units for which the radius of B is 1, so that the area of B is π. Suppose that the darts
are thrown in such a way that they are certain to hit a point in B, and that each point in
B is equally likely to be hit. Thus, if A ⊂ B, then the probability of hitting a point in A is
P (A) =
area(A)
area(B)
=
area(A)
.
π
Define the random variable X to be the distance from the center of
B to the point that is hit.
(a) What are the possible values of X?
(b) Compute P (X ≤ 0.5).
(c) Compute P (0.5 < X ≤ 0.7).
(d) Determine and graph the cdf of X.
9