Download Continuous Variables and Their Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Continuous Variables and Their Probability
Distributions
Dr. Tai-kuang Ho, National Tsing Hua University
The slides draw from the textbooks of Wackerly, Mendenhall, and
Schea¤er (2008) & Devore and Berk (2012)
1
4.1
Introduction
Examples of continuous random variables: the amount of rainfall; the length of
life.
A random variable that can take on any value in an interval is called continuous.
This does not mean that if we have enough observations, we would eventually
observe an outcome corresponding to every value in the interval. Rather it means
that no value within the interval can be ruled out as a possible value.
This chapters introduces the probability distribution for continuous random variables.
2
Probability mass function, pmf, (discrete random variables)
Probability density function, pdf, (continuous random variables)
The probability distribution for a discrete random variable can always be given
by assigning a nonnegative probability to each of the possible values the variable
may assume. In every case, of course, the sum of all the probabilities that we
assign must be equal to 1.
Unfortunately, the probability distribution for a continuous random variable cannot be speci…ed in the same way. It is mathematically impossible to assign
nonzero probabilities to all the points on a line interval while satisfying the requirement that the probabilities of the distinct possible values sum to 1.
3
4.2
The Probability Distribution for a Continuous Random Variable
De…nition: Cumulative distribution function of a random variable
Let Y denote any random variable. The distribution function of Y , denoted by
F (y ), is such that F (y ) = P (Y
y ) for 1 < y < 1.
The nature of the distribution function associated with a random variable determines whether the variable is continuous or discrete.
Figure 4.1
4
F (y ) = P (Y
8
>
0;
>
>
>
< 1;
4
y) =
3;
>
>
>
4
>
: 1;
f or
y<0
f or 0 y < 1
f or 1 y < 2
f or
2 y
Step functions: Distribution functions for discrete random variables are always
step functions because the cumulative distribution function increases only at the
…nite or countable number of points with positive probabilities.
Theorem: Properties of a Distribution Function
If F (y ) is a distribution function, then
5
1. F ( 1)
2. F (1)
lim F (y ) = 0
y! 1
lim F (y ) = 1
y!1
3. F (y ) is a non-decreasing function of y . If y1 and y2 are any values such that
y1 < y2, then F (y1) F (y2).
How about the CDF (cumulative distribution function) for a continuous random
variable?
Figure 4.2
6
DEFINITION: continuous random variables
A random variable Y with distribution function F (y ) is said to be continuous if
the CDF of y , F (y ), is continuous, for 1 < y < 1.
If Y is a continuous random variable, then for any real number y ,
P (Y = y ) = 0
If P (Y = y0) = p0 > 0, then F (y ) would have a discontinuity of size p0 at
the point y0, violating the assumption that Y was continuous.
7
Practically speaking, the fact that continuous random variables have zero probability at discrete points should not bother us.
Imaging the following two questions, one is silly and the other is good:
1. The probability of a rainfall of 2:13846 inches?
2. The probability of a rainfall between 2 and 3 inches?
Probability density function (PDF) is derived from CDF.
DEFINITION: probability density function
8
Let F (y ) be the distribution function for a continuous random variable Y . Then
f (y ), given by
dF (y )
= F 0(y )
dy
wherever the derivatives exists, is called the probability density function for the random variable Y .
f (y ) =
Relationship between CDF and PDF:
F (y ) =
Z y
1
9
f (t) dt
Figure 4.3
THEOREM: Properties of a Density Function
If f (y ) is a density function for a continuous random variable, then
1. f (y )
0 for all y ,
1 < y < 1.!because F (y ) is a non-decreasing function
R1
2.
1 f (y )dy = 1.!because F (1) = 1
The CDF for a continuous random variable must be continuous, but the PDF
needs not be everywhere continuous.
10
It is often of interest to determine the value, y , of a random variable Y that is
such that P (Y
y ) equals or exceeds some speci…ed value.
VaR: value at risk
http://en.wikipedia.org/wiki/Value_at_risk
In …nancial mathematics and …nancial risk management, Value at Risk (VaR) is
a widely used risk measure of the risk of loss on a speci…c portfolio of …nancial
assets. For a given portfolio, probability and time horizon, VaR is de…ned as
a threshold value such that the probability that the mark-to-market loss on the
portfolio over the given time horizon exceeds this value (assuming normal markets
and no trading in the portfolio) is the given probability level.
11
For example, if a portfolio of stocks has a one-day 5% VaR of $1 million, there
is a 0:05 probability that the portfolio will fall in value by more than $1 million
over a one day period if there is no trading.
DEFINITION: quantiles
Let Y denote any random variable. If 0 < p < 1, the p-th quantile of Y ,
denoted by p; is the smallest value such that F ( p) = P (Y
p. If
q)
Y is continuous, p is the smallest value such that F ( p) = P (Y
p) = p.
Some prefer to call p the 100p-th percentile of Y .
0:5
is called the median of the random variable.
12
THEOREM: probability in a speci…c interval
If the random variable Y has density function f (y ) and a < b, then the probability that Y falls in the interval [a; b] is
P (a
Y
b) =
Z b
a
f (y )dy:
For a continuous variable Y with constants a and b and a < b,
13
P (a < Y < b) = P (a
Y < b) = P (a < Y
= P (a
Y
b) =
Z b
a
b)
f (y ) dy
Figure 4.8
4.3
Expected Values for Continuous Random Variables
To study continuous random variables we need numerical descriptive measures
such as mean and variance.
14
The de…nitions are for continuous variables, but they are consistent with those
of the discrete variables mentioned in Chapter 3.
DEFINITION: expected value of a continuous random variable
The expected value of a continuous random variable Y is
E (Y ) =
provided that the intergral exists.
Z 1
1
15
y f (y )dy
E (Y ) =
X
y f (y )
y
THEOREM: expected value of a function of a random variable
Let g (Y ) be a function of Y ; then the expected value of g (Y ) is given by
E [g (Y )] =
provided that the intergral exists.
g (Y ) = (Y
Z 1
1
)2
16
g (y ) f (y )dy
V ar (Y ) = E Y 2
2
THEOREM
Let c be a constant and let g (Y ), g1(Y ), g2(Y ), : : :, gk (Y ) be functions of a
continuous random variable Y . Then the following results hold:
1. E (c) = c.
2. E [c g (Y )] = c E [g (Y )].
3. E [g1(Y ) + g2(Y ) +
+ gk (Y )] = E [g1(Y )]+ E [g2(Y )]+
17
+ E [gk (Y )].
4.4
The Uniform Probability Distribution
Suppose that a bus always arrives at a particular stop between 8 : 00 and 8 : 10
A.M. and that the probability that the bus will arrive in any given subinterval of
time is proportional only to the length of the subinterval.
A bus is likely to arrive between 8 : 00 and 8 : 02 as it is to arrive between
8 : 06 and 8 : 08.
Figure 4.9
DEFINITION: uniform distribution
18
If 1 < 2, a random variable Y is said to have a continuous uniform probability
distribution on the interval ( 1; 2) if and only if the density function of Y is
f (y ) =
(
1
2
0
;
;
1
y
2
elsewhere
1
The parameters of the density function are 1; 2.
The uniform distribution is very important for theoretical reasons. Simulation
studies are valuable techniques for validating models in statistics. If we desire a
set of observations on a random variable Y with distribution function F (y ), we
often can obtain the desired results by transforming a set of observations on a
uniform random variable.
19
THEOREM: mean and variance of a uniform distribution
If 1 < 2 and Y is a random variable uniformly distributed on the interval ( 1; 2),
then
+ 2
= E (Y ) = 1
2
2
(
= V (Y ) = 2
Proof
20
1)
12
2
E (Y ) =
Z 1
1
y f (y ) dy =
1
=
2
y2
1 2
#
2
=
1
R code to plot a uniform distribution
x=seq(-4,4,length=100)
y=dunif(x,min=-4,max=4)
21
Z
2
1
y
2
1
2
2
2( 2
2
1
dy
1
+ 1
= 2
2
1)
plot(x,y,xlab="x value",ylab="Density",main="Uniform Distributions")
4.5
The Normal Probability Distribution
The most commonly used continuous probability distribution is the normal distribution.
DEFINITION: normal probability distribution
A random variable Y is said to have a normal probability distribution if and only if,
for > 0 and 1 < < 1, the density function of Y is
22
1
f (y ) = p exp
2
"
(y
)
2 2
Parameters of a normal distribution:
2#
;
1<y<1
and
Moment-generating function?
exp
t+
1 2 2
t
2
THEOREM: mean and variance of a normal distribution
23
If Y is a normally distributed random variable with parameters
and ; then
E (Y ) =
V (Y ) = 2
Figure 4.10
Areas under the normal density function usually does not have a closed-form
expression and numerical integration techniques are needed.
24
P (a
Y
b) =
Z b
a
1
p exp
2
"
(y
)
2 2
2#
dy
We can always transform a normal random variable Y to a standard normal
random variable Z by using the relationship:
Z=
Y
R code to plot a normal distribution
25
x=seq(-6,6,length=100)
y=dnorm(x,mean=0,sd=1)
plot(x,y,xlab="x value",ylab="Density",main="Standard Normal Distributi
4.6
The Gamma Probability Distribution
Some random variables are always non-negative and for various reasons yield
distributions of data that are skewed (nonsymmetric) to the right.
26
That is, most of the area under the density function is located near the origin,
and the density function drops gradually as y increases.
Figure 4.15
DEFINITION: Gamma density function
A random variable Y is said to have a gamma distribution with parameters
and > 0 if and only if the density function of Y is
f (y ) =
8
>
< y
>
:
y
1e
( )
0
27
; 0 y<1
; elsewhere
>0
where
( )=
Z 1
0
y
1 e y dy
( ) : gamma function
(1) = 1
( )=(
1) (
( ) = (n
1)!;
1) ;
> 1;
> 1;
n 2 integer
n 2 integer
28
: shape parameter
The shape of the gamma density di¤ers for the di¤erent values of
When
.
= 1, it becomes an exponential distribution.
Figure 4.16
: scale parameter
Multiplying a gamma-distributed random variable by a positive constant produces
a random variable that also has a gamma distribution with the same value of
but with an altered value of .
29
THEOREM: mean and variance of a gamma distribution
If Y has a gamma distribution with parameters
and ; then
= E (Y ) =
2
= V (Y ) =
2
( ; ) are parameters of a gamma distribution.
30
Don’t confuse gamma function
( ) with gamma distribution
( ; ).
Why gamma distribution?
Gamma distribution is often the probability model for waiting time: for instance,
waiting time until death.
In fact, gamma distribution is a good model for non-negative random variables
of continuous type: for instance, the distribution of income.
A gamma distribution is related to a Poisson distribution.
31
Start from the assumption of Poisson distribution.
W : the time that is needed to obtain exactly k changes.
W has a gamma distribution with
= k and
= k;
=
Interpretation?
32
1
= 1.
Moment-generating function (mgf) of a gamma distribution:
M (t) =
=
E etX
Z 1
1
( )
0
y=
x=
y
(1
t)
=
Z 1
;
0
1e
x
x (1
t<
etx
t)
1
33
;
1
( )
x(1
x
1e
t)
dx
t<
! dx =
1
(1
t)
dy
x
dx
M (t) =
=
Z 1
0
Z 1
0
M (t) =
=
1
( )
1
t
x
1e
!
1
( )
1
1
t
1
(1
t)
Z 1
|
;
0
x(1
t)
dx
1
t
1
y
( )
t<
{z
=1
1
e y dy
1 e y dy
:
Derive mean and variance of a gamma distribution:
34
1
y
}
M 0 (t) = (
M 00 (t) = (
) (1
)(
1
t)
1) (1
t)
(
)
2
)2 :
(
= M 0 (0) =
2
= M 00 (t)
2
=
( + 1) 2
35
2 2
=
2
DEFINITION: 2 distribution
Let v be a positive integer. A random variable Y is said to have a chi-square distribution with degrees of freedom if and only if Y is a gamma-distributed random
variable with parameters = v=2 and = 2.
Chi-square random variable has only one parameter .
mgf of 2 distribution
M (t) = (1
2t)
36
2
;
1
t<
2
THEOREM: mean and variance of 2 distribution
If Y is a chi-square random variable with v degrees of freedom, then
= E (Y ) =
2
= V (Y ) = 2v
= n2 for some integer n, then 2Y has a
If Y has a gamma distribution with
2
distribution with n degree of freedom.
37
The gamma density function in which
function.
= 1 is called the exponential density
DEFINITION: exponential distribution
A random variable Y is said to have an exponential distribution with parameter
if and only if the density function of Y is
8
< 1
e
f (y ) =
:
0
y
; 0 y<1
; elsewhere
38
>0
THEOREM: mean and variance of exponential distribution
If Y is an exponential random variable with parameter ; then
= E (Y ) =
2
= V (Y ) = 2
P (Y > a + bjY > a) = P (Y > b)
This property of the exponential distribution is called the memory-less property
of the distribution.
39
Let X1; X2; : : : ; Xn be independent random variables. Suppose, for i = 1; : : : ; n,
Pn
Pn
that Xi has a ( i; ) distribution. Let Y = i=1 Xi. Then Y has ( i=1 i; )
distribution.
Proof
2
MY (t) = E etY
8
n
< X
= E 4exp t
:
i=1
= E [exp ftX1 + tX2 +
=
n
Y
i=1
= (1
E [exp ftXig] =
t)
n
i=1 i
40
n
Y
i=1
93
=
Xi 5
;
+ tXng]
(1
t)
i
Let X1; X2; : : : ; Xn be independent random variables. Suppose, for i = 1; : : : ; n,
Pn
Pn
2
2
that Xi has a (ri) distribution. Let Y = i=1 Xi. Then Y has ( i=1 ri)
distribution.
R code to plot a gamma distribution
x=seq(0,10,length=1000)
y=dgamma(x,shape=2,scale=1)
plot(x,y,xlab="x value",ylab="Density",main="Gamma Distributions")
41