Download Continuous Distributions, Mainly the Normal Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Continuous Distributions, Mainly the Normal Distribution
STA 281 Fall 2011
1 Continuous Random Variables
Discrete distributions place probability on specific numbers. A Bin(n,p) distribution, for example, has
possible values 0, 1, …, n and each of these numbers has positive probability of occurring. Some random
phenomena, instead of having possible values of a discrete set, can take values in a range. For example,
a person’s height can be anywhere between 0 and 10 feet tall. It MIGHT be exactly 6 feet, but odds are
it’s more like 6.003243556 feet. It is a mathematical fact that if you have an interval of numbers, you
can’t place a positive probability on each of them. However, we’d still like to talk about the probability a
randomly chosen person has a height between 72 and 75 inches. We need another way of expressing
probability models for random variables that can take on any value in an interval of the real line. Such
random variables are called continuous random variables.
Just like discrete random variables, continuous random variables have a set of possible values.
However, this set is an interval of numbers (xmin,xmax) instead of a set of numbers like 0, 1, …, n. It is
possible that xmin=-∞ and/or xmax=∞. To compute probabilities for continuous random variables, we have
a probability density function f(x) defined over the range of possible values. To find the probability of
any interval (a,b), integrate the density over that range. Thus
(
)
∫ ( )
This idea is similar to adding probabilities for discrete random variables. If X Bin(n=100,p=0.3) and
you want P(23≤X≤26), you compute it by taking P(X=23)+P(X=24)+P(X=25)+P(X=26). This
amounts to
(
)
∑
(
)
For continuous random variables, integration replaces adding the values together.
Not just any f(x) can be a density. Since by integrating densities we are supposed to arrive at
probabilities, densities must obey the conditions
1. ∫
( )
. Just like any probabilities, the probability of the sample space must be 1.
2. f(x)≥0 for all x. If f(x) were negative anywhere, we could integrate in that region and get a
negative probability. Since probabilities cannot be negative, densities can’t either.
One important fact to keep in mind when dealing with densities is that all individual points have 0
probability. Thus, if X is a continuous random variable, P(X=37)=0. Points do not have probability, but
intervals do have probability.
Example. Let X be a random variable with density f(x)=0.5x over the range 0 ≤ x ≤ 2. This is a valid
density since it is nonnegative over its entire range and ∫ ( )
P(0.2<X<0.6). We just integrate the density over that range
1
. Suppose we wanted to find
Notice that, since points have zero probability, P(X=0.2)=0 and P(X=0.6)=0 (similarly any probability
involving a single point is zero). You CANNOT just read probabilities off the density, you have to perform the
integration. Furthermore, since points don’t have any probability, whether you use less than or less than or equal
signs doesn’t make any difference. For example
(
)
(
)
(
)
(
)
Be careful to keep the range in mind when computing probabilities. Continuing with the example,
suppose you wanted P(-0.4<X<-0.1). There is no need to perform the integration in this example, since
the interval (-0.4,-0.1) is outside the range of X (given to be the interval between 0 and 1). Thus, P(0.4<X<-0.1)=0. Also, P(-0.3<X<0.7) must be “edited” to P(0<X<0.7) before the probability is
computed. The interval (-0.3,0) contains no probability by assumption, so we don’t need to integrate on
that region. In general, look at the probability you are interested in, decide where you need to integrate
based on the range of the random variable (don’t integrate outside the range!), and only then perform
the integration to find the probabilities.
2 Cumulative Distribution Functions
We are interested in interval probabilities such as P(a≤X≤b). Instead of doing the integration again and
again for every a and b, one useful function is the cumulative distribution function, or cdf.
Specifically, the cumulative distribution function F(x) is
(
)
∫
( )
(
)
{
Notice that we switched the variable of integration from x to t. The variable of integration is irrelevant;
we could use any letter we want. However, the variable of integration cannot match the letter used for
the boundary of integration. Thus, for the cdf, the boundary of integration involves x and we use t here
for the variable of integration. The cdf values outside of (xmin,xmax) are derived simply because X cannot
be outside that range. Thus, the probability of being below any value less than xmin is 0 by the definition
of xmin, and the probability of being below any value greater than xmax is 1 because the entire range of X is
below xmax.
To continue our example, reconsider the density f(x)=0.5x for x between 0 and 2. The cdf in that
range is
The cdf allows us to compute many kinds of probabilities directly, without performing any integration
(the integration has been “performed”). In particular,
1. P(X≤x)=F(x) (this is by definition).
2. P(X>x)=1-P(X≤x)=1-F(x) (this is by the complement rule). Remember that for continuous
random variables P(X>x)=P(X≥x), since points have no probability.
3. P(a<X≤b)=P(X≤b)-P(X≤a)=F(b)-F(a). Again note that for continuous random variables points
have no probability, so P(a<X≤b)= P(a≤X≤b)= P(a≤X<b)= P(a<X<b).
2
3 Expected Values and Variances
∑
Recall that, for discrete random variables, we had the formulas [ ]
(
), [ ( )]
[(
[ ] ( [ ]) . Similar formulas hold for continuous
∑ ( ) (
) ]
), and [ ]
distributions, except that the sums are replaced with integrals, and the P(X=x) is replaced with the
density f(x). Specifically
[ ]
[(
[ ]
∫
[ ( )]
∫
( ) ( )
(
[ ])
[ ]) ]
∫
In the example with f(x)=0.5x for x
∫
( )
( )
[∫
[
( )
( )
]
( [ ])
]
(0,2), we find
As with discrete distributions, the square root of the variance is called the standard deviation.
4 Normal Distributions
4.1
Introduction
A Normal distribution has two parameters, µ and σ2. The parameter µ is called the mean, and
determines the central location of the distribution. The parameter σ2 is called the variance (σ is the
standard deviation) and determines the spread of the distribution. If a random variable X has a normal
distribution with mean µ and variance σ2, then we write X N(µ,σ2). One particular normal, called the
standard normal, occurs when µ=0 and σ2=1. Usually a random variable with a standard normal
distribution is written as Z.
The density function f(x) for a normal curve is rather messy. The range of a normal distribution is
the entire real line (-∞,∞) and the density is
3
( )
{
√
(
)
}
Plotting this results in a “bell curve”. The expectation and variance of the normal may be found using
the integration formulas from Section 3. These integrals result in
[ ]
[
]
[ ]
∫
√
∫
{
√
[
]
(
{
)
(
)
}
}
( [ ])
You can find probabilities (in theory!) for the normal distribution using the density f(x) just like any
other continuous density. For any numbers a<b,
(
)
∫
( )
The difficulty is that the function f(x) cannot be integrated analytically. This is not to say it’s just
extremely hard. It’s more than that. It just can’t be done. The function f(x) does NOT have an
antiderivative. Finding antiderivatives is not the only way of computing integrals, however. One can
also use numerical integration (such as the trapezoidal rule or Simpson’s rule from your calculus
classes). At some point, people did this for the standard normal, which has density
( )
⁄
√
The numerical integration was focused at finding the cumulative distribution function of the standard
normal, which is
( )
∫
( )
Recall from the previous sections that knowing the cumulative density functions allows you to compute
all kinds of interval probabilities, such as P(Z≤z), P(Z>z), and P(a<Z≤b). The numerical results for
Φ(z) were typically reported in tables, one of which is available on the course website. Today, we more
commonly use computer programs to find answers. We will focus on both approaches in this course.
Normal tables provide values of Φ(z) for z between about -3.5 and 3.5. For values of z less than 3.5, P(Z≤z) is essentially 0, and for values of z above 3.5, P(Z≤z) is essentially 1. If you are working
problems where z is outside the range of the table, you may substitute 0 or 1 appropriately for Φ(z). If z
is not divisible by 0.01, just use the closest z value in the table. For example, if you are trying to find
P(Z≤2.3125), just use P(Z≤2.31). Computer programs, by contrast, allow you to put in any number and
get an answer, and are thus easier to use and more accurate (surprise!). In R, the command is
pnorm(2.3125)
Of course, 2.3125 can be replaced with any number.
4
The remainder of this document is intended to describe a method for computing probabilities for
arbitrary normal random variables (normal random variables without µ=0 and σ2=1) using the cdf of the
standard normal Φ(z).
4.2
4.2.1
Computing Normal Probabilities and Quantiles
Linear Transformations of Normal Distributions
Normal distributions are strange in that despite having an ugly looking density and being impossible to
integrate, virtually everything else you want to do with them works out nicely (albeit occasionally after
a lot of work). One quite useful fact about normal distributions concerns linear transformations.
Suppose X N(µ,σ2) and Y=aX+b, so Y is a linear transformation of X.
We already know from a previous handout that E[Y]=aE[X]+b=aµ+b and V[Y]=a2V[Y]=a2σ2.
These facts do not require the normality of X. However, when X is normally distributed, the linear
transformation Y=aX+b also has a normal distribution, with the mean and variance given by those same
formulas. Thus µY= aµX+b and
.
This result applies even more generally, since linear combinations of independent normal random
variables also have normal distributions. In other words, if
(
) and
(
) with X
independent of Y, then W=aX+bY+c also has a normal distribution. Using the formulas for the mean
and variance of a linear combination, we find [ ]
and (because X and Y are
independent) [ ]
. We summarize these results in the following theorem:
THEOREM (Linear Transformations and Combinations of Normal Random Variables)
a) If X N(µX,
b) If
(
) and Y=aX+b, then Y N(aµX+b,a2 ).
),
(
), and X and Y are independent, then W=aX+bY+c N(
).
c) In general, if X1,…,Xn are all independent,
(
), and W=a1X1+…+anXn+b, then
(
)
The most important use of this theorem is the Z transformation, which provides a link between standard
normals and any other normal.
COROLLARY (Z Transformations)
a) If X N(µ,σ2), then Z=(X-µ)/σ N(0,1).
b) If Z N(0,1), then X=σZ+µ N(µ,σ2).
Z transformations, together with the standard normal cdf Φ(z), allow us to compute interval
probabilities for any normal distribution.
4.2.2
Computing Normal Probabilities and Quantiles Using the Z Table
The building block for finding probabilities for any normal distribution is the standard normal cdf Φ(z).
Suppose X N(µ,σ2). To find probabilities involving X, we use the Z transformation and then find
probabilities in terms of a standard normal.
1. P(X≤x). Performing some algebra, we find
(
)
(
)
5
(
)
Thus, to find P(X≤x), look up Φ((x-µ)/σ).
2. Recall P(X≥x)=1-P(X≤x).
3. P(a≤X≤b)=P(X≤b)-P(X≤a).
A quantile is a percentile expressed as a probability. The 80th percentile of a distribution, for
example, is the point where 80% of the population is beneath that value. This is equivalent to the 0.8
quantile. Similarly, the 30th percentile is the 0.3 quantile.
Suppose Z N(0,1) and we want to find the point z such that P(Z≤z)=0.4. The margins of a Z-table
contain values of z, while the interior of the table contains probabilities. Thus, if you want to find a
value of z, you are looking for a value on the margin of the table. The given information, 0.4, is a
probability, so you have to find 0.4 in the body of the table. You will find that the closest value to 0.4 in
the table is 0.4013, corresponding to z=(-0.25).
To find the quantile of an arbitrary normal X N(µ,σ2), you first have to find the Z-score for the
given quantile, and then use the formula X=σZ+µ to convert it back to the original X. For example,
suppose X N(µ=50,σ2=100) and we want to find the 0.7 quantile. Finding 0.7 in the body of the table,
the closest value is 0.6985, which corresponds to a Z-score of 0.52. Converting this back to the original
scale, we find X=(10)(0.52)+50=55.2.
4.2.3
Computing Normal Probabilities and Quantiles Using R
Although the equations above involving standard normals will work, R provides easier ways of finding
normal probabilities that avoid the Z transformation. In addition to
pnorm(z)
providing P(Z≤z) for a standard normal Z, you can also just specify the mean and standard deviation. R
uses the standard deviation, not the variance, to measure spread for normal distributions. Thus, if X N(5,3),
where 3 is the variance, you can find P(X≤2) using the R command
pnorm(2,5,sqrt(3))
The formulas in the previous section (P(X≥x)=1-P(X≤x) and P(a≤X≤b)=P(X≤b)-P(X≤a)) still apply, so
you can find P(X>6) using the command
1- pnorm(6,5,sqrt(3))
and you could find P(1≤X≤7) using the command
pnorm(7,5,sqrt(3))-pnorm(1,5,sqrt(3))
Quantiles in R are also easy to compute. The qnorm command supplies quantiles. Thus, to find the
point x such that P(X≤x)=0.6, just use
qnorm(0.6,5,sqrt(3))
Obviously, using the computer is easier and more accurate than using the table. Thus, this is the
recommend method. Unfortunately, as computers are unavailable as standard equipment for exams, you
still need to learn the Z-table method. In addition, should you decide to take more statistics,
understanding the Z-transformation is the foundation of some more advanced theorems.
6
4.3
Examples of Finding Probabilities with Normal Tables
Let X N(15,32)
a) Find P(X<12.5), P(X>14.2), and P(X=15).
b) Find P(3.5<X<16.4) and P(10.4<X<13.6).
c) Find c such that P(X<c)=0.05 and d such that P(15-d<X<15+d)=0.80.
7
5 Problems
1) Let X be a continuous random variable with density f(x)=1-(x/2) over the range 0 to 2.
a) Find the cdf of X.
b) What are P(X=0.5), P(X<-1.5), P(X<0.7), P(X>0.2), and P(0.1<X<1.2)?
c) What are E[X] and V[X]?
2) Let X N(µ=5,σ2=9).
a) What are P(X=5), P(X<7), P(X>8), and P(5.2<X≤8.6)?
b) What is the 0.4 quantile of X?
c) Find a central region containing 90% of the probability.
3) Let X be a continuous random variable with density f(x)=x-0.5 for 1<x<2.
a) Find P(0<X<1.5).
b) Find P(X<4).
c) Find the cdf F(x).
4) Let X N(5,42). Find
a) P(X≥18).
b) P(X<-0.5).
c) P(6<X<11).
5) Let X be a continuous random variable with density 2(1-x) over the range (0,1). Find P(X>0.2),
E[X], and V[X].
6) Let X and Y be independent random variables with X N(3,62) and Y N(-2,32). Define W=3X2Y-1. Find
a) the distribution of W.
b) P(W=3).
c) P(W>40).
7) A student turns in 10 homework assignments over the course of a semester. Suppose scores on each
homework are normally distributed with mean 70 and variance 42, and assume all homeworks are
independent.
a) Let Y be the TOTAL number of points on the 10 homework assignments. Find the distribution
of Y.
b) Let ̅ be the AVERAGE number of points on the 10 homework assignments. Find the
distribution of ̅ .
c) Find P( ̅ >72).
8) A potential buyer is evaluating two different types of servers. Server A’s response times (in ms) are
distributed N(24,32) while server B’s response times are distributed N(22,42). To test the servers,
the buyer makes 5 requests to server A and 5 requests to server B. Assume the requests are
independent. Let ̅ be the mean response time for server A and let ̅ be the mean response time for
server B.
a) Find the distribution of ̅- ̅ .
b) Find P( ̅> ̅ ).
8