Download 4. CONTINUOUS DISTRIBUTIONS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
CIVL 3103 – Approximation and Uncertainty
J.W. Hurley, R.W. Meier
4. CONTINUOUS DISTRIBUTIONS
The beautiful thing about learning is nobody can
take it away from you.
– B. B. King
In the last chapter, we said that continuous random variables can take on continuous values. Examples of
continuous random variables could be time, distance, mass, force, temperature, or any combination of
these (such as stress or velocity). This makes them fundamentally different from discrete random
variables, which can only take on certain specific values (head/tails, hearts/spades/clubs/diamonds, etc.)
PROPERTIES OF CONTINUOUS RANDOM VARIABLES
With a continuous random variable, P(X = x) is zero, so you have to address probability in a different
way. You have to talk, instead, about the probability that the random variable X will fall within some
range of values a ≤ X ≤ b. Those probabilities are determined by finding the area under a curve called the
probability density function:
b
P ( a ≤ X ≤ b ) = ∫ f ( x ) dx
f(x)
a
a
b
x
The probability density function has to satisfy two conditions:
f ( x ) ≥ 0 and
∞
∫ f ( x ) dx = 1
−∞
The former ensures that the area under the curve (probability) is always non-negative and the latter says
that it’s 100% certain that the experiment will have some outcome! These are, of course, the same
conditions that had to be satisfied by the probability mass function for discrete random variables.
Now, no one likes to integrate functions (well, mathematicians might, but engineers sure don't!) and that's
all you can do with the probability density function. You do have a friend, though, and it's called the
cumulative distribution function (or simply the distribution function). Just as for discrete random
variables, the cumulative distribution function gives the probability that the random variable X will take
on a value less than or equal to x when the experiment is performed:
F ( x ) = P( X ≤ x) =
x
∫ f ( u ) du
−∞
If you have the distribution function, you don't have to integrate anything! To determine the probability
that X will be less than or equal to some value a , simply pull the values off the curve:
P( X ≤ a) = F ( a )
1
F(x)
F(b)
F(a)
0
a
x
b
If you want to know the probability that the random variable X will take on a range of values:
P ( a ≤ X ≤ b ) = F (b) − F ( a )
Unfortunately, we don't have distribution functions for all of the various probability density functions
because some of them can’t be integrated in closed form. Instead, they must be integrated numerically
(more about that at the end of the course when we talk about approximation).
Fortunately, someone has already done the numerical integration for you and summarized the results in
tabular form. There is an appendix full of tables in the back of your textbook covering the most
commonly used continuous distribution functions (all of which will be covered later in this chapter and at
various points throughout the rest of the course).
39
The expected value of a continuous random variable is the probability-weighted average of the possible
outcomes (as it was for discrete random variables). We said earlier that the expected value (also called the
mean) is just the point about which the probability distribution is centered. In engineering terms, it’s the
centroid of the probability distribution. As you no doubt remember from Statics, the centroid of an area
can be calculated as
E ( X ) = µx =
∞
∫ x f ( x ) dx
−∞
The variance of a continuous random variable is just the probability-weighted average of the squared
deviations from the expected value (as it was for discrete random variables). It can be calculated in an
analogous manner:
V ( X ) = σ 2x =
∞
∫ ( x − µ ) f ( x ) dx
2
x
−∞
As before, we can write the variance as the expected value of the squared deviations:
2
V ( X ) = E ( x − µ x ) 


and we can also calculate the variance as
V ( X ) = E ( X 2 ) −  E ( X ) 
2
which is easier to compute than squaring a bunch of deviations.
SOME IMPORTANT CONTINUOUS DISTRIBUTIONS
In this chapter, we’ll look at three of the most commonly used continuous distributions:
o
The Uniform Distribution
o
The Exponential Distribution
o
The Normal Distribution
In subsequent chapters, we’ll add several more to this list.
40
THE UNIFORM DISTRIBUTION
The uniform distribution is one in which a value is equally likely to occur anywhere in the range between
the smallest possible value a and the largest possible value b:
f(x)
c
a
x
b
By definition, the total area under the probability density function must be one, so:
c (b − a ) = 1 ⇒ c =
1
b−a
From this, we conclude that the probability density function for a uniformly distributed random variable
is simply
f ( x) =
1
for a ≤ x ≤ b (0 elsewhere)
b−a
The cumulative distribution function, which gives P(X ≤ x), can be calculated by integrating the probability
density function between –∞ and x. This task is made simpler by the fact that f(x) is zero for any X ≤ a:
x
F ( x) =
∫
a
f (u ) du =
−∞
x
x
1
1
du =
du
−
−
b
a
b
a ∫a
a
∫ 0 du + ∫
−∞
Completing the integration, we get
F ( x) =
x−a
for a ≤ x ≤ b
b−a
which is just the equation of a straight line extending from 0 at X = a (there’s no chance of X taking on a
value less than a ) to 1 at X = b (it’s a certainty that X will take on a value less than or equal to b ).
41
The expected value of a uniformly distributed random variable can be found from our earlier equation as:
E(X ) =
∞
∫
x f ( x ) dx =
−∞
b
1
b−a
∫ x dx =
a
1  b2 − a2 


b−a 2 
This simplifies to
E( X ) = µ =
a +b
2
which is just the midpoint of the range. Of course you probably didn’t need to integrate to figure this out. If
the random variable X has to be somewhere between a and b then your best guess for the value of that
variable would probably be the midpoint. The phrase “best guess” is synonymous with “expected value.”
The variance of a uniformly distributed random variable can be found using the simplified formula
( )
V ( X ) = E X 2 −  E ( X ) 
2
The expected value of X2 can be calculated as
∞
b
( ) ∫ x f ( x ) dx = b −1 a ∫ x
E X2 =
2
−∞
2
dx =
a
1  b3 − a 3  b 2 + ab + a 2

=
b−a 3 
3
Substituting for E(X) from the equation above,
V (X )=
b 2 + ab + a 2  a + b 
−

3
 2 
2
which can be simplified to
V ( X ) = σ2 =
1
2
(b − a )
12
Recalling that the standard deviation is just the square root of the variance, the standard deviation of a
uniformly distributed random variable is
σ=
1
(b − a )
12
42
EXAMPLE: The Uniform Distribution
Most calculators can generate “random numbers” as can most spreadsheets (e.g., Excel uses the function
RAND to put a random number in a cell). These electronically generated “random numbers” appear,
statistically, to be random. By this we mean that we cannot conclude, using various statistical tests, that
these numbers do not follow a uniform distribution. The numbers are not really random, though, because
they are generated by a computer algorithm that uniquely determines the next number to be generated
from the last number generated or from some “seed value” entered by the user or stored in memory. For
example, a very simple random number generator is:
5
5
xi = ( xi −1 + π ) − int ( xi −1 + π ) 


This generates pseudo-random numbers between 0 and 1, which we abbreviate as X ∼ U [ 0 , 1 ] meaning
“the random variable X is uniformly distributed over the range [0,1].”
What is the probability that the random number generator (RNG) above will generate a value between 0.2
and 0.3 the next time it is invoked?
43
THE EXPONENTIAL DISTRIBUTION
The exponential distribution is often used to model the times between successive events, such as
customers arriving at a drive-thru window. If the arrivals are at random (i.e., completely independent of
one another), then the number of arrivals per unit time is Poisson distributed (as we learned in the last
chapter) and the time intervals between arrivals will be exponentially distributed.
The exponential distribution is defined by a single parameter, λ, which is the average number of arrivals
per unit of time. (Yes, that’s the same λ that defines the Poisson distribution):
f(x)
λ
0
x
The probability density function for the exponential distribution is:
f ( x ) = λe −λx
which can be integrated to provide the cumulative distribution function
F ( x ) = 1 − e − λx
The cumulative distribution function gives the probability that the next arrival will be in no more than x time
units given an expected (or average) time between arrivals of:
E( X ) = µ =
1
λ
The variance of the exponential distribution is given by:
V (X ) = σ 2 =
44
1
λ2
It is just as common to ask “What is the probability that the next arrival will be at least x minutes from now?”
as it is to ask “What is the probability that the next arrival will be in less than x minutes?” The two questions
are complementary:
P ( X ≥ x) = 1 − P ( X ≤ x)
Substituting the cumulative probability function for P(X ≤ x) gives
P ( X ≥ x ) = e −λx
An important property of the exponential distribution is that it is memoryless. That means that it doesn’t
matter if the last arrival was one second ago or one hour ago, the probability that an arrival will (or won’t)
occur in the next x minutes is exactly the same. The Poisson process doesn’t remember when the last arrival
was.
This derives from the assumption that the arrivals are completely independent of one another, so the timing of
the last arrival should have no effect on the timing of the next arrival.
EXAMPLE: The Exponential Distribution
Telephone calls arrive at the help desk of a small computer software company at the rate of 15 per hour.
What is the probability that the next call arrives within 3 minutes?
45
MODELING FAILURE WITH AN EXPONENTIAL DISTRIBUTION
Random equipment failures (as opposed to, say, those that are age-related or triggered by an event, such
as a voltage spike) are often modeled as Poisson processes. Each failure is considered a random “arrival”
so the length of time between failures follows an exponential distribution. In this context, the parameter λ
is referred to as the failure rate (the expected number of failures per unit time) and its inverse (the
expected amount of time between failures) is called the Mean Time Between Failures (MTBF).
EXAMPLE:
If the mean time between failures for a SCSI disk drive is 1,000,000 hours (approximately 114 years),
what is the probability that the disk drive will last 5 years?
E(X ) =
λ=
1
= 1,000,000 hours failure
λ
1
failures hour
1,000,000
( MTBF )
( failure rate )
Asking the probability that the disk drive will last 5 years is the same as asking the probability that the
next failure will be more than 5 years away:
P ( X > 5 years ) = P ( X > 43,800 hours ) = e
−
43,800
1,000,000
= 0.957
So there’s a 4.3% chance that the disk drive will fail in 5 years, even though its MTBF is nearly 114 years!
Most equipment is made up of multiple components. The MTBF for a system with multiple components is
given by
MTBF =
1
λ1 + λ 2 + K + λ n
For example, for a large corporate server with perhaps 100 disk drives, the MTBF for the system is
1
MTBF =
100 ×
1
1,000,000
= 10,000 hours
which is just slightly more than one year! The probability that the system will last 5 years without a single disk
drive going bad is
P ( X > 5 years ) = P ( X > 43,800 hours ) = e
46
−
43,800
10,000
= 0.0125
THE NORMAL DISTRIBUTION
The normal distribution was discovered in 1733 by De Moivre, who found it to be the limiting case of the
binomial distribution (which, you’ll recall, gives the probability of x successes in n trials) as the number of
trials becomes infinite. The normal distribution was “discovered” again a half century later by both Laplace
and Gauss, each of whom found that it seemed to describe the distribution of errors in astronomical
measurements. The normal distribution is the foundation of many statistical methods used in data analysis
because it does accurately describe the distribution of random errors. Its importance cannot be overstated.
The normal distribution is bell-shaped, symmetrical about the mean, µ, and ranges from -∞ to ∞.
f(x)
x
µ
The probability density function is given by
1  x −µ  2

σ 
− 
1
f ( x) =
e 2
σ 2π
−∞< x<∞
−∞<µ <∞
σ >0
We have a shorthand way of describing a normally distributed random variable:
X ∼ N [ µ , σ2 ]
This says “the random variable X is normally distributed with mean µ and variance σ2.”
If, for example, we have X ∼ N[25,100] , the interpretation would be that X is normally distributed with a
mean of 25 and a variance of 100.
µ = 25
(population mean)
σ 2 = 100
(population variance)
σ = 10
(population standard deviation)
This is known as a specific normal distribution, because it has specified values of µ and σ2. (We could, of
course, talk about specific binomial distributions and specific Poisson distributions, too.)
47
Because the normal distribution is symmetrical, the mean µ determines where on the x-axis the bell-curve is
centered:
µ1
µ2
µ3
The standard deviation σ determines the breadth of the bell-curve and, indirectly, the height of the curve.
Because the area under the curve must always be 1, the height of the curve varies in inverse proportion to its
breadth. For example, if one bell curve is twice as wide as another, it must also be half as high:
σ=1
σ=2
µ
As with all continuous distributions, the probability density function doesn’t do us much good. What we really
need to know is the cumulative probability function,
F ( x) =
1
x
∫e
σ 2π −∞
1  u −µ  2
− 

2 σ 
du
Unfortunately, this integral cannot be evaluated in closed form. Instead, it has to be integrated numerically and
the results tabulated for future use.
48
This presents a bit of a problem. There are an infinite number of normal random variables, each having a
unique µ and σ . To calculate any probabilities associated with a specific normal variable, we’d have to have a
table corresponding to those exact values of µ and σ. That means we’d need an infinite number of tables!
Fortunately, there is an easy way out.
THE STANDARD NORMAL DISTRIBUTION
If X is a normally distributed random variable with mean µ and variance σ 2, then ( X − µ ) σ is a normally
distributed random variable with zero mean and unit variance. In “shorthand” notation:
If
X
N  µ ,σ 2 
then
X −µ
σ
N [ 0,1]
For convenience, we define the Z statistic as:
Z=
X −µ
σ
so we can write
Z ∼ N[0,1]
The standard normal curve is symmetric about the origin and lies almost entirely within in the range [-3,+3] as
shown below:
f(Z)
-4
-3
-2
-1
0
1
2
3
4
Z
“So what?” you say. Well, now we don’t have to integrate specific normal distributions. We can, instead,
numerically integrate the standard normal distribution once, store the values in a table, and use the table for
every problem involving normally distributed random variables. Table 1 of Appendix B in the textbook is an
example of such a table. Table 1 contains values of
F ( z) = P(Z ≤ z)
for selected non-negative values of Z. Get used to this table…you will use it a lot!
49
Wait a minute … read that last paragraph again. Table 1 in Appendix B contains values of F(z) for selected
non-negative values of Z . What happens if Z is negative (meaning X is less than the mean)?
Fortunately, we can use the symmetry of the standard normal distribution to easily calculate probabilities
associated with negative values of Z.
Examining the figure below, it is readily apparent that the area to the left of Z = – z is identical to the area to
the right of Z = + z because the bell curve is symmetrical about the origin:
f(z)
P(Z ≤ –z)
P(Z ≥ +z)
-z
0
+z
To determine the area to the left of Z = – z , we just have to calculate the area to the right of Z = + z . The area
to the right of Z = + z is P(Z ≥ +z) and, by definition,
P(Z ≥ +z) + P(Z ≤ +z) =1
(since Z ≥ z is the complement of Z ≤ z ), so
P(Z ≥ +z) =1− P(Z ≤ +z)
From this, we conclude that
F (−z ) = 1 − F (+z )
so you only need to worry about one side of the distribution!
50
EXAMPLE: The Normal Distribution
The paper “Reliability Enhancement of a New Computer” (ESS Proceedings, 1995) describes a program to
enhance the reliability of the motherboard in a massively parallel RISC computer system. Part of the paper
concentrates on a specific parity error that seems to be temperature dependent. The temperatures at which the
error occurs follow a normal distribution with a mean of 80ºC and a standard deviation of 20ºC.
a. What is the probability that an error occurs at a temperature of less than 77ºC?
b. What is the probability of an error occurring between 77ºC and 83ºC?
c. Over what range of temperatures will 2/3 of the errors occur?
d. What is the probability that an error will occur at a temperature within ±2σ of the mean?
51
THE NORMAL AS AN APPROXIMATION OF THE BINOMIAL
We said earlier that the normal distribution is the limiting case of the binomial distribution as the number of
trials goes to infinity. If the number of trials is large, it becomes difficult to calculate
a
n
P ( X ≤ a ) = F ( a ) = ∑   p x qn− x
x =0  x 
because of the sheer number of terms in the summation. Since the binomial distribution approaches the normal
distribution for large n, we can approximate the probability above using the normal distribution and forget
about calculating all those terms!
Recall that the mean of the binomial distribution is µ x = np and the variance is σ x2 = npq . If X follows a
binomial distribution and n is large (actually, if npq is greater than about 10), then:
Z=
a − np
npq
follows a standard normal distribution and F(z) can be determined from standard tables.
Example
A student is taking a true/false test for which he hasn’t studied, so he randomly circles answers. If there are 50
questions on the test, what are his chances of getting a C or better on the test (i.e., 70% or more correct
answers)?
What we’re asking is the probability of 35 or more successes in 50 trials where the probability of success on
each trial is 50%. Ordinarily, we’d have to calculate the probabilities of getting 35, 36, 37, … , 50 answers
correct and add the results. Using the normal approximation, though, we can determine the probability of
getting 34 or less correct with a single calculation, then simply subtract the result from one:
p = 0.5 ⇒ q = 1 − p = 0.5 ⇒ µ = np = 25 ⇒ σ 2 = npq = 12.5
Z=
a − µ 34 − 25
=
= 2.546
σ
12.5
P ( X ≥ 35 ) = 1 − P ( Z ≤ 2.546 ) = 1 − 0.9945 = 0.0055 !!!
Remember this next time you decide to guess on a test!
End Note:
These aren't all of the continuous distributions that we'll cover … there are still more to come, but we’ll be
exploring them in a slightly different way.
52
SOME CONTINUOUS DISTRIBUTIONS IN EXCEL
You can use Excel’s Function Wizard to implement the exponential distribution and the normal
distribution. Excel can give you both the value of the probability density function, f(x), and the value of
the cumulative probability distribution, F(x). Just make the last argument of the function TRUE if you
want F(x) and FALSE if you want f(x). If you don’t want to use the function wizard, you can simply type
the functions into a cell just like any other function:
Exponential Distribution
f(x) = EXPONDIST(x, lambda, FALSE)
F(x) = EXPONDIST(x, lambda, TRUE)
Normal Distribution
f(x) = NORMDIST(x, mean, stdev, FALSE)
F(x) = NORMDIST(x, mean, stdev, TRUE)
Excel also lets you implement the cumulative standard normal distribution. The function has just one
argument, which is z:
Standard Normal Distribution
F(z) = NORMSDIST(z)
Of course you could obtain the same results by using the NORMDIST function with a mean of zero and a
standard deviation of one. In fact, that’s the only way you can obtain the probability mass function, f(z),
for the standard normal distribution because the NORMSDIST function doesn’t have a TRUE/FALSE
argument to toggle between the two.
Many times, you need to use the cumulative normal distribution backwards. For example, to determine
the 95th percentile value of X, you have to go into the body of the Cumulative Standard Normal Table,
find the value 0.950, figure out what value of Z corresponds to that entry, and convert that value into X.
This is called inversion and Excel provides functions for solving inverse problems involving both the
normal distribution and the standard normal distribution:
Normal Distribution
X = NORMINV(probability, mean, stdev)
Standard Normal Distribution
Z = NORMSINV(probability)
The former return the value of X corresponding to the input probability and the latter returns the value of
Z corresponding to the input probability.
There is no uniform distribution in Excel (because the calculations are so straightforward), but Excel has
a couple of random number generators. The function RAND() returns a U[0,1] random number that
changes every time the spreadsheet is recalculated. If you want a random number from a uniform
distribution other than U[0,1], you can use the function RANDBETWEEN(bottom, top), which returns
random numbers uniformly distributed between bottom and top.
53