Download Statistics Notes (6) - Home Page of Vance A. Hughey

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Principles of Statistics I
Lecture Notes for
Principles of Statistics I (Economics 261)
Western Nevada College
Copyright © 2010
By Vance A. Hughey
Carson City, Nevada
The following material is copyrighted. The text of this publication, or any part thereof, may not be reproduced
or transmitted in any form or by any means, electronic or mechanical, including photocopying, storage in an
information retrieval system, or otherwise, without the prior written permission of Vance A. Hughey.
Continuous Probability Distributions
Page 1
Principles of Statistics I
6.
Continuous Probability Distributions—Introduces probability distribution of continuous random variables.
Three major continuous probability distributions—the uniform, the normal, and the exponential
distributions—are introduced.
Learning Objectives:
Topics:
1. Understand the difference between how probabilities are computed for
discrete and continuous random variables.
2. Know how to compute probability values for a continuous uniform
probability distribution and be able to compute the expected value and
variance for such a distribution.
3. Be able to compute probabilities using a normal probability
distribution. Understand the role of the standard normal distribution in
this process.
4. Know how and when the normal distributions can be used to
approximate binomial probabilities.
5. Be able to compute probabilities using an exponential probability
distribution.
6. Understand the relationship between the Poisson and exponential
probability distributions.
Uniform distributions
Normal probability distribution
Standard normal distribution
Exponential probability distribution
Key concepts:
Uniform distributions
Probability density function
Normal probability distribution
Bell-shaped curve
z-value
Standard normal distribution
Continuity correction factor
Exponential probability distribution
We have just finished studying discrete probability distributions, but we now turn to
continuous probability distributions. The key difference (which leads to a major
difference in how probabilities are computed) stems from the fact that with
continuous probability distributions, the random variable can take on any value in an
interval. There are an infinite number of possible values in any interval, so we won’t
be talking about a specific value. Instead we will discuss the probability that a
continuous random variable will lie within a specific interval.
This technique is useful in a broad range of problems. For example, consider the
probability that a can of soup will contain between 8.00 and 8.33 ounces of soup. It is
not meaningful, or is impractical, to ask what the probability is that a can contains
exactly 8.2534 ounces.
Other kinds of problems that can be addressed include the probability that a stock
price will vary from $45 to $76, or the probability that a loan application will be
processed within a particular 30-minute interval, or the probability that a firm’s
accounts receivable will be from 30 to 60 days old. There are many other
Continuous Probability Distributions
Page 2
Principles of Statistics I
applications in engineering, manufacturing, medicine, transportation, psychology, and
other fields.
We are going to discuss three continuous probability distributions:
And we will be discussing the probability that a continuous random variable will lie
within a specific interval.
Let’s start with the uniform distribution since it will clearly show the differences
between discrete and continuous probability distributions concerning how
probabilities are computed.
Uniform Distribution
If we assume, as the text does, that the flight between Chicago and New York can
take from 120 to 140 minutes, we can define x as a random variable where:
120 ≤ x ≤ 140 .
If we assume that the flight can never take less than 120 minutes nor more than
140 minutes, then f ( x) = 0 for any interval <120 or >140.
Continuous Probability Distributions
Page 3
Principles of Statistics I
There are 20 one-minute intervals between 120 and 140 minutes, and if we further
assume that each one-minute interval is equally likely, then the random variable x has
a uniform probability distribution. The probability density function would be:
f ( x) =
1
for 120 ≤ x ≤ 140
20
The graph of this probability density function would look like this (Figure 6.2—
page 228):
f(x)
P(120<=x<=130)=Area=1/20(10)=.50
1/20
x
120
125
130
Flight Time in Minutes
135
140
We can also generalize the uniform probability density function as follows
(Equation 6.1—page 227):
⎧⎛ 1 ⎞
⎪
for a ≤ x ≤ b
f ( x) = ⎨⎜⎝ b - a ⎟⎠
⎪ 0
elsewhere
⎩
where: a = smallest value the variable can assume; and
b = largest value the variable can assume.
In the case of the probability density function f ( x) , the height of the function at any
particular value of x does not represent probability. Instead, we look at the area to
find probability.
What is the probability that a flight time will fall between 120 and 140?
Answer: 1.0.
What is the probability that the flight time will fall between 120 and 130?
Answer: 0.50, since each one-minute interval is equally likely.
Continuous Probability Distributions
Page 4
Principles of Statistics I
Optional
The mean and variance for the uniform probability distribution are:
µ=
σ
2
a+b
2
(b − a )
=
2
12
The Normal Distribution
One of the most common distributions used in business and economics for decisionmaking purposes is the normal probability distribution. This distribution is also
known as the Gaussian distribution, named after Carl Friedrich Gauss (1777-1855)
who was the first person to explore the properties of the normal curve. It has been
used in a wide variety of applications including heights and weights of people,
scientific measurements, test scores, and amounts of rainfall.
(Later we will also see how a continuous normal random variable can be used as an
approximation in situations involving discrete random variables.)
You’ve all probably seen a representation of the normal (or bell-shaped) curve at
some time or another. Perhaps one of your instructors has drawn such a curve in
conjunction with reporting of test scores. The probability density function that
defines the bell-shaped curve of the normal probability distribution is (Equation 6.2—
page 232):
f ( x) =
1
σ 2π
e−(x− µ )
2
/ 2σ
2
Don’t worry about this formula since it only provides the height of the curve for any
value of x.
Continuous Probability Distributions
Page 5
Principles of Statistics I
The bell-shaped curve looks like this:
µ
The normal probability distribution (normal curve) is an extremely important
distribution in statistics, so let’s spend some time reviewing some of the important
characteristics of this kind of distribution.
Characteristics of normal probability distributions
1.
There is an entire family of normal probability distributions. Each has
a different combination of mean µ and standard deviation σ .
2.
The mean, median, and mode are the same for a normal curve and
occur at the value of x where the curve is at its highest point.
3.
The mean of a normal distribution can be any numerical value:
positive, negative, or zero.
4.
The normal distribution is symmetric. The “tails” approach but never
touch the horizontal axis.
Continuous Probability Distributions
Page 6
Principles of Statistics I
5.
The standard deviation determines the width of the curve to the extent
that the higher the standard deviation, the flatter the curve.
6.
The total area under the curve (as with all continuous probability
distributions) is 1.
7.
Regardless of the value of the mean, µ , and the standard deviation, σ ,
probabilities for the normal random variable are given by the area
under the curve. We’ll show you how to compute these probabilities
shortly.
The Standard Normal Probability Distribution
There is a special normal curve that is used in decision-making called the standard
normal probability distribution. What distinguishes this curve from all of the other
normal curves is that it has a mean, µ , of 0 and a standard deviation, σ , of 1, and
instead of using x as the random variable, we use the letter z.
We use this special curve to find probabilities. We know that there is a 100% chance
or a probability of 1.00 that z falls within an interval under the curve. What is the
probability that z falls within an interval above the mean? 0.50. Half of the z values
are above the mean and half are below the mean. What if we wanted to know the
probability that z was between 0 and 1, inclusive? Here’s where we refer to a table
developed specifically for computing the area under the standard normal curve.
Computing the area under the curve within a defined interval is the same as
determining the probability that z falls within the defined interval.
The tables in the inside front cover of your textbook show the area under the curve
from the mean to any specified z value. If we want to know the probability that z falls
within 1 standard deviation above the mean, we can look up the z value of 1.00 and
find the probability 0.3413 (0.8413–0.5000 using the right-side table).
What is the probability that z falls within 1 standard deviation of the mean? In this
case we would be looking for the area both above and below the mean that is within
one standard deviation of the mean. Since we already know the probability of z
falling within the interval z=0 to 1, we can just double it to get 0.6826 (68.3%). Note
that this is the same probability as we see in Figure 6.4 (page 234). If we were to
calculate the probability that the random variable z would take on a value within plus
Continuous Probability Distributions
Page 7
Principles of Statistics I
or minus 2 standard deviations of the mean we would get 95.44% and for 3 standard
deviations we would get 99.72%.
Before we can use the standard normal probability distribution in any real-world
decision-making, we need to come up with a means of converting an x-value to a
z-value. You see, there are very few real-world cases where the distribution has
a mean of 0 and a standard deviation of 1.
The formula for converting to the standard normal distribution is:
z=
x −µ
σ
Using this formula we can compute probabilities for any normal distribution. Let’s
go through an example now to put this entire discussion in a more meaningful light.
Again, for simplicity, let’s start with the example from the text—The Grear Tire
Company problem on page 239.
This company wants to place a guarantee on its new tire but isn’t sure what the policy
should be. Actual tests show the mean mileage to be 36,500 with a standard
deviation of 5,000 miles. What is the probability that the tire mileage will exceed
40,000 miles for any given tire? Using 40,000 for x, 36,500 for µ , and 5,000 for σ ,
we can use Formula 6.3 (page 238) to convert x to z.
z=
x −µ
σ
40,000 − 36,500
5,000
3,500
=
5,000
z = 0.70
=
Now that we have a z value, we can compute the probability that z > 0.70 (which is
the same as the probability that x ≥ 40,000 . We know that the probability that z is
greater than 0 is 0.50. But we want the probability that z > 0.70 , so we’re looking for
an area to the right of z = 0.70 . We need to find the probability that z is between
0 and 0.7, and then subtract that value from 0.5. This gives us
0.5000− 0.2580 = 0.2420
From this we conclude that about 24.2 percent of the tires will last at least
40,000 miles.
Now for some tricky stuff. What should the guarantee be if Grear would like no more
than 10 percent of the tires to be eligible for the discount guarantee? We need to set a
Continuous Probability Distributions
Page 8
Principles of Statistics I
minimum guaranteed mileage where the probability of a tire not getting that mileage
is 10 percent. Figure 6.7 (page 240) shows this situation. The x value that
corresponds with the minimum mileage is where the 50 percent lower tail of the
distribution would break at 0.40. So, if we find 0.1000 (actually 0.1003) in the leftside table on the inside of the front cover of the text, we see that it corresponds with
z = −1.28 . A minus since it is below the mean.
We have the z-value but we need to convert back to x:
x −µ
= −1.28
σ
x − µ = 1.28σ
x = µ −1.28σ
x = 36,500 −1.28(5,000) = 30,100
z=
Grear might want to set a 30,000-mile guarantee for its tires.
Normal Approximation of Binomial Probabilities
Suppose, as the text does, that we want to find the probability that a sample of
100 invoices has 12 errors when a 10 percent error rate is average for a particular
company. Since binomial probability tables don’t usually go over n = 20 , we could
use the normal distribution to approximate the binomial probability.
First, we set µ = np and σ = np(1− p) .
µ = (100)(0.1) = 10 .
σ = (100)(0.1)(0.9) = 3 .
Next, we want to find the area under the curve where x = 12 . But since we cannot
find a continuous probability at a point, we need to convert the discrete point to an
interval using a continuity correction factor. We add and subtract 0.5 from 12 to
get an interval of 11.5 to 12.5. Now we can find the area under the curve for this
interval.
Finally, we convert to the standard normal probability distribution using Equation 6.3
(page 238). We’ll find the z value for x = 12.5 and for x = 11.5. Once we have the
probabilities for each area, we’ll subtract to find the probability for the interval.
z=
x − µ 12.5 −10.0
=
= 0.83
σ
3
at x = 12.5
z=
x − µ 11.5 −10.0
=
= 0.50
σ
3
at x = 11.5
Continuous Probability Distributions
Page 9
Principles of Statistics I
From the right-side table, the area under the curve for z = 0.83 is 0.7967 and for
z = 0.50 is 0.6915. Subtracting, we get P( x = 12) = 0.1052 .
The exponential distribution
A continuous probability distribution that is useful in describing the time or space
between occurrences of an event is the exponential probability distribution.
The formula for calculating the probability of obtaining a value for the exponential
random variable of less than or equal to some specific value of x is:
P( x ≤ x0 ) = 1− e−x 0 / µ
(Equation 6.4)
Let’s go through an example. Let’s say that your experience at a bank is such that it
usually takes 7 minutes for you to complete your weekly deposit. One day you
decide that you would like to know the probability of getting your business done in
3 minutes or less. You recognize this as a problem involving a continuous probability
distribution and further recognize it as a time interval problem where the exponential
distribution would be appropriate. You drag out the formula for calculating the
probability of obtaining a value for the exponential random variable of less than or
equal to some specific value of x ( x0 ) :
P( x ≤ x0 ) = 1− e−x 0 / µ
P ( x ≤ 3) = 1− e−3 / 7
P( x ≤ 3) = 0.3486
The exponential and Poisson distributions are similar in that the Poisson distribution
provides a description of the number of occurrences per interval, and the exponential
distribution provides a description of the length of the interval between occurrences.
Continuous Probability Distributions
Page 10
Principles of Statistics I
Next class
•
Chapter 7—Sampling and Sampling Distributions
Continuous Probability Distributions
Page 11