Download Chapter 6 The Normal Probability Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 6
The Normal Probability Distribution
6.1
Properties of the Normal Distribution
Recall a continuous random variable is a random variable that can take on any real value
in a speci ed range. The probabilities for all continuous random variables are given by the
areas under the probability density curve. The mathematical function, which produces this
curve, is called the probability density function.
De nition 6.1.1 The probability density function f (x) describes the distribution of probability for a continuous random variable. f (x) has the properties:
1.
The total area under the probability density curve is 1;
2.
P (a < X < b)= area under the probability density curve between a and b; and
3.
f (x) > 0 for all x.
Example 6.1.1 Suppose we have a uniformly continuous random variable X that can take
on any value between 0 and 60. The probability distribution looks like
Example 6.1.2 Using the above distribution function, we can calculate the probabilities
associated with di erent outcomes of X:
94
1. P (0 < X < 6)
2. P (X < 30)
3. P (30 < X < 50)
4. P (X = 15)
Property 6.1.1 For a continuous random variable X, P (X = x) = 0. Why?
95
6.1.1
The Normal Distribution
The Normal Distribution is one of the most famous continuous probability distributions. It
is sometimes called the Gaussian Distribution, as it was \discovered" by Gauss. The Normal
Distribution is most commonly known as the Bell Curve.
De nition 6.1.2 A continuous random variable is normally distributed or has a normal
probability distribution if its relative frequency histogram of the random variable has the
shape of a normal curve (bell-shaped and symmetric).
The highest point of a normal distribution occurs at the population mean and is always
the center of the distribution; 21 the area of the distribution is to the left of the population
mean and 21 the area of the distribution is to the right of the population mean. The normal
distribution is symmetric about and is \bell-shaped". The total area under the curve is
one. As x increases without bound, the graph approaches but NEVER equals zero. As x
decreases without bound, the graph approaches but NEVER equals zero.
We use the notation X~N ( ; 2 ) to denote that X is a normal random variable with
population mean and population standard deviation . Also,
1.
P(
< X < + ) = 0:683
2.
P(
2 < X < + 2 ) = 0:954
3.
P(
3 < X < + 3 ) = 0:997
The usefulness of the normal distribution is that IF we know our data is sampled from a
normal population and IF we know the population mean y and the population variance 2 ,
then we can determine the probability of an event occurring by simply determining the area
of the corresponding region under the normal curve with mean and variance 2 .
Example 6.1.3 The washing machines owned by a chain of 24 hour Laundromats break
down, on average, after 300 days and the standard deviation being 50 days. Assuming that
the times taken for the washing machines to break down are normally distributed, what is
the probability that a given washing machine will break down in under 320 days (shading the
appropriate region is su cient for now)?
96
Shade the region representing the probability that a machine will break down after more than
363 days.
Shade the region representing the probability that a given washing machine will break down
between the 200th and the 350th day.
Example 6.1.4 The percentage of iron in ore specimens were randomly sampled from a 35
325 tonne shipload of ore. The sample mean and standard deviation were determined to be
62.96% and 0.61%, respectively. Assuming that the sample mean and sample standard deviation are very good approximations of the population mean and population standard deviation
percentages and that the random variable is normal, for any particular sample:
Shade the region representing the probability that percentage of iron in the sample will exceed
63%.
97
Shade the region representing the probability that percentage of iron in the sample will be less
than 62%.
Shade the region representing the probability that percentage of iron in the sample will lie
between 62% and 63%.
From your diagrams for the last two examples, explain why you can OR cannot read o
the corresponding probabilities?
Is there a systematic way to calculate these probabilities? The answer is YES!!!
The data sampled from any normal population, regardless of the population's mean or
standard deviation, can be converted to a standard normal population whose mean is 0 and
whose standard deviation is 1 (abbreviated N (0; 1).) We use Z to denote a standard normal
random variable, that is Z is a normal random variable whose expected value is 0 and whose
standard deviation is 1.
We can transform the normal random variable X (which has mean and variance 2 )
to the standard normal random variable Z (which has mean 0 and variance 1) using the
formula
X
:
Z=
98
The above transformation from X to Z has no e ect on the underlying probabilities
associated with X. One bene t of the transformation is that there are calculated tables for
P (Z < z) (the probabilities associated with the standard normal random variable).
99
6.2
The Standard Normal Distribution
Example 6.2.1 Find the area under the standard normal curve that lies to the left of z =
1:68:
Theorem 6.2.1 The area under the normal curve to the right of z = 1 Area under the
normal curve to the left of z:
Example 6.2.2 Find the area under the standard normal curve to the right of z =
Example 6.2.3 Find the area under the standard normal curve between z =
z = 2:01:
100
0:46:
1:35 and
Example 6.2.4 Find the z-score such that area left of the z-score is 0.32.
Example 6.2.5 Find the z-score such that the area to the right of the z-score is 0.4332.
Example 6.2.6 Find the z-scores that divide the middle 90% of the area in the standard
normal distribution from the area in the tails.
101
6.3
Applications of the Normal Distribution
Example 6.3.1 A pediatrician obtains the heights of her 200 three-year-old female patients.
The heights are normally distributed, with mean 38.72 inches and standard deviation 3.17
inches. What percent of the three-year-old females have a height less than 35 inches?
Example 6.3.2 Using the information in the previous example, determine the probability
that a randomly selected three-year-old female is between 35 and 40 inches tall.
Example 6.3.3 Using the information in this section's rst example, nd the height of a
three-year-old female at the 20th percentile.
102
Example 6.3.4 Using the information in this section's rst example, determine the heights
of the three-year-old females that separate the middle 98% of the distribution from the bottom
and top 1% (in other words determine the 1st and 99th percentiles)
103
6.4
Assessing Normality
We use a normal probability plot to plot the observed data versus the normal scores. A
normal score is the expected z-score of the data value if the distribution of the random
variable is normal. To draw a normal probability plot,
1. Arrange the data in ascending order.
i 0:375
where i is the index and n is the number of observations.
n + 0:25
This value represents the expected proportion of observations less than or equal to the
i'th data value.
2. Compute fi =
3. Find the z-score corresponding to fi
4. Plot the observed values on the horizontal axis and the corresponding z-scores on
vertical axis.
If the sample data are taken from a population that is normally distributed, a normal
probability plot of the actual values verses the expected z-scores will be approximately linear
(a straight line).
Example 6.4.1 The "three-year rate of return" of a randomly selected group of mutual funds
are: 15.8, 16.7, 18.2, 18.4, 18.4, 18.5, 19.2, 19.5, 21.3, 22.2, 22.6, 23.7, 23.7, 25.5, 27.0,
27.4, 28.5, 29.1, 29.6. Is there evidence supporting that the random variable "three-year
rate of return" is normally distributed?
104
6.5
Sampling Distributions; The Central Limit Theorem
We have now developed enough background to begin discussing the third aspect of statistics:
inference. Typically we wish to learn more about some numerical value associated with a
population. This numerical feature of a population is called a parameter. The true value
of a population parameter is quite often an unknown constant and can only be correctly
determined by studying the entire population. If the population is relatively small, then the
parameter of interest can be computed. What happens if the population is quite large? This
is where inferences come into play. We take a sample from the population and compute an
estimate for the parameter. This sample-based estimate is called a statistic. A statistic is a
numerical valued function of the sample observations.
The sample mean, X, is a statistic because its numerical value depends on the values of
X1 through Xn available. Since statistics serve as our estimates of population parameters,
we must remember:
1. since a sample used to calculate a statistic is only part of a population, a statistic
cannot ever be expected to yield the exact value of the parameter;
2. the observed value of a statistic depends on the particular sample that was selected;
and
3. there will be some variability in the values of a statistic, depending on the sampling.
We have already discussed three important statistics: the sample mean, the sample
variance, and the sample standard deviation.
For a xed sample size n, each sample statistic is a random variable. Its sample space is
the set of all possible samples of size n. As a random variable, each statistic has a probability
distribution. Such distributions are called sampling distributions. As an example, we will
look at the distribution of the sample mean X.
We denote the population mean by and the population standard deviation by .
Theorem 6.5.1 The sampling distribution of has a mean
ation X . It can be shown that
1. X = ,
2
2.
3.
2
X
=
X
=
n
r
,
2
n
=p
n
105
X
= E(X) and a standard devi-
Example 6.5.1 Suppose you have a balanced six sided die. Let X represent the number on
the \face-down" side.
(a)
State the distribution function of X.
(b)
Consider now that you roll this die twice. Let X1 represent the number on the "facedown" side during the rst roll and let X2 represent the number on the \face-down" side
X1 + X2
during the second roll. State the distribution function of .X =
:
2
(c)
Calculate E(X) and sd(X):
106
(d) Generate a random sample of size 5 and calculate X:
(e) Generate a random sample of size 15 and calculate X:
(f ) What do you notice about your results in (d) and (e)?
107
Remark 6.5.1 When sampling from a normal population with mean
and standard deviation , X is normally distributed with mean and standard deviation p . Refer to
n
Example 2 in the textbook for an example.
Theorem 6.5.2 (The Law of Large Numbers) As additional observations are added to
the sample, the di erence between the sample mean and the population mean approaches zero.
A surprising result is, perhaps, one of the most famous theorems in Statistics.
Theorem 6.5.3 (The Central Limit Theorem) Suppose a random variable X has population mean
and standard deviation
and that a random sample of size n has been
drawn from the this population. Then the sampling distribution of X becomes approximately
normals as n increases. The mean of this distribution is and the standard deviation is
p :
n
Example 6.5.2 Over the years the nal exam scores for rst year statistics students have
been found to be approximately normal with mean 60 and standard deviation 10. A new
instructor teaches a small summer section of 30 students. What is the probability that the
average nal exam score will be below 55? What is the probability that the average nal exam
score will lie between 50 and 70?
Proposition 6.5.1 For a simple random sample of size n 0:05N; and population proportion p;
x
1. the mean of the sampling distribution of p^ = is p^ = E[^
p] = p;
n
r
p(1 p)
2. The standard deviation of the sampling distribution of p^ is p^ =
; and
n
3. if npq 10; then the shape of the sampling distribution is approximately normal.
108
6.6
The Normal Approximation to the Binomial Distribution
If our binomial experiment consists of a large number of Bernoulli trials, then calculating the
corresponding probability distribution becomes quite time consuming. Consider if we toss a
coin 50 times and are interested in the number of heads which appear. The corresponding
distribution would require us to calculate some 51 probabilities!!! No thanks!!!
It can be shown that if the number of Bernoulli trials (for a binomial RV X) is su ciently
p
large, then X N (np; npq), where by su ciently large we mean np 5 and nq 5, that
is
X np
Z= p
is approximately N (0; 1):
npq
Because we are trying to approximate a discrete distribution with a continuous distribution, we must use a correction for continuity, that is we must add/subtract 0.5 to the values
that the random variable can take on.
Example 6.6.1 The probability that a person recovers from a rare blood disease is 0.6. If
100 people are known to have contracted the disease, what is the probability that less than
half will survive?
Example 6.6.2 The probability that an egg is bad is 0.04. You buy twelve dozen eggs. What
is the probability of getting exactly 5 bad eggs?
109