Download Chapter 6 Section 1 The Normal Distribution OBJECTIVES • Identify

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
M227
Chapter 6
The Normal Distribution
Section 1
OBJECTIVES
•
•
•
•
•
•
•
Identify distributions as symmetrical or skewed.
Identify the properties of the normal distribution.
Find the area under the standard normal distribution, given various z values.
Find the probabilities for a normally distributed variable by transforming it into a
standard normal variable.
Find specific data values for given percentages using the standard normal distribution.
Use the central limit theorem to solve problems involving sample means for large
samples.
Use the normal approximation to compute probabilities for a binomial variable.
INTRODUCTION
• Quantitative random variables can be either discrete or continuous.
• Discrete variables assume finite number of values between any two given values of
the variable.
• Continuous variables can assume infinite number of values between any two given
values of the variable.
• Examples of continuous variables: height, weight, temperature, cholesterol levels.
• Many continuous variables have distributions that are bell-shaped, and these are called
approximately normally distributed variables.
• Experiment: measure height of women in US.; start with a sample of 100, then start
increasing the sample size and decreasing the class width; observe the shape of the
resulting histograms.
•
•
•
When the sample size becomes very, very large, the histogram approaches a normal
distribution (figure (d) above). This distribution is also called a bell curve or the
Gaussian distribution.
No variable fits the normal distribution perfectly, since the normal distribution is a
theoretical curve.
For many variables, the deviation of their distributions from the normal distribution is
very small; thus, we can use the properties of the normal distribution in the study of
these variables.
Section 6-1
Page 1
Chapter 6
The Normal Distribution
M227
Section 2
Properties of the Normal Distribution
• In mathematics, curves can be represented by equations. Examples: equation of a line:
y = mx + b , equation of a circle: x 2 + y 2 = r 2 , and so on.
•
The normal distribution is a continuous, symmetric, bell-shaped distribution of a
variable
•
The equation for the normal distribution curve, developed by the German
mathematician Carl Gauss, is:
2
e − ( X − µ ) /(2σ
y=
σ 2π
2
)
where
e ≈ 2.718
π ≈ 3.14
µ = population mean
σ = population standard deviation
•
Shape and position of the normal distribution curve depends on two parameters, the
mean and the standard deviation.
Normal Distribution Properties
1. The normal distribution curve is bell-shaped.
2. The mean, median, and mode are equal and located at the center of the distribution.
3. The normal distribution curve is unimodal (i.e., it has only one mode).
4. The curve is symmetrical about the mean, which is equivalent to saying that its shape is the
same on both sides of a vertical line passing through the center.
5. The curve is continuous—i.e., there are no gaps or holes. For each value of X, here is a
corresponding value of Y.
6. The curve never touches the x-axis. Theoretically, no matter how far in either direction the
curve extends, it never meets the x-axis—but it gets increasingly closer.
7. The total area under the normal distribution curve is equal to 1.00 or 100%.
8. The area under the normal curve that lies within one standard deviation of the mean is
approximately 0.68, or 68%; within two standard deviations, about 0.95, or 95%; and within
three standard deviations, about 0.997 or 99.7%.
Section 6-2
Page 2
M227
Chapter 6
The Normal Distribution
Areas under the Normal Distribution curve
Section 6-3
Page 3
Section 3
Chapter 6
The Normal Distribution
M227
Section 3
The Standard Normal Distribution
•
Since each normally distributed variable has its own mean and standard deviation, the
shape and location of these curves will vary. In practical applications, one would have
to have a table of areas under the curve for each variable. To simplify this, statisticians
use the standard normal distribution.
•
The standard normal distribution is a normal distribution with a mean of 0 and a
standard deviation of 1.
Standard Normal Distribution
•
The values under the curve indicate the proportion of area in each section , and
represent probabilities (compare this to a relative frequency histogram). For example,
the area between the mean and 1 standard deviation above or below the mean is about
0.3413 or 34.13%.
•
“Represent probabilities” mean: If it were possible to select any z value at random, the
probability of choosing one, say, between 0 and 2 would be the same as the area
under the curve between 0 and 2. In this case, the area is 0.4772. Therefore, the
probability of randomly selecting any z value between 0 and 2 is 0.4772.
•
The horizontal axis for the graph of the standard normal distribution is called the z-axis.
•
All normally distributed variables can be transformed into the standard normally
distributed variable by using the formula for the standard score:
z=
value − mean
SDev
or
z=
X −µ
σ
•
z value is the number of standard deviations that a particular X value is away from the
mean.
•
In order to find the area under the standard normal distribution curve for any z value,
use Table E in Appendix C.
Section 6-3
Page 4
Chapter 6
The Normal Distribution
Finding the Area Under the Normal Distribution Curve:
Section 3
M227
1. Between 0 and any z value:
Look up the z value in the table
to get the area
-z
0
0
z
0
0
z
2. In any tail:
a. Look up the z value to get area
b. Subtract area from 0.5000
-z
3. Between two z values on the same side of the mean:
a. Look up both z values to get the areas.
b. Subtract the smaller area from the larger area
z1
z2
0
0
4. Between two z values on opposite sides of the mean:
a. Look up both values to get the areas
b. Add the areas
-z
0
+z
5. To the left of any z value (z > mean):
a. Look up the z value to get the area
b. Add 0.5000 to the area
0
+z
6. To the right of any z value (z < mean):
a. Look up the z value to get the area
b. Add 0.5000 to the area
-z
0
7. In any two tails:
a. Look up the z values to get areas
b. Subtract each area from 0.5000
c. Add the answers
-z
Section 6-3
Page 5
0
+z
z1
z2
Chapter 6
The Normal Distribution
M227
Section 3
Examples
1. Find the area under the normal distribution curve for 0 < z < 2.34 or P(0 < z < 2.34)
2. Find the area for −1.75 < z < 0 or P( −1.75 < z < 0
3. Find the area for z > 1.11 or P( z > 1.11)
4. Find the area for z < −1.93 or P( z < −1.93)
5. Find the area for 2 < z < 2.47 or P(2 < z < 2.47)
6. Find the area for −2.48 < z < −0.83 or P( −2.48 < z < −0.83)
7. Find the area for z < 1.99 or P( z < 1.99)
8. Find the area for z > 2.43 or P( z > 2.43)
Examples
1. Find the probability that z is less than 2.03
2. Find the probability that z is within 1.4 standard deviations of the mean.
3. Fill in the blank: P( 0 < z < _______ ) = 0.4279 (or “find z value such that the area
under the standard normal distribution curve between 0 and the z value is 0.4279”)
4. Find two z values, one positive and one negative, so that the areas in the two tails
total 12%.
Examples
Study the section in the book titled “Excel Step by Step”, page 301. Redo some of the
above problems using Excel.
Note: The NORMSDIST function returns the “cumulative” area. For example, if z = 1,
then NORMSDIST(1) = 0.8413 (0.5000 + 0.3413) as opposed to the E table that
returns the value of 0.3413
0
0
z=1
E Table
NORMSDIST
Section 6-3
Page 6
z=1
Chapter 6
The Normal Distribution
M227
Section 4
Applications of the Normal Distribution
• For all the problems presented in this chapter, one can assume that the variable is
normally distributed or approximately normally distributed.
• To solve problems by using the standard normal distribution,, transform the original
variable to a standard normal distribution variable by using the z formula:
X −µ
z=
σ
Example: Let x be a normal random variable with mean 80 and standard deviation 12.
What percentage of values are:
1. Between 85 and 98
2. Outside of 1.5 standard deviations of the mean
Example 6-14: The mean number of hours an American worker spends on a computer is
3.1 hours per workday. Assume the standard deviation is 0.5 hour. Find the percentage of
workers who spend less than 3.5 hours on the computer.
Solution:
1. Draw the figure and represent the area that we want to find.
X − µ 3.5 − 3.1
=
= 0.80
2. Find the z value corresponding to 3.5: z =
σ
0.5
Hence, 3.5 is 0.8 standard deviations above the mean of 3.1.
3.
Find the area using table E: A(0<z<0.8)=0.2881; Since we need the area to the left of
z=0.8, add 0.5 to it to get 0.7881.
4.
Therefore, 78.81% of the workers spend less than 3.5 hours per workday on the
computer.
Example 6-16: AAA reports that the average time it takes to respond to an emergency
call is 25 minutes. Assume that standard deviation is 4.5 minutes. If 80 call are randomly
selected, approximately how many will be responded to in less than 15 minutes?
2.
15 − 25
= −2.22
σ
4.5
Find the area from table E: 0.4868 (use +2.22)
3.
Subtract 0.4868 from 0.5 to get 0.0132
4.
Multiply the sample size 80 by 0.0132 to get 1.056. Hence, approximately 1 call will be
responded to in under 15 minutes.
1.
Find the area to the left of 15: z =
X −µ
=
Calculating Cut-off Values
Example 6-17: To qualify for a police academy, candidates must score in the top 10% on
a general abilities test. The test has a mean of 200 and a standard deviation of 20. Find
the lowest possible score to qualify.
Section 6-4
Page 7
M227
Chapter 6
The Normal Distribution
Section 4
1.
We need to find the X values that cuts off the upper 10% of the area under the normal
distribution curve
2.
3.
4.
Work backward to solve this problem.
Subtract 0.1 from 0.5 to get the area between the mean 200 and X.
Find the z value that corresponds to the area of .4000
If specific value cannot be found, use closest value. If it falls exactly between two z
values, use the larger of the two z values.
z value = 1.28
5.
6.
Substitute in the z value formula and solve for X:
X −µ
X − 200
→ 1.28 =
→ X = 226
z=
σ
20
A score of at least 226 is needed in order to qualify
Example 6-18: For a medical study, a researcher wishes to select people in the middle
60% of the population based on blood pressure. If the mean systolic pressure is 120 and
the standard deviation is 8, find the upper and lower readings that would qualify people to
participate in the study.
Section 6-4
Page 8
M227
Chapter 6
The Normal Distribution
Section 4
Note that two values are needed, one above the mean and one below the mean. Find the
value to the right of the mean first. The closest z value for an area of 0.3000 is 0.84.
Substitute this into the z-score formula to find X1:
X −µ
X − 120
z=
→ 0.84 = 1
→ X 1 − 120 = (0.84)(8) → X 1 = 120 + 6.72 = 126.72
σ
8
On the other side: X 2 = 120 − 6.72 = 113.28
Therefore, the middle 60% will have blood pressure readings of 113.28 < X < 126.72
Determine Normality
1. Draw a histogram and see if the curve is bell-shaped. If it is, do step 2.
3( X − median )
2. Check for skewness using Pearson’s index of skewness: PI =
; If
s
−1 < PI < 1 , then do step 3. (If not, assume that the data are significantly skewed)
3. Check for outliers.
Section 6-4
Page 9
Chapter 6
The Normal Distribution
M227
Section 5
The Central Limit Theorem
•
A sampling distribution of sample means is a distribution using the means
computed from all possible random samples of a specific size taken from a population.
•
The goal of the Central Limit Theorem is to determine the behavior of the means of
samples of the same size taken from the same population, as it relates to the
population mean.
•
Properties of the Distribution of Sample Means (select all possible samples of a
specific size with replacement):
o The mean of the sample means (denoted by µ X ) will be the same as the
population mean
o The standard deviation of the sample means will be smaller than the standard
deviation of the population, and it will be: σ X =
•
σ
n
Example: Consider an 8-point quiz given to 4 students. The results of the quiz were: 2,
6, 4, 8. Assume that the population for this experiment consists of the 4
students. The population mean and standard deviation are:
2+6+4+8
µ=
=5
4
σ=
(2 − 5)2 + (6 − 5)2 + (4 − 5)2 + (8 − 5)2
= 2.236
4
Select all samples sizes of 2 taken with replacement, and calculate the mean of
each sample:
Sample
2,2
2,4
2,6
2,8
4,2
4,4
4,6
4,8
Mean
2
3
4
5
3
4
5
6
Sample
6,2
6,4
6,6
6,8
8,2
8,4
8,6
8,8
Mean
4
5
6
7
5
6
7
8
Construct an ungrouped frequency distribution of the means:
Section 6-5
X
f
2
3
4
5
6
7
8
1
2
3
4
3
2
1
Page 10
Chapter 6
The Normal Distribution
M227
Section 5
The Histogram for this distribution appears to be approximately normal:
Sample Means
Frequencies
5
4
4
3
3
2
3
2
2
1
1
1
0
2
3
4
5
6
7
8
Sample Means
Calculate the mean of the sample means:
µX =
2 + 3 + ⋅⋅⋅ + 8 80
=
= 5 = µ !!
16
16
Calculate the standard deviation of the sample means:
σX =
(2 − 5)2 + ⋅⋅⋅(8 − 5)2
= 1.581 ;
16
σ
n
=
2.236
= 1.581 = σ X !!!!
2
•
The standard deviation of the sample means is called the standard error of the mean.
•
Central Limit Theorem: As the sample size n increases, the shape of the distribution of the
sample means taken with replacement from a population with mean µ and standard
deviation σ will approach a normal distribution. This distribution will have a mean µ and a
standard deviation σ X =
σ
n
.
•
The central limit theorem can be used to answer questions about sample means in the
same manner that the normal distribution can be used to answer questions about individual
values.
•
A new formula must be used for the z values: z =
•
Practical Alternatives to the Central Limit Theorem:
o When the original population is normally distributed, then the distribution of the
sample means will be also normally distributed, for any sample size n.
o
X −µ
σ n
When the distribution of the original population might not be normal, then a sample
size of 30 or more is needed in order to assume that the sample of the means is
approximately normally distributed. The larger the sample, the better the
approximation.
Example 6-22: The average age of a vehicle registered in the United States is 8 years (or 96
months). Assume the standard deviation is 16 months. If a random sample of 36 vehicles is
selected, find the probability that the mean of their age is between 90 and 100 months.
Section 6-5
Page 11
Chapter 6
The Normal Distribution
M227
Section 5
Since the sample size is greater than 30, we can assume that the sample is approximately
normally distributed:
The two z values are: z1 =
90 − 96
100 − 96
= −2.25, z2 =
= 1.5
16 36
16 36
The two areas corresponding to these z values are:
A( −2.25) = 0.4878 and A(1.5) = 0.4332
Since the z values are on the opposite side of the mean, the probability is found by adding
the two areas: P (90 < X < 100) = 0.4878 + 0.4332 = 0.921 = 92.1%
Example 6-23:
•
Emphasize the difference between asking questions about an individual and between a
sample .
Finite Population Correction Factor
OMIT
Section 6-5
Page 12
Chapter 6
The Normal Distribution
M227
Section 6
Normal Approximation to the Binomial Distribution
•
Often, Normal Distributions are used to solve problems of Binomial Distributions when
n is large.
•
Characteristics of binomial distributions:
o There must be a fixed number of trials.
o The outcome of each trial must be independent.
o Each experiment can have only two outcomes or be reduced to two outcomes.
o The probability of a success must remain the same for each trial.
•
A binomial distribution is determined by n (number of trials) and p (probability of
success).
•
When p is approximately 0.5 and n increases, the shape of the binomial distribution
becomes similar to that of the normal distribution.
•
Rule of thumb: Use normal distribution when: n ⋅ p ≥ 5 and n ⋅ q ≥ 5 .
•
•
In addition to the above condition, a correction for continuity must be used.
This correction results from the fact that when we deal with a discrete variable X, we
must use its boundaries for its probability: example P ( X = 7) we use the correction
P (6.5 < X < 7.5) .
Summary of Normal Approximation to the Binomial Distribution
•
Binomial
P( X
P( X
P( X
P( X
P( X
= a)
≥ a)
> a)
≤ a)
< a)
Normal
P ( a − 0.5 < X < a + 0.5)
P ( X > a − 0.5)
P ( X > a + 0.5)
P ( X < a + 0.5)
P ( X < a − 0.5)
•
Procedure to use Normal distribution for approximating Binomial Distribution:
o Step 1: Check to see whether the normal
approximation can be used.
o Step 2 Find the mean µ and the standard deviation σ .
o Step 3 Write the problem in probability notation, using X.
o Step 4 Rewrite the problem using the continuity correction factor, and show the
corresponding area under the normal distribution curve.
o Step 5 Find the corresponding z values.
o Step 6 Find the solution.
•
•
Example 6-24.
Example 6-27.
Section 6-6
Page 13
M227
Chapter 6
The Normal Distribution
Section 7
Summary
•
The normal distribution can be used to describe a variety of variables, such as heights,
weights, and temperatures.
•
The normal distribution is bell-shaped, unimodal, symmetric, and continuous; its mean,
median, and mode are equal.
•
The area under the normal distribution curve is 1.
•
Mathematicians use the standard normal distribution which has a mean of 0 and a
standard deviation of 1.
•
The normal distribution can be used to describe a sampling distribution of sample
means.
•
These samples must be of the same size and randomly selected with replacement from
the population.
•
The central limit theorem states that as the size of the samples increases, the
distribution of sample means will be approach a normal distribution.
•
If the normality of the population is not known, use a sample size greater than 30.
•
The normal distribution can be used to approximate other distributions, such as the
binomial distribution.
•
For the normal distribution to be used as an approximation to the binomial distribution,
the conditions np ≥ 5 and nq ≥ 5 must be met.
•
A correction for continuity may be used for more accurate results.
Section 6-7
Page 14