Download 7_Normal Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
5.1 Introduction to
Normal Distributions
Properties of a Normal Distribution
x
• The mean, median, and mode are equal
• Bell shaped and is symmetric about the mean
• The total area that lies under the curve is one or 100%
Properties of a Normal Distribution
Inflection point
Inflection point
x
• As the curve extends farther and farther away from the mean, it
gets closer and closer to the x-axis but never touches it.
• The points at which the curvature changes are called inflection
points. The graph curves downward between the inflection points
and curves upward past the inflection points.
Means and Standard Deviations
Curves with different means, same standard deviation
Means?
10 11
12 13 14
15 16 17 18 19
20
Curves with different means, different standard deviations
9 10 11 12 13 14 15 16 17 18 19 20 21 22
Empirical Rule
68%
About 68% of the area
lies within 1 standard
deviation of the mean
About 95% of the area
lies within 2 standard
deviations
About 99.7% of the area lies within
3 standard deviations of the mean
Determining Intervals
x
3.3 3.6 3.9 4.2
4.5 4.8 5.1
Example: An instruction manual claims that assembly time for a
product is normally distributed with a mean of 4.2 hours and a
standard deviation of 0.3 hour. Determine the interval in which
95% of the assembly times fall.
95% of the data will fall within 2 standard deviations of the mean.
4.2 – 2 (0.3) = 3.6 and 4.2 + 2 (0.3) = 4.8.
95% of the assembly times will be between 3.6 and 4.8 hrs.
The Standard Normal Distribution
Standard normal distribution: mean = 0, standard deviation = 1
Using z-scores any normal distribution can be transformed into
the standard normal distribution.
–4 –3 –2 –1
0 1
2 3
4
z
If a normal distribution is standardized using tables, then
each value must be standardized to find probabilities.
Chptr 2: The Standard Score
The standard- or z-score, represents the number of standard
deviations a random variable x falls from the mean:
Test scores for a civil service exam are normally distributed
with a mean of 152 and a standard deviation of 7. Find the
standard z-score for a person with a score of:
(a) 161
(b) 148
(c) 152
Cumulative Areas
The
total
area
under
the curve
is one.
–3 –2 –1 0 1 2 3
z
Cumulative area is close to 0 for z-scores close to –3.49
Cumulative area for z = 0 is 0.50
Cumulative area is close to 1 for z-scores close to 3.49
Cumulative Areas
Using a standard normal table, find the cumulative area
for a z-score of –1.25
–3 –2 –1 0 1 2 3
z
Pg. A16: down the z column on the left to z = –1.2 and across
to the cell under .05 = 0.1056, the cumulative area.
The probability that z is at most –1.25 is 0.1056.
Finding Probabilities
To find the probability that z is less than a given value, read the
cumulative area in the table corresponding to that z-score.
Find P(z < –1.45)
P (z < –1.45) = 0.0735
–3 –2 –1
0 1
2 3
z
Read down the z-column to –1.4 and across to .05 = 0.0735
Finding Probabilities
To find the probability that z is greater than a given
value, subtract the cumulative area in the table
from 1.
Find P(z > –1.24).
0.1075
0.8925
z
–3 –2 –1 0 1 2 3
The cumulative area (area to the left) is 0.1075. So the area
to the right is 1 – 0.1075 = 0.8925.
P(z > –1.24) = 0.8925
Finding Probabilities
The probability that z is between two values: find the cumulative
areas for each and subtract the smaller area from the larger.
Find P(–1.25 < z < 1.17)
–3 –2 –1 0 1 2
1. P(z < 1.17) = 0.8790
3
z
2. P(z < –1.25) = 0.1056
3. P(–1.25 < z < 1.17) = 0.8790 – 0.1056 = 0.7734
Probabilities can’t be negative, so subtract smaller from larger
Summary
To find the probability that z is less
than a given value, read the
corresponding cumulative area.
z
-3 -2 -1 0 1 2 3
To find the probability is greater
than a given value, subtract the
cumulative area in the table from 1.
-3 -2 -1 0 1 2 3
z
To find the probability z is between
two given values, find the
cumulative areas for each and
subtract the smaller area from the
larger.
-3 -2 -1 0 1 2 3
*cdf*
z
Section 5.2
Normal Distributions
Finding Probabilities
Probabilities and Normal Distributions
If a random variable, x, is normally distributed, then the
probability that x will fall within an interval is equal to the area
under the curve in the interval.
Example: IQ scores are normally distributed with a mean of
100 and a standard deviation of 15. Find the probability that a
person selected at random will have an IQ score less than 115.
100 115
To find the area, first find the standard score equivalent to x = 115
115 − 100
z=
=1
15
Probabilities and Normal Distributions
Normal Distribution
Standard Normal
Distribution
100 115
Find P(z < 1).
0 1
From Standard Normal Table:
P(z < 1) = 0.8413, so P(x <115) = 0.8413
SAME
SAME
Find P(x < 115).
Application Example
Monthly utility bills in a city are normally distributed with a mean of
$100 and a standard deviation of $12. A utility bill is randomly
selected. Find the probability it is between $80 and $115.
Normal Distribution: µ = 100; σ = 12
P(80 < x < 115)
P(–1.67 < z < 1.25)
Subtract areas under the curve:
0.8944 – 0.0475 = 0.8469
The probability that a utility bill is between
$80 and $115 is 0.8469.
Section 5.3
Normal Distributions:
Finding Values
From Areas to z-Scores
Find the z-score corresponding to a cumulative area of 0.9803.
z = 2.06 corresponds
roughly to the
98th percentile.
0.9803
–4 –3 –2 –1 0
1
2
3
4
z
Locate 0.9803 in the area portion of the table. Read the values
at the beginning of the corresponding row and at the top of
the column. The z-score is 2.06.
Finding z-Scores from Areas
Find the z-score corresponding to the 90th percentile.
.90
0
z
The closest table area is .8997. The row heading is 1.2 and
column heading is .08. This corresponds to z = 1.28.
A z-score of 1.28 corresponds to the 90th percentile.
Finding z-Scores from Areas
Find the z-score with an area of .60 falling to its right.
.40
.60
z
0
z
With .60 to the right, the remaining area is .40. The closest
value in the table is .4013. The row heading is -0.2 and
column heading is .05. The z-score is -0.25.
A z-score of -0.25 has an area of .60 to its right.
It also corresponds to the 40th percentile
Finding z-Scores from Areas
Find the z-score such that 45% of the area under the curve
falls between –z and z.
.275
.275
.45
–z 0
z
The area remaining in the tails is .55. Half this area is
in each tail, so since .55/2 = .275 is the cumulative area for
the negative z value and .275 + .45 = .725 is the cumulative
area for the positive z. The closest table area is .2743 and
the z-score is -0.60. The positive z score is 0.60.
From z-Scores to Raw Scores
To find a data value, x when given a standard score, z:
z-score formula?
Example: The test scores for a civil service exam are normally
distributed with a mean of 152 and a standard deviation of 7. Find
the test score for a person with a standard score of: 2.33, –1.75, 0
x = 152 + (2.33)(7) = 168.31
x = 152 + (–1.75)(7) = 139.75
x = 152 + (0)(7) = 152
z or standard scores are the number of standard deviations
above or below the mean
Finding Percentiles or Cut-off Values
Monthly utility bills in a city are normally distributed with a mean
of $100 and a standard deviation of $12. What is the smallest
utility bill that can be in the top 10% of the bills?
90%
10%
z
Find the cumulative area in the table that is closest to 0.90. The
area 0.8997 corresponds to a z-score of 1.28.
To find the corresponding x-value, use
x = 100 + 1.28(12) = 115.36
$115.36 is the smallest value in the top 10%.
Section 5.4
The Central Limit
Theorem
Sampling Distributions
A sampling distribution is the probability distribution of a
sample statistic that is formed when samples of size n are
repeatedly taken from a population. If the sample statistic is the
sample mean, then the distribution is the sampling distribution
of sample means.
Sample
Sample
Sample
Sample
Sample
Sample
The sampling distribution consists of the values of the sample
means,
The Central Limit Theorem
If a sample n ≥ 30 is taken from a population with any type of
distribution that has a mean = µ and standard deviation = σ
x
then the sample means will have a normal distribution
and a standard deviation of
standard error of the mean
The Central Limit Theorem
If a sample of any size is taken from a population with a normal
distribution with mean = µ, and standard deviation = σ
x
then the distribution of means of sample size n, will be normally
distributed with a mean
and a standard deviation
Application
Mean length of sockeye salmon is µ=69.2 and σ=2.9 cm. Random samples
of 60 fish are selected. Find the mean and standard deviation (standard
error) of the sampling distribution.
69.2
Distribution of means of
sample size 60, will be
normal.
mean
Standard deviation
Interpreting the Central Limit Theorem
Mean of sockeye salmon is µ=69.2 cm. If a random sample of
60 fish is selected, what is the probability that the mean
length for the sample is greater than 70 cm? Assume the
standard deviation is 2.9 cm.
Since n > 30 the sampling distribution of will be normal
mean
standard deviation
Find the z-score for a sample mean of 70:
Interpreting the Central Limit Theorem
z
2.14
There is a 0.0162 or 1.62% probability that a sample of 60
sockeye will have a mean length greater than 70 cm.
What is probability that 1 fish will be > 70 cm?
P(z>0.28) = 1-0.6103 = 0.3897 ≅ 39%
Application Central Limit Theorem
A long time ago, the mean price of gasoline in California was $1.164
per gallon. What is the probability that the mean price for a sample of
38 gas stations in California is between $1.169 and $1.179? Assume
the standard deviation = $0.049.
Since n > 30 the sampling distribution of
will be normal
mean
standard deviation
Calculate the z-score for sample values of $1.169 and $1.179.
Application Central Limit Theorem
P( 0.63 < z < 1.90)
= 0.9713 – 0.7357
= 0.2356
z
.63
1.90
The probability is 0.2356 that the mean for the sample is
between $1.169 and $1.179.
Hint: drawing the distribution, values, and area of interest will help keep
calculations clear.
Section 5.5
Normal Approximation to
Binomial Distributions
Binomial Distribution Characteristics
• There are a fixed number of independent trials, n.
• Each trial has 2 outcomes, Success or Failure.
• The probability of S on a single trial is p and the probability
of F is q. In total:
p+q=1
• We can find the probability of exactly x successes out of n
trials. Where x = 0 or 1 or 2 … n.
• x is a discrete random variable representing a count of
the number of S’s in n trials.
Application
34% of Americans have type A+ blood. If 500 Americans are
sampled at random, what is the probability at least 300
have type A+ blood?
Using Chapter 4 you could calculate the probability that exactly
300, exactly 301… exactly 500 Americans have A+ blood type
and then add the probabilities (but this should drive you crazy).
Alternatively…use normal curve probabilities to approximate
binomial probabilities.
If np ≥ 5 and nq ≥ 5, then the binomial random variable x is
approximately normally distributed with mean
µ= np
and standard deviation
Why np ≥ 5 and nq ≥ 5?
0
1
2
3
4
5
n=5
p = 0.25, q = .75
np =1.25 nq = 3.75
n = 20
p = 0.25
np = 5 nq = 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
n = 50
p = 0.25
np = 12.5
nq = 37.5
0
10
20
30
40
50
Binomial Probabilities
The binomial distribution is discrete with a probability histogram
graph. The probability that a specific value of x will occur is equal
to the area of the rectangle with midpoint at x.
Example: If n = 50 and p = 0.25 find
Add the areas of the rectangles with midpoints at x = 14, x = 15,
and x = 16: 0.111 + 0.089 + 0.065 = 0.265
0.111
0.089
0.065
14
15
16
Correction for Continuity
Use the normal approximation to the binomial distribution
to find
.
14
15
16
Values for the binomial random variable x are 14, 15 and 16.
Correction for Continuity
14
15
16
The interval of values under the normal curve is
To ensure the boundaries of each rectangle are included
in the interval, subtract 0.5 from a left-hand boundary and
add 0.5 to a right-hand boundary.
Normal Approximation to the Binomial
Use the normal approximation to the binomial to find
.
Find the mean and standard deviation using binomial distribution
formulas:
Adjust the endpoints to correct for continuity P(13.5 ≤ x ≥ 16.5).
Convert each endpoint to a standard score:
Application
A survey of Internet users found that 75% favored government
regulations of “junk” e-mail. If 200 Internet users are
randomly selected, find the probability that fewer than 140
are in favor of government regulation.
Since np = 150 ≥ 5 and nq = 50 ≥ 5, can use the normal
approximation to the binomial distribution.
The binomial phrase of “fewer than 140” means up to 139:
0, 1, 2, 3…139.
Use the correction for continuity to translate to the continuous
variable in the interval
. Find P(x< 139.5).
Application
A survey of Internet users found that 75% favored government
regulations of “junk” e-mail. If 200 Internet users are
randomly selected, find the probability that fewer than 140
are in favor of government regulation.
Use the correction for continuity P(x<139.5).
P( z < -1.71) = 0.0436
The probability that fewer than 140 are in favor of government
regulation is approximately 0.0436.