Download 1342Lecture5.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
52
Instruction: The Normal Distribution
This lecture discusses the normal distribution (also called the Gaussian distribution).
Previous lectures discussed samples and their distributions. In this section, we will consider a
particular population distribution called the normal curve. The normal curve is a bell-shaped
symmetric curve that is the graph of a theoretical relative frequency distribution of a continuous
variable associated with a large population. Recall that a continuous variable takes values that
represent categories such that an infinite number of possible scores fall between any two
measured scores. Such observations as weights, lengths, and durations are continuous variables.
Consider for example the sample of weights rounded to the nearest tenth of a microgram
and the sample's relative frequency distribution below.
⎧ 2.2, 4.9, 0.6, 2.4, 2.9, 3.1, 1.4, 2.6, 1.7, 2.3,
⎪
W = ⎨1.8, 3.2, 2.1, 4.3, 2.6, 1.9, 3.3, 0.8, 3.9, 2.5,
⎪ 2.7, 1.8, 2.3, 3.4, 1.1, 2.7, 3.5, 1.4, 3.8, 2.1
⎩
⎫
⎪
⎬
⎪
⎭
f 0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
This symmetrical distribution represents a sample of thirty weights measured in micrograms.
Imagine that the sample represents a random sample of thirty weights taken from a type of krill
in the Atlantic ocean; then imagine the distribution associated with the population of billions of
the same type of krill. The distribution above uses five classes each with a width of one
microgram. For a population of billions, much smaller classes would be used with tiny widths
with the effect that the graph would be "smoothed" into a continuous bell-shaped curve similar to
Figure A.
Figure A
53
Indeed, many such populations have relative frequency distributions whose graph takes a shape
of this type called the normal curve whose properties are given below.
The graph of a normal curve is bell-shaped and symmetric about a
vertical line through the center of the distribution, which corresponds
to the mean, median, and mode. The normal curve approaches the Xaxis asymptotically at the far right and far left.
The functions that define or yield the distribution of a continuous random variable are called
continuous probability density functions. The normal probability density function is given
below.
The normal probability density function, f ( X ) , is given by
f (X ) =
1 ⎡ X −µ ⎤
σ ⎥⎦
2
− ⎢
1
e 2⎣
2πσ
where µ is the mean, σ is the standard deviation, and X is the
continuous random variable.
The independent variable of the normal probability density function is the continuous random
variable. The mean and the standard deviation are parameters such that every distinct
combination of µ and σ produce a different normal probability distribution. As Figure B
shows, normal distributions can vary while maintaining the characteristics that make them
normal.
Figure B
Since the normal curve represents a relative frequency distribution it represents in effect a
probability distribution because the relative frequency of any given interval of X-values equals
the probability that a given object selected at random from the population will have a raw score
that falls in the given interval (assuming that selecting each member of the population is equally
likely). Accordingly, the area under the curve along any interval of X-values corresponds the
probability that the continuous variable will equal a value in the corresponding interval. The total
area under the normal curve equals one square unit, representing the population as a whole
(100% of the data) and corresponding to the certain probability.
According to what is called the empirical rule, the area under the normal curve that
corresponds to an interval of values within one, two, and three standard deviations equals about
68%, 95%, and 99.7% of the area respectively. This means that approximately 68% of the data
of a normal population falls within one standard deviation, approximately 95% of the data of a
normal population falls within two standard deviations, and approximately 99.7% of the data of a
54
normal population falls within three standard deviations. The empirical rule is displayed
graphically below in Figure C.
−3
−2
−1
0
1
2
3
Figure C
The empirical rule has implications that can help solve problems. Consider a population
of 500,000 whose distribution is understood to be normal. If the population mean is 12 and the
standard deviation equals 4. How many members of the population have raw scores between 8
and 16? Note that 8 and 16 are both within one standard deviation (that is, one four-unit
interval) from the mean. According to the empirical rule, about 68% of the data fall within one
standard deviation of the mean. Since 0.68×500,000 = 340,000, there are an estimated 340,000
members with raw scores between 8 and 16.
The symmetry of the normal curve also has implications that can help solve problems.
Consider the same population above. How many members of the population have raw scores
between 12 and 16. We know 340,000 members fall between 8 and 16. Note that the mean
equals the midpoint of 8 and 16. Since the distribution is symmetrical, half of the 340,000
members fall between 8 and the mean while the other half falls between the mean and 16. Thus,
170,000 members have a score between 12 and 16.
The previous paragraphs discussed problems that could be addressed by the empirical
rule, which applies to problems involving regions under the normal curve within 1, 2, or 3
standard deviations, which are Z-scores. If we transform X-values to Z-scores, then it is possible
to standardize the normal probability density function by letting µ = 0 and σ = 1 .
The standardized normal probability density function, f ( Z ) , is
given by
f (Z ) =
1
1 − 2 Z2
e
.
2π
Most textbooks provide tables like Table E.2 in our text (Business Statistics, Levine et.
al.). These tables help answer questions involving fractional units of standard deviations. Table
E.2 shows the area under the curve for intervals that extend from negative infinity to a particular
Z-score.
The full table is in the textbook, but the segment of the table below will help answer the
following four questions regarding a population with a mean of 476 and standard deviation of 20.
1) What percent of the population has a score below 506?
2) What percent of the population has a score greater than 506?
3) What percent of the population has a score between the mean and 506?
4) What percent of the population has a score between 468 and 506?
55
To use the table, convert the raw scores to Z-scores as shown below.
506 − 476
= 1.50
20
468 − 476
=
= −0.4
20
Z 506 =
Z 468
Use the Z-score to consult the table as below.
Z
0.37
0.38
0.39
0.40
0.41
0.42
0.43
A
0.6443
0.6480
0.6517
0.6554
0.6591
0.6628
0.6664
Z
0.93
0.94
0.95
0.96
0.97
0.98
0.99
A
0.8238
0.8264
0.8289
0.8315
0.8340
0.8365
0.8389
Z
1.49
1.50
1.51
1.52
1.53
1.54
1.55
A
0.9319
0.9332
0.9345
0.9357
0.9370
0.9382
0.9394
Z
2.05
2.06
2.07
2.08
2.09
2.10
2.11
A
0.9798
0.9803
0.9808
0.9812
0.9817
0.9821
0.9826
Z
2.61
2.62
2.63
2.64
2.65
2.66
2.67
A
0.9953
0.9955
0.9956
0.9957
0.9959
0.9960
0.9961
Z
3.17
3.18
3.19
3.20
3.21
3.22
3.23
A
0.99924
0.99926
0.99929
0.99931
0.99934
0.99936
0.99938
The table contains the answer to question one: 93.32% of the area under the curve falls below
the Z-score of 1.5. Equivalently, 93.32% of the population has a raw score below 506.
For question two, we use the complement principle. If 93.32% of the data falls below the
raw score of 506, then the rest of the data must fall above 506. Subtracting the answer to
question one from the entirety finds the answer to question two: 100% − 93.32% = 6.68% .
For question three, we will find a difference. According to the table, 93.32% of the area
falls below 1.5. Also, 50% of the area falls below the mean. Thus, 43.32% (93.32 – 50 = 43.32)
of the area falls between the mean and the Z-score 1.5. Equivalently, 43.32% of the population
has a raw score between the mean and 506.
Table E.2 in the textbook shows areas associated with negative Z-scores, but our partial
table above does not. Nevertheless, we can answer question four using the symmetrical property
of the normal curve. We need the area below the raw score 468, which has a Z-score of –0.4.
We find the area below a positive 0.4, which is 66.64%. Subtracting the area below the mean
gives the area between the mean and 0.4: 66.64% − 50.0% = 16.64% , but recalling that the
normal curve is symmetrical, we know that 16.64% of the data also falls between –0.4 and the
mean. Since we know from question three that 43.32% of the data falls between the mean and
1.5, we know that 59.96% of the data falls between the Z-scores –0.4 and 1.5 (because 16.64 plus
43.32 equals 59.96). Equivalently, 59.96% of the data falls between the raw scores 468 and 506.
Assignment 5
56
Problems
#1
Use the empirical rule for the normal distribution to answer the following two questions.
A)
In a survey conducted by the National Center for Health Statistics, the sample mean
height of women in the United States (ages 20-29) was 64 inches, with a sample standard
deviation of 2.75 inches. If the sample is normally distributed, about what percent of the
women have heights between 64 inches and 69.5 inches?
B) The mean mileage (in thousands) for a rental car company’s fleet is twelve and the standard
deviation (in thousands) is approximately 3.2. Between what two values do 99.7% of the data
lie? (Assume normality.)
#2
#3
#4
#5
Assume the mean annual consumption of peanuts is normally distributed with a mean of 5.9
pounds per person and a standard deviation of 1.8 pounds per person.
A)
What percent of people annually consume less than 3.1 pounds of peanuts?
B)
What percent of people annually consume more than 3.1 pounds of peanuts?
The weights of adult male rhesus monkeys are normally distributed, with a mean of 15 pounds
and a standard deviation of 3 pounds. A rhesus monkey is randomly selected and weighed.
A)
Assume that a rhesus monkey is randomly selected and weighed. Find the probability
that the monkey's weight is less than thirteen pounds.
B)
Assume that a rhesus monkey is randomly selected and weighed. Find the probability t
hat the monkey's weight is more than seventeen pounds.
According to the National Marine Fisheries Service, the lengths of Atlantic croaker fish are
normally distributed with a mean of ten inches and a standard deviation of two inches.
A)
Assume that an Atlantic croaker fish is selected at random. Find the probability that the
fish is less than seven inches in length.
B)
Assume that an Atlantic croaker fish is selected at random. Find the probability that the
fish is between seven inches and fifteen inches in length.
Assume that a normally distributed population of weights of a certain species of fish has a
mean of forty pounds and a standard deviation of eight pounds. A certain fishing trawler
throws back any netted fish (of the species in question) that do not weight at least sixty-four
pounds. Find the probability that at least one acceptable fish will be caught per netting of 100
fish (of the species in question).