Download chapter2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 2 The Normal Distribution
1
Up to this point we have been developing a strategy for exploring data on a single quantitative variable.
To review:

Start with a graph (e.g., dot plot, stemplot, or histogram)

Look for an overall shape or pattern; then look for deviations from this pattern

Last, but not least, choose a numerical summary to describe center and spread
Here comes a fast ball
We will now add that sometimes the overall pattern of a LARGE number of observations is so regular
that we can describe it by a smooth curve
Below is a histogram of the vocabulary scores of all seventh grade students in Gary, Indiana
The histogram is approximately symmetric and both tails fall off smoothly from the single peak. No
large gaps or obvious outliers. Note that the smooth curve drawn provides a reasonable description of
the overall pattern of the data. Now lets use it!!
The shaded area represents the proportion of scores that are less than or equal to 6. The correct answer
from the histogram is 0.303
Chapter 2 The Normal Distribution
2
Using the smooth curve the proportion of scores less than or equal to 6 is calculated to be 0.293. Close
enough???
Here comes a curve ball!!!
In most cases the curve is easier to work with because the histogram depends on your choice of classes,
while the curve does not if we do the following:

Use the smooth curve to describe what proportion of the observations fall within each RANGE
of values; NOT the counts of observations (relative frequency not frequency)

Adjust the dimensions of the curve so that the area under the curve represents the proportion of
the observations

Further adjust the scale of the graph so that the total area under the curve is exactly 1
(representing 100% of the data)
The resulting curve is a DENSITY CURVE and the area under the curve and above the horizontal axis is
equal to the proportion of observations falling in this range.
The shaded area under the density curve is the proportion of observations taking values between 7 and 8
Note that no real set of data is exactly described by a density curve. The curve is an approximation that
is easy to use and accurate enough for our use.
Chapter 2 The Normal Distribution
3
The median and mean of a symmetric density curve
The median and mean of a right-skewed density curve
EXAMPLE
Sketch a density curve that is symmetric but not bell-shaped
The figure is a density curve of a UNIFORM DISTRIBUTION. Using this curve answer the following:
a) what is the total area under the curve
b) what percent of the observations lie above 0.8
c) what percent of the observations lie between 0.25 and 0.75
d)what percent of the observations lie between 0.8 and 1.75
Chapter 2 The Normal Distribution
4
For the density curve above find the proportion of observations within the interval:
a) 0.6 ≤ X ≤ 0.8
b) 0 ≤ X ≤ 0.4
Density curves that are symmetric, single-peaked, and bell-shaped are called NORMAL CURVES and
they describe NORMAL DISTRIBUTIONS. All normal distributions have the same overall shape. The
exact density curve for a particular normal distribution can be described by giving the mean (µ) and
standard deviation (α)
Two normal curves showing the mean and standard deviation
Chapter 2 The Normal Distribution
5
The 68-95-99.7 Rule (of thumb)
In the normal distribution with mean (µ) and standard deviation (α)
 68% of the observations fall within α of the mean
 95% of the observations fall within 2α of the mean
 99.7% of the observations fall within 3α of the mean
The distribution of heights of young women aged 18 to 24 is approximately mean 64.5 in and standard
deviation 2.5 in. What percentage of young women have heights:
a) less than 64.5 in
b) less than 69.5 in
c) greater than 62 in
d) greater than 72 in
Normal distributions are so common in the real world that a shorthand notations has been developed to
describe them. The normal distribution with mean µ and standard deviation α is referred to as N(µ,α).
Normal distributions are important in statistics because:

The distribution of many real world data sets can be described by the normal distribution (e.g.,
SAT scores)

Normal distributions are good approximations of many kinds of chance outcomes (e.g., tossing a
coin)

Many statistical procedures based on normal distributions work well for other roughly
symmetric distributions.
Chapter 2 The Normal Distribution
6
The Army reports that the distribution of head circumference among male soldiers is approximately
N(22.8,1.1)
a)what percent of soldiers have head circumference greater than 23.9 in?
b) A head circumference of 23.9 in would be what percentile?
c) What percentage of soldiers have head circumferences between 21.7 and 23.9 in?
Human pregnancies from conception to birth varies according to a distribution that is approximately
N(266,16).
a) between what values do the lengths of the middle 95% of all pregnancies fall?
b) How short are the shortest 2.5% of all pregnancies
c) How long are the longest 2.5% of all pregnancies
Chapter 2 The Normal Distribution
7
THE STANDARD NORMAL DISTRIBUTION
Normal distributions have similar shapes. In fact, all normal distributions are exactly the same if we
report the data in units of α about µ.
STANDARDIZING AND Z-SCORES
If X is an observation from a distribution that has mean µ and standard deviation α, the standardized
value (sometimes called the z-value) of X is the difference between the value and the mean divided by the
standard deviation
Z = (X-µ)/α
A standardized observation tells us how many standard deviations the original observation falls away
from the mean and in which direction. Observations larger than the mean are positive and obervations
smaller than the mean are negative. Observations equal to the mean give a z-value of zero
The standard normal distribution is the normal distribution N(0,1) with mean of 0 and standard deviation
of 1. If a variable X has ANY normal distribution N(µ,α), then the standardized variable z has the
standard normal distribution
Table A, inside the front cover, gives areas under the standard normal curve. The table entry for each zvalue is the area under the curve to the left of z
Chapter 2 The Normal Distribution
8
What proportion of young women are less than 68 inches tall?
The distribution of heights of young women aged 18 to 24 was approximately N(64.5,2.5). The
standardized height becomes
Z = (height – 64.5)/2.5
The level of cholesterol in the blood is important because high-cholesterol levels may increase the risk of
heart disease. The distribution of blood cholesterol levels in a large population of people of the same age
and sex is roughly normal. For 14 year old boys the mean is 170 milligrams of cholesterol per deciliter of
blood and standard deviation of 30 mg/dL. Levels above 240 mg/dL may require medical attention.
What percent of 14 year old boys have more than 240 mg/dL of cholesterol?
x>240
(x-170)/30 > (240-170)/30
z>2.33
What percent of 14 year old boys have blood cholesterol between 170 smf 240 mg/dL?
170 ≤ X ≤ 240
(170-170)/30 ≤ (X-170)/30 ≤ (240-170)/30
Chapter 2 The Normal Distribution
9
0 ≤ Z ≤ 2.33
Using Table A the area between 0 and 2.33 is the area below 2.33 minum the area below 0
Area between 0 and 2.33
= area below 2.33 – area below 0.00
= 0.9901 – 0.5000
= 0.4901
Therefore, about 49% of 14 year old boys have cholesterol levels between 170 and 240 mg/dL.
What if the z-value we are interested in falls outside the range covered by the table. For example, what if
we are interested in the area to the left of Z = -4. The table only goes out to Z = -3.4. The desired area is
less than the entry for Z = -3.4, which is 0.0003. There is very little area outside the range covered by
the Table. Therefore, you can usually take this area to be zero with little loss in accuracy.
What if we want to find the observed value with a given proportion of observations above it or below it?
To do this one must find the desired area in Table A and work backward to find the corresponding
observed value.
Example: Scores on the SAT verbal test in recent years follow approximately the N(505,110)
distribution. How high must a student score in order to place in the top 10% of all students taking the
SAT?
Go to Table A and find the value of Z such that 90% of the area falls to the left.
The closest entry to 0.9 is 0.8997. This entry corresponds to a Z-value of 1.28
Therefore,
Z = 1.28
(X-505)/110 = 1.28
X = 1.28(110) + 505
X = 645.8
Consequently, a student must score at least 646 to place in the highest 10%
Chapter 2 The Normal Distribution
10
Example: Use Table A to find the area under the curve which answers the following question?
a) Z < 2.85
b) Z > 2.85
c) Z ≥ 2.85
d) Z = 2.85
e) The point Z with 25% of the observations falling below it
Is It Normal????
Many of the statistical calculations we will use in later chapters assume the distribution of the data is
normal. Therefore you will need to be able to verify normality in order to properly use these statistical
techniques.
1.
Plot the data as a histogram, stemplot or dotplot. Look for a symmetric bell-shaped curve. Then
verify that the 68-95-99.7 rule applies.
2.
Construct a normal probability plot. Enter data into a list and then hit StatPlot Icon 6 with data
(list) and axis X. If the graph is linear (or roughly so) the data has a normal distribution.
Note that the program calculates the percentile for each data point and its corresponding Z-value. It then
plots the Z-value on the Y-axis and the corresponding X-value (original observed value) on the X-axis.
Example: Does the following data on hysterectomies performed per year by male doctors in Switzerland
approximate a normal distribution?
27 50 33 25 86 25 85 31 37 44 20 36 59 34 28