Download Chapter 6: The Normal Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
1
WHY DO WE NEED TO KNOW MODELS?
Scenario I A patient visits his doctor complaining of a number of symptoms. The doctor
suspects the patient is suffering from some disease. The doctor performs a diagnostic
test to check for this disease. High responses on the test support that the patient may
have the disease. The patient’s test response is 200.What does this say?
The doctor has a frame of reference—that is, a model for the responses of the
diagnostic test for “healthy subjects”— as shown in the accompanying figure:
Model for
Healthy Subjects
80
100
120
140
160
180
200
Test response
Based on this model, it is very unlikely that a test response of 200, or greater, would
have occurred if the subject were actually healthy. Thus, either the patient has this
disease or a very unlikely event has occurred.
Scenario II:
Suppose we wish to compare two drugs, Drug A and Drug B, for relieving arthritis pain.
Subjects suitable for the study are randomized to one of the two drug groups and are
given instructions for dosage and how to measure their “time to relief.” Results of the
study are summarized by presenting the models for the time to relief for the two drugs.
Which drug is better overall?
Consider any point in time, say time = T as indicated on the above axis. A higher
proportion of subjects treated with Drug A have felt “relief” by this time point as
compared to those treated with Drug B. If the study design was sound, and the models
based on the study results adequately portray the models for the populations, then we
might conclude that Drug A appears to be better than Drug B in terms of having a
quicker time to relief. We may wish to assess if the difference between these two drugs
is statistically significant by conducting a more formal statistical test.
2
MODELING CONTINUOUS VARIABLES
Histogram 6.1
Proportion
5
0.08
0.06
0.04
0.02
30
35
40
45
50
55
Age
If we draw a curve through the tops of the bars in Histogram 6.1 and require the
smoothed curve to have total area under it equal to 1, we would have what is called a
density function, also called a density curve.
The key idea when working with
density functions is that area under the curve, above an interval, corresponds to the
proportion of units with values in the interval.
NOTATION...
Since we will be discussing models for populations, the mean and standard deviation for
a density curve or model will be represented by

(mu) and

(sigma), respectively.
DEFINITION:
A density function is a (nonnegative) function or curve that describes the overall shape
of a distribution. The total area under the entire curve is equal to 1, and proportions are
measured as areas under the density function.
3
Let's Do It! 1
Lifetime Density Function
Let the variable X represent the length of life, in years, for an electrical
component. The following figure is the density curve for the distribution of X.
(a)
What proportion of electrical components lasts longer than 6 years?
(b)
What proportion of electrical components lasts longer than 1 year?
(c)
Describe the shape of the distribution.
Normal Distributions
A normal distribution


Point of
inflection

4
Three members of the family of normal distributions
Distribution #3:
Normal with a mean of 80
and a standard deviation of 5
Distribution #1:
Normal with a mean of 50
and a standard deviation of 10
20
30
40
50
Distribution #2:
Normal with a mean of 80
and a standard deviation of 10
60
70
80
90
100
General Notation
X is N(  , ) means that the variable or characteristic X is
normally distributed with mean

and standard deviation  .
5
Example 1 IQ Scores
Problem
Let the variable X represent IQ scores of 12-year-olds. Suppose that the distribution of
X is normal with a mean of 100 and a standard deviation of 16—that is, X is
N(100, 16). Jessica is a 12-year-old and has an IQ score of 132.We would like to
determine the proportion of 12-year olds that have IQ scores less than Jessica’s score
of 132.
Since the area under the density curve corresponds to proportion, we want to find the
area to the left of 132 under an N(100, 16) curve. Sketch this curve and show the
corresponding area that represents this proportion.
IQ Scores have a
normal distribution
with mean 100 and
standard deviation 16

area to the left of 132 = ?
68
84
100
116
132
IQ Score
6
How to Calculate Areas under a Normal Distribution
DEFINITION:
If X is N(  , ) , the standardized normal variable
Z
X

is
N 0 ,1.
DEFINITION:
The z-score or standard score for an observed value tells us how many standard
deviations the observed value is from the mean – that is, it tells us how far the observed
value is from the mean in standard-deviation units. It is computed as follows:
Z
X

= number of standard deviations that
X differs from the mean

If Z > 0, then the value of X is above (greater than) its mean.
If Z < 0, then the value of X is below (less than) its mean.
If Z = 0, then the value of X is equal to its mean.
7
Example 2 Standard IQ Score
Problem
Recall the distribution of IQ scores for 12-year-olds—normally distributed with a mean of
100 and a standard deviation of 16.
(a)
Jessica had a score of 132. Compute Jessica’s standardized score.
(b)
Suppose Jessica has an older brother, Mike, who is 20 years old and has an IQ
score of 144. It wouldn’t make sense to directly compare Mike’s score of 144 to
Jessica’s score of 132. The two scores come from different distributions due to
the age difference. Assume that the distribution of IQ scores for 20-year-olds is
normal with a mean of 120 and a standard deviation of 20. Compute Mike’s
standardized score.
(c)
Relative to their respective age group, who had the higher IQ score—Jessica or
Mike?
Solution
Jessica's standard score =
Mike's standard score =
132  100
 2 .
16
144  120
 12
. .
20
Thus, relative to their respective age groups, Jessica has a higher IQ score than Mike
8
Example 3
Finding Proportions for the Standard Normal Distribution
Problem
Finding proportions under a normal distribution involves standardization and then
finding the corresponding proportion (area) under the standard normal distribution. Let’s
first work on finding areas under a standard normal N(0, 1) distribution.
(a)
Find the area under the standard normal distribution to the left of z = 1.22.
Sketch a picture of the corresponding area and use either Table 3 page 825
or your TI-84 to find the area.
Solution
Using TI:
(b)
Find the area under the standard normal distribution to the right of z = 1.22.
9
Let's Do It! 2 6.2More Standard Normal Areas
(a)
Find the area under the standard normal distribution between z = 0 and z = 1.22.
Sketch the area and use Table E or your calculator to find the area

Z
0
(b)
Find the area under the standard normal distribution to the left of z = -2.55.
Sketch the area and use Table E or your calculator to find the area.

Z
0
(c)
Find the area under the standard normal distribution between z = -1.22 and z
= 1.22. Sketch the area and use Table E or your calculator to find the area.

0
Z
10
Let's Do It! 3 6.3IQ Scores
We will continue with the model for IQ score of 12-year-olds. In answering the following
questions, remember to use the symmetry of the normal distribution and the fact that
the total area under the curve is 1. It may also be very useful to draw a picture of the
area you are trying to find so you can establish a frame of reference (for example,
should it be larger or smaller than 50%?) and see the way to approach getting the
answer. If you will be using Table II, you will need to first compute the corresponding zscores.
X = IQ score (12-year-olds) has a
(a)
N 100,16  distribution.
What proportion of the 12-year-olds has IQ scores below 84? Sketch it.

52
(b)
68
84
100
116
132
148
IQ Score
What proportion of the 12-year-olds has IQ scores 84 or more? Sketch it.

52
(d)
68
84
100
116
132
148
IQ Score
What proportion of the 12-year-olds has IQ scores between 84 and 116?
Sketch it.

52
68
84
100
116
132
148
IQ Score
11
Example The Top 1% of the IQ Distribution
Problem
Recall the N 100,16 model for IQ score of 12-year-olds. What IQ score must a 12year-old have to place in the top 1% of the distribution of IQ scores?


(a)
Draw a picture to show what IQ score you are trying to find.
(b)
What percentile do you want to find for the IQ distribution?
(c)
Find the percentile using Table E in reverse or your calculator.
Again it may be helpful to draw a picture:

The area to the left is 0.99
100
?
IQ Score
Many calculators have the ability to find various percentiles of a normal distribution. The TI has
a built-in function called invNorm under the DIST menu. You must first specify the desired are
to the left, then the mean and the standard deviation for the normal distribution. The steps for
finding the 99th percentile of our N(100,16) distribution are as follows:
12
To use Table 3 in reverse manner,
Step1: standardize the score X.
Step 2: find the Z score on the margins that will corresponds to the area 0.99.
Finally equate step 1 to the value of the Z in step 2 and solve for X.
x
x  100
 2.33

16
.
 x  100  (2.33)  16  137.28
z

A 12-year-old must have an IQ score of at least 137.28 to place in the top 1%.
Let's Do It! 4 6.7Freestyle Swim Times
The finishing times for 11–12-year-old male swimmers performing the 50-yard freestyle
are normally distributed with a mean of 35 seconds and a standard deviation of 2
seconds.
(a)
The sponsors of a swim meet decide to give certificates to all 11–12-year-old
male swimmers who finish their 50-yard race in under 32 seconds. If there are
50 such swimmers entered in the 50-yard freestyle event, approximately how
many certificates will be needed?
(b)
In what amount of time must a swimmer finish to be in the “top” fastest 2% of
the distribution of finishing times?
13
Let's Do It! 5 7 Hours per Week
According to a study, men in the US devote an average of 16hrs per week to house
work. Assume that the number of hours men devote to house work is normally
distributed with a standard deviation of 3.5.
a. Suppose that the lower 10% of men on the distribution devote fewer than x hours
per week. Find the value of x.
b. Suppose the upper 5% of men on the distribution devote more than x hrs per
week. Find the value of x.
Let's Do It! 6 Middle portion of the normal distribution
If one-person household spends an average of $40 per month on medications and
doctor visits, find the maximum and minimum dollar amounts spent per month for the
middle 50% of one-person household. Assume that the standard deviation is $5 and
that the amount spent is normally distributed.
Homework Page 138: 1-9 all, 12-16 all, 31-33 all
14