Download clicking here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
8 - THE NORMAL DISTRIBUTION
Examples: Alpha fetoprotein levels of mothers carrying a fetus with spina bifida.
Smooth bell shaped symmetric curve is
called the Normal p.d.f. curve or just the Normal curve.
50% 50%
Mean
If a random variable, X, has a Normal distribution with a mean  and a standard deviation  we write:
The Normal distribution is important because:

it fits a lot of data reasonably well;

it can be used to approximate other distributions

it is an important assumption in statistical inference (see later work.)
Shape is solely determined by two parameters,  and  , the population mean  controls where the normal
is centered, and the population standard deviation  controls the spread about  .
Example: Alpha fetoprotein levels found in the urine of mothers carrying a foetus with spina bifida.
Let X = alpha fetoprotein level in the urine of a mother carrying a foetus with spina bifida. We will assume
that alpha fetoprotein levels have a normal distribution.
The sample mean AFP level, X  22.71  moles/liter and the sample standard deviation, s = 3.92 
moles/liter. These are the sample-based estimates of _____ and ______ respectively.
Approximately _______ % of the mothers in this population will have AFP levels within 1 standard
deviation of the mean, i.e. we estimate that approximately ________% of this population of mothers will
have AFP levels:
between __________________ and
_____________________
= between __________ and ___________
Diagram here:
.
Approximately _______ % of the mothers in this population will have AFP levels within 2 standard
deviation of the mean, i.e. we estimate that approximately ________% of this population of mothers will
have AFP levels:
between _________ and
__________
= between __________ and ___________
5
Approximately _______ % of the mothers in this population will have AFP levels within 3 standard
deviation of the mean, i.e. we estimate that approximately ________% of this population of mothers will
have AFP levels:
between _________ and
__________
= between __________ and ___________
For the Normal Distribution:
A random observation has approximately:
68% chance of falling within 1  of  ;
95% chance of falling within 2  of  ;
99.7% chance of falling within 3  of  .
In a normal distribution, approximately:
68% of observations are within 1  of  ;
95% of observations are within 2  of  ;
99.7% of observations are within 3  of  .
or
OBTAINING PROBABILITES
To find probabilities associated with a normal distribution with mean  and standard deviation we need to
have a mechanism for finding areas beneath the normal curve. Because there are infinitely many mean and
standard deviations we might be interested in we need a standard process by which we can find areas
associated with any normal distribution!
The Standard Normal Distribution and Using the Standard Normal Table
X 
Fact: If X ~ N(  ,  ) then if we define a new random variable Z 
then Z ~ N(0,1), i.e. we create

a new random variable Z where the observed values of Z are the z-scores for the random variable X.
Recall the process of converting a random variable X to z-scores is called standardization. Once
standardized, we can find probabilities/areas of interest using a standard normal table.
The standard normal table in the appendix of most texts gives P(Z < z), i.e. lower tail probabilities for a
standard normal distribution (shaded). We can also use the Normal Probability Calculator in JMP in the
Tutorials section of website.
Most tables give shaded area = P(Z < z)
0 z
Z
6
Basic method for obtaining probabilities
1.
2.
3.
4.
Sketch a Normal curve, marking on the mean and values of interest.
Shade the area under the curve corresponding to the required probability.
Convert all values in original scale to their corresponding z-scores.
Obtain the desired probability from the upper-tail areas provided by a standard normal table.
Z = standard normal random variable
Z ~Normal(   0,   1).
Find the following standard normal probabilities:
a) P(Z > 2.25)
b) P(Z < 1.28)
c) P(Z > .50)
d) P(Z < -2.33)
e) P(-1.96 < Z < 1.96)
f) Find z so that P(Z < z) = .90, i.e. what is the 90th percentile of the standard normal distribution?
7
Spina Bifida Example (continued)
X = AFP level of a randomly selected mother carrying a foetus with spina bifida . Lets assume that
X~Normal ( =23.05,  = 4.08) using the sample mean and sample standard deviation.
Find the following:
a) P(X < 15.00)
=
b) P(X < 20.00)
c) P(20.00 < X < 25.00)
d) P(X > 30.00)
e) Find the 90th percentile.
f) Find the 25th percentile
8
Original Problem: Spina Bifida Diagnosis
15.73
23.05
Recall: For normal foetuses  =15.73,  = 0.72 and for foetuses with spina bifida  = 23.05 and  = 4.08.
Assume the threshold for detecting spina bifida is set at 17.8. (A foetus would be diagnosed as not having
spina bifida if the fetoprotein level is below 17.8)
a)
What is the probability that a foetus not suffering from spina bifida is correctly diagnosed?
Incorrectly diagnosed?
P(X < 17.8) = P(Z < 2.88) = .9980 or 99.8% chance. (Specificity)
b)
What is the probability that a foetus with spina bifida is correctly diagnosed? Incorrectly diagnosed?
P(X > 17.8) = P(Z > -1.29) = .9015
c)
If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level
should they set T?
To convert a z-score back to the original data scale use
the formula x = ×z .
9
Standard Normal Table – P(Z < z)
Table for negative z-scores, i.e. z < 0
10
Standard Normal Table – P(Z < z)
Table for positive z-scores, i.e. z > 0
11