Download STP 226

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
STP231 Brief Class Notes Instructor: Ela Jackiewicz
Chapter 4. THE NORMAL DISTRIBUTION
Introducing Normally Distributed Variables
The distributions of some variables like thickness of the eggshell, serum cholesterol
concentration in blood, white blood cells count in a specimen of blood have roughly the
shape of a normal curve (bell shaped curve)
Normally Distributed Variable
A variable is said to be normally distributed or to have a normal distribution if its
distribution has the shape of a normal curve.
•
•
Normal distribution (curve) completely determined by mean (µ) and standard
deviation (σ).
Parameters of Normal distribution = (µ,σ)
Characteristics of Normal distribution
• Bell-shaped
• Symmetric around the mean µ
• Close to the horizontal axis outside the range from µ-3σ to µ+3σ
• Spread depends on the standard deviation σ.
• Area under the curve is 1 for any (µ,σ).
Notation: Y~N(12, 7) indicates that Y has normal distribution with mean 12 and
standard deviation 7
1
STP231 Brief Class Notes Instructor: Ela Jackiewicz
Normally distributed variables and normal-curve areas
For a normally distributed variable, the percentage of all possible observations that lie
within any specified range equals the corresponding area under its associated normal
curve expressed as a percentage. This holds true approximately for a variable that is
approximately normally distributed.
Example (Heights of Female of College Students):
A college has an enrollment of 3264 female students. Records show that the mean height
of these students is 64.4 inches and the standard deviation is 2.4 inches. Since the shape
of the relative histogram of this sample college students approximately normally
distributed, we assume the total population distribution of the height of all the female
college students follows the normal distribution with the same mean and the standard
deviation. Now if you want to find out the percentage of students whose heights are
between 66 and 68 inches, you have to evaluate the area under the normal curve
from 66 to 68.
2
STP231 Brief Class Notes Instructor: Ela Jackiewicz
68
Area =
∫
66
1
2π(2.4) 2
e
( x −64. 4 ) 2
−
2 ( 2. 4 ) 2
dx = 0.1846 (by TABLE)
Relative frequency = 0.1100+0.0735 = 0.1835 (by relative frequency distribution)
Standardizing a Normally Distributed Variable
Facts:
1) Once we know the mean and standard deviation of a normally distributed
variable, we know its distribution and associate normal curve
2) Percentages for a normally distributed variable are equal to areas under its
associated normal curve.
How do we find areas under a normal curve?
Integration? Or tables for each different µ and σ ? Or standardize your normal curve and
use only one table with mean(µ)=0 and standard deviation(σ)=1?
Standard Normal Distribution; Standard Normal Curve
A normally distributed variable having mean 0 and standard deviation 1 is said to have
the standard normal distribution. Its associated normal curve is called the standard
normal curve.
Standardized Normally Distributed Variable
The standardized version of a normally distributed variable Y, Z =
standard normal distribution.
Y −μ
σ
has the
3
STP231 Brief Class Notes Instructor: Ela Jackiewicz
Areas under the Standard Normal Curve
Basic Properties of the Standard Normal Curve
1.
2.
3.
4.
The total area under the standard normal curve is equal to 1.
The standard normal curve extends indefinitely in both directions, approaching,
but never touching, the horizontal axis as it does so.
The standard normal curve is symmetric about 0; i.e., the part of the curve to the
left of 0 is the mirror image of the part of the curve to the right of 0.
Most of the area under the standard normal curve lies between –3 and 3.
Using the Standard-Normal Table
There are infinitely many normally distributed variables, however, if these variables can
be standardized, then the standard normal tables can be used to find the areas under the
curve.
* Table set up to accumulate the area under the curve from -∞ to and specified value.
* The table starts at –3.9 and goes to 3.9 since outside this range of values the area is
negligible.
* The table can be used to find a z value given and area, or and area given a z value.
4
STP231 Brief Class Notes Instructor: Ela Jackiewicz
The zα Notation
The symbol zα is used to denote the z- score having area α (alpha) to its right under the
standard normal curve. zα - z sub alpha or simply z α.
Working with Normally Distributed Variables
To Determine a Percentage or Probability for a normally Distributed Variable
1.
2.
3.
4.
Sketch the normal curve associated with the variable.
Shade the region of interest and mark the delimiting x-values.
Compute the z-scores for the delimiting x-values found in step 2.
Use Table II to obtain the area under the standard normal curve delimited by the
z-scores found in step 3.
Example (contd.)
Height of Female students: Normal distribution with µ = 64.4, σ = 2.4.
We want to determine the probability that randomly selected student will have height
between 66 and 68.
z-score for x = 66: z = (66-64.4)/2.4 = 0.67, x=68: z = (68-64.4)/2.4 = 1.5
area under standard normal curve: z= 1.5 -> 0.9332, z = 0.67 -> 0.7486
resulting probability: 0.9332 – 0.7486 = 0.1846
In conclusion: For normally distributed variables Y questions:
1)What percentage of values of Y are in the range a to b
2)For randomly selected Y what is the probability P(a<Y <b)
can both be answered by computing area under the normal curve between a and b.
5
STP231 Brief Class Notes Instructor: Ela Jackiewicz
Visualizing a Normal Distribution
1.
2.
3.
68.26% of all possible observations lie within one standard deviation to either
side of the mean, i.e., between µ - σ and µ + σ.
95.44% of all possible observations lie within two standard deviations to either
side of the mean, i.e., between µ - 2σ and µ + 2σ.
99.74% of all possible observations lie within three standard deviations to either
side of the mean, i.e., between µ - 3σ and µ + 3σ.
To Determine the Observations Corresponding to a specified Percentage or
Probability for a Normally Distributed Variable.
1.
2.
3.
4.
Sketch the normal curve associated with the variable.
Shade the region of interest.
Use Table II to obtain the z-scores delimiting the region in step 2.
Obtain the x-values having the z-scores found in step 3:
x=μ+z (σ)
Example (contd.)
a. Obtain the Q3(75th percentile) of the height of female students.
The z-score corresponding to Q3 is the one having an area of 0.75 to its left under the
standard normal curve. From Table II, that z-score is 0.67, approximately.
So the x-value (height) corresponding to that z-score is 64.4 + (0.67)*2.4 = 66 inches.
b. Obtain the 10th percentile.
z-score corresponding to P10 is the one having an area of 0.1 to its left under the standard
normal curve. From Table II, that z-score is –1.28, approximately. So the x-value (height)
corresponding to that z-score is 64.4 + (-1.28)*2.4 = 61.32.
Assessing Normality. Normal Probability Plots.
1. Many statistical procedures are based on the assumption that data analyzed is
coming from normally distributed populations. One way to assess the normality of
your data is through the use of Empirical Rule. We can compute percentages
within 1, 2 and 3 SD-s from the mean of the data and check is the percentages are
close to expected 68-95-99.7.
Visual check of the histogram is also helpful, if we have unimodal, nearly
symmetric graph with no long or very short tails, we can be pretty sure that
normality assumption can be made.
2. With small data sets in particular Empirical Rule or a visual check of the
histogram is not as useful. A special statistical graph: Normal Probability Plot is
often used. The plot is a scaltterplot that compares observed data values to the
values we would expect to have if the population were normal. If the data came
from normal population, points would follow a straight line;
Following example illustrates the procedure.
6
STP231 Brief Class Notes Instructor: Ela Jackiewicz
Ex. Y= age of onset of diabetes , sample of size 5: 7, 48, 43, 51, 49. Order your data.
Compute mean and standard deviation: ̄y =39.6 , s=18.46
Y=observed height
7
43
48
49
51
(i-.5)/n= adjusted percentile
.10
.30
.50
.7
.9
-1.28
-0.52
0
0.52
1.28
16.0
30.0
39.6
zα
Y=theoretical height=
49.2
63.2
̄y +z α s
Graphing theoretical height (x-axis) vs observed height (y-axis) we can see that points do
not follow a line, first value is much smaller that expected theoretical value indicating left
skewness.
Check out normal probability graphs on page 137
Instructions for : TI83, 83-Plus, 84-Plus
Computing areas under normal curves:
use 2nd VARS to get to the DISTR menu:
option 2 normalcdf(lower limit, upper limit, mean, standard deviation)
will give are between lower and upper limits
(mean=0 and SD=1 are default values)
Ex1 To find area between 1 and 1.7 under N(0,1) use
normalcdf(1,1.7,0,1)=.1141
Ex2 To find area onder N(0,1) left of 2.3 use
normalcdf(-1000000, 2.3,0,1)=.9893
(use any “large” negative number as lower limit)
Ex3 To find area right of 2.11 under N(0,1) use
normalcdf(2.11,1000000,0,1)=.0174
(use any large positive number as upper limit)
Ex4 To find area under curve N(12,3) between 10 and 16 use
normalcdf(10, 16, 12,3)=.6563
Finding points from under the normal curves when area is given.
use 2nd VARS to get to the DISTR menu:
option 3 invNorm(area to the left, mean, standard deviation)
(mean=0 and SD=1 are default values)
7
STP231 Brief Class Notes Instructor: Ela Jackiewicz
Ex5 To find third decile of N(0,1) use invNorm(.3,0,1)=-.52
Ex6 To find 95th percentile of N(0,1) (or to find Z.05 ) use
invNorm(.95,0,1)=1.645
Ex7 To find third quartile on N(12,3) use invNorm(.75,12,3)=14.02
8