Download stat slides - the normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
THE NORMAL distribution
WHAT IS A NORMALLY DISTRIBUTED DATA?
For example, what is the normal height for 18 y-o. males in Manila?
How can we say that 5½ ft is the normal height for them?
Say, 5½ft!
NORMAL = 5½ft
I can find too many 18 y-o males around who are about 5.5ft tall!
MODE = 5½ft
The shortest guy I have found is about 5 ft, and the tallest us 6 ft.
5½ ft is right in the middle. Not too small, not too tall!
MEDIAN = 5½ft
Well, people say 5½ft is just the average height for them.
MEAN = 5½ft
Now, can you describe the number of 18 y.o. guys who are shorter
or taller than the normal 5½ft height, if you count them?
The numbers are decreasing smoothly as I count the guys who
are either shorter or taller than the normal 5½ft height!
Therefore, in Statistics, a NORMALLY DISTRIBUTED
data set means: MODE = MEDIAN = MEAN and
Smoothly decreasing
frequencies both for
lower and higher
values than normal!
The frequencies are smoothly decreasing for both
lower and higher values than the normal value
THE NORMAL DISTRIBUTION — WHAT IS A NORMALLY DISTRIBUTED DATA SET?
Page 1
HISTOGRAM PATTERN OF NORMALLY DISTRIBUTED DATA
HISTOGRAMS IN ACTUAL CASES
THE NORMAL DISTRIBUTION
INTERVAL CLASSES
Median=Mode=Mean
FREQUENCY
WITH 27 INTERVAL CLASSES
Median=Mode=Mean
FREQUENCY
Median=Mode=Mean
FREQUENCY
WITH 9 INTERVAL CLASSES
INTERVAL CLASSES
BASIC PROPERTIES
The normal distribution is BELL-SHAPED, due to
its smoothly decreasing frequency pattern.
An “exactly” normal data IS IMPOSSIBLE!!
Such data can only be approximately normal.
The normal distribution is used as a SUBSTITUTE
for any approximately normal data.
INTERVAL CLASSES
THE NORMAL DISTRIBUTION — THE HISTOGRAM PATTERN OF NORMALLY DISTRIBUTED DATA
Page 2
ONE NORMAL TO REPRESENT ALL THE OTHERS!
TAKE
NOTE!
Imagine that you have:
ORIGINAL DATA
MEAN
ST.DEV.
Then, using the
z-score formula:
X  X (mean)
Z 
s (st.dev)
(COMPLETE LIST)
You convert all your original
data values into z-scores
Of course, you have also computed
the mean and the standard deviation!
IN SHORT:
If you convert your entire data set into
z-scores, you will have a new data set
(all z-scores, apparently!) having:
mean = 0 and standard deviation = 1!
And this works for all kinds of data set!
You now have:
CONVERTED DATA SET
MEAN = 0
ST.DEV. = 1
(ALL Z-SCORE VALUES)
Computing the mean and standard
deviation of the converted data,
REMEMBER THIS, you will get:
ONESTANDARDNORMAL?
ALL NORMALLY
DISTRIBUTED DATA SETS,
MEAN = 0
ST.DEV. = 1
WHEN CONVERTED
INTO Z-SCORES ,
ALL OF THEM
TURN INTO
0
THE STANDARD NORMAL
THE NORMAL DISTRIBUTION — ONE NORMAL TO REPRESENT ALL THE OTHERS!
Page 3
A CLOSER VIEW OF THE STANDARD NORMAL BELL CURVE
From this Table, we will
see that the interval
Z=-1.5 to Z=-0.7
under the bell-curve,
encloses an area of
0.1752 or 17.52%
TAKE NOTE!
This AREA is the
same as the
RELATIVE FREQUENCY
AREA = 0.1752
of the interval!
Not important in the standard normal!
We have a special TABLE for
the areas under the standard
normal bell-curve. (Later!)
The standard normal is derived from a given
normal data set, by converting its data values
into z-scores. Hence, the horizontal axis of the
standard normal is the axis of the z-scores.
Recall!! For the standard normal, the mean = 0.
So, the point Z=0 must be marked as the point of
symmetry of the bell-curve and its maximal point.
In the standard normal, the vertical axis has no
significance! For the usual (frequency) histograms,
this line is the axis for the class frequencies.
What serves as relative frequencies in the standard
normal is the % AREA enclosed by the DESIGNATED
INTERVAL on the z-score axis, UNDER THE BELL-CURVE!
TOTAL AREA = 1.00
-1.5
Lastly, the TOTAL AREA enclosed by the standard
normal bell-curve is 1.00 (or 100%).
-0.7
INTERVAL
0
Z-SCORE AXIS
(converted data values)
THE NORMAL DISTRIBUTION — A CLOSER VIEW OF THE STANDARD NORMAL BELL CURVE
Page 4
HOW TO READ THE STANDARD NORMAL TABLE
On the left is a standard normal distribution
TABLE OF CUMULATIVE AREAS.
Does it give us all possible areas under
the standard normal bell curve?
Usually, it only gives the left-tail area under the
standard normal from a specified z-value!
This blue-shaded section
of the standard normal is
what the ‘left-tail from a
specified z-value’ refers to
z
Other areas can be obtained from the left-tail
area by subraction. (Coming shortly.)
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
Page 5
THE PROBABILITY NOTATION P(EVENT) FOR THE AREAS
NOTE!
The area under the standard normal is the relative frequencies (%) of
the interval (of z-values) enclosing it. But the relative frequency is the
probability (after appropriate restatement). So, the notation P(Event)
is also used to denote area under the standard normal.
(For now, we will use a rectangle in place of the standard normal bell-curve for easy visualization…)
P(Z<1.12)
1.12
The shaded area under the standard normal will be denoted as:
P( Z<1.12 )
Read the symbol P(??) as: “the area under the interval (??)”
The interval Z<1.12 consists of all z-values less than 1.12
The value of P(Z<1.12) is what the Table will give us! For the other possible areas:
P(Z>-0.75)
-0.75
-0.75
P(-1.23<Z<0.95)
-1.23
AREA = 1
0.95
-0.75
P(Z<0.95)
-1.23
P(Z<-0.75)
0.95
THE NORMAL DISTRIBUTION — THE PROBABILITY NOTATION P(EVENT) FOR THE AREAS
P(Z<-1.23)
-1.23
0.95
Page 6
HOW TO FIND P(Z<1.27) IN THE STANDARD NORMAL TABLE
To find the value of P(Z<1.27) in
the Table of Cumulative Areas :
0.8980
0
1.27
First, find ‘1.2’ in the very first column (the ‘z’ column).
Then, along the row (where you find ‘1.2’) find the value under the column ‘0.07’.
Therefore, P(Z<1.27) = 0.8980
THE NORMAL DISTRIBUTION — HOW TO FIND P(X<12) IN THE STANDARD NORMAL TABLE
Page 7
EXAMPLE 1. Draw the specified section of the standard normal and find its area.
A. Z<-1.04
NOTE!
Area = P(Z<-1.04) = 0.1492
No need to place the specified z-values
precisely! Just correctly, relative to the
middle point z=0, and to each other!
-1.04
B. Z>0.82
0
-1.04
Area = P(Z>0.82)
= 1 – P(Z<0.82)
= 1 – 0.7939
= 0.2061
C. 1.25<Z<2.08
0
0.82
0
1.25
Area = P(1.25<Z<2.08)
= P(Z<2.08) – P(Z<1.25)
= 0.9812 – 0.8944
= 0.0868
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
2.08
Page 8
EXAMPLE 2. Find the Z-value (?) such that
Find the z-value in the Table whose left-tail area is
nearest to 0.29 = 0.2900 (must have 4 decimal places)
A. P(Z<?) = 0.29
left-tail
area = 0.29
? = -0.55:
P(Z<-0.55) = 0.2912
? = -0.56:
P(Z<-0.56) = 0.2877
(nearer!)
?
Find the z-value in the Table whose left-tail area is
nearest to 0.55 = 0.5500 (must have 4 decimal places)
B. P(Z>?) = 0.45
left-tail
area
0.45
0.55
? = 0.12:
P(Z<0.12) = 0.5478
? = 0.13:
P(Z<0.13) = 0.5517
(nearer!)
?
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
Page 9
EXAMPLE 3. It was found that certain type of storage battery lasts an average of 3.0 yrs with the
standard deviation of 0.5 years. If the battery lives are normally distributed:
A. Find the percentage of batteries that will last less than 3.7 years
THINK!
The battery lives (in yrs) are normal,
with MEAN = 3.0 and ST.DEV. = 0.5
So the histogram for battery lives is:
We want those battery lasting less than 3.7 years.
To use the standard normal, we
just convert the data values to
z-scores using the formula:
X  3.7 :
Z 
Z 
3.0
3.7
years
0
1.4
z-score
XX
s
3.7  3
 1.4
0.5
The shaded section of the standard normal,
expressed in probability notation is:
P(Z<1.4) = 0.9192 or 91.92%
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
Page 10
B. Find the percentage of batteries that will last at least 2.1 years.
MEAN = 3.0
ST.DEV. = 0.5
Convert the data values to Z-score:
Z 
X  2.3 :
Shaded section:
P(Z>-1.8) = 1 – P(Z<-1.8)
years
2.1 3.0
-1.8
2.1  3
 1.8
0.5
0
= 1 – 0.0359
z-score
= 0.9641 or 96.41%
C. Find the percentage of batteries that will last around 3.5 to 3.8 years.
MEAN = 3.0
ST.DEV. = 0.5
3.0 3.5
-1.8
0
3.8
years
z-score
Convert the data values to Z-score:
3.5  3
 1
0.5
3.8  3
 1.6
Z 
0.5
Z 
X  3.5 :
X  3.8 :
Shaded section:
P(1<Z<1.6)
= P(Z<1.6) – P(Z<1)
= 0.9452 – 0.8413
= 0.1039 or 10.39%
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
Page 11
EXAMPLE 4. The height of miniature poodles is normally distributed with a mean of 30 cms and
standard deviation 4.1 cms.
A. Find the percentage of miniature of poodles which are taller than 35 cms.
MEAN = 30
ST.DEV. = 4.1
3.0
35
0
1.22
Convert the data values to Z-score:
X  35 :
Shaded section:
Height (cm)
Z 
35  30
 1.22
4.1
P(Z>1.22) = 1 – P(Z<1.22)
= 1 – 0.8888
z-score
= 0.1112 or 11.12%
B. Is it possible to find a miniature poodle which is shorter than 18cms?
MEAN = 30
ST.DEV. = 4.1
35
3.0
-2.93
0
Convert the data values to Z-score:
X  18 :
Z 
18  30
 2.93
4.1
Shaded section: P(Z<-2.93) = 0.0017 or 0.17%
Height (cm)
0.17% — less than 1% chance!
z-score
So, it’s almost impossible to find such a poodle!
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
Page 12
C. Find the height range of the tallest 10% of all miniature poodles.
MEAN = 30
ST.DEV. = 4.1
0.90
The tallest 10% belong to the shaded right-tail
from some Z-value (?) whose area is 0.10 (=10%)
Therefore, the left-tail area at this Z-value (?) is 0.90.
0.10
30 35.25
0
?
Height (cm)
The tallest 10% starts at 35.25cms.
From the Table, find that Z-value, ? = 1.28
Convert Z-value back to data value X. using:
z-score
X  X  Zs
X  30  (1.28)(4.1)  35.25 cms
The height range for the tallest 10% is 35.25cms and above.
THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE
Page 13