Download Session 5 - bjgumm.com!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Session V: The Normal Distribution
Continuous Distributions
(Zar, Chapter 6)
The subject of this chapter:
Continuous Distributions
The most important continuous distribution: The Normal distribution.
1)
Originally defined by Laplace (~1799)
2)
Developed by Gauss (~1850): often called the
Gaussian distribution although some
say he had little to do with it.
3)
Named the “Normal Distribution”
by Karl Pearson (1920)
4)
Has nothing to do with normal vs. abnormal,
but is the “common” distribution of many processes.
Some Review:
Universe
Sample Space
X3
X2
x1,x2, x3
, xn 
Xn
X1
f (x)
n
0  bin width
Histogram
Density
x
1
100%
F(x)
n
0  bin width
0
x
Distribution
Function
x
F(x)   f (u)du
0
Cummulative
Histogram
The “Normal” Distribution
The Density Function N(m,s2):
1  x  m 2
 

1
f ( x) 
e 2 s 
2s
  3.14159 ; e  2.71828
m

 xf ( x) dx


s   ( x  m )2 f ( x)dx
2

Recall…
n
x 
i 1
n
s2  
i 1
s= s2
xi
n
estimates m
( xi  x )
n 1
2
estimates s 2
estimates s
If m  0 and s  1
"Standardized Normal"
y 
x-m
N(0,1)
 x  ys  m
s
If x ~ N(m , s 2 )
then y ~ N(0,1)
From Table B.2 (App 17)
Pr y> 0.0  0.5000  Pr  y  0.0
Pr y> 1.0  0.1587  Pr  y  1.0
Pr y> 2.0  0.0228  Pr  y  2.0
Pr y> 2.57  0.0051  Pr  y  2.57
Pr y> 3.0  0.0013  Pr  y  3.0
34%
13.5%
2.5%
Pr 0.0  y< 1.0  Pr 0.0  y  Pr 1.0  y
About 1/3
=.5000 - 0.1587 = .3413
Pr 1.0 < y <2.0  Pr 1.0  y  Pr 2.0  y
About 13.5 %
= 0.1587 - 0.0228 = .1359
Pr  y > 2.0  0.0228
About 2.5%
66.7%
95%
A test of the normal distribution:
Ho
Bin
Obs
:
:
:
N( x ,s2 )
(-,-2) (-2,-1)
h1
h2
(-1,0) (0,1)
h3
h4
(1,2)
h5
(2,)
h6
Prob
:
.025
.34
.135
.025
Expected
:
n.025 n.135 etc
.135
.34
 2  ? DF = 6 -1-1  1  3
for x
for s 2
Moments
Definitions:
mp 

x
p
f ( x)dx

( Note : m1  m ) (estimated by x )
Central Moments:

Kp 

( x  m ) p f ( x ) dx

First Central Moment is Zero:

K1 
1
(
x

m
)
f ( x)dx


  xf ( x)dx   m f ( x)dx
 mm 0
Second Central Moment is the Variance:

K2 
2
(
x

m
)
f ( x)dx


s2
Estimated by s2
“Machine Calculation Formula” for the Variance:
s
2
(x  x)


i

Hint:
2
n 1
2
2
x

nx
 i
n 1
2
2
a

b

a

2
ab

b
(
)
2
Moments (Cont.)
Third Moment: Skewness

K3 
3
(
x

m
)
f ( x)dx


K3 < 0
Mean < Median
Skewed
“to the left”
K3 = 0
Mean = median
Symmetric
K3 > 0
Mean> Median
Skewed
“to the right”
Normalized as:
K3
1 
s
“Unit”-less
3
Estimated by:
n
k3 
n ( xi  x )3
i 1
(n  1)(n  2)
and
k3
g1  3
s
“Machine” formula:
k3 
n xi3  3 xi  xi2  2( xi )3 / n
(n  1)(n  2)
Fourth Moment: Kurtosis

k4 
4
(
x

m
)
f ( x) dx


Normalized by
2 
k4
s
4
3
“Unit”-less
Estimated by
 ( xi  x ) n(n  1) /(n  1)  3  ( xi  x ) 
4
k4 
( n  2)( n  3)
and
g2 
2
k4
3
4
s
2
Why –3?
Normal Distribution has uncorrected 4th moment = 3
K4 < 0
Lepto-kurtotic
K4 = 0
Meso-kurtotic
K4 > 0
Platy-kurtotic
So, for the normal distribution,
k4
s
4
X ~ N( m , s 2 )
 3
Moment = Symbol For normal Estimated by
1st moment (mean) = m
m
x
2nd moment (variance) = s 2
s2
s2
3rd moment (skewness) =  1
0
g1
4th moment (kurtosis) =  2
0
g2
Many distributions can be characterized by their first four moments.
A system called the Pearson system (after E.S. Pearson) is such a
system. Not used much any more.
So… Why the “Normal” Distribution?
Distribution of Means:
Universe
These are data points.
If we selected them again
they would be different.
X1
Xn
X3
x1, x2 , x3 ,, xn 
X2
These are random variables that generate. x1 , x2 , x3 ,, xn 
Now
n
xn   xi / n
i 1
is a data point, but if we did the experiment
over and over again calculating a new x each
time, they would be different and have a
distribution based upon the random variables
n
Xn   Xi / n
i 1
So if  X1, X 2 , X 3 ,, X n  is a random sample,
what’s the distribution of X n ?
Start easy! n=2
If X1 and X2 are independent random variables
(like a random sample) with means m1 and m2 and
2
variances s 1 and s 2 , then
Mean(aX1  bX 2 )  am1  bm2
Variance(aX 1  bX 2 )  a 2s 12  b 2s 22
For a general “n”:
If  X1 , X 2 , X 3 ,, X n  is a random sample
with mean m and s 2 , then
Mean ( X )  Mean (  X i / n )
  Mean( xi ) / n
= m /n=nm /n=m
and
n
V ariance( X )  V ariance( X i ) / n 2
i 1
  s 2 / n2  ns 2 / n 2  s 2 / n
Note: True for any random sample with any
2
m
and
s
 !
distribution with
Distribution of Xn
Central Limit Theorem:
If X1 , , X n is a random sample with mean m
and variance s 2   , then
n 
Dist( X n ) 
 N (m , s 2 / n)
or more precisely
 X m 
n 
Dist  n

 N (0,1)

s/ n 
This means that if n
is large enough, then X
is normally distributed,
or at least close enough.
How large does
nhave to be?
Ex: The uniform distribution
1
1 if 0  x  1
f ( x)  
 0 otherwise
f
0
0
x
F ( x) 
 f ( u ) du

if x  0
0

  x if 0  x  1
 1 if x  1

1
x
How many samples are necessary before X n is normally distributed
Authority
Number
IBM
12
DAJ
~30
Biomedical considerations
Typical sample is actually a large sum of independent
elements all with about the same distribution.
or, at worst,
Sample is the sum of distinct clones
Testing the sample for the Normal Distribution!
Chi-Square Goodness of Fit
Chi-Square we have observed and expected
Frequencies.
We then combined them in the Chi-Square statistics.
For continuous distribution and sample x1 , x2 , , xn ,
form the histogram:
h3
h2
h1
observed: h1 h2 h3, h4 h5
expected: ? From H
h4
0
h5
Ex:6.1
Ho : m 
Height Class
< 62.5
62.5 - 63.5
63.5 - 64.5
64.5 - 65.5
65.5 - 66.5
66.5 - 67.5
67.5 - 68.5
68.5 – 69.5
69.5 – 70.5*
70.5 – 71.5
71.5 – 72.5
72.5 – 73.5
73.5 – 74.5
74.5 – 75.5
75.5 – 76.5
76.5 – 77.5
>77.5
x  70.17
s 2  10.96
s  3.31
Height fi
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0_
70
;s 
2
Lower limit - x
s
-2.317
Tabled
value
.01025
-2.015
-1.713
-1.410
-1.109
- .807
- .505
- .202
.0997
.4018
.7039
1.006
1.308
1.610
1.912
2.215
.0219
.0434
.0791
.1338
.2099
.3069
.4198
.4603
.3459
.2407
.1572
.0954
.0537
.0279
.0134
; f = N(m , s )
2
Pˆ (Height )
.01025
.0117
.0215
.0357
.05547
.0761
.0970
.1129
.1199
.1164
.1032
.0835
.0618
.0417
.0258
.0134
2
.72
.82
1.51
2.50
3.83
5.33
6.79
7.90
8.39
8.15
7.22
5.85
4.33
2.92
1.81
2.75
.94
Ho : X ~ N(70.17,10.96)
How to Calculate the Probability of a “bin”:
63
62.5
63.5
62.5
63.5
Pr(<63.5) – Pr(<62.5)=
 70.17
 62.5  70.17
Pr (63) = Pr  63.53.31
Pr



3.31
Est #(63) = Pr(63) ·70
*
P(70)  P(69.5  X  70.17)  P(70.17  X  70.5)
= .5 - .4198 + .5 -.4603
=.0802 + .0397
= .1199
2 Rule of Thumb:
Not more than 20-25% of expected frequencies < 5
and no frequency < 1.
Now back to the example!
Ex:6.1
Ho : m 
Height Class
Height fi
< 62.5
62.5 - 63.5
63.5 - 64.5
64.5 - 65.5
65.5 - 66.5
66.5 - 67.5
67.5 - 68.5
68.5 – 69.5
69.5 – 70.5*
70.5 – 71.5
71.5 – 72.5
72.5 – 73.5
73.5 – 74.5
74.5 – 75.5
75.5 – 76.5
76.5 – 77.5
>77.5
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
x  70.17
s 2  10.96
s  3.31
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0_
70
;s 
2
Lower limit - x
s
-2.317
Tabled
value
.01025
-2.015
-1.713
-1.410
-1.109
- .807
- .505
- .202
.0997
.4018
.7039
1.006
1.308
1.610
1.912
2.215
.0219
.0434
.0791
.1338
.2099
.3069
.4198
.4603
.3459
.2407
.1572
.0954
.0537
.0279
.0134
; f = N(m , s )
2
Pˆ (Height )
.01025
.0117
.0215
.0357
.05547
.0761
.0970
.1129
.1199
.1164
.1032
.0835
.0618
.0417
.0258
.0134
2
.72
.82
1.51
2.50
3.83
5.33
6.79
7.90
8.39
8.15
7.22
5.85
4.33
2.92
1.81
2.75
.94
Pooled
3.05
6.33
7.25
5.5
Ho : X ~ N(70.17,10.96)
25.88
d.f. = 11 – 1 –2 = 8
S70
x,s2
2
0.05,8
 15.507
 Accept H0 : X is N(70.17,10.96)
Does this mean that Height is Normally Distributed?
Yes, N(70.17,10.96)!
Probits: A graphical way to test Normality
The probit transformation:
The value of x that corresponds to a probability
if x is normal with mean 0 and variance 1
Sometimes with mean 3
(to make the 95% confidence positive)
x
P( x) 


x
f ( x)dx 
s

1 2
y
1
2
e dy
2
-3
-2
-1
0
1
Use of probit paper:
If you plot normally distributed data, you get a straight line:
Use of probit transformation on SPSS
Calculate the ranked data using the Syntax window:
rank x. (Creates the ranks in variable rx)
compute px=rx/#samples. (Normalizes to 1.00)
compute probitx=probit(px).
The new variable can be plotted against x.
It should be close to a straight line if x is normally distributed.
Statistics
SAMPLE
N
Va lid
Miss ing
Mean
61
0
-9.8130 E-02
Std . Error of Mea n
Medi an
Mode
.1 278
-.2 140
-2.74
Std . Deviation
Va riance
Skewness
Std . Error of Skewnes s
.9 984
.9 968
.0 15
.3 06
Kurtosis
Std . Error of Ku rtosi s
Rang e
Mini mum
.0 61
.6 04
5.14
-2.74
Maximu m
Percentiles
25
50
2.40
-.8 258
-.2 140
75
.6 397
a. Multiple mod es e xi st. The smal lest value is shown
a
Kolomogorov-Smirnov
goodness of fit test to the normal distribution
Compare the frequency polygon to the hypothesized curve
1.
2.
3.
4.
5.
Calculate the frequency polygon.
H0: N(mean, variance) or other distribution
Find or calculate the parameters (mean, variance, etc.)
Calculate the distribution function using H0
Compare:
(
 Maximum ( F ( x
)
))
d   Minimumi F ( x(i ) )  Fˆ ( x( i ) )
d
i
(i )
)  Fˆ ( x( i )
D  Maximum ( d  , d 
)
where
F ( x( i ) ) is the Hypothesized Distribution at order statistic x( i )
Fˆ ( x( i ) ) is the frequency polygon at x( i )
1.0
.9
.8
.7
.6
.5
.4
.3
.2
.1
0.0
-4
-3
-2
-1
0
1
2
One-Sa mple Kolm ogorov-Smirnov Test
N
Norma l Para mete rs
a,b
Most Extreme
Differences
Mean
Std . Deviation
Ab solu te
Pos itive
Nega tive
Kol mogo rov-Smirno v Z
As ym p. Si g. (2 -tail ed)
a. Te st d istri buti on i s Norma l.
b. Calcula ted from data .
SAMPLE
61
-9.8130 E-02
.9 984
.0 90
.0 90
-.0 70
.7 06
.7 01
3
4
But, these are tests of the
original distribution!
Do we necessarily care what that
distribution is?
We more often want to compare
the parameters of the distribution.
Has the distribution moved as a result of
treatment? Different populations?
Is the new treatment “better” than the old?