Download Normal quantile plots

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
Assessing Normality – intro
IPS 7e, pp. 65-67
Calculate the mean and standard deviation from the data. This determines the complete shape of a normal
distribution. If the data is from a normal
STDNORM1
distribution, it should "match up" (more or
less) with the normal curve with the
same mean and standard deviation.
12
10
8
6
Picture: Superimpose a normal curve
on a histogram, using its mean and s.d.
4
2
Call Center 80, pp.15-16
Stdnorm1 is standard normal data
0
-3.00
-1.00
-2.00
1.00
.00
3.00
2.00
Normal quantiles
The “idea” by hand: Look at the “usual” percentiles in the Call Ctr. Data.
(Row A : Actual Call Ctr. Percentiles from SPSS. Row B: z’s from those percentiles.)
IF a distribution is Normal, its percentiles should (when standardized) match the same percentiles in a Normal Z
distribution. You found z’s for HW 10th, 25th percentiles in a Normal dist. (#1.46, 1.49) I’ve added 5th, 95th.
Row C: IPS calls these “Normal scores.”
Row D: the Normal scores (z’s for percentiles in a Normal dist) turned to Lengths.
Percentiles
Call Center 80
5th
10th
25th
50th
75th
90th
95th
Act
A
length (seconds): SPSS %iles
3.05
9.20
54.25
103.50
200.50
432.80
700.00
ual
B
z =(length-mean)/sd = (x-196.58)/342.022
-.57
-.55
-.42
-.27
.01
.69
1.47
Nor
C
Normal score (% to table z)
-1.645
-1.282
-0.674
0
0.674
1.282
1.645
mal
D
-34.11
196.58
427.27
634.90
759.16
Mean + Normalscore*sd =(seconds) -366.00 -241.74
IF the data are Normal, the “Real” z-scores (B) shold match the expected Normal score z’s (C). The
“Real” lengths should match the lengths calculated from the expected Normal score z’s. (A=D)
From A to B, and from C to D, are always just linear transformations,
changes in axis labeling but nothing more. So graphing either-of-
A-or-B against
either-of-C-orD should give
a straight line,
IF data are
Normal.
NOT Normal
here.
IPS Normal Quantile: y=A , x=C
(cf. p.66 Fig 1.30)
SPSS Q-Q type (transposed)
y=A , x=D
Normal Quantiles (In general): We can find (by computer) the percentile value of each observation (the
proportion actually below that value). Then we can figure, if it IS from a normal distribution, what z-value
this would correspond to, that is, compare the percentile of our observation to the place of the same
percentile in a Normal distribution.
If we graph data from a normal distribution using this method, using actual data values on one axis, and the
Expected-if Normal z- (or x-) values on the other, they should lie on a straight line (it's just a linear
transformation). But if it's not a normal distribution, the percentiles won't lie in the right place and they
won't lie on a straight line.
NormalQuantile13.doc
1
Normal quantile-like plots with SPSS 21
Method 1: Built-in “Q-Q” plot This reverses IPS’s axes!
(but we can flip it.)
Analyze>Descriptive Statistics> Q-Q plots
Click your variable across into the Variables box.. The
rest should be mostly OK:
Test Distribution: Normal, Proportion Estimation
Formula: Blom’s,
Rank Assigned to ties: Mean
(This means that if there is a tie for 3rd, 4th, and 5th, they’ll
all be assigned rank “4th”)
Granularity (lots of equal values (ties), due to coarse
measurement scale or rounding): Choosing Break ties
arbitrarily will arbitrarily call one of the identical numbers
3rd, another 4th, the last 5th. This will result in little “straight
line” patterns like the one in Fig. 1.31 p. 67, where 4
countries had value 23. Choosing the default Mean will plot
them all at the “center” point.)
IPS
SPSS
The Default gives both axes in original units. You can choose
Transform: standardized values and get both in standardized (z) form.
(I think this is easier to understand than IPS, which uses original
units on one axis and standardized on the other.)
The graph has a line, y=x. For Normal data, the dots should
lie along this line.
Flip axes, to get IPS
form: Double-click graph
to go into Chart Editor,
do Options> Transpose
Chart. (I also clicked on
& deleted the line)
Axes flipped
Interpreting: Obviously,
if the data lie cleanly
along the line, it’s pretty
normal. If not, how to
interpret? If most values
lie along a straight line (not necessarily SPSS’s
line) and a few depart, the departures are outliers.
If they trail off from a straight-line pack in a curve at the “top” end, observed values bigger than expected
(concave up in IPS) that’s right skewed If the observed values trail off at the bottom end (smaller than
expected; concave, down in IPS), that’s left skewed.
If they make an “S” curve, the data is either pointier or squarer than a normal distribution, depending on
which way the S goes. (And depending on what’s on the horizontal axis.)
The Detrended graph (comes free with original) has the same x-values, but y-values are the vertical
distances from the line, reversed! Does this help in interpreting? I’m not sure.
Another way to get the plot: Analyze>Descriptive Statistics>Explore: Plots: Normality Plots with tests. Reversed
axes like “native” SPSS Q-Q, but labeled like IPS, z-scores on one axis, “raw” on the other.
NormalQuantile13.doc
2
Method 2 (Replicate IPS graphs “step by step”):
Calculate, for each observation, what percentile it is; then
what the corresponding standard normal value is, then what
that value should be in the original units. Luckily, SPSS
makes some of this straightforward.
SPSS will make us the new variables we need.
Transform>Rank Cases
Click your variable across to the Variables box.
(Ties button : Use the default, Mean. (Doesn’t have the
Break ties arbitrarily option))
Rank Types button: Choose Proportion estimates--gives
percentiles (as .135 instead of 13.5%) Note there are 4
choices for how to compute percentiles. Blom is fine.Choose
Normal scores--gives the z-score corresponding to the percentile.
Continue. OK creates 3 new variables, Plength (proportions=percentiles) and Nlength (expected
Normal scores) from the original variable length. (and Rlength, the ranks, from 1st to 80th)
Graphs>Legacy Dialogs>Scatter/dot> Simple Scatter
Drag Original variable (length here) to vertical (Y) axis, N-variable (Nlength here) to the horizontal (X)
axis. Get this graph. (I’ve prettied it up a bit, in the Chart Editor.)
Optional: You can also Create a new variable with the standard normal values transformed back to original
units. Transform>Compute Variable, p. 8 of big SPSS handout. Here create ExpLength (Expected
Normal Length) =
196.58 + 342.022*Nlength.
Expected normal = mean + sd*zscore.
Graph the original values against the expected normal values.
Selected data: compare with hand computation, p. 1. Bold at “percentiles”
Length Plength Nlength Explength
1
.008
-2.419 -630.67
2
.026
-1.935 -465.32
148 .674
.452
351.26
2
.026
-1.935 -465.32
157 .693
.505
369.23
3
.045
-1.694 -382.67
178 .706
.541
381.48
4
.058
-1.575 -342.09
179 .718
.577
393.97
9
.083
-1.388 -278.18
182 .731
.614
406.72
9
.083
-1.388 -278.18
199 .743
.653
419.78
9
.083
-1.388 -278.18
201 .755
.692
433.17 Q3
11
.107
-1.240 -227.55
203 .768
.732
446.94
19
.126
-1.145 -194.93
211 .780
.773
461.12
19
.126
-1.145 -194.93
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
325 .855
1.059
558.69
51
.220
-.773
-67.96
367 .868
1.115
577.98
52
.232
-.732
-53.78
372 .880
1.175
598.56
54
.245
-.692
-40.01
386 .893
1.240
620.71
55
.257
-.653
-26.62 Q1
438 .905
1.310
644.80
56
.269
-.614
-13.56
465 .917
1.388
671.34
57
.282
-.577
-.81
479 .930
1.475
701.10
59
.294
-.541
11.68
700 .949
1.631
754.56
64
.307
-.505
23.93
700 .949
1.631
754.56
~~~~~~~~~~~~~~~~~~~~
951 .967
1.842
826.71
88
.444
-.141
148.35
1148 .980
2.049
897.26
89
.456
-.110
159.11
2631 .992
2.419
1023.83
90
.469
-.078
169.84
102 .481
-.047
180.55
103 .494
-.016
191.24
104 .506
.016
201.92median
106 .519
.047
212.61
~~~~~~~~~~~~~~~~~~~~
NormalQuantile13.doc
3
How close does “normal” data come to the straight line?
Three sets of data generated from the standard normal distribution: (Transform, RV.NORMAL(0,1) Big
handout p.8 bottom)
i
s
O
O
O
N
6
6
6
M
3
7
2
M
8
7
9
S
4
8
7
STDNORM1
STDNORM3
STDNORM2
12
14
10
12
12
10
10
8
8
8
6
6
6
4
4
4
2
2
2
0
-3.00
-1.00
1.00
-2.00
3.00
.00
0
0
-3.0
2.00
-1.0
-2.0
Normal Q-Q Plot of STDNORM1
1.0
3.0
.0
-3.0
-1.7
2.0
2
2
2
1
1
1
0
0
0
-2
-3
-3
-2
-1
0
Observed Value
1
Exp ected No rmal
3
Exp ected No rmal
3
-1
-2
-3
2
3
-3
-2
-1
0
1
-1.0
1.0
.3
-2
-3
2
3
-3
-2
-1
0
(These are the “native” SPSS Q-Q plots, reversed axes from IPS. Since there’s little deviation from
straight, the reversal doesn’t cause a problem in interpretation.)
NormalQuantile13.doc
4
3.0
-1
Observed Value
Observed Value
2.3
1.7
Normal Q-Q Plot of STDNORM3
Normal Q-Q Plot of STDNORM2
3
-1
-.3
-2.3
1
2
3