Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Assessing Normality – intro IPS 7e, pp. 65-67 Calculate the mean and standard deviation from the data. This determines the complete shape of a normal distribution. If the data is from a normal STDNORM1 distribution, it should "match up" (more or less) with the normal curve with the same mean and standard deviation. 12 10 8 6 Picture: Superimpose a normal curve on a histogram, using its mean and s.d. 4 2 Call Center 80, pp.15-16 Stdnorm1 is standard normal data 0 -3.00 -1.00 -2.00 1.00 .00 3.00 2.00 Normal quantiles The “idea” by hand: Look at the “usual” percentiles in the Call Ctr. Data. (Row A : Actual Call Ctr. Percentiles from SPSS. Row B: z’s from those percentiles.) IF a distribution is Normal, its percentiles should (when standardized) match the same percentiles in a Normal Z distribution. You found z’s for HW 10th, 25th percentiles in a Normal dist. (#1.46, 1.49) I’ve added 5th, 95th. Row C: IPS calls these “Normal scores.” Row D: the Normal scores (z’s for percentiles in a Normal dist) turned to Lengths. Percentiles Call Center 80 5th 10th 25th 50th 75th 90th 95th Act A length (seconds): SPSS %iles 3.05 9.20 54.25 103.50 200.50 432.80 700.00 ual B z =(length-mean)/sd = (x-196.58)/342.022 -.57 -.55 -.42 -.27 .01 .69 1.47 Nor C Normal score (% to table z) -1.645 -1.282 -0.674 0 0.674 1.282 1.645 mal D -34.11 196.58 427.27 634.90 759.16 Mean + Normalscore*sd =(seconds) -366.00 -241.74 IF the data are Normal, the “Real” z-scores (B) shold match the expected Normal score z’s (C). The “Real” lengths should match the lengths calculated from the expected Normal score z’s. (A=D) From A to B, and from C to D, are always just linear transformations, changes in axis labeling but nothing more. So graphing either-of- A-or-B against either-of-C-orD should give a straight line, IF data are Normal. NOT Normal here. IPS Normal Quantile: y=A , x=C (cf. p.66 Fig 1.30) SPSS Q-Q type (transposed) y=A , x=D Normal Quantiles (In general): We can find (by computer) the percentile value of each observation (the proportion actually below that value). Then we can figure, if it IS from a normal distribution, what z-value this would correspond to, that is, compare the percentile of our observation to the place of the same percentile in a Normal distribution. If we graph data from a normal distribution using this method, using actual data values on one axis, and the Expected-if Normal z- (or x-) values on the other, they should lie on a straight line (it's just a linear transformation). But if it's not a normal distribution, the percentiles won't lie in the right place and they won't lie on a straight line. NormalQuantile13.doc 1 Normal quantile-like plots with SPSS 21 Method 1: Built-in “Q-Q” plot This reverses IPS’s axes! (but we can flip it.) Analyze>Descriptive Statistics> Q-Q plots Click your variable across into the Variables box.. The rest should be mostly OK: Test Distribution: Normal, Proportion Estimation Formula: Blom’s, Rank Assigned to ties: Mean (This means that if there is a tie for 3rd, 4th, and 5th, they’ll all be assigned rank “4th”) Granularity (lots of equal values (ties), due to coarse measurement scale or rounding): Choosing Break ties arbitrarily will arbitrarily call one of the identical numbers 3rd, another 4th, the last 5th. This will result in little “straight line” patterns like the one in Fig. 1.31 p. 67, where 4 countries had value 23. Choosing the default Mean will plot them all at the “center” point.) IPS SPSS The Default gives both axes in original units. You can choose Transform: standardized values and get both in standardized (z) form. (I think this is easier to understand than IPS, which uses original units on one axis and standardized on the other.) The graph has a line, y=x. For Normal data, the dots should lie along this line. Flip axes, to get IPS form: Double-click graph to go into Chart Editor, do Options> Transpose Chart. (I also clicked on & deleted the line) Axes flipped Interpreting: Obviously, if the data lie cleanly along the line, it’s pretty normal. If not, how to interpret? If most values lie along a straight line (not necessarily SPSS’s line) and a few depart, the departures are outliers. If they trail off from a straight-line pack in a curve at the “top” end, observed values bigger than expected (concave up in IPS) that’s right skewed If the observed values trail off at the bottom end (smaller than expected; concave, down in IPS), that’s left skewed. If they make an “S” curve, the data is either pointier or squarer than a normal distribution, depending on which way the S goes. (And depending on what’s on the horizontal axis.) The Detrended graph (comes free with original) has the same x-values, but y-values are the vertical distances from the line, reversed! Does this help in interpreting? I’m not sure. Another way to get the plot: Analyze>Descriptive Statistics>Explore: Plots: Normality Plots with tests. Reversed axes like “native” SPSS Q-Q, but labeled like IPS, z-scores on one axis, “raw” on the other. NormalQuantile13.doc 2 Method 2 (Replicate IPS graphs “step by step”): Calculate, for each observation, what percentile it is; then what the corresponding standard normal value is, then what that value should be in the original units. Luckily, SPSS makes some of this straightforward. SPSS will make us the new variables we need. Transform>Rank Cases Click your variable across to the Variables box. (Ties button : Use the default, Mean. (Doesn’t have the Break ties arbitrarily option)) Rank Types button: Choose Proportion estimates--gives percentiles (as .135 instead of 13.5%) Note there are 4 choices for how to compute percentiles. Blom is fine.Choose Normal scores--gives the z-score corresponding to the percentile. Continue. OK creates 3 new variables, Plength (proportions=percentiles) and Nlength (expected Normal scores) from the original variable length. (and Rlength, the ranks, from 1st to 80th) Graphs>Legacy Dialogs>Scatter/dot> Simple Scatter Drag Original variable (length here) to vertical (Y) axis, N-variable (Nlength here) to the horizontal (X) axis. Get this graph. (I’ve prettied it up a bit, in the Chart Editor.) Optional: You can also Create a new variable with the standard normal values transformed back to original units. Transform>Compute Variable, p. 8 of big SPSS handout. Here create ExpLength (Expected Normal Length) = 196.58 + 342.022*Nlength. Expected normal = mean + sd*zscore. Graph the original values against the expected normal values. Selected data: compare with hand computation, p. 1. Bold at “percentiles” Length Plength Nlength Explength 1 .008 -2.419 -630.67 2 .026 -1.935 -465.32 148 .674 .452 351.26 2 .026 -1.935 -465.32 157 .693 .505 369.23 3 .045 -1.694 -382.67 178 .706 .541 381.48 4 .058 -1.575 -342.09 179 .718 .577 393.97 9 .083 -1.388 -278.18 182 .731 .614 406.72 9 .083 -1.388 -278.18 199 .743 .653 419.78 9 .083 -1.388 -278.18 201 .755 .692 433.17 Q3 11 .107 -1.240 -227.55 203 .768 .732 446.94 19 .126 -1.145 -194.93 211 .780 .773 461.12 19 .126 -1.145 -194.93 ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ 325 .855 1.059 558.69 51 .220 -.773 -67.96 367 .868 1.115 577.98 52 .232 -.732 -53.78 372 .880 1.175 598.56 54 .245 -.692 -40.01 386 .893 1.240 620.71 55 .257 -.653 -26.62 Q1 438 .905 1.310 644.80 56 .269 -.614 -13.56 465 .917 1.388 671.34 57 .282 -.577 -.81 479 .930 1.475 701.10 59 .294 -.541 11.68 700 .949 1.631 754.56 64 .307 -.505 23.93 700 .949 1.631 754.56 ~~~~~~~~~~~~~~~~~~~~ 951 .967 1.842 826.71 88 .444 -.141 148.35 1148 .980 2.049 897.26 89 .456 -.110 159.11 2631 .992 2.419 1023.83 90 .469 -.078 169.84 102 .481 -.047 180.55 103 .494 -.016 191.24 104 .506 .016 201.92median 106 .519 .047 212.61 ~~~~~~~~~~~~~~~~~~~~ NormalQuantile13.doc 3 How close does “normal” data come to the straight line? Three sets of data generated from the standard normal distribution: (Transform, RV.NORMAL(0,1) Big handout p.8 bottom) i s O O O N 6 6 6 M 3 7 2 M 8 7 9 S 4 8 7 STDNORM1 STDNORM3 STDNORM2 12 14 10 12 12 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 -3.00 -1.00 1.00 -2.00 3.00 .00 0 0 -3.0 2.00 -1.0 -2.0 Normal Q-Q Plot of STDNORM1 1.0 3.0 .0 -3.0 -1.7 2.0 2 2 2 1 1 1 0 0 0 -2 -3 -3 -2 -1 0 Observed Value 1 Exp ected No rmal 3 Exp ected No rmal 3 -1 -2 -3 2 3 -3 -2 -1 0 1 -1.0 1.0 .3 -2 -3 2 3 -3 -2 -1 0 (These are the “native” SPSS Q-Q plots, reversed axes from IPS. Since there’s little deviation from straight, the reversal doesn’t cause a problem in interpretation.) NormalQuantile13.doc 4 3.0 -1 Observed Value Observed Value 2.3 1.7 Normal Q-Q Plot of STDNORM3 Normal Q-Q Plot of STDNORM2 3 -1 -.3 -2.3 1 2 3