Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATGRAPHICS – Rev. 1/10/2005 Multiple Sample Comparison SnapStat Summary The Multiple Sample Comparison SnapStat creates a one-page summary that compares two or more independent samples of variable data. It includes tests to determine whether or not there are significant differences between the means and/or standard deviations of the populations from which the samples were taken. In addition, the data is displayed graphically using a multiple scatterplot, a multiple box-and-whisker plot, a means plot, and an ANOM plot. The calculations performed are a subset of those performed in the Multiple Sample Comparison procedure. However, the output is preformatted to fit on a single page. Sample StatFolio: multsamsnapstat.sgp Sample Data: The file pulse rates.sf6 contains the results of an experiment reported by Milliken and Johnson (1992) in which 78 workers were assigned at random to six groups. Each group was given a work task to perform, and pulse rates were measured after each individual had worked on his assigned task for one hour. After several individuals dropped out of the study, the final data were: Task 1 27 31 26 32 39 37 38 39 30 28 27 27 34 Task 2 29 28 37 24 35 40 40 31 30 25 29 25 Task 3 34 36 34 41 30 44 44 32 32 31 Task 4 34 34 43 44 40 47 34 31 45 28 Task 5 28 28 26 35 31 30 34 34 26 20 41 21 Task 6 28 26 29 25 35 34 37 28 21 28 26 The final n = 68 measurements have been arranged in q = 6 columns, one for each group of subjects. Alternatively, the data could have been arranged in a table with all of the pulse rates in a single column, together with a column identifying which task the subject was given. A portion of such a file is shown below: 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 1 STATGRAPHICS – Rev. 1/10/2005 Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 … Pulse Rate 27 31 26 32 39 37 38 39 30 28 27 27 34 29 28 37 24 35 40 40 31 30 25 29 25 34 … Task 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 … Either data structure can be analyzed by the Multiple Sample Comparison SnapStat procedure. If the same data is to be used in other procedures such as the General Linear Models procedure, it should be structured in the second manner. 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 2 STATGRAPHICS – Rev. 1/10/2005 Data Input When the Multiple Sample Comparison SnapStat is selected from the main menu, the first dialog box displayed asks you to specify the format in which the data has been entered: • Multiple Data Columns: indicates that each sample has been placed into a separate column. • Data and Code Columns: indicates that all observations have been placed into a single column, with a second column indicating which sample each observation belongs to. • Sample Statistics: indicates that the original observations are not available. However, the sample sizes, sample means, and sample standard deviations have been placed into 3 columns of the data sheet. In this case, some options will not be available. Multiple Data Columns If the data have been placed in separate columns for each sample, the column names must be entered on the second dialog box: 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 3 • STATGRAPHICS – Rev. 1/10/2005 Samples: two or more numeric columns containing the observations, one column for each sample. • Select: subset selection. Data and Code Columns If the data from all samples have been placed into a single column, then enter the name of that column and the column containing the group identifiers: • Data: numeric column containing the observations from all samples. • Level codes: numeric or non-numeric column containing an identifier for the sample corresponding to each data value. • Select: subset selection. Sample Statistics If the original observations are not available but the means and standard deviations of each sample are known, enter the sample statistics into separate columns of the datasheet: Task 1 2 3 4 5 6 Size 13 12 10 10 12 11 Mean 31.9231 31.0833 35.8000 38.0000 29.5000 28.8182 Standard Deviation 4.95751 5.66422 5.30827 6.59966 6.00757 4.75012 Then complete the second dialog box as shown below: 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 4 STATGRAPHICS – Rev. 1/10/2005 • Sample Means: numeric column containing the means of each sample. • Sample Standard Deviations: numeric column containing the standard deviations of each sample. • Sample Sizes: numeric column containing the sizes of each sample. • Sample Labels: optional column containing labels for each sample. • Select: subset selection. 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 5 STATGRAPHICS – Rev. 1/10/2005 Output The output from the SnapStat consists of a single page pf graphs and numerical statistics. SnapStat: Multiple Sample Comparison Count 13 12 10 10 12 11 68 Mean 31.9231 31.0833 35.8 38 29.5 28.8182 32.3088 Sigma 4.95751 5.66422 5.30827 6.59966 6.00757 4.75012 6.24203 50 45 response Sample Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Scatterplot 40 35 30 Box-and-Whisker Plot Task 2 ANOVA Table Sum of Source Squares Between 694.439 Within 1916.08 Total 2610.51 Task 3 P-Value = 0.0015 Task 1 Task 4 Task 6 Task 5 Task 4 Task 3 Task 1 20 Task 2 25 Mean Square 138.888 30.9045 Df 5 62 67 F-Ratio 4.49 Variance Check Levene's: 0.641611 P-Value = 0.6688 Task 5 Task 6 20 25 30 35 40 45 50 response Means Plot Analysis of Means Plot With 95% Decision Limits 39 38 37 CL=32.31 35 LDL=28.24 Mean 35 32 UDL=36.38 33 2005 by StatPoint, Inc. Task 6 Task 5 Task 4 Task 3 Task 2 Task 6 Task 5 Task 4 27 Task 3 26 Task 2 29 Task 1 31 29 Task 1 Mean With 95.0 Percent LSD Intervals 41 Multiple Sample Comparison SnapStat - 6 STATGRAPHICS – Rev. 1/10/2005 Summary Statistics (top left) The top left section of the output displays summary statistics for each sample of observations. The table includes: 1. Count: the number of observations in each sample, nj. 2. Mean: the average pulse rate for the subjects given each of the 6 tasks, Y j . 3. Sigma: the standard deviations of each sample, sj. Note that the group assigned to Task 4 has the highest mean and standard deviation. Scatterplot (top right) The top right section displays the observations within each group. It seems to suggest that pulse rates are somewhat higher for subjects assigned to tasks 3 and 4. Box-and-Whisker Plot (left center) The left center section of the output displays a multiple box-and-whisker plot. Box-and-whisker plots are constructed in the following manner: • A box is drawn extending from the lower quartile of the sample to the upper quartile. This is the interval covered by the middle 50% of the data values when sorted from smallest to largest. • A vertical line is drawn at the median (the middle value). • If requested, a plus sign is placed at the location of the sample mean. • Whiskers are drawn from the edges of the box to the largest and smallest data values, unless there are values unusually far away from the box (which Tukey calls outside points). Outside points, which are points more than 1.5 times the interquartile range (box width) above or below the box, are indicated by point symbols. Any points more than 3 times the interquartile range above or below the box are called far outside points, and are indicated by point symbols with plus signs superimposed on top of them. If outside points are present, the whiskers are drawn to the largest and smallest data values which are not outside points. In the sample data, the variability appears to be similar within each sample, although the locations show some differences. There are no outside points. 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 7 STATGRAPHICS – Rev. 1/10/2005 ANOVA Table and Variance Check (right center) The right center section contains both an analysis of variance table and a variance check. ANOVA Table This table divides the overall variability among the n measurements into two components: 1. A “within groups” component, which measures the variability among pulse rates of subjects given the same task. 2. A “between groups” component, which measures the variability among subjects given different tasks. Of particular importance is the F-ratio, which tests the hypothesis that the mean response for all samples is the same. Formally, it tests the null hypothesis H0: µ1 = µ2 = ... = µq versus the alternative hypothesis HA: not all µj equal If F is sufficiently large, the null hypothesis is rejected. The statistical significance of the F-ratio is most easily judged by its P-value. If the P-value is less than 0.05, the null hypothesis of equal means is rejected at the 5% significance level, as in the current example. This does not imply that every mean is significantly different from every other mean. It simply implies that the means are not all the same. Variance Check One of the assumptions underlying the analysis of variance is that the variances of the populations from which the samples come are the same. A test is performed to test the hypotheses: Null Hypothesis: all σj are equal Alt. Hypothesis: not all σj are equal If the P-Value for the test is small (less than 0.05 if operating at the 5% significance level), then the hypothesis of equal variances is rejected. The test performed depends on the selection on the ANOVA/Regression tab of the Preferences dialog box, accessible from the Edit menu. For the pulse rate data, the sample means are significantly different, but the sample variances are not. 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 8 STATGRAPHICS – Rev. 1/10/2005 Means Plot (bottom left) This plot shows the sample means together with uncertainty intervals. The type of interval plotted depends on the setting in the ANOVA/Regression tab of the Preferences dialog box, accessible from the edit menu. The type of intervals that may be selected are: • Confidence intervals - displays confidence intervals for the group means using the pooled within-group standard deviation: Y j ± tα / 2 , n − q • MS within nj (1) LSD intervals - designed to compare any pair of means with the stated confidence level. The intervals are given by Yj ± 2M 2 MS within nj (2) where M is defined as in the Multiple Range Tests. This formula also applies to the three selections below. • Tukey HSD Intervals - designed for comparing all pairs of means. The stated confidence level applies to the entire family of pairwise comparisons. • Scheffe Intervals - designed for comparing all contrasts. Not usually relevant here. • Bonferroni Intervals - designed for comparing a selected number of contrasts. Tukey’s intervals are usually tighter. Analysis of Means Plot (bottom left) This plot constructs a chart similar to a standard control chart, where each sample mean is plotted together with a centerline and upper and lower decision limits. The centerline is located at the grand average of all of the observations Y . The decision limits are located at Y ± hn −q ,1−α MS within nj q −1 q (3) where h is a critical value obtained from a table of the multivariate t distribution. The chart tests the null hypothesis that all of the sample means are equal to the grand mean. Any means that fall outside the decision limits indicate that the corresponding sample differs significantly from that overall mean. The advantage of the ANOM plot is that it shows at a glance which means are significantly different than the average of all the samples. It also does so using a type of chart with which many engineers and operators are quite familiar. It is easy to see from the above chart that Task 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 9 STATGRAPHICS – Rev. 1/10/2005 4 has a significantly higher pulse rate than average, while all of the other task means are within the decision limits. The procedure is exact if all sample sizes are equal and approximate if they don’t differ too much. 2005 by StatPoint, Inc. Multiple Sample Comparison SnapStat - 10