* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Laboratory 4 - School of Computer Science and Statistics
Survey
Document related concepts
Transcript
Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4: Comparing Two Independent Samples To complete the laboratory exercise, work your way through this handout, which is self contained and self explanatory. Work in pairs (two per machine), and learn from each other. Keep separate logs of your work. The tutor is available to help with technicalities and discuss substantive issues if necessary. Invitations to consider the results of Minitab analysis and their statistical and substantive interpretations are printed in italics. Take some time for this; consult your neighbour or tutor. Enter your responses in a Word document, as if draft contributions to a report on the experiment and its analysis. Topics: 1. 2. 3. Comparing two independent samples initial data analysis 2-sample t checking assumptions When are perceived sample differences statistically significant? A comprehensive exercise The final part of Laboratory 3 showed how to use a one sample test to compare two samples of measurements when the study design ensures that measurements in the two samples have been appropriately matched, so that the difference between the samples is adequately represented by the differences between matched pairs of measurements. The test of difference then becomes a test of the deviation from 0 of the average of the single "sample" of paired differences. When the matched pairs design is appropriate, the variation between measurements made on different subjects is removed from consideration when the paired differences are calculated, thus reducing the standard error to which the mean difference is referred via the t-statistic. However, pairing may not always be possible. Part 1 deals with a case where matching is not possible and the standard "2-sample t test" is appropriate. Initial data analysis is followed by application of the test and then diagnostic analysis to validate the standard assumptions underlying the test. In Part 2, the question of when a perceived difference, perhaps established through initial data analysis, corresponds to a statistically significant difference. The effect of a sequence of increasing hypothetical differences on both the perception and the fact are studied. Using simulated data, the effect of changing samples sizes on these issues may also be invesigated. In Part 3, students are invited to apply the approach to analysis used in Part 1 to another case study and bring the study to a conclusion in the form of a client report. Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 Learning Objectives: Be able to make and interpret dotplots, boxplots and numerical summaries of two samples of data as part of an initial data analysis, with reference to spread, level, patterns and exceptions compare and contrast horizontal and vertical boxplots implement and interpret a 2-sample t test explain key aspects of the test compare and contrast versions of the test assuming and not assuming equal standard deviations, using Minitab Help to seek relevant information explain the assumptions underlying the validity of the 2-sample t test and the consequences of their not being valid calculate residuals and make and interpret a Normal diagnostic plot of the residuals make and use Normal reference plots to assist with interpretation check the equal standard deviations assumption discuss the implications of exceptional cases in inference regarding standard deviations set up a procedure to check the effect of increasing mean difference between samples on the graphical impact of the difference and the statistical significance of the difference use computer simulation to assess the effect for varying sample sizes implement a comprehensive analysis of the differences between two independent samples and prepare a comprehensive report on the analysis. page 2 Trinity College, Dublin Generic Skills Programme 1. Statistics for Research Students Laboratory 4 Comparing two samples As part of a larger study of academic progress by males and females, IQ scores of samples of seventh grade boys and girls in a Mid-West USA school district were measured. Assuming that these samples were representative of all the seventh graders, male and female, in the school district, a basic question is: Is there evidence of a difference in IQ scores for boys and girls? If so, a supplementary question is: What is the size of this difference? The data are available in IQ Scores.xls in the GenericSkillsData folder; copy to Minitab. To facilitate brushing and identification of individual cases later, stack the data in a single column, with group identifiers in another column: 1.1 name C3 "IQ", name C4 "Group", from the Data menu, select Stack, / Columns enter Boys and Girls as the columns to stack, check "Column of current worksheet" and enter IQ, store subscripts in Group, ensure "Use variable names in subscript column" is checked, click OK. Initial data analysis Produce graphical and numerical summaries as follows: from the Graph menu, select Dotplot, then One Y, With Groups1, select IQ as the Graph variable, select Group as the categorical variable for grouping, click OK, repeat with boxplots, this time clicking on the Scale button and checking Transpose value and category scales, from the Stat menu, select Basic Statistics, then Display Descriptive Statistics, select IQ as the Variable and Group as the "By variable", click the Statistics button, check Mean, Standard deviation, Minimum, Maximum, N nonmissing, uncheck others, click OK, OK. Compare Boys' and Girls' IQ with regard to (a) spread (b) level Comment on patterns and exceptions. Transposing the value and category scales gives horizontal boxplots, consistent with the dotplots (and the usual convention for histograms). To see the effect of changing this, produce vertical boxplots. Which do you prefer? Why? 1 note that "One Y" means that the data are in one column, with group identifiers in another column. Recall that Y is frequently used in statistical notation to represent a response variable. Here, Y refers to the IW variable and may be regarded as a response to an IQ test page 3 Trinity College, Dublin Generic Skills Programme 1.2 Statistics for Research Students Laboratory 4 2-sample t Use a two-sample t-test to test the statistical significance of the difference between mean IQ's for boys and girls: from the Stat menu, choose Basic Statistics, then 2-sample t, check "Samples in one column", then select IQ for Samples and Group for Subscripts, check "Assume equal variances", click OK. Report on the result of the t-test. Explain the make up of the Pooled Standard Deviation. Why are the degrees of freedom = 76? Report on the confidence interval estimate. There is a suggestion that the spread of Girls IQ exceeds that of Boys. To allow for this possibility, repeat the 2-Sample t, allowing for unequal variances. List the differences in results. The key difference between the cases of equal and unequal variances is that the sampling distribution of the test statistic changes. When the variances are equal (and Normality applies), the t distribution is appropriate. When the variances are not equal, Minitab uses an approximation to the sampling distribution which is also a t distribution but with different degrees of freedom. To find out more about this, use Minitab context sensitive Help: edit the last dialog (Ctrl+E), click on the Help button in the dialog box, click on the "Equal or unequal variances" link at the end of the first Help page and read the resulting help, click on the "main topic" link, then "see also", "Methods and formulas", "Test statistics", read the Help. (Note the confusing use of "sample standard deviation, s", when "standard error" is meant). Which test do you prefer? Why? 1.3 Checking assumptions Both t-tests used above make use of the t distribution as the null hypothesis sampling distribution of the 2-sample t statistic. The validity of this use of the t distribution depends on certain assumptions including an assumption that the underlying frequency distribution of IQ is Normal. In addition, the first test assumes that the standard deviations of Boys and Girls IQs are equal. If these assumptions are invalid, then page 4 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 the significance level of the test may not be 5%, the p-value may be incorrectly calculated, the width of the confidence interval may be too narrow, or too wide. It is sensible, therefore, to check these assumptions, especially given the reservations noted at the initial data analysis stage. 1.3.1 Checking Normality The Normality assumption may be checked using a Normal probability plot of the residuals, defined as the individuals IQ values less the relevant sample mean. By combining the two samples, we have a larger sample on which to base the Normality check. By subtracting the relevant means, we ensure that the combined sets of residuals have a common sample mean of 0; combining the two samples with their different means would distort the Normal plot. This parallels the calculation of residuals at the end of Part 3 of Laboratory 3. Use the procedure in Laboratory 3 to calculate residuals, use C5, C6, C7 instead of C30, C31, C32, name C5 "Boys Res", C6 "Girls Res", C7 "Residuals". Before proceeding to produce the Normal plot, recall that, by default, Minitab puts the Normal scores on the vertical axis and the data on the horizontal, out of line with popular convention. To change this, from the Tools menu, select Options, open Individual Graphs, select Probability Plots, under Graph Orientation, check "Show raw data on vertical scale", click OK. To make the Normal plot, from the Graph menu, click on Probability Plot, choose Single, enter Residuals as the Graph variable, click on Distribution, then the Data Display tab, uncheck "Show confidence interval", click OK, OK. Discuss the result. The confidence interval, unchecked in the instructions above, provides an interval within which individual points from a genuine Normal sample are expected to fit. To see how this applies here, repeat the Normal plot, this time checking the "Show confidence interval" box. Discuss the result. Can you reconcile the values outside the confidence limits with the result of the Anderson-Darling (AD) test? To assist further with interpreting the Normal plot, make Normal reference plots: follow the instructions in Laboratory 2, page 5, for making 19 Normal reference plots, this time generating 78 rows and storing them in C8-C26, page 5 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 ensure that the "Show confidence interval" box is unchecked, compare the Normal plot of residuals with each of the reference plots in turn. For each reference plot, note any deviations from a straight line, compare the pattern in the Residuals plot to that in each reference plot. Do you think that the Residuals data follow a Normal model? Explain. 1.3.2 Checking Equal Standard Deviations The second assumption may be checked formally by testing the significance of the difference of the two sample standard deviations. The standard test for this, referred to as the F-test, is based on the ratio of the two standard deviations. This test is known to be sensitive to departures from the Normality assumption and so Minitab also provides a second test, Levene's test, designed to be less sensitive to lack of Normality. These tests may be implemented as follows: from the Stat menu, select Basic Statistics, then 2 Variances, check "Samples in one column", then select IQ for Samples and Group for Subscripts, click OK. The results are displayed in terms of confidence intervals for the two standard deviations, boxplots of the data, and summary results of the two tests. Comment on the results. Note the range of possible values of included in the confidence interval for Girls standard deviation. Discuss: what implications does the upper end have for possible spread of Girls IQ? how does this compare to the spread evident from the boxplots? what is needed to reduce the confidence interval width? Since p-values are reported for the two tests, critical values are not needed. Critical values for the F test are readily available (and may be computed using Minitab) and will be used later in the course. Levene's test uses approximate critical values which are the squares of the appropriate critical values for the 2-sample t test used above. (These are F-test critical values also, but for a different F-test). 1.4 Reanalysis The Normal diagnostic plot and the initial data analysis suggested the possibility that the smallest values in both Boys and Girls data may be exceptional. While it is not easy to decide on this issue, it makes sense to re-apply the t-test with these cases deleted, to see what effect, if any, their deletion has on the formal analysis. To do this, revisit the Normal diagnostic plot and brush all four suspect points (hold down the shift key while pointing at each point in succession), (if the brush is not working, remake the Normal diagnostic plot and turn on the brush) from the Data menu, select Subset Worksheet, check "Specify which rows to exclude", then check "Brushed rows", page 6 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 click OK, reapply the 2-sample t-test, as in §1.2 above. Report on the results of the t-test. Compare with the results of the earlier application. Comment 2. When are perceived sample differences statistically significant? While the initial data analysis suggested that girls' IQs were somewhat lower than boys', the ttest did not indicate statistical significance in the observed difference, meaning that the observed difference could be explained away in terms of chance variation and that, if different samples of boys and girls had been tested, the observed difference might not have recurred, or the difference might have been in the opposite direction. (One toss of a fair coin resulting in heads does not imply that all tosses of that coin will result in heads). An obvious question is "how big must an observed difference be to be statistically significant?" A partial answer to this question may be found by creating versions of the data adjusted to have increasingly bigger differences between boys and girls and observing the effect of increasing difference on both the dotplots and the value of t. Minitab can be set up to do this in a few steps that may be outlined as: first, create a column whose first cell will hold potential mean difference values, to be changed as desired, next, adjust the Boys IQ values so that Boys and Girls have the SAME mean, (subtract the mean difference, calculated earlier to be 5.12), and adjust further so that they have the desired mean difference, next, calculate the t-statistic for testing the difference between Boys (as adjusted) and Girls, finally, make a dotplot of the the Boys (as adjusted) and Girls. Minitab can be set to update the calculations and the dotplot each time the potential mean difference value, set up in the first step, is changed. The effect on both t and dotplot can then be observed simultaneously, as you iterate through a sequence of potential differences. This may be achieved as follows: name C28 enter 0 in row 1 of C28, name C29 use the Calculator to calculate 'Boys' – 5.12 + 'delta' in C29, check the "Assign as a formula" box, click OK, "delta"; "Boys+", delta will be set to 0 initially and then to successively increasing difference vales Boys+ will contain the Boys IQ values, adjusted to have the same mean as the Girls initially, and then increasing values, as delta is increased page 7 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 name C30 in C30, calculate MEAN('Boys+')-MEAN(Girls), name C31 in C31, calculate2 "Mean Diff", "SE Diff", the numerator of the t-statistic the denominator of the t-statistic sqrt(STDEV(Boys)**2/COUNT(Boys)+STDEV(Girls)**2/COUNT(Girls)) name C32 in C32, calculate C30/C31, make a dotplot of Boys+ and Girls, right-click the completed graph and check "Update Graph Automatically", "t", the t-statistic How do Boys+ and Girls compare in the dotplot? What is the value of t? Now, change delta to 1, then 2, then 3, etc. Note changes in the dotplot and in the value of t. How big is delta when the sample means are significantly different according to t? How big is delta when the samples appear "different" according to the dotplot? 3 A comprehensive exercise High Pressure Liquid Chromatography is a sophisticated methodology to separate, identify and quantify the constituents of chemical compounds. New variants on the basic methods are regularly being developed. At one stage in a HPLC method development programme, two different sulphonic acid sodium salts were used in one phase of the method. The variants were designated as Method A and Method B. The effects of the methods on the percentage recovery of the nominal level of the active ingredients of pharmaceutical products were being compared. The study involved each method being used in 12 separate analytical runs, with the methods being implemented in random order, over a period of several days. The results reported for each run are the averages of a fixed number of within-run replicates. The results for one of the test materials are shown below. 2 This is the Minitab version of the formula 2 Boy s nBoy s 2 Girls , for combining the standard errors of the nGirls mean for Boys and the mean for Girls into a single standard error for the difference between the means. page 8 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 Method A 95.0 97.3 95.7 95.7 94.8 95.8 94.2 93.0 96.2 95.9 96.2 94.9 Method B 97.3 97.2 95.2 95.6 99.2 96.2 98.5 95.9 96.0 98.0 95.9 96.8 They are available in the %Recovery dataset in the GenericSkillsData folder. Carry out a detailed analysis of the data. Include initial data analysis, detailed examination of the validity of the standard model, and standard tests of difference between the methods. Where statistical significance is established, report the result in the form of a confidence interval Prepare, in Microsoft Word, a short management report setting out your conclusions, with an appendix including all relevant computer output. page 9 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 4 Conclusion This concludes Laboratory 4. The learning objectives listed at the outset are reproduced here. Check them individually and ensure that you have achieved each one; seek help from the Tutor if necessary. Learning Objectives: Be able to make and interpret dotplots, boxplots and numerical summaries of two samples of data as part of an initial data analysis, with reference to spread, level, patterns and exceptions compare and contrast horizontal and vertical boxplots implement and interpret a 2-sample t test explain key aspects of the test compare and contrast versions of the test assuming and not assuming equal standard deviations, using Minitab Help to seek relevant information explain the assumptions underlying the validity of the 2-sample t test and the consequences of their not being valid calculate residuals and make and interpret a Normal diagnostic plot of the residuals make and use Normal reference plots to assist with interpretation check the equal standard deviations assumption discuss the implications of exceptional cases in inference regarding standard deviations set up a procedure to check the effect of increasing mean difference between samples on the statistical significance of the difference use computer simulation to assess the effect for varying sample sizes implement a comprehensive analysis of the differences between two independent samples and prepare a comprehensive report on the analysis. page 10