Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 1, Feedback Feedback is provided for some of the tasks completed in the Laboratory, typically as model answers to some of the italicised questions. Comment on the values of the descriptive statistics appearing in the Session window, particularly with regard to between-sample comparisons. Descriptive Statistics: Before, During, After Variable Before During After N 20 20 20 N* 0 0 0 Mean 40.55 51.20 43.00 SE Mean 2.04 2.01 1.36 StDev 9.14 8.98 6.06 Min 30.00 37.00 32.00 Q1 34.00 44.00 40.00 Median 40.00 50.50 42.00 Q3 43.00 58.25 48.00 Max 69.00 69.00 54.00 There are 20 measurements in each sample, with no missing values. The mean duration increased from 40.55 to 51.2 when the modification was made and fell back again, to 43, after it was removed. SE Mean is inappropriate as a "descriptive" statistic. The standard deviation was around 9 for the Before and During samples, 6 for the After sample. The Minimum, Quartiles and Median follow the same general pattern as the mean. Maximum follows the same pattern as the standard deviation. The Note that both standard deviation and Maximum are relatively high for the Before sample. This is due to the presence of an exceptionally large value on the Before sample. The graphical analysis to be introduced shortly will clarify this. Check the definition of trimmed mean. Why do you think it is defined in this way? Minitab calculates a 5% trimmed mean. A 5% trimmed mean excludes the highest 5% and the lowest 5% of the values and averages the rest. The trimming excludes exceptional values at either end of the sample, which might otherwise bias the calculation of the mean. For these data, the trimmed means are shown below, along with the untrimmed means and the medians, which could be described as the ultimately trimmed means. Descriptive Statistics: Before, During, After Variable Before During After Mean 40.55 51.20 43.00 TrMean 39.56 51.00 43.00 Median 40.00 50.50 42.00 In this case, trimming has negligible effect on any of the samples. Dotplot of Before, During, After Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 1, Feedback Make dotplots Before During After 30 35 40 45 50 Data 55 60 65 70 Interpret the results; give a verbal description of any patterns that you see and any exceptions to those patterns. The plots show the same general pattern of level (location) and spread seen in the numerical summaries. The exceptional case in the Before sample is clear. Apart from this, all three samples show higher frequency in the middle, lower towards the tails, consistent with the Normal model, (though not confirming it). Dotplot of Duration It appears that duration increased when the modification was put in place and decreased again when it was removed. The shift seems substantial, even in the context of the relatively wide spread in each sample. Sample Making dotplots from the stacked data leads to After Before During 30 35 40 45 50 Duration 55 60 65 70 Compare and contrast the two plots. Which do you prefer? Why? Which shows the Before – During – After sequence best? Which shows the effect of the process change best? Time order may be preferable to alphabetical, in that it is related to a key feature of the study. Placing the two samples without the modification in close proximity highlights their similarity as well as the difference in the "During" sample. Make boxplots Compare the results of brushing boxplots with brushing dotplots. As only exceptional cases are displayed individually in boxplots, only those cases respond to brushing in boxplots. page 2 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 1, Feedback Note that, with larger data sets, the dots in dotplots represent more than one case, so the brush does not work at all. (See Exercise 3) Make histograms The Multiple Graphs button in the Histogram dialog box offers an option to "Show Graph Variables In separate panels of the same graph". This results in the following layout: Histogram of Before, During, After Before 8 During 6 Frequency 4 2 0 8 After 30 40 50 60 70 6 4 2 0 30 40 50 60 70 in which the histograms are not stacked and, therefore, difficult to compare. It is preferable to stack the individual histograms using the Layout Tool, as in the Laboratory. When displaying this, the choice of display size is important. A small display such as page 3 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 1, Feedback 30 40 50 60 70 60 70 60 70 Before 30 40 50 During 30 40 50 After Is preferable to an oversized display such as the one below. The task of comparing different histograms spread over a large area requires considerable eye travel. There is considerable research to show that this inhibits easy interpretation page 4 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 1, Feedback 30 40 50 60 70 60 70 60 70 Before 30 40 50 During 30 40 50 After page 5 Trinity College, Dublin Generic Skills Programme Statistics for Research Students Laboratory 1, Feedback Dot plots, box plots and histograms graphically convey information concerning frequency distributions. Which of the three conveys the most information? Which conveys the least? The dot plots convey most information, in that they display all the data points. Boxplots convey the least, being based on 5-number summaries (see SideNote, page 12). For the purpose of assessing the effect of the process change, choosing on form of display is a matter of personal preference. Boxplots seem to differentiate more clearly between samples. For the purpose of assessing Normality, the histograms give a better view of the general shape of the frequency distributions. (However, there are better ways of assessing Normality, particularly the Normal probability plot that will arise later in the course). A key question is: how different do the samples have to be to conclude that the process change had an effect? Is there really a difference between During and Before or After? Is there really a difference between Before and After? Conceivably, any apparent change could be explained by the chance variation known to affect all processes. A formal answer is supplied by a test of statistical significance. If it is concluded that there is a process change effect, a more important question is what are the consequences of the observed effect, as measured in terms of process improvement, consequent customer satisfaction, ultimately, increased profit. Which would you prefer as a summary statistic for spread in the Before sample, standard deviation, range or interquartile range. Interquartile Range is insensitive to the presence of the exceptional case and so presents an assessment of spread more appropriate to the normal operation of the process. Exceptional cases should be treated as such. Statisticians describe estimates that are insensitive to the presence of the exceptional cases as robust. page 6