Download Feedback Lab 1 - Trinity College Dublin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 1, Feedback
Feedback is provided for some of the tasks completed in the Laboratory, typically as model
answers to some of the italicised questions.
Comment on the values of the descriptive statistics appearing in the Session window,
particularly with regard to between-sample comparisons.
Descriptive Statistics: Before, During, After
Variable
Before
During
After
N
20
20
20
N*
0
0
0
Mean
40.55
51.20
43.00
SE Mean
2.04
2.01
1.36
StDev
9.14
8.98
6.06
Min
30.00
37.00
32.00
Q1
34.00
44.00
40.00
Median
40.00
50.50
42.00
Q3
43.00
58.25
48.00
Max
69.00
69.00
54.00
There are 20 measurements in each sample, with no missing values. The mean duration
increased from 40.55 to 51.2 when the modification was made and fell back again, to 43, after it
was removed.
SE Mean is inappropriate as a "descriptive" statistic.
The standard deviation was around 9 for the Before and During samples, 6 for the After sample.
The Minimum, Quartiles and Median follow the same general pattern as the mean.
Maximum follows the same pattern as the standard deviation.
The
Note that both standard deviation and Maximum are relatively high for the Before sample. This
is due to the presence of an exceptionally large value on the Before sample. The graphical
analysis to be introduced shortly will clarify this.
Check the definition of trimmed mean. Why do you think it is defined in this way?
Minitab calculates a 5% trimmed mean. A 5% trimmed mean excludes the highest 5% and the
lowest 5% of the values and averages the rest. The trimming excludes exceptional values at
either end of the sample, which might otherwise bias the calculation of the mean.
For these data, the trimmed means are shown below, along with the untrimmed means and the
medians, which could be described as the ultimately trimmed means.
Descriptive Statistics: Before, During, After
Variable
Before
During
After
Mean
40.55
51.20
43.00
TrMean
39.56
51.00
43.00
Median
40.00
50.50
42.00
In this case, trimming has negligible effect on any of the samples.
Dotplot of Before, During, After
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 1, Feedback
Make dotplots
Before
During
After
30
35
40
45
50
Data
55
60
65
70
Interpret the results; give a verbal description of any patterns that you see and any
exceptions to those patterns.
The plots show the same general pattern of level (location) and spread seen in the numerical
summaries. The exceptional case in the Before sample is clear. Apart from this, all three
samples show higher frequency in the middle, lower towards the tails, consistent with the
Normal model, (though not confirming it).
Dotplot of Duration
It appears that duration increased when the modification was put in place and decreased again
when it was removed. The shift seems substantial, even in the context of the relatively wide
spread in each sample.
Sample
Making dotplots from the stacked data leads to
After
Before
During
30
35
40
45
50
Duration
55
60
65
70
Compare and contrast the two plots. Which do you prefer? Why?
Which shows the Before – During – After sequence best?
Which shows the effect of the process change best?
Time order may be preferable to alphabetical, in that it is related to a key feature of the study.
Placing the two samples without the modification in close proximity highlights their similarity as
well as the difference in the "During" sample.
Make boxplots
Compare the results of brushing boxplots with brushing dotplots.
As only exceptional cases are displayed individually in boxplots, only those cases respond to
brushing in boxplots.
page 2
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 1, Feedback
Note that, with larger data sets, the dots in dotplots represent more than one case, so the brush
does not work at all. (See Exercise 3)
Make histograms
The Multiple Graphs button in the Histogram dialog box offers an option to "Show Graph
Variables In separate panels of the same graph". This results in the following layout:
Histogram of Before, During, After
Before
8
During
6
Frequency
4
2
0
8
After
30
40
50
60
70
6
4
2
0
30
40
50
60
70
in which the histograms are not stacked and, therefore, difficult to compare. It is preferable to
stack the individual histograms using the Layout Tool, as in the Laboratory. When displaying
this, the choice of display size is important. A small display such as
page 3
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 1, Feedback
30
40
50
60
70
60
70
60
70
Before
30
40
50
During
30
40
50
After
Is preferable to an oversized display such as the one below. The task of comparing different
histograms spread over a large area requires considerable eye travel. There is considerable
research to show that this inhibits easy interpretation
page 4
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 1, Feedback
30
40
50
60
70
60
70
60
70
Before
30
40
50
During
30
40
50
After
page 5
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 1, Feedback
Dot plots, box plots and histograms graphically convey information concerning
frequency distributions. Which of the three conveys the most information? Which
conveys the least?
The dot plots convey most information, in that they display all the data points. Boxplots convey
the least, being based on 5-number summaries (see SideNote, page 12).
For the purpose of assessing the effect of the process change, choosing on form of display is a
matter of personal preference.
Boxplots seem to differentiate more clearly between samples.
For the purpose of assessing Normality, the histograms give a better view of the general shape
of the frequency distributions. (However, there are better ways of assessing Normality,
particularly the Normal probability plot that will arise later in the course).
A key question is: how different do the samples have to be to conclude that the process
change had an effect? Is there really a difference between During and Before or After? Is
there really a difference between Before and After?
Conceivably, any apparent change could be explained by the chance variation known to affect
all processes. A formal answer is supplied by a test of statistical significance. If it is concluded
that there is a process change effect, a more important question is what are the consequences
of the observed effect, as measured in terms of process improvement, consequent customer
satisfaction, ultimately, increased profit.
Which would you prefer as a summary statistic for spread in the Before sample,
standard deviation, range or interquartile range.
Interquartile Range is insensitive to the presence of the exceptional case and so presents an
assessment of spread more appropriate to the normal operation of the process. Exceptional
cases should be treated as such.
Statisticians describe estimates that are insensitive to the presence of the exceptional cases as
robust.
page 6