Download lab 9 anova - Faculty Web Pages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Mathematics 244
Studio Lab 8: ANALYSIS OF VARIANCE
Please answer questions in the answer template (Lab8ans.doc).
Introduction
The analysis of variance (ANOVA) procedure enables us to perform a test of the research
hypothesis, Ha, of whether there are any differences in the means of several populations,
provided that certain assumptions hold. Using data obtained by taking simple random samples
from each population, the test statistic is based on a comparison of the “average” variation
between samples with the “average” variation within samples. The larger the variation between
samples relative to the variation within, the greater the evidence against the null hypothesis, H0,
that the population means are equal.
As a consequence of this approach to testing the null and alternative hypotheses above, for a
given difference between population means (Ha is true), the greater the variation within the
populations, the less likely we are to detect this difference (that is, correctly reject H0).
We will explore this idea further in the lab by comparing the sodium levels of three different
breakfast cereals:
“Froot Loops” (manufactured by Kellogg’s), “Fruity Pebbles”
(manufactured by Post), and “Trix” (manufactured by General Mills). Based on the nutritioninformation labels, for 1-ounce (approximately 30-g) servings, the sugar content per serving of
these three cereals is about the same. However, the per-serving sodium content is different: 125
mg for “Froot Loops”, 135 mg for “Fruity Pebbles”, and 140 mg for “Trix”.1
Now it will not be the case that every single serving of these cereals will contain exactly these
stated amounts — there will be variation in the sodium content of single servings of any cereal.
This is the variation within the population of all possible single servings of each cereal produced
by the manufacturing process. To get an idea of the magnitude of this variation for all three
cereals, we consulted a food scientist, who indicated that the standard deviation of sodium
content for each is approximately 12 mg per serving.
Part I: The Mechanics Behind ANOVA
Using the nominal values of the manufacturers’ stated sodium contents as population means and
the standard deviation of 12 mg, we created normal populations of the sodium content in the
servings for each of the three cereals. Using these populations, we will model the sampling of
servings from the three manufacturers’ processes. We will draw random samples of size 20 from
each cereal type and perform an ANOVA to test the null hypothesis that the mean per-serving
1
1993 figures. Data obtained from the StatLib archive of the American Statistical Association at
http://lib.stat.cmu.edu/datasets/1993.expo/cereal .
sodium contents are the same for all three cereals vs. the alternative hypothesis that some of the
mean per-serving sodium contents for the three cereals differ.
i) What assumptions must be met in order to perform an ANOVA? (You may wish to refer to
“Procedures” handout)
ii) Specify the hypotheses to be tested.
H0:
Ha:
iii) Is H0 or Ha true here?
Even though we know which of the two hypotheses is true, we will nevertheless go through the
process of taking independent samples of cereal servings using the three populations of poker
chips to demonstrate the necessary sampling protocol for an analysis of variance, and also to
determine if our test makes the correct decision. We will sample n1, n2, and n3 poker chips oneat-a-time with replacement from “Froot Loops”, “Fruity Pebbles”, and “Trix”, respectively.
iv) In a blank MINITAB worksheet, label columns C1-C3 with the cereal names. Type all of the
collected sodium-content observations into the appropriate columns.
v) What is the number of observations in each sample?
observations?
n1 =
n2 =
n3 =
What is the total number of
N=
Now we will test the hypotheses, using an ANOVA, at the 0.05 level of significance.
vii) Calculate the numbers of degrees of freedom for this test. Be sure to indicate your
calculations. Determine the critical value, F, for this test. Use  = 0.05.
df = ( , )
F =
viii) Perform an ANOVA to test the hypotheses as follows. MINITAB has an “unstacked” oneway analysis-of-variance procedure which also produces simultaneous boxplots of the data.
TYPING COMMANDS
USING THE MOUSE
MTB > aovoneway C1 – C3 ;
(may be abbreviated “aovo”)
StatANOVAOneway (Unstacked)...
(not just “Oneway”)
SUBS> gbox .
Type C1 – C3 in the box for the Responses.
2
Click on Graphs...
Select Boxplots of data.
Click OK.
Click OK.
Copy both the ANOVA output (from the session window) and the boxplots here.
ix) Compare the value the test statistic, F0, to F. What decision do you make?
x) Use the P-value from the ANOVA output to conduct the test and write down your decision.
xi) Does the information provided by the boxplots confirm your decision? Explain.
xii) Write a concluding sentence to interpret the results of the test in the context of the mean
sodium content of these three cereals.
xiii) Is the test decision correct?
xiv) Note that the MINITAB output includes a graphical display of confidence intervals for each
of the three population means. Although these confidence intervals do not provide a complete
picture of the differences in means, they provide a rough idea of when to reject H0, namely, when
two or more intervals don’t overlap. Based on this notion, suggest why the confidence intervals
support the test results.
Part II: Effect of Within-Population Variation on the Power of the Test
In this part of the lab, we will sample from large populations of the sodium content of 5000
single servings of these three cereals in order to examine how the level of within-population
variation influences our ability to detect a difference in population means when one exists.
Task A: Small Within-Population Variation
Using the manufacturers’ nominal per-serving sodium content values as the population means
and the quoted standard deviation of 12 mg per serving as the population standard deviations, we
3
created a normal population of the sodium content of 5000 servings for each of the three cereals.
Columns C1-C3 of the MINITAB worksheet lab8dat.mtw contain these populations.
We will take a random sample of 20 servings from each of the three populations. We want to
know, based on our samples, whether we can detect any difference in the mean sodium content
per serving of these three cereals.
i) Specify the hypotheses to be tested. Which is true?
H0:
Ha:
ii) Use MINITAB to select a sample of size 20 without replacement from each population,
placing the samples in columns C4-C6, respectively. (Recall the SAMPLE command: sample
20 c1 c4 — repeat for the others.)
iii) Stack your three samples, storing the per-serving sodium levels in column C7 and the
subscripts in column C8.
iv) Perform an ANOVA to test the hypotheses using the following procedure:
TYPING COMMANDS
MTB > oneway C7 C8
USING THE MOUSE
StatANOVAOneway...
(not “Oneway Unstacked”)
Type C7 for the Response variable
and C8 for the Factor.
Click OK.
Copy the ANOVA output from the session window here.
v) What decision do you make? Use  = 0.05.
vi) Write a concluding sentence to interpret the results of the test in the context of the mean
sodium content of these three cereals.
Task B: Large Within-Population Variation
What if the quoted standard deviation of 12 mg per serving were incorrect? For example, what if
the sodium content per serving were 24 mg instead of 12? Using the value of the manufacturers’
nominal per-serving sodium content values as population means and 24 mg per serving as the
population standard deviations, we created a normal population of the sodium content of 5000
4
servings for each of the three cereals.
lab8dat.mtw contain these populations.
Columns C11-C13 of the MINITAB worksheet
i) Specify the hypotheses to be tested. Which is true?
H0:
Ha:
ii) Use MINITAB to select a sample of size 20 without replacement from each population,
placing the samples in columns C14-C16, respectively.
iii) Stack your three samples, storing the per-serving sodium levels in column C17 and the
subscripts in column C18.
iv) Perform an ANOVA to test the hypotheses.
v) What decision do you make? Use  = 0.05.
vi) Write a concluding sentence to interpret the results of the test in the context of the mean
sodium content of these three cereals.
Report your results to the Lab Instructor, who will collect and display them.
vii) (a) For what proportion of the replications in the lab was the null hypothesis rejected this
time? (b) What quantity does this proportion estimate?
(a)
(b)
Task C: Comparison of the Results from Tasks A and B
i) Plot the data for both sets of stacked samples obtained in Tasks A and B simultaneously,
using the following procedure:
USING THE MOUSE
GraphPlot...
Type C8 for the Y variable and C7 for the X variable for Graph 1.
Type C18 for the Y variable and C17 for the X variable for Graph 2.
Select DisplaySymbol and For eachGroup and type C8 for the
Group variables. (Do this only once, not twice.)
Select FrameMultiple Graphs...
5
Select Each graph on a separate page.
Select Same X and same Y.
Click OK.
Click OK.
<Note: There is a page-break here for a reason. If your boxes on the next page cross a page
boundary, you may not be able to print your lab.>
6
<Note: There is a page-break here for a reason. If your boxes below cross a page boundary, you
will likely not be able to print your lab.>
Copy the graphs on the left and fill-in the corresponding values of si on the right. Identify the
values of x1 , x 2 , and x 3 for each graph, writing them in by hand, once you have printed your
lab. Draw a vertical line (by hand, once you have printed your lab) from each xi to its
associated “dotplot.” Note that the MINITAB output from the ONEWAY command provides
the sample means and standard deviations.
s3 =
s2 =
s1 =
s3 =
s2 =
s1 =
ii) By comparing the three “dotplots” obtained for the samples in Task A to those for the samples
in Task B, explain how the spread in the plot affects your ability to detect differences in the
means of the three populations.
iii) Compare the sample standard deviations from the samples taken in Task A with the sample
standard deviations from the samples taken in Task B. Which group of values is larger? Is this
surprising? Explain.
iv) What happens to the power of the test when the within-population variation increases?
Explain what this implies in terms of how the within-population variation affects our ability to
detect a difference between population means when one exists.
7