Download Statistics 230 All Homework Assignments Section 0 1. Computer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics 230
All Homework Assignments
Section 0
1. Computer output for analysis of a random sample of data is shown below. Some of the
quantities are missing. Compute the values of the missing quantities:
Variable N Mean SE of mean Std. Dev. Variance Minimum Maximum
Y
9 19.96
?
3.12
?
15.94
27.16
2. Consider the computer output shown below.
One-Sample T: Y
Test of mu=91 vs. not=91
Variable N
Mean
Std. Dev SE of mean
Y
25 92.5805
?
0.4673
95% CI
(91.6160, ?)
T
3.38
P
0.002
(a) Fill in the missing values in the output. Can the null hypothesis be rejected at the 0.05
level? Why?
(b) Is this a one-sided or a two-sided test?
(c) If the hypotheses had been H0 : µ = 90 versus H1 : µ 6= 90 would you reject the null
hypothesis at the 0.05 level?
(d) Use the output and the t table (or R) to find a 99 percent two-sided CI on the mean.
(e) What would the P-value be if the alternative hypothesis were H1 : µ > 91?
3. Two machines are used for filling plastic bottles with a net volume of 16.0 ounces. The
filling processes can be assumed to be normal, with standard deviations of σ1 = 0.015 and
σ2 = 0.018. The quality engineering department suspects that both machines fill to the
same net volume, whether or not this volume is 16.0 ounces. An experiment is performed
by taking a random sample from the output of each machine. (Don’t forget that for parts
(a)-(c) of this problem, we know σ1 and σ2 .)
Machine 1: 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02, 15.99
Machine 2: 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01, 16.00
(a) State the hypotheses that should be tested in this experiment.
(b) Find the P-value for this test and test these hypotheses using α = 0.05. What are your
conclusions?
(c) Find a 95 percent confidence interval on the difference in the mean fill volume for the
two machines.
(d) Re-do part (b), but this time assume that σ1 = σ2 = σ and that the value of σ is
unknown.
4. Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit
pattern can be imaged onto the wafer. After application, the coated wafers are baked
to remove the solvent in the photoresist mixture and to harden the resist. Here are
measurements of photoresist thickness (in kA) for eight wafers baked at 95C and eight wafers
baked at 100C. Assume that all 16 of the runs were made in random order.
1
Statistics 230
All Homework Assignments
95 C: 11.176, 7.089, 8.097, 11.739, 11.291, 10.759, 6.467, 8.315
100 C: 5.263, 6.748, 7.461, 7.015, 8.133, 7.418, 3.772, 8.963
Is there evidence to support the claim that the higher baking temperature results in wafers
with a lower mean photoresist thickness? Use α = 0.05. Use the entire State-Plan-SolveConclude process. Write a short (one-paragraph) executive summary of your conclusions.
5. Data on a random variable Y were 12, 8, 14, 20, 26, 26, 20, 21, 18, 24, 30, 21, 18, 16, 10, and
20. Assuming this is a random sample from a normal distribution, test each of the following.
Let α = 0.05.
(a) H0 : µ = 12 versus H1 : µ > 12 assuming that σ = 7
(b) H0 : µ = 16 versus H1 : µ 6= 16 assuming that σ = 7
(c) H0 : µ = 18 versus H1 : µ > 18 assuming that the value of σ is unknown.
6. Pretest data for experimental and control groups on course content in a special vocationalindustrial course indicated:
Experimental: ȳ1 = 9.333 s1 = 4.945 n1 = 12
Control: ȳ2 = 8.375 s2 = 1.187 n2 = 8
(a) Test the hypothesis of equal means using α = 0.05. Give a conclusion.
(b) Build a 95% confidence interval for µ1 − µ2 . Interpret the interval.
7. Suppose two samples randomly selected from two independent normal populations give
n1 = 9 ȳ = 16.0 s21 = 5.0
n2 = 4 ȳ = 12.0 s22 = 3.0
(a) Is there enough evidence to claim that the mean of population 1 is greater than the
mean of population 2? Give a clear conclusion. (Use α = 0.05)
(b) Build a 90% confidence interval for µ1 − µ2 . Interpret the interval.
8. Susan and Olivia both took an introductory statistics class, however Susan attends University
A and Olivia attends University B. The final exam for University A has µ = 50 and σ = 10
and Susan scored 62 points. The final exam for University B has µ = 1500 and σ = 25 and
Olivia scored 1540 points. We want to know who understands statistics better by comparing
Susan’s and Olivia’s final exam scores. Assuming the student body at each university is
comparable, who performed better on the final exam? Explain.
9. Assume we are looking at University A’s final exam (from the previous problem).
(a) With the information given, are you able to calculate the probability of a randomly
selected student scoring higher than 60 points?
(b) What would you have to assume about the distribution of the exam scores in order to
answer part (a)? Make your assumption(s) and calculate your answer.
2
Statistics 230
All Homework Assignments
(c) With the information given, are you able to calculate the probability that a randomlyselected group of 10 students will have a mean score above 60 points?
(d) What would you have to assume about the distribution of the exam scores in order to
answer part (c)? Make your assumption(s) and calculate your answer.
(e) With the information given, are you able to calculate the probability that a randomlyselected group of 100 students will have a mean score above 60 points?
(f) What would you have to assume about the distribution of the exam scores in order to
answer part (e)? Make your assumption(s) and calculate your answer.
10. Suppose we are testing patients for cancer. Our null hypothesis is that the patient is healthy
(i.e., does NOT have cancer).
(a) Describe a type I error.
(b) Describe a type II error.
(c) Would it be more important to minimize α or β in this case? Explain.
11. We are interested in µ = the mean age of current graduate students at BYU. A previous
study (from 2008) estimated the mean to be 25 with a standard deviation of 3.
(a) What sample size do we need so that we can construct a 99% confidence interval
estimating µ that has a margin of error equal to 2 years?
(b) Consider a hypothesis test of Ho : µ = 25 years vs. Ha : µ > 25 years using α = 0.01.
What sample size do we need in order to detect a difference of 2 years with 80% power?
12. Suppose we are trying to find a faster drying glue. In a study done years age, Brand A and
Brand B were tested 8 times each (on a total of 16 identical surfaces) and the drying times
were:
1
2
3
4
5
6
7
8
BrandA
11.56
9.16
10.81
11.35
6.04
8.60
9.72
12.26
BrandB
16.41
18.52
14.13
12.49
16.29
14.54
14.60
15.79
Now, suppose Brand B has now come out with a new and improved version and we want to
collect a new sample of Brand A and the new Brand B drying times. We believe the drying
3
Statistics 230
All Homework Assignments
time will be more similar and want to be able to compare mean drying times. To estimate
the variability for our new study, we will use the pooled sample variance from the old study
as an estimate for σ 2 .
(a) What sample size do we need so that we can construct a 95% confidence interval
estimating µB − µA that has a margin of error equal to 1 minute?
(b) Consider a hypothesis test of Ho : µB = µA vs. Ha : µB > µA , using α = 0.05. What
sample size do we need in order to detect a difference of 1 minute with 90% power?
13. An article in the Journal of Strain Analysis compares several procedures for predicting the
shear strength for steel plate girders. Data for nine girders in the form of the ratio of predicted
to observed load for two of these procedures, the Karlsruhe and Lehigh methods are as follows:
Girder
1
2
3
4
5
6
7
8
9
Karlsruhe Method
1.186
1.151
1.322
1.339
1.200
1.402
1.365
1.537
1.559
Lehigh Method
1.061
0.992
1.063
1.062
1.065
1.178
1.037
1.086
1.052
(a) Using α = 0.05, is there evidence to support a claim that there is a difference in mean
performance between the two methods? As part of your answer, report the p-value
(using R to compute it) and then interpret the p-value in the context of the problem.
(b) Construct a 95% confidence interval for the difference in mean predicted to observed
load.
14. The Center for the Study of Violence wants to determine whether a conflict-resolution program
in a particular high school alters aggressive behavior among its students. For 10 students,
aggression was measured both before and after they participated in the conflict resolution
course. Their scores were the following (higher scores indicate greater aggressiveness):
1
2
3
4
5
6
7
8
9
10
Before Participating
10
3
4
8
8
9
5
7
1
7
4
After participating
8
4
2
5
7
8
4
5
2
5
Statistics 230
All Homework Assignments
(a) Test the Null Hypothesis that aggression does not differ as a result of participating in
the conflict-resolution program. Show your work.
(b) Revaluate this problem using the two sample t-test method (i.e., ignore the pairing in
the data). As always, provide the test statistic and exact p-value.
(c) Create a confidence interval for µd using the paired-comparison confidence interval
sd
d¯ ± tα/2,n−1 √
n
and compare with the two-sample confidence interval for µ1 − µ2
s
x̄1 − x̄2 ± tα/2,n1 +n2 −2
s2pl
1
1
+
.
n1 n2
How do the centers of the two intervals compare? How do the widths of the two intervals
compare? Why are the two intervals different?
(d) If you were to rerun this experiment to test the effectiveness of the the treatment, would
you use a two sample t-test or a paired comparison test? Explain your reasoning.
Section 1.1
15. #1.3 on p. 34-36
16. #1.4 on p. 34-36. For part (c), assume that the measurements are on a collection of sampled
units. For part (h), let the statement begin: “For a SIMPLE random sample,...”
17. #1.5 on p. 34-36
18. #1.6 parts (a) and (b) on p. 34-36
19. #1.8 on p. 34-36
Chapter 4
20. #A3 on p. 109
21. #A5 on p. 109
22. #A6 on p. 109
23. #A8 on p. 109
24. #A9 on p. 109
25. #B1 on p. 116
26. #B4 on p. 116
5
Statistics 230
All Homework Assignments
27. #B8 on p. 117
28. #B10 on p. 117
29. #B11 on p. 117
30. #B14 on p. 118
31. #C2 on p. 124
32. #C3 on p. 124
33. Read the study described below and answer the questions.
6
Statistics 230
All Homework Assignments
(a) Which of the following best describes the study above? (choose one)
i. designed experiment in which experimental units are randomly sampled from the
population of interest
ii. designed experiment using available experimental units
iii. observational study in which samples are randomly selected from preexisting distinct
groups
iv. observational study using nonrandom sample
(b) Can you conclude that arguing with parents protects children from (or causes decreased
susceptibility to) drugs and alcohol? Explain. If causation cannot be concluded, how
could the study be changed to make causation a plausible conclusion.
34. Read the study described below and answer the questions.
7
Statistics 230
All Homework Assignments
8
Statistics 230
All Homework Assignments
(a) Which of the following best describes the study above? (choose one)
i. designed experiment in which experimental units are randomly sampled from the
population of interest
ii. designed experiment using available experimental units
iii. observational study in which samples are randomly selected from preexisting distinct
groups
iv. observational study using nonrandom sample from preexisting distinct groups
(b) Can you conclude that boosting a woman’s confidence improves her spatial reasoning
abilities? Explain. If causation cannot be concluded, how could the study be changed
to make causation a plausible conclusion.
Sections 1.2-1.3
35. #1.14 on p. 36-37
36. Consider the spatial reasoning study described in problem number 34, where subjects received
feedback (either criticism or compliments) after their performance on an unrelated pre-task.
Suppose that the mean spatial rotation scores for the groups of interest were as follows:
• mean score for men that were criticized after the pre-task = 85%
• mean score for men that were complimented after the pre-task = 87%
• mean score for women that were criticized after the pre-task = 70%
• mean score for women that were complimented after the pre-task = X
Given what you know about the study and that the study concluded that there was a
significant interaction between gender and pre-task feedback type, which of the following
values for X is most reasonable: 65%, 70%, 72%, or 87%? Explain how you chose your
answer over the other options.
9
Statistics 230
All Homework Assignments
37. Suppose that an experiment is run comparing the final exam grades of Stat 230 students.
Two factors are considered: (i) lecture time (either morning or afternoon) and (ii) major
(stat or non-stat). Suppose that the sample size is large enough so that a difference of at
least 5% on the final exam would be a significant difference across lecture times or across
majors. Further, suppose that the number of students in each of the four treatment groups is
equal and that the mean final exam score for the morning section stat majors was 83%. For
each problem below, create a table formatted as follows (the numbers in italics will be filled
in by you):
Lecture
Time
Major
stat
non-stat
83
85
78
72
80.5
78.5
morning
afternoon
overall
overall
84
75
(a) What might the means for the other three groups be IF morning did significantly better
than afternoon, stat did significantly better than non-stat, and there was NO evidence
of a lecture time × major interaction?
(b) What might the means for the other three groups be IF morning did significantly better
than afternoon, stat did significantly better than non-stat, and there was strong evidence
of a lecture time × major interaction?
(c) What might the means for the other three groups be IF morning did significantly better
than afternoon, stat and non-stat were equivalent, and there was strong evidence of a
lecture time × major interaction?
(d) What might the means for the other three groups be IF morning and afternoon were
equivalent, stat and non-stat were equivalent, and there was strong evidence of a lecture
time × major interaction?
Chapter 3
Note that problems 6-10 on p. 103 are based on the introductory paragraph labeled “The
bivariate BF[1] model.”
38. #6 on p. 103
39. Fill in the blanks: The estimated effect for long days tells how far it is from
. The residual for the first observation tells how far it is from
to
.
40. #8 on p. 103. Also give a p-value for the day length factor and give a conclusion.
41. #9 on p. 103
42. #10 on p. 103
43. #17 on p. 104
10
to
Statistics 230
All Homework Assignments
44. #23 on p. 106
45. #24 on p. 106. For false statements, re-write the statement so that it is true (changing as
few words as possible...changing a false statement to “Snow is colder than molten lava” will
not be given points...nice try, though).
46. #26 on p. 106
47. Consider the popcorn data on page 3.8 of the lecture notes, with the complete ANOVA table
on page 3.29 of the lecture notes on the course webpage. Give a 95% C.I. for each of the
following differences in means:
(a) µhigh salt - µlow salt
(b) µbuttery oil - µcanola oil
(c) µhigh salt with buttery - µlow salt with canola
(d) Calculate the width of each of the intervals in (a), (b), and (c). (Calculate the upper
confidence limit minus lower confidence limit.) Why is the width in (c) different from
the widths in (a) and (b)?
Chapter 5
NOTE: For all HW problems requiring statistical computing (e.g., R or SAS),
I expect type-written responses. Make sure that you paste in your code and
the appropriate sections of program output in addition to your type-written
conclusions. DO NOT simply attach pages of computer output. Cut and paste
only parts you refer to in your discussion. Large stacks of computer output will
NOT be graded. Also, working in groups is fine, but each student should write
his or her own interpretations/conclusions. Identical HW assignments will be
treated as plagiarism.
48. If data came from a normal distribution, what fraction of the data will be classified as outliers
when using the “Tukey” boxplot in R? Show your work.
49. Read the cancer.txt data set into R. (The data set is on the course webpage and there is
code that you can cannibalize in section5.R.) The column names are given in the first row
of the file. Suppose we are interested in seeing if the mean survival time in days is the same
for each cancer type. (We’re NOT doing the ANOVA here yet, just checking conditions with
exploratory data analysis.)
(a) Use means, sds, and boxplots to evaluate whether or not these data are appropriate for
an ANOVA. Specifically, are there outliers, unequal sds across cancer type, or evidence
of non-normality?
(b) Repeat part (a) after taking a log transform of the survival times, e.g.:
logsurv <- log(cancer$days)
(c) Compare your answers in parts (a) and (b) and make a recommendation for analysis.
11
Statistics 230
All Homework Assignments
50. Use the R command below to obtain a randomized ordering for 36 subjects that will be
assigned to one of four treatments (A, B, C, and D).
sample(1:36,36,replace=FALSE)
Give your randomized list and explain how you would use this list to assign treatments to
the 36 subjects.
51. Suppose that a veterinary psychologist runs a balanced BF[1] experiment to study the effect
of diet on depression in dogs. She uses a collection of 15 labrador retrievers that have been
diagnosed with severe depression (e.g., listless, apathetic, no interest when live squirrels are in
the room). She places each dog on one of 3 experimental diets (all Cheetos, all steak, all tofu)
for 3 months and then records the depression score for each at the end of the study, where
high depression scores indicate more extreme depression. The mean depression score for each
group was: Cheetos=27, Steak=22, and Tofu=11. Tofu is amazing! (Note: these data were
made up by your instructor.) Create the factor diagram (aka “decomposition tables”) for
the data, with diets as columns. Fill in the locations in each table with their known values,
leaving a “?” at each location where you don’t have enough information to specify the value.
Properly label your diagram and write the df under each table/box.
52. #D8 on p. 178
53. #D9 on p. 178
54. #D10 on p. 178 (Note the typo: “Cond. avg.” should say “Cond. eff.”)
55. #D14 on p. 179 (If you want, you can generate random numbers in R and check some of
these properties yourself. For example, to generate 100 random numbers from a standard
normal distribution, use: x <- rnorm(100).)
56. #D21 on p. 180. Instead of calculating critical values, instead use your F -statistic for
“Conditions” to calculate a p-value using R.
57. #D22 on p. 181
58. #D24 on p. 181. Include an F -statistic and a p-value for the Group factor.
59. USING R, do a complete analysis of variance comparing survival times for the cancer types
discussed in problem 49. Remember that you will want to compare the log of survival time
(see problem 49(b)). For full credit, you must show all code and the appropriate output.
(a) Calculate and list the mean log-survival time in days for each cancer type.
(b) Test the hypothesis
H0 : All cancer types have the same mean log-survival time.
Give the ANOVA table. Interpret the F statistic and p-value and then make a conclusion.
12
Statistics 230
All Homework Assignments
60. USING SAS, do a complete analysis of variance comparing mean log-survival times for the
cancer types discussed in problem 49. For full credit, you must show all code. To do the
log transform in SAS, adapt the following code (changing MYDIR to your personal Stat 230
directory name):
data cancer;
infile ’MYDIR\cancer.txt’ firstobs=2;
input type $ gender $ age days;
logsurv = log(days);
run;
OR, you could use the following:
data cancer;
input type $ gender $ age days;
logsurv = log(days);
datalines;
[PASTE THE CONTENTS OF CANCER.TXT HERE]
;
run;
(a) Calculate and list the mean log-survival time in days for each cancer type.
(b) Test the hypothesis
H0 : All cancer types have the same mean log-survival time.
Give the ANOVA table. (Since your interpretation should be the same here as in the
previous problem where you used R, there is no need to re-write the same interpretation.)
(c) Use the group means or difference in means from the output, along with the MSE from
the ANOVA table, to calculate (by hand) the confidence interval for µkidney − µstomach .
(The formula is written on the last page of the section3 lecture notes posted on the
webpage.)
61. We are interested in comparing 4 different methods for preparing for the ACT exam:
• Method A: Control–just take the exam
• Method B: Take one practice exam
• Method C: Take a prep course online
• Method D: Be hypnotized the day before
You are interested in assessing the power of the F test (in ANOVA) for detecting differences
in preparation method means when the significance level is α = 0.05.
(a) Suppose that ACT scores have a standard deviation of 4.7, and suppose we would like
to evaluate the possibility that the group means are µA = 21, µB = 23, µC = 25, and
µD = 27. In R, make a plot that shows the power of the F test when n = 2, 3, . . . , 20.
(Print and include this plot with your homework.)
13
Statistics 230
All Homework Assignments
(b) What is the smallest value for the group size (n) that gives 85% power?
(c) What would happen to your power curve if your hypothesized means were µA = 27, µB =
25, µC = 23, and µD = 21. Explain your answer at the level of a Stat 121 student.
(d) What would happen to your power curve if your hypothesized means were µA = 21, µB =
21, µC = 21, and µD = 27. Explain your answer at the level of a Stat 121 student.
(e) What would happen to your power curve if your hypothesized means were µA = 21, µB =
21, µC = 27, and µD = 27. Explain your answer at the level of a Stat 121 student.
Chapter 6
62. #A1 on p. 207. Note: There is a typo in this problem. Where it reads “Draw and label
a two-way table showing the two TREATMENTS and...” it should read “Draw and label a
two-way table showing the two FACTORS and...”
63. #B3 on p. 214
64. #B5 on p. 215
65. #B6 on p. 215
66. Use the file snapbean.txt (on the webpage) to conduct a two-way ANOVA in R. This
experiment endeavors to evaluate whether the date of sowing and/or the variety of snapbean
plant will affect the total yield of snapbeans. You should test if “Yield” (the response) is
affected by “sowdate” (1=early,...,4=late), “variety” (1, 2, or 3), or the interaction of sowdate
with variety. Conduct your analysis in R. Give code and appropriate output. For each of the
following effects, write a sentence which gives an appropriate conclusion (including references
to the p-value and the hypotheses of interest):
(a) sowdate
(b) variety
(c) interaction
67. Use the file programmers.txt (on the webpage) to conduct a two-way ANOVA in SAS. This
experiment was run to see how the type of experience of computer programmers and/or
the years of experience for programmers impacts their ability to accurately estimate the
time needed (in programmer days) to complete a large systems project. The response
variable “TimePredictionError” represents the difference between the actual time required to
complete a large systems project and the programmer’s estimated time. Note that all values
are positive, meaning that every subject underestimated the length of the task, but larger
values represent larger time-prediction errors. You should test if “TimePredictionError”
(the response) is affected by “LargeSystemExp” (no=experienced only with small systems,
yes=experienced with large systems), “YearsOfExp” (less5 = less than 5 years, less10 =
between 5 and 10 years, more10=more than 10 years), or the interaction of LargeSystemExp
with YearsOfExp. Conduct your analysis in SAS. Give code and appropriate output. For
each of the following effects, write a sentence which gives an appropriate conclusion (including
references to the p-value and the hypotheses of interest):
14
Statistics 230
All Homework Assignments
(a) LargeSystemExp
(b) YearsOfExp
(c) interaction
68. USING SAS, do a complete analysis of variance on heights of singers in a choir, found in the
file singerheights.csv (note that it is comma-delimited). For full credit, you must show all
code. Fit an ANOVA model that includes terms for gender (“f” or “m”), singing part (“low”
or “high”), and the interaction between gender and part. (Note that the low part for females
is generally called alto, high part for females is soprano, low part for males is bass, and high
part for males is tenor. However, we are interested in the association between singing the
high/low part and height, so we are treating this as a 2 × 2 factorial instead of a one-way
anova with 4 levels of “singing part.”)
(a) Find the complete ANOVA table USING TYPE I SS. Carry out the complete analysis
considering the decision flow diagram discussed in class for two-way ANOVA. Give a
complete interpretation for each of the terms in the model.
(b) Find the complete ANOVA table USING TYPE III SS. Carry out the complete analysis
considering the decision flow diagram discussed in class for two-way ANOVA. Give a
complete interpretation for each of the terms in the model.
(c) Why is the SS for gender so much smaller with Type III SS? Explain.
69. For this problem, you will conduct an analysis of the BF[2] data you gathered in your catapult
experiment.
(a) Give the ANOVA table (using either R or SAS). Give your code and your output.
(b) List the three null hypotheses for your experiment. For each hypothesis, write a
conclusion for your test of the hypothesis. (Make sure your statement for each hypothesis
references the p-value and clearly states the conclusion in context.)
Section 7.1
70. #A1 on pages 250-1
71. #A2 on pages 250-1
72. #A3 on pages 250-1
73. Use the file marketing.txt on the website. The first column is sales of a product of interest
(in dollars), the second column is the shelf height factor (shelf height for the product being
sold), and the third column is day of week (the blocking factor). On each day, the researcher
in this study randomly assigned a product of interest to a location on a five-level store shelf
and then recorded the total sales for each shelf at the end of the day. Our primary interest is
to see if the shelf heights have different mean sales.
(a) Write out a well-labeled factor diagram for these data. Also, write down the statistical
model, carefully defining on the parameters in the model.
15
Statistics 230
All Homework Assignments
(b) Why would the researchers choose to treat day of the week as a block?
(c) Use SAS to analyze the data. Does the shelf height for the product affect the sales?
Does the blocking factor turn out to be an important source of variability? (As always,
include your code and use carefully-selected SAS output to justify your conclusions.)
(d) Now ignore the blocks and re-run the analysis as a BF[1] design. How do your conclusions
change? Why are the results different from the CB[1] analysis?
Section 7.3
74. #C1(a,b,c,e) on page 266
75. Consider the experiment described in Example 7.11 on page 261, with data given on the
bottom of page 281.
(a) The following is known about the analysis: mean of all observations = 21.25, SSplants =
483.75, SSdeblading = 24.5, SSinteraction = 265.75, SSresidual = 42.75, SStotal = 16194.
Using what you know about the design of the experiment and the information above,
give the complete ANOVA table for the data including appropriate F -statistics and
p-values. (You will want to use ‘1-pf(blah,blah,blah)’ in R to find the p-values.)
(b) Using the file auxin.csv on the webpage, run the analysis in SAS to check your work in
part (a). Include code and selectively-chosen parts of the SAS output. Discuss the results
of the experiment, including the significance or non-significance of each hypothesis test
of interest (i.e., discuss the test for each factor). What conclusions can be drawn about
the theories about the source of auxin and the role of leaf blades?
Section 7.2
76. #B1 on pages 256-7
77. Now use the data from the previous problem (found in Figure 7.7 on page 254) to conduct
the formal analysis in SAS. You can either (1) type the data into a spreadsheet, save the file
as .csv format, and read in the data; or (2) type the data directly into a SAS data step, e.g.,
data cows;
input cow period diet $ yield;
cards;
1 1 roughage 608
1 2 partial 716
.
.
.
3 3 partial 832
;
run;
16
Statistics 230
All Homework Assignments
(a) Does diet have a significant effect on yield? Compare the means for the diets.
(b) Did the nuisance variables (cow and time period) have substantial impact on the yield?
Section 7.4
78. #D1 on pages 278-9
79. #D2 on pages 278-9
80. #D3 on pages 278-9
81. #D4 on pages 278-9
82. #D9 on pages 278-9 (this is a CB[1])
83. #D11 on pages 278-9 (this is a SP/RM[1;1]) NOTE: To complete this problem, assume that
the first column of the “Blocks” box reads “1, -2, ?, -1, 1, ?”
Chapter 11
84. In this problem, you will use SAS to do a complete analysis of variance on the head injury
severity scores associated with 7 types of cars. The data are found in the file headinjury.csv
(note that it is comma-delimited). For full credit, you must show all code.
(a) Give the name of the appropriate design for these data and write down the statistical
model, carefully defining on the parameters in the model.
(b) Our primary interest is to see if the car types have different mean head-injury severity
scores. Write down the appropriate null and alternative hypotheses, carefully defining
all symbols.
(c) Give the ANOVA table and interpret the proper F-test for the hypotheses of interest.
(d) Assume that our primary interest is in constructing a confidence interval for each possible
pairwise comparison. If we want to ensure that the family-wise error rate is no greater
than 0.05, which multiple-comparison approach is most appropriate? Use your chosen
approach and interpret this set of pairwise comparisons—which means are significantly
different from each other? [Hint: for part (d) and (e), you can use
means cartype / tukey;
means cartype / bon;
means cartype / scheffe;
in SAS and it will give you Least Significant Difference which is the confidence interval’s
margin of error (half the C.I. width). Alternatively, you can use
means cartype / tukey cldiff;
means cartype / bon cldiff;
means cartype / scheffe cldiff;
17
Statistics 230
All Homework Assignments
which gives all pairwise C.I.’s.]
(e) Compare the width of the interval for one of the pairwise comparisons—say, “µcompact −
µvan ”—when using: (i) Tukey’s HSD, (ii) Scheffe’, and (iii) Bonferroni. Based on the
width of the intervals, which is the best approach?
(f) Re-do the analysis, this time assuming that instead of looking at all the pairwise
comparisons, you only want to consider 3 different contrasts: (i) mean of the
trucks&vans&minivans minus the mean of the other 4 car types, (ii) mean of the
heavy&medium cars minus the mean of the light&compact cars, and (iii) mean of
minivans minus mean of compact cars. If we want to ensure that the family-wise error
rate is no greater than 0.05, which multiple-comparison approach is most appropriate?
Use your chosen approach and interpret the 3 contrasts described—which contrasts
are statistically significant? NOTE: When specifying contrasts, if you need to enter
− 13 use “-0.33333333333” not “-0.33”. SAS needs contrasts to sum “exactly” to zero.
Alternatively, you can multiply every element of a contrast by a constant and the test
2 −1
of the contrast will not be affected. That is, you change the contrast from ( −1
3
3
3 )
to (−1 2 −1) with no change to the F-statistic and p-value for the contrast.
85. Use the analysis of the wear data that we did in class (code is in section11.sas). Instead of
the contrasts previously considered in section11.sas, suppose that we are interested in the
following two contrasts:
• (mean of fabric wear values in filler level 1 (cotton) and proportion level 2 (50% filled))
minus (mean of fabric wear values in filler level 2 (polyester))
• (mean of fabric wear values in filler level 1 and proportion level 1) minus (mean of fabric
wear values in filler level 1 and proportion level 2)
If we want to ensure that the family-wise error rate is no greater than 0.05, which multiplecomparison approach is most appropriate? Use your chosen approach and interpret the 2
contrasts described—which contrasts are statistically significant?
18