Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
History of statistics wikipedia , lookup
Time series wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Student's t-test wikipedia , lookup
Statistics 230 All Homework Assignments Section 0 1. Computer output for analysis of a random sample of data is shown below. Some of the quantities are missing. Compute the values of the missing quantities: Variable N Mean SE of mean Std. Dev. Variance Minimum Maximum Y 9 19.96 ? 3.12 ? 15.94 27.16 2. Consider the computer output shown below. One-Sample T: Y Test of mu=91 vs. not=91 Variable N Mean Std. Dev SE of mean Y 25 92.5805 ? 0.4673 95% CI (91.6160, ?) T 3.38 P 0.002 (a) Fill in the missing values in the output. Can the null hypothesis be rejected at the 0.05 level? Why? (b) Is this a one-sided or a two-sided test? (c) If the hypotheses had been H0 : µ = 90 versus H1 : µ 6= 90 would you reject the null hypothesis at the 0.05 level? (d) Use the output and the t table (or R) to find a 99 percent two-sided CI on the mean. (e) What would the P-value be if the alternative hypothesis were H1 : µ > 91? 3. Two machines are used for filling plastic bottles with a net volume of 16.0 ounces. The filling processes can be assumed to be normal, with standard deviations of σ1 = 0.015 and σ2 = 0.018. The quality engineering department suspects that both machines fill to the same net volume, whether or not this volume is 16.0 ounces. An experiment is performed by taking a random sample from the output of each machine. (Don’t forget that for parts (a)-(c) of this problem, we know σ1 and σ2 .) Machine 1: 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02, 15.99 Machine 2: 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01, 16.00 (a) State the hypotheses that should be tested in this experiment. (b) Find the P-value for this test and test these hypotheses using α = 0.05. What are your conclusions? (c) Find a 95 percent confidence interval on the difference in the mean fill volume for the two machines. (d) Re-do part (b), but this time assume that σ1 = σ2 = σ and that the value of σ is unknown. 4. Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit pattern can be imaged onto the wafer. After application, the coated wafers are baked to remove the solvent in the photoresist mixture and to harden the resist. Here are measurements of photoresist thickness (in kA) for eight wafers baked at 95C and eight wafers baked at 100C. Assume that all 16 of the runs were made in random order. 1 Statistics 230 All Homework Assignments 95 C: 11.176, 7.089, 8.097, 11.739, 11.291, 10.759, 6.467, 8.315 100 C: 5.263, 6.748, 7.461, 7.015, 8.133, 7.418, 3.772, 8.963 Is there evidence to support the claim that the higher baking temperature results in wafers with a lower mean photoresist thickness? Use α = 0.05. Use the entire State-Plan-SolveConclude process. Write a short (one-paragraph) executive summary of your conclusions. 5. Data on a random variable Y were 12, 8, 14, 20, 26, 26, 20, 21, 18, 24, 30, 21, 18, 16, 10, and 20. Assuming this is a random sample from a normal distribution, test each of the following. Let α = 0.05. (a) H0 : µ = 12 versus H1 : µ > 12 assuming that σ = 7 (b) H0 : µ = 16 versus H1 : µ 6= 16 assuming that σ = 7 (c) H0 : µ = 18 versus H1 : µ > 18 assuming that the value of σ is unknown. 6. Pretest data for experimental and control groups on course content in a special vocationalindustrial course indicated: Experimental: ȳ1 = 9.333 s1 = 4.945 n1 = 12 Control: ȳ2 = 8.375 s2 = 1.187 n2 = 8 (a) Test the hypothesis of equal means using α = 0.05. Give a conclusion. (b) Build a 95% confidence interval for µ1 − µ2 . Interpret the interval. 7. Suppose two samples randomly selected from two independent normal populations give n1 = 9 ȳ = 16.0 s21 = 5.0 n2 = 4 ȳ = 12.0 s22 = 3.0 (a) Is there enough evidence to claim that the mean of population 1 is greater than the mean of population 2? Give a clear conclusion. (Use α = 0.05) (b) Build a 90% confidence interval for µ1 − µ2 . Interpret the interval. 8. Susan and Olivia both took an introductory statistics class, however Susan attends University A and Olivia attends University B. The final exam for University A has µ = 50 and σ = 10 and Susan scored 62 points. The final exam for University B has µ = 1500 and σ = 25 and Olivia scored 1540 points. We want to know who understands statistics better by comparing Susan’s and Olivia’s final exam scores. Assuming the student body at each university is comparable, who performed better on the final exam? Explain. 9. Assume we are looking at University A’s final exam (from the previous problem). (a) With the information given, are you able to calculate the probability of a randomly selected student scoring higher than 60 points? (b) What would you have to assume about the distribution of the exam scores in order to answer part (a)? Make your assumption(s) and calculate your answer. 2 Statistics 230 All Homework Assignments (c) With the information given, are you able to calculate the probability that a randomlyselected group of 10 students will have a mean score above 60 points? (d) What would you have to assume about the distribution of the exam scores in order to answer part (c)? Make your assumption(s) and calculate your answer. (e) With the information given, are you able to calculate the probability that a randomlyselected group of 100 students will have a mean score above 60 points? (f) What would you have to assume about the distribution of the exam scores in order to answer part (e)? Make your assumption(s) and calculate your answer. 10. Suppose we are testing patients for cancer. Our null hypothesis is that the patient is healthy (i.e., does NOT have cancer). (a) Describe a type I error. (b) Describe a type II error. (c) Would it be more important to minimize α or β in this case? Explain. 11. We are interested in µ = the mean age of current graduate students at BYU. A previous study (from 2008) estimated the mean to be 25 with a standard deviation of 3. (a) What sample size do we need so that we can construct a 99% confidence interval estimating µ that has a margin of error equal to 2 years? (b) Consider a hypothesis test of Ho : µ = 25 years vs. Ha : µ > 25 years using α = 0.01. What sample size do we need in order to detect a difference of 2 years with 80% power? 12. Suppose we are trying to find a faster drying glue. In a study done years age, Brand A and Brand B were tested 8 times each (on a total of 16 identical surfaces) and the drying times were: 1 2 3 4 5 6 7 8 BrandA 11.56 9.16 10.81 11.35 6.04 8.60 9.72 12.26 BrandB 16.41 18.52 14.13 12.49 16.29 14.54 14.60 15.79 Now, suppose Brand B has now come out with a new and improved version and we want to collect a new sample of Brand A and the new Brand B drying times. We believe the drying 3 Statistics 230 All Homework Assignments time will be more similar and want to be able to compare mean drying times. To estimate the variability for our new study, we will use the pooled sample variance from the old study as an estimate for σ 2 . (a) What sample size do we need so that we can construct a 95% confidence interval estimating µB − µA that has a margin of error equal to 1 minute? (b) Consider a hypothesis test of Ho : µB = µA vs. Ha : µB > µA , using α = 0.05. What sample size do we need in order to detect a difference of 1 minute with 90% power? 13. An article in the Journal of Strain Analysis compares several procedures for predicting the shear strength for steel plate girders. Data for nine girders in the form of the ratio of predicted to observed load for two of these procedures, the Karlsruhe and Lehigh methods are as follows: Girder 1 2 3 4 5 6 7 8 9 Karlsruhe Method 1.186 1.151 1.322 1.339 1.200 1.402 1.365 1.537 1.559 Lehigh Method 1.061 0.992 1.063 1.062 1.065 1.178 1.037 1.086 1.052 (a) Using α = 0.05, is there evidence to support a claim that there is a difference in mean performance between the two methods? As part of your answer, report the p-value (using R to compute it) and then interpret the p-value in the context of the problem. (b) Construct a 95% confidence interval for the difference in mean predicted to observed load. 14. The Center for the Study of Violence wants to determine whether a conflict-resolution program in a particular high school alters aggressive behavior among its students. For 10 students, aggression was measured both before and after they participated in the conflict resolution course. Their scores were the following (higher scores indicate greater aggressiveness): 1 2 3 4 5 6 7 8 9 10 Before Participating 10 3 4 8 8 9 5 7 1 7 4 After participating 8 4 2 5 7 8 4 5 2 5 Statistics 230 All Homework Assignments (a) Test the Null Hypothesis that aggression does not differ as a result of participating in the conflict-resolution program. Show your work. (b) Revaluate this problem using the two sample t-test method (i.e., ignore the pairing in the data). As always, provide the test statistic and exact p-value. (c) Create a confidence interval for µd using the paired-comparison confidence interval sd d¯ ± tα/2,n−1 √ n and compare with the two-sample confidence interval for µ1 − µ2 s x̄1 − x̄2 ± tα/2,n1 +n2 −2 s2pl 1 1 + . n1 n2 How do the centers of the two intervals compare? How do the widths of the two intervals compare? Why are the two intervals different? (d) If you were to rerun this experiment to test the effectiveness of the the treatment, would you use a two sample t-test or a paired comparison test? Explain your reasoning. Section 1.1 15. #1.3 on p. 34-36 16. #1.4 on p. 34-36. For part (c), assume that the measurements are on a collection of sampled units. For part (h), let the statement begin: “For a SIMPLE random sample,...” 17. #1.5 on p. 34-36 18. #1.6 parts (a) and (b) on p. 34-36 19. #1.8 on p. 34-36 Chapter 4 20. #A3 on p. 109 21. #A5 on p. 109 22. #A6 on p. 109 23. #A8 on p. 109 24. #A9 on p. 109 25. #B1 on p. 116 26. #B4 on p. 116 5 Statistics 230 All Homework Assignments 27. #B8 on p. 117 28. #B10 on p. 117 29. #B11 on p. 117 30. #B14 on p. 118 31. #C2 on p. 124 32. #C3 on p. 124 33. Read the study described below and answer the questions. 6 Statistics 230 All Homework Assignments (a) Which of the following best describes the study above? (choose one) i. designed experiment in which experimental units are randomly sampled from the population of interest ii. designed experiment using available experimental units iii. observational study in which samples are randomly selected from preexisting distinct groups iv. observational study using nonrandom sample (b) Can you conclude that arguing with parents protects children from (or causes decreased susceptibility to) drugs and alcohol? Explain. If causation cannot be concluded, how could the study be changed to make causation a plausible conclusion. 34. Read the study described below and answer the questions. 7 Statistics 230 All Homework Assignments 8 Statistics 230 All Homework Assignments (a) Which of the following best describes the study above? (choose one) i. designed experiment in which experimental units are randomly sampled from the population of interest ii. designed experiment using available experimental units iii. observational study in which samples are randomly selected from preexisting distinct groups iv. observational study using nonrandom sample from preexisting distinct groups (b) Can you conclude that boosting a woman’s confidence improves her spatial reasoning abilities? Explain. If causation cannot be concluded, how could the study be changed to make causation a plausible conclusion. Sections 1.2-1.3 35. #1.14 on p. 36-37 36. Consider the spatial reasoning study described in problem number 34, where subjects received feedback (either criticism or compliments) after their performance on an unrelated pre-task. Suppose that the mean spatial rotation scores for the groups of interest were as follows: • mean score for men that were criticized after the pre-task = 85% • mean score for men that were complimented after the pre-task = 87% • mean score for women that were criticized after the pre-task = 70% • mean score for women that were complimented after the pre-task = X Given what you know about the study and that the study concluded that there was a significant interaction between gender and pre-task feedback type, which of the following values for X is most reasonable: 65%, 70%, 72%, or 87%? Explain how you chose your answer over the other options. 9 Statistics 230 All Homework Assignments 37. Suppose that an experiment is run comparing the final exam grades of Stat 230 students. Two factors are considered: (i) lecture time (either morning or afternoon) and (ii) major (stat or non-stat). Suppose that the sample size is large enough so that a difference of at least 5% on the final exam would be a significant difference across lecture times or across majors. Further, suppose that the number of students in each of the four treatment groups is equal and that the mean final exam score for the morning section stat majors was 83%. For each problem below, create a table formatted as follows (the numbers in italics will be filled in by you): Lecture Time Major stat non-stat 83 85 78 72 80.5 78.5 morning afternoon overall overall 84 75 (a) What might the means for the other three groups be IF morning did significantly better than afternoon, stat did significantly better than non-stat, and there was NO evidence of a lecture time × major interaction? (b) What might the means for the other three groups be IF morning did significantly better than afternoon, stat did significantly better than non-stat, and there was strong evidence of a lecture time × major interaction? (c) What might the means for the other three groups be IF morning did significantly better than afternoon, stat and non-stat were equivalent, and there was strong evidence of a lecture time × major interaction? (d) What might the means for the other three groups be IF morning and afternoon were equivalent, stat and non-stat were equivalent, and there was strong evidence of a lecture time × major interaction? Chapter 3 Note that problems 6-10 on p. 103 are based on the introductory paragraph labeled “The bivariate BF[1] model.” 38. #6 on p. 103 39. Fill in the blanks: The estimated effect for long days tells how far it is from . The residual for the first observation tells how far it is from to . 40. #8 on p. 103. Also give a p-value for the day length factor and give a conclusion. 41. #9 on p. 103 42. #10 on p. 103 43. #17 on p. 104 10 to Statistics 230 All Homework Assignments 44. #23 on p. 106 45. #24 on p. 106. For false statements, re-write the statement so that it is true (changing as few words as possible...changing a false statement to “Snow is colder than molten lava” will not be given points...nice try, though). 46. #26 on p. 106 47. Consider the popcorn data on page 3.8 of the lecture notes, with the complete ANOVA table on page 3.29 of the lecture notes on the course webpage. Give a 95% C.I. for each of the following differences in means: (a) µhigh salt - µlow salt (b) µbuttery oil - µcanola oil (c) µhigh salt with buttery - µlow salt with canola (d) Calculate the width of each of the intervals in (a), (b), and (c). (Calculate the upper confidence limit minus lower confidence limit.) Why is the width in (c) different from the widths in (a) and (b)? Chapter 5 NOTE: For all HW problems requiring statistical computing (e.g., R or SAS), I expect type-written responses. Make sure that you paste in your code and the appropriate sections of program output in addition to your type-written conclusions. DO NOT simply attach pages of computer output. Cut and paste only parts you refer to in your discussion. Large stacks of computer output will NOT be graded. Also, working in groups is fine, but each student should write his or her own interpretations/conclusions. Identical HW assignments will be treated as plagiarism. 48. If data came from a normal distribution, what fraction of the data will be classified as outliers when using the “Tukey” boxplot in R? Show your work. 49. Read the cancer.txt data set into R. (The data set is on the course webpage and there is code that you can cannibalize in section5.R.) The column names are given in the first row of the file. Suppose we are interested in seeing if the mean survival time in days is the same for each cancer type. (We’re NOT doing the ANOVA here yet, just checking conditions with exploratory data analysis.) (a) Use means, sds, and boxplots to evaluate whether or not these data are appropriate for an ANOVA. Specifically, are there outliers, unequal sds across cancer type, or evidence of non-normality? (b) Repeat part (a) after taking a log transform of the survival times, e.g.: logsurv <- log(cancer$days) (c) Compare your answers in parts (a) and (b) and make a recommendation for analysis. 11 Statistics 230 All Homework Assignments 50. Use the R command below to obtain a randomized ordering for 36 subjects that will be assigned to one of four treatments (A, B, C, and D). sample(1:36,36,replace=FALSE) Give your randomized list and explain how you would use this list to assign treatments to the 36 subjects. 51. Suppose that a veterinary psychologist runs a balanced BF[1] experiment to study the effect of diet on depression in dogs. She uses a collection of 15 labrador retrievers that have been diagnosed with severe depression (e.g., listless, apathetic, no interest when live squirrels are in the room). She places each dog on one of 3 experimental diets (all Cheetos, all steak, all tofu) for 3 months and then records the depression score for each at the end of the study, where high depression scores indicate more extreme depression. The mean depression score for each group was: Cheetos=27, Steak=22, and Tofu=11. Tofu is amazing! (Note: these data were made up by your instructor.) Create the factor diagram (aka “decomposition tables”) for the data, with diets as columns. Fill in the locations in each table with their known values, leaving a “?” at each location where you don’t have enough information to specify the value. Properly label your diagram and write the df under each table/box. 52. #D8 on p. 178 53. #D9 on p. 178 54. #D10 on p. 178 (Note the typo: “Cond. avg.” should say “Cond. eff.”) 55. #D14 on p. 179 (If you want, you can generate random numbers in R and check some of these properties yourself. For example, to generate 100 random numbers from a standard normal distribution, use: x <- rnorm(100).) 56. #D21 on p. 180. Instead of calculating critical values, instead use your F -statistic for “Conditions” to calculate a p-value using R. 57. #D22 on p. 181 58. #D24 on p. 181. Include an F -statistic and a p-value for the Group factor. 59. USING R, do a complete analysis of variance comparing survival times for the cancer types discussed in problem 49. Remember that you will want to compare the log of survival time (see problem 49(b)). For full credit, you must show all code and the appropriate output. (a) Calculate and list the mean log-survival time in days for each cancer type. (b) Test the hypothesis H0 : All cancer types have the same mean log-survival time. Give the ANOVA table. Interpret the F statistic and p-value and then make a conclusion. 12 Statistics 230 All Homework Assignments 60. USING SAS, do a complete analysis of variance comparing mean log-survival times for the cancer types discussed in problem 49. For full credit, you must show all code. To do the log transform in SAS, adapt the following code (changing MYDIR to your personal Stat 230 directory name): data cancer; infile ’MYDIR\cancer.txt’ firstobs=2; input type $ gender $ age days; logsurv = log(days); run; OR, you could use the following: data cancer; input type $ gender $ age days; logsurv = log(days); datalines; [PASTE THE CONTENTS OF CANCER.TXT HERE] ; run; (a) Calculate and list the mean log-survival time in days for each cancer type. (b) Test the hypothesis H0 : All cancer types have the same mean log-survival time. Give the ANOVA table. (Since your interpretation should be the same here as in the previous problem where you used R, there is no need to re-write the same interpretation.) (c) Use the group means or difference in means from the output, along with the MSE from the ANOVA table, to calculate (by hand) the confidence interval for µkidney − µstomach . (The formula is written on the last page of the section3 lecture notes posted on the webpage.) 61. We are interested in comparing 4 different methods for preparing for the ACT exam: • Method A: Control–just take the exam • Method B: Take one practice exam • Method C: Take a prep course online • Method D: Be hypnotized the day before You are interested in assessing the power of the F test (in ANOVA) for detecting differences in preparation method means when the significance level is α = 0.05. (a) Suppose that ACT scores have a standard deviation of 4.7, and suppose we would like to evaluate the possibility that the group means are µA = 21, µB = 23, µC = 25, and µD = 27. In R, make a plot that shows the power of the F test when n = 2, 3, . . . , 20. (Print and include this plot with your homework.) 13 Statistics 230 All Homework Assignments (b) What is the smallest value for the group size (n) that gives 85% power? (c) What would happen to your power curve if your hypothesized means were µA = 27, µB = 25, µC = 23, and µD = 21. Explain your answer at the level of a Stat 121 student. (d) What would happen to your power curve if your hypothesized means were µA = 21, µB = 21, µC = 21, and µD = 27. Explain your answer at the level of a Stat 121 student. (e) What would happen to your power curve if your hypothesized means were µA = 21, µB = 21, µC = 27, and µD = 27. Explain your answer at the level of a Stat 121 student. Chapter 6 62. #A1 on p. 207. Note: There is a typo in this problem. Where it reads “Draw and label a two-way table showing the two TREATMENTS and...” it should read “Draw and label a two-way table showing the two FACTORS and...” 63. #B3 on p. 214 64. #B5 on p. 215 65. #B6 on p. 215 66. Use the file snapbean.txt (on the webpage) to conduct a two-way ANOVA in R. This experiment endeavors to evaluate whether the date of sowing and/or the variety of snapbean plant will affect the total yield of snapbeans. You should test if “Yield” (the response) is affected by “sowdate” (1=early,...,4=late), “variety” (1, 2, or 3), or the interaction of sowdate with variety. Conduct your analysis in R. Give code and appropriate output. For each of the following effects, write a sentence which gives an appropriate conclusion (including references to the p-value and the hypotheses of interest): (a) sowdate (b) variety (c) interaction 67. Use the file programmers.txt (on the webpage) to conduct a two-way ANOVA in SAS. This experiment was run to see how the type of experience of computer programmers and/or the years of experience for programmers impacts their ability to accurately estimate the time needed (in programmer days) to complete a large systems project. The response variable “TimePredictionError” represents the difference between the actual time required to complete a large systems project and the programmer’s estimated time. Note that all values are positive, meaning that every subject underestimated the length of the task, but larger values represent larger time-prediction errors. You should test if “TimePredictionError” (the response) is affected by “LargeSystemExp” (no=experienced only with small systems, yes=experienced with large systems), “YearsOfExp” (less5 = less than 5 years, less10 = between 5 and 10 years, more10=more than 10 years), or the interaction of LargeSystemExp with YearsOfExp. Conduct your analysis in SAS. Give code and appropriate output. For each of the following effects, write a sentence which gives an appropriate conclusion (including references to the p-value and the hypotheses of interest): 14 Statistics 230 All Homework Assignments (a) LargeSystemExp (b) YearsOfExp (c) interaction 68. USING SAS, do a complete analysis of variance on heights of singers in a choir, found in the file singerheights.csv (note that it is comma-delimited). For full credit, you must show all code. Fit an ANOVA model that includes terms for gender (“f” or “m”), singing part (“low” or “high”), and the interaction between gender and part. (Note that the low part for females is generally called alto, high part for females is soprano, low part for males is bass, and high part for males is tenor. However, we are interested in the association between singing the high/low part and height, so we are treating this as a 2 × 2 factorial instead of a one-way anova with 4 levels of “singing part.”) (a) Find the complete ANOVA table USING TYPE I SS. Carry out the complete analysis considering the decision flow diagram discussed in class for two-way ANOVA. Give a complete interpretation for each of the terms in the model. (b) Find the complete ANOVA table USING TYPE III SS. Carry out the complete analysis considering the decision flow diagram discussed in class for two-way ANOVA. Give a complete interpretation for each of the terms in the model. (c) Why is the SS for gender so much smaller with Type III SS? Explain. 69. For this problem, you will conduct an analysis of the BF[2] data you gathered in your catapult experiment. (a) Give the ANOVA table (using either R or SAS). Give your code and your output. (b) List the three null hypotheses for your experiment. For each hypothesis, write a conclusion for your test of the hypothesis. (Make sure your statement for each hypothesis references the p-value and clearly states the conclusion in context.) Section 7.1 70. #A1 on pages 250-1 71. #A2 on pages 250-1 72. #A3 on pages 250-1 73. Use the file marketing.txt on the website. The first column is sales of a product of interest (in dollars), the second column is the shelf height factor (shelf height for the product being sold), and the third column is day of week (the blocking factor). On each day, the researcher in this study randomly assigned a product of interest to a location on a five-level store shelf and then recorded the total sales for each shelf at the end of the day. Our primary interest is to see if the shelf heights have different mean sales. (a) Write out a well-labeled factor diagram for these data. Also, write down the statistical model, carefully defining on the parameters in the model. 15 Statistics 230 All Homework Assignments (b) Why would the researchers choose to treat day of the week as a block? (c) Use SAS to analyze the data. Does the shelf height for the product affect the sales? Does the blocking factor turn out to be an important source of variability? (As always, include your code and use carefully-selected SAS output to justify your conclusions.) (d) Now ignore the blocks and re-run the analysis as a BF[1] design. How do your conclusions change? Why are the results different from the CB[1] analysis? Section 7.3 74. #C1(a,b,c,e) on page 266 75. Consider the experiment described in Example 7.11 on page 261, with data given on the bottom of page 281. (a) The following is known about the analysis: mean of all observations = 21.25, SSplants = 483.75, SSdeblading = 24.5, SSinteraction = 265.75, SSresidual = 42.75, SStotal = 16194. Using what you know about the design of the experiment and the information above, give the complete ANOVA table for the data including appropriate F -statistics and p-values. (You will want to use ‘1-pf(blah,blah,blah)’ in R to find the p-values.) (b) Using the file auxin.csv on the webpage, run the analysis in SAS to check your work in part (a). Include code and selectively-chosen parts of the SAS output. Discuss the results of the experiment, including the significance or non-significance of each hypothesis test of interest (i.e., discuss the test for each factor). What conclusions can be drawn about the theories about the source of auxin and the role of leaf blades? Section 7.2 76. #B1 on pages 256-7 77. Now use the data from the previous problem (found in Figure 7.7 on page 254) to conduct the formal analysis in SAS. You can either (1) type the data into a spreadsheet, save the file as .csv format, and read in the data; or (2) type the data directly into a SAS data step, e.g., data cows; input cow period diet $ yield; cards; 1 1 roughage 608 1 2 partial 716 . . . 3 3 partial 832 ; run; 16 Statistics 230 All Homework Assignments (a) Does diet have a significant effect on yield? Compare the means for the diets. (b) Did the nuisance variables (cow and time period) have substantial impact on the yield? Section 7.4 78. #D1 on pages 278-9 79. #D2 on pages 278-9 80. #D3 on pages 278-9 81. #D4 on pages 278-9 82. #D9 on pages 278-9 (this is a CB[1]) 83. #D11 on pages 278-9 (this is a SP/RM[1;1]) NOTE: To complete this problem, assume that the first column of the “Blocks” box reads “1, -2, ?, -1, 1, ?” Chapter 11 84. In this problem, you will use SAS to do a complete analysis of variance on the head injury severity scores associated with 7 types of cars. The data are found in the file headinjury.csv (note that it is comma-delimited). For full credit, you must show all code. (a) Give the name of the appropriate design for these data and write down the statistical model, carefully defining on the parameters in the model. (b) Our primary interest is to see if the car types have different mean head-injury severity scores. Write down the appropriate null and alternative hypotheses, carefully defining all symbols. (c) Give the ANOVA table and interpret the proper F-test for the hypotheses of interest. (d) Assume that our primary interest is in constructing a confidence interval for each possible pairwise comparison. If we want to ensure that the family-wise error rate is no greater than 0.05, which multiple-comparison approach is most appropriate? Use your chosen approach and interpret this set of pairwise comparisons—which means are significantly different from each other? [Hint: for part (d) and (e), you can use means cartype / tukey; means cartype / bon; means cartype / scheffe; in SAS and it will give you Least Significant Difference which is the confidence interval’s margin of error (half the C.I. width). Alternatively, you can use means cartype / tukey cldiff; means cartype / bon cldiff; means cartype / scheffe cldiff; 17 Statistics 230 All Homework Assignments which gives all pairwise C.I.’s.] (e) Compare the width of the interval for one of the pairwise comparisons—say, “µcompact − µvan ”—when using: (i) Tukey’s HSD, (ii) Scheffe’, and (iii) Bonferroni. Based on the width of the intervals, which is the best approach? (f) Re-do the analysis, this time assuming that instead of looking at all the pairwise comparisons, you only want to consider 3 different contrasts: (i) mean of the trucks&vans&minivans minus the mean of the other 4 car types, (ii) mean of the heavy&medium cars minus the mean of the light&compact cars, and (iii) mean of minivans minus mean of compact cars. If we want to ensure that the family-wise error rate is no greater than 0.05, which multiple-comparison approach is most appropriate? Use your chosen approach and interpret the 3 contrasts described—which contrasts are statistically significant? NOTE: When specifying contrasts, if you need to enter − 13 use “-0.33333333333” not “-0.33”. SAS needs contrasts to sum “exactly” to zero. Alternatively, you can multiply every element of a contrast by a constant and the test 2 −1 of the contrast will not be affected. That is, you change the contrast from ( −1 3 3 3 ) to (−1 2 −1) with no change to the F-statistic and p-value for the contrast. 85. Use the analysis of the wear data that we did in class (code is in section11.sas). Instead of the contrasts previously considered in section11.sas, suppose that we are interested in the following two contrasts: • (mean of fabric wear values in filler level 1 (cotton) and proportion level 2 (50% filled)) minus (mean of fabric wear values in filler level 2 (polyester)) • (mean of fabric wear values in filler level 1 and proportion level 1) minus (mean of fabric wear values in filler level 1 and proportion level 2) If we want to ensure that the family-wise error rate is no greater than 0.05, which multiplecomparison approach is most appropriate? Use your chosen approach and interpret the 2 contrasts described—which contrasts are statistically significant? 18