Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
ST 311 Evening Problem Session Solutions Week 13 Midterm 2 Review 1. p. 326, Question 18 [Learning Objectives G1, G6] A dog food company wants to compare a new lower-calorie food with their standard dog food to see if it’s effective in helping inactive dogs maintain a healthy weight. They have found several dog owners willing to participate in the trial. The dogs have been classified as small, medium, or large breeds, and the company will randomly select the owners of each size dog to receive one of the two foods. The owners have agreed not to feed their dogs anything else for a period of 6 months, after which the dogs’ weights will be checked. This study can best be described as: (a) Observational study (b) Completely randomized design (c) Matched pairs (d) Blocked experiment Solution This is an example of a blocked experiment. 2. Was a control group used? Was a placebo used? [Learning Objective G5] Solution A control group was used; it was the dogs that ate the standard dog food. You can also think of the standard dog food as a placebo, since it was like the treatment (lower-calorie food) but lacked the active ingredient the company though would make the treatment effective (i.e. fewer calories). 3. What is the advantage of using this type of design? [Learning Objective G7] Solution It allows us to account for any effect we would expect the size of the dog (small, medium, or large) to have on the relationship between diet and weight. For example, we might expect that smaller dogs are less likely to lose weight regardless of the type of food they eat, which might unfairly make the lower-calorie food look less effective at controlling weight. 4. p. 326, Question 8 [Learning Objectives G1, G6] Among a group of disabled women aged 65 and older who were tracked for several years, those who had a vitamin B12 deficiency were twice as likely to suffer severe depression as those who did not. This study can best be described as: (a) Observational study (b) Completely randomized design (c) Matched pairs (d) Blocked experiment Solution This is an example of an observational study. 5. Was a control group used? Was a placebo used? [Learning Objective G5] Solution A control group was used but a placebo was not. The control group was the women who did not have a B12 deficiency. Note that researcher had no influence over who was in the treatment (B12 deficiency) group and who was in the control group since this was an observational study. 6. p. 326, Question 14 [Learning Objectives G1, G6] Scientists at a major pharmaceutical firm investigated the effectiveness of an herbal compound to treat the common cold. They exposed each subject to a cold virus, then randomly assigned the subjects to either the herbal compound or a sugar solution known to have no effect on colds. Several days a later they assessed the patient’s condition, using a cold severity scale ranging from 0 to 5. They found no evidence of benefits associated with the compound. This study can best be described as: (a) Observational study (b) Completely randomized design (c) Matched pairs (d) Blocked experiment Page 1 ST 311 Evening Problem Session Solutions Week 13 Midterm 2 Review Solution This is an example of a completely randomized design. 7. Was a control group used? Was a placebo used? [Learning Objective G5] Solution A control group and a placebo were used. The control group was the subjects given the sugar solution, which was the placebo. 8. p. 702, Question 2 [Learning Objectives J1, J3, J11-J13, J19-J21] The European School Study Project on Alcohol and Other Drugs, published in 1995, investigated the use of marijuana and other drugs. Data from 11 countries are summarized in the following scatterplot and regression analysis. They show the association between the percentage of country’s ninth graders who report having smoked marijuana and who have used other drugs such as LSD, amphetamines, and cocaine. R-Squared: Reg Equation: 0.873 Other Drugs = −3.0678 + 0.6150·Marijuana Table 1: Parameter Estimates Parameter Intercept Slope Estimate −3.0678 0.6150 Std. Err. 2.204 0.0784 Alternative 6= 0 6= 0 DF 9 9 T-Stat −1.39 xxxxxxx P-Value 0.1974 xxxxxxx (a) What is the response variable? (b) What is the explanatory variable? (c) What is the correlation coefficient? (d) Interpret R-Squared in the context of the problem. (e) Examine the scatterplot of this data. Do the necessary conditions for appropriate inference using a hypothesis test about the slope appear to be met? (f) Find the appropriate T-statistic and corresponding p-value to test if the slope of the regression line is equal to 0. Use a significance level of α = 0.05. (g) In the previous part, why did we test that the slope was zero? (h) In another country, 30% of ninth graders reported using marijuana and 10% of ninth graders reported using other drugs. What is the residual value for this country? Solutions (a) The response variable is percent of ninth graders using other drugs. (b) The explanatory variable is percent of ninth graders using marijuana. p (c) The correlation coefficient is (0.873) = 0.934. Note, we are using the positive square root here because the slope coefficient is positive. (d) Approximately 87.3% of variation in other drug use can be explained by marijuana use in 9th grade. (e) The conditions are: we have a random sample, a straight line is an appropriate model for the data, and the points have the same standard deviation around the line for all values of X. Looking at the scatterplot, we see that the points do seem to follow a straight-line pattern (not a curve). We can also see that the standard deviation is relatively consistent for all values of X (i.e. the points do not seem to get closer to or farther from each other as X increases). Note: we cant check the random sample condition using the scatterplot and we are not explicitly told it is true in the problem; thus we are left to assume the sample was randomly selected, or is at very least representative of the population. Page 2 ST 311 Evening Problem Session Solutions Week 13 Midterm 2 Review (f) The T-statistic can be calculated using values from the table: .6150 − 0 = 7.84 .0784 T = The looking at the T-table the p-value associated with T = 7.84 and 9 degrees of freedom is p < .01. So at a significance level of α = 0.05 we should reject the null hypothesis. So there is significant evidence to suggest that there is a relationship between marijuana use in the 9th grade and other drug use. (g) Because this is the logical way to test is there is no relationship between the two variables. If there is no relationship, knowing the percent of 9th graders who have smoked marijuana (X) would not give us any additional information about the percent who have used other drugs (Y), meaning that the values of Y would not increase with X but would instead stay flat as X changes. This corresponds to a slope of zero. (h) We first need to identify the predicted value from our regression equation. We expect that a country in which 30% of ninth graders use marijuana will have OtherU se = −3.0678 + 0.6150(30) = 15.382. Then to find the residual value we need to calculate the observed minus the predicted value, so Residual = 10 − 15.382 = −5.832. 9. p. 496, Question 18 [Learning Objectives H3, H7-H9] According to the Association of American Medical Colleges, only 46% of medical school applicants were admitted to a medical school in the fall of 2006. Upon hearing this, the trustees of Striving College expressed concern that only 77 of the 180 students in their class of 2006 who applied to medical school were admitted. The college president would like to see if the school’s success rate is lower than the national average. (a) Is this question about means or proportions? (b) Identify the null and alternative hypotheses. (c) Is the null distribution a t-statistic or z-statistic, and how do you know? (d) Calculate the test statistic and find the p-value. (e) Using a significance level of α = 0.05, what is your conclusion? State your conclusion in terms of your hypotheses, and interpret your results in the context of this problem. Solutions (a) This is a question about proportions. We know this because the data we record for each student is a yes or no response about whether or not they got into graduate school. So this is a categorical variable, and with categorical variables, we use proportions. (b) H0 : p = 0.46 HA : p < 0.46 (c) The null distribution for proportions is a Z distribution. (d) To begin with we need to calculate the p̂ for our sample from Striving College. In this example 77 = 0.43. The test statistic is calculated as follows: p̂ = 180 Z= 0.43 − 0.46 −.032 q = = −0.87 .037 .46·.54 180 Then looking at the Z-table, we see that P (Z < −0.87) = 0.1922 is our p-value. Page 3 ST 311 Evening Problem Session Solutions Week 13 Midterm 2 Review (e) Then since p − value > 0.05, at a significance level of α = 0.05 we fail to reject the null hypothesis. Therefore there is not significant evidence to suggest that the Striving College acceptance rate is lower than the national average. 10. p. 629, Question 24 [Learning Objectives H13] A tire manufacturer tested the braking performance of one of its tire models on a test track. The company tried the tires on 10 different cars, recording the stopping distance for each car on both wet and dry pavement. The company is interested in knowing if cars stop in a shorter distance on dry pavement than on wet pavement. Results are shown in the table Table 2: Stopping Distances Car 1 2 3 4 5 6 7 8 9 10 Mean St. Dev Dry 145 152 141 143 131 148 126 140 135 133 139.4 8.10 Wet 211 191 220 207 198 208 206 177 183 223 202.4 15.07 Difference -66 -39 -79 -64 -67 -60 -80 -37 -48 -90 -63 17.59 (a) Is this question about means or proportions? (b) Identify the null and alternative hypotheses. (c) Is the null distribution a t-statistic or z-statistic, and how do you know? (d) Calculate the test statistic and find the p-value. (e) Using a significance level of α = 0.05, what is your conclusion? State your conclusion in terms of your hypotheses, and interpret your results in the context of this problem. Solutions (a) This is a question about means. We know this because the data we are recording is a numerical value for each trial. (b) H0 : µD = 0 HA : µD < 0 (c) The null distribution for means is a T distribution. (d) This is a paired differences t-test. Our T statistic is calculated as follows: T = −63 − 0 √ = −11.326 17.59/ 10 Then looking at the t-table for 9 degrees of freedom, we see that the appropriate p-value for a t-statistic of −11.326 and a one-tail probability is p − value < .005. (e) Then since p − value < 0.05, at a significance level of α = 0.05, there is significant evidence to reject the null hypothesis. Therefore there is significant evidence to suggest that the stopping distance on dry pavement is shorter than the stopping distance on wet pavement. Page 4 ST 311 Evening Problem Session Solutions Week 13 Midterm 2 Review 11. We are interested in testing whether NC state students average the recommended 8 hours of sleep per night. It is expected that the distribution of sleep times is greatly skewed left. We randomly select 10 students, and ask the number of hours they slept the previous night. The mean response is 7 hours with a standard deviation of 1.2 hours. [Learning Objectives F10, F17, H6, H11] a) If appropriate, conduct a hypothesis test at the .05 level; if not explain why not. b) We repeat the sleep study, but this time randomly select 100 students. We find a 95% confidence interval for the average hours of sleep the previous night is 6.75 to 7.25 hours. If appropriate interpret this confidence interval; if not, explain why not. c) Consider the hypotheses H0 : µ = 8, HA : µ 6= 8. Based on the confidence interval above, would we reject the null hypothesis at a 0.05 significance level? Solutions (a) Not appropriate, because the population is skewed, and our sample size is only 10. (b) It is appropriate, since we have more than 30 students. Interpretation is, we are 95% confident NC State students get an average of between 6.25 and 7.25 hours of sleep per night. (c) Yes, we would reject since 8 is not in the confidence interval. 12. We test the hypothesis that NC State students sleep 8 hours per night (H0 : µ = 8, HA : µ 6= 8). In a random sample of 50 students, the average is 7 hours of sleep per night. We get a test statistic of -2.17 and a p-value of 0.035. Which are correct interpretations of a hypothesis test or p-value? If inappropriate, explain why. [Learning Objective H9] (a) This means the probability NC State students sleep an average 8 hours per night is .035. (b) At the .05 level, we do not reject the null hypothesis, and cannot conclude NC State students sleep different than 8 hours per night on average. (c) At the .05 level, we reject the null hypothesis, and conclude there is evidence NC State students sleep different than 8 hours per night on average. In addition, the mean hours of sleep observed is 7 hours, so we can conclude NC State students sleep statistically significantly less than 8 hours per night. (d) This means, that if the true mean sleep time for NC State students were 8 hours per night on average, there is a 3.5% chance we would observe a test statistic less than -2.17 or greater than +2.17 in a random sample of 50 students. Solutions (a) Not appropriate. The p-value is not the probability the null hypothesis is true. (b) Not appropriate. If the p-value is less than the significance level we will reject the null hypothesis. (c) Yes! (d) Yes! Page 5