Download A dog food - NC State: WWW4 Server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
Transcript
ST 311
Evening Problem Session Solutions
Week 13 Midterm 2 Review
1. p. 326, Question 18 [Learning Objectives G1, G6]
A dog food company wants to compare a new lower-calorie food with their standard dog food to see if it’s
effective in helping inactive dogs maintain a healthy weight. They have found several dog owners willing to
participate in the trial. The dogs have been classified as small, medium, or large breeds, and the company
will randomly select the owners of each size dog to receive one of the two foods. The owners have agreed
not to feed their dogs anything else for a period of 6 months, after which the dogs’ weights will be checked.
This study can best be described as:
(a) Observational study
(b) Completely randomized design
(c) Matched pairs
(d) Blocked experiment
Solution This is an example of a blocked experiment.
2. Was a control group used? Was a placebo used? [Learning Objective G5]
Solution A control group was used; it was the dogs that ate the standard dog food. You can also think
of the standard dog food as a placebo, since it was like the treatment (lower-calorie food) but lacked the
active ingredient the company though would make the treatment effective (i.e. fewer calories).
3. What is the advantage of using this type of design? [Learning Objective G7]
Solution It allows us to account for any effect we would expect the size of the dog (small, medium, or
large) to have on the relationship between diet and weight. For example, we might expect that smaller
dogs are less likely to lose weight regardless of the type of food they eat, which might unfairly make the
lower-calorie food look less effective at controlling weight.
4. p. 326, Question 8 [Learning Objectives G1, G6]
Among a group of disabled women aged 65 and older who were tracked for several years, those who had
a vitamin B12 deficiency were twice as likely to suffer severe depression as those who did not. This study
can best be described as:
(a) Observational study
(b) Completely randomized design
(c) Matched pairs
(d) Blocked experiment
Solution This is an example of an observational study.
5. Was a control group used? Was a placebo used? [Learning Objective G5]
Solution A control group was used but a placebo was not. The control group was the women who did
not have a B12 deficiency. Note that researcher had no influence over who was in the treatment (B12
deficiency) group and who was in the control group since this was an observational study.
6. p. 326, Question 14 [Learning Objectives G1, G6]
Scientists at a major pharmaceutical firm investigated the effectiveness of an herbal compound to treat the
common cold. They exposed each subject to a cold virus, then randomly assigned the subjects to either the
herbal compound or a sugar solution known to have no effect on colds. Several days a later they assessed
the patient’s condition, using a cold severity scale ranging from 0 to 5. They found no evidence of benefits
associated with the compound. This study can best be described as:
(a) Observational study
(b) Completely randomized design
(c) Matched pairs
(d) Blocked experiment
Page 1
ST 311
Evening Problem Session Solutions
Week 13 Midterm 2 Review
Solution This is an example of a completely randomized design.
7. Was a control group used? Was a placebo used? [Learning Objective G5]
Solution A control group and a placebo were used. The control group was the subjects given the sugar
solution, which was the placebo.
8. p. 702, Question 2 [Learning Objectives J1, J3, J11-J13, J19-J21]
The European School Study Project on Alcohol and Other Drugs, published in 1995, investigated the use
of marijuana and other drugs. Data from 11 countries are summarized in the following scatterplot and
regression analysis. They show the association between the percentage of country’s ninth graders who
report having smoked marijuana and who have used other drugs such as LSD, amphetamines, and cocaine.
R-Squared:
Reg Equation:
0.873
Other Drugs = −3.0678 + 0.6150·Marijuana
Table 1: Parameter Estimates
Parameter
Intercept
Slope
Estimate
−3.0678
0.6150
Std. Err.
2.204
0.0784
Alternative
6= 0
6= 0
DF
9
9
T-Stat
−1.39
xxxxxxx
P-Value
0.1974
xxxxxxx
(a) What is the response variable?
(b) What is the explanatory variable?
(c) What is the correlation coefficient?
(d) Interpret R-Squared in the context of the problem.
(e) Examine the scatterplot of this data. Do the necessary conditions for appropriate inference using a
hypothesis test about the slope appear to be met?
(f) Find the appropriate T-statistic and corresponding p-value to test if the slope of the regression line is
equal to 0. Use a significance level of α = 0.05.
(g) In the previous part, why did we test that the slope was zero?
(h) In another country, 30% of ninth graders reported using marijuana and 10% of ninth graders reported
using other drugs. What is the residual value for this country?
Solutions
(a) The response variable is percent of ninth graders using other drugs.
(b) The explanatory variable is percent of ninth graders using marijuana.
p
(c) The correlation coefficient is (0.873) = 0.934. Note, we are using the positive square root here
because the slope coefficient is positive.
(d) Approximately 87.3% of variation in other drug use can be explained by marijuana use in 9th grade.
(e) The conditions are: we have a random sample, a straight line is an appropriate model for the data,
and the points have the same standard deviation around the line for all values of X. Looking at the
scatterplot, we see that the points do seem to follow a straight-line pattern (not a curve). We can also
see that the standard deviation is relatively consistent for all values of X (i.e. the points do not seem
to get closer to or farther from each other as X increases). Note: we cant check the random sample
condition using the scatterplot and we are not explicitly told it is true in the problem; thus we are left
to assume the sample was randomly selected, or is at very least representative of the population.
Page 2
ST 311
Evening Problem Session Solutions
Week 13 Midterm 2 Review
(f) The T-statistic can be calculated using values from the table:
.6150 − 0
= 7.84
.0784
T =
The looking at the T-table the p-value associated with T = 7.84 and 9 degrees of freedom is p < .01.
So at a significance level of α = 0.05 we should reject the null hypothesis. So there is significant
evidence to suggest that there is a relationship between marijuana use in the 9th grade and other drug
use.
(g) Because this is the logical way to test is there is no relationship between the two variables. If there is
no relationship, knowing the percent of 9th graders who have smoked marijuana (X) would not give
us any additional information about the percent who have used other drugs (Y), meaning that the
values of Y would not increase with X but would instead stay flat as X changes. This corresponds to
a slope of zero.
(h) We first need to identify the predicted value from our regression equation. We expect that a country
in which 30% of ninth graders use marijuana will have
OtherU se = −3.0678 + 0.6150(30) = 15.382.
Then to find the residual value we need to calculate the observed minus the predicted value, so
Residual = 10 − 15.382 = −5.832.
9. p. 496, Question 18 [Learning Objectives H3, H7-H9]
According to the Association of American Medical Colleges, only 46% of medical school applicants were
admitted to a medical school in the fall of 2006. Upon hearing this, the trustees of Striving College expressed
concern that only 77 of the 180 students in their class of 2006 who applied to medical school were admitted.
The college president would like to see if the school’s success rate is lower than the national average.
(a) Is this question about means or proportions?
(b) Identify the null and alternative hypotheses.
(c) Is the null distribution a t-statistic or z-statistic, and how do you know?
(d) Calculate the test statistic and find the p-value.
(e) Using a significance level of α = 0.05, what is your conclusion? State your conclusion in terms of your
hypotheses, and interpret your results in the context of this problem.
Solutions
(a) This is a question about proportions. We know this because the data we record for each student is
a yes or no response about whether or not they got into graduate school. So this is a categorical
variable, and with categorical variables, we use proportions.
(b)
H0 : p = 0.46
HA : p < 0.46
(c) The null distribution for proportions is a Z distribution.
(d) To begin with we need to calculate the p̂ for our sample from Striving College. In this example
77
= 0.43. The test statistic is calculated as follows:
p̂ = 180
Z=
0.43 − 0.46
−.032
q
=
= −0.87
.037
.46·.54
180
Then looking at the Z-table, we see that P (Z < −0.87) = 0.1922 is our p-value.
Page 3
ST 311
Evening Problem Session Solutions
Week 13 Midterm 2 Review
(e) Then since p − value > 0.05, at a significance level of α = 0.05 we fail to reject the null hypothesis.
Therefore there is not significant evidence to suggest that the Striving College acceptance rate is lower
than the national average.
10. p. 629, Question 24 [Learning Objectives H13]
A tire manufacturer tested the braking performance of one of its tire models on a test track. The company
tried the tires on 10 different cars, recording the stopping distance for each car on both wet and dry
pavement. The company is interested in knowing if cars stop in a shorter distance on dry pavement than
on wet pavement. Results are shown in the table
Table 2: Stopping Distances
Car
1
2
3
4
5
6
7
8
9
10
Mean
St. Dev
Dry
145
152
141
143
131
148
126
140
135
133
139.4
8.10
Wet
211
191
220
207
198
208
206
177
183
223
202.4
15.07
Difference
-66
-39
-79
-64
-67
-60
-80
-37
-48
-90
-63
17.59
(a) Is this question about means or proportions?
(b) Identify the null and alternative hypotheses.
(c) Is the null distribution a t-statistic or z-statistic, and how do you know?
(d) Calculate the test statistic and find the p-value.
(e) Using a significance level of α = 0.05, what is your conclusion? State your conclusion in terms of your
hypotheses, and interpret your results in the context of this problem.
Solutions
(a) This is a question about means. We know this because the data we are recording is a numerical value
for each trial.
(b)
H0 : µD = 0
HA : µD < 0
(c) The null distribution for means is a T distribution.
(d) This is a paired differences t-test. Our T statistic is calculated as follows:
T =
−63 − 0
√ = −11.326
17.59/ 10
Then looking at the t-table for 9 degrees of freedom, we see that the appropriate p-value for a t-statistic
of −11.326 and a one-tail probability is p − value < .005.
(e) Then since p − value < 0.05, at a significance level of α = 0.05, there is significant evidence to reject
the null hypothesis. Therefore there is significant evidence to suggest that the stopping distance on
dry pavement is shorter than the stopping distance on wet pavement.
Page 4
ST 311
Evening Problem Session Solutions
Week 13 Midterm 2 Review
11. We are interested in testing whether NC state students average the recommended 8 hours of sleep per night.
It is expected that the distribution of sleep times is greatly skewed left. We randomly select 10 students,
and ask the number of hours they slept the previous night. The mean response is 7 hours with a standard
deviation of 1.2 hours. [Learning Objectives F10, F17, H6, H11]
a) If appropriate, conduct a hypothesis test at the .05 level; if not explain why not.
b) We repeat the sleep study, but this time randomly select 100 students. We find a 95% confidence
interval for the average hours of sleep the previous night is 6.75 to 7.25 hours. If appropriate interpret
this confidence interval; if not, explain why not.
c) Consider the hypotheses H0 : µ = 8, HA : µ 6= 8. Based on the confidence interval above, would we
reject the null hypothesis at a 0.05 significance level?
Solutions
(a) Not appropriate, because the population is skewed, and our sample size is only 10.
(b) It is appropriate, since we have more than 30 students. Interpretation is, we are 95% confident NC
State students get an average of between 6.25 and 7.25 hours of sleep per night.
(c) Yes, we would reject since 8 is not in the confidence interval.
12. We test the hypothesis that NC State students sleep 8 hours per night (H0 : µ = 8, HA : µ 6= 8). In a
random sample of 50 students, the average is 7 hours of sleep per night. We get a test statistic of -2.17
and a p-value of 0.035. Which are correct interpretations of a hypothesis test or p-value? If inappropriate,
explain why. [Learning Objective H9]
(a) This means the probability NC State students sleep an average 8 hours per night is .035.
(b) At the .05 level, we do not reject the null hypothesis, and cannot conclude NC State students sleep
different than 8 hours per night on average.
(c) At the .05 level, we reject the null hypothesis, and conclude there is evidence NC State students sleep
different than 8 hours per night on average. In addition, the mean hours of sleep observed is 7 hours,
so we can conclude NC State students sleep statistically significantly less than 8 hours per night.
(d) This means, that if the true mean sleep time for NC State students were 8 hours per night on average,
there is a 3.5% chance we would observe a test statistic less than -2.17 or greater than +2.17 in a
random sample of 50 students.
Solutions
(a) Not appropriate. The p-value is not the probability the null hypothesis is true.
(b) Not appropriate. If the p-value is less than the significance level we will reject the null hypothesis.
(c) Yes!
(d) Yes!
Page 5