Download Practice Exam Spring 09

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
PubH 6415
Practice Exam I
NAME:______________________
Directions: This is an open-book, open-notes exam. You may use a calculator of your choosing,
but laptop computers are not permitted. For true/false questions, circle the word corresponding
to your answer. For short answer problems, please show all relevant work necessary to arrive at
a solution, partial credit will be awarded. You will have from 10:10 am to 12:05 pm to complete
the exam. Do not spend too much time on any one problem. Good Luck!!
1. To examine effects of trace amounts of DDT on nerve activity, researchers took a random
sample of 6 male rats and fed them small amounts of DDT. They measured the time it
took for electrical responses in certain leg nerves. They then took a random sample of 6
healthy male rats and measured the time it took for electrical responses in certain leg
nerves, with the results given below. The measurements are given in milliseconds. Use
this to answer questions A-F. (Assume all relevant distributions have approximate
normal distributions.)
1
DDT rat
Control rat
12.2
11.1
2
16.9
9.7
3
25.1
12.1
4
22.4
9.4
5
8.5
8.2
6
20.6
6.6
Mean
17.62
9.52
Standard
deviation
6.34
1.97
A. Write the null and alternative hypothesis for the following research question.
Does giving DDT to the rats increase the mean time to electrical responses in leg
nerves?
B. The correct t-test for testing the hypothesis in part a is (Circle one)
i.
One sample t-test
ii. Two-sample t-test assuming equal variances.
iii. Two-sample t-test not assuming equal variances.
iv.
Matched pairs t-test
1
C. Calculate the test statistic.
D. How many degrees of freedom are there to determine the shape of the t-distribution
for comparison of the test statistic?
E. Calculate the range of the p-value or indicate the cutoff value you compare to the tstatistic. What is your conclusion at = 0.05?
F. If you conclude to reject the null hypothesis at =0.05, what is the probability of a
type I error?
2
2. Researchers were interested to see whether men or women receive faster service at
restaurants. Sixteen people, eight women and eight men, were matched on age and race
All participants were given nice clothing. Each of the eight matches, of men and
women, was randomly assigned a restaurant. The order of who arrived first at the
restaurant (man or woman) was determined by the flip of a coin. Each person ordered a
similar drink and similar meal. The time in minutes until the food arrived at the table was
recorded below. Use this to answer questions A-G. (Assume all relevant distributions
have approximate normal distributions.)
Restaurant 1
2
3
4
5
6
7
8
Man
22
Woman
25
Difference -3
14
12
2
16
13
3
26
21
5
18
21
-3
13
14
-1
9
9
0
27
16
11
Mean
Standard
Deviation
18.125 6.40
16.375 5.45
1.75
4.68
A. What are the null and alternative hypotheses that would answer the research
question above?
B. The correct t-test for testing the hypothesis in part a is (Circle one)
i.One sample t-test
ii.Two-sample t-test assuming equal variances.
iii.Two-sample t-test not assuming equal variances.
iv.Matched pairs t-test
C. Calculate the test statistic.
D. How many degrees of freedom are there to determine the shape of the tdistribution for comparison of the test statistic?
3
E. Calculate the range of the p-value or indicate the cutoff value you compare to the
t-statistic (at = 0.05).
F. What is your conclusion (at = 0.05)?
G. What potential error might we have made given our conclusion above?
4
3. Researchers are interested in studying the relationships between two of the major
pollutants in vehicle exhaust: Carbon monoxide (CO) and nitrogen oxides (NOX). The
amount of the pollutants emitted by 46 light-duty engines of the same type is measured.
The units of measurement are in grams per mile. Researchers calculate the correlation
between these measures. The value is –0.7192.
A. Determine whether each of the statements regarding the correlation coefficient is
true or false.
i. The correlation coefficient equals the proportion of times that two
variables lie on a straight line.
True or False
ii. The correlation coefficient is always +1 if all the data points lie on a
perfectly horizontal straight line.
True or False
iii. The correlation coefficient measures the strength of any relationship that
may be present between two quantitative variables.
True or False
iv. The correlation coefficient does not have a unit of measure.
True or False
v. The correlation coefficient lies between –1.0 and +1.0 inclusive.
True or False
B. Determine whether each of the statements is true or false based on the correlation
coefficient given in the problem above.
vi. An arithmetic mistake was made. Correlation must be positive.
True or False
vii. There is more carbon monoxide than nitrogen oxides.
True or False
viii. If the correlation between CO and NOX is –0.7192 then the correlation
between NOX and CO is +0.7192.
True or False
ix. Based on the correlation we can conclude that a negative linear
relationship best describes the relationship between NOX and CO.
True or False
5
4. A researcher is studying treatments for agoraphobia with panic disorder. The three
treatments under study are two levels Imipramine and placebo. Thirty patients were
randomly divided into three groups of 10 each. One group was assigned to placebo and
the other two groups were assigned to one of the two levels of Imipramine. After 24
weeks on treatment a measure of symptoms was evaluated (high score indicating less
symptoms). Assume the data from the three groups are independent and the responses
are approximately normal. Below are the means and standard deviation of the test scores
for the three groups.
Placebo
Imipramine Level 1
Imipramine Level 2
Mean
75.7
84.1
102.4
Standard Deviation
12.61
18.44
20.82
The researchers did an ANOVA test of the data and obtained the following results.
Source
DF
SS
MS
F
Groups
3727.8
Error
310.87
Total
A. Fill in the missing pieces of the ANOVA table.
B. Is the assumption of equal variances reasonable? Show your work.
C. What is the value of the pooled standard deviation?
D. What is the p-value for the test of equal population means vs. at least two differ?
E. What is your conclusion at = 0.05?
F. If we wish to make all pair-wise comparisons, what is the new value we need to compare
our p-value to for conclusions in order keep our overall type I error rate at 0.05 using the
Boneferroni technique?
6
5. Twelve (6 men, 6 women) School of Public Health recent graduates are randomly
selected from each of three different programs from which they matriculated: Maternal
and Child Health (m), Community Health Education (c) and Public Health
Administration and Policy (a). The total sample size is 36 alumni. Each alumnus is
asked his or her current salary. Investigators would like to use ANOVA to investigate the
relationship between starting salary, gender and program.
A. What is the correct ANOVA test to be used?
i.)
ii.)
iii.)
iv.)
v.)
Matched pairs ANOVA for gender
Three-Way ANOVA for salary
3 X 2 One-Way ANOVA
2 X 3 Two-Way ANOVA
2 X 1 Two-Way ANOVA
B. Fill in the missing ANOVA table below.
TWO-WAY ANOVA
Program and Gender Analysis of Salary (in thousands of dollars)
The ANOVA Procedure
Dependent Variable: salary
Source
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
5
6563.666667
1312.733333
50.15
<.0001
Error
30
785.333333
26.177778
Corrected Total
35
7349.000000
Source
program
gender
program*gender
R-Square
Coeff Var
Root MSE
salary Mean
0.893137
11.58435
5.116422
44.16667
DF
Anova SS
Mean Square
F Value
Pr > F
2
1
2
5702.166667
113.777778
747.722222
2851.083333
113.777778
373.861111
108.91
4.35
14.28
<.0001
0.0457
<.0001
C. Based on the output, does there appear to be a significant
interaction between program and gender?
7
D. Use the plot below to describe the main effect for program.
E. Use the plot below to describe the interaction of gender and
program.
6. In a statistics course, a linear regression equation was computed to predict the final exam
score from the score on the midterm exam. The midterm scores ranged from 60 pointes
8
to 99 points. Assume there is a reasonable linear relationship between these measures.
The equation of the least-squares regression line was:
y = 10 + 0.9x
A. Suppose the standard deviation of the midterm scores was equal to the
standard deviation of the final exam scores. What is the correlation
coefficient between scores?
B. Suppose Joe scores a 90 on the midterm exam. What would be the predicted
value of his score on the final exam?
C. Suppose the teacher wants the lowest final exam score to be around 73. What
is the midterm score that would yield a prediction of 73 for the final score?
D. Sally scores 5 points higher than Megan on the midterm. How much of an
increase do we predict on Sally’s final exam score compared to Megan’s?
4. The goal of the Multiple Risk Factor Intervention Trial was to examine whether a
randomized intervention would reduce incidence of coronary heart disease over six years.
9
After the first year of the trial, researchers examined changes in systolic blood pressure (SBP):
outcome variable Y = SBP(visit 1) minus SBP(baseline)
and whether or not those blood pressure changes were related to baseline cholesterol readings.
Thus a negative value is obtained when SBP improves.
They fit a regression of change in SBP on baseline cholesterol:
proc reg data=new;
model sbpchange = chol / clb cli clm r influence;
title "Regression of change in SBP on cholesterol";
run;
Variable
Intercept
chol
DF
Parameter
Estimate
Standard
Error
1
1
-11.05869
0.02391
3.04102
0.01123
t Value
-3.64
2.129
Pr > |t|
0.0003
0.0335
10
This regression model can be written as: SBPchange =  0  1Chol  
a. Looking at only the scatterplot above (not the regression results), describe the
relation between change in SBP and cholesterol. Specifically, comment on form,
strength, and direction of the relation.
b. How do we interpret the estimated b1=.02391?
c. Compute a 95% confidence interval for the treatment effect (  1 ). (Use n=1002.)
d.
Using your interval from c., and α=0.05, test H 0 : 1  0 . State your conclusion
clearly in the context of the study.
11
Now consider the diagnostic plots for this regression shown on the next two pages.
e. What assumption(s) of the model are we checking with the histogram and quantile plot? Do
these two plots indicate that the assumption(s) is (are) satisfied?
f. In the scatterplot above, and in the plots below, there is a data point with a circle around it.
From the plots below, we can see that this data point has an unusually large positive studentized
residual. Looking back at the scatterplot, explain why this point has such a large positive
studentized residual.
g. In the scatterplot above, and in the plots below, there is a data point with a triangle around it.
From the plots below, we can see that this data point has a much higher Cook’s distance value
than the data point with the circle. Looking back at the scatterplot, explain why the data point
with the triangle has such a large value for Cook’s distance compared to the data point with the
circle.
12
13
14