Download 1 1. Define the following terms (1 point each): alternative hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
1
1. Define the following terms (1 point each):
alternative hypothesis
One of three hypotheses indicating that the parameter is not zero; one states the
parameter is not equal to zero, one states the parameter is larger than zero, and one
states the parameter is smaller than zero.
probability value
An area under the sampling distribution of t under the assumption that the null
hypothesis is true: the area beyond calculated t for a two-tailed test; the area above
calculated t for an upper-tailed test; and the area below calculated t for a lower-tailed
test.
Type I error rate
The probability of rejecting the null hypothesis when it is true.
2
2. Using the data construct a scatter plot with GPA as the dependent variable
4.0
GPA
3.5
3.0
2.5
2.0
1000
1100
1200
1300
SAT
Plot of GPA Versus SAT
1400
1500
1600
3
3. For these data, would it be appropriate to report SY  X ? If not, why not
No. Calculating SY  X is based on the assumption of equal conditional variances. The plot
suggests this assumption is violated. There is relatively little conditional variance when
SAT is low (e.g., 1000) and relatively large conditional variance when SAT is high (e.g.,
1600). So SY  X would have to be misleading.
4. Using the data calculate
 XY
where X denotes SAT and Y denotes GPA.
SAT
X 
GPA
Y 
SAT  GPA
 XY 
1050
2.50
2625
1100
2.40
2640
1150
2.50
2875
1200
2.80
3360
1250
3.10
3875
1300
2.60
3380
1350
2.70
3645
1400
3.40
4760
1550
3.60
5580
1600
2.70
4320
 XY  37060
5. Calculate the slope using salary as the dependent variable (2 points).
b
S XY
588.6

 36.7
S XX 16.025
4
6. Calculate the standard error of estimate (2 points).
SY  X 

SYY  bS XY
n2
35320  36.7  588.6 
20  2
 761.148
 27.6
Note that SY2  X  761.148 .
7. Set up and test hypotheses relevant to the question of whether ratings are predictive of
salaries; use the F statistic approach and   .01 (7 points).
H0 :   0
HA :   0
F
b2 S XX
SY2  X
 36.7 

2
16.025
761.148
 28.4
F ,1, n  2  F.05,1, 20 2  4.4139
5
F distribution with 1 and 18 df
0.5
0.4
0.3
0.2
4.4139
.05
0.1
1
2
3
4
5
6
We reject the null hypothesis since calculated F is in the region of rejection.
We conclude that ratings are predictive of salaries.
8. The intercept is calculated to be 625.83. Can a legitimate substantive interpretation of
the intercept be made? Why?
No, because the range of the ratings is 1 to 5, and the intercept is the estimated
conditional mean corresponding to a value of zero for the independent variable.
6
9. Calculate and interpret a 90-percent confidence interval for  (2 points).
b   t / 2, n  2   Sb 
Sb 
SY2  X
S XX
761.148
16.025
 6.89

t / 2, n  2  t.10 / 2, 20 2  1.7341
36.7  1.7341 6.892 
24.8, 48.7
We are 90% confident that the population slope is between 24.8 and 48.7.
(The preceding interpretation is satisfactory.)
We are 90% confident that the mean difference in salary for secretaries that are onerating point apart is between 24.8 and 48.7.
(The preceding interpretation is stated in terms of the subject-matter of the problem and is
also satisfactory.)
10. Calculate the residual for a secretary who had a rating of 3.1 and a salary of 750.
e  Y  Yˆ
Yˆ  a  bX
 625.8  36.7  3.1
 739.7
7
e  Y  Yˆ
 750  739.7
 10.3
11. The correlation between performance ratings and biweekly salaries is .78. Suppose
the reliability of performance ratings is .81. What is the correlation between true
performance ratings and salary? (Assume salary is measured without error.) (2 points.)
If salary is measured without error then rYY   1.00
rTX TY 

rXY
rXX  rYY 
.78
.81 1.00
 .87
12. In the sample what happens to the average competence rating as more press
conferences are viewed?
b  8.192
Since b is positive, in the sample confidence ratings increase as more conferences are
viewed.
13. What is the numeric value of the standard error estimate?
The conditional variance  SY2 X  is equal to the mean square for error. Therefore
SY  X  1163.020  34.1
8
14. Which of the following hypotheses
H0 :   0
HA :   0
is supported by the results? Use α = .05 and justify your answer with appropriate
statistical evidence (3 points).
Although the p value is for the t statistic calculated from the slope (i.e. t  b Sb ), we can
use it because the t statistic calculated from the slope equals the t statistic calculated from
the correlation:
b
n2
r
.
Sb
1 r2
prob  t  .0568
We need an upper-tailed p value because H A :   0 .
The sign of r is positive, since its sign is the same as the sign of b.
Since the sign of r and the hypothesized sign of the population correlation coefficient

are the same we compute the upper-tailed p value by
1
1
prob  t  .0568   .0284
2
2
Since p   , we can reject the null hypothesis; the alternative hypothesis is supported.
9
We can also do this problem by using the calculated t and critical t. Although the
calculated t (2.01) is for the slope (i.e. t  b Sb ), we can apply the calculated t it to the
correlation hypothesis because the t statistic calculated from the slope equals the t
statistic calculated from the correlation:
b
n2
r
Sb
1 r2
.
From the printout calculated t equals 2.01. The critical t is t , n  2  t.05,24 2  1.7171 and
we reject the null hypothesis.
10
The situation described in the introduction to 15 and 16 is depicted in the following:
GPA
Rejected
Admitted
950
1520
1200
GRE
Figure 1. Hypothetical scatterplot that would have occurred had all applicants been
admitted.
15. a
The un-standardized slope is not systematically affected by direct selection, so (a) is
correct.
16. b
Simple explanation
The scatter plot for the selected sample is rounder than is the scatter plot for the entire
sample and therefore the correlation is smaller for the selected sample. Because the
standardized slope is equal to the correlation coefficient, the standardized slope must also
smaller for the selected sample.
Complex explanation
We know that
r2  1 
 n  2  SY2  X
 n  1 SY2
,
11
SY2  X is not systematically affected by direct selection, but SY2 is reduced by direct
selection. Therefore r 2 must be reduced by direct selection and bZ  r must be smaller
in the selected sample. Therefore (b) is correct.
17. b
Option (b) is correct because power is lower when the Type I error rate is smaller. In
regard to the other options: power increases when (a) the sample size increases and (b)
when a correct directional alternative hypothesis is used. Power is not affected by the
choice between the t test on the regression slope and the t test on the correlation
coefficient because both t statistics are equal.
18. d
The plot shows the fan-out shape to the residuals so equal conditional variances is
violated. In regard to the other options: The most obvious feature in the plot is the
fanning out of the residuals, so we should not conclude that the residuals are non-normal.
A residual plot cannot tell us whether independence is violated, so (b) is not correct.
There is no relationship between the residuals and the independent variable, so (c) is not
correct.
19. a
This follows from the fact that measurement error in either variable attenuates the
correlation coefficient.
20. a
The width of confidence intervals declines as sample size increases.
21. d
Only the correlation coefficient is a scale-free statistic. The size of each of the other
statistics changes when the scale of measurement for the variables changes.
22. c
By definition the odds ratio tells us how much the odds are multiplied by when the
independent variables is changed one unit (in a descriptive sense).