Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 1. Define the following terms (1 point each): alternative hypothesis One of three hypotheses indicating that the parameter is not zero; one states the parameter is not equal to zero, one states the parameter is larger than zero, and one states the parameter is smaller than zero. probability value An area under the sampling distribution of t under the assumption that the null hypothesis is true: the area beyond calculated t for a two-tailed test; the area above calculated t for an upper-tailed test; and the area below calculated t for a lower-tailed test. Type I error rate The probability of rejecting the null hypothesis when it is true. 2 2. Using the data construct a scatter plot with GPA as the dependent variable 4.0 GPA 3.5 3.0 2.5 2.0 1000 1100 1200 1300 SAT Plot of GPA Versus SAT 1400 1500 1600 3 3. For these data, would it be appropriate to report SY X ? If not, why not No. Calculating SY X is based on the assumption of equal conditional variances. The plot suggests this assumption is violated. There is relatively little conditional variance when SAT is low (e.g., 1000) and relatively large conditional variance when SAT is high (e.g., 1600). So SY X would have to be misleading. 4. Using the data calculate XY where X denotes SAT and Y denotes GPA. SAT X GPA Y SAT GPA XY 1050 2.50 2625 1100 2.40 2640 1150 2.50 2875 1200 2.80 3360 1250 3.10 3875 1300 2.60 3380 1350 2.70 3645 1400 3.40 4760 1550 3.60 5580 1600 2.70 4320 XY 37060 5. Calculate the slope using salary as the dependent variable (2 points). b S XY 588.6 36.7 S XX 16.025 4 6. Calculate the standard error of estimate (2 points). SY X SYY bS XY n2 35320 36.7 588.6 20 2 761.148 27.6 Note that SY2 X 761.148 . 7. Set up and test hypotheses relevant to the question of whether ratings are predictive of salaries; use the F statistic approach and .01 (7 points). H0 : 0 HA : 0 F b2 S XX SY2 X 36.7 2 16.025 761.148 28.4 F ,1, n 2 F.05,1, 20 2 4.4139 5 F distribution with 1 and 18 df 0.5 0.4 0.3 0.2 4.4139 .05 0.1 1 2 3 4 5 6 We reject the null hypothesis since calculated F is in the region of rejection. We conclude that ratings are predictive of salaries. 8. The intercept is calculated to be 625.83. Can a legitimate substantive interpretation of the intercept be made? Why? No, because the range of the ratings is 1 to 5, and the intercept is the estimated conditional mean corresponding to a value of zero for the independent variable. 6 9. Calculate and interpret a 90-percent confidence interval for (2 points). b t / 2, n 2 Sb Sb SY2 X S XX 761.148 16.025 6.89 t / 2, n 2 t.10 / 2, 20 2 1.7341 36.7 1.7341 6.892 24.8, 48.7 We are 90% confident that the population slope is between 24.8 and 48.7. (The preceding interpretation is satisfactory.) We are 90% confident that the mean difference in salary for secretaries that are onerating point apart is between 24.8 and 48.7. (The preceding interpretation is stated in terms of the subject-matter of the problem and is also satisfactory.) 10. Calculate the residual for a secretary who had a rating of 3.1 and a salary of 750. e Y Yˆ Yˆ a bX 625.8 36.7 3.1 739.7 7 e Y Yˆ 750 739.7 10.3 11. The correlation between performance ratings and biweekly salaries is .78. Suppose the reliability of performance ratings is .81. What is the correlation between true performance ratings and salary? (Assume salary is measured without error.) (2 points.) If salary is measured without error then rYY 1.00 rTX TY rXY rXX rYY .78 .81 1.00 .87 12. In the sample what happens to the average competence rating as more press conferences are viewed? b 8.192 Since b is positive, in the sample confidence ratings increase as more conferences are viewed. 13. What is the numeric value of the standard error estimate? The conditional variance SY2 X is equal to the mean square for error. Therefore SY X 1163.020 34.1 8 14. Which of the following hypotheses H0 : 0 HA : 0 is supported by the results? Use α = .05 and justify your answer with appropriate statistical evidence (3 points). Although the p value is for the t statistic calculated from the slope (i.e. t b Sb ), we can use it because the t statistic calculated from the slope equals the t statistic calculated from the correlation: b n2 r . Sb 1 r2 prob t .0568 We need an upper-tailed p value because H A : 0 . The sign of r is positive, since its sign is the same as the sign of b. Since the sign of r and the hypothesized sign of the population correlation coefficient are the same we compute the upper-tailed p value by 1 1 prob t .0568 .0284 2 2 Since p , we can reject the null hypothesis; the alternative hypothesis is supported. 9 We can also do this problem by using the calculated t and critical t. Although the calculated t (2.01) is for the slope (i.e. t b Sb ), we can apply the calculated t it to the correlation hypothesis because the t statistic calculated from the slope equals the t statistic calculated from the correlation: b n2 r Sb 1 r2 . From the printout calculated t equals 2.01. The critical t is t , n 2 t.05,24 2 1.7171 and we reject the null hypothesis. 10 The situation described in the introduction to 15 and 16 is depicted in the following: GPA Rejected Admitted 950 1520 1200 GRE Figure 1. Hypothetical scatterplot that would have occurred had all applicants been admitted. 15. a The un-standardized slope is not systematically affected by direct selection, so (a) is correct. 16. b Simple explanation The scatter plot for the selected sample is rounder than is the scatter plot for the entire sample and therefore the correlation is smaller for the selected sample. Because the standardized slope is equal to the correlation coefficient, the standardized slope must also smaller for the selected sample. Complex explanation We know that r2 1 n 2 SY2 X n 1 SY2 , 11 SY2 X is not systematically affected by direct selection, but SY2 is reduced by direct selection. Therefore r 2 must be reduced by direct selection and bZ r must be smaller in the selected sample. Therefore (b) is correct. 17. b Option (b) is correct because power is lower when the Type I error rate is smaller. In regard to the other options: power increases when (a) the sample size increases and (b) when a correct directional alternative hypothesis is used. Power is not affected by the choice between the t test on the regression slope and the t test on the correlation coefficient because both t statistics are equal. 18. d The plot shows the fan-out shape to the residuals so equal conditional variances is violated. In regard to the other options: The most obvious feature in the plot is the fanning out of the residuals, so we should not conclude that the residuals are non-normal. A residual plot cannot tell us whether independence is violated, so (b) is not correct. There is no relationship between the residuals and the independent variable, so (c) is not correct. 19. a This follows from the fact that measurement error in either variable attenuates the correlation coefficient. 20. a The width of confidence intervals declines as sample size increases. 21. d Only the correlation coefficient is a scale-free statistic. The size of each of the other statistics changes when the scale of measurement for the variables changes. 22. c By definition the odds ratio tells us how much the odds are multiplied by when the independent variables is changed one unit (in a descriptive sense).