* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Practice Exam Spring 09
Survey
Document related concepts
Transcript
PubH 6415 Practice Exam I NAME:______________________ Directions: This is an open-book, open-notes exam. You may use a calculator of your choosing, but laptop computers are not permitted. For true/false questions, circle the word corresponding to your answer. For short answer problems, please show all relevant work necessary to arrive at a solution, partial credit will be awarded. You will have from 10:10 am to 12:05 pm to complete the exam. Do not spend too much time on any one problem. Good Luck!! 1. To examine effects of trace amounts of DDT on nerve activity, researchers took a random sample of 6 male rats and fed them small amounts of DDT. They measured the time it took for electrical responses in certain leg nerves. They then took a random sample of 6 healthy male rats and measured the time it took for electrical responses in certain leg nerves, with the results given below. The measurements are given in milliseconds. Use this to answer questions A-F. (Assume all relevant distributions have approximate normal distributions.) 1 DDT rat Control rat 12.2 11.1 2 16.9 9.7 3 25.1 12.1 4 22.4 9.4 5 8.5 8.2 6 20.6 6.6 Mean 17.62 9.52 Standard deviation 6.34 1.97 A. Write the null and alternative hypothesis for the following research question. Does giving DDT to the rats increase the mean time to electrical responses in leg nerves? B. The correct t-test for testing the hypothesis in part a is (Circle one) i. One sample t-test ii. Two-sample t-test assuming equal variances. iii. Two-sample t-test not assuming equal variances. iv. Matched pairs t-test 1 C. Calculate the test statistic. D. How many degrees of freedom are there to determine the shape of the t-distribution for comparison of the test statistic? E. Calculate the range of the p-value or indicate the cutoff value you compare to the tstatistic. What is your conclusion at = 0.05? F. If you conclude to reject the null hypothesis at =0.05, what is the probability of a type I error? 2 2. Researchers were interested to see whether men or women receive faster service at restaurants. Sixteen people, eight women and eight men, were matched on age and race All participants were given nice clothing. Each of the eight matches, of men and women, was randomly assigned a restaurant. The order of who arrived first at the restaurant (man or woman) was determined by the flip of a coin. Each person ordered a similar drink and similar meal. The time in minutes until the food arrived at the table was recorded below. Use this to answer questions A-G. (Assume all relevant distributions have approximate normal distributions.) Restaurant 1 2 3 4 5 6 7 8 Man 22 Woman 25 Difference -3 14 12 2 16 13 3 26 21 5 18 21 -3 13 14 -1 9 9 0 27 16 11 Mean Standard Deviation 18.125 6.40 16.375 5.45 1.75 4.68 A. What are the null and alternative hypotheses that would answer the research question above? B. The correct t-test for testing the hypothesis in part a is (Circle one) i.One sample t-test ii.Two-sample t-test assuming equal variances. iii.Two-sample t-test not assuming equal variances. iv.Matched pairs t-test C. Calculate the test statistic. D. How many degrees of freedom are there to determine the shape of the tdistribution for comparison of the test statistic? 3 E. Calculate the range of the p-value or indicate the cutoff value you compare to the t-statistic (at = 0.05). F. What is your conclusion (at = 0.05)? G. What potential error might we have made given our conclusion above? 4 3. Researchers are interested in studying the relationships between two of the major pollutants in vehicle exhaust: Carbon monoxide (CO) and nitrogen oxides (NOX). The amount of the pollutants emitted by 46 light-duty engines of the same type is measured. The units of measurement are in grams per mile. Researchers calculate the correlation between these measures. The value is –0.7192. A. Determine whether each of the statements regarding the correlation coefficient is true or false. i. The correlation coefficient equals the proportion of times that two variables lie on a straight line. True or False ii. The correlation coefficient is always +1 if all the data points lie on a perfectly horizontal straight line. True or False iii. The correlation coefficient measures the strength of any relationship that may be present between two quantitative variables. True or False iv. The correlation coefficient does not have a unit of measure. True or False v. The correlation coefficient lies between –1.0 and +1.0 inclusive. True or False B. Determine whether each of the statements is true or false based on the correlation coefficient given in the problem above. vi. An arithmetic mistake was made. Correlation must be positive. True or False vii. There is more carbon monoxide than nitrogen oxides. True or False viii. If the correlation between CO and NOX is –0.7192 then the correlation between NOX and CO is +0.7192. True or False ix. Based on the correlation we can conclude that a negative linear relationship best describes the relationship between NOX and CO. True or False 5 4. A researcher is studying treatments for agoraphobia with panic disorder. The three treatments under study are two levels Imipramine and placebo. Thirty patients were randomly divided into three groups of 10 each. One group was assigned to placebo and the other two groups were assigned to one of the two levels of Imipramine. After 24 weeks on treatment a measure of symptoms was evaluated (high score indicating less symptoms). Assume the data from the three groups are independent and the responses are approximately normal. Below are the means and standard deviation of the test scores for the three groups. Placebo Imipramine Level 1 Imipramine Level 2 Mean 75.7 84.1 102.4 Standard Deviation 12.61 18.44 20.82 The researchers did an ANOVA test of the data and obtained the following results. Source DF SS MS F Groups 3727.8 Error 310.87 Total A. Fill in the missing pieces of the ANOVA table. B. Is the assumption of equal variances reasonable? Show your work. C. What is the value of the pooled standard deviation? D. What is the p-value for the test of equal population means vs. at least two differ? E. What is your conclusion at = 0.05? F. If we wish to make all pair-wise comparisons, what is the new value we need to compare our p-value to for conclusions in order keep our overall type I error rate at 0.05 using the Boneferroni technique? 6 5. Twelve (6 men, 6 women) School of Public Health recent graduates are randomly selected from each of three different programs from which they matriculated: Maternal and Child Health (m), Community Health Education (c) and Public Health Administration and Policy (a). The total sample size is 36 alumni. Each alumnus is asked his or her current salary. Investigators would like to use ANOVA to investigate the relationship between starting salary, gender and program. A. What is the correct ANOVA test to be used? i.) ii.) iii.) iv.) v.) Matched pairs ANOVA for gender Three-Way ANOVA for salary 3 X 2 One-Way ANOVA 2 X 3 Two-Way ANOVA 2 X 1 Two-Way ANOVA B. Fill in the missing ANOVA table below. TWO-WAY ANOVA Program and Gender Analysis of Salary (in thousands of dollars) The ANOVA Procedure Dependent Variable: salary Source DF Sum of Squares Mean Square F Value Pr > F Model 5 6563.666667 1312.733333 50.15 <.0001 Error 30 785.333333 26.177778 Corrected Total 35 7349.000000 Source program gender program*gender R-Square Coeff Var Root MSE salary Mean 0.893137 11.58435 5.116422 44.16667 DF Anova SS Mean Square F Value Pr > F 2 1 2 5702.166667 113.777778 747.722222 2851.083333 113.777778 373.861111 108.91 4.35 14.28 <.0001 0.0457 <.0001 C. Based on the output, does there appear to be a significant interaction between program and gender? 7 D. Use the plot below to describe the main effect for program. E. Use the plot below to describe the interaction of gender and program. 6. In a statistics course, a linear regression equation was computed to predict the final exam score from the score on the midterm exam. The midterm scores ranged from 60 pointes 8 to 99 points. Assume there is a reasonable linear relationship between these measures. The equation of the least-squares regression line was: y = 10 + 0.9x A. Suppose the standard deviation of the midterm scores was equal to the standard deviation of the final exam scores. What is the correlation coefficient between scores? B. Suppose Joe scores a 90 on the midterm exam. What would be the predicted value of his score on the final exam? C. Suppose the teacher wants the lowest final exam score to be around 73. What is the midterm score that would yield a prediction of 73 for the final score? D. Sally scores 5 points higher than Megan on the midterm. How much of an increase do we predict on Sally’s final exam score compared to Megan’s? 4. The goal of the Multiple Risk Factor Intervention Trial was to examine whether a randomized intervention would reduce incidence of coronary heart disease over six years. 9 After the first year of the trial, researchers examined changes in systolic blood pressure (SBP): outcome variable Y = SBP(visit 1) minus SBP(baseline) and whether or not those blood pressure changes were related to baseline cholesterol readings. Thus a negative value is obtained when SBP improves. They fit a regression of change in SBP on baseline cholesterol: proc reg data=new; model sbpchange = chol / clb cli clm r influence; title "Regression of change in SBP on cholesterol"; run; Variable Intercept chol DF Parameter Estimate Standard Error 1 1 -11.05869 0.02391 3.04102 0.01123 t Value -3.64 2.129 Pr > |t| 0.0003 0.0335 10 This regression model can be written as: SBPchange = 0 1Chol a. Looking at only the scatterplot above (not the regression results), describe the relation between change in SBP and cholesterol. Specifically, comment on form, strength, and direction of the relation. b. How do we interpret the estimated b1=.02391? c. Compute a 95% confidence interval for the treatment effect ( 1 ). (Use n=1002.) d. Using your interval from c., and α=0.05, test H 0 : 1 0 . State your conclusion clearly in the context of the study. 11 Now consider the diagnostic plots for this regression shown on the next two pages. e. What assumption(s) of the model are we checking with the histogram and quantile plot? Do these two plots indicate that the assumption(s) is (are) satisfied? f. In the scatterplot above, and in the plots below, there is a data point with a circle around it. From the plots below, we can see that this data point has an unusually large positive studentized residual. Looking back at the scatterplot, explain why this point has such a large positive studentized residual. g. In the scatterplot above, and in the plots below, there is a data point with a triangle around it. From the plots below, we can see that this data point has a much higher Cook’s distance value than the data point with the circle. Looking back at the scatterplot, explain why the data point with the triangle has such a large value for Cook’s distance compared to the data point with the circle. 12 13 14