Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NAME (Please Print): HONOR PLEDGE (Please Sign): Statistics 101 Homework 9 You are allowed to discuss problems with other students, but the final answers must be your own work. For all problems that require calculation, YOU MUST ATTACH SEPARATE PAGES, NEATLY WRITTEN, THAT SHOW YOUR WORK. You may attach JMP-IN output. Please mark your answer in the space provided. As a general rule, each blank counts for one point. If necessary work is not shown, or if that work is substantially wrong, then you will not get credit even if the answer is correct. (The obvious purpose of this seemingly draconian policy is to prevent people from mindlessly copying each other’s answers.) Report all numerical answers to at least two correct decimal places. DUE DATE: IN CLASS ON THURSDAY, April 19. 1 1. A study in the New England Journal of Medicine reported that children were less likely to be born on weekends (presumably because doctors want the day off, and induce or delay labor so as to avoid working on Saturdays and Sundays). Suppose the study found that 5% were born on Saturdays and Sundays, compared to 24% on Fridays and Mondays, with the other weekdays each having 14% of the births. The midwives at the Quadrilateral Clinic worry that their numbers might show similarly unprofessional patterns. They check their records and report that for 700 random births, 102 were on Monday, 98 on Tuesday, 101 on Wednesday, 99 on Thursday, 101 on Friday, 99 on Saturday, and 100 on Sunday, and they test whether they show the same pattern as doctors. In words, what is your null hypothesis? Midwives have the same pattern as doctors In words, what is your alternative hypothesis? Midwives have different pattern than doctors What is the value of your test statistic? Goodness of fit test, ts=290.46 χ26 What distribution does this follow? (Include df if appropriate.) < 0.01 What is your significance probability (or P-value)? In words, what is your conclusion? Midwives are not like doctors Comment on the midwives’ situation. The data are too good to be true. They have almost equal numbers. 2 2. Use the data that is available at: http://lib.stat.cmu.edu/DASL/Datafiles/PopularKids.html Read the story. If last name begins with a letter in A-G, determine whether gender is related to the child’s goal. If your last name begins with a letter H-S, determine whether goal is related to grade. If your last name begins with T-Z, determine whether goal is related to race. In words, what is your null hypothesis? The variables are not related In words, what is your alternative hypothesis? The variables are related. What is the value of your test statistic? A-G, 21.46; H-S, 26.31; T-Z, 6.903. χ2 What distribution does this follow? (Include df if appropriate.) A-G, df=2; H-S, df=4; T-Z, df=2. What is your significance probability (or P-value)? A-G, < 0.0001; H-S, < 0.0001, T-Z, between 5% and 1% In words, what is your conclusion? For any groups, we reject the null hypothesis that the variables are not related. 3. Use the data that is available at: http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html 3 Read the description. If last name begins with a letter in A-G, do a linear regression that predicts the price of beef from the consumption of beef, the price of pork, the retail food price index, and disposable income. If your first name begins with a letter in H-S, do a linear regression that predicts the price of pork from the price of beef, the consumption of pork, the retail food price index, and disposable income. If your first name begins with a letter in T-Z, do a linear regression that predicts the retail food price index from the consumption of beef, the consumption of pork, the price of pork, and disposable income. What is your regression equation? A-G: PBE=122.51-1.16CBE+0.55PPO-0.43PFO+0.008DINC H-S: PPO=84.95+0.12PBE-0.95CPO+0.85PFO-0.49DINC T-Z: PFO=-117.57+0.21CBE+1.12CPO+1.08PPO+0.63DINC What is your prediction for a year in which the price of beef is 60 cents/lb, the price of pork is 50 cents/lb, the consumption of beef per capita is 55 pounds, the consumption of pork per capita is 58 pounds, the PFO is 850, and the disposable income per capita index is 35? What is the significance probability for deciding whether disposable income is a useful variable in your regression? A-G: 0.048; H-S:-4.64; T-Z, 9.51. t13 What distribution is used for determining the significance probability. (Include the df if appropriate.) What is the strongest correlation among two of the explanatory variables in your model? Which variables are these? A-G: the correlation between cbe and dinc are not high enough to cause concern. H-S: the correlation between pbe and dinc may be high enough to cause concern. T-Z: the correlation between cbe and dinc seem high enough to cause concern. see above Do you feel their correlation is sufficiently strong that there is a problem of multicollinearity? 4 Attach a residual plot for your analysis with respect to disposable income (1 point). Comment upon any issues that you see in it below. With the plot, they should say whether there is a pattern or not. Interpret the sign of your most significant coefficient. Identify the variable, indicate the significance probability associated with it, and try to explain whether it makes sense. Sometimes it is not reasonable to try to interpret the sign of the coefficient—if this is the case, please indicate that and don’t bother with an explanation. Holding all other variables constant, we will increase (decrease) the y-variable if we increase explanatory variable for the positive (negative) slope. 4. Do problem 7 on page 544. In words, what is your null hypothesis? The data is consistent with model In words, what is your alternative hypothesis? The data is not consistent with model 0.2 What is the value of your test statistic? Chi-square with df=2 What distribution does this follow? (Include df if appropriate.) 90% What is your significance probability (or P-value)? In words, what is your conclusion? This is a good fit. 5 5. Comment on your answer for problem 8 on page 569. y = 0.533x + 1.667 6. Give your explanation for problem 5 on page 568. False. The data are cross-sectional not longitudinal. An alternative explanation: death rates are higher among people who drink, smoke and donot eat breakfast. So, fewer of these people survive past 65 and get interviewed. 7. Do problem 9 on page 566. (a) The question makes sense, and the difference in attitudes is important. (That is a practical judgement, not a statistical one.) (b) The question makes sense, because the data are based on probability samples; but to answer it, you need to use the half-sample method (c) Now this is like example 3 on Page 507. The SE for the 1970 percentage is 1.6%; for 1990, the SE is 1.4%. The difference is 23%, and the SE for the difference is 2.1%, so z = 23/2.1 = 10 and P = 0. 8. Explain your answer for problem 6 on page 569. The difference between 90th and 50th percentiles is bigger-long right hand tail. 6