Download DUE DATE: IN CLASS ON THURSDAY, April 19.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
NAME (Please Print):
HONOR PLEDGE (Please Sign):
Statistics 101
Homework 9
You are allowed to discuss problems with other students, but the final answers must be
your own work.
For all problems that require calculation, YOU MUST ATTACH SEPARATE PAGES,
NEATLY WRITTEN, THAT SHOW YOUR WORK. You may attach JMP-IN output.
Please mark your answer in the space provided. As a general rule, each blank counts
for one point. If necessary work is not shown, or if that work is substantially wrong, then
you will not get credit even if the answer is correct. (The obvious purpose of this seemingly
draconian policy is to prevent people from mindlessly copying each other’s answers.)
Report all numerical answers to at least two correct decimal places.
DUE DATE: IN CLASS ON THURSDAY, April 19.
1
1. A study in the New England Journal of Medicine reported that children were less
likely to be born on weekends (presumably because doctors want the day off, and
induce or delay labor so as to avoid working on Saturdays and Sundays). Suppose
the study found that 5% were born on Saturdays and Sundays, compared to 24% on
Fridays and Mondays, with the other weekdays each having 14% of the births.
The midwives at the Quadrilateral Clinic worry that their numbers might show similarly unprofessional patterns. They check their records and report that for 700 random
births, 102 were on Monday, 98 on Tuesday, 101 on Wednesday, 99 on Thursday, 101
on Friday, 99 on Saturday, and 100 on Sunday, and they test whether they show the
same pattern as doctors.
In words, what is your null hypothesis?
Midwives have the same pattern as doctors
In words, what is your alternative hypothesis?
Midwives have different pattern than doctors
What is the value of your test statistic?
Goodness of fit test, ts=290.46
χ26 What distribution does this follow? (Include df if appropriate.)
< 0.01 What is your significance probability (or P-value)?
In words, what is your conclusion?
Midwives are not like doctors
Comment on the midwives’ situation.
The data are too good to be true. They have almost equal numbers.
2
2. Use the data that is available at:
http://lib.stat.cmu.edu/DASL/Datafiles/PopularKids.html
Read the story. If last name begins with a letter in A-G, determine whether gender
is related to the child’s goal. If your last name begins with a letter H-S, determine
whether goal is related to grade. If your last name begins with T-Z, determine whether
goal is related to race.
In words, what is your null hypothesis?
The variables are not related
In words, what is your alternative hypothesis?
The variables are related.
What is the value of your test statistic?
A-G, 21.46; H-S, 26.31; T-Z, 6.903.
χ2 What distribution does this follow? (Include df if appropriate.)
A-G, df=2; H-S, df=4; T-Z, df=2.
What is your significance probability (or P-value)?
A-G, < 0.0001; H-S, < 0.0001, T-Z, between 5% and 1%
In words, what is your conclusion?
For any groups, we reject the null hypothesis that the variables are not related.
3. Use the data that is available at:
http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html
3
Read the description. If last name begins with a letter in A-G, do a linear regression
that predicts the price of beef from the consumption of beef, the price of pork, the
retail food price index, and disposable income. If your first name begins with a letter
in H-S, do a linear regression that predicts the price of pork from the price of beef,
the consumption of pork, the retail food price index, and disposable income. If your
first name begins with a letter in T-Z, do a linear regression that predicts the retail
food price index from the consumption of beef, the consumption of pork, the price of
pork, and disposable income.
What is your regression equation?
A-G: PBE=122.51-1.16CBE+0.55PPO-0.43PFO+0.008DINC
H-S: PPO=84.95+0.12PBE-0.95CPO+0.85PFO-0.49DINC
T-Z: PFO=-117.57+0.21CBE+1.12CPO+1.08PPO+0.63DINC
What is your prediction for a year in which the price of beef is 60 cents/lb, the
price of pork is 50 cents/lb, the consumption of beef per capita is 55 pounds, the
consumption of pork per capita is 58 pounds, the PFO is 850, and the disposable
income per capita index is 35?
What is the significance probability for deciding whether disposable income is a
useful variable in your regression?
A-G: 0.048; H-S:-4.64; T-Z, 9.51.
t13 What distribution is used for determining the significance probability. (Include
the df if appropriate.)
What is the strongest correlation among two of the explanatory variables in your
model? Which variables are these?
A-G: the correlation between cbe and dinc are not high enough to cause concern.
H-S: the correlation between pbe and dinc may be high enough to cause concern.
T-Z: the correlation between cbe and dinc seem high enough to cause concern.
see above Do you feel their correlation is sufficiently strong that there is a problem of
multicollinearity?
4
Attach a residual plot for your analysis with respect to disposable income (1
point). Comment upon any issues that you see in it below.
With the plot, they should say whether there is a pattern or not.
Interpret the sign of your most significant coefficient. Identify the variable, indicate the significance probability associated with it, and try to explain whether
it makes sense. Sometimes it is not reasonable to try to interpret the sign of
the coefficient—if this is the case, please indicate that and don’t bother with an
explanation.
Holding all other variables constant, we will increase (decrease) the y-variable if
we increase explanatory variable for the positive (negative) slope.
4. Do problem 7 on page 544.
In words, what is your null hypothesis?
The data is consistent with model
In words, what is your alternative hypothesis?
The data is not consistent with model
0.2 What is the value of your test statistic?
Chi-square with df=2 What distribution does this follow? (Include df if appropriate.)
90% What is your significance probability (or P-value)?
In words, what is your conclusion?
This is a good fit.
5
5. Comment on your answer for problem 8 on page 569.
y = 0.533x + 1.667
6. Give your explanation for problem 5 on page 568.
False. The data are cross-sectional not longitudinal. An alternative explanation:
death rates are higher among people who drink, smoke and donot eat breakfast.
So, fewer of these people survive past 65 and get interviewed.
7. Do problem 9 on page 566.
(a) The question makes sense, and the difference in attitudes is important. (That is
a practical judgement, not a statistical one.)
(b) The question makes sense, because the data are based on probability samples;
but to answer it, you need to use the half-sample method
(c) Now this is like example 3 on Page 507. The SE for the 1970 percentage is 1.6%;
for 1990, the SE is 1.4%. The difference is 23%, and the SE for the difference is
2.1%, so z = 23/2.1 = 10 and P = 0.
8. Explain your answer for problem 6 on page 569.
The difference between 90th and 50th percentiles is bigger-long right hand tail.
6