Download Week One - Answers to Assignments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Opinion poll wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Week Two - Answers to Practice Exercises
Practice Exercises 1
1. The law of averages explains this funnel shape. The law of averages says the variability in a
sample percentage goes down as the sample size is larger. Thus the more patients you treat
annually, the percentage treated correctly will be less variable.
2. Graph A is averaged over a month, and graph B is averaged over week. If you average over
week, there should be more variability than if you average over a month. The law of averages
explains this.
5. 20 people. Since less than half the people in the population have red hair, in a very large
sample is very unlikely for you to see more than half the people with red hair.
Practice Exercises 2
8. You need a sample 16 times larger than the current sample. In order to cut the margin of error
in half, you need to quadruple the size of the sample. To go from a margin of error of 12 down to
a margin of error of 3, you need to first cut in half to go from 12 to 6, and then again to go from 6
to 3. This means you need to quadruple the sample size, and then quadruple it again. So this
means a sample 4x4=16 times larger.
9. Since our survey only as a margin of error of 10%, they should not have written that headline.
You can't confidently that job satisfaction really is down, because the 10% drop in satisfaction is
still within the margin of error. The drop in satisfaction from 65% to 55% is a drop of 10
percentage points, and in order to make a big claim that this is a real throb, we need a survey with
a margin of error much less than 10%.
Practice Exercises 3
10. (a) 10 million minutes = 10,000,000/60/24/365= 19 years of labor working around-the-clock.
(b) The margin of error = 2 = 2×square root of (50×50/sample size), so we can solve this to get
sample size = 50 squared = 2,500. This means 25,000 minutes or, equivalently, 25,000/60/24 = 17
days of labor. This is a much more manageable project, and a much better idea.
11. The fund which invested in 50 randomly chosen stocks. With a very large sample you are
likely to be very close to the true overall average return. Since overall the stocks went up, a very
large sample is most likely to give you this.
14. a) No. This is 2 ounces above average and therefore only one standard deviation above
average -- it is not so unusually high. b) Yes. Here we are talking about the variability of a sample
average. We should look at the standard error, which in this case equals 2/square root of 100 = .2
ounces. Now, these 2 ounces above average correspond to 10 standard errors above average. This
is extremely unusually high.
Practice Exercises 4
1. a) 10% plus or minus roughly 2%, or the range from 8% to 12%. To get a 95 confidence
interval you take the average and go plus or minus two standard errors. Here the standard error
for the sample percentage equals the square root of (10*90)/1000 = .949 %, and so two standard
errors equals approximately 2* .949 = 2%. (b) 4 hours plus or minus roughly .32 hours, or the
range from 3.68 hours to 4.32 hours. Here the standard error for the sample average equals
5/square root of 1000 = .16, and so twice the standard error equals .32. (c) no -- this would only
be true if the distribution of hours watched follow a normal distribution. Since the standard
deviation is so high (and going even one standard deviation below average takes you outside of
the range of the data: 5 hours below the mean of 4 hours equals –1 hours and is obviously outside
the range of the data), it is pretty clear that the histogram for the number of hours watched does
not follow a normal distribution.
3. (a) no, all we can say is that we are 95 percent confident the true average of the population is in
the range from $85,000 to $95,000. This does not mean 95 percent of the people are in this range.
(b) False (c) True (d) $25,000. This confidence interval uses a margin of error of $5000, since we
have $90,000 plus or minus $5000. This means the standard error should be equal to half of this,
or $2,500. Plugging everything into the formula for the standard error of a sample average and
solving for the standard deviation we get 2500 = SD/square root of 100, and thus SD = $25,000.
This is an estimate of the standard deviation. The standard deviation is different from the standard
error of the sample mean -- the standard deviation tells you how variable individual data values
are from the average and the standard error for the sample mean tells you how variable the
sample average is from the overall population average.
7. (a) the standard error here would equal $10,000/square root of 400 = $500. This means the
margin of error would equal $1000. You could say that the average income in the population is
estimated to be $30,000 -- and we are 95 percent confident it is off by less than $1000. (b) false -a 95 percent confidence interval does not tell you the range where 95 percent of the data lies. It
tells you we are 95 percent confident the average of the population lies in that range.
Practice Exercises 5
2. a) yes. The standard error for the average weight would equal .42/square root of 36 = .07. That
means this sample is off by (14.8-15)/.07 = - 2.9, or almost three standard errors below what is
supposed to be. Since this is more than two standard errors, this is statistically significant. b) 576
boxes. Currently the standard error equals .07, and thus the margin of error is about .14. In order
to cut the margin of error down to .035 this means we want to cut it down by a factor of .14/.0 35
= 4. And in order to cut the margin of error down by a factor of 4 we need to multiply the sample
size by 4 squared, which equals 16. To this means we need a sample 16 times larger, which
means we must sample 36* 16 = 576 boxes.
4. If they repeated the study and gave placebo to both the treatment and control groups,
there would be a 0.4 chance of seeing at least as much difference between the two groups
in terms of the number of people having the side effect.
16. The main issue in this problem is statistical significance versus practical significance. The
sample size is very large, so SE = SD/square root of 10000 will be small (equal to 0.5). As a
result, even a small decrease of 2 in the average will be statistically significant ( z = - 4). For a
process that has a SD of 50, a decrease of 2 in the average is not practically significant.