Download STAT301 Solutions 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
STAT301 Solutions 4
(1) A 95% confidence interval for the mean is [10, 18]. How much do I need to increase the
sample size by such that the margin of error is one?
Suppose this confidence interval was constructed using a sample of size
nold . We know that the margin of error is 4, and that the margin of error
satisfies 4 = 1.96 √nσold , note we do not need to know the standard deviation
σ. We do want to know how much more we need to increase the sample size
such that the the margin of error is one. This means the new sample size
σ
nnew should be such that 1 = 1.96 nnew
. To see how nold and nnew are related
in order that the margin of error goes from 4 to 1,
q we divide the two
√
√
4/1 = (1.96 × σ/ nold )/(1.96 × σ/ nnew ). This gives 4 = nnew /nold . Therefore,
16 = nnew /nold , and nnew = 16nold . In other words we need to increase the
sample size by a factor of 16 in order to decrease the margin of error from
4 to one.
(2) What is wrong with these statements?
(i) A researcher tests the following hypothesis H0 : x̄ = 23.
It makes no sense to state the sample mean in the hypothesis, as this
does not give us any information about the population. The hypotheses
should be about the population. When we do the test we only observe
the sample, and see whether the data is consistent with the hypothesis.
We will never know for sure whether the hypotheses is true or not.
(ii) A company wants to test that the mean mileage of their cars is greater than 40
miles per gallon. They state their null hypothesis as H0 : µ > 40.
The null hypothesis should always have an equal sign in it, ie. H0 : µ =
40. The reason for this is slightly technical. We need to evaluate the
chance of the data given the null is true. It is not possible to make
such a calculation for µ > 40 - what mean would we choose to make the
calculation, 40.1, 42 etc.? This is why we always state the null with an
equal sign.
(iii) The z-statistics is equal to −1.6. Because −1.6 is less than 5%, the null hypothesis
is rejected at the 5% level.
You can never compare a z-statistic with a probability. They are different. The z-statistic allows you to obtain the probability. So in this
case, you need to look up −1.6 in the tables, it is 5.4%, and compare
5.4% with the probability. Since the p-value of 5.4% is greater than
5% we cannot reject the null. Alternatively, look up the z-transform
corresponding to 5%, which −1.64. It is clear that −1.6 is closer to zero
than −1.64, this we cannot reject the null. In summary, z-statistics can
only be compared with z-statistics and p-values with probabilities.
(3) Select the appropriate null and alternative hypothesis in each of the following cases:
(i) The Battalion recently changed the format of their opinion page, you want to see
what the students think of this change. You take a random sample of students
and select those who regularly read the Battalion. They are asked to indicate
their opinions on the changes using a 5 point scale. -2 if the new format is much
worse than the old, -1 if the new format is worse than the old, 0 if the old format
is the same as the old, 1 if the new format is better than the old and 2 if the new
format is much better than the old.
Here we are trying to investigate whether there is a difference in opinion. It could be either way, meaning that the people in general think
the opinion page worse, in which case µ < 0 or that it got better µ > 0.
Thus we are testing H0 : µ = 0 (people are indifferent to the changes in
format) against HA : µ 6= 0.
(ii) The average square footage of one bedroom apartments in a new student housing
development is advertised to be 880 square feet. A student group thinks that
the apartments are smaller than advertised. They hire an engineer to measure a
sample of apartments to test their suspicion.
Here are interested in investigating whether the developer has ‘cheated’
us, ie. the size is smaller than claimed. This means testing H0 : µ = 880
against HA : µ < 880.
(4) A student claims that his scantron sheet was incorrectly graded by the scantron machine. Instead of getting 4 out of 15, he claims that score should be 11 out of 15.
The probability that the scantron machine reads one grade wrong is less than 1 in 105 .
Based on this information, the professor calculates the probability that Scantron reads
7 grades incorrectly as less than 10−35 . In other words, the probability of scantron
grading the sheet incorrectly is less than 10−35 . You want to use this information to
see if there is any evidence to suggest that the student’s claims are incorrect.
State the null and alternative for this test and give the p-value (explaining exactly
what this p-value means for someone who has never studied statistics). What are the
conclusions of the test?
Here we want to see whether there is evidence to prove that scantron was
functioning correctly and the student changed the grade. Based on this our
null hypothesis is that the student did not change the grade (and scantron
made the mistake) and the alternative hypothesis that he is did change
the grade, thus scantron could not made the mistake (remember innocent
until proven guilty). We articulate this as H0 : The did not change his
scantron sheet (thus the scantron machine made mistakes) against HA : the
student did change his scantron sheet (thus scantron did not make mistakes
in grading).
We then evaluate the chance of observing a grade change from 4 out of 15
to 11 out of 15, given that the null is true. This is the same as saying that
scantron made a mistake, and graded 4 of his questions correct rather than
11. This probability is the p-value and is given to us in the question, it is
10−35 . As this probability is so small, smaller than any significance level (we
usually use 5%), there is sufficient evidence to reject the null and decide
that the student changed his scantron. However, for every 1035 innocent
students, one student will have their scantron sheet in such an incorrect
way.
(5) A test statistic of the null H0 : µ = µ0 gives the z-transform z = 2.
(a) What is the p-value if the alternative is HA : µ > µ0 . Since the the alternative
is pointing right we look at the area to the right of 2, this is 2.2%. The
p-value is 2.2%.
(b) What is the p-value if the alternative is HA : µ < µ0 . Since the the alternative
is pointing left we look at the area to the left of 2, this is 97.8%. The
p-value is 97.8%. The p-value, since it is in the opposite direction to
the alternative. There is no evidence to reject the null.
(c) What is the p-value if the alternative is HA : µ 6= µ0 .
Since the alternative is two-sided. We calculate the smaller area and
double it. This gives the p-value = 2.2 × 2 = 4.4%. Observe, it is usually
harder to reject the null for a 2-sided test than a one-sided test. I
use the word usually because for this to be true we require the sample
mean is on the side of the alternative for a one-sided test.
(6) Next to this homework you should see the Fuel consumption data. The spreadhsheet
gives the number of miles per Gallon on 20 separate occassions for one car. Load the
Fuel consumption data into Statcrunch (there are 20 observations in this data set,
remember to use the whitespace option when loading the data). Obtain the summary
statistics by going to Stat, Summary Stat, selecting Column and putting miles in the
column. By pressing calculate you should get all of the summary statistics, such as
sample mean, standard deviation, median etc. Using this information answer the
following questions.
(a) What is the standard error of the sample mean x̄?
√
Mean = 43.17, standard error = 4.41/ 20 = 0.98, Median = 43.4.
(b) Examine the data for skewness and other signs of non-normality by using the
relative frequency histogram and boxplot options in Statcrunch, and by comparing
the sample mean and median (found in the summary option). Give your plots
and numerical summaries.
Based on these plots and summaries do you think the sample mean based on 20
observations will be almost normally distributed?
From Figure 1 we observe that that the mean and median are quite
close and there is some suggestion of of bi-modality but no real skew.
We cannot say that the mileage is very close to normal, but there
aren’t any features to suggest it deviates hugely from normality. By
Figure 1:
the central limit theorem, the sample mean becomes more normal as
the sample size grows. Based on this it is safe to say the sample mean
based on 20 observations will be close to normal.
(c) Assuming that the standard deviation is actually known use the normal distribution to construct a 95% confidence interval for the mean.
The confidence interval is [43.17 ± 1.96 × 0.98] = [41.2, 45.1]
(d) Using that the standard deviation has actually been estimated from the data
use the t-distribution with 19 degrees of freedom (since the sample size is 20) to
construct a 95% CI for the mean.
Looking up the tables we have see that the t-value corresponding to
2.5% at 19 degrees of freedom is 2.093. Thus the confidence interval is
[43.17 ± 2.093 × 0.98] = [41.1, 45.2].
(e) We note that in part (c) the confidence interval was constructed using that the
population standard deviation is known, and in part (d) the confidence interval
was constructed using that the standard deviation was estimated from the data.
The interval in part (d) is longer than the interval in part (c).
Explain why the interval in (d) is longer (don’t just say that 2.093 is bigger than
1.96).
The confidence interval using the t-distribution is slightly longer because it reflects the extra variability and error caused by using estimating the standard deviation and not using the true standard deviation.
This is reflected in the t-transform taking more extreme values and
thus the 2.5% level for the t-distribution being larger than the 2.5% for
the normal distribution.
(f) Using the confidence interval in part (d), does the data suggest that the mean
consumption is different from 42 miles per Gallon?
Since 42 lies in the interval [41.1, 45.2], we cannot say whether mean
mileage is 42 or not.
(g) Using what we have learnt in class, statistically test the hypothesis H0 : µ = 42
against the alternative HA : µ 6= 42 (use the fact that the standard deviation
was estimated from the data - ie. use the t-distribution rather than the normal
distribution).
This means calculating the chance of observing a sample mean of 43.17
given the mean is 42. Remember this is based on the reliability of the
estimator X̄, which is measured in terms of the standard error = 0.98.
= 1.19. If we look up the tables we
We make a t-transform t = (43.17−42)
0.98
see that the t-transform corresponding to 2.5% is 2.93. This means that
the area to the right of 1.19 is greater than 2.5%. Thus the p-value is
greater than 5%. If you use Statcrunch you will get the p-value 2 × 0.12
= 24%.
(h) How are the conclusions in parts (f) and (g) related?
Since 42 lies in the 95% confidence interval, it’s p-value is greater than
5%, this corresponds to the p-value that we calculated in part (g).
(i) You want to test the hypothesis that the mileage per gallon is greater than your
previous car, which did 41 miles per Gallon. By doing the appropriate test at
the 5% significance level is there any evidence to suggest that the current car has
better mileage than your old car?
Clearly state your null and alternative.
H0 : µ = 41 against HA : µ > 41. Again we calculate the t-transform
= 2.21. The hypothesis is pointing
under the null. This is t = (43.17−41)
0.98
to the right, thus we need to calculate the area to the right of 2.21. As
the t-value at 2.5% is 2.093, and 2.093 < 2.21, the p-value is less than
2.5%. Thus we can reject the null at the 5% level.
(j) You want to test the hypothesis that the mileage per gallon is greater than friend’s
car, which does 46 miles per Gallon. By doing the appropriate test at the 5%
significance level is there any evidence to suggest that the current car has better
mileage than your friend’s car?
Clearly state your null and alternative.
H0 : µ = 46 against HA : µ > 46. With an average of x̄ = 43.17 it is
easy to see there is no evidence to support the alternative. If we do
= −2.88. The hypothesis is
the calculation we see see that t = (43.17−46)
0.98
pointing to the right, thus we need to calculate the area to the right
of −2.88, this is clearly greater than 50%. Thus there is no evidence to
reject the null at any level.