Download Answers to Homework set #4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Ch 21. #11 a) one-tailed because I would like to prove that the population proportion is too low,
rather than too low or too high. B) probability of rejecting a true null. The null hypothesis is that
the proportion is .27 and the alternative is that it is lower (smaller) than .27. so this would be the
probability of deciding based on the (unusual/misleading) data that the proportion of minorities is
too low (i.e. below .27) when really it is truly .27. c) probability of accepting a false null. So this
would be the probability of deciding that .27 (or 27%) are minorities when something else is true,
like perhaps only 22% are minorities or 18% or…..many beta errors. D) power is 1-beta. So it is
doing the right thing (opposite of beta error). It is the probability of rejecting a false null. So it is
the probability of stating that less than .27 (or 27%) are minorities when some proportion smaller
(in the population) is the truth. E) when the p-value is less than alpha we reject the hypothesis so if
alpha goes from .01 to .05 it is easier to reject the null hypothesis. So if you can reject it more often
you are more likely to reject a false null hypothesis which is good so that power must increase. F)
since power is a good thing and it is good to get more data then only using 37 rather than 87
employees should result in the loss of power for the test. G) n=37 and p-hat=.19 and 95% using 2
then .19 +- 2* (sqrt ((.19*.81)/37))=.19+-.13=(.06, .32)
To test use a z-score of (.19-.27)/(sqrt((.27*.73)/37)) = -1.09
Using the A-51 normal tables we get a p-value of .1379 comparing this to an alpha=.05 says we
find supportive evidence for p=.27.
Ch. 23 #9 I said to change to 99% but 98% is there. I will do both a) original distribution normal by
histogram, independence n/N<.10 b) 98.28+-t(df=51)*(.68/sqrt(52))=98.28+-2.403 or
2.678)*.094=98.28+-.226 or .252= (98.03, 98.5) approximately. C) we are 98% (or 99%) sure that
the mean for all folks’ body temperature is within that interval d) if we picked 52 people again and
again, say 100 times then we would make 98 (or 99) intervals that have the mean inside them e) a
test of the mean is 98.6 against it is not 98.6 would produce a t-score of (98.28-98.6)/.094 =-3.4,
look for df=52-1=51 (closest is 50) and find a number near 3.4 in A-53..that is something larger
than 2.678 so we know the p-value (looking up to the top of A-53 under two tail) is less than .01.
We reject the idea that the mean is 98.6 since .01<.05 (alpha I suggested). So, this data makes us
conclude that the normal temperature is not 98.6 #11 a) the less sure we need to be the narrower we
can make the interval, so 90% is less sure so I could build a narrower interval, it would be 98.28 +1.676*.094 = (98.1, 98.4) b) the more sure interval makes it more likely that we have the true (all
people’s) mean body temperature identified but at the sacrifice of having to all more ‘wiggle room’
about where that value really lies.c) more data means more information meaning we can be more
sure and/or narrow our interval. If we compare this new 99 (or 98)% interval made from 500
people it will be narrower than the last based on only 52 people. D) using the approximate formula I
derived in class of (2s/MOE)*(2s/MOE) we get ((2*.68)/.1)*((2*.68)/.1)=184.96 which you should
round up to 185 people. Deveaux uses a more exact formula to get 252.
Ch. 24 #7 from the authors’ website (via our class website or directly
http://media.pearsoncmg.com/aw/aw_deveaux_introstats_1/data/data_index.html) I downloaded the
cereal data and used DDXL. Highlighting the 2 columns and choosing ‘confidence intervals’ and
then 2-sample, with 95% confidence:
the sample mean for the children’s is
46.85 and for the adult’s is 10.367; we
know that the stdev of the difference of
48.65 and 10.367 (or 36.483) is gotten
by using the sqrt of (((7.67*7.67)/27)+
((6.6*6.6)/18)) = sqrt(2.18+2.42)=2.14
(note the 7.67 and 6.6 were gotten by
using chart/histogram in DDXL. The
df=40 are shown by DDXL to left and
come from the formula in the footnote
on p. 454. So t with df=40 and
confidence of 95% is 2.021 (check this
in A-53). So
36.483+-2.021* 2.14 = (32.15, 40.82)
although not shown the histogram for the adult cereal looks
somewhat triangular rather than bell-shaped and may pose a
concern about the truth of assumptions.
To test whether the null hypothesis that there is no difference in
the mean sugar content of children and adult cereals versus that
the mean for the children’s is larger (greater, higher) we will
need a t-score = (36.483-0)/2.14=17.05. Looking at A-53 with
df=40 we see that the closest number is 2.704….so the p-value
is less than .01. with .01<.05=alpha we definitely reject the idea
that the mean sugar contents are the same and accept/decide that
the children’s mean content is higher than the adult’s.