Download Section 8.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Homework #5 --- Solutions
Section 6.1
1. 6.2 Ans: 540  1.96(80 / 10 )  (490.42, 589.58)
6.2 Extra “?”: Under what conditions do you think it is reasonable to assume you know the standard deviation? Can
you construct a 100% confidence interval? If “Yes”, how would you do it?
Ans: It is reasonable to assume that you know  when it is determined from a population (or process) which is both
very familiar to you AND has exhibited very stable characteristics over a relatively “long” period of time.
It would, in effect, be necessary to have an infinitely large sample size, virtually no variability, or a margin for error
allowed to be infinitely “large” and therefore a 100% C. I. is not possible.
2. 6.6 (a) 0.9186 kg
(b) (59.99, 63.59 kg) Yes, the upper limit in the interval is quite a bit less than 65.
Extra “?”: a) What is the standard deviation of X? Ans: by hand or with SPSS, x = 61.79 and s = 4.8 Is it the same
as ? Why or why not? No, the sample standard deviation will vary with every sample.
b) How sure are you that the average wt. < 65kg? Ans: 95% confident because the interval is a 95% C.I.
$3. Go to http://www.whfreeman.com/ips4e, select the ‘Confidence Intervals’ applet.
a) Change the Confidence Level (C) (in the left-hand column) to 99% and then click Sample.
Where is your sample mean relative to the true mean, ?
It’s as likely to be above  as below it. Most of the time it will be close, and only rarely will it be far away.
Are they the same value? How likely are they to be the same value, i.e., what is P( X = )?
No. P( X = ) = 0 since the probability for z equal to a constant is always zero.
How often will X be smaller than the true mean, i.e., what is P( X < )?
P( X < ) = P( ( X  )/ < ( )/ ) = P(z < 0) = 0.5.
How often will X be larger than the true mean, i.e., what is P( X > )?
P( X > ) = P( ( X  )/ > ( )/ ) = P(z < 0) = 0.5.
b) Now change the Confidence Level to 95%, then 90% and finally 80%.
What is happening x ? Why?
The sample mean did not change because we use the same set of data  x .
What is happening to the width of the interval? Why?
The interval gets shorter(narrower) as the confidence level goes down. This is because the z-score changes.
What part of the margin of error is changing: the z-score, the standard deviation of the (sample) data or the sample
size?
The z-score changes with the different confidence levels. The subscript, /2, on the z says the area above the (1)%
interval is /2.
c) Now click Sample nineteen more times (so you’ll have 20 total intervals).
What’s the difference in the intervals: the widths or the centers? Why?
The centers changed because we drew different samples each time but did not change the confidence level so the
length of the intervals stays the same.
What causes the inteval to include the true mean, , or not?
We can consider the interval as a randomly selected interval and we have (1)% chance that the interval will cover
the true mean IN THE LONG RUN. Remember! to have a % you MUST have multiple samples!
What proportion (or percent) of these 20 intervals actually covered (included) the true mean, ?
In my survey, it is 85% (confidence level = 0.80). The answers will vary.
What is the probability that any one of your intervals contains ? Is it always the same?
Once you’ve gotten a sample, there is NO probability left, the inteval calculated from the sample either contains  or
not (0 or 1 rather than a %), so no, it’s not the same always.
d) Repeat this 20 times and record how many actually covered . (click the Reset button at the bottom)
18 16 17 15 16 16 17 16 16 15 14 19 17 17 17 15 17 16 16 17 (confidence level = 0.8)
Why are these numbers not all the same?
They are not all the same because the probability of number of intervals contain the true mean is the confidence level
but it does not mean we will always get the same number in each time. We will get approximately (1)% of the
intervals to cover , but this will vary from sample (of 20 intevals) to sample.
If you looked at the distribution of these numbers, how would you describe it  shape, center, spread?
The distribution of these numbers is approximately Normal (see the histogram) with the true mean is 16 (80% of 20).
The sample mean is 16.35 and the sample standard deviation is 1.1367 for this data.
Ok, now we know that our estimate, e.g., x , is never exactly equal to the parameter we want, e.g., , so we use
intervals estimates, called confidence intervals, instead. The basic formula for a confidence interval is: estimate 
z*sd(estimate).
e) Explain each of the following including its function in the interval and how it affects the interval width.
i. estimate
ii. z*=z/2
iii. sd(estimate)
( x is normally distributed (when the Central Limit Theorem holds) and therefore continuous, so P( x = any #) = 0!)
We can use z* when our data is normally distributed AND we know the true population variance, 2, OR when our
sample size is large enough (at least 30, but more is better) so that the CLT holds. The large sample size also lets us
use s, the sample standard deviation in place of . Using t* in place of z*, is more conservative (wider intervals), so
computers always use t*.
a. estimate: our best guess for the parameter so we make it the center of the interval. Changing the value of our
estimate (getting a different x , just shifts the interval. It doesn’t change the width at all.
b. z*=z/2: the number from the distribution (table) that gives us the desired confidence level. The more confidence
we want the larger this value and thus the wider the interval
c. sd(estimate): the standard deviation (or standard error when estimated from the data) of our estimate. If our data
is highly variable (large ), then we are more uncertain of our estimate and so the interval is wider. If we take
more of our data (larger n), we are more sure of our estimate so we get a narrower interval.
Uncertainity is inferred by the width of the interval: the wider the interal the less sure we are; the narrower the
interval the more sure we are about the accuracy of our estimate.
$4. We usually refer to confidence intervals as “(1-)*100% confidence intervals”, so C = 1.
What role does, , play in a confidence interval?
The confidence level, 1 , provides information on how much confidence we can have in the method used to
construct the interval estimate. The probability is in which sample we will get, and so which x we will have. Once
we have a sample there is no longer any probability: either we got a good (close to ) x or we didn’t.
How does increasing(decreasing)  affect the interval?
The higher confidence level, the wider the interval due to the larger z/2.
Interpret the confidence level, (1-). Think about what you saw in 1c).
In the long run we can expect 100(1-)% of the confidence intervals to contain the true value of the population
parameter.
For any set of intevals, we expect about % will NOT contain the true parameter.
$5. 6.15 As an example: 40, 42, 42, 39, 37, 41, 42, 44, 42, 41, 45, 35, 37, 37, 35,
39, 40, 46, 38, 41, 39, 44, 37, 44, 42, 41, 35, 43, 46, 43
Frequency
Stem &
3.00
4.00
4.00
6.00
7.00
4.00
2.00
3
3
3
4
4
4
4
Stem width:
Each leaf:
.
.
.
.
.
.
.
The mean is 40.6, very close to 40 the expected number of ‘hits’ out of 50
80% confidence intervals. If we continued to sample, we would see the
mean average of hits getting closer and closer to 40.
Leaf
555
7777
8999
001111
2222233
4445
66
10.00
1 case(s)
Extra “?”: What happens to a confidence interval when you reduce the confidence level from 95% to 80%? You do
NOT have to do 6.14 to answer this question! Ans: The interval becomes narrower.
$6. 6.16 Extra “?”: What else affects the sample size in addition to the desired margin of error, m?
Ans: Referring to the sample size formula on page 425, we see that z* and , “desired confidence level” and
variability (in the population measure), respectively, also affect the size of the sample.
2
 z *   1.96*8000 
2
Solve n  
 
  31.36  983.45 . Use 984.
500

 m  
2
Alternatively: Solve 1.96(8000/n) = 500 , which is the formula for the margin for error. Result: n = 983.45, so use
984, since we want a whole number and choose to “round up”.
Section 8.1
7. 8.1 (Using p, the sample proportion, p = 11/40 = 0.275, with SE=0.0706. The 95% interval is 0.275+1.96(0.0706)
p = 13/44 = 0.2955, with SE = 0.2955(10.2955)/44 = 0.0688.
= 0.1366 to 0.4134.) Using the Wilson estimate, ~
The 95% interval is 0.2955+1.96(0.0688) = 0.1607 to 0.4303.
p = 17/88 = 0.1932, SE = (0.1932)(10.1932)/88 = 0.0421. Using p, the
$8. 8.4 (a) Using the Wilson estimate, ~
sample proportion, p = 15/84 = 0.1786, with a SE of 0.0418. (b) Using this, our 95% interval is (0.1511,0.2353) using
the Wilson estimate, and (0.1368, 0.2204) using just p.
Extra “?”: What would happen to the confidence interval if you required your confidence interval to have 98%
confidence. No computations required. Ans: The size (width) of the interval would become larger, since we are
asking for greater confidence that we have actually “captured” the true value of p, .
9. 8.10 Using p, the sample proportion, p = 41/216 = 0.1898, with SE = 0.1898*(10.1898)/216 = 0.02668. The
p =
95% interval is (0.18981.96(0.02668), 0.1898+1.96(0.02668)) = (0.1375, 0.2421). Using the Wilson estimate, ~
(p+2)/(n+4) = 43/220 = 0.1955, with SE=0.1955(10.1955)220 = 0.02674. The 95% interval is (0.1688, 0.2222).
2
 z* 
10. 8.16 n + 4 =   p* (1  p* ) = (1.96/0.03)2 (0.44)(0.56) = 1051.7  n = 1047.7 but we MUST ALWAYS
m
ROUND UP, so we would need 1048 in our sample.
Extra “?”: What would happen to the confidence interval if the margin of error was allowed to increase?
Ans: The size (width) of the interval would increase and we would need less in our sample.