Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Homework #5 --- Solutions Section 6.1 1. 6.2 Ans: 540 1.96(80 / 10 ) (490.42, 589.58) 6.2 Extra “?”: Under what conditions do you think it is reasonable to assume you know the standard deviation? Can you construct a 100% confidence interval? If “Yes”, how would you do it? Ans: It is reasonable to assume that you know when it is determined from a population (or process) which is both very familiar to you AND has exhibited very stable characteristics over a relatively “long” period of time. It would, in effect, be necessary to have an infinitely large sample size, virtually no variability, or a margin for error allowed to be infinitely “large” and therefore a 100% C. I. is not possible. 2. 6.6 (a) 0.9186 kg (b) (59.99, 63.59 kg) Yes, the upper limit in the interval is quite a bit less than 65. Extra “?”: a) What is the standard deviation of X? Ans: by hand or with SPSS, x = 61.79 and s = 4.8 Is it the same as ? Why or why not? No, the sample standard deviation will vary with every sample. b) How sure are you that the average wt. < 65kg? Ans: 95% confident because the interval is a 95% C.I. $3. Go to http://www.whfreeman.com/ips4e, select the ‘Confidence Intervals’ applet. a) Change the Confidence Level (C) (in the left-hand column) to 99% and then click Sample. Where is your sample mean relative to the true mean, ? It’s as likely to be above as below it. Most of the time it will be close, and only rarely will it be far away. Are they the same value? How likely are they to be the same value, i.e., what is P( X = )? No. P( X = ) = 0 since the probability for z equal to a constant is always zero. How often will X be smaller than the true mean, i.e., what is P( X < )? P( X < ) = P( ( X )/ < ( )/ ) = P(z < 0) = 0.5. How often will X be larger than the true mean, i.e., what is P( X > )? P( X > ) = P( ( X )/ > ( )/ ) = P(z < 0) = 0.5. b) Now change the Confidence Level to 95%, then 90% and finally 80%. What is happening x ? Why? The sample mean did not change because we use the same set of data x . What is happening to the width of the interval? Why? The interval gets shorter(narrower) as the confidence level goes down. This is because the z-score changes. What part of the margin of error is changing: the z-score, the standard deviation of the (sample) data or the sample size? The z-score changes with the different confidence levels. The subscript, /2, on the z says the area above the (1)% interval is /2. c) Now click Sample nineteen more times (so you’ll have 20 total intervals). What’s the difference in the intervals: the widths or the centers? Why? The centers changed because we drew different samples each time but did not change the confidence level so the length of the intervals stays the same. What causes the inteval to include the true mean, , or not? We can consider the interval as a randomly selected interval and we have (1)% chance that the interval will cover the true mean IN THE LONG RUN. Remember! to have a % you MUST have multiple samples! What proportion (or percent) of these 20 intervals actually covered (included) the true mean, ? In my survey, it is 85% (confidence level = 0.80). The answers will vary. What is the probability that any one of your intervals contains ? Is it always the same? Once you’ve gotten a sample, there is NO probability left, the inteval calculated from the sample either contains or not (0 or 1 rather than a %), so no, it’s not the same always. d) Repeat this 20 times and record how many actually covered . (click the Reset button at the bottom) 18 16 17 15 16 16 17 16 16 15 14 19 17 17 17 15 17 16 16 17 (confidence level = 0.8) Why are these numbers not all the same? They are not all the same because the probability of number of intervals contain the true mean is the confidence level but it does not mean we will always get the same number in each time. We will get approximately (1)% of the intervals to cover , but this will vary from sample (of 20 intevals) to sample. If you looked at the distribution of these numbers, how would you describe it shape, center, spread? The distribution of these numbers is approximately Normal (see the histogram) with the true mean is 16 (80% of 20). The sample mean is 16.35 and the sample standard deviation is 1.1367 for this data. Ok, now we know that our estimate, e.g., x , is never exactly equal to the parameter we want, e.g., , so we use intervals estimates, called confidence intervals, instead. The basic formula for a confidence interval is: estimate z*sd(estimate). e) Explain each of the following including its function in the interval and how it affects the interval width. i. estimate ii. z*=z/2 iii. sd(estimate) ( x is normally distributed (when the Central Limit Theorem holds) and therefore continuous, so P( x = any #) = 0!) We can use z* when our data is normally distributed AND we know the true population variance, 2, OR when our sample size is large enough (at least 30, but more is better) so that the CLT holds. The large sample size also lets us use s, the sample standard deviation in place of . Using t* in place of z*, is more conservative (wider intervals), so computers always use t*. a. estimate: our best guess for the parameter so we make it the center of the interval. Changing the value of our estimate (getting a different x , just shifts the interval. It doesn’t change the width at all. b. z*=z/2: the number from the distribution (table) that gives us the desired confidence level. The more confidence we want the larger this value and thus the wider the interval c. sd(estimate): the standard deviation (or standard error when estimated from the data) of our estimate. If our data is highly variable (large ), then we are more uncertain of our estimate and so the interval is wider. If we take more of our data (larger n), we are more sure of our estimate so we get a narrower interval. Uncertainity is inferred by the width of the interval: the wider the interal the less sure we are; the narrower the interval the more sure we are about the accuracy of our estimate. $4. We usually refer to confidence intervals as “(1-)*100% confidence intervals”, so C = 1. What role does, , play in a confidence interval? The confidence level, 1 , provides information on how much confidence we can have in the method used to construct the interval estimate. The probability is in which sample we will get, and so which x we will have. Once we have a sample there is no longer any probability: either we got a good (close to ) x or we didn’t. How does increasing(decreasing) affect the interval? The higher confidence level, the wider the interval due to the larger z/2. Interpret the confidence level, (1-). Think about what you saw in 1c). In the long run we can expect 100(1-)% of the confidence intervals to contain the true value of the population parameter. For any set of intevals, we expect about % will NOT contain the true parameter. $5. 6.15 As an example: 40, 42, 42, 39, 37, 41, 42, 44, 42, 41, 45, 35, 37, 37, 35, 39, 40, 46, 38, 41, 39, 44, 37, 44, 42, 41, 35, 43, 46, 43 Frequency Stem & 3.00 4.00 4.00 6.00 7.00 4.00 2.00 3 3 3 4 4 4 4 Stem width: Each leaf: . . . . . . . The mean is 40.6, very close to 40 the expected number of ‘hits’ out of 50 80% confidence intervals. If we continued to sample, we would see the mean average of hits getting closer and closer to 40. Leaf 555 7777 8999 001111 2222233 4445 66 10.00 1 case(s) Extra “?”: What happens to a confidence interval when you reduce the confidence level from 95% to 80%? You do NOT have to do 6.14 to answer this question! Ans: The interval becomes narrower. $6. 6.16 Extra “?”: What else affects the sample size in addition to the desired margin of error, m? Ans: Referring to the sample size formula on page 425, we see that z* and , “desired confidence level” and variability (in the population measure), respectively, also affect the size of the sample. 2 z * 1.96*8000 2 Solve n 31.36 983.45 . Use 984. 500 m 2 Alternatively: Solve 1.96(8000/n) = 500 , which is the formula for the margin for error. Result: n = 983.45, so use 984, since we want a whole number and choose to “round up”. Section 8.1 7. 8.1 (Using p, the sample proportion, p = 11/40 = 0.275, with SE=0.0706. The 95% interval is 0.275+1.96(0.0706) p = 13/44 = 0.2955, with SE = 0.2955(10.2955)/44 = 0.0688. = 0.1366 to 0.4134.) Using the Wilson estimate, ~ The 95% interval is 0.2955+1.96(0.0688) = 0.1607 to 0.4303. p = 17/88 = 0.1932, SE = (0.1932)(10.1932)/88 = 0.0421. Using p, the $8. 8.4 (a) Using the Wilson estimate, ~ sample proportion, p = 15/84 = 0.1786, with a SE of 0.0418. (b) Using this, our 95% interval is (0.1511,0.2353) using the Wilson estimate, and (0.1368, 0.2204) using just p. Extra “?”: What would happen to the confidence interval if you required your confidence interval to have 98% confidence. No computations required. Ans: The size (width) of the interval would become larger, since we are asking for greater confidence that we have actually “captured” the true value of p, . 9. 8.10 Using p, the sample proportion, p = 41/216 = 0.1898, with SE = 0.1898*(10.1898)/216 = 0.02668. The p = 95% interval is (0.18981.96(0.02668), 0.1898+1.96(0.02668)) = (0.1375, 0.2421). Using the Wilson estimate, ~ (p+2)/(n+4) = 43/220 = 0.1955, with SE=0.1955(10.1955)220 = 0.02674. The 95% interval is (0.1688, 0.2222). 2 z* 10. 8.16 n + 4 = p* (1 p* ) = (1.96/0.03)2 (0.44)(0.56) = 1051.7 n = 1047.7 but we MUST ALWAYS m ROUND UP, so we would need 1048 in our sample. Extra “?”: What would happen to the confidence interval if the margin of error was allowed to increase? Ans: The size (width) of the interval would increase and we would need less in our sample.