Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lesson Notes: Shape, Center, and Spread for Sample Proportions Deriving the Formulas Letting X represent a random variable that can take on the value 1 with probability p and the value 0 with probability 1 p, you can derive the mean and standard deviation of X symbolically from the definitions of the mean and standard deviation of a probability distribution. X xi pi 0(1 p) 1(p) p ___________ X xi X 2 pi __________________________ (0 p)2 (1 p) (1 p)2 p ______________________ p2 (1 p) (1 p)2 p __________________ (p)(1 p)(p 1 p) ________ p(1 p) Because p̂ is the mean of n independent x’s, the mean________ of the sampling ______ p(1 p) __ or _____ . See also distribution of p̂ is p and the standard deviation is _______ n n E45 and E46. p(1 p) Notes for AP Teachers Modeling Good Answers Once students have completed E35 or E43, show them the model answer as an example of what is expected on the AP Exam. The answer to E35 shows standard use of a normal approximation to estimate probabilities. E43 has a discussion of the effect of sample size on the shape of the sampling distribution. A PDF file containing each question and its model answer is available at www.keypress.com/keyonline. Solutions Discussion D11. a. The distribution for n 40 is the most normal, and the distribution for n 10 is the least normal. Notice that for n 10, np 10 0.6 6 10, and n(1 p) 10 0.4 4 10. The guideline does not hold and the distribution for n 10 is quite skewed. For n 20, np 12 10, but n(1 p) 8 10. The guideline does not hold. This distribution is less skewed, but the skew is still noticeable. When n 40, both np 24 and n(1 p) 16 are more than 10. The guideline does hold and the distribution looks 86 Section 7.3 Solutions very symmetric and fairly smooth. It appears that the guideline works quite well. b. We would expect 60% of any size sample to be wearing seat belts. c. The distribution for n 10 has the largest spread, and the distribution for n 40 has the smallest spread. d. Getting a sample with 75% or more Mississippians would be more likely with a sample size of 10. The area of the bars in the histogram to the right of 0.75 gets smaller as the sample size increases because with larger sample sizes, the sample proportions tend to cluster closer to 0.6. Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press ______ D12. a. For n 10, p̂ 0.6, p̂ _____ n p(1 p) _____ 0.6 0.4 0.155 10 _____ ______ For n 20, p̂ 0.6, p̂ _____ n p(1 p) _____ 0.6 0.4 0.110 20 _____ ______ For n 40, p̂ 0.6, p̂ _____ n p(1 p) _____ 0.6 0.4 0.077 40 _____ b. Estimates will vary, but the answers from part a should match quite well. In particular, the sampling distributions of the sample proportion for all sample sizes have a center that is close to 0.6. For the sampling distribution of the sample proportion for samples of size 10, the spread is large, with reasonably likely sample proportions ranging from about 0.3 to 0.9. For n 20, the spread is smaller, with reasonably likely sample proportions ranging from about 0.4 to 0.8. For n 40, the spread is small, with reasonably likely sample proportions ranging from about 0.45 to 0.75. c. The mean using these formulas is 0.6000006, the difference due to rounding error. The standard deviation is 0.155, which matches the result from the previous formula. Practice P16. First check the guideline: Both np 100 0.63 63 and n(1 p) 100 0.37 37 are greater than 10, so the shape of the distribution will be approximately normal. So______________ sum np 63, and _________ sum np(1 p) 100 0.63 0.37 4.83. Find z-scores and probabilities for both 56 and 70 freshmen and subtract: 56 63 1.45, P(sum ⱕ 56) 0.074 z _______ 4.83 70 63 1.45, P(sum ⱕ 70) 0.926 z _______ 4.83 Then 0.926 0.074 0.852. P ⫽ 0.852 56 63 70 Number of Freshmen Alternatively, normalcdf56,70,63,4.83 0.853. There is about an 85% probability that a sample of 100 freshmen will contain between 56 and 70 freshmen who believe that dissent Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press is a critical component of the political process. P17. a. The distribution for n 100 is the most normal, and the distribution for n 10 is the least normal. Notice that for n 10, np 10 0.1 1 10, and n(1 p) 10 0.9 9 10. The guideline does not hold and the distribution for n 10 is quite skewed. For n 20, np 2 10, but n(1 p) 18 10. The guideline does not hold and this distribution is less skewed, but the skew is still very noticeable. When n 40, np 4 10 but n(1 p) 36 10. The guideline does not hold and this distribution is less skewed, but the skew is still noticeable. For n 100, both np 10 and n(1 p) 90 are at least 10. The guideline does hold and the distribution looks very symmetric and fairly smooth. The skew is visible for all but n 100, where both np and n(1 p) are at least 10. It appears that the guideline works quite well. b. For each sample size, the expected proportion of drivers is 0.10. c. The distribution for n 10 has the largest spread, and the distribution for n 100 has the smallest spread. d. Finding 20% or more drivers failing a driving test would be more likely in a sample of size 10. The area of the bars in the histogram to the right of 0.20 gets smaller as the sample size increases because with larger sample sizes, the sample proportions tend to cluster closer to 0.1. P18. a. The mean for each distribution is p 0.10. ______ _____ 0.1 0.9 For n 10, p̂ _____ _____ 0.095 n 10 p(1 p) _____ 0.1 0.9 For n 20, p̂ _____ 0.067 20 _____ 0.1 0.9 For n 40, p̂ _____ 0.047 40 _____ 0.1 0.9 For n 100, p̂ _____ 100 0.03 b. Estimates will vary, but the answers from part a should match quite well. In particular, the sampling distributions of the sample proportion for all sample sizes have a center that is close to 0.1. For the sampling distribution of the sample proportion for samples of size 10, the spread is large, with reasonably likely sample proportions ranging from about 0 to 0.3. For n 20, the spread is smaller, with reasonably likely sample proportions ranging from about 0 to 0.23. For n 40, the spread is small, with reasonably likely sample proportions ranging from about 0.01 to 0.2. For n 100, the spread is smallest, with reasonably likely sample proportions ranging from about 0.04 to 0.16. Section 7.3 Solutions 87 c. The mean using the table is 0.0999999, different from 0.1 only because of rounding error. The standard error is about 0.095, which is equal to the result in part a. P19. a. This sampling distribution can be considered approximately normal because both np 53 and n(1 p) 47 are 10 or greater. The sampling distribution of p̂ has a mean of 0.53 and a standard error of ________ p̂ p(1 p) ________ n _____________ 0.53(1 0.53) ____________ 0.05 100 fall within approximately two standard errors of the mean. This interval is _________ p 1.96 p(1 p) ________ 0.6 1.96 n ___________ 0.6(1 0.6) n __________ The intervals produced for the various sample sizes are shown below. a. n 40: 0.6 0.152 or (0.448, 0.752) b. n 100: 0.6 0.096 or (0.504, 0.696) c. n 400: 0.6 0.048 or (0.552, 0.648) Note again that the sample size must be increased fourfold to cut the width of the interval in half. Exercises 0.43 0.48 0.53 0.58 0.63 Sample Proportion (p ⫽ 0.53, n ⫽ 100) b. No. As you can see from the sampling distribution in part a, the probability of getting 9 (or 9%) or fewer women just by chance is almost 0. That there are so few women in the U.S. Senate cannot reasonably be attributed to chance alone, and so we should look for other explanations (for example, that fewer women go into politics, that women are not able to raise as much money for their campaigns, or that voters are reluctant to elect a woman senator). P20. a. The sampling distribution of p̂ is approximately normal because both np 134.85 and n(1 p) 300.15 are 10 or greater. With p 0.31 and n 435, the sampling distribution for the sample proportion has a mean of 0.31 and a standard error of 0.022. A sample proportion of 0.53 is about 10 standard deviations above the mean of the sampling distribution, and therefore it would be very unusual to see such a result. The probability is close to 0. E35. The sampling distributions for the sample proportion and for the number of successes can be considered approximately normal because both np 920 and n(1 p) 80 are 10 or greater. a. This sampling distribution is approximately normal with mean and standard error p̂ 0.92 ________ p̂ p(1 p) ________ n _____________ 0.92(1 0.92) ____________ 0.00858 1000 The sketch, with scale on the x-axis, is shown in part c. b. This sampling distribution is approximately normal with mean and standard error sum 1000(0.92) 920 _________ __________________ sum np(1 p) 1000(0.92)(1 0.92) 8.58 The sketch is shown in part d. 0.9 0.92 2.33. __________ c. The z-score is z _________ 0.92(1 0.92) _________ 1000 P ⫽ 0.0099 p̂ ⫽ 0.90 0.903 0.911 0.92 0.929 0.937 Sample Proportion (n ⫽ 1000) 0.266 0.288 0.310 0.332 0.354 Sample Proportion (p ⫽ 0.31, n ⫽ 435) b. Yes, p̂ 231/435 0.53. As you saw in part a, it is almost impossible for this many representatives to be Republican if they were chosen at random from the general population. Possible reasons include the fact that people do not always vote for their party’s candidate. P21. Because the sampling distributions are approximately normal, in each case about 95% of the potential values of the sample proportion will 88 Section 7.3 Solutions The probability is about 0.0099. 925 920 __________________ d. The z-score is z ______________ 0.5828. 1000(0.92)(1 0.92) P ⫽ 0.2800 sum ⫽ 925 902.8 911.4 920 928.6 937.2 Number of Han Chinese (n ⫽ 1000) The probability is about 0.2800. Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press e. Rare events would be the totals that are outside the interval 920 1.96(8.58); that is, larger than 937 or smaller than 903. Rare events would be the proportions that are outside the interval 0.92 1.96(0.00858); that is, larger than 0.937 or smaller than 0.903. E36. a. Because np 500 0.223 111.5 and n(1 p) 500 0.777 388.5 are both 10 or more, the shape of the sampling distribution will be approximately normal. p̂ p 0.223 ________ p̂ p(1 p) ________ n ___________ 0.223 0.777 0.0186 500 ___________ The sketch, with scale on the x-axis, is shown in part c. b. The shape was verified as approximately normal in part a. sum np 500 0.223 111.5 _________ _______________ sum np(1 p) 500 0.223 0.777 9.31 The sketch, with scale on the x-axis, is shown in part d. c. Using the sampling distribution for the sample proportion, the z-score for a sample proportion of 0.2 is (0.2 0.223) z ___________ 1.24 0.0186 The probability of getting a sample proportion of 0.20 or fewer with one of these surnames is 0.108. P ⫽ 0.108 0.1858 0.2044 0.223 0.2416 0.2602 Sample Proportion d. Using the sampling distribution for the sample sum, the z-score for 105 successes is (105 111.5) z ____________ 0.698 9.31 The probability of getting 105 people or more with one of these surnames is 0.757. sum ⫽ 105 P ⫽ 0.757 e. Rare events would be the totals that are outside the interval 111.5 1.96(9.31); that is, larger than 129.75 or smaller than 93.25. Rare events would be the proportions that are outside the interval 0.223 1.96(0.0186); that is, larger than 0.259 or smaller than 0.187. E37. a. The shape would be slightly skewed to the left because np 100 0.92 92 and n(1 p) 100 0.08 8 are not both at least 10. The mean of the sampling distribution would still be 0.92 and the standard deviation would be greater because of the smaller sample size. b. As stated in part a, the shape will now be slightly skewed to the left. The mean will be onetenth that of the mean for a sample of 1000, and the standard___deviation will be smaller as well, by a factor of 10 . c. With n 100, the distribution is more spread out and skewed left than the distribution for n 1000. This means a larger percentage of the sample proportions would be farther from p and, in this case, in the direction of the skew or in the left tail. Because the sample proportion of 0.9 is in the left portion of the distribution or the part that is with the skew, the probability of getting 90% or fewer in the sample will be greater with a sample size of 100 than with a sample size of 1000. Note: You cannot accurately calculate this probability using this procedure because the distribution is not approximately normal. E38. a. Because np 100 0.223 22.3 and n(1 p) 100 0.777 77.7 are both at least 10, the shape will still be approximately normal. The mean of the sampling distribution of the sample proportion will still be 0.223, but the standard __ deviation will be larger by a factor of 5 . b. As shown in part a, the shape will still be approximately normal. The mean of the sampling distribution of the sample sum will be one-fifth as large as that in E36. The standard __ deviation will be smaller, as well, by a factor of 5 . c. The probability of getting 20% or fewer in the sample will be greater with a sample size of 100 because the distribution is more spread out. This means a larger percentage of the sample proportions would be farther from p. E39. a. 0.5 b. The sampling distribution can be considered approximately normal because both np 25 and n(1 p) 25 are 10 or greater. The sampling distribution has mean and standard error ___________ 92.88 102.19 111.5 120.81 130.12 Number of People Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press p̂ 0.5 and p̂ 0.5(1 0.5) __________ 0.0707 50 Section 7.3 Solutions 89 The z-score for a sample proportion of 0.2 is d. The z-score for a sample proportion of 0.54 is 0.2 0.5 4.2433 z ________ 0.0707 The probability of getting 10 or fewer under the median age is 0.000011. c. These results are not at all what we would expect from a random sample of 50 people. This group is special in that they are all old enough to have a job. Children are included in computing the 33.1 median age, but they do not hold jobs. Thus, we would expect the median age of job holders to be greater than the median age of all people in the United States, as they are at Westvaco. E40. For parts a and b of this problem, the sampling distribution for the sample total can be considered approximately normal because both np 50.3 and n(1 p) 85.7 are 10 or greater. For parts c to e of this problem, the sampling distribution for the sample total can also be considered approximately normal because both np 1054.5 and n(1 p) 1795.5 are 10 or greater. a. The sampling distribution is approximately normal with mean and standard error 0.54 0.37 18.80 __________ z ____________ 0.37 0.63 _________ 2850 The probability is very close to 0. e. The z-score for a sample proportion of 0.54 is 18.80 from part d. The probability of getting 54% or larger by chance alone is very close to 0. You should conclude that this group is special in some way. (In fact, Cal State Northridge is in Los Angeles and many of its students come from urban schools.) E41. You are assuming that the 75 married women were selected randomly from a population in which 60% are employed. Both np 75 0.6 45 and n(1 p) 75 0.4 30 are at least 10, so the sampling distribution of the sum will be approximately normal. The sampling distribution of the sum has sum np 45 _________ ___________ sum np(1 p) 75 0.6 0.4 4.24 For 30 employed women in the sample, z sum 136 0.37 50.32 _____________ sum 136 0.37 0.63 5.63 (30 45) ______ 3.536, and P(sum ⱕ 30) 0.0002. 4.24 For 40 employed women in the sample, z (40 45) ______ 1.179, and P(sum ⱕ 40) 0.1192. 4.24 The sketch is shown in part b. b. The z-score for 68 is P(30 ⱕ sum ⱕ 40) 0.1192 0.0002 0.1190. (68 50.32) z ___________ 3.14 5.63 P ⫽ 0.1190 P ⫽ 0.9992 total ⫽ 68 30 39.06 44.69 50.32 55.95 61.58 Number of Students (n ⫽ 136) The probability of getting 68 or fewer who need remediation is about 0.9992. c. This distribution is approximately normal with mean and standard error p̂ 0.37 ________ p̂ 0.37 0.63 0.009 2850 ________ 40 45 Number of Employed Women E42. Getting 15 or fewer disks with bad sectors is the same thing as getting 85 or more without. The sampling distribution for the number of disks that have no bad sectors can be considered approximately normal because both np 80 and n(1 p) 20 are 10 or greater. The sampling distribution has mean and standard error sum 100(0.8) 80 _______________ sum 100(0.8)(1 0.8) 4 The z-score for 85 disks with no bad sectors is 85 80 1.25 z _______ 4 0.352 0.361 0.370 0.379 0.388 Sample Proportion (n ⫽ 2850) 90 Section 7.3 Solutions Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press Frequency P ⫽ 0.1056 sum ⫽ 85 72 76 80 84 88 Number of Disks (n ⫽ 100) 0.80 0.90 1.00 Sample Proportion (n ⫽ 50, p ⫽ 0.90) Frequency 40 20 10 0.82 0.88 0.94 1.00 Sample Proportion (n ⫽ 100, p ⫽ 0.90) Frequency 60 50 40 30 20 10 0 0.86 0.88 0.90 0.92 0.94 Sample Proportion (n ⫽ 500, p ⫽ 0.90) Frequency 80 60 40 20 0 0.90 0.92 0.94 0.96 0.98 1.00 1.02 Sample Proportion (n ⫽ 50, p ⫽ 0.98) Frequency 80 60 40 20 0 0.90 0.92 0.94 0.96 0.98 1.00 1.02 Sample Proportion (n ⫽ 100, p ⫽ 0.98) 100 80 60 40 20 0 0.900 Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press 30 0 Frequency The probability of getting 85 or more with no bad sectors is about 0.1056. The assumption that underlies this answer is that the 100 sampled disks are a random sample from a population in which 80% of the disks have no bad sectors. E43. a. The means of the sampling distributions definitely depend upon p, as the first three center close to p 0.2 and the second three center close to p 0.4. The sampling distributions have means equal to p regardless of the sample size; the means do not depend upon the sample size. b. The spreads of the sampling distributions decrease as n increases for both values of p. The spreads also depend upon the value of p, however. For each sample size, the spread for p 0.4 has a larger standard error than the one for p 0.2. c. For p 0.2, the shape is quite skewed for n 5 and some slight skewness remains at n 25. For n 100, the shape is basically symmetric. For p 0.4, the shape shows a slight skewness at n 5 but is fairly symmetric at n 25 and beyond. The farther p is from 0.5, the more skewness in the sampling distribution of the sample proportion. d. The rule does not work well for samples of size 5 for either value of p but works well for samples of size 25 or more for both values of p. E44. The first three histograms are simulated sampling distributions for p 0.90, and the last three are for p 0.98. The sample sizes for each proportion are 50, 100, and 500. For p 0.90, the shape is skewed toward the smaller values for n 50 but is approximately normal when n 100 and n 500. For p 0.98, the shape is highly skewed for n 50 and remains highly skewed for n 100. It becomes approximately normal at n 500. The rule that np and n(1 p) must both be at least 10 seems to work quite well. The important message: Whether the normal model works well as an approximation depends upon both n and p. If p is close to 0 or 1, the sample size must be quite large to produce an approximately normal sampling distribution for the sample proportion. 70 60 50 40 30 20 10 0 0.940 0.920 0.960 0.980 Sample Proportion (n ⫽ 500, p ⫽ 0.98) 1.000 Section 7.3 Solutions 91 xf ____________ _ 0 14 + 1 26 0.65 E45. a. x ____ n 40 26 This sample mean is the same as p̂ __ 40 0.65 because each success adds 1 and each failure adds 0, so both formulas count the number of successes, then divide by the sample size. b. ∑ xi pi 0 0.4 1 0.6 0.6. The box on page 450 says p 0.6, which is the same. c. The sample proportion is a type of sample mean, if the successes are given the value 1 and failures the value 0. 92 Section 7.3 Solutions ___________ E46. a. xi X 2pi ___________________________ (0 0.6)2 0.4 (1 0.6)2 0.6 _________________ 0.62 0.4 0.42 0.6 ________________ 0.6 0.4(0.6 0.4) _______ 0.6 0.4 ____ 0.24 ________ _______ ____ b. p(1 p) 0.6 0.4 0.24 . Both formulas give the same value for the standard error. Statistics in Action Instructor’s Guide, Volume 2 © 2008 Key Curriculum Press