Download Chapter 7

Chapter 7 Sampling and sampling distributions One of the reasons for taking a sample is to try to understand how the population is distributed using a sample rather than a census. It probably appeals to most people that you could gain a better understanding with a large sample as opposed to a small sample (more information is better than less information). So, you might get a better understanding with n=10000 as opposed to n=100 and n=100 might be better than n=1. In this chapter we will study sampling from a known population (nobody would do this in practice, why sample when you already know what the population looks like). However, we can compare the results of the sample with something that we already know. This will allow us to determine how much better off we would be if we increased the sample size. It will also allow us to develop some rules to use when we consider sampling from an unknown population. What we want to do in this chapter is to determine  If large samples are better than small samples.  How much better large samples are than small samples.  Construct some rules that we can use when sampling from an unknown population. Sampling from a known normal distribution Suppose we took a sample of size n from a normal distribution with   100 and   10 and then computed the sample mean. Now repeat this a large number of times and plot the histogram. Figures 1,2, and 3 show histograms of sample means for sample sizes n=1, n=10, and n=100. Figure 1. The histogram of the means of 1000 samples of size n=1 taken from a normal distribution Here even with samples of size n=1, the distribution of sample means appears to be ___ normally distributed. The bins represent values of the sample means X . Figure 2. The histogram of 1000 samples of size n=10 taken from a normal distribution Figure 2 also suggests that the distribution of sample means for samples of size n=10 is ___ normally distributed. However, the range of values of X is much smaller than it was for samples of size n=1. Figure 3. The histogram of 1000 samples of size n=100 taken from a normal distribution ___ The results for Figure 3 also indicate a normal distribution. The range of values of X is still smaller than for samples of size n=10. Note that the spread (standard deviation) of the distribution of sample means decreases as n gets larger. The sampling distribution when sampling from a normal distribution __  The distribution of sample means, X , will be normally distributed  This distribution will have a mean X    This distribution will have a standard deviation X   n Suppose that you have a normal distribution with   100 and   10 . Find the probability that a single observation taken from this distribution will be between 99 and 101. That is, find P(99<X<101). (X  ) (99  100)  1   0.1  10 10 ( X   ) (101  100) 1 Z    0.1  10 10 P(99  X  101)  P(0.1  Z  0.1)  0.5398  0.4602  0.0796 Z  Suppose now that we take a sample of size n=100 from this distribution. Find the ___ probability that the sample mean will be in [99,101]. That is find P(99  X  101).  X    100  ___  X  n  10 1 100  ___   X   __  X  (99  100)  1 Z    1  ___ 1 1 X    X   __  X  (99  100) 1 Z   1  ___ 1 1 ___ X __ P(99  X  101)  P(1  Z  1)  0.8413  0.1587  0.6826 Suppose now that we take a sample of size n=1000 from this distribution. Find the ___ probability that the sample mean will be in [99,101]. That is find P(99  X  101).  X    100  ___ X   n  10  0.316 10 00  ___   X   __  X  (99  100) 1 Z    3.16 X 0.316 0.316  ___   X   __  X   (99  100)  1  3.16 Z X 0.316 0.316 __ P(99  X  101)  P(3.16  Z  3.16)  0.9992  0.0008  0.9984 Of course we can also use Excel Example 7.1 Suppose that and auditing team examines accounts receivable for a certain firm. Unknown to the auditors the mean and standard deviation of these accounts is   1332.52 and   237.55 (these are population values). The auditing team takes a sample of n=36 accounts. Find the probability that the resulting sample mean will exceed $1350. Find the probability that the sample mean will be less than 1300. Find the probability that the sample mean will be between $1310 and $1360. Assume that accounts receivable can be described by a normal distribution. We have  X    1332.52  X  n  237.55 36  a) Find P X  1350 Z   39.592  X     1350  1332.52  0.44 X X 39.592  P X  1350  P( Z  0.44)  1  PZ  0.44  1  0.6700  0.3300  b) Find P X  1300  Z X     1300  1332.52  0.82 X  X 39.592  P X  1300  P( Z  0.82)  0.2061 c) Find P(1310  X  1360) Z X     1310  1332.52  0.57 Z X     1360  1332.52  0.69 X X X 39.592 X 39.592 P(1310  X  1360)  P(0.57  Z  0.69)  0.7549  0.2843 In Excel Example 7.2 Suppose that the time it takes to takes to fabricate a central processor chip for a computer can be described by a normal distribution with a mean of 35 minutes and a standard deviation of 5 minutes. A time management team is studying the process in hopes of improving it. The management team does not know what the mean fabrication time is, so the take a sample of n=100 time histories to try to get an estimate of the true, but unknown mean. Find the probability that the sample mean time is less than 34 minutes, the probability that it is greater than 36.3 minutes, and the probability that it will be between 34 and 35.7 minutes.  X    35  X  n 5  100   0.5 a) Find P X  34 Z  X     34  35  2.0 X  X 0.5  P X  34  PZ  2.0  0.0228   b) Find P X  36.3 Z  X     36.3  35  2.6 X X 0.5  P X  36.3  PZ  2.6  1  PZ  2.6  1  0.9953  0.0047 c) Find  P 34  X  35.7  Z X     34  35  2.0 Z X     35.7  35  1.4  X X X 0.5 X  0.5 P 34  X  35.7  P 2.0  Z  1.4  0.9192  0.0228  0.8964 And using Excel The central limit theorem Figure 4. The normal distribution between 90 and 110 It might not be too surprising that the sample means taken from a normal distribution would be normal, but let’s consider sampling from a uniform distribution where the samples must be in the range [90,110]. The results of such samples are shown in Figures 5, 6, and 7. Figure 5. The histogram of 1000 sample means of size n=1 taken from a uniform distribution between [90,110] Note that for a sample of size n=1, we are just sampling the distribution, and so the distribution of sample means just reproduces the distribution from which the sample was taken. The results here are not normal, but are the for the uniform distribution. Figure 6. The histogram of 1000 sample means of size n=10 taken from a uniform distribution between [90,110] Figure 7. The histogram of 1000 sample means of size n=100 taken from a uniform distribution between [90,110] If the sample size is increased to n=10, the distribution of sample means is starting to look like a normal distribution. See the graph in Fig. 6. Fig. 7 shows the histogram of 1000 sample means of size n=100. The histogram of these 1000 means looks quite normal. It appears here that if the sample size is as small as n=10 the resulting sampling distribution is normally distributed. In fact, if we make the size large enough, the distribution of sample means will be normally distributed. As a rule of thumb, if the sample size is on the order of n=30 then the sampling distribution will be normally distributed. These results are the most important in statistics and are called the Central Limit Theorem (CLT). The Central Limit Theorem(CLT) Regardless of the nature of the distribution from which a sample is taken, if the sample size is large enough (rule of thumb, n=30 is large enough), then __  The distribution of sample means, X , will be normally distributed  This distribution will have a mean X    This distribution will have a standard deviation X   n Note: if the sample comes from a population that is normally distributed, the CLT will hold for a sample of size n=1 or larger. In most practical applications in an unknown situation, people will say the CLT hold for samples of size n=30 or larger. The sampling distribution of the binomial distribution ( p̂ ). Here we will vary a little bit from our rule of thumb of (n=30) being large enough for the CLT to hold. We know a good deal more about the binomial; it is not just an unknown distribution. The CLT will hold for the binomial when np  5 nq  5 So the CLT will hold for the binomial when the normal approximation to the binomial distribution is good. We do make a change here. We will find it convenient to look at binomial problems in terms of the proportion of successes out of n trials rather than the number of successes. For the binomial distribution,   E ( X )  np where E(X) indicates the “expected value” of the distribution. It is another term for the mean of a distribution. The expected value is the value you would expect to get for the average result of performing an experiment a large number of times. Suppose that you flipped a coin ten times where p=0.5. Call getting a head a Success and record X, the number of S’s. Repeat this a large number of times and average the number of X’s. You would expect this average to be five. So for n=10, p=0.5 E ( x)  np  5 Now define the proportion of successes in n trials to be pˆ  X n so that the mean or expected value of the proportion of successes in n trials for the binomial is E  pˆ   E  X  np   p. n n The mean is just p, the probability of a success in any trial. The standard deviation of the distribution in terms of p̂ is  X   p  pq / n For the binomial distribution, the sampling distribution of p̂ will be normally distributed with X X n   p  pq / n if np  5 and nq  5 and the Z score is  X  pˆ  Z  pˆ  p  p Example 7.3 Polls are almost always reported in terms of proportions (the percentage of respondents that favor something) rather that in terms of X (the number of respondents that favor it). Suppose that a poll has been commissioned in an election contest between A and B. Consider a response for A to be a success. Suppose that 55% of all voters actually favor A. The size of the poll is n=1200 voters. What is the probability that the response for A will be in the range [52%,58%]? In this problem p=0.55 and n=1200. This gives  p  pq / n  (0.5)(0.45) / 1200  0.0144 Z  pˆ  p   0.52  0.55  2.08 Z  pˆ  p   0.58  0.55  2.08 p p 0.0144 0.0144 P  0.52  pˆ  0.58  P(2.08  Z  2.08)  0.9812  0.0188  0.9624 In Excel n= p= q= sigma-p P(0.52<p-hat<0.58) 1200 1200 0.55 0.55 0.45 0.45 0.0144 =SQRT(0.55*0.45/1200) 0.9628 =NORMDIST(0.58,0.55,0.0144,TRUE)-NORMDIST(0.52,0.55,0.0144,TRUE) Example 7.4 A market survey is taken of n  1000 potential buyers to see how they like a test product. Suppose that 10% of the population likes the product. What is the probability that 12% or more of the test group will indicate that they like the product. Does the CLT hold for the problem? np  1000  0.1  100 nq  1000*0.9  900 so the CLT holds for this problem. The standard deviation of the sampling distribution is  pˆ  pq / n  (0.1)(0.9) /1000  0.0095 So z and  pˆ  p    0.12  0.10   2.10 p 0.0095 P  pˆ  0.12  P  Z  2.10  1  P  Z  2.10  1  0.9821  0.0179 So there is less than a 2% chance of getting a sample proportion greater than 12% if the true population proportion is 10%. Problems 7. 1 Suppose a sample of size n=10 is taken from a normal distribution with   150 and   12 . Find a. P  X  153 b. P 148  X  151 c. P  X  148  7.2 Repeat problem 7.1 using a sample size of n=100. 7.3 A sample of size n=10 is taken from a population which is not normal, but has   100 and   10 . Does the Central Limit Theorem hold? Can you find P  x  101 using the Central Limit Theorem? 7.4 Suppose household incomes in Flagstaff are normally distributed with   22, 000 and   2, 000 . A sample of size n=10 households are sampled. Find a. P  x  21000 b. P  21,599  x  22,500 P  21,599  x  22,500 c. P  x  21,500 7.5 Suppose a machine is producing defective items at a 10% rate. One thousand items of the machines output are inspected. What is the probability that between 9% and 11% of the inspected items will be defective? 7.6 Repeat problem 7.5 to find the probability that between 8% and 12% of the inspected items will be defective. 7.6 Suppose that 52% of the registered voters are in favor of a certain proposition placed on an upcoming Arizona election ballot. A sample of n=1200 voters are selected at Random. What is the probability that a. between 49\% and 53\% of the sampled voters will favor the proposition? b. a majority of the voters in the sample will favor the proposition? c. more than 55\% of the voters in the sample will favor the proposition? 7.7 A machine is producing ball bearings with an average diameter of 101 cm and with a standard deviation of 8cm. A sample of n=49 ball bearings is taken. What is the probability that the sample mean will be between 98cm and 100cm? Answers 7.1 a) 0.2148, b) 0.3045, 0.2981 7.2 a) 0.0062, b) $0.7492, c) 0.0475 7.3 7.4 a) 0.9429, b) 0.5209 , c) 0.2148 7.5 P  0.09  pˆ  0.11  P  1.05  Z  1.05  0.7062 7.6 0.9652 7.7 a) 0.7361, b) 0.9177, c) 0.0188 7.8 0.1878

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 7