Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 8 Sampling Variability & Sampling Distributions Section 8.1 Statistics and Sampling Variability Suppose we are interested in finding the true mean (m) fat content of quarter-pound hamburgers marketed by a national fast food chain. To learn something about m, we could obtain a sample of To answer and these questions, n = 50 hamburgers determine thewe fatwill content examine the sampling distribution, which of each one. describes the long-run behavior of sample statistics. Recall that the sample mean is a statistic Statistic The campus of Wolf City College has a fish pond. Suppose there are 20 fish in the pond. The lengths of the fish (in inches) are given below: 4.5 5.4 10.3 7.9 6.3 4.3 8.5 6.6 11.7 8.9 2.2 9.8 9.6 8.7 13.3 4.6 10.7 13.4 7.7 5.6 This is aThe statistic! true mean m = 8. We caught fish with lengths 6.3 inches, Notice that some Let’sare catch two 2.2 inches, and 13.3 inches. sample means Suppose we randomly catch a sample ofan This is more samples closer and some farther x = 7.273inches example of fish from this pond and measure theirat the and look away; some above and 2nd sample 8.5, 4.6, and 5.6 inches. sampling length. What would thebelow mean of some thelength mean. sample means. x = 6.23 inches the sample be? rd 3 sample – 10.3, 8.9, and 13.4 inches. x = 10.87 inches variability The distribution that would be formed by considering the value of a sample statistic for every possible different sample of a given size from a population. In this case, the sample statistic is the sample mean x. Fish Pond Continued . . . 4.5 5.4 10.3 7.9 6.3 4.3 9.6 8.5 6.6 11.7 8.9 2.2 9.8 8.7 13.3 4.6 10.7 13.4 7.7 5.6 Section 8.2 The Sampling Distribution of a Sample Mean Fish Pond Revisited . . . Suppose there are only 5 fish in the pond. The lengths of the fish (in inches) are given below: 6.6 11.7 8.9 2.2 What is the mean mand 7.84 x =standard deviation of this sxpopulation? = 3.262 9.8 We will keep the population size small so that we can find ALL the possible samples. Fish Pond Revisited . . . 6.6 11.7 8.9 2.2 9.8 mx = 7.84 and sx = 3.262 Pairs 6.6 & 11.7 6.6 & 8.9 6.6 & 2.2 6.6 & 9.8 x 9.15 7.75 4.4 8.2 mx = 7.84 sx = 1.998 11.7 & 8.9 11.7 & 2.2 11.7 & 9.8 8.9 & 2.2 8.9 & 9.8 2.2 & 9.8 find all the 10.3 Let’s 6.95 10.75 5.55 9.35 6 How many samples samplesofofsize size22. are possible? How doisthese What the mean valuesand compare to standard the population deviation of these meansample and standard means? deviation? Fish Pond Revisited . . . 6.6 11.7 8.9 2.2 9.8 mx = 7.84 and sx = 3.262 Triples x 6.6, 11.7, 8.9 6.6, 11.7, 2.2 6.6, 11.7, 9.8 6.6, 8.9, 2.2 9.067 6.833 9.367 5.9 mx = 7.84 sx = 1.332 6.6, let’s 11.7, Now find 11.7, all the11.7, 2.2, 9.8 many 8.9, 2.2 samples 8.9, 9.8 2.2, 9.8 How samples of size 3. of size 310.133 are 7.9 8.433 6.2 7.6 possible? 6.6, 8.9, 9.8 What is the mean How do values andthese standard compare deviation to of the these population and samplemean means? standard deviation? 8.9, 2.2, 9.8 6.967 What do you notice? • The mean of the sampling distribution EQUALS the mean of the population. mx = m • As the sample size increases, the standard deviation of the sampling distribution decreases. as n sx General Properties of Sampling Distributions of x Rule 1: Rule 2: mx m sx s n Note that in the previous fish pond examples this standard deviation formula was not correct because the sample sizes were more than 10% of the population. This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample The paper “Mean Platelet Volume in Patients with Metabolic Syndrome and Its Relationship with Coronary Artery Disease” (Thrombosis Research, 2007) includes data that suggests that the distribution of platelet volume of patients We can use Minitab to generate random samples from whothis dopopulation. not have metabolic syndrome500 is approximately We will generate random samples normalofwith 8.25 and s =the 0.75. n = m5 =and compute sample mean for each. Platelets Continued . . . Similarly, we will generate 500 random samples of n = 10, n = 20, and n = 30. The density histograms below display the resulting 500 x for each of the given sample sizes. What do doyou you notice notice What do you notice What aboutthe themeans standard about the shape of about deviation of these these histograms? these histograms? histograms? General Properties Continued . . . Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n. The paper “Is the Overtime Period in an NHL Game Long Enough?” (American Statistician, 2008) gave data on the time (in minutes) from the start of the game to the first goal scored for the 281 regular season games from the 20052006 season that went into overtime. The density histogram for the data is shown below. Let’s consider these 281 values as a population. The distribution is strongly positively skewed with mean m = 13 minutes and with a median of 10Using minutes. Minitab, we will generate 500 samples of the following sample sizes from this distribution: n = 5, n = 10, n = 20, n = 30. What do you notice These Are these are histograms What dothe youdensity notice about the standard histograms centered the at 500 about thefor shape of deviations of these approximately samples m = 13? these histograms? histograms? General Properties Continued . . . How large is “sufficiently large” CLT can safely be applied if n exceeds 30. anyway? A soft-drink bottler claims that, on average, cans contain 12 oz of soda. Let x denote the actual volume of soda in a randomly selected can. Suppose that x is normally distributed with s = .16 oz. Sixteen cans are randomly selected, and the soda volume is determined for each one. Let x = the resulting sample mean soda. If the bottler’s claim is correct, then the sampling distribution of x is normally distributed with: m x m 12 s .16 sx .04 n 16 Soda Problem Continued . . . m x m 12 s .16 sx .04 n 16 To standardized these endpoints, use in mthe xmean m soda What is the probability that x the sample Look these up table x z volume is between 11.96 andounces subtract the12.08 s sx and ounces? n probabilities. P(11.96 < x < 12.08) = .9772 - .1587 = .8185 11.96 12 a* 1 .04 12.08 12 b* 2 .04 A hot dog manufacturer asserts that one of its brands of hot dogs has an average fat content of 18 grams per hot dog with standard deviation of 1 gram. Consumers of this brand would probably not be disturbed if the mean was less than 18 grams, but would be unhappy if it exceeded 18 grams. An independent testing organization is asked to analyze a random sample of 36 hot dogs. Suppose the resulting sample mean is 18.4 grams. Does this Since the sample size is result suggest that the manufacturer’s claim is greater than 30, the incorrect? Central Limit Theorem applies. So the distribution of x is approximately normal with mx 18 and sx 1 .1667 36 Hot Dogs Continued . . . mx 18 and sx 1 .1667 36 Suppose the resulting sample mean is 18.4 grams. Does this result suggest that the manufacturer’s claim is incorrect? P(x > 18.4) = 1 - .9918 = .0082 Values of x at least as large as 18.4 would be observed 18 .4 18 only about .82% of the time. z 2.40 The sample .1667 mean of 18.4 is large enough to cause us to doubt that the manufacturer’s claim is correct. Section 8.3 The Sampling Distribution of a Sample Proportion Let’s explore what happens within distributions of sample proportions (p). Have students perform the following experiment. This is a statistic! •Toss a penny 20 times and record the number of heads. The is a partial graph ofmark the •Calculate the proportion oftoheads and Whatdotplot would happen the dotplot if weit on sampling distribution of all the dot plot on board. flipped thethe penny 50 times andsample recorded proportions of sample size 20. the proportion of heads? What shape do you think the dot plot will have? Sampling Distribution of p The distribution that would be formed by considering the value of a sample statistic for We will use: every possible different sample of a given size p for the population proportion from a population. and p for the sample proportion In this case, we will use number of successes in the sample pˆ n Suppose we have a population of six students: Alice, Ben, Charles, Denise, Edward, & Frank will keep so We areWe interested in the the population proportion small of females. that we the can parameter find ALL theof possible interest This is called samples of a given size. What is the proportion of females? 1/3 Let’s select samples of two from this population. How many different samples are possible? 6C2 =15 Find the 15 different samples that are possible and find the sample proportion of the number of females in each sample. Ben & Frank Alice & Ben .5 Charles & Denise Alice & Charles .5 Charles & Edward Alice & Denise 1 Charles & Frank Alice & Edward .5 Denise of & Edward the mean the Alice & Frank How does .5 Denise & Frank Ben & Charles 0 sampling distribution & Frank Ben & Denise compare .5 to theEdward population Ben & Edward 0 parameter (p)? 0 .5 0 0 .5 .5 0 Find the mean and standard deviation of these sample proportions. 1 m pˆ and s pˆ 0.29814 3 General Properties for Sampling Distributions of p Rule 1: m pˆ p Rule 2: s pˆ p (1 p ) n Note that in the previous student example this standard deviation formula was not correct because the sample size was more than 10% of the population. This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample In the fall of 2008, there were 18,516 students enrolled at California Polytechnic State University, San Luis Obispo. Of these students, 8091 (43.7%) were female. We will use a statistical software package to simulate sampling from this Cal Poly population. We will generate 500 samples of each of the following sample sizes: n = 10, n = 25, n = 50, n = 100 and compute the proportion of females for each sample. The following histograms display the distributions of the sample proportions for the 500 samples of each sample size. What do you notice about about What do you notice Are these histograms thethe standard deviation shape of the these centered around trueof these distributions? distributions? proportion p = .437? The development of viral hepatitis after a blood transfusion can cause serious complications for a patient. The article “Lack of Awareness Results in Poor Autologous Blood Transfusions” (Health Care Management, May 15, 2003) reported that hepatitis occurs in 7% of patients who receive blood transfusions during heart surgery. We will simulate sampling from a population of blood recipients. We will generate 500 samples of each of the following sample sizes: n = 10, n = 25, n = 50, n = 100 and compute the proportion of people who contract hepatitis for each sample. The following histograms display the distributions of the sample proportions for the 500 samples of each sample size. Are these histograms centered around the true proportion p = .07? What happens to the shape of these histograms as the sample size increases? General Properties Continued . . . Rule 3: When n is large and p is not too near 0 or 1, the sampling distribution of p is approximately normal. The farther the value of p is from 0.5, the larger n must be for the sampling distribution of p to be approximately normal. A conservative rule of thumb: If np > 10 and n (1 – p) > 10, then a normal distribution provides a reasonable approximation to the sampling distribution of p. Blood Transfusions Revisited . . . Let p = proportion of patients who contract hepatitis after a blood transfusion p = .07 Suppose a new bloodToscreening procedure is we answer this question, believed to reduce themust incident rate of consider thehepatitis. sampling distributionisofgiven p. to Blood screened using this procedure n = 200 blood recipients. Only 6 of the 200 patients contract hepatitis. Does this result indicate that the true proportion of patients who contract hepatitis when the new screening is used is less than 7%? Blood Transfusions Revisited . . . Let p = .07 p = 6/200 = .03 Is the sampling distribution approximately normal? np = 200(.07) = 14 > 10 Yes, we can n(1-p) = 200(.93) = 186 > 10 use a normal approximation. What is the mean and standard deviation of the sampling distribution? m pˆ .07 .07(.93) s pˆ .018 200 Blood Transfusions Revisited . . . m pˆ .07 Let p = .07 p = 6/200 = .03 .07(.93) s pˆ .018 200 Does this result indicate that the true proportion of patients who contract hepatitis when the new screening is used is less than 7%? This small probability tells us thatscreening it is unlikely This new P(p < .03) = .0132 that a sample procedure proportion appears of .03 or yield awould smaller .03 .07 Assume thetoscreening smaller be z 2.22 incidence rate observed if thefor procedure is not effective .07(.93) screening procedure hepatitis. and p = .07. 200 was ineffective. Defective Example If the true proportion of defectives produced by a certain manufacturing process is 0.08 and a sample of 400 is chosen, what is the probability that the proportion of defectives in the sample is greater than 0.10? Would a normal distribution be reasonable? Since np 400(0.08) 32 > 10 and n(1- p) = 400(0.92) = 368 > 10, it’s reasonable to use the normal approximation. Defective Example--Continued mp 0.08 (1 ) 0.08(1 0.08) sp 0.013565 n 400 p mp 0.10 0.08 z 1.47 sp 0.013565 P(p > 0.1) P(z > 1.47) 1 0.9292 0.0708 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Sales Example Suppose 3% of the people contacted by phone are receptive to a certain sales pitch and buy your product. If your sales staff contacts 2000 people, what is the probability that more than 100 of the people contacted will purchase your product? Clearly p = 0.03 and p = 100/2000 = 0.05 so 0.05 0.03 P(p > 0.05) P z > (0.03)(0.97) 2000 0.05 0.03 P z > P(z > 5.24) 0 0.0038145 Sales Example - continued If your sales staff contacts 2000 people, what is the probability that less than 50 of the people contacted will purchase your product? Now p = 0.03 and p = 50/2000 = 0.025 so 0.025 0.03 P(p 0.025) P z (0.03)(0.97) 2000 0.025 0.03 P z P(z 1.31) 0.0951 0.0038145