Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 8 Sampling Distributions Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chap 2 2 Section 8.1 Distribution of the Sample Mean Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Statistics such as the “mean” ( x ) are random variables. Their value varies from sample to sample, so they have probability distributions associated with them. In this chapter we focus on the shape (normal?), center (peak) and spread(deviation) of distributions of x. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-4 The sampling distribution of the sample mean is the probability distribution of all possible values of the random variable “ ” x Take many samples of size “n” from a population whose mean is μ and standard deviation is σ. Then plot just the means ( ) ofxeach sample. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-5 Illustrating Sampling Distributions Step 1: Obtain a simple random sample of size “n” Step 2: Compute the sample mean. Step 3: Repeat Steps 1 and 2 until all possible simple random samples of size n have been obtained (in theory). In practice, take as many n-sized samples as practicable (time & money…). 8-6 Sampling Distribution of the Sample Mean: Normal Population The weights of pennies minted after 1982 are approx normally distributed with mean = 2.46g (grams) and std dev = 0.02g. (28g = 1 oz) Approximate the sampling distribution of the sample mean by taking 200 simple random samples of size n = 5 pennies from this population (in other words, find the mean weight of each 5-penny sample and then plot just the 200 x ‘s, not the weight of all 1000 pennies.) Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-7 The data on the following slide represent the sample means for the 200 simple random samples, each of size n = 5. For example, the first sample of n = 5 pennies had the following weight data (in g): 2.493 x 2.466 2.473 2.492 Note: 2.471 = 2.479g for this sample 8-8 The data on the following slide represent the sample means for the 200 simple random samples, each of size n = 5. For example, the first sample of n = 5 pennies had the following weight data (in g): 2.493 x 2.466 2.473 2.492 Note: 2.471 = 2.479g for this sample 8-9 The data on the following slide represent the sample means for the 200 simple random samples, each of size n = 5. For example, the first sample of n = 5 pennies had the following weight data (in g): 2.493 x 2.466 2.473 2.492 Note: 2.471 = 2.479g for this sample 8-10 Sample Means for Samples of Size n = 5 8-11 The mean of the 200 sample means is 2.46g, the same as the mean of the population. The standard deviation of the sample means (“standard error”) is 0.0086g, which is smaller than the standard deviation of the population. The next slide shows the histogram of all 200 sample means. 8-12 8-13 What role does “n”, the size of each sample, play in the value of the standard deviation of the sample means? As the size “n” of each sample increases, the standard deviation of the distribution of the sample mean decreases. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-14 The Impact of Sample Size “n” on Sampling Variation (Variance, Std Dev) Approximate the distribution of the sample means by obtaining 200 simple random samples of size n = 20 pennies (not 5)…. from the same population of pennies minted after 1982 (μ = 2.46 grams and σ = 0.02 grams) Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-15 The mean of the 200 sample means for n = 20 pennies is still 2.46g, but the standard deviation is now 0.0045g (0.0086g for n = 5). There is less variation in the distribution of the sample means when n =20p than with n = 5p and the curve is more “normal”.8-16 The Mean & Standard Deviation of the Sampling Distribution of x If a random sample of size “n” is drawn from a large population with mean “μ” and std dev “σ”. Then the distribution of the x ‘s will have: x std dev x n . mean x is the “standard error of the mean” or just “standard error” 8-18 Theorem If the parent population of the random variable X is normally distributed, the distribution of the sample means from x that population will also be normally distributed. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-19 Distribution of the Sample Mean The weights of pennies minted after 1982 (pop) are normally distributed with: mean = 2.46g and std dev = 0.02g. What is the probability that, in a random sample of 10 of these pennies, the mean weight of the sample is at least 2.465 grams? Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-20 x normally distributed . 2.465 2.46 Z 0.79 0.0063 . P(Z > 0.79) = 1 – 0.7852 x = 2.46g 0.02 x 0.0063 10 normcdf (-1E99, 0.79) = 0.7852 Or: normcdf (0.79,1E99) = 0.2148 Or: normcdf(2.465,1E99,2.46,0.0063) = 0.2137 = 0.2148 . Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Now… Lets describe the Distribution of the Sample Means if the parent population is non-normal, Or (same thing) we don’t know whether the parent is normal or not. 22 Sampling from a Population that is Not Normal (or Unk) The following table and histogram give the probability distribution for rolling a fair die: Result Face on Die Relative Frequency or Probability 1 0.1667 2 0.1667 3 0.1667 4 0.1667 5 0.1667 6 0.1667 μ = 3.5, σ = 1.708 Note that the population distribution is Uniform, NOT normal 8-23 Probability Experiment Roll a dice 4 times (n = 4) and find the mean value x Repeat this 200 times. Calculating the sample mean for each of the 200 samples. Repeat for n = 10 and 30. Estimate the sampling distribution of simple random samples of size and by obtaining 200 Histograms of the sampling distribution of the sample mean for each sample size are given on the next slide. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-24 Probability Experiment Roll a dice 4 times (n = 4) and find the mean value x Repeat this 200 times. Calculating the sample mean for each of the 200 samples. Repeat for n = 10 and 30. Estimate the sampling distribution of simple random samples of size and by obtaining 200 Histograms of the sampling distribution of the sample mean for each sample size are given on the next slide. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-25 Probability Experiment 1. Roll 4 dice (n = 4). Find the mean value of result x 2. Repeat this 200 times. Calculate the sample mean for each of the 200 rolls. Repeat for n = 10 and n = 30. 3. Estimate the sampling distribution of x for each “n” Histograms of the distribution of the sample means for each sample size are given on the next slide. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-26 8-27 8-28 8-29 Key Points from Dice Experiment The mean of the sampling distribution is equal to the mean of the parent population. The std error of the distribution of the sample means is regardless of “n” n The Central Limit Theorem: the shape of the distribution of the sample means becomes approx normal as the sample size n ≥ 30 increases, regardless of the shape of the parent population (means can use z-scores). Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-30 Using the Central Limit Theorem Suppose that the mean time for an auto oil change at Karlos’ Speedy Oil Change is 11.4 minutes with a standard deviation of 3.2 min. (a) If a random sample of n = 35 oil changes is selected, describe the sampling distribution of the sample mean. (b) If a sample of n = 35 oil changes at Karlos’ is selected, what is the probability that the mean of these 35 oil change times is less than 11 minutes? 8-31 Using the Central Limit Theorem S”what is the probability the mean of 35 oil change times is less than 11 minutes? that the mean timen for an oil at a “10-minute oil change x change Solution: since > 30, is approximately normally distributed joint” is 11.4=minutes with standard of 3.2 min minutes. with mean 11.4 min anda std. dev. deviation = 3.2 0.5409 35 (a) If a random sample of n = 35 oil changes is 11 11.4 Z 0.74 P(Z < –0.74) = 0.2296 0.5409 Or…. 2:Distr:2: normcdf(-1E99,11.0,11.4,0.5409) = 0.2298 IOW, if you did this experiment (n=35) 100 times, about 23 of those times (1/4) the mean oil change would take less than 11 mins. 8-32 Section 8.2 Distribution of the Sample Proportion (not Mean) Copyright © 2013, 2010 and 2007 Pearson Education, Inc. POINT ESTIMATE OF A POPULATION PROPORTION If a random sample of size “n” is obtained from a population in which each outcome is binomial (Yes/No; True/False; Success/Fail) The sample proportion, pˆ (“p-hat”) is x pˆ where x is the number of “successes”. n The sample proportion is a pˆstatistic that estimates the population proportion, ρ (Gr: rho). Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-34 Computing a Sample Proportion In a 2003 national Harris Poll, 1745 registered voters were asked whether they approved of the way President Bush was handling the economy (wthtm). 349 responded “yes”, and the other 1396 said various other unprintable things. Obtain a point estimate (%) for the proportion of registered voters who approved of the way President Bush was handling the economy. x 349 pˆ 0.2000 n 1745 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-35 Using Simulation to Describe the Distribution of the Sample Proportion According to a Time poll in 2008, 42% of registered voters believed that gay/lesbian couples should be allowed to (civil) marry. Describe the sampling distribution of the sample proportion for samples of size: n = 10, 50, 100. Note: We are using simulations to create the histograms on the following slides. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-36 8-37 8-38 8-39 Key Points from Time Poll Shape: As the size of the sample “n” increases, the shape of the sampling proportion distribution becomes approximately normal. Center: The mean of the sampling proportion distribution pˆ equals the population proportion, ρ. Spread: The std dev of the sampling proportion distribution decreases as the sample size, n increases. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-40 Sampling Distribution of p̂ For a simple random sample of size n with point estimate pˆ and population proportion ρ: 1. The shape of the sampling distribution of approximately normal provided npq ≥ 10. 2. The mean of the sampling distribution of is p̂ is pˆ 3. The standard deviation of the sampling distribution of p ˆ is pq p̂ n Note: p + q=1 or q = 1 - p Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-41 Normality for Qualitative Variables (Proportions)… In order for us to use z-scores for probability of proportions, the distribution of p ˆ must be approx normal. Your text uses npq ≥ 10 for this requirement, but most books use both np ≥ 5 and also nq ≥ 5 to assume normality 42 Normality Requirement In this civil marriage problem: p = 0 .42 (42% approve) , so q = 1-p = 0.58 (58% disapprove) If n = 50, then: npq = 12.18 > 10 np = 21 >5 and nq = 29 >5 But, if n = 20, npq = 4.872 < 10 np = 8.4 > 5 and nq = 11.6 > 5 43 Sampling Distribution of the Sample Proportion According to a Time poll conducted in 2008, 42% of registered voters believed that gay/lesbian couples should be allowed to civil marry. Suppose that we obtain a random sample of 50 voters and determine which of those voters believe that gay/lesbian couples should be allowed to marry (“Success” = x). Describe the sampling distribution of the sample proportion for registered voters who believe that gay and lesbian couples should be allowed to marry. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-44 Proportion Marriage Survey np(1 – p) = npq = 50(0.42)(0.58) = 12.18 ≥ 10. Thus the sampling distribution of the sample proportion is therefore approximately normal with: mean = 0.42 (42% of the 50 = Success, 58% of the 50 = Fail) and standard deviation = (0.42)(0.58) 0.0698 6.98% 50 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-45 Compute Probabilities of a Sample Proportion According to the Centers for Disease Control and Prevention (CDC) in 2004, 18.8% of school-aged children (aged 611 yrs) were overweight. (a) In a random sample of 90 school-aged children, what is the probability that at least 19% are overweight? (b) Suppose in a random sample of 90 school-aged children you find 25 overweight children. What might you conclude? Note: we are dealing with % of children, not quantity of children, because this variable is proportion. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-46 Porky Pigs: a) In sample of 90 children, what is prob at least 19% are overweight? 1. npq = 90(0.188)(0.812) ≈ 13.739 ≥ 10 So, pˆ is approximately normal with mean = 0.188 and standard deviation = (0.188)(0.812) 0.0412 90 0.19 0.188 Z 0.0485 0.0412 P( Z 0.0485) 0.5193 Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-47 Porky Pigs: b) In a sample of 90, you find 25 overweight children. What might you conclude? 1. As before, pˆ is approximately normal with mean = 0.188 and standard deviation = 0.0412 0.2778 0.188 25 Z 2.179 pˆ 0.2778 0.0412 90 2.normcdf (2.179, 1E99) = 0.0147 = prob of finding 25/90 O/W 3.If the true population proportion is 0.188 (18.8% O/W), then the prob of finding 25/90 or 27.8% o/w children in a sample is “unusual” (any data point outside 2σ). 4.If we repeated this experiment 100 times, we would only expect to get this many O/W children (25/90) approx 1 or 2 times. Copyright © 2013, 2010 and 2007 Pearson Education, Inc. 8-48 Chap 2 49