Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Sampling Distribution Summary Statistics Chapter 7 concerns summary statistics (a one number summary of a random sample) and sampling distributions (a plot of the summary statistics taken from many samples of a population). The two summary statistics and their sampling distribution analyzed in this chapter are for the mean (in 7.2) and the proportion of successes in each sample for a binomial distribution (in 7.3). In section 7.1 sampling distributions are generated by using simulations for samples of size n. The four steps in doing this are: 1. Take a random sample of fixed size n from the population. 2. Compute a summary statistic (often either the mean or proportion of successes). 3. Repeat steps 1 and 2 many times. 4. Display the distribution of the summary statistics (sampling distribution). In analyzing the sampling distribution we typically compute its mean ( x if the summary statistic is the mean and p̂ if the summary statistic is the proportion of successes) and its standard deviation, which is called the standard error (SE) for a sampling distribution. It is denoted as x if the summary statistic is the mean and p̂ if the summary statistic is the proportion of successes. In general, if the shape of the original population is normal the sampling distribution shape will also be normal. If the shape of the population is skewed, the shape of the sampling distribution will become more and more normal as the sample size, n, increases. Reasonably likely values for a summary statistic include the middle 95% of the values in a sampling distribution. Rare events are those in the lower 2.5% or upper 2.5% of the sampling distribution. If the sampling distribution is normal and it is standardized (transformed to a mean of 0 and a standard deviation equal to one) the middle 95% correspond to z-scores between ±1.96 (approximately ± 2 standard deviations). Section 7.2 Sampling Distribution of the Sample Mean Section 7.2 considers cases for sampling distributions where the summary statistic is the sample mean. The notations used for the population, individual samples, and the sampling distribution are: Population Parameter Sample Statistic x Sampling Distribution x s x or SE N n Mean Standard Deviation Size The relationships between the population parameters and sampling distribution are: x (mean of the population = mean of the sampling distribution) 𝜎𝑥̅ = 𝜎 √𝑛 (as the sample size, n, increases the standard error decreases) The Central Limit Theorem concerns the property that as the sample size, n, increases the sampling distribution becomes more normal. 𝜎 The property that 𝜎𝑥̅ = is very useful because it is now possible to determine √𝑛 the standard error for a sample of size n without having to perform a simulation. As an example of how this might be used, say you own a catering company and you are determining how much of a certain beverage you should have available for a job that involves catering 50 people. From previous experience, you find that each person drinks an average of 0.25 liters with a standard deviation of 0.12 liters. What is the probability that the average amount of beverage consumed by each person will be 0.28 liters or more? 𝜇 = 𝜇𝑥̅ = 0.25 L x z n x x x 0.12 0.01697 50 0.28 0.25 1.77 0.01697 A z-score of 1.77 corresponds to 0.9616, so there is a (1 – 0.9616) = 0.0384 or 3.84% chance the average consumption will be 0.28 L or more per person. What are the likely values (middle 95%) of beverage amounts that you might expect to provide for each person? Remembering that the middle 95% corresponds to z-scores of ±1.96, we have: z x x x Rearranging this to solve for x and substituting ±1.96 for z gives: x 1.96 x x Substituting, we have: x 1.96 0.01697 0.25 0.217 to 0.283 liters. In other words, it is likely that the average amount of beverage consumed by each person will be between 0.217 the 0.283 liters. Finding probabilities involving sample totals Some problems are given in terms of a total value rather than an average value. For example, instead of expressing our previous problem as: What is the probability that the average amount of beverage consumed by each of the 50 persons with be 0.28 liters or more? we could have stated the problem as: What is the probability that the total amount of beverage consumed by the 50 persons will be 14 liters or more? (because 50 x 0.28 = 14 liters) Two methods are possible for handling this “total value” problem. The first is to just divide the total by the number in the sample, n, to get an average per 14 person, and then proceed as we did in the problem above. 0.28 50 The second method is to transform our equations which treat average values to total values. The mean of the total, denoted as SUM in the text, is SUM n (in our example, 50 x 0.28 = 14 liters) The standard error for the total, denoted as SUM n x n shape of the distribution will still be normal. n n . The For our example problem, then, we have: You own a catering company and are analyzing how much beverage will be consumed. Based on previous experience, you have found that the average amount of beverage consumed is 0.25 liters per person with a standard deviation of 0.12 liters. What is the probability that the total amount of beverage consumed for a catering event of 50 people will be 14 liters or less? SUM n 50 0.25 12.50 liters SUM n 50 0.12 0.849 z sample sum SUM SUM 14 50 0.25 14 12.50 1.77 0.848 50 0.12 Once again, a z-score of 1.77 corresponds to 0.9616, so there is a (1 – 0.9616) = 0.0384 or 3.84% chance the total consumption will be 14 liters or more. The catering company owner could also determine what total range of beverage consumed is likely (that is, the middle 95%). 1.96 sample sum sum SUM sample sum 1.96 SUM SUM 1.960.849 12.50 10.84 to 14.16 The owner of the catering company, then, can reasonably expect to need between 10.84 to 14.16 liters of the beverage. 7.3 Sampling Distribution of the Sample Proportion From section 6.2, for a binomial distribution with proportion of successes, p, we found for a sample of size n the mean for the number of successes X is X np and the standard deviation is X np1 p . If both np and n(1 – p) are ≥ 10 the sampling distribution will have approximately a normal shape. If the summary statistic for each sample is the sample proportion, p̂ (called “phat”), defined as: pˆ number of " succeses" sample size we have, for the sampling distribution of the sample proportion: pˆ X n np p n (the mean of the sampling distribution of the sampling proportion is always equal to p) np1 p n p1 p (the spread decreases as the n n sample size, n, increases) As the sample size increases, the shape of the sampling distribution becomes more normal and is approximately normal if n is large enough. As a general rule of thumb, if both np and n(1 – p) are at least 10, the shape can be treated as being normal. pˆ X Example from book: About 60% of Mississippians use seat belts. Suppose your class conducts a survey of 40 randomly selected Mississippians. a.) What is the chance that 75% or more of those selected wear seat belts? np = 40(0.6) = 24 and n(1-p) = 40(1-0.6)=16, which are both ≥ 10, so we can treat it as a normal distribution. pˆ p 0.60 pˆ p1 p 0.61 0.6 0.0775 n 40 0.75 0.6 1.94 (corresponds to pˆ 0.0775 0.9738). That means that there is a (1 – 0.9738) = 0.0262 or 2.62% chance that in a sample of 40 Mississippians, 75% of them will wear seat belts. The z-score for this is: z pˆ p b.) Would it be quite unusual to find that fewer than 25% of the Mississippians selected wear seat belts? For this case, pˆ 0.25 , so the z score will be: z pˆ p pˆ 0.25 0.6 4.52 0.0775 Since anything outside of z scores between -1.96 and +1.96 are unusual, it would be very unusual to find that only 25% of Mississippians in a sample of size 40 wear seatbelts.