Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 8 Sampling Variability & Sampling Distributions 1 8.1: Basic Terms Any quantity computed from values in a sample is called a statistic. The observed value of a statistic depends on the particular sample selected from the population; typically, it varies from sample to sample. This variability is called sampling variability. 2 Sampling Distribution The distribution of a statistic is called its sampling distribution. So you could have a sampling distribution of a mean, median, max, min, etc. 3 1 Example Consider a population that consists of the numbers 1, 2, 3, 4 and 5 generated in a manner that the probability of each of those values is 0.2 no matter what the previous selections were. This population could be described as the outcome associated with a spinner such as given below. The distribution is next to it. x 1 2 3 4 5 p(x) 0.2 0.2 0.2 0.2 0.2 4 Example If the sampling distribution for the means of samples of size two is analyzed, it looks like Sample 1, 1 1, 2 1, 3 1, 4 1, 5 2, 1 2, 2 2, 3 2, 4 2, 5 3, 1 3, 2 3, 3 Sample 3, 4 3, 5 4, 1 4, 2 4, 3 4, 4 4, 5 5, 1 5, 2 5, 3 5, 4 5, 5 1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3 3.5 4 2.5 3 3.5 4 4.5 3 3.5 4 4.5 5 1 1.5 2 2.5 3 3.5 4 4.5 5 frequency 1 2 3 4 5 4 3 2 1 25 p(x) 0.04 0.08 0.12 0.16 0.20 0.16 0.12 0.08 0.04 5 Example The original distribution and the sampling distribution of means of samples with n=2 are given below. 1 2 3 4 Original distribution 5 1 2 3 4 5 Sampling distribution n=2 6 2 Example Sampling distributions for n=3 and n=4 were calculated and are illustrated below. 1 2 3 4 5 Sampling distribution n = 3 1 2 3 4 5 Sampling distribution n = 4 7 Simulations 2 To illustrate the general behavior of samples of fixed size n, 10000 samples each of size 30, 60 and 120 were generated from this uniform distribution and the means calculated. Probability histograms were created for each of these (simulated) sampling distributions. Notice all three of these look to be essentially normally distributed. Further, note that the variability decreases as the sample size increases. 2 2 8 3 4 3 4 3 4 Means (n=30) Means (n=60) Means (n=120) Simulations To further illustrate the general behavior of samples of fixed size n, 10000 samples each of size 4, 16 and 32 were generated from the positively skewed distribution pictured below. Skewed distribution 9 Notice that these sampling distributions are all skewed, but as n increases, the sampling distributions became more symmetric and eventually appeared to be almost normally distributed. 3 8.2: Terminology Let x denote the mean of the observations in a random sample of size n from a population having mean µ and standard deviation σ. Denote the mean value of the distribution by μ x and the standard deviation of the distribution by σ x (called the standard error of the mean), then the rules on the next two slides hold. 10 Properties of the Sampling Distribution of the Sample Mean. Rule 1: μ x = μ Rule 2: σ x = σ n This rule is approximately correct as long as no more than 10% of the population is included in the sample. Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n. 11 Central Limit Theorem. Rule 4: When n is sufficiently large, the sampling distribution of x is approximately normally distributed, even when the population distribution is not itself normal. 12 4 Illustrations of Sampling Distributions Population n =4 n=9 n = 16 Symmetric normal like population 13 Illustrations of Sampling Distributions Population n=4 n=10 n=30 Skewed population 14 More about the Central Limit Theorem. The Central Limit Theorem can safely be applied when n exceeds 30. If n is large or the population distribution is normal, the standardized variable z= x − μX x − μ = σX σ n has (approximately) a standard normal (z) distribution. 15 5 Example A food company sells “18 ounce” boxes of cereal. Let x denote the actual amount of cereal in a box of cereal. Suppose that x is normally distributed with µ = 18.03 ounces and σ = 0.05. a) What proportion of the boxes will contain less than 18 ounces? 18 − 18.03 ⎞ ⎛ P(x < 18) = P ⎜ z < ⎟ 0.05 ⎠ ⎝ = P(z < −0.60) = 0.2743 16 Example - continued b) A case consists of 24 boxes of cereal. What is the probability that the mean amount of cereal (per box in a case) is less than 18 ounces? The central limit theorem states that the distribution of x is normally distributed so ⎛ 18 − 18.03 ⎞ P(x < 18) = P ⎜ z < ⎟ 0.05 24 ⎠ ⎝ = P(z < −2.94) = 0.0016 17 8.3: Some proportion distributions where π = 0.2 Let p be the proportion of successes in a random sample of size n from a population whose proportion of S’s (successes) is π. n = 100 n = 20 n = 50 n = 10 18 0.2 0.2 0.2 0.2 6 Properties of the Sampling Distribution of p Let p be the proportion of successes in a random sample of size n from a population whose proportion of S’s (successes) is π. Denote the mean of p by μp and the standard deviation by σp (which is the standard error of the proportion) . Then the following rules hold 19 Properties of the Sampling Distribution of p Rule 1: μp = π Rule 2: σp = π(1 − π) n Rule 3: When n is large and π is not too near 0 or 1, the sampling distribution of p is approximately normal. And now we can use these to calculate a z score 20 Condition for Use The further the value of π is from 0.5, the larger n must be for the normal approximation to the sampling distribution of p to be accurate. Rule of Thumb If both np ≥ 10 and n(1-p) ≥ 10, then it is safe to use a normal approximation. Or put another way, we need ≥ 10 successes and ≥ 10 failures to say it’s approximately normal. 21 7 Example If the true proportion of defectives produced by a certain manufacturing process is 0.08 and a sample of 400 is chosen, what is the probability that the proportion of defectives in the sample is greater than 0.10? Since nπ = 400(0.08) = 32 > 10 and n(1-π) = 400(0.92) = 368 > 10, it’s reasonable to use the normal approximation. 22 Example (continued) μp = π = 0.08 σp = π(1 − π) 0.08(1 − 0.08) = = 0.013565 n 400 z= p − μp 0.10 − 0.08 = = 1.47 0.013565 σp P(p > 0.1) = P(z > 1.47) = 1 − 0.9292 = 0.0708 23 Example Suppose 3% of the people contacted by phone are receptive to a certain sales pitch and buy your product. If your sales staff contacts 2000 people, what is the probability that more than 100 of the people contacted will purchase your product? Clearly π = 0.03 and p = 100/2000 = 0.05 so 24 ⎛ ⎞ ⎜ 0.05 − 0.03 ⎟ P(p > 0.05) = P ⎜ z > ⎟ (0.03)(0.97) ⎟ ⎜⎜ ⎟ 2000 ⎝ ⎠ 0.05 − 0.03 ⎞ ⎛ = P⎜ z > ⎟ = P(z > 5.24) ≈ 0 0.0038145 ⎠ ⎝ 8 Example - continued If your sales staff contacts 2000 people, what is the probability that less than 50 of the people contacted will purchase your product? Now π = 0.03 and p = 50/2000 = 0.025 so ⎛ ⎞ ⎜ 0.025 − 0.03 ⎟ P(p < 0.025) = P ⎜ z < ⎟ (0.03)(0.97) ⎟ ⎜⎜ ⎟ 2000 ⎝ ⎠ 0.025 − 0.03 ⎞ ⎛ = P⎜ z < ⎟ = P(z < −1.31) = 0.0951 0.0038145 ⎠ ⎝ 25 9