Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 9 Sampling Distributions 1 Background We want to learn about the feature of a population (parameter) In many situations, it is impossible to examine all elements of a population because elements are physically inaccessible, too costly to do so, or the examination involved may destroy the item. Sample is a relatively small subset of the total population. We study a random sample to draw conclusions about a population, this is where statistics come into the picture. Statistics, such as the sample mean and sample variance, computed from sample measurements, vary from sample to sample. Therefore, they are random variables. The probability distribution of a statistic is called a sampling distribution. 2 Sampling Distributions A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population Sampling Distributions Sampling Distribution of the Mean Sampling Distribution of the Proportion 3 Developing a Sampling Distribution Assume there is a population … Population size N=4 A B C D Random variable, X, is age of individuals Values of X: 18, 20, 22, 24 (years) 4 Developing a Sampling Distribution (continued) Summary Measures for the Population Distribution: X μ P(x) i N .3 18 20 22 24 21 4 σ (X μ) i N .2 .1 0 2 2.236 18 20 22 24 A B C D x 5 Sampling with replacement Samples Age Sample means A, A 18, 18 18 A, B 18, 20 19 A, C 18, 22 20 A, D 18, 24 21 B, A 20, 18 19 B, B 20, 20 20 B, C 20, 22 21 B, D 20, 24 22 C, A 22, 18 20 C, B 22, 20 21 C, C 22, 22 22 C, D 22, 24 23 D, A 24, 18 21 D, D 24, 20 22 D, C 24, 22 23 D, D 24, 24 24 6 Developing a Sampling Distribution (continued) Sampling Distribution of All Sample Means Sample Means Distribution 16 Sample Means 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 _ P(X) .3 .2 .1 0 18 19 20 21 22 23 24 _ X 7 Developing a Sampling Distribution (continued) Summary Measures of this Sampling Distribution (note that N=16 for the population of sample means): μX X N σX i 18 19 21 24 21 16 2 ( X μ ) i X N (18 - 21)2 (19 - 21)2 (24 - 21)2 1.58 16 8 Comparing the Population with its Sampling Distribution (with replacement) Sample Means Distribution n=2 Population N=4 μ 21 σ 2.236 μ X 21 _ P(X) .3 P(X) .3 .2 .2 .1 .1 0 18 20 22 24 A B C D X 0 18 19 σ X 1.58 20 21 22 23 24 2 _ X 9 Mean and standard error of the sample Mean (sample with replacement) The mean of the distribution of sample mean: X A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population) σX σ n Note that the standard error of the mean decreases as the sample size increases 10 If the Population is Normal If a population is normal with mean μ and standard deviation σ, The sampling distribution of X is also normally distributed with μX μ and σ σX n Or, equivalently, the sampling distribution of normally distributed with μ Xi nμ and σ Xi n X i 1 i is n 11 Sampling Distribution Properties (continued) As n increases, Larger sample size σ x decreases Smaller sample size μ x 12 If the Population is not normal The central limit theorem states that when the number of observations in each sample (called sample size) gets large enough The sampling distribution of X is approximately normally distributed with μX μ σ σX n and n Or, equivalently, the sampling distribution of approximately normally distributed with μ Xi nμ and σ Xi X i 1 i is also n 13 Z value for means Standardize the sample mean: Z X n 14 Visualizing the Central Limit Theorem Population Distribution Sampling distribution properties: Central Tendency μx μ σ σx n Variation μ x Sampling Distribution (becomes normal as n increases) Larger sample size Smaller sample size μx x 15 How Large is Large Enough? For most distributions, n > 30 will give a sampling distribution that is nearly normal For fairly symmetric distributions, n > 15 Recall that, for normal population distributions, the sampling distribution of the mean is always normally distributed regardless of sample size n 16 Calculating probabilities Suppose we want to find out P ( a X b) If the population is normal, then regardless of the value of n: a a P(a X b) P Z n n If the population is not normal, then, when n is large enough (n > 30) a a P(a X b) P Z n n 17 Example Suppose a population has mean μ = 10 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected. What is the probability that the sample mean is between 9.7 and 10.3? 18 Example (continued) Solution: Even if the population is not normally distributed, the central limit theorem can be used (n > 30) … so the sampling distribution of approximately normal x is … with mean μx = 10 σ 3 …and standard deviation σ x n 36 0.5 19 Example (continued) Solution (continued): 9.7 - 10 X -μ 10.3 - 10 P(9.7 X 10.3) P 3 σ 3 36 n 36 P(-0.6 Z 0.6) 0.6514 Population Distribution ??? ? ?? ? ? ? ? ? μ 10 Sampling Distribution Sample ? X 9.7 10 10.3 μ X 10 x 20 One more example Time spent using e-mail per session is normally distributed with =8 minutes and =2 minutes. 1. If a random sample of 25 sessions were selected, what proportion of the sample mean would be between 7.8 and 8.2 minutes? 21 Example (Cont’d) 2. If a random sample of 100 sessions were selected, what proportion of the sample mean would be between 7.8 and 8.2 minutes? 3. What sample size would you suggest if it is desired to have at least 0.90 probability that the sample mean is within 0.2 of the population mean? 22 Sampling Distribution of the Proportion Sampling Distributions Sampling Distribution of the Mean Sampling Distribution of the Proportion 23 Population Proportions In Bernoulli trials, let π = the proportion of successes Recall that Y = the number of successes in n Bernoulli trials follows Bin(n, π) For the ith Bernoulli trial, Define 1 if the ith outcome is a " success" Xi 0 if the ith outcome is a " failure" Then, obviously E ( X i ) and ( X i ) (1 ) 24 Population proportions (Cont’d) For large n, apply the CLT to sample mean and sum n pX X i 1 n i (1 - ) 2 is approximat ely distribute d as N , n n Y X i is approximat ely distribute d as N n , i 1 n (1 - ) 2 How large is large? n 5 and n( 1-) 5 Or np 5 and n( 1-p) 5 25 Z-Value for Proportions Standardize p to a Z value with the formula: p Z σp p (1 ) n 26 Example If the true proportion of voters who support Proposition A is π = 0.4, what is the probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45? i.e.: if π = 0.4 and n = 200, what is P(0.40 ≤ p ≤ 0.45) ? 27 Example (continued) Find σ p : σ (1 ) 0.4(1 0.4) 0.03464 p n Convert to standard P(0.40 p 0.45) normal: 200 0.45 0.40 0.40 0.40 P Z 0.03464 0.03464 P(0 Z 1.44) 0.4251 28 Review example The number of claims received by an automobile insurance company on collision insurance on one day follows the following probability distribution: x 0 1 2 3 4 p(x) 0.65 0.2 0.1 0.03 0.02 4 With xp( x) 0.57 x 0 4 x 2 p( x) 2 0.93 x 0 Suppose the number of claims received are independent from day to day. 29 Review example (cont’d) For a 50-day period, Find the probability of the following events: 1) The total number of claims exceeds 20 2) On more than 20 days, at least one claim is received 30 Sampling distribution of difference of two independent populations An important estimation problem involves the comparison of means of the two populations. For example, you may want to make comparisons like these: The average scores on GRE for students who majored in mathematics versus chemistry The average income for male and female college graduates The proportion of patients receiving different medications who recovered from a certain disease 31 Sample distributions of difference of two independent sample means Suppose there are two populations Population Mean I II 1 2 S.d. 1 2 Independent random samples of size n₁ and n₂ observations have been selected from the two populations with sample means X 1 and X 2 respectively Recall that when n₁ and n₂ are large, X 1and X 2are approximately normally distributed with 1 2 E X 1 1 , X 1 E X 2 2 , X 2 n1 n2 32 Since the two samples are independent X X 1 2 1 2 2 σ X1 X 2 2 σ1 σ 2 n1 n 2 Standardize: Z X 1 X 2 1 2 2 2 σ1 σ 2 n1 n 2 33 Example A light bulb factory operates two different types of machines. The mean life expectancy is 385 hours from machine I and 365 hours from machine II. The process standard deviation of life expectancy of machine I is 110 hours and of machine II is 120 hours. What is the probability that the average life expectancy of a random sample of 100 light bulbs from Machine I is shorter than the average life expectancy of 100 light bulbs from Machine II? 34 Example (Cont’d) Note that n1 100, n2 100 1 385, 2 365 1 110, 2 120 Therefore 0 385 365 P X 1 X 2 0 P Z 2 2 110 120 100 100 20 P Z PZ 1.23 0.1093 16.28 35 Sampling distribution of difference of two independent sample proportions Assume that independent random samples of n₁ and n₂ observations have been selected from binomial populations with parameters 1 and 2 , respectively. The sampling distribution of the difference in sample proportions (p₁-p₂) can be approximated by a normal distribution with mean and standard deviation p p 1 2 1 2 p p 1 1 (1 1 ) 2 (1 2 ) n1 2 n2 The Z statistic is Z p1 p2 1 2 1 (1 1 ) 2 (1 2 ) n1 n2 36 Example From a study by the Charles Schwab Corporation, 74% of African Americans and 84% of Whites with an annual income above $50,000 owned stocks. For a random sample of 500 African American and a random sample of 500 Whites with income above $50,000, what is the probability that more whites own stocks? 37 Example (Cont’d) Summary data: n1 500, n2 500 1 0.74, 2 0.84 It follows that 0 0.84 0.74 P p2 p1 0 P Z 0.74 0.26 0.84 0.16 500 500 P( Z 3.91) 0.99995 38 Important Summary of sampling distributions Param. Point estimate μ 2 N , n X 1 N , n p 1 2 X 1 X 2 1 2 p1 Sampling distribution 2 2 1 2 N 1 2 , n n 1 2 1 1 1 1 1 1 N , p2 1 2 n n 1 1 Standardized Z X n Z p Z 1 n Z X 1 X 2 1 2 12 n1 Z 22 n2 p1 p2 1 2 1 1 1 2 1 2 n1 n2 39 Sampling methods Simple random samples Stratified samples 40 Simple Random Samples Every individual or item from the frame has an equal chance of being selected Selection may be with replacement or without replacement Samples obtained from table of random numbers or computer random number generators Simple to use May not be a good representation of the population’s underlying characteristics 41 Stratified Samples Divide population into two or more subgroups (called strata) according to some common characteristic A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes Samples from subgroups are combined into one Ensures representation of individuals across the entire population Population Divided into 4 strata Sample 42 Types of Survey Errors (continued) Coverage error Excluded from frame Non response error Follow up on nonresponses Sampling error Measurement error Random differences from sample to sample Bad or leading question 43