Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling and sampling distribution September 11, 2016 STAT 151 Class 5 Slide 1 Outline Sampling Sampling distribution of a mean Outline of Topics 1 Sampling 2 Sampling distribution of a mean 3 Sampling distribution of a proportion STAT 151 Class 5 Slide 2 Sampling distribution of a proportion Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Statistical Inference Many economic and social decisions are based on figures from the entire population, e.g., how many homeless people are there? what is household income? A census – every unit in the population is studied – is the gold standard but very costly Statisticians use a representative portion of the population – a sample – to solve the problem The method of using a sample to study a population is called statistical inference STAT 151 Class 5 Slide 3 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Population and sample Population – The set of all units of interest Finite – Population size N is enumerable Infinite – N is not finite (note that infinite 6= “continuous” as in the definition of random variables) Sample – Any subset of a population. Sample size n can be as small as one unit of the population A finite population can be analysed as an infinite population if (1) N is very big (2) Nn < 0.05 (3) N is small but sampling is carried out with replacement We assume an infinite population or a finite population with (1), (2) or (3) STAT 151 Class 5 Slide 4 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Parameters and statistics Every problem about a population can be characterised by some summaries called parameters, e.g., the proportion of homeless people the mean income A statistic is the equivalence of a parameter calculated from a sample, e.g., the proportion of homeless people in the sample the sample mean income Parameters are usually unknown whereas statistics are known Inferential statistics uses a statistic to infer about a parameter STAT 151 Class 5 Slide 5 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Common population quantities and sample counterparts Parameter Statistic Probability distribution Histogram (Population) mean, µ (Sample) mean, X̄ (Population) variance, σ 2 (Sample) variance, s 2 (Population) standard deviation, σ (Sample) standard deviation, s (Population) proportion, p (Sample) proportion, p̂ STAT 151 Class 5 Slide 6 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Simple random sample A simple random sample (SRS) is chosen in such a way that every member of the population has the same probability of being selected A SRS allows valid inference to be drawn because sampling is carried out based on the principle of randomization, instead of leaving such decisions to human judgement We assume members in our sample are independently drawn from the population– each unit in the sample to contribute a separate piece of information about the parameter of interest There are other sampling schemes but we focus on SRS here Hereafter, we refer a SRS of independent observations as a “sample” STAT 151 Class 5 Slide 7 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling from a population PN µ= Population i=1 N X1 , X2 , · · · , XN σ2 = PN i=1 (Xi X̄ = Slide 8 i=1 Xi n X1 , X2 , · · · , Xn s2 = STAT 151 Class 5 − µ)2 N Pn Sample Xi Pn i=1 (Xi n − X̄ )2 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling error Example Sampling with replacement from a finite population Population Sample∗ Units X1 , ..., X7 = 1, 2, 3, 4, 5, 6, 7 X1 , ..., X5 =3, 6, 5, 1, 6 Size N=7 n=5 Mean µ= X1 +...+XN N = 1+...+7 7 =4 X̄ = X1 +...+Xn n = 3+6+5+1+6 5 = 4.2 X̄ − µ = 4.2 − 4 ≡ × is called a sampling error Every sample of size n is subject to sampling error because only a subset of the population is used to infer about the whole In practice, µ is unknown and hence × is also unknown and it cannot be estimated ∗ X1 , ..., X5 are generic symbols for five units randomly selected with replacement from the population; they are not necessarily the first five units in the population STAT 151 Class 5 Slide 9 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling distribution Sample k 4, 5, 6, 1, 7 X̄ = 4.6 Population 1, 2, 3, 4, 5, 6, 7 4.6 − µ = × Sample 2 1, 4, 6, 2, 2 X̄ = 3 3−µ=× Sample 1 3, 6, 5, 1, 6 X̄ = 4.2 4.2 − µ = × Sampling distribution = distribution of × = distribution of X̄ The distribution of sampling errors can be studied and it tells us the likely values of the sampling error when X̄ is used to estimate µ. The sampling error distribution is sometimes called a sampling distribution STAT 151 Class 5 Slide 10 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling distributions of a statistic and its sampling error Different samples give an empirical sampling distribution of X̄ Distribution of X and x 0 to 500 500 to 1000 1000 to 1500 1500 to 2000 2000 to 2500 2500 to 3000 3000 to 3500 3500 to 4000 Few samples have X̄ near 1 or 7 — only appear if sampling gives nearly all 1s or all 7s — a rare outcome Highest frequencies of X̄ near population mean µ = 4 — many ways of obtaining n = 5 from 1, 2, 3, 4, 5, 6, 7 to give X̄ ≈ 4 Distribution looks “symmetric” about µ — equally likely to obtain a sample with values higher or lower than average Each time X̄ is used to estimate µ, sampling error × may result. The distributions of X̄ and × are identical except the values are translated STAT 151 Class 5 Slide 11 0 1 2 3 µ 5 6 7 8 1 2 3 4 X −4 −3 −2 −1 0 sampling error x Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling distribution and Central Limit Theorem (CLT) Possible sampling errors = (Possible values of X̄ ) 0 µ Sampling error X̄ The Central Limit Theorem (CLT) says that when using X̄ from a reasonably big sample of n independent observations to estimate µ, the sampling distribution of X̄ (and its sampling error) is approximately normal X̄ ∼ Normal X̄ )) |{z} | {z } (µ, var( | {z } statistic sampling distribution sampling variation and × = X̄ − µ ∼ Normal ) | {z } (0, var(×) | {z } | {z } sampling error sampling distribution sampling variation We do not know where exactly is × among the red ×’s. However, using the empirical rules, we can be 95% certain that × is no more than p 0 ± 2 var(×) STAT 151 Class 5 Slide 12 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling variation Sample 1 2 .. . 3, 1, 6, 4, 5, 6, .. . 1, 2, 6 2 X̄ 4.2 3 .. . Sampling error × 4.2 − µ 3 −µ .. . k 4, 5, 6, 1, 7 4.6 4.6 − µ Any X1 , X2 , X3 , X4 , X5 X1 +...+X5 5 X1 +...+X5 5 −µ 1 Sampling variation measures the changes in X̄ [var(X̄ )] and its sampling error × [var(X̄ − µ)] under random sampling 2 X̄ = 3 var(X1 ) measures how different X1 s are observable under random sampling, e.g., X1 = 3 in sample 1, and X1 = 1 in sample 2, etc. 4 var(X1 ) = var(X2 ) = ... because random sampling affects X1 , X2 , ... equally 5 var(X1 ) ≡ var(X ) since different X1 are observable due to the inherent variance of X in the population STAT 151 Class 5 X1 +...+X5 5 Slide 13 so var(X̄ ) are due to var(X1 ),..., var(X5 ) Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling variation (2) var(X̄ ) = var = = X1 + ... + Xn n 1 var(X1 + ... + Xn ) n2 1 [var(X1 ) + ... + var(Xn )] {z } n2 | X1 ,...,Xn are independent = 1 n2 n × var(X ) | {z } var(X1 )=...=var(Xn )≡var(X ) = var(X ) n } | {z depends on var(X ) and n var(sampling error) = var(X̄ − µ) = var(X̄ ) | {z } µ is a constant Sampling variation depends on STAT 151 Class 5 Slide 14 (1) var(X ), the variation of X in the population (2) n, the sample size Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Why sampling variation matters? Sampling distribution Sampling error 0 Large sampling variation Our sampling error × is among the ×’s and so may be large STAT 151 Class 5 Slide 15 0 Small sampling variation Our sampling error × is among the ×’s and so never too large Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion What is a proportion? Example We wish to estimate the proportion, p, of homeless people in a population of N individuals. Let X indicate whether someone is homeless: 1 homeless X = 0 not homeless Suppose the value of X in the population are X1 = 1 (homeless), X2 = 0 (not homeless), X3 = 0,...,XN = 1, which is a collection of 1’s and 0’s #10 s N 1 + 0 + 0 + ... + 1 = N X1 + X2 + X3 + ... + XN = =µ N p= Hence a proportion is a special case of µ with only 1’s and 0’s STAT 151 Class 5 Slide 16 Outline Sampling Sampling distribution of a mean Sampling distribution of a proportion Sampling to estimate a proportion Example (cont’d) We take a sample X1 , ..., Xn and estimate p ≡ µ using X̄ ≡ p̂ = X1 , ..., Xn are: X1 + ... + Xn n 1 with probability p 0 with probability 1 − p We use CLT for X̄ , i.e., X̄ ∼ N(µ, var(X ) ) n } | {z var(X̄ ) p2 z}|{ var(X ) = E(X 2 ) − E(X )2 = (1)2 p + (0)2 (1 − p) − µ2 = p − p 2 = p(1 − p) Hence CLT for p̂ is p̂ ∼ N(p, p(1−p) ) n STAT 151 Class 5 Slide 17