Download Chapter 18 sampling distribution models

Chapter 18 Sampling distribution models math2200 Sample proportion • Kerry vs. Bush in 2004 – A Gallup Poll • 49% for Kerry • 1016 respondents – A Rasmussen Poll • 45.9% for Kerry • 1000 respondents – Why the answers are different? Model • Let Y be the number of people favoring Kerry in a sample of size n=1000 • Y ~ Binomial(n,p) – p: the proportion of people for Kerry in the entire population • When n is large, Y can be approximated by Normal model with mean np and variance npq. Modeling sample proportion • The sample proportion  pq – Normal model with mean p and variance n  N  p,  pq   n  Kerry vs. Bush (cont’) – Assume the true population proportion voting for Kerry is 49%. – The sample proportion p̂ = Y/n has a normal model with mean 0.49 and standard deviation 0.0158 (n=1000) – Then we know that both 49% and 45.9 % are reasonable to appear (0.459 - 0.49)/0.0158= - 1.962 Sampling Distribution Model • Consider the sample proportion as a random variable instead of a number. The distribution of the sample proportion is called the sampling distribution model for the proportion. Central limit theorem (CLT) • If the observations are drawn – independently – from the same population (equivalently, distribution) the sampling distribution of the sample mean becomes normal as the sample size increases. • The population distribution could be unknown. CLT • Suppose the population distribution has mean μand standard deviation σ • The sample mean has mean μand standard deviation  . n • Let Y1, …, Yn be n independently and identically distributed random variables – E(Y1) = μ – Var(Y1)= σ2 • Then as n increases, the distribution of (Y1+…+Yn)/n tends to a normal model with mean μand standard deviation  n Standard Error • If we don’t know  or σ, the population parameters, we will use sample statistics to estimate. • The estimated standard deviation of a sampling distribution is called a standard error. Standard Error (cont.) • For a sample proportion, the standard error is SE ( pˆ )  pˆ qˆ n • For the sample mean, the standard error is s SE  y   n The Process Going Into the Sampling Distribution Model What Can Go Wrong? • Don’t confuse the sampling distribution with the distribution of the sample. – When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. – The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get. What Can Go Wrong? (cont.) • Beware of observations that are not independent. – The CLT depends crucially on the assumption of independence. – You can’t check this with your data—you have to think about how the data were gathered. • Watch out for small samples from skewed populations. – The more skewed the distribution, the larger the sample size we need for the CLT to work. Summary • Sample proportions or sample means are statistics – They are random because samples vary – Their distribution can be approximated by normal using the CLT • Be aware of when the CLT can be used – n is large – If the population distribution is not symmetric, a much larger n is needed • The CLT is about the distribution of the sample mean, not the sample itself

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 18 sampling distribution models