Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 18 Sampling distribution models math2200 Sample proportion • Kerry vs. Bush in 2004 – A Gallup Poll • 49% for Kerry • 1016 respondents – A Rasmussen Poll • 45.9% for Kerry • 1000 respondents – Why the answers are different? Model • Let Y be the number of people favoring Kerry in a sample of size n=1000 • Y ~ Binomial(n,p) – p: the proportion of people for Kerry in the entire population • When n is large, Y can be approximated by Normal model with mean np and variance npq. Modeling sample proportion • The sample proportion pq – Normal model with mean p and variance n N p, pq n Kerry vs. Bush (cont’) – Assume the true population proportion voting for Kerry is 49%. – The sample proportion p̂ = Y/n has a normal model with mean 0.49 and standard deviation 0.0158 (n=1000) – Then we know that both 49% and 45.9 % are reasonable to appear (0.459 - 0.49)/0.0158= - 1.962 Sampling Distribution Model • Consider the sample proportion as a random variable instead of a number. The distribution of the sample proportion is called the sampling distribution model for the proportion. Central limit theorem (CLT) • If the observations are drawn – independently – from the same population (equivalently, distribution) the sampling distribution of the sample mean becomes normal as the sample size increases. • The population distribution could be unknown. CLT • Suppose the population distribution has mean μand standard deviation σ • The sample mean has mean μand standard deviation . n • Let Y1, …, Yn be n independently and identically distributed random variables – E(Y1) = μ – Var(Y1)= σ2 • Then as n increases, the distribution of (Y1+…+Yn)/n tends to a normal model with mean μand standard deviation n Standard Error • If we don’t know or σ, the population parameters, we will use sample statistics to estimate. • The estimated standard deviation of a sampling distribution is called a standard error. Standard Error (cont.) • For a sample proportion, the standard error is SE ( pˆ ) pˆ qˆ n • For the sample mean, the standard error is s SE y n The Process Going Into the Sampling Distribution Model What Can Go Wrong? • Don’t confuse the sampling distribution with the distribution of the sample. – When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. – The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get. What Can Go Wrong? (cont.) • Beware of observations that are not independent. – The CLT depends crucially on the assumption of independence. – You can’t check this with your data—you have to think about how the data were gathered. • Watch out for small samples from skewed populations. – The more skewed the distribution, the larger the sample size we need for the CLT to work. Summary • Sample proportions or sample means are statistics – They are random because samples vary – Their distribution can be approximated by normal using the CLT • Be aware of when the CLT can be used – n is large – If the population distribution is not symmetric, a much larger n is needed • The CLT is about the distribution of the sample mean, not the sample itself