Download Chapter 18 sampling distribution models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
Chapter 18
Sampling distribution models
math2200
Sample proportion
• Kerry vs. Bush in 2004
– A Gallup Poll
• 49% for Kerry
• 1016 respondents
– A Rasmussen Poll
• 45.9% for Kerry
• 1000 respondents
– Why the answers are different?
Model
• Let Y be the number of people favoring
Kerry in a sample of size n=1000
• Y ~ Binomial(n,p)
– p: the proportion of people for Kerry in the
entire population
• When n is large, Y can be approximated
by Normal model with mean np and
variance npq.
Modeling sample proportion
• The sample proportion

pq
– Normal model with mean p and variance
n

N  p,

pq 

n 
Kerry vs. Bush (cont’)
– Assume the true population proportion voting for
Kerry is 49%.
– The sample proportion p̂ = Y/n has a normal model
with mean 0.49 and standard deviation 0.0158
(n=1000)
– Then we know that both 49% and 45.9 % are
reasonable to appear
(0.459 - 0.49)/0.0158= - 1.962
Sampling Distribution Model
• Consider the sample proportion as a random
variable instead of a number. The distribution
of the sample proportion is called the
sampling distribution model for the
proportion.
Central limit theorem (CLT)
• If the observations are drawn
– independently
– from the same population (equivalently,
distribution)
the sampling distribution of the sample
mean becomes normal as the sample size
increases.
• The population distribution could be
unknown.
CLT
• Suppose the population distribution has mean μand
standard deviation σ
• The sample mean has mean μand standard
deviation  .
n
• Let Y1, …, Yn be n independently and identically
distributed random variables
– E(Y1) = μ
– Var(Y1)= σ2
• Then as n increases, the distribution of (Y1+…+Yn)/n
tends to a normal model with mean μand standard
deviation 
n
Standard Error
• If we don’t know  or σ, the population
parameters, we will use sample statistics to
estimate.
• The estimated standard deviation of a
sampling distribution is called a standard error.
Standard Error (cont.)
• For a sample proportion, the standard
error is
SE ( pˆ ) 
pˆ qˆ
n
• For the sample mean, the standard error is
s
SE  y  
n
The Process Going Into the
Sampling Distribution Model
What Can Go Wrong?
• Don’t confuse the sampling distribution
with the distribution of the sample.
– When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
– The sampling distribution is an imaginary
collection of the values that a statistic might
have taken for all random samples—the one
you got and the ones you didn’t get.
What Can Go Wrong? (cont.)
• Beware of observations that are not
independent.
– The CLT depends crucially on the assumption
of independence.
– You can’t check this with your data—you have
to think about how the data were gathered.
• Watch out for small samples from skewed
populations.
– The more skewed the distribution, the larger
the sample size we need for the CLT to work.
Summary
• Sample proportions or sample means are
statistics
– They are random because samples vary
– Their distribution can be approximated by normal
using the CLT
• Be aware of when the CLT can be used
– n is large
– If the population distribution is not symmetric, a
much larger n is needed
• The CLT is about the distribution of the sample
mean, not the sample itself