Download sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
CHAPTER 11:
Sampling Distributions
ESSENTIAL STATISTICS
Second Edition
David S. Moore, William I. Notz, and Michael A. Fligner
Lecture Presentation
2
Chapter 11 Concepts
Parameters and Statistics
Statistical Estimation and the Law of Large
Numbers
Sampling Distributions
The Sampling Distribution of x
The Central Limit Theorem
3
Connection Between Sample Mean &
Population Mean
Population data:
millions of test
score
Sample: N=100
4
Connection Between Sample Mean &
Population Mean
◙ Why sample mean to be trusted to get population mean?
◙ Sample means vary around the population mean
◙ They don’t vary much and not far from the population mean
Parameters and Statistics
As we begin to use sample data to draw conclusions about a wider
population, we must be clear about whether a number describes a
sample or a population.
A parameter is a number that describes some characteristic of the
population. In statistical practice, the value of a parameter is not known
because we cannot examine the entire population.
A statistic is a number that describes some characteristic of a sample.
The value of a statistic can be computed directly from the sample data.
We often use a statistic to estimate an unknown parameter.
Remember s and p: statistics come from samples and
parameters come from populations
We write µ (the Greek letter mu) for the population mean and σ for the
population standard deviation. We write x (x-bar) for the sample mean and s
for the sample standard deviation.
5
Sampling Variability
6
This basic fact is called sampling variability: the value of a statistic
varies in repeated random sampling.
To make sense of sampling variability, we ask, “What would happen if we
took many samples?”
Population
Sample
Sample
Sample
Sample
Sample
Sample
Sample
Sample
?
The Law of Large Numbers
How can x be an accurate estimate of ? After all, different
random samples would produce different values of x.
If we keep on taking larger and larger samples, the statistic
x is guaranteed to get closer and closer to the parameter m.
Draw observations at random from
any population with finite mean µ.
The law of large numbers says that
as the number of observations drawn
increases, the sample mean of the
observed values gets closer and
closer to the mean µ of the
population.
7
Sampling Distributions
8
The law of large numbers assures us that if we measure enough
subjects, the statistic x-bar will eventually get very close to the unknown
parameter µ.
If we took every one of the possible samples of a certain size, calculated
the sample mean for each, and graphed all of those values, we’d have a
sampling distribution.
The population distribution of a variable is the distribution of
values of the variable among all individuals in the population.
The sampling distribution of a statistic is the distribution of
values taken by the statistic in all possible samples of the same
size from the same population.
In practice, it’s difficult to take all possible samples of size n to obtain
the actual sampling distribution of a statistic. Instead, we can use
simulation to imitate the process of taking many, many samples.
Population Distributions vs.
Sampling Distributions
There are actually three distinct distributions involved when we
sample repeatedly and measure a variable of interest.
1)The population distribution gives the values of the variable
for all the individuals in the population.
2)The distribution of sample data shows the values of the
variable for all the individuals in the sample.
3)The sampling distribution shows the statistic values from
all the possible samples of the same size from the population.
9
The Sampling Distribution of
x
10
When we choose many SRSs from a population, the sampling distribution
of the sample mean is centered at the population mean µ and is less
spread out than the population distribution. Here are the facts.
The Sampling Distribution of Sample Means
Suppose that x is the mean of an SRS of size n drawn from a large population
with mean m and standard deviation s . Then :
The mean of the sampling distribution of x is mx = m
The standard deviation of the sampling distribution of x is
sx =
s
n
Note : These facts about the mean and standard deviation of x are true
no matter what shape the population distribution has.
If individual observations have the N(µ,σ) distribution, then the sample mean
of an SRS of size n has the N(µ, σ/√n) distribution regardless of the sample
size n.
Mean and Standard Deviation of
Sample Means
If numerous samples of size n are taken from
a population with mean µ and standard
deviation  , then the mean of the sampling
distribution of x is µ (the population mean)
and the standard deviation is: 
n
( is the population s.d.)
1
1
The Central Limit Theorem
12
Most population distributions are not Normal. What is the shape of the
sampling distribution of sample means when the population distribution
isn’t Normal?
It is a remarkable fact that as the sample size increases, the distribution
of sample means changes its shape: it looks less like that of the
population and more like a Normal distribution!
When the sample is large enough, the distribution of sample means is
very close to Normal, no matter what shape the population distribution
has, as long as the population has a finite standard deviation.
Draw an SRS of size n from any population with mean m and finite
standard deviation s . The central limit theorem (CLT) says that when n
is large, the sampling distribution of the sample mean x is approximately
Normal:
æ s ö
x is approximately N ç m,
÷
è
nø
The Central Limit Theorem
13
a)
b)
c)
d)
n=1
n=2
n = 10
n = 25
Means of random samples are less variable than individual observations.
Means of random samples are more Normal than individual observations.
The Central Limit Theorem
14
Consider the strange population
distribution from the Rice University
sampling distribution applet.
Describe the shape of the sampling
distributions as n increases. What do you
notice?
Normal Condition for Sample Means
If the population distribution is Normal, then so is the
sampling distribution of x. This is true no matter what
the sample size n is.
If the population distribution is not Normal, the central
limit theorem tells us that the sampling distribution
of x will be approximately Normal in most cases if
n ³ 30.