Download Sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Section 7.1
Sampling Distributions
Introduction
Statistical inference: use information from a sample to draw
conclusions about a wider population.
Different random samples yield different statistics.
We can think of a statistic as a random variable because it takes
numerical values that describe the outcomes of the random
sampling process.
Population
Collect data from a
representative Sample...
Sample
Make an Inference
about the Population.
Parameter
– a number that describes the
population
– a fixed number
– in practice, we do not know its
value because we cannot examine
the entire population
Statistic
– a number that describes a sample
– the value of a statistic is known when we
have taken a sample, but it can change
from sample to sample
– we often use a statistic to estimate an
unknown parameter
Compare
Parameter
Statistic
mean: μ
standard deviation: σ
proportion: p
mean: x-bar
standard deviation: s
proportion: 𝑝 (p-hat)
• Sometimes we call the
parameters “true”;
true mean, true
proportion, etc.
• Sometimes we call the
statistics “sample”;
sample mean, sample
proportion, etc.
Identify the population, the parameter, the sample, and the
statistic in each of the following settings.
1) A pediatrician wants to know the 75th percentile for the
distribution of heights of 10-year-old boys, so she takes
a sample of 50 patients and calculates that the 75th
percentile in the sample is 56 inches.
Population is all 10-year-old boys
The parameter of interest is the 75th percentile for all
10-year-old boys.
The sample is the 50 10-year-old boys included in the
sample; the statistic is 56 inches, the 75th percentile of
the heights in the sample.
The statistic is 56 inches, the 75th percentile of the
heights in the sample.
2) A Pew Research Center Poll asked 1102 12- to 17-yearolds in the United States if they have a cell phone. Of the
respondents, 71% said Yes.
<http://www.pewinternet.org/Reports/2009/14--Teensand-Mobile-Phones-Data-Memo.aspx>
Population is all 12- to 17-year olds in the U.S.
The parameter of interest is p, the proportion of all
12- to 17-year olds with cell phones.
The sample is the 1102 12- to 17-year-olds in the
sample.
The statistic is the sample proportion with a cell phone,
𝑝 = 0.71
Sampling Variability
Sampling variability: the value of a statistic varies in repeated
random sampling.
To make sense of sampling variability, we ask, “What would
happen if we took many samples?”
Sample
Population
Sample
Sample
Sample
Sample
Sample
Sample
Sample
?
Sampling variability
Why is sampling variability not fatal?
– Take a large number of samples from the same
population
– Calculate the sample mean 𝑥 or sample
proportion 𝑝 for each sample
– Make a histogram of the values of 𝑥 or 𝑝
– Examine the distribution displayed in the
histogram for shape, center, and spread, as well
as outliers or other deviations
p̂
Sampling Distribution
The sampling distribution of a statistic is the
distribution of values taken by the statistic in all
possible samples of the same size from the same
population.
In practice, it’s difficult to take all possible
samples of size n to obtain the actual sampling
distribution of a statistic. Instead, we can use
simulation to imitate the process of taking many,
many samples.
Sampling Distribution vs. Population Distribution
There are actually three distinct distributions involved when we sample
repeatedly and measure a variable of interest.
1. The population distribution gives the values of the variable
for all the individuals in the population.
2. The distribution of sample data shows the values of the
variable for all the individuals in the sample.
3. The sampling distribution shows the statistic values from all
the possible samples of the same size from the population.
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Choosing cards
We used Fathom to simulate choosing 100 SRSs of size 5 from a deck of cards.
The graph below shows the distribution of the sample median for these 100
samples.
Measures from Collection 1 Dot Plot
2
3
4 5 6 7
8
Sample M e dian
9
10
Problem:
a) There is one dot on the graph at 2. Explain what this value represents
b) Describe the distribution. Are there any obvious outliers?
c) Would it be surprising to get a sample median of 4 or smaller in an SRS of
size 5 from this deck? Justify your answer.
d) Suppose that another student prepared a different deck of cards and
claimed that it was exactly the same as the one used in the Activity. When you
take an SRS of size 5, the median is 4. Does this provide convincing evidence
12
that the student’s deck is different? Explain.
Solution:
a) In one SRS of size 5, the median of the sample was 2.
b) Shape: The graph is roughly symmetric with a single peak at
6. Center: The mean of the distribution is about 6. Spread: The
values fall mostly between 4 and 8, although there are values
as low as 2 and as high as 10. Outliers: There don’t seem to be
any outliers.
c) No. Getting a sample median of 4 or smaller occurs fairly
often just by chance when taking random samples of size 5
from a deck of cards with the aces and face cards removed. A
median of 4 or smaller occurred in 22 of the 100 trials of the
simulation.
d) Getting a sample median of 4 does not provide convincing
evidence that the student’s deck is different. Because getting a
median of 4 or smaller is somewhat likely to happen by chance
alone, it is plausible that the deck is the same.
13
Describing Sampling Distributions
“How trustworthy is a statistic as an estimator of the parameter?”
Center: Biased and unbiased estimators
In the chips example, we collected many samples of size 20 and calculated
the sample proportion of red chips. How well does the sample proportion
estimate the true proportion of red chips, p = 0.5?
Note that the center of the approximate sampling
distribution is close to 0.5. In fact, if we took ALL
possible samples of size 20 and found the mean of
those sample proportions, we’d get exactly 0.5.
A statistic used to estimate a parameter is an
unbiased estimator if the mean of its sampling
distribution is equal to the true value of the
parameter being estimated.
Describing Sampling Distributions
Spread: Low variability is better!
Unfortunately, using an unbiased estimator doesn’t guarantee that the value
of your statistic will be close to the actual parameter value.
n=100
n=1000
Larger samples have a clear advantage over smaller samples. They are
much more likely to produce an estimate close to the true value of the
parameter.
Describing Sampling Distributions
One important and surprising fact is that the variability of a
statistic in repeated sampling does not depend very much on the
size of the population.
Variability of a Statistic
The variability of a statistic is described by the spread of
its sampling distribution. This spread is determined mainly
by the size of the random sample. Larger samples give
smaller spreads. The spread of the sampling distribution
does not depend much on the size of the population, as long
as the population is at least 10 times larger than the sample.
Bias, Variability, and Shape
We can think of the true value of the population parameter as the
bull’s-eye on a target and of the sample statistic as an arrow fired
at the target.
Both bias and variability describe
what happens when we take many
shots at the target.
Bias means that our aim is off and we
consistently miss the bull’s-eye in the
same direction. Our sample values do
not center on the population value.
High variability means that repeated
shots are widely scattered on the target.
Repeated samples do not give very
similar results.