Download IV. A. Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
IV. Inferential Statistics
A. Sampling Distributions
In this section
 Unbiased Estimator with a Small Amount of Error
 Sampling Distribution of x
 Sampling Distribution of p̂
1.
Unbiased Estimator with a Small Amount of Error
In order to understand the idea of a sampling distribution you have to be comfortable
with some terminology that was introduced toward the beginning of the semester. The
probability of error in inferential statistics comes from what we call the sampling
distribution.
Inferential Statistics – making inference about a population based on a sample
Population – the group of interest or the set of all possible measurements
Sample – a subset of the population
Parameter – a numerical characteristic of the population
Statistic – a numerical characteristic of the sample
Since the population is often not available, we use statistics to estimate parameters. In
order to discuss this idea, we will look at the sample mean, x . When dealing with
measures other than the sample mean the equations change but the overall idea does not.
It is really the process we use to estimate parameters based on statistics that is important
to understand.
In statistical application, we take a random sample from the population and compute a
statistic, like x
The value of the statistic
x
depends on which items are selected for the sample
By putting the above two statements together you should understand that the statistic,
is a random variable. Taking a random sample and calculating x is equivalent to
randomly selecting a value for x out of all its possible values.
x
Sampling Distribution – the probability of a statistic over all possible samples
We want the sampling distribution to be centered at the value of the parameter and to
have little variation. Consider the graph below. In the graph there are three statistics that
could be used to estimate the parameter,  . Which one is best?
Statistic 3 is the best choice in the graph above. The worst choice is statistic 1 which
tends to always overestimate the parameter (the center of the distribution is right of the
parameter). This is what a statistic looks like when it is biased. If you take a random
sample, you will not end up with a biased statistic. This means your sampling distribution
will be centered over the parameter of interest. When looking at statistic 2 you see there
is a lot of variability. This is not good because the spread means there is a reasonable
chance of getting a value far from the parameter. Notice with statistic 3 most of the
outcomes are very close to the parameter we are estimating. This means that when using
the statistic to estimate the parameter you can expect a small amount of error in the
estimate. Hopefully, you remember that we can decrease this error by increasing the
sample size. How this really works mathematically is that increasing your sample size
decreases the variability in the sampling distribution of the statistic.
2.
Sampling Distribution of x
The statistic x (sample mean) estimates  (population mean). You should understand
that in this class we will only look at good statistics meaning that with proper techniques
the sampling distribution will be centered with a small amount of variability. This is the
case when looking at the sampling distribution for x . Let us look at a couple facts about
the sampling distribution of x .
x  
The average value of x across all possible samples is  , the population mean.
This should make since because we just said the sampling distribution of x is
centered about the parameter it estimates which is  .

x 
n
The standard deviation of the sampling distribution of
x
n
is the population
standard deviation  divided by n . Notice that as
increases the sample to
sample variability in x decreases. If you increase the denominator, the overall
quantity decreases. Therefore as has been stated previously, increasing sample
size decreases the error when using a statistic to estimate a parameter.
Standard Error – is the standard deviation of the sampling distribution
( x


n
is the standard error of
x)
Remember in section III. C. that if we have a normal distribution and when know the
mean and standard deviation, then we can find the corresponding probability using the
methods the transformation
Z
x

. The important part here is that we take an
observation and subtract the mean and divide by the standard deviation. We can use the
same format for the sampling distribution.
If our sample comes from a normal distribution with mean
then
Z
x


and standard deviation 
has a standard normal distribution.
n
Notice that in the transformation above we know the population is normally distributed.
In reality when collecting data it is rare that this would ever be the case. So what can be
done when the population is not normally distributed? This has been thoroughly
researched by statisticians. The research has led to one of the most important theorems in
statistics.
Central Limit Theorem – if we sample from a population with mean
deviation  then
Z
x


and standard
is approximately standard normal for large
n
n
Notice that the central limit theorem works when we do not have a normal distribution.
Of course the question here becomes how large does n have to be in order for the central
limit theorem to work? The answer is if n  30 or larger, the central limit theorem will
apply in almost all cases. So for class purposes, this is the general rule we will use.
Example 1
A population of soft drink cans has amounts of liquid following a normal distribution
with   12 and   0.2 oz.
1. What is the probability that a single can is between 11.9 and 12.1 oz.
2. What is the probability that x is between 11.9 and 12.1 for n = 16 cans
Example 2
A population of trees have heights with a mean of 110 feet and a standard deviation of 20
feet. Suppose a sample of 100 trees is selected and find the following.
1.
2.
3.
4.
x
x
P( x  108 feet)
What about P( X  108) ?
3.
Sampling Distribution of p̂
Population Proportion  p 
Sample Proportion  p̂ 
 pˆ  p
 pˆ 
# in population with characteri stic
# in population
# in sample with characteri stic
n
p1  p 
n
If we sample from a population with proportion
p , then Z 
pˆ  p
p1  p 
n
is
approximately standard normal for large n
Example 3
Suppose the president’s approval rating is 56% and we look at samples of size 100
 p̂
Find  p̂
1. Find
2.
Example 4
A survey of 1024 registered voters yields 512 plan to vote for the republican candidate
p = proportion of all voters who plan to vote for the republican candidate
1. Calculate
p̂
2. Calculate the margin of error (Remember this is
1
n
from section I. B.)
3. Calculate an estimate for the standard deviation of the sampling distribution
4. The empirical rule says that 95% of data should be within 2 standard deviations. Do
you see where the margin of error comes from?