Download Sampling Distributions - CSUSB Math Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Learning Objectives
Sampling Distributions
1.  Statistic vs. Parameter
2.  Sampling Distributions
3.  Mean and Standard Deviation of the
Sampling Distribution of a Proportion
4.  Standard Error
Section 1
How Likely Are the Possible Values of a
Statistic? The Sampling Distribution
5.  Sampling Distribution Example
6.  Population, Data, and Sampling Distributions
Learning Objective 1:
Statistic and Parameter
Learning Objective 2:
Sampling Distributions
!  A statistic is a numerical summary of sample
Example:
data such as a sample proportion or sample
mean
!  A parameter is a numerical summary of a
population such as a population proportion or
population mean.
!  In practice, we seldom know the values of
parameters.
!  Parameters are estimated using sample data.
!  We use statistics to estimate parameters.
! 
Prior to counting the votes, the proportion in favor of
recalling Governor Gray Davis was an unknown
parameter.
! 
An exit poll of 3160 voters reported that the sample
proportion in favor of a recall was 0.54.
If a different random sample of about 3000 voters were
selected, a different sample proportion would occur.
! 
The sampling distribution of the sample proportion
shows all possible values and the probabilities for
those values.
1
Learning Objective 2:
Sampling Distributions
!  The sampling distribution of a statistic is the
probability distribution that specifies
probabilities for the possible values the
statistic can take.
!  Sampling distributions describe the variability
that occurs from study to study using sample
statistics to estimate population parameters
!  Sampling distributions help to predict how
close a statistic falls to the parameter it
estimates
Learning Objective 3:
Mean and SD of the Sampling Distribution of a
Proportion
!  For a random sample of size n from a
population with proportion p of outcomes in a
particular category, the sampling distribution
of the proportion of the sample in that
category has
Mean = p
standard deviation =
p(1 - p)
n
Learning Objective 4:
The Standard Error
Learning Objective 5:
Example: 2006 California Election
!  To distinguish the standard deviation of a
!  If the population proportion supporting the
sampling distribution from the standard
deviation of an ordinary probability
distribution, we refer to it as a standard
error.
reelection of Schwarzenegger was 0.50, would
it have been unlikely to observe the exit-poll
sample proportion of 0.565?
!  Based on your answer, would you be willing to
predict that Schwarzenegger would win the
election?
2
Learning Objective 5:
Example: 2006 California
Learning Objective 5:
Example: 2006 California Election
!  Given that the exit poll had 2705 people and
assuming 50% support the re-election of
Schwarzenegger,
! 
Find the estimate of the population proportion and
the standard error:
p = .5
σ=
.5 * (1 − .5)
= .0096
2705
Learning Objective 5:
Example: 2006 California Election
z=
(0.565 - 0.50)
= 6.8
0.0096
Learning Objective 6:
Population Distribution
!  Population distribution: This is the
probability distribution from which we take the
sample.
!  Values
!  The sample proportion of 0.565 is more than six
standard errors from the expected value of 0.50.
of its parameters are usually unknown.
They’re what we’d like to learn about.
!  The sample proportion of 0.565 voting for re-election
of Schwarzenegger would be very unlikely if the
population proportion were p = 0.50 or p < 0.50
3
Learning Objective 6:
Data distribution
Learning Objective 6:
Sampling Distribution
!  This is the distribution of the sample data.
!  This is the probability distribution of a sample
It’s the distribution we actually see in
practice.
!  It’s described by statistics
!  With random sampling, the larger the sample
size n, the more closely the data distribution
resembles the population distribution
statistic.
!  With random sampling, the sampling
distribution provides probabilities for all the
possible values of the statistic.
!  The sampling distribution provides the key for
telling us how close a sample statistic falls to
the corresponding unknown parameter.
!  Its standard deviation is called the standard
error.
Learning Objective 6:
Example
Learning Objective 6:
Example
!  In the 2006 U.S. Senate election in NY
!  An exit poll of 1336 voters showed
!  The population distribution is the 4.1 million values
! 
! 
67% (895) voted for Clinton
33% (441) voted for Spencer
!  When
all 4.1 million votes were tallied
68% voted for Clinton
!  32% voted for Spencer
! 
!  Let X= vote outcome with x=1 for Clinton and
x=0 for Spencer
of the x vote variable, 32% of which are 0 and 68% of
which are 1.
!  The data distribution is the 1336 values of the x vote
for the exit poll, 33% of which are 0 and 67% of which
are 1.
!  The sampling distribution of the sample proportion is
approximately a normal distribution with p=0.68 and
σ = 0.68(1− 0.68) /1336 = 0.013
!  Only the sampling distribution is bell-shaped; the
others are discrete and concentrated at the two
values 0 and 1
4
Learning Objectives
Sampling Distributions
1.  The Sampling Distribution of the Sample
Mean
2.  Effect of n on the Standard Error
3.  Central Limit Theorem (CLT)
4.  Calculating Probabilities of Sample Means
Section 2
How Close Are Sample Means to
Population Means?
Learning Objective 1:
The Sampling Distribution of the Sample Mean
!  The sample mean, x, is a random variable.
!  The sample mean varies from sample to
sample.
!  By contrast, the population mean, µ, is a
single fixed number.
Learning Objective 1:
The Sampling Distribution of the Sample Mean
!  For a random sample of size n from a population having mean
µ and standard deviation σ, the sampling distribution of the
sample mean has:
! 
Center described by the mean µ (the same as the mean
of the population).
! 
Spread described by the standard error, which equals
the population standard deviation divided by the square
root of the sample size:
n
!  standard error of x = σ
5
Learning Objective 1:
Example 1: Pizza Sales
Learning Objective 2:
Effect of n on the Standard Error
!  Knowing how to find a standard error gives us a
!  Daily sales at a pizza restaurant vary from
day to day.
!  The sales figures fluctuate around a mean µ
= $900 with a standard deviation σ = $300.
!  What are the center and spread of the
sampling distribution?
mechanism for understanding how much variability to
expect in sample statistics “just by chance.”
σ
!  The standard error of the sample mean =
n
!  As the sample size n increases, the denominator
increases, so the standard error decreases.
!  With larger samples, the sample mean is more likely
to fall closer to the population mean.
µ = $900
standard error =
300
= 113
7
Learning Objective 3:
Central Limit Theorem
Learning Objective 3:
Central Limit Theorem (CLT)
!  Question: How does the sampling distribution
!  For random sampling with a large sample
of the sample mean relate with respect to
shape, center, and spread to the probability
distribution from which the samples were
taken?
size n, the sampling distribution of the sample
mean is approximately a normal distribution.
!  This result applies no matter what the shape
of the probability distribution from which the
samples are taken (with a few notable
exceptions).
6
Learning Objective 3:
CLT: How Large a Sample?
Learning Objective 3:
CLT: Impact of increasing n
!  The sampling distribution of the sample mean takes
more of a bell shape as the random sample size n
increases.
!  The more skewed the population distribution, the
larger n must be before the shape of the sampling
distribution is close to normal.
!  In practice, the sampling distribution is usually close
to normal when the sample size n is at least about
30.
!  If the population distribution is approximately normal,
then the sampling distribution is approximately
normal for all sample sizes.
Learning Objective 3:
CLT: Making Inferences
Learning Objective 4:
Calculating Probabilities of Sample Means
!  For large n, the sampling distribution is
!  The distribution of weights of milk bottles is normally
approximately normal even if the population
distribution is not.
!  This enables us to make inferences about
population means regardless of the shape of
the population distribution.
distributed with a mean of 1.1 lbs and a standard
deviation (σ)=0.20.
!  What is the probability that the mean of a random sample
of 5 bottles will be greater than 0.99 lbs?
! 
Calculate the mean and standard error for the sampling
distribution of a random sample of 5 milk bottles
!  By the CLT, x
is approximately normal with mean=1.1
and standard error = 0.2 5 =0.0894
! 
P( x >0.99)=
normcdf (.99,1E 99,1.1,
.20
) = .89
5
7
Learning Objective 4:
Calculating Probabilities of Sample Means
Learning Objective 4:
Calculating Probabilities of Sample Means
!  Closing prices of stocks have a right skewed
!  An automobile insurer has found that repair claims
distribution with a mean (µ) of $25 and σ= $20.
!  What is the probability that the mean of a random sample
of 40 stocks will be less than $20?
have a mean of $920 and a standard deviation of
$870. Suppose that the next 100 claims can be
regarded as a random sample from the long-run
claims process.
!  What is the probability that the average of the 100
claims is larger than $900?
! 
Calculate the mean and standard error for the sampling
distribution of a random sample of 40 stocks
x is approximately normal with mean=25
!  By the CLT,
and standard error = 20 40=3.1623
! 
P( x <20)=
normcdf (−1E99, 20, 25,
20
) = .06
40
normcdf (900,1E99, 920,
870
) = .59
100
Learning Objective 4:
Calculating Probabilities of Sample Means
Learning Objective 4:
Calculating Probabilities of Sample Means
Example: the distribution of actual weights of 8
oz. wedges of cheddar cheese produced by a
certain company is normal with mean =8.1 oz.
and standard deviation =0.1 oz.
!  Find the value x such that there is only a 10%
chance that the average weight of a sample
of five wedges will be above x.
Example: the distribution of actual weights of 8
oz. wedges of cheddar cheese produced by
a certain company is normal with mean =8.1
oz. and standard deviation =0.1 oz.
!  Find the value x such that there is only a 5%
chance that the average weight of a sample
of five wedges will be below x.
invnorm(.9,8.1,
.1
) = 8.16
5
invnorm(.05,8.1,
.1
) = 8.03
5
8
Learning Objectives
1.  Using the CLT to Make Inferences
2.  Standard Errors in Practice
3.  Sampling Distribution for a Proportion
Section 3
How Can We Make Inferences About a
Population?
Learning Objective 1:
Using the CLT to Make Inferences
Learning Objective 2:
Standard Errors in Practice
Implications of the CLT
In practice, standard errors are estimated
!  When
the sampling distribution of the sample
mean is approximately normal, x falls within 2
standard errors of µ with probability close to
0.95 and almost certainly falls within 3
standard errors of µ.
!  For large n, the sampling distribution of x is
approximately normal no matter what the
shape of the underlying population distribution.
!  Standard
errors have exact values depending
on parameter values, e.g.,
p(1− p) n for a sample proportion
! 
! 
σ
n
for a sample mean
!  In
practice, these parameter values are
unknown. Inference methods use standard
errors that substitute sample values for the
parameters in the exact formulas above
These estimated standard errors are the
numbers we use in practice.
9
Learning Objective 3:
Sampling Distribution for a Proportion
!  The binomial probability distribution is the
sampling distribution for the number of successes
in n independent trials
!  In practice, the sample proportion of successes is
the statistic usually reported
!  Since the sample proportion is simply the number
of successes divided by the number of trials, the
formulas for the mean and standard deviation of
the sampling distribution of the proportion of
successes are the formulas for the mean and
standard deviation of the number of successes
divided by n.
Learning Objective 3:
Sampling Distribution for a Proportion
!  For a binomial random variable with n trials and
probability p of success for each, the sampling
distribution of the proportion of successes has
! 
Mean = p
Standard error =
p(1− p)
n
!  These values can be found by taking the mean np
and the standard deviation np(1− p) for the
binomial distribution of the number of successes
and dividing by n
! 
10