Download Additional notes on sampling

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
The Concept of a Sampling
Distribution
(Predicting the behavior of a
statistic)
Parameter & Statistic
A parameter is a numerical descriptive measure
of a population. Because it is based on all the
observations in the population, its value is
almost always unknown.
A sample statistic is a numerical descriptive
measure of a sample. It is calculated from the
observations in the sample.
Common Statistics & Parameters
Sample Statistic
Mean
Standard
Deviation
Variance
Binomial
Proportion
Population Parameter
µ
s
σ
s2
p
• The median household income for US is roughly $51,900;
• The mean household income is $70,900 (people like Bill Gates
pull the mean to the right).
• Now, suppose we take a random sample of 2,000 U.S.
Households and gather information on their annual income.
• Assume we got a represantative sample to ourselves.
• So the drawn sample, on average, should look like U.S.
• There should be whole types of people like homeless people,
academicians, very rich people, etc.
• So we expect the mean household income about $70,900 .
• Will it be exactly?
• If we get different samples of 2,000, we would expect
some means to be higher, and some to be lower.
• Might we get a sample of 2,000 with a mean household
income of $500,000?
• That is possible but highly unlikely unless our sample is
just from very rich part.
• It is also highly unlikely to have mean annual income of
$7,000.
• We can not compare any two statistics on the basis of
their performance for a single sample.
• Sample statistics are themselves are random variables
because different samples can lead to different values
for the sample statistics.
• As random variables, sample statistics must be judged
and compared on the basis of their probability
distribution (i.e., the collection of values and
associated probabilities of each statistic that would be
obtained if the sampling experiment was repeated a
very large number of times).
Sampling Distribution
The sampling distribution of a sample statistic
calculated from a sample of n measurements is
the probability distribution of the statistic.
Developing
Sampling Distributions
Suppose There’s a Population ...
• Population size, N = 4
• Random variable, x
• Values of x: 1, 2, 3, 4
• Uniform distribution
Population Characteristics
Summary Measure
N
µ=
∑x
i=1
N
i
= 2.5
Population Distribution
3
2
1
0
P(x)
x
1
2
3
4
All Possible Samples
of Size n = 2
16 Samples
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
1st 2nd Observation
Obs 1
2
3
4
1
1,1 1,2 1,3 1,4
1 1.0 1.5 2.0 2.5
2
2,1 2,2 2,3 2,4
2 1.5 2.0 2.5 3.0
3
3,1 3,2 3,3 3,4
3 2.0 2.5 3.0 3.5
4
4,1 4,2 4,3 4,4
4 2.5 3.0 3.5 4.0
Sample with replacement
Sampling Distribution
of All Sample Means
16 Sample Means
Sampling Distribution
of the Sample Mean
1st 2nd Observation
Obs 1
2
3
4
1 1.0 1.5 2.0 2.5
2 1.5 2.0 2.5 3.0
3 2.0 2.5 3.0 3.5
4 2.5 3.0 3.5 4.0
P(x)
.3
.2
.1
.0
x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Summary Measure of
All Sample Means
N
∑x
1.0 + 1.5 + ... + 4.0
µX =
=
= 2.5
N
16
i
i=1
Comparison
Sampling Distribution
of the Sample Mean
Population Distribution
.3
.2
.1
.0
P(x)
x
1
2
3
4
P(x)
.3
.2
.1
.0
x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
µ x = 2.5
Example 5.1
Consider the popular casino game of craps, in which a player
throws two dice and bets on the outcome (the sum of total
dots showing on the upper faces of two dice). If the sum total
of dice is 7 or 11, the roller wins $5; if the total is a 2,3, or 12,
the roller loses $5; and for any other total ( 4, 5,6,8,9,10)no
money is lost.
Let x represent the result of the come-out roll wager.
Outcome of wager -5
0
5
p(x)
6/9
2/9
1/9
Now consider a random sample of n=3 come-out rolls.
Find the sampling distribution of the sample mean.
Find the distribution of the sample median.
Another example
• Lets consider you as population
• And we are interested in your grades from the first
midterm exam
• Mean of your first exam grades is 76.34
• Since we considered this class as the population, this
value will be parameter, µ.
• We will take 20 samples of size 15.
• We will calculate mean and median values for each
sample.
Sampling distributions for mean and
median
• Don’t confuse the sampling distribution with
the distribution of the sample.
– When you take a sample, you look at the
distribution of the values, usually with a
histogram, and you may calculate summary
statistics.
– The sampling distribution is an imaginary
collection of the values that a statistic might have
taken for all random samples—the one you got
and the ones you didn’t get.
5.2
Properties of Sampling
Distributions: Unbiasedness
and Minimum Variance
Point Estimator
A point estimator of a population parameter is a
rule or formula that tells us how to use the
sample data to calculate a single number that
can be used as an estimate of the population
parameter.
Estimates
• If the sampling distribution of a sample
statistic has a mean equal to the population
parameter the statistic is intended to estimate,
the statistic is said to be an unbiased
estimate of the parameter.
• If the mean of the sampling distribution is not
equal to the parameter, the statistic is said to
be a biased estimate of the parameter.
Comparison
Sampling Distribution
of the Sample Mean
Population Distribution
.3
.2
.1
.0
P(x)
x
1
2
3
4
P(x)
.3
.2
.1
.0
x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
µ x = 2.5
Unbiased
Biased
stt315
• For 20 samples, we can not have an
unbiased estimate for µ.
• But statistical theory tells us that for very
large sample size, sample mean is an
unbiased estimate of µ.
Standard Error
The standard deviation of a sampling distribution
measures another important property of statistics: the
spread of these estimates generated by repeated
sampling.
• Even though both statistics
have sampling distribution
centered at parameter, the
probability that A is closer to
the parameter value is higher
than the probability that B is
closer to the parameter value.
• It is better to use a statistics
which is centered at the
parameter and has smaller
variation, i.e. smaller standard
error.
Standard Error(cont.)
• To make an inference about a population parameter,
we use the sample statistic with a sampling
distribution that is unbiased and has a smaller
standard deviation than the any other unbiased
statistic.
• The standard deviation of the sampling distribution
of a statistic is also called the standard error of the
statistic.
Back to Example 5.1
With smaller standard error, sample mean seems as the better
estimator for population mean.
Thinking Challenge
5.3
The Sampling Distribution of a
Sample Mean and the Central
Limit Theorem
Properties of the Sampling
Distribution of x
1. Mean of the sampling distribution equals mean
of sampled population*, that is,
µ x = E (x ) = µ.
2. Standard deviation of the sampling distribution
equals
Standard deviation of sampled population
Square root of sample size
That is, σ x =
σ
n
.
Standard error of sample mean
Theorem 5.1
If a random sample of n observations is selected from
a population with a normal distribution, the sampling
distribution of x will be a normal distribution.
Sampling from
Normal Populations
• Central Tendency
µx = µ
Population Distribution
σ = 10
• Dispersion
σ
σx =
n
– Sampling with
replacement
µ = 50
x
Sampling Distribution
n=4
σx = 5
n =16
σx = 2.5
µx- = 50
x
Standardizing the Sampling
Distribution of x
x − µx x − µ
z=
=
σ
σx
n
Sampling
Distribution
Standardized Normal
Distribution
σ=1
σx
µx
x
µ =0
z
Thinking Challenge
You’re an operations analyst
for AT&T. Long-distance
telephone calls are normally
distributed with µ = 8 min.
and σ = 2 min. If you select
random samples of 25 calls,
what percentage of the
sample means would be
between 7.8 & 8.2 minutes?
© 1984-1994 T/Maker Co.
Sampling Distribution Solution*
x−µ
Sampling
Distribution
7.8 − 8
z=
=
= −.50
2
σ
25
n
x − µ 8.2 − 8
z=
=
= .50
2
σ
Standardized Normal
25
n
Distribution
σx = .4
σ=1
.3830
.1915 .1915
7.8 8 8.2 x
–.50 0 .50
z
Sampling from
Non-Normal Populations
• Central Tendency
µx = µ
Population Distribution
σ = 10
• Dispersion
σ
σx =
n
– Sampling with
replacement
µ = 50
x
Sampling Distribution
n=4
σx = 5
n =30
σx = 1.8
µx- = 50
x
Central Limit Theorem
Consider a random sample of n observations selected
from a population (any probability distribution) with
mean μ and standard deviation σ. Then, when n is
sufficiently large, the sampling distribution of x will be
approximately a normal distribution with mean µ x = µ
and standard deviationσ x = σ n . The larger the
sample size, the better will be the normal
approximation to the sampling distribution of x .
Central Limit Theorem
As sample
size gets
large
enough
(n ≥ 30) ...
σx =
σ
n
sampling
distribution
becomes almost
normal.
µx = µ
x
Central Limit Theorem Example
The amount of soda in cans of a
particular brand has a mean of
12 oz and a standard deviation
of .2 oz. If you select random
samples of 50 cans, what
percentage of the sample means
would be less than 11.95 oz?
SODA
Central Limit Theorem Solution*
x−µ
11.95 − 12
z=
=
= −1.77
.2
σ
Sampling
Standardized Normal
n
50
Distribution
Distribution
σx = .03
.0384
σ=1
.4616
11.95 12
x
–1.77 0
Shaded area exaggerated
z
• When population standard deviation is
unknown;
̅
Thinking Challenge
• Assume that the systolic blood pressure of 30-year-old males
is normally distributed, with an average of 122 mmHg and a
standard deviation of 10mmHg. A random sample of 16 men
from this age group is selected.
• Calculate the probability that the average blood pressure of
the sample will be greater than 125mmHg?
• Calculate the probability that the average blood pressure of
this sample will be between 118 and 124 mmHg?
• Calculate the probability that the blood pressure of an
individual male from this population will be between 118 and
124mmHg?
Thinking Challenge
• Assume that the average weight of an NFL player
is 245.7 pounds with a standard deviation of 34.5
pounds, but the probability distribution of the
population is unknown. If a random sample of 32
players is selected,
• what is the probability that the average weight of
the sample will be less than 234 pounds?
• What is the probability that the average weight of
the sample is between 248 and 254 pounds?
The Sampling Distribution of
the Sample Proportion
(Predicting the behavior of
discrete random variables)
Sample Proportion
Just as the sample mean is a good estimator of the
population mean, the sample proportion—denoted
p̂ — is a good estimator of the population
proportion p. How good the estimator p̂ is will
depend on the sampling distribution of the statistic.
This sampling distribution has properties similar to
those of the sampling distribution of x.
Z-score for the
sampling distribution
of proportion
• When we do not know the population
proportions;
Thinking Challenge
• A report claims that 15% of women are left-handed.
a) Calculate the probability that more than 12% of a random
sample of 100 women is left-handed.
np=100*0.15=15≥15
n(1-p)=100*0.85=85 ≥15
So we can use the normal approximation to the binomial
distribution.
Thinking Challenge (cont.)
b) Calculate the probability that 11% to 16%-women
random sample is left-handed.
0.11
0.11 < ̂ < 0.16
0<
0.15
<
0.15 ∗ 0.85
100
< 1.12 + 0 <
<
0.16
0.15
1.12 <
0.15 ∗ 0.85
100
< 0.28
0.3686 + 0.1103
< 0.28
0.4789