Download Sampling Distrib. of Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
10-1
TOPIC (10) – SAMPLING VARIABILITY AND
SAMPLING DISTRIBUTIONS
Recall that we typically cannot census the entire
population of interest so we take a sample from that
population in order to make estimates and draw
conclusions about the population.
The sample mean x is the estimator of the unknown
population mean µ.. Similarly, the sample standard
deviation is the estimator of the unknown population
standard deviation σ .
10-2
1) SAMPLING DISTRIBUTION of the Sample
Mean x
Important Point:: The value of x will vary with
each sample taken from the population.
10-3
EXAMPLE Suppose we had a very small population
of 5 units with X-values {2, 4, 8, 10, 14}. What is the
frequency distribution of the sample mean x based
on a random sample of 2 units?
Here, µ = 7.6 and σ = 4.77.
Let’s take samples of size 2 with replacement. The
total number of possible samples is 15.
x
3
5
6
8
6
7
9
9
11
12
2
4
8
10
14
Mean of x : µ x =
VAR1
4
3
No of obs
Sample
(2, 4)
(2, 8)
(2, 10)
(2, 14)
(4, 8)
(4, 10)
(4, 14)
(8, 10)
(8, 14)
(10, 14)
(2, 2)
(4, 4)
(8, 8)
(10, 10)
(14, 14)
2
1
0
0
2
4
6
8
10
12
Upper Boundaries (x <= boundary)
1
(3 + 5 +"+10 + 14) = 7.6
15
14
Expected
Normal
10-4
Std. Deviation of x : σ x =
σ
n
=
4.77
= 3.376
2
We can think of the list of samples (and their x
values) as a population of samples, each sample with
a value for the variable of interest!
Some Things To Note About The Behavior Of
Sample Means:
1)
2)
x varies from sample to sample (called
SAMPLING VARIABILITY)
the average of the = the average of the
sample means
population sampled
µx
=
µ
The sample mean x is said to be UNBIASED
for the population mean µ
3) The frequency distribution of the sample means
does not match the distribution of the original
population
centered in the same place but the shape and
variability (range) are different
10-5
4) Knowing the frequency distribution for the
sample means allows us to calculate probabilities
about the mean.
5) the variability of the < the variability of the
sample means
X-values in the
population sampled
σx
<
σ
6) The frequency distribution of the sample means
is called the SAMPLING DISTRIBUTION of
x.
Its shape and its variability, σ x , depend on the
sample size.
Its center, µ x , depends on whether the sampling is
unbiased or not.
All three characteristics depend on the sampling
method (i.e. all can change if the method changes)
10-6
Effects Of Sample Size And Sampling Method
Let’s take samples of size 3 with replacement. The
total number of possible samples is 35.
(4, 10, 10)
(4, 4, 14)
(4, 14, 14 )
(8, 8, 10)
(8, 10, 10)
(8, 8, 14)
Sample
(2, 4, 8)
Frequency Distribution of Sample Means, n=3
11
10
9
8
7
No of obs
(2, 8, 10)
(2, 10, 14)
(4, 8, 10)
(4, 10, 14)
(2, 2, 4)
(2, 4, 4)
(8, 14, 14)
(10, 10, 14)
(10, 14, 14)
(2, 2, 2)
(4, 4, 4)
(8, 8, 8)
(10, 10, 10)
(14, 14, 14)
(2, 4, 10)
(2, 4, 14)
(2, 8, 14)
(4, 8, 14)
( 8, 10, 14)
(2, 2, 8)
(2, 8, 8)
(2, 2, 10)
(2, 10,10)
(2, 2, 14)
(2, 14, 14)
(4, 4, 8)
(4, 8, 8)
(4, 4, 10)
6
5
4
3
2
1
0
0
2
4
6
8
10
12
14
Upper Boundaries (x <= boundary)
Mean of x : µ x = 7.6
Std. Deviation of x :
σx =
σ
4.77
=
= 2.754
n
3
Increasing the sample size made the shape even more
normal and decreased the variability as well.
Expected
Normal
10-7
What is the probability Pr(6.6 < x < 8.6)?
We can get an approximate answer using the fact that
it looks like x is normally distributed with a mean of
7.6 and a standard deviation of 2.75.
Pr( 6.6 < x < 8.6)
= Pr
F 6.6 − 7.6 < Z < 8.6 − 7.6I
H 2.75
2.75 K
= Pr( −0.36 < Z < +0.36)
= Pr(Z < +0.36) − Pr(Z < −0.36)
= 0.6406 − 0.3594
= 0.2812
10-8
SAMPLING DISTRIBUTION of x :
Suppose we have a population with a mean µ and a
standard deviation σ and we take a sample of size n.
As long as the sample is random and either we keep
the sample size to less than 5% of the population or
otherwise we sample with replacement, the frequency
distribution of the sample mean has the following
characteristics:
1.
2.
µx = µ
σ
σx =
n
3. The shape of the distribution is
a) a bell-curve (Normal), if the original population
that we sampled has a bell-curve distribution.
b) (CENTRAL LIMIT THEOREM) a bell-curve if
the sample size is relatively large regardless of the
shape of the frequency distribution of the
original population.
“relatively large” = 30 or more
10-9
EXAMPLE In a study of the evolutionary history of
the amphipod Gammarus minus, one of the variables
used to distinguish subspecies is the length of the first
antennae. If the population found in caves only recently
separated from the subspecies found in springs, the
length of the antennae should be similar in the two
groups. Spring animals have an average first antennal
length of 2.9 mm and a population standard deviation of
0.7mm.
What is the probability that your sample of 10 cave
animals would yield a mean length of 3.1 or larger if the
two subspecies split off recently ?
First we note that the sample size is relatively small
so we need to assume that antennal length is normally
distributed (which seems reasonable). Then the
sampling distribution of x is Normal with mean
µ x = 2.6 and standard deviation of
σx = σ
n
= 0 .7
10
= 0.221.
10-10
Then
Pr( x > 3.1) = 1 − Pr( x ≤ 3.1) where
⎛ x − 2 . 6 3 .1 − 2 .6 ⎞
Pr( x ≤ 3.1) = Pr ⎜
<
⎟
0
.
221
0
.
221
⎝
⎠
= Pr(Z < 2.26 ) = 0.9881
So , Pr( x > 3.1) = 1 − 0.9881 = 0.0119
Hence, this event is very unlikely if the two species
separated recently. Should your sample actually yield
a mean of 3.1 or more, it would imply that the
hypothesis that they split recently is wrong!
10-11
1) SAMPLING DISTRIBUTION of the Sample
Proportion p
If we want to estimate what proportion of the
population (π) are in the category we have defined as
a success, we take a random sample from that
population and calculate the sample proportion in that
category (p).
The shape of the sampling distribution for p depends
very heavily on the sample size n and the population
proportion π.
EXAMPLE Suppose we had repeatedly tossed n=5
dice where π = 0.5 for Pr(1). The frequency
distribution for the sample proportion is:
VAR1
800
700
600
No of obs
500
400
300
200
100
0
-1
0
1
2
3
Upper Boundaries (x <= boundary)
4
5
Expected
Normal
10-12
The mean of this sampling distribution is 0.5 and the
standard deviation is 0.2236.
Important Points: For any given sample size, the
closer π, the population proportion, is to 1/2,
A) the more symmetric the shape of the frequency
distribution of the sample proportion p
B) the larger the variability of values of p
Important Points: For any given value of π, the
population proportion, a larger sample size from that
population has
A) a more symmetric shape for the frequency
distribution of the sample proportion p
B) a smaller variability in the values of p
Let’s put what we’ve learned about sample
proportions into one statement:
10-13
SAMPLING DISTRIBUTION of p
Suppose we have a population with a binary variable.
The proportion of successes in the population is π
and we take a random sample of n.
As long as the sample is random so that each sampled
unit is independent of any other sampled unit, the
frequency distribution of the sample proportion has
the following characteristics:
1.
µp = π
2.
σp =
π (1 − π )
n
3. (CENTRAL LIMIT THEOREM) The shape of
the distribution is approximately normal when n is
large and π is not too close to 0 or 1.
The further π is from 1/2, the larger n has to be in
order for the shape to be a bell-curve. A rule-ofthumb is that the CLT holds if both
nπ ≥ 10 and n (1 − π ) ≥ 10 .
10-14
EXAMPLE Suppose that the proportion of a
specific form of birth defect was 1 in 1000 live births
around the early 1900s. A researcher claims that
better hygiene and health care has decreased the rate
to something much smaller (say 1 in 10,000 now). To
test this hypothesis the scientist collects birth records
at random for 25,000 children born in 1999. There
were 17 children with the birth defect. What is the
probability of observing so few defects or even fewer
if the 1 in a 1000 rate is still true?
If π = 1/1000 is true then the mean proportion of
successes in random samples of 25000 is
µ p = π = 0.001 and the standard deviation for a sample
proportion is
σp =
π (1 − π )
=
n
0.001(0.999 )
= 0.0002 . A
25000
random sample of 25,000 is sufficiently large for normality
but let’s check to make sure:
nπ = 25000(0.001) = 25 and of course
nπ = 25000(0.999 ) = 24975 . Both are bigger than 10
so we can proceed.
17 ⎞
0.00068 − 0.001⎞
⎛
⎛
Pr ⎜ p ≤
⎟ = Pr ⎜ Z ≤
⎟
25000 ⎠
0.0002
⎝
⎝
⎠
= Pr(Z ≤ −1.60 ) = 0.0548
There is evidence to suggest that the rate has gone
down but it isn’t very strong.