Download Chap 8 Show

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Chapter 8
Sampling Variability
&
Sampling Distributions
1
8.1: Basic Terms
Any quantity computed from values in a sample is
called a statistic.
The observed value of a statistic depends on the
particular sample selected from the population;
typically, it varies from sample to sample. This
variability is called sampling variability.
2
Sampling Distribution
The distribution of a statistic is called its
sampling distribution.
So you could have a sampling distribution
of a mean, median, max, min, etc.
3
1
Example
Consider a population that consists of the numbers 1, 2, 3,
4 and 5 generated in a manner that the probability of each
of those values is 0.2 no matter what the previous
selections were. This population could be described as the
outcome associated with a spinner such as given below.
The distribution is next to it.
x
1
2
3
4
5
p(x)
0.2
0.2
0.2
0.2
0.2
4
Example
If the sampling distribution for the means of
samples of size two is analyzed, it looks like
Sample
1, 1
1, 2
1, 3
1, 4
1, 5
2, 1
2, 2
2, 3
2, 4
2, 5
3, 1
3, 2
3, 3
Sample
3, 4
3, 5
4, 1
4, 2
4, 3
4, 4
4, 5
5, 1
5, 2
5, 3
5, 4
5, 5
1
1.5
2
2.5
3
1.5
2
2.5
3
3.5
2
2.5
3
3.5
4
2.5
3
3.5
4
4.5
3
3.5
4
4.5
5
1
1.5
2
2.5
3
3.5
4
4.5
5
frequency
1
2
3
4
5
4
3
2
1
25
p(x)
0.04
0.08
0.12
0.16
0.20
0.16
0.12
0.08
0.04
5
Example
The original distribution and the sampling
distribution of means of samples with n=2
are given below.
1
2
3
4
Original distribution
5
1
2
3
4
5
Sampling distribution
n=2
6
2
Example
Sampling distributions for n=3 and n=4 were
calculated and are illustrated below.
1
2
3
4
5
Sampling distribution n = 3
1
2
3
4
5
Sampling distribution n = 4
7
Simulations
2
To illustrate the general behavior
of samples of fixed size n, 10000
samples each of size 30, 60 and
120 were generated from this
uniform distribution and the means
calculated. Probability histograms
were created for each of these
(simulated) sampling distributions.
Notice all three of these look to be
essentially normally distributed.
Further, note that the variability
decreases as the sample size
increases.
2
2
8
3
4
3
4
3
4
Means (n=30)
Means (n=60)
Means (n=120)
Simulations
To further illustrate the general behavior of
samples of fixed size n, 10000 samples each of
size 4, 16 and 32 were generated from the
positively skewed distribution pictured below.
Skewed distribution
9
Notice that these sampling distributions are all
skewed, but as n increases, the sampling distributions
became more symmetric and eventually appeared to
be almost normally distributed.
3
8.2: Terminology
Let x denote the mean of the observations
in a random sample of size n from a
population having mean µ and standard
deviation σ. Denote the mean value of the
distribution by μ x and the standard deviation
of the distribution by σ x (called the standard
error of the mean), then the rules on the
next two slides hold.
10
Properties of the Sampling
Distribution of the Sample Mean.
Rule 1: μ x = μ
Rule 2: σ x =
σ
n
This rule is approximately correct as
long as no more than 10% of the
population is included in the sample.
Rule 3: When the population distribution is
normal, the sampling distribution of x
is also normal for any sample size n.
11
Central Limit Theorem.
Rule 4: When n is sufficiently large, the
sampling distribution of x is
approximately normally
distributed, even when the
population distribution is not
itself normal.
12
4
Illustrations of Sampling
Distributions
Population
n =4
n=9
n = 16
Symmetric normal like population
13
Illustrations of Sampling
Distributions
Population
n=4
n=10
n=30
Skewed population
14
More about the Central Limit
Theorem.
The Central Limit Theorem can safely
be applied when n exceeds 30.
If n is large or the population distribution
is normal, the standardized variable
z=
x − μX x − μ
=
σX
σ n
has (approximately) a standard normal
(z) distribution.
15
5
Example
A food company sells “18 ounce” boxes
of cereal. Let x denote the actual amount
of cereal in a box of cereal. Suppose that
x is normally distributed with µ = 18.03
ounces and σ = 0.05.
a) What proportion of the boxes will
contain less than 18 ounces?
18 − 18.03 ⎞
⎛
P(x < 18) = P ⎜ z <
⎟
0.05 ⎠
⎝
= P(z < −0.60) = 0.2743
16
Example - continued
b) A case consists of 24 boxes of cereal.
What is the probability that the mean
amount of cereal (per box in a case)
is less than 18 ounces?
The central limit theorem states that the
distribution of x is normally distributed so
⎛
18 − 18.03 ⎞
P(x < 18) = P ⎜ z <
⎟
0.05 24 ⎠
⎝
= P(z < −2.94) = 0.0016
17
8.3: Some proportion
distributions where π = 0.2
Let p be the proportion of successes in a
random sample of size n from a population
whose proportion of S’s (successes) is π.
n = 100
n = 20
n = 50
n = 10
18 0.2
0.2
0.2
0.2
6
Properties of the Sampling
Distribution of p
Let p be the proportion of successes in a
random sample of size n from a population
whose proportion of S’s (successes) is π.
Denote the mean of p by μp and the
standard deviation by σp (which is the
standard error of the proportion) . Then the
following rules hold
19
Properties of the Sampling
Distribution of p
Rule 1: μp = π
Rule 2:
σp =
π(1 − π)
n
Rule 3: When n is large and π is not too near
0 or 1, the sampling distribution of p is
approximately normal.
And now we can use these to calculate a z score
20
Condition for Use
The further the value of π is from 0.5, the larger n
must be for the normal approximation to the
sampling distribution of p to be accurate.
Rule of Thumb
If both np ≥ 10 and n(1-p) ≥ 10, then it is safe to
use a normal approximation.
Or put another way, we need ≥ 10 successes and ≥
10 failures to say it’s approximately normal.
21
7
Example
If the true proportion of defectives
produced by a certain manufacturing
process is 0.08 and a sample of 400 is
chosen, what is the probability that the
proportion of defectives in the sample is
greater than 0.10?
Since nπ = 400(0.08) = 32 > 10 and
n(1-π) = 400(0.92) = 368 > 10,
it’s reasonable to use the normal
approximation.
22
Example
(continued)
μp = π = 0.08
σp =
π(1 − π)
0.08(1 − 0.08)
=
= 0.013565
n
400
z=
p − μp 0.10 − 0.08
=
= 1.47
0.013565
σp
P(p > 0.1) = P(z > 1.47)
= 1 − 0.9292 = 0.0708
23
Example
Suppose 3% of the people contacted by phone
are receptive to a certain sales pitch and buy
your product. If your sales staff contacts 2000
people, what is the probability that more than
100 of the people contacted will purchase your
product?
Clearly π = 0.03 and p = 100/2000 = 0.05 so
24
⎛
⎞
⎜
0.05 − 0.03 ⎟
P(p > 0.05) = P ⎜ z >
⎟
(0.03)(0.97) ⎟
⎜⎜
⎟
2000
⎝
⎠
0.05 − 0.03 ⎞
⎛
= P⎜ z >
⎟ = P(z > 5.24) ≈ 0
0.0038145 ⎠
⎝
8
Example - continued
If your sales staff contacts 2000 people, what
is the probability that less than 50 of the
people contacted will purchase your product?
Now π = 0.03 and p = 50/2000 = 0.025 so
⎛
⎞
⎜
0.025 − 0.03 ⎟
P(p < 0.025) = P ⎜ z <
⎟
(0.03)(0.97) ⎟
⎜⎜
⎟
2000
⎝
⎠
0.025 − 0.03 ⎞
⎛
= P⎜ z <
⎟ = P(z < −1.31) = 0.0951
0.0038145 ⎠
⎝
25
9