Download Sampling Distributions Central Limit Theorem - RIT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
___________________________________
___________________________________
___________________________________
Sampling Distributions
Central Limit Theorem
___________________________________
___________________________________
___________________________________
___________________________________
„
„
„
___________________________________
Sampling Distribution
___________________________________
Distribution of a statistic
___________________________________
Recall
„
„
„
„
„
„
___________________________________
Population : Collection of all data on a characteristic
Parameter : Measure of a population
Sample
Statistic
„
„
___________________________________
e.g. mean, s.d, proportion, median, etc., of a population
notation: µ,σ,π, etc.
___________________________________
___________________________________
: Any subset of a population
: Measure of a population
e.g. mean, s.d., proportion, median of sample data
notation:
x , s, p
2
clt
___________________________________
„
Objective:
„
___________________________________
Learn properties of statistics (sample measures)
___________________________________
„
„
„
„
„
3
___________________________________
In particular, properties of sample mean and
sample proportion
___________________________________
What are the average and s.d. of the sample mean?
What is the distribution of the sample mean x ?
___________________________________
___________________________________
What are the average and s.d. of sample proportion?
What is the distribution of the sample proportion, p?
clt
___________________________________
POPULATION (X)
MEAN : µ
S.D. : σ
___________________________________
___________________________________
___________________________________
Choose random, independent samples each of size n.
1: x11, x 12, x13, ..., x1n
„Sample mean
1
„Sample
x
„Sample
___________________________________
Sample 2 x21, x22, x23,...., x2n
„Sample
mean =
x2
___________________________________
___________________________________
3: x31, x 32, x33, ..., x3n
mean
, etc.
3
x
„Sample
4
clt
___________________________________
„
„
So, we could have many sample means.
Sample mean is a random variable
„
„
___________________________________
___________________________________
A statistic
___________________________________
What can we say about this?
___________________________________
„
„
1. On the average, the sample means
must equal the population mean
___________________________________
___________________________________
2. The sample means are closer together
than the population values themselves.
5
clt
___________________________________
„
Meaning:
„
___________________________________
Average of sample means= population mean
µX = µ
___________________________________
___________________________________
„
Variance of the sample means is smaller than
the variance of the population (σ2 )
How much smaller?
„
Variance of sample means =
„
σ 2x =
6
σ2
n
; so, σ X =
___________________________________
___________________________________
___________________________________
population variance
sample size
σ
n
clt
___________________________________
„
Important Observations
„
„
„
___________________________________
There are three “averages” we are talking about:
„
1. Population mean = µ Å This is a constant
„
2. Sample mean =
„
3. Mean of Sample means
x
___________________________________
___________________________________
Å This is a random variable
µX
___________________________________
Å A constant
Similarly, there are two variances:
„
1. Variance of the population= σ Å A constant
„
2. Variance of Sample means=
σ X2
___________________________________
___________________________________
Å A constant
You need to use the right notation and interpret
each quantity appropriately.
7
clt
„
µx = µ;
„
„
„
___________________________________
So, we have the important relationship:
σ 2x =
σ
___________________________________
2
___________________________________
n
Note that the bigger the sample size, the less
variable are the values of sample means.
The above are true for means of samples
from any population and for samples of any
size.
What about the distribution of the sample
means, x ? Recall that x is a r.v.
8
___________________________________
___________________________________
___________________________________
clt
„
„
If population is normal then
„
___________________________________
~ Normal
___________________________________
x
___________________________________
___________________________________
for large samples
___________________________________
How large?
„
„
x
If population is not normal or unknown, then
≈ Normal (approximately normal)
„
___________________________________
Sample size at least 30.
___________________________________
CENTRAL LIMIT THEOREM
„
9
___________________________________
Applies when population is non-normal or when
the population distribution is unknown.
clt
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Central Limit Theorem
___________________________________
___________________________________
___________________________________
The random variable is the number of units a component
sold by a company. The probability distribution of X is
___________________________________
___________________________________
given as:
U n its s o ld
1000
3000
5000
10000
P ro b a b ility
0 .1
0 .3
0 .4
0 .2
___________________________________
___________________________________
We can compute the mean µ = 5000 units and the
___________________________________
standard deviation is σ = 2792.8 units
Suppose we look at a sample of 16 sales and do this, say,
___________________________________
100 times. We can easily mimic (simulate) this
experiment in Minitab.
11
clt
___________________________________
„
First we have to enter the probability
distribution in our worksheet.
„
„
„
„
„
„
„
12
___________________________________
___________________________________
Column 1 has the values
1000,3000,5000.10000
Column 2 has the probabilities
___________________________________
___________________________________
Now we tell Minitab to generate 100
samples each of size 16 from our discrete
distribution:
___________________________________
___________________________________
Take a look at some of the samples…
Row
1 Sample 1
2 Sample 2
C3
10000
3000
C4
C5
C6
C7
C8
C9
C10
5000 5000 3000 3000 5000 1000 10000 ….
1000 3000 3000 3000 10000 5000 10000 …..
clt
„
„
___________________________________
Now, compute the means of each of the 100
samples store them in column c19
Look at the histogram of sample means in
C19 compare with the histogram of the
“population distribution”.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
0.4
20
prob
Frequency
0.3
0.2
0.1
10
0
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
3500 4000 4500 5000 5500 6000 6500 7000
units
sam16
13
clt
___________________________________
„
Let us repeat the process by taking samples
of size 64, 81. Below you see the graphs of
the histograms of the sample averages for
all three cases.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
14
clt
___________________________________
___________________________________
___________________________________
___________________________________
Frequency
20
___________________________________
___________________________________
10
___________________________________
0
3500 4000 4500 5000 5500 6000 6500 7000
sam16
15
clt
___________________________________
___________________________________
___________________________________
___________________________________
Frequency
20
___________________________________
___________________________________
10
___________________________________
0
4250 4500 4750 5000 5250 5500 5750 6000
sam64
16
clt
___________________________________
___________________________________
___________________________________
Frequency
30
___________________________________
___________________________________
20
___________________________________
10
___________________________________
0
4400 4600 4800 5000 5200 5400 5600 5800
sam81
17
clt
___________________________________
„
Note that:
„
„
In all the three examples, the middle value -- the mean
of the sample means -- is about 5000 (the mean of the
population)
„ this is what the property
µ x = µ says
As the sample size increases, the values of the sample
mean get closer.
„
„
„
this is what the property
σx =σ / n
___________________________________
___________________________________
___________________________________
___________________________________
says.
The distribution gets more symmetrical.
(I.e.) As the sample size gets larger, the distribution
of the sample averages looks like that of a normal
distribution.
„
18
___________________________________
___________________________________
this is what CLT says.
clt
___________________________________
Central Limit Theorem for Proportions
„
___________________________________
There is nothing special about averages
___________________________________
Binomial Population
Success / Failure
π= P(Success)
π= Proportion of successes
___________________________________
___________________________________
___________________________________
___________________________________
Sample of size n : x1,x2,... , xn
Success / failure
19
clt
___________________________________
Examples
„
„
„
___________________________________
Each component is either good or defective
π = Proportion of good items
in the population (Shipment)
„
„
„
___________________________________
Shipment of components
___________________________________
process dependent
may or may not be known
___________________________________
Random sample of 250 components
„
„
„
___________________________________
Each component is either good or bad
p = proportion of good components in the
sample
Changes from one sample to another sample
20
___________________________________
clt
___________________________________
„
Population:
„
„
„
___________________________________
___________________________________
p = proportion (fraction) of voters in the
sample who favor the candidate
___________________________________
Value of p will change from one sample to
another
„
21
___________________________________
Random sample of 1057 voters
„
„
___________________________________
Favorable or unfavorable opinion about a
candidate, say in our county.
π = proportion of voters who favor the
candidate
___________________________________
Random
clt
___________________________________
• p = proportion of successes in the sample
___________________________________
• Sample proportion
___________________________________
• Statistic; Random
• What are the properties of the statistic p ?
___________________________________
• (1) Average of sample proportion is the population proportion:
µp = π
• (2) S.d. of sample proportion p:
• CLT => p ≈ Normal
σp =
___________________________________
___________________________________
π (1 − π )
___________________________________
n
• Combine with (1) and (2) to get
•
22
p≈
(
N π,
π (1 − π )
n
)
for large samples.
clt
___________________________________
Things to remember...
„
„
„
„
„
23
___________________________________
Use appropriate notation to identify the
given information
Define the random variables
Write the question in terms of the r.v.
Keep the information about the population
and the information about the sample
separate.
Connect your steps.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
clt