Download Statistics Sampling Distribution Note

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
9/30/2015
Statistics
• A statistic is any quantity whose value
can be calculated from sample data.
CH5: Statistics and their
distributions
MATH/STAT360
CH5
• A statistic can be thought of as a random
variable.
1
MATH/STAT360
Sampling Distribution
CH5
Note
• Any statistic, being a random variable, has
a probability distribution.
• The probability distribution of a statistic is
sometimes referred to as its sampling
distribution.
• In this group of notes we will look at
examples where we know the population
and it’s parameters.
• This is to give us insight into how to
proceed when we have large populations
with unknown parameters (which is the
more typical scenario).
MATH/STAT360
MATH/STAT360
CH5
2
3
CH5
4
9/30/2015
The “Meta-Experiment”
Sample Statistics
• The “Meta-Experiment” consists of indefinitely
many repetitions of the same experiment.
• If the experiment is taking a sample of 100 items
from a population, the meta-experiment is to
repeatedly take samples of 100 items from the
population.
• This is a theoretical construct to help us
understand the probabilities involved in our
experiment.
MATH/STAT360
CH5
5
Distribution of the Sample Mean
Meta-Experiment
Experiment
Population
Sample
Population
Statistic
Sample
of n
Sample
Statistic
Sample of n
Sample of n
Sample
Statistic
Sample of n
Sample
. Statistic
.
Etc.
MATH/STAT360
CH5
6
Example: Random Rectangles
100 Rectangles with µ=7.42 and σ=5.26.
Let X1, X2,…,Xn be a random sample from
a distribution with mean value µ and
standard deviation σ. Then
frequency
1. E ( X )   X  
2. V ( X )   X2   2 / n
0
5
SD ( X )   X   / n
10
15
Histogram of Areas
MATH/STAT360
CH5
7
MATH/STAT360
0
5
CH5 10
Areas
15
20
8
9/30/2015
Based on 68 random samples of size 5:
Mean of the sample means=7.33
SD of the sample means=1.88.
So, the distribution of the sample mean
based on samples of size 5, should have
1. E ( X )  7.42
Histogram of Sample Means
10
0
5
frequency
15
2. SD ( X )  5.26 / 5  2.35
MATH/STAT360
CH5
9
MATH/STAT360
0
5
CH5
10
15
20
10
Means
Normal Distributions
Example: Women’s Heights
• Let X1, X2,…,Xn be a random sample from
a normal distribution with mean value µ
and standard deviation σ.
• Then for any n, X is normally distributed
(with mean µ and standard deviation  / n ).
MATH/STAT360
CH5
11
• It is known that women’s heights are normally
distributed with population mean 64.5 inches
and population standard deviation 2.5 inches.
• We will look at the distribution of sample
means for various sample sizes.
• Since the population follows a normal
distribution, the sampling distribution of X is
also normal regardless of sample size.
MATH/STAT360
CH5
12
9/30/2015
For n=9, the sample means will be normally distributed with
mean=64.5 and standard deviation= 2.5 / 9  0.83.
For n=25, the sample means will be normally distributed with
mean=64.5 and standard deviation= 2.5 / 25  0.5.
Distribution of Sample Means (n=25)
0.2
0.4
Height of Curve
0.3
0.2
0.0
0.0
0.1
Height of Curve
0.6
0.4
0.8
Distribution of Sample Means (n=9)
62
63
64
65
MATH/STAT360
66
67
62
68
Sample
CH5 Mean
13
For n=100, the sample means will be normally distributed with
mean=64.5 and standard deviation = 2.5 / 100  0.25.
MATH/STAT360
63
64
65
66
67
68
Sample Mean
CH5
14
The Central Limit Theorem
Distribution of Sample Means (n=100)
1.0
E ( X )   and V ( X )   2 / n.
0.0
0.5
Height of Curve
1.5
• Let X1, X2,…,Xn be a random sample from
a distribution with mean value µ and
standard deviation σ.
• Then if n is sufficiently large, X has
approximately a normal distribution with
62
MATH/STAT360
63
64
65
Sample Mean
CH5
66
67
68
15
MATH/STAT360
CH5
16
9/30/2015
Rule of Thumb
Sampling Distribution Simulation
• If n>30, the Central Limit Theorem can be
used.
http://onlinestatbook.com/stat_sim/index.html
• For highly skewed data or data with
extreme outliers, it may take sample of
40+ before the CLT starts “working”.
MATH/STAT360
CH5
17
The sampling distribution of X depends on the
sample size (n) in two ways:
1. The standard deviation is  X   / n
which is inversely proportional to n
2. If the population distribution is not normal, then
the shape of the sampling distribution of
X depends on n, being more nearly normal
for larger n.
CH5
CH5
18
Why is the Central Limit Theorem
Important?
Dependence on Sample Size
MATH/STAT360
MATH/STAT360
19
• Every different type of population needs a
different set of procedures (i.e. probability
tables) to answer questions about probability.
• The CLT shows us that we can use the same
procedure (normal probability procedures) for
questions about probability and the sample
mean, regardless of the shape of the original
population.
• The only requirement is a “large” sample.
MATH/STAT360
CH5
20
9/30/2015
Proportions as Means
Distribution of Sample Proportion
• Recall that a binomial RV X is the number
of successes in an experiment consisting
of n independent success/failure trials.
• Let
1 if the ith trial results in a success
• The CLT implies that if n is sufficiently
large, then p̂ has approximately a normal
distribution with
p (1  p )
n
where p is the true population proportion.
E ( pˆ )  p and V ( pˆ ) 
Xi  
0 if the ith trial results in a failure
• Then the sample proportion can be
expressed as
pˆ 
MATH/STAT360
# successes

n CH5
X
i
n
• In order for the approximation to hold we need
np≥10 and n(1-p)≥10.
X
21
Example: Suppose we took samples of size 20
from a population where p=0.5.
MATH/STAT360
0.10
0.00
0.05
Probability
0.15
The sampling distribution of p̂ is approximately
Normal with mean = 0.5 and
variance = p(1  p ) / n  0.5  0.5 / 20  0.0125
CH5
0.0
0.2
0.4
23
0.6
p-hat
0.8
1.0
MATH/STAT360
CH5
22