Download 7.samplingdist - Illinois State University Department of Psychology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics for the Social Sciences
Psychology 340
Spring 2005
Sampling distribution
Outline
Statistics for the
Social Sciences
• Review 138 stuff:
–
–
–
–
What are sample distributions
Central limit theorem
Standard error (and estimates of)
Test statistic distributions as transformations
Flipping a coin example
Statistics for the
Social Sciences
3=
n
=
2
2
8 total outcomes
HHH
Number of heads
3
HHT
2
HTH
2
HTT
1
THH
2
THT
1
TTH
1
TTT
0
Flipping a coin example
Statistics for the
Social Sciences
Number of heads
3
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
2
X
f
p
3
1
.125
2
2
1
3
3
.375
.375
1
0
1
.125
1
2
1
0
Hypothesis testing
Statistics for the
Social Sciences
Distribution of possible outcomes
(of a particular sample size, n)
Can make predictions about
likelihood of outcomes based on
this distribution.
• In hypothesis testing, we
compare our observed samples
with the distribution of possible
samples (transformed into
standardized distributions)
• This distribution of possible
outcomes is often Normally
Distributed
Distribution of sample means
Statistics for the
Social Sciences
• Comparison distributions considered so far were
distributions of individual scores
• Mean of a group of scores
– Comparison distribution is distribution of means
Distribution of sample means
Statistics for the
Social Sciences
• A simple case
– Population:
2
4
6
8
– All possible samples of size n = 2
Assumption: sampling
with replacement
Distribution of sample means
Statistics for the
Social Sciences
• A simpler case
– Population:
2
4
6
8
– All possible samples of size n = 2
mean
mean
2
2
4
6
2
5
2
4
2
6
2
8
4
2
4
4
3
4
5
4
8
6
2
6
4
3
4
6
6
6
8
6
4
5
6
7
There are 16 of them
mean
8 2
5
8
4
8
6
8
8
6
7
8
Distribution of sample means
Statistics for the
Social Sciences
5
4
3
2
1
In long run, the random selection of tiles
leads to a predictable pattern
2 3 4 5 6 7 8
means
2
mean
2
2
4
mean
6
5
8
mean
2
5
2
4
3
4
5
4
8
8
4
2
6
6
2
8
6
2
8
6
4
8
8
4
2
3
4
6
6
4
4
6
8
6
4
5
6
7
6
7
8
Distribution of sample means
Statistics for the
Social Sciences
5
4
3
2
1
• Sample problem:
2 3 4 5 6 7 8
means
X f
– What’s the probability of getting a
sample with a mean of 6 or more?
p
8 1 0.0625
7 2 0.1250
6 3 0.1875
5 4 0.2500
4 3 0.1875
3 2 0.1250
2 1 0.0625
P(X > 6) = .1875 + .1250 + .0625 = 0.375
• Same as before, except now we’re
asking about sample means rather
than single scores
Distribution of sample means
Statistics for the
Social Sciences
• Distribution of sample means is a “virtual” distribution
between the sample and population
Population
Distribution of sample means
Sample
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Shape
– If population is Normal, then the dist of sample means
will be Normal
– If the sample size is large (n > 30), regardless of shape of the
population
Population
Distribution of sample means
N > 30
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Center
– The mean of the dist of sample means is equal to the mean of the
population
Population

Distribution of sample means
same numeric value
different conceptual values
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Center
– The mean of the dist of sample means is equal to the
mean of the population
– Consider our earlier example
Population
2
4
6
Distribution of sample means
8
= 2+4+6+8
4
=5
5
4
3
2
1
2 3 4 5 6 7 8
means
= 2+3+4+5+3+4+5+6+4+5+6+7+5+6+7+8
16
=5
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Spread
– The standard deviation of the distribution of sample
mean depends on two things
• Standard deviation of the population
• Sample size
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Spread
• Standard deviation of the population
• The smaller the population variability, the closer the sample
means are to the population mean
X3 X1
X2
X3 X1
X2
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Spread
• Sample size

n=1
X
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Spread
• Sample size
n = 10

X
Statistics for the
Social Sciences
Properties of the distribution of sample
means
• Spread
• Sample size
n = 100
The larger the sample
size the smaller the
spread

X
Properties of the distribution of sample
means
Statistics for the
Social Sciences
• Spread
• Standard deviation of the population
• Sample size
– Putting them together we get the standard deviation of
the distribution of sample means
X 

n
– Commonly called the standard error

Standard error
Statistics for the
Social Sciences
• The standard error is the average amount that
you’d expect a sample (of size n) to deviate from
the population mean
– In other words, it is an estimate of the error that you’d
expect by chance (or by sampling)
Distribution of sample means
Statistics for the
Social Sciences
• Keep your distributions straight by taking care
with your notation
Population
Distribution of sample means

X

Sample
s
X

Statistics for the
Social Sciences
Properties of the distribution of sample
means
• All three of these properties are combined to form
the Central Limit Theorem
– For any population with mean  and standard deviation
, the distribution of sample means for sample size n
will approach a normal distribution with a mean of 
and a standard deviation of  as n approaches infinity
n
(good approximation if n > 30).

Performing your statistical test
Statistics for the
Social Sciences
• What are we doing when we test the hypotheses?
– Computing a test statistic: Generic test
Could be difference between a sample and a
population, or between different samples
observed difference
test statistic 
difference expected by chance
Based on standard error or an
estimate of the standard error
Statistics for the
Social Sciences
Hypothesis Testing With a Distribution
of Means
• It is the comparison distribution when a sample has
more than one individual
• Find a Z score of your sample’s mean on a
distribution of means
Z
(X   X )
X
“Generic” statistical test
Statistics for the
Social Sciences
An example: One sample z-test
Memory example experiment:
• We give a n = 16 memory patients a
memory improvement treatment.
• After the treatment they have an
average score of X = 55 memory errors.
• How do they compare to the general
population of memory patients who have
 of memory errors that is
a distribution
Normal,  = 60,  = 8?
•
Step 1: State your hypotheses
H0: the memory treatment
sample are the same (or
worse) as the population
of memory patients.
Treatment > pop > 60
HA: Their memory is better
than the population of
memory patients
Treatment < pop < 60
“Generic” statistical test
Statistics for the
Social Sciences
An example: One sample z-test
H0: Treatment > pop > 60
HA: Treatment < pop < 60
Memory example experiment:
• We give a n = 16 memory patients a
memory improvement treatment.
• After the treatment they have an
average score of X = 55 memory errors.
• How do they compare to the general
population of memory patients who have
 of memory errors that is
a distribution
Normal,  = 60,  = 8?
•
Step 2: Set your decision
criteria
a = 0.05
One -tailed
“Generic” statistical test
Statistics for the
Social Sciences
An example: One sample z-test
H0: Treatment > pop > 60
HA: Treatment < pop < 60
Memory example experiment:
• We give a n = 16 memory patients a
memory improvement treatment.
• After the treatment they have an
average score of X = 55 memory errors.
• How do they compare to the general
population of memory patients who have
 of memory errors that is
a distribution
Normal,  = 60,  = 8?
One -tailed
•
a = 0.05
Step 3: Collect your data
“Generic” statistical test
Statistics for the
Social Sciences
An example: One sample z-test
H0: Treatment > pop > 60
HA: Treatment < pop < 60
Memory example experiment:
• We give a n = 16 memory patients a
memory improvement treatment.
• After the treatment they have an
average score of X = 55 memory errors.
• How do they compare to the general
population of memory patients who have
 of memory errors that is
a distribution
Normal,  = 60,  = 8?

a = 0.05
One -tailed
•
Step 4: Compute your test
statistics
zX 
X  X
X
= -2.5


55  60
 8



 16 
“Generic” statistical test
Statistics for the
Social Sciences
An example: One sample z-test
Memory example experiment:
• We give a n = 16 memory patients a
memory improvement treatment.
H0: Treatment > pop > 60
HA: Treatment < pop < 60
a = 0.05
One -tailed
zX  2.5
• Step 5: Make a decision
• After the treatment they have an
about your null hypothesis
average score of X = 55 memory errors.
• How do they compare to the general
population of memory patients who have 5%
 of memory errors that is
a distribution
Normal,  = 60,  = 8?
-2
-1

Reject H0
1
2
“Generic” statistical test
Statistics for the
Social Sciences
An example: One sample z-test
H0: Treatment > pop > 60
HA: Treatment < pop < 60
Memory example experiment:
• We give a n = 16 memory patients a
memory improvement treatment.
• After the treatment they have an
average score of X = 55 memory errors.
• How do they compare to the general
population of memory patients who have
 of memory errors that is
a distribution
Normal,  = 60,  = 8?
One -tailed
a = 0.05
zX  2.5
•
Step 5: Make a decision
about your null hypothesis
- Reject H0
- Support for our HA, the
evidence suggests that the
treatment decreases the
number of memory errors