Download Significance Tests

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Significance
Tests
…and their significance
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
Self-esteem
15
20
25
30
35
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
The sample means would stack
up in a normal curve. A normal
sampling distribution.
z -3
Self-esteem
-2
15
-1
20
0
25
1
30
2
35
3
40
Significance Tests
Remember how a sampling distribution of means is created?
Take a sample of size 500 from the US. Record the mean selfesteem. If the mean should be 25, you might get this.
The sample means would stack
up in a normal curve. A normal
sampling distribution.
2.5%
z -3
Self-esteem
2.5%
-2
15
-1
20
0
25
1
30
2
35
3
40
Significance Tests
The sample size affects the sampling
distribution:
Standard error = population standard deviation / square root of sample size
Y-bar= /n
s.e. = pop. Sd./ n
But in fact we use our sample’s
standard deviation as an estimate of
the population’s.
• Σ(Y – Y-bar)2
n-1
Significance Tests
And if we increase our sample size (n)…
Our repeated sample means will be closer to the true
mean:
2.5%
2.5%
Z-3 -2 -1 0 1 2 3
z -3
-2
-1
0
1
2
3
Significance Tests
Means will be closer to the true mean,
and our standard error of the sampling distribution is
smaller:
2.5%
2.5%
Z-3 -2 -1 0 1 2 3
z -3
-2
-1
0
1
2
3
Significance Tests
The range of particular middle percentages gets
smaller:
Self-esteem
15
20
25
30
35
40
2
3
Z-3 -2 -1 0 1 2 3
z -3
-2
95% Range
-1
0
1
Significance Tests
We use that measuring stick
to say two things:
1.96z
1. If my sample is in the
middle specified percent,
the population’s mean is
within this range.
(Confidence Interval)
95%
1z
2. Besides construct a
confidence interval,
we can also do a
significance test.
68%
3z
-3 -1.96 -1

0
99.99%
1 1.96 3
68%
95%
99.99%
Significance Tests
We use that measuring stick
to say two things:
1.96z
1. If my sample is in the
middle specified percent,
the population’s mean is
within this range.
(Confidence Interval)
2. If the population mean is
the same as a guess of
mine, then my sample’s
mean would have to fall
within this range to have
been drawn from the
middle specified percent.
(Significance Test)
95%
1z
68%
3z
-3 -1.96 -1

0
99.99%
1 1.96 3
68%
95%
99.99%
Significance Tests
We know that if you have
your sampling
distribution centered
on the population
mean:
•
16% of samples’
means would be
larger than  + 1z and
16% would be smaller
than  - 1z, for a
total of 32%
outside that range.
1z
68%
-3 -1.96 -1

0
68%
1 1.96 3
Significance Tests
We know that if you have
your sampling
distribution centered
on the population
mean:
•
1.96z
95%
2.5% of samples’
means would be larger
than  + 1.96z and
2.5% would be smaller
than  - 1.96z, for a
total of 5% outside
that range.
-3 -1.96 -1

0
95%
1 1.96 3
Significance Tests
We know that if you have
your sampling
distribution centered
on the population
mean:
•
0.005% of samples’
means would be
larger than  + 3z and
0.005% would be
smaller than  - 3z,
for a total of 0.01%
outside that range.
3z
-3 -1.96 -1

0
99.99%
1 1.96 3
99.99%
Significance Tests
But you remember that we don’t
normally know the actual mean for
the population.
But what if we guessed?
What if we specified a value that might
be the population mean?
Significance Tests
If we guessed a mean…
If our guess is correct, our
sample’s mean should be
among the common
samples that would have
been drawn from a
population with that
guessed mean.
guess
If it is not, it is likely that
the sample did not come
from such a population.
-3 -1.96 -1

0
1 1.96 3
What if my
sample’s
mean were
here?
Significance Tests
One way to tell
whether our sample’s
mean was generated
by such a population
is to place our
sampling distribution
over the guessed
mean to see if the
sample mean is
among the middle
99% or 95% of
samples that would be
generated by such a
mean.
1.96z
95%
What if my
sample’s
mean were
here?
-3 -1.96 -1

0
guess
95%
1 1.96 3
It is among
the rare 5%
of possible
means.
Significance Tests
Essentially, a
significance test for a
mean tells you what
the odds are that your
sample mean could
have come from a
population whose
mean equals your
guess.
1.96z
95%
What if my
sample’s
mean were
here?
-3 -1.96 -1

0
guess
95%
1 1.96 3
It is among
the rare 5%
of possible
means.
Significance Tests
What you do is
figure out what
your sample’s zscore is relative to
your guessed mean.
If z is larger than
1.96 or smaller than
-1.96, you have less
than a 5% chance
than your sample
came from such a
“guess population”
Essentially, a significance
test for a mean tells you
what the odds are that
your sample mean could
have come a population
with a particular mean.
—reject the guess!
-3 -1.96 -1

0
1 1.96 3
guess
95%
Sample mean
Significance Tests
For example:
If our guess was that self-doubt
scores in the population averaged 20 on a scale
from 1 – 50, we’d place a guess as below.
Self-doubt
16
18
20
22
24
26
28
Significance Tests
We guess 20, but our sample of size 100 has a mean
of 25 and a standard deviation of 10.
Guess, 
Sample, Y-bar
Self-doubt
16
18
20
22
24
26
28
Significance Tests
Let’s build a sampling
distribution around our
guess, 20: sample of size
100; s.d. = 10.
Sample, Y-bar
s.e. = 10/100 =
10/10 = 1
Self-doubt
Z:
16
18
20
22
24
-3 -2 -1 0 1 2 3 4 5
26
28
Significance Tests
Our sample appears to be
larger than a critical value
of 1.96 (outer 5% of
samples) or even 2.58
(outer 1% of samples).
Sample, Y-bar
s.e. = 10/100 =
10/10 = 1
Self-doubt
Z:
16
18
20
22
24
-3 -2 -1 0 1 2 3 4 5
26
28
Significance Tests
How many z’s is our sample
mean away from our
guess?
Z = Y-bar –  / s.e.
Z = 25 – 20 / 1
s.e. = 10/100 =
10/10 = 1
Sample, Y-bar
z=5
Self-doubt
Z:
16
18
20
22
24
-3 -2 -1 0 1 2 3 4 5
26
28
Significance Tests
Indeed, our sample z-score is 5, well above 1.96 or
2.58. Reject the guess!
s.e. = 10/100 =
Looking in Appendix B…
10/10 = 1
Our sample has a
Sample, Y-bar
.0000287 % chance of
having come from a
population whose
mean is 20!
Self-doubt
Z:
16
18
20
22
24
-3 -2 -1 0 1 2 3 4 5
26
28
Significance Tests
Conducting a Test of Significance for the Mean
By slapping the sampling distribution for the mean over a guess of the mean,
Ho, we can find out whether our sample could have been drawn from a
population where the mean is equal to our guess.
1.
2.
3.
4.
5.
6.
7.
Decide -level ( = .05) and nature of test (two-tailed vs. one-tailed)
Set critical z (z = +/- 1.96) or t
Make guess or null hypothesis,
Ho:  = 0
Ha:   0
Collect and analyze data
Calculate Z or t: z = Y-bar - 
s.e.
Make a decision about the null hypothesis (reject or fail to reject)
Find the P-value
Significance Tests
1. Decide -level ( = .05) and nature of test
(two-tailed vs. one-tailed).
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
-level refers to how unlikely a sample’s
mean would have to be before you’d reject
your guess. The scientific standard is
typically .05 probability, or a 5% chance that
your sample came from a population whose
mean is what you guessed.
If your sample’s mean has less than 5%
chance of having come from a population
with your guess, you’d reject the guess (the
null hypothesis). -level could be set at .10,
.01, etc.
Guess, µo
What if my
sample
mean were
here?
Significance Tests
1. Decide -level ( = .05) and nature of test
(two-tailed vs. one-tailed).
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
One- or two-tailed test refers to the rejection
region in your sampling distribution.
2.5% of
sampling
distribution.
If your -level were .05, in a two-tailed test
your rejection region would be the outer
2.5% of each tail.
A two-tailed test implies a directionless null
hypothesis such as µo = 0.
Guess, µo
What if my
sample
mean were
here?
Significance Tests
1. Decide -level ( = .05) and nature of test
(two-tailed vs. one-tailed).
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
One- or two-tailed test refers to the rejection
region in your sampling distribution.
If your -level were .05, in a one-tailed test
your rejection region would be the outer 5%
of one of the tails.
5% of
sampling
distribution.
A one-tailed test implies a directional null
hypothesis such as µo ≤ 0 or µo ≥ 0 .
The idea: If I have good reason to think a
parameter would be above a particular value,
then I only need to set the guess at that
value or less (µo ≤ 0) and look to see if the
sample statistic is in the rare 5% of possible
samples above the null. If it is in the
extreme low end, I won’t reject the null!
Guess, µo
What if my
sample
mean were
here or
there?
Significance Tests
2. Set critical z (z = +/- 1.96) or t
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
-level refers to how unlikely a sample’s
mean would have to be before you’d reject
your guess. There is a z- or t-score that
corresponds with that  proportion of the
area in the tails of the curve (area in the tails
of the sampling distribution).
For example,
?? in the right tail corresponds with z = ??
.10
.05
.025
.01
.005
1.28
1.65
1.96
2.33
2.58
Guess, µo
What if my
sample
mean were
here?
Significance Tests
• We use t instead
of z to be more
accurate:
• t curves are
Tea Tests?
symmetric and
bell-shaped like
the normal
distribution.
However, the
spread is more
than that of the
standard normal
distribution—the
tails are fatter.
df = 1, 2, 3, and so on,
approaching normal as df
exceeds 120.
Significance Tests
• The reason for using t is due to the fact that we use
sample standard deviation (s) rather than population
standard deviation (σ) to calculate standard error. Since
s, standard deviations, will vary from sample to sample,
Tea Tests?
the variability in the sampling distribution ought to be
greater than in the normal curve. t has a larger spread,
more accurately reflecting the likelihood of extreme
samples, especially when sample size is small.
• The larger the degrees of freedom (n – 1), the closer
the t curve is to the normal curve. This reflects the fact
that the standard deviation s approaches σ for large
sample size n.
• Even though z-scores based on the normal curve will
work for larger samples (n > 120) SPSS uses t for all
tests because it works for small samples and large
samples alike.
Significance Tests
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
3. Make guess or null hypothesis:
Ho:  = 0
Ha:   0
The guess refers to the value that you will feel
comfortable with declaring is true for the
population unless your sample evidence suggests
otherwise.
In science, we wouldn’t want to assert something
based on a sample unless we had extremely good
evidence. The null is a default assumption, such as
saying previous research says the mean is . In
more advanced statistics, we will typically use null
hypotheses that declare “no difference between
groups” or “no relationship between variables.”
The alternative is typically consistent with your
research hypothesis or expectations.
Guess, µo
What if my
sample
mean were
here?
Significance Tests
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
3. Make guess or null hypothesis:
Ho:  = 0 (or some other value)
Ha:   0
The hypotheses above refer to a two-tailed test.
Hypotheses for one-tailed tests would be like this:
Ho:  ≤ 0 (or some other value)
Ha:  > 0
Ho:  ≥ 0
Ha:  < 0
Guess, µo
What if my
sample
mean were
here?
Significance Tests
4. Collect and analyze data.
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
Once you’ve established your assumptions
and what you are testing for, you can get into
data analysis.
Note that this ordering of steps helps prevent
you from “peaking into” the data to establish
your assumptions and tests. Basing tests on
sample information sets up predetermined
outcomes—bad!
If calculating inferential statistics by hand,
you would need to find your mean and
standard deviation for each variable.
Guess, µo
What if my
sample
mean were
here?
Significance Tests
5. Calculate Z or t: z or t = Y-bar - 
s.e.
s.e. for z = σ/√n
s.e. for t = s/√n
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
Calculating the test statistic will tell you how
many standard errors away from the null
hypothesis your sample statistic is.
Corresponding with the z or t value is an area
under the curve that tells what proportion or
percentage of sample means would have
Guess, µo
been that far away if your null hypothesis
were correct.
What if my
sample
mean were
here?
Significance Tests
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
6. Make a decision about the null hypothesis.
Is your sample statistic more standard errors away
from your guess or null hypothesized value than
your critical z or t?
If it is farther out:
•It meets the criteria for implausibly rare that you
established from the outset.
•You would reject the null, saying it is unlikely your
sample could have come from a population where
that null value is true
If it is not more extreme than your critical z or t:
•It is not an unlikely occurrence as established
from the outset.
•You would fail to reject the null, saying that your
guess is likely true and your sample has a good
chance of having come from a population with that
null value.
Guess, µo
- Z? What if my
sample
mean were
here?
Significance Tests
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
7. Find the p-value.
The p-value will tell you the actual likelihood that
you’d get a sample with your statistic that is as far
away from the null value or the guess if your null
or guess were true for your population.
To find p, look in a z or t table to find the
proportion of the area in the tails of the curve that
corresponds with the z or t that you calculated for
your sample statistic.
p?
Remember to be sure you keep track of whether
you are doing a two-tailed (p * 2) or one-tailed (p)
test.
Guess, µo
What if my
sample
mean were
here?
Significance Tests
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
Another Example of a Significance Test of the mean
or proportion.
An administrator read that “snob universities” have
over 50% of student GPAs over 3.5. He wants
to determine whether SJSU is a “snob
university.”
He decides:
1. To use an alpha-level of .05 with a one-tailed
test.
2. That the critical z or t will be 1.65.
3. Thinking SJSU is a “snob university” he sets his
null as: Ho: Π ≤ .5; Ha: Π > .5
5%
Guess, .50
Significance Tests
Another Example of a Significance Test of the mean
or proportion.
An administrator read that “snob universities” have
over 50% of student GPAs over 3.5. He wants to
determine whether SJSU is a “best university.”
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
He decides:
1. To use an alpha-level of .05 with a one-tailed
test.
2. That the critical z or t will be 1.65.
3. Thinking SJSU is a “snob university” he sets his
null as: Ho: Π ≤ .5; Ha: Π > .5
4. He collects data from 500 randomly selected
SJSU students and finds that .40 have GPAs
above 3.5.
Guess, .50
5. He calculates z
s.e.= √(p)(1-p)/N
z = p – Πo / s.e. s.e. = √.4(.6) = √.24 = .022
Our sample.
500
500
z = .4 - .5/ .022 = -4.55
5%
Significance Tests
Another Example of a Significance Test of the mean or
proportion.
Sampling
distribution
of sample
means, s.e.
calculated
by s/√n
An administrator read that “snob universities” have over
50% of student GPAs over 3.5. He wants to
determine whether SJSU is a “snob university.”
He decides:
5. He calculates z
z = p – po / s.e.
z = .4 - .5/ .022
= -4.55
s.e.= √(p)(1-p)/N
s.e. = √.4(.6) = √.24 = .022
500
500
6. Making a decision about the null is easy. He sees
that his sample proportion is lower than the null of .5
and within the null of less than .5. He fails to reject
the null.
7. In finding the p-value, he sees that if the population
value were .5, he’d have less than .ooo1 chance of
getting a GPA that low. He has good evidence that
SJSU is not a “snob univeristy.”
5%
Guess, .50
Our sample.
Significance Tests
• One final note:
– The tests we typically use in sociology
have assumptions of large sample sizes.
– When conducting tests with small
sample sizes, some restrictions apply:
• When working with means, we typically
have to assume the population values are
normally distributed.
• When working with proportions, we must
use a binomial probability distribution.