Download Effect size

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
On-line resources
•
•
•
•
http://wise.cgu.edu/powermod/index.asp
http://wise.cgu.edu/regression_applet.asp
http://wise.cgu.edu/hypomod/appinstruct.asp
http://psych.hanover.edu/JavaTest/NeuroAnim/stats/StatDec.
html
• http://psych.hanover.edu/JavaTest/NeuroAnim/stats/t.html
• http://psych.hanover.edu/JavaTest/NeuroAnim/stats/CLT.html
• Note demo page
Effect sizes
Large
Medium
Small
R-squared
.15
.06
.01
r
.39
.24
.10
For a small effect size, .01,
The change in success rate is from 46% to 54%
For a medium effect size, .06,
The change in success rate is from 38% to 62%.
For a large effect size, .16,
The change in success rate is from 30% to 70%
Cohen’s D
.80
.50
.20
But what does .10 really mean?
Predictor
Outcome
R2
r
Vietnam veteran
status
Alcohol abuse
.00
.03
Testostone
Juvenile delinquency
.01
.10
AZT
Death
.05
.33
Psychotherapy
Improvement
.10
.32
Is psychotherapy effective?
(after Shapiro & Shapiro, 1983)
Therapy target
Number of
studies
Anxiety & depression 30
Cohen’s D
r
R2
.67
.31
9.6%
Phobias
76
.88
.54
29%
Physical and habit
problems
Social and sexual
problems
Performance
anxieties
106
.85
.52
27%
76
.75
.43
18%
126
.71
.37
14%
Calculating Cohen’s D
Effect size = difference between predicted mean and mean of known
population divided by population standard deviation (assumes that
you know population and sample size)
(imagine one population receives treatment, the other does not)
d= (m1-m2) / s
m1=mean of population 1 (hypothesized mean for the population that is
subjected to the experimental manipulation)
m2=mean of population 2 (which is also the mean of the comparison
distribution)
s=standard deviation of population 2 (assumed to be the standard
deviation of both populations
One other way to think about D
• D =.20, overlap 85%, 15 vs. 16 year old girls
distribution of heights
• D=.50, overlap 67%, 14 vs. 18 year old girls
distribution of heights
• D=.80, overlap 53%, 13 vs. 18 years old girls
distribution of heights
Effect sizes are interchangeable
•
http://www.amstat.org/publications/jse/v10n3/aberson/power_applet.html
Statistical significance vs. effect size
• p <.05
• r =.10
– For 100,000, p<.05
– For 10, p>.05
– Large sample, closer to population, less chance of
sampling error
Brief digression
• Research hypotheses and statistical
hypotheses
• Is psychoanalysis effective?
– Null?
– Alternate?
– Handout
• Why test the null?
Statistical significance and decision levels.
(Z scores, t values and F values)
Sampling distributions for the null hypothesis:
http://statsdirect.com/help/distributions/pf.htm
One way to think about it…
Two ways to guess wrong
Truth for
population
Do not reject
null hypothesis
Reject null
hypothesis
Null is true
Correct!
Type 1 error
Null is not true
Type 2 error
Correct!
Type 1 error: think something is there and there is nothing
Type 2 error: think nothing is there and there is
An example
Null hypothesis is false Null hypothesis is true
Reject null hypothesis
Merit pay works and we We decided merit pay
know it
worked, but it doesn’t.
Do not reject null hypothesis
We decided merit pay
does not work but it
does.
Merit pay does not work
and we know it.
An example
Imagine the following research looking at the effects of the drug, AZT,
if any, on HIV positive patients. In others words, does a group of AIDs
patients given AZT live longer than another group given a placebo. If
we conduct the experiment correctly - everything is held constant (or
randomly distributed) except for the independent measure and we do
find a different between the two groups, there are only two
reasonable explanations available to us:
Null hypothesis is Null hypothesis is
false
true
Reject null hypothesis
Do not reject null
hypothesis
From Dave Schultz:
Statistical power is how “sensitive” a study is
detecting various associations (magnification
metaphor)
If you think that the effect is small (.01), medium, (.06) or large (.15), and you want to
find a statistically significant difference defined as p<.05, this table shows you how
many participants you need for different levels of “sensitivity” or power.
Power ->
Effect size |
.01
.06
.15
.10
.20
.30
.40
.50
.60
21
5
3
53
10
5
83
14
6
113
19
8
144 179
24 30
10 12
.70
.80 .90
219 271 354
36 44 57
14 17 22
If you think that the effect is small (.01), medium, (.06) or large (.15), and you want to
find a statistically significant difference defined as p<.01, this table shows you how many
participants you need for different levels of “sensitivity” or power.
Power ->
.10
.20
.30
.40
.50
.60 .70
.80 .90
.01
70
116 156
194
232 274 323 385 478
.06
13
20
26
32
38
45
53
62
77
.15
6
8
11
13
15
18
20
24
29
Effect size |
What determines power?
1. Number of subjects
2. Effect size
3. Alpha level
Power = probability that your experiment will
reveal whether your research hypothesis is
true
How increase power?
1.
2.
3.
4.
Increase region of rejection to p<.10
Increase sample size
Increase treatment effects
Decrease within group variability
Study feature
Practical way of raising
power
Disadvantages
Predicted difference
Increase intensity of
experimental procedures
Use a less diverse
population
Use standardized,
controlled circumstances
of testing or more precise
measurement
Use a larger sample size
May not be practical or
distort study’s meaning
May not be available,
decreases generalizability
Not always practical
Standard deviation
Standard deviation
Sample size
Significant level
One tailed vs. two tailed
test
Not practical, can be costly
Use a more lenient level of Raises alpha, the
significance
probability of type 1 error
Use a one-tailed test
May not be appropriate to
logic of study
What is adequate power?
.50 (most current research)
.80 (recommended)
How do you know how much power you have?
Guess work
Two ways to use power:
1. Post hoc to establish what you could find
2. Determine how many participants need
Outcome
statistically
significant
Sample Size
Conclusion
Yes
Small
Important results
Yes
Large
Might or might not
have practical
importance
No
Small
Inconclusive
No
Large
Research H.
probably false
Statistical power (for p <.05)
r=.10
r=.30
r=.50
Two tailed
One tailed
Power:
Power = 1 - type 2 error
Power = 1 - beta