Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
On-line resources • • • • http://wise.cgu.edu/powermod/index.asp http://wise.cgu.edu/regression_applet.asp http://wise.cgu.edu/hypomod/appinstruct.asp http://psych.hanover.edu/JavaTest/NeuroAnim/stats/StatDec. html • http://psych.hanover.edu/JavaTest/NeuroAnim/stats/t.html • http://psych.hanover.edu/JavaTest/NeuroAnim/stats/CLT.html • Note demo page Effect sizes Large Medium Small R-squared .15 .06 .01 r .39 .24 .10 For a small effect size, .01, The change in success rate is from 46% to 54% For a medium effect size, .06, The change in success rate is from 38% to 62%. For a large effect size, .16, The change in success rate is from 30% to 70% Cohen’s D .80 .50 .20 But what does .10 really mean? Predictor Outcome R2 r Vietnam veteran status Alcohol abuse .00 .03 Testostone Juvenile delinquency .01 .10 AZT Death .05 .33 Psychotherapy Improvement .10 .32 Is psychotherapy effective? (after Shapiro & Shapiro, 1983) Therapy target Number of studies Anxiety & depression 30 Cohen’s D r R2 .67 .31 9.6% Phobias 76 .88 .54 29% Physical and habit problems Social and sexual problems Performance anxieties 106 .85 .52 27% 76 .75 .43 18% 126 .71 .37 14% Calculating Cohen’s D Effect size = difference between predicted mean and mean of known population divided by population standard deviation (assumes that you know population and sample size) (imagine one population receives treatment, the other does not) d= (m1-m2) / s m1=mean of population 1 (hypothesized mean for the population that is subjected to the experimental manipulation) m2=mean of population 2 (which is also the mean of the comparison distribution) s=standard deviation of population 2 (assumed to be the standard deviation of both populations One other way to think about D • D =.20, overlap 85%, 15 vs. 16 year old girls distribution of heights • D=.50, overlap 67%, 14 vs. 18 year old girls distribution of heights • D=.80, overlap 53%, 13 vs. 18 years old girls distribution of heights Effect sizes are interchangeable • http://www.amstat.org/publications/jse/v10n3/aberson/power_applet.html Statistical significance vs. effect size • p <.05 • r =.10 – For 100,000, p<.05 – For 10, p>.05 – Large sample, closer to population, less chance of sampling error Brief digression • Research hypotheses and statistical hypotheses • Is psychoanalysis effective? – Null? – Alternate? – Handout • Why test the null? Statistical significance and decision levels. (Z scores, t values and F values) Sampling distributions for the null hypothesis: http://statsdirect.com/help/distributions/pf.htm One way to think about it… Two ways to guess wrong Truth for population Do not reject null hypothesis Reject null hypothesis Null is true Correct! Type 1 error Null is not true Type 2 error Correct! Type 1 error: think something is there and there is nothing Type 2 error: think nothing is there and there is An example Null hypothesis is false Null hypothesis is true Reject null hypothesis Merit pay works and we We decided merit pay know it worked, but it doesn’t. Do not reject null hypothesis We decided merit pay does not work but it does. Merit pay does not work and we know it. An example Imagine the following research looking at the effects of the drug, AZT, if any, on HIV positive patients. In others words, does a group of AIDs patients given AZT live longer than another group given a placebo. If we conduct the experiment correctly - everything is held constant (or randomly distributed) except for the independent measure and we do find a different between the two groups, there are only two reasonable explanations available to us: Null hypothesis is Null hypothesis is false true Reject null hypothesis Do not reject null hypothesis From Dave Schultz: Statistical power is how “sensitive” a study is detecting various associations (magnification metaphor) If you think that the effect is small (.01), medium, (.06) or large (.15), and you want to find a statistically significant difference defined as p<.05, this table shows you how many participants you need for different levels of “sensitivity” or power. Power -> Effect size | .01 .06 .15 .10 .20 .30 .40 .50 .60 21 5 3 53 10 5 83 14 6 113 19 8 144 179 24 30 10 12 .70 .80 .90 219 271 354 36 44 57 14 17 22 If you think that the effect is small (.01), medium, (.06) or large (.15), and you want to find a statistically significant difference defined as p<.01, this table shows you how many participants you need for different levels of “sensitivity” or power. Power -> .10 .20 .30 .40 .50 .60 .70 .80 .90 .01 70 116 156 194 232 274 323 385 478 .06 13 20 26 32 38 45 53 62 77 .15 6 8 11 13 15 18 20 24 29 Effect size | What determines power? 1. Number of subjects 2. Effect size 3. Alpha level Power = probability that your experiment will reveal whether your research hypothesis is true How increase power? 1. 2. 3. 4. Increase region of rejection to p<.10 Increase sample size Increase treatment effects Decrease within group variability Study feature Practical way of raising power Disadvantages Predicted difference Increase intensity of experimental procedures Use a less diverse population Use standardized, controlled circumstances of testing or more precise measurement Use a larger sample size May not be practical or distort study’s meaning May not be available, decreases generalizability Not always practical Standard deviation Standard deviation Sample size Significant level One tailed vs. two tailed test Not practical, can be costly Use a more lenient level of Raises alpha, the significance probability of type 1 error Use a one-tailed test May not be appropriate to logic of study What is adequate power? .50 (most current research) .80 (recommended) How do you know how much power you have? Guess work Two ways to use power: 1. Post hoc to establish what you could find 2. Determine how many participants need Outcome statistically significant Sample Size Conclusion Yes Small Important results Yes Large Might or might not have practical importance No Small Inconclusive No Large Research H. probably false Statistical power (for p <.05) r=.10 r=.30 r=.50 Two tailed One tailed Power: Power = 1 - type 2 error Power = 1 - beta