Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Repeated Measures ANOVA PSYCHOMETRICS Michael J. Kalsher Department of Cognitive Science MGMT 6971 PSYCHOMETRICS © 2014, Michael Kalsher 1 of 44 Outline • Review • Descriptive vs. Inferential Statistics • Parametric vs. Non-parametric Statistics • The role of equivalence in research • The t test – One-group t test – Independent-groups t test – Dependent-groups t test • Analysis of Variance – One-way Independent-groups ANOVA – One-way Dependent-groups ANOVA 2 of 44 Descriptive Statistics • Describing or characterizing the obtained sample data • Use of summary measures, typically: – measures of central tendency (mean, median, mode) – measures of dispersion (range, variance, standard deviation) 3 of 44 Inferential Statistics • Used to make inferences about populations based on the behavior of a sample. • Concerned with how likely it is that a result based on a particular sample, or set of samples, are the same as results that might be obtained from an entire population. 4 of 44 Parametric vs. Non-parametric tests Parametric tests • Term used for inferential tests based on the normal distribution. – Accuracy depends on tests meeting basic assumptions. • Most outcomes closely follow a known probability distribution. • Many parametric tests are robust to violations of distributional assumptions, so the assigned p-value will be fairly accurate in many situations. Non-Parametric tests • A family of statistical procedures that do not rely on the restrictive assumptions of parametric tests. In particular, they do not assume that data come from a normal distribution. 5 of 44 Assumptions of Parametric Data Most parametric statistics based on the normal distribution have four basic assumptions that must be met for the test to be accurate: 1. Normally distributed data. 2. Homogeneity of variance (i.e., variance of each group should be equivalent). 3. Data must be interval or ratio. 4. Independence (i.e., the behavior of one participant doesn’t influence the behavior of another participant). 6 of 44 Achieving Equivalence: Random Assignment versus Correlated-Groups Design Random assignment: helps to ensure statistical equivalence of groups at the beginning of a study. Correlated-groups design: assures equivalence by either using the same participants in all groups or participants that have been closely matched. They are usually more sensitive than betweensubjects designs to the effects of the IV(s). - Within-subjects (or repeated measures) design Tests each subject under every condition - Matched-subjects designs Match participants on relevant variables prior to the study and then randomly assign the matched sets of participants--one member of each matched set--to each group. 7 of 44 Characteristics of WithinSubjects Designs 1. Each participant is exposed to all conditions of the experiment, and therefore, serves as his/her own control. 2. Susceptible to sequence effects, so the order of the conditions should be “counter-balanced”. In complete counter-balancing: a. Each participant is exposed to all conditions of the experiment. b. Each condition is presented an equal number of times. c. Each condition is presented an equal number of times in each position. d. Each condition precedes and follows each other condition an equal number of times. 3. The critical comparison is the difference between the correlated groups on the dependent variable. 8 of 44 Dependent or Matched Pairs t-test The dependent, or matched-pairs t-test, is designed for situations in which the same participants are used in both experimental conditions. Thus, each participant contributes two scores. Calculation of t: t= Mean Difference Standard Error (of the mean difference) 9 of 44 Assumptions The dependent groups t test requires the following statistical assumptions: 1. Data are from normally distributed populations. Note: Independent samples t-test is robust against violation of this assumption if n > 30 for both groups. 2. Data are measured at least at the interval level. 10 of 44 Dependent t-test: An Overview A researcher wonders whether self-help books actually work and asks a group of participants to read two books: one written for the purpose of increasing relationship happiness and another that is (hopefully!) irrelevant for this purpose. After reading each book, she asks participants to fill out a survey that measures relationship happiness. Books Read Mean N Std. Deviation Std. Error Mean Marital Bliss: A Practical Approach 20.018 500 9.98123 .44637 Statistics 101 18.490 500 8.99153 .40211 Paired Differences Pair 1 Mean Std. Std. Error 95% Confidence Int. Diff. Deviation Mean Lower Upper 1.5280 12.62807 .56474 .4184 2.6376 t 2.706 df 499 Sig. (2-tailed) .007 11 of 44 Critical Values: Dependent Groups t test Note: Degrees of Freedom = ND - 1 (where ND = the number of difference scores) Our value: 2.706 12 of 44 Effect Sizes The statistical test tells us whether it is safe to conclude that the means come from the same—or different—populations. It doesn’t tell us how strong these differences are. r2 (r-Square), or the coefficient of determination, is one metric for gauging effect size. It represents the proportion of variance in the Sums of Squares Total that is accounted for by the treatment. Rules of Thumb regarding effects sizes: Small effect: 1-3% of the total variance Medium effect: 10% of the total variance Large effect: 25% of the variance R2 = SSM SST 13 of 44 Dependent t-test: Calculating the Effect Size Statistical vs. Practical significance. Degrees of freedom: Formula: r2 t2 = (df) = N-1 = 499 t2 + df Relationship Happiness sample computation: r2 = (-2.706)2 (-2.706)2 + 499 = 7.32 = .01 (small effect size) 506.32 14 of 44 Dependent t-test: Reporting the Results On average, the reported relationship happiness after reading Marital Bliss: A Practical Approach (M = 20.02, SE = .45), was significantly higher than after reading the introductory statistics book (M = 18.49, SE =.40), t(499) = 2.71, p < .01, r2= .01. However, the small effect size estimate indicates that this difference was not practically significance. 15 of 44 Sample Problem A psychologist believes that children of parents who use positive verbal statements (polite requests and suggestions) are more socially accepted and more positive in interactions with their peers. Although children acquire behavioral information from sources other than parents (TV, peers, and so on), more induction (coaching children by introducing consequences for behaviors and supplying rationales that support them) on the part of parents, as opposed to more power-assertive and permissive types of discipline, facilitates a pro-social behavioral orientation in children that, in turn, leads to greater competence and greater acceptance by peers. Twenty first-grade children who were rated by teachers and peers as aggressive and their parents are asked to participate in a study to determine whether a seminar on inductive parenting techniques improves social competency in children. The parents attend the seminar for one month. The children are tested for social competency before the course and then retested six months after their parents’ completion of the course. The results of the social competency test are shown on the following page with higher scores indicating a higher level of social competency. In this problem, we are testing the null hypothesis that there is no difference between the means of pre- and post-seminar social competency scores. What is the IV? What is the DV? Was the seminar effective? 16 of 44 Social Competency Problem Data Set Child Pre Post Child Pre Post 1 2 3 4 5 6 7 8 9 10 31 26 32 38 29 34 24 35 30 36 34 25 38 36 29 41 26 42 36 44 11 12 13 14 15 16 17 18 19 20 31 27 25 28 32 27 37 29 31 27 28 32 25 30 41 37 39 33 40 28 17 of 44 T-TEST PAIRS=PreSeminar WITH PostSeminar (PAIRED) /CRITERIA=CI(.9500) /MISSING=ANALYSIS. 18 of 44 SPSS Output Paired Samples Descriptive Statistics Mean Pre-Post Std. Deviation N Std. Error Mean Paired Samples Correlations Pre & Post 30.45 20 4.019 .899 34.20 20 6.066 1.356 N Correlation Sig. 20 .771 .000 Paired Differences Mean Std. Dev. Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Sig. t Pre-Post -3.750 3.919 .876 -5.584 -1.916 -4.280 df 19 Paired Samples Test (2-tail) .000 19 of 44 One-way Repeated-measures ANOVA 20 of 44 What is it? • Used when testing more than 2 experimental groups. • In dependent groups ANOVA, all groups are dependent: each score in one group is associated with a score in every other group. This may be because the same subjects served in every group or because subjects have been matched. • Within-subjects design equates conditions prior to experiment by using same participants in each condition or participants matched on some variable(s) of interest. This removes the single largest contributing factor to variance: individual differences. 21 of 44 Examining Sources of Variance in repeated measures ANOVA: An Example Suppose we want to test whether participants can identify a target faster if there are fewer distractor items. Participants must find the single target (T or F) in a letter array and then push either the T or F button on the keyboard. Each person is tested under three conditions: 10, 15, and 20 distractor items. There are 10 trials at each of the three levels of distraction, and the 10 trials are summed to give the total search time for each distraction level. 22 of 44 One-Way Repeated Measures ANOVA • Used when testing 3 or more experimental groups. • Each person contributes more than one score (i.e., every participant is exposed to every treatment). • Within-subjects designs: - equate conditions by using same participants in each condition. - variance is partitioned into SST, SSM and SSR - in repeated-measures ANOVA, the model and residual sums of squares are both part of the within-participant variance. SST SSBG SSWG SSModel SSR 23 of 44 Condition (time estimation in seconds) Participants Order of Presentation A B C (10) (15) (20) 1 ABC 18.33 22.39 24.97 2 ACB 15.96 20.72 21.79 3 BAC 19.02 22.78 25.46 4 BCA 25.36 27.48 27.91 5 CAB 19.52 24.64 26.75 6 CBA 23.27 24.96 25.49 Mean Scores 20.24 23.83 25.40 Source Df SS MS Between 2 83.69 41.85 Subjects 5 95.85 19.17 10 122.997 1.30 Error F 32.25 p <.001 24 of 44 Independent vs. Dependent Groups ANOVA: A few points worth noting … • In independent-groups ANOVA, accuracy of the F-test depends upon the assumption that the groups tested are independent. • The relationship between treatments in a repeatedmeasures design causes the conventional F-test to lack accuracy, which leads to an additional assumption. • Sphericity: refers to the equality of variances of the differences between treatment levels. • Mauchly’s test statistic • Corrections applied to produce a valid F-ratio: • Sphericity estimates < .75, use Greenhouse-Geisser estimate • When sphericity esimates > .75, use Huynh-Feldt estimate 25 of 44 The Sphericity Assumption: Equality of variances of the differences between treatment levels Accuracy of the F test in independent-groups ANOVA depends upon the assumption that the groups tested are independent. The relationship between treatments in a repeated-measures design causes the F test to lack accuracy. This requires an additional assumption termed sphericity. If we were to take each pair of treatment levels and calculate the differences between each pair of scores, then it is necessary that these differences have equal variances. Mauchly’s test examines the hypothesis that the variances of the differences between conditions are equal. The effect of violating sphericity is a loss of power (increased Type II error). 26 of 44 One-Way Dependent-Groups ANOVA: An Example Suppose we were testing the idea that consuming an increasing amount of alcohol will make it more likely people will “eye-up” members of the opposite sex. IV: Over 4 nights, people are given either 1, 2, 3 or 4 pints of beer to drink. DV: How many people the drinkers “eye-ball” (as measured by specialized eye-tracking goggles). 27 of 44 16 “Ogling” increases after 3 pints! “Ogling” after 1 and 2 pints seems similar 14 12 10 Descriptive Statistics 8 1 Pint 2 Pints 3 Pints 4 Pints 6 4 Mean 11.7500 11.7000 15.2000 14.9500 Std. Dev. 4.31491 4.65776 5.80018 4.67327 N 20 20 20 20 2 0 1 2 3 Number of Pints 4 Mean Difference = 3.5 28 of 44 Steps in the Analysis -- Compute SSM (the variability explained by the experimental effect) -- Compute SSR (amount of unexplained variation across the conditions of the repeated measures variable) -- Divide by the appropriate df: (1) df for SSM = levels of the IV minus 1 (or k - 1); (2) df for SSR = (k - 1) x (n - 1) [n = number of participants in each group] -- F = MSM/MSR = the probability of getting a value like this by chance alone. -- Check to see whether Mauchly’s test is significant! 29 of 44 SPSS Output Mauchly’s Test of Sphericity Within-subjects Effect Mauchly’s W Alcohol .477 Chi-Square 13.122 Source Type III SS ALCOHOL Error df1 5 Epsilon GG HF LB .745 .849 .333 Sig. .022 df Mean Square F Sig. 75.033 4.729 .005 .011 Sphericity Assumed 225.100 3 Greenhouse-Geisser 225.100 2.235 100.706 4.729 Huynh-Feldt 225.100 2.547 88.370 4.729 .008 Lower-bound 225.100 1.000 225.100 4.729 .042 Sphericity Assumed 904.400 57 15.867 Greenhouse-Geisser 904.400 42.469 21.296 Huynh-Feldt 904.400 48.398 18.687 Lower-bound 904.400 19.000 47.600 Note: If the Epsilon estimates <.75, use Greenhouse-Geisser (GG); If the Epsilon estimates > .75, use Huynh-Feldt (HF). 30 of 44 SPSS Output: Post hoc tests Mean (I) Alcohol (J) Alcohol Difference (I - J) 1 2 3 4 2 5.000E-02 Lower Bound Std. Error 1.000 -2.133 2.233 .136 .242 -7.544 .644 -.7.480 1.080 -3.450 1.391 4 -3.200 1.454 -5.000E-02 Bound .742 3 1 Sig. Upper .742 1.000 -2.233 2.133 3 -3.500 1.139 .038 -6.853 -.147 4 -3.250 1.420 .202 -7.429 .929 1 3.450 1.391 .136 -.644 7.544 2 3.500 1.139 .038 .147 6.853 4 .250 1.269 1.000 -3.485 3.985 1 3.200 1.454 .242 -1.080 7.480 2 3.250 1.420 .202 -.929 7.429 3 -.250 1.269 1.000 -3.985 3.485 31 of 44 Calculating Effect Size MSM - MSR 2 = MSM + ((n-1) x MSR) 2 = 75.03 - 15.87 75.03 + ((20 - 1) x 15.87) = 59.16 = 75.03 + 301.53 59.17 = .16 376.56 32 of 44 Reporting the Results The results show that the number of people eyed-up was significantly affected by the amount of alcohol consumed, F(2.55, 48.40) = 4.73, p<.05. Mauchly’s test indicated that the assumption of sphericity had been violated, 2(5) = 13.12, p<.05, therefore degrees of freedom were corrected using Huynh-Feldt estimates of sphericity ( = .85). The effect size indicated that the effect of alcohol consumption on ogling was substantial. Bonferroni posthoc tests revealed a significant difference in the number of people eyed-up only between 2 and 3 pints, p<.05. No other comparisons were significant (all ps > .05). 33 of 44 Sample Problem: One-way Repeated-measures ANOVA A stress management therapy group instructor conducts a study to determine the most effective relaxation technique(s) for stress reduction. 20 members of his stress management group participate in the study. The heart rate of each participant is monitored during each of five conditions. Each participant experienced all five conditions during the same session to control for variations in the amount of stress experienced from day to day. The five conditions are as follows: (1) baseline (subjects sat quietly for 15-minutes); (2) guided meditation (subjects listened to a tape instructing them to close their eyes, breathe deeply, and relax their muscles for 15 minutes while concentrating on a single word or phrase); (3) comedy (subjects listened to the act of a stand-up comedian on a tape cassette for 15 minutes); (4) nature (subjects listened to a tape for 15 minutes of various sounds of nature, including the sounds of the ocean, wind, rain, leaves rustling, and birds chirping); and (5) music (subjects listened to a tape of a collection of easy-listening music for 15 minutes). Every subject experienced the baseline condition first; however, the four treatment conditions were counterbalanced to alleviate the possibility of order effects. Each subject’s heart rate was monitored continuously during each of the 15-minute periods. The mean heart rate (beats per minute) for each subject during each condition is presented on the following page. In this problem, we are testing the null hypothesis that, on average, the heart rates of subjects remain the same during each of the five conditions - or - that these conditions do not influence heart rate differentially. What is the IV? What is the DV? 34 of 44 Sample Problem Data Set 1 85 70 75 71 74 11 80 72 76 74 75 2 79 69 73 70 72 12 97 80 89 82 87 3 91 82 87 83 86 13 88 78 82 80 82 4 93 80 85 79 84 14 94 79 84 80 84 5 92 80 86 81 87 15 75 60 68 62 66 6 87 79 83 80 81 16 76 67 72 69 70 7 84 72 77 73 76 17 90 77 83 76 83 8 78 69 74 71 73 18 86 75 80 77 80 9 79 69 73 70 72 19 94 84 88 85 87 10 80 71 74 72 73 20 70 59 64 58 62 35 of 44 36 of 44 37 of 44 Move each of the within subjects variables over to the “WithinSubjects Variables” box. 38 of 44 39 of 44 40 of 44 Which estimate do you use? Remember!: If the Epsilon estimates <.75, use Greenhouse-Geisser (GG); If the Epsilon estimates > .75, use Huynh-Feldt (HF). 41 of 44 SPSS Output Source SS Df Mean Square F Sig. Exp. Cond WithinSubjects Effects Sphericity Assumed 1573.100 4 393.275 235.531 .000 Greenhouse-Geisser 1573.100 1.654 951.015 235.531 .000 Huynh-Feldt 1573.100 1.792 877.881 235.531 .000 Lower-bound 1573.100 1.000 1573.100 235.531 .000 Sphericity Assumed 126.900 76 1.670 Greenhouse-Geisser 126.900 31.428 4.038 Huynh-Feldt 126.900 34.047 3.727 Lower-bound 126.900 19.000 6.679 Error Condition Descriptive Statistics Mean Std. Dev N Baseline 84.90 7.511 20 Meditate 73.60 6.969 20 Comedy 78.65 7.036 20 Nature 74.65 7.006 20 Music 77.70 7.420 20 42 of 44 43 of 44 Sample Problem Western people can become obsessed with body weight and diets, and because the media continues to glamorize stick-thin celebrities, we end up depressed that we’re not perfect. This gives corporate moguls in the fashion industry the opportunity to jump on our vulnerability by making loads of money on diets that will apparently help us to attain beautiful bodies. A European company bursts onto the scene with a diet called the Mediterranean-Monaco diet. The basic principle is that you eat no meat, drink lots of green tea, eat lots of bread dipped in extra virgin olive oil, eat chocolate at least once per day, and drink red wine (for the health benefits, of course) at the rate of at least 1.5oz. glass per day. Ten people in need of losing weight agree to try the diet for two months. Their weight was measured in Kilograms at the start of the diet and then after 1 month and 2 months. Did the diet work? Before Diet After 1 Month After 2 Months 63.75 65.38 81.34 62.98 66.24 69.31 65.98 67.70 77.89 107.27 102.72 91.33 66.58 69.45 72.87 120.46 119.96 114.26 62.01 66.09 68.01 71.87 73.62 55.43 83.01 75.81 71.63 76.62 67.66 68.60 44 of 44