Download Paired t-test, non

The paired t-test, non-parametric tests, and ANOVA July 13, 2004 Review: the Experiment (note: exact numbers have been altered)  Grade 3 at Oak School were given an IQ test at the beginning of the academic year (n=90).  Classroom teachers were given a list of names of students in their classes who had supposedly scored in the top 20 percent; these students were identified as “academic bloomers” (n=18).  BUT: the children on the teachers lists had actually been randomly assigned to the list.  At the end of the year, the same I.Q. test was readministered. The results Children who had been randomly assigned to the “top-20 percent” list had mean I.Q. increase of 12.2 points (sd=2.0) vs. children in the control group only had an increase of 8.2 points (sd=2.5) Confidence interval (more information!!) 95% CI for the difference: 4.0±1.99(.64) = (2.7 – 5.3) t-curve with 88 df’s has slightly wider cutoff’s for 95% area (t=1.99) than a normal curve (Z=1.96) The Paired T-test The Paired T-test  Paired data means you’ve measured the same person at different time points or measured pairs of people who are related (husbands and wives, siblings, controls pair-matched to cases, etc.  For example, to evaluate whether an observed change in mean (before vs. after) represents a true improvement (or decrease): Null hypothesis: difference (after-before)=0 The differences are treated like a single random variable n Xi Yi Xi - Yi X1 Y1 D1 X2 Y2 D2 X3 Y3 D3 X4 Y4 D4 … … … Xn Yn Dn D i 1 Dn  n n D  2  (D i  Dn ) 2 i 1 n 1 Dn  0 T= SD n  SD 2 Example Data baseline Test2 improvement 10 10 9 8 12 11 11 7 6 9 9 10 9 9 12 13 8 11 12 13 11 8 9 8 9 9 -1 +2 +4 0 -1 +1 +2 +4 +2 0 -1 -1 0 Is there a significant increase in scores in this group? Average of differences = +1 Sample Variance = 3.3; sample SD = 1.82 T 12 = 1/(1.82/3.6) = 1.98 data _null_; pval= 1-probt(1.98, 12); put pval; run; 0.0355517436 Significant for a one-sided test; borderline for twosided test Example 2: Did the control group in the Oak School experiment improve at all during the year? t71  8.2 8.2   28 2 .29 2.5 72 p-value <.0001 Confidence interval for annual change in IQ test score 95% CI for the increase: 8.2±2.0(.29) = (7.6 – 8.8) t-curve with 71 df’s has slightly wider cutoff’s for 95% area (t=2.0) than a normal curve (Z=1.96) Summary: parametric tests True standard deviation is known One sample (or paired sample) Two samples One-sample Z-test Two-sample Z-test Two-sample t-test Standard deviation is estimated by the sample One-sample t-test Equal variances are pooled Unequal variances (unpooled) Non-parametric tests Non-parametric tests  t-tests require your outcome variable to be normally distributed (or close enough).  Non-parametric tests are based on RANKS instead of means and standard deviations (=“population parameters”). Example: non-parametric tests 10 dieters following Atkin’s diet vs. 10 dieters following Jenny Craig Hypothetical RESULTS: Atkin’s group loses an average of 34.5 lbs. J. Craig group loses an average of 18.5 lbs. Conclusion: Atkin’s is better? Example: non-parametric tests BUT, take a closer look at the individual data… Atkin’s, change in weight (lbs): +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 J. Craig, change in weight (lbs) -8, -10, -12, -16, -18, -20, -21, -24, -26, -30 Enter data in SAS… data nonparametric; input loss diet $; datalines ; +4 atkins +3 atkins 0 atkins -3 atkins -4 atkins -5 atkins -11 atkins -14 atkins -15 atkins -300 atkins -8 jenny -10 jenny -12 jenny -16 jenny -18 jenny -20 jenny -21 jenny -24 jenny -26 jenny -30 jenny ; run; Jenny Craig 30 25 20 P e r c 15 e n t 10 5 0 -30 -25 -20 -15 -10 -5 0 5 Weight Change 10 15 20 Atkin’s 30 25 20 P e r c 15 e n t 10 5 0 -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 Weight Change -60 -40 -20 0 20 t-test doesn’t work…  Comparing the mean weight loss of the two groups is not appropriate here.  The distributions do not appear to be normally distributed.  Moreover, there is an extreme outlier (this outlier influences the mean a great deal). Statistical tests to compare ranks:  Wilcoxon rank-sum test (equivalent to MannWhitney U test) is analogue of two-sample ttest.  Wilcoxon signed-rank test is analogue of onesample t-test, usually used for paired data Wilcoxon rank-sum test        RANK the values, 1 being the least weight loss and 20 being the most weight loss. Atkin’s +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 1, 2, 3, 4, 5, 6, 9, 11, 12, 20 J. Craig -8, -10, -12, -16, -18, -20, -21, -24, -26, -30 7, 8, 10, 13, 14, 15, 16, 17, 18, 19 Wilcoxon “rank-sum” test Sum of Atkin’s ranks:  1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 + 20=73  Sum of Jenny Craig’s ranks: 7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137   Jenny Craig clearly ranked higher!  P-value *(from computer) = .017 – from ttest, p-value=.60 *Tests in SAS… /*to get wilcoxon rank-sum test*/ proc npar1way wilcoxon data=nonparametric; class diet; var loss; run; /*To get ttest*/ proc ttest data=nonparametric; class diet; var loss; run; Wilcoxon “signed-rank” test H0: median weight loss in Atkin’s group = 0 Ha:median weight loss in Atkin’s not 0 Atkin’s  +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 Rank absolute values of differences (ignore zeroes): Ordered values: 300, 15, 14, 11, 5, 4, 4, 3, 3, 0 Ranks: 1 2 3 4 5 6-7 8-9 Sum of negative ranks: 1+2+3+4+5+6.5+8.5=30 Sum of positive ranks: 6.5+8.5=15 P-value*(from computer)=.043; from paired t-test=.27 *Tests in SAS… /*to get one-sample tests (both student’s t and signed-rank*/ proc univariate data=nonparametric; var loss; where diet="atkins"; run; What if data were paired? e.g., one-to-one matching; find pairs of study participants who have same age, gender, socioeconomic status, degree of overweight, etc. Atkin’s  +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 J. Craig  -8, -10, -12, -16, -18, -20, -21, -24, -26, -30 Enter data differently in SAS… 10 pairs, rather than 20 individual observations data piared; input lossa lossj; diff=lossa-lossj; datalines ; +4 -8 +3 -10 0 -12 -3 -16 -4 -18 -5 -20 -11 -21 -14 -24 -15 -26 -300 -30 ; run; *Tests in SAS… /*to get all paired tests*/ proc univariate data=paired; var diff; run; /*To get just paired ttest*/ proc ttest data=paired; var diff; run; /*To get paired ttest, alternatively*/ proc ttest data=paired; paired lossa*lossj; run; ANOVA for comparing means between more than 2 groups ANOVA (ANalysis Of VAriance)  Idea: For two or more groups, test difference between means, for quantitative normally distributed variables.  Just an extension of the t-test (an ANOVA with only two groups is mathematically equivalent to a t-test).  Like the t-test, ANOVA is “parametric” test— assumes that the outcome variable is roughly normally distributed The “F-test” Is the difference in the means of the groups more than background noise (=variability within groups)? Variabilit y between groups F Variabilit y within groups Spine bone density vs. menstrual regularity 1.2 1.1 1.0 S P I N E 0.9 Within group variability Between group variation Within group variability Within group variability 0.8 0.7 amenorrheic oligomenorrheic eumenorrheic Group means and standard deviations  Amenorrheic group (n=11): – Mean spine BMD = .92 g/cm2 – standard deviation = .10 g/cm2  Oligomenorrheic group (n=11) – Mean spine BMD = .94 g/cm2 – standard deviation = .08 g/cm2  Eumenrroheic group (n=11) – Mean spine BMD =1.06 g/cm2 – standard deviation = .11 g/cm2 The size of the groups. Between-group variation. The F-Test 2 sbetween The difference of each group’s mean from the overall mean. 2 2 2 (. 92  . 97 )  (. 94  . 97 )  ( 1 . 06  . 97 )  ns x2  11* ( )  .063 3 1 2 swithin  avg s 2  1 (.102  .082  .112 )  .0095 3 F2,30 The average amount of variation within groups. 2 between 2 within s  s .063   6.6 .0095 Large F value indicates Each group’s variance. that the between group variation exceeds the within group variation (=the background noise). The F-distribution  The F-distribution is a continuous probability distribution that depends on two parameters n and m (numerator and denominator degrees of freedom, respectively): The F-distribution  A ratio of sample variances follows an Fdistribution:   2 between 2 within The F ~ Fn ,m F-test tests the hypothesis that two sample variances are equal. will be close to 1 if sample variances are equal. 2 2 H 0 :  between   within H a : 2 between  2 within ANOVA Table Source of variation d.f. Between k-1 (k groups) Sum of squares Mean Sum of Squares SSB SSB/k-1 (sum of squared deviations of group means from F-statistic SSB SSW p-value Go to k 1 nk  k Fk-1,nk-k chart grand mean) Within nk-k (n individuals per group) Total nk-1 variation SSW (sum of squared deviations of observations from their group mean) s2=SSW/nk-k TSS (sum of squared deviations of observations from grand mean) TSS=SSB + SSW ANOVA=t-test Source of variation Between (2 groups) Within d.f. 1 2n-2 Sum of squares SSB Squared (squared difference difference in means in means) SSW equivalent to numerator of pooled variance Total 2n-1 variation Mean Sum of Squares TSS Pooled variance F-statistic p-value Go to (X  Y ) sp 2 2 ( X Y 2 )  (t 2 n  2 ) 2 sp F1, 2n-2 Chart notice values are just (t 2 2n-2) ANOVA summary  A statistically significant ANOVA (F-test) only tells you that at least two of the groups differ, but not which ones differ.  Determining which groups differ (when it’s unclear) requires more sophisticated analyses to correct for the problem of multiple comparisons… Question: Why not just do 3 pairwise ttests? Answer: because, at an error rate of 5% each test, this means you have an overall chance of up to 1(.95)3= 14% of making a type-I error (if all 3 comparisons were independent)  If you wanted to compare 6 groups, you’d have to do 6C2 = 15 pairwise ttests; which would give you a high chance of finding something significant just by chance (if all tests were independent with a type-I error rate of 5% each); probability of at least one type-I error = 1-(.95)15=54%.  Multiple comparisons With 18 independent comparisons, we have 60% chance of at least 1 false positive. Multiple comparisons With 18 independent comparisons, we expect about 1 false positive. Correction for multiple comparisons      How to correct for multiple comparisons posthoc… Bonferroni’s correction (adjusts p by most conservative amount, assuming all tests independent)  Holm/Hochberg (gives p-cutoff beyond which not significant)  Tukey’s (adjusts p)  Scheffe’s (adjusts p) Non-parametric ANOVA Kruskal-Wallis one-way ANOVA Extension of the Wilcoxon Sign-Rank test for 2 groups; based on ranks Proc NPAR1WAY in SAS Reading for this week  Chapters 4-5, 12-13 (last week)  Chapters 6-8, 10, 14 (this week)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Paired t-test, non