Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Nonparametric tests I Back to basics Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use MTB > dotplot 'Male' 'Female'; SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . . ---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . . ---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12 MTB > dotplot 'Male' 'Female'; SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . . ---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . . ---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12 MTB > desc 'Male' 'Female’ Variable MALE FEMALE Variable MALE FEMALE N 50 50 Mean 0.5908 0.5180 Min 0.2900 0.3200 Median 0.5600 0.4950 Max 1.1300 0.8500 TrMean 0.5770 0.5102 Q1 0.4275 0.4100 StDev 0.1979 0.1315 Q3 0.7150 0.6125 SEMean 0.0280 0.0186 Lecture Outline • What is a nonparametric test? – What is a parameter? – What are examples of non-parametric tests? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use Parameters • are central to inference in GLM and ANOVA • and represent assumptions about the underlying processes LET LET LET LET K1=4.7 K2=-2.5 K3=10.4 K4=1.9 # # # # Group 1 mean minus grand mean Group 2 mean minus grand mean The grand mean Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' LET LET LET LET K1=4.7 K2=-2.5 K3=10.4 K4=1.9 # # # # Group 1 mean minus grand mean Group 2 mean minus grand mean The grand mean Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' Group 1 1 Fitted value = m + 2 2 3 -1-2 Error has Normal Distribution with zero mean and standard deviation LET LET LET LET K1=4.7 K2=-2.5 K3=10.4 K4=1.9 # # # # Group 1 mean minus grand mean Group 2 mean minus grand mean The grand mean Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' Group 1 1 Fitted value = m + 2 2 3 -1-2 Error has Normal Distribution with zero mean and standard deviation Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes • can be done without in some simple situations Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes • can be done without in some simple situations – BUT HOW? Rnk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Wt Sex 0.29 1 0.32 2 0.34 1 0.34 2 0.34 2 0.36 1 0.36 1 0.37 1 0.37 1 0.37 1 0.37 2 0.37 2 0.38 1 0.38 1 0.38 2 0.38 2 0.39 2 0.40 2 0.40 2 0.40 2 0.41 1 0.41 1 0.41 2 0.41 2 0.41 2 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 0.41 0.42 0.43 0.43 0.43 0.45 0.45 0.45 0.45 0.46 0.47 0.47 0.48 0.48 0.48 0.48 0.49 0.49 0.50 0.50 0.50 0.50 0.50 0.51 0.51 2 1 1 2 2 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 2 2 1 2 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 0.52 0.52 0.52 0.53 0.53 0.55 0.56 0.56 0.56 0.57 0.58 0.58 0.59 0.59 0.59 0.60 0.61 0.61 0.62 0.62 0.62 0.62 0.62 0.63 0.63 1 2 2 2 2 2 1 1 1 1 2 2 1 2 2 1 1 2 1 1 2 2 2 1 2 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 0.65 0.66 0.67 0.67 0.67 0.67 0.68 0.71 0.72 0.73 0.75 0.75 0.77 0.78 0.78 0.78 0.82 0.83 0.85 0.85 0.88 0.98 0.98 1.05 1.13 1 1 1 2 2 2 1 1 2 1 1 1 1 1 2 2 2 1 1 2 1 1 1 1 1 Rnk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Wt Sex 0.29 1 0.32 2 0.34 1 0.34 2 0.34 2 0.36 1 0.36 1 0.37 1 0.37 1 0.37 1 0.37 2 0.37 2 0.38 1 0.38 1 0.38 2 0.38 2 0.39 2 0.40 2 0.40 2 0.40 2 0.41 1 0.41 1 0.41 2 0.41 2 0.41 2 Remember ties 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 0.41 0.42 0.43 0.43 0.43 0.45 0.45 0.45 0.45 0.46 0.47 0.47 0.48 0.48 0.48 0.48 0.49 0.49 0.50 0.50 0.50 0.50 0.50 0.51 0.51 2 1 1 2 2 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 2 2 1 2 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 0.52 0.52 0.52 0.53 0.53 0.55 0.56 0.56 0.56 0.57 0.58 0.58 0.59 0.59 0.59 0.60 0.61 0.61 0.62 0.62 0.62 0.62 0.62 0.63 0.63 1 2 2 2 2 2 1 1 1 1 2 2 1 2 2 1 1 2 1 1 2 2 2 1 2 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 0.65 0.66 0.67 0.67 0.67 0.67 0.68 0.71 0.72 0.73 0.75 0.75 0.77 0.78 0.78 0.78 0.82 0.83 0.85 0.85 0.88 0.98 0.98 1.05 1.13 1 1 1 2 2 2 1 1 2 1 1 1 1 1 2 2 2 1 1 2 1 1 1 1 1 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 Mean Rank 70 80 90 100 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 Mean Rank The ‘Male’ mean rank = 55.26 The ‘Female’ mean rank = 45.74 70 80 90 100 MTB > mann-whitney male female MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE FEMALE N = N = 50 50 Median = Median = 0.5600 0.4950 MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Sum of ranks of 2763 corresponds to a mean rank of 2763/50 = 55.26 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 Mean Rank The ‘Male’ mean rank = 55.26 The ‘Female’ mean rank = 45.74 70 80 90 100 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 Mean Rank The ‘Male’ mean rank = 55.26 The ‘Female’ mean rank = 45.74 70 80 90 100 MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) Cannot reject at alpha = 0.05 MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) Cannot reject at alpha = 0.05 MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) Cannot reject at alpha = 0.05 The null hypothesis is better expressed as “the distributions of male and female weights are the same”. Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes • can be done without in some simple situations Nonparametric vs Parametric Nonparametric vs Parametric • Sign Test • One-sample t-test Nonparametric vs Parametric • Sign Test • Mann-Whitney Test • One-sample t-test • Two-sample t-test Nonparametric vs Parametric • Sign Test • Mann-Whitney Test • Spearman Rank Test • One-sample t-test • Two-sample t-test • Correlation/Regression Nonparametric vs Parametric • • • • Sign Test Mann-Whitney Test Spearman Rank Test Kruskal-Wallis Test • • • • One-sample t-test Two-sample t-test Correlation/Regression One-way ANOVA Nonparametric vs Parametric • • • • • Sign Test Mann-Whitney Test Spearman Rank Test Kruskal-Wallis Test Friedman Test • • • • • One-sample t-test Two-sample t-test Correlation/Regression One-way ANOVA One-way blocked ANOVA Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use A rose by any other name.. • Non-parametric tests lack parameters • Rank tests start by ranking the data • Distribution-free tests don’t assume a Normal distribution (or any other) These are mainly but not completely overlapping sets of tests (and some are scale-invariant too). Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use Fewer assumptions but... • still some assumptions (including independence) • limited range of situations – no more than 2 x-variables – can’t mix continuous and categorical x-variables • provide p-values but estimation is dodgy • loss of efficiency if parametric assumptions are upheld • there is a grand scheme for parametric statistics (GLM) but a lot of separate strange names for nonparametrics When is there a choice? • when there is a non-parametric test – fewer than two or three variables altogether • and prediction is not required How to choose: • If the assumptions of parametric test are upheld, use it – on grounds of efficiency • If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical) • If assumptions not fixable, use nonparametric test MTB > dotplot 'LogM' 'LogF'; SUBC> same. . . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00 MTB > dotplot 'LogM' 'LogF'; SUBC> same. . . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00 MTB > desc 'LogM' 'LogF' Variable LogM LogF Variable LogM LogF N 50 50 Mean -0.5786 -0.6878 Min -1.2379 -1.1394 Median -0.5798 -0.7032 Max 0.1222 -0.1625 TrMean -0.5850 -0.6928 Q1 -0.8499 -0.8916 StDev 0.3248 0.2453 Q3 -0.3355 -0.4902 SEMean 0.0459 0.0347 Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use Last remarks • Nonparametric tests are an opportunity to revise the basic ideas of statistical inference • They are sometimes useful in biology • They are often used in biology • NEXT WEEK: more nonparametrics, including confidence intervals and randomisation tests. READ the handout