Download Nonparametric tests I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Nonparametric tests I
Back to basics
Lecture Outline
• What is a nonparametric test?
• Rank tests, distribution free tests and
nonparametric tests
• Which type of test to use
MTB > dotplot 'Male' 'Female';
SUBC> same.
.
: .
.
.
.
. . :: :..:::.. :..:: :... .:.. .. .
:
.
.
---+---------+---------+---------+---------+---------+---MALE
..: . : :
: .
.: ::::::.::.:. ::.: : .
: . .
---+---------+---------+---------+---------+---------+---FEMALE
0.32
0.48
0.64
0.80
0.96
1.12
MTB > dotplot 'Male' 'Female';
SUBC> same.
.
: .
.
.
.
. . :: :..:::.. :..:: :... .:.. .. .
:
.
.
---+---------+---------+---------+---------+---------+---MALE
..: . : :
: .
.: ::::::.::.:. ::.: : .
: . .
---+---------+---------+---------+---------+---------+---FEMALE
0.32
0.48
0.64
0.80
0.96
1.12
MTB > desc 'Male' 'Female’
Variable
MALE
FEMALE
Variable
MALE
FEMALE
N
50
50
Mean
0.5908
0.5180
Min
0.2900
0.3200
Median
0.5600
0.4950
Max
1.1300
0.8500
TrMean
0.5770
0.5102
Q1
0.4275
0.4100
StDev
0.1979
0.1315
Q3
0.7150
0.6125
SEMean
0.0280
0.0186
Lecture Outline
• What is a nonparametric test?
– What is a parameter?
– What are examples of non-parametric
tests?
• Rank tests, distribution free tests and
nonparametric tests
• Which type of test to use
Parameters
• are central to inference in GLM and
ANOVA
• and represent assumptions about the
underlying processes
LET
LET
LET
LET
K1=4.7
K2=-2.5
K3=10.4
K4=1.9
#
#
#
#
Group 1 mean minus grand mean
Group 2 mean minus grand mean
The grand mean
Standard deviation of the error
RANDOM 30 'Error'
LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
LET
LET
LET
LET
K1=4.7
K2=-2.5
K3=10.4
K4=1.9
#
#
#
#
Group 1 mean minus grand mean
Group 2 mean minus grand mean
The grand mean
Standard deviation of the error
RANDOM 30 'Error'
LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
Group
1
1
Fitted value = m + 2
2
3
-1-2
Error has Normal Distribution with zero mean and
standard deviation 
LET
LET
LET
LET
K1=4.7
K2=-2.5
K3=10.4
K4=1.9
#
#
#
#
Group 1 mean minus grand mean
Group 2 mean minus grand mean
The grand mean
Standard deviation of the error
RANDOM 30 'Error'
LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
Group
1
1
Fitted value = m + 2
2
3
-1-2
Error has Normal Distribution with zero mean and
standard deviation 
Parameters
• are central to inference in GLM and
ANOVA
• but represent assumptions about the
underlying processes
Parameters
• are central to inference in GLM and
ANOVA
• but represent assumptions about the
underlying processes
• can be done without in some simple
situations
Parameters
• are central to inference in GLM and
ANOVA
• but represent assumptions about the
underlying processes
• can be done without in some simple
situations – BUT HOW?
Rnk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Wt Sex
0.29
1
0.32
2
0.34
1
0.34
2
0.34
2
0.36
1
0.36
1
0.37
1
0.37
1
0.37
1
0.37
2
0.37
2
0.38
1
0.38
1
0.38
2
0.38
2
0.39
2
0.40
2
0.40
2
0.40
2
0.41
1
0.41
1
0.41
2
0.41
2
0.41
2
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
0.41
0.42
0.43
0.43
0.43
0.45
0.45
0.45
0.45
0.46
0.47
0.47
0.48
0.48
0.48
0.48
0.49
0.49
0.50
0.50
0.50
0.50
0.50
0.51
0.51
2
1
1
2
2
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
2
2
1
2
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
0.52
0.52
0.52
0.53
0.53
0.55
0.56
0.56
0.56
0.57
0.58
0.58
0.59
0.59
0.59
0.60
0.61
0.61
0.62
0.62
0.62
0.62
0.62
0.63
0.63
1
2
2
2
2
2
1
1
1
1
2
2
1
2
2
1
1
2
1
1
2
2
2
1
2
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
0.65
0.66
0.67
0.67
0.67
0.67
0.68
0.71
0.72
0.73
0.75
0.75
0.77
0.78
0.78
0.78
0.82
0.83
0.85
0.85
0.88
0.98
0.98
1.05
1.13
1
1
1
2
2
2
1
1
2
1
1
1
1
1
2
2
2
1
1
2
1
1
1
1
1
Rnk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Wt Sex
0.29
1
0.32
2
0.34
1
0.34
2
0.34
2
0.36
1
0.36
1
0.37
1
0.37
1
0.37
1
0.37
2
0.37
2
0.38
1
0.38
1
0.38
2
0.38
2
0.39
2
0.40
2
0.40
2
0.40
2
0.41
1
0.41
1
0.41
2
0.41
2
0.41
2
Remember ties
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
0.41
0.42
0.43
0.43
0.43
0.45
0.45
0.45
0.45
0.46
0.47
0.47
0.48
0.48
0.48
0.48
0.49
0.49
0.50
0.50
0.50
0.50
0.50
0.51
0.51
2
1
1
2
2
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
2
2
1
2
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
0.52
0.52
0.52
0.53
0.53
0.55
0.56
0.56
0.56
0.57
0.58
0.58
0.59
0.59
0.59
0.60
0.61
0.61
0.62
0.62
0.62
0.62
0.62
0.63
0.63
1
2
2
2
2
2
1
1
1
1
2
2
1
2
2
1
1
2
1
1
2
2
2
1
2
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
0.65
0.66
0.67
0.67
0.67
0.67
0.68
0.71
0.72
0.73
0.75
0.75
0.77
0.78
0.78
0.78
0.82
0.83
0.85
0.85
0.88
0.98
0.98
1.05
1.13
1
1
1
2
2
2
1
1
2
1
1
1
1
1
2
2
2
1
1
2
1
1
1
1
1
140
120
100
80
60
40
20
0
0
10
20
30
40
50
60
Mean Rank
70
80
90 100
140
120
100
80
60
40
20
0
0
10
20
30
40
50
60
Mean Rank
The ‘Male’ mean rank = 55.26
The ‘Female’ mean rank = 45.74
70
80
90 100
MTB > mann-whitney male female
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
FEMALE
N =
N =
50
50
Median =
Median =
0.5600
0.4950
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Sum of ranks of 2763 corresponds to
a mean rank of 2763/50 = 55.26
140
120
100
80
60
40
20
0
0
10
20
30
40
50
60
Mean Rank
The ‘Male’ mean rank = 55.26
The ‘Female’ mean rank = 45.74
70
80
90 100
140
120
100
80
60
40
20
0
0
10
20
30
40
50
60
Mean Rank
The ‘Male’ mean rank = 55.26
The ‘Female’ mean rank = 45.74
70
80
90 100
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05
The null hypothesis is better expressed as “the distributions
of male and female weights are the same”.
Parameters
• are central to inference in GLM and
ANOVA
• but represent assumptions about the
underlying processes
• can be done without in some simple
situations
Nonparametric vs Parametric
Nonparametric vs Parametric
• Sign Test
• One-sample t-test
Nonparametric vs Parametric
• Sign Test
• Mann-Whitney Test
• One-sample t-test
• Two-sample t-test
Nonparametric vs Parametric
• Sign Test
• Mann-Whitney Test
• Spearman Rank Test
• One-sample t-test
• Two-sample t-test
• Correlation/Regression
Nonparametric vs Parametric
•
•
•
•
Sign Test
Mann-Whitney Test
Spearman Rank Test
Kruskal-Wallis Test
•
•
•
•
One-sample t-test
Two-sample t-test
Correlation/Regression
One-way ANOVA
Nonparametric vs Parametric
•
•
•
•
•
Sign Test
Mann-Whitney Test
Spearman Rank Test
Kruskal-Wallis Test
Friedman Test
•
•
•
•
•
One-sample t-test
Two-sample t-test
Correlation/Regression
One-way ANOVA
One-way blocked ANOVA
Lecture Outline
• What is a nonparametric test?
• Rank tests, distribution free tests and
nonparametric tests
• Which type of test to use
A rose by any other name..
• Non-parametric tests lack parameters
• Rank tests start by ranking the data
• Distribution-free tests don’t assume a
Normal distribution (or any other)
These are mainly but not completely
overlapping sets of tests (and some
are scale-invariant too).
Lecture Outline
• What is a nonparametric test?
• Rank tests, distribution free tests and
nonparametric tests
• Which type of test to use
Fewer assumptions but...
• still some assumptions (including independence)
• limited range of situations
– no more than 2 x-variables
– can’t mix continuous and categorical x-variables
• provide p-values but estimation is dodgy
• loss of efficiency if parametric assumptions are
upheld
• there is a grand scheme for parametric statistics
(GLM) but a lot of separate strange names for
nonparametrics
When is there a choice?
• when there is a non-parametric test
– fewer than two or three variables
altogether
• and prediction is not required
How to choose:
• If the assumptions of parametric test are
upheld, use it – on grounds of efficiency
• If not upheld, consider fixing the
assumptions (e.g. by transforming the
data, as in the practical)
• If assumptions not fixable, use
nonparametric test
MTB > dotplot 'LogM' 'LogF';
SUBC> same.
.
.
.
.
. ::: :.. . :::.. :..::.:....: : : .
: . .
+---------+---------+---------+---------+---------+-------LogM
.:
. :
. .
. : ::.:: : :. ::.::. ::.:. : . : ..
+---------+---------+---------+---------+---------+-------LogF
-1.25
-1.00
-0.75
-0.50
-0.25
0.00
MTB > dotplot 'LogM' 'LogF';
SUBC> same.
.
.
.
.
. ::: :.. . :::.. :..::.:....: : : .
: . .
+---------+---------+---------+---------+---------+-------LogM
.:
. :
. .
. : ::.:: : :. ::.::. ::.:. : . : ..
+---------+---------+---------+---------+---------+-------LogF
-1.25
-1.00
-0.75
-0.50
-0.25
0.00
MTB > desc 'LogM' 'LogF'
Variable
LogM
LogF
Variable
LogM
LogF
N
50
50
Mean
-0.5786
-0.6878
Min
-1.2379
-1.1394
Median
-0.5798
-0.7032
Max
0.1222
-0.1625
TrMean
-0.5850
-0.6928
Q1
-0.8499
-0.8916
StDev
0.3248
0.2453
Q3
-0.3355
-0.4902
SEMean
0.0459
0.0347
Lecture Outline
• What is a nonparametric test?
• Rank tests, distribution free tests and
nonparametric tests
• Which type of test to use
Last remarks
• Nonparametric tests are an opportunity
to revise the basic ideas of statistical
inference
• They are sometimes useful in biology
• They are often used in biology
• NEXT WEEK: more nonparametrics,
including confidence intervals and
randomisation tests. READ the handout
Related documents