Download 09 nonparametric test2015

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Lecture9 nonparametric methods
Xiaojin YU
Department of Epi. And Biostatistics,
School of public health,Southeast
university
1
Review: Type of data
 qualitative data (categorical data)
(1) binary (dichotomous, binomial)
(2) multinomial (polytomous)
(3) ordinal
 quantitative data
2
Measures of central
tendency- quantitative data
 Mean: Normal distribution
 Geometric mean: positively skew and
data can be transferred into normal
distribution by log scale.
 Median: used by all data, in general, is
often used to abnormal data.
Measures of Dispersionquantitative data




Range,
Interquartile range ,
Variance and standard deviation ,
coefficient of variation
4
Compare means by t-test
type
conditions
H0
x ,S,n,μ0
μ=μ0
Paired t-test
d ,Sd,n
μd=0
Two group ttest
x1 ,
s1,n1,
,
x2
s2,n2
Single sample
t-test
t
Assumption:
 normality
 Equality of variance
t
x  0
t
s/ n
t
d 0
sd / n
ν
n-1
np-1
x1  x2
(n1  1) s12  (n2  1) s22
1 1
(  )
(n1  1)  (n2  1)
n1 n2
n1+n2-2
Comparison of Means
between two groups
N (missing)
age
mean(std)
Median
(Min-Max)
Duria mean(std)
tion
Median
(Min-Max)
2017/5/5
Group A
72( 0)
Group B
70( 0)
Methods
statistic
P
40.11± 14.87
41.00
(17- 65)
23.74± 70.35
5.00
(1-365)
37.26± 14.45
36.00
(17-67)
22.41± 70.75
4.00
(1-365)
t- test
1.160
0.2482
rank sum
0.112
0.9113
6
7
Compare proportion by Chisquare test
Drugs
Drug A
Drug B
total
Effect of drug
effective Not effective
41
4
24
11
65
15
total
Sampl
e rates
45
35
70
91.1
68.6
 Are the 2 population proportions equal or not?
 How categorical variables are distributed among 2
population?
8
solution
 H0: πA = πB
 H1: πA≠πB, α=0.05
A (T)
41 (36.56)
4 (8.44)
24 (28.44)
11 (6.56)
 Calculate T and Test
Statistic, Chi-square
Drug
positive
negative
R total
A
T11 36.56
T12 8.44
n1
B
T21 28.44
T22 6.56
n2
C total
m1
m2
n
9
Test Statistic
(A  T)
 
T
A (T)
2
2
 
2
41  36.56  4  8.442
2
36.56
8.44
41 (36.56)
4 (8.44)
24 (28.44)
11 (6.56)
2

24  28.44

28.44
2

11  6.56

6.56
 6.573
10
Conclusion
 Since 6.573>3.84, P<0.05, we reject
H0,accept H1 at 0.05 level, so We
conclude that the two populations are not
homogeneous with respected to effect of
drug. The effects of drug A and drug B are
not equivalent.
11
Compare Ordinal data
result
Treatment (1)
heal 4
significantly effective 3
effective 2
ineffective 1
total
65(51.59)
18(14.29)
30(23.81)
13(10.32)
126
ControL(2)
42(51.22)
6(7.32)
23(28.05)
11(13.41)
82
OUTLINE
 Basic logic of rank based methods
 Rank sum test for 2 independent group
(Completely random design)
 Sign rank test for Paired design
 Rank sum test for 3 or more independent
group (Completely random design)
 Multiple Comparison
13
Rank & Rank Sum




Review of median
Example of duration in Hospital:
month 3.1 5.5 6.0 10.2 11.9
rank
1
2
3
4
5
14
Task to you
 How to compare boys are taller or girl are
taller no measuring is allowed?
Solution to the task
 Blue-male
 Red- female
16
The locations of small value are in
front(small rank), and great value are in the
post(great rank).
17
Part I: Wilcoxon Rank Sum
Test
 Rank Sum Test for Comparing the Locations of Two
Populations
 Mann-Whitney test
 review t-test for comparing 2 population means
Normality and homogeneity
18
Rank: 1
2017/5/5
2
3.5
3.5
5
6
19
EXAMPLE 1: Table 9.1 Survival Times of
Cats & Rabbits without oxygen
Cats
minutes
25
34
44
46
46
48
49
50
n1=8
rank
9.5
13
15
16
17
18
19
20
T1=127.5
rabbits
minutes
rank
14
1
15
2
16
3
17
4
19
5
21
6.5
21
6.5
23
8
25
9.5
28
11
30
12
35
14
n2=12
T2=82.5
20
STEP I: Test Hypothesis and sig.
level
 H0:M1=M2 population locations of survival time
of both cat and rabbit are equal
 H1: M1 ≠ M2 population locations of survival
time of both cat and rabbit are not equal ;
 a = 0.05
21
STEP II: Statistic
Assign Ranks
 To pool n1 +n2
observations to form a
single sample
 rank all observations
of the pooled sample
from smallest to
largest in column 2
and 4
 Mid-ranks are used by
tied values
Pooled sample
time
rank
time rank
28
11
14
1
30
12
15
2
34
13
16
3
35
14
17
4
44
15
19
5
46
16
21
6.5
46
17
21
6.5
48
18
23
8
49
19
25
9.5
50
20
25
9.5
n1=8 T1=127.5 n2=12 T2=82.5
22
STEP II: Statistic
Test statistic T
 Calculate the rank sums for the two
samples respectively, denotes by T1
and T2.
 Take the Ti with small n as T.
 n1=8<n2=12, so T= T1 =127.5.
 Sum(T1 ,T2)=N(N+1)/2=210
23
STEP III: Determine P Value,
conclusion
 From table in appendix E, by n1=8,n2-n1=4, we
have the critical interval of Tα (58-110)
 Since T=127.5, is beyond of Tα, so,P≤α。Given
α=0.05, P<0.05;
 H0 is rejected, it concludes that the survival times
of cats and rabbits in the environment without
oxygen might be different.
 Cat will survive for longer time without oxygen.
24
BASIC LOGIC
 N=N1+N2
GIVEN N, the total rank sum is fixed and can be
calculated .
If H0 is true, the total rank sum should be
assigned between 2 groups with weight of ni.
N ( N  1) n1 ( N  1) n2 ( N  1)


2
2
2
25
Normal Approximation
n1>10 or n2-n1 >10
Z
T  n1(N 1) / 2  0.5
N  n1  n2
n1n2 (N 1) /12
Correction of ties
Zc  Z / c
C  1 (t j3  t j )/(N 3  N)
26
EXAMPLE 2: Table 9-2 Results From
a Clinic Trial for Hypertension
effect
Drug
A
Drug
B
total
0 ineffect.
17
70
1 effect.
25
2(healed)
total
Range of
rank
Averag
e rank
Rank sum
DrugA
DrugB
87
1-87
44
748
3080
13
38
88-125
106.5
2662.5
1384.5
27
37
64
126-189
157.5
4252.5
5827.5
69
120
189
~
~
7663
10292
TA=7663 n=69;
TB=10293,n=120
27
Part II: Wilcoxon’s Signed
Rank Test
 Wilcoxon(1945) H0: Md=0
 Example:
 A test procedure the data on 28 patients
data(14 pairs) from a sequential analysis
double blind clinical trial for cancer of the
head and neck will be used. (Bakowski
MT, etc. Int. J. Radiation Oncology
Biology Physics 1978 ,4 :115-119)
28
Wilcoxon’s Signed Rank
Test often used
1) quantitative data---t-test for pairs design
the difference of pairs must be normal, if
its distribution is skew then must used
Signed Rank Test.
(2)Qualitative data--- pairs design
ordinal
29
Example 9.3
 2 treatment groups : radiotherapy + drug (B)

radiotherapy + placebo (A)
 The tumor response within three months of completion of
treatment was assessed for each patient in terms of complete
regression (CR), partial regression (PR), no change (NC) and
progression of the disease (P).
 Scored from 1 to 5 as follows:
 5 = CR with no recurrence subsequently up to 6 months ore,
4 = CR initially but with a subsequent recurrence within 6
months,
 3 = PR,
 2 = NC,
 1 = P.
30
2)
31
STEP I: Test Hypothesis
 H0 : Md=0 population Median of differences is
equal to zero;
 H1 : Md≠0 population Median of differences is
equal to zero;
 α=0.05
32
STEP II: statistic
Assigning Rank
 1) Calculate the difference di=xi-yi, and ignore
all the pairs with zero differences.
 2) Rank the absolute values of non-zero dis
from the smallest to the largest such that each
di gets a rank; if there is a tie, what will we do?
33
 Ties: These six patients all have
differences of 1 and therefore the rank
numbers 1, 2, 3, 4, 5 and 6 must be
divided amongst them.
That is, they all have a rank of
(1 + 2 + 3 + 4 + 5 + 6)/6 =3.5
 3) Assign the initial signs of dis to their
ranks
34
Test Statistics T
 valid number of pairs n=10;
 Find the sum of the ranks with positive signs and
denote by T+;
 Find the sum of the ranks with negative signs and
denote by T-;
 Sum(T+ ,T-)=n(n+1)/2=55
 Let T=min(T+ ,T-) or anyone。
 T-=48, T+=7
35
Step 3) Determine the P value
& Conclude A Conclusion
 n<25,find the critical value range Tα in table 10.3
(P184) .
 n=10,T=48 or 7,in this example, given the value of
α=0.05, find the critical value T0.05 is (8~47),T is not in
the interval, P<0.05, H0 is not rejected。 It can not
conclude that the results from two different between 2
treatments.
36
Normal Approximation
 When n> 25,the table 10.3 can’t help. Then
we turn to the normal approximation.
 In fact it can be proved that if H0 is true, when n is large enough,
the distribution of statistic T will close to a normal distribution with
nn  1
T 
4
nn  12n  1
T 
24
37
Correction of Continuity
 If there is tie, the statistic is
Z
T  nn 1 / 4  0.5
nn 12n 1 / 24
38
Part III: Kruskal-Wallis Test
 Similar to one-way
ANOVA /chi-square test
 Used to test location of more than 2 populations
39
Example 9.4
Allocate 24 person randomly to 1 of 3 groups: no
exercise; 20 minutes of jogging per day; or 60 minutes
of jogging per day.
At the end of a month, ask each participant to rate how
depressed they now feel, on a Likert scale that runs
from 1 ("totally miserable") through to 100 (ecstatically
happy").
Question:Does physical exercise alleviate depression?
40
Report on depression from 3
groups and ranks
No exercise
score
rank
23
2
26
3
51
16
49
14
58
19
37
7
29
5.5
44
10
76.5
 Ri
Jogging for20 minutes
score
rank
22
1
27
4
39
9
29
5.5
46
11
48
12
49
14
65
23
79.5
Jogging for 60 minutes
score
rank
59
20
66
24
38
8
49
14
56
17.5
60
21
56
17.5
62
22
144
41
Test Hypothesis
 H0: M1=M2=…=Mk
3 populations have the same population
location
 H1: M1,M2,…Mk are not all equal : 3
populations have different population location ,
At least one of the populations has a median
different from the others.
 a = 0.05
42
Test Statistic -H
 Ri N  1 
12

H
ni  

N ( N  1) i 1  ni
2 
K
2
 Let N=n1+n2+n3
 Ri the sum of the ranks associated with the ith
sample, like 76.5,79.5,144
 The average rank is (N+1)/2
 The sample average rank for ith sample is Ri/ni
 12/{N(N+1)}standard the test statistic in terms of the
overall sample size N.
43
Solution to Example
 K=3
R1=76.5 n1=8 R2=79.5 n2=8 R3=144 n3=8
 There are k-1=2 degree of freedom in this
example.
12
H
N ( N  1)

Ri 2
 3( N  1)
ni
 76.52 79.52 144 2 
12

  3(24  1)
H


24(24  1)  8
8
8 
 7.27125
44
Adjusted Formulae for Tied
 the number of individuals within the j-th
tied subgroup
HC  H / C
C  1 (t 3j  t j ) /[N3  N)]
7.2715
Hc 
(23  2)  (23  2)
1
3
24  24
 7.2775


45
CRITICAL VALUE
 Table 11 H-critical values
 C2 –Critical Values
 when n is big enough, H is distributed
as 2 distribution approximately with n
=k–1
46
Conclusion
 k=3,,the critical value is 5.99 .
 Since 7.27>5.99,the P<0.05. we reject H0.
that is, there is evidence that at least one of
the groups is different from others.
47
NONPARAMETRIC test
o Nonparametric: That are not focused on
testing hypothesis about the parameters of the
population.
o Distribution-free: make no assumptions about
the distribution of the data; and are suitable for
small sample sizes or large samples where
parametric assumptions are violated
 – Use ranks of the data values rather than actual
data values themselves
 – Loss of power when parametric test is appropriate
48
Parametric and non-parametric equivalents
49
NONPARAMETRIC test
Advantages
 More different types
of data
 Numerical Data with
unknown distribution or
skewed distribution
 Ordinal variable or the
measurement data that
are given with rank only
Disadvantage
 A waste of data
 Loss of power when
parametric test is
appropriate
50
Learning Objectives
 1. Understand when nonparametric
statistical methods are appropriate.
 2. Know how to perform the Wilcoxon
Signed-Rank Test and when it should be
used.
 3. Know how to perform the Wilcoxon
Rank-Sum Test and when it should be
used.
THANK YOU FOR YOUR
ATTENTION!
52