Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture9 nonparametric methods Xiaojin YU Department of Epi. And Biostatistics, School of public health,Southeast university 1 Review: Type of data qualitative data (categorical data) (1) binary (dichotomous, binomial) (2) multinomial (polytomous) (3) ordinal quantitative data 2 Measures of central tendency- quantitative data Mean: Normal distribution Geometric mean: positively skew and data can be transferred into normal distribution by log scale. Median: used by all data, in general, is often used to abnormal data. Measures of Dispersionquantitative data Range, Interquartile range , Variance and standard deviation , coefficient of variation 4 Compare means by t-test type conditions H0 x ,S,n,μ0 μ=μ0 Paired t-test d ,Sd,n μd=0 Two group ttest x1 , s1,n1, , x2 s2,n2 Single sample t-test t Assumption: normality Equality of variance t x 0 t s/ n t d 0 sd / n ν n-1 np-1 x1 x2 (n1 1) s12 (n2 1) s22 1 1 ( ) (n1 1) (n2 1) n1 n2 n1+n2-2 Comparison of Means between two groups N (missing) age mean(std) Median (Min-Max) Duria mean(std) tion Median (Min-Max) 2017/5/5 Group A 72( 0) Group B 70( 0) Methods statistic P 40.11± 14.87 41.00 (17- 65) 23.74± 70.35 5.00 (1-365) 37.26± 14.45 36.00 (17-67) 22.41± 70.75 4.00 (1-365) t- test 1.160 0.2482 rank sum 0.112 0.9113 6 7 Compare proportion by Chisquare test Drugs Drug A Drug B total Effect of drug effective Not effective 41 4 24 11 65 15 total Sampl e rates 45 35 70 91.1 68.6 Are the 2 population proportions equal or not? How categorical variables are distributed among 2 population? 8 solution H0: πA = πB H1: πA≠πB, α=0.05 A (T) 41 (36.56) 4 (8.44) 24 (28.44) 11 (6.56) Calculate T and Test Statistic, Chi-square Drug positive negative R total A T11 36.56 T12 8.44 n1 B T21 28.44 T22 6.56 n2 C total m1 m2 n 9 Test Statistic (A T) T A (T) 2 2 2 41 36.56 4 8.442 2 36.56 8.44 41 (36.56) 4 (8.44) 24 (28.44) 11 (6.56) 2 24 28.44 28.44 2 11 6.56 6.56 6.573 10 Conclusion Since 6.573>3.84, P<0.05, we reject H0,accept H1 at 0.05 level, so We conclude that the two populations are not homogeneous with respected to effect of drug. The effects of drug A and drug B are not equivalent. 11 Compare Ordinal data result Treatment (1) heal 4 significantly effective 3 effective 2 ineffective 1 total 65(51.59) 18(14.29) 30(23.81) 13(10.32) 126 ControL(2) 42(51.22) 6(7.32) 23(28.05) 11(13.41) 82 OUTLINE Basic logic of rank based methods Rank sum test for 2 independent group (Completely random design) Sign rank test for Paired design Rank sum test for 3 or more independent group (Completely random design) Multiple Comparison 13 Rank & Rank Sum Review of median Example of duration in Hospital: month 3.1 5.5 6.0 10.2 11.9 rank 1 2 3 4 5 14 Task to you How to compare boys are taller or girl are taller no measuring is allowed? Solution to the task Blue-male Red- female 16 The locations of small value are in front(small rank), and great value are in the post(great rank). 17 Part I: Wilcoxon Rank Sum Test Rank Sum Test for Comparing the Locations of Two Populations Mann-Whitney test review t-test for comparing 2 population means Normality and homogeneity 18 Rank: 1 2017/5/5 2 3.5 3.5 5 6 19 EXAMPLE 1: Table 9.1 Survival Times of Cats & Rabbits without oxygen Cats minutes 25 34 44 46 46 48 49 50 n1=8 rank 9.5 13 15 16 17 18 19 20 T1=127.5 rabbits minutes rank 14 1 15 2 16 3 17 4 19 5 21 6.5 21 6.5 23 8 25 9.5 28 11 30 12 35 14 n2=12 T2=82.5 20 STEP I: Test Hypothesis and sig. level H0:M1=M2 population locations of survival time of both cat and rabbit are equal H1: M1 ≠ M2 population locations of survival time of both cat and rabbit are not equal ; a = 0.05 21 STEP II: Statistic Assign Ranks To pool n1 +n2 observations to form a single sample rank all observations of the pooled sample from smallest to largest in column 2 and 4 Mid-ranks are used by tied values Pooled sample time rank time rank 28 11 14 1 30 12 15 2 34 13 16 3 35 14 17 4 44 15 19 5 46 16 21 6.5 46 17 21 6.5 48 18 23 8 49 19 25 9.5 50 20 25 9.5 n1=8 T1=127.5 n2=12 T2=82.5 22 STEP II: Statistic Test statistic T Calculate the rank sums for the two samples respectively, denotes by T1 and T2. Take the Ti with small n as T. n1=8<n2=12, so T= T1 =127.5. Sum(T1 ,T2)=N(N+1)/2=210 23 STEP III: Determine P Value, conclusion From table in appendix E, by n1=8,n2-n1=4, we have the critical interval of Tα (58-110) Since T=127.5, is beyond of Tα, so,P≤α。Given α=0.05, P<0.05; H0 is rejected, it concludes that the survival times of cats and rabbits in the environment without oxygen might be different. Cat will survive for longer time without oxygen. 24 BASIC LOGIC N=N1+N2 GIVEN N, the total rank sum is fixed and can be calculated . If H0 is true, the total rank sum should be assigned between 2 groups with weight of ni. N ( N 1) n1 ( N 1) n2 ( N 1) 2 2 2 25 Normal Approximation n1>10 or n2-n1 >10 Z T n1(N 1) / 2 0.5 N n1 n2 n1n2 (N 1) /12 Correction of ties Zc Z / c C 1 (t j3 t j )/(N 3 N) 26 EXAMPLE 2: Table 9-2 Results From a Clinic Trial for Hypertension effect Drug A Drug B total 0 ineffect. 17 70 1 effect. 25 2(healed) total Range of rank Averag e rank Rank sum DrugA DrugB 87 1-87 44 748 3080 13 38 88-125 106.5 2662.5 1384.5 27 37 64 126-189 157.5 4252.5 5827.5 69 120 189 ~ ~ 7663 10292 TA=7663 n=69; TB=10293,n=120 27 Part II: Wilcoxon’s Signed Rank Test Wilcoxon(1945) H0: Md=0 Example: A test procedure the data on 28 patients data(14 pairs) from a sequential analysis double blind clinical trial for cancer of the head and neck will be used. (Bakowski MT, etc. Int. J. Radiation Oncology Biology Physics 1978 ,4 :115-119) 28 Wilcoxon’s Signed Rank Test often used 1) quantitative data---t-test for pairs design the difference of pairs must be normal, if its distribution is skew then must used Signed Rank Test. (2)Qualitative data--- pairs design ordinal 29 Example 9.3 2 treatment groups : radiotherapy + drug (B) radiotherapy + placebo (A) The tumor response within three months of completion of treatment was assessed for each patient in terms of complete regression (CR), partial regression (PR), no change (NC) and progression of the disease (P). Scored from 1 to 5 as follows: 5 = CR with no recurrence subsequently up to 6 months ore, 4 = CR initially but with a subsequent recurrence within 6 months, 3 = PR, 2 = NC, 1 = P. 30 2) 31 STEP I: Test Hypothesis H0 : Md=0 population Median of differences is equal to zero; H1 : Md≠0 population Median of differences is equal to zero; α=0.05 32 STEP II: statistic Assigning Rank 1) Calculate the difference di=xi-yi, and ignore all the pairs with zero differences. 2) Rank the absolute values of non-zero dis from the smallest to the largest such that each di gets a rank; if there is a tie, what will we do? 33 Ties: These six patients all have differences of 1 and therefore the rank numbers 1, 2, 3, 4, 5 and 6 must be divided amongst them. That is, they all have a rank of (1 + 2 + 3 + 4 + 5 + 6)/6 =3.5 3) Assign the initial signs of dis to their ranks 34 Test Statistics T valid number of pairs n=10; Find the sum of the ranks with positive signs and denote by T+; Find the sum of the ranks with negative signs and denote by T-; Sum(T+ ,T-)=n(n+1)/2=55 Let T=min(T+ ,T-) or anyone。 T-=48, T+=7 35 Step 3) Determine the P value & Conclude A Conclusion n<25,find the critical value range Tα in table 10.3 (P184) . n=10,T=48 or 7,in this example, given the value of α=0.05, find the critical value T0.05 is (8~47),T is not in the interval, P<0.05, H0 is not rejected。 It can not conclude that the results from two different between 2 treatments. 36 Normal Approximation When n> 25,the table 10.3 can’t help. Then we turn to the normal approximation. In fact it can be proved that if H0 is true, when n is large enough, the distribution of statistic T will close to a normal distribution with nn 1 T 4 nn 12n 1 T 24 37 Correction of Continuity If there is tie, the statistic is Z T nn 1 / 4 0.5 nn 12n 1 / 24 38 Part III: Kruskal-Wallis Test Similar to one-way ANOVA /chi-square test Used to test location of more than 2 populations 39 Example 9.4 Allocate 24 person randomly to 1 of 3 groups: no exercise; 20 minutes of jogging per day; or 60 minutes of jogging per day. At the end of a month, ask each participant to rate how depressed they now feel, on a Likert scale that runs from 1 ("totally miserable") through to 100 (ecstatically happy"). Question:Does physical exercise alleviate depression? 40 Report on depression from 3 groups and ranks No exercise score rank 23 2 26 3 51 16 49 14 58 19 37 7 29 5.5 44 10 76.5 Ri Jogging for20 minutes score rank 22 1 27 4 39 9 29 5.5 46 11 48 12 49 14 65 23 79.5 Jogging for 60 minutes score rank 59 20 66 24 38 8 49 14 56 17.5 60 21 56 17.5 62 22 144 41 Test Hypothesis H0: M1=M2=…=Mk 3 populations have the same population location H1: M1,M2,…Mk are not all equal : 3 populations have different population location , At least one of the populations has a median different from the others. a = 0.05 42 Test Statistic -H Ri N 1 12 H ni N ( N 1) i 1 ni 2 K 2 Let N=n1+n2+n3 Ri the sum of the ranks associated with the ith sample, like 76.5,79.5,144 The average rank is (N+1)/2 The sample average rank for ith sample is Ri/ni 12/{N(N+1)}standard the test statistic in terms of the overall sample size N. 43 Solution to Example K=3 R1=76.5 n1=8 R2=79.5 n2=8 R3=144 n3=8 There are k-1=2 degree of freedom in this example. 12 H N ( N 1) Ri 2 3( N 1) ni 76.52 79.52 144 2 12 3(24 1) H 24(24 1) 8 8 8 7.27125 44 Adjusted Formulae for Tied the number of individuals within the j-th tied subgroup HC H / C C 1 (t 3j t j ) /[N3 N)] 7.2715 Hc (23 2) (23 2) 1 3 24 24 7.2775 45 CRITICAL VALUE Table 11 H-critical values C2 –Critical Values when n is big enough, H is distributed as 2 distribution approximately with n =k–1 46 Conclusion k=3,,the critical value is 5.99 . Since 7.27>5.99,the P<0.05. we reject H0. that is, there is evidence that at least one of the groups is different from others. 47 NONPARAMETRIC test o Nonparametric: That are not focused on testing hypothesis about the parameters of the population. o Distribution-free: make no assumptions about the distribution of the data; and are suitable for small sample sizes or large samples where parametric assumptions are violated – Use ranks of the data values rather than actual data values themselves – Loss of power when parametric test is appropriate 48 Parametric and non-parametric equivalents 49 NONPARAMETRIC test Advantages More different types of data Numerical Data with unknown distribution or skewed distribution Ordinal variable or the measurement data that are given with rank only Disadvantage A waste of data Loss of power when parametric test is appropriate 50 Learning Objectives 1. Understand when nonparametric statistical methods are appropriate. 2. Know how to perform the Wilcoxon Signed-Rank Test and when it should be used. 3. Know how to perform the Wilcoxon Rank-Sum Test and when it should be used. THANK YOU FOR YOUR ATTENTION! 52