Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Non-parametric Statistics An Introduction for Experimentalists Sebastian Strasser University of Munich July 13, 2011 2 Sebastian Strasser Non-parametric Statistics Overview I I The role of non-parametric statistics in experimental economics Tests I I I I I I 3 Mann-Whitney U Test Kolmogorov-Smirnov Test Wilcoxon Test Binomial Test χ2 Test Kruskal-Wallis Test Sebastian Strasser Non-parametric Statistics The role of non-parametric statistics in experimental economics I small sample sizes (often between n = 6 and n = 30 (cf. independent observations)) I no assumption about underlying distribution of data generating process I possibility to analyze ordinal and categorical data I close relation to methods used in medicine, biology, chemistry In practice, both non-parametric and parametric analyzes (OLS, IV, MLE, GMM, t-test, etc.) conducted at the same time. 4 Sebastian Strasser Non-parametric Statistics Mann-Whitney U Test Definition Test whether two statistically independent groups have been drawn from the same population with respect to the mean. H0 No difference in means H1a Difference in means: X 6= Y (two-sided) H1b Difference in means: X < Y or X > Y (one-sided) 5 Sebastian Strasser Non-parametric Statistics Mann-Whitney U Test Example Ultimatum Game with students from economics (ECON) and management science (MGMT).Variable of Interest: Offered amount. Offered amounts (ECON) Offered amounts (MGMT) 2 3 4 2.5 1 5 0.5 5 0.5 Step 1: Bring all observations in ascending order and assign ascending ranks: offer group rank 6 0.5 ECON 1.5 0.5 ECON 1.5 1 ECON 3 2 ECON 4 Sebastian Strasser 2.5 MGMT 5 3 MGMT 6 4 ECON 7 Non-parametric Statistics 5 MGMT 8.5 5 MGMT 8.5 Mann-Whitney U Test Example offer group rank 0.5 ECON 1.5 0.5 ECON 1.5 1 ECON 3 2 ECON 4 2.5 MGMT 5 3 MGMT 6 4 ECON 7 5 MGMT 8.5 5 MGMT 8.5 Step 2: Sum the ranks of the smaller group to obtain W. I In our example: W (N) = 28 (Wmax =30) I p = 0.063 (two-sided) (table J from Siegel/Castellan) I p = 0.048 (two-sided) (from STATA) Approximation of W (N) for n → ∞ through normal distribution STATA: ranksum offer, by(study) 7 Sebastian Strasser Non-parametric Statistics Kolmogorov-Smirnov Test Definition Test whether two statistically independent groups have been drawn from the same population with respect to the distribution (mean, skewness, kurtosis). H0 Same distribution H1 Difference in distributions (two-sided) 8 Sebastian Strasser Non-parametric Statistics Kolmogorov-Smirnov Test Example Ultimatum Game with students from economics (ECON) and management science (MGMT). Variable of Interest: Offered amount. Offered amounts (ECON) Offered amounts (MGMT) 2 3 4 2.5 1 5 0.5 5 0.5 Step 1: Determine the cumulative frequencies of the observations: offer ECON MGMT 9 0.5 40 % 0% 1 60 % 0% 2 80 % 0% Sebastian Strasser 2.5 80 % 25 % 3 80 % 50 % 4 100 % 50 % Non-parametric Statistics 5 100 % 100 % Kolmogorov-Smirnov Test Example offer ECON MGMT Sn (X ) − Sm (X ) 0.5 40 % 0% 40 % 1 60 % 0% 60 % 2 80 % 0% 80 % 2.5 80 % 25 % 55 % 3 80 % 50 % 30 % 4 100 % 50 % 50 % 5 100 % 100 % 0% Step 2: Look for the biggest absolute difference between the cumulative frequencies by calculating the following values: 10 I Dm,n = max |Sn (X ) − Sm (X )| where m(n) is the number of observations in both samples I Sm (X ) = equal X . K m where K is the number of observations smaller or Sebastian Strasser Non-parametric Statistics Kolmogorov-Smirnov Test Example offer ECON MGMT Sn (X ) − Sm (X ) 0.5 40 % 0% 40 % 1 60 % 0% 60 % 2 80 % 0% 80 % 2.5 80 % 25 % 55 % 3 80 % 50 % 30 % 4 100 % 50 % 50 % 5 100 % 100 % 0% The test statistic is then given by m · n · Dm,n = 5 · 4 · 0.8 = 16 I p = 0.10 (two-sided) (table LII from Siegel/Castellan) I p = 0.116 (two-sided) (from STATA) Approximation for n → ∞ through χ2 distribution STATA: ksmirnov offer, by(study) Possibility to test against theoretical distribution 11 Sebastian Strasser Non-parametric Statistics Wilcoxon Signed-Ranks Test Definition Test whether there are difference between two statistically dependent observations (X1 and X2 ). H0 No differences between the observations (X1 = X2 ). H1a Difference between the observations: X1 6= X2 (two-sided) H1b Difference between the observations: X1 < X2 or X1 > X2 (one-sided) 12 Sebastian Strasser Non-parametric Statistics Wilcoxon Signed-Ranks Test Example Repeated Ultimatum Game.Variable of Interest: Offered amount in rounds 1 and 2. subject round 1 round 2 1 0.5 1.5 2 0.5 1.5 3 1 1 4 2 1.5 5 2.5 1 6 3 1 7 4 1 8 5 2 9 5 2.5 Step 1: Determine the difference between the paired observations and assign ranks according to the absolute difference (taking into account the sign of the difference): subject round 1 round 2 difference rank 13 1 0.5 1.5 1 +2.5 2 0.5 1.5 1 +2.5 3 1 1 0 drop Sebastian Strasser 4 2 1.5 -0.5 -1 5 2.5 1 -1.5 -4 6 3 1 -2 -5 7 4 1 -3 -7.5 Non-parametric Statistics 8 5 2 -3 -7.5 9 5 2.5 -2.5 -6 Wilcoxon Signed-Ranks Test Example subject round 1 round 2 difference rank 1 0.5 1.5 1 +2.5 2 0.5 1.5 1 +2.5 3 1 1 0 drop 4 2 1.5 -0.5 -1 5 2.5 1 -1.5 -4 6 3 1 -2 -5 7 4 1 -3 -7.5 8 5 2 -3 -7.5 9 5 2.5 -2.5 -6 I T + = Sum of all ranks with positive sign. (T + = 5) I T − = Sum of all ranks with negative sign. (T − = 31) I p = 0.078 (two-sided with N = 8 (!)) (table H from S/C) I p = 0.0745 (two-sided) (from STATA) Approximation for n → ∞ through normal distribution STATA: signrank offer1 = offer2 Sign-Test as an alternative (neglecting the size of the deviations) 14 Sebastian Strasser Non-parametric Statistics Binomial Test Definition Test whether there the sampling distribution of a dichotomous random variable is different from a population with p = p0 Two possible events (X = 1 or X = 0): heads or tail, budget surplus or deficit, etc. I Probability of X = 1: p I Probability of X = 0: q = 1 − p H0 p = p 0 H1 p 6= p0 15 Sebastian Strasser Non-parametric Statistics Binomial Test Example dice throw result X 1 heads 0 2 tail 1 3 heads 0 4 heads 0 5 heads 0 6 heads 0 I Is the coin a fair coin, i.e. p = q = 0.5 I Y = P 7 tail 1 X =2 ! P[Y = k] = where n k 16 ! = n k N−k p q k N! k!(N − k)! Sebastian Strasser Non-parametric Statistics 8 heads 0 9 heads 0 10 heads 0 Binomial Test Example The probability that Y = 2 is given by ! P[Y = 2] = 10 2 8 10! p q = 0.52 0.52 = 0.043 2 2!8! We are interested in the cumulative probability that Y ≤ r or Y ≥ s P[Y ≤ k] = k X n i i=0 P[Y ≤ 2] = 2 X n i=0 17 Sebastian Strasser i ! p i q N−i ! p i q N−i = 0.055 Non-parametric Statistics Binomial Test Another Example Systematic deviations of actual expenses from budget for R&D: I I I 18 12 out of 32 negative p = 0.2153 hence not able to reject H0 STATA: bitest deviation==0.5 Sebastian Strasser Non-parametric Statistics χ2 Test Definition Test whether there are differences in distributions in two (or more) categories A and B. H0 No differences between the categories. H1 Differences between the categories. Possibility to test between two observed distributions or to compare observed sample to a theoretical distribution. Minimum number of observations per cell: ≈ 5 19 Sebastian Strasser Non-parametric Statistics χ2 Test Easiest application: 2 x 2 tables, but possibility for n x k extension: A C B D Test statistic (general): χ2 = k X n X (Oij − Eij )2 j=1 i=1 Eij where Oij is the observation, Eij the expected observation under independence and n(k) the number of rows(columns) Degrees of freedom: df = (n − 1)(k − 1) 20 Sebastian Strasser Non-parametric Statistics χ2 Test Example Application for 2 x 2 tables: Econ Mgmt Offers under 5 8 13 Offers of 5 and more 14 12 Approximation of test statistic for 2 x 2 tables: χ2 = N(|AD − BC | − N2 )2 (A + B)(C + D)(A + C )(B + D) χ2 = 1.15 (p = 0.282, df = 1) χ2 = 0.61 (with 2 x 2 approximation) STATA: tab study offerdummy, chi For smaller cells (< 5), use Fisher-exact test (hypergeo test stat) STATA: tab study offerdummy, exact 21 Sebastian Strasser Non-parametric Statistics Kruskal Wallis Test (or H-Test) Definition Test whether k statistically independent samples have been drawn from the same population w.r.t. to the mean H0 k samples are from the same population H1 k samples are from different populations Closely related to Mann-Whitney, but applicable to k > 2 groups Test statistic: H= X RS 2 12 h − 3(n + 1) n(n + 1) h nh where RSh is the sum of ranks per group STATA: kwallis offer, by(age) 22 Sebastian Strasser Non-parametric Statistics Overview of tests used 23 Sebastian Strasser Non-parametric Statistics 24 Sebastian Strasser Non-parametric Statistics