Download Tutorial on the chi2 test for goodness-of-fit testing

Mobile Computing Group A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing Outline The presentation follows the pyramid schema Chi2 tests for GoF Goodness-of-fit (GoF) Background -concepts Background   Descriptive vs. inferential statistics – Descriptive : data used only for descriptive purposes (use tables, graphs, measures of variability etc.) – Inferential : data used for drawing inferences, make predictions etc. Sample vs. population – A sample is drawn from a population, assumed to have some characteristics. – The sample is often used to make inferences about the population (inferential statistics) :  Hypothesis testing  Estimation of population parameters Background  Statistic vs. parameter – A statistic is related (estimated from) a sample. It can be used for both descriptive and inferential purposes – A parameter refers to the whole population. A sample statistic is often used to infer a population parameter   Example : the sample mean may be used to infer the population mean (expected value) Hypothesis testing – A procedure where sample data are used to evaluate a hypothesis regarding the population – A hypothesis may refer to several things : properties of a single population, relation between two populations etc. – Two statistical hypotheses are defined: a null H0 and an alternative H1  H0 is the often a statement of no effect or no difference. It is the hypothesis the researcher seeks to reject Background  Inferential statistical test –  Hypothesis testing is carried out via an inferential statistic test :  Sample data are manipulated to yield a test statistic  The obtained value of the test statistic is evaluated with respect to a sampling distribution, i.e., a theoretical probability distribution for the possible values of the test statistic  The theoretical values of the statistic are usually tabulated and let someone assess the statistical significance of the result of his statistical test The goodness-of-fit is a type of hypothesis testing – devise inferential statistical tests, apply them to the sample, infer the matching of a theoretical distribution to the population distribution GoF as hypothesis testing  Hypothesis H0: –  The sample data are manipulated to derive a test statistic –  The sample is derived from a theoretical distribution F() In the case of the chi2 statistic this includes aggregation of data into bins and some computations The statistic, as computed from data, is checked against the sampling distribution – For the chi2 test, the sampling distribution is the chi2 distribution, hence the name Goodness-of-fit  Statistical tests and statistics : the big picture EDF-based tests e.g., KS test, Anderson-Darling test Classical chi2 statistics Chi2 type tests e.g., Shapiro-Wilk test for normality Generalized chi2 statistics Log-likelihood ratio statistic Modified chi2 statistic Specialized tests Pearson chi2 statistic Pearson chi2 statistic If X1, X2, X3…Xn , the random sample and F() the theoretical distribution under test, the Pearson chi2 statistic is computed as: M X2  i 1 Oi  Ei  2  Ei M   N i  n  pi  2 i 1 n  pi  M : number of bins  Oi (Ni): observed frequency in bin i  n  Ei (npi) : expected frequency in bin i according to the theoretical distribution F() : sample size pi  P( X j falls in bin i )   dF x  i Interpretation of chi2 statistic   Theory says that the Pearson chi2 statistic follows a chi2 distribution, whose df are – M-1, when the parameters of the fitted distribution are given a priori (case 0 test) – Somewhere between M-1 and M-1-q, when the q parameters of the distribution are estimated by the sample data – Usually, the df for this case are taken to be M-1-q Having estimated the value of the chi2 statistic X2 , I check the chi2 distribution with M-1 (M-1-q) df to find – What is the probability to get a value equal to or greater than the computed value X2, called p-value – If p > a, where a is the significance level of my test, the hypothesis is rejected, otherwise it is retained – Standard values for a are 0.1, 0.05, 0.01 – the higher a is the more conservative I am in rejecting the hypothesis H0 Example  A die is rolled 120 times  1 comes 20 times, 2 comes 14, 3 comes 18, 4 comes 17, 5 comes 22 and 6 comes 29 times  The question is: “Is the die biased?” –or better: “Do these data suggest that the die is biased?”  Hypothesis H0 : the die is not biased – Therefore, according to the null hypothesis these numbers should be distributed uniformly – F() : the discrete uniform distribution Example – cont.  Computations: Bin 1 2 3 4 5 6 Sums  Oi 20 14 18 17 22 29 120 Ei 20 20 20 20 20 20 120 Oi- Ei 0 -6 -2 -3 2 9 0 (Oi- Ei)2 0 36 4 9 4 81 (Oi- Ei)2/ Ei 0 1.8 .2 .45 .2 4.05 X2=6.7 Interpretation – The distribution of the test statistic has 5 df – The probability to get a value smaller or equal than 6.7 under a chi2 distribution with 5 df (p-value) is 0.75, which is < 1-a for all a in {0.01..0.1}. – Therefore the hypothesis that the die is not biased cannot be rejected Interpretation of Pearson chi2  Graphical illustration  f z z    52 At 10% significance level, I would reject the hypothesis if the computed X2>9.24) 10% of the area under the curve 6.7 P-value : 0.25 9.24 11.07 15.09 0.1 0.05 0.01 z Properties of Pearson chi2 statistic  It can be estimated for both discrete and continuous variables – Holds for all chi2 statistics. Max flexibility but fails to make use of all available information for continuous variables  It is maybe the simplest one from computational point of view  As with all chi2 statistics, one needs to define number and borders of bins – These are generally a function of sample size and the theoretical distribution under test Bin selection  How many and which? –  Different opinions in literature, no rigid proof of optimality There seems to be convergence on the following aspects – Probability of bins  – The bins should be chosen equiprobable with respect to the theoretical distribution under test Minimum expected frequencies npi :  (Cramer, 46) : npi > 10, for all bins  (Cochran, 54) : npi > 1 for all bins, npi >= 5 for 80% of bins  (Roscoe and Byars,71) Bin selection  Relevance of bins M to sample size N – (Mann and Wald, 42), (Schorr, 74) : for large sample sizes 1.88n2/5 < M < 3.76n2/5 – (Koehler and Larntz,80) : for small sample size M>=3, n>=10 and n2/M>=10 – (Roscoe and Byars, 71)  Equi-probable bins hypothesis : N > M when a = 0.01 and a = 0.05  Non-equiprobable bins : N>2M (a = 0.05) and N>4M (a=0.01) Bin selection  Bins vs. sample size according to Mann and Ward Bin selection : cont. vs. discrete Fx  x  1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Equi-probable bins easy to select Fx  x  x Bin i 1.0 Less straightforward to define equi-probable bins 1 2 3 4 5 6 7 x References Textbooks   D.J. Sheskin, Handbook of parametric and nonparametric statistical procedures – Introduction (descriptive vs. inferential statistics, hypothesis testing, concepts and terminology) – Test 8 (chap. 8) – The Chi-Square Goodness-of-Fit Test (high-level description with examples and discussion on several aspects) R. Agostino, M. Stephens, Goodness-of-fit techniques – Chapter 3 – Tests of Chi-square type  Reviews the theoretical background and looks more generally at chi2 tests, not only the Pearson test. References Papers  S. Horn, Goodness-of-Fit tests for discrete data: A review and an Application to a Health Impairment scale – Good discussion of the properties and pros/cons of most goodnessof-fit tests for discrete data – accessible, tutorial-like

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Tutorial on the chi2 test for goodness-of-fit testing