Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mobile Computing Group A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing Outline The presentation follows the pyramid schema Chi2 tests for GoF Goodness-of-fit (GoF) Background -concepts Background Descriptive vs. inferential statistics – Descriptive : data used only for descriptive purposes (use tables, graphs, measures of variability etc.) – Inferential : data used for drawing inferences, make predictions etc. Sample vs. population – A sample is drawn from a population, assumed to have some characteristics. – The sample is often used to make inferences about the population (inferential statistics) : Hypothesis testing Estimation of population parameters Background Statistic vs. parameter – A statistic is related (estimated from) a sample. It can be used for both descriptive and inferential purposes – A parameter refers to the whole population. A sample statistic is often used to infer a population parameter Example : the sample mean may be used to infer the population mean (expected value) Hypothesis testing – A procedure where sample data are used to evaluate a hypothesis regarding the population – A hypothesis may refer to several things : properties of a single population, relation between two populations etc. – Two statistical hypotheses are defined: a null H0 and an alternative H1 H0 is the often a statement of no effect or no difference. It is the hypothesis the researcher seeks to reject Background Inferential statistical test – Hypothesis testing is carried out via an inferential statistic test : Sample data are manipulated to yield a test statistic The obtained value of the test statistic is evaluated with respect to a sampling distribution, i.e., a theoretical probability distribution for the possible values of the test statistic The theoretical values of the statistic are usually tabulated and let someone assess the statistical significance of the result of his statistical test The goodness-of-fit is a type of hypothesis testing – devise inferential statistical tests, apply them to the sample, infer the matching of a theoretical distribution to the population distribution GoF as hypothesis testing Hypothesis H0: – The sample data are manipulated to derive a test statistic – The sample is derived from a theoretical distribution F() In the case of the chi2 statistic this includes aggregation of data into bins and some computations The statistic, as computed from data, is checked against the sampling distribution – For the chi2 test, the sampling distribution is the chi2 distribution, hence the name Goodness-of-fit Statistical tests and statistics : the big picture EDF-based tests e.g., KS test, Anderson-Darling test Classical chi2 statistics Chi2 type tests e.g., Shapiro-Wilk test for normality Generalized chi2 statistics Log-likelihood ratio statistic Modified chi2 statistic Specialized tests Pearson chi2 statistic Pearson chi2 statistic If X1, X2, X3…Xn , the random sample and F() the theoretical distribution under test, the Pearson chi2 statistic is computed as: M X2 i 1 Oi Ei 2 Ei M N i n pi 2 i 1 n pi M : number of bins Oi (Ni): observed frequency in bin i n Ei (npi) : expected frequency in bin i according to the theoretical distribution F() : sample size pi P( X j falls in bin i ) dF x i Interpretation of chi2 statistic Theory says that the Pearson chi2 statistic follows a chi2 distribution, whose df are – M-1, when the parameters of the fitted distribution are given a priori (case 0 test) – Somewhere between M-1 and M-1-q, when the q parameters of the distribution are estimated by the sample data – Usually, the df for this case are taken to be M-1-q Having estimated the value of the chi2 statistic X2 , I check the chi2 distribution with M-1 (M-1-q) df to find – What is the probability to get a value equal to or greater than the computed value X2, called p-value – If p > a, where a is the significance level of my test, the hypothesis is rejected, otherwise it is retained – Standard values for a are 0.1, 0.05, 0.01 – the higher a is the more conservative I am in rejecting the hypothesis H0 Example A die is rolled 120 times 1 comes 20 times, 2 comes 14, 3 comes 18, 4 comes 17, 5 comes 22 and 6 comes 29 times The question is: “Is the die biased?” –or better: “Do these data suggest that the die is biased?” Hypothesis H0 : the die is not biased – Therefore, according to the null hypothesis these numbers should be distributed uniformly – F() : the discrete uniform distribution Example – cont. Computations: Bin 1 2 3 4 5 6 Sums Oi 20 14 18 17 22 29 120 Ei 20 20 20 20 20 20 120 Oi- Ei 0 -6 -2 -3 2 9 0 (Oi- Ei)2 0 36 4 9 4 81 (Oi- Ei)2/ Ei 0 1.8 .2 .45 .2 4.05 X2=6.7 Interpretation – The distribution of the test statistic has 5 df – The probability to get a value smaller or equal than 6.7 under a chi2 distribution with 5 df (p-value) is 0.75, which is < 1-a for all a in {0.01..0.1}. – Therefore the hypothesis that the die is not biased cannot be rejected Interpretation of Pearson chi2 Graphical illustration f z z 52 At 10% significance level, I would reject the hypothesis if the computed X2>9.24) 10% of the area under the curve 6.7 P-value : 0.25 9.24 11.07 15.09 0.1 0.05 0.01 z Properties of Pearson chi2 statistic It can be estimated for both discrete and continuous variables – Holds for all chi2 statistics. Max flexibility but fails to make use of all available information for continuous variables It is maybe the simplest one from computational point of view As with all chi2 statistics, one needs to define number and borders of bins – These are generally a function of sample size and the theoretical distribution under test Bin selection How many and which? – Different opinions in literature, no rigid proof of optimality There seems to be convergence on the following aspects – Probability of bins – The bins should be chosen equiprobable with respect to the theoretical distribution under test Minimum expected frequencies npi : (Cramer, 46) : npi > 10, for all bins (Cochran, 54) : npi > 1 for all bins, npi >= 5 for 80% of bins (Roscoe and Byars,71) Bin selection Relevance of bins M to sample size N – (Mann and Wald, 42), (Schorr, 74) : for large sample sizes 1.88n2/5 < M < 3.76n2/5 – (Koehler and Larntz,80) : for small sample size M>=3, n>=10 and n2/M>=10 – (Roscoe and Byars, 71) Equi-probable bins hypothesis : N > M when a = 0.01 and a = 0.05 Non-equiprobable bins : N>2M (a = 0.05) and N>4M (a=0.01) Bin selection Bins vs. sample size according to Mann and Ward Bin selection : cont. vs. discrete Fx x 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Equi-probable bins easy to select Fx x x Bin i 1.0 Less straightforward to define equi-probable bins 1 2 3 4 5 6 7 x References Textbooks D.J. Sheskin, Handbook of parametric and nonparametric statistical procedures – Introduction (descriptive vs. inferential statistics, hypothesis testing, concepts and terminology) – Test 8 (chap. 8) – The Chi-Square Goodness-of-Fit Test (high-level description with examples and discussion on several aspects) R. Agostino, M. Stephens, Goodness-of-fit techniques – Chapter 3 – Tests of Chi-square type Reviews the theoretical background and looks more generally at chi2 tests, not only the Pearson test. References Papers S. Horn, Goodness-of-Fit tests for discrete data: A review and an Application to a Health Impairment scale – Good discussion of the properties and pros/cons of most goodnessof-fit tests for discrete data – accessible, tutorial-like