* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Choosing the Appropriate Statistical Test with your Computer
Survey
Document related concepts
Transcript
Choosing the Appropriate Statistical Test with your Computer Volume One Parametric/Non-Parametric Tests P.Y Cheng Preface Volume 1, 2 and 3 of ‘Choosing the approximate statistical test with your computer’ are the newest Books I have written so far, and the characteristics of these new books are: 1. They try to help readers to choose the approximate statistical test for their data, which is a very important starting point for their success! 2. As in previous books, we try to skip difficult theories and try to demonstrate the using of different tests with clear and typical examples! We hope readers can, at the first time, follow the examples in this book for choosing the approximate test for their data, even if they don’t want to learn too much about underlying theories immediately! 3. For ordinal and interval data, we try to compare the parametric tests with their corresponding non-parametric ones one by one, and show when we should use which of them. 4. For categorical data, we try to clarify the very similar and rather confusing types of testing. For example, we remind readers not to be confused by number of criteria and number of samples. We also remind users that the calculators would not differentiate whether we are running a test for independence or a test for homogeneity, only we human knows about it! 5. As in other books published previously, we introduce free web calculators corresponding to large statistical software e.g. SPSS, so the readers can still get the same results even without expensive packages! Acknowledgments Firstly of all, I would like to thanks Prof. C.M. Wong and Dr L.M. Ho who give me the braveness to start writing. They have been consultants for many staff of the University of Hong Kong and are always helpful for any HKU staff approaching them for asking statistical problems! I would also like to thank everybody who has contributed for the publishing of this book! This include the publishing company (which is still no known yet), the authors of reference books or internet material that has helped me a lot during the writing and checking process! I would like to thank my friends and relatives who have been encouraging me during the publishing of my books, especially my son (Andy) and my wife (Betty) who have shown their patient and understanding when I concentrate on the production of this new book! Lastly, I would like to thank, in advance, any future audience of this book and hope they can find some useful materials for helping them to solve statistical problems. Also hope they can enjoy reading this book in full color, with hundreds of brilliant pictures and many ‘cook book’ examples! Medical Faculty, The University of Hong Kong Cheng Ping Yuen (Senior Technician) Bachelor of Life Science (BSc), Napier University, UK Master of Public Health (MPH), Hong Kong University Certificate of Hong Kong Statistics Society (HKSS) Fellow of British Royal Statistician Society (RSS) Microsoft Certified Professional (MCP) Hong Kong Registered Medical Technologist (Class I) Phone : (852) 9800 7023 / 3917 9417 email : [email protected] Content 1.1 Some basic concepts 1.1.a 1.1 b 1.1.c 1.1.d 1.1.e Definition of Parametric and Non-Parametric Tests……………………...P. 1 Type of measurements (Data Type)………………………………………P. 4 Independence of samples………………………………………………….P. 5 Numbers of samples……………………………………………………....P. 5 Summary of Parametric & Non-Parametric Tests…………………………P.6 1.2 The most important distributions for using Parametric Tests 1.2.a The Normal Distribution……………………….………………………...P. 7 1.2.a.i Distribution of Sample Means…………………………………..P. 8 1.2.a.ii The standard normal distribution from sample means………….P. 9 1.2.a.iii The t Distribution from sample means…………………………P.10 1.2.a.iv Testing of Hypothesis (Significance)…………………………..P.11 1.2.b The Binomial Distribution………………………………………………..P.15 1.2.c The Poisson Distribution………………………………………..……….P. 17 1.2.d Before giving up parametric tests………………………………………..P. 19 1.2.d.i The Central Limit Theorem…………………………………….P. 19 1.2.d.ii The Normal approximation to other distributions……………..P. 20 1.2.d.iii Robustness to deviation from distribution assumptions………P. 23 1.3 Running Parametric Tests vs corresponding Non-Parametric Tests 1.3.a.i One sample T –Test………………………………………………...…P. 24 1.3.a.ii Wilcoxon signed rank test (corresponding to one sample t test)……………………………P. 32 1.3.b.i Two Samples T Test…………………………………………………..P. 38 1.3.b.ii Mann-Whitney Test for two independent samples (Corresponding to parametric two independent samples t test)……..P. 49 1.3.c.i Paired T-Test………………………………………………………….P. 55 1.3.c.ii Wilcoxon matched-pairs signed-rank test for paired samples (Corresponding to parametric paired t test…………………………...P. 63 1.3.d.i One-way Anova: Completely Randomised Design – Equal Sample Size ………………….P. 72 1.3.d.ii Non-parametric independent k samples - Kruskal-Wallis one-way analysis of variance…………..P. 86 1.3.e.i Two Factors Anova (a X b factorial)…………………………………..P. 93 1.3.e.ii Non-parametric paired k samples - Friedman two-way analysis of variance…………………………………..….P.103 1.4 Table Form Non-parametric Tests for Categorical Data 1.4.1 One sample Chi-square Test 1.4.1.a A Goodness of fit test with one sample…………………….P. 112 1.4.1.b A Test of Independence with one sample………….……….P. 116 1.4.1.b.i 2 x 2 contingency table for one sample…………….P. 116 1.4.1.b.ii Independence test for k x k contingency table for one sample…………….P. 120 1.4.2 Two sample Chi-square Test 1.4.2.a Independent Samples 1.4.2.a.i Chi-square test for two samples……………………..P. 123 1.4.2.a.ii Chi-square test for k > 2 samples …………………..P. 127 1.4.2. b Dependent Samples 1.4.2.b.i McNemar Test for two dependent (paired) samples…………..P. 135 1.4.b.ii Cochran Q Test for k dependent samples…………...P. 136 Appendix – Installation of free statistical software - Tables Choosing the Appropriate Statistical Test with your Computer Volume One Parametric/Non-Parametric Tests 1 1.1 Some basic concepts 1.1.a Definition of Parametric vs Non-Parametric Tests Parametric Tests When applying a Parametric Test in statistics, we assume that the samples come from a population with well known distributions such as Normal Distribution, Binomial Distribution, Poisson Distribution, especially for Normal Distribution. The assumptions support the theories behind the Parametric Tests. If the population is too far away from the assumptions (e.g. Normality, equal variance), the test results might not be valid anymore! Anyway, they are called Parametric Tests since they are used for detecting population parameters such as mean, proportion or variance, with the null hypothesis involving them! The most famous parametric tests including T Test, Anova Test and linear regression. 2 Non-Parametric Tests On the other hand, non-parametric tests do not need the population to have a well known distribution. It might only just require the population to be e.g. having a continuous distribution, having a median, or being symmetric etc. etc. Unlike parametric test, they are usually not used to detected parameters of populations. However, they can be used to detect e.g. whether 2 samples come from the same population, or whether a sample agree with some frequency theories. Famous non-parametric tests includes table tests such as Chi square test, Contingency test, and rank comparing tests such as Wilcoxon rank sum test, Mann-Whitney test etc.. ** Remark – If the condition for running parametric tests is fulfilled (e.g. Normality, equal variance, data type…), please always prefer parametric tests to non-parametric tests! Parametric tests utilizes more information from data, the results is more reliable and convincing i.e. they are more powerful than non-parametric ones! 3 1.1.b Type of measurement (Data Type) What type of distribution the population could be, on the other hand, is often determined by what type of measurement being taken, thus affecting the choice between parametric or non-parametric tests to be run! Nominal e.g. Red/Bright/Green Ordinal Tasteful (1 – 10) Categorical Interval Height (cm) Continuous 1. For nominal data e.g. color, gender etc., the NO parametric test is available! Table form non-parameter tests such as Chi square test and Contingency test would be used instead! 2. For ordinal data, or for interval data that well known distributions cannot be assumed (especially for normal distribution), rank comparing test such as Wilcoxon signed rand test, Mann-Whitney test and Kruskal-Wallis test would be used for detecting e.g. whether the samples come from the same population! 3. For interval data fulfilling conditions for population assumption e.g. Normality, equal variance etc., more powerful parametric tests such as T Test, Anova Test and linear regression should be used. 4 1.1.c Independence of samples For BOTH parametric and non-parametric test, independence would determine the choice of tests to be used for inference of hypothesis. Independent samples – Observations relate to independent groups of individuals, such as weight from boys and girls Dependent samples - Each set of observations is made on the same groups of individuals e.g. blood pressure before and after a certain treatment. The observations are made on the same individuals and usually represent the change over time. 1.1.d Numbers of samples The number of samples and variables would also determine which test should be used, BOTH in parametric and in non-parametric groups of tests! This is because for > 2 samples, we cannot just repeat the 2 sample tests, otherwise the risk of committing type 1 error would increase rapidly! 5 1.1.e Summary of Parametric and Non-Parametric Tests Tests for independent samples: Parametric Tests Interval: One Sample t test Corresponding Non-parametric Tests Ordinal/Interval: Nominal (parametric tests n/a) One sample Wilcoxon signed rank test Two Samples t test Two samples Wilcoxon rank sum test, Mann-Whitney test k Samples One way Anova k samples Kruskal-Wallis One way Anova Linear Regression Non-Parametric Regression One sample Chi-square test Two sample Chi-square test, relative risk, odds ratio k sample Chi-square test Tests for dependent (paired) samples: Parametric Tests Corresponding Non-parametric Tests Interval: Ordinal/Interval: Nominal: (parametric tests n/a) Two sample Paired t test 2 sample Wilcoxon matched signed-rank test (Many people just call it Wilcoxon signed-rank test, Careful!) Two samples McNemar Test k Samples Two Way Anova k samples Friedman Two Way Anova k samples Cochran Q Test 6 1.2 The most important distributions for using parametric tests 1.2.a The Normal Distribution The normal distribution is the most important distribution in Statistics, not only because so many natural phenomena (e.g. weight, height, class mark, IQ score…) follow this distribution, but also because the possibility of making use of it for solving problems of many other statistical distributions! The function for possibility density of a Normal distribution is:- The curve is, thus, determined by two parameters: 1. the population mean µ, and 2. the population standard deviation σ (or σ2, the variance) 7 1.2.a.i Distribution of Sample means If we can always measure each individual of a population e.g. height of all children born in 1995 in UK, then we might not need to run statistical tests to get conclusion from them! However, it is usually impossible or, too costly, to make such a measurement! We usually take a sample from the population, run a statistical test with the sample data, and based on distribution and probability theories to get a conclusion of whether accepting a hypothesis or not! This is also called the interpretation of the population using a sample. The following is a sample of height from the population (e.g. children born in UK in 1995):- Image we can measure infinitive number of sample means (although we usually would not, practically, do so) and plot the frequent histogram, we can get a distribution of sample means: 8 1.2.a.ii The standard normal distribution from sample means If the population is normal and the variance is known : Then the random variable is exactly standard normal (Mean = 0, S.D. = 1), no matter how small the sample size is. All the four distributions above are normal distributions, but only the GREEN one is a Standard Normal One with µ = 0 and σ2 = 1 (σ = 1) ! Where : x is the mean of the sample µ is the mean of the population σ is the known standard deviation of population n is the sample size s is the standard deviation calculated from sample 9 1.2.a.iii The t Distribution from sample means If the population is normal and the variance is unknown. The random variable has exactly a t-distribution (Mean = 0, S.D. approaches 1 as n increased) with n-1 degrees of freedom, no matter how small the sample size is. ‘s’, the sample variance, is used instead of a population variance! Please notice that when the underlying population is normal, we can apply t-distributions for statistical tests no matter how small (degree of freedom) is!! A t-distribution is similar with the z-distribution in that they are both bi-symmetric and of dell shape, but the central peak is lower and the two tails are higher! As df (N-1) increase, it would become more and more like a z-distribution! At df 120, we might say that there is almost no difference at all 10 1.2.a.iv Testing of Hypothesis (Significance) With the z distribution (Standard Normal Distribution) and the t distribution, under which the area under them being well known (either from tables or by using computer), we can then carry out testing of hypothesis-using samples to interpret the underlying population! A probability of 0.05 is usually used as the critical probability for testing of hypothesis!! Z Distribution (z test):- Area = 0.25% Area = 0.25% If the mean and the variance of a population are known, then we can run a normal test with a sample using the z distribution (standard normal distribution). For example, an education department want to know whether the average mark of students on Mathematics this year is the same as pass years (mean = 80 and S.D. = 5). A random sample of 25 students is taken, marks measured with mean = 83 Null hypothesis H0: Mean of this year = Mean of pass years Alternative hypothesis: Ha: Mean of this year =\= Mean of pass z= = 83-80 / 5/√25 = 3/1 =3 z = 3 >> 1.96 The probability of getting z > 1.96 or < -1.96 by change only (sampling error) = 0.05. Thus the probability of getting such a high z value of 3 by chance only is << 0.05!! The sample mean is significantly different from the population mean used for z test! So we reject the Null Hypothesis that the average mark being same as pass years! 11 T Distribution (t test):- df = ∞ Df = 24 Df =12 2.179 ‐2.179 2.064 ‐2.064 2.5 Suppose the S.D. of the underlying population of student marks in the above section is unknown, and the sample variance calculated from sample data being 6 instead of being 5, then:t= = 83-80 / 6√25 = 3/1.2 = 2.5 t = 2.5 >> 2.064 (from table : df = 24 , 5% probability of same population mean) The probability of getting t > 2.064 or < -2.064 by change (sampling error) = 0.05 Thus the probability of getting such a high t value of 2.5 by chance only is << 0.05!! The sample mean is significantly different from the population mean used for t test! So we reject the Null Hypothesis that the average mark being same as last year! (Please remember that, to make use of the t-distribution for the calculation of probability above, we assume that the underlying population being normal, and the t-distribution curves are useful no matter how smaller the sample size being used!) 12 One-Tailed Test and Two-Tailed Test Not significate Significate One‐tailed Test critical value for 5% chance Two‐tailed Test critical value for 5% chance In (a), we test for whether the sample mean is greater than or smaller than the population mean , such that the probability of getting a z value is totally 0.05 in probability, i.e. 1.96 on each side! The probability on each side is 0.025 only! In (b), we just test for whether the sample mean is greater than the population mean, such that the probability of getting a z value is 0.05 in probability i.e. 1.651 on the right hand extreme only! This also implies that the rejection of the Null Hypothesis (that there is no real difference in means) is easier to achieve with a double fold of chance!! One tailed significance = two tailed significance divided by 2! As for the case in the graph above, the test for the difference between sample mean and the population mean is not significant in a two-tailed test (z=1.8 < .196), but being significant in the one-tailed test (z=1.8>1.651)!! 13 Type I Error and Type II Error We might say, by default, type I error would be the error we try to avoid Firstly! This is the error to say that there is difference between two groups while there is, in fact, no! (Suppose having difference is a crime, type I error is the error to sentence a person to have committed the crime, while he, in fact, has not!) If we accept the null hypothesis the there is no difference, then we would not have the risk of committing the type I error. However, we would then immediately be under the risk of committing the type II error i.e. saying that there is no difference while there is, in fact, yes! (Saying that a person has not committed a crime while he, in fact, has committed) 1 ‐ β Range of t‐values that would make us commit the Type II error i.e. saying there is no difference while there are, in fact, two curves existing! Critical t‐value for rejecting the null hypothesis that there is no real difference between the two groups (only one curve)! In the graph above, using α/2 as the critical point, we would not reject the null hypothesis that there is no real difference between the 2 populations, while t is less than 2! We accept the null hypo-thesis since we don’t want to committee the Type I error (sentencing a person that he has committed a crime while he is ignore)! We think there is only ONE curve (Red) existing!! However, if there is, in fact, real difference between the two populations (two curves existing), then we would have committed the Type II error (let the accused person go while he has committed the crime) already!! The range of t values that would make us commit such a mistake is shown in the graph above! The probability that we would commit such an error is the area represented by β! This probability would depend on ‘how different’ you would say that there is a real different. β would be important for calculation of ‘Power’ (1-β) and sample size N. We would take about calculation of Sample Size and Power in the later sections! 14 1.2.b The Binomial Distribution The binomial distribution describes the behavior of a count variable X if the following conditions apply: 1: The number of observations n is fixed. 2: Each observation is independent. 3: Each observation represents one of two outcomes ("success" or "failure"). 4: The probability of "success" p is the same for each outcome. If these conditions are met, then X has a binomial distribution with parameters n and p, abbreviated B(n,p). Example Suppose individuals with a certain gene have a 0.70 probability of eventually contracting a certain disease. If 100 individuals with the gene participate in a lifetime study, then the distribution of the random variable describing the number of individuals who will contract the disease is distributed B(100,0.7). Note: The sampling distribution of a count variable is only well-described by the binomial distribution is cases where the population size is significantly larger than the sample size. As a general rule, the binomial distribution should not be applied to observations from a simple random sample (SRS) unless the population size is at least 10 times larger than the sample size. To find probabilities from a binomial distribution, one may either calculate them directly, use a binomial table, or use a computer. The number of sixes rolled by a single die in 20 rolls has a B(20,1/6) distribution. The probability of rolling more than 2 sixes in 20 rolls, P(X>2), is equal to 1 - P(X<2) = 1 - (P(X=0) + P(X=1) + P(X=2)). Using the MINITAB command "cdf" with subcommand "binomial n=20 p=0.166667" gives the cumulative distribution function as follows: 15 Binomial with n = 20 and p = 0.166667 x 0 1 2 3 4 5 6 7 8 9 P( X <= x) 0.0261 0.1304 0.3287 0.5665 0.7687 0.8982 0.9629 0.9887 0.9972 0.9994 The corresponding graphs for the probability density function and cumulative distribution function for the B(20,1/6) distribution are shown below: Since the probability of 2 or fewer sixes is equal to 0.3287, the probability of rolling more than 2 sixes = 1 - 0.3287 = 0.6713. Mean and Variance of the Binomial Distribution The binomial distribution for a random variable X with parameters n and p represents the sum of n independent variables Z which may assume the values 0 or 1. If the probability that each Z variable assumes the value 1 is equal to p, then the mean of each variable is equal to 1*p + 0*(1p) = p, and the variance is equal to p(1-p). By the addition properties for independent random variables, the mean and variance of the binomial distribution are equal to the sum of the means and variances of the n independent Z variables, so These definitions are intuitively logical. Imagine, for example, 8 flips of a coin. If the coin is fair, then p = 0.5. One would expect the mean number of heads to be half the flips, or np = 8*0.5 = 4. The variance is equal to np(1-p) = 8*0.5*0.5 = 2. 16 1.2.c The Poisson Distribution The Poisson distribution arises when you count a number of events across time or over an area. You should think about the Poisson distribution for any situation that involves counting events. Some examples are: the number of Emergency Department visits by an infant during the first year of life, the number of pollen spores that impact on a slide in a pollen counting machine, the number of incidents of apna and bradycardia in a pre-term infant. The number of white blood cells found in a cubic centimeter of blood. Sometimes, you will see the count represented as a rate, such as the number of deaths per year due to horse kicks, or the number of defects per square yard. The Poisson distribution depends on a single parameter λ. The probability that the Poisson random variable equals k is for any value of k from 0 all the way up to infinity. Although there is no theoretical upper bound for the Poisson distribution, in practice these probabilities get small enough to be negligible when k is very large. Exactly how large k needs to be before the probabilities become negligible depends entirely on the value of λ. Here are some tables of probabilities for small values of λ. λ 0 1 2 3 0.1 0.905 0.090 0.005 0.000 λ 0 1 2 3 4 5 0.5 0.607 0.303 0.076 0.013 0.002 0.000 λ 0 1 2 3 4 5 6 7 8 1.5 0.223 0.335 0.251 0.126 0.047 0.014 0.004 0.001 0.000 For larger values of λ it is easier to display the probabilities in a graph. 17 The plot shown above illustrates Poisson probabilities for λ = 2.5. The above plot illustrates Poisson probabilities for λ = 7.5. and this plot illustrates Poisson probabilities for λ = 15. The mean of the Poisson distribution is λ. For the Poisson distribution, the variance, λ, is the same as the mean, so the standard deviation is √λ. Binomial Distribution vs Poisson Distribution Poisson Distribution can often be regarded as a Binomial Distribution with n is very large and p being very small! In fact, sometimes we can impose such a Binomial Distribution with a Poisson one to save much labor of computation! 18 1.2.d Before giving up parametric tests Although choosing parametric or non-parametric tests depends mainly on whether the population fulfills the assumptions of underlying distributions, especially for Normal or t distribution, we should try our best to stick to using parametric tests although the population might be deviating from the assumptions! This is because they are more powerful and the results are more reliable and convincing. 1.2.d.i The Central Limit Theorem When : The population is not normal and the variance may or may not be known:The random variable or the random variable (the one used depending on whether the variance is known or unknown) is approximately standard normal if the sample size is sufficiently large (at least thirty). The is also call the Central Limit Theorem! Please notice that the sample size N must be equal or greater than 30 for applying this Central Limit Theorem, with that we can apply approximation of z distribution for running statistical tests! Where : x is the mean of the sample µ is the mean of the population σ is the known standard deviation of population n is the sample size s is the standard deviation calculated from sample 19 1.2.d.ii The Normal approximation to other distributions (Strictly speaking, it is just a outcome of the Central Limit Theorem) Approximation of a Binomial distribution by a Normal distribution As stated previously, the normal distribution is so important that it can solve many problems not only for so many natural phenomena in the natural world, but it can also be used for solving many other problems by superimposing on other distributions! Let’s see the following binomial distributions before any medicine is available for a disease and patient just recover by bed resting, the chance of recovery is 0.4 (fail to recover is 0.6) : 20 As you can see, for these binomial distributions, as N increases, the more the histogram would become a normally distribution one! In fact, the more p approaches 0.5, the less N need to be so that a normally distribution can well superimpose on the histogram of the binomial distribution! Generally, we can carry out the approximation when Np and Nq are both greater than 5! In the other way, N = 5/p and round up or N = 5/q and round up, depending on which is smaller!! For example, for 0.4 and 0.6, we have 5/0.4 = 12.5, round up to N=13, for carrying out the approximation! If the superimpose of the normal distribution on the binomial one is alright, the later one can be treated as a normal distribution with mean = Np (=20*0.4=8 in this case) and Std. Dev = √Npq (= √20*0.4*0.6) = 2.19 in this case)! 21 Approximation of a Poisson distribution by a Normal distribution A Poisson() has a mean and standard deviation given by: From Central Limit Theorem, as gets large: Equation 2 for the Poisson distribution method can then be rewritten: (1) Figure 1: Example of Equation 1 estimate of where = 40, t = 4 Figure 2: Example of Equation 3 estimate of where = 2, t = 4 Equation 1 works nicely in Figure 1 for large (40): is the measure of the amount of data, whereas t is just a scaling factor. Figure 2 shows the normal distribution approximation is less useful for smaller (2): now Equation 1 is completely inaccurate, assigning considerable confidence to negative values, and fails to reflect the asymmetric nature of the uncertainty distribution 22 1.2.d.iii Robustness to deviation from distribution assumptions T test Overall, the two sample t-test is reasonably power-robust to symmetric non-normality (the true type-I-error-rate is affected somewhat by kurtosis, the power is impacted mostly by that). When the two samples are mildly skew in the same direction, the one-tailed t-test is no longer unbiased. The t-statistic is skewed oppositely to the distribution, and has much more power if the test is in one direction than if it's in the other. If they're skew in opposite directions, the type I error rate can be heavily affected. Heavy skewness can have bigger impacts, but generally speaking, moderate skewness with a twotailed test isn't too bad if you don't mind your test in essence allocating more of its power to one direction that the other. In short - the two-tailed, two-sample t-test is reasonably robust to those kinds of things if you can tolerate some impact on the significance level and some mild bias. Anova and linear regression Anova is robust to normality, but probably being more sensitive to equality of variance! However, if the sample size of different groups are equal, or nearly equal, you can choose the Tukey’s Pos Hoc Test for finding inter-groups difference, since it is robust to deviation from equal variance, with equal sample size! Please notice that Anova is just a case of linear regression and a T test is just a Anova of 2 groups! Same as in T test, it might be probably fine in most cases, if your data are somewhat symmetrical and skewness does not occur in opposite directions for your study groups. 23 1.3 Running Parametric Tests vs corresponding Non-Parametric Tests 1.3.a.i One sample T -Test The vendor of a new medicine claimed that it can yield a depression score below 70 after applying for 2 weeks to the patients! A sample of 25 patients has been chosen to take the new medicine and the depression score is taken after two weeks. (The underlying population is an unlimited average of samples of 25 patients)! 1) The result scores, in SPSS, are:- 24 2) Analysis, Compare Means, One Sample T Test… 3) You would see, move Dep_Score to Test Variable(s), input 70 as Test Value 4) Click ‘OK’ 25 Results for One Sample T Test in SPSS :- 4a) One-Sample Statistics Results: Sample Size N Sample Mean x Sample Std. Dev. σ 26 σm = σ/√N = 4.748/√25 Value to be compared with 4b) One Sample Test Results: t value calculated by degree of freedom = N‐1 ‐3.833 Probability of t > 3.833 or < ‐3.8333 by chance < 0.05, reject H0 that the means are equal! Difference between sample mean (66.36) and hypothesis mean (70) 70 ‐5.60 = 64.4 70 ‐1.68 = 68.3 95% confidence for the mean to fall between these 2 limit 99.99% Conclusion: The 25 patients taking the new medicine have a depression score of mean 66.36 and SD 4.784. A t value of -3.833 is obtained, which is significant even for a 2 tailed test! We can reject the Null Hypothesis H0 that the sample mean is same as the comparing value of 70! The vendor might be right that their new medicine can cure patients to obtain a depression score different from 70! 27 One-Tailed Test for example above: The computer output above is a 2 tailed test output! For a 2 tailed test: H0 : The population mean =70 Ha: The population mean =\= 70 In a 1 tailed test: H0 : The population mean >=70 Ha: The population mean < 70 We just want to decide whether the population mean of depression score would be less than 70, without considering that whether it would be greater than 70 in average:- NOTHING NEED TO BE CHANGED FOR THE RUNNING OF THE 2 TAILED TEST ABOVE! What you need to do is to know above how to interpret the same computer output. For rejecting the Null Hypothesis:- Step 2: divide the significance by 2, as the 1 tailed test would produce a probability 2 times less that a 2 tailed test for obtaining critical t – values! Step 1: t must be negative in this case that the negative tail is being tested for, and must be positive if the positive tail is being tested for! 28 One Sample T-Test using Excel (with PHstat4 Add-In) (For installation of PHStat4 Add-In, please refer to Appendix: Installation of Free Software) 1) PHStat, One-Sample Tests, t-test for the Mean, sigma unknown… 29 2) Input information for running a 2 tailed test: 3) Results nearly same as when using SPSS above, differences might due to rounding off issue: t = 3.833 when using SPSS above! 2 times p‐Value in 1 tailed test 30 4) Input information for running a 1 tailed test: 5) Results nearly same as when using SPSS above, differences might due to rounding off issue: t = 3.833 using when using SPSS above! shifting of critical value towards central axis! 0.5 times p‐Value in 2 tailed test 31 1.3.a.ii Wilcoxon signed rank test (corresponding to one sample t test) The only assumptions for running Wilcoxon signed rank sum test are: 1) The population is continuous 2) The population has a median 3) The population is symmetric Running the Wilcoxon signed rank sum test in SPSS: For example, we have got a data set as following: 9, 11, 18, 16, 17, 21, 12, 10, 11, 11, 19, 16, 12, 13. 20, 14, 15, 13 We want to test the hypothesis: Ho: Median = 16 Ha: Median =\= 16 1) Data in SPSS: 32 2) Analysis, Nonparametric Tests, One Sample… 3) Click ‘Assign Manually’: 33 4) Move Data to ‘Continuous’: 5) Click ‘OK’ 34 6) You would go back to the previous window, select the test again: 7) Select ‘Automatically compare ….’, Click ‘Settings’ 2 1 35 8) Select ‘Choose Tests’, ‘ Customize tests‘, Compare median …. (Wilcoxon signed-rank test)’ Input 16: 9) Choose ‘Test Options’, input ‘Significance level’ and ‘Confidence interval’, use default is no need to change: 36 P 0.066 > 0.05, can’t reject the hypothesis that the population median is equal to 16. But 0.066/2 = 0.033 < 0.05, sign. for 1 tailed test! 10) Results: Calculation by hand if SPSS is not available: i) Subtracting 16 from each observation, we get -7, -5, 2, 0, 1, 5, -4, -6, -5, -5, 3, 0, -4, -3, 4, -2, -1, -3 ii) Discarding the zeros and ranking the others in order of increasing absolute magnitude, we have 1, -1, 2, -2, 3, -3, -3, -4, -4, 4, -5, 5, -5, -5, -6, -7 iii) The ‘1’s occupy ranks 1 and 2; the mean (average) of these ranks is 1.5; and each ‘1’ is given a rank of 1.5 iv) The ‘2’s occupy ranks 3 and 4; the means of these ranks is 3.5; each 2 is given a rank of 3.5. v) In a similar manner, each ‘3’ receives a rank of 6; each ‘4’ a rank of 9; each ‘5’ and rank of 12.5; the ‘6’ is assigned a rank of 15; and the ‘-7’ a rank of 16. vi) The sequence of the ranks is now 1.5, -1.5, -3.5, 3.5, 6, -6, -6, -9, -9, 9, -12.5, 12.5, -12.5, -12.5, -15, -16. ( ‘-‘ indicate negative ranks). vii) The positive rank sum = 32.5 The negative rank sum = 103.5 Take the smaller rank sum is taken as T = 32.5 viii) In the table for Wilcoxon signed-rank test, find, in the column headed by the value α = 0.05, n = number of ranks = 16 (18-2), critical value = 29. 32.5 is not less than or equal 29, so we have to accept the null hypothesis that the median is 16! For one tailed test, α = 0.01, critical value = 35, we can reject the null hypothesis! 37 1.3.b.i Two Samples T Test This is also called ‘independent t test’, meaning that the two samples would not affect each other in measuring of the data values. Two t-distributed populations are tested by using two samples from each of them. The basic question is: How different are the two means below different such that the chance of getting the critical t-value would be less than 5% (one tailed) or 2.5 % (two-tailed)? Mean of Treatment Group Mean of Control Group For example, 25 patients are chosen for taking a traditional medicine for treating depression (Group 1- control), and another 25 patients are also chosen for taking the new medicine (Group 2) ! The Depression Score is taken after 2 weeks and being input to SPSS as:- 38 1) In SPSS, Analysis, Compare Means, Independent Samples T Test… 2) Move ‘Dep_Score’ into Test Variable(s)’ and Group into ‘Grouping Variables’: 39 3) Click ‘Define Groups’ 4) Input the values 1 and 2 form definition of Groups 5) Click ‘OK’ 40 6) SPSS output: 7a) Groups Statistics Group 2 has a lower mean and Std. Dev. than Group 1 41 7b) T Test results B A C Part A - Test for assumption of ‘equal variance’ and t value obtained Levene's Test for Equality of Variances F Dep_Score Equal variances assumed t-test for Equality of Means Sig. t .009 7.441 Equal variances not df .674 48 .674 38.928 assumed An analysis of variance test has been run for testing the hypothesis that the variance of the two groups are equal or not! The larger the value of F, the higher the chance that their variance being different! Sign. = 0.009 < 0.05, meaning that the variance of the two groups being significantly different! This imply that equal variance could not be assumed and we should use the ‘equal variance not assumed’ data e.g. t, df …instead! 42 t value and df calculated under the assumption that the variance of the 2 groups being different! Separated variance instead of pooled variance is used! Part B – Significance, Mean Difference, and Std Error Difference: t-test for Equality of Means Std. Error Sig. (2-tailed) Dep_Score Mean Difference Difference Equal variances assumed .503 1.260 1.868 Equal variances not assumed .504 1.260 1.868 2 tailed probability = 0.504 > 0.05, we cannot reject the Null Hypothesis that the two population means are equal! Sample Mean of Group 1 – Sample Mean of Group 2 = 67.62‐62.36 Part C) 95% Confidence interval of the Difference t-test for Equality of Means 95% Confidence Interval of the Difference Lower Dep_Score Upper Equal variances assumed -2.497 5.017 Equal variances not assumed -2.520 5.039 We have 95% confidence to say that the difference between the two groups (Mean of Group One – Mean of Group Two) would fall between the value ‐2.520 and 5.039! 43 One-Tailed Test: As stated previously, there is no need to make any changes for running the test! Just make sure that whether a positive tail or a negative tail you are testing, and see whether you can get a significate test result after a double fold increase of chance! For example, if you just want to test whether the new medicine can produce a lower depression score in Group 2, this also means the whether Group 1 can produce a higher score that Group 2! Then we can testing the positive tail Mean of Group 1 – Mean of Group 2, and follow step 1 and step 2 below: Levene's Test for Equality of Variances F Dep_Score Equal variances t-test for Equality of Means Sig. 7.441 assumed t .009 Equal variances not assumed df .674 48 .674 38.928 Step 1 : t‐value being positive instead of being negative t-test for Equality of Means Std. Error Sig. (2-tailed) Dep_Score Difference Equal variances assumed .503 1.260 1.868 Equal variances not assumed .504 1.260 1.868 Step 2: 0.504/2 = 0.252 Still > 0.05, the 1 tailed test still can’t find sign. Difference between the 2 groups! 44 Mean Difference Running the 2 samples test with Excel (For implanting ‘Data Analysis Tools’ Add-In of Excel, please refer to Appendix: Installation of Free Software) 1) DATA, Data Analysis 45 2) Test for Assumption of equal variance 3) Input required information 4) Results show that the equal variance assumption is not hold 0.006 << 0.05, the variance of the two groups are sign. Different! 46 5) Choose ‘t-test: Two-Sample Assuming Unequal Variance 6) Input required information: The ‘Hypothesis Mean Difference’ is a very useful item! If we just want to test whether there is any difference in mean of two group, just leave it blank (meaning zero)! If we want to test whether the 2 groups have a particular value of difference, just fill in the value: Leave blank means ‘being zero’! If a particular value of difference is to be tested, please input here!! 47 7) Results similar to results from SPSS, the small difference might due to rounding off issue: Sample as in SPSS (two tailed) Two tailed and One tailed Test Results from PH4Stat4 Excel Add-In: Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation 25 67.6 8.0052 25 66.36 4.7511 Intermediate Calculations Numerator of Degrees of Freedom 12.0150 Denominator of Degrees of Freedom 0.3077 Total Degrees of Freedom 39.0416 Degrees of Freedom 39 Standard Error 1.8618 Difference in Sample Means 1.2400 Separate‐Variance t Test Statistic 0.6660 Two‐Tail Test Lower Critical Value Upper Critical Value p‐Value Do not reject the null hypothesis ‐2.0227 2.0227 0.5093 48 Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation Upper‐Tail Test Upper Critical Value p‐Value Do not reject the null hypothesis 25 67.6 8.0052 25 66.36 4.7511 Intermediate Calculations Numerator of Degrees of Freedom 12.0150 Denominator of Degrees of Freedom 0.3077 Total Degrees of Freedom 39.0416 Degrees of Freedom 39 Standard Error 1.8618 Same as in SPSS Difference in Sample (two tailed) Means 1.2400 Separate‐Variance t Test Statistic 0.6660 1.6849 0.2547 1.3.b.ii Mann-Whitney Test for two independent samples (Corresponding to parametric two independent samples t test) *Should get identical results for Wilcoxon rank sum test for 2 samples) The assumptions for running Mann-Whitney test are: 1) The populations are continuous 2) The population have a median 3) The populations must have the same form For example, we have 2 samples with scores that are heavily skewed and we don’t want to use t test for comparing them! We prefer to use a non-parametric test, the Mann-Whitney Test. Sorry the problem with SPSS: 1) Input Data 49 2) Analyze, Nonparmetric Tests, Independent Samples… 3) Choose default 50 4) Choose Tests, check ‘Mann-Whitney U (2 samples) 5) Test Options, Choose default if not change, click ‘Run’ 51 6) Results: 0.349 > 0.005, we can’t reject the hypothesis that the two samples are come from two populations with the same median i.e. they are from two identical populations! Wilcoxon Rank Sum Test for two independent samples with PHStat4 (Expected to have the same results as the Mann-Whitney Test above) It is interesting to know about that the Mann-Whitney test above and the Wilcoxon Rank Sum Test for two independent samples would get the same answer of p value, and there is no differences which test you run! So, if you don’t have SPSS, you can use the free Excel Add-In PHStat4 to run a Wilcoxon test to get the same conclusion! 1) Run PHStat by clicking the icon on Desktop, Enable Macro 52 2) Input Data 3) ADD-IN, PHStat, Two –Sample Tests…, Wilcoxon Rank Sum Test… 53 4) Input required information, click ‘OK’ 5) Results: Very close to p‐value 0.349 found in SPSS with Mann‐ Whitney Test! Difference might due to rounding off! 54 1.3.c.i Paired T-Test Paired T-Test would be used when e.g. the sample subjects would be measured on two time points, or pairs of twins are studied under experiments etc., etc. The key point is that we assume there is a particular relationship existing such that the data values measured would not be independent from each other! Simply speaking, it is an analysis of the differences from each pair of data: t = d–0 /s d 55 / √nd For example, a company running two shops want to know whether there would be real difference of income between them. Suppose using SPSS:- 1) Analyze, Compare Means, Paired-Samples T Test… 56 2) Put Shop_1 under Variable1 and Shop_2 under Variable2 3) Click ‘OK’ 4) Output: 57 Enlarged pictures: Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Mean Pair Shop_1 - 1 Shop_2 Mean of Shop 1 – Mean of Shop 2 -190.900 Std. Std. Error Deviation Mean 265.334 Std. Dev. Of the difference Lower 83.906 -380.709 Std. Error of Mean of the difference Upper -1.091 95% confidence that the difference would fall into this range Paired Samples Test t Pair 1 Shop_1 - Shop_2 df -2.275 Sig. (2-tailed) 9 .049 t (df = 9) 0.025 0.025 t = ‐2.275 ‐2.262 Sign. < 0.05, the result is significance! Reject the hypothesis that the difference between the income of the two shop = 0! Shop 2 have an income different from shop 2. 2.262 58 Two-Tailed Test Paired Samples Test t Pair 1 Shop_1 - Shop_2 -2.275 Sig. (2-tailed) 9 Step 2: Divided this probability value by two and see whether it can be < 0.05 for rejecting the null hypothesis! 0.049/2 = 0.0242, reject null hypothesis and accept the alternative hypothesis that Shop 1 have income lower that Shop 2. Step 1: Make sure the +/‐ sign is agreed with the hypothesis you want to test for i.e. Shop_1 > Shop_2 or Shop 2 > Shop_1! If opposite, then no need to test anymore. If agreed, go to Step 2! 0.05 t = ‐2.275 ‐1.833 59 df .049 Solved by Excel 1) Data, Data Analysis 2) Choose ‘t-test: Paired Two Samples for Means 60 3) Input required fields: Besides zero, we can test for hypothesis that variable 1 is different from variable 2 by a certain value! 4) Results is almost the same as in SPSS: 61 Output of Excel Add-In PHStat4: 62 1.3.c.ii Wilcoxon matched-pairs signed-rank test for paired samples (Corresponding to parametric paired t test) Running Wilcoxon matched-pairs signed-rank test with SPSS Suppose ten students has taken a mathematics training course! Testing is carried out before and after the course and score is recorded. The marks are found to be heavily skewed previously and not being suitable to be tested by t-test etc.! 1) Input Data 2) Analyze, Nonparametric Tests, Related Samples 63 3) Choose Assign Manually: 4) Seeing: 5) Click OK: 64 6) Go back to: 7) Click Field: 65 8) Seeing: 9) Click ‘Settings’: 66 10) Choose Tests, check ‘Wilcoxon matched-pair…’ 11) Test Options, use default if not change: 67 12) Results: 0.404 > 0.05 result being significant even for this two tailed test. Accept the null hypothesis that the median, and thus the two populations, are identical! IF YOU DON’T HAVE SPSS If you don’t have SPSS, please try to find a free calculator in the internet for this Wilcoxon matched-pairs signed-rank test (by using google search etc.)! e.g.: http://www.socscistatistics.com/tests/signedranks/ 1) Check that the test is for Wilcoxon Matched paired…, click ‘Take me to the calculator’: 68 2) Seeing, input data: 69 3) After input Data: 70 4) Result: 5) Ignore ‘Result 1 – Z‐value’ due to N being too small (9). Refer to ‘Result 2 – W‐value’, W‐value is 15.5, the result is NOT significant at P<= 0.05, for a two tailed test! 71 1.3.d.i – One-way Anova: Completely Randomised Design – Equal Sample Size • This is one of the most common and basic experimental design • The equal sample size make the Anova test very robust for ‘not too serious’ violation of assumption of NORMALITY. *Moreover, Tukey Post Hoc test is appropriate for this equal size case, since it would be robust for violation of the assumption of EQUAL VARIANCE.(Post Hoc means ‘unplanned before the experiment’.) ‐ 5 cages each with 4 rats were used for a ‘Completely Randomized Design’ Experiment. ‐ The 20 rats had been assigned to the treatments A (control), B, C and D totally randomly by e.g. blinded researcher drawing animal numbers, without concerning any cages boundaries! ‐ The response was a ‘score’ after the 4 ‘treatments’ e.g. a growth in body weight after a certain period of time. ‐ Please find any Significant Differences in this score caused by the four treatments :- 72 73 Solved by using SPSS ver. 20.0 1. Input or ‘cut and paste’ data 2. Click ‘Variable View’. 74 3. Edit variable names and decimal places etc. 4. ‘Analyze’, ‘Compare Means’, ‘One-way Anova’. 75 5. Move Score to ‘Dependent List’ and ‘Treatment’ to ‘Factor’. 6. Enter ‘Options’ and ‘Post Hoc’ for choosing related options and tests. ‘Contrasts’ would be left along for discussion in Book 2. 76 7. In ‘Options’, choose e.g. ‘Descriptive’ and ‘Homogeneity of variance test’. (‘Welch’ would be useful when the distribution of population is not normal or unknown, for test of equal variance assumption.) 8. In ‘Post Hoc’, choose e.g. LSD, Tukey and Scheffe, in an order of ‘toughness’ for detection of significant differences among groups! 77 9. Click ‘OK’ to run. 10. SPSS result output:A - The command lines for all tests are shown first. B - The ‘Descriptives’ function calculate the basic parameters such as Mean and Std Deviation of the samples. C – The ‘Sig.’ of 0.969 > 0.05 in the Test of Homogeneity of Variance indicate that the assumption of equal variance is OK (*However, for such a few num of subjects, the chance of failing the test is very low. We had better either trust the population being normal, or use non-parametric tests for equal variance e.g. Welch!) D - For the overall Anova, a ‘Sig.’ Of 0.000 < 0.05 implies that the null hypothesis of equal means would be rejected. At least one Group Difference exists among the groups – Multiple Comparisons should be used for finding out where it is! 78 A B F: df (3,16) C D 13.594 3.329 P = 0.05 F value = = . . = 13.594 Critical F value for df (3,16) from table = 3.239 (0.05), F > Critical F, at least two groups are different in their group means! (Although the computer can do everything for you, you should also learn about this important step used in various Anova analysis!) 79 80 11. See results of Tukey Test as example, particularly in this equal sample size case! A ‘Sig.’ <0.05 indicates a Group Difference. The results shows that Group 1, 2 different from Group 3, 4! 12. Imagined picture: Group 1 Group 2 Group 3 Group 4 81 Solved by using Excel (CRD – Equal Sample Size) Solved by Excel a) Tukey’s Test 1. Click ‘Data’, ‘Data Analysis’. 2. Choose ‘Anova: Single Factor’ = One-way Anova 82 3. Input required fields and data area (dotted line), click ‘OK’. 4. Overall Anova results:- p-value = 0.00015 < 0.05 means we can reject the null hypothesis. Go ahead for founding inter-groups differences. 5. Finding of the value of ‘q’, k = 4, df = 16, q = 4.05 (0.05):83 6. Finding ‘Critical Difference’ and significant differences- 84 7. Counter Checking with SPSS figures – Although same groups of sign. Differences are found, this is still not enough to say that the results are exactly the same. However, we can counter check the 95% Confidence Interval of SPSS! 8. Comparison of Tukey’s Test results in Excel and in SPSS: 85 1.3.d.ii Non-parametric independent k samples - Kruskal-Wallis one-way analysis of variance Suppose the government want to compare the number of traffic accidents in 3 districts in 10 Sundays. Experience shows that the data can be quite skew without being normal distributed at all! 1) Enter Data 86 2) Analyze, Nonparametric Tests, Independent Samples… 3) Seeing, click Assign Manually… 4) Move Traff_Acc to Continuous, District to Nominal 87 5) Choose Default : Automatically compare….. 6) Fields, move Traff_Acc to Test Fields, District to Groups 88 7) In Setting, Choose Tests, choose ‘Kreskal-Wallis 1-way ANOVA (k samples)’ 8) In Test Options. Choose significance and confidence interval 89 9) Results: Sign. = 0.047 < 0.05. Reject the null hypothesis that the frequency of traffic accidents in the 3 districts has been the same during the passed 10 years! Solved by PHStat4 Excel Add-In 1) Click icon on desktop etc. 90 2) Click ‘Enable Macro’: 3)Add-In, PHStat, Multiple Sample Tests, Kruskal-Walli Rank Test… 91 1) Seeing: 5) Input information required: 92 6) Results: Sign. same as in SPSS above! 1.3.e.i Two Factors Anova (a X b factorial) - Each subject is randomly chosen into each ‘combination’ of the two factors under investigation, both factors are of interest. - Calculation is the same as in the previous block design, No Post Hoc test would be run and all interaction is assume to be due to chance only. - 5 cages each with 4 rats have been used for a ‘Completely Randomized TwoFactors (a X b factorial) Without Replication Design’ Experiment. - Each of the 20 rats had been assigned randomly to be subjects for the ‘combinations’ of factor one and two (Diet A,B,C,D X Lighting 1,2,3,4,5 = 20). The response is a ‘score’ after the twenty ‘treatments’ e.g. the growth in weight within a certain period of time. - Please find any Significant Differences caused by the two factors. 93 Diet A ‐ Control Diet B Lighting 1 ‐ Control Solved by SPSS 1. Data input to SPSS. 94 Diet C Diet D 2. Click ‘Variable View’. 3. Give variable names, decimal places etc. 95 4. ‘Analyze’, ‘General Linear Model’, ‘Univariate’. 5. Put ‘Weight_Increase’ under ‘Dependent Variables’, ‘Diet’ under ‘Fixed Factor(s)’ and ‘Light’ under ‘Random Factor(s)’. Post Hoc tests would not run even chosen. 96 6. The ‘Sig.’ for Diet is 0.006 < 0.05, indicating that Diet is a factor causing sign. difference between at least two of the treatment groups. The ‘Sig’. for Light is 0.369 > 0.005, we cannot reject the null hypothesis that the group means under this factor is equal. Post Hoc Test wouldn’t run and no way to calculate interaction being sig. or not! 97 98 7. You might say why not put ‘Light’ into ‘Fixed Factor(s)’, since we are interested in it also?? 8. Although ‘light’ seems to be a fixed factor that the researcher would be interested in its effect, assigning it as a ‘fixed factor’ in SPSS would not give the meaningful values of F! It is because that under the situation without any replication, and the interaction is due to chance only, ‘light’ should be treated more likely a ‘Blocking factor’ in the Blocked Design rather than a ‘fixed factor’ in a axb Factorial Design with replication. Please refer to Part 4 for more details about ‘fixed and random factors’. 99 Solved by Excel 1. Click ‘Data’, ‘Data Analysis’. 2. Choose ‘Anova: Two-Factor Without Replication’. 100 3. Input requested fields and select data area (dotted line). 101 4. Anova Results:Although upside down, the overall anova results is same as from SPSS, indicating the ‘Diet’ is a sign. Factor while ‘Lighting’ is not. At least two groups different in their means. Again, no way to run Post Hoc tests due to no replication. Lighting Diet = interaction in SPSS The ‘Error’ and ‘df’ under Interaction =0, Not shown!! Post Hoc Tests wouldn’t run. 102 1.3.e.ii Non-parametric paired k samples - Friedman two-way analysis of variance Suppose six patients being handled by three different doctors in the pass 9 years, each doctor by about 3 years, and average frequency of admission to the hospital has been recorded. Please compare the frequency of admission and see whether it is the same for all the 3 doctors! 1) Input data 2) Analyze, Nonparametric Tests, Related Samples… 103 3) Click ‘Assign Manually…’ 4) Moving as below: 104 1) Again, Nonparametric Tests, Related Samples: 2) Use Default: 105 3) Moving as below: 8) Under Choose Test, choose ‘Friedman’s 2-way ANOVA by ranks (k samples) 106 9) Under Test Options, choose significance and confidence interval 10) Results: Sign. = 0.016 < 0.05. Reject the null hypothesis that the number of admission to hospital under the 3 doctors are different! 107 Solved by XLStat 1) Run XLStat2015 2) Proceed with the trial version 3) Input Data: 108 4) Choose as following: 5) Seeing: 109 6) Input required information 7) In ‘Options’, input Significance level etc 110 Seeing, Continue 9) Results (Sample as using SPSS above): 111 1.4 Table Form Non-parametric Tests for Categorical Data For categorical data, there are no parametric tests or rank comparing non-parametric tests to be used! There is mostly no choice but to use table form non-parametric tests for making statistical testing on the data obtained. The most important table form test is the Chi-square (ӽ2) test that use the statistic ӽ2 and the Chi-square Distribution for the interpretation of the categorical data. Commonly used Chi-square test including the Goodness of fit test that test whether the sample follows some probability theories, testing of homogeneity and testing of independence etc. The number of samples taken and whether the data are paired or not are important for the choosing of which version of test to be used! 1.4.1 One sample Chi-square Test 1.4.1.a A Goodness of fit test with one sample A sales contact 5 potential customers every day and for 100 days she make a record of making sales as : Number of sales: Frequency: 0 15 1 21 2 40 3 14 4 6 5 4 Her boss feels that the chance of making a sales with a call is about 35%, and a binomial distribution of b (y; 5, 0.35) would have the following probabilities : y 0 1 2 3 4 5 p( y) 0.1160 0.3124 0.3364 0.1812 0.0487 0.0053 e = 100p( y) 11.60 31.24 33.64 18.12 4.87 0.53 112 Solved by SPSS 113 1) Data Input 15 ‘ 0 ‘ 21 ‘ 1 ‘ 40 ‘ 2’ so on 114 2) Analyze, Nonparametric Tests, Legacy Dialogs, Chi-square 3) Input values of Expected Values, if not ‘All categories equal’ 115 4) Input Expected Values until finished: 5) Run the test and get the following results:- = 10.4108 in hand calculation above < 0.05, the null that the selling follows the binomial distribution b (y; 5,0.35) is rejected! 116 1.4.1.b A Test of Independence with one sample 1.4.1.b.i 2 x 2 contingency table for one sample This might be one of the most commonly used test for categorical data in our daily life! The expected values are determined by marginal totals assuming that the proportions are evenly divided by the number of cells in a 2 x 2 table (Please don’t mix up the ‘number of samples’ with the ‘number of criteria’ in a sample! One sample can have > 1 criteria thus forming a 2 x 2, 2 x k or k x k contingency table. For > 1 sample, we are taking different samples from different populations or groups, can also be handled with one contingency table!) A football coach wants to know whether the winning or losing of a game is independent of whether the game is played at home or away, so he runs a test of independence test with: H0: Winning is independent of where the game is played Ha: Winning is dependent of where the game is played Checking the results of the pass 30 years: Observed values: Home Away Total Won 97 69 166 Lost 42 83 125 Total 139 152 291 From marginal Totals: Expected values: Home Away Won 79.3 86.7 Lost 59.7 65.3 117 Since ӽ2 0.05, 1 = 3.841, the null hypothesis is rejected and there is evidence that winning and losing is dependent of where the game is played! Solved by Free Web Tools There are many many free Chi-square calculator in the internet that can calculate the answer as above, e.g. http://www.socscistatistics.com/tests 118 Home Away Win Lost 119 1.4.1.b.ii Independence test for k x k contingency table for one sample For example, we have a sample of people with different hair and eye color, and we want to test for whether the two criteria are independent or not:- Observed and expected (bracketed) values: Hair Color Red Blue 12 (8.58) Brown 24 (27.82) Black 16 (15.6) Total 52 Eye Color Golden 18 (13.70) 39 (44.41) 26 (24.9) 83 Black 36 (43.73) 151 (141.78) 88 (79.5) 265 H0: Eye Color and Hair Color are independent of each other Ha: Eye Color and Hair Color are dependent of each other http://www.socscistatistics.com/tests 120 Total 66 214 120 400 Please notice that, in test for independence, these values are found only after taking the single sample! In test for homogeneity, the values is designed before taking sampleS!! 121 122 We cannot reject the null hypothesis that eye color and hair color are independent to each other! 1.4.2 Two sample Chi-square Test 1.4.2.a Independent Samples 1.4.2.a.i Chi-square test for two samples Sometimes we want to know whether 2 or more samples come from the same population and we might even don’t know is the distribution of the population! For examples, if a doctor wants to know whether the male/female proportion in disease A and disease B is the same or not:Disease A Disease B Total Male 32 28 60 Female 18 22 40 Total 50 50 100 123 Solved by Free Web Tools ** Please notice that the web calculator cannot differentiate whether you are calculating for a one sample 2 x 2 table for independence or a two samples test for homogeneity, only we human know about this! http://www.socscistatistics.com/tests 124 125 Fisher exact probability test Fisher exact test is more accurate than Chi-square test when the sample size is too small, especially when one of the expect value is less than 5. We don’t want to discuss the underlying theory of this test here, which involve features like e.g. hypergeometric distribution, we would just show you how to simply get the correct probability (i.e. events just happen by chance) by using the following free web tool: http://www.danielsoper.com/statcalc3/calc.aspx?id=29 Suppose we have the following 2 X 2 table : 2 5 7 3 1 4 5 6 11 126 1.4.2.a.ii Chi-square test for k > 2 samples ** Please, again, notice that the web calculator cannot differentiate whether you are calculating for a one sample k x k table for independence or a k samples test for homogeneity, only we human know about this! For example, a wine company want to learn more about a new band of wine about how it is favored by drinkers of different countries. 100 Chinese, 100 Americans and 100 Europeans are invited for a tasting study. The ranking of tastefulness is 1 to 4, with 1 being most favour:Tastefulness 1 2 3 4 Total 42 26 19 13 100 55 21 14 10 100 38 30 22 10 100 135 77 55 33 300 Chinese American Europeans Total Solved by Free Web Tools Using: http://www.socscistatistics.com/tests 127 128 129 If you are confused why ordinal data above is tested by chi-square test instead of rank comparing rest e.g. Kruskal-Wallis We can try e.g. (For convenience we has lessen the data set) 1 Tastefulness 3 2 4 Total 4 3 3 10 20 3 5 8 4 20 5 4 5 6 20 12 12 16 20 60 Chinese American Europeans Total 130 If we run Kruskai-Wallis one-way in SPSS: 131 132 133 A samiliar, but lower significance comparing with 0.404 is found. This is reasonable since the request for having significance for this test is stricter comparing with the chi‐ square test for homogeneity above! 134 1.4.2. b Dependent Samples 1.4.2.b.i McNemar Test for two dependent (paired) samples Solved by Free Web Tools For example, in a study a test is performed before treatment and after treatment in 20 patients. The results of the test are coded 0 and 1. Is there a significant change in the test result before and after treatment? 135 Using: http://vassarstats.net/propcorr.html 1.4.b.ii Cochran Q Test for k dependent samples Cochran's Q test is an extension to the McNemar test for related samples that provides a method for testing for differences between three or more matched sets of frequencies or proportions. Example: 12 subjects are asked to perform 3 tasks. The outcome of each task is a dichotomous value, success or failure. The results are coded 0 for failure and 1 for success. In the example, subject 1 was successful in task 2 but failed tasks 1 and 3. 136 Please run a Cochran Q test for testing whether the success or fail rate for the 3 tasks is the same or not! Solved by SPSS: 1) Input Data 137 2) Checking, data type need to be Numeric! 3) Analyze, Nonparametric Tests, Related Samples 138 4) Seeing, Click ‘Field ’ 5) Choose ‘Fields’ 139 6) In Setting, choose ‘Cochran’s Q (k samples)’ etc, click ‘Run’ 7) Results: Significance is 0.013 < 0.05, we can reject the null hypothesis that the 3 tasks are the same in success or fail proportion! 140 Appendix - Installation of free software - Tables Activvation o of Exceel ‘Analyysis Too olPak’ A Add‐In (Office e 2013)) 1. Clicck ‘File’ 2. Clicck ‘Options’ 93 3. Clicck ‘Add In ns’ 4. Clicck ‘Analysis ToolP Pak’, ‘Go’’ 94 5. Cheeck ‘Anallysis ToolPak’, Click ‘OK’ 6. ‘Daata’, ‘Data Analysis’ 95 7. Anaalysis Too ols Ready 8. Tryy anyone of the teests, seem ms fine! 96 3) Saved Folder: 4) Enter to see: Installation of PHStat4 Excel Add‐In 1) Go to : http://users.business.uconn.edu/rjantzen/phstatinstall.htm 2) Save as PHStat_4.0 on e.g. Desktop 5) Enter to see: 6) Create a PHStat shortcut on Desktop by copying its icon: 7) Doubling its icon on Desktop, click ‘Enable Macros’: 8) Click Add‐in, PHStat, installation of PHStat4 successful! Installation of G Power 1) Download G Power from : http://www.psycho.uni‐duesseldorf.de/abteilungen/aap/gpower3/ 2) Seeing: 3) Click ‘GPowerSetup’ 4) Click ‘Next’ 5) Click ‘Next’ 6) Click ‘Next’ after finishing 7) Click ‘Shortcut’ on Desktop: Installation of XLStat2015 1) Go to ‘http://www.xlstat.com/en’ for downloading the software 2) Select Windows, Mac etc 3) Save the installation file 4) Select where to save e.g. desktop 5) Installation File Downloaded 6) Click ‘Run’ 7) Select Language 8) Agree 9) Select ‘Complete’, ‘Next’… 10) Installing files: 11) Finish 12) XLStat2015 Icon appears 13) Clicking the icon: 14) Choose ‘Trial version’ 15) Program ready