Download Non-Parametric Statistics

Non-Parametric Statistics A Presentation by Rob McMullen for AP Statistics What are Non-Parametric Statistics? Non-parametric statistics are a special form of statistics which help statisticians with a problem occuring in Parametric statistics. In order to understand what non parametric statistics are, it is first necessary to know what parametric statistics are. end end What are Parametric Statistics? In AP statistics, when we refer to a distribution we often make certain assumptions about it that enable us to work with it. One thing that helps us with this is the CLT, which allows us to assume that many sampling distributions are approximately normal. This theorem, the Central Limit Therom, tells us that for any distribution with a mean and variance, the sampling distribution for all samples of a given sample size is approximately normally distributed. end When are Parametric Statistics not useful? When we do significance tests, we rely on the assumption that the sampling distribution of samples taken follows the t-distribution or the zdistribution, depending on the situation. When this assumption is not true, none of our tests, which are called “parametric statistical inference tests,” are reliable. Everything we have done in AP stats has been in the field of “parametric statistics.” end Why does lack of normality cause problems? When we calculate the p-value for an inference test, we find the probability that the sample was different due to sampling variability. Basically, we are trying to see if a recorded value occurred by chance and chance alone. When we look for a p-value, we are assuming that all samples of the given sample size are normally distributed around the mean. This is why the test statistic, which is the number of standard deviations away from the population mean the sample mean is, is able to be used. Therefore, without normality, no p-value can be found. What are Non-Parametric Statistics? The way in which statisticians deal with this problem of parametric statistics is the field of nonparametric statistics. These are tests that can be done without the assumption of normality, approximate normality, or symmetry. These tests do not require a mean and standard deviation. Since a standard deviation assumes symmetry, it is not useful for many distributions anyway. end end What is different about NonParametric Statistics?  Sometimes statisticians use what is called “ordinal” data. This data is obtained by taking the raw data and giving each sample a rank. These ranks are then used to create test statistics.  In parametric statistics, one deals with the median rather than the mean. Since a mean can be easily influenced by outliers or skewness, and we are not assuming normality, a mean no longer makes sense. The median is another judge of location, which makes more sense in a non-parametric test. The median is considered the center of a distribution. Tests for non-parametric statistics are similar to the tests end covered in AP stats, but each is slightly different. There are non-parametric tests which are similar to the parametric tests. The following table shows how some of the tests match up. Parametric Test Goal for Parametric Test Non-Parametric Test Goal for NonParametric Test Two Sample T-Test To see if two samples have identical population means Wilcoxon Rank-Sum Test To see if two samples have identical population medians One Sample T-Test To test a hypothesis about the mean of the population a sample was taken from Wilcoxon Signed Ranks Test To test a hypothesis about the median of the population a sample was taken from Chi-Squared Test for Goodness of Fit To see if a sample fits a theoretical distribution, such as the normal curve Kolmogorov-Smirnov Test To see if a sample could have come from a certain distribution ANOVA To see if two or more sample means are significantly different Kruskal-Wallis Test To test if two or more sample medians are significantly different ANOVA What is an ANOVA? When are ANOVAs useful? How does one carry out an ANOVA? end ANOVA What is an ANOVA? Since ANOVAs were not covered in AP stats, I will now explain them. An ANOVA is a way to compare multiple sample means to see if they are significantly different. The term comes from a term that describes what the experiment does: ANalysis Of VAriance = ANOVA. An ANOVA looks at the variance between the sample means, and decides if they are significant or not. This can be done to compare two or more samples. end ANOVA When are ANOVAs useful? An ANOVA can be used when one wants to compare any number of samples. This test be done to see if many samples could have come from the same population. This test can also tell you about the differences between two or more areas. For example, if a survey is conducted in many different towns, you can see if their average responses differ significantly. Similarly, you can take samples of plant growth in different climates, soil, or with different treatments. In all cases, an ANOVA can be used to see if the means vary significantly. end ANOVA end How does one carry out an ANOVA? An ANOVA is conducted by first putting all the samples into one, large sample. The standard deviation of this sample is then found, and called  . Next, the value for the range of variation in sample means is found. If the variation between the means is greater than the range of variation, the null hypothesis is rejected. The range of variation is found by finding  / N½, (N½ is the square-root of N) where N is the number of samples in each sample. The difference between each pair of sample means is then found, which is the variation of the means. If any one of these is greater than the range of variation, then those two means are significantly different from each other. Depending on your goal, this may cause you to reject your null hypothesis. end EXAMPLE Now that I have explained the background principles of Non-Parametric Statistics, I will now carry out an example of one of the tests. I have chosen the Wilcoxon Rank-Sum Test (also call the Wilcoxon Mann-Whitney Test) because it is the most commonly used test. The Wilcoxon Rank-Sum Test end The Wilcoxon Rank-Sum Test is used in place of the two-sample t-test when the sampling distributions of the variables being compared are not normal. This test requires two samples of sample size n1 and n2. The test is carried out as follows. Items in green are the steps to the test. Items in white are an example of a real test. The Wilcoxon Rank-Sum Test 1: The first step in this procedure is to collect two samples. Sample 1: {3,2,12,9,13,7,9,11,4,5,6} n1=11 Sample 2: {1,8,4,15,12,6,10,14,3,3} n2=10 end The Wilcoxon Rank-Sum Test end 2: The Second step is to combine the two samples into one large sample. Simply take all the data values from each sample and make one large group. Make sure to know the original samples, as the data will have to be separated back into its original state later. Combined Sample size: n1+n2 = 10+11 = 21 {3,2,12,9,13,7,9,11,4,5,6} and {1,8,4,15,12,6,10,14,3,3} becomes: {3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3} The Wilcoxon Rank-Sum Test end 3: Once all the data is in one sample, the data must be put into order by size. The data should go from smallest to largest. {3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3} In order is: {1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15} The Wilcoxon Rank-Sum Test end 4: Each data value is given a rank based on size. If two or more data have the same value, their rank is the average of the ranks. This step is when the raw data becomes ordinal data, or ranked data. Combined sample in order is: (sample size 21) {1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15} Each data value is ranked 1-21: RANK: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 RAW DATA: 1 2 3 3 3 4 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15 end RANK: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 RAW DATA: 1 2 3 3 3 4 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15 When two or more data have the same rank, the rank is averaged. Therefore, the data becomes: RANK: RAW DATA: 1 2 4 6.5 8 9.5 11 12 13.5 15 16 17.5 19 20 21 1 2 333 44 5 6 6 7 8 9 9 10 11 12 12 13 14 15 The Wilcoxon Rank-Sum Test end 5: The data are then put back into their original sampling groups as ranked data. RANK: RAW DATA: 1 2 4 6.5 8 9.5 11 12 13.5 15 16 17.5 19 20 21 1 2 333 44 5 6 6 7 8 9 9 10 11 12 12 13 14 15 Orininal Sample 1: Ranked Sample 1: {3,2,12,9,13,7,9,11,4,5,6} {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5} Original Sample 2: Ranked Sample 2: {1,8,4,15,12,6,10,14,3,3} {1,12,6.5,21,17.5,9.5,15,20,4,4,} The Wilcoxon Rank-Sum Test 6: The sum of the ranks is taken for each sample. This is the test statistic. Ranked Sample 1: {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5} Sum of sample 1: 120.5 Ranked Sample 2: {1,12,6.5,21,17.5,9.5,15,20,4,4,} Sum of sample 2: 110.5 end The Wilcoxon Rank-Sum Test SUMMARY: 1: Two samples are taken. 2: The samples are combined to make one distribution of sample size (n1+n2). 3: The data are put into order, based on size. 4: Each data value is given a rank based on size. If two or more data have the same value, their rank is the average of the ranks. 5: The data are then put back into their original sampling groups as ranked data. 6: The sum of the ranks is taken for each sample. This is the test statistic. end Non-Parametric Statistics This concludes my presentation. Are there any topics which have been covered that are not clear, which you would like to see again? Wilcoxon Rank-Sum Test explanation/example Explanation of an ANOVA Introduction to Non-Parametric Statistics Chart comparing Significance Tests THANK YOU I would like to thank you for taking the time to view this presentation. If you have any questions regarding this topic, you may email me at [email protected]. I hope that this has been informational and that you now clearly understand what non-parametric statistics are.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Non-Parametric Statistics