Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Diversity and Distribution of Species Calculating the index of diversity for species in a sample and comparing the distribution of species between samples from two sites Core Quantitative concepts and skills Statistics: Simpson’s index of diversity and the chi-squared test Prepared for SSAC by *David McAvity – The Evergreen State College* © The Washington Center for Improving the Quality of Undergraduate Education. All rights reserved. *2007* 1 Measure of Diversity The number of different methods for determining the diversity of species. The key idea is to balance two important components of diversity. One is the richness of the sample and the other is the evenness of the sample. The richness of a sample is the number of different types of organisms (species, or other category) present in the sample. The evenness refers to the balance between number of each type: The following examples illustrate these concepts: Consider the following data giving the number of different flowers contained in a 10 square meter plot from three different lawns, A, B and C. lawn flower daisy dandelion clover butter cup thistle total plot A plot B 10 20 19 8 3 60 plot C 35 42 23 100 47 4 6 1 2 60 Plot A and C are both richer than plot B because they have five species present and plot B has only three. However, A is more even than C, because the numbers of organisms are more evenly distributed between the species in A. Plot C has a much larger proportion of daisies and very few of the others. Notice how plot B actually has more organisms in total. 2 Diversity Index One way to quantify the diversity in a sample is to think about the likelihood of getting two of the same species if you removed two organisms at random from the sample. If the sample is very diverse you would be unlikely to get two of the same type of organism. In plot C from the previous example you would be very likely to get two daisies if you randomly chose two, whereas in plot A, this would be less likely. Simpson’s index of diversity is the probability of getting two different species when picking two organisms from your sample. This is one minus the probability of getting two of the same kind. For example, if you have 4 red and 3 4 3 blue balls in a bag. The probability of getting two reds when drawing out two is 7 6 and 3 2 the probability of getting two blues is . Notice that each of these has the form: 7 6 n n 1 n(n 1) N N 1 N ( N 1) where n is the number of balls of a particular color and N is the total number of balls. The probability of getting two of a kind is the sum of these terms for each type of bal. This can be expressed as diversity to be: D 1 n(n 1) N ( N 1) . So we define Simpson’s index of n(n 1) N ( N 1) 3 Calculating Simpson’s Index of Diversity Simpson’s index of diversity is D 1 n(n 1) N ( N 1) Where n number of organisms of each type, and N is the total number of organisms. Since this is a probability the values should range between 0 and 1, with 1 being the most diverse and 0 being the least. Use a spreadsheet to calculate Simpson’s index of diversity using the data from Plot A shown in yellow cells below. You should enter a formula to duplicate the results in the peach cells. lawn flower daisy dandelion clover butter cup thistle total Index of Diversity n n(n-1) 10 20 19 8 3 60 0.75 90 380 342 56 6 874 = cell with a number in it = cell with a formula in it Repeat this calculation for the other two samples of data in the previous previous example to see if this measure of diversity fits our intuition about what diversity is. 4 Chi-squared test for the distribution of species Even when two sites have a similar diversity the species may have different relative abundances. For examples, if we have two bags, one with 3 red balls and 8 blue balls and another with 8 red balls and 3 blue balls, the diversity of the two bags will be the same, but the distribution is different. To test if two sites have a significantly different distribution of species we do a chi-squared test. For this example we will compare plots A and B. The first thing to do is to group the data into categories so that frequency of each type is greater than or equal to 5 lawn flower plot A plot B daisy 10 35 dandelion 20 42 clover 19 23 butter cup 8 thistle 3 total 60 100 lawn flower plot A plot B daisy 10 35 dandelion 20 42 other 30 23 total 60 100 5 Chi-squared test for the distribution of species Now we form the row totals to find out how many of each species are present in both the plots together. To see if there is a difference in the distribution of species between the two plots we calculate the chi-squared statistic with the null hypothesis that there is no difference. If there were no difference we would expect that the distribution in each plot would be the same as the over all distribution. In the example below the proportion of daisies in the overall total is 45:160 or about 28%. This means we would expect 28% of the flowers in Plot A to be daisies and 28% of flowers in Plot B to be daisies. 28% of of 60 is 16.9 and 28% of 100 is 28. In general the entry in any cell in the table for expected frequencies is (row total)(column total)/(grand total). Complete the following expected frequency tables as shown below by finding formulas for each of the peach cells. Observed lawn flower plot A plot B Total daisy 10 35 45 dandelion 20 42 62 other 30 23 53 total 60 100 160 Expected lawn flower plot A plot B Total daisy 16.875 28.125 45 dandelion 23.25 38.75 62 other 19.875 33.125 53 total 60 100 160 6 Calculating chi-squared. Once you have your expected frequencies you calculate chi-squared using the formula. 2 (O E ) 2 E Where O is the observed frequency of each cell and E is the expected frequency. Once you have your chi-squared value you need to calculate the degrees of freedom. In a contingency table this is given by (N-1)(M-1) where N is the number of rows and M is the number of columns. In our example b there are 3 rows and 2 columns so the degrees of freedom are 2. Finally we compare our value of chisquared to the critical value at the 0.05 level of significance and if our value is less than the critical value we cannot reject the null hypothesis – ie the two sites have a similar distribution. However if our value of chi-squared is greater than the critical value we can reject the null hypothesis ie we can say the sites have a different distribution. Critical values of chi-squared are given on the next page. In each of the cells below calculate (O-E)2/E then find the grand total, which is the value of chisquared. Here the grand total is 13.46. The critical value of chi-squared with two degrees of freedom at the 0.05 level of significance is 5.99, since ours is greater we can say that the two plots have a significantly different distribution at that level of significance. Now compare plot A and plot C in the same way. Chi-squared lawn flower plot A daisy 2.8009 dandelion 0.4543 other 5.158 total 8.4132 plot B 1.6806 0.2726 3.0948 5.0479 Total 4.4815 0.7269 8.2528 13.461 7 Critical Values of Chi-squared Chi-Square Table 0.050 0.010 0.001 df 1 2 3 4 3.84146 5.99147 7.81473 9.48773 6.63490 9.21034 11.3449 13.2767 10.828 13.816 16.266 18.467 5 6 7 8 9 11.0705 12.5916 14.0671 15.5073 16.9190 15.0863 16.8119 18.4753 20.0902 21.6660 20.515 22.458 24.322 26.125 27.877 10 11 12 13 14 18.3070 19.6751 21.0261 22.3621 23.6848 23.2093 24.7250 26.2170 27.6883 29.1413 29.588 31.264 32.909 34.528 36.123 8