* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Statistics
Survey
Document related concepts
Transcript
Biology Introduction to Statistics BIOLOGY Introduction to Statistics In science experiments we often have to compare measurements from two different treatments and decide if the independent variable has a real effect on what we are measuring (the dependent variable). In other words, are the results/difference significant? How can we do that? In the scientific community we are not allowed to just say, “Well clearly A is bigger than B!” Even though the results of an experiment are sometimes obvious, we have to have some kind of separate measure to indicate to anyone who looks at our results that there really is a difference. How do we get a handle on the difference between measurements in an experiment? We use STATISTICS. There are many statistical tools we can use when comparing data. In this worksheet we will look at: mean median standard deviations of a sample population t Test / p value (comparing two means to see if they are significantly different) ACORN STUDY Let’s say we are ecologists and we are studying oak trees. Let’s say you have a hypothesis (testable guess) that the acorns from Red Oak trees (Quercus rubrum) are heavier than the acorns that come from the White Oak species (Quercus alba). How will you test your hypothesis? You know you are going to weigh acorns. Will one acorn of each species suffice? It’s pretty clear to us that weighing one acorn of each species is not going to be a strong enough test of our hypothesis. OK so let’s say you collect 21 acorns of each species, mass them (weigh them on a scale) and compare the data. They might look like this: (Note: the first thing we did was order the measurements from lowest to highest mass for each species) Page 1 of 18 Biology Introduction to Statistics Figure 1. Data Set A: Acorn Masses (grams) Acorn # White Oak Red Oak 1 2.67 2.86 2 3.26 2.86 3 3.4 2.87 4 3.54 3.00 5 3.59 3.17 6 3.59 3.56 7 3.66 3.68 8 3.67 3.71 9 3.71 4.00 10 3.75 4.03 11 3.81 4.05 12 3.92 4.11 13 3.99 4.59 14 4.03 4.97 15 4.1 4.98 16 4.13 4.99 17 4.15 5.01 18 4.29 5.06 19 4.31 5.34 20 5.21 5.42 21 6.01 5.59 AVERAGE (ARITHMETIC MEAN) OK, so are Red Oak or White Oak acorns heavier? We have to get a handle on these data. The first tool we might use, one with which you are familiar, is the average, otherwise known as the arithmetic mean. To calculate the mean mass for each kind of acorn: divide the sum of all the values by the number of value. Calculating Average/Mean using Excel: 1. Put your data in a column. 2. Click in the box at the bottom of the column. 3. Click Insert function: fx. Page 2 of 18 Biology 3 2 4. From the list, click on AVERAGE. Page 3 of 18 Introduction to Statistics Biology Introduction to Statistics 5. Select the column of data for which you want to calculate the mean by: highlighting desired data or typing in the “coordinates” (ex. B2:B22), and then click Enter. 6. The mean will appear in the box you selected. Page 4 of 18 Biology Introduction to Statistics 7. To apply to the next column: a. (Red Oak in this case), bring your cursor to the bottom, right corner and drag it over to the desired box(es). b. You will notice that the formula will be carried over and applied to the appropriate column. The White Oak mean column is B2:B22; the Red Oak mean column is automatically applied as C2:C22. 7a . Acorn # White Oak Red Oak Mean (g) 3.94 4.18 7b. Page 5 of 18 Biology Introduction to Statistics MEDIAN Now we know that the mean mass of Red Oak acorns in our sample, 4.18 grams, is greater than the mean mass of White Oak acorns in our sample, 3.94 grams. For many of us in our relatively short scientific careers, calculating the mean would be enough. We might conclude that Red Oak acorns are heavier than White Oak acorns, but look at how much heavier, only 4.18-3.94=0.24 grams. Let’s take a bit closer look. First of all let’s look at the range of values for each kind of acorn: Fig 1. : Acorn Masses (grams) Acorn # White Oak Red Oak 1 2.67 2.86 2 3.26 2.86 3 3.4 2.87 4 3.54 3.00 5 3.59 3.17 6 3.59 3.56 7 3.66 3.68 8 3.67 3.71 9 3.71 4.00 10 3.75 4.03 11 3.81 4.05 12 3.92 4.11 13 3.99 4.59 14 4.03 4.97 15 4.1 4.98 16 4.13 4.99 17 4.15 5.01 18 4.29 5.06 19 4.31 5.34 20 5.21 5.42 21 6.01 5.59 The White Oak acorns range from 2.67 to 6.01 grams. The Red Oak acorns range from 2.86 to 5.59 grams. When there’s quite a range of values in a sample it’s helpful to use two more statistical tools: the median and the standard deviation. The median is a simple number to come up: it’s just the middle number in a sample that has been arranged from least to greatest in value. If the sample is an even number, just take the two middle numbers in the sample, and calculate their average. Look at Fig. 1 and look for the number in the middle of each column and calculate the median mass for White Oak = ______ grams Red Oak = _______ grams Page 6 of 18 Biology Introduction to Statistics White Oak = 3.81 grams Red Oak= 4.05 grams Calculating Median using Excel: To calculate in Excel: 1. Click Insert Function (fx) 2. Click on MEDIAN from the list 3. Select all the values in the column by highlighting or typing in the appropriate box. **Be careful to include ONLY the boxes that hold data. Ex. In this case, make sure you highlight B2:B22 instead of B2:B23 1 2 3 Page 7 of 18 Biology Introduction to Statistics STANDARD DEVIATION (STDEV) The median is a valuable quick check against the mean because two sets of numbers can have the same mean but very different spread / range in value. For example, the following two sets of test scores have the same means: Fig. 2: Test scores A B 86 86 85 84 84 96 94 92 73 70 mean median But notice that the medians of these two sets of numbers are different. What is the mean? Student A: _____________ What is the median? Student A: _____________ Student B: _______________ Student B: _______________ Compare mean and median for both students. What do you notice? What can you conclude? The median for the first student, 85, is right on the mean and confirms that the spread around the mean is not very great. The median for the second student, 92, is seven points off from the mean of 85. If nothing else, the difference between the mean and median indicate that the test scores for the second student are far less consistent than that of the first student. A more direct measure of the “spread” of the values in a sample is the standard deviation. Page 8 of 18 Biology Introduction to Statistics Calculating Standard Deviation (STDEV) using Excel: Standard deviation tells you the spread (range) of the numbers around the mean of a sample. Take a look at the test scores again. In order to calculate the STD using Excel: 1. Select a box at the bottom of a column of numbers. 2. Click on Insert Function and select STDEV from the list. (similar to Mean and Median). 3. Select the range of values in the column. Student A mean STD 86 86 85 84 84 85 1 B 96 94 92 73 70 85 12.4499 The STD of student A’s grade is only 1. The STD of student B’s grade is nearly 12.5. Though both students have 85 averages, the STD clearly demonstrates how spread out student B’s grades are. Let’s go back to the acorn data. Find Mean, Median and STD using Excel. White Oak 2.67 3.26 3.4 3.54 3.59 3.59 3.66 3.67 3.71 3.75 3.81 3.92 3.99 4.03 4.1 4.13 4.15 4.29 4.31 5.21 6.01 Mean Median Std Page 9 of 18 Red Oak 2.86 2.86 2.87 3.00 3.17 3.56 3.68 3.71 4.00 4.03 4.05 4.11 4.59 4.97 4.98 4.99 5.01 5.06 5.34 5.42 5.59 Biology Introduction to Statistics Mean Median Std White Oak 3.94 3.81 0.68 Red Oak 4.18 4.05 0.92 Which has a greater spread around the mean? Calculate both the absolute value of the spread and what percentage the spread is of the mean. In this case the means, medians and STD’s are comparable to each other for both kinds of acorns. There is a little more spread in the Red Oak data; you can see that the STD is .92 or about 22% of the mean, as opposed to the White Oak data in which the STD, 0.68 is about 18% of the mean. But the means are different; can we say that the average mass of Red Oak acorns in this case is greater than the mass of White Oak acorns? Page 10 of 18 Biology Introduction to Statistics DETERMINING SIGNIFICANCE: T test p value The T test: As seen earlier, the means of the two acorns are different; can we say that the average mass of Red Oak acorns in this case is greater than the mass of White Oak acorns? I know you want to say “yes” but the statisticians say “hold on there!” There is a specific test that tells you the probability that two means are different from each other due to chance. Read the next paragraph carefully and make sure you get it. If two sets of numbers have the same mean, you could imagine that actually they are measured, or “pulled”, from only one big set of numbers. The T test is a statistical test that gives you the probability (p-value) that two samples are pulled from the same big set of numbers. Another way to look at it is that the T test tells you the probability that the difference between two means is due to chance. Rewrite your understanding of p value in your own words: Page 11 of 18 Biology Introduction to Statistics Calculating p value for two sets of numbers using Excel: 1. Click on a box at the bottom of one of your columns of numbers. 2. Click on Insert Function (fx) and select T test (TTEST). You will see a dialogue box with four different rows. 1. 2. Page 12 of 18 Biology Introduction to Statistics 3. For the first row, Array1, select White Oak data (the first column of numbers). 4. For the second row, Array2, select the Red Oaks data (second column of numbers). Page 13 of 18 Biology Introduction to Statistics 5. In the third row, Tails, type “2” for a two-tailed test. 6. In the fourth row Type, type “3”. Page 14 of 18 Biology Introduction to Statistics When you calculate the p-value for the acorn data, what do you get? What does this mean? It means that there is an approximately 0.34 or 34% chance that any differences between the two means are due to chance. For a scientist that is way too much to accept that the two means are different. As ecologists we have not supported the hypothesis that Red Oak acorns weigh more than White Oak acorns. Page 15 of 18 Biology Introduction to Statistics Now let’s look at a different set of data for these acorns: Figure 2. Data Set B: Acorn Masses (grams) White Oak 2.67 3.26 3.4 3.54 3.59 3.59 3.66 3.67 3.71 3.75 3.81 3.92 3.99 4.03 4.1 4.13 4.15 4.29 4.31 5.21 6.01 Red Oak 2.86 3.00 3.17 3.56 3.68 3.71 4.00 4.03 4.05 4.11 4.59 4.97 4.98 4.99 5.01 5.06 5.34 5.42 5.59 5.95 6.08 Mean Median Std t test P= Using Excel, find Mean, Median, STD, and run a t Test for Set #2. Page 16 of 18 Biology Introduction to Statistics In this case the mean mass of Red Oak acorns, 4.48 grams is again greater than the mean mass of White Oak acorns, 3.94 grams. The medians and STD’s are again comparable. So what can we conclude? Page 17 of 18 Biology Introduction to Statistics Comparing two sets of Acorn Data: Data Set A. White Oak 3.94 3.81 0.682018 0.341613 Mean Median STDV t Test Red Oak 4.18 4.05 0.921088 Data Set B. Mean Median STDV t Test White Oak 3.94 3.81 0.682018 0.041907 Red Oak 4.48 4.59 0.957091 Before running the T test, you might think that these two data sets are very similar in number. However, we can’t conclude anything until we perform a T test. When we do, we get a p-value of 0.04. What does this mean? It means that there is a 0.04 or 4% chance that the difference between the two means is due to chance. This is very different from the first case. In general, when scientists see that there is a p-value of less than 0.05, they accept that the differences between the two samples are real. Because in this case the p<.05, we ecologists would say that the Red Oak acorns in Data Set B. really are heavier than the White Oak acorns. When you obtain Absorbance measurements for your Euglena, you will want to compare the mean absorbance values from different treatments in a similar way. Your teacher will guide you through this process. Good Luck! Be sure you can define the following terms in your own words: Mean / Average Median Standard Deviation T Test P value Page 18 of 18