Download Introduction to Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Biology
Introduction to Statistics
BIOLOGY
Introduction to Statistics
In science experiments we often have to compare measurements from two different treatments
and decide if the independent variable has a real effect on what we are measuring (the dependent
variable). In other words, are the results/difference significant?
How can we do that? In the scientific community we are not allowed to just say, “Well clearly A
is bigger than B!” Even though the results of an experiment are sometimes obvious, we have to
have some kind of separate measure to indicate to anyone who looks at our results that there
really is a difference.
How do we get a handle on the difference between measurements in an experiment? We use
STATISTICS. There are many statistical tools we can use when comparing data. In this
worksheet we will look at:
 mean
 median
 standard deviations of a sample population
 t Test / p value (comparing two means to see if they are significantly different)
ACORN STUDY
Let’s say we are ecologists and we are studying oak trees. Let’s say you have a hypothesis
(testable guess) that the acorns from Red Oak trees (Quercus rubrum) are heavier than the acorns
that come from the White Oak species (Quercus alba). How will you test your hypothesis? You
know you are going to weigh acorns. Will one acorn of each species suffice?
It’s pretty clear to us that weighing one acorn of each species is not going to be a strong enough
test of our hypothesis. OK so let’s say you collect 21 acorns of each species, mass them (weigh
them on a scale) and compare the data. They might look like this: (Note: the first thing we did
was order the measurements from lowest to highest mass for each species)
Page 1 of 18
Biology
Introduction to Statistics
Figure 1. Data Set A: Acorn Masses (grams)
Acorn #
White Oak Red Oak
1
2.67
2.86
2
3.26
2.86
3
3.4
2.87
4
3.54
3.00
5
3.59
3.17
6
3.59
3.56
7
3.66
3.68
8
3.67
3.71
9
3.71
4.00
10
3.75
4.03
11
3.81
4.05
12
3.92
4.11
13
3.99
4.59
14
4.03
4.97
15
4.1
4.98
16
4.13
4.99
17
4.15
5.01
18
4.29
5.06
19
4.31
5.34
20
5.21
5.42
21
6.01
5.59
AVERAGE (ARITHMETIC MEAN)
OK, so are Red Oak or White Oak acorns heavier? We have to get a handle on these data. The
first tool we might use, one with which you are familiar, is the average, otherwise known as the
arithmetic mean. To calculate the mean mass for each kind of acorn: divide the sum of all the
values by the number of value.
Calculating Average/Mean using Excel:
1. Put your data in a column.
2. Click in the box at the bottom of the column.
3. Click Insert function: fx.
Page 2 of 18
Biology
3
2
4. From the list, click on AVERAGE.
Page 3 of 18
Introduction to Statistics
Biology
Introduction to Statistics
5. Select the column of data for which you want to calculate the mean by: highlighting
desired data or typing in the “coordinates” (ex. B2:B22), and then click Enter.
6. The mean will appear in the box you selected.
Page 4 of 18
Biology
Introduction to Statistics
7. To apply to the next column:
a.
(Red Oak in this case), bring your cursor to the bottom, right corner and drag it
over to the desired box(es).
b.
You will notice that the formula will be carried over and applied to the
appropriate column. The White Oak mean column is B2:B22; the Red Oak
mean column is automatically applied as C2:C22.
7a
.
Acorn # White Oak Red Oak
Mean (g) 3.94
4.18
7b.
Page 5 of 18
Biology
Introduction to Statistics
MEDIAN
Now we know that the mean mass of Red Oak acorns in our sample, 4.18 grams, is greater than
the mean mass of White Oak acorns in our sample, 3.94 grams. For many of us in our relatively
short scientific careers, calculating the mean would be enough. We might conclude that Red Oak
acorns are heavier than White Oak acorns, but look at how much heavier, only 4.18-3.94=0.24
grams.
Let’s take a bit closer look. First of all let’s look at the range of values for each kind of acorn:
Fig 1. : Acorn Masses (grams)
Acorn #
White Oak Red Oak
1
2.67
2.86
2
3.26
2.86
3
3.4
2.87
4
3.54
3.00
5
3.59
3.17
6
3.59
3.56
7
3.66
3.68
8
3.67
3.71
9
3.71
4.00
10
3.75
4.03
11
3.81
4.05
12
3.92
4.11
13
3.99
4.59
14
4.03
4.97
15
4.1
4.98
16
4.13
4.99
17
4.15
5.01
18
4.29
5.06
19
4.31
5.34
20
5.21
5.42
21
6.01
5.59
The White Oak acorns range from 2.67 to 6.01 grams. The Red Oak acorns range from 2.86 to
5.59 grams. When there’s quite a range of values in a sample it’s helpful to use two more
statistical tools: the median and the standard deviation.
The median is a simple number to come up: it’s just the middle number in a sample that has
been arranged from least to greatest in value. If the sample is an even number, just take the two
middle numbers in the sample, and calculate their average. Look at Fig. 1 and look for the
number in the middle of each column and calculate the median mass for
White Oak = ______ grams
Red Oak = _______ grams
Page 6 of 18
Biology
Introduction to Statistics
White Oak = 3.81 grams
Red Oak= 4.05 grams
Calculating Median using Excel:
To calculate in Excel:
1. Click Insert Function (fx)
2. Click on MEDIAN from the list
3. Select all the values in the column by highlighting or typing in the appropriate box. **Be
careful to include ONLY the boxes that hold data. Ex. In this case, make sure you
highlight B2:B22 instead of B2:B23
1
2
3
Page 7 of 18
Biology
Introduction to Statistics
STANDARD DEVIATION (STDEV)
The median is a valuable quick check against the mean because two sets of numbers can have the
same mean but very different spread / range in value.
For example, the following two sets of test scores have the same means:
Fig. 2: Test scores
A
B
86
86
85
84
84
96
94
92
73
70
mean
median
But notice that the medians of these two sets of numbers are different.
What is the mean? Student A: _____________
What is the median? Student A: _____________
Student B: _______________
Student B: _______________
Compare mean and median for both students. What do you notice? What can you
conclude?
The median for the first student, 85, is right on the mean and confirms that the spread around the mean is not very great. The
median for the second student, 92, is seven points off from the mean of 85. If nothing else, the difference between the mean and
median indicate that the test scores for the second student are far less consistent than that of the first student.
A more direct measure of the “spread” of the values in a sample is the standard deviation.
Page 8 of 18
Biology
Introduction to Statistics
Calculating Standard Deviation (STDEV) using Excel:
Standard deviation tells you the spread (range) of the numbers around the mean of a sample.
Take a look at the test scores again. In order to calculate the STD using Excel:
1. Select a box at the bottom of a column of numbers.
2. Click on Insert Function and select STDEV from the list. (similar to Mean and Median).
3. Select the range of values in the column.
Student
A
mean
STD
86
86
85
84
84
85
1
B
96
94
92
73
70
85
12.4499
The STD of student A’s grade is only 1. The STD of student B’s grade is nearly 12.5. Though
both students have 85 averages, the STD clearly demonstrates how spread out student B’s grades
are.
Let’s go back to the acorn data.
Find Mean, Median and STD using Excel.
White
Oak
2.67
3.26
3.4
3.54
3.59
3.59
3.66
3.67
3.71
3.75
3.81
3.92
3.99
4.03
4.1
4.13
4.15
4.29
4.31
5.21
6.01
Mean
Median
Std
Page 9 of 18
Red
Oak
2.86
2.86
2.87
3.00
3.17
3.56
3.68
3.71
4.00
4.03
4.05
4.11
4.59
4.97
4.98
4.99
5.01
5.06
5.34
5.42
5.59
Biology
Introduction to Statistics
Mean
Median
Std
White Oak
3.94
3.81
0.68
Red Oak
4.18
4.05
0.92
Which has a greater spread around the mean? Calculate both the absolute value of the
spread and what percentage the spread is of the mean.
In this case the means, medians and STD’s are comparable to each other for both kinds of
acorns. There is a little more spread in the Red Oak data; you can see that the STD is .92 or
about 22% of the mean, as opposed to the White Oak data in which the STD, 0.68 is about 18%
of the mean. But the means are different; can we say that the average mass of Red Oak acorns in
this case is greater than the mass of White Oak acorns?
Page 10 of 18
Biology
Introduction to Statistics
DETERMINING SIGNIFICANCE:


T test
p value
The T test:
As seen earlier, the means of the two acorns are different; can we say that the average mass of
Red Oak acorns in this case is greater than the mass of White Oak acorns? I know you want to
say “yes” but the statisticians say “hold on there!” There is a specific test that tells you the
probability that two means are different from each other due to chance. Read the next paragraph
carefully and make sure you get it.
If two sets of numbers have the same mean, you could imagine that actually they are measured,
or “pulled”, from only one big set of numbers. The T test is a statistical test that gives you the
probability (p-value) that two samples are pulled from the same big set of numbers. Another
way to look at it is that the T test tells you the probability that the difference between two
means is due to chance.
Rewrite your understanding of p value in your own words:
Page 11 of 18
Biology
Introduction to Statistics
Calculating p value for two sets of numbers using Excel:
1. Click on a box at the bottom of one of your columns of numbers.
2. Click on Insert Function (fx) and select T test (TTEST). You will see a dialogue box with
four different rows.
1.
2.
Page 12 of 18
Biology
Introduction to Statistics
3. For the first row, Array1, select White Oak data (the first column of numbers).
4. For the second row, Array2, select the Red Oaks data (second column of numbers).
Page 13 of 18
Biology
Introduction to Statistics
5. In the third row, Tails, type “2” for a two-tailed test.
6. In the fourth row Type, type “3”.
Page 14 of 18
Biology
Introduction to Statistics
When you calculate the p-value for the acorn data, what do you get?
What does this mean? It means that there is an approximately 0.34 or 34% chance that any
differences between the two means are due to chance.
For a scientist that is way too much to accept that the two means are different. As ecologists we
have not supported the hypothesis that Red Oak acorns weigh more than White Oak acorns.
Page 15 of 18
Biology
Introduction to Statistics
Now let’s look at a different set of data for these acorns:
Figure 2. Data Set B: Acorn Masses (grams)
White
Oak
2.67
3.26
3.4
3.54
3.59
3.59
3.66
3.67
3.71
3.75
3.81
3.92
3.99
4.03
4.1
4.13
4.15
4.29
4.31
5.21
6.01
Red
Oak
2.86
3.00
3.17
3.56
3.68
3.71
4.00
4.03
4.05
4.11
4.59
4.97
4.98
4.99
5.01
5.06
5.34
5.42
5.59
5.95
6.08
Mean
Median
Std
t test P=
Using Excel, find Mean, Median, STD, and run a t Test for Set #2.
Page 16 of 18
Biology
Introduction to Statistics
In this case the mean mass of Red Oak acorns, 4.48 grams is again greater than the mean mass of
White Oak acorns, 3.94 grams. The medians and STD’s are again comparable. So what can we
conclude?
Page 17 of 18
Biology
Introduction to Statistics
Comparing two sets of Acorn Data:
Data Set A.
White Oak
3.94
3.81
0.682018
0.341613
Mean
Median
STDV
t Test
Red Oak
4.18
4.05
0.921088
Data Set B.
Mean
Median
STDV
t Test
White Oak
3.94
3.81
0.682018
0.041907
Red Oak
4.48
4.59
0.957091
Before running the T test, you might think that these two data sets are very similar in number.
However, we can’t conclude anything until we perform a T test. When we do, we get a p-value
of 0.04. What does this mean?
It means that there is a 0.04 or 4% chance that the difference between the two means is due to
chance. This is very different from the first case.
In general, when scientists see that there is a p-value of less than 0.05, they accept that the
differences between the two samples are real.
Because in this case the p<.05, we ecologists would say that the Red Oak acorns in Data Set B.
really are heavier than the White Oak acorns.
When you obtain Absorbance measurements for your Euglena, you will want to compare the
mean absorbance values from different treatments in a similar way. Your teacher will guide you
through this process. Good Luck!
Be sure you can define the following terms in your own words:
 Mean / Average
 Median
 Standard Deviation
 T Test
 P value
Page 18 of 18