Download Statistics in Biology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
A-level statistics page 1
Statistics for Frequency Data
Sometimes the data from an experiment are not measurements but counts (or frequencies) of
things, such as counts of different phenotypes in a genetics cross, or counts of species in different
habitats. With frequency data you can’t usually calculate averages or do a t test, but instead you do a
chi-squared (2) test. This compares observed counts with some expected counts and tells you the
probability (P) that there is no difference between them. In Excel the 2 test is performed using the
formula: =CHITEST (observed range, expected range) . There are three different uses of the test
depending on how the expected data are calculated.
Sometimes the expected data can be calculated from a quantitative theory, in which case you
are testing whether your observed data agree with the theory (if P < 5% then the data do not
agree with the theory, and if P > 5% then the data do agree with the theory). A good example is a
genetic cross, where Mendel’s laws can be used to
predict frequencies of different phenotypes. In this
example Excel formulae are used to calculate the
expected values using a 3:1 ratio of the total
number of observations. The 2 P is 53%, which
is much greater than 5%, so the results do indeed
support Mendel’s law. Incidentally a very high P
(>80%) is suspicious, as it means that the results
are just too good to be true.
1.
2.
Other times the expected data are calculated by assuming that the counts in all the categories
should be the same, in which case you are testing whether there is a difference between the
counts (if P < 5% then the counts are significantly different from each other, and if P > 5% then
there is no significant difference between the
counts). In the example above the sex of children
born in a hospital over a period of time is compared.
The expected values are calculated by assuming
there should be equal numbers of boys and girls, and
the 2 P of 6.4% is greater than 5%, so there is no
significant difference between the sexes.
If the count data are for categories in two groups, then the expected data can be calculated by
assuming that the two groups are independent (if P < 5% then there is a significant association
between the two groups, and if P > 5% then the two groups are independent). Each group can have
counts in two or more categories, and the observed frequency data are set out in a table, called a
contingency table. A copy of this table is then made for the expected data, which are calculated for
each cell from the corresponding totals of the observed data, using the formula
E = column total x row total / grand total . In this example the flow rate of a stream (the two categories
fast / slow) is compared to the type of stream bed (the four categories weed-choked / some weeds /
shingle / silt) at 50
different sites to see if
there is an association
between them. The 2
P of 1.1% is less than
5%, so there is an
association between
flow rate and stream
bed.
3.
HGS Biology
NCM/6/01