Download Exercise 1 Extension: ChiSquare Analysis of Fast Plants Results

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Exercise 6: Extension Activity for Fast Plant Dihybrid Cross
Chi-Square Analysis
The observed phenotypic ratios among progeny of a cross rarely match the expected
ratios perfectly, even when the hypothesis on which the expected ratios are based is
correct. For example, none of the F2 progeny from Mendel's dihybrid crosses showed
an exact 9:3:3:1 phenotypic ratio. However, a large difference between the data and the
predicted results could be telling you that the original hypothesis is wrong. In your case,
the null hypothesis is that your results do NOT deviate from the expected (Mendelian)
inheritance pattern for a dihybrid cross (e.g., a 9:3:3:1 ratio of phenotypes).
How do you decide if the difference between your observed results and your expected
results can be accounted for by mere chance deviation, or if there is a difference
significant enough to suggest that the null hypothesis is NOT supported by your data?
A simple statistical test, called the chi-square test (2), can be used. This is essentially
a test of goodness-of-fit. This test takes into account the observed deviation in each
component of an expected ratio as well as the sample size, and reduces them to a
single numerical value (2). This value can then be used to estimate the probability (p)
of the observed deviation occurring strictly as a result of chance.
Formula:
o = observed value for a given category
(o - e) 2
 
e
2
e = expected value for a given category
d = deviation = (o - e )
 (sigma) = sum of the calculated values for
each category of the ratio.
One more thing needs to be taken into account: the number of categories into which a
datum point may fall. In general, a larger number of categories means there is more
room for deviation to occur. Thus, if we are counting individuals in four different
phenotypic categories, the 2 value will be calculated from four different squared
deviations; whereas if there were only two phenotypic categories, then only two squared
deviations would be used to calculate the 2 value. It is reasonable to expect a higher
2 value in the case of the experiment with four phenotypic categories. Thus, in
interpreting the 2 value, the number of independent categories is taken into account as
the degrees of freedom (df) of the experiment, where:
df = k - 1
k = number of categories
1
Chi-square Analysis - continued
A 2 value can then be interpreted in terms of a probability value (p). The calculation
relating 2 and p is complicated; we will simply use a table of calculated probabilities for
given 2 values.
The p value represents the probability that the observed deviation is due to chance alone.
Small random deviations from expected values are likely to occur frequently, that is,
they have a high probability of occurrence, and hence are indicated by high p values.
However, for the null hypothesis to be supported, large deviations should be rare, that
is they have a low probability of occurrence (as indicated by low p values). Typically, for
p values ≤ 0.05, scientists reject the null hypothesis and conclude that the differences
between the observed and the expected values DO differ significantly (e.g., the
differences are due to something other than random events). That is, we generally
consider a result statistically significant if there is less than or equal to a 5% probability
of getting that result by chance alone.
To summarize, if the p value is greater than 0.05 (due to a small 2 value), then the null
hypothesis cannot be rejected and we conclude that the observed results are not
significantly different than the expected values (e.g., your results do NOT deviate
significantly from the predicted Mendelian ratio). On the other hand, if the p value is less
than or equal to 0.05 (due to a large  2 value), then you must reject the null hypothesis
and offer an alternative explanation for your results (i.e., a new hypothesis is in order!).
2
Chi-square Analysis - continued
An Example: Analysis of the F2 results of a dihybrid cross - the null hypothesis here is
that the F2 progeny do not deviate from the simple phenotypic 9:3:3:1 ratio,
characteristic of the Mendelian inheritance pattern of two unlinked recessive traits.
expected
ratio
9/16
3/16
3/16
1/16
TOTAL:
o
587
197
168
56
1008
e
9/16(1008) = 567
3/16(1008) = 189
3/16(1008) = 189
1/16(1008) = 63
d
+20
+8
-21
-7
d2
400
64
441
49
d 2/e
0.71
0.34
2.33
0.78
2
 = 4.16
df = 4-1 = 3
We then use the 2 table above: first, find the line corresponding to df = 3, then look
across this line for the numbers that are closest to the calculated 2 value. 4.16 falls
between 2.366 and 4.642. The corresponding p values (on the top of each column) are
0.2 and 0.5. This means that by chance alone there is a 20% - 50% probability of
observing a deviation as great as the one we observed. Thus, if we repeat this
experiment 100 times, 20-50 trials would be expected to show a deviation as great as
the one observed on this first experiment (where 2=4.16). Thus, it is very probable that
the observed deviations can be attributed to chance alone (p is much greater than 0.05),
and these data support the null hypothesis, e.g., the results do NOT differ significantly
from the predicted 9:3:3:1 ratio of phenotypes. Yippee – Mendel is supported once
again!
3