Download Mitosis Data Analysis: Testing Statistical Hypotheses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Mitosis Data Analysis:
Testing Statistical Hypotheses
By Dana Krempels, Ph.D. and Steven Green, Ph.D.
The raw data (singular = datum) you have collected for the past two lab sessions are
counts of the number of cells in various stages of mitosis. This chapter will guide you through
the process of data analysis so that you can determine whether there is a difference between
your treatment and control onion root tips.
I. Data, Parameters, and Statistics: Quick Review
Recall that data can be of three basic types:
1. Attribute data. These are descriptive, "either-or" measurements, and usually describe the
presence or absence of a particular attribute. Because such data have no specific sequence,
they are considered unordered.
2. Discrete numerical data. These correspond to biological observations counted as
integers (whole numbers). These data are ordered, but do not describe physical attributes of
the things being counted.
3. Continuous numerical data. These are data that fall along a numerical continuum. The
limit of resolution of such data is the accuracy of the methods and instruments used to collect
them. Continuous numerical data generally fall along a normal (Gaussian) distribution, a
function indicating the probability that a data point will fall between any two real numbers.
Usually, data measurements are distributed over a range of values. Measures of the
tendency of measurements to occur near the center of the range include the population mean
(the average measurement), the median (the measurement located at the exact center of the
range) and the mode (the most common measurement in the range). Measurements of
dispersion around the mean include the range, variance and standard deviation.
Parameters and Statistics
If you were able to measure the height of every adult male Homo sapiens who ever existed,
and then calculate a mean, median, mode, range, variance and standard deviation from your
measurements, those values would be known as parameters. They represent the actual
values as calculated from measuring every member of a population of interest. Obviously, it is
very difficult to obtain data from every member of a population of interest, and impossible of
that population is theoretically infinite in size. However, one can estimate parameters by
randomly sampling members of the population.
Such an estimate, calculated from
measurements of a subset of the entire population, is known as a statistic.
Mitosis Data Analysis - 1
In general, parameters are written as Greek symbols equivalent to the Roman symbols
used to represent statistics. For example, the standard deviation for a subset of an entire
population is written as "s", whereas the true population parameter is written as s.
II. From Raw Data to Mitotic Index
Now that you’ve had a chance to review a bit of statistical information, it’s time to apply it to
your team’s project. In this section, you will be guided through the process of calculating
indices from your raw data collected over the past two weeks, and then using those indices to
compare the two populations of dividing cells, treatment and control.
A. Ordinal Data Points: Mitotic Index (M)
When you counted mitotic cells in your samples, you were taking a survey of the number
of different stages of mitosis present in each of your two populations (treatment and control).
You counted the number of mitotic cells in 10 samples (remember: all the root tips from a
single individual onion comprise one sample) in each of the treatment and control populations.
You then calculated a Mitotic Index (M) for each sample. (Depending on the parameter your
team chose, this might have been simply the number of mitotic cells in a sample (M), or it
could have been the number of cells in a particular phase of mitosis in a sample (Mx, with x
being the phase of mitosis). Be sure to specify the nature of your index in all your reports.)
At the end of your preliminary calculations, you should have ten M values for each of the
two populations you are comparing. You will use these M values in a Mann-Whitney U test to
determine whether your two populations differ significantly in their states of mitosis.
Recall the formula for a Mitotic Index, which represents the proportion/frequency of mitotic
cells in your total cell population.
M = nm/N
nm = the number of mitotic cells in the sample
N = the total number of cells counted in the sample.
Your team should have counted at least 10 samples from each of your two root tip cell
poppuations, and and should have Mitotic Indices for both. If you have not yet done so,
calculate the indices and enter them in the table below. Provide an appropriate table legend.
Table
Treatment
Sample #
.
Mitotic
Index (M)
Control
Sample #
Mitosis Data Analysis - 2
Mitotic
Index (M)
[NOTE: If your team is calculating indices for one or more specific stages of mitosis,
you will subject each of those paired sets of indices to the Mann-Whitney test, as well.
This will be for you to decide.]
So what do we do with these indices? You may have an intuitive sense of whether or not
your treatment and control overlap in the number of mitotic cells. But that’s not enough.
Statistics and statistical tests are used to test whether the results of an experiment are
significantly different from the null hypothesis prediction. What is meant by "significant?" For
that matter, what is meant by "expected" results? To answer these questions, we must
consider the matter of probability.
B. Probability
The probability (P value) that an observed result is due to some factor other than chance
is also known as alpha (α). By convention, α is usually set at 0.05, or 5%, which means that
there is a 95% probability that a particular outcome is due to some factor other than random
chance. In essence, α is a “cut off value” that defines the area(s) in a probability distribution
where a particular value is unlikely to fall.
In some studies, a more rigorous α of 0.01 (1%) is required to reject the null hypothesis,
and in some others, a more lenient α of 0.1 (10%) is allowed for rejection of the null
hypothesis. For our study of mitosis, you will use an α level of 0.05.
The term "significant" is often used in every day conversation, yet few people know the
statistical meaning of the word. In scientific endeavors, significance has a highly specific and
important definition. Every time you read the word "significant" in this lab manual, know that
we refer to the following scientifically accepted standard:
The difference between an observed and expected result is said to be statistically
significant if and only if:
Under the assumption that there is no true difference, the probability that
the observed difference would be at least as large as that actually seen is
less than or equal to a (5%; 0.05).
Conversely, under the assumption that there is no true difference, the
probability that the observed difference would be smaller than that
actually seen is greater than 95% (0.95).
(Go ahead and read that as many times as it takes for it to make (1) sense, or (2) you fall
asleep. Whichever comes first.)
Once an investigator has calculated a statistic from collected data, s/he must be able to
draw conclusions from it. How does one determine whether deviations from the expected (null
hypothesis) are significant?
A probability distribution assigns a relative probability to any possible outcome (e.g., a
particular Mitotic Index). The mitotic indices you calculated for each sample, while expressed
as numbers, are not distributed along a normal curve. They are ordinal, rather than
continuous, data. For this reason, a non-parametric statistical test, the Mann-Whitney U
test, will be employed for your analysis.
C. Statistical Hypotheses
A non-parametric test is used to test the significance of qualitative or attribute data such
as those you have been collecting for this project. In the following sections, you will learn how
to apply a statistical test to your data.
Mitosis Data Analysis - 3
Your team should already have devised two statistical hypotheses stated in terms of
opposing statements, the null hypothesis (Ho) and the alternative hypothesis (Ha). The null
hypothesis states that there is no significant difference between the two populations being
compared. The alternative hypothesis may be either directional (one-tailed), stating the
precise way in which the two populations will differ (“Control Group will have more mitotic cells
than Treatment Group.”), or non-directional (two-tailed), not specifying the way in which two
populations will differ (“Control and Treatment will differ in the number of mitotic cells.”).
To determine whether or not there is a difference in mitosis between your two populations
(Treatment and Control), you must perform a statistical test on your data.
III. Applying a Statistical Test to Your Mitotic Indices
Once your team has calculated a Mitotic index (M) for each of your 10 samples from each
of the two onion cell populations (Treatment and Control), you are ready to employ a statistical
test to determine whether there is overlap between the range of calculated indices. If there is
a great deal of overlap, then there is not a significant difference between them; you will fail to
reject your null hypothesis. However, if there is very little overlap (5% or less), you can
confidently conclude that the two cell populations do differ significantly; you will reject your null
hypothesis.
Non-parametric test for two samples: Mann-Whitney U
The Mann-Whitney test allows the investigator (you) to compare your two cell
populations without assuming that your Mitotic Index values are normally distributed.
The Mann-Whitney U does have its rules. For this test to be appropriate:
•
•
•
You must be comparing two random, independent samples (Treatment & Control)
The measurements (Mitotic I ndices, in our case) should be ordinal
No two measurements should have exactly the same value (though we can deal
with “ties” in a way that will be explained shortly).
The Mann-Whitney U test allows the investigator to determine whether there is a significant
difference between two sets of ordered/ranked data, such as those your team has collected in
its mitosis study.
Here is a stepwise explanation and example of how to apply this test to your data.
1. State your null and alternative hypotheses.
Ho :
HA :
Example:
Ho: There is no difference in the ranks of Mitotic Indices (M) between meristematic cells in
an onion treated with aqueous trifluralin and an onion treated with plain water.
HA: There is a difference in the ranks of Mitotic Indices (M) between meristematic cells in
an onion treated with aqueous trifluralin and an onion treated with plain water.
2. State the significance level (alpha, α) necessary to reject Ho. This is typically P < 0.05
3. Rank your Mitotic Indices from smallest to largest in a table, noting which index came from
which population of cells (Treatment or Control).
Example: Table 1 shows 20 (imaginary) values for Mitotic Indices from the two onion root tip
cell populations mentioned before, treated with trifluralin (T) and treated with plain water (C).
Table 2 shows the values ranked and labeled by population.
Mitosis Data Analysis - 4
(Notice in the ranked table that if two values are the same, then each receives the average of the two
ranks. For example, value 0.35 appears twice (ranks 6 and 7). The sum of the rank values divided by
two is their mean: 13/2 = 6.5. The two equal values thus “share” ranks 6 and 7 equally.)
Table 1. Example: Mitotic Indices for
treatment and control root tips (not ranked)
Table 2. Example: Ranked Mitotic Indices
Sample #
M treatment
Mcontrol
Rank
1
2
3
4
5
6
7
8
9
10
0.20
0.25
0.45
0.35
0.15
0.10
0.55
0.40
0.30
0.45
0.55
0.60
0.65
0.80
0.35
0.75
0.70
0.85
0.90
0.50
1
2
3
4
5
6.5
6.5
8
9
10
11
12.5
12.5
14
15
16
17
18
19
20
Ranked M
values
0.10
0.15
0.20
0.25
0.30
0.35
0.35
0.40
0.45
0.45
0.50
0.55
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
Cell
Population
T
T
T
T
T
T
C
T
T
T
C
C
T
C
C
C
C
C
C
C
4. Assign points to each ranked value. Each “treatment” rank gets one point for every
“control” rank that appears below it. Every “control” value gets one point for every “treatment”
value that appears below it. For example, the first rank, 2(T) has 9 “control” values below it, so
it gets 9 points. Value 9(C) has 3 “treatment” values below it, so it gets 3 points. (Table 3)
Table 3. Points assigned to ranked M values in Treatment
and Control onion cell popuiations. (example)
Rank
Ranked M
Cell population
Points
values
1
0.10
T
10
2
0.15
T
10
3
0.20
T
10
4
0.25
T
10
5
0.30
T
10
6.5
0.35
T
10
6.5
0.35
C
4
8
0.40
T
9
9
0.45
T
9
10
0.45
T
9
11
0.50
C
1
12.5
0.55
C
1
12.5
0.55
T
7
14
0.60
C
0
15
0.65
C
0
16
0.70
C
0
17
0.75
C
0
18
0.80
C
0
19
0.85
C
0
20
0.90
C
0
Mitosis Data Analysis - 5
5. Calculate a U statistic for each category by adding the points for each cell population.
Utreatment = 10 + 10 + 10 + 10 + 10 + 10 + 9 + 9 + 9 + 7 = 94
Ucontrol = 4 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 = 6
Your final U value is the smaller of these two values. In this example our U value is 6. In
general, the lower the U value, the greater the difference between the two groups being tested.
(For example, if none of the M values overlapped, the U value would be zero. That means
there is a large difference between the two groups: they do not overlap at all.)
6. You are now ready to move to the final step, determining whether to reject or fail to reject
your null hypothesis. (Proceed to Section IV.)
IV. Critical values for non-parametric statistics
As you already know, a specific probability value linked to every possible value of any
statistic, including the Mann-Whitney U statistic you just calculated.
Remember that we have defined our significance level (a) as 0.05. This implies that a
correct null hypothesis will be rejected only 5% of the time, but correctly identified as false 95%
of the time. A critical value of a statistic (e.g., your Mann-Whitney U statistic) is that value
associated with a significance level of 0.05 or lower. The critical values for the Mann-Whitney
U statistic are listed in Table 4.
Compare your U value to those shown in the Table of Critical Values for the Mann-Whitney
U (Table 4). Find the sample size (i.e., the number of Mitotic Indices (M) you calculated) for
each of your two cell populations, and use the matrix to find the critical value for U at those two
sample sizes. (For example, if you calculated 19 M values for one cell population and 17 for
the other, then the critical value of the U statistic would be 99. This means that a U value of 99
or lower indicates rejection of the null hypothesis.
If your U value is lower than the critical value at the appropriate spot in the table, reject your
null hypothesis. If your U value is greater than that in the table, fail to reject.
In our example of treatment and control groups with 10 samples each, we obtained a MannWhitney statistic of 6. This is far lower than the critical value of 23 required for rejection of the
null hypothesis. This means that there is very little overlap between the two populations: they
are significantly different. A complete table of Mann-Whitney U critical values can be found in
Table 5.
Table 4. Small section of a table of critical values for the Mann-Whitney U test. Example:
If both your treatment and control groups consist of ten values, then the critical value for
the Mann-Whitney U is shown in the square marked with the red arrow.
Mitosis Data Analysis - 6
Table 5. Critical values for the Mann-Whitney U statistic. Find the value that
corresponds to the sample sizes (10) of your two cell populations. If your U value is
smaller than that shown in the table, then there is less than 5% chance that the
difference between your two cell populations is due to chance alone. If your U value is
smaller than the one shown in this table for your two sample sizes, reject your null
hypothesis. If your U value is larger than that shown in the table, fail to reject your null
hypothesis. (From The Open Door Web Site, http://www.saburchill.com/)
V. Graphic Representation of your Data
Tables of numerical data are important, but they are not always the best way to
present your data to an audience. As the old saying goes, “A picture is worth a
thousand words.” The most effective way to present your experimental results,
whenever possible, is with a figure.
Mitosis Data Analysis - 7
A. Mitosis Raw Data
A simple bar graph can be used to represent the proportion of cells in your sample
that you found in each stage of mitosis. An example can be seen in Figure 1.
Figure 1. A bar graph showing a hypothetical distribution of cells in
each stage of mitosis in a study population of cells. Note that the
categories could be placed in any order, and do not necessarily
represent a continuum.
Don’t confuse a bar graph, which depicts categories of data that are not necessarily
continuous, with a histogram, which depicts continuous data. An example of a
histogram is shown in Figure 2.
Figure 2. A histogram showing a hypothetical distribution of cells of
different diameter in a population of cells. Note that each bar on the
histogram represents a specific subset of a range of continuous
numerical data that occur in a set order.
Notice that these figures, unlike tables, have their legends underneath. Be sure to use
the proper format for all figures and tables in all your work.
B. Visualizing Mann-Whitney U results
Because the Mann-Whitney U provides a measure of how great the overlap is
between two groups being compared, a box plot is a good way to represent your
Mann-Whitney U results. The box graph can be created to show the median of each
group, the range of values, and their overlap. An example of a box plot is shown in
Figure 3, with a key and explanation in Figure 4.
Mitosis Data Analysis - 8
Figure 3. Sample box plot showing overlap of mitotic index values
for two populations of cells.
Figure 4. The black bar in the center of each population’s values
represents the median. The Interquartile Range (IQR) includes 50% of
th
the values, and is bordered on the bottom by the 25 percentile and
th
on the top by the 75 percentile. The range is the region between the
minimum and maximum values. The star represents a data point that
is an outlier.
VI. Experimental Error vs. Human Error
Your team made sure that all factors except one—the chemical used on one
population of onion root tips—were exactly the same for Treatment and Control groups.
But did you get exactly the same number of mitotic cells in each sample? Probably not.
What might account for slightly different results among samples?
Mitosis Data Analysis - 9
Slight variation in results in carefully run trials is known as experimental variability
or experimental error. In this experiment, it could be due to genetic differences
between individual onions or to other biological factors. Note that this natural variability
is NOT the same as variability caused by actual mistakes in experimental technique
(human error).
DO NOT CITE HUMAN ERROR AS A REASON FOR UNEXPECTED RESULTS IN
YOUR EXPERIMENT! THAT IS UNPROFESSIONAL. If you make accidental
mistakes that could affect your results, you should re-do the experiment, not simply
explain away those mistakes as “human error.” Citing human error as a good reason
for your results is about as good as saying, “Oops! We are terrible at science. But we
don’t really care enough to do it right.”
NEVER include human error in this or any future discussions of experimental
variability. Experimental error ≠ mistakes! When contemplating your results, your
fellow scientists will assume you have done your experiments as carefully as possible,
and have minimized inaccuracies due to human error.
In statistics, an outlier is a data point that is very different from the majority of the
other data points. A data point’s outlier value may indicate experimental error or true
variability. If the investigators suspect an outlier is due to experimental error, it may be
excluded from the statistical analysis. However, it is always important to include
outliers. Real data should not be ignored.
VII. Project Completed. Is This the End?
The study you are now completing is only the beginning of what could be a longterm research project to discover the various factors that direct and affect mitosis. The
only thing you are determining now is whether or not there is a statistically significant
difference between your treatment and control cell populations. In other words, the
research project you are now completing is a pilot study. It establishes an observable
fact (i.e., that there is or is not a difference in mitosis between cells treated with a
particular chemical and those treated with a placebo (plain water)). That fact should be
subject to further investigation beyond what you have accomplished here.
Although you may have established that there is or is not a difference in mitosis
between your treated and untreated roots, you still may not be able to definitively state
why or why not there is a difference. To do that, you must move to the next step, which
is to list as many competing hypotheses as possible as to why there is a difference (or
even—if your team has obtained negative results—why there is not a difference, despite
obvious differences in your two populations). Each of these multiple hypotheses could
form the basis for a research project that would take your team one step further towards
discovering the reasons for your pilot study’s observed result. You should be able to
give a brief description of an experiment that could be designed to test each of your
competing hypotheses.
In your presentation, be sure to include a list of hypotheses that could explain your
observed results. What factors differed between the cell populations that might cause
differences in mitosis? Consider what is happening on a cellular and molecular level.
When you analyze your results, think about every aspect of your findings, and report
anything you find intriguing enough to warrant further study.
Science is not a one-project endeavor. Every new piece of valid information can be
seen as opening a new doorway to discovery of the most intimate mechanisms of life.
Mitosis Data Analysis - 10