Download T-test - Cloudfront.net

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
The t Test
In biology you often want to compare two sets of
replicated measurements to see if they are the same or
different. For example are plants treated with fertilizer
taller than those without?
If the means of the two sets are very different, then it is
easy to decide, but often the means are quite close and
it is difficult to judge whether the two sets are the same
or are significantly different.
The t test compares two sets of data and tells you the
probability (P) that the two sets are basically the same.
1.1.5 Deduce the significance of the difference between
two sets of data using calculated values for t and the
appropriate tables.(3)
If you carry out a statistical significance test, such as the
t-test, the result is a P value, where P is the probability
that there is no difference between the two samples.
A. When there is no difference between the two samples:
A small difference in the results gives a higher P value,
which suggests that there is no true difference between
the two samples
By convention, if P > 0.05 you can conclude that the
result is not significant (the two samples are not
significantly different).
B. When there is a difference between the two samples:
A larger difference in results gives a lower P value, which
makes you suspect there is a true difference (assuming
you have a good sample size).
By convention, if P < 0.05 you say the result is
statistically significant.
If P < 0.01 you say the result is highly significant and
you can be more confident you have found a true effect.
As always with statistical conclusions, you could be
wrong! It is possible there really is no effect, and you had
the bad luck to get sets of results that suggests a
difference or not, where there is none.
Of course, even if results are statistically highly significant,
it does not mean they are necessarily biologically
important. Remember this when drawing conclusions.
Correlation does not imply causation!
Causation and
correlation ?
1.1.6 Explain that the existence of a correlation does not
establish that there is a causal relationship between two
variables..
Typically in Biology your experiment may involve a
continuous independent variable and a continuously
variable dependent variable. e.g effect of enzyme
concentration on the rate of an enzyme catalyzed reaction.
The statistical analysis would set out to test the strength of
the relationship (correlation). Once a correlation between
two factors has been established from experimental data it
would be necessary to advance the research to determine
what the causal relationship might be.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Causati
It on
is important to realize that if the statistical analysis of
data indicates a correlation between the independent and
dependent variable this does not prove any causation.
Only further investigation will reveal the causal effect
between the two variables.
Correlation does not imply causation!
Skirt lengths and stock prices are highly correlated
(as stock prices go up, skirt lengths get shorter).
The number of cavities in elementary school children and
vocabulary size have a strong positive correlation.
Clearly there is no real interaction between the factors
involved simply a co-incidence of the data.
Correlation vs. Causation :We have been discussing
correlation. We have looked at situations where there exists
a strong positive relationship between our variables x and y.
However, just because we see a strong relationship between
two variables, this does not imply that a change in one
variable causes a change in the other variable. Correlation
does not imply causation! Consider the following:
In the 1990s, researchers found a strong positive relationship
between the number of television sets per person x and the
life expectancy y of the citizens in different countries. That is,
countries with many TV sets had higher life expectancies.
Does this imply causation? By increasing the number of TVs
in a country, can we increase the life expectancy of their
citizens? Are there any hidden variables that may explain
this strong positive correlation?
There is a strong positive correlation between ice cream
sales and shark attacks. That is, as ice cream sales
increase, the number of shark attacks increase.
Is it reasonable to conclude the following?
Ice cream consumption causes shark attacks.
All of the previous examples show a strong positive
correlation between the variables. However, in each example
it is not the case that one variable causes a change in the
other variable. For example, increasing the number of ice
cream sales does not increase the number of shark attacks.
There are outside factors, also known as lurking variables,
which cause the correlation between these variables.
Correlation does not always mean that one thing
causes the other thing (causation), because a
something else might have caused both.
For example, on hot days people buy ice cream, and
people also go to the beach where some are eaten
by sharks. There is a correlation between ice cream
sales and shark attacks (they both go up as the
temperature goes up in this case). But just because
ice cream sales go up does not cause (causation)
more shark attacks.
Correlation does not imply causation!
You may be interested to know that global warming, earthquakes, hurricanes, and
other natural disasters are a direct effect of the shrinking numbers of Pirates since the
1800s. For your interest, I have included a graph of the approximate number of pirates
versus the average global temperature over the last 200 years. As you can see, there is
a statistically significant inverse relationship between pirates and global temperature.
What is a t-test?
A t-test is any statistical hypothesis test in which the test
statistic follows a Student's t distribution if the null
hypothesis is supported.
What is a t-test used for?
It can be used to determine if two sets of data are
significantly different from each other, and is most
commonly applied when the test statistic would
follow a normal distribution.
Why is it called a
Student's t-test?
The t-statistic was introduced in 1908 by William Sealy
Gosset, a chemist working for the Guinness brewery in
Dublin, Ireland ("Student" was his pen name).
Gosset had been hired due to Claude Guinness's policy
of recruiting the best graduates from Oxford and
Cambridge to apply biochemistry and statistics to
Guinness's industrial processes. Gosset devised the
ttest as a cheap way to monitor the quality of stout.
The t-test work was submitted to and accepted in the
journal Biometrika, the journal that Karl Pearson had cofounded and was the Editor-in-Chief; the article was
published in 1908. Since Guinness had a company policy
that chemists were not allowed to publish their findings, the
company allowed Gosset to publish his mathematical work
but only if he used a pseudonym, that was "Student".
Time for
Student’s
t-test…
Statistic’s
makes the
finest stout!
How do you do a t-test?
T-test values can be calculated with equations
but we will calculate them using EXCEL.
Type: 1or 2
Type 1: matched pairs
Type 2: unpaired
Number of tails: 1 or 2
df: degrees of freedom
significance level (
): usually P= 0.05
t-test to Compare Two Sample Means
Student’s t-Test
Student’s t-test is the most common (and simple) way of testing to
see if there is a significant difference between two independent
groups. The t-test statistic is calculated from the means, the number
of samples in each group (n1 and n2), and the variance of each group
(s1 and s2), according to the following equation.
The variance is simply the standard deviation squared.
Equation for t value for 2 means
t
X1  X2
2
1
2
2
s
s

n1 n2
although you can use
equations we will use EXCEL
QuickTime™ and a
decompressor
are needed to see this picture.
T-Test using EXCEL
(1) Make data table in EXCEL
Note: This only
gives the P value
(2) Add cell P =
Insert / Function / TTEST
Type 1:paired
Type 2:unpaired
Means
Group A
tumor mass (g)
0.72
0.68
0.69
0.66
0.57
0.66
0.7
0.63
0.71
0.73
Group B
tumor mass (g)
0.71
0.83
0.89
0.57
0.68
0.74
0.75
0.67
0.8
0.78
0.675
0.742
P =
0.0269
You need to activate:
Add ins: Data Analysis Toolpak
(1) Label a cell TTEST(2) Click on the adjacent cell
(3) Tools | Data Analysis |
T-test: Two-Sample
Assuming equal variance
Group 1
mass (g)
12.5
13
12
12
13
14
13
10.5
9.5
11
Group 2
Mass (g)
12
8.5
10
8
8
13.5
9
8.5
6.5
9
12.05
9.3
TTEST
Means
t-Test: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
12.05
1.858
10
3.0458
0
18
3.5234
0.0012
1.7341
0.0024
2.1009
Variable 2
9.3
4.233
10
Hypothesis Testing
Using
t-tests
What is Hypothesis Testing?
Hypothesis testing is used to obtain information about a
population parameter. A hypothesis is created about the
population parameter, and then a sample from the
population is collected and analyzed. The data found will
either support or not support the hypothesis.
A statistic is any value that is computed from the data in
the sample. A test statistic is a statistic that can be used
to find evidence in a hypothesis test.
If a hypothesis test is conducted to find information
about the population mean, the sample mean would be a
logical choice of a statistic that would be useful.
Steps for Hypothesis Testing
Statistical test of difference using the t-Test.
There are a few steps for evaluating a dataset or
comparing multiple sets of data (statistical inference
process). These steps are summarized here: list:
1. State the null hypothesis and the alternative hypothesis
based on your research question. Define the hypothesis
as to whether your means or standard deviations are
significantly different.
Null Hypothesis: 'There is no significant difference
between the height of shells in sample A and sample B.'
H0: μ = μ0
Alternative Hypothesis: 'There is a significant difference
between the height of shells in sample A and sample B'.
HA: μ ≠ μ0
Hypothesis Testing
• The intent of hypothesis testing is formally
examine two opposing conjectures
(hypotheses), H0 and HA
• These two hypotheses are mutually exclusive
and exhaustive so that one is true to the
exclusion of the other
• We accumulate evidence - collect and analyze
sample information - for the purpose of
determining which of the two hypotheses is true
and which of the two hypotheses is false
The Null and Alternative Hypothesis
The null hypothesis, H0:
• States the assumption (numerical) to be tested
• Begin with the assumption that the null
hypothesis is TRUE
• Always contains the ‘=’ sign
The alternative hypothesis, Ha:
• Is the opposite of the null hypothesis
• Challenges the status quo
• Never contains just the ‘=’ sign
• Is generally the hypothesis that is believed to
be true by the researcher
Null and Alternative Hypotheses
The null hypothesis, denoted H0, is the statement
that is being tested. Usually the null hypothesis is
the “status quo” or “no change” hypothesis.
The hypothesis test looks for evidence against
the null hypothesis.
The alternative hypothesis, denoted HA or H1, is the
statement that we are hoping is true or what we
wish to prove. It is the “opposite” of the null
hypothesis. Since we wish to prove the alternative
hypothesis, we usually write the alternative
hypothesis first and then the null hypothesis.
Statistical test of difference using the t-Test.
2. Set the critical P level (also called the alpha () level )
usually it will be P = 0.05 (5%)
The p-value is the probability of observing an outcome
as extreme or more extreme as the observed sample
outcome if the null hypothesis is true.
decide if the test should be 1- or 2-tailed
determine the number of degrees of freedom.
df = n1 + n2 - 2
3. Calculate the value of the appropriate statistic.
Use the t-test for comparing means
Level of Significance
Most hypothesis tests fall in the category of significance
tests. Before the test is started (before the sample is
chosen and anything is computed), a significance level, α
is chosen. The most commonly used significance levels
are α = 0.10, 0.05, or 0.01. If a significance level isn’t
specified, α = 0.05 is the most common choice.
The significance level is how much evidence is needed to
reject the null hypothesis. For example, if α = 0.05 is
chosen, the evidence is considered strong enough to
reject the null hypothesis if the data in the sample would
only happen 5% of the time, or less, when the null
hypothesis is true. That means that the null hypothesis
will only be rejected when the data in the sample isn’t
very likely if the null hypothesis is true.
4. Write the decision rule for rejecting the null hypothesis.
In biology the critical probability is usually taken as
0.05 (or 5%). This may seem very low, but it reflects the
facts that biology experiments are expected to produce
quite varied results.
If P > 5% then the two sets are the same (i.e. accept the
null hypothesis).
If P < 5% then the two sets are different (i.e. reject the
null hypothesis).
For the t test to work, the number of repeats should
be as large as possible, and certainly > 5.
5. Write a summary statement based on the decision.
Example: The null hypothesis is rejected since calculated
P = 0.003 < P = 0.05 two-tailed test
Depending on whether the calculated value is greater than or
less than the tabulated value, you accept or reject your
hypothesis, and can thereby conclude whether your data is
significantly different or not.
6. Write a statement of results in standard English.
There is a significant difference between the height of
shells in sample A and sample B.
What are degrees of freedom?
The “df” in the t-distribution means “degrees of
freedom”, in comparing 2 means
df = n1 + n2 - 2
The t-distribution is a measure of the area under a curve.
The normal distribution
QuickTime™ and a
decompressor
are needed to see this picture.
The central region on this graph is the acceptance area
and the tail is the rejection region, or regions. In this
particular graph of a two-tailed test, the rejection region
is shaded blue. The tail is referred to as “alpha“, or
p-value (probability value). The area in the tail can be
described with z-scores. For example, if the area of the
tails was 5% (2.5% each side).
x
x
The t-distribution looks almost identical to the normal
distribution curve, only it’s a bit shorter and fatter.
The t-distribution can be used for small samples.
The larger the sample size, the more the t-distribution
looks like the normal distribution. In fact, for sample sizes
larger than 20, the t-distribution is almost exactly like the
normal distribution. The “df” in the t-distribution means
“degrees of freedom” and is just the sample size minus
one (n-1).
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
This graph shows what three different t-distributions look
like. With a larger sample size (black line, infinite degrees
of freedom), the t-distribution looks identical to the normal
curve. But with a smaller sample size of four (df = 3), the
t-distribution curve is shorter and fatter.
x
How to Calculate a t-Distribution
Step 1: Calculate the df, or degrees of freedom) .
Step 2: Look up the df in the left hand side of the DF = 8
t-distribution table.
Locate the column under your alpha level
(the alpha level is usually given to you in the question.
at the 0.05 level
t crit = 2.306
In general, statistical tests are used for comparing two
means or two standard deviations to see if they are
significantly different.
You can also compare a mean from measured data to an
accepted value to see if your sample measurements
match the literature values.
There are two main types of t-tests we will use:
The usual form of the t test is for "unmatched pairs"
(type = 2), where the two sets of data are from
different individuals.
For example leaves grown in the sun
and grown leaves in the shade.
QuickTime™ and a
decompressor
are needed to see this picture.
The other form of the t test is
for "matched pairs" (type = 1),
where the two sets of data are
from identical individuals.
Pulse Before
bpm eating
A good example of this is a
” before and after " test.
For example the pulse rate of 8
individuals was measured before
and after eating a large meal,
with the results shown in the left.
The mean pulse rate is certainly
higher after eating, but is it
significantly higher?
mean
After
eating
105
109
79
87
79
86
103
109
87
100
74
82
73
80
83
90
85.4
92.9
Hint: type 1 has 1 group
1. Set up the null and alternative hypothesis
Ho
there is no difference in the heart
rate before and after eating a meal
HA
the heart rate is higher after eating
2. Set the critical P level
(also called the alpha () level )
P  0.05
3. Calculate the value of the appropriate statistic.
Which kind of t-test should be
used… paired or unpaired?
Type 1
Calculate the degrees of freedom (DF)
DF  # of pairs of data  1
DF = n -1
DF  8  1  7
TTEST
t-Test: Paired Two Sample for Means
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
85.4
152.6
8
0.9790
0
7
-8.275
0.000
1.895
0.000
2.365
Variable 2
92.9
135.0
8
n-1
Find the critical value.
  1.895
Determine if there is a difference or not.
t  t critical (8.275 > 1.833)
So, the null hypothesis is rejected and
the alternative hypothesis is accepted
Conclusion: Eating a meal increases the heart rate
QuickTime™ and a
decompressor
are needed to see this picture.
Hint: type 2 has 2 independent groups
QuickTime™ and a
decompressor
are needed to see this picture.
Evaluation of Means for small samples - The t-test
The t-test, and any statistical test of this sort, consists of
three steps.
1. Define the null and alternate hypotheses,
2. Calculate the t-statistic for the data,
3. Compare tcalc to the tabulated t-value, for the
appropriate significance level and degree of freedom.
If tcalc > ttab, we reject the null hypothesis and accept the
alternate hypothesis. Otherwise, we accept the null
hypothesis.
The t-test can be used to compare a sample mean to an
accepted value (a population mean), or it can be used to
compare the means of two sample sets.
Rejecting or Failing to
Reject the Null Hypothesis
Rejecting or Failing to
Reject the Null Hypothesis
If the p-value is less than the significance level,
reject the null hypothesis. For example, if α = 0.05
and the p-value is 0.03, reject the null hypothesis
because we expect to see the observed outcome
only 3% of the time if the null hypothesis is true.
So the observed outcome isn’t very likely.
More specifically the probability of the observed
outcome happening was less than 5% if the null
hypothesis is true. So reject the null hypothesis in
favor of the alternative hypothesis and say, “there
is sufficient evidence to reject the null hypothesis”.
To summarize with non technical language, if
something is not very likely, reject it.
Rejecting or Failing to
Reject the Null Hypothesis
If the p-value is greater than the significance
level, fail to reject the null hypothesis.
For example, if α = 0.05 and the p-value is 0.15, fail to
reject the null hypothesis. The observed outcome is
expected 15% of the time if the null hypothesis is
true.
This may not seem very likely, but it is more likely
than 5% so the conclusion is to fail to reject the null
hypothesis, and we say, “there is not sufficient
evidence to reject the null hypothesis.”
Rejecting or Failing to
Reject the Null Hypothesis
Why shouldn’t the conclusion be, “there is sufficient
evidence to accept the null hypothesis”? It is a
convention based on the fact that in mathematics,
statements are not proved with examples. A claim
can be disproven with one example, but even one
million examples in favor of the claim can’t prove it.
To borrow a common phrase, “Absence of evidence
is not evidence of absence”. However, hypothesis
tests don’t actually prove anything anyways. They
are just a method of judging the evidence for or
against a hypothesis. Yet, the tradition is strong
enough, that a conclusion should never be, “there is
sufficient evidence to accept the null hypothesis”.
Analogy
Until the 17th century
Europeans thought
every swan was white
because for centuries,
every swan they saw
was white.
Then, a black swan was
discovered in Australia,
instantly disproving the
hypothesis that all swans
are white.
Analogy
Suppose a person thinks that there might have been a
skunk in his yard the previous night. A null hypothesis is
that there was no skunk in the yard (status quo).
The alternative hypothesis would then
be that there was a skunk in the yard.
H0: There was no skunk in the yard.
HA: There was a skunk in the yard.
He could go outside the next day and look for evidence
that there was a skunk. If he finds skunk fur or smells a
skunk, then he would have evidence to reject the null
hypothesis in favor of the alternative hypothesis
(that there was a skunk).
On the other hand, if he doesn’t find evidence that a
skunk was there, that does not mean that the null
hypothesis is true. A skunk could have been there
without leaving evidence. That is why he shouldn’t
say he accepts the null hypothesis. He doesn’t know
for sure that there wasn’t a skunk. He just doesn’t
have evidence to support the claim that there was a
skunk. So he says there is not sufficient evidence to
reject the null hypothesis or that he fails to reject the
null hypothesis.
If he rejects the null hypothesis, he could technically
say that he accepts the alternative hypothesis.
However, tradition dictates that conclusions are
always stated as rejecting or failing to reject
hypothesis rather than accepting hypothesis.
If he finds skunk fur in the yard, he would reject the
null hypothesis. Yet he still hasn’t proved that there
was a skunk. The dog could have brought the fur into
the yard. Because he hasn’t proved the alternative
hypothesis, (he might have strong evidence that there
was a skunk, but he hasn’t proven it) he shouldn’t say
that he accepts the alternative hypothesis.
Pvalues
• Calculate a test statistic
in the sample data
that is
relevant to the hypothesis being tested
• After calculating a test statistic we convert this
to a P value by comparing its value to
distribution of test
statistic’s under the null hypothesis
• Measure of how likely the test statistic value
is under the null hypothesis
P-value ≤ α ⇒ Reject H0 at level α
P-value > α ⇒ Do not reject H0 at level α
1- vs 2-Tailed Tests
QuickTime™ and a
decompressor
are needed to see this picture.
EXP: This drug makes
tumors smaller
QuickTime™ and a
decompressor
are needed to see this picture.
Exp: This drug makes rats grow bigger
QuickTime™ and a
decompressor
are needed to see this picture.
This drug changes blood pressure
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Type 1
Type 2
a
Hint: type 1QuickTime™
has
1 andgroup
decompressor
DF  n  1
are needed to see this picture.
DF  n1  n2  2
Hint: type 2 has 2 independent groups
Example #1: In an investigation to
determine the effectiveness of sequencing
fingerprints 10 prints are taken enhanced
with DFO and then with ninhydrin.
The points of detail (minutiae) are
recorded. Is there a difference at the
95% confidence level?
DFO
DFO
+Ninhydrin
8
10
12
15
11
12
6
6
9
13
11
14
7
9
8
9
10
15
9
12
In biometrics and forensic science,
minutiae are major features of a
fingerprint, using which comparisons
of one print with another can be made.
t-test for matched pairs
1. Set up the null and alternative hypothesis
Ho
there is no difference in the number
of minutae when using ninhydrin
HA
there are more number of minutae
after enhancment with ninhydrin
Is this 1 or 2 tailed?
In biometrics and forensic
science, minutiae are
major features of a
fingerprint, using which
comparisons of one print
with another can be made.
1 tailed
2. Set the critical P level
(also called the alpha () level )
95% confidence level?
 P  0.05
3. Calculate the value of the appropriate statistic.
Which kind of t-test should be
used… paired or unpaired?
Type 1
Calculate the degrees of freedom (DF)
DF  # of pairs of data  1
DF = n -1
DF  10  1  9
(1) Label a cell TTEST (2) Click on the adjacent cell
(3) Tools | Data Analysis |
T-test: Two paired two
sample means
DFO
8
12
11
6
9
11
7
8
10
9
9.1
DFO
& Ninhydrin
10
15
12
6
13
14
9
9
15
12
11.5
AVERAGE
t-Test: Paired Two Sample for Means
Variable 1
Variable 2
Mean
9.1
11.5
Variance
3.6555556
8.7222222
Observations
10
10
Pearson Correlation
0.8953207
Hypothesized Mean Difference
0
df
9
t Stat
-5.0410083 use absolute value!
P(T<=t) one-tail
0.0003494
t  5.04
t Critical one-tail
1.8331129
P(T<=t) two-tail
0.0006988
t Critical two-tail
2.2621572
Find the critical value.
  1.833
Determine if there is a difference or not.
t  t critical (5.0 > 1.833)
So, the null hypothesis is rejected and
the alternative hypothesis is accepted
Conclusion:The ninhydrin does make a positive difference.
You can also use EXCEL to solve directly for the P value
(1) Make data table in EXCEL
Type 1:paired
(2) Add cell P =
Type 2:unpaired
Insert / Function / TTEST
DFO
8
12
11
6
9
11
7
8
10
9
9.1
DFO
& Ninhydrin
10
15
12
6
13
14
9
9
15
12
11.5
The mean of DFO only is
significantly less than the mean
DFO + Ninhydrin because the
value of P < 0.05.
P=
0.0003494
The null hypothesis is rejected,
Conclusion: there are more
number of minutae after
enhancment with ninhydrin
AVERAGE
t-test for unmatched (independent) pairs
If there is no before and after relationship between the
sample then the independent samples test is used.
t
X1  X2
2
1
2
2
s
s

n1 n2
t-test for unmatched
(independent) pairs
Example 2: Some brown dog
hairs were found on the
clothing of a victim at a
crime scene involving a dog.
The diameters of the five hairs were measured:
46, 57, 54, 51, 38 m
A suspect is the owner of the dog with similar brown
hairs. A sample of the hairs has been taken and their
widths measured: 31, 35, 50, 35, 36 m
Is it possible that the hairs found on the victim were
left by the suspect’s dog? Test at the 5% level.
t-test for unmatched (independent) pairs
1. Set up the null and alternative hypothesis
Null Hypothesis (Ho): 'There is no significant difference
between the hairs from Dog A and Dog B.'
Alternative Hypothesis (HA): 'There is a significant
difference between the hairs from Dog A and Dog B.'
Is this 1 or 2 tailed?
Is this a Type 1
or Type study?
2 tailed
Type 2… 2 independent groups
t-test for unmatched (independent) pairs
2. Calculate mean and standard deviation of the for the data sets.
DOG A
hair (m)
DOG B
hair (m)
46
31
35
57
54
38
50
35
36
246
187
Total
49.2
37.4
Mean
7.463
7.301
STD DEV
51
3. Calculate the magnitude of the difference between the two means.
t
X1  X2
s12 s22

n1 n2
X1  X2
49.2  37.4  11.8
11.8
4. Calculate the standard error in the difference.
2
1
2
2
s
s

n1 n2
s1
s2

n1
n2
7.4632 7.3012

5
5
4.669
18.56 10.66  4.669  4.67
3. Calculate the magnitude of the difference between the two means. 11.8
4. Calculate the standard error in the difference.
4.669
5. Calculate the value of t.
t
X1  X2
2
1
2
2
s
s

n1 n2
11.8
t
 2.527  2.53
4.669
6. Calculate the degrees of freedom (DF).
DF  n1  n2  2  5  5  2  8
t-test for unmatched (independent) pairs
7. Choose your level of significance and
find the critical value using the table.
  2.306
at the 0.05 level
t crit = 2.306
8. Determine if there is a difference or not.
If t < critical value then there is no significant
difference between the two data sets
If t > critical value then there is a significant
difference between the two data sets
t > t critical (2.53 > 2.306)
So, at 0.05 level there is a significant difference
between the two data sets… we reject the null
hypothesis… that the hairs came from the same dog
Conclusion: The hairs are from different dogs (HA)
I told you I
wuz framed!
Example #3: A researcher wishes to test if a certain kind of
growth hormone will produce faster growth in mice.
She injects 10 mice with
Group 1 Group 2 –
the hormone and uses another Hormone
No Hormone
mass (g)
10 as a control.
mass( g)
Three weeks later, she weighs the
mice and discovers that the mean
weight of mice that have received
the injections is 12.05 g and the
mean weight of control mice is 9.3 g.
12.5
12
13
8.5
12
10
12
8
13
8
14
13.5
13
9
10.5
8.5
9.5
6.5
11
9
Mean = 12.05
Mean = 9.3
These values indicate that the mice receiving the hormone
are heavier. Is her value of 12.05 significantly different
than 9.3?
Is it possible that the hormone has no effect, that the weight
difference between the two groups is due to chance?
This is like flipping a coin 10 times. You expect 5 heads and
5 tails but you might get 6 heads or 7 heads or perhaps 8
heads. Similarly, if the hormone does not work, you expect
the mean for the two groups to be similar but it may not be
exactly the same.
Group 1 Hormone
mass( g)
Group 2 –
No Hormone
mass (g)
12.5
12
13
8.5
12
10
12
8
13
8
14
13.5
13
9
10.5
8.5
9.5
6.5
11
9
Mean = 12.05
Mean = 9.3
Is this a Type 1 or
Type 2 study?
Two independent
groups… Type 2
How many degrees of
freedom are there?
DF  n1  n2  2
DF  10  10  2  18
What is the chance that the two means would be as different
as 12.05g and 9.3g if the hormone really did not work?
Statistical tests test whether differences in the data are real
differences or whether they are due to chance. In the example
above, we test if the mean of group 1 is significantly different
than the mean of group 2.
The alternative is that the difference is due to chance or
random fluctuations and the hormone did not cause additional
weight gain. The test gives the probability that difference could
be due to chance.
If the probability P) that the difference is due to chance is less
than 1 out of 20 (P<0.05), then we conclude that the difference is
real. If the probability is greater than 0.05, we conclude that the
difference is not significant, it could be due to chance.
There are several tests available for testing means.
A commonly used test for data that are normally
distributed is the t-test.
Null hypothesis (Ho): There is no difference in growth
Sara's Hypothesis is that newborn mice injected with the
hormone will be heavier after 3 weeks of growth than
mice without the hormone (HA).
Is this a one or two tailed test?
One tailed
Number of Tails
The number of tails in a test refers to the number of
ways that the two groups can differ. The following
hypothesis would lead us to perform a two-tailed test:
The mean weight of mice injected with the hormone will
be different than the mean weight of the control mice.
This is two-tailed because the hypothesis proposes two
possible outcomes. The hypothesis is true if the weight
hormone mice is greater than the weight of control mice.
The hypothesis is also true if the weight of hormone mice
is less than the weight of control mice.
The following hypothesis would
lead us to perform a one- tailed test:
The mean weight of mice injected with the hormone will be
greater than the mean weight of the control mice.
The following hypothesis would also lead us to perform a
one-tailed test.
The mean weight of mice injected with the hormone will be
less than the mean weight of the control mice.
This is a one-tailed test because the hypothesis proposes
that there is only one possible outcome: the weight of the
hormone mice will be less than the weight of the control
mice.
The calculations for the test can be performed by hand
but computer software can do them very quickly.
To perform the test, the weight data for the two groups
of mice above are entered into a t-test program.
Using P = 0.05 what is the critical value?
  1.734
(1) Label a cell TTEST(2) Click on the adjacent cell
(3) Tools | Data Analysis | T-test: Two-Sample
Assuming equal variance
Group 1 mass (g)
Group 2 mass (g)
with Hormone
12.5
13
12
12
13
14
13
10.5
9.5
11
Mean = 12.05
Control
12
8.5
10
8
8
13.5
9
8.5
6.5
9
Mean = 9.3
TTEST
Group 1 mass (g) Group 2 mass (g)
with Hormone
Control
TTEST
12.5
12
t-Test: Two-Sample Assuming Equal Variances
13
8.5
12
10
Variable 1
Variable 2
12
8
Mean
12.05
9.3
13
8
Variance
1.858
4.233
14
13.5
Observations
10
10
13
9
Pooled Variance
3.0458
10.5
8.5
Hypothesized Mean Difference
0
9.5
6.5
df
18
11
9
t Stat
3.523
Mean = 12.05
Mean = 9.3
P(T<=t) one-tail
0.001214
t Critical one-tail
1.734
P(T<=t) two-tail
0.002427
t Critical two-tail
2.101
est: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
12.05
1.858
10
3.046
0
18
3.523
0.00121
1.734
0.00243
2.101
Variable 2
9.3
4.233
10
Determine if there is a difference or not.
If t < critical value then there is no significant
difference between the two data sets
If t > critical value then there is a significant
difference between the two data sets
t > t critical (3.523 > 1.734)
So, at 0.05 level there is a significant difference
between the two data sets… we reject the null
hypothesis… that the mice had the same mass gain.
Alternate hypothesis: The mice that received the
hormone injection gained more weight.
You could also come to this
conclusion using the P value
INSERT \ FUNCTION \ TTEST
Group 1
mass (g)
12.5
13
12
12
13
14
13
10.5
9.5
11
Group 2
Mass (g)
12
8.5
10
8
8
13.5
9
8.5
6.5
9
12.05
9.3
P=
0.0012
Means
The software reveals that p = 0.0012. The probability that
the difference between the two means (12.05 and 9.3) is due
to chance (random effects) is 0.0012 (or 12 out of 10,000).
Because p < 0.05 ( 5 out of 100), we conclude that the two
means are really different and that the difference is not due
to chance.
Conclusion: The researcher accepts her hypothesis
that the hormone produces faster growth.
If p had been greater than 0.05, we would reject her
hypothesis (accept Ho) and conclude that the two
means are not significantly different ; the hormone
did not cause one group to be heavier.
The word "significant" has a slightly different meaning in
statistics than it does in general usage.
In a statistical test of two means, if the difference is not due
to chance, we conclude that the two means are significantly
different.
In this example, the mean weight of group 1 is significantly
heavier than the mean weight of group 2.
More Examples of
Statistical Analysis
Using a t-test
Example #4: A researcher wishes to learn if a certain drug slows
the growth of tumors. She obtained mice with tumors and
randomly divided them into two groups. She then injected one
group of mice with the drug and used the second group as a
control. After 2 weeks, she sacrificed the mice and weighed the
tumors. The weight of tumors for each group of mice is below.
The researcher is interested in learning if the drug reduces the
growth of tumors.
Her hypothesis is: The mean weight of tumors from mice in
group A will be less than the mean weight of mice in group 2.
Mean =
Group A
Treated with Drug
Group B
Control- Not Treated
0.72
0.71
0.68
0.83
0.69
0.89
0.66
0.57
0.57
0.68
0.66
0.74
0.70
0.75
0.63
0.67
0.71
0.80
0.73
0.78
0.675
0.742
A t-test can be used to test the probability that the
two means do not differ.
1. Set up the null and alternative hypothesis
Null hypothesis Ho: the two means of the tumor masses
do not differ.
Alternative hypothesis HA: The tumors from the
group treated with the drug will weigh less than
tumors from the control group.
Is this a one or two tailed test?
This is a one-tailed test because the researcher is
interested in if the drug decreased tumor size.
She is not interested in if the drug changed tumor size.
The values from the table above are entered into the
spreadsheet as shown below.
You need to activate:
Add ins: Data Analysis Toolpak
(1) Label a cell TTEST(2) Click on the adjacent cell
Group A
with drug
tumor mass (g)
0.72
0.68
0.69
0.66
0.57
0.66
0.7
0.63
0.71
0.73
0.675
TTEST
Group B
control
tumor mass (g)
0.71
0.83
0.89
0.57
0.68
0.74
0.75
0.67
0.8
0.78
0.742
(3) Tools | Data Analysis |
T-test: Two-Sample
Assuming equal variance
MEAN
(1) Label a cell TTEST(2) Click on the adjacent cell
Group A
with drug
tumor mass (g)
0.72
0.68
0.69
0.66
0.57
0.66
0.7
0.63
0.71
0.73
0.675
TTEST
Group B
control
tumor mass (g)
0.71
0.83
0.89
0.57
0.68
0.74
0.75
0.67
0.8
0.78
0.742
(3) Tools | Data Analysis |
T-test: Two-Sample
Assuming equal variance
MEAN
t-Test: Two-Sample Assuming Equal Variances
Variable 1
Mean
0.675
Variance
0.0022944
Observations
10
Pooled Variance
0.0052672
Hypothesized Mean Difference
0
df
18
t Stat
-2.0642818
P(T<=t) one-tail 0.0268544
t Critical one-tail 1.7340636
P(T<=t) two-tail 0.0537089
t Critical two-tail
2.100922
Variable 2
0.742
0.00824
10
Determine if there is a difference or not.
Using P = 0.05 what is the critical value?
  1.734
Determine if there is a difference or not.
If t < critical value then there is no significant
difference between the two data sets
If t > critical value then there is a significant
difference between the two data sets
t > t critical (2.06 > 1.734)
Decision: So, at 0.05 level there is a significant
difference between the two data sets… we reject the
null hypothesis… that both groups would have the
same mass tumors.
Conclusion: The drug reduced the size of the tumors.
t-Test: Two-Sample Assuming Equal Variances
Variable 1
Mean
0.675
Variance
0.0022944
Observations
10
Pooled Variance
0.0052672
Hypothesized Mean Difference
0
df
18
t Stat
-2.0642818
P(T<=t) one-tail 0.0268544
t Critical one-tail 1.7340636
P(T<=t) two-tail 0.0537089
t Critical two-tail
2.100922
Variable 2
0.742
0.00824
10
Decision: The t-test shows that tumors from the drug group
were significantly smaller than the tumors from the control
group because p < 0.05. Reject the null hypothesis!
Conclusion: The researcher therefore accepts her hypothesis
(HA) that the drug reduces the growth of tumors.
T-Test using EXCEL
Note: This only
gives the P value
(1) Make data table in EXCEL
(2) Add cell P =
Insert / Function / TTEST
Group A
with drug
tumor mass (g)
0.72
0.68
0.69
0.66
0.57
0.66
0.7
0.63
0.71
0.73
0.675
P =
Type 1:paired
Type 2:unpaired
Group B
control
tumor mass (g)
0.71
0.83
0.89
0.57
0.68
0.74
0.75
0.67
0.8
0.78
0.742
0.0269
MEAN
Example #5: A researcher
wishes to learn whether the pH
of soil affects seed germination
of a particular herb found in
forests near her home.
She filled 10 flower pots with acid soil (pH 5.5) and ten flower
pots with neutral soil (pH 7.0) and planted 100 seeds in each pot.
The mean number of seeds that germinated in each type of soil
is given on the table…
Table 1: Mean # of germanium
seeds germinated at different pH
Acid Soil
pH 5.5
Mean =
Neutral Soil
pH 7.0
% germination
% germination
42
43
45
51
40
56
37
40
41
32
41
54
48
51
50
55
45
50
46
48
43.5
48
The researcher is testing whether soil pH affects
germination of the herb.
A t-test can be used to test the probability that the
two means do not differ.
Her hypothesis is: The mean germination at pH 5.5 is
different than the mean germination at pH 7.0.
What is the null hypothesis?
Null hypothesis (Ho) : The mean germination at pH 5.5 is
the same as the mean germination at pH 7.0.
Alternative hypothesis (HA) : The mean germination at pH
5.5 is different than the mean germination at pH 7.0.
Null hypothesis (Ho) : The mean germination at pH 5.5 is
the same than the mean germination at pH 7.0.
Alternative hypothesis (HA) : The mean germination at pH
5.5 is different than the mean germination at pH 7.0.
Is this a one or two tailed test?
This is a two-tailed test because the researcher is interested
in if soil acidity changes germination percentage. She does
not specify if it increases or decreases germination.
Table 1: Mean # of germanium
seeds germinated at different pH
Acid Soil
pH 5.5
Mean =
Neutral Soil
pH 7.0
% germination
% germination
42
43
45
51
40
56
37
40
41
32
41
54
48
51
50
55
45
50
46
48
43.5
48
Is this a Type 1 or
Type 2 study?
Two independent
groups… Type 2
How many degrees of
freedom are there?
DF  n1  n2  2
DF  10  10  2  18
Choose your level of significance and find
the critical value using the table.
at the 0.05 level
t crit = 2.101
  2.101
Calculate the value of t (use EXCEL).
Tools | Data Analysis |
Mean =
Acid Soil
pH 5.5
Neutral soil
pH 7.0
% germination
% germination
42
45
40
37
41
41
48
50
45
46
43.5
43
51
56
40
32
54
51
55
50
48
48
TTEST
T-test: Two-Sample
Assuming equal variance
t-Test: Two-Sample Assuming Equal Variances
Variable 1
Mean
43.5
Variance
15.83333333
Observations
10
Pooled Variance
36.58333333
Hypothesized Mean Difference
0
df
18
t Stat
-1.66362669
P(T<=t) one-tail
0.056749718
t Critical one-tail
1.734063592
P(T<=t) two-tail
0.113499436
t Critical two-tail
2.100922037
Variable 2
48
57.333333
10
Use the
absolute
value !
t  1.66
Determine if there is a difference or not.
If t < critical value then there is no significant
difference between the two data sets
If t > critical value then there is a significant
difference between the two data sets
t < t critical (1.66 < 2.101)
Decision: So, at 0.05 level there is not a significant
difference between the two data sets… we accept the
null hypothesis… that both groups would have the
same % germination.
Conclusion: The mean germination at pH 5.5 is not
different than the mean germination at pH 7.0.
t-Test: Two-Sample Assuming Equal Variances
Variable 1
Mean
43.5
Variance
15.83333333
Observations
10
Pooled Variance
36.58333333
Hypothesized Mean Difference
0
df
18
t Stat
-1.66362669
P(T<=t) one-tail
0.056749718
t Critical one-tail
1.734063592
P(T<=t) two-tail
0.113499436
t Critical two-tail
2.100922037
Variable 2
48
57.333333
10
The t-test shows that the mean germination of the two groups
does not differ significantly because p > 0.05. The researcher
concludes that pH does not affect germination of the herb.
Example #6: Suppose a
researcher wished to
learn if a particular
chemical is toxic to a
certain species of beetle.
She believes the chemical might interfere with the
beetle’s reproduction. She obtained beetles and divided
them into two groups. She then fed one group of beetles
with the chemical and used the second group as a
control. After 2 weeks, she counted the number of eggs
produced by each beetle in each group.
The mean egg count for each group of beetles is below.
Group 1
fed chemical
Mean =
33
31
34
38
32
28
32.7
Group 2 not fed
chemical (control)
35
42
43
41
40.3
1. Set up the null and alternative hypothesis
The researcher believes the chemical
interferes with beetle reproduction.
She suspects that the chemical reduces
egg production.
Her hypothesis is: The mean number of eggs in group 1
less than the mean number of group 2.
A t-test can be used to test the probability that the two
means do not differ (null hypothesis).
Ho = The mean number of eggs
is the same for both groups.
HA = The mean number of eggs is less
for group 1 than group 2.
is
Is this a one or two tailed test?
This is a 1-tailed test because her hypothesis proposes that
group B will have greater reproduction than group 1.
If she had proposed that the two groups would have
different reproduction but was not sure which group
would be greater, then it would be a 2-tailed test.
Group 1
fed
chemical
Group 2
not fed
chemical
(control)
33
35
31
42
34
43
38
41
Calculate the degrees
of freedom
40.3
DF  n1  n2  2
DF  6  4  2  8
32
28
Mean
32.7
Is this a Type 1
or Type 2 study?
Type 2 … there are
2 independent groups
Note: group populations
can be different sizes
Choose your level of significance and find
the critical value using the table.
  1.86
at the 0.05 level
t crit = 1.86
Calculate the value of t (use EXCEL).
Tools | Data Analysis |
T-test: Two-Sample
Assuming equal variance
t-Test: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
32.67
11.07
6
11.76
0
8
-3.426
0.005
1.860
0.009
2.306
Variable 2
40.25
12.92
4
Determine if there is a difference or not.
If t < critical value then there is no significant
difference between the two data sets
If t > critical value then there is a significant
difference between the two data sets
t > t critical (3.43 > 1.86)
So, at 0.05 level there is a significant difference between
the two data sets… we reject the null hypothesis… that
the both beetle groups had the same # of eggs
HA: The mean number of eggs is less for group 1 than
group 2.
Conclusion: The chemical reduces
the number of beetle eggs produced.
You can also determine the answer by finding P value
Group 1
33
31
34
38
32
28
Group 2
35
42
43
41
P=
P=
0.0045
0.00450564
Insert / Function / TTEST
The results of her t-test are copied below.
t-Test: Two-Sample Assuming Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Variable 1
32.67
11.07
6
11.76
0
8
-3.426
0.005
1.860
0.009
2.306
Variable 2
40.25
12.92
4
The researcher concludes that the mean of group 1 is
significantly less than the mean for group 2 because the
value of P < 0.05. She accepts her hypothesis that the
chemical reduces egg production because group 1 had
significantly less eggs than the control.
Hypothesis Testing Errors
Hypothesis Testing Errors
Type I Error: A type I error happens when a true null
hypothesis is rejected. The probability of a type I
error is denoted α.
Type II Error: A type II error occurs when a false null
hypothesis is not rejected. The probability of a type II
error is denoted β.
Decision
H0 is true
H0 is false
Reject H0
Type I Error
Correct Decision
Fail to Reject H0 Correct Decision Type II Error
The goal is to make both α and β as small as
possible. Unfortunately for a fixed sample size,
decreasing α will increase β and vice versa.
It is necessary to choose which error is more
important to decrease based on the scenario.
In most cases, β is difficult to calculate so α is
set between 0.01 and 0.10.
The probability of a type I error, α, is also the level of
significance. If α = 0.05 is chosen, then a null
hypothesis is rejected if the sample would only
happen 5% of the time if the null hypothesis is true.
That means that there is a probability of 0.05 that
the null hypothesis will be rejected when it is true.
Analogy
Trials are like hypothesis
tests. Since a person is
innocent until proven
guilty, innocence is the
status quo.
H0= the plaintiff is innocent
HA= the plaintiff is guilty
If a jury convicts an innocent man, the jury has made a
type I error.
If the jury comes to the conclusion that a man is
innocent, but he was actually guilty, they have made a
type II error.
In this case, a decision has to be made about whether it is
better to minimize the probability of a type I error or a type
II error. Is it better to send an innocent man to jail or to
release a guilty man?
In criminal trials the
precedent is that a man
is only convicted if the
evidence is beyond a
reasonable doubt.
We don’t want to convict an innocent man, so the courts
try to minimize the probability of a type I error. Of course
this means that more guilty people are not convicted so
the probability of a type II error is higher.
d
Species area curve.
Figure EMC.sp-12. Number of wildlife species or
groups using different tree/snag species in Eastside
Mixed Conifer Forest Wildlife Habitat Type.
Data Evaluation and Comparisons
http://www.chem.utoronto.ca/coursenotes/analsci/StatsTut
orial/AdvStats.html
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
T-Test Calculation : Excel 2007 (calculating P)
In Excel 2007 the
TTEST to calculate P
is accessed by
following the routine
provided to the left.
QuickTime™ and a
decompressor
are needed to see this picture.
Note that his directly
calculates P and not t
STAT
After step 5 a dialog
box opens (see
below).
T-Test Calculation : Excel 2007 (calculating P)
QuickTime™ and a
decompressor
are needed to see this picture.
Enter the setting as
provided:
In Excel 2003 the t
test is performed
using the formula: =
TTEST (range1,
range2, tails, type) .
For the examples you'll use in biology, tails is always
2 , and type can be:
1, paired
2,Two sample equal variance
3, Two samples unequal variance
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
The cell with the t test P can
be formatted as a
percentage (Format menu >
cell > number tab >
percentage).
This automatically
multiplies the value by 100
and adds the % sign.
This can make P values
easier to read and
understand. It's also a good
idea to plot the means as a
bar chart with error bars of
standard error (or SD) to
show the variability in the
data.
Sample Excel Data
t-Test: Two-Sample Assuming Unequal Variances
Variable 1
Mean
31.4
Variance
32.0
Observations
10
Hypothesized Mean Difference 0
df
17
t Stat
-4.54
P(T<=t) one-tail
0.00
t Critical one-tail
1.74
P(T<=t) two-tail
0.00
t Critical two-tail
2.11
Variable 2
41.6
18.5
10
QuickTime™ and a
decompressor
are needed to see this picture.
t-test to Compare One Sample Mean to an Accepted Value
We have already seen how to do the first step, and have
null and alternate hypotheses. The second step involves
the calculation of the t-statistic for one mean, using the
formula:
x  o
t
s
n
where s is the standard deviation of the sample, not the
population standard deviation. In our case,
t Test (non-matched pairs)
QuickTime™ and a
decompressor
are needed to see this picture.
Quic kTime™ and a
dec ompres sor
are needed to see this pic ture.
where n is the number of points
X A is the mean of set A
sA is the standard deviation of set A
X B is the mean of set B
SB is the standard deviation of set B
The probability is then found from a table of t values.
t Test (matched pairs)
Quick Time™a nd a
dec ompr esso r
ar e nee ded to see this pictur e.
Quic kTime™ and a
dec ompres sor
are needed to see this pic ture.
where n is the number of points
X is the mean of the differences
and s is the standard deviation of the differences
t-test to Compare Two Sample Means
In this case, we require two separate sample means,
standard deviations and sample sizes. The number of
degrees of freedom is computed using the formula
s
s 
 n  n 
1
2
2
1
d.o. f . 
2
2
2
4
4


s1
s2
 n 2(n  1)  n 2(n  1) 
1
1
2
1
2
and the result is rounded to the nearest whole number.
Once these quantities are determined, the same three
steps for determining the validity of a hypothesis are
used for two sample means.
Table of Critical t-Values for 95% Confidence
Level
ν=n-1
1
2
3
4
5
6
7
8
9
10
12
14
15
16
18
20
25
30
tcrit
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.179
2.145
2.131
2.120
2.101
2.086
2.060
2.042
Table of critical values
for a 2-tailed t-test at 95%
confidence level,
generated from Excel
using the TINV function.
t-test to Compare One Sample Mean to an Accepted Value
In the example, the mean of arsenic concentration
measurements was m = 4 ppm, for n = 7 and, with
sample standard deviation s = 0.9 ppm. We established
suitable null and alternative hypothoseses:
Null Hypothesis
H0: μ = μ0
Alternate Hypothesis HA: μ > μ0
where μ0 = 2 ppm is the allowable limit and μ is the
population mean of the measured soil (refresher on the
difference between sample and population means).
t-test to Compare One Sample Mean to an Accepted Value
For the third step, we need a table of tabulated t-values
for significance level and degrees of freedom, such as
the one found in your lab manual or most statistics
textbooks. Referring to a table for a 95% confidence limit
for a 1-tailed test, we find tν = 6.95% = 1.94.
(The difference between 1- and 2-tailed distributions was
covered in a previous section.)
t-test to Compare One Sample Mean to an Accepted Value
We are now ready to accept or reject the null hypothesis.
If the tcalc > ttab, we reject the null hypothesis. In our case,
tcalc = 5.88 > ttab = 2.45, so we reject the null hypothesis,
and say that our sample mean is indeed larger than the
accepted limit, and not due to random chance, so we can
say that the soil is indeed contaminated.
One and Two Sided
Tests
• Hypothesis tests can be one or two sided
(tailed)
• One tailed tests are directional:
H0: μ1 - μ2 ≤ 0
HA: μ1 - μ2 > 0
• Two tailed tests are not directional:
H0: μ1 - μ2 = 0
HA: μ1 - μ2 ≠ 0
Some
Notation
• In general, critical values for an α level test
denoted as:
One sided test : Xa
Two sided test : Xa/2
where X depends on the distribution of the test
statistic
• For example, if X ~ N(0,1):
One sided test : za (i.e., z0.05 = 1.64)
Two sided test : za/2 (i.e., z0.05 / 2 = z 0.025 = ± 1.96)
x
x
1- vs 2-Tailed Tests
One-tailed tests
In the previous pages, you learned how to perform
define the hypothesis for a statistical test, then to
perform a t-test to compare means. In the example t-test
we performed, we defined an alternate hypothesis to test
whether one mean was greater than the other: μ > μ0.
In this situation, we tested whether one mean was higher
than the other. We were not interested in whether the
first mean was lower than the other, only if it was higher.
So we were only interested in one side of the probability
distribution, which is shown in the image below:
One-tailed tests
In the previous pages, you learned how to perform
define the hypothesis for a statistical test, then to
perform a t-test to compare means. In the example t-test
we performed, we defined an alternate hypothesis to test
whether one mean was greater than the other: μ > μ0.
In this situation, we tested whether one mean was higher
than the other. We were not interested in whether the
first mean was lower than the other, only if it was higher.
So we were only interested in one side of the probability
distribution, which is shown in the image below:
One-tailed tests
In this distribution, the shaded region shows the area
represented by the null hypothesis, H0: μ = μ0. This
actually implies μ ≤ μ0, since the only unshaded region in
the image shows μ > μ0. Because we were only
interested in one side of the distribution, or one "tail",
this type of test is called a one-sided or a one-tailed test.
When you are using tables for probability distributions,
you should make sure whether they are for one-tailed or
two-tailed tests. Depending on which they are for, you
need to know how to switch to the one you need.
One-tailed tests
A one-tailed test uses an alternate hypothesis that states
either H1: μ > μ0 OR H1: μ < μ0, but not both. If you want
to test both, using the alternate hypothesis H1: μ ≠ μ0,
then you need to use a two-tailed test.
Two-tailed tests
We would use a two-tailed test to see if two means are
different from each other (ie from different populations),
or from the same population. As an example, let's assume
that we want to check if the pH of a stream has changed
significantly in the past year. A water sample from the
stream was analyzed using a pH electrode, where six
samples were taken. It was found that the mean pH
reading was 6.5 with standard deviation sold = 0.2.
A year later, six more samples were analyzed, and the
mean pH of these readings was 6.8 with standard
deviation sold = 0.1.
Example 1
We could use a one-tailed test, to see if the stream has a
higher pH than one year ago, for which we would use the
alternate hypothesis HA: μprev < μcurrent. However, we may
want a more rigorous test, for the hypothesis that HA: μprev
≠ μcurrent. This would mean that both HA: μprev < μcurrent and
HA: μprev > μcurrent were satisfied, and we could be sure
that there is a significant difference between the means.
The probability distribution for a 90% confidence level,
two-tailed test looks like this:
Continuing the example, we define the null hypothesis
Ho: μprev = μcurrent, and the alternate hypothesis HA:
μprev ≠ μcurrent. The d.o.f. for a two sample mean ttest is ν = 7.35 ≈ 7, since the d.o.f. must be a whole
number. The t-value for the two sample test is
t = (6.8-6.5)/sqrt((0.1)^2/6 + (0.2)^2/6) = 3.29
If we consult a two-tailed t-test table, for a 95%
confidence limit, we find that t7,95% = 2.36.
Since tcalc > t7,95%, we reject the null hypothesis, accept
the alternate hypothesis that μprev ≠ μcurrent, and can say
that the means are significantly different.
Using Tables for One- and Two-Tailed Tests
Some tables of critical t-values only give you the values
for either a one- or two-tailed test, but not both.
Because of this, you will need to know how to use onetailed tables for two-tailed tests, and vice versa.
The conversion is actually quite simple:
Table you have
Operation
To get ...
One-tailed
Two-tailed
Divide P by 2
Multiply P by 2
Two-tailed test for P/2
One-tailed test for 2P
Using Tables for One- and Two-Tailed Tests
Table you have
Operation
To get ...
One-tailed
Two-tailed
Divide P by 2
Multiply P by 2
Two-tailed test for P/2
One-tailed test for 2P
For example, assume you have a table to a one-tail test
at the 98% confidence level and want to perform a twotailed test. For the 98% confidence level, P = 0.02.
Divide P by 2 to get 0.01, which is a 99% confidence
level. So you would compare tcalc to the value from the
98% one-tailed table, and it would be equivalent to a
two-tailed test at the 99% confidence level.
Using Tables for One- and Two-Tailed Tests
Table you have
Operation
To get ...
One-tailed
Two-tailed
Divide P by 2
Multiply P by 2
Two-tailed test for P/2
One-tailed test for 2P
Cell B12 has the t-test
probability, which is a tiny
0.000065, and indicates that the
difference between the before
and after results is very highly
significant. If we had used the
normal unmatched pairs t-test,
we would have obtained a P of
0.225, which is higher than 0.05,
so indicates that the apparent
increase in pulse rate with
eating is not significant! This
shows the importance of
choosing the right test.
Pulse Before
bpm eating
mean
After
eating
105
109
79
87
79
86
103
109
87
100
74
82
73
80
83
90
85.4
92.9
Writing a conclusion:
The results of the dependent t-test can be seen in the
resultant table. The value of t (t Stat) is -4.53744, which
we can round off to -4.54.
The probability of this result being due to chance can
be read from the table as 0.000291 (two-tail) which
means that this result is significant at the .0003 level.
We will set our alpha level as .05, so we will say that p
< .05 rather than that p = .0003.
We could also look up the t critical value or cut-off
value for t from the table by looking at t Critical one-tail
which is 2.110 without using the spreadsheet.
x
P varies from 0 (not likely) to 1 (certain). The higher
the probability, the more likely it is that the two
sets are the same, and that any differences are just
due to random chance. The lower the probability, the
more likely it is that that the two sets are significantly
different, and that the differences are real.
Where do you draw the line between these two
conclusions? In biology the critical probability is
usually taken as 0.05 (or 5%). This may seem very
low, but it reflects the facts that biology experiments
are expected to produce quite varied results. So if P
> 0.05 then the two sets are the same, and if P < 0.05
then the two sets are different. For the t test to work,
the number of repeats should be as large as
possible, and certainly > 5.
In Excel the t test is performed using the formula:
=TTEST (range1, range2, tails, type) . For the
examples you'll use in biology, tails is always 2
(for a "two-tailed" test), and type can be 1 or 2
depending on the circumstances.
Statistical hypotheses form the basis of statements
and conclusions that we can make about sets of data.
A hypothesis is a statement designed to be proven or
disproven , such as "The sample means of two sets of
data are statistically the same and the samples come
from the same overall population."
Taking the example above comparing sample means, we
would define the hypothesis H
H: "two sets of data (1 and 2), with sample means m1
and m2, are both part of the same population, so that
their populations means are equal, μ1 = μ2."
If we accept this hypothesis, we are saying that despite
the fact that the samples came from two different
measurements, they are part of the same overall
population or that the measurement is being made on
the same general system. If we reject the hypothesis, we
are saying the population means are different, and that
we are dealing with two separate systems.
We use statistical tests of significance to determine
whether to accept the hypothesis or not, and we choose
the test depending on if we are comparing two or more
means, standard deviations, or variances. Two tests are
covered in this tutorial: the t-test and the F-test.
Other tests, such as Z-tests, χ2-tests, and Analysis of
Variance (ANOVA), are described in most statistics
textbooks.
First, we will see how to construct a hypothesis.
Referring to the above example of a comparison
between means, assume we want to analyze some soil
to determine its arsenic content and to see if it exceeds
the allowable amount. We have run a series of n = 7
tests various soil samples and find that the mean
arsenic concentration is 4 ppm, with a standard
deviation of s = 0.9 ppm. If the allowable limit is 2 ppm
arsenic, we wish construct a hypothesis to determine
whether the soil is indeed contaminated, or if the
difference between the sample mean and the allowable
limit is to random error.
There are two possibilities:
1. The true mean of the soil arsenic concentration μ is
greater than the allowable limit: μ > 2 ppm = μ0
2. The true mean of the soil arsenic concentration μ1 is
the same or less than the allowable limit and any
deviation is due to random error: μ1 ≤ 2 ppm
To set up the hypothesis, we make what is called the
null hypothesis, which says there is no difference
between the means. We also set up an alernate
hypothesis, which is the hypothesis we adopt if the null
hypothesis is disproved.
Null Hypothesis
H0: μ = μ0
Alternate Hypothesis HA: μ > μ0
where μ0 = 2 ppm is the allowable limit.
If our statistical test shows that the the null hypothesis
is true, we conclude that the means are equal and the
arsenic concentration in the soil is not above the
allowable limit.
If the test shows that the null hypothesis is false, then
we accept the alternate hypothesis, and can conclude
that the arsenic concentration in the soil is indeed above
the allowable limit.
The statistical test we would use in this case is the t-test,
which we will explore on the next page.
It is important to note that the hypothesis we just
established is a one-tailed test, since we were looking at
the probability that the sample mean was either "greater
than", or "less than or equal to" 2 ppm. It is also
possible to have a two-tailed test, where we would try to
establish "equal to" or "not equal to." This concept is
covered in the pages on confidence levels and one- and
two-tailed tests.
Evaluation of Means for small samples - The t-test
In the previous example, we set up a hypothesis to test
whether a sample mean was close to a population mean
or desired value for some soil samples containing
arsenic. On this page, we establish the statistical test to
determine whether the difference between the sample
mean and the population mean is significant. It is called
the t-test, and it is used when comparing sample means,
when only the sample standard deviation is known.
QuickTime™ and a
decompressor
are needed to see this picture.
Parametric and NonParametric Tests
• Parametric Tests: Relies on theoretical
distributions of the test statistic under the null
hypothesis and assumptions about the
distribution of the sample data (i.e., normality)
• Non-Parametric Tests: Referred to as
“Distribution Free” as they do not assume that
data are drawn from any particular distribution