Download t-test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Essential Question:

How do scientists use statistical
analyses to draw meaningful
conclusions from experimental results?
Standard

Design, conduct and ANALYZE an
experimental study
Purpose of Statistical Analysis
 Determines
whether the results
found in an experiment are
meaningful.
 Answers
the question:
 Does the data gathered in an
experiment support the research
hypothesis or cause us to reject
it?
2 types of Statistics
Descriptive statistics is the use of
statistical and graphical techniques to
present information about the data set
being studied.
 Inferential statistics is the use of drawing
conclusions about a population using a
sample considered to be representative
of that population.

Example Study
A researcher performs a clinical trial (randomized,
controlled, double-blind experimental study) to
determine whether a new medication is effective
in lowering LDL cholesterol levels in patients with
hypercholesterolemia.
○ The researcher believes that the medication
will be more effective in lowering LDL
cholesterol levels in patients with
hypercholesterolemia than a placebo.
Step 1
Where does a researcher
begin analysis of the data
collected?
Create a Hypothesis
First: Identify the Hypothesis

When statistically analyzing experiments, it is
necessary to set up two hypotheses.
 The first hypothesis is called the null hypothesis.
 The second hypothesis is called the alternative
hypothesis.
○ Note: The alternative hypothesis should always be
set up before beginning the experiment.
Null Hypothesis

Null Hypothesis (H0): The starting point in
scientific research where the experimenter
assumes there is no effect of the treatment or no
relationship between two variables.
Example: The mean of group one’s data set is equal to the
mean of group two’s data set.

LDL experiment Null Hypothesis:
 There is no difference in LDL cholesterol levels in
patients given the medication than in patients given
the placebo.
Alternative Hypothesis

Alternative Hypothesis (Ha): (Also called
research hypothesis) What the
experimenter thinks may be true (what will
happen) before beginning the experiment.

Two types of alternative hypotheses:
 Directional
 Nondirectional
Alternative Hypothesis

Directional Alternative Hypothesis: Hypothesis
predicting that the mean ( ) of one group will
be more or less than the mean of the other
group.


Example:
Non-directional Alternative Hypothesis:
Hypothesis predicting that the mean of one
group is not equal to the mean of the other
group(without specifying whether it will be more
or less).
○ Example:
Alternative Hypotheses for the LDL
Medication Experiment
 Directional Alternative Hypothesis:
 The LDL cholesterol levels will be lower in the patients given
the medication than the patients given the placebo.
 Non-Directional Alternative Hypothesis:
 The LDL cholesterol levels for the patients receiving the
medication will not equal the LDL cholesterol levels for the
patients receiving the placebo.
Researcher Gathers Data
Patients given the medication
Patient
LDL levels
Patients given the placebo
Patient
LDL levels
A
174 g/dL
B
151 g/dL
A
159 g/dL
B
169 g/dL
C
133 g/dL
C
148 g/dL
D
126 g/dL
D
156 g/dL
What does the researcher do now?
 Does this data support/reject the
alternative hypothesis?
 How can he/she be sure?

Descriptive Statistics

Begin with the descriptive
 Measuring mean, median and mode
○ Know when to use which one
 Measure range, standard deviation and
variance
○ Understand degrees of freedom
Determine Typical Data Values
Determine what values are typical, or
normal, for the data collected (for the
sample).
○ Calculate the mean, variance, and
standard deviation.
Mean
Mean ( ): The arithmetic average.
Example Calculation for the LDL
Medication Experiment
Next Step?
Now that the mean has been calculated,
it is important to determine how spread
out the data is from the mean.
 Two measures of spread are variance and
standard deviation.
Variance
Variance: The measure of the spread of the data
about the mean. Variance is referred to as the
average deviation of the data points from the mean.
The more spread out the data points are, the larger
the variance will be.
○ Step One: Calculate the deviation (or difference of the data
point from the mean) for each data point.
○ Step Two: Square these deviations to ensure that all values
are positive.
○ Step Three: Calculate the sum of all of these squared
deviations.
○ Step Four: Divide by the number of data points in the data set
minus one.
Calculating the Variation
Sample size (number of data points) = n
Variance Calculation for the LDL
Medication Experiment
Standard Deviation
Standard Deviation: The measure of the spread of
the data about the mean
When calculating the variance, you must square each
difference in order to obtain all positive numbers. This
results in large numbers.
 The standard deviation is the square root of the variance,
resulting in a number representative of the data set
because it is in the same scale and same unit of measure
as the original data points. This provides a more accurate
measure of the spread than the variance.

 Step One: Calculate the variance for the data set.
 Step Two: Take the square root of the variance.
Calculating the Standard Deviation
Example Calculation for the LDL
Medication Experiment
Importance of Calculating the
Standard Deviation

The standard deviation summarizes how close the
data points are to the mean.

Determines if data set is normally distributed.
 We can say that the data set is normally distributed if:
○ Approximately 68% of the data falls within one standard
deviation of the mean;
○ Approximately 95% of the data falls within two standard
deviations of the mean; and
○ Over 99% of the data falls within three standard
deviations of the mean.
If normally distributed:

Can confidently represent the data set
with the mean
If NOT normally distributed:

Must use either the median or the mode
to represent the data set due to outliers
and/or a confounding variable
Determining Statistical Significance
The results show that the mean of the LDL
levels of the subjects given the
experimental medication were lower than
the LDL levels of the subjects given the
placebo.
 How can the researcher determine whether
the difference between the experimental
group and control group in lowering blood
LDL levels was actually due to the
medication or due to chance?

 i.e., Were the results statistically significant?
Statistical Significance

The measure of the probability of getting
a test statistic rare enough that the null
hypothesis can be comfortably rejected.

0.05 (5%) is the widely accepted
statistical significance level in biology
○ Meaning: there is a probability of 5% or less
that the test statistic is calculated by chance,
given the sample data.
How do we determine significant
differences between the means of
two sets of data?
t-test
t-tests
t-test: Type of statistical calculation
used to determine whether the
differences between the means of two
samples are statistically significant.
 Two main types of t-tests are commonly
used to analyze biomedical data:
○ Student’s t-test (i.e., independent t-test)
○ Paired t-test (i.e., dependent t-test)
Student’s t-test
Student’s t-test: Used to determine
whether the difference between the
means of two independent groups (both
which are being tested for the same
dependent variable) is statistically
significant.
Example: Study set up where one group is
given the experimental treatment and
another group is given the placebo.
There are three variations of the same
formula for the student’s t-test:
 The first variation should be used when the sample sizes are
unequal AND either one or both samples are small (n<30).
 The second variation should be used when the sample sizes are
equal (regardless of size).
 The third variation should be used when sample sizes are
unequal AND both sample sizes are large (n>30).
Paired (dependent) t-test
Paired t-test: Used to determine
whether the difference between the
means of two groups (each containing
the same participants and being tested
at two different points) is statistically
significant.
Example: Study set up where the same
group of participants is followed before and
after an experimental treatment.
Formula for Paired t-test
Now Back to the LDL Medication
Experiment

Determine which type of t-test (student’s
t-test or paired t-test) is most
appropriate for our LDL levels
experiment.
 Because the two groups being tested are
independent of each other (participants in
the experimental group and the control
group are different), the student’s t-test is
the appropriate test to use.
Which variation of the formula
should we use?
 Both the experimental group and control
group contain four participants. Because the
sample sizes are equal, the second variation
should be used.
Calculations:
Calculations:
You’re Not Done Yet
What you just calculated is called the t
value.
 Next you will use a t-table in order to
determine whether your t value is statistically
significant.

T-Table
How to Use a T-Table
Step 1
Calculate the degrees of freedom for the
experiment.
Degrees of Freedom
Degrees of Freedom: A measure of
certainty that the sample populations are
representative of the population being
studied.
 To calculate the degrees of freedom for a
sample:
Total # data points in sample(s) - # populations being sampled
(Typically n-1)
Formulas for DF:
Calculating the Degrees of Freedom
for the LDL Medication Experiment
How to Use a T-Table
Step 2
Find the row
corresponding with
the appropriate
degrees of freedom
for your experiment.
How to Use a T-Table
Step 3 - Determine whether the t value
exceeds any of the critical values in the
corresponding row.
 If the t value exceeds any of the critical values in
the row, the alternative hypothesis can be
accepted. This means that the results ARE
STATISTICALLY SIGNIFICANT.
 If the t value is smaller than all of the critical
values in the row, the alternative hypothesis can
be rejected and the null hypothesis can be
accepted. This means that the results are NOT
STATISTICALLY SIGNIFICANT.
How to Use a T-Table Step 4
If the t value exceeds any of the critical values
in the row, you need to:
1) Follow the corresponding row over to the right until
you locate the column with the critical value that is
just slightly smaller than the t value.
2) Follow this column to the top of the table and
determine its corresponding p value.
3) Determine which p value to use, the p value
corresponding with a one-tailed test for significance
versus the p value corresponding with a two-tailed
test for significance.
One-Tailed vs. Two-Tailed Test for
Significance

In order to determine whether you
completed a one-tailed or two-tailed test
for significance, look back to your
alternative hypothesis.
 If the alternative hypothesis was directional,
you have completed a one-tailed test for
significance.
 If the alternative hypothesis was
nondirectional, you have completed a twotailed test for significance.
One-Tailed or Two-Tailed Test for
Significance?
The Alternative Hypothesis was as follows:
 The researcher hypothesizes that the
medication will be more effective in lowering
LDL cholesterol levels in patients with
hypercholesterolemia than the placebo.
• Since this is a directional alternative
hypothesis, you have completed a onetailed test for significance.
What Does the p Value Mean?
The p values indicate the probability that the
difference between the means of the two
samples are due only to chance.
 If the p value ≤ 0.01, the results are VERY
SIGNIFICANT.
○ The probability that the difference is due to chance
is less than or equal to1%.
 If the p value ≤ 0.05, the results are SIGNIFICANT.
○ The probability that the difference is due to chance
is less than or equal to 5%.
 If the p value > 0.05, the results are NOT
SIGNIFICANT.
○ The probability that the difference is due to chance
is greater than 5%.
Using the T-Table for the LDL
Medication Experiment
Putting It All Together
The group given the new medication had a
mean of 146.75 and a standard deviation of
20.53.
 The group given the placebo had a mean of
157.25 and a standard deviation of 11.64.
 The t value for this study was 0.89, with a p
value greater than 0.05. This means that the
results are NOT statistically significant.

 This means that the researcher can reject the
alternative hypothesis and accept the null hypothesis
that the new medication did NOT lower patients’ LDL
levels more than the placebo.