Download Statistical analysis presentation (ppt)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistical Analysis
Purpose of Statistical Analysis
• Determines whether the results found
in an experiment are meaningful.
• Answers the question:
–Does the experiment prove that the
research hypothesis is true or
false?
Example Study
A researcher performs a clinical trial
(randomized, controlled, double-blind
experimental study) to determine whether a
new medication is effective in lowering LDL
cholesterol levels in patients with
hypercholesterolemia.
• The researcher hypothesizes that the
medication will be more effective in
lowering LDL cholesterol levels in patients
with hypercholesterolemia than a placebo.
First Step?
Where should this researcher
begin analysis of the data
collected?
First: Identify the Hypothesis
When statistically analyzing experiments,
it is necessary to set up two hypotheses.
The first hypothesis is called the null
hypothesis. The second hypothesis is
called the alternative hypothesis.
Note: The alternative hypothesis should be
set up before beginning the experiment.
Hypotheses
Null Hypothesis (H0): The starting point in
scientific research where the experimenter
assumes there is no effect of the treatment or no
relationship between two variables.
Example: The mean of group one is equal to the
mean of group two.
Alternative Hypothesis (Ha): (Also called
research hypothesis) What the experimenter
thinks may be true before beginning the
experiment.
Two types of alternative hypotheses:
Directional and Nondirectional
Alternative Hypothesis
Directional Alternative Hypothesis:
Hypothesis predicting that the mean ( ) of one
group will be more or less than the mean of the
other group.
Example:
Nondirectional Alternative Hypothesis:
Hypothesis predicting that the mean of one
group is not equal to the mean of the other
group(without specifying whether it will be more
or less).
Example:
Identify the Hypotheses for the
LDL Medication Experiment
• Null Hypothesis: There is no difference in LDL
cholesterol levels in patients given the
medication than in patients given the placebo.
• Alternative Hypothesis: The LDL cholesterol
levels will be lower in the patients given the
medication than the patients given the placebo.
– This is a Directional Alternative Hypothesis
Second: Determine Typical
Data Values
Determine what values are typical, or
normal, for the data collected.
• Calculate the mean, variance, and
standard deviation.
Mean
Mean ( ): The arithmetic average.
Example Calculation for the LDL
Medication Experiment
Next Step?
Now that the mean has been calculated, it
is important to determine how spread out
the data is from the mean.
– Two measures of spread are variance and
standard deviation.
Variance
Variance: The measure of the spread of the data
about the mean. Variance is referred to as the
average deviation of the data points from the mean.
The more spread out the data points are, the larger
the variance will be.
• Step One: Calculate the deviation (or difference of the data
point from the mean) for each data point.
• Step Two: Square these deviations to ensure that all values
are positive.
• Step Three: Calculate the sum of all of these squared
deviations.
• Step Four: Divide by the number of data points in the data
set minus one.
Calculating the Variation
Example Calculation for the LDL
Medication Experiment
Standard Deviation
Standard Deviation: The measure of the
spread of the data about the mean, which is the
square root of the variance.
– When calculating the variance, you must square each difference
in order to obtain all positive numbers. This results in large
numbers.
– The standard deviation is the square root of the variance,
resulting in a number representative of the data set because it is
in the same scale and same unit of measure as the original data
points. This provides a more accurate measure of the spread
than the variance.
• Step One: Calculate the variance for the data set.
• Step Two: Take the square root of the variance.
Calculating the Standard Deviation
Example Calculation for the LDL
Medication Experiment
Importance of Calculating the
Standard Deviation
• The standard deviation summarizes how close
the data points are to the mean. We can say that
the data set is normally distributed if:
– Approximately 68% of the data falls within one
standard deviation of the mean;
– Approximately 95% of the data falls within two
standard deviations of the mean; and
– Over 99% of the data falls within three
standard deviations of the mean.
What Do The Results Show?
• The results show that the mean of the LDL
levels of the subjects given the experimental
medication were lower than the LDL levels of the
subjects given the placebo.
• How can the researcher determine whether the
difference between the experimental group and
control group in lowering blood LDL levels was
actually due to the medication or due to chance?
– i.e., Were the results statistically significant?
A type of statistical analysis
called a:
t-test
t-tests
t-test: Type of statistical calculation used
to determine whether the differences
between the means of two samples are
statistically significant.
– Two main types of t-tests are commonly used
to analyze biomedical data:
• Student’s t-test (i.e., independent t-test)
• Paired t-test (i.e., dependent t-test)
Student’s t-test
Student’s t-test: Used to determine
whether the difference between the means
of two independent groups (both which are
being tested for the same dependent
variable) is statistically significant.
Example: Study set up where one group is
given the experimental treatment and another
group is given the placebo.
There are three variations of the same
formula for the student’s t-test:
– The first variation should be used when the sample sizes are
unequal AND either one or both samples are small (n<30).
– The second variation should be used when the sample sizes are
equal (regardless of size).
– The third variation should be used when sample sizes are
unequal AND both sample sizes are large (n>30).
Paired t-test
Paired t-test: Used to determine whether
the difference between the means of two
groups (each containing the same
participants and being tested at two
different points) is statistically significant.
Example: Study set up where the same group
of participants is followed before and after an
experimental treatment.
Formula for Paired t-test
Now Back to the LDL
Medication Experiment
• Determine which type of t-test (student’s ttest or paired t-test) is most appropriate for
our LDL levels experiment.
– Because the two groups being tested are
independent of each other (participants in the
experimental group and the control group are
different), the student’s t-test is the
appropriate test to use.
Which variation of the formula
should we use?
– Both the experimental group and control
group contain four participants. Because the
sample sizes are equal, the second variation
should be used.
Calculations:
Calculations:
You’re Not Done Yet
• What you just calculated is called the t
value.
• Next you will use a t-table in order to
determine whether your t value is
statistically significant.
T-Table
How to Use a T-Table Step 1
Calculate the degrees of freedom for the
experiment.
Degrees of Freedom
Degrees of Freedom: A measure of
certainty that the sample populations are
representative of the population being
studied.
– To calculate the degrees of freedom for a
sample:
Total # data points in sample(s) - # populations being sampled
Formulas:
Calculating the Degrees of
Freedom for the LDL Medication
Experiment
How to Use a T-Table Step 2
Find the row
corresponding with
the appropriate
degrees of freedom
for your experiment.
How to Use a T-Table Step 3
Determine whether the t value exceeds
any of the critical values in the
corresponding row.
– If the t value exceeds any of the critical values
in the row, the alternative hypothesis can be
accepted. This means that the results ARE
STATISTICALLY SIGNIFICANT.
– If the t value is smaller than all of the critical
values in the row, the alternative hypothesis
can be rejected and the null hypothesis can
be accepted. This means that the results are
NOT STATISTICALLY SIGNIFICANT.
How to Use a T-Table Step 4
If the t value exceeds any of the critical values in
the row, you need to:
1) Follow the corresponding row over to the right until
you locate the column with the critical value that is
just slightly smaller than the t value.
2) Follow this column to the top of the table and
determine its corresponding p value.
3) Determine which p value to use, the p value
corresponding with a one-tailed test for significance
versus the p value corresponding with a two-tailed
test for significance.
One-Tailed vs. Two-Tailed Test
for Significance
• In order to determine whether you
completed a one-tailed or two-tailed test
for significance, look back to your
alternative hypothesis.
– If the alternative hypothesis was directional,
you have completed a one-tailed test for
significance.
– If the alternative hypothesis was
nondirectional, you have completed a twotailed test for significance.
One-Tailed or Two-Tailed Test
for Significance?
The Alternative Hypothesis was as follows:
– The researcher hypothesizes that the
medication will be more effective in lowering
LDL cholesterol levels in patients with
hypercholesterolemia than the placebo.
• Since this is a directional alternative
hypothesis, you have completed a onetailed test for significance.
What Does the p Value Mean?
The p values indicate the probability that the
difference between the means of the two
samples are due only to chance.
– If the p value ≤ 0.01, the results are VERY
SIGNIFICANT.
• The probability that the difference is due to chance
is less than or equal to1%.
– If the p value ≤ 0.05, the results are SIGNIFICANT.
• The probability that the difference is due to chance
is less than or equal to 5%.
– If the p value > 0.05, the results are NOT
SIGNIFICANT.
• The probability that the difference is due to chance
is greater than 5%.
Using the T-Table for the LDL
Medication Experiment
Putting It All Together
• The group given the new medication had a
mean of 146.75 and a standard deviation of
20.53.
• The group given the placebo had a mean of
157.25 and a standard deviation of 11.64.
• The t value for this study was 0.89, with a p
value greater than 0.05. This means that the
results are NOT statistically significant.
– This means that the researcher can reject the
alternative hypothesis and accept the null hypothesis
that the new medication did NOT lower patients’ LDL
levels more than the placebo.