Download transcript of this learning activity

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
20. Introduction to Biostatistics – Part One
[Start of recorded material]
Introduction to Biostatistics
This module is an introduction to biostatistics. My name is Dr Melanie Bell and I am
the senior biostatistician for the Psycho-Oncology Cooperative Research Group
(PoCoG). PoCoG is one of many cooperative trials groups in Australia.
Outline
In this module on Introduction to Statistics, we will be covering types of data,
populations and samples, estimation, comparing groups, hypothesis test and P values
and power. Our objective is for you to learn about some basic principles of statistics so
that you can better understand cancer research.
Types of Data and Measures of Effect
Recall that the reason we do research is to answer a question. And the way to answer
the question is to gather data that either support or refute our hypotheses. There are
different types of data and therefore different ways that they are summarised and
compared. This comparison is the measure of effect, essentially does the treatment
work? Is the exposure related to disease?
Additive Scale
Differences and means and risk differences are just that - differences in summary
values. They are on an additive scale because we are looking at differences. Our usual
hypotheses in these cases is whether these differences are zero or not.
Examples of Differences
Here are some examples of differences. The mean difference in anxiety between the
intervention and controlled group was 3.6 points, on a 21 point scale. The risk of
clinical levels of anxiety for patients in the intervention group was 4% lower than
patients in the controlled group.
Multiplicative Scale
The odds ratio, relative risk and hazard ratio are on the multiplicative scale. For
example, the relative risk is the ratio of the risk of disease amongst the exposed as
compared to the risk of disease in the unexposed. The hazard ratio is used to compare
relative survival time between intervention and controlled groups. Our usual hypothesis
in this case is whether the ratio is one or not.
Multiplicative Scale : A Word of Caution
-1-
Be careful about multiplicative measures. It pays to know the base rate of risk. In the
above study, the odds of having a fatal pulmonary embolism, a so-called economy class
syndrome, are eight times greater after you’ve flown more than eight hours. But the
risk of having a pulmonary embolism, even after having flown for at least three hours, is
one in two million. There is a low absolute risk.
Populations and Samples
Because we can’t sample everyone in the population, or put everyone with a disease
into an intervention study, we use samples. Statistical inference is the process of using
information from a sample to infer something about the population from which it was
drawn. We can use statistical inference for estimation and for comparing groups.
Example of Estimation
Here is an example of estimation. Suppose we wanted to know what is the quality of
life in Australian testicular cancer survivors? How would we answer this? What is the
population? What is the sample? Because we can’t ask every Australian man who has
had testicular cancer, we select a sample and ask them. The population is Australian
testicular cancer survivors and the sample is the men we actually ask about quality of
life.
From Population to Sample
If we did the study again with a different sample, we would not get the exact same
results. We never know what the true population value is, Mu. But if our sample is
large enough and we have picked a representative sample; i.e. the sample is unbiased,
we will come pretty close with the sample mean, x-bar.
Variability
The population is variable, so the sample estimate (x-bar) will not be the same as the
true value (Mu). For example, we may sample 150 men and find the sample mean
quality of life (x-bar) equals 65 out of 100, while the true mean (Mu) maybe 72. We
may do another study with 150 men and find sample mean is 80, and another that gives
mean quality of life as 74. How do we qualify this variability?
Error
The sample mean quality of life in testicular cancer patients is an estimate of the true,
but unknowable, population mean. How far off we are is called the error and is made
up of systematic error, or bias, and random variation.
Systematic Error
Systematic error is minimised through good study design, including choosing a
representative sample. This helps to avoid selection bias.
Random Variation
-2-
Random variation is the realm of statistics.
Probability
There is always uncertainty in assessing a population characteristic using information
from a sample. This uncertainty is made up of between participant variability, within a
participant variability, measurement error and other sources. We measure uncertainty
using probability.
Normal Distribution
The normal distribution is also referred to as the bell curve or the galcian distribution. It
is used extensively in statistics. The normal distribution is used to calculate the
probability of results. It is used for quantifying our uncertainty about the mean quality
of life. We use it to make 95% confidence intervals. A confidence interval is an
interval that we can be reasonably sure contains the true population parameter, Mu.
Remember, that no-one can know what the true population value, Mu, is, but we can
estimate it from a sample.
End of Part One
[End of recorded material]
-3-