Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
20. Introduction to Biostatistics – Part One [Start of recorded material] Introduction to Biostatistics This module is an introduction to biostatistics. My name is Dr Melanie Bell and I am the senior biostatistician for the Psycho-Oncology Cooperative Research Group (PoCoG). PoCoG is one of many cooperative trials groups in Australia. Outline In this module on Introduction to Statistics, we will be covering types of data, populations and samples, estimation, comparing groups, hypothesis test and P values and power. Our objective is for you to learn about some basic principles of statistics so that you can better understand cancer research. Types of Data and Measures of Effect Recall that the reason we do research is to answer a question. And the way to answer the question is to gather data that either support or refute our hypotheses. There are different types of data and therefore different ways that they are summarised and compared. This comparison is the measure of effect, essentially does the treatment work? Is the exposure related to disease? Additive Scale Differences and means and risk differences are just that - differences in summary values. They are on an additive scale because we are looking at differences. Our usual hypotheses in these cases is whether these differences are zero or not. Examples of Differences Here are some examples of differences. The mean difference in anxiety between the intervention and controlled group was 3.6 points, on a 21 point scale. The risk of clinical levels of anxiety for patients in the intervention group was 4% lower than patients in the controlled group. Multiplicative Scale The odds ratio, relative risk and hazard ratio are on the multiplicative scale. For example, the relative risk is the ratio of the risk of disease amongst the exposed as compared to the risk of disease in the unexposed. The hazard ratio is used to compare relative survival time between intervention and controlled groups. Our usual hypothesis in this case is whether the ratio is one or not. Multiplicative Scale : A Word of Caution -1- Be careful about multiplicative measures. It pays to know the base rate of risk. In the above study, the odds of having a fatal pulmonary embolism, a so-called economy class syndrome, are eight times greater after you’ve flown more than eight hours. But the risk of having a pulmonary embolism, even after having flown for at least three hours, is one in two million. There is a low absolute risk. Populations and Samples Because we can’t sample everyone in the population, or put everyone with a disease into an intervention study, we use samples. Statistical inference is the process of using information from a sample to infer something about the population from which it was drawn. We can use statistical inference for estimation and for comparing groups. Example of Estimation Here is an example of estimation. Suppose we wanted to know what is the quality of life in Australian testicular cancer survivors? How would we answer this? What is the population? What is the sample? Because we can’t ask every Australian man who has had testicular cancer, we select a sample and ask them. The population is Australian testicular cancer survivors and the sample is the men we actually ask about quality of life. From Population to Sample If we did the study again with a different sample, we would not get the exact same results. We never know what the true population value is, Mu. But if our sample is large enough and we have picked a representative sample; i.e. the sample is unbiased, we will come pretty close with the sample mean, x-bar. Variability The population is variable, so the sample estimate (x-bar) will not be the same as the true value (Mu). For example, we may sample 150 men and find the sample mean quality of life (x-bar) equals 65 out of 100, while the true mean (Mu) maybe 72. We may do another study with 150 men and find sample mean is 80, and another that gives mean quality of life as 74. How do we qualify this variability? Error The sample mean quality of life in testicular cancer patients is an estimate of the true, but unknowable, population mean. How far off we are is called the error and is made up of systematic error, or bias, and random variation. Systematic Error Systematic error is minimised through good study design, including choosing a representative sample. This helps to avoid selection bias. Random Variation -2- Random variation is the realm of statistics. Probability There is always uncertainty in assessing a population characteristic using information from a sample. This uncertainty is made up of between participant variability, within a participant variability, measurement error and other sources. We measure uncertainty using probability. Normal Distribution The normal distribution is also referred to as the bell curve or the galcian distribution. It is used extensively in statistics. The normal distribution is used to calculate the probability of results. It is used for quantifying our uncertainty about the mean quality of life. We use it to make 95% confidence intervals. A confidence interval is an interval that we can be reasonably sure contains the true population parameter, Mu. Remember, that no-one can know what the true population value, Mu, is, but we can estimate it from a sample. End of Part One [End of recorded material] -3-