Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics, Trial design, Epidemiological Techniques Performance status and co-morbidity I Dukic, B Zelhof, North Western Urology Teaching, 8th July 2014 Overview Basic statistics and epidemiological statistics Measurement of performance status and co-morbidity Trial design Vivas 1. What is evidence based medicine? What are the problems with randomisation studies? 2. How would you set up a phase III trial? 3. What do you understand by statistical significance and confidence intervals? Basic Statistics 2x2 table - sensitivity, specificity Prevalence/incidence ROC Hypothesis testing P value Confidence interval RRR, ARR, CER, EER, NNT Data types Qualitative- categorical measurement i.e. not number ▪ Nominal : e.g. yes/no ▪ Ordinal: rank e.g. most useful to least useful Quantitative – numerical measurement ▪ Interval- e.g. time interval ▪ Ratio-e.g height Parametric vs Non-Parametric Non-parametric test is less powerful, therefore, parametric should be used if possible, provided the following rules are fulfilled The basic distinction for parametric vs non-parametric is: 1. If measurement scale is nominal or ordinal (Qualitative) – nonparametric tests 2. If measurement is interval or ration scales (Quantitative) – parametric tests 3. Other considerations include normally distributed data in parametric tests, and the relationship between the groups or variables being tested Statistics Average of data points Mean, median, mode, weighted mean Measure of spread Centile, standard deviation, range How sure of answer P value, power calculation, ci etc Skew Mean = The average value, calculated by adding all the observations and dividing by the number of observations (parametric) Mode= is the most common (frequent) value- (non-parametric) Median= Middle value of the list (often used when data are skewed) (non parametric) 2x2 tables Patient HAS the disease Patient does NOT have the disease Positive test Correct Wrong Negative test Wrong Correct 2x2 tables Patient HAS the disease Patient does NOT have the disease Positive test Correct True Positive Wrong False positive Negative test Wrong False Negative Correct True Negative Sensitivity Patient HAS the disease Patient does NOT have the disease Positive test True Positive False positive Negative test False Negative True Negative Proportion of patients correctly identified as having disease = TP TP + FN Specificity Patient HAS the disease Patient does NOT have the disease Positive test True Positive False positive Negative test False Negative True Negative Proportion of patients patient without disease correctly identified = TN TN + FP Positive Predictive Value - PPV Patient HAS the disease Patient does NOT have the disease Positive test True Positive False positive Negative test False Negative True Negative Proportion of positive test with the disease = TP TP + FP Negative Predictive Value - NPV Patient HAS the disease Patient does NOT have the disease Positive test True Positive False positive Negative test False Negative True Negative Proportion with negative test without the disease = TN TN + FN Relevance of Sensitivity + Specificity Highly specific test is unlikely to give a false positive: +ve result should be regarded as true +ve - SPIN Sensitive test rarely misses a condition: -ve test reassuring SNOUT Type I + Type II Error Type I Rejecting the null hypothesis when it is in fact true is called a Type I error False positive Type II Not rejecting the null hypothesis when in fact the alternate hypothesis is true is called a Type II error False negative Likelihood Ratio the chance that a specified test result would be expected in a patient with the condition of interest, versus a patient without the condition. Sensitivity 1 - Specificity Receiver Operating Characteristics (ROC) Graph of the pairs of true positive rates (sensitivity) and False positive rates (100% - specificity) Receiver Operating Characteristics (ROC) • Graph of the pairs of true positive rates (sensitivity) and false positive rates (100% - specificity) • Assess if a test is useful • Can compare two different tests • Select optimal cut off value for test ROC It shows the trade off between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity). The closer the curve follows the lefthand border and then the top border of the ROC space, the more accurate the test. The closer the curve comes to the 45degree diagonal of the ROC space, the less accurate the test. The area under the curve is a measure of test accuracy Importance of cut –off value on test performance Importance of cut –off value on test performance ROC Curves for optimal PSA range in patients aged 50 - 80 50 - 60 years 60 - 70 years 70 - 80 years El-Gallery et al, Urology, 46:2000. 1995 Incidence + Prevalence • Incidence - The proportion of new cases of a disease in the population at risk during a specified time interval. It is usual to define the disorder, and the population, and the time, and report the incidence as a rate • Prevalence - This is a measure of the proportion of people in a population who have a disease at a point in time, or over some period of time. Hypothesis (significance) Testing Null hypothesis (H0) - the exposure / intervention being studied is not associated with the outcome of interest. The difference in means =0 Alternative hypothesis (H1) – holds if null hypothesis is not true Two-tailed tests - assume difference in means in both directions e.g smoking rates different in men and women men>women or women>men One-tailed test – direction of effect specified in H1 e.g new drug cannot make things worse. P value Viva question 3 Hypothesis testing – produces a p-value P – value is the probability of obtaining our results , if the null hypothesis is true (chance). It is not the probability that the null hypothesis is true or correct Allows assessment of whether findings are statistically significant or not statistically significant from a reference value P<0.05 (smaller the p – value the greater the evidence against the null hypothesis) P>0.05 we do not reject the null hypothesis – this does not mean the null hypothesis is true. Confidence Interval Viva Teaching – question 3 The range of plausible values for the “true” effect Generally use 95% certainty It can be used to make a decision with out providing an exact p - value If the value lies outside 95% C.I Then reject H0 – p < 0.05 no exact value. What determines the width of the confidence interval? 1. Sample size - a larger sample size will give more precise results with narrower C.I. 2. Variability of the characteristic being studied; the less variable it is (between subjects, within subjects, measurement error etc, the more precise) 3. The degree of confidence required (95%,90%, 65%); the more confidence required, the wider the interval. Relative Risk / Risk Ratio • Risk of an event / developing a disease relative to exposure. It is a ration of the probability of the event occurring in the exposed group versus the nonexposed group Outcome No Outcome Exposed A B Not Exposed C D RR = A / (A + B) C / (C + D) Risk ratio – ratio of risk in exposed / risk in unexposed Relative Risk 2 Experimental Event Rate (EER) = A A+B Control Event Rate (CER) = C C+D Relative Risk = EER CER Outcome No Outcome Exposed A B Not Exposed C D Relative Risk 3 Suited to clinical trials A relative risk of 1 means there is no difference in risk between the two groups An RR of < 1 means the event is less likely to occur in the experimental group than in the control group An RR of > 1 means the event is more likely to occur in the experimental group than in the control group Relative Risk Reduction Outcome No Outcome Exposed A B Not Exposed C D Absolute risk reduction = EER - CER Relative risk reduction = Risk difference = Baseline risk EER - CER CER Worked Example Relative Risk •‘PROSCAR more than halves the risk of developing acute urinary retention and the need for surgery’ •PLESS study RR example 2 RR example 3 • EER = Retention No Retention Total Placebo 42 1471 1513 Finasteride 99 1404 1503 A = A+B • CER = C RR = EER CER = 2.8% = 6.6% 2.8 = 0.42 1513 = C+D • 42 99 1503 = 6.6 Worked Example RRR Retention No Retention Total Placebo 42 1471 1513 Finasteride 99 1404 1503 RRR = risk difference = 6.6 – 2.8 = 58% baseline risk 6.6 Worked Example ARR Retention No Retention Total Placebo 42 1471 1513 Finasteride 99 1404 1503 Absolute Risk Reduction = Control event rate – Experimental event rate = CER – EER = 6.6 – 2.8 = 3.8 Worked Example NNT Retention No Retention Total Placebo 42 1471 1513 Finasteride 99 1404 1503 NNT = 1 ARR = 1 0.038 = 26 Statistics in PLESS study •‘PROSCAR more than halves the risk of developing acute urinary ’ retention and the need for surgery’ Retention in control = 6.6% Retention in treatment = 2.8% Relative risk reduction = 58% Absolute risk reduction = 3.8% NNT = 26 Combination therapy NNT MTOPS CombAT Progress clinically 37 18 Prevent AUR 147 22 Prevent surgery 52 18 McConnell, J.D. et al., 2003. The Long-Term Effect of Doxazosin, Finasteride, Roehrborn, C.G. et al., 2010. The Effects of Combination Therapy with Dutasteride and Tamsulosin on Clinical Outcomes in Men with Symptomatic and Combination Therapy on the Clinical Progression of Benign Prostatic Benign Prostatic Hyperplasia: 4-Year Results from the CombAT Study. Hyperplasia. New England Journal of Medicine, 349(25), pp.2387–2398. European Urology, 57(1), pp.123–131. Why is NNT important? It takes into account the underlying frequency of the outcome (which RRR does not) The ideal NNT is 1, where everyone has improved with treatment and no-one has with control. The higher the NNT, the less effective is the treatment NNTs are only one element of decision making and need to be integrated with Patients’ underling risk patient preferences, caregiver experience and judgment local constraints and conditions Common NNTs in Urology Appropriate antibiotic in UTI 1 Treatment for stone passage in ureteric colic 4 Intraurethral alprostatil for ED 2.3 Compression stockings for post op DVT 9 Aspirin/streptokinase after M.I 20 Finasteride to prevent retention 26 Aspirin after MI 40 Odds Ratio Way of comparing if the probability of an event is the same for 2 groups OR = 1 – equally likely in both groups OR > 1 – more likely in first group OR < 1 – less likely in first group Outcome No Outcome Exposed A B Not Exposed C D OR = A/B = AD C/D BC Summary of effect measures Measure of Effect Abreviation Description No effect Total success Absolute Risk Reduction ARR Absolute change in risk: risk of event in control group – risk of event in treated group ARR = 0% ARR = initial risk Relative Risk Reduction RRR Proportion of Risk remoed by treatment: ARR divided by initial risk in control group RRR = 0% RRR = 100% Relative Risk RR Risk of event in treated group divided by risk of event in control group RR = 1 or 100% RR = 0 Odds Ratio OR Odds of event in treated group divided by odds of event in control group OR = 1 OR = 0 Number Needed to Treat NNT Number of patients needing treatment to prevent one event: reciprocal of ARR NNT = ∞ NNT = 1/initial risk Other Random Statistical Stuff Population should be clearly defined Sample every individual from population it is drawn must have an equal chance of being included chi-square test (also chi squared test or χ2 test) is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough. Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW) or Wilcoxon rank-sum test) is a non-parametric statistical hypothesis test for assessing whether two independent samples of observations have equally large values Levels of evidence 1a – evidence obtained from meta-analysis of randomized studies 1b - evidence obtained from at least one randomized trial 2a - evidence obtained form well-designed controlled study without randomization 2b - Evidence obtained from at least one other type of well designed cohort or case-control study 3 - Evidence obtained from well designed non-experimental studies such as comparative studies, correlation studies and case report 4 - Evidence obtained from expert opinion Controlled trials Randomized control trial: Is a specific type of scientific experiment, participants in the trial are randomly allocated to either one intervention. It is the gold standard for a clinical trial. RCT are often used to test the efficacy of various types of intervention within a patient population. Double blinded Single blinded Non-Randomized trial Observational studies Cohort study (longitudinal study) One way of getting around the problem of the small proportion of people with the disease of interest is the cohort study Following a group of people (i.e. the cohort) over time and observe they develop disease Generally concerned with the aetiology of disease rather than treatment Observational studies Case-control study Another solution to the problem of the small number of people with the disease of interest. Patients with a particular condition are matched with control Case-control study is generally concerned with the aetiology of disease rather than treatment Cross-sectional study Cross-sectional studies involve data collected at a defined time. They are often used to assess the prevalence of acute or chronic conditions, or to answer questions about the causes of disease or the results of medical intervention. They may also be described as censuses Phases of studies Consists of 4 phases. If drugs pass phases 1-3, usually approved by regulatory bodies. Phase 1: Screening for safety- experimental drug or treatment in a small group of subjects (20-80) for the first time to evaluate its safety, determine a safe dosage range and identify side effects. Phase 2: Establishing the testing protocol- experimental treatment is given to a larger group (100-300) to see if it is effective and to further evaluate its safety. Phases of studies Phase 3: Final testing-treatment is given to a large group of subjects (1000-3000) to confirm its effectiveness, monitor side effects, compare it to commonly used treatments and collect information that will allow it to be used safely Phase 4: Post-approval studies- post marketing studies. Including treatment’s risks, benefits and optimal use Parametric statistical tests Compare the difference between normally distributed data sets Analysis of variance (ANOVA) – used to compare the means of two or more samples to see whether they come from the same population, testing the null hypothesis t-test – compare two samples Χ2 (Chi squared) – a measure of difference between actual and expected frequencies (usually based on a null hypothesis), alternative includes Fisher’s exact test (small numbers) and Mantel Haenszel test for comparing multiple two way tables Non parametric tests Data not normally distributed, examples include: Mann-Whitney U Wilcoxon rank test Kruskal Wallis Fridemann Resources Medical stats made easy (2003) Notes on statistics for medical students Statistics at Square One – BMJ publishing (free online) How to read a paper How to read a paper: Statistics for the non-statistician Clinicians guide to statistics for medical practice and research Measurement of Performance status and comorbidity K Moore Presented by I Dukic 2014 Performance Status Definition scales and criteria used by doctors and researchers to assess how a patient's disease is progressing, assess how the disease affects the daily living abilities of the patient, and determine appropriate treatment and prognosis. Assessment of performance status Various scoring systems Zubrod / WHO / ECOG Karnofsky Lansky - children Eastern Cooperative Oncology Group ECOG was established in 1955 as one of the first cooperative groups launched to perform multi-center cancer clinical trials. ECOG has evolved from a five member consortium of institutions on the East Coast to one of the largest clinical cancer research organizations in the U.S. ECOG PERFORMANCE STATUS 0 - Fully active, able to carry on all pre-disease performance without restriction 1 - Restricted in physically strenuous activity but ambulatory and able to carry out work of a light or sedentary nature, e.g., light house work, office work 2 - Ambulatory and capable of all selfcare but unable to carry out any work activities. Up and about more than 50% of waking hours 3 - Capable of only limited selfcare, confined to bed or chair more than 50% of waking hours 4 - Completely disabled. Cannot carry on any selfcare. Totally confined to bed or chair 5 - Dead Am. J. Clin. Oncol: Toxicity And Response Criteria Of The Eastern Cooperative Oncology Group. Oken et al. Am J Clin Oncol 5:649-655, 1982 WHO PERFORMANCE STATUS 0 - you are fully active and more or less as you were before your illness 1 - you cannot carry out heavy physical work, but can do anything else 2 - you are up and about more than half the day; you can look after yourself, but are not well enough to work 3 - you are in bed or sitting in a chair for more than half the day; you need some help in looking after yourself 4 - you are in bed or a chair all the time and need a lot of looking after KARNOFSKY PERFORMANCE STATUS 100 – you don’t have any evidence of disease and feel well 90 – you only have minor signs or symptoms but are able to carry on as normal 80 – you have some signs or symptoms and it takes a bit of effort to carry on as normal 70 – you are able to care for yourself but unable to carry on with normal activities/active work 60 – you need help from time to time but can mostly care for yourself 50 – you need quite a lot of help to care for yourself 40 – you always need help to care for yourself 30 – you are disabled and may need to stay in hospital 20 – you are sick, in hospital and need a lot of treatment 10 – you are very sick and unlikely to recover Comparing ECOG with KARNOFSKY ECO G Performance KARNOFSKY 0 Fully active 90-100 1 Not able to do strenuous work but otherwise OK 70-80 2 Capable of self-care only. Up and about > 50% of waking hours 50-60 3 Limited self care only. In bed or chair > 50% of waking hours 30-40 4 Completely disabled. No self care possible. Confined to bed/chair 12-20 Comorbidity Either the presence of one or more disease in addition to a primary disease, or, the effect of such additional disorders or diseases Many tests attempt to standardise the weight or value of comorbid conditions Attempt to consolidate each individual comorbid condition into a single, predictive variable that measures mortality or other outcomes Researchers have validated such tests because of their predictive value, but no one test is as yet recognized as a standard Comorbidity Index 13 different methods identified and critically reviewed – 1 disease count, 12 indexes Charlson Index – most extensively studied Cumulative Illness Rating Scale (CIRS) – addresses body systems without specific diagnoses Index of Coexisting Disease (ICED) – 2D, measures disease severity and disability Kaplan – for use in diabetes Insufficient data on others De Groot V et al. How to measure comorbidity: a critical review of available methods. J Clin Epid 2003; 56: 221-229. Charlson Comorbidity Index Charlson index most extensively studied and seems to be the method of choice in urology. 22 diseases in the index selected and weighed on the basis of the strength of their association with mortality Validated as a predictor of short term and long term mortality. Charlson Comorbidity Index Charlson Comorbidity Index Studied in Prostate, Renal, Bladder. Charlson score divided in to 3 levels Low (0) Medium (0-2) High (3 or more) Can look at Each 1 point increase in CCI score leads to a 2.3 increase in relative risk of death at 12 months. Data from 1987 Survival for the same level of comorbidity over time (CaP) Survival for different levels of comorbidity (RCC) D Tumour Factors O Patient Factors Performance Status Comorbidity Index Life Tables Treatment C T O Patient Preference R Observation