Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Back to Basics, 2017 POPULATION HEALTH Session 1: Critical Appraisal, Epidemiology Methods, Biostatistics Ian McDowell School of Public Health [email protected] (Based on slides by Dr. N. Birkett) March 2017 1 THE PLAN (2) • 4 hour session. Aim to spend about 3 hours on lecture – Review MCQs in remaining time • A 10 minute break about half-way through • Please interrupt for questions, etc. if things aren’t clear. – Goal is to help you, not to cover a fixed curriculum. March 2017 2 LMCC Objective 78.2 (abbreviated) 78.2: ASSESSING AND MEASURING HEALTH STATUS AT THE POPULATION LEVEL • Describe the health status of a defined population. • Measure and record the factors that affect the health status of a population with respect to the principles of causation – Critically evaluate possible sources of data … – Know how to access and collect health information to describe the health of a population … – Analyze population health data using appropriate measures – Interpret and present the analysis of health status indicator – Apply the principles of epidemiology by accurately discussing the implications of the measures … http://mcc.ca/wp-content/uploads/Qualifying-examination-objectives.pdf March 2017 (see page 208) 3 Structure • Epidemiology & biostatistics methods are the basic science that underpin medical & health services research • MCC does not test basic sciences, but questions cover their application in practice. The emphasis is on critical appraisal of information, and • Your daily clinical application is via EvidenceBased Medicine, so this offers one way to structure this presentation. March 2017 4 Content Map Study Designs Sampling EBM: Causal criteria Measurements March 2017 Critical appraisal, PICO, Up to Date Biostatistics Data analyses Errors & validity 5 EBM judgment criteria ARE THE STUDY RESULTS VALID? Based on • Choice of study design • Sampling (representativeness, chance errors) • Quality of study execution • Validity of measures (biases) • Strength of associations found • Control for confounding factors March 2017 6 EBM Domains • Etiology, or identifying harm – Highlights causal criteria, strength of association, observational study designs • Therapy – Focus on experimental designs & execution • Diagnosis – Highlights validity of measures • Prognosis – Sampling, adjustment, likelihood estimates March 2017 7 1. Descriptive Studies The ED seems to be seeing more patients every year. Is this a true trend, or just a chance fluctuation? March 2017 8 Descriptive & Analytic studies • Observational research; • Count things (prevalence, incidence); • Perhaps test associations (i.e., analytic study) – E.g. different prevalence among men & women = association between sex & the disease. March 2017 9 Basic terms • Prevalence: – The probability that a person has the outcome of interest today. Counts existing cases of disease. Useful for measuring burden of illness: how big is the problem? How many hospital beds will we require? • Incidence: – The probability (chance) that someone without the outcome will develop it over a fixed period of time. Relates to new cases of disease. Useful for studying causes of illness. March 2017 10 Prevalence • Back in July, 2010, 140 graduates from the U. of Ottawa medical school started working as R1’s. • Of this group, 100 had insomnia on June 30. • Therefore, the prevalence of insomnia was: 100 = 0.72 = 72% 140 March 2017 11 Incidence Proportion (risk) • In July 2010, 140 graduates from U. Ottawa medical school start working as R1’s. • Over the next year, 30 develop a stomach ulcer. • Therefore, the incidence proportion (risk) of an ulcer in the first year post-graduation is: March 2017 12 Incidence Rate (1) • Incidence rate is the ‘speed’ with which people get ill. Rates involve time. • Everyone dies (eventually). Many consider it desirable to delay dying: death rate is lower. • Compute with person-time denominator: PT = # people * duration of follow-up # new cases Incidence Rate = Person-time of follow-up March 2017 13 Incidence rate (2) • 140 U. of Ottawa medical students were followed during their residency – 50 did 2 years of residency – 90 did 4 years of residency – Person-time = 50 * 2 + 90 * 4 = 460 PY’s • During follow-up, 30 developed stress ulcers. • Incidence rate of stress is: 30 65 cases IR = = 0.065 cases/PY = 460 1,000 PYs March 2017 14 Prevalence & incidence • As long as conditions are ‘stable’ and disease is fairly rare, we have this relationship: P » I *d That is, Prevalence ≈ Incidence rate * average disease duration March 2017 15 Sampling - Glossary • Target population: population we would like to describe or generalise results to – E.g. all patients with lung cancer in Ontario, not just those you studied at the Civic hospital on Ottawa. • Study population: group from which we draw the sample (Civic hospital) – The study population may not be equivalent to the target population, so a random sample that is representative of the study population may not be representative of the target population • Sampling frame: list of elements in the study population (all lung cancer patients at the Civic) • Probability samples: probability of being included in the sample is known for each unit – Important to adjust estimates for different parts of the population. March 2017 16 Target population (Lung cancer patients in Ottawa) Inferences Study population (Lung cancer patients in Civic hospital) March 2017 drawn Sample 17 Glossary - continued • • • • • • Simple random sampling (SRS): each person has the same (& non-zero) probability of being selected Systematic sample: Every nth individual Convenience sample: e.g. patients attending the out-patient clinic this morning Stratified sample: reduce sampling error by sub-groups on your list (e.g. by age-group). Cluster sample: for efficiency, instead of randomly sampling from all over the city, first choose certain study populations only (e.g., schools), then sample more intensively from them. Sampling unit = the school. Multistage sample: e.g. randomly select cities, then hospitals in those cities, then patients in the selected hospitals. March 2017 18 Sampling error • Random samples most likely to give unbiased estimates of the sample population parameters. But: • Random samples may differ from the study population due to chance, so results based on them may be ‘off-target’ (i.e. random error: variation in a measurement due to chance) • Random sampling error or standard error = square root (variance/sample size) • Standard errors are used in the calculation of 95% confidence intervals. March 2017 19 Advantages of a Simple Random Sample • Many statistical tests assumes that you are using SRS • Random samples with good response rates may be less susceptible to selection bias • Random sampling may (hopefully!) produce a representative sample. March 2017 20 Disadvantages of a Simple Random Sample • Requires complete list of population • Random samples may have low response rates resulting in selection bias – E.g., people may not understand why they were chosen: “You were chosen by chance” is not very appealing! • May still result in a non-representative sample, particularly with small sample sizes or low response rates. March 2017 21 Biostatistics (1): Descriptive Statistics These describe data results: average values & spread of values March 2017 22 MCQ Your colleague shows you a study result quoting a mean value of “4.6 +/- 2.1”. You discuss the meaning of this. – – – – – It shows the standard deviation It shows the variance around the mean It shows the standard error of the mean It shows the confidence level GOK March 2017 23 ‘Central Tendency’ & Dispersion • Mean: – average value. Measures the ‘centre’ of the data. Will be roughly in the middle. • Median: – The middle value: 50% above and 50% below. Used when data are skewed. • Variance: – A measure of how spread out the data are. – Defined by subtracting the mean from each observation, squaring, adding them all up and dividing by the number of observations. March 2017 24 Variance & Standard Deviation • Standard deviation: – Square root of the variance. March 2017 25 March 2017 26 Multiple Studies • EBM is not just based on results of one study of some individuals; • It is based (ideally) on results accumulated over many studies • There may be some variation in these results • So we need to summarize several studies … March 2017 27 Standard Error • The standard deviation looks at the variation of data in individuals • But we often repeat studies. Each produces a mean value, and these mean values may vary somewhat. – What is the distribution of these means? – Will be ‘normal’, ‘Gaussian’ or ‘Bell curve’ – Mean of the means will be same as population mean – But the variance of the means will be smaller than population variance • This is the Standard error (of the mean): March 2017 sd , where n is the sample size n 28 Estimation • Usually we study a sample of people to estimate ‘parameters’ for a broader population. – The ‘population’ could refer to all Canadians, or everyone of a certain group, or to all patients with a particular disease, etc. • Sample used in estimation ideally should be randomly selected. • The accuracy of the resulting estimates is described by ‘inferential statistics’. March 2017 29 Confidence Intervals – A range of numbers which tell us where the best estimate of the correct answer (or parameter) lies. • For a 95% confidence interval, we are 95% sure that the true value lies inside the interval. – Usually computed as: mean ± twice the standard error March 2017 30 Example of Confidence Interval • If sample mean is 80 with a standard deviation of 20, and the sample size is 25 then: – We can be 95% confident that the true mean lies within the range: 80 ± (2*4) = (72, 88). March 2017 31 Example of Confidence Interval • If the sample size were 100, then – 95% confidence interval is: 80 ± (2*2) = (76, 84). – Increasing sample size makes our estimate of the parameter more precise. March 2017 32 Analytic Study Designs March 2017 33 MCQ A patient (who regrettably did not attend B2B) asks you to explain why a randomised trial is considered so superior to other designs. Which of the following is the best answer? – – – – The randomization ensures a representative sample RCT approaches a true experimental design It achieves better control of confounders The prospective design allows for more complete follow-up – The results permit both relative and absolute analyses. 34 March 2017 Back to EBM • The Etiology, Harm & Therapy domains of EBM all involve studies designed to find causal relationships. • This requires analytic studies – examining relationships between presumed causal factors and health states. • Study designs can be observational or (ideally) experimental. – The designs vary in terms of how well they can 35 address the causal criteria. March 2017 Causal Criteria Criterion Comments Temporal sequence Strong criterion, but when did the disease begin? Strength of association Depends on how other factors are controlled in analysis Biological gradient; dose-response OK, but there may be a threshold Specificity of association OK criterion for ID, but not for obesity, smoking, etc. Consistency across studies Good, unless relationship applies only to minority of people Biological rationale? Good if we have a theory. Cessation of exposure Great, if pathology is reversible March 2017 36 Cohort studies (1) 1. Select non-diseased subjects based on their exposure status • Main method used: • Select a group of people with the exposure of interest • Select a group of people without the exposure • Can also simply select a group of people without the disease and study a range of exposures. 2. Follow the group to determine what happens to them. March 2017 37 Cohort studies (2) 3. Compare the incidence of the disease in exposed and unexposed people • If exposure increases risk, incidence will be higher in exposed subjects than unexposed subjects • Compute a relative risk (risk ratio). 4. Framingham Study is standard example. March 2017 38 Study begins Outcomes time Disease Exposed group Unexposed group No disease Disease No disease March 2017 39 Cohort studies (4) Disease YES RISK RATIO NO = 42 122 Risk in exposed: Exp YES 42 80 122 NO 43 302 345 85 382 467 Risk in Non-exposed = 43 345 If exposure increases risk, you would expect 42 43 122 to be larger than 345. How much larger can be assessed by the ratio of one to the other: Risk Ratio (RR) = Risk in exposed Risk in unexposed 42 ( ) 0.344 = 122 = ( 43 345 ) 0.125 = 2.76 March 2017 40 Cohort studies (5) • To avoid long follow-up, can use a historical cohort study design. • Recruit subjects sometime in the past • Usually use administrative records to record exposures • Follow-up to the present & record outcomes • Can continue to follow into the future. March 2017 41 Cohort studies (6) • Example: cancer in Gulf War Veterans • Exposure took place in 1991-2 • Study is conducted in 2016 • Identify soldiers deployed to Persian Gulf in 1991 • Identify soldiers not deployed to Persian Gulf in 1991 • Compare incidence of cancer in each group from 1991 to 2010 March 2017 42 Case-control studies (1) • Select subjects based on their final outcome. – Select a group of people with the outcome or disease (cases) – Select a group of people without the outcome (controls) – Ask them about past exposures (or get from records) – Compare the frequency of exposure in the two groups • If exposure increases risk, the odds of exposure in the cases should be higher than the odds in the controls – Compute an Odds Ratio – Under many conditions, OR ≈ RR March 2017 43 The study begins by selecting subjects based on Exposed Unexposed Review records Disease (cases) Review records No disease (controls) Exposed Unexposed March 2017 44 Case-control studies (3) Disease? YES ODDS RATIO NO = 42 18 Odds of exposure in controls = 43 67 Odds of exposure in cases Exp? YES 42 18 NO 43 67 If exposure increases risk, you will find more exposed 85 85 170 cases than exposed controls. The odds of exposure for cases would be higher 42 18 > 43 67 This can be assessed by the ratio of one to the other: OR is often called the: ‘cross-product ratio’ Odds Ratio (OR) = Exposure odds in cases Exposure odds in controls 42 ( ) = 18 ( 43 67 ) 42 × 67 43×18 = 3.64 = March 2017 45 Case-control studies (3) Disease? YES ODDS RATIO NO Odds of exposure in cases Exp? YES 42 18 NO 43 67 = 42/43 Odds of exposure in controls = 18/67 If exposure increases risk, cases should be more 85 85 170 likely to have been exposed than controls. The odds of exposure for cases would be higher (42/43 > 18/67). This can be assessed by the ratio of one to the other: Odds Ratio (OR)= Exposure odds in cases Exposure odds in controls OR is often called the: ‘cross-product ratio’ = (42/43) / (18/67) = (42x67) / (43x18) = 3.64 March 2017 46 Randomized Controlled Trials Basically a cohort study where the researcher decides which exposure (e.g. treatment) the subjects get. – Recruit a group of people meeting pre-specified eligibility criteria. – Randomly assign some subjects (usually 50% of them) to get the control treatment and the rest to get the experimental treatment. – Follow-up the subjects to determine the risk of the outcome in both groups. – Compute a relative risk or otherwise compare the groups. March 2017 47 RCTs (2) Some key design features – Intention to treat vs. per protocol analysis – Allocation concealment • the person randomizing should not know the next treatment allocation – Blinding (masking) • of the Patient, Treatment team, Outcome assessor, Statistician – Monitoring committee March 2017 • Early termination rules 48 RCTs (3) Some technical & ethical challenges: • Equipoise – Must be no clear advantage of the treatment, or it’s unethical to withhold it from controls – But, if you’re not confident of superiority, why do a trial? • Often highly selective study samples – Generalizable? • Contamination – Control group members get the new treatment • Co-intervention – Some people get treatments other than those under study. March 2017 49 RCT: analysis of outcomes • Relative Risk: RR = Incidence treatment / Incidence control • Absolute risk reduction ARR = Incidencecontrol - incidencetreatment = attributable risk • Relative risk reduction ARR RRR = = 1- RR incidencecontrol March 2017 50 RCTs – Options for Analysis Treatment Control Asthma attack 15 25 No attack 35 25 Relative Risk = Absolute Risk Reduction = Relative Risk Reduction = Number Needed to Treat = March 2017 Total Incidence 50 50 .30 .50 0.3 / 0.5 = 0.6 0.5 - 0.3 = 0.2 0.2 / 0.5 = -.4 = 40% 1/ARR = 1 / 0.2 = 5 51 78.2: CRITICAL APPRAISAL Hierarchy of evidence (highest to lowest quality, approximately) • • • • • • • • • March 2017 Meta analyses & Systematic reviews Experimental (Randomized trial) Prospective Cohort Historical Cohort Case-Control Cross-sectional Ecological (for individual-level exposures) Case report/series Expert opinion 52 Confounding Interpreting associations and distinguishing causal influences (Note: Standardization is covered in the Assessing & Measuring Health session) March 2017 53 MCQ An anxious patient brings you an article that studied an association between alcohol consumption and cancer of the mouth. The authors stated that the causal link was confounded by smoking. Which of the following represents the best explanation of what this may mean? – – – – – It was a mistaken result due to a flawed analysis. Key criteria for a causal link were not met. Smokers gave unreliable information on their drinking. Adjusting for smoking removes the association. Smoking and drinking mutually interact in causing cancer. March 2017 54 Confounding: an example • Does drinking alcohol cause oral cancer? – A case-control study found an OR of 3.4 (95% CI: 2.1 - 4.8) • BUT, the effect of alcohol may be ‘mixed up’ with the effect of smoking (‘confusion’ en français). • A confounder is an extraneous factor which is associated with both exposure and outcome, and is not an intermediate step in the causal pathway. – Smoking causes mouth cancer; – Heavy drinkers tend to be heavy smokers; – Smoking is not part of causal pathway for alcohol. March 2017 55 The Confounding Triangle ? Causal association? Oral cancer Alcohol Smoking March 2017 56 Confounding So, does alcohol drinking cause oral cancer? • Run analyses for smokers separately from non-smokers: – Among smokers, we find: • OR = 1.3 (95% CI: 0.92-1.83) – Among non-smokers, we find: • OR = 1.1 (95% CI: 0.8-1.7). – Not significant: likely confounded by smoking! • Logistic regression commonly used to adjust for multiple confounders. 57 March 2017 TIME FOR A BREAK! March 2017 58 Biostatistics (2): Inferential Statistics & significance testing How likely is it that the result in this study accurately reflects what is going on in the broader population? March 2017 59 MCQ A colleague is reading an article and asks you to clarify the meaning of ‘inferential statistics’. 1. Statistics that show which conclusion is most likely to be correct 2. Mathematics that estimate the likelihood of a chance finding 3. An analysis that demonstrates a significant correlation 4. Results that meet criteria for causation 5. Analyses that involve more than two variables. March 2017 60 MCQ A patient shows you a study evaluating the medication you recently prescribed for her. It shows a p-value of 0.05. She asks you to explain this statistic. 1. 2. 3. 4. 5. March 2017 The p value summarizes the magnitude of the benefit of the treatment. It demonstrates a significant benefit to the therapy. The study was small; a larger one would have provided a better p-value. It shows the probability that there is no benefit of the treatment. It shows that one can never be certain that a given treatment will work. 61 BIOSTATISTICS Inferential Statistics • Describing things is fine but limited. • Want to compare different groups to see if they differ more than you might expect by chance alone: – New drug treatments compared to old ones – Exposure to pollutants and risk of cancer. • Inferential statistics makes this possible – requires a good study design to avoid bias. March 2017 62 Experimental logic: “I cured the patient” • Start with a theory: ‘Magnetic personality cures people’. • We cannot prove a theory, but we can disprove predictions it makes, which casts doubt on the theory. • So, set up a Null Hypothesis (hoping secretly to disprove this): “The patients are no better after seeing Dr Gauss.” • Generate some data. • Check to see if the results are consistent with the null hypothesis. – If the result is ‘unlikely’, then reject the null hypothesis. • Statistics just puts a mathematical overlay on this approach. March 2017 63 Hypothesis Testing (1) Used to compare two or more groups. • General process of hypothesis testing : 1. We first assume that the two groups have the same outcome results. = null hypothesis (H0) 2. Generate some data March 2017 64 Hypothesis Testing (2) 3. From the data, compute some number (a ‘statistic’) that interprets any difference between the groups 4. Under the null hypothesis (H0), this value should be zero. 4. Compare the value you get to ‘0’. • If difference is ‘too large’, we can conclude that our assumption (null hypothesis) is unlikely to be true • So, reject the null hypothesis. March 2017 65 Hypothesis Testing (3) 5. We quantify the extent of our discomfort with the null hypothesis through the significance level or p-value. – ‘p’ = probability that the difference we found could occur if the null hypothesis really is true. – Reject H0 if the p-value is ‘too small’ • What is ‘too small’? – arbitrary – tradition sets it at < 0.05 (a 5% chance). March 2017 66 Hypothesis Testing (4) • Defining the p-value – The probability that we reject the null hypothesis (conclude it is wrong) when it is really right. (A false positive) • Calculation of p-value – Assuming that the null hypothesis is true, – What is the probability that our statistic would be at least as big as what we actually got? • It is not the probability that the groups are different • We can never prove that the null hypothesis is true • We either reject or accept the null hypothesis March 2017 67 Example of significance test (1) • Is there an association between sex and smoking? – 35 of 100 men smoke but only 20 of 100 women smoke • Usually present data in a 2x2 table: Smoke Don’t smoke Men 35 65 100 Women 20 80 100 55 145 200 • Compare observed #’s to what we would have expected under the null hypothesis. March 2017 68 Example of significance test (2) • Null Hypothesis – There is no effect of sex on the probability a person is a smoker. • Calculate a chi-square value (the statistic) 2 c = 5.64. March 2017 69 Example of significance test (3) – If there is no effect of sex on smoking (the null hypothesis), a chi-square value as large as 5.64 would occur only 1.8% of the time. p = 0.018 – Gives moderately strong evidence to reject the null hypothesis – Would conclude that smoking prevalence differs by sex. March 2017 70 Example of significance test (4) • Instead of computing the p-value, could compare your statistic to the ‘critical value’ – The value of the Chi-square which gives p = 0.05 is 3.84 – Since 5.64 > 3.84, we conclude that p < 0.05 • Doesn’t tell us what the probability actually is. – Just that the observed statistic is rarer than our cut-off value March 2017 71 Examples of significance tests (5) • Common methods used are: – T-test – Z-test – Chi-square test – ANOVA • Approach can be extended through the use of regression models to relate several independent variables to one dependent variable: – Linear regression – Logistic regression – Cox models March 2017 72 Back to Hypothesis Testing … (5) • p-values are key for interpreting hypothesis tests. • But they are being down-played – Modern approach is to present 95% confidence intervals of the treatment effect rather than a p-value – Gives estimate of the range of potential benefits. • Now, we need to get to statistical power. • So, a bit more stuff and some more terms (sorry). March 2017 73 Hypothesis Testing (6) • Hypothesis tests can get things right or wrong • Two types of errors can occur: – Type 1 error (aka. Alpha) – Type 2 error (aka. Beta) • p-value – Essentially the alpha value • Power – Related to type 2 error (Beta) March 2017 74 Hypothesis Testing (7) • p-value: • The chance you will say there is a difference between groups (e.g. the new drug is better) when there really is NO difference = risk of an alpha error March 2017 75 Hypothesis testing (8) Actual Situation No effect Results of Stats Analysis March 2017 Effect No effect Effect (No error) Type 2 error (β) Type 1 error (α) (No error) 76 Hypothesis Testing (9) • Statistical Power: – It’s easy to show that a drug reduces BP by 40 mmHg – Hard to show that it reduces BP by 1 mmHg – Study more likely to ‘miss’ the small effect than the large effect. – Statistical Power is: • The chance the study will show a difference between groups when there really is a difference of a given amount. • Basically, this is 1-β – Power depends on how big a difference you consider to be important March 2017 77 How to improve your power? • Increase sample size • Improve precision of the measurement tools used (reduces standard deviation) • Use better statistical methods • Use better designs • Reduce bias • Set a bigger difference you wish to find. March 2017 78 Study Measurements (See EBM Diagnosis and Prognosis studies) March 2017 79 Measurement Core Concepts (1) • Random Variation (chance): – Every time we measure something, errors will occur. • Any sample will include people with values different from the real value, just by chance. • These are random factors which affect the precision (SD) of our data but not necessarily the validity when studying a large group of people. – Statistics and bigger sample sizes can help here. March 2017 80 Measurement Core Concepts (2) • Bias: – A systematic factor which causes two groups to differ. • A study uses a two-section scale to measure height • Scale was incorrectly assembled (with a gap between the upper and lower sections). • Over-estimates height for everyone (a bias). – Bigger numbers and statistics don’t help much; you need good measurements & a good design instead. March 2017 81 Reliability • = reproducibility. Does it produce the same result repeatedly? (A reliable car starts every time) • Related to chance error • Random errors average out in the long run • But in patient care you hope to do a test only once – Therefore, you need a reliable test March 2017 82 Validity • Whether a test measures what it purports to measure – Is a disease present (or absent)? – How often is the test result correct? – What interpretation can you place on the test result? • Normally use criterion validity – Compare test result to a gold standard • SIM web page on validity March 2017 83 Reliability and Validity Target shooting as a metaphor Reliability Low Low • • • • • • Validity • High • March 2017 High ••• ••• • • • • ••• •• • 84 Test Properties - Validity (1) Test +ve Diseased Not diseased 90 5 True positives Test -ve 10 False negatives 100 March 2017 95 False positives 95 105 True negatives 100 200 85 Test Properties (2) Diseased Not diseased Test +ve 90 5 95 Test -ve 10 95 105 100 100 200 Sensitivity = 0.90 March 2017 Specificity = 0.95 86 2x2 Table for Testing a Test Gold standard Test Positive Test Negative March 2017 Disease present Disease absent a (TP) c (FN) Sensitivity = a/(a+c) b (FP) d (TN) Specificity = d/(b+d) 87 Test Properties (3) • Sensitivity = • Specificity = Pr(test positive in a person with disease) Pr(test negative in a person without disease) • Range: 0 to 1 – – – – > 0.9: 0.8-0.9: 0.7-0.8: < 0.7: March 2017 Excellent Not bad So-so Poor 88 Test Properties (4) • Sensitivity and Specificity – Values depend on cutoff point between normal/abnormal – Generally, high sensitivity is associated with lower specificity and vice-versa. – Not affected by prevalence, if ‘case-mix’ is constant • Do you want a test to have high sensitivity or high specificity? – Depends on cost of ‘false positive’ and ‘false negative’ errors – PKU – one false negative is a disaster – Ottawa Ankle Rules: insisted on sensitivity of 1.00 – Incarceration: high specificity March 2017 89 Overall test performance: the ROC • Test scores for sick and well populations almost always overlap. • Change the cut-off point: • As sensitivity goes up, specificity will go down • Graph the sen & spec (actually, TP 1-spec) for each cut-point. • = Receiver Operating Characteristic curves. • Area under the curve = measure of overall test quality. 0.9 = excellent, 0.8 = good, 0.5 = useless. March 2017 FP 90 Ruling in & out • “SpPIn” – Positive result on a specific test rules Dx in – High specificity means that test identifies only this particular disease (it’s very choosy), so + score rules in. • “SnNOut” – Negative result on a sensitive test rules it out – High sensitivity means it would most likely find disease if present, so you can rely on a negative result to exclude the Dx. March 2017 91 Test Properties (5) • Sen & Spec not directly useful to clinician: You see the test result, but don’t know if it’s a true or false result. • Patients don’t ask: – “If I’ve got the disease, how likely is that the test will be positive?” • They ask: – “My test is positive. Does that mean I have the disease?” → Predictive values. March 2017 92 Predictive Values • Based on rows, not columns + - TP FP FN TN – PPV interprets positive test (% of Pos that are TP) – NPV interprets negative test (% of Neg that are TN) • Shows the probability that the patient has (or does not have) the disease, based on the test result – Immediately useful to clinician & patient. March 2017 93 Test Properties (9) Test +ve Diseased Not diseased 90 5 TP Test -ve March 2017 95 FP 10 95 105 100 100 200 PPV = 0.95 NPV = 0.90 94 2x2 Table for Testing a Test Gold standard Disease Disease Present Absent Test + a (TP) b (FP) Test - c (FN) d (TN) March 2017 PPV = a/(a+b) NPV= d/(c+d) 95 Test Properties (10) Diseased Test +ve 90 Not diseased 5 Test -ve 10 95 105 100 100 200 95 PPV = 0.95 What happens if the disease is less common? March 2017 96 Test Properties (11) Diseased Test +ve 9 Not diseased 5 Test -ve 1 95 10 100 95 14 105 96 200 110 PPV = 9/14 = 0.64 What happens if the disease is less common? March 2017Increasing numbers of false positives 97 Prevalence and Predictive Values • When disease gets rarer, more people end up in ‘no disease’ group than in ‘disease’ group • Therefore, for a given sensitivity & specificity: – # true positives goes down – # false positives goes up • PPV goes down • Conversely, NPV goes up. March 2017 98 Prevalence and Predictive Values A Dilemma • Predictive values of a test are dependent on the pre-test prevalence of the disease • Prevalence is lower in non-tertiary care settings. • Most tests are developed and evaluated in tertiary care settings. • PPV will be lower in non-tertiary care settings. March 2017 99 Prevalence and Predictive Values • So, how do you determine how useful a test will be in a different patient setting? • Process is often called ‘calibrating’a test – Relies on the stability of sensitivity & specificity across populations. – You need to know (or guess) the prevalence in the new setting. – Allows you to estimate what the PPV and NPV would be in the new setting. March 2017 100 Methods for Calibrating a Test Four methods: 1. Apply test + gold standard to a consecutive series of patients from the new population • Rarely feasible (especially during an exam) 2. Hypothetical table [simple & feasible] • Assume the new population has (e.g.) 10,000 people • Fill in the cells based on the prevalence, sensitivity and specificity [next slide] 3. Likelihood Ratios & a Nomogram • Only useful if you have access to the nomogram 4. Bayes’ Theorem (calculates Likelihood Ratios) March 2017 101 Calibration by hypothetical table • Pretend you can do a new study in your patient population • Assume a practice size • 10,000 makes the numbers nice • Figure out how many patients with disease there would be (prevalence) • Figure what test results you would expect to see • Compute PPV March 2017 102 Calibration by hypothetical table Fill cells in following order: “Truth” Test +ve Test -ve Disease Present Disease Absent Total PV 4th 7th 8th 10th 6th 9th 11th 5th Sensitivity Specificity Total March 2017 2nd 3rd Pre-test Prevalence 10,000 103 Test Properties (11) Tertiary care: Prev=0.050 Diseased Test +ve Test -ve March 2017 Not diseased 450 25 50 475 500 500 Sens = 0.90 Spec = 0.95 475 PPV = 0.89 525 1,000 104 Test Properties (12) Primary care: Prev=0.01 Diseased Test +ve Test -ve 585 90 495 10 9,405 9,415 100 9,900 10,000 Sens = 0.90 March 2017 Not diseased PPV = 0.1538 Spec = 0.95 105 Calibration using Likelihood Ratios • The Likelihood Ratio is the probability of a given test result in a patient with the disease, divided by the probability of the same finding in a patient without the disease. • Consider the following table (from a research study) – How do the ‘odds’ of having the disease change once you get a positive test result? March 2017 106 Likelihood Ratios Diseased Not diseased 90 5 95 Test +ve Test - 10 ve 100 95 105 100 200 Post-test odds (test +ve) = 18.0 Pre-test odds = 1.00 Odds (after +ve test) are 18 times higher than the odds before you had the test. This is the LIKELIHOOD RATIO. March 2017 https://www.youtube.com/watch?v=nFZs6eMvZFY 107 Likelihood Ratios • Likelihood ratios are related to sens & spec LR(+) = • Sometimes given as the definition of the LR(+) • LR(+) is fixed across populations. – Bigger is better. March 2017 108 Examples of LRs LRs change pretest probability (pp) as follows: LR+ Increases pp by LR- Reduces pp by 2 15% (of the pp) 0.5 15% 5 30% 0.2 30% 10 45% 0.1 45% CAGE questions for alcoholism: March 2017 CAGE LR+ 1 positive NS 2 positive 4.5 3 positive 13.3 4 positive 101 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1495095/ 109 Claire Lee’s Video! • Claire produced a video on LRs. – Guys - learn how to tell if an egg is rotten before you open it! • https://www.youtube.com/watch?v=ohohrv 6peYk March 2017 110 Calibration with Nomogram • Graphical approach which avoids arithmetic • Scaled to work directly with probabilities – no need to convert to odds • Draw line from pretest probability (= prevalence) through likelihood ratio – extend to estimate posttest probabilities • Only useful if someone gives you the nomogram! March 2017 111 Example of Nomogram (pretest probability 1%, LR+ 18, LR– 0.105) 15% 18 (LR = 1 no change) 1% .105 0.01% Pretest Prob. March 2017 LR Posttest Prob. 112 MCQ Several authorities recommend routine screening for depression in primary care settings. You find a brief screening test that has a positive likelihood ratio (LR+) of 2. If the pre-test probability is 5%, what is the main conclusion you should draw from that figure? 1. A patient with a positive score will be twice as likely to have a depression. 2. The interpretation will depend on the negative likelihood ratio (LR-). 3. Patients with depression will be twice as likely to score positively. 4. The test result will not be very informative. 5. A negative result will help you to rule out the disease. March 2017 113 TIME FOR A BREAK! March 2017 114 Measures of Benefit (1) • Consider a new potentially life-saving drug. How many people do we need to treat to prevent one death? • = ‘Number Needed to Treat’ (NNT). – Treat 5 people & study 5 controls for one year: • Incidence rate for the control group is 2 deaths per 5 person-years. • Incidence rate for the experimental group is 1 death per 5 person-years. 115 March 2017 Number needed to treat (2) • Treat 5 people for one year: – – – – Control therapy: 2 deaths Exp therapy: 1 death PREVENTED = 1 death So, NNT was 5 to prevent 1 death. • Calculation: – What is the risk difference? (aka Absolute Risk Reduction) 2/5 – 1/5 = 1/5 March 2017 1 NNT = RD 116 Number needed to treat (3) For preventing rare diseases you will need to treat many people to prevent one outcome, even if the reduction in risk is high: Relative risk reduction IR (Old Rx) IR (New Rx) RD NNT March 2017 = 0.1 = 10/1,000 = 1/1,000 = 9/1,000 = 1000/9 = 111 117 THE END March 2017 118 43) The classical “epidemiological triad” of disease causation consists of factors which fall into which of the following categories: a) host, reservoir, environment b) host, vector, environment c) reservoir, agent, vector d) host, agent, environment e) host, age, environment March 2017 119 For Mathematicians: • Some folks may wish to understand the references to Bayes. • Calibration by Bayes theorem: • Remember: – Post-test odds(+) = pretest odds * LR(+) – And, the LR(+) is ‘fixed’ across populations • Need a short aside about odds March 2017 120 Converting between odds & probabilities • if prevalence = 0.20, then • pre-test odds = = 0.25 (1 to 4) • if post-test odds = 0.25, then • PPV = March 2017 = 0.20 121 Calibration by Bayes’s Theorem • For diagnostic tests: – Prevalence is your best guess about the probability that the patient has the disease, before you do the test • Also known as Pretest Probability of Disease (a+c) / N in 2x2 table • Is closely related to Pre-test odds of disease when disease is rare: (a+c) / (b+d) in 2x2 table March 2017 122 Test Properties (13) Diseased Not diseased Test +ve a b a+b Test -ve c d c+d Prevalence proportion a+c b+d a+b+c+d =N Prevalence odds March 2017 123 Calibration by Bayes’s Theorem • To ‘calibrate’ your test for a new population: – Get the LR(+) value (via sen & spec) from the reference source – Estimate the pre-test odds for your population – Compute the post-test odds – Convert to post-test probability to get PPV March 2017 124 Example: (sens 90%, spec 95%, new prevalence 1%) • Compare to the ‘hypothetical table’ method (PPV=15.38%) March 2017 125