Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology If you are a hypothesis tester, which pvalue is more likely to avoid a type I error P = 0.049 P = 0.005 What is the p-value fallacy? If you go to the doctor with a set of symptoms, does the doctor develop a hypothesis and test it? Anyone heard of Bayesian statistics? After completing a study, would you rather know the probability of the data given the null hypothesis, or the probability of the null hypothesis given the data? Last Session Randomization – – – P-values – – Leads to average confounding, gives meaning to p-values Provides a known distribution for all possible observed results Observational data does not have average 0 confounding Probability under the null that a test statistic would be greater than or equal to its observed value, assuming no bias. Not the probability of the data, the null, a significance level Confidence intervals – – Calculated assuming infinite repetitions of the data Don’t give a probability of containing the true value Today The p-value fallacy p-value functions – Shows p-values for the data at a range of hypotheses Bayesian statistics – – – The difference between Frequentists and Bayesians Bayesian Theory How to apply Bayes Theory in Practice P-value fallacy P-value developed by Fisher as informal measure of compatibility of data with null – – Provides no guidance on significance Should be interpreted in light of what we know Hypothesis testing developed by Neyman, Pearson to minimize errors in the long run – RA Fisher Jerzy Egon Pearson (not Karl, his father) Neyman P = 0.04 is no more evidence than p = 0.00001 Fallacy is that p-value can do both Different goals The goal of epidemiology: – The goal of policy: – To measure precisely and accurately the effect of an exposure on a disease To make decisions Given our goal: – – Why hypothesis testing? Why compare to the null? p-value functions (1) Recall that a p-value is: – “The probability under the test hypothesis (usually the null) that a test statistic would be ≥ to its observed value, assuming no bias.” We can calculate p-values under test hypotheses other than the null – – Particularly easy if we use a normal approximation If we assume we can fix the margins with observational data p-value functions (1) Exposed Unexposed Disease 6 3 No disease 14 17 Total 20 20 Risk 0.3 0.15 2.00 RR Observed 0.63 SE(ln(RR)) p-value functions (2a) To calculate a test statistics (Z score) for a p-value, usually given: ln( RR ) ln( 2.0) z 1.1 SE ( RR ) 0.63 p ( z 1.1) 0.27 Ln(1) = 0 p-value functions (2b) ln( RRo ) ln( RR H ) z , SE ( RRo ) c d SE ( RR ) a Ne b N E abs ( z ) p 2 (1 1 21 z 2 e dz ) 2 p-value functions (3) Hypothesis RR Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Observed RR a RR b RR c RR d RR e RR f RR g RR h RR I 0.1 0.20 0.33 0.5 1 2.00 3 5 10 z-value p-value 4.737 0.0E+00 3.641 0.0000 2.849 0.0040 2.192 0.0280 1.096 0.2730 0.000 1.0000 -0.641 0.5210 -1.449 0.1470 -2.545 0.0110 p-value functions (4) 2-sided p-value p-value function for example 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Null pval 0.27 UCLM LCLM 0.1 0.58 Point estimate 1 RR hypothesis 2 6.9 10 p-value functions (4) 2-sided p-value p-value function for example 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0.1 1 RR hypothesis 10 Case-control study of spermicides and Down Syndrome Interpretation Introduction to Bayesian Thinking What is the best estimate of the true effect of E on D? Given a disease D and An exposure E+ vs. unexposed E-: – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 1.5 to 2.7 OK, but why did you say what you said? – Have no other info to go on What is the best estimate of the true effect of E on D? Given a disease D and An exposure E+ vs. unexposed E-: – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 0.5 to 8.0 Note that the width of the interval doesn’t affect the best estimate in this case What is the best estimate of the true effect of E on D? Given a disease D (breast cancer) and An exposure E+ (ever-smoking) vs. unexposed E- (never-smoking): – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 1.0 to 4.0 Most previous studies of smoking and BC have shown no association What is the best estimate of the true effect of E on D? Given a disease D (lung cancer) and An exposure E+ (ever-smoking) vs. unexposed E- (never-smoking): – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 1.0 to 4.0 Most previous studies of smoking and LC have shown much larger effects What is the best estimate of the true probability of heads? Given a 100 flips of a fair coin flipped in a fair way Observed number of heads = 40 – – The probability of heads equals 0.40 with 95% confidence interval 0.304 to 0.496 Given what we know about a fair coin, why should this data override what we know? So why would we interpret our study data as if it existed in a vacuum? The Monty Hall Problem An alternative to frequentist statistics Something important about the data is not being captured by the p-value and confidence interval, or at least in the way they’re used What is missing is a measure of the evidence provided by the data Evidence is a property of data that makes us alter our beliefs Frequentist statistics fail as measures of evidence The logical underpinning of frequentist statistics is that – “if an observation is rare under a hypothesis, then the observation can be used as evidence against the hypothesis.” Life is full of rare events we accord little attention – What makes us react is a plausible competing hypothesis under which the data are more probable Frequentist statistics fail as measures of evidence • Null p-value provides no information about the probability of alternatives to the null • Measurement of evidence requires 3 things: • The observations (data) • 2 competing hypotheses (often null and alternative) • The p-value incorporates data & 1 hypothesis • usually the null The likelihood as a measure of evidence Likelihood = c*Probability(dataH) Data are fixed and hypotheses variable – p-values calculated with fixed (null) hypothesis and assuming data are randomly variable. Evidence supporting one hypothesis versus another = ratio of their likelihoods – Log of the ratio is an additive measure of support Evidence versus belief The hypothesis with the higher likelihood is better supported by the evidence – Belief also depends on prior knowledge, and can be incorporated w/ Bayes Theorem – But that does not make it more likely to be true. It is the likelihood ratio that represents the data Priors can be subjective or empirical – But not arbitrary Bayesian analysis (1) Given (1) the prior odds that a hypothesis is true, and (2) data to measure the effect – – Update the prior odds using the data to calculate the posterior odds that the hypothesis is true. A formal algorithm to accomplish what many do informally Bayesian analysis (2) p(H1 ) p(D H1 ) p(H1 D) p(H0 ) p(D H0 ) p(H0 D) Prior odds times the likelihood ratio equals the posterior odds Only for people with an ignorant prior distribution (uniform) can we say that the frequentist 95% CI covers the true value with 95% certainty Bayesian analysis (3): Environmental tobacco smoke and breast cancer likelihood prior odds posterior study observation ratio (A) odds Sandler 1.6 (0.8-3.4) 1.78 1.0 1.78 Hirayama 1.3 (0.8-2.1) 0.38 1.8 0.7 Smith 1.6 (0.8-3.1) 2.12 0.7 1.4 H1 = [OR = 2]; H0 = [OR = 1] Initially, the analyst has no preference (prior odds = 1) Bayesian analysis (4): concepts Keep these concepts separate: – – – The hypotheses under comparison (e.g., RR=2 vs RR=1) The prior odds (>1 favors 1st hypothesis (RR=2), <1 favors the 2nd hypothesis (RR=1)) The estimate of effect for a study (this is the data used to modify the prior odds) Bayesian analysis (5): concepts Keep these concepts separate: – – The likelihood ratio (probability of data under 1st hypothesis versus under 2nd hypothesis. >1 favors 1st hypothesis, <1 favors the 2nd hypothesis) The posterior odds (compares the hypotheses after observing the data. >1 favors 1st hypothesis, <1 favors the 2nd hypothesis) Bayesian analysis: Part V: Problem 3 (20 points total) The following table shows odds ratios, 95% confidence intervals, and the standard error of the ln(OR) from three studies of the association between passive exposure to tobacco smoke and the occurrence of breast cancer. The fourth column shows the likelihood ratio calculated as the 1 0 likelihood under the hypothesis that the true relative risk equals 2.0 divided by the likelihood under the hypothesis that the true relative risk equals 1.0. H = RR = 2.0 study H = RR = 1.0 SE(ln(OR)) likelihood ratio prior odds posterior odds Sandler observation 95% CI 1.6 (0.8-3.4) 0.38 1.8 0.75 1.34 Hirayama 1.3 (0.8-2.1) 0.24 0.4 1.34 0.50 Smith 1.6 (0.8-3.1) 0.34 2.1 0.50 1.07 A. (7 points) Assume that someone favors the hypothesis that the true odds ratio equals 1.0 over the hypothesis that the true odds ratio equals 2.0. The person quantifies their preference by stating that their prior odds for the hypothesis of a relative risk equal to 2.0 versus the hypothesis of a relative risk equal to 1.0 is 0.75 (see the first row of the table). Complete the shaded cells of the table using Bayesian analysis. After seeing these three studies, should the person favor (circle one): the hypothesis of 2.0 the hypothesis of 1.0 http://statpages.org/bayes.html Connection between p-values and the likelihood ratio Bayesian intervals Bayesian intervals require specification of prior odds for entire distribution of hypotheses, not just two hypotheses – Update the distribution with data – Distribution will look like a p-value function, but incorporate only prior knowledge. Posterior distribution Choose interval limits Priors 1 0.9 probability density 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10 RR Hypothesis Advocate Skeptic Ignorant Bimodal + Sandler 1 0.9 probability density 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10 RR Hypothesis advocate skeptic ignorant bimodal + Hirayama 1 0.9 probability density 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10 RR Hypothesis advocate skeptic ignorant bimodal + Smith 1 0.9 probability density 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10 RR Hypothesis advocate skeptic ignorant bimodal + Morabia 1 0.9 probability density 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10 RR Hypothesis advocate skeptic ignorant bimodal + Johnson 1 0.9 probability density 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 1 10 RR Hypothesis advocate skeptic ignorant bimodal Conclusion Pvalue fallacy – Pvalue functions – Cannot serve both the long run perspective and the individual study perspective Can help see the entire distribution of probabilities Bayesian analysis – Allows us to change our beliefs with new information and measure the probability of hypothesis