Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene hunting from a Bayesian viewpoint Tuan V. Nguyen Bone and Mineral Research Program Garvan Institute of Medical Research Sydney, Australia Gene search is justified? • Exploration of disease pathway • Public health implications • Pharmacological applications • Treatment? 10 0.5 8 0.4 6 0.3 4 0.2 2 0.1 0 0 Femoral neck BMD 10-year Risk of Fx 0.6 1. 05 - 12 0. 95 - 0.7 0. 85 - 14 0. 75 - 0.8 0. 65 - 16 0. 55 - 0.9 0. 45 - 18 <0 .4 0 Prevalence (%) Complex traits BMD: genetics and environments Variation in complex trait = G + E + GxE (G=genetics; E=environment, x=interaction) MZ twins; r=0.73 1.4 1.3 1.3 1.2 1.2 1.1 1.1 1 Twin 2 Twin 2 1.4 0.9 DZ twins; r=0.47 1 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 Twin 1 Twin 1 Genetics, BMD and fracture Fracture BMD Does familial risk of BMD affect fracture risk? Intraclass correlation in BMD RR of BMD/Fx r = 0.8 r = 0.9 _________________________________________________ 5 1.14 1.16 6 1.17 1.20 7 1.21 1.24 8 1.24 1.28 _________________________________________________ Genes that affect BMD explain small variation in fx risk Fracture risk: genetics and environments 0.1 5 0.08 4 Relative risk Pairwise concordance Twin study 0.06 0.04 3 2 0.02 1 0 0 DZ MZ Zygosity DZ MZ Zygosity P Kannus et al, BMJ 1999; 319:1334-7 Current strategies • Linkage analysis • Genome-wide screen • Association analysis • “Candidate gene” Linkage analysis – identical by descent (ibd) AB AC AB AC IBD = 0 AB CD AC AD IBD = 1 AB CD BC BC IBD = 2 Linkage analysis: basic model Squared difference in BMD among siblings o oo oo oo o o o oo oo oo o o o oo oo oo o o 0 1 2 Number of alleles shared IBD Population-based association analysis Fracture AC AB AC BC AA AB BB AA AC AB AC BB BC BC CC AB BB CC BB Controls BC Family-based association analysis AB AA AB AB AC BC BC AA AB Genome-wide vs candidate gene approach Genome-wide screen Candidate gene analysis Complex Simple No prior knowledge of mechanism Prior knowledge of mechanism Expensive Inexpensive No specific genes Specific genes Linkage vs association phenomena Linkage Association Magnitude of “effect” No Yes Transmission Yes No/Yes Complex Simple Power Low High False +ve High High Study design complexity Test statistic Test statistic = signal / noise = effect size / random error Result: significant (+ve) or not significant (-ve) Criteria: P-value P<0.05 +ve P>0.05 -ve Consider an example Genotype Fracture No fx BB 300 300 Bb 600 650 Bb 100 50 OR = (0.1 / 0.9) / (0.05 / 0.95) = 2.11 LnOR = 0.74; SE(lnOR) = 0.18 P-value < 0.0001 Diagnostic analogy Diagnosis Genetic research Has cancer test +ve OK Has cancer test –ve ! (false -ve) No cancer test +ve ! (false +ve) No cancer test –ve OK Association Significant Association Power NS No assoc. Significant No assoc. NS P-value The meaning of P-value • P-value: probability of getting a significant statistical test given that there is no association (or no linkage) • P-value = P(significant stat | Ho is true) The logic of P-value • If Tuan has hypertension, he is unlikely to have pheochromocytoma • If there was truly no association, the the observation is unlikely • Tuan has pheochromocytoma • The observation occurred • Tuan is unlikley to have hypertension • The no-association hypothesis is unlikely P-value “Thinking about P values seems quite counterintuitive at first, as you must use backwards, awkward logic. Unless you are a lawyer or a Talmudic scholar … you will probably find this sort of reasoning uncomfortable” (Intuitive Statistics) What do we want to know? Clinical P(+ve | cancer), or P(cancer | +ve) ? Research P(Significant test | Association), or P(Association | Significant test) ? Breast Cancer Screening Prevalence = 1%; Sensitivity = 90%; Specificity = 91% Population Cancer (n=10) No Cancer (n=990) +ve -ve +ve -ve N=9 N=1 N=90 N=900 P(Cancer| +ve result) = 9/(9+90) = 9% Genetic association Prior prob. association = 0.05; Power = 90%; Pvalue = 5% 1000 SNPs True (n=50) False (n=950) +ve -ve +ve -ve N=45 N=5 N=48 N=902 P(True association| +ve result) = 45/(45+48) = 48% Problem with p-value No. of patients get A and B No. preferring A:B % Preferring A Two-sided Pvalue 20 15:5 75.0 0.04 200 115:86 57.5 0.04 2,000 1046:954 52.3 0.04 2,000,000 1001445: 998555 50.07 0.04 P value is • NOT the likelihood that findings are due to chance • NOT the probability that the null hypothesis is true given the data • A p-value = 0.05 does not mean that there is a 95% chance that a real difference exists • The lower p-value, the stronger the evidence for an effect Bayes factor P(data | H0) BF = _________________ P(data | H1) 2 / n BF 2 / n 0.5 2 2 0.5 2 BF 1 n / 2 2 exp 0.5 x 0 2 / / 2 exp 0.5 x 0 2 / 2 2 / n exp 0.5z 2 1 / n 2 1 Minimum Bayes factor 1 z2 BF z exp 2 Prob of null hypothesis P H 0 PH 0 | Data 1 PH 0 BF 1 Where P(H0) is the prior probability of the null hypothesis Re-evaluation of some “positive” studies Study OR 95% CI P-value Z MinBF Min P(H0|data) 1 2.10 1.10, 3.90 0.021 2.30 0.27 0.21 2 1.50 1.10, 2.10 0.010 2.46 0.20 0.16 3 2.67 1.37, 6.02 0.009 2.60 0.15 0.13 4 2.59 1.23, 4.45 0.004 2.90 0.07 0.07 5 2.26 1.09, 4.69 0.026 2.19 0.33 0.25 6 2.60 1.40, 5.00 0.003 2.94 0.06 0.06 7 2.58 1.36, 4.91 0.004 2.89 0.07 0.07 8 2.79 1.02, 7.65 0.043 2.00 0.45 0.31 Bayes Factor for genetic association study let p be the allelic frequency of genetic marker, then BF can be shown to be: 1 / BF 1 4n 2 p 3 1 p 3 2 1 n 3 F n/2 Bayes Factor and posterior probability of an association let 0 be the prior probability of a true association, the posterior probability of the association is: P0 1 1 1 0 / 0 / BF Bayes factor vs p-value – t/z test LODA [Log10(Bayes factor)] 5 (From uppermost to lowermost lines: n=10, 15, 50, 100, 500, 1000, 10000) 4 3 2 1 0 -6 -5 -4 -3 Log10(P-value) -2 -1 P-value and Bayes factor – LD test 16 (From uppermost to lowermost lines: n= 10, 50, 100, 300, 500, 1000, 10000) 14 Log10(Bayes factor) 12 10 8 6 4 2 0 -16 -14 -12 -10 -8 Log10(p-value) -6 -4 -2 0 Bayes factor and p-value Bayes factor P-value • • • • • Non-comparative Observed + hypothetical data Evidence only negative Sensitive No formal justification or interpretation • • • • • Comparative Only observed data Evidence +ve or –ve Insensitive Formal justification and interpretation Summary • The criteria of p<0.05 is not an adequate measure of a genetic association • Bayes factor is potentially a relevant measure of association 50 18 50 16 50 14 50 12 50 10 0 85 0 65 0 45 25 0 160 140 120 100 80 60 40 20 0 50 Number of studies Distribution of sample sizes Sample size Ioannidis et al, Trends Mol Med 2003 Distribution of effect sizes 100 80 60 40 20 4 8 2. 2. 2 6 1. 8 0. 1. 2 5 0. 0 0 Number of studies 120 Effect size (OR) Ioannidis et al, Trends Mol Med 2003 Correlation between the odds ratio in the first studies and in subsequent studies Ioannidis et al, Nat Genet 2001 Evolution of the strength of an association as more information is accumulated Ioannidis et al, Nat Genet 2001 Predictors of statistically significant discrepancies between the first and subsequent studies of the same genetic association Odds ratio – univariate analysis Total no. of studies (per association) 1.17 (1.03, 1.33) Odds ratio – multivariate analysis 1.18 (1.02, 1.37) Sample size of the first study 0.42 (0.17, 0.98) 0.44 (0.19, 0.99) Single first study with clear genetic effect 9.33 (1.01, 86.3) NS Predictor Ioannidis et al, Nat Genet 2001 Diagnosis and statistical reasoning Diagnosis Research Prior probability of disease (prevalence) Prior probability of research hypothesis Positive test result (+ve) Statistical significance (S) Sensitivity P(+ve | diseased) Power: (1-b) P(S | association) False positive rate P(+ve | no diseased) P-value P(S | No association) Positive predictive value P(diseased | +ve) Bayesian probability P(Association | S) Risk factors for fracture • • • • Blonde hair • Drinking coffee Being tall • Drinking tea Wear trouser (women) • Coca cola High heel (women) • High protein intake Cancer risk • • • • Electric razors Broken arms (women) Fluorescent lights Allergies • Breeding reindeer • Being a waiter • Owning a pet bird • Being short • Being tall • Hot dogs • Have a refrigerator! Altman and Simon, JNCI 1992 “Half of what doctors know is wrong. Unfortunately we don’t know which half.” Quoted from the Dean of Yale Medical School, in “Medicine and Its Myths”, New York Times Magazine, 16/3/2003