Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
False Discovery or Missed Discovery Bernie Devlin Dept of Psychiatry, University of Pittsburgh, SOM Collaborators: K. Roeder, L. Wasserman, SA. Bacanu, C.R. Genovese Introduction Topic: Multiple tests, and tradeoff of true / false discoveries Properties of single & multiple tests FDR: When some hypotheses are true How to increase the power of FDR: weighted FDR Conclusions False or Missed Discoveries Statistical decision: choose a rejection threshold to control the Type I error rate, False Discovery Missed Discovery while achieving desirable power for relevant alternatives. Now Commonly Many, many tests Power Power ? How to Choose Threshold α? • Uncorrected Testing • Ignores multiple testing • Reject if P < α • WRONG! • Control Experiment-wise Type I Error (FWER) • Bonferroni correction • Reject if P < α/m The Multiple Testing Problem • • • Reject if p-value Pi ≤ Threshold D = # discoveries FD= # false discoveries Ho Retained Ho Rejected Total Ho True TN FD mo H1 True FN TD m1 Total N D m More Power To You • FWER is principled but ... Right goal? • Aim for power and principle. • False Discovery Control: Bound ratio FD/D Proposed by Benjamini and Hochberg 1995 : False Discovery Rate FDR = E ( FD / D ) ≤ α Power Reject P < Threshold α/m t tw α The Benjamini-Hochberg Procedure (1995) • Order • The the p-values BH threshold is The FDR is controlled B-H, in action FDR Control for Dependent Tests • Contiguous regions of the brain. Dependent FDR: Benjamini and Yukateli • Association with gene X gene interaction. Devlin et al.(2003) Gen Epi 25: 36-47 For large m, FDR holds provided empirical distribution of p-values is a consistent estimator the distribution of p-values. What Kind of Dependent Tests? Search for liability loci in a large number of genes, allowing for gene-gene interaction. GLM model for phenotype with pairwise interactions k g(µ) = Σ βr Xir + Σ βrs Xir Xis + … r=1 1≤r<s≤k A way to view model X1 X2 … Xk Y X1*X2 X2*X3 Full Model: All terms Reduced Model: Drop “terms” … Target set Approach 1 2 12 3 23 4 34 A B 13 24 14 A*B Prior information • FDR does not incorporate scientific priors into the analysis. • Consider formal procedure for using prior information: • Genes targeted due to linkage • Genes in biological pathways • Try Weights Power Reject P < Threshold α/m t tw α p-Value Weighting • Control overall FDR, but favor some hypotheses with weights. • Think betting: • • • • Up-weight candidates (wi > 1) Down-weight others (wk < 1) Budget: average weight equals 1 Placing bets: Pi/wi Choosing good weights • Mean (w1,…,wm) = 1 • True model: “a” = fraction alternatives • Betting model: “ε” = fraction up-weighted • Binary weights: w ∝ 1 for (1 − ε ) ⎧ Bet = ⎨ ⎩w ∝ B for ε 0 1 Power For a Given Test Power For a Given Test Genome-wide Association • A decade ago the idea of genome-wide association was envisioned (Risch and Merikangas). • ASHG: Risch suggested focusing testing on SNPs under linkage peaks. • Technology favors pre-selected SNPs, fixed platform? Linkage Trace for Body Mass Index Continuous Weights • Linkage trace Y for weights. • Exponential Weights • eBY • Cumulative Weights • P(Z < Y-B) Exponential weights exaggerates extreme values in linkage trace. 14 12 10 8 6 4 2 0 -2 0 25 linkage 50 cumulative 75 exponential 100 Cumulative weights give broader peaks. 14 12 10 8 6 4 2 0 -2 0 25 linkage 50 cumulative 75 exponential 100 Bigger B creates concentrated, dramatic up-weighting. 18 16 14 12 10 8 6 4 2 0 -2 0 25 linkage 50 cumulative, small B 75 cumulative, large B 100 Simulation Experiment • Generate linkage traces • Generate associate tests • 500K SNPs • 10 causal SNPs • Power = # Discoveries/10 Impact of Weights on Power • Improvements for either choice --finding 1-2 more signals out of 10. 1.0 0.8 Bonferroni No Weight Weight Noise 0.6 0.4 • Uninformative linkage trace, losses are small. 0.2 0.0 4.0 4.5 5.0 5.5 6.0 Brain Imaging Increase power? Standard FDR versus FDR using weights based on regions Standard B-H (red + blue) Results: False Region Control Threshold Pacifico et al. CMU Tech Report #771 Conclusions In tradeoff of true / false discoveries FDR has good properties when some tests true To increase FDR power, “focus” tests Prior knowledge: expected G x G P-value weighting: ‘prior’ knowledge