Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Controlling false discovery rate via knockoffs Rina Foygel Barber & Emmanuel Candès Jan 21 2015 Code & demos available at http://web.stanford.edu/~candes/Knockoffs/ Paper available at http://arxiv.org/abs/1404.5609 Jan 21 2015 Controlling false discovery rate via knockoffs 1/36 Setting An example: Which mutations in the reverse transcriptase (RT) of HIV-1 determine susceptibility to reverse transcriptase inhibitors (RTIs)? I yi ∈ R = resistance of virus in sample i to a RTI-type drug I Xij ∈ {0, 1} indicates if mutation j is present in virus sample i How can we select mutations that determine drug resistance, in such a way that our answer will replicate in further trials? Jan 21 2015 Controlling false discovery rate via knockoffs 2/36 Setting Sparse linear model: iid y = X · β + z, where zi ∼ N (0, σ 2 ) I n observations, p features I β is sparse Jan 21 2015 Controlling false discovery rate via knockoffs 3/36 Setting Goal: select a set of features Xj that are likely to be relevant to the response y, without too many false positives. One way to measure performance: # false positives |S ∩ H0 | FDR = E =E . total # of features selected |S| ↑ False discovery rate | {z False discovery proportion } S = set of selected features H0 = “null hypotheses” = {j : βj? = 0} Jan 21 2015 Controlling false discovery rate via knockoffs 4/36 Sparse regression Lasso: βλ = arg min β∈Rp 1 ky − X · βk22 + λ kβk1 2 Asymptotically, Lasso will select the correct model (at a good λ). In practice for a finite sample, I True positives & false positives intermixed along the Lasso path I How to pick λ to balance FDR vs power? I Need to account for correlations between Xj & weak signals that may have been missed on the Lasso path. Jan 21 2015 Controlling false discovery rate via knockoffs 5/36 Sparse regression Simulated data with n = 1500, p = 500. ●● 2 ● ● ● 1 ● ● ● ● ● −1 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −2 Fitted coefficient βj Lasso fitted model for λ = 1.75: ● ● ● ● 0 100 200 300 400 500 Index j Jan 21 2015 Controlling false discovery rate via knockoffs 6/36 Sparse regression Simulated data with n = 1500, p = 500. ●● 2 ● ● ● 1 ● ● ● ● ● −1 0 ● ● ● ● ● ● ● ● o ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −2 Fitted coefficient βj Lasso fitted model for λ = 1.75: ● ● o ● ● 0 100 200 False positive? 300 400 True positive? 500 Index j Jan 21 2015 Controlling false discovery rate via knockoffs 7/36 Sparse regression Simulated data with n = 1500, p = 500. ●● 2 ● ● ● 1 ● ● ● ● ● −1 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −2 Fitted coefficient βj Lasso fitted model for λ = 1.75: FDP = 26 55 = 47% ● ● ● ● 0 100 200 300 400 500 Index j Jan 21 2015 Controlling false discovery rate via knockoffs 8/36 Sparse regression Simulated data with n = 1500, p = 500. ●● 2 ● ● ● 1 ● ● ● ● ● −1 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −2 Fitted coefficient βj Lasso fitted model for λ = 1.75: FDP = 26 55 = 47% ● ● ● ● 0 100 200 300 400 500 Index j To estimate FDP, would need to calculate distribution of βjλ for null j (would need to know σ 2 , β ? , . . . ). (Donoho et al 2009) Jan 21 2015 Controlling false discovery rate via knockoffs 8/36 Construct knockoffs Main idea: ej . For each feature Xj , construct a knockoff version X The knockoffs serve as a “control group” ⇒ can estimate FDP. Setting: I Require n > p (ongoing work for high-dim. setting) I Don’t need to know σ 2 I Don’t need any information about β ? I Will get an exact, finite-sample guarantee for FDR Jan 21 2015 Controlling false discovery rate via knockoffs 9/36 Construct knockoffs Construction: I The knockoffs replicate the correlation structure of X: ej> X ek = Xj> Xk for all j, k X I Also preserve correlations between knockoffs & originals: ej> Xk = Xj> Xk for all j 6= k X Augmented design matrix e = (X1 X2 . . . Xp X e1 X e2 . . . X ep ) ∈ Rn×2p X X Jan 21 2015 Controlling false discovery rate via knockoffs 10/36 Construct knockoffs How? e = X · (Ip − 2ξΣ−1 ) + U · C, Define X where: Σ = X > X ξIp U = n × p orthonormal matrix orthogonal to X > C C = 4(ξIp − ξ 2 Σ−1 ) (Cholesky decomposition) =⇒ Jan 21 2015 e > X X e = X X Σ Σ − 2ξIp Σ − 2ξIp Σ Controlling false discovery rate via knockoffs 11/36 Construct knockoffs Why? For a null feature Xj , D e> ? e> e> Xj> y = Xj> Xβ ? + Xj> z = X j Xβ + Xj z = Xj y Jan 21 2015 Controlling false discovery rate via knockoffs 12/36 Construct knockoffs Why? For a null feature Xj , D e> ? e> e> Xj> y = Xj> Xβ ? + Xj> z = X j Xβ + Xj z = Xj y Jan 21 2015 Controlling false discovery rate via knockoffs 12/36 Construct knockoffs Lemma 1: Pairwise exchangeability property. For any N ⊂ H0 , h i e X X > swap(N) h i> D e y y = X X =⇒ the knockoffs are a “control group” for the nulls h Jan 21 2015 e X X i swap(N) = Controlling false discovery rate via knockoffs 13/36 Knockoff method Steps: 1. Construct knockoffs 2. Compute Lasso with augmented matrix: 1 2 e βλ = arg min y − X X · β + λ kβk 1 2 2 β∈R2p ej as a “control group” for Xj 3. Use X Jan 21 2015 Controlling false discovery rate via knockoffs 14/36 Knockoff method Fitted model for λ = 1.75 on the simulated dataset: ● Fitted coefficient βj ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 original features I Jan 21 2015 500 knockoff features Lasso selects 49 original features & 24 knockoff features Controlling false discovery rate via knockoffs 15/36 Knockoff method Fitted model for λ = 1.75 on the simulated dataset: ● Fitted coefficient βj ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 original features 500 knockoff features I Lasso selects 49 original features & 24 knockoff features I Pairwise exchangeability of the nulls =⇒ probably ≈ 24 false positives among the 49 original features Jan 21 2015 Controlling false discovery rate via knockoffs 15/36 Knockoff method Compute Lasso on the entire path λ ∈ [0, ∞). n o λj = sup λ : βjλ 6= 0 = first time Xj enters Lasso path n o ej = sup λ : βeλ 6= 0 = first time X ej enters Lasso path λ j Then define statistics ej } · sign(λj − λ ej ) Wj = max{λj , λ Jan 21 2015 Controlling false discovery rate via knockoffs 16/36 Knockoff method ● null signal + + ● ● + + ● − | {z ● + ● ● − } λ=0 variables enter late (probably not significant) Jan 21 2015 + ● + ● + |W| ● − − − | {z } λ→∞ variables enter early (likely significant) Controlling false discovery rate via knockoffs 17/36 Knockoff method ● λ when Xj enters 4 3 ● ● ● ● ● ● ● ● 2 1 0 ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ●●●●● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ●●● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ●● ●● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ●● 0 Jan 21 2015 Null Signal 1 ● 2 3 λ when Xj enters 4 Controlling false discovery rate via knockoffs 18/36 Knockoff method Lemma 2: Pairwise exchangeability of the nulls =⇒ D (W1 , W2 , . . . , Wp ) = (|W1 | · 1 , |W2 | · 2 , . . . , |Wp | · p ) iid where j = sign(Wj ) for non-nulls and j ∼ {±1} for nulls. ● null signal + + ● ● + + ● − Jan 21 2015 ● + ● ● − + ● − − ● + ● + |W| − Controlling false discovery rate via knockoffs 19/36 Knockoff method Selected variables: Control group: Jan 21 2015 Sλ = {j : Wj ≥ +λ} e Sλ = {j : Wj ≤ −λ} Controlling false discovery rate via knockoffs e d λ ) := Sλ FDP(S Sλ 20/36 Knockoff method ● 4 λ when Xj enters e d λ ) := Sλ FDP(S Sλ Sλ = {j : Wj ≥ +λ} e Sλ = {j : Wj ≤ −λ} Selected variables: Control group: 3 ● ● ● ● selected variables Sλ control group Sλ ● ● ● ● 2 1 0 Null Signal ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●●● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ●●●●● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ●●● ● ●● ●●●● ●● ●● ● ● ●● ● ●● ●● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ●●●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●●●● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ●● 0 1 ● 2 3 λ when Xj enters 4 Sλ ∩ H0 e Sλ ∩ H0 d λ) FDP(Sλ ) = ≈ ≤ FDP(S Sλ Sλ Jan 21 2015 Controlling false discovery rate via knockoffs 20/36 Knockoff method The knockoff filter: define e d λ ) := Sλ = #{j : Wj ≤ −λ} , FDP(S Sλ #{j : Wj ≥ +λ} then choose n o d λ ) ≤ q (or λ = ∞ if empty set) Λ = min λ : FDP(S and select the variable set SΛ = {j : Wj ≥ Λ} . Jan 21 2015 Controlling false discovery rate via knockoffs 21/36 Knockoff method 1.0 ● 3 ● ● ● ● ● ● 2 0 Actual FDP Estimated FDP 0.8 ● ● ● 1 ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●●● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ●●● ●● ●●●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●●●● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ●● ●● ● ● ●● 0 Jan 21 2015 Null Signal FDP λ when Xj enters 4 1 0.6 0.4 0.2 0.0 ● 2 3 λ when Xj enters 4 0 1 2 3 4 λ Controlling false discovery rate via knockoffs 22/36 Theoretical guarantees Theorem 1: For SΛ chosen by the knockoff filter, E [mFDP(SΛ )] ≤ q where the modified FDP is given by S ∩ H0 mFDP(S) = . S + q−1 Jan 21 2015 Controlling false discovery rate via knockoffs 23/36 Theoretical guarantees The knockoff+ filter: define e d + (Sλ ) := Sλ + 1 = #{j : Wj ≤ −λ} + 1 , FDP Sλ #{j : Wj ≥ +λ} then choose n o d + (Sλ ) ≤ q (or λ = ∞ if empty set) Λ+ = min λ : FDP and select the variable set SΛ+ = {j : Wj ≥ Λ+ } . Jan 21 2015 Controlling false discovery rate via knockoffs 24/36 Theoretical guarantees Theorem 2: For SΛ+ chosen by the knockoff+ filter, E FDP(SΛ+ ) ≤ q . Jan 21 2015 Controlling false discovery rate via knockoffs 25/36 Theoretical guarantees Theorem 2: For SΛ+ chosen by the knockoff+ filter, E FDP(SΛ+ ) ≤ q . Proof sketch: SΛ ∩ H0 SΛ ∩ H0 e SΛ+ ∩ H0 + 1 + + FDP(SΛ+ ) = · = SΛ SΛ e SΛ+ ∩ H0 + 1 + + {z } | | {z } d + (SΛ )≤q ≤FDP + Jan 21 2015 Controlling false discovery rate via knockoffs martingale 25/36 Theoretical guarantees Proof sketch cont’d: Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ, e Sλ ∩ H0 + 1 and Λ+ is a stopping time. + + ● ● + + ● − Jan 21 2015 ● + ● ● − + ● − − ● + ● + |W| − Controlling false discovery rate via knockoffs 26/36 Theoretical guarantees Proof sketch cont’d: Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ, e Sλ ∩ H0 + 1 and Λ+ is a stopping time. + + ● ● + + ● − Jan 21 2015 ● + ● ● − ● − ● ● |W| − Controlling false discovery rate via knockoffs 26/36 Theoretical guarantees Proof sketch cont’d: Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ, e Sλ ∩ H0 + 1 and Λ+ is a stopping time. + + ● ● + + ● − Jan 21 2015 ● + ● ● − ● − ● ● |W| − Controlling false discovery rate via knockoffs 26/36 Theoretical guarantees Proof sketch cont’d: Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ, e Sλ ∩ H0 + 1 and Λ+ is a stopping time. + + ● ● + + ● − Jan 21 2015 ● + ● ● − ● − ● ● |W| − Controlling false discovery rate via knockoffs 26/36 Theoretical guarantees Proof sketch cont’d: Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ, e Sλ ∩ H0 + 1 and Λ+ is a stopping time. + + ● ● + + ● − Jan 21 2015 ● + ● ● − ● − ● ● |W| − Controlling false discovery rate via knockoffs 26/36 Theoretical guarantees Proof sketch cont’d: Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ, e Sλ ∩ H0 + 1 and Λ+ is a stopping time. + + ● ● + + ● − ● + ● ● − ● − ● ● |W| − C E [M(Λ+ )] ≤ E [M(0)] = E ≤ 1, |H0 | − C + 1 for C = # of + coin flips ∼ Bin(|H0 |, 0.5) Jan 21 2015 Controlling false discovery rate via knockoffs 26/36 Simulations Setup: I n = 3000, p = 1000, sparsity level k I Features Xj are random unit vectors with correlation level ρ I For signals j, βj? ∼ {±A} for amplitude level A I y = Xβ + N(0, In ) iid Compare knockoff, knockoff+, & Benjamini-Hochberg (BH). Jan 21 2015 Controlling false discovery rate via knockoffs 27/36 Simulations I Fix amplitude A = 3.5 & sparsity level k = 30 I Vary feature correlation ρ from 0 to 0.9 (set E[Xj> Xk ] = ρ|j−k| ) 30 100 ● 25 Nominal level Knockoff Knockoff+ BHq ● Knockoff Knockoff+ BHq 80 15 Power (%) FDR (%) 20 ● ● ● ● ● 60 ● ● ● ● ● 40 10 ● ● ● ● 20 ● 5 ● ● ● ● 0 0.0 0.2 0.4 0.6 Feature correlation ρ Jan 21 2015 ● 0 0.8 0.0 0.2 0.4 0.6 0.8 Feature correlation ρ Controlling false discovery rate via knockoffs 28/36 HIV data Which mutations in the RT or protease of HIV-1 determine susceptibility to RT inhibitors or protease inhibitors? Data: Genotypic predictors of HIV type 1 drug resistance, Rhee et al (2006) Available at hivdb.stanford.edu (Stanford HIV Drug Resistance Database) I Each drug analysed separately I Response y = resistance to the drug I Features X = which mutations are present in the RT or in the protease Jan 21 2015 Controlling false discovery rate via knockoffs 29/36 HIV data The data set: Drug type # drugs Sample size PI NRTI NNRTI 6 6 3 848 639 747 Jan 21 2015 # protease or RT positions genotyped 99 240 240 Controlling false discovery rate via knockoffs # mutations appearing ≥ 3 times in sample 209 294 319 30/36 HIV data The data set: Drug type # drugs Sample size PI NRTI NNRTI 6 6 3 848 639 747 # protease or RT positions genotyped 99 240 240 # mutations appearing ≥ 3 times in sample 209 294 319 To validate results: I Jan 21 2015 Treatment-selected mutation (TSM) panel: A separate study identifies mutations frequently present in patients who have been treated with each type of drug Controlling false discovery rate via knockoffs 30/36 HIV data Results for PI type drugs # HIV−1 protease positions selected Resistance to APV 35 30 Appear in TSM list Not in TSM list Resistance to ATV 35 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 0 0 Knockoff BHq 5 0 Knockoff Data set size: n=768, p=201 BHq Knockoff Data set size: n=329, p=147 Resistance to LPV Resistance to NFV Resistance to SQV 35 35 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 0 0 BHq Data set size: n=516, p=184 BHq Data set size: n=826, p=208 35 Knockoff Jan 21 2015 Resistance to IDV 35 5 0 Knockoff BHq Data set size: n=843, p=209 Controlling false discovery rate via knockoffs Knockoff BHq Data set size: n=825, p=208 31/36 Resistance to X3TC # HIV−1 RT positions selected HIV data 30 Appear in TSM list Not in TSM list Resistance to ABC 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 0 0 Knockoff BHq Knockoff Data set size: n=628, p=294 Resistance to D4T Resistance to DDI Resistance to TDF 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 0 0 BHq 30 Appear in TSM list Not in TSM list 5 0 Knockoff BHq Knockoff Data set size: n=632, p=292 Resistance to DLV 35 Resistance to EFV Resistance to NVP 35 35 30 30 25 25 20 20 20 15 15 15 10 10 10 5 5 0 0 BHq Data set size: n=732, p=311 BHq Data set size: n=353, p=218 25 Knockoff BHq Data set size: n=630, p=292 30 Data set size: n=630, p=293 # HIV−1 RT positions selected 0 BHq 30 Knockoff Jan 21 2015 5 Knockoff Data set size: n=633, p=292 Results for NRTI type drugs Results for NNRTI type drugs Resistance to AZT 30 5 0 Knockoff BHq Data set size: n=734, p=318 Controlling false discovery rate via knockoffs Knockoff BHq Data set size: n=746, p=319 32/36 Can knockoffs be replaced by permutations? Let X π = X with rows randomly permuted. Then Σ 0 π > π X X X X ≈ 0 Σ Jan 21 2015 Controlling false discovery rate via knockoffs 33/36 Can knockoffs be replaced by permutations? λj Value of λ when variable enters model Let X π = X with rows randomly permuted. Then Σ 0 π > π X X X X ≈ 0 Σ ● 25 ● 20 Original non−null features Original null features Permuted features ● 10 ●● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●● ●● ● ●●● ● ●●● ● ●● ●●●●●●●●●● ● ●●●● ●● ●● ● 0 |0 {z } | signals 50 {z }100| 150 Index of column in the augmented matrix [X Xπ] nulls Knockoff method Permutation method Jan 21 2015 ● ● 15 200 {z permuted features } FDR (target level q = 20%) 12.29% 45.61% Controlling false discovery rate via knockoffs 33/36 Summary The knockoff filter for inference in a sparse linear model: • Creates a “control group” for any type of statistic • Handles any type of feature correlation • Unknown noise level & sparsity level • Finite-sample FDR guarantees Jan 21 2015 Controlling false discovery rate via knockoffs 34/36 Summary Future work: 1. How to move to high-dimensional setting? 2. Extend to GLMs or other regression models? 3. Similar principles for other problems, e.g. graphical models? Jan 21 2015 Controlling false discovery rate via knockoffs 35/36 Summary Thank you! I I Code & demos available at http://web.stanford.edu/~candes/Knockoffs/ Paper available at http://arxiv.org/abs/1404.5609 I Joint work with Emmanuel Candès @ Stanford I R. F. B. was partially supported by NSF award DMS-1203762. E. C. is partially supported by AFOSR under grant FA9550-09-1-0643, by NSF via grant CCF-0963835 and by the Math + X Award from the Simons Foundation. Jan 21 2015 Controlling false discovery rate via knockoffs 36/36