Download Slides.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Transcript
Controlling false discovery rate
via knockoffs
Rina Foygel Barber & Emmanuel Candès
Jan 21 2015
Code & demos available at http://web.stanford.edu/~candes/Knockoffs/
Paper available at http://arxiv.org/abs/1404.5609
Jan 21 2015
Controlling false discovery rate via knockoffs
1/36
Setting
An example:
Which mutations in the reverse transcriptase (RT) of HIV-1
determine susceptibility to reverse transcriptase inhibitors (RTIs)?
I
yi ∈ R = resistance of virus in sample i to a RTI-type drug
I
Xij ∈ {0, 1} indicates if mutation j is present in virus sample i
How can we select mutations that determine drug resistance,
in such a way that our answer will replicate in further trials?
Jan 21 2015
Controlling false discovery rate via knockoffs
2/36
Setting
Sparse linear model:
iid
y = X · β + z, where zi ∼ N (0, σ 2 )
I
n observations, p features
I
β is sparse
Jan 21 2015
Controlling false discovery rate via knockoffs
3/36
Setting
Goal: select a set of features Xj
that are likely to be relevant to the response y,
without too many false positives.
One way to measure performance:
# false positives
|S ∩ H0 |
FDR = E
=E
.
total # of features selected
|S|
↑
False discovery rate
|
{z
False discovery proportion
}
S = set of selected features
H0 = “null hypotheses” = {j : βj? = 0}
Jan 21 2015
Controlling false discovery rate via knockoffs
4/36
Sparse regression
Lasso: βλ = arg min
β∈Rp
1
ky − X · βk22 + λ kβk1
2
Asymptotically, Lasso will select the correct model (at a good λ).
In practice for a finite sample,
I
True positives & false positives intermixed along the Lasso path
I
How to pick λ to balance FDR vs power?
I
Need to account for correlations between Xj & weak signals that
may have been missed on the Lasso path.
Jan 21 2015
Controlling false discovery rate via knockoffs
5/36
Sparse regression
Simulated data with n = 1500, p = 500.
●●
2
●
●
●
1
●
●
●
●
●
−1
0
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
−2
Fitted coefficient βj
Lasso fitted model for λ = 1.75:
●
●
●
●
0
100
200
300
400
500
Index j
Jan 21 2015
Controlling false discovery rate via knockoffs
6/36
Sparse regression
Simulated data with n = 1500, p = 500.
●●
2
●
●
●
1
●
●
●
●
●
−1
0
●
●
●
●
●
● ●
●
o
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
−2
Fitted coefficient βj
Lasso fitted model for λ = 1.75:
●
●
o
●
●
0
100
200
False positive?
300
400
True positive?
500
Index j
Jan 21 2015
Controlling false discovery rate via knockoffs
7/36
Sparse regression
Simulated data with n = 1500, p = 500.
●●
2
●
●
●
1
●
●
●
●
●
−1
0
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
−2
Fitted coefficient βj
Lasso fitted model for λ = 1.75:
FDP =
26
55
= 47%
●
●
●
●
0
100
200
300
400
500
Index j
Jan 21 2015
Controlling false discovery rate via knockoffs
8/36
Sparse regression
Simulated data with n = 1500, p = 500.
●●
2
●
●
●
1
●
●
●
●
●
−1
0
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
−2
Fitted coefficient βj
Lasso fitted model for λ = 1.75:
FDP =
26
55
= 47%
●
●
●
●
0
100
200
300
400
500
Index j
To estimate FDP, would need to calculate distribution of βjλ for null j
(would need to know σ 2 , β ? , . . . ). (Donoho et al 2009)
Jan 21 2015
Controlling false discovery rate via knockoffs
8/36
Construct knockoffs
Main idea:
ej .
For each feature Xj , construct a knockoff version X
The knockoffs serve as a “control group” ⇒ can estimate FDP.
Setting:
I
Require n > p (ongoing work for high-dim. setting)
I
Don’t need to know σ 2
I
Don’t need any information about β ?
I
Will get an exact, finite-sample guarantee for FDR
Jan 21 2015
Controlling false discovery rate via knockoffs
9/36
Construct knockoffs
Construction:
I
The knockoffs replicate the correlation structure of X:
ej> X
ek = Xj> Xk for all j, k
X
I
Also preserve correlations between knockoffs & originals:
ej> Xk = Xj> Xk for all j 6= k
X
Augmented design matrix
e = (X1 X2 . . . Xp X
e1 X
e2 . . . X
ep ) ∈ Rn×2p
X X
Jan 21 2015
Controlling false discovery rate via knockoffs
10/36
Construct knockoffs
How?
e = X · (Ip − 2ξΣ−1 ) + U · C,
Define X
where:
Σ = X > X ξIp
U = n × p orthonormal matrix orthogonal to X
>
C C = 4(ξIp − ξ 2 Σ−1 ) (Cholesky decomposition)
=⇒
Jan 21 2015
e > X X
e =
X X
Σ
Σ − 2ξIp
Σ − 2ξIp
Σ
Controlling false discovery rate via knockoffs
11/36
Construct knockoffs
Why?
For a null feature Xj ,
D e>
?
e>
e>
Xj> y = Xj> Xβ ? + Xj> z = X
j Xβ + Xj z = Xj y
Jan 21 2015
Controlling false discovery rate via knockoffs
12/36
Construct knockoffs
Why?
For a null feature Xj ,
D e>
?
e>
e>
Xj> y = Xj> Xβ ? + Xj> z = X
j Xβ + Xj z = Xj y
Jan 21 2015
Controlling false discovery rate via knockoffs
12/36
Construct knockoffs
Lemma 1: Pairwise exchangeability property.
For any N ⊂ H0 ,
h
i
e
X X
>
swap(N)
h
i>
D
e y
y = X X
=⇒ the knockoffs are a “control group” for the nulls
h
Jan 21 2015
e
X X
i
swap(N)
=
Controlling false discovery rate via knockoffs
13/36
Knockoff method
Steps:
1. Construct knockoffs
2. Compute Lasso with augmented matrix:
1
2
e
βλ = arg min
y
−
X
X
·
β
+
λ
kβk
1
2
2
β∈R2p
ej as a “control group” for Xj
3. Use X
Jan 21 2015
Controlling false discovery rate via knockoffs
14/36
Knockoff method
Fitted model for λ = 1.75 on the simulated dataset:
●
Fitted coefficient βj
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
500 original features
I
Jan 21 2015
500 knockoff features
Lasso selects 49 original features & 24 knockoff features
Controlling false discovery rate via knockoffs
15/36
Knockoff method
Fitted model for λ = 1.75 on the simulated dataset:
●
Fitted coefficient βj
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
500 original features
500 knockoff features
I
Lasso selects 49 original features & 24 knockoff features
I
Pairwise exchangeability of the nulls
=⇒ probably ≈ 24 false positives among the 49 original features
Jan 21 2015
Controlling false discovery rate via knockoffs
15/36
Knockoff method
Compute Lasso on the entire path λ ∈ [0, ∞).
n
o
λj = sup λ : βjλ 6= 0 = first time Xj enters Lasso path
n
o
ej = sup λ : βeλ 6= 0 = first time X
ej enters Lasso path
λ
j
Then define statistics
ej } · sign(λj − λ
ej )
Wj = max{λj , λ
Jan 21 2015
Controlling false discovery rate via knockoffs
16/36
Knockoff method
●
null
signal
+ +
●
●
+ +
●
−
|
{z
●
+
● ●
−
}
λ=0
variables enter late
(probably not significant)
Jan 21 2015
+
●
+
●
+
|W|
●
− −
−
|
{z
}
λ→∞
variables enter early
(likely significant)
Controlling false discovery rate via knockoffs
17/36
Knockoff method
●
λ when Xj enters
4
3
●
●
●
●
●
●
●
●
2
1
0
●
●
●●
●
●
●
●
● ●
●●
●
●
●●
● ●●
●
●
●
●
●● ● ● ● ● ●
●●
●
●
●●
●
●
●●
●
●
● ●
●
●●
● ●
●●
●●
●
●
●
●●
●
●● ●
●
●
●
●
● ● ●
● ●●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●●●
● ●
●●
●●● ● ●
●
●
●● ●
●
●
●●
●● ●●
●
●
● ●
●
●●
●
●●●●● ●● ● ●
●
●●
●
●
●
● ●● ● ●●●●
●● ●
●
●
●●●
● ● ● ●
● ●
● ●
●
● ●
● ●●● ● ●
●
● ● ● ●●●
●
●
●●
●●●
●
●
●
●●●●
●
●
●
●
● ●
●●
● ●●
●●
● ●●●●● ●● ● ●
●
●
●
● ● ●
●
●● ● ● ●●●
●
●
●
●
●
●
●
●● ● ●
●
●●
●
●●
● ●●
●
●
●
●
●●●●
● ●●
●
● ●
● ●
●
●
●
●
● ●
●●
●●
●
●
●●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
● ●
●●●
●●
●
●
●
●
●
●
●●●● ●● ● ●
●●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●● ●●●
●
●
●
●
●● ●● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ● ●
●
●
●
● ● ● ●●
●
●●●
● ●●
●
●
●
●●
0
Jan 21 2015
Null
Signal
1
●
2
3
λ when Xj enters
4
Controlling false discovery rate via knockoffs
18/36
Knockoff method
Lemma 2: Pairwise exchangeability of the nulls =⇒
D
(W1 , W2 , . . . , Wp ) = (|W1 | · 1 , |W2 | · 2 , . . . , |Wp | · p )
iid
where j = sign(Wj ) for non-nulls and j ∼ {±1} for nulls.
●
null
signal
+ +
●
●
+ +
●
−
Jan 21 2015
●
+
● ●
−
+
●
− −
●
+
●
+
|W|
−
Controlling false discovery rate via knockoffs
19/36
Knockoff method
Selected variables:
Control group:
Jan 21 2015
Sλ = {j : Wj ≥ +λ}
e
Sλ = {j : Wj ≤ −λ}
Controlling false discovery rate via knockoffs
e d λ ) := Sλ FDP(S
Sλ 20/36
Knockoff method
●
4
λ when Xj enters
e d λ ) := Sλ FDP(S
Sλ Sλ = {j : Wj ≥ +λ}
e
Sλ = {j : Wj ≤ −λ}
Selected variables:
Control group:
3
●
●
●
●
selected variables Sλ
control group Sλ
●
●
●
●
2
1
0
Null
Signal
●
●
●●
●
●
●
●
● ●
●●
●
●
●●
● ●●
●
●
●
●
●● ● ● ● ● ●
●●
●
●
●●
●
●
●●
●
●
● ●
●
●●
● ●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
● ● ●
● ●●
●●
●● ●
●
●
●
● ●
●
●
●
●●●
●
●●
●●● ● ● ●
●●● ● ●
●
●
●
●
●●
●● ●●
●
●
● ●
●
●●
●
●●●●● ●● ● ●
●
●●
●
●
●
● ●● ● ●●●●
●● ●
●
●
●●●
● ● ● ●
● ●
● ●
●
● ●
● ●●● ● ●
●
● ● ● ●●●
●
●
●●
●●●
●
●●
●●●●
●●
●● ● ●
●●
● ●●
●●
● ●●●●● ●● ● ●
●
●
●
● ● ●
●
●● ● ● ●●●
●
●
●
● ●●
●● ●● ● ● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
● ●●
●●
●●
●●●●●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
● ●
●
●●●●
●
●●● ● ● ●
●●
●
● ●
●●
●●
●
●
●
●
●
●
●●●● ●● ● ●
●●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●● ●
●●
●
●
●
●
●
●
●
●●
● ● ●●● ● ●● ●
●
●
●
●
●
●●
●● ● ●
●
●
●
●
●
●●●
● ●●
●
●
●
●●
● ● ●●
0
1
●
2
3
λ when Xj enters
4
Sλ ∩ H0 e
Sλ ∩ H0 d λ)
FDP(Sλ ) =
≈
≤ FDP(S
Sλ Sλ Jan 21 2015
Controlling false discovery rate via knockoffs
20/36
Knockoff method
The knockoff filter: define
e d λ ) := Sλ = #{j : Wj ≤ −λ} ,
FDP(S
Sλ #{j : Wj ≥ +λ}
then choose
n
o
d λ ) ≤ q (or λ = ∞ if empty set)
Λ = min λ : FDP(S
and select the variable set
SΛ = {j : Wj ≥ Λ} .
Jan 21 2015
Controlling false discovery rate via knockoffs
21/36
Knockoff method
1.0
●
3
●
●
●
●
●
●
2
0
Actual FDP
Estimated FDP
0.8
●
●
●
1
●
●●
●
●
●
●
● ●
●●
●
●
●●
● ●●
●
●
●
●
●● ● ● ● ● ●
●●
●
●
●●
●
●
●●
●
●
● ●
●
●●
● ●
●●
●●
●
●
●
●●
●
●● ●
●
●
●
●
● ● ●
● ●●
●●
●● ●
●
●
●
● ●
●
●
●●●
● ●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
● ●
●●
●
●●●●● ●● ● ●
●
●●
●
●
●
● ●● ● ●●●●
●● ●
●
●
●●●
● ● ● ●
● ●
● ●
●
● ●
● ●●● ● ●
●
● ● ● ●●●
●
●
●●
●●●
●●
●●●●
●● ● ●
●●
●
●●
●
●
●●
●
●
●
●
● ●●●
● ● ●
●
●
● ● ●
●
●● ● ● ●●●
●
●●
●
● ●●
●● ●● ● ● ●●
●
●
●
●
●
●
●
●
●
●●●●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
● ● ●●
● ●
●
●●
●
●●
●●
●
● ●
●
●●● ● ● ●
●●
●
● ●
●●
●●
●●
●
●
●
●
●
●
●●●● ●● ● ●
●●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
● ●
●
●● ● ●●
●
●
●
●
●
●●●
● ●●
●
●
●●
●●
● ● ●●
0
Jan 21 2015
Null
Signal
FDP
λ when Xj enters
4
1
0.6
0.4
0.2
0.0
●
2
3
λ when Xj enters
4
0
1
2
3
4
λ
Controlling false discovery rate via knockoffs
22/36
Theoretical guarantees
Theorem 1: For SΛ chosen by the knockoff filter,
E [mFDP(SΛ )] ≤ q
where the modified FDP is given by
S ∩ H0 mFDP(S) = .
S + q−1
Jan 21 2015
Controlling false discovery rate via knockoffs
23/36
Theoretical guarantees
The knockoff+ filter: define
e d + (Sλ ) := Sλ + 1 = #{j : Wj ≤ −λ} + 1 ,
FDP
Sλ #{j : Wj ≥ +λ}
then choose
n
o
d + (Sλ ) ≤ q (or λ = ∞ if empty set)
Λ+ = min λ : FDP
and select the variable set
SΛ+ = {j : Wj ≥ Λ+ } .
Jan 21 2015
Controlling false discovery rate via knockoffs
24/36
Theoretical guarantees
Theorem 2: For SΛ+ chosen by the knockoff+ filter,
E FDP(SΛ+ ) ≤ q .
Jan 21 2015
Controlling false discovery rate via knockoffs
25/36
Theoretical guarantees
Theorem 2: For SΛ+ chosen by the knockoff+ filter,
E FDP(SΛ+ ) ≤ q .
Proof sketch:
SΛ ∩ H0 SΛ ∩ H0 e
SΛ+ ∩ H0 + 1
+
+
FDP(SΛ+ ) =
·
=
SΛ SΛ e
SΛ+ ∩ H0 + 1
+
+
{z
} |
|
{z
}
d + (SΛ )≤q
≤FDP
+
Jan 21 2015
Controlling false discovery rate via knockoffs
martingale
25/36
Theoretical guarantees
Proof sketch cont’d:
Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ,
e
Sλ ∩ H0 + 1
and Λ+ is a stopping time.
+ +
●
●
+ +
●
−
Jan 21 2015
●
+
● ●
−
+
●
− −
●
+
●
+
|W|
−
Controlling false discovery rate via knockoffs
26/36
Theoretical guarantees
Proof sketch cont’d:
Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ,
e
Sλ ∩ H0 + 1
and Λ+ is a stopping time.
+ +
●
●
+ +
●
−
Jan 21 2015
●
+
● ●
−
●
−
●
●
|W|
−
Controlling false discovery rate via knockoffs
26/36
Theoretical guarantees
Proof sketch cont’d:
Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ,
e
Sλ ∩ H0 + 1
and Λ+ is a stopping time.
+ +
●
●
+ +
●
−
Jan 21 2015
●
+
● ●
−
●
−
●
●
|W|
−
Controlling false discovery rate via knockoffs
26/36
Theoretical guarantees
Proof sketch cont’d:
Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ,
e
Sλ ∩ H0 + 1
and Λ+ is a stopping time.
+ +
●
●
+ +
●
−
Jan 21 2015
●
+
● ●
−
●
−
●
●
|W|
−
Controlling false discovery rate via knockoffs
26/36
Theoretical guarantees
Proof sketch cont’d:
Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ,
e
Sλ ∩ H0 + 1
and Λ+ is a stopping time.
+ +
●
●
+ +
●
−
Jan 21 2015
●
+
● ●
−
●
−
●
●
|W|
−
Controlling false discovery rate via knockoffs
26/36
Theoretical guarantees
Proof sketch cont’d:
Sλ ∩ H0 M(λ) = is a supermartingale w.r.t. increasing λ,
e
Sλ ∩ H0 + 1
and Λ+ is a stopping time.
+ +
●
●
+ +
●
−
●
+
● ●
−
●
−
●
●
|W|
−
C
E [M(Λ+ )] ≤ E [M(0)] = E
≤ 1,
|H0 | − C + 1
for C = # of + coin flips ∼ Bin(|H0 |, 0.5)
Jan 21 2015
Controlling false discovery rate via knockoffs
26/36
Simulations
Setup:
I
n = 3000, p = 1000, sparsity level k
I
Features Xj are random unit vectors with correlation level ρ
I
For signals j, βj? ∼ {±A} for amplitude level A
I
y = Xβ + N(0, In )
iid
Compare knockoff, knockoff+, & Benjamini-Hochberg (BH).
Jan 21 2015
Controlling false discovery rate via knockoffs
27/36
Simulations
I
Fix amplitude A = 3.5 & sparsity level k = 30
I
Vary feature correlation ρ from 0 to 0.9 (set E[Xj> Xk ] = ρ|j−k| )
30
100
●
25
Nominal level
Knockoff
Knockoff+
BHq
●
Knockoff
Knockoff+
BHq
80
15
Power (%)
FDR (%)
20
●
●
●
●
●
60
●
●
●
●
●
40
10
●
●
●
●
20
●
5
●
●
●
●
0
0.0
0.2
0.4
0.6
Feature correlation ρ
Jan 21 2015
●
0
0.8
0.0
0.2
0.4
0.6
0.8
Feature correlation ρ
Controlling false discovery rate via knockoffs
28/36
HIV data
Which mutations in the RT or protease of HIV-1
determine susceptibility to RT inhibitors or protease inhibitors?
Data:
Genotypic predictors of HIV type 1 drug resistance, Rhee et al (2006)
Available at hivdb.stanford.edu (Stanford HIV Drug Resistance Database)
I
Each drug analysed separately
I
Response y = resistance to the drug
I
Features X = which mutations are present in the RT or in the
protease
Jan 21 2015
Controlling false discovery rate via knockoffs
29/36
HIV data
The data set:
Drug type
# drugs
Sample size
PI
NRTI
NNRTI
6
6
3
848
639
747
Jan 21 2015
# protease or
RT positions
genotyped
99
240
240
Controlling false discovery rate via knockoffs
# mutations
appearing ≥ 3 times
in sample
209
294
319
30/36
HIV data
The data set:
Drug type
# drugs
Sample size
PI
NRTI
NNRTI
6
6
3
848
639
747
# protease or
RT positions
genotyped
99
240
240
# mutations
appearing ≥ 3 times
in sample
209
294
319
To validate results:
I
Jan 21 2015
Treatment-selected mutation (TSM) panel:
A separate study identifies mutations frequently present in
patients who have been treated with each type of drug
Controlling false discovery rate via knockoffs
30/36
HIV data
Results for
PI type drugs
# HIV−1 protease positions selected
Resistance to APV
35
30
Appear in TSM list
Not in TSM list
Resistance to ATV
35
30
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
0
0
Knockoff
BHq
5
0
Knockoff
Data set size: n=768, p=201
BHq
Knockoff
Data set size: n=329, p=147
Resistance to LPV
Resistance to NFV
Resistance to SQV
35
35
30
30
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
0
0
BHq
Data set size: n=516, p=184
BHq
Data set size: n=826, p=208
35
Knockoff
Jan 21 2015
Resistance to IDV
35
5
0
Knockoff
BHq
Data set size: n=843, p=209
Controlling false discovery rate via knockoffs
Knockoff
BHq
Data set size: n=825, p=208
31/36
Resistance to X3TC
# HIV−1 RT positions selected
HIV data
30
Appear in TSM list
Not in TSM list
Resistance to ABC
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
0
0
Knockoff
BHq
Knockoff
Data set size: n=628, p=294
Resistance to D4T
Resistance to DDI
Resistance to TDF
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
0
0
BHq
30
Appear in TSM list
Not in TSM list
5
0
Knockoff
BHq
Knockoff
Data set size: n=632, p=292
Resistance to DLV
35
Resistance to EFV
Resistance to NVP
35
35
30
30
25
25
20
20
20
15
15
15
10
10
10
5
5
0
0
BHq
Data set size: n=732, p=311
BHq
Data set size: n=353, p=218
25
Knockoff
BHq
Data set size: n=630, p=292
30
Data set size: n=630, p=293
# HIV−1 RT positions selected
0
BHq
30
Knockoff
Jan 21 2015
5
Knockoff
Data set size: n=633, p=292
Results for
NRTI type drugs
Results for
NNRTI type drugs
Resistance to AZT
30
5
0
Knockoff
BHq
Data set size: n=734, p=318
Controlling false discovery rate via knockoffs
Knockoff
BHq
Data set size: n=746, p=319
32/36
Can knockoffs be replaced by permutations?
Let X π = X with rows randomly permuted. Then
Σ 0
π >
π
X X
X X ≈
0 Σ
Jan 21 2015
Controlling false discovery rate via knockoffs
33/36
Can knockoffs be replaced by permutations?
λj
Value of λ when variable enters model
Let X π = X with rows randomly permuted. Then
Σ 0
π >
π
X X
X X ≈
0 Σ
●
25
●
20
Original non−null features
Original null features
Permuted features
●
10
●●
●
5
●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
● ●● ●●●●●●●●●●●● ●● ● ●●● ● ●●● ● ●● ●●●●●●●●●● ● ●●●● ●● ●● ●
0
|0 {z } |
signals
50
{z
}100|
150
Index of column in the augmented matrix [X Xπ]
nulls
Knockoff method
Permutation method
Jan 21 2015
●
●
15
200
{z
permuted features
}
FDR (target level q = 20%)
12.29%
45.61%
Controlling false discovery rate via knockoffs
33/36
Summary
The knockoff filter for inference in a sparse linear model:
• Creates a “control group” for any type of statistic
• Handles any type of feature correlation
• Unknown noise level & sparsity level
• Finite-sample FDR guarantees
Jan 21 2015
Controlling false discovery rate via knockoffs
34/36
Summary
Future work:
1. How to move to high-dimensional setting?
2. Extend to GLMs or other regression models?
3. Similar principles for other problems, e.g. graphical models?
Jan 21 2015
Controlling false discovery rate via knockoffs
35/36
Summary
Thank you!
I
I
Code & demos available at
http://web.stanford.edu/~candes/Knockoffs/
Paper available at http://arxiv.org/abs/1404.5609
I Joint work with Emmanuel Candès @ Stanford
I R. F. B. was partially supported by NSF award DMS-1203762. E. C. is partially
supported by AFOSR under grant FA9550-09-1-0643, by NSF via grant CCF-0963835
and by the Math + X Award from the Simons Foundation.
Jan 21 2015
Controlling false discovery rate via knockoffs
36/36