Download q-value

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fetal origins hypothesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
q-value
Tiffany Chao
Beth Johnson
Steven Lee
Hypothesis testing
● Test for each gene
○ null hypothesis: no differential expression
● Two kinds of errors
○ type I error (false positive)
say that a gene is differentially expressed when it
actually isn't; wrongly reject a true null hypothesis
○ type II error (false negative)
say that a gene isn't differentially expressed when it
actually is; fail to reject a false null hypothesis
Thinking about p-values
● Probability of obtaining a test statistic at
least as extreme as the one that was actually
observed, assuming the null hypothesis is
true
● Minimum false positive rate at which an
observed statistic can be called significant
● If null hypothesis is simple, then a null pvalue is uniformly distributed
Multiple comparison problem
● Even if we have useful approximations for
our p-values, we still face the multiple
comparison problem
● When performing many independent tests,
p-values no longer have the same
interpretation
Not only in genomics!
● "Statistical Comparisons of Classifiers over
Multiple Datasets", Demsar, JMLR 2006
● "Permutation Tests for Studying Classifier
Performance", Ojala, JMLR 2010
● "On Comparing Classifiers: Pitfalls to avoid
and a recommended approach", Salzberg,
1997, Data Mining and Knowledge Discovery
Multiple hypothesis testing
Called
significant
Called not
significant
Total
Null true
F
m0 – F
m0
Alternative true
T
m1 – T
m1
Total
S
m–S
m
Suppose we care about p-values ≤ 0.05?
Error rates (more on this later)
● Per comparison error rate (PCER)
○ E[F] / m
● Per family error rate (PFER)
○ E[F]
● Family-wise error rate (FWER)
○ Pr(F ≥ 1)
● False discovery rate (FDR)*
○ E[F/S] (and set F/S = 0 when S = 0)
= E[F/S | S > 0] Pr(S > 0)
● Positive false discovery rate (pFDR)*
○ E[F/S | S > 0]
MHT error controlling procedure
● Suppose you test m hypotheses and get m pvalues: p1 , p2 , p3 , ... pm
● A multiple hypothesis test error controlling
procedure is a function T(p; α) such that
rejecting all nulls with pi ≤ T(p; α) implies that
Error ≤ α
● Error is a population quantity (not random)
Weak and strong control
● Weak: T(p; α) is such that Error ≤ α only when
m0 = m
● Strong: T(p; α) is such that Error ≤ α for any
value of m0
○ note that m0 is not an argument for T(p; α)!
Bonferroni correction
provides strong control:
but too restrictive
Why FDR and q-value?
●
To help us interpret these values, two pieces of
information would be useful
●
●
Estimate of the overall proportion of features that are
truly alternative (even if they cannot be precisely
identified)
Measure of significance that can be associated with
each feature so that thresholding the numbers at a
particular value has an easy interpretation
FDR
●
Would like an error measure that provides a
balance between
●
●
Number of false positive features (F)
Number of true positive features (T)
FDR
●
The false discovery rate is the expected value of
the proportion of false positive features among
all those called significant
*Some possibility S = 0, so some adjustment has to be made to definition of FDR
Estimating FDR
●
Therefore, the FDR depends on what threshold
(t) we are using to determine significance
Estimating FDR
●
Because we are considering many features (m is
very large), we can approximate
Estimating FDR
●
●
We now need to approximate E[S(t)] and E[F(t)]
To illustrate how FDR is determined, for m genes
we have m p values
●
●
denoted p1, p2,…,pm
Define F(t) and S(t)
can count these for a
given t
Estimating FDR
●
●
Approximating F(t) is more difficult because we
do know how many values called significant
were truly null
Assuming null p values are uniformly distributed,
the probability(null p ≤ t) = t
(# of null features x probability of null feature called significant)
Estimating FDR
●
●
●
We do not know true value of m0, (# of null
features) so we must estimate
Equivalently, we can estimate the proportion of
features that are truly null (denoted by π0)
Assuming a uniform distribution for null pvalues, we can estimate this quantity using a
histogram
Estimating π0
Find where pvalues look like a
uniform
distribution and
set λ
Estimating π0
Note π0 does
not depend on
t
λ
(1-λ)
Estimating π0
Can also fit a cubic
function to the π0
vs λ data to
determine π0(1)
(because “most” of
the p values at 1
would be expected
to be null)
FDR
●
Estimate for False Discovery Rate is
Graphical Interpretation
q-value definition
●
for a given feature, the q-value is the
expected FDR incurred if it is called
significant
○
●
(every other p_j <= p_i is also called significant)
in practical terms: a q-value threshold is the
"proportion of significant features that turn
out to be false leads"
Graphical Interpretation
Graphical Interpretation
q-value
a measure of each feature's significance
●
p-value is in terms of the false positive rate
vs
q-value is in terms of the FDR
○
this takes into account that thousands of features
are simultaneously being tested (via FDR)
■
uses a better model of where the significant features
are likely to be
p vs q
●
Example:
○
●
p-values:
○
○
●
m = 10000
cutoff at .01 assumes that you likely found about
100 false positives
cutoff of .0001 assumes that you only found 1
false positive, but at what cost?
q-values:
○
set q-value cutoff at .05, and be sure that only 5%
of the significant genes found are likely to be false
positives
Algorithm for Determining qValues
●
●
Compute test statistic (p-value) for m genes
Estimate π0
○
Using histogram
■
■
Find region where p-values are uniform + set λ
Count p-values > λ and compute (1-λ)m (number of
values)
Using cubic spline
For each p-value
calculate FDR for each threshold t >= p
○
●
○
■
○
only choose t values for each unique p in the gene set
choose minimum FDR as q-value
q-value (cutoff)
q-value accuracy
●
assumes that the dependence between features will
generally be weak dependence
genes are actually dependent in pathways, which
can be modeled as blocks
○
●
●
if so, when m is large, calling all features significant
with q <= alpha, implies the FDR <= alpha
the estimated q value of each feature is greater than
or equal to it's true q-value
conservative is desirable
○
q-value summary
●
●
A standard measure of significance that can
be universally interpreted between studies
better than using just p-values
○
arbitrary selection of alpha, where it is selected so
the expected number of false positives is < 1
throws away too many likely truly significant
features
Questions?
FDR plug-in
Create K permutations of the data, producing
k
statistics tj for features j=1,...,M and permutations
k=1,...,K.
●
For a range of cutoffs C, let
●
Estimate the FDR by
●