Download tileMEseq0905

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Notes and
statistics on
base level
expression
Don Gilbert
May 2009
Biology Dept., Indiana University
[email protected]
wfleabase.org/docs/tileMEseq0905.pdf
2007: Tile expression
DrosMel tiled by Affymetrix, finds new genes (blue) and
known (orange)
.
wfleabase.org/docs/tileMEseq0905.pdf
Precision improves ’06-’09
Measuring expression over gene structures,
Nimblegen (08) has higher precision than Affy (06/07)
RNA-Seq (09) has higher precision than Nimblegen
.
wfleabase.org/docs/tileMEseq0905.pdf
…
microarray statistics
for base level expression?
wfleabase.org/docs/tileMEseq0905.pdf
Gene or Base expression?
•
Base-level expression (tiles, rna-seq) calculate like
gene differential expression (DE)
•
•
Combine for tiles over gene
•
•
•
•
Per tile, per RNA-seq contig or per base: treatment - control
Independent (technically) observations, but biologically related
Increase DF, Power with longer gene
How to combine?
•
As independent replicates: gene > (tiles, technical, bio replicates)?
•
As nested block: gene > tiles > replicates ?
•
As gene average: gene = mean(tiles) > replicates ?
Compare with gene-level stats
…
wfleabase.org/docs/tileMEseq0905.pdf
Gene or Base expression?
Base level tests find expression better than gene average
Base level sensitivity= 42%, Gene level sensitivity= 38% Both have specificity= 37%
Sensitivity = 1 - false rejection; Specificity = 1 - false discovery
wfleabase.org/docs/tileMEseq0905.pdf
Gene or Base expression?
DE is consistent over gene span though expression Ave
changes; gene-level measure can miss this.
Expression over gene span, treatment(red) vs control(green) with 3 replicates
wfleabase.org/docs/tileMEseq0905.pdf
…
gene structures & expression
wfleabase.org/docs/tileMEseq0905.pdf
Sequence normalizing?
Idea is to remove sequence (GC) effects on probe hyb. score
QuickTime™ and a
decompressor
are needed to see this picture.
TileScope; Royce TE, Rozowsky JS, and Gerstein, MB. (2007). Assessing the need for
sequence-based normalization in tiling microarray experiments. Bioinformatics, 23, 988-997.
wfleabase.org/docs/tileMEseq0905.pdf
Sequence normalizing?
Sequence-normalizing also removes Exon/Intron signal !
QuickTime™ and a
decompressor
are needed to see this picture.
Don’t use it (TileScope’s quantilenorm)
.. or other sequence adjustments of expression,
unless gene structure signals are included.
wfleabase.org/docs/tileMEseq0905.pdf
QuickTime™ and a
decompressor
are needed to see this picture.
Intron-Exon Detection
Nimblegen and Solexa tile/base expression detects gene structure, on average, fairly well.
wfleabase.org/docs/tileMEseq0905.pdf
Intron-Exon Update
Newest
RNA-Seq
finds
intron/exon
very well
(Stranded RNASeq, modEncode
Gingeras lab,
March 2009 )
wfleabase.org/docs/tileMEseq0905.pdf
Differential expression
Gene end (3’) has more expression, but
Example
genes
exons
introns
QuickTime™ and a
decompressor
are needed to see this picture.
constant differential over gene span, on average.
Green is treatment, red control. Line style shows 3 replicates of Daphnia tiled expression.
wfleabase.org/docs/tileMEseq0905.pdf
Diff. Expr. distributions
Introns show a Pred
null DE
distribution,
genes and
TAR regions Metal
are wider.
Genes
Use introns as
baseline for
Sex
statistics?
wfleabase.org/docs/tileMEseq0905.pdf
Introns
TARs
…
multiple testing corrections
wfleabase.org/docs/tileMEseq0905.pdf
Multiple statistic tests
• Problem: perform 20,000 tests and p-values hit laws of
chance. Pr = 0.05 can happen 1,000 times by chance (false
discovery, FDR).
• DrosMel Affy line t-tests: 2,284,383 / 5,395,023 = 0.42 Sig
• Bonferroni: conservative = 0.03 Sig
• Benjamini & Hochberg: p.adjust(p,’BH’) = 0.35 Sig
• qvalue(p) : distribution based = 0.41 Sig
Storey, JD and R Tibshirani, 2003. Statistical significance for genomewide studies. PNAS 100:94409445
• SAM permutation qvalue
• However, p.adjust meant for 100’s of tests, not Millions
• Drosmel modEncode case: 1900 pairwise Affy cell line (62 cells) DE
comparisons x 14,000 genes = 26,600,000 t-tests
wfleabase.org/docs/tileMEseq0905.pdf
Multiple DE tests : Daphnia
Sex
P<0.05 6733
%P
28
%BH
19
%Qvalue
21
max P|Q 1e-2
Predate
832
3
0
0
1e-4
Metals
2502
10
0
0
1e-4
• Much different corrections for experiments on same genes
• Daphnia DE: 3 expt.s (trt - con), 25000 genes, 3 replicates
• Predate, Metal genes have low expression, important to detect
wfleabase.org/docs/tileMEseq0905.pdf
Multiple statistic tests
• “Statisticians have turned p-value corrections into
an industry, but they are really more of a band-aid
than a solution”*
• What about false rejection (FRR; type II error)?
• Balance errors, false rejection maybe more important
• Solution #1: test fewer, directed hypotheses
• Solution #2: measure error rate on knowns, eg.
prediction of “known” genes
• Solution #3: known null hypothesis, eg. introns
*http://www.bioconductor.org/workshops/2009/SeattleApr09/DiffExpr/
wfleabase.org/docs/tileMEseq0905.pdf
1900 pairwise Affy cell line DE comparisons x 14,000 genes = 26,600,000 t-tests
QuickTime™ and a
decompressor
are needed to see this picture.
wfleabase.org/docs/tileMEseq0905.pdf
Hypotheses of interest are fewer: ~100s cells x 14,000 genes ~ 2 Million tests
QuickTime™ and a
decompressor
are needed to see this picture.
wfleabase.org/docs/tileMEseq0905.pdf
Summary
1. Base-level expression (tiles, rna-seq) measures
gene expression better
•
Balances sensitivity (false rejection) with specificity (false discovery)
2. Base-level expression measures gene structures
well
•
On average, and precision is improving for individual genes.
3. Multiple test corrections are needed but
problematic
•
•
False discovery corrections for millions of tests leads to false
rejections.
Determine empirical error rates where possible
wfleabase.org/docs/tileMEseq0905.pdf
End note
Summary pages
wfleabase.org/genome-summaries/tile-expression/
insects.eugenes.org/species/data/dmel5/modencode/
Genome expression maps
insects.eugenes.org:8091/gbrowse/cgi-bin/gbrowse/drosmelme/
•
expression in 52 cell lines (affy) and more precise solexa & nimblegen for
a few cell lines
insects.eugenes.org:8091/gbrowse/cgi-bin/gbrowse/daphnia_pulex8/
•
expression among 4 treatment groups (sex, metal stress, biotic predator);
nimblegen
wfleabase.org/docs/tileMEseq0905.pdf
Related documents