Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem Cell Network Online Microarray Analysis Course Unit Two Fall 2006 http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Introduction to SAM • “assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements.” • SAM uses permutations of repeated measurements to estimate the False Discovery Rate • Paper available online: http://www-stat.stanford.edu/~tibs/SAM/pnassam.pdf http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Overview • Calculate “relative difference” – a value that incorporates the change in expression between conditions and the variation of measurements in each condition • Calculate “expected relative difference” – derived from controls generated by permutations of data • Plot against each other, set cutoff to identify deviating genes • Calculate FDR for chosen cutoff from the control permutations http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Relative Difference x I (i) xU (i) d(i) s(i) s0 xI (i), xU (i) Mean expression of gene i in condition I or U http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Relative Difference xI (i ) xU (i ) d (i ) s (i ) s0 s(i) Gene-specific scatter http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Gene-specific scatter s(i) a 2 2 [x (i) x (i)] [x (i) x (i)] m m n n I U a 1 n1 1 n2 n1 n2 2 xI (i ) xU (i ) d (i ) s (i ) s0 http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Relative Difference x I (i) xU (i) d(i) s(i) s0 s0 Small positive constant calculated to minimize coefficient of variation. http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics T-test xy t (1) (2) 2 1 p nx s t df x ( s x ) df y ( s y ) 2 2 xi x yi y df x i df x df y i df y df x df y 2 (3) 1 nx n1y (4) df x df y (7) s(i) a m [ xm (i) xI (i)]2 n [ xn (i) xU (i)]2 x 1 nx 1 nx n1y 1 ny SS 2 1 ny ny x SS y a (8) s(i) aSS I SSU (9) s (i ) xy 1 nx xI (i ) xU (i ) d (i ) s (i ) s0 2 xy SS x SS y t n (6) xy t (5) 1 ny SAM xy df x df y t vs. d (i ) (10) http://www.ottawagenomecenter.ca/research/bioinformatics/ 1 n1 n12 n1 n2 2 1 n1 1 n2 n1 n2 2 SS I SSU xI (i ) xU (i ) 1 n1 n12 n1 n2 2 SS I SSU s0 OHRI Bioinformatics Relative difference vs. Gene scatter • Plotting d(i) vs s(i) d (i ) xI (i ) xU (i ) s (i ) s0 • Comparing 4 shaded vs 4 non-shaded samples • • • • A: Relative differences between irradiated and unirradiated states B: Relative differences between cell lines C: Relative differences between hybridizations (technical replicates) D: Relative differences between ‘balanced’ permutation (Extra control) http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics SAM creates controls via permutation • Consider permutations of the samples used. • In the original paper, looked at 36 balanced permutations where each cell line was represented was represented equally • Calculate dp(i) for each permutation p • Average all dp(i) to get ‘expected relative difference’: dE(i) http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Finding significant genes • Plot d(i) vs dE(i) • Identify genes which deviate from d(i)=dE(i) by more than a threshold, • These do not necessarily have the largest change in expression. • Can optimize with estimate of false positive rate http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics False Discovery Rate • Take observed d(i) values for upper and lower cutoffs • Find the mean number of genes exceeding these cutoffs in the permuted data - this gives an estimate for FDR http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics SAM Output • List of significantly changing genes – Fold changes may be asymmetric • Estimated false positive rate for the list http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics SAM Implementations • In Bioconductor – R package – ‘siggenes’ • From Stanford – Excel plugin – R package – ‘samr’ – http://www-stat.stanford.edu/~tibs/SAM/ http://www.ottawagenomecenter.ca/research/bioinformatics/ OHRI Bioinformatics Unit 2 Exercises • Analysis of array data with SAM in R for Windows • Exploration of SAM results • Identification significantly changing genes http://www.ottawagenomecenter.ca/research/bioinformatics/