Download Microarray analysis with SAM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operations research wikipedia , lookup

Transcript
OHRI Bioinformatics
Introduction to the Significance Analysis
of Microarrays application
Stem Cell Network
Online Microarray Analysis Course
Unit Two
Fall 2006
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Introduction to SAM
• “assigns a score to each gene on the basis of change in
gene expression relative to the standard deviation of
repeated measurements.”
• SAM uses permutations of repeated measurements to
estimate the False Discovery Rate
•
Paper available online: http://www-stat.stanford.edu/~tibs/SAM/pnassam.pdf
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Overview
• Calculate “relative difference” – a value that
incorporates the change in expression between
conditions and the variation of measurements in
each condition
• Calculate “expected relative difference” –
derived from controls generated by permutations
of data
• Plot against each other, set cutoff to identify
deviating genes
• Calculate FDR for chosen cutoff from the control
permutations
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Relative Difference
x I (i)  xU (i)
d(i) 
s(i)  s0
xI (i), xU (i)
Mean expression of gene i
in condition I or U
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Relative Difference
xI (i )  xU (i )
d (i ) 
s (i )  s0
s(i)
Gene-specific scatter
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Gene-specific scatter

s(i)  a
2
2
[x
(i)

x
(i)]

[x
(i)

x
(i)]
m m
n n
I
U
a


1
n1

1
n2

n1  n2  2
xI (i )  xU (i )
d (i ) 
s (i )  s0
http://www.ottawagenomecenter.ca/research/bioinformatics/

OHRI Bioinformatics
Relative Difference
x I (i)  xU (i)
d(i) 
s(i)  s0
s0
Small positive constant
calculated to minimize
coefficient of variation.
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
T-test
xy
t
(1)

(2)

2 1
p nx
s
t
df x ( s x )  df y ( s y )
2
2

   xi  x  
   yi  y  
df x  i df x   df y  i df y 




df x  df y
2
(3)
1
nx

 n1y
(4)
df x  df y
(7) s(i)  a m [ xm (i)  xI (i)]2  n [ xn (i)  xU (i)]2 

x

1
nx

1
nx
 n1y

1
ny


 SS
 2
1
ny
 ny
x

 SS y 
a
(8) s(i)  aSS I  SSU 
(9) s (i ) 
xy
1
nx
xI (i )  xU (i )
d (i ) 
s (i )  s0
2
xy
SS x  SS y 
t
n

(6)
xy
t
(5)
1
ny
SAM
xy
df x  df y
t
vs.
d (i ) 
(10)
http://www.ottawagenomecenter.ca/research/bioinformatics/

1
n1
 n12

n1  n2  2


1
n1

1
n2

n1  n2  2
SS I  SSU 

xI (i )  xU (i )
1
n1
 n12

n1  n2  2
SS I  SSU   s0
OHRI Bioinformatics
Relative difference vs. Gene scatter
• Plotting d(i) vs s(i)
d (i ) 
xI (i )  xU (i )
s (i )  s0
• Comparing 4 shaded vs 4 non-shaded
samples
•
•
•
•
A: Relative differences between irradiated
and unirradiated states
B: Relative differences between cell lines
C: Relative differences between
hybridizations (technical replicates)
D: Relative differences between ‘balanced’
permutation (Extra control)
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
SAM creates controls via permutation
• Consider permutations of the samples used.
• In the original paper, looked at 36 balanced
permutations where each cell line was
represented was represented equally
• Calculate dp(i) for each permutation p
• Average all dp(i) to get ‘expected relative
difference’: dE(i)
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Finding significant genes
• Plot d(i) vs dE(i)
• Identify genes which
deviate from d(i)=dE(i) by
more than a threshold, 
• These do not necessarily
have the largest change in
expression.
• Can optimize  with
estimate of false positive
rate
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
False Discovery Rate
• Take observed d(i)
values for upper and
lower cutoffs
• Find the mean
number of genes
exceeding these
cutoffs in the
permuted data - this
gives an estimate
for FDR
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
SAM Output
• List of significantly
changing genes
– Fold changes may be
asymmetric
• Estimated false positive
rate for the list
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
SAM Implementations
• In Bioconductor
– R package – ‘siggenes’
• From Stanford
– Excel plugin
– R package – ‘samr’
– http://www-stat.stanford.edu/~tibs/SAM/
http://www.ottawagenomecenter.ca/research/bioinformatics/
OHRI Bioinformatics
Unit 2 Exercises
• Analysis of array data with SAM in R for
Windows
• Exploration of SAM results
• Identification significantly changing genes
http://www.ottawagenomecenter.ca/research/bioinformatics/