Download Some issues in microarray experimental design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Some views on microarray
experimental design
Rainer Breitling
Molecular Plant Science Group &
Bioinformatics Research Centre
University of Glasgow, Scotland, UK
Personal Background
• University of Glasgow, Scotland, UK
• Molecular Plant Sciences Group
• Bioinformatics Research Centre
• Functional Genomics Facility
Some common questions in
microarray experimental design
•
•
•
•
How many arrays will I need?
Should I pool my samples?
Which arrays should I choose?
Which samples should I put together on
one array?
Why are microarrays special?
• produce large amounts of data
instantaneously
• can look for unexpected effects
• are still quite expensive
almost never repeated
careful design necessary before you start
How many replicates?
• as many as possible
Statistics says: The more replicates, the
better your estimate of expression (that’s
an asymptotic process, so if you add at
least a few replicates, the effect will be
really strong)
How many replicates?
n
4( z1 / 2  z1  )
( /  )
2
2
•α significance level (probability of detecting FP)
•1-β power to detect differences (probability of detecting TP)
•σ standard deviation of the log-ratios
•δ detectable difference between class mean log-ratios
•z percentile of standard normal distribution
 n required number of arrays (reference design)
How many replicates?
• Five
Experience shows: For most common
experiments you get a reasonable list of
differentially expressed genes with 5
replicates
How many replicates?
• Three
One to convince yourself, one to convince
your boss, one just in case...
How many replicates?
• It depends on
– the quality of the sample
– the magnitude of the expected effect
– the experimental design
– the method of analysis
The quality of the sample
• smaller samples (single cells) are more
noisy than large samples (tissue
homogenates)
• cell cultures are less noisy than patient
biopsies
• sample pooling can decrease noise – if
individual variation is not of interest
The magnitude of the effect
• Microarrays are very sensitive
• To keep effects small:
– use early time points, gentle stimuli
– never compare dogs and donuts
• if you get a list of 2000 genes that are
significantly changed, your experiment
failed!
The magnitude of the effect
• some problematic cases
– stably transfected cell lines (are they still the
same cells?)
– knock-out organisms (even the same tissue
can be a different)
– local changes may be diluted  cell
isolation will increase noise
The experimental design
• Three major options:
– reference design (flexible)
– balanced block design (efficient)
– loop design (elegant)
The experimental design
• loop designs can save samples...
A
B
C
D
R
R
R
R
A
B
D
C
• ...but they can cause interpretation
nightmares in less simple cases (use for
large studies, if you have a full-time
statistician in the team)
The method of analysis
• Golub et al. (1999) data
set
• 38 leukemia patient bone
marrow samples,
hybridized individually to
Affymetrix microarrays
• Differential expression
between two leukemia
types was examined,
using random subsets of
the complete dataset
The method of analysis
0h
9.5h
iterative
GroupAnalysis
(iGA)
11.5h
13.5h
15.5h
18.5h
20.5h
6144 - purine base
metabolism
6099 - tricarboxylic
acid cycle
6099 - tricarboxylic
acid cycle
3773 - heat shock
protein activity
6099 - tricarboxylic
acid cycle
9277 - cell wall
(sensu Fungi)
3773 - heat shock
protein activity
5749 - respiratory
chain complex II
(sensu Eukarya)
6099 - tricarboxylic
acid cycle
3773 - heat shock
protein activity
297 - spermine
transporter activity
6950 - response to
stress
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
5977 - glycogen
metabolism
5749 - respiratory
chain complex II
(sensu Eukarya)
15846 - polyamine
transport
297 - spermine
transporter activity
8177 - succinate
dehydrogenase
(ubiquinone) activity
6950 - response to
stress
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
4373 - glycogen
(starch) synthase
activity
3773 - heat shock
protein activity
4373 - glycogen
(starch) synthase
activity
8177 - succinate
dehydrogenase
(ubiquinone) activity
15846 - polyamine
transport
4373 - glycogen
(starch) synthase
activity
4129 - cytochrome
c oxidase activity
6537 - glutamate
biosynthesis
5353 - fructose
transporter activity
7039 - vacuolar
protein catabolism
5751 - respiratory
chain complex IV
(sensu Eukarya)
6097 - glyoxylate
cycle
15578 - mannose
transporter activity
6950 - response to
stress
5749 - respiratory
chain complex II
(sensu Eukarya)
5750 - respiratory
chain complex III
(sensu Eukarya)
7039 - vacuolar
protein catabolism
4129 - cytochrome
c oxidase activity
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
9060 - aerobic
respiration
8645 - hexose
transport
5751 - respiratory
chain complex IV
(sensu Eukarya)
8177 - succinate
dehydrogenase
(ubiquinone) activity
4129 - cytochrome
c oxidase activity
respiratory chain
complex II
glyoxylate
cycle
citrate (TCA) cycle
oxidative phosphorylation
Graph-based iterative
GroupAnalysis (GiGA)
respiratory chain
complex III
(complex V)
What is a good replicate?
The experiment your competitor at the other
side of the globe would do to see if your
results are reproducible
Vary “all” parameters – challenge your
results
Prepare new samples, from new cultures,
using new buffers and new graduate
students
Remember to produce matched controls
What is a “bad” replicate?
• technical replicates (i.e. hybridizing the
same sample repeatedly)
• dye-swapping experiments (usually genespecific dye bias is not a big issue, and
dye balancing is more efficient anyway)
• pooled samples, hybridized repeatedly
• the same preparation, only labelled twice
Should samples be pooled?
• most samples are already pooled – they
come from multiple cells
• pool to increase amount of mRNA, but
only as much as necessary
• prepare independent pools to assess
variation
• problems: bias, “contamination”, outliers,
information loss...
Which arrays are the best?
• Standard arrays
compare and exchange data easily
• Whole-genome arrays
detect unexpected effects, increase confidence
• Single-color arrays (Affymetrix GeneChip)
for more complex comparisons
• Annotated arrays
Further reading
• Dobbin, Shih & Simon (2003) J. Natl.
Cancer Inst. 95: 1362.
• Yang & Speed (2002) Nature Rev. Genet.
3: 579.
• Breitling (2004)
http://www.brc.dcs.gla.ac.uk/~rb106x/microarray_tips.htm
Contact
Rainer Breitling
Bioinformatics Research Centre
Davidson Building A416
[email protected]
http://www.brc.dcs.gla.ac.uk/~rb106x