Download supplementary information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Community fingerprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
S U P P L E M E N TA R Y I N FO R M AT I O N
JENNER & YOUNG (2005)
1
Chang 2000
Dietz 2000
Bigger 2001
Feezor 2003
Jenner 2003
Van t Wout 2003
Warke 2003
2
Belcher 2000
Simmen 2001
Kramer 2003
Radhakrishnan 2003
Roberts 2003
Rodriguez 2004
Log2
Detweiler 2001
Boldrick 2002
Cuadras 2002
Guellimin 2002
Jones 2003
Mueller 2003
Pathan 2004
Remove
flagged probes
in SMD
3
Fig. S5
Cuadras 2002
Jones 2003
Mueller 2003
Median center
arrays
Subtract T=0
Median center
genes & arrays
Iizuka 2003
Wells 2003
4
Wells 2003
Wang 2004
5
Zhang 2001
Iizuka 2003
Zhang 2003
6
Tian 2002
Hong 2004
Iizuka 2003
Divide signals
by average
across all arrays
Zhang 2001
Zhang 2003
Divide signals
by geomean of
control samples
Remove probes
always called
absent
Average
duplicate probes
(one value
per HUGO ID)
Lookup value
for unique
Affymetrix
Hu6800 genes
Log2 and
median center
genes & arrays
Wang 2004
Floor signals
to 50
Retrieve HUGO
gene ID
Log2
Remove probes
called absent in
>80% arrays
Remove probes
always called
absent
7
Tian 2002
Corbeil 2001
Huang 2001
Nau 2002
Izmailova 2003
Nau 2003
Hu6800 studies
S5 | Microarray data processing. The steps required for processing data from each paper into a format allowing comparison between them. The datasets were acquired from various websites (addresses for which can
be found in supplementary information S2 (table). Each dataset was processed separately and then combined at the end. The datasets can be divided into 9 groups according to the format of the available data and the
processing involved. Data from groups 1 and 2 required little data processing, other than converting to log base 2 for group 2. Data from group 3 were taken from the Stanford microarray database (SMD). These datasets
were filtered in SMD to remove probes flagged as missing by the array scanner with subsequent processing steps carried out locally in Excel. Median centering the arrays sets the median log2 expression ratio for each
array to 0. This is performed so that the data from each array is scaled equivalently. Median centering genes does the same thing for genes and is performed for studies utilizing a common reference so that expression
changes become ratios of the median value for that gene across all samples. Data in groups 4-6 were generated with Affymetrix arrays and were provided as absolute signal intensities. The data processing for these
datasets Reflects the need to create log2 expression ratios. The data for many of these studies were also filtered to remove genes that were often or always called absent by the Affymetrix software. In all cases, data were
processed in a manner that best matches the data processing pipeline originally used by the authors. From those studies where the information was not included, HUGO gene IDs were retrieved using the ID provided, for
example GenBank accession number, LocusLink or Affymetrix IDs. Data from multiple probes representing a single gene (single HUGO ID) were averaged to provide one expression value per HUGO ID per experiment. For
each dataset, those genes that are also present on the Affymetrix Hu6800 array were isolated. This array was used to provide the final gene set because it has been used extensively by our lab and others to produce
expression data – the group 7 studies). When a gene present on the Affymetrix Hu6800 array was not found in another dataset a gap appears in the data (grey in Figs. 1 and supplementary information S6). Data in each
dataset were then aligned so that each row represents a single HUGO ID. The genes were then ordered with a 1-dimensional self-organising map and grouped using average-linkage hierarchical clustering.
www.nature.com/reviews/micro
NATURE REVIEWS | MICROBIOLOGY