Download msb4100030-sup

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Heritability of IQ wikipedia , lookup

X-inactivation wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human genome wikipedia , lookup

Oncogenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Public health genomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Pathogenomics wikipedia , lookup

Microevolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Essential gene wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genomic imprinting wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Transcript
Figure S1 A
Temperatures
Proliferation
cluster
(G12)
The proliferation cluster is a stable one.
A dendrogram depicting results of cluster analysis of all varying genes on the array using the
Super paramagnetic Clustering (SPC) algorithm. The proliferation cluster is shown with red
arrow, it constitute a well-defined, stable cluster that is intact across a range of “temperatures”
Figure S1 B
The proliferation cluster is a well defined entity that is clearly separable from the rest of the genes
Right – Expression matrix of 200 varying genes that are sorted according to expression using the
SPIN algorithm (Ref) such that genes in adjacent rows have similar expression profiles; Left – a
distance matrix between the genes using the same order as in the right expression matrix on the
right. The proliferation cluster genes are marked by the red box and are shown to constitute once
cluster that does not break down naturally into sub-clusters
Figure S1 C
Expression
Distance matrix
The proliferation cluster has an elongated shape
Left – Expression matrix of the proliferation cluster genes sorted according to SPIN (Ref);
Right – a distance matrix between the genes using the same order as in the right expression
matrix on the right. The proliferation cluster is shown to have an “elongated shape”, i.e. each
gene is close to its neighbor on the sorted cluster, yet the genes at the two ends are relatively
remote. (see Bioinformatics. 2005 21(10):2301-8 for typical distance matrices of elongated
clouds.)
Figure S1 D
Principal Component Scatter Plot
0.8
Second Principal Component
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-0.8
-0.6
-0.4
-0.2
0
0.2
First Principal Component
0.4
0.6
0.8
The proliferation cluster has an elongated shape and it does not break-down naturally into
sub-clusters
Principal component analysis of expression profiles of the genes in the proliferation cluster.
This analysis too demonstrates that the proliferation cluster is an elongated (rather than
spherical for example) entity that does not break down naturally into sub-clusters
Figure S2
4
3
Gene Expression
2
1
0
Disrtribution of cell cycle periodicity
Proportion
-1
-2
-3
G2/M
G1/S
CHR
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
All genes
Proliferation genes
1
2
3
4
-4
5
10
15
The proliferation cluster genes are cell cycle periodic
20
25
Samples
30
35
5
6
CCP score
40
7
8
9
10
45
Expression profiles of 53 proliferation cluster genes with cell cycle periodicity (CCP) index > 3 during three cell cycles of HeLa
cells (Whitfield et al. 2002) (another 45 proliferation cluster genes for which expression data exists in the cell cycle experiment
had CCP<3 (not shown). In black are colored G2/M-phase peaking genes, in blue are G1-S peaking genes, in green are genes
that contain the CHR motif in their promoters (see below). The inset shows the distribution of the CCP index (Tavazoie et al.
1999) calculated using Fast Fourier Transform (fft) function of Matlab, for the genes in the proliferation cluster (black) and for the
rest of the genome (red), shaded is the area below the black curve that correspond to CCP>3. Only 6.7% of the genes in the
genome have such CCP value or higher.
Figure S3 A
45
40
Real orthologs
Random pairs
35
30
25
20
15
10
5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Orthologous human and mouse proliferation cluster genes preserve the composition of the
regulatory motifs in their promoters.
Distribution of motif compositional similarity of orthologous human – mouse promoters from
the proliferation cluster, and random human – mouse gene pairs. Given the five motifs in the
present analysis (E2F, NFY, CHR, CDE and ELK1), each promoter was characterized by a 5long binary signature indicating the presence or absence of the motifs in it. The similarity
between a pair of promoters was defined as the number of motifs that are in the two promoters
divided by the number of motifs in the promoter with the maximal number of motifs. The
(blue) distribution for real orthologous pairs is shifted relative to the high similarity values
compared to the (red) random pairs distribution.
V$E2F.03: distance distrebution
60
B
C
40
ELK1: distance distribution
30
NFY.01: distance distribution
100
E
D
20
50
0
1000
1000
0
-1000
-500
0
0
-1000
500
500
0
500
human location
1000
500
0
0
500
1000
human location
500
0
0
-500
1000
1000
mouse location
mouse location
1000
0
mouse location
0
-1000
mouse location
2
10
20
0
CHR: distance distribution
4
1000
500
500
1000
human location
F
Orthologous human and mouse proliferation cluster genes
preserve the location of the regulatory motifs relative to the
TSS.
0
500
1000
human location
CDE.01: distance distribution
80
60
40
20
0
-1000
0
1000
1000
mouse location
For the motifs E2F, ELK1, NFY, CHR, and CDE (B through F
respectively) we show in the upper panel a distribution of
distances (in bps) between the distance of the motif from the
TSS of human and its mouse ortholog. The lower panel shows
for each motif a scatter plot in which each gene that contains the
motif in both species is a dot whose x-axis depicts the distance of
the motif from the human TSS, while the y-axis depicts the
500
1000
0
0
0
500
0
0
500
1000
human location
0.4
Figure S4
points
point 7
0.3
0.2
p16 + p21 sum gate
p16+p21 sum gate
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
200
300
400
500
mean chip expression
600
700
800
proliferation cluster genes mean chip expression
The negative correlation between p21 + p16 and the proliferation cluster is obeyed by all time
points
Scatter plot depicting the negative correlation between the summated profile of (normalized) p16
and p21 levels and the averaged expression profile of the proliferation cluster. Each time point is
shown as a dot (two duplicates are shown); time point 7, which had among the proliferation
cluster genes a lower expression profiles compared to its neighboring time points, is show in red.
Clearly time point 7 (and all the other points) obey the same trend of negative correlation
Figure S5
ELK1
-Log(p-value)
-Log(p-value)
NFY.01
-score
-score
An annotation-based procedure to determine a threshold score in assigning genes to a PSSM
Alternative assignments of genes to a PSSM, each corresponding to a different threshold score
of the basic assignment algorithm (Elkon et al., 2003), are examined. At each assignment, that
corresponds to a (typically different) set of genes, we calculate a hypergeometric p-value on
the hypothesis that the genes assigned to the motif are enriched in the cluster. The threshold
score that is chosen is that one that minimizes such p-value. A Bonferroni test is then applied to
ensure that p-value is corrected for multiple hypotheses tested (number of motifs x number of
thresholds tested per motif). Show here for example are the case of ELK1 and NYF.
0.15
Figure S6
Normalized expression
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
1
2
3
4
5
6
7
8
9
10
11
12
Samples
Only genes that contain the CHR motif in its preferred strand and location display the
proliferation signature
The expression profiles (same sample order as in Fig. 1B) of all CHR-containing genes in the
genome, partitioned into 20 categories that correspond to the distance of CHR from the TSS
Samples
and the strand in which the motif occurs. The
20 categories correspond to either coding or noncoding strand and to one of 100 bp-long non-overlapping sequence windows that span the entire
1000 bp-long promoters. Genes belonging to each category were grouped together and their
mean expression levels are represented as blue lines. The genes that contain the CHR motif on
the coding strand in the first 100-bp window are shown in red.