Download Supplementary information

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Lac operon wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Molecular cloning wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene desert wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genome evolution wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

Expression vector wikipedia , lookup

Molecular evolution wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene expression profiling wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Supplementary information
Data and Methods for Gene Correlation Analysis
The constitutive gene expression measurements from the NCI60 originate from three
publicly available data sets, each independently generated on different experimental
platforms. The Z-score normalized differential in constitutive gene expression across the
NCI60 is treated in the same manner as GI50 values. Expression data for all three
microarray experiments were merged by collecting and averaging data for sequences
belonging to the same Unigene annotations (version 156). This resulted in a total of 9961
unique genes for data analysis. Gene correlation maps were constructed for this set of
genes as the Pearson correlation coefficient (PCC) between gene expression values and
representative node vector in the GI50 SOM. The PCC between cytotoxicity ( d ) and gene
expression ( g ) is defined as:

ncell
PCC 
i 1

ncell
i 1
(d i  d )( g i  g )
(d i  d ) 2 i cell1 ( g i  g ) 2
n
(1)
where g and d denote averages, and the summation runs over the number of cell lines.
This procedure creates one data point for each of the 1066 node vectors on the GI50 map
and provides a visual mean to identify correlated gene responses according to specific
map regions. Each gene correlation map yields information on the entire range of
potential anticancer drugs used in the cell based GI50 screen, rather that the single-most
correlated cytotoxic response. Correlation of regions of the map involving more that one
node to a particular gene was also estimated an average of node/gene correlations that
achieved a p-value less than 0.05. Correlations between genes were assessed based on
their correlation maps with the 1066 node vectors on the GI50 map. As a note of caution,
positive or negative correlations do not define causality, nor does absence of correlation
imply non-causality
Randomization procedures were used to establish measures of statistical significance for
drug-gene correlations. For all map nodes associated with a region, a Euclidian distance
was calculated between the gene map correlations at each node and a unit vector. A
random sample of distances was obtained by repeatedly calculating similar distributions
from a shuffled sample of gene measurement vectors. The mean and standard deviation
of randomized distances was used to construct Z-scores for each gene’s expression for all
map nodes. Large positive or negative Z-values indicate statistically significant deviation
from the randomized distributions for positive or negative correlations between gene
measurements and map nodes. Z-scores larger or smaller than +/-1.80 (p<0.05) were
carried along for further analysis.
Annotations of the most significant Z-scoring genes were based on the controlled
vocabulary of the Gene Ontology (GO) consortium. These descriptors do not encompass
pathway data per se but characterize genes according to cellular component, biological
process, and molecular function. The hierarchical part of the GO vocabulary was
assigned to different levels, with level 1 being the most encompassing, and containing
words like "cell" as a descriptor to cellular components, "binding activity" for molecular
function, etc. We stratified the GO descriptors to the fifth level where all children
inherited the parent’s descriptors. All genes that have at least one measurement in the
NCI60 microarray data were assigned primary as well as inherited GO descriptors at each
level. The Unigene associated GO terms were derived from the publicly available
LocusLink data set.
To determine the significance of these descriptors within a specified map region p-values
from Fisher's exact two-tailed test were determined for Z-scores of selected genes. Thus,
for each GO term at each level of the GO description, the number of genes, selected from
the genes identified via the Z-score analysis above, associated with this term were
counted in the region of interest. A running tally of the total number of genes at this
description level was also kept. The corresponding summations were then performed over
all regions of the map except the region of interest. These counts were then used to
construct a contingency table for each GO term that was used to calculate the
corresponding p-value from Fisher's exact two-tailed test. Using this procedure, 524
unique GO terms were found to be associated with genes significantly correlated within
the Q-region. These terms encompass cellular components, biological processes, and
functional data defined to the fifth level of the GO hierarchy.
Supplementary table captions:
Table I. Examples of naturally occurring electrophilic compounds found in the Qregion. Comments on these compounds can be found in Part 1 of the text.
Table II.
Known drugs in the Q-region classified as antineoplastic or
immunosuppressive agents. Compound Classes are included where the drugs
match motifs described in Table 2. Common drug names are used. The most
likely drug targets, obtained from literature search (SciFinder®) or PDB, are
included where possible.
Table I.
O
Chalcone derivatives
R1
R2
O
R1
N
Mannich Bases of Chalcone
derivatives
R
R2
R3
N
R4
Kaur-16-en-15-one derivatives
O
O
O
Eupatoriopicrine derivatives
O
O
HO
OH
OH
Chlorohydrin derivatives
R3
R2
R1
OH
Cl
R
1-Aryl-2-dimethylaminomethyl2-propen-1-ones
N
O
Ergolide derivatives
O
O
O
Helenalin derivatives
O
O
O
O
R
Table II.
Drug NSC
NSC365798
NSC284356
NSC400978
NSC79037
NSC95441
Drug CAS
10083-24-6
10403-51-7
1146-04-9
13010-47-4
13909-09-6
Drug Compound
Class
N/A
N/A
N/A
N/A
N/A
NSC8120
NSC143648
149-29-1
21090-35-7
1
N/A
NSC104801
2126-70-7
4
Drug Name
Piceatannol
Mitindomide
Illudin M
CCNU
Semustine
Patulin
Sangivamycin
Acrylic acid, 3-p-anisoyl3-bromo-, sodium salt
NSC29228
2179-57-9
12
Garlicin
NSC87868
NSC141537
NSC118994
NSC141633
NSC250682
2257-09-2
2270-40-8
23590-99-0
26833-87-4
28957-04-2
13
N/A
N/A
N/A
N/A
Phenylethyl
isothiocyanate
Anguidin
Ribox
Homoharringtonine
Oridonin
NSC135037
NSC122750
NSC11926
29444-03-9
30562-34-6
313-67-7
N/A
19
N/A
Jatrophone
Geldanamycin
Aristolochic acid
NSC32982
458-37-7
4
Curcumin
NSC26647
NSC236613
475-38-7
481-42-5
6
6
Naphthazarin
Plumbagin
NSC85235
NSC101088
NSC252844
509-93-3
5119-48-2
517-89-5
1
1
6
Ambrosin
Withaferin A
Shikonin
NSC306951
55303-98-5
N/A
Avarol
NSC208734
NSC85237
NSC286193
57576-44-0
5938-03-4
60084-10-8
6
1
N/A
Aclarubicin
Ivalin
Tiazofurin
Drug Target
Protein-Tyrosine Kinase
DNA topoisomerase II
GSH metabolism
Alkylating agent
Many
GSH metabolism; protein
cross-linking; RNA
Polymerase II
Chemopreventive agents;
Modulation of phase II
enzymes
Chemopreventive agents;
cytochrome P-450
enzyme system;
aldehyde
dehydrogenase; GSH
metabolism; DNA
topoisomerase II
Mycotoxin
Ribonucleotide reductase
Many
DNA polymerase II
GSH synthesis; protein
kinase C
Heat-Shock Protein 90
Phospholipase A2
Cyclooxygenase-2;
GSTP1; protein tyrosine
kinase
GSH metabolism; redox
cycling; topoisomerase I
Oxidative stress
Inducible nitric oxide
synthase
Cyclooxygenase-2
Many
Reverse transcriptase;
RNA-dependent and
DNA-dependent DNA
polymerase
DNA topoisomerase I
and II
IMP dehydrogenase
NSC63701
NSC283162
606-58-6
64124-21-6
N/A
N/A
Toyocamycin
Trimelamol
NSC324368
NSC325663
65492-82-2
66082-27-7
N/A
N/A
Edelfosine
Saframycin A
NSC2604
683-18-1
NSC320951
NSC324360
NSC63984
NSC330499
NSC352890
NSC253272
NSC349644
NSC340847
NSC364372
NSC347512
NSC372208
Organometal
Dichlorodibutylstannane
68322-91-8
69839-83-4
73-03-0
73341-72-7
1
N/A
N/A
19
Bis(helenalinyl) malonate
Didox
Cordycepin
Macbecin I
77691-03-3
81424-67-1
82423-05-0
83705-13-9
87081-35-4
87626-55-9
98474-21-6
N/A
N/A
N/A
N/A
1
N/A
6
9-Deazaadenosine
Caracemide
Cyanocycline A
Selenazofurin
Leptomycin B
Mitoflaxone
Urdamycin A
NSC32946
NSC616960
NSC659936
7059-23-6
644-69-9
6856-01-5
N/A
1
1
Mitoguazone
Ranunculin
Eupatoriopicrin
NSC605986
NSC345842
70000-22-5
88859-04-5
N/A
N/A
Caulerpenyne
Mafosfamide
Adenosine kinase; cyclindependent protein
kinases
DNA cross-linking
Protein kinase C;
phospholipase C
DNA binding
Phospholipase A2; GSH
metabolism
Nucleic acid synthesis;
DNA topoisomerase II
Ribonucleotide reductase
Adenosine deaminase
Heat-Shock Proteins 90
Adenosine kinase;
Deaminase, adenosine
Ribonucleotide reductase
DNA binding
IMP dehydrogenase
Protein nuclear export
Aminopeptidase N/CD13
Adenosylmethionine
decarboxylase; ornithine
decarboxylase
DNA polymerase
Glutathione depletion
Na+/K+-ATPase;
Phospholipase A2
Alkylating agent