* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Supplementary information
Secreted frizzled-related protein 1 wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Molecular cloning wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene desert wikipedia , lookup
List of types of proteins wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Expression vector wikipedia , lookup
Molecular evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene regulatory network wikipedia , lookup
Gene expression profiling wikipedia , lookup
Supplementary information Data and Methods for Gene Correlation Analysis The constitutive gene expression measurements from the NCI60 originate from three publicly available data sets, each independently generated on different experimental platforms. The Z-score normalized differential in constitutive gene expression across the NCI60 is treated in the same manner as GI50 values. Expression data for all three microarray experiments were merged by collecting and averaging data for sequences belonging to the same Unigene annotations (version 156). This resulted in a total of 9961 unique genes for data analysis. Gene correlation maps were constructed for this set of genes as the Pearson correlation coefficient (PCC) between gene expression values and representative node vector in the GI50 SOM. The PCC between cytotoxicity ( d ) and gene expression ( g ) is defined as: ncell PCC i 1 ncell i 1 (d i d )( g i g ) (d i d ) 2 i cell1 ( g i g ) 2 n (1) where g and d denote averages, and the summation runs over the number of cell lines. This procedure creates one data point for each of the 1066 node vectors on the GI50 map and provides a visual mean to identify correlated gene responses according to specific map regions. Each gene correlation map yields information on the entire range of potential anticancer drugs used in the cell based GI50 screen, rather that the single-most correlated cytotoxic response. Correlation of regions of the map involving more that one node to a particular gene was also estimated an average of node/gene correlations that achieved a p-value less than 0.05. Correlations between genes were assessed based on their correlation maps with the 1066 node vectors on the GI50 map. As a note of caution, positive or negative correlations do not define causality, nor does absence of correlation imply non-causality Randomization procedures were used to establish measures of statistical significance for drug-gene correlations. For all map nodes associated with a region, a Euclidian distance was calculated between the gene map correlations at each node and a unit vector. A random sample of distances was obtained by repeatedly calculating similar distributions from a shuffled sample of gene measurement vectors. The mean and standard deviation of randomized distances was used to construct Z-scores for each gene’s expression for all map nodes. Large positive or negative Z-values indicate statistically significant deviation from the randomized distributions for positive or negative correlations between gene measurements and map nodes. Z-scores larger or smaller than +/-1.80 (p<0.05) were carried along for further analysis. Annotations of the most significant Z-scoring genes were based on the controlled vocabulary of the Gene Ontology (GO) consortium. These descriptors do not encompass pathway data per se but characterize genes according to cellular component, biological process, and molecular function. The hierarchical part of the GO vocabulary was assigned to different levels, with level 1 being the most encompassing, and containing words like "cell" as a descriptor to cellular components, "binding activity" for molecular function, etc. We stratified the GO descriptors to the fifth level where all children inherited the parent’s descriptors. All genes that have at least one measurement in the NCI60 microarray data were assigned primary as well as inherited GO descriptors at each level. The Unigene associated GO terms were derived from the publicly available LocusLink data set. To determine the significance of these descriptors within a specified map region p-values from Fisher's exact two-tailed test were determined for Z-scores of selected genes. Thus, for each GO term at each level of the GO description, the number of genes, selected from the genes identified via the Z-score analysis above, associated with this term were counted in the region of interest. A running tally of the total number of genes at this description level was also kept. The corresponding summations were then performed over all regions of the map except the region of interest. These counts were then used to construct a contingency table for each GO term that was used to calculate the corresponding p-value from Fisher's exact two-tailed test. Using this procedure, 524 unique GO terms were found to be associated with genes significantly correlated within the Q-region. These terms encompass cellular components, biological processes, and functional data defined to the fifth level of the GO hierarchy. Supplementary table captions: Table I. Examples of naturally occurring electrophilic compounds found in the Qregion. Comments on these compounds can be found in Part 1 of the text. Table II. Known drugs in the Q-region classified as antineoplastic or immunosuppressive agents. Compound Classes are included where the drugs match motifs described in Table 2. Common drug names are used. The most likely drug targets, obtained from literature search (SciFinder®) or PDB, are included where possible. Table I. O Chalcone derivatives R1 R2 O R1 N Mannich Bases of Chalcone derivatives R R2 R3 N R4 Kaur-16-en-15-one derivatives O O O Eupatoriopicrine derivatives O O HO OH OH Chlorohydrin derivatives R3 R2 R1 OH Cl R 1-Aryl-2-dimethylaminomethyl2-propen-1-ones N O Ergolide derivatives O O O Helenalin derivatives O O O O R Table II. Drug NSC NSC365798 NSC284356 NSC400978 NSC79037 NSC95441 Drug CAS 10083-24-6 10403-51-7 1146-04-9 13010-47-4 13909-09-6 Drug Compound Class N/A N/A N/A N/A N/A NSC8120 NSC143648 149-29-1 21090-35-7 1 N/A NSC104801 2126-70-7 4 Drug Name Piceatannol Mitindomide Illudin M CCNU Semustine Patulin Sangivamycin Acrylic acid, 3-p-anisoyl3-bromo-, sodium salt NSC29228 2179-57-9 12 Garlicin NSC87868 NSC141537 NSC118994 NSC141633 NSC250682 2257-09-2 2270-40-8 23590-99-0 26833-87-4 28957-04-2 13 N/A N/A N/A N/A Phenylethyl isothiocyanate Anguidin Ribox Homoharringtonine Oridonin NSC135037 NSC122750 NSC11926 29444-03-9 30562-34-6 313-67-7 N/A 19 N/A Jatrophone Geldanamycin Aristolochic acid NSC32982 458-37-7 4 Curcumin NSC26647 NSC236613 475-38-7 481-42-5 6 6 Naphthazarin Plumbagin NSC85235 NSC101088 NSC252844 509-93-3 5119-48-2 517-89-5 1 1 6 Ambrosin Withaferin A Shikonin NSC306951 55303-98-5 N/A Avarol NSC208734 NSC85237 NSC286193 57576-44-0 5938-03-4 60084-10-8 6 1 N/A Aclarubicin Ivalin Tiazofurin Drug Target Protein-Tyrosine Kinase DNA topoisomerase II GSH metabolism Alkylating agent Many GSH metabolism; protein cross-linking; RNA Polymerase II Chemopreventive agents; Modulation of phase II enzymes Chemopreventive agents; cytochrome P-450 enzyme system; aldehyde dehydrogenase; GSH metabolism; DNA topoisomerase II Mycotoxin Ribonucleotide reductase Many DNA polymerase II GSH synthesis; protein kinase C Heat-Shock Protein 90 Phospholipase A2 Cyclooxygenase-2; GSTP1; protein tyrosine kinase GSH metabolism; redox cycling; topoisomerase I Oxidative stress Inducible nitric oxide synthase Cyclooxygenase-2 Many Reverse transcriptase; RNA-dependent and DNA-dependent DNA polymerase DNA topoisomerase I and II IMP dehydrogenase NSC63701 NSC283162 606-58-6 64124-21-6 N/A N/A Toyocamycin Trimelamol NSC324368 NSC325663 65492-82-2 66082-27-7 N/A N/A Edelfosine Saframycin A NSC2604 683-18-1 NSC320951 NSC324360 NSC63984 NSC330499 NSC352890 NSC253272 NSC349644 NSC340847 NSC364372 NSC347512 NSC372208 Organometal Dichlorodibutylstannane 68322-91-8 69839-83-4 73-03-0 73341-72-7 1 N/A N/A 19 Bis(helenalinyl) malonate Didox Cordycepin Macbecin I 77691-03-3 81424-67-1 82423-05-0 83705-13-9 87081-35-4 87626-55-9 98474-21-6 N/A N/A N/A N/A 1 N/A 6 9-Deazaadenosine Caracemide Cyanocycline A Selenazofurin Leptomycin B Mitoflaxone Urdamycin A NSC32946 NSC616960 NSC659936 7059-23-6 644-69-9 6856-01-5 N/A 1 1 Mitoguazone Ranunculin Eupatoriopicrin NSC605986 NSC345842 70000-22-5 88859-04-5 N/A N/A Caulerpenyne Mafosfamide Adenosine kinase; cyclindependent protein kinases DNA cross-linking Protein kinase C; phospholipase C DNA binding Phospholipase A2; GSH metabolism Nucleic acid synthesis; DNA topoisomerase II Ribonucleotide reductase Adenosine deaminase Heat-Shock Proteins 90 Adenosine kinase; Deaminase, adenosine Ribonucleotide reductase DNA binding IMP dehydrogenase Protein nuclear export Aminopeptidase N/CD13 Adenosylmethionine decarboxylase; ornithine decarboxylase DNA polymerase Glutathione depletion Na+/K+-ATPase; Phospholipase A2 Alkylating agent