Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Ab initio genotype-phenotype association reveals intrinsic modularity in genetic networks (in bacteria) Olivier Elemento, Tavazoie lab Some bacterial phenotypes … Motility Gram-staining Spore formation Hyper-thermophily Can we find the genes underlying these phenotypes ? http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi Motility in bacteria • Some (but not all) bacteria are motile • Motile bacteria may share genes involved in motility • These genes may be absent from nonmotile bacteria … … Motility present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004) ~200 bacterial genomes … … Motility E. coli Gene X present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004) ~200 bacterial genomes … ~200 bacterial genomes … Motility High correlation E. coli Gene X E. coli Gene Y … … present Gene Y is likely involved in motility absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004) … … Motility B. subtilis gene Z (e.g. CheV) … present absent (Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004) ~200 bacterial genomes • Calculate a phylogenetic profile for all 600,000 genes in bacteria (~1.2x10^8 BLASTs) • Collect the genes most correlated to the phenotype in all bacteria that have the phenotype (~3,000 for motility) • Merge homologous genes (based on sequence similarity) ~ 3,000 motility genes Merging homologous (orthologous/paralogous) genes ~ 3,000 motility genes 75 groups of homologs (Generic Genes) Motility E. coli Gene Y B. subtilis Gene Y B. anthrax Gene Y C. jejeuni Gene Y Generic Gene Y Can we recover such modules ? Motility Generic Gene V Generic Gene W Generic Gene Y Generic Gene Z Can we recover such modules ? Generic Gene V Generic Gene Z Module 1 Generic Gene W Generic Gene Y Module 2 Can we recover such modules ? • Cluster Generic Gene profiles 1,000 times using Iclust with different random initializations (obtain slightly different clusters) • Group together genes which almost always end up in the same cluster Iclust: Slonim et al, 2006 Motility GG index GG-3 flagellar biosynthetic protein flhB GG-4 flagellar biosynthetic protein flhA GG-5 flagellar biosynthetic protein fliP GG-22 flagellar biosynthetic protein fliR GG-56 flagellar biosynthetic protein fliQ GG-6 flagellar hook flgE/F/G GG-7 flagellar motor switch fliG GG-10 flagellar basal-body rod flgC GG-12 flagellar MS-ring fliF GG-13 flagellar hook-associated protein 1 flgK GG-18 flagellar motor switch fliN GG-21 flagellar motor switch fliM GG-27 flagellar hook-associated protein 3 flgL GG-29 flagellar hook-associated protein 2 fliD GG-8 flagellin fliC GG-17 motility protein A motA GG-74 flagellar protein fliS GG-20 motility protein B motB GG-1 methyl-accepting chemotaxis protein GG-11 chemotaxis protein cheA GG-45 methyl-accepting chemotaxis protein GG-73 methyl-accepting chemotaxis protein GG-38 chemotaxis protein cheV GG-15 chemotaxis protein cheW GG-2 chemotaxis methyltransferase cheR GG-30 glutamate methylesterase cheB GG-32 flagellar L-ring protein precursor flgH GG-36 flagellar P-ring protein precursor flgI Motility GG index These results are based on no prior knowledge, apart from genome sequences along with their phenotypic annotations GG-9 RNA-polymerase sigma-54 factor GG-14 transcription factor, sigma-54-dependent Phylogenetic profiles / modules for motility E. coli chemotaxis and flagellum modules Some E. coli genes are not recovered. Why ? Motility fliI, cheY fliO, cheZ Phylogenetic profiles / modules for Gram-staining GG-2 3-deoxy-manno-octulosonate cytidylyltransferase GG-3 UDP-3-O glucosamine N-acyltransferase GG-4 lipid-A-disaccharide synthase GG-5 polysialic acid capsule expression protein GG-7 UDP-3-O N-acetylglucosamine deacetylase GG-8 3-deoxy-D-manno-octulosonic-acid transferase GG-11 tetraacyldisaccharide 4'-kinase GG-1 outer membrane protein yaeT GG-20 HlyD family secretion protein GG-96 HlyD family secretion protein GG-53 HlyD family secretion protein GG-111 membrane fusion protein (MFP) GG-15 pyridoxal phosphate biosynthetic protein GG-52 pyridoxal phosphate biosynthetic protein GG-35 ABC transporter, permease GG-9 PAL peptidoglycan-associated lipoprotein GG-10 tolQ/exbB protein GG-12 tolB protein GG-72 lipid A biosynthesis lauroyl acyltransferase GG-68 glutaredoxin 3 GG-29 2-octaprenyl-6-methoxyphenol hydroxylase GG-31 glutathione synthetase GG-18 glutaredoxin-related protein GG-73 coproporphyrinogen III oxidase, aerobic GG-107 hydroxyacylglutathione hydrolase GG-63 spore-cortex-lytic enzyme GG-87 spore germination protein GG-104 spore protease GG-136 spore protease related GG-71 stage III sporulation protein AB GG-103 stage III sporulation protein AE GG-132 stage III sporulation protein AG GG-95 stage II sporulation protein E GG-137 stage II sporulation protein M GG-11 stage II sporulation protein P GG-134 stage II sporulation protein R GG-135 stage IV sporulation protein GG-76 stage IV sporulation protein A GG-46 stage IV sporulation protein B GG-40 stage V sporulation protein AC GG-34 stage V sporulation protein AD GG-15 stage V sporulation protein AF GG-37 translocation-enhancing protein GG-94 hypothetical membrane protein GG-127 hypothetical membrane protein Focused hypotheses for experimental validation GG-8 sporulation-blocking protein yabP GG-130 sporulation sigma-E factor processing peptidase GG-58 stage III sporulation protein AC GG-6 stage III sporulation protein AD GG-3 stage III sporulation protein D GG-49 small acid-soluble spore protein I sspI GG-69 spoVID-dependent spore coat assembly factor GG-101 spore coat protein GG-52 spore coat protein E GG-99 spore coat related, putative GG-97 spore cortex biosynthesis, putative GG-84 spore germination protein GG-90 spore germination protein GG-55 spore germination protein C1 GG-62 sporulation initiation phosphotransferase GG-113 stage III sporulation protein AF GG-64 stage IV sporulation protein FA GG-91 stage VI sporulation protein D GG-54 abi, CAAX amino terminal protease GG-42 cytochrome C-550/C-551 GG-53 cytochrome C oxidase subunit IV GG-36 menaquinol-cytochrome C reductase qcrC GG-50 lipoprotein, putative GG-18 prespore-specific transcriptional regulator GG-66 putative lipoprotein GG-56 putative ribonuclease H GG-26 reductase ribT / acetyltransferase gnaT GG-124 hypothetical membrane proetin GG-118 hypothetical membrane protein GG-29 hypothetical cytosolic protein GG-38 hypothetical cytosolic protein GG-120 hypothetical cytosolic protein GG-24 hypothetical protein GG-27 hypothetical protein GG-28 hypothetical protein GG-30 hypothetical protein GG-31 hypothetical protein GG-32 hypothetical protein GG-33 hypothetical protein GG-41 hypothetical protein GG-43 hypothetical protein GG-47 hypothetical protein GG-60 hypothetical protein GG-61 hypothetical protein GG-65 hypothetical protein GG-67 hypothetical protein GG-68 hypothetical protein GG-70 hypothetical protein GG-72 hypothetical protein GG-73 hypothetical protein GG-83 hypothetical protein GG-88 hypothetical protein, HD domain GG-100 hypothetical protein (ecsc) GG-114 hypothetical protein GG-116 hypothetical protein GG-117 hypothetical protein • Community sequencing Conclusion • Systematic association of genotype / phenotype for several phenotypes • Clustering reveals robust modules that corresponds to protein complexes, signal transduction pathways, enzymatic pathways • Many predictions that can be verified experimentally Acknowledgements • Saeed Tavazoie • Noam Slonim • Tavazoie lab members