Download talk_proks_meeting_0..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ab initio genotype-phenotype
association reveals intrinsic
modularity in genetic networks
(in bacteria)
Olivier Elemento, Tavazoie lab
Some bacterial phenotypes …
Motility
Gram-staining
Spore formation
Hyper-thermophily
Can we find the genes
underlying these phenotypes ?
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
Motility in bacteria
• Some (but not all) bacteria are motile
• Motile bacteria may share genes involved
in motility
• These genes may be absent from nonmotile bacteria
…
…
Motility
present
absent
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
~200
bacterial genomes
…
…
Motility
E. coli Gene X
present
absent
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
~200
bacterial genomes
…
~200
bacterial genomes
…
Motility
High
correlation
E. coli Gene X
E. coli Gene Y
…
…
present
Gene Y is likely
involved in motility
absent
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
…
…
Motility
B. subtilis gene Z
(e.g. CheV)
…
present
absent
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
~200
bacterial genomes
• Calculate a phylogenetic profile for all 600,000
genes in bacteria (~1.2x10^8 BLASTs)
• Collect the genes most correlated to the
phenotype in all bacteria that have the
phenotype (~3,000 for motility)
• Merge homologous genes (based on sequence
similarity)
~ 3,000 motility genes
Merging homologous
(orthologous/paralogous) genes
~ 3,000 motility genes
75 groups of homologs (Generic Genes)
Motility
E. coli Gene Y
B. subtilis Gene Y
B. anthrax Gene Y
C. jejeuni Gene Y
Generic Gene Y
Can we recover such modules ?
Motility
Generic Gene V
Generic Gene W
Generic Gene Y
Generic Gene Z
Can we recover such modules ?
Generic Gene V
Generic Gene Z
Module 1
Generic Gene W
Generic Gene Y
Module 2
Can we recover such modules ?
• Cluster Generic Gene profiles 1,000 times
using Iclust with different random
initializations (obtain slightly different
clusters)
• Group together genes which almost
always end up in the same cluster
Iclust: Slonim et al, 2006
Motility GG index
GG-3 flagellar biosynthetic protein flhB
GG-4 flagellar biosynthetic protein flhA
GG-5 flagellar biosynthetic protein fliP
GG-22 flagellar biosynthetic protein fliR
GG-56 flagellar biosynthetic protein fliQ
GG-6 flagellar hook flgE/F/G
GG-7 flagellar motor switch fliG
GG-10 flagellar basal-body rod flgC
GG-12 flagellar MS-ring fliF
GG-13 flagellar hook-associated protein 1 flgK
GG-18 flagellar motor switch fliN
GG-21 flagellar motor switch fliM
GG-27 flagellar hook-associated protein 3 flgL
GG-29 flagellar hook-associated protein 2 fliD
GG-8 flagellin fliC
GG-17 motility protein A motA
GG-74 flagellar protein fliS
GG-20 motility protein B motB
GG-1 methyl-accepting chemotaxis protein
GG-11 chemotaxis protein cheA
GG-45 methyl-accepting chemotaxis protein
GG-73 methyl-accepting chemotaxis protein
GG-38 chemotaxis protein cheV
GG-15 chemotaxis protein cheW
GG-2 chemotaxis methyltransferase cheR
GG-30 glutamate methylesterase cheB
GG-32 flagellar L-ring protein precursor flgH
GG-36 flagellar P-ring protein precursor flgI
Motility GG index
These results are based on no prior
knowledge, apart from genome sequences
along with their phenotypic annotations
GG-9 RNA-polymerase sigma-54 factor
GG-14 transcription factor, sigma-54-dependent
Phylogenetic profiles / modules for motility
E. coli chemotaxis and flagellum modules
Some E. coli genes are not recovered. Why ?
Motility
fliI, cheY
fliO, cheZ
Phylogenetic profiles / modules for Gram-staining
GG-2 3-deoxy-manno-octulosonate cytidylyltransferase
GG-3 UDP-3-O glucosamine N-acyltransferase
GG-4 lipid-A-disaccharide synthase
GG-5 polysialic acid capsule expression protein
GG-7 UDP-3-O N-acetylglucosamine deacetylase
GG-8 3-deoxy-D-manno-octulosonic-acid transferase
GG-11 tetraacyldisaccharide 4'-kinase
GG-1 outer membrane protein yaeT
GG-20 HlyD family secretion protein
GG-96 HlyD family secretion protein
GG-53 HlyD family secretion protein
GG-111 membrane fusion protein (MFP)
GG-15 pyridoxal phosphate biosynthetic protein
GG-52 pyridoxal phosphate biosynthetic protein
GG-35 ABC transporter, permease
GG-9 PAL peptidoglycan-associated lipoprotein
GG-10 tolQ/exbB protein
GG-12 tolB protein
GG-72 lipid A biosynthesis lauroyl acyltransferase
GG-68 glutaredoxin 3
GG-29 2-octaprenyl-6-methoxyphenol hydroxylase
GG-31 glutathione synthetase
GG-18 glutaredoxin-related protein
GG-73 coproporphyrinogen III oxidase, aerobic
GG-107 hydroxyacylglutathione hydrolase
GG-63 spore-cortex-lytic enzyme
GG-87 spore germination protein
GG-104 spore protease
GG-136 spore protease related
GG-71 stage III sporulation protein AB
GG-103 stage III sporulation protein AE
GG-132 stage III sporulation protein AG
GG-95 stage II sporulation protein E
GG-137 stage II sporulation protein M
GG-11 stage II sporulation protein P
GG-134 stage II sporulation protein R
GG-135 stage IV sporulation protein
GG-76 stage IV sporulation protein A
GG-46 stage IV sporulation protein B
GG-40 stage V sporulation protein AC
GG-34 stage V sporulation protein AD
GG-15 stage V sporulation protein AF
GG-37 translocation-enhancing protein
GG-94 hypothetical membrane protein
GG-127 hypothetical membrane protein
Focused hypotheses for
experimental validation
GG-8 sporulation-blocking protein yabP
GG-130 sporulation sigma-E factor processing peptidase
GG-58 stage III sporulation protein AC
GG-6 stage III sporulation protein AD
GG-3 stage III sporulation protein D
GG-49 small acid-soluble spore protein I sspI
GG-69 spoVID-dependent spore coat assembly factor
GG-101 spore coat protein
GG-52 spore coat protein E
GG-99 spore coat related, putative
GG-97 spore cortex biosynthesis, putative
GG-84 spore germination protein
GG-90 spore germination protein
GG-55 spore germination protein C1
GG-62 sporulation initiation phosphotransferase
GG-113 stage III sporulation protein AF
GG-64 stage IV sporulation protein FA
GG-91 stage VI sporulation protein D
GG-54 abi, CAAX amino terminal protease
GG-42 cytochrome C-550/C-551
GG-53 cytochrome C oxidase subunit IV
GG-36 menaquinol-cytochrome C reductase qcrC
GG-50 lipoprotein, putative
GG-18 prespore-specific transcriptional regulator
GG-66 putative lipoprotein
GG-56 putative ribonuclease H
GG-26 reductase ribT / acetyltransferase gnaT
GG-124 hypothetical membrane proetin
GG-118 hypothetical membrane protein
GG-29 hypothetical cytosolic protein
GG-38 hypothetical cytosolic protein
GG-120 hypothetical cytosolic protein
GG-24 hypothetical protein
GG-27 hypothetical protein
GG-28 hypothetical protein
GG-30 hypothetical protein
GG-31 hypothetical protein
GG-32 hypothetical protein
GG-33 hypothetical protein
GG-41 hypothetical protein
GG-43 hypothetical protein
GG-47 hypothetical protein
GG-60 hypothetical protein
GG-61 hypothetical protein
GG-65 hypothetical protein
GG-67 hypothetical protein
GG-68 hypothetical protein
GG-70 hypothetical protein
GG-72 hypothetical protein
GG-73 hypothetical protein
GG-83 hypothetical protein
GG-88 hypothetical protein, HD domain
GG-100 hypothetical protein (ecsc)
GG-114 hypothetical protein
GG-116 hypothetical protein
GG-117 hypothetical protein
• Community sequencing
Conclusion
• Systematic association of genotype / phenotype
for several phenotypes
• Clustering reveals robust modules that
corresponds to protein complexes, signal
transduction pathways, enzymatic pathways
• Many predictions that can be verified
experimentally
Acknowledgements
• Saeed Tavazoie
• Noam Slonim
• Tavazoie lab members
Related documents