Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Evolution of regulatory interactions in bacteria Mikhail Gelfand Institute for Information Transmission Problems, RAS 4th Bertinoro Computational Biology (BCB) Meeting “Evolution of and Comparative Approaches to Gene Regulation” 24-30 June 2006 Это – ряд наблюдений. В углу – тепло. Взгляд оставляет на вещи след. Вода представляет собой стекло. Человек страшней, чем его скелет. Иосиф Бродский A list of some observations. In a corner, it’s warm. A glance leaves an imprint on anything it’s dwelt on. Water is glass’s most public form. Man is more frightening than its skeleton. Joseph Brodsky Plan • Evolution of individual sites • Coevolution of transcription factors and their binding signals • Distribution of transcription factor families in various genomes • Evolution of simple and complex regulatory systems Birth and death of sites is a very dynamic process NadR-binding sites upstream of pnuB seem absent in Klebsiella pneumoniae and Serratia marcescens … but there are candidate sites further upstream … … and they are clearly diferent (not simply misaligned). Loss of regulators and cryptic sites Loss of the RbsR in Y. pestis (ABC-transporter also is lost) RbsR binding site Start codon of rbsD Unexpected conservation of non-consensus positions in orthologous sites regulatory site of LexA upstream of lexA consensus nucleotides are in caps Escherichia coli Salmonella typhi Yersinia pestis Haemophilus influenzae Pasteurella multocida Vibrio cholerae TgCTGTATATActcACAGcA aACTGTATATActcACAGcA agCTGTATATActcACAGcA atCTGTATAcAatacCAGTt TtCTGTATATAataACAGTt cACTGgATATActcACAGTc wrong consensus? TF PurR, gene purL Escherichia coli Salmonella typhi Yersinia pestis Haemophilus influenzae Pasteurella multocida Vibrio cholerae A C G C A A A C Gg T T t C G T A C G C A A A C Gg T T t C G T A C G C A A A C Gg T T t C G T A t G C A A A C G T T T G Ct T A C G C A A A C G T T Tt C G T A C G C A A A C Gg T T G C t T TF PurR, gene purM Escherichia coli Salmonella typhi Yersinia pestis Haemophilus influenzae Pasteurella multocida Vibrio cholerae t C G C A A A C G T T T G Ct T t C G C A A A C G T T T G Ct T t C G C A A A C G T T T G Cc T t C G C A A A C G T T T G Ct T t C G C A A A C G T T T G Ct T A C G C A A A C G T T Tt C c T Non-consensus positions are more conserved than synonymous codon positions Relative conservation of non-consensus nucleotides may be higher than conservation of consensus nucleotides Regulators and their signals • Subtle changes at close evolutionary distances • Cases of signal conservation at surprisingly large distances • Changes in spacing / geometry of dimers • Correlation between contacting nucleotides and amino acid residues The LacI family: subtle changes in signals at close distances G A CG Gn GC n NrdR (regulator of ribonucleotide reducases and some other replication-related genes): conservation at large distances BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing Profile 1: Gram-positive bacteria, Archaea Profile 2: Gram-negative bacteria DNA signals and protein-DNA interactions Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom) CRP PurR IHF TrpR Specificity-determining positions in the LacI family • Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups – 44 SDPs 10 residues contact NPF (analog of the effector) 7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ) 6 residues in the intersubunit contacts 5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ) 7 residues contact the operator sequence 6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ) LacI from E.coli CRP/FNR family of regulators TGTCGGCnnGCCGACA CooA Desulfovibrio TTGTGAnnnnnnTCACAA FNR Gamma TTGATnnnnATCAA HcpR Desulfovibrio TTGTgAnnnnnnTcACAA Correlation between contacting nucleotides and amino acid residues • • • • DD DV EC YP VC DD DV EC YP VC CooA in Desulfovibrio spp. CRP in Gamma-proteobacteria HcpR in Desulfovibrio spp. FNR in Gamma-proteobacteria COOA COOA CRP CRP CRP HCPR HCPR FNR FNR FNR ALTTEQLSLHMGATRQTVSTLLNNLVR ELTMEQLAGLVGTTRQTASTLLNDMIR KITRQEIGQIVGCSRETVGRILKMLED KXTRQEIGQIVGCSRETVGRILKMLED KITRQEIGQIVGCSRETVGRILKMLEE DVSKSLLAGVLGTARETLSRALAKLVE DVTKGLLAGLLGTARETLSRCLSRMVE TMTRGDIGNYLGLTVETISRLLGRFQK TMTRGDIGNYLGLTVETISRLLGRFQK TMTRGDIGNYLGLTVETISRLLGRFQK Contacting residues: REnnnR TG: 1st arginine GA: glutamate and 2nd arginine TGTCGGCnnGCCGACA TTGTGAnnnnnnTCACAA TTGTgAnnnnnnTcACAA TTGATnnnnATCAA The correlation holds for other factors in the family Distribution of TF families in bacterial genomes Pseudomonas aeruginosa TetR LysR LuxR Streptomyces coelicolor LacI GntR AraC ExtraTrain database Agrobacterium tumefaciens Escherichia coli Bacillus subtilis Strategies of successful TF families • One ortholog per genome: – LexA, NrdR, HrcA, ArgR – present even in archaea: BirA (also enzyme), ModE • Several (2-3) orthologs per genome – CRP/FNR, FUR • Local explosions – LacI in alpha- and gamma-proteobacteria – 2CS systems in delta-proteobacteria – sigma-factors in Streptomyces • Because TF in a family tend to have related functions and these might depend on the lifestyle? LacI family regulons in closely related strains (top: TFs, bottom: regulated genes) Seven Escherichia and Shigella spp. Four Bacillus cereus and B. anthracis strains Five Salmonella spp. 1 1 2 3 2 2 4 3 5 4 3 6 7 1 2 3 4 5 6 7 1 5 1 2 3 4 5 4 1 2 3 4 What are the driving forces for the present-day state? • Expansion and contraction of regulons • Duplications of regulators with or without regulated loci • Loss of regulators with or without regulated loci • Re-assortment of regulators and structural genes • … especially in complex systems • Horizontal transfer Regulon expansion: how FruR has become CRA Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Common ancestor of Enterobacteriales Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Enterobacteriales Common ancestor of Escherichia and Salmonella Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Enterobacteriales E. coli and Salmonella spp. Trehalose/maltose catabolism in alpha-proteobacteria Duplicated LacI-family regulators: lineagespecific post-duplication loss The binding signals are very similar (the blue branch is somewhat different: to avoid cross-recognition?) Utilization of an unknown galactoside in gamma-proteobacteria Yersinia and Klebsiella: two regulons, GalR (not shown, includes genes galK and galT) and Laci-X Erwinia: one regulon, GalR Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup) Utilization of maltose/maltodextrin in Firmicutes Two different ABC transporters (shades of red) PTS (pink) Glucoside hydrolases (shades of green) Two regulators (black and grey) Modularity of the functional subsystem Two different ABC systems Three hydrolases in one operon (E. faecalis) or separately Changes of regulation Two different ABC systems Displacement: invasion of a regulator from a different subfamily (horizontal transfer from a related species?) – blue sites Orthologous TFs with completely different regulons Utilization of xylose in alpha-proteobacteria xylBA Three different ABC transporters Three regulators: two from the LacI family and one from the ROK family Changes in operon structure Changes in regulation Displacement: Operon regulation changed from XylR-1 to XylR-2 (different subfamily) Duplication and displacement: Duplicated XylR-1a assumed the role of the ROK-family regulator Catabolism of gluconate in proteobacteria extreme variability of regulation of “marginal” regulon members β Pseudomonas spp. γ Regulation of amino acid biosynthesis in Firmicutes • Interplay between regulatory RNA elements and transcription factors • Expansion of T-box systems (normally RNA structures regulating aminoacyltRNA-synthetases) Aromatic amino acid regulons Five regulatory systems for the methionine biosynthesis A. SAMdependent RNA riboswitch B. Met-tRNAdependent T-box (RNA) C,D,E. repressors of transcription Methionine regulatory systems: loss of S-box regulons • S-boxes (SAM-1 riboswitch) – Bacillales – Clostridiales – the Zoo: • • • • • • ZOO Petrotoga actinobacteria (Streptomyces, Thermobifida) Chlorobium, Chloroflexus, Cytophaga Fusobacterium Deinococcus proteobacteria (Xanthomonas, Geobacter) • Met-T-boxes (Met-tRNA-dependent attenuator) + SAM-2 riboswitch for metK – Lactobacillales • MET-boxes (candidate transcription signal) – Streptococcales Lact. Strep. Bac. Clostr. Mapping the events to the phylogenetic tree loss of S-boxes (SAM-I riboswitches) expansion of Met-T-boxes, emergence of SAM-2 riboswitches Trp-T-boxes TRAP Tyr-T-boxes PCE Bacillus subtilis and related species emergence of MtaR Tyr-T-boxes ARO Bacillus cereus and related species Lactobacillus spp. Streptococcus spp. Clostridium spp. Combined regulatory network for iron homeostasis genes in in a-proteobacteria. [- Fe] [+Fe] [ - Fe] [+Fe] RirA RirA Irr Irr FeS heme degraded Siderophore uptake 2+ 3+ Fe / Fe uptake Iron uptakesystems Fur [- Fe] Iron storage ferritins FeS synthesis Heme synthesis Iron-requiring enzymes [ironcofactor] Fur IscR Fe FeS Transcription factors FeS status of cell [+Fe] The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line. Fe and Mn regulons Rhizobiaceae Organism Abb. Irr MUR / FUR MntR RirA IscR Sinorhizobium meliloti SM + + - + - + + + - + - Rhizobium leguminosarum RL Rhizobium etli RHE + + - + - Agrobacterium tumefaciens AGR + + - + - Mesorhizobium loti ML + - + + - MBNC + + + - + - + - + - + - Mesorhizobium sp. BNC1 Brucella melitensis Rhizobiales Rhodobacteraceae BQ + + Bradyrhizobium japonicum BJ Rhodopseudomonas palustris RPA + + + + + + - - - Nitrobacter hamburgensis Nham + + - - - Nitrobacter winogradskyi Nwi + + - - - Rhodobacter capsulatus RC - Rhodobacter sphaeroides Rsph + + + + - + + + + Silicibacter STM + + - + + Silicibacter pomeroyi S PO + + - + + Jannaschia Jann + + - #? + + + + quintana and spp. sp. TM1040 sp.CC51 HTCC2654 Rhodobacterales bacterium Roseobacter sp. MED193 Roseovarius nubinhibens - proteobacteria Rhodobacterales Roseovarius ISM sp.217 Loktanella vestfoldensis Sulfitobacter sp. SKA53 EE-36 RB2654 + + - MED193 + + - ISM + + - + #? ROS217 + + - + + SKA53 + + - #? + EE36 + + - #? #? + OB2597 + + OA2633 - + - - + CC - + - - + PB2503 - + - - + Erythrobacter litoralis ELI - - Novosphingobium aromaticivorans Saro - + + - - + + Sphinopyxis g alaskensis HTCC2597 Oceanicola batsensis HTCC2633 Oceanicaulis alexandrii Caulobacterales Caulobacter crescentu s Parvularculales Parvularcula bermudensis Rhodospirillales SAR11 cluster Rickettsiales HTCC2503 Sala - + - - + ZM - + - - + Gluconobacter oxydans GOX - + - + Rhodospirillum rubrum Rrub - + + - - + + Magnetospirillum magneticum Amb - + + - - + PU1002 + + - - + - - - - + Pelagibacter ubique Rickettsia HTCC1002 and Ehrlichia species B. C. + Zymomonas mobilis RB2256 A. Distribution of Irr, Fur/Mur, MntR, RirA, and IscR regulons in α-proteobacteria + - Hyphomonadaceae Sphingomonadales + - Bartonella Bradyrhizobiaceae BME Group D. #?' in RirA column denotes the absence of the rirA gene in an unfinished genomic sequence and the presence of candidate RirA-binding sites upstream of the iron uptake genes. Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria Genes Functions: Iron uptake Iron storage FeS synthesis Iron usage Heme biosynthesis Regulatory genes Manganese uptake Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I Fur sp| Escherichia coli: P0A9A9 ECOLI Pseudomonas aeruginosa PSEAE NEIMA Neisseria meningitidis : sp|Q03456 : sp|P0A0S7 Fur in g- and b- proteobacteria HELPY Helicobacter pylori : sp|O25671 Bacillus subtilis : P54574 sp| BACSU SM mur Sinorhizobium meliloti Mesorhizobium sp. BNC1 (I) MBNC03003179 BQ fur2 Bartonella quintana BMEI0375 Brucella melitensis EE36 12413 Sulfitobacter sp. EE-36 MBNC03003593Mesorhizobium sp. BNC1 (II) HTCC2654 Rhodobacterales bacterium RB2654 19538 Agrobacterium tumefaciens AGR C 620 RHE_CH00378 Rhizobium etli Rhizobium leguminosarum RL mur Nham 0990 Nitrobacter hamburgensis X14 Nwi 0013 Nitrobacter winogradskyi Rhodopseudomonas palustris RPA0450 Bradyrhizobium japonicum BJ fur Roseovarius sp.217 ROS217 18337 Jannaschia sp. CC51 Jann 1799 Silicibacter pomeroyi SPO2477 STM1w01000993Silicibacter sp. TM1040 MED193 22541 Roseobacter sp. MED193 OB2597 02997 Oceanicola batsensis HTCC2597 Loktanella vestfoldensisSKA53 SKA53 03101 Rhodobacter sphaeroides Rsph03000505 Roseovarius nubinhibens ISM ISM 15430 PU1002 04436Pelagibacter ubiqueHTCC1002 GOX0771 Gluconobacter oxydans Zmomonas y mobilis ZM01411 Novosphingobium aromaticivorans Saro02001148 Sphinopyxis alaskensis RB2256 Sala 1452 ELI1325 Erythrobacter litoralis Oceanicaulis alexandrii HTCC2633 OA2633 10204 PB2503 04877 Parvularcula bermudensis HTCC2503 CC0057 Caulobacter crescentus Rhodospirillum rubrum Rrub02001143 (I) Magnetospirillum magneticum Amb1009 Magnetospirillum magneticum (II) Amb4460 Fur in e- proteobacteria Fur in Firmicutes Mur in a-proteobacteria Regulator of manganese uptake genes (sit, mntH) Fur in a-proteobacteria Regulator of iron uptake and metabolism genes Irr a-proteobacteria Erythrobacter litoralis Caulobacter crescentus Zymomonas mobilis Novosphingobium aromaticivorans Oceanicaulis alexandrii Sphinopyxis alaskensis Gluconobacter oxydans Rhodospirillum rubrum Parvularcula bermudensis - Magnetospirillum magneticum Identified Mur-binding sites The A, B, and C groups of a - proteobacteria - Sequence logos for the identified Fur-binding sites in the D group of a-proteobacteria Bacillus subtilis Mur Escherichia coli Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II Fur Escherichia coli : P0A9A9 sp| ECOLI Pseudomonas aeruginosa : sp|Q03456 PSEAE NEIMA Fur in g- and b- proteobacteria Neisseria meningitidis : sp|P0A0S7 HELPY Helicobacter pylori : sp|O25671 sp| BACSU Bacillus subtilis : P54574 Fur in e- proteobacteria Fur in Firmicutes a-proteobacteria Mur / Fur Agrobacterium tumefaciens AGR C 249 Sinorhizobium meliloti SM irr Rhizobium etli RHE CH00106 Rhizobium leguminosarum (I) RL irr1 RL irr2 Rhizobium leguminosarum (II) Mesorhizobium loti MLr5570 MBNC03003186 Mesorhizobium sp. BNC1 BQ fur1 Bartonella quintana Brucella melitensis (I) BMEI1955 Brucella melitensis (II) BMEI1563 BJ blr1216 Bradyrhizobium japonicum (II) RB2654 182 Rhodobacterales bacterium HTCC2654 Loktanella vestfoldensis SKA53 SKA53 01126 Roseovarius sp.217 ROS217 15500 Roseovarius nubinhibens ISM ISM 00785 OB2597 14726 Oceanicola batsensis HTCC2597 Jann 1652 Jannaschia sp. CC51 Rsph03001693Rhodobacter sphaeroides Sulfitobacter sp. EE-36 EE36 03493 STM1w01001534 Silicibacter sp. TM1040 Roseobacter sp. MED193 MED193 17849 SPOA0445 Silicibacter pomeroyi Rhodobacter capsulatus RC irr RPA2339 Rhodopseudomonas palustris (I) RPA0424* Rhodopseudomonas palustris (II) Bradyrhizobium japonicum (I) BJ irr* Nwi 0035* Nitrobacter winogradskyi Nham 1013* Nitrobacter hamburgensis X14 PU1002 04361 Pelagibacter ubique HTCC1002 Irr in a-proteobacteria regulator of iron homeostasis Sequence logos for the identified Irr binding sites in a-proteobacteria. The A group (8 species) - Irr The B group (4 species) - Irr The C group (12 species) - Irr Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria Nitrite/NO-sensing regulator NsrR (Nitrosomonas europeae, Escherichia coli) ROS217_15206 Rsph03001477 RC NsrR GOX0860 Amb1318 Nwi_0743 Iron repressor RirA (Rhizobium leguminosarum) SPOA0186 Ricket. Sala_1049 Saro02000305 NE NsrR OB2597_05195 ROS217_02155 ROS217_14291 SMc00785 RHE CH00735 AGR_C_344 Cysteine metabolism repressor CymR (Bacillus subtilis) AGR_L_1131 SPO3722 RHE_CH02777 RL_3336 SPO1393 MBNC02000669 MLl1642 SMc02238 AGR_C_872 RHE_CH00547 OA2633_11510 RL RirA BMEII0707 MLr1147 MBNC02002196 BQ04990 RC 0780 RB2654_19993 Rsph023178 SPO0432 MED193_09800 STM_634 Positional clustering of rrf2-like genes with: iron uptake and storage genes; Fe-S cluster synthesis operons; genes involved in nitrosative stress protection; sulfate uptake/assimilation genes; CC0132 thioredoxin reductase; SMc01160 BJ blr7974 carboxymuconolactone RL_5159 AGR_L_2343 decarboxylase-family genes; AGR_C_402 hmc cytochrome operon NsrR RirA RL_619 ZMO0116 ROS217_16231 GOX0099 BS CymR IscR-II Rrub02000219 ZMO0422 Sala_1236 IscR ELI0458 Saro3534 DV Rrf2 OA2633_03246 CC1866 EC IscR Jann_2366 STM_3629 EE36_14302 SPO2025 Rsph023725 RC_0477 Rrub_1115 Amb0200 GOX1196 RPA0663 Ricket. Cytochrome complex regulator Rrf2 (Desulfovibrio vulgaris) Iron-Sulfur cluster synthesis repressor IscR (Escherichia coli) PB2503_ 09884 proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain proteins without a cysteine triad motif Sequence logos for the identified RirA-binding sites in a-proteobacteria The A group - RirA (8 species) The C group - RirA (12 species) An attempt to reconstruct the history Open problems • Model the evolution of regulatory systems (a catalog of elementary events, estimates of probabilities) – – – – – Birth of a binding site; what are the mechanisms? Loss of a binding site Duplication of a regulated gene and/or a regulator Horizontal transfer of a regulated gene and/or a regulator Loss of structural a gene and/or a regulator • Develop an evolutionary model that would converge to the present state (that is, have the same properties) – – – – Distribution of TF families sizes Distribution of regulon sizes Other graph-theoretical properties (node degrees etc.) General properties? E.g. stable cores and flexible margins of functional systems (in terms of gene presence and regulation) • “Microevolution” (strains): – “metagenomic” regulatory systems? • Co-evolution of TFs and DNA sites: – “Neutral” model for the evolution of binding sites (with invariant functional pressure from the bound protein) – How do the signals evolve? What is the driving force – changes in TFs? – TF-family, position-specific protein-DNA recognition code? All that needs to take into account the incompleteness and noise in the data Acknowledgements • • • • • • • Andrei A. Mironov Dmitry Rodionov (now at Burnham Institute) Olga Laikova Alexei Vitreschak Anna Gerasimova Ekateina Kotelnikova (now at Ariadne Genomics) Ekaterina Panina (now at UCLA) • Leonid Mirny (MIT) • • • • Howard Hughes Medical Institute Russian Fund of Basic Research Russian Academy of Sciences, program “Molecular and Cellular Biology” INTAS