Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission Problems, RAS Moscow, Russia Singapore, 17-18 July 2006 Comparative genomics of regulation • Why – Functional annotation of genes – Metabolic modeling – Practical applications in genetic engineering, drug targeting etc. • How – Close genomes: phylogenetic footprinting. Regulatory sites are seen as conservation islands in alignments of gene upstream regions – Distant genomes: consistency filtering. Candidate sites in one genome may be unreliable, but independent occurrence upstream of orthologous genes in many genomes yields reliable predictions • Caveats – – – – – • Presense of (predicted) binding sites does not immediately imply functional regulation Operon structure Need to verify presence of orthologous transcription factors in the studied genomes Orthologous factors may have different binding motifs One functional system may be regulated by different factors within and between genomes Many genomes – Taxon-specific regulation – Evolution • • • • individual sites transcription-fator families transcription factors and their binding motifs simple and complex regulatory systems How it works: Two simple examples • Biotin regulator of alpha-proteobacteria • Universal regulator of ribonucleotide reductases: reconstruction of the regulatory system and the mechanism of regulation BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing Profile 1: Gram-positive bacteria, Archaea Profile 2: Gram-negative bacteria BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing Profile 1: Gram-positive bacteria, Archaea Profile 2: Gram-negative bacteria BirA of alpha-proteobacteria: no DNA-binding domain Identification of the candidate regulator (BioR) in alphaproteobacteria 1. Candidate binding sites: similar palindromes upstream of biotin biosynthesis and transport genes in different genomes TTATAGATAA TTATCTATAA TTATAGATAg TTATCTATAA TTATCTATAA TTATAGATAg TTATCTATAA TcATATATtA TcATAGATAg TTATCTATAA TTATCTATAA TTATCTATtA TTATCTAcAA TTATCTATAA TTATCTATAA TTATCTATAA TcATAGATtA cTATAGATAA TTATCTAcAA 1. 2. Positional clustering: candidate transcription factor from the GntR family is often found in the same loci (black arrows) 3. Phyletic patterns: phyletic distribution of candidate sites (red cirsles) exactly coincides with the phyletic distribution of the candidate regulator 4. Autoregulation: in many cases there are candidate sites upstream of the bioR gene itself Conserved signal upstream of nrd genes Identification of the candidate regulator by the analysis of phyletic patterns • COG1327: the only COG with exactly the same phylogenetic pattern as the signal – “large scale” on the level of major taxa – “small scale” within major taxa: • absent in small parasites among alpha- and gammaproteobacteria • absent in Desulfovibrio spp. among delta-proteobacteria • absent in Nostoc sp. among cyanobacteria • absent in Oenococcus and Leuconostoc among Firmicutes • present only in Treponema denticola among four spirochetes COG1327 “Predicted transcriptional regulator, consists of a Zn-ribbon and ATP-cone domains”: regulator of the riboflavin pathway? Additional evidence – 1 • nrdR is sometimes clustered with nrd genes or with replication genes dnaB, dnaI, polA Additional evidence – 2 • In some genomes, candidate NrdR-binding sites are found upstream of other replicationrelated genes – dNTP salvage – topoisomerase I, replication initiator dnaA, chromosome partitioning, DNA helicase II Multiple sites (nrd genes): FNR, DnaA, NrdR Mode of regulation • Repressor (overlaps with promoters) • Co-operative binding: – most sites occur in tandem (> 90% cases) – the distance between the copies (centers of palindromes) equals an integer number of DNA turns: • mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns • 21 bp (2 turns) in Vibrio spp. • 41-42 bp (4 turns) in some Firmicutes • experimental confirmation in Streptomyces (Borovok et al., 2004) Evolutionary processes that shape regulatory systems • Expansion and contraction of regulons • Duplications of regulators with or without regulated loci • Loss of regulators with or without regulated loci • Re-assortment of regulators and structural genes • … especially in complex systems • Horizontal transfer Loss of regulators, and cryptic sites Loss of the RbsR in Y. pestis (ABC-transporter also is lost) RbsR binding site Start codon of rbsD Regulon expansion: how FruR has become CRA Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Common ancestor of Enterobacteriales Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Enterobacteriales Common ancestor of Escherichia and Salmonella Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Enterobacteriales E. coli and Salmonella spp. Trehalose/maltose catabolism, alpha-proteobacteria Duplicated LacI-family regulators: lineage-specific post-duplication loss The binding signals are very similar (the blue branch is somewhat different: to avoid cross-recognition?) Utilization of an unknown galactoside, gamma-proteobacteria Yersinia and Klebsiella: two regulons, GalR (not shown, includes genes galK and galT) and Laci-X Erwinia: one regulon, GalR Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup) Utilization of maltose/maltodextrin, Firmicutes Two different ABC transporters (shades of red) PTS (pink) Glucoside hydrolases (shades of green) Two regulators (black and grey) Modularity of the functional subsystem Two different ABC systems Three hydrolases in one operon (E. faecalis) or separately Changes of regulation Displacement: invasion of a regulator from a different subfamily (horizontal transfer from a related species?) – blue sites Orthologous TFs with completely different regulons (alpha-proteobaceria and Xanthomonadales) Catabolism of gluconate, proteobacteria extreme variability of regulation of “marginal” regulon members β Pseudomonas spp. γ Combined regulatory network for iron homeostasis genes in a-proteobacteria [- Fe] [+Fe] [ - Fe] [+Fe] RirA RirA Irr Irr FeS heme degraded Siderophore uptake 2+ 3+ Fe / Fe uptake Iron uptakesystems Fur [- Fe] Iron storage ferritins FeS synthesis Heme synthesis Iron-requiring enzymes [ironcofactor] Fur IscR Fe FeS Transcription factors FeS status of cell [+Fe] The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line. Fe and Mn regulons Rhizobiaceae Organism Abb. Irr MUR / FUR MntR RirA IscR Sinorhizobium meliloti SM + + - + - + + + - + - Rhizobium leguminosarum RL Rhizobium etli RHE + + - + - Agrobacterium tumefaciens AGR + + - + - Mesorhizobium loti ML + - + + - MBNC + + + - + - + - + - + - Mesorhizobium sp. BNC1 Brucella melitensis Rhizobiales Rhodobacteraceae BQ + + Bradyrhizobium japonicum BJ Rhodopseudomonas palustris RPA + + + + + + - - - Nitrobacter hamburgensis Nham + + - - - Nitrobacter winogradskyi Nwi + + - - - Rhodobacter capsulatus RC - Rhodobacter sphaeroides Rsph + + + + - + + + + Silicibacter STM + + - + + Silicibacter pomeroyi S PO + + - + + Jannaschia Jann + + - #? + + + + quintana and spp. sp. TM1040 sp.CC51 HTCC2654 Rhodobacterales bacterium Roseobacter sp. MED193 Roseovarius nubinhibens - proteobacteria Rhodobacterales Roseovarius ISM sp.217 Loktanella vestfoldensis Sulfitobacter sp. SKA53 EE-36 RB2654 + + - MED193 + + - ISM + + - + #? ROS217 + + - + + SKA53 + + - #? + EE36 + + - #? #? + OB2597 + + OA2633 - + - - + CC - + - - + PB2503 - + - - + Erythrobacter litoralis ELI - - Novosphingobium aromaticivorans Saro - + + - - + + Sphinopyxis g alaskensis HTCC2597 Oceanicola batsensis HTCC2633 Oceanicaulis alexandrii Caulobacterales Caulobacter crescentu s Parvularculales Parvularcula bermudensis Rhodospirillales SAR11 cluster Rickettsiales HTCC2503 Sala - + - - + ZM - + - - + Gluconobacter oxydans GOX - + - + Rhodospirillum rubrum Rrub - + + - - + + Magnetospirillum magneticum Amb - + + - - + PU1002 + + - - + - - - - + Pelagibacter ubique Rickettsia HTCC1002 and Ehrlichia species B. C. + Zymomonas mobilis RB2256 A. Distribution of Irr, Fur/Mur, MntR, RirA, and IscR regulons in α-proteobacteria + - Hyphomonadaceae Sphingomonadales + - Bartonella Bradyrhizobiaceae BME Group D. #?' in RirA column denotes the absence of the rirA gene in an unfinished genomic sequence and the presence of candidate RirA-binding sites upstream of the iron uptake genes. Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I Fur sp| Escherichia coli: P0A9A9 ECOLI Pseudomonas aeruginosa PSEAE NEIMA Neisseria meningitidis : sp|Q03456 : sp|P0A0S7 Fur in g- and b- proteobacteria HELPY Helicobacter pylori : sp|O25671 Bacillus subtilis : P54574 sp| BACSU SM mur Sinorhizobium meliloti Mesorhizobium sp. BNC1 (I) MBNC03003179 BQ fur2 Bartonella quintana BMEI0375 Brucella melitensis EE36 12413 Sulfitobacter sp. EE-36 MBNC03003593Mesorhizobium sp. BNC1 (II) HTCC2654 Rhodobacterales bacterium RB2654 19538 Agrobacterium tumefaciens AGR C 620 RHE_CH00378 Rhizobium etli Rhizobium leguminosarum RL mur Nham 0990 Nitrobacter hamburgensis X14 Nwi 0013 Nitrobacter winogradskyi Rhodopseudomonas palustris RPA0450 Bradyrhizobium japonicum BJ fur Roseovarius sp.217 ROS217 18337 Jannaschia sp. CC51 Jann 1799 Silicibacter pomeroyi SPO2477 STM1w01000993Silicibacter sp. TM1040 MED193 22541 Roseobacter sp. MED193 OB2597 02997 Oceanicola batsensis HTCC2597 Loktanella vestfoldensisSKA53 SKA53 03101 Rhodobacter sphaeroides Rsph03000505 Roseovarius nubinhibens ISM ISM 15430 PU1002 04436Pelagibacter ubiqueHTCC1002 GOX0771 Gluconobacter oxydans Zmomonas y mobilis ZM01411 Novosphingobium aromaticivorans Saro02001148 Sphinopyxis alaskensis RB2256 Sala 1452 ELI1325 Erythrobacter litoralis Oceanicaulis alexandrii HTCC2633 OA2633 10204 PB2503 04877 Parvularcula bermudensis HTCC2503 CC0057 Caulobacter crescentus Rhodospirillum rubrum Rrub02001143 (I) Magnetospirillum magneticum Amb1009 Magnetospirillum magneticum (II) Amb4460 Fur in e- proteobacteria Fur in Firmicutes Mur in a-proteobacteria Regulator of manganese uptake genes (sit, mntH) Fur in a-proteobacteria Regulator of iron uptake and metabolism genes Irr a-proteobacteria Erythrobacter litoralis Caulobacter crescentus Zymomonas mobilis Novosphingobium aromaticivorans Oceanicaulis alexandrii Sphinopyxis alaskensis Gluconobacter oxydans Rhodospirillum rubrum Parvularcula bermudensis - Magnetospirillum magneticum Identified Mur-binding sites The A, B, and C groups of a - proteobacteria - Sequence logos for the identified Fur-binding sites in the D group of a-proteobacteria Bacillus subtilis Mur Escherichia coli Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II Fur Escherichia coli : P0A9A9 sp| ECOLI Pseudomonas aeruginosa : sp|Q03456 PSEAE NEIMA Fur in g- and b- proteobacteria Neisseria meningitidis : sp|P0A0S7 HELPY Helicobacter pylori : sp|O25671 sp| BACSU Bacillus subtilis : P54574 Fur in e- proteobacteria Fur in Firmicutes a-proteobacteria Mur / Fur Agrobacterium tumefaciens AGR C 249 Sinorhizobium meliloti SM irr Rhizobium etli RHE CH00106 Rhizobium leguminosarum (I) RL irr1 RL irr2 Rhizobium leguminosarum (II) Mesorhizobium loti MLr5570 MBNC03003186 Mesorhizobium sp. BNC1 BQ fur1 Bartonella quintana Brucella melitensis (I) BMEI1955 Brucella melitensis (II) BMEI1563 BJ blr1216 Bradyrhizobium japonicum (II) RB2654 182 Rhodobacterales bacterium HTCC2654 Loktanella vestfoldensis SKA53 SKA53 01126 Roseovarius sp.217 ROS217 15500 Roseovarius nubinhibens ISM ISM 00785 OB2597 14726 Oceanicola batsensis HTCC2597 Jann 1652 Jannaschia sp. CC51 Rsph03001693Rhodobacter sphaeroides Sulfitobacter sp. EE-36 EE36 03493 STM1w01001534 Silicibacter sp. TM1040 Roseobacter sp. MED193 MED193 17849 SPOA0445 Silicibacter pomeroyi Rhodobacter capsulatus RC irr RPA2339 Rhodopseudomonas palustris (I) RPA0424* Rhodopseudomonas palustris (II) Bradyrhizobium japonicum (I) BJ irr* Nwi 0035* Nitrobacter winogradskyi Nham 1013* Nitrobacter hamburgensis X14 PU1002 04361 Pelagibacter ubique HTCC1002 Irr in a-proteobacteria regulator of iron homeostasis Sequence logos for the identified Irr binding sites in a-proteobacteria The A group (8 species) - Irr The B group (4 species) - Irr The C group (12 species) - Irr Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria Nitrite/NO-sensing regulator NsrR (Nitrosomonas europeae, Escherichia coli) ROS217_15206 Rsph03001477 RC NsrR GOX0860 Amb1318 Nwi_0743 Iron repressor RirA (Rhizobium leguminosarum) SPOA0186 Ricket. Sala_1049 Saro02000305 NE NsrR OB2597_05195 ROS217_02155 ROS217_14291 SMc00785 RHE CH00735 AGR_C_344 Cysteine metabolism repressor CymR (Bacillus subtilis) AGR_L_1131 SPO3722 RHE_CH02777 RL_3336 SPO1393 MBNC02000669 MLl1642 SMc02238 AGR_C_872 RHE_CH00547 OA2633_11510 RL RirA BMEII0707 MLr1147 MBNC02002196 BQ04990 RC 0780 RB2654_19993 Rsph023178 SPO0432 MED193_09800 STM_634 Positional clustering of rrf2-like genes with: iron uptake and storage genes; Fe-S cluster synthesis operons; genes involved in nitrosative stress protection; sulfate uptake/assimilation genes; CC0132 thioredoxin reductase; SMc01160 BJ blr7974 carboxymuconolactone RL_5159 AGR_L_2343 decarboxylase-family genes; AGR_C_402 hmc cytochrome operon NsrR RirA RL_619 ZMO0116 ROS217_16231 GOX0099 BS CymR IscR-II Rrub02000219 ZMO0422 Sala_1236 IscR ELI0458 Saro3534 DV Rrf2 OA2633_03246 CC1866 EC IscR Jann_2366 STM_3629 EE36_14302 SPO2025 Rsph023725 RC_0477 Rrub_1115 Amb0200 GOX1196 RPA0663 Ricket. Cytochrome complex regulator Rrf2 (Desulfovibrio vulgaris) Iron-Sulfur cluster synthesis repressor IscR (Escherichia coli) PB2503_ 09884 proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain proteins without a cysteine triad motif Sequence logos for the identified RirA-binding sites in a-proteobacteria The A group - RirA (8 species) The C group - RirA (12 species) Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria Genes Functions: Iron uptake Iron storage FeS synthesis Iron usage Heme biosynthesis Regulatory genes Manganese uptake An attempt to reconstruct the history Regulators and their signals • Subtle changes at close evolutionary distances • Cases of motif conservation at surprisingly large distances • Correlation between contacting nucleotides and amino acid residues DNA signals and protein-DNA interactions Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom) CRP PurR IHF TrpR Specificity-determining positions in the LacI family • Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups – 44 SDPs 10 residues contact NPF (analog of the effector) 7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ) 6 residues in the intersubunit contacts 5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ) 7 residues contact the operator sequence 6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ) LacI from E.coli The LacI family: subtle changes in signals at close distances G A CG Gn GC n CRP/FNR family of regulators TGTCGGCnnGCCGACA CooA Desulfovibrio TTGTGAnnnnnnTCACAA FNR Gamma TTGATnnnnATCAA HcpR Desulfovibrio TTGTgAnnnnnnTcACAA Correlation between contacting nucleotides and amino acid residues • • • • DD DV EC YP VC DD DV EC YP VC CooA in Desulfovibrio spp. CRP in Gamma-proteobacteria HcpR in Desulfovibrio spp. FNR in Gamma-proteobacteria COOA COOA CRP CRP CRP HCPR HCPR FNR FNR FNR ALTTEQLSLHMGATRQTVSTLLNNLVR ELTMEQLAGLVGTTRQTASTLLNDMIR KITRQEIGQIVGCSRETVGRILKMLED KXTRQEIGQIVGCSRETVGRILKMLED KITRQEIGQIVGCSRETVGRILKMLEE DVSKSLLAGVLGTARETLSRALAKLVE DVTKGLLAGLLGTARETLSRCLSRMVE TMTRGDIGNYLGLTVETISRLLGRFQK TMTRGDIGNYLGLTVETISRLLGRFQK TMTRGDIGNYLGLTVETISRLLGRFQK Contacting residues: REnnnR TG: 1st arginine GA: glutamate and 2nd arginine TGTCGGCnnGCCGACA TTGTGAnnnnnnTCACAA TTGTgAnnnnnnTcACAA TTGATnnnnATCAA The correlation holds for other factors in the family Open problems • Model the evolution of regulatory systems (a catalog of elementary events, estimates of probabilities) – – – – – – Birth of a binding site; what are the mechanisms? Loss of a binding site Duplication of a regulated gene and/or a regulator Horizontal transfer of a regulated gene and/or a regulator Loss of structural a gene and/or a regulator General properties? • Distribution of TF family and regulon sizes • Stable cores and flexible margins of functional systems (in terms of gene presence and regulation) • Co-evolution of TFs and DNA sites: – “Neutral” model for the evolution of binding sites (with invariant functional pressure from the bound protein) – How do the signals evolve? What is the driving force – changes in TFs? – TF-family, position-specific protein-DNA recognition code? All that needs to take into account the incompleteness and noise in the data Acknowledgements • Andrei A. Mironov (algorithms and software) • Alexandra B. Rakhmaninova (SDPs) • Dmitry Rodionov (now at Burnham Institute) (BioR, NrdR, iron) • Olga Laikova (LacI, sugars) • Dmitry Ravcheev (FruR) • Olga Kalinina (SDPs/LacI) • Leonid Mirny, MIT (protein/DNA contacts, SDPs) • Andy Johnston, University of East Anglia (iron) • • • • Howard Hughes Medical Institute Russian Fund of Basic Research Russian Academy of Sciences, program “Molecular and Cellular Biology” INTAS