Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems Moscow, Russia Cologne, July 2008 Plan • Individual sites • Transcription factors and their binding motifs • Regulatory systems and regulons Birth and death of sites is a very dynamic process NadR-binding sites upstream of pnuB seem absent in Klebsiella pneumoniae and Serratia marcescens … but there are candidate sites further upstream … … and they are clearly different (not simply misaligned). Cryptic sites and loss of regulators Loss of RbsR in Y. pestis (ABC-transporter also is lost) RbsR binding site Start codon of rbsD Unexpected conservation of non-consensus positions in orthologous sites regulatory site of LexA upstream of lexA consensus nucleotides are in caps Escherichia coli Salmonella typhi Yersinia pestis Haemophilus influenzae Pasteurella multocida Vibrio cholerae TgCTGTATATActcACAGcA aACTGTATATActcACAGcA agCTGTATATActcACAGcA atCTGTATAcAatacCAGTt TtCTGTATATAataACAGTt cACTGgATATActcACAGTc wrong consensus? TF PurR, gene purL Escherichia coli Salmonella typhi Yersinia pestis Haemophilus influenzae Pasteurella multocida Vibrio cholerae A C G C A A A C Gg T T t C G T A C G C A A A C Gg T T t C G T A C G C A A A C Gg T T t C G T A t G C A A A C G T T T G Ct T A C G C A A A C G T T Tt C G T A C G C A A A C Gg T T G C t T TF PurR, gene purM Escherichia coli Salmonella typhi Yersinia pestis Haemophilus influenzae Pasteurella multocida Vibrio cholerae t C G C A A A C G T T T G Ct T t C G C A A A C G T T T G Ct T t C G C A A A C G T T T G Cc T t C G C A A A C G T T T G Ct T t C G C A A A C G T T T G Ct T A C G C A A A C G T T Tt C c T Non-consensus positions are more conserved than synonymous codon positions Regulators and their motifs • Cases of motif conservation at surprisingly large distances • Subtle changes at close evolutionary distances • Correlation between contacting nucleotides and amino acid residues • Changes in symmetry patterns NrdR (regulator of ribonucleotide reducases and some other replication-related genes): conservation at large distances DNA motifs and protein-DNA interactions Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom) CRP PurR IHF TrpR The LacI family: subtle changes in motifs at close distances G A CG Gn GC n Specificity-determining positions in the LacI family Training set: 459 sequences average length: 338 amino acids, 85 specificity groups – 44 SDPs 10 residues contact NPF (analog of the effector) 7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ) 6 residues in the intersubunit contacts 5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ) 7 residues contact the operator sequence 6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ) LacI from E.coli The CRP/FNR family of regulators TGTCGGCnnGCCGACA CooA Desulfovibrio TTGTGAnnnnnnTCACAA FNR Gamma TTGATnnnnATCAA HcpR Desulfovibrio TTGTgAnnnnnnTcACAA Correlation between contacting nucleotides and amino acid residues • • • • DD DV EC YP VC DD DV EC YP VC CooA in Desulfovibrio spp. CRP in Gamma-proteobacteria HcpR in Desulfovibrio spp. FNR in Gamma-proteobacteria COOA COOA CRP CRP CRP HCPR HCPR FNR FNR FNR Contacting residues: REnnnR TG: 1st arginine GA: glutamate and 2nd arginine ALTTEQLSLHMGATRQTVSTLLNNLVR ELTMEQLAGLVGTTRQTASTLLNDMIR KITRQEIGQIVGCSRETVGRILKMLED KXTRQEIGQIVGCSRETVGRILKMLED KITRQEIGQIVGCSRETVGRILKMLEE DVSKSLLAGVLGTARETLSRALAKLVE DVTKGLLAGLLGTARETLSRCLSRMVE TMTRGDIGNYLGLTVETISRLLGRFQK TMTRGDIGNYLGLTVETISRLLGRFQK TMTRGDIGNYLGLTVETISRLLGRFQK TGTCGGCnnGCCGACA TTGTGAnnnnnnTCACAA TTGTgAnnnnnnTcACAA TTGATnnnnATCAA The correlation holds for other factors in the family NrtR (regulator of NAD metabolism): systematic search for correlated positions • • • • analysis of correlated positions in proteins and sites analysis of specificity determining positions the same positions in one alpha-helix identified plans for experimental verification Comparison with the recently solved structure: correlated positions indeed bind the DNA (more exactly, form a hydrophobic cluster) NiaR: changed dimer structure? The GalR family and Cproteins of RMsystems: direct and inverted repeats BirA: changed spacing What are the events leading to the present-day state? • Expansion and contraction of regulons • New regulators (where from?) • Duplications of regulators with or without regulated loci • Loss of regulators with or without regulated loci • Re-assortment of regulators and structural genes • … especially in complex systems • Horizontal transfer Trehalose/maltose catabolism in alpha-proteobacteria Duplicated LacI-family regulators: lineagespecific post-duplication loss The binding motifs are very similar (the blue branch is somewhat different: to avoid cross-recognition?) Utilization of an unknown galactoside in gamma-proteobacteria Yersinia and Klebsiella: two regulons, GalR and Laci-X Erwinia: one regulon, GalR Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup) Utilization of maltose/maltodextrin in Firmicutes Displacement: invasion of a regulator from a different subfamily (horizontal transfer from a related species?) – blue sites Orthologous TFs with completely different regulons (alpha-proteobaceria and Xanthomonadales) Catabolism of gluconate in proteobacteria Extreme variability of the regulation of “marginal” regulon members β Pseudomonas spp. γ Regulon expansion, or how FruR has become CRA • CRA (a.k.a. FruR) in Escherichia coli: – global regulator – well-studied in experiment (many regulated genes known) • Going back in time: looking for candidate CRA/FruR sites upstream of (orthologs of) genes known to be regulated in E.coli Common ancestor of gamma-proteobacteria Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Common ancestor of the Enterobacteriales Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Enterobacteriales Common ancestor of Escherichia and Salmonella Mannose Glucose manXYZ ptsHI-crr edd epd eda adhE aceEF Mannitol mtlA gapA fbp Fructose pykF mtlD fruBA fruK pfkA pgk gpmA icdA ppsA pckA aceA tpiA aceB Gamma-proteobacteria Enterobacteriales E. coli and Salmonella spp. Regulation of amino acid biosynthesis in Firmicutes • Interplay between regulatory RNA elements and transcription factors • Expansion of T-box systems (normally – RNA structures regulating aminoacyl-tRNA-synthetases) Three regulatory systems for the methionine biosynthesis A. B. C. SAMdependent riboswitch Met-T-box MtaR: repressor of transcription MtaR Methionine regulatory systems: loss of S-box regulons • S-boxes (SAM-1 riboswitch) – Bacillales – Clostridiales – the Zoo: ZOO • Petrotoga • actinobacteria (Streptomyces, Thermobifida) • Chlorobium, Chloroflexus, Cytophaga • Fusobacterium • Deinococcus • proteobacteria (Xanthomonas, Geobacter) • Met-T-boxes (Met-tRNA-dependent attenuator) + SAM-2 riboswitch for metK – Lactobacillales • candidate TF-binding motif: MtaR – Streptococcales Lact. Strep. Bac. Clostr. Recent duplications and bursts: ARG-T-box in Clostridium difficile LR_ARGS CPE_ARGS CAC_ARGS CB_ARGS CBE_ARGS Lactobacillales CTC_ARGS LP_ARGS LME_ARGS Clostridiales argS argS LJ_ARGS CDF_YQIXYZ LGA_ARGS RDF02391 PPE_ARGS LSA_ARGS СDF_ARGC BC_ARGS2 EF_ARGS BH_ARGS CDF_ARGH Bacillales argS : ARG-specific T-box regulatory site yqiXYZ NEW NEW aminoacyl-tRNA synthetase biosynthetic genes amino acid transporters Clostridium difficile RDF02391 argCJBDF argH others argG predicted amino acid transporters amino acid biosynthetic genes … caused by loss of transcription factor AhrC Gram+ bacteria: Clostridium difficile: AhrC regulatory protein (negative regulation of arginine metabolism positive regulation of arginine catabolism) Binding to 5’ UTR gene region regulation of gene expression 5’ ... AhrC site AhrC is lost Expansion of T-box regulon regulation of expression of arginine biosynthetic and transport genes by T-box antitermination Other clostridia spp. (CA, CTC, CTH, CPE, CB, CPE) yqiXYZ yqiXYZ argC argH argC argH argG : AhrC binding site : ARG-specific T-box regulatory site CH_HISS Bacillales Other Gram+ hisS aspS CTH_HISS Lactobacillales ASP\ASN his operon DRE_HISS HIS TTE_HISS ASP GAC his XYZ PL_HISS Rapid mutation of regulatory codons NEW BE_HISS ASN AAC BL_HISS BS_HISS BC_HISS LRE_HISXYZ LSA_HISXYZ OOE_HISXYZ SGO_HISC SMU_HISC Z XY HI S _ LP EF_HISXYZ OB_HISS BCL_HISS Duplications and changes in specificity: ASN/ASP/HIS T-boxes HIS BH_HISS EX_HISS LME_HISXYZ CDF_HISZX EF_HISS LMO_HISXYZ EF_HISXYZ LME_HIS(Z\G) LL_HISC LP_HISZ Clostridiales CPE_ASNS2 CDF_ASNA CB_ASNS2 CDF_ASNS2 CTC_ASNA asnS ASN LCA_HISZ CB_ASNS3 CAC_ASNS32 asnA BC_ASNS2 BC_ASNA ASN CBE_ASNS2 P. pentosaceus asnS CTC_ASNS2 CPE_ASNA ASP PPE_HISXYZ Lactobacillales hisS aspS PPE_ASNS EX_ASNA LCA_HISS ASP hisXYZ HIS LB_ASNA LB_ASNS2 LJ_HISS LP_ASNA PPE_ASNA Lactobacillales asnS ASN LB_HISS asnA LRE_ASPS LP_HISS PPE_HISS L. reuteri aspS ASP hisS HIS LRE_HISS ASN LJ_ASNA L. johnsonii asnA LJ_glnQHMP LD_ASNA ASN glnQHMP ASP SG_ASPS2 SMU_ASPS2 Blow-up 1 LCA_HISS LJ_HISS PPE_HISXYZ PPE_ASNS2 LB_HISS LRE_ASPS LB_ASNA LP_HISS PPE_HISS PPE_ASNA LP_ASNA LRE_HISS ASN AAC HIS CAC P. pentosaceus asnS ASP LJ_ASNA hisXYZ LJ_GLNQHMP ASP ASN AAC HIS CAC GAC ASP GAC Lactobacillales Lactobacillales asnA hisS aspS ASN ASP L. reuteri L. johnsonii aspS hisS HIS LD_ASNA ASP disruption of hisS-aspS operon mutation of regulatory codon asnA ASN glnQHMP ASP HIS Blow-up 2. Prediction Regulators lost in lineages with expanded HIS-T-box regulon?? … and validation • conserved motifs upstream of HIS biosynthesis genes Bacillales (his operon) Clostridiales Thermoanaerobacteriales Halanaerobiales Bacillales • candidate transcription factor yerC co-localized with the his genes • present only in genomes with the motifs upstream of the his genes • genomes with neither YerC motif nor HIS-T-boxes: attenuators The evolutionary history of the his genes regulation in the Firmicutes T-boxes: Summary / History Life without Fur Regulation of iron homeostasis (the Escherichia coli paradigm) Iron: • essential cofactor (limiting in many environments) • dangerous at large concentrations FUR (responds to iron): • synthesis of siderophores • transport (siderophores, heme, Fe2+, Fe3+) • storage • iron-dependent enzymes • synthesis of heme • synthesis of Fe-S clusters Similar in Bacillus subtilis Regulation of iron homeostasis in α-proteobacteria [- Fe] [+Fe] [ - Fe] [+Fe] RirA RirA Irr Irr FeS heme degraded Siderophore uptake 2+ 3+ Fe / Fe uptake Iron uptakesystems Fur [- Fe] Iron storage ferritins FeS synthesis Heme synthesis Iron-requiring enzymes [ironcofactor] Fur IscR Fe FeS Transcription factors FeS status of cell [+Fe] Experimental studies: • FUR/MUR: Bradyrhizobium, Rhizobium and Sinorhizobium • RirA (Rrf2 family): Rhizobium and Sinorhizobium • Irr (FUR family): Bradyrhizobium, Rhizobium and Brucella Distribution of transcription factors in genomes Search for candidate motifs and binding sites using standard comparative genomic techniques FUR/MUR branch of the FUR family Fur sp| Escherichia coli: P0A9A9 ECOLI Pseudomonas aeruginosa : sp|Q03456 PSEAE NEIMA Fur in g- and b- proteobacteria Neisseria meningitidis : sp|P0A0S7 HELPY Helicobacter pylori : sp|O25671 P54574 BACSU Bacillus subtilis : sp| SM mur Sinorhizobium meliloti Mesorhizobium sp. BNC1 (I) MBNC03003179 BQ fur2 Bartonella quintana BMEI0375 Brucella melitensis EE36 12413 Sulfitobacter sp. EE-36 MBNC03003593Mesorhizobium sp. BNC1 (II) Rhodobacterales bacterium HTCC2654 RB2654 19538 Agrobacterium tumefaciens AGR C 620 RHE_CH00378 Rhizobium etli Rhizobium leguminosarum RL mur Nham 0990 Nitrobacter hamburgensis X14 Nwi 0013 Nitrobacter winogradskyi Rhodopseudomonas palustris RPA0450 Bradyrhizobium japonicum BJ fur Roseovarius sp.217 ROS217 18337 Jannaschia sp. CC51 Jann 1799 Silicibacter pomeroyi SPO2477 STM1w01000993Silicibacter sp. TM1040 MED193 22541 Roseobacter sp. MED193 OB2597 02997Oceanicola batsensisHTCC2597 Loktanella vestfoldensisSKA53 SKA53 03101 Rhodobacter sphaeroides Rsph03000505 Roseovarius nubinhibensISM ISM 15430 PU1002 04436Pelagibacter ubiqueHTCC1002 GOX0771 Gluconobacter oxydans Zmomonas y mobilis ZM01411 Saro02001148 Novosphingobium aromaticivorans Sphinopyxis alaskensis RB2256 Sala 1452 ELI1325 Erythrobacter litoralis Oceanicaulis alexandrii HTCC2633 OA2633 10204 PB2503 04877 Parvularcula bermudensis HTCC2503 CC0057 Caulobacter crescentus Rhodospirillum rubrum Rrub02001143 Magnetospirillum magneticum (I) Amb1009 Magnetospirillum magneticum(II) Amb4460 Fur in e- proteobacteria Fur in Firmicutes Mur in a-proteobacteria Regulator of manganese uptake genes (sit, mntH) Fur in a-proteobacteria Regulator of iron uptake and metabolism genes Irr a-proteobacteria Erythrobacter litoralis Caulobacter crescentus Zymomonas mobilis Novosphingobium aromaticivorans Oceanicaulis alexandrii Sphinopyxis alaskensis Gluconobacter oxydans Rhodospirillum rubrum Parvularcula bermudensis - Magnetospirillum magneticum Identified Mur-binding sites of a - proteobacteria - FUR and MUR boxes Bacillus subtilis Mur Escherichia coli Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis Irr branch of the FUR family Fur Escherichia coli : P0A9A9 sp| ECOLI Pseudomonas aeruginosa : sp|Q03456 PSEAE NEIMA Fur in g- and b- proteobacteria Neisseria meningitidis : sp|P0A0S7 HELPY Helicobacter pylori : sp|O25671 sp| BACSU Bacillus subtilis : P54574 Fur in e- proteobacteria Fur in Firmicutes a-proteobacteria Mur / Fur Agrobacterium tumefaciens AGR C 249 Sinorhizobium meliloti SM irr Rhizobium etli RHE CH00106 Rhizobium leguminosarum (I) RL irr1 RL irr2 Rhizobium leguminosarum (II) Mesorhizobium loti MLr5570 MBNC03003186 Mesorhizobium sp. BNC1 BQ fur1 Bartonella quintana Brucella melitensis (I) BMEI1955 Brucella melitensis (II) BMEI1563 BJ blr1216 Bradyrhizobium japonicum (II) RB2654 182 Rhodobacterales bacterium HTCC2654 Loktanella vestfoldensis SKA53 SKA53 01126 Roseovarius sp.217 ROS217 15500 Roseovarius nubinhibens ISM ISM 00785 OB2597 14726 Oceanicola batsensis HTCC2597 Jann 1652 Jannaschia sp. CC51 Rsph03001693Rhodobacter sphaeroides Sulfitobacter sp. EE-36 EE36 03493 STM1w01001534 Silicibacter sp. TM1040 Roseobacter sp. MED193 MED193 17849 SPOA0445 Silicibacter pomeroyi Rhodobacter capsulatus RC irr RPA2339 Rhodopseudomonas palustris (I) RPA0424* Rhodopseudomonas palustris (II) Bradyrhizobium japonicum (I) BJ irr* Nwi 0035* Nitrobacter winogradskyi Nham 1013* Nitrobacter hamburgensis X14 PU1002 04361 Pelagibacter ubique HTCC1002 Irr in a-proteobacteria: regulator of iron homeostasis Irr boxes Rhizobiaceae plus Bradyrhizobiaceae Rhodobacteriaceae Rhodospirillales RirA/NsrR family (Rhizobiales) IscR family Regulation of genes in functional subsystems Rhizobiales Bradyrhizobiaceae Rhodobacteriales The Zoo (likely ancestral state) Reconstruction of history Frequent co-regulation with Irr Strict division of function with Irr Appearance of the iron-Rhodo motif All logos and Some Very Tempting Hypotheses: Cross-recognition of FUR and IscR motifs in the ancestor. 2. When FUR had become MUR, and IscR had been lost in Rhizobiales, emerging RirA (from the Rrf2 family, with a rather different general consensus) took over their sites. 3. Iron-Rhodo boxes are recognized by IscR: directly testable 2 1. 1 3 Summary and open problems • Regulatory systems are very flexible – – – – easily lost easily expanded (in particular, by duplication) may change specificity rapid turnover of regulatory sites • With more stories like these, we can start thinking about a general theory – catalog of elementary events; how frequent? – mechanisms (duplication, birth e.g. from enzymes, horizontal transfer) – conserved (regulon cores) and non-conserved (marginal regulon members) genes in relation to metabolic and functional subsystems/roles – (TF family-specific) protein-DNA recognition code – distribution of TF families in genomes; distribution of regulon sizes; etc. People • • • • • Andrei A. Mironov – software, algorithms Alexandra Rakhmaninova – SDP, protein-DNA correlations • • • • • • • • Anna Gerasimova (now at U. Michigan) – NadR Olga Kalinina (on loan to EMBL) – SDP Yuri Korostelev – protein-DNA correlations Ekateina Kotelnikova (now at Ariadne Genomics) – evolution of sites Olga Laikova – LacI Dmitry Ravcheev– CRA/FruR Dmitry Rodionov (on loan to Burnham Institute) – iron etc. Alexei Vitreschak – T-boxes and riboswitches • • • Andy Jonson (U. of East Anglia) – experimental validation (iron) Leonid Mirny (MIT) – protein-DNA, SDP Andrei Osterman (Burnham Institute) – experimental validation Howard Hughes Medical Institute Russian Foundation of Basic Research Russian Academy of Sciences, program “Molecular and Cellular Biology”