Download seminar

Document related concepts
no text concepts found
Transcript
Evolution of bacterial
regulatory systems
Mikhail Gelfand
Research and Training Center “Bioinformatics”
Institute for Information Transmission Problems
Moscow, Russia
Cologne, July 2008
Plan
• Individual sites
• Transcription factors and their binding
motifs
• Regulatory systems and regulons
Birth and death of sites
is a very dynamic process
NadR-binding sites upstream of pnuB seem absent in
Klebsiella pneumoniae and Serratia marcescens
… but there are candidate sites further upstream …
… and they are clearly different (not simply misaligned).
Cryptic sites and loss of regulators
Loss of RbsR in Y. pestis
(ABC-transporter also is lost)
RbsR binding site
Start codon of rbsD
Unexpected conservation of non-consensus
positions in orthologous sites
regulatory site of LexA upstream of lexA
consensus nucleotides are in caps
Escherichia coli
Salmonella typhi
Yersinia pestis
Haemophilus influenzae
Pasteurella multocida
Vibrio cholerae
TgCTGTATATActcACAGcA
aACTGTATATActcACAGcA
agCTGTATATActcACAGcA
atCTGTATAcAatacCAGTt
TtCTGTATATAataACAGTt
cACTGgATATActcACAGTc
wrong consensus?
TF PurR, gene purL
Escherichia coli
Salmonella typhi
Yersinia pestis
Haemophilus influenzae
Pasteurella multocida
Vibrio cholerae
A C G C A A A C Gg T T t C G T
A C G C A A A C Gg T T t C G T
A C G C A A A C Gg T T t C G T
A t G C A A A C G T T T G Ct T
A C G C A A A C G T T Tt C G T
A C G C A A A C Gg T T G C t T
TF PurR, gene purM
Escherichia coli
Salmonella typhi
Yersinia pestis
Haemophilus influenzae
Pasteurella multocida
Vibrio cholerae
t C G C A A A C G T T T G Ct T
t C G C A A A C G T T T G Ct T
t C G C A A A C G T T T G Cc T
t C G C A A A C G T T T G Ct T
t C G C A A A C G T T T G Ct T
A C G C A A A C G T T Tt C c T
Non-consensus positions are more conserved than
synonymous codon positions
Regulators and their motifs
• Cases of motif conservation at
surprisingly large distances
• Subtle changes at close evolutionary
distances
• Correlation between contacting
nucleotides and amino acid residues
• Changes in symmetry patterns
NrdR (regulator of ribonucleotide reducases
and some other replication-related genes):
conservation at large distances
DNA motifs and protein-DNA interactions
Entropy at aligned sites and the number of contacts
(heavy atoms in a base pair at a distance <cutoff from a protein atom)
CRP
PurR
IHF
TrpR
The LacI family:
subtle changes in motifs at close distances
G
A
CG
Gn GC
n
Specificity-determining positions
in the LacI family
Training set: 459 sequences
average length: 338 amino acids,
85 specificity groups
– 44 SDPs
10 residues contact NPF (analog of
the effector)
7 residues in the effector contact zone
(5Ǻ<dmin<10Ǻ)
6 residues in the intersubunit
contacts
5 residues in the intersubunit
contact zone (5Ǻ<dmin<10Ǻ)
7 residues contact the operator
sequence
6 residues in the operator contact
zone (5Ǻ<dmin<10Ǻ)
LacI from E.coli
The CRP/FNR family of regulators
TGTCGGCnnGCCGACA
CooA
Desulfovibrio
TTGTGAnnnnnnTCACAA
FNR
Gamma
TTGATnnnnATCAA
HcpR
Desulfovibrio
TTGTgAnnnnnnTcACAA
Correlation between contacting
nucleotides and amino acid residues
•
•
•
•
DD
DV
EC
YP
VC
DD
DV
EC
YP
VC
CooA in Desulfovibrio spp.
CRP in Gamma-proteobacteria
HcpR in Desulfovibrio spp.
FNR in Gamma-proteobacteria
COOA
COOA
CRP
CRP
CRP
HCPR
HCPR
FNR
FNR
FNR
Contacting residues: REnnnR
TG: 1st arginine
GA: glutamate and 2nd arginine
ALTTEQLSLHMGATRQTVSTLLNNLVR
ELTMEQLAGLVGTTRQTASTLLNDMIR
KITRQEIGQIVGCSRETVGRILKMLED
KXTRQEIGQIVGCSRETVGRILKMLED
KITRQEIGQIVGCSRETVGRILKMLEE
DVSKSLLAGVLGTARETLSRALAKLVE
DVTKGLLAGLLGTARETLSRCLSRMVE
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
The
correlation
holds for
other
factors in
the family
NrtR (regulator of NAD metabolism):
systematic search for correlated positions
•
•
•
•
analysis of correlated positions in proteins and sites
analysis of specificity determining positions
the same positions in one alpha-helix identified
plans for experimental verification
Comparison with the recently solved structure:
correlated positions indeed bind the DNA
(more exactly, form a hydrophobic cluster)
NiaR: changed dimer structure?
The GalR
family
and Cproteins
of RMsystems:
direct
and
inverted
repeats
BirA:
changed
spacing
What are the events leading
to the present-day state?
• Expansion and contraction of regulons
• New regulators (where from?)
• Duplications of regulators with or without
regulated loci
• Loss of regulators with or without regulated
loci
• Re-assortment of regulators and structural
genes
• … especially in complex systems
• Horizontal transfer
Trehalose/maltose catabolism
in alpha-proteobacteria
Duplicated LacI-family regulators: lineagespecific post-duplication loss
The binding motifs are very similar (the blue branch is
somewhat different: to avoid cross-recognition?)
Utilization of an unknown galactoside
in gamma-proteobacteria
Yersinia and Klebsiella: two regulons, GalR and Laci-X
Erwinia: one regulon, GalR
Loss of regulator and merger of
regulons: It seems that laci-X was
present in the common ancestor
(Klebsiella is an outgroup)
Utilization of maltose/maltodextrin
in Firmicutes
Displacement: invasion of a regulator from a
different subfamily (horizontal transfer from a
related species?) – blue sites
Orthologous TFs with
completely different regulons
(alpha-proteobaceria and
Xanthomonadales)
Catabolism of gluconate in proteobacteria
Extreme variability of the regulation
of “marginal” regulon members
β
Pseudomonas spp.
γ
Regulon expansion, or
how FruR has become CRA
• CRA (a.k.a. FruR) in Escherichia coli:
– global regulator
– well-studied in experiment
(many regulated genes known)
• Going back in time: looking for candidate
CRA/FruR sites upstream of (orthologs of)
genes known to be regulated in E.coli
Common ancestor of gamma-proteobacteria
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Common ancestor of the Enterobacteriales
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
Common ancestor of Escherichia and Salmonella
Mannose
Glucose
manXYZ
ptsHI-crr
edd
epd
eda
adhE
aceEF
Mannitol
mtlA
gapA
fbp
Fructose
pykF
mtlD
fruBA
fruK
pfkA
pgk
gpmA
icdA
ppsA
pckA
aceA
tpiA
aceB
Gamma-proteobacteria
Enterobacteriales
E. coli and Salmonella spp.
Regulation of amino acid biosynthesis
in Firmicutes
• Interplay between regulatory RNA
elements and transcription factors
• Expansion of T-box systems (normally
– RNA structures regulating
aminoacyl-tRNA-synthetases)
Three
regulatory
systems
for the
methionine
biosynthesis
A.
B.
C.
SAMdependent
riboswitch
Met-T-box
MtaR:
repressor of
transcription
MtaR
Methionine regulatory systems:
loss of S-box regulons
• S-boxes (SAM-1 riboswitch)
– Bacillales
– Clostridiales
– the Zoo:
ZOO
• Petrotoga
• actinobacteria (Streptomyces,
Thermobifida)
• Chlorobium, Chloroflexus, Cytophaga
• Fusobacterium
• Deinococcus
• proteobacteria (Xanthomonas, Geobacter)
• Met-T-boxes (Met-tRNA-dependent
attenuator) + SAM-2 riboswitch for metK
– Lactobacillales
• candidate TF-binding motif: MtaR
– Streptococcales
Lact.
Strep. Bac. Clostr.
Recent duplications and bursts:
ARG-T-box in Clostridium difficile
LR_ARGS
CPE_ARGS
CAC_ARGS
CB_ARGS
CBE_ARGS
Lactobacillales
CTC_ARGS
LP_ARGS
LME_ARGS
Clostridiales
argS
argS
LJ_ARGS
CDF_YQIXYZ
LGA_ARGS
RDF02391
PPE_ARGS
LSA_ARGS
СDF_ARGC
BC_ARGS2
EF_ARGS
BH_ARGS
CDF_ARGH
Bacillales
argS
: ARG-specific T-box regulatory site
yqiXYZ
NEW
NEW
aminoacyl-tRNA synthetase
biosynthetic genes
amino acid transporters
Clostridium
difficile
RDF02391
argCJBDF
argH
others
argG
predicted
amino acid
transporters
amino acid
biosynthetic
genes
… caused by loss of transcription factor AhrC
Gram+ bacteria:
Clostridium
difficile:
AhrC regulatory protein
(negative regulation of arginine metabolism
positive regulation of arginine catabolism)
Binding to 5’ UTR gene region
regulation of gene expression
5’
...
AhrC site
AhrC is lost
Expansion of T-box regulon
regulation of expression of
arginine biosynthetic
and transport genes by
T-box antitermination
Other clostridia spp.
(CA, CTC, CTH, CPE, CB, CPE)
yqiXYZ
yqiXYZ
argC
argH
argC
argH
argG
: AhrC binding site
: ARG-specific T-box regulatory site
CH_HISS
Bacillales
Other Gram+
hisS aspS
CTH_HISS
Lactobacillales
ASP\ASN
his operon
DRE_HISS
HIS
TTE_HISS
ASP
GAC
his XYZ
PL_HISS
Rapid mutation
of regulatory codons
NEW
BE_HISS
ASN
AAC
BL_HISS
BS_HISS
BC_HISS
LRE_HISXYZ
LSA_HISXYZ
OOE_HISXYZ SGO_HISC
SMU_HISC
Z
XY
HI S
_
LP
EF_HISXYZ
OB_HISS
BCL_HISS
Duplications
and changes in
specificity:
ASN/ASP/HIS
T-boxes
HIS
BH_HISS
EX_HISS
LME_HISXYZ
CDF_HISZX
EF_HISS
LMO_HISXYZ
EF_HISXYZ
LME_HIS(Z\G)
LL_HISC
LP_HISZ
Clostridiales
CPE_ASNS2
CDF_ASNA
CB_ASNS2
CDF_ASNS2
CTC_ASNA
asnS
ASN
LCA_HISZ
CB_ASNS3
CAC_ASNS32
asnA
BC_ASNS2
BC_ASNA
ASN
CBE_ASNS2
P. pentosaceus
asnS
CTC_ASNS2
CPE_ASNA
ASP
PPE_HISXYZ
Lactobacillales
hisS aspS
PPE_ASNS
EX_ASNA
LCA_HISS
ASP
hisXYZ
HIS
LB_ASNA
LB_ASNS2
LJ_HISS
LP_ASNA
PPE_ASNA
Lactobacillales
asnS
ASN
LB_HISS
asnA
LRE_ASPS
LP_HISS PPE_HISS
L. reuteri
aspS
ASP
hisS
HIS
LRE_HISS
ASN
LJ_ASNA
L. johnsonii
asnA
LJ_glnQHMP
LD_ASNA
ASN
glnQHMP
ASP
SG_ASPS2 SMU_ASPS2
Blow-up 1
LCA_HISS
LJ_HISS
PPE_HISXYZ
PPE_ASNS2
LB_HISS
LRE_ASPS
LB_ASNA
LP_HISS PPE_HISS
PPE_ASNA
LP_ASNA
LRE_HISS
ASN
AAC
HIS
CAC
P. pentosaceus
asnS
ASP
LJ_ASNA
hisXYZ
LJ_GLNQHMP
ASP
ASN
AAC
HIS
CAC
GAC
ASP
GAC
Lactobacillales
Lactobacillales
asnA
hisS aspS
ASN
ASP
L. reuteri
L. johnsonii
aspS
hisS
HIS
LD_ASNA
ASP
disruption of hisS-aspS operon
mutation of regulatory codon
asnA
ASN
glnQHMP
ASP
HIS
Blow-up 2. Prediction
Regulators
lost in
lineages
with
expanded
HIS-T-box
regulon??
… and validation
• conserved motifs upstream of HIS biosynthesis genes
Bacillales
(his operon)
Clostridiales
Thermoanaerobacteriales
Halanaerobiales
Bacillales
• candidate transcription factor yerC co-localized with the his genes
• present only in genomes with the motifs upstream of the his genes
• genomes with neither YerC motif nor HIS-T-boxes: attenuators
The evolutionary history of the his genes
regulation in the Firmicutes
T-boxes: Summary / History
Life without Fur
Regulation of iron homeostasis
(the Escherichia coli paradigm)
Iron:
• essential cofactor (limiting in many environments)
• dangerous at large concentrations
FUR (responds to iron):
• synthesis of siderophores
• transport (siderophores, heme, Fe2+, Fe3+)
• storage
• iron-dependent enzymes
• synthesis of heme
• synthesis of Fe-S clusters
Similar in Bacillus subtilis
Regulation of iron homeostasis in α-proteobacteria
[- Fe]
[+Fe]
[ - Fe]
[+Fe]
RirA
RirA
Irr
Irr
FeS
heme
degraded
Siderophore
uptake
2+
3+
Fe / Fe
uptake
Iron uptakesystems
Fur
[- Fe]
Iron storage
ferritins
FeS
synthesis
Heme
synthesis
Iron-requiring
enzymes
[ironcofactor]
Fur
IscR
Fe
FeS
Transcription
factors
FeS status
of cell
[+Fe]
Experimental studies:
• FUR/MUR: Bradyrhizobium, Rhizobium and Sinorhizobium
• RirA (Rrf2 family): Rhizobium and Sinorhizobium
• Irr (FUR family): Bradyrhizobium, Rhizobium and Brucella
Distribution of
transcription
factors in
genomes
Search for
candidate
motifs and
binding sites
using
standard
comparative
genomic
techniques
FUR/MUR branch of the FUR family
Fur
sp|
Escherichia coli: P0A9A9
ECOLI
Pseudomonas aeruginosa : sp|Q03456
PSEAE
NEIMA
Fur in g- and b- proteobacteria
Neisseria meningitidis : sp|P0A0S7
HELPY Helicobacter pylori : sp|O25671
P54574
BACSU Bacillus subtilis : sp|
SM mur
Sinorhizobium meliloti
Mesorhizobium sp. BNC1 (I)
MBNC03003179
BQ fur2
Bartonella quintana
BMEI0375
Brucella melitensis
EE36 12413 Sulfitobacter sp. EE-36
MBNC03003593Mesorhizobium sp. BNC1 (II)
Rhodobacterales bacterium HTCC2654
RB2654 19538
Agrobacterium tumefaciens
AGR C 620
RHE_CH00378 Rhizobium etli
Rhizobium leguminosarum
RL mur
Nham 0990 Nitrobacter hamburgensis X14
Nwi 0013
Nitrobacter winogradskyi
Rhodopseudomonas palustris
RPA0450
Bradyrhizobium japonicum
BJ fur
Roseovarius sp.217
ROS217 18337
Jannaschia sp. CC51
Jann 1799
Silicibacter pomeroyi
SPO2477
STM1w01000993Silicibacter sp. TM1040
MED193 22541 Roseobacter sp. MED193
OB2597 02997Oceanicola batsensisHTCC2597
Loktanella vestfoldensisSKA53
SKA53 03101
Rhodobacter sphaeroides
Rsph03000505
Roseovarius nubinhibensISM
ISM 15430
PU1002 04436Pelagibacter ubiqueHTCC1002
GOX0771 Gluconobacter oxydans
Zmomonas
y
mobilis
ZM01411
Saro02001148 Novosphingobium aromaticivorans
Sphinopyxis alaskensis RB2256
Sala 1452
ELI1325
Erythrobacter litoralis
Oceanicaulis alexandrii HTCC2633
OA2633 10204
PB2503 04877 Parvularcula bermudensis HTCC2503
CC0057
Caulobacter crescentus
Rhodospirillum rubrum
Rrub02001143
Magnetospirillum magneticum (I)
Amb1009
Magnetospirillum magneticum(II)
Amb4460
Fur in e- proteobacteria
Fur in Firmicutes
Mur
in a-proteobacteria
Regulator of manganese
uptake genes (sit, mntH)
Fur
in a-proteobacteria
Regulator of iron uptake
and metabolism genes
Irr
a-proteobacteria
Erythrobacter litoralis
Caulobacter crescentus
Zymomonas mobilis
Novosphingobium aromaticivorans
Oceanicaulis alexandrii
Sphinopyxis alaskensis
Gluconobacter oxydans
Rhodospirillum rubrum
Parvularcula bermudensis -
Magnetospirillum magneticum
Identified Mur-binding sites
of a - proteobacteria
-
FUR and
MUR
boxes
Bacillus subtilis
Mur
Escherichia coli
Sequence logos for
the known
Fur-binding sites
in Escherichia coli
and Bacillus subtilis
Irr branch of the FUR family
Fur
Escherichia coli : P0A9A9
sp|
ECOLI
Pseudomonas aeruginosa : sp|Q03456
PSEAE
NEIMA
Fur in g- and b- proteobacteria
Neisseria meningitidis : sp|P0A0S7
HELPY Helicobacter pylori : sp|O25671
sp|
BACSU Bacillus subtilis : P54574
Fur in e- proteobacteria
Fur in Firmicutes
a-proteobacteria
Mur / Fur
Agrobacterium tumefaciens
AGR C 249
Sinorhizobium meliloti
SM irr
Rhizobium etli
RHE CH00106
Rhizobium leguminosarum (I)
RL irr1
RL irr2 Rhizobium leguminosarum (II)
Mesorhizobium loti
MLr5570
MBNC03003186 Mesorhizobium sp. BNC1
BQ fur1 Bartonella quintana
Brucella melitensis (I)
BMEI1955
Brucella melitensis (II)
BMEI1563
BJ blr1216 Bradyrhizobium japonicum (II)
RB2654 182 Rhodobacterales bacterium HTCC2654
Loktanella vestfoldensis SKA53
SKA53 01126
Roseovarius sp.217
ROS217 15500
Roseovarius nubinhibens ISM
ISM 00785
OB2597 14726 Oceanicola batsensis HTCC2597
Jann 1652 Jannaschia sp. CC51
Rsph03001693Rhodobacter sphaeroides
Sulfitobacter sp. EE-36
EE36 03493
STM1w01001534 Silicibacter sp. TM1040
Roseobacter sp. MED193
MED193 17849
SPOA0445
Silicibacter pomeroyi
Rhodobacter capsulatus
RC irr
RPA2339
Rhodopseudomonas palustris (I)
RPA0424*
Rhodopseudomonas palustris (II)
Bradyrhizobium japonicum (I)
BJ irr*
Nwi 0035* Nitrobacter winogradskyi
Nham 1013* Nitrobacter hamburgensis X14
PU1002 04361
Pelagibacter ubique HTCC1002
Irr in a-proteobacteria:
regulator of iron
homeostasis
Irr boxes
Rhizobiaceae plus
Bradyrhizobiaceae
Rhodobacteriaceae
Rhodospirillales
RirA/NsrR family (Rhizobiales)
IscR family
Regulation of genes
in functional
subsystems
Rhizobiales
Bradyrhizobiaceae
Rhodobacteriales
The Zoo (likely
ancestral state)
Reconstruction of history
Frequent
co-regulation
with Irr
Strict division
of function
with Irr
Appearance of the
iron-Rhodo motif
All logos and Some Very
Tempting Hypotheses:
Cross-recognition of
FUR and IscR motifs
in the ancestor.
2. When FUR had
become MUR, and
IscR had been lost in
Rhizobiales, emerging
RirA (from the Rrf2
family, with a rather
different general
consensus) took over
their sites.
3. Iron-Rhodo boxes
are recognized by
IscR: directly
testable
2
1.
1
3
Summary and open problems
• Regulatory systems are very flexible
–
–
–
–
easily lost
easily expanded (in particular, by duplication)
may change specificity
rapid turnover of regulatory sites
• With more stories like these, we can start thinking about
a general theory
– catalog of elementary events; how frequent?
– mechanisms (duplication, birth e.g. from enzymes, horizontal
transfer)
– conserved (regulon cores) and non-conserved (marginal regulon
members) genes in relation to metabolic and functional
subsystems/roles
– (TF family-specific) protein-DNA recognition code
– distribution of TF families in genomes; distribution of regulon
sizes; etc.
People
•
•
•
•
•
Andrei A. Mironov – software, algorithms
Alexandra Rakhmaninova – SDP, protein-DNA correlations
•
•
•
•
•
•
•
•
Anna Gerasimova (now at U. Michigan) – NadR
Olga Kalinina (on loan to EMBL) – SDP
Yuri Korostelev – protein-DNA correlations
Ekateina Kotelnikova (now at Ariadne Genomics) – evolution of sites
Olga Laikova – LacI
Dmitry Ravcheev– CRA/FruR
Dmitry Rodionov (on loan to Burnham Institute) – iron etc.
Alexei Vitreschak – T-boxes and riboswitches
•
•
•
Andy Jonson (U. of East Anglia) – experimental validation (iron)
Leonid Mirny (MIT) – protein-DNA, SDP
Andrei Osterman (Burnham Institute) – experimental validation
Howard Hughes Medical Institute
Russian Foundation of Basic Research
Russian Academy of Sciences, program “Molecular and Cellular Biology”
Related documents