Download How do non-enyzmatic domains become enzymes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

SR protein wikipedia , lookup

Transcriptional regulation wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein wikipedia , lookup

Interactome wikipedia , lookup

Lipid signaling wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Western blot wikipedia , lookup

Metabolism wikipedia , lookup

Restriction enzyme wikipedia , lookup

Biosynthesis wikipedia , lookup

Enzyme inhibitor wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Evolution of metal ions in biological systems wikipedia , lookup

Metalloprotein wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Catalytic triad wikipedia , lookup

Enzyme wikipedia , lookup

Transcript
Evolution of biocatalysis
structural and genomic trends
L. Aravind
Computational Biology Branch
National Center for Biotechnology Information
Enzymes achieve incredible feats with relative ease
How do they do that?
Fritz Haber
Haber Process for NH3 synthesis
Nitrogenase
Bacterial nitrogen fixation
N2(g) + 3H2(g) ↔ 2NH3(g) + ΔH
N2 + 8H+ + 8e- ↔ 2NH3 + H2
200..900 Atmospheres pressure
450°C
Iron catalyst
(Molybdenum promoter)
Ambient pressure
20-30°C
Catalytic metal cluster with
active molybdenum or vanadium
16ATP->ADP+Pi
Industrial and biological catalysts
Zinc
Some metals are common to
both industrial and biological
catalysts…
Iron
Molybdenum
Nickel
Vanadium
Magnesium
Yet the biological systems…
bypass the draconian
temperature and pressure
regimes…
So the protein scaffold and
small molecule co-factors
matter in some way.
Other
Rosmannoids
PP-ATPases
Photolyase
USPA
ETFP- A and B
Aminoacyl tRNA
synthetases
HIGH
nucleotidyltransferases
Translation
tRNA
Biogenesis
Origin of KMSK
signature
LUCA
Origin of bi-helical Cterminal module module
Nucleotidyl
transferase
Pyrophosphatase
ATP->AMP+PP
ATPase,
AMP binding
Where did the specificity
come from?
At the junction between the protein and RNA worlds: RNASE PH
RNASE P- S5 domain
(Ribozyme)
Ribosomal protein S5- S5 domain
(RNA binding protein)
RNASE PH active site region
(enzyme)
Both Rnase P and Rnase PH
share a common S5 domain
that is found in several nucleic
acid binding contexts
RNAse P has an active
Ribozyme while RNAse PH is
entirely a protein enzymes
RNASE P
Active site on ribozyme
RNASE PH
The active site was probably built on to
the protein core with additional protein
motif
protein
R
ribozyme
It is possible that the early nucleic acid binding domains were
in place even when ribozymes were still active. Catalytic
activities appeared in these proteins slowly displacing the
ribozymes
D
The Echoes of a lost world
Analysis of phyletic patterns and higher order evolutionary
relationships show that many ancient protein folds had paralogous
representatives that can be traced back to LUCA
Most of these ancient folds contain RNA-binding versions and the
particular representatives are often associated with RNA-binding.
We have evidence for protein synthesis before the translation
apparatus was in place
A Ribozyme makes all extant proteins
Suggests a possible role for RNA and emergence of enzymes by
displacement of ribozymes by protein enzymes
The continuing story of enzymes
General tendencies in enzyme evolution
Are there different temporal phases in which different catalytic activities were acquired by different folds ?
Are there differences in terms of the number of different catalytic activities accommodated by different folds?
What are their obvious structural determinants?
Invention of enzymes in the later phases of evolution
How do non-enyzmatic domains become enzymes ?
Similar active sites different catalytic mechanisms
Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily
The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion
Convergent evolution of active sites
The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase
The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases
Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases
Does the scaffold matter in catalysis?
The mysterious case of the ISOCOT fold
Are there engines of innovation in the biosphere?
The bacterial engine of enzyme innovation
Some fundamental concepts
Bio-catalysis relies on a constellation of a few amino acid residues that are embedded in a distinct
globular domain of the enzyme (the catalytic domain).
Generally, the set of catalytic residues is highly conserved during evolution, but variations occur in
the substrate-binding and cofactor-binding sites. This is exploration of “substrate space” by
essentially adapting the same biochemical activity on a range of different substrates. (Same basic
biochemical activity)
E.g. various Rossmann-fold methyltransferases transfering methyl groups from AdoMet to
various substrates.
In contrast, enzymes with similar catalytic residues may explore considerable diversity in “reaction
space”. (Different activities)
E.g. DnaG-type primases and most topoisomerases share a catalytic domain (the TOPRIM
domain) with an identical core set of catalytic residues. Two distinct reactions:
nucleotidyltransferase (transferase; class 2 according to the EC classification) and
topoisomerase (isomerase; class 5 according to the EC classification)
A more dramatic case of exploration of the reaction space is seen in the stem glycolytic
(Embden-Meyerhoff) pathway: four of the glycolytic enzymes, 1,6 fructose bisphosphate
aldolase, triose phosphate isomerase, enolase, and pyruvate kinase, that catalyze three distinct
reactions, have the same structural scaffold, the TIM barrel
At least 1 superkingdom
few terminal branches
All 3 superkingdoms
Folds
ub
le
ps
TFo
ld
e
ik
e
M
ib
R
os
ar
et
s
re
m
al
l
an
lo
be
nfo
ta
ld
la
c
ta
Pm
Lo
as
op
e
AT
Pa
se
do
T
se
ne
C
O
-L
IT
H
P
ik
e
U
Llik
PF
da
O
IS
ho
R
H
eH
l
rre
AD
-L
H
Ba
N
As
R
M
TI
Activities
Activites by superkingdom
18
16
14
12
10
8
6
4
2
0
Number of Proteins (per 1000)
Pseudomonas aeruginosa
35
P-loop NTPases
30
TIM Barrel
RRM-like
HUP
DSBH
beta-propeller
RNAseH
Metallobetalactamase
P-Loop ATPase
double psi barrel
T-Fold
Rhodanese
PFL-like
HAD-like
HIT-like
Rossmann-fold
Classic Rossmann fold
25
20
15
10
Scatter plot of the number of
distinct enzymatic activities vs. the
number of representatives of
common folds in the proteomes of
the proteobacterium Pseudomonas
aeruginosa and the yeast
Saccharomyces cerevisiae
5
0
0
5
10
15
20
Activities
Saccharomyces cerevisiae
Number of Proteins (per 1000)
35
TIM Barrel
RRM-like
HUP
DSBH
beta-propeller
RNAseH
Metallobetalactamase
P-Loop ATPase
double psi barrel
T-Fold
Rhodanese
PFL-like
HAD-like
HIT-like
Rossmann-fold
30
P-loop NTPases
25
20
15
Classic Rossmann fold
10
5
0
0
5
10
15
Activities
20
The number of activities in most
common folds scales linearly with
their prevalence in the proteome
However, there are some striking
exceptions such as the P-loop
NTPases and classical Rossmann fold
enzymes which show extensive
proliferation but apparently very little
exploration of reaction space
Folds with few and many activities
PSUS, cyclase,
polymerase
MSOR
Primase (polymerase) Cyclase, polymerase
ACP
NDPK
Many
NH2
Dehydrogenase
COOH
Phytase,
Arabinase
NH2
RRM-Like Fold
Walker A
loop
Dehydrogenase
COOH
Phytase,
Arabinase
Phytase
Phytase
Walker B-aspartate
Arabinase
COOH
Few
Beta Propeller
NH2
P-loop NTPase
For each fold, the positions of the catalytic residues from several
representative examples are shown in red
Of shapes and functions
The TIM barrel, the beta-propellers, and the DSBH domain contain a central pocket that
binds their substrates and/or cofactors, with an approximate cyclic symmetry.
The pocket that is inherent to these structures allows easy accommodation of diverse substrate
molecules through low-specificity interactions. Subsequently, natural selection could act on
these proteins to fix residues that impart interaction specificity and catalytic capacity. The
intrinsic symmetry of the central pockets of these folds creates the potential for different
catalytic residues to emerge on the surface of the substrate-binding site, providing for the
evolution of a wide range of activities.
The two-layered RRM-like fold that consists of two helices packed against a four-stranded
anti-parallel sheet represents another structural principle in the evolution of multicatalytic
folds.
The main theme in this case appears to be the large exposed surface area of an entire sheet that
is provided by the two-layered structure.
The P-loop and Rossmann folds are 3-layered sandwiches where a central sheet is
protected on both sides by helices. This only leaves the loops for interactions and this
configuration has been less favorable to explore reaction space.
The continuing story of enzymes
General tendencies in enzyme evolution
Are there different temporal phases in which different catalytic activities were acquired by different folds ?
Are there differences in terms of the number of different catalytic activities accommodated by different folds?
What are their obvious structural determinants?
Invention of enzymes in the later phases of evolution
How do non-enyzmatic domains become enzymes ?
Similar active sites different catalytic mechanisms
Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily
The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion
Convergent evolution of active sites
The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase
The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases
Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases
Does the scaffold matter in catalysis?
The mysterious case of the ISOCOT fold
Are there engines of innovation in the biosphere?
The bacterial engine of enzyme innovation
The UTRA domain: non-enzymatic domain to enzyme
Chorismate
lyase
Ancestral unit
Dimerization and
ligand- binding
small molecule
binding domain of
HutC/FarR
transcription
factors
emergence of key polar
residues
Chorismate lyase
Chorismatepyruvate+
4-hydroxybenzoate
The continuing story of enzymes
General tendencies in enzyme evolution
Are there different temporal phases in which different catalytic activities were acquired by different folds ?
Are there differences in terms of the number of different catalytic activities accommodated by different folds?
What are their obvious structural determinants?
Invention of enzymes in the later phases of evolution
How do non-enyzmatic domains become enzymes ?
Similar active sites different catalytic mechanisms
Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily
The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion
Convergent evolution of active sites
The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase
The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases
Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases
Does the scaffold matter in catalysis?
The mysterious case of the ISOCOT fold
Are there engines of innovation in the biosphere?
The bacterial engine of enzyme innovation
DHH
S5eq
D
DxD
S4eq
S6eq
K
Acidic active site
Rossmanoid folds
S1
S2
DHH
RECEIVER
S
S5eq
D
S4eq
S6eq
DxD E
S4eq
S1
S2
D
S1
D
D
S4eq
S1
S2
VWA
TOPRIM
S5eq
S6eq
S5eq
S2
C0/C1 Cap Insertion
K DxD
DD/DxxxD
T/S
C2 Cap Insertion
Classic HAD
S6
S5
S4
S1
S2
S3
S3.1
S3.2
S2.1
Anatomy of the HAD catalytic domain
FLAP
Squiggle
Elaboration of C1 caps
1N9K
Acid Phosphatase
1SU4
P-type ATPase
1O08
SDT1
1TA0/1U7O
CTD/MDP-1
1K1E
8KDO
Phosphatase
1F5S
Phosphoserine
Phosphatase
1MH9
Deoxyribonucleotidase
1QYI
Zr25
Elaboration of C1 caps
Cof Clade
HisB family
C
1NF2, Tm0651
Cof family
C
C
Zn
Histidinol phosphate
phosphatase family
1NRW, Ywpj
Cof family
Nagd Clade
1L6R, Apc0014
Cof family
1U02, otsB
Trehalose Phosphate
Phosphatase family
1XVI, Yedp
Mannosyl-3-phosphoglycerate
phosphatase family
PSP family
1FS5, serB
SerB subfamily
1VJR, Tm1742
CIN/AraL subfamily
1Y8A, Af1437
Af1437 subfamily
Enzyme may emerge from non-enzymatic ligand binding domains by
acquisition of key catalytic residues
The development of special structures around the active site play a
major role in influencing catalytic mechanisms
The continuing story of enzymes
General tendencies in enzyme evolution
Are there different temporal phases in which different catalytic activities were acquired by different folds ?
Are there differences in terms of the number of different catalytic activities accommodated by different folds?
What are their obvious structural determinants?
Invention of enzymes in the later phases of evolution
How do non-enyzmatic domains become enzymes ?
Similar active sites different catalytic mechanisms
Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily
The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion
Convergent evolution of active sites
The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase
The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases
Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases
Does the scaffold matter in catalysis?
The mysterious case of the ISOCOT fold
Are there engines of innovation in the biosphere?
The bacterial engine of enzyme innovation
Active site convergence in hydroxylases: the elusive deoxyhypusine hydroxylase
eIF5A is a translation initiation factor found in archaea and eukaryotes and is required for
the formation of the first peptide bond (Ortholog of bacterial EF-P).
Critical for its function is the modification of a highly conserved lysine: In archaea it is
modified into deoxyhypusine, while in eukaryotes it is modified further into hypusine. This
additional modification is critical for the fitness of all eukaryotic models studied to date.
eIF5A is the only protein in the whole cell with this unusual amino acid hypusine. This
amino acid and the activity appears to be unique to eukaryotes
While the enzyme catalyzing the first step is well-known, deoxyhypusine hydroxylase
has eluded identification for 20 years
All that was known about it was that it might have metal-dependence. This is not
surprising given that most other hydroxylases show this dependence
The majority of hydroxylases can be unified into the Double-stranded beta helix fold
2OG-Fe Dioxygenases
Several new members were
discovered in this class including
protein lysyl and prolyl hydroxylases
and alkylated base methylases, like
AlkB, the DNA repair enzyme
In addition to sharing a
common fold they share a a
peculiar HX[HQD] signature
at the N-terminus and a
conserved H at the Cterminus. They both bind
ligands in the interior and
catalyze a range of
monooxygenase or
dioxygenase reactions
However, there are two
major divisions within them
that differ in terms of their
dependence on 2-ketoacid
co-factors
AraC-like DSBH
This class includes non-catalytic
sugar binding domains of prokaryotic
transcription factors and the JOR
domain which was predicted to
demethylate chromatin proteins
Despite these enzymes having all the necessary credentials to function as a deoxyphypusine
hydroxylase none of them matched their phyletic patterns mirrored that of DOHH. Experimental
tests with enzymes of this family for DOHH activity also failed to uncover any activity.
Deoxyhypusine hydroxylase turns out to be HEAT
repeat protein with a symmetric dyad of 4 repeats
High throughput protein interactions recovered HEAT repeat protein as the
principle eIF5A interactor outside of the translation apparatus
HE
HEAT REPEAT:
FINAL
S. cerevisiae
S. pombe
A. nidulans
N. crassa
L. edodes
E. cuniculi
T. annulata
C. parvum
A. thaliana
D. discoideum
D. melanogaster
D. rerio
H. sapiens
L. major
Consensus/95%
HEAT REPEAT:
FINAL
S. cerevisiae
S. pombe
A. nidulans
N. crassa
L. edodes
E. cuniculi
T. annulata
C. parvum
A. thaliana
D. discoideum
D. melanogaster
D. rerio
H. sapiens
L. major
Consensus/95%
HE
--HHHHHHHHHH-------HHHHHHHHHHHHHHH------HHHHHHHHHH-----HHHHHHHHHHHHH----HHHHHHHHHHH------HHHHHHHHHHHHH------HHHHHHHHH---------HHHHHHHHHHHHHH-HHHHHHHHHHHHH----------EEEE-EEE-DECTLEQLRDILVNKSGKTVLANRFRALFNLKTVAEE(9)KAIEYIAESFVND-KSELLKHEVAYVLGQTKNLDAAPTLRHVMLDQNQEPMVRHEAAEALGALG----DKDSLDDLNKA--AKEDPHVAVRETCELAIN--RINW--THGGAKD---KENLQQSLYSS-IDPAP
PQAVIDELERVLVNLDKSNPLSFRYRALFSLNALAKK(3)RAVDAIYKAF-ID-DSELLKHEMAYVMGQSGQQYAVQPLINIVNDLDQQVMVRHEAAEALGALG----FTESLPVLEKY--YKEDPLAPIRETCELAIA--RIQW--KNGLDKN---NEKITPSMYDSVVDPAP
ADTTVQTLRNVLTS--ETEPLARRFRALFSLKHLACL(9)PAIQAIAAGF-SS-ASALLKHELAYCLGQTRNTDALPFLLDVVQDTQEDSMCRHEAAEALGALG----YESSLEVLKALR-DNENEVDVVRETCDIAVD--RILW--EQSEARK---AEKLKPSDFTS-IDPAP
MSATIASLRESLCS--ETTPLPIRFRALFSLKHLAVQ(9)SAIDAIAAAF-AS-PSALLKHELAYCLGQTGSDAAIPHLTQVLEDLQEDPMCRHEAAEALGALG----KAESLGVLQKYL-HREGEDVSVKETCEIAID--RIEW--ENSEERK---QEKLRQSDFAS-VDPAP
SATQLKALEDSVLNTSGKVLLHDRVRALFTLKSLKNE---DAIRIISKGF-QD-SAALLKHELAYCLGQIRNPLALPVLESVLRNPSEDPMVRHEAAEAMGAIS----TADSIPILKQ---YLSDPDRSVRETCEIAIA--KIEWDKTEEGAKN(4)RDENRLPLYTS-IDPAP
--MDIEVARKNIGC--DSVSIAKRMRSLFYLRNVLLP---ESARAITEAF-GS-KSVLLKHEAAYVLGQMRMEESVRVLLDVLSDEDEDEIVRHEAGEALGNFR---PREEIVEALRK---YSNHPKKPISETCYLALM--KLK-------------DGSDIVSKFGS-RDPAL
SSPSKDLITSILLN--PEVPLSLQLRALYYCRDLPEE---DCSKILISALDVH-FDTFMRHEIAYVIGQSGCFSASKKLAELVEDVTEDPMVRHEAIEALAALK----SKDHIHLIKK---YCDDENRAVRDTCNLALHT-LINAEDSNTEGCT---SFPISSSPYRA-IDPVR
NIYSKEKIRELLLS--HDTDISSKIRCLFFGRFHGDE---ESAEMLSKSLDYS-ESVLFRHEVLYVLGQMGLKSPLTRLYEILADETEHPMVRHEAGEAIAAIG----DDESLEIVEK---YLNDNSPAVRETCYLAAHSLRLKREKRLKESNK(4)TNISNINAFNT-RDPTP
MVNLEKFLCERLVD--QSQPISERFRALFSLRNLKGP---GPRNALILAS-RD-SSNLLAHEAAFALGQMQDAEAIPALESVLNDMSLHPIVRHEAAEALGAIG----LAGNVNILKKS--LSSDPAQEVRETCELALK--RIEDMSNVDAENQ---SSTTEKSPFMS-VDPAG
TEEIVNGLKETLTD--VSQPIAKRFRSLFTLRNLNGP---LCIDAMASAL-ND-KSALLRHEIAYCLGQMEDEYALKVLIDLVKNSDEHPMVRHEAAEALGAIG----SESAHKTLKE---YSNDPVREVSETCQLALS--RVEWYEKNK-------PETEEDKMYMS-VDPAP
SQQQIEAIGGVLNN--KERPLKERFRALFTLKNIGGG---AAIEAISKAF-DD-DSALLKHELAYCLGQMQDAQALDILTKVLKDTTQEPMVRHEAAEAMGAIG----HPDVLPILEE---YKQDPVVEVAETCAIALD--RVRWLQSG--------QKVDDSNPYAS-VDPSP
NDKDIAAVGSILVN--TKQDLTTRFRALFTLRNLGGA---EAVKWISEAF-VD-ESALLKHELAYCLGQMQDESAIPTLEAVLKDTNQEPMVRHEAGEALGAIG----NPKVLELLKK---YAEDPVIEVAETCQLAVK--RLEWLM-NGGEQT---KDGTDENPYCS-VDPAP
TEQEVDAIGQTLVD--PKQPLQARFRALFTLRGLGGP---GAIAWISQAF-DD-DSALLKHELAYCLGQMQDARAIPMLVDVLQDTRQEPMVRHEAGEALGAIG----DPEVLEILKQ---YSSDPVIEVAETCQLAVR--RLEWLQQHGG--------EPAAGPYLS-VDPAP
TVEEVRKEYAKLLD--PQEPLDSRMRELYRLKEDCLK(2)AGVTVILEAIDTT-DSVLLQHELAYNAGQSGREEAVPELERILRTTSYDVVTRHEAAEALGAIG----SPLALQVLETHSAPTTEPEASIRETCELALA--RIAMKETKGDAA----VAPPSGCEFVS-VDPSP
...........h.s...p..h..phR.Lh.hp.........s..hh..u.s.p..s.hh.HEhhhshGQ.....s...L..hh.s.pbp.hhRHEAhEAhhsh.........h..hp......p.....h.-TC.hAh...bh......................h.s..DPs.
-----HHHHHHHHHH----HHHHHHHHHHHHH--------HHHHHHHHHHH------HHHHHHHHHHHH----HHHHHHHHHHH------HHHHHHHHHHHHH---HH
HHH-HHHHH-------HHHHHHHHHHHHHHH-------------PLPLEK---DATIPELQALLNDPK-Q-PLFQRYRAMFRLRDIGT----DEAILALATGF-SA-ESSLFKHEIAYVFGQIGSPAAVPSLIEVLGRKEEAPMVRHEAAEALGAIASPE----VVD-VLKSYL--NDEVDVVRESCIVALDMYDY-ENSNELEYAPTAN-------------PMPDHEQDVKSEVAKLRSEIVDQN-L-PLFYRYRVMFRLRNIGN----EEAVLALTDGF-KD-PSPLFRHEIAFVFGQMIAPASVPALIKVLENTEEVPMVRHEAAEALGGIANDE----CLP-VLKKFS--KDDVRVVAESCIVALDMIEY-EKSGDMEYAYIPKVSA----------PMPLTAK--EPSIPDLEKTLLDTN-L-PLFERYRAMFGLRDLAS(7)-KQAVQSLAKGM-KD-PSALFRHEIAFVFGQLCHPASVPSLTETLSDLNEVGMVRHEAAEALGSLGDVE----GVEDTLKKFL--NDPEKVVRDSIIVALDMAEF-EKNGEIEYALIPDSGNPAAVPAA---PMPEDDE--KQTVETLEKKLLDTS-L-PLFKRYRAMFALRDLAS(7)-VPAILALAKGL-KD-ESALFRHEIAFVFGQLSHPASIPALTEALSNLDEVSMVRHEAAEALGSLGDEE----GVEETLLKFL--HDKEKVVRESVIVALDMAEF-EQSGQAEYALIPEVASKAS----(9)PRPEEIS--QTKIDELRDNLLDVN-R-PLFERYRAMFALRNIGS----PAAVDALAAGF-SG-DSALFKHEIAFVFGQLLSPHSVPCLIEVLQNSPESDMVRHEAAEALGGIATPE----VLP-PLKEWVARDDAPVVVRESCQVALDLWEY-ENSGDFQYANGLESPSTPISV-----PM-------EGSFEEARRILLDKN-E-CLYRRYQAMFYLRDLGT----SAAIHALGKSM-ED-DSALFKHEVSFVFGQMRSRESIPYLIKGMEDEKEHGMVRHECAEALGAIGDDA----ALK-ALSKYL--HDPCDILRESVEVAVDIHSY-MTGDEIEYCNAE--------------TDSVD----ESDLNSLSEILFNQS-L-PLYKRYEALYKIRGISG----DEAAKIIGEALVKDKVSEVFRHECAFVLGQMQSVAPVKSLIECLRNRNEEPMARHEAALALGSCASLY(14)IVE-VLEEFL--QDEVKVVSDSCLVAMDYIN--ESKHELTAH-----------------PKSSCE---VSHIESLASDLLNED-L-QLEKRYAALFALRNILT(24)HFIAGEIAKAMEIDKSSAVFRHECAFVLGQIQVISTADTLSRVLSNQSEESMVRHEAAFALGSVGSND(25)SIE-TLLKYS--NDLDIIVAESCIVGLQTIM--DETGSLDILLE---------------PAAS-----FSSVHQLRQVLLDET-K-GMYERYAALFALRNHGG----EEAVSAIVDSL-SA-SSALLRHEVAYVLGQLQSKTALATLSKVLRDVNEHPMVRHEAAEALGSIADEQ----SIA-LLEEFS--KDPEPIVAQSCEVALSMLEFENSGKSFEFFFTQDPLVH---------PLKKG----SVSRDELRSKFLDSN-L-DIFNRYRALFSLRDIGD----EQSVLALCDGL-KDQSSALLRHEVAFVLGQLQHRVAIDPLTTCVLDESENAMVRHEAAEALGAIASTE----TIP-LLEKLL--QDKEPIVSESCAVALDVTEYFNNTESFQYADGIKILLEKNLV(5)
---PTAG-----DKSVTELKAIYLDAQ-Q-SLFDRYRAMFSLRNLRT----EESVLAIAEGL-KD-SSALFRHEVAFVLGQLQEPCSIPFLQENLEDRLENEMVRHECAEALGAIATED----CIQ-ILNRYA--EDDKRVVKESCVIALDMCEY-ENSPEFQYADGLAKLDATK-------PAQ------RKSVPELRTQLLDET-L-PLFDRYRAMFALRNLGT----EEAVLALGDGL-QC-SSALFRHEIGYVLGQIQHEASIPQLQAALEKMDENAMVRHECAEALGSIGKEP----CVQ-ILERYR--KDQERVVKESCEVALDMLEY-ENSSQFQYADGLLRLQSAH-------PAE------ERDVGRLREALLDES-R-PLFERYRAMFALRNAGG----EEAALALAEGL-HC-GSALFRHEVGYVLGQLQHEAAVPQLAAALARCTENPMVRHECAEALGAIARPA----CLA-ALQAHA--DDPERVVRESCEVALDMYEH-ETGRAFQYADGLEQLRGAPS---(8)TDEPV----PLTVEELEAVLLDTSGRTRLFRRYMAMFTLRNLAT----EAAVAALCRGLREDTISALFRHEVAFVLGQLERPSSQPALIAALKDEEEAPMVRHEAAEALGAIADPA----TLP-VLESYA--THHEPIVRDSCVVALEMHKY-WAHFNSLAHQQQEA---------...s..........p..ph...h.s.p.b..hb.RY.hhh.hRsh.s.......h..hh.uh..s..S.hh+HEhuhVhGQh....s...L..sh.p..E..MhRHEhAbALGuhhp.......h...L..h...pc...hh.pSh.hhhphh..........hh...............
HE
HE
The metal chelating sites are boxed
Typically HEAT repeat proteins are involved in protein-protein interactions.
But DOHH and a few related phycocyanobilin synthases are rare all alpha-helical
enzymes and use the HEAT repeats as their catalytic scaffold.
Completely different scaffolds but similar active site
The arginine finger in catalysis of phosphohydrolysis: how general is it?
The nucleophilic attack by water on a NTP results in a hypercharged pentavalent
intermediate which needs to be stabilized for the reaction to proceed
In GTPases this was found to be mediated by the GTPase-activating protein the GAP which
provides an arginine finger which stabilizes the intermediate.
R
R
R
R
R
PilT
SFI/II
AAA+ ATPase
STAND
DNAB
R
HerA/FtsK
The tale of moving fingers in P-loop NTPases
Arginine fingers are widely utilized but are not conserved even within the P-loop NTPases
They have evolved in at least 5 distinct families of enzymes and on at least 14 independent occasions in
the P-loop NTPase fold. On at least one occasion, the P-loop NTPases have innovated a potassium
finger, where a potassium in coordinated by an acidic residue.
Combining the spatial locations of R-fingers with the classification scheme for the P-loop NTPase
suggests that is was:
probably absent in the ancestral version of the fold;
received their R-finger from a ribozyme because arginine has been found to be a cofactor from
phosphotransfer catalyzing ribozymes
it has shifted position in course of evolution
This differential positioning of the R-finger allowed several different ways of coupling the free energy of
NTP hydrolysis to different downstream motor functions. Thus, it seems to have been a major factor
in the occupation of diverse sub-cellular niches by the P-loop NTPase fold.
Arginine finger
Methenyl tetrahydrofolate
synthetase
R
Lysine finger P-type ATPase (HAD
superfamily)
K
Tale of two knots
The SPOUT superfamily of methyltransferases
includes a vast group of RNA methylases that
are prototyped by SpoU and TrmD. They
mediate the transfer of -CH3 from AdoMet to
various bases.
They differ from all classic methyltransferases
in having a unique active site constellation.
The N-terminal motif involved in SAM binding
is a glycine-rich loop similar to other
methylases.
But they have an additional C-terminal motif
that is associated with a structural knot
AdoMet
Regular Rossmannoid fold
SPOUT
Knot
Rotation
of C-terminal unit
SET domain methylases have a knotted active site
The SET domain is a methyltransferase
that is prevalent in eukaryotic
chromosomal proteins
Members of this superfamily methylate
histones, other chromosomal proteins
and cytoplasmic proteins such as
RUBISCO and cytochromes
Crystal structures suggested that it has
a unique complex fold that that is
different from the classic methylases
with the Rossmann domains and the
SPOUT domains
Phylogenetic and phyletic analysis
suggests that this domain has
originated de novo in the eukaryotic
lineage
How did this happen?
Origin of the SET domain through duplication of a simple unit
Ancestral simple 3strand unit
Existence as obligate
ligand-binding dimer
Knot
Duplication favoring
knot formation
Insertion/ further
duplication in
loop:
differentiation of
two dimers
AdoMet
The continuing story of enzymes
General tendencies in enzyme evolution
Are there different temporal phases in which different catalytic activities were acquired by different folds ?
Are there differences in terms of the number of different catalytic activities accommodated by different folds?
What are their obvious structural determinants?
Invention of enzymes in the later phases of evolution
How do non-enyzmatic domains become enzymes ?
Similar active sites different catalytic mechanisms
Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily
The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion
Convergent evolution of active sites
The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase
The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases
Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases
Does the scaffold matter in catalysis?
The mysterious case of the ISOCOT fold
Are there engines of innovation in the biosphere?
The bacterial engine of enzyme innovation
The ISOCOT domain is shared by sugar isomerases, eIF2B, DeoR transcription
factors, acyl-CoA transferases and methenyltetrahydrofolate synthetase
NH2
MdcA
Metal Chelating flap
RpiA
H
Metal
C C
S4b
S4b
S4a
S4a
H3
NH2
COOH
S6
S5
S4
S1
S2
S6
S3
S5
S4
H0
H1
S6a
H2
COOH
Eif2B
(1t9k)
CoA C term
H0
MTHFS
S4b
COOH
S4b
COOH
N
S4a
S6
S4a
S5
H3
S4
S1
S5
S4
S1
S2
S2
S3
H4
H3
H0
S6
H1
NagB
NagB &
Sol1
N
S3
COOH
ISOCOT Core Structure
Sol1
S2
H4
H4
S6a
S1
H1
H2
S3
NH2
H4
H0
H1
H2
CoA transferase
N-terminal domain
NH2
All CoA transferase
ISOCOT domains
The ISOCOT domain is shared by sugar isomerases, eIF2B, DeoR transcription
factors, acyl-CoA transferases and methenyltetrahydrofolate synthetase
Unusual features of the ISOCOT fold
The ISOCOT fold is a derived version of the Rossmannoid fold with a specialized flap tucked under
an archway
Despite considerable catalytic diversity, the general location of the substrate binding sites is similar
in this fold
There are unique extensions and inserts that form caps, which control access to the active site
Although these common positions are involved in substrate interactions throughout the superfamily,
there is considerable variety in terms of the actual residues in these positions
Interestingly, even ISOCOT fold enzymes catalyzing similar reactions may not share specific
catalytic residues, beyond the generic features of the fold
Ribose phosphate isomerase and methylthioribose-1-phosphate isomerase
Oxacid :acetyl CoA transferase, malonate:acetyl-S-ACP transferase and citrate:acetyl-S-
ACP transferase
In most of the other large enzymatic superfamilies: members catalyzing mechanistically similar
reactions generally preserve a fixed set of highly conserved active site residues
Thus, the ISOCOT fold indicates that certain substrate-binding scaffolds may, by themselves, play a
major role in allowing particular catalytic activities, and show a lower dependence on strictly
conserved residues
The continuing story of enzymes
General tendencies in enzyme evolution
Are there different temporal phases in which different catalytic activities were acquired by different folds ?
Are there differences in terms of the number of different catalytic activities accommodated by different folds?
What are their obvious structural determinants?
Invention of enzymes in the later phases of evolution
How do non-enyzmatic domains become enzymes ?
Similar active sites different catalytic mechanisms
Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily
The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion
Convergent evolution of active sites
The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase
The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases
Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases
Does the scaffold matter in catalysis?
The mysterious case of the ISOCOT fold
Are there engines of innovation in the biosphere?
The bacterial engine of enzyme innovation
Bacterial versus archaeal inheritances of eukaryotic enzymes
HAD superfamily
Vertical inheritence
MDP-1/RNA polymerase
phosphatase
Total: 1
Archaea
HAD superfamily
HAD superfamily
Early transfers
Late transfers transfers
8KDO phosphatase clade--vertebrates
cN-I nucleotidase clade--vertebrates
sEHCT/Acad10 subfamily--animals
Phosphohistidine/phospholysine phosphatase subfamily -animals
VSP subfamily --plants
Sucrose phosphate synthase C-terminal domain (SPSC)
family --plants
Sucrose phosphate phosphatase (SPP) family--plants
CbbY subfamily--plants
HerA-associated family--plants
DOG subfamily (BPGM family)--fungi
NapD subfamily (PSP family)--fungi
PHM8-SDT1 subfamily (Sdt1p family)--fungi, plants,
microsporidians
Dehr subfamily II (HAD family)--fungi, C.elegans, Giardia
YihX subfamily (Sdt1p family)--some fungi, plants and 7
others
Total:21
Deoxyribonucleotidase family
YniC subfamily (BPGM family)
Dehr subfamily I (HAD family)
CUT1/CECR5 subfamily (NagD family)
Phosphomannomutase (PMM) family (cof
clade)
Total: 5
Bacteria
The currently available data suggests that:
In most major superfamilies of enzymes the direction of flow of laterally
transferred proteins is from bacteria to eukaryotes and bacteria to archaea
In eukaryotes most major biochemical innovations related to neurotransmitter
biosynthesis, poly and oligosaccharide chains for glycoproteins, novel
substrate utilization are due to enzymes acquired relatively late in eukaryotic
evolution.
Thus, not only did the bacteria contribute to the fundamental aspects of
eukaryogenesis, but also appear to be the chief providers for new biochemical
activities.
Acknowledgements
Aravind group
Vivek Anantharaman
Collaborators
Eugene Koonin Group (NCBI)
Detlef Leipe Group (NCBI)
Max Burroughs
MH Park group (NICFD)
Lakshminarayan Iyer
Karen Allen group (Boston U)