* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How do non-enyzmatic domains become enzymes
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Transcriptional regulation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Lipid signaling wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Western blot wikipedia , lookup
Restriction enzyme wikipedia , lookup
Biosynthesis wikipedia , lookup
Enzyme inhibitor wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Proteolysis wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Metalloprotein wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Evolution of biocatalysis structural and genomic trends L. Aravind Computational Biology Branch National Center for Biotechnology Information Enzymes achieve incredible feats with relative ease How do they do that? Fritz Haber Haber Process for NH3 synthesis Nitrogenase Bacterial nitrogen fixation N2(g) + 3H2(g) ↔ 2NH3(g) + ΔH N2 + 8H+ + 8e- ↔ 2NH3 + H2 200..900 Atmospheres pressure 450°C Iron catalyst (Molybdenum promoter) Ambient pressure 20-30°C Catalytic metal cluster with active molybdenum or vanadium 16ATP->ADP+Pi Industrial and biological catalysts Zinc Some metals are common to both industrial and biological catalysts… Iron Molybdenum Nickel Vanadium Magnesium Yet the biological systems… bypass the draconian temperature and pressure regimes… So the protein scaffold and small molecule co-factors matter in some way. Other Rosmannoids PP-ATPases Photolyase USPA ETFP- A and B Aminoacyl tRNA synthetases HIGH nucleotidyltransferases Translation tRNA Biogenesis Origin of KMSK signature LUCA Origin of bi-helical Cterminal module module Nucleotidyl transferase Pyrophosphatase ATP->AMP+PP ATPase, AMP binding Where did the specificity come from? At the junction between the protein and RNA worlds: RNASE PH RNASE P- S5 domain (Ribozyme) Ribosomal protein S5- S5 domain (RNA binding protein) RNASE PH active site region (enzyme) Both Rnase P and Rnase PH share a common S5 domain that is found in several nucleic acid binding contexts RNAse P has an active Ribozyme while RNAse PH is entirely a protein enzymes RNASE P Active site on ribozyme RNASE PH The active site was probably built on to the protein core with additional protein motif protein R ribozyme It is possible that the early nucleic acid binding domains were in place even when ribozymes were still active. Catalytic activities appeared in these proteins slowly displacing the ribozymes D The Echoes of a lost world Analysis of phyletic patterns and higher order evolutionary relationships show that many ancient protein folds had paralogous representatives that can be traced back to LUCA Most of these ancient folds contain RNA-binding versions and the particular representatives are often associated with RNA-binding. We have evidence for protein synthesis before the translation apparatus was in place A Ribozyme makes all extant proteins Suggests a possible role for RNA and emergence of enzymes by displacement of ribozymes by protein enzymes The continuing story of enzymes General tendencies in enzyme evolution Are there different temporal phases in which different catalytic activities were acquired by different folds ? Are there differences in terms of the number of different catalytic activities accommodated by different folds? What are their obvious structural determinants? Invention of enzymes in the later phases of evolution How do non-enyzmatic domains become enzymes ? Similar active sites different catalytic mechanisms Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion Convergent evolution of active sites The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases Does the scaffold matter in catalysis? The mysterious case of the ISOCOT fold Are there engines of innovation in the biosphere? The bacterial engine of enzyme innovation Some fundamental concepts Bio-catalysis relies on a constellation of a few amino acid residues that are embedded in a distinct globular domain of the enzyme (the catalytic domain). Generally, the set of catalytic residues is highly conserved during evolution, but variations occur in the substrate-binding and cofactor-binding sites. This is exploration of “substrate space” by essentially adapting the same biochemical activity on a range of different substrates. (Same basic biochemical activity) E.g. various Rossmann-fold methyltransferases transfering methyl groups from AdoMet to various substrates. In contrast, enzymes with similar catalytic residues may explore considerable diversity in “reaction space”. (Different activities) E.g. DnaG-type primases and most topoisomerases share a catalytic domain (the TOPRIM domain) with an identical core set of catalytic residues. Two distinct reactions: nucleotidyltransferase (transferase; class 2 according to the EC classification) and topoisomerase (isomerase; class 5 according to the EC classification) A more dramatic case of exploration of the reaction space is seen in the stem glycolytic (Embden-Meyerhoff) pathway: four of the glycolytic enzymes, 1,6 fructose bisphosphate aldolase, triose phosphate isomerase, enolase, and pyruvate kinase, that catalyze three distinct reactions, have the same structural scaffold, the TIM barrel At least 1 superkingdom few terminal branches All 3 superkingdoms Folds ub le ps TFo ld e ik e M ib R os ar et s re m al l an lo be nfo ta ld la c ta Pm Lo as op e AT Pa se do T se ne C O -L IT H P ik e U Llik PF da O IS ho R H eH l rre AD -L H Ba N As R M TI Activities Activites by superkingdom 18 16 14 12 10 8 6 4 2 0 Number of Proteins (per 1000) Pseudomonas aeruginosa 35 P-loop NTPases 30 TIM Barrel RRM-like HUP DSBH beta-propeller RNAseH Metallobetalactamase P-Loop ATPase double psi barrel T-Fold Rhodanese PFL-like HAD-like HIT-like Rossmann-fold Classic Rossmann fold 25 20 15 10 Scatter plot of the number of distinct enzymatic activities vs. the number of representatives of common folds in the proteomes of the proteobacterium Pseudomonas aeruginosa and the yeast Saccharomyces cerevisiae 5 0 0 5 10 15 20 Activities Saccharomyces cerevisiae Number of Proteins (per 1000) 35 TIM Barrel RRM-like HUP DSBH beta-propeller RNAseH Metallobetalactamase P-Loop ATPase double psi barrel T-Fold Rhodanese PFL-like HAD-like HIT-like Rossmann-fold 30 P-loop NTPases 25 20 15 Classic Rossmann fold 10 5 0 0 5 10 15 Activities 20 The number of activities in most common folds scales linearly with their prevalence in the proteome However, there are some striking exceptions such as the P-loop NTPases and classical Rossmann fold enzymes which show extensive proliferation but apparently very little exploration of reaction space Folds with few and many activities PSUS, cyclase, polymerase MSOR Primase (polymerase) Cyclase, polymerase ACP NDPK Many NH2 Dehydrogenase COOH Phytase, Arabinase NH2 RRM-Like Fold Walker A loop Dehydrogenase COOH Phytase, Arabinase Phytase Phytase Walker B-aspartate Arabinase COOH Few Beta Propeller NH2 P-loop NTPase For each fold, the positions of the catalytic residues from several representative examples are shown in red Of shapes and functions The TIM barrel, the beta-propellers, and the DSBH domain contain a central pocket that binds their substrates and/or cofactors, with an approximate cyclic symmetry. The pocket that is inherent to these structures allows easy accommodation of diverse substrate molecules through low-specificity interactions. Subsequently, natural selection could act on these proteins to fix residues that impart interaction specificity and catalytic capacity. The intrinsic symmetry of the central pockets of these folds creates the potential for different catalytic residues to emerge on the surface of the substrate-binding site, providing for the evolution of a wide range of activities. The two-layered RRM-like fold that consists of two helices packed against a four-stranded anti-parallel sheet represents another structural principle in the evolution of multicatalytic folds. The main theme in this case appears to be the large exposed surface area of an entire sheet that is provided by the two-layered structure. The P-loop and Rossmann folds are 3-layered sandwiches where a central sheet is protected on both sides by helices. This only leaves the loops for interactions and this configuration has been less favorable to explore reaction space. The continuing story of enzymes General tendencies in enzyme evolution Are there different temporal phases in which different catalytic activities were acquired by different folds ? Are there differences in terms of the number of different catalytic activities accommodated by different folds? What are their obvious structural determinants? Invention of enzymes in the later phases of evolution How do non-enyzmatic domains become enzymes ? Similar active sites different catalytic mechanisms Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion Convergent evolution of active sites The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases Does the scaffold matter in catalysis? The mysterious case of the ISOCOT fold Are there engines of innovation in the biosphere? The bacterial engine of enzyme innovation The UTRA domain: non-enzymatic domain to enzyme Chorismate lyase Ancestral unit Dimerization and ligand- binding small molecule binding domain of HutC/FarR transcription factors emergence of key polar residues Chorismate lyase Chorismatepyruvate+ 4-hydroxybenzoate The continuing story of enzymes General tendencies in enzyme evolution Are there different temporal phases in which different catalytic activities were acquired by different folds ? Are there differences in terms of the number of different catalytic activities accommodated by different folds? What are their obvious structural determinants? Invention of enzymes in the later phases of evolution How do non-enyzmatic domains become enzymes ? Similar active sites different catalytic mechanisms Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion Convergent evolution of active sites The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases Does the scaffold matter in catalysis? The mysterious case of the ISOCOT fold Are there engines of innovation in the biosphere? The bacterial engine of enzyme innovation DHH S5eq D DxD S4eq S6eq K Acidic active site Rossmanoid folds S1 S2 DHH RECEIVER S S5eq D S4eq S6eq DxD E S4eq S1 S2 D S1 D D S4eq S1 S2 VWA TOPRIM S5eq S6eq S5eq S2 C0/C1 Cap Insertion K DxD DD/DxxxD T/S C2 Cap Insertion Classic HAD S6 S5 S4 S1 S2 S3 S3.1 S3.2 S2.1 Anatomy of the HAD catalytic domain FLAP Squiggle Elaboration of C1 caps 1N9K Acid Phosphatase 1SU4 P-type ATPase 1O08 SDT1 1TA0/1U7O CTD/MDP-1 1K1E 8KDO Phosphatase 1F5S Phosphoserine Phosphatase 1MH9 Deoxyribonucleotidase 1QYI Zr25 Elaboration of C1 caps Cof Clade HisB family C 1NF2, Tm0651 Cof family C C Zn Histidinol phosphate phosphatase family 1NRW, Ywpj Cof family Nagd Clade 1L6R, Apc0014 Cof family 1U02, otsB Trehalose Phosphate Phosphatase family 1XVI, Yedp Mannosyl-3-phosphoglycerate phosphatase family PSP family 1FS5, serB SerB subfamily 1VJR, Tm1742 CIN/AraL subfamily 1Y8A, Af1437 Af1437 subfamily Enzyme may emerge from non-enzymatic ligand binding domains by acquisition of key catalytic residues The development of special structures around the active site play a major role in influencing catalytic mechanisms The continuing story of enzymes General tendencies in enzyme evolution Are there different temporal phases in which different catalytic activities were acquired by different folds ? Are there differences in terms of the number of different catalytic activities accommodated by different folds? What are their obvious structural determinants? Invention of enzymes in the later phases of evolution How do non-enyzmatic domains become enzymes ? Similar active sites different catalytic mechanisms Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion Convergent evolution of active sites The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases Does the scaffold matter in catalysis? The mysterious case of the ISOCOT fold Are there engines of innovation in the biosphere? The bacterial engine of enzyme innovation Active site convergence in hydroxylases: the elusive deoxyhypusine hydroxylase eIF5A is a translation initiation factor found in archaea and eukaryotes and is required for the formation of the first peptide bond (Ortholog of bacterial EF-P). Critical for its function is the modification of a highly conserved lysine: In archaea it is modified into deoxyhypusine, while in eukaryotes it is modified further into hypusine. This additional modification is critical for the fitness of all eukaryotic models studied to date. eIF5A is the only protein in the whole cell with this unusual amino acid hypusine. This amino acid and the activity appears to be unique to eukaryotes While the enzyme catalyzing the first step is well-known, deoxyhypusine hydroxylase has eluded identification for 20 years All that was known about it was that it might have metal-dependence. This is not surprising given that most other hydroxylases show this dependence The majority of hydroxylases can be unified into the Double-stranded beta helix fold 2OG-Fe Dioxygenases Several new members were discovered in this class including protein lysyl and prolyl hydroxylases and alkylated base methylases, like AlkB, the DNA repair enzyme In addition to sharing a common fold they share a a peculiar HX[HQD] signature at the N-terminus and a conserved H at the Cterminus. They both bind ligands in the interior and catalyze a range of monooxygenase or dioxygenase reactions However, there are two major divisions within them that differ in terms of their dependence on 2-ketoacid co-factors AraC-like DSBH This class includes non-catalytic sugar binding domains of prokaryotic transcription factors and the JOR domain which was predicted to demethylate chromatin proteins Despite these enzymes having all the necessary credentials to function as a deoxyphypusine hydroxylase none of them matched their phyletic patterns mirrored that of DOHH. Experimental tests with enzymes of this family for DOHH activity also failed to uncover any activity. Deoxyhypusine hydroxylase turns out to be HEAT repeat protein with a symmetric dyad of 4 repeats High throughput protein interactions recovered HEAT repeat protein as the principle eIF5A interactor outside of the translation apparatus HE HEAT REPEAT: FINAL S. cerevisiae S. pombe A. nidulans N. crassa L. edodes E. cuniculi T. annulata C. parvum A. thaliana D. discoideum D. melanogaster D. rerio H. sapiens L. major Consensus/95% HEAT REPEAT: FINAL S. cerevisiae S. pombe A. nidulans N. crassa L. edodes E. cuniculi T. annulata C. parvum A. thaliana D. discoideum D. melanogaster D. rerio H. sapiens L. major Consensus/95% HE --HHHHHHHHHH-------HHHHHHHHHHHHHHH------HHHHHHHHHH-----HHHHHHHHHHHHH----HHHHHHHHHHH------HHHHHHHHHHHHH------HHHHHHHHH---------HHHHHHHHHHHHHH-HHHHHHHHHHHHH----------EEEE-EEE-DECTLEQLRDILVNKSGKTVLANRFRALFNLKTVAEE(9)KAIEYIAESFVND-KSELLKHEVAYVLGQTKNLDAAPTLRHVMLDQNQEPMVRHEAAEALGALG----DKDSLDDLNKA--AKEDPHVAVRETCELAIN--RINW--THGGAKD---KENLQQSLYSS-IDPAP PQAVIDELERVLVNLDKSNPLSFRYRALFSLNALAKK(3)RAVDAIYKAF-ID-DSELLKHEMAYVMGQSGQQYAVQPLINIVNDLDQQVMVRHEAAEALGALG----FTESLPVLEKY--YKEDPLAPIRETCELAIA--RIQW--KNGLDKN---NEKITPSMYDSVVDPAP ADTTVQTLRNVLTS--ETEPLARRFRALFSLKHLACL(9)PAIQAIAAGF-SS-ASALLKHELAYCLGQTRNTDALPFLLDVVQDTQEDSMCRHEAAEALGALG----YESSLEVLKALR-DNENEVDVVRETCDIAVD--RILW--EQSEARK---AEKLKPSDFTS-IDPAP MSATIASLRESLCS--ETTPLPIRFRALFSLKHLAVQ(9)SAIDAIAAAF-AS-PSALLKHELAYCLGQTGSDAAIPHLTQVLEDLQEDPMCRHEAAEALGALG----KAESLGVLQKYL-HREGEDVSVKETCEIAID--RIEW--ENSEERK---QEKLRQSDFAS-VDPAP SATQLKALEDSVLNTSGKVLLHDRVRALFTLKSLKNE---DAIRIISKGF-QD-SAALLKHELAYCLGQIRNPLALPVLESVLRNPSEDPMVRHEAAEAMGAIS----TADSIPILKQ---YLSDPDRSVRETCEIAIA--KIEWDKTEEGAKN(4)RDENRLPLYTS-IDPAP --MDIEVARKNIGC--DSVSIAKRMRSLFYLRNVLLP---ESARAITEAF-GS-KSVLLKHEAAYVLGQMRMEESVRVLLDVLSDEDEDEIVRHEAGEALGNFR---PREEIVEALRK---YSNHPKKPISETCYLALM--KLK-------------DGSDIVSKFGS-RDPAL SSPSKDLITSILLN--PEVPLSLQLRALYYCRDLPEE---DCSKILISALDVH-FDTFMRHEIAYVIGQSGCFSASKKLAELVEDVTEDPMVRHEAIEALAALK----SKDHIHLIKK---YCDDENRAVRDTCNLALHT-LINAEDSNTEGCT---SFPISSSPYRA-IDPVR NIYSKEKIRELLLS--HDTDISSKIRCLFFGRFHGDE---ESAEMLSKSLDYS-ESVLFRHEVLYVLGQMGLKSPLTRLYEILADETEHPMVRHEAGEAIAAIG----DDESLEIVEK---YLNDNSPAVRETCYLAAHSLRLKREKRLKESNK(4)TNISNINAFNT-RDPTP MVNLEKFLCERLVD--QSQPISERFRALFSLRNLKGP---GPRNALILAS-RD-SSNLLAHEAAFALGQMQDAEAIPALESVLNDMSLHPIVRHEAAEALGAIG----LAGNVNILKKS--LSSDPAQEVRETCELALK--RIEDMSNVDAENQ---SSTTEKSPFMS-VDPAG TEEIVNGLKETLTD--VSQPIAKRFRSLFTLRNLNGP---LCIDAMASAL-ND-KSALLRHEIAYCLGQMEDEYALKVLIDLVKNSDEHPMVRHEAAEALGAIG----SESAHKTLKE---YSNDPVREVSETCQLALS--RVEWYEKNK-------PETEEDKMYMS-VDPAP SQQQIEAIGGVLNN--KERPLKERFRALFTLKNIGGG---AAIEAISKAF-DD-DSALLKHELAYCLGQMQDAQALDILTKVLKDTTQEPMVRHEAAEAMGAIG----HPDVLPILEE---YKQDPVVEVAETCAIALD--RVRWLQSG--------QKVDDSNPYAS-VDPSP NDKDIAAVGSILVN--TKQDLTTRFRALFTLRNLGGA---EAVKWISEAF-VD-ESALLKHELAYCLGQMQDESAIPTLEAVLKDTNQEPMVRHEAGEALGAIG----NPKVLELLKK---YAEDPVIEVAETCQLAVK--RLEWLM-NGGEQT---KDGTDENPYCS-VDPAP TEQEVDAIGQTLVD--PKQPLQARFRALFTLRGLGGP---GAIAWISQAF-DD-DSALLKHELAYCLGQMQDARAIPMLVDVLQDTRQEPMVRHEAGEALGAIG----DPEVLEILKQ---YSSDPVIEVAETCQLAVR--RLEWLQQHGG--------EPAAGPYLS-VDPAP TVEEVRKEYAKLLD--PQEPLDSRMRELYRLKEDCLK(2)AGVTVILEAIDTT-DSVLLQHELAYNAGQSGREEAVPELERILRTTSYDVVTRHEAAEALGAIG----SPLALQVLETHSAPTTEPEASIRETCELALA--RIAMKETKGDAA----VAPPSGCEFVS-VDPSP ...........h.s...p..h..phR.Lh.hp.........s..hh..u.s.p..s.hh.HEhhhshGQ.....s...L..hh.s.pbp.hhRHEAhEAhhsh.........h..hp......p.....h.-TC.hAh...bh......................h.s..DPs. -----HHHHHHHHHH----HHHHHHHHHHHHH--------HHHHHHHHHHH------HHHHHHHHHHHH----HHHHHHHHHHH------HHHHHHHHHHHHH---HH HHH-HHHHH-------HHHHHHHHHHHHHHH-------------PLPLEK---DATIPELQALLNDPK-Q-PLFQRYRAMFRLRDIGT----DEAILALATGF-SA-ESSLFKHEIAYVFGQIGSPAAVPSLIEVLGRKEEAPMVRHEAAEALGAIASPE----VVD-VLKSYL--NDEVDVVRESCIVALDMYDY-ENSNELEYAPTAN-------------PMPDHEQDVKSEVAKLRSEIVDQN-L-PLFYRYRVMFRLRNIGN----EEAVLALTDGF-KD-PSPLFRHEIAFVFGQMIAPASVPALIKVLENTEEVPMVRHEAAEALGGIANDE----CLP-VLKKFS--KDDVRVVAESCIVALDMIEY-EKSGDMEYAYIPKVSA----------PMPLTAK--EPSIPDLEKTLLDTN-L-PLFERYRAMFGLRDLAS(7)-KQAVQSLAKGM-KD-PSALFRHEIAFVFGQLCHPASVPSLTETLSDLNEVGMVRHEAAEALGSLGDVE----GVEDTLKKFL--NDPEKVVRDSIIVALDMAEF-EKNGEIEYALIPDSGNPAAVPAA---PMPEDDE--KQTVETLEKKLLDTS-L-PLFKRYRAMFALRDLAS(7)-VPAILALAKGL-KD-ESALFRHEIAFVFGQLSHPASIPALTEALSNLDEVSMVRHEAAEALGSLGDEE----GVEETLLKFL--HDKEKVVRESVIVALDMAEF-EQSGQAEYALIPEVASKAS----(9)PRPEEIS--QTKIDELRDNLLDVN-R-PLFERYRAMFALRNIGS----PAAVDALAAGF-SG-DSALFKHEIAFVFGQLLSPHSVPCLIEVLQNSPESDMVRHEAAEALGGIATPE----VLP-PLKEWVARDDAPVVVRESCQVALDLWEY-ENSGDFQYANGLESPSTPISV-----PM-------EGSFEEARRILLDKN-E-CLYRRYQAMFYLRDLGT----SAAIHALGKSM-ED-DSALFKHEVSFVFGQMRSRESIPYLIKGMEDEKEHGMVRHECAEALGAIGDDA----ALK-ALSKYL--HDPCDILRESVEVAVDIHSY-MTGDEIEYCNAE--------------TDSVD----ESDLNSLSEILFNQS-L-PLYKRYEALYKIRGISG----DEAAKIIGEALVKDKVSEVFRHECAFVLGQMQSVAPVKSLIECLRNRNEEPMARHEAALALGSCASLY(14)IVE-VLEEFL--QDEVKVVSDSCLVAMDYIN--ESKHELTAH-----------------PKSSCE---VSHIESLASDLLNED-L-QLEKRYAALFALRNILT(24)HFIAGEIAKAMEIDKSSAVFRHECAFVLGQIQVISTADTLSRVLSNQSEESMVRHEAAFALGSVGSND(25)SIE-TLLKYS--NDLDIIVAESCIVGLQTIM--DETGSLDILLE---------------PAAS-----FSSVHQLRQVLLDET-K-GMYERYAALFALRNHGG----EEAVSAIVDSL-SA-SSALLRHEVAYVLGQLQSKTALATLSKVLRDVNEHPMVRHEAAEALGSIADEQ----SIA-LLEEFS--KDPEPIVAQSCEVALSMLEFENSGKSFEFFFTQDPLVH---------PLKKG----SVSRDELRSKFLDSN-L-DIFNRYRALFSLRDIGD----EQSVLALCDGL-KDQSSALLRHEVAFVLGQLQHRVAIDPLTTCVLDESENAMVRHEAAEALGAIASTE----TIP-LLEKLL--QDKEPIVSESCAVALDVTEYFNNTESFQYADGIKILLEKNLV(5) ---PTAG-----DKSVTELKAIYLDAQ-Q-SLFDRYRAMFSLRNLRT----EESVLAIAEGL-KD-SSALFRHEVAFVLGQLQEPCSIPFLQENLEDRLENEMVRHECAEALGAIATED----CIQ-ILNRYA--EDDKRVVKESCVIALDMCEY-ENSPEFQYADGLAKLDATK-------PAQ------RKSVPELRTQLLDET-L-PLFDRYRAMFALRNLGT----EEAVLALGDGL-QC-SSALFRHEIGYVLGQIQHEASIPQLQAALEKMDENAMVRHECAEALGSIGKEP----CVQ-ILERYR--KDQERVVKESCEVALDMLEY-ENSSQFQYADGLLRLQSAH-------PAE------ERDVGRLREALLDES-R-PLFERYRAMFALRNAGG----EEAALALAEGL-HC-GSALFRHEVGYVLGQLQHEAAVPQLAAALARCTENPMVRHECAEALGAIARPA----CLA-ALQAHA--DDPERVVRESCEVALDMYEH-ETGRAFQYADGLEQLRGAPS---(8)TDEPV----PLTVEELEAVLLDTSGRTRLFRRYMAMFTLRNLAT----EAAVAALCRGLREDTISALFRHEVAFVLGQLERPSSQPALIAALKDEEEAPMVRHEAAEALGAIADPA----TLP-VLESYA--THHEPIVRDSCVVALEMHKY-WAHFNSLAHQQQEA---------...s..........p..ph...h.s.p.b..hb.RY.hhh.hRsh.s.......h..hh.uh..s..S.hh+HEhuhVhGQh....s...L..sh.p..E..MhRHEhAbALGuhhp.......h...L..h...pc...hh.pSh.hhhphh..........hh............... HE HE The metal chelating sites are boxed Typically HEAT repeat proteins are involved in protein-protein interactions. But DOHH and a few related phycocyanobilin synthases are rare all alpha-helical enzymes and use the HEAT repeats as their catalytic scaffold. Completely different scaffolds but similar active site The arginine finger in catalysis of phosphohydrolysis: how general is it? The nucleophilic attack by water on a NTP results in a hypercharged pentavalent intermediate which needs to be stabilized for the reaction to proceed In GTPases this was found to be mediated by the GTPase-activating protein the GAP which provides an arginine finger which stabilizes the intermediate. R R R R R PilT SFI/II AAA+ ATPase STAND DNAB R HerA/FtsK The tale of moving fingers in P-loop NTPases Arginine fingers are widely utilized but are not conserved even within the P-loop NTPases They have evolved in at least 5 distinct families of enzymes and on at least 14 independent occasions in the P-loop NTPase fold. On at least one occasion, the P-loop NTPases have innovated a potassium finger, where a potassium in coordinated by an acidic residue. Combining the spatial locations of R-fingers with the classification scheme for the P-loop NTPase suggests that is was: probably absent in the ancestral version of the fold; received their R-finger from a ribozyme because arginine has been found to be a cofactor from phosphotransfer catalyzing ribozymes it has shifted position in course of evolution This differential positioning of the R-finger allowed several different ways of coupling the free energy of NTP hydrolysis to different downstream motor functions. Thus, it seems to have been a major factor in the occupation of diverse sub-cellular niches by the P-loop NTPase fold. Arginine finger Methenyl tetrahydrofolate synthetase R Lysine finger P-type ATPase (HAD superfamily) K Tale of two knots The SPOUT superfamily of methyltransferases includes a vast group of RNA methylases that are prototyped by SpoU and TrmD. They mediate the transfer of -CH3 from AdoMet to various bases. They differ from all classic methyltransferases in having a unique active site constellation. The N-terminal motif involved in SAM binding is a glycine-rich loop similar to other methylases. But they have an additional C-terminal motif that is associated with a structural knot AdoMet Regular Rossmannoid fold SPOUT Knot Rotation of C-terminal unit SET domain methylases have a knotted active site The SET domain is a methyltransferase that is prevalent in eukaryotic chromosomal proteins Members of this superfamily methylate histones, other chromosomal proteins and cytoplasmic proteins such as RUBISCO and cytochromes Crystal structures suggested that it has a unique complex fold that that is different from the classic methylases with the Rossmann domains and the SPOUT domains Phylogenetic and phyletic analysis suggests that this domain has originated de novo in the eukaryotic lineage How did this happen? Origin of the SET domain through duplication of a simple unit Ancestral simple 3strand unit Existence as obligate ligand-binding dimer Knot Duplication favoring knot formation Insertion/ further duplication in loop: differentiation of two dimers AdoMet The continuing story of enzymes General tendencies in enzyme evolution Are there different temporal phases in which different catalytic activities were acquired by different folds ? Are there differences in terms of the number of different catalytic activities accommodated by different folds? What are their obvious structural determinants? Invention of enzymes in the later phases of evolution How do non-enyzmatic domains become enzymes ? Similar active sites different catalytic mechanisms Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion Convergent evolution of active sites The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases Does the scaffold matter in catalysis? The mysterious case of the ISOCOT fold Are there engines of innovation in the biosphere? The bacterial engine of enzyme innovation The ISOCOT domain is shared by sugar isomerases, eIF2B, DeoR transcription factors, acyl-CoA transferases and methenyltetrahydrofolate synthetase NH2 MdcA Metal Chelating flap RpiA H Metal C C S4b S4b S4a S4a H3 NH2 COOH S6 S5 S4 S1 S2 S6 S3 S5 S4 H0 H1 S6a H2 COOH Eif2B (1t9k) CoA C term H0 MTHFS S4b COOH S4b COOH N S4a S6 S4a S5 H3 S4 S1 S5 S4 S1 S2 S2 S3 H4 H3 H0 S6 H1 NagB NagB & Sol1 N S3 COOH ISOCOT Core Structure Sol1 S2 H4 H4 S6a S1 H1 H2 S3 NH2 H4 H0 H1 H2 CoA transferase N-terminal domain NH2 All CoA transferase ISOCOT domains The ISOCOT domain is shared by sugar isomerases, eIF2B, DeoR transcription factors, acyl-CoA transferases and methenyltetrahydrofolate synthetase Unusual features of the ISOCOT fold The ISOCOT fold is a derived version of the Rossmannoid fold with a specialized flap tucked under an archway Despite considerable catalytic diversity, the general location of the substrate binding sites is similar in this fold There are unique extensions and inserts that form caps, which control access to the active site Although these common positions are involved in substrate interactions throughout the superfamily, there is considerable variety in terms of the actual residues in these positions Interestingly, even ISOCOT fold enzymes catalyzing similar reactions may not share specific catalytic residues, beyond the generic features of the fold Ribose phosphate isomerase and methylthioribose-1-phosphate isomerase Oxacid :acetyl CoA transferase, malonate:acetyl-S-ACP transferase and citrate:acetyl-S- ACP transferase In most of the other large enzymatic superfamilies: members catalyzing mechanistically similar reactions generally preserve a fixed set of highly conserved active site residues Thus, the ISOCOT fold indicates that certain substrate-binding scaffolds may, by themselves, play a major role in allowing particular catalytic activities, and show a lower dependence on strictly conserved residues The continuing story of enzymes General tendencies in enzyme evolution Are there different temporal phases in which different catalytic activities were acquired by different folds ? Are there differences in terms of the number of different catalytic activities accommodated by different folds? What are their obvious structural determinants? Invention of enzymes in the later phases of evolution How do non-enyzmatic domains become enzymes ? Similar active sites different catalytic mechanisms Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion Convergent evolution of active sites The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases Does the scaffold matter in catalysis? The mysterious case of the ISOCOT fold Are there engines of innovation in the biosphere? The bacterial engine of enzyme innovation Bacterial versus archaeal inheritances of eukaryotic enzymes HAD superfamily Vertical inheritence MDP-1/RNA polymerase phosphatase Total: 1 Archaea HAD superfamily HAD superfamily Early transfers Late transfers transfers 8KDO phosphatase clade--vertebrates cN-I nucleotidase clade--vertebrates sEHCT/Acad10 subfamily--animals Phosphohistidine/phospholysine phosphatase subfamily -animals VSP subfamily --plants Sucrose phosphate synthase C-terminal domain (SPSC) family --plants Sucrose phosphate phosphatase (SPP) family--plants CbbY subfamily--plants HerA-associated family--plants DOG subfamily (BPGM family)--fungi NapD subfamily (PSP family)--fungi PHM8-SDT1 subfamily (Sdt1p family)--fungi, plants, microsporidians Dehr subfamily II (HAD family)--fungi, C.elegans, Giardia YihX subfamily (Sdt1p family)--some fungi, plants and 7 others Total:21 Deoxyribonucleotidase family YniC subfamily (BPGM family) Dehr subfamily I (HAD family) CUT1/CECR5 subfamily (NagD family) Phosphomannomutase (PMM) family (cof clade) Total: 5 Bacteria The currently available data suggests that: In most major superfamilies of enzymes the direction of flow of laterally transferred proteins is from bacteria to eukaryotes and bacteria to archaea In eukaryotes most major biochemical innovations related to neurotransmitter biosynthesis, poly and oligosaccharide chains for glycoproteins, novel substrate utilization are due to enzymes acquired relatively late in eukaryotic evolution. Thus, not only did the bacteria contribute to the fundamental aspects of eukaryogenesis, but also appear to be the chief providers for new biochemical activities. Acknowledgements Aravind group Vivek Anantharaman Collaborators Eugene Koonin Group (NCBI) Detlef Leipe Group (NCBI) Max Burroughs MH Park group (NICFD) Lakshminarayan Iyer Karen Allen group (Boston U)