* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Interdependence, Reflexivity, Fidelity, Impedance Matching
Non-coding DNA wikipedia , lookup
Magnesium transporter wikipedia , lookup
Bottromycin wikipedia , lookup
Gene regulatory network wikipedia , lookup
Western blot wikipedia , lookup
RNA silencing wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein adsorption wikipedia , lookup
Epitranscriptome wikipedia , lookup
List of types of proteins wikipedia , lookup
Two-hybrid screening wikipedia , lookup
History of molecular evolution wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Non-coding RNA wikipedia , lookup
Protein structure prediction wikipedia , lookup
Gene expression wikipedia , lookup
Molecular evolution wikipedia , lookup
Biochemistry wikipedia , lookup
bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. SubmittedtoMolecularBiologyandEvolutionasanArticle Bestfit:Discoveriessection Interdependence,Reflexivity,Fidelity,ImpedanceMatching,andtheEvolutionofGeneticCoding CharlesW.Carter,Jr1andPeterWills2 1.DepartmentofBiochemistryandBiophysics,UniversityofNorthCarolinaatChapelHill,ChapelHill,NC 27599-7260 ORCIDID:/0000-0002-2653-4452 2.DepartmentofPhysics,UniversityofAuckland,PB92109,Auckland1042,NewZealand ORCIDID:/0000-0002-2670-7624 CorrespondingAuthor:CharlesW.Carter,Jr.Email:[email protected] ABSTRACT Genetic coding is generally thought to have required ribozymes whose functions were taken over by polypeptideaminoacyl-tRNAsynthetases(aaRS).TwodiscoveriesaboutaaRSandtheirtRNAsubstrates nowfurnishaunifyingrationalefortheoppositeconclusion:thatthekeyprocessesoftheCentralDogma of molecular biology emerged simultaneously and naturally from simple origins in a peptide•RNA partnership,eliminatingtheepistemologicalneedforapriorRNAworld.First,thetwoaaRSclasseslikely arosefromoppositestrandsofthesameancestralgene,implyingasimplegeneticalphabet.Inversion symmetriesinaaRSstructuralbiologyarisingfromgeneticcomplementaritywouldhavestabilizedthe initialandsubsequentdifferentiationofcodingspecificitiesandhencerapidlypromoteddiversityinthe proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing reflexivityinproteinaaRS.Bootstrappingofincreasinglydetailedcodingisthusintrinsictopolypeptide aaRS,butimpossibleinanRNAworld.Thesenotionsunderlinethefollowingconceptsthatcontradict gradualreplacementofribozymalaaRSbypolypeptideaaRS:(i)anysetofaaRSmustbeinterdependent; (ii)reflexivityintrinsictopolypeptideaaRSproductiondynamicspromotesbootstrapping;(iii)takeover of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; (iv) the Central Dogma’semergenceismostprobablewhenreplicationandtranslationerrorratesremaincomparable. These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled gene-replicase-translatase system of genetic coding that would have continuously preserved the functionalmeaningofgeneticallyencodedproteingeneswhosephylogeneticrelationshipsmatchthose observedtoday. RunningTitle:EvolutionofGeneticCoding Keywords:Aminoacyl-tRNAsynthetases,Bootstrapping,Evolutionoftranslation,MolecularPhylogeny 1 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Introduction:whencemoleculargenetics? Gene expression consists of interpreting symbolic information stored in nucleic acid sequences. This irreversible computational process creates intrinsically novel meaning, and is thus fundamentally differentfromthephysicalchemistryunderlyingothernaturalprocesses,distinguishingitevenfromthe molecular biological processes of replication and transcription. Our goal here is to provide a new conceptual basis for understanding how informational readout and the synthesis of peptide catalysts frominstructionsingenesmightfirsthaveemergedandthenevolvedcompatiblywithinheritance. A. TheCentralDogmaandtheadaptorhypothesisimplyaminoacyl-tRNAsynthetases(aaRS) Itishelpfultothinkabouttheoriginofgeneticswithintheconceptualframeworkfirstarticulatedby Crick with recent modifications (Fig. 1). Crick recognized that protein synthesis must be directed by information archived in DNA sequences and that information flow proceeds unidirectionally via an intermediate RNA “message” to ribosomes. He proposed separately that linking gene sequences to proteinsequencesrequiredtheinterventionofathirdRNAcomponent(Crick1955)to“adapt”individual aminoacidsto“codon”unitsinthemessage(Fig.1A),accountingfortheinitiallyobscurerelationship betweenwhatturnedouttobecollinearsequencesofgenesandproteins. Participation of the adaptor, transfer RNA (tRNA), involves creating a covalent bond between its 3’ terminusandthecarboxylategroupofanappropriateaminoacid.Creationofthatbond,inturnrequires activation of the amino acid’s α-carboxyl group by reaction with ATP. In cells, activation and aminoacylation require a separate enzyme for each amino acid. These assignment catalysts, called aminoacyl-tRNAsynthetases(aaRS),werefirstclearlyidentifiedbyBergandOfengand(1958). Executing genetic coding rules requires that aaRSs recognize both amino acids and tRNAs with high specificitysothattheformercanbeescortedtotheribosomebythelatterforproteinsynthesis.However, specificrecognitionbyfoldedproteinsdependsonacomplex“ecology”basedonthechemicalbehavior andinteractionsofindividualaminoacids(Fig.1B).Thatbehaviorcanbeaccuratelyparameterizedby two experimental Gibbs phase transfer free energies—from vapor to cyclohexane and from water to cyclohexane—relatedtothesizeandpolarity,respectively,ofeachaminoacid’ssidechain(Carterand Wolfenden2015;Wolfenden,etal.2015;CarterandWolfenden2016).Correlationsbetweenthesefree energiesandtRNAidentityelementsrecognizedbyaaRSandthedistributionofaminoacidsbetween surfaces and cores after protein folding established these parameters as the main axes of a kind of “periodictable”ofaminoacids(CarterandWolfenden2016)concatenatedinchainsthatfoldtogenerate proteinsofvirtuallyunlimitedfunctionaldiversity,inanalogytojoiningatomstoformmolecules. ImplementingtheCentralDogma—theirreversibleattachmentsofaminoacidstocodon-specifictRNAs by aaRSs—thus exploits the ecology of the amino acids within those enzymes. Proteins folded in accordancewithsuchecologiesthat,inturn,executecomputationallycontrolledproductionfromgenes of specialized amino acid ecologies (including their own! ) compose a reflexive property known as a 2 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. “strangeloop”(Fig.1C;(Hofstadter1979).Recognizingthatloopopensfundamentallynewwaystothink aboutwhatenabledtheaaRStoemergeastheonlyproteinscodedbyprogramswrittenasmRNAthat can, once folded, collectively interpret the programming language in tRNA. We propose that this reflexivityinfunctionalchemistryandencodedinformationplayedacrucialroleincreatinggenetics. B. TheRNAWorldhypothesisfailstoaddresskeyquestionsaboutgeneexpression. Thedefaultframeworkforthinkingabouthowgeneticsemergedhasbeenafacilesolutiontotheproblem thatlifesimultaneouslyrequiresthatgeneticinformationmustbepassedfromgenerationtogeneration, and that catalysts must synchronize rates of chemical reactions underlying the accuracy in gene replication, expression, and metabolism. Base pairing between complementary nucleic acid strands answeredtheformerproblemimmediatelyanddecisively,oncethehelicalstructureofdouble-stranded DNAwaselucidated(WatsonandCrick1953),andpointedlyhighlightedthesecondproblem. ThecrystalstructureoftRNAPhe(Kim,etal.1973)revealedthat,unlikeDNA,RNAcanassumetertiary structures,consistentwithproposals(Woese1967;Crick1968;Orgel1968)thattheearliestcatalysts alsomighthavebeenRNAsthatcould“dothejobofaprotein”(Crick1968).Thathypothesishasbeen sustainedalmostexclusivelybytheobservationthat,whereasproteinscannotreadilystoreortransmit digitalinformation,RNAdoeshaverudimentarycatalyticproperties(Cech1986;Guerrier-Takada1989). The expedient conclusion that RNAs functioned as both genes and catalysts in a life form devoid of proteinswasrapidlyembracedas“theRNAWorld”(Gilbert1986). Theclaritywithwhichbase-pairingsolvedtheinheritanceproblemandthediscoveryof,andfascination with,catalyticRNAshort-circuitedthequesttounderstandandanswerdeeperquestions: CatalyticRNAitselffallsfarshortoffulfillingthetasksnowcarriedoutbyproteins.Theterm“catalytic RNA” overlooks three fundamental problems: (i) it vastly overestimates the potential catalytic proficiency of ribozymes (Wills 2016); and fails to address either (ii) the computational essence of translation, or (iii) the requirement that, throughout the evolution of translation and intermediary metabolism,catalystsnotonlyaccelerate,butmoreimportantly,synchronizechemicalreactionswhose spontaneousratesdifferbythan1020–fold(WolfendenandSnider2001). Thenexusconnectingpre-bioticchemistrytobiologyisnotreplicationbutthetranslationtablethatmaps amino acid sequences of functional proteins onto nucleotide triplet codons. The quintessential problem posed by life’s diversity (Carter and Wolfenden 2016; Wills 2016) is how that critical transformation becameembedded,inparallel,intotRNAandgenesequences,togetherwiththeribosomalread-write mechanism(Bowman,etal.2015;PetrovandWilliams2015).SpontaneousfoldingofRNAaptamersand thedynamicsofanRNAworlddonotrequireencodingintogeneticinformationandhencefallwellshort ofwhat“conversiontoafunctionalmolecule”(HorningandJoyce2016)impliesforDarwinianevolution byselectionactingonphenotypes(Wills2016). 3 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. By the time protein folding organizes amino acid side chains into a functional active site, genetic informationhasbeenirreversiblytransformed.Anymolecularmachinechargedwithreversingtranslation byunfolding,then“reading”thesequenceofaproteinwouldrequireshuttlingeachsuccessiveamino acidthrough~20activesitesuntilonefitted,andthenovercomingtheredundancyofthegeneticcode. By enabling the inheritance of genetically encoded characteristics this one-way flow of genetic informationenshrinedintheCentralDogma(Koonin2015)ensuresthatbiologicalevolutiontranscends thesimplepopulationdynamicsofnaturalselectioninanyRNAworld. RNAresearchhasneverprovided,eitherexperimentallyorconceptually,evenanapproximatemodelfor howanearlyrandomcatalyticnetwork,withoutencodedproteins,mighthaveprogressivelybootstrapped thespecificityandselectivitycharacteristicofenzymicsystems(Hordijk,etal.2014).Thus,evolutionof synchronizedcatalysisrequiredsimultaneousevolutionofgeneticcoding. C. Many nevertheless embrace the RNA World with little reservation (Wolf and Koonin 2007; VanNoorden2009;Yarus2011b,a;Bernhardt2012;Breaker2012;RobertsonandJoyce2012). Itisimportant,therefore,toassesstheexperimentaldataonwhichthehypothesisrestsandtoseparate datathatgenuinelysupportthehypothesisfromthosethatonlyappeartodoso. SelectingevermoreproficientRNAaptamersfromlargecombinatoriallibrariesbasedoriginallyonselfsplicing introns only appears to support the existence of ancestral ribozymal polymerases. Despite the technical elegance and practical value of Selex experiments (Tuerck and Gold 1990), even fullydevelopedaptamerreplicases(Wochner,etal.2011;Attwater,etal.2013;SczepanskiandJoyce2014; Taylor, et al. 2015; Horning and Joyce 2016) would support only a limited version of the RNA World hypothesiswithoutphylogeneticrelationshipsconnectingthemtobiologicalancestry.Sofarasweknow, allnucleicacidsincontemporarybiologyaresynthesizedbyproteinenzymes,muchas,reciprocally,the synthesis of proteins from activated amino acids is catalyzed by an RNA template at the peptidyl transferasecenteroftheribosome(Noller,etal.1992;Noller2004;Petrov,etal.2014;Bowman,etal. 2015).Thusnophylogeneticbasisexistsforancestralribozymalpolymerases. The most compelling evidence that proteins were first coded by ribozymes is the extensive phylogenetic analysisofcontemporaryproteinfamilies.Kooninandcolleagues(Aravind,etal.2002;Leipe,etal.2002; KooninandNovozhilov2009;Koonin2011),andothers(Caetano-Anolles,etal.2007;Caetano-Anollés, et al. 2013; Caetano-Anollés and Caetano-Anollés 2016) argue that protein domains speciated substantially before the advent of protein-based aminoacyl-tRNA synthetases and translation factors. Consequently, they argue, a fully developed ribozyme-based version of the contemporary universal geneticcodemusthavefirstmappedRNAsequencestotheaminoacidsequencesofpeptides.Wewill call this fully-blown RNA World scenario the “RNA Coding World” (RCW; see also (Rodin and Rodin 2006b,a;RodinandRodin2008;Rodin,etal.2011)).Thecontemporary“ProteinCodingWorld”(PCW), whichusesaaRSenzymestoattachaminoacidstocognatetRNAs,isenvisagedtohaveevolvedbyaseries 4 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. of“takeovers”,wherebythecodingfunctionsofaaRSribozymeswereprogressivelyreplaced,without disruption, by enzymic counterparts. Our analysis articulates the improbability that such a takeover couldeverhavetakenplace. D. ContemporaryaaRSsfurnishcluesabouthowtheybecamemolecularinterpreters. Understanding the evolutionary basis for the Central Dogma (Fig. 1) requires asking how selforganization and selection might have produced, from nearly random origins, finely tuned ecological nichesofaminoacidsarrangedtoprovidethecatalyticandpattern-matchingcapabilitiesnecessaryfor theoperationofacodeusinga20-letteralphabet.Weenvisionthatthisprocessbeganwithareduced alphabetadministeredbyasmall“bootblock”thatgrewbystepwiseincreasesinalphabetsize,inwhich the information that survived (i.e. was selected) at each stage was such that it could be used by the existinginterpreterstomakethemselves,inspiteoftheerrorsthattheymade. Accumulatingsucherrorsleadspotentiallytotwotypesof“catastrophes”.Replicativeerrorseventually limit the survival of progressively longer “genes”, and can produce what has been called an “Eigen catastrophe” (Eigen and Schuster 1977). Similarly, translation errors eventually limit the functional specificity available to maintain a cell’s biochemical network, and can lead to an “Orgel catastrophe” (Orgel1968).Thus,replicationandtranslationerrorsrepresentthemostsignificantresistancetothe emergenceandgradualenhancementofbiologicalcomplexity. Eigen (Eigen 1971; Eigen and Schuster 1977) articulated a strategy for integrated survival of both informationandfunctionalspecificityinanerror-pronenetworkinvolving“informationcarriers”and “functional catalysts”. He noted that the survival of separate molecular species may be enhanced in systems containing either codependent or multiply interdependent components, whose cooperation with other members of the set might assist survival of sets of molecules that would otherwise be eliminated by competition. He called the symmetrical arrangement of components within these sets “hypercycles”,aconceptthatcanbegeneralizedtoincludeotherinterdependentarrangements. Although early aaRS phylogenies should record the order in which enzymic aaRS appeared, either ab initioorduringtheirtakeoverofribozymalaaRSs,earlierauthors(Woese,etal.2000)doubtedthatsuch information would be relevant to the emergence of genetic coding. §I, however, summarizes a new interpretationofevidence,fromexperimentaldeconstructionofbothaaRSclasses(Chandrasekaran,et al. 2013; Carter 2014; Carter, et al. 2014; Carter 2015, 2016, 2017), that all contemporary aaRS descended in modular fashion from a single bi-directional gene, whose strands coded for functional ancestors,respectively,ofClassIandIIsynthetases.Productsofthatgeneappeartohavebeenoptimally differentiatedandcraftedalmostideallytoestablishhypercycle-likeinterdependence,implementinga minimal amino acid alphabet—all characteristics of the “boot block” envisioned to have first enabled genetic coding. This bi-directional coding ancestry necessarily coupled the evolutionary descent of 5 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. contemporaryClassIandIIaaRSsphylogenies(O’DonoghueandLuthey-Schulten2003;WolfandKoonin 2007;Caetano-Anollés,etal.2013)discussedin§Babove. Wethereforefaceastarkchoice:eitherthesequentialaaRSdecompositionsintoincreasinglyconserved fundamentalmodules—urzymes(Ur=primitive;(Li,etal.2013;Carter2014;Martinez,etal.2015))and protozymes (Proto = before; (Martinez, et al. 2015)—and their bi-directional coding ancestry (Chandrasekaran,etal.2013;Carter,etal.2014;Carter2015),ortherelativeoriginsofmultipleprotein superfamiliesderivedfromcurrentphylogeneticanalysesmustbewrong.Weoutlinearesolutionin§III. Phylogenetic and biochemical evidence has been supplemented by developing statistically significant (CarterandWolfenden2015,2016)relationshipsbetweenidentityelementsthatdictaterecognitionof tRNAbydifferentsynthetasesandthesizeandpolarityofaminoacidsidechains.Codingrelationships implementedintRNArecognitionarethereforenotarbitrary,butreflectthedeeplyrelevantinnerlogic of protein folding rules (Carter and Wolfenden 2015; Wolfenden, et al. 2015; Carter and Wolfenden 2016).Weconsiderthispoint,reflexivity,andotherrelevantconceptsingreaterdetailin§II. CorrelationsofthetRNAidentityelementsalsorevealedthatsignalsforthetwophysicalpropertiesare distributed differently between the anticodon (specifying polarity) and the acceptor stem (specifying size).This,inturn,furnisheddetailsofan“operationalRNAcode”inthetRNAacceptorstem(Schimmel, et al. 1993) that likely preceded development of coding by the tRNA anticodon, perhaps first implementingonlythebinarydiscriminationbetweenlarge(ClassI)andsmall(ClassII)sidechains.This newevidencemakesitdifficulttoimaginehowasophisticatedproteomesynthesizedbyanadvanced ribozyme-basedtranslationsystemcouldhavesurvivedanytransitionviaaprotein-basedexpression systemwithfidelityaslowasthatofprimordialbinarycoding. RESULTS I. AARSclassdualitieswouldhavehelpedtostabilizequasispeciesbifurcations. ThreefunctionalitiesgivetheaaRStheiruniquestatusastheearliestenzymes:(i)Theyaccelerateamino acidactivationattheexpenseoftwoATPphosphates~1014-fold,resultinginirreversiblesynthesisof aminoacyl 5’AMP. The uncatalyzed rates of all other reactions in protein synthesis are orders of magnitude faster than thatreaction, whichthuslimitstherateofprebioticproteinsynthesis.(ii)The adenosineinATPservesasanaffinitytagthatincreasesactive-sitebinding1000-fold,enhancingcoding assignmentspecificity,especiallywhereeditingisrequired.(iii)TheyacylatetRNA,covalentlylinkinga specifiedaminoacidtoatRNAmoleculebearingacode-cognateanticodon. Notably,twodistinctsetsofhomologousaaRSstructures,ClassIandClassII(Cusack,etal.1990;Eriani, etal.1990;Ruff,etal.1991),implementthesethreefunctionsindisparateways.Thetwoclassesactivate symmetricalsetsof10aminoacids.Bothclasseshaveonemajor(A),andtwodifferentminorsubclasses (BandC)(Cusack1994).ThecommonoriginofthetwoaaRSclassesonoppositestrandsofthesame 6 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. ancestralgene(RodinandOhno1995)remainedobscureuntilquiterecently(Martinez,etal.2015).This sectionoutlineshowconsequencesofthisdualityatmultiplestructuralandfunctionallevelsmayhave servedtodifferentiateandstabilizeearlystagesofgeneticcodinginthefaceofhigherrorrates. Anancestralbi-directionalgeneproducedaClassIancestorwithamodestspecificityforlargeraminoacids, and a Class II ancestor with similar specificity for smaller amino acids. No useful information can be encodedusingonlyonekindorequivalentclassofactivatedaminoacid.Thesimplestimaginablecode wouldhaverequireddiscriminatingbetweenatleasttwokindsofaminoacids.Theinterestingscenarios (Wills2004)thusentailgeneratingthefullcodefromsimple2-or4-letteralphabetsviatransitions,each ofwhichincreasestheeffectivesizeneffoftheaminoacidandcodonalphabets.Nestedinstabilities(Wills 2004)allowforsuchaseriesofcode-expandingtransitionstoattractorstateswithprogressivelylarger valuesofneff.Thesetransitionsconnectdynamicstateswithsignificanterrorratesandthusentailbroad distributionsoffunctionalproteinsequencesandtheirencodinggenesthatarecalled“quasispecies”,so wecallthecorrespondingtransitions“quasispeciesbifurcations”(Fig.2). TheTrpRSandHisRSurzymes(Li,etal.2013)andthedesignedClassI/IIprotozymegene(Martinez,et al. 2015) furnish substantive experimental representations of the ancestral assignment catalysts envisionedbyWills(2004).Geneproductscreatedfromoppositestrandsutilizingthefullgeneticcode bothaccelerateaminoacidactivation~106-fold.Bothwild-typeprotozymesexhibithighATPaffinityand the Class I protozyme possesses a consensus phosphate binding site composed entirely of oriented backboneNHgroups(Hol,etal.1978).Thus,itseemsplausibleandofobviousinterestthatprotozymes codedusingfewerthanthecanonical20aminoacidsmightretainsubstantialcatalyticactivity. Acodingsystemassigningdualclassesofaminoacids{a,b}thatarefunctionallydifferentiatedinacrude binaryfashiontotRNAswithanticodonscomplementarytocodons{A,B}bysuchspeciescouldbifurcate intotwoversionstoproducefour-membercodonandaminoacidalphabets,{A,B,C,D}and{a,b,c,d}, increasingthecodingcapacityfrom2lettersto4letters,andexpandingthe2´2translationtableintoa 4´4table.Insimulationsofsuchaprocess(Wills2004,2009),thehierarchicallynestedembeddingof assignment activities in the protein sequence space geometrically mirrored the decomposition of the alphabets. The system showed stepwise coding self-organization, first from a non-coding state to the executionofabinarycode{A®a,B®b}andthenfromthebinarycodetotheexpandedfour-dimensional code{A®a,B®b,C®c,D®d},anticipatingexperimentalstudiesofthetwosynthetaseClasses(Fig.2). Ancestralbi-directionalcodingwouldhaveimpactedtheemergenceandevolutionofgeneticcodingin severalimportantways.First,theancestralgenewouldhavepartitionedthesequencespacedecisively, dividing it between those sequences related most closely to each of the two strands. Second, the translated products of each strand would have differentiated the functional specificities retained by sequences surrounding the centroids of the two populations. Third, the bidirectional coding 7 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. complementarity constraint steepens the fitness landscape, decisively enforcing coding cooperation compared to the corresponding possibilities for genes that could mutate independently, thereby increasing selection pressure for coding. Cooperation is therefore more robust than it would be with independentgenesfortheClassIandIIurzymes.Finally,thereducedvolumesofsequencespaceand enhanced functional specialization of the two bi-directionally coded quasispecies suggest that fewer mutations were necessary for neofunctionalization of subsequent duplications, successively easing subsequentbifurcationsasneffincreasedduringthebi-directionalcodingregime. A. ExperimentaldeconstructionsofClassIandIIaaRSrevealparallelstructuralhierarchies. Apuzzlinghierarchyofinversionsymmetriesinthestructural,functional,andevolutionarybiologyof contemporaryaaRSsnowappearstomakesenseiftheaaRSsareremnantsofsuchbifurcations. Superimposing Class I and II aaRS catalytic domains reveals small invariant cores, distinct from idiosyncraticelementsuniquetoeachaminoacid.LikeRussianMatryoshkadolls,paralleldeconstruction ofbothClassIandIIaaRSfamiliesrevealsnested,increasinglyconservedmodularcatalystsofnearly equalmolecularmass(Carter2014):catalyticdomains(200-350residues),urzymes(120-130residues; (Pham, et al. 2007; Pham, et al. 2010; Li, et al. 2011; Li, et al. 2013)), and protozymes (46 residues; (Martinez,etal.2015)),eachretainingconservedportionsfromitsprecedingconstruct. Urzymesretainallnecessaryfunctionsoffull-lengthaaRS,albeitwithlowerproficiencyandspecificity, andareanalogoustousing“molecule”todefinethesmallestunitofmatterthatretainsallpropertiesof achemicalsubstance.Protozymes,ontheotherhand,approachthesmallestpolypeptidecatalystsand henceareperhapsmoreanalogousto“atoms”. Publishedevidencethatexperimentalurzymecatalyticactivitiesariseneitherfromtinyamountsofwildtype enzyme nor from unrelated, but highly active contaminants includes the following (Pham, et al. 2010;Li,etal.2011):(i)emptyvectorcontrolshavenoactivity.(ii) proteasecleavagetaggedfusion proteinsreleasescrypticactivity.(iii)Mutationsalteractivity.(iv)AminoacidKMvaluesdifferfromWT values, and, most importantly, (v) single turnover active-site titration experiments show pre-steadystate burst sizes demonstrating that 35−75% of molecules transiently form tight transition-state complexes.Experimentalassaysofprotozymeswerevalidatedbyshowingthatactive-sitemutantsH18A (ClassI)andR113A(ClassII)eliminatedactivityoftherespectivecatalyst(Martinez,etal.2015). Modular accretions in the structurally unrelated Class I and II protein superfamilies exhibit parallel accelerationsoftherate-limitingstepofproteinsynthesisovera108-foldrange.Experimentaltransitionstatestabilizationfreeenergiestracklinearlywithnumberofresiduesindeconstructedconstructsfrom bothClasses,reinforcingthejustificationoftheseconstructsassnapshotsintheparallelevolutionofboth synthetase classes (Martinez, et al. 2015). Urzymes retain ~60% of the full-length transition state stabilization free energy observed in modern synthetases. Protozymes are only 46 amino-acids-long. AlthoughtheyretainonlytheATPbindingsites,aaRSprotozymesfrombothClassIandIIaaRSexhibit 8 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. ~40%ofthefull-lengthtransition-statestabilization,buthavenotyetbeenshowneithertoacylatetRNA ortodiscriminatesignificantlybetweendifferentaminoacids. Theseaccelerationsdocumentthatmultiplefamiliesofproteinsarecapableofsynchronizingchemical reactions over a very broad range from the uncatalyzed rate to that observed in contemporary organisms.RNAhasnotbeenshowncapableofparallelrateaccelerationsoversuchadynamicrange either in parallel families or with similar increases in mass, underscoring the superior ability of polypeptidecatalyststosynchronizecellularchemistry. B. Ancestralbi-directionalgeneticcodingunderliestheaaRSclassdistinction RodinandOhno(RodinandOhno1995)alignedcodingsequencesofthetwoaaRSClassesinopposite directions revealed highly significant bi-directional coding of the class-defining active-site sequence motifs. Subsequently, it became increasingly apparent that protein-based aaRSs all descended from a singleancestralgenewhosecomplementarystrandsencodedprecursorstotheClassIandClassIIaaRS superfamilies (Carter, et al. 2014; Carter 2015; Martinez, et al. 2015). Bi-directional coding ancestry impliesthatproteinaaRSgeneevolutionbeganwithanearlystageinwhichtheuniqueinformationin onestrandofagenecouldbeinterpretedasadifferentproteinwithasimilarfunctionontheopposite strand.ThreetypesofresultshaveconfirmedpredictionsoftheRodin-Ohnohypothesis: 1) The most highly conserved portions of contemporary aaRSs should correspond to modules in the contemporary enzymes capable of bi-directional alignment, and should retain catalytic activity when extractedfromthefull-lengthgenes.Twosuccessivelevelsofexperimentaldeconstructionconfirmthis prediction.Urzymes(Pham,etal.2007;Pham,etal.2010;Li,etal.2011;Li,etal.2013)have~120-130 aminoacidsandretainallofthetranslationfunctionsofcontemporarysynthetasesandaccelerateamino acidactivationby109-fold,withsignificantspecificity.Adesignedbi-directionalgeneencodes~46amino acidClassIandIIProtozymesthatcontaintheATPbindingsitesoftherespectiveaaRS,bindATPtightly, andaccelerateaminoacidactivation106-fold(Martinez,etal.2015). 2)Codingsequencesshouldretainahigherfrequencyofbase-pairingbetweenmiddlecodonbasesin antiparallel, in-frame alignments of Class I and II aaRS. This middle-base pairing frequency, ~0.34, is significantlynon-randomandincreasesto0.42incomparisonsbetweencodingsequencesreconstructed independentlyforancestralnodesofbothClassIandIIaaRS(Chandrasekaran,etal.2013). 3)Itshouldbepossibletore-constructabonafidebi-directionalgenesuchthateachstrandcodesfora functionalaminoacidactivatingenzymehomologoustoonecontemporaryaaRSsclass.Weconfigured Rosetta to both constrain tertiary structures and impose genetic complementarity to give “designed” ClassIandIIprotozymes(Martinez,etal.2015).Remarkably,allfourwild-typeanddesignedpeptides fromClassIandClassIIhavethesamekcat/KMandaccelerateaminoacidactivationby~106-fold.Wildtypesequenceshave100-foldlowerkcatand100foldhigherKMvaluesthandothedesignedprotozymes fromthecomplementarygene,inkeepingwiththepossibilitythattheirwild-typesequencesmayinclude 9 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. amino acid binding determinants lost in the designed protozymes. The protozymes extend a linear relationship between transition state stabilization free energy and the number of residues of the constructs. Notably, Class I and II constructs exhibit the same slopes and intercepts relating rate accelerationtonumberofresidues(Martinez,etal.2015). Bi-directional,in-framecodingisastrangeidea.Base-pairingispartofaninversionsymmetryoperator thatgeneratesthesequenceand(usinghelicalsymmetryoperators)thestructureoftheoppositestrand. Becausetheoppositestrandsequencecanberetrievedusingthisinversionoperator,agene’sunique informationiscontainedineachstrand.Thatuniqueinformation,however,hastwodifferentfunctional interpretations. Validating (1)-(3) of the Rodin-Ohno hypothesis revealed higher-order symmetries relatingClassIandIIgeneproducts(Carter,etal.2014;Carter2015),asdiscussedin§II.C-§II.E. C. Anancienthypercycle-likeinterdependencerelatescatalyticresiduesineachClass. Active-site amino acids in aaRS occur in three sets of signature sequences (Eriani, et al. 1990; Carter 1993). Class I HIGH and KMSKS sequences and Class II Motifs 1 and 2 are present in the respective urzymes.TheHIGH/Motif2signatureispresentintheprotozymes.Asthesemotifsprovidedtheoriginal evidenceforbi-directionalcoding(RodinandOhno1995),andcontainactive-siteresidues,itcomesas no surprise that the respective active-sites utilize different catalytic residues. In fact, all residues contributingtocatalysisbyClassIactivesitesmustbeactivatedbyClassaaRSII,andconversely,residues neededforClassIIactivitymustbeactivatedbyClassIaaRS(Carter,etal.2014;Carter2015,2017).This functional “anti-homology” dates from the earliest Class I and II catalysts. Interdependence induces a hypercycle-likecouplingbetweenthetwobi-directionalgeneproductssimilartothatproposedbyEigen (Eigen1971;EigenandSchuster1977)tomitigatecompetition,inducecooperationandtherebyincrease theoverallsemi-randomgeneticcontentthatcouldsurvivedeteriorationatanygivencopy-errorrate. D. FoldedClassIandIIAARStertiarystructuresare“insideout”. Binarypatternscodingforproteinsecondarystructures(Kamtekar,etal.1993;Patel,etal.2009)are reflected across complementary coding strands. They are determined by positions of hydrophobic residues (Muñoz and Serrano 1994); the heptapeptide repeat, a-g, with hydrophobic amino acids in positionsa,e,f,isdiagnosticforalphahelix.Alternationofhydrophobicsidechains,especiallywhenthey includesidechainswithbranchedβ-carbonatoms,isalmostaseffectiveapredictorofβ-structure. Solubleglobularproteinshavehydrophobiccoresandwater-solublesurfaces.Thedistributionofamino acids in folded proteins between these two extreme environments is spanned by a two-dimensional “basisset”furnishedbytheexperimentalfreeenergiesoftransferbetweenvaporandcyclohexaneand between water and cyclohexane (Carter and Wolfenden 2015; Wolfenden, et al. 2015). The contemporarygeneticcoderespectsthisdichotomytoanextraordinarydegree,ascodonsforvirtually all core side chains are anticodons for distinctly surface side chains (Zull and Smith 1990). Complementary codons for proline and glycine, most often associated with turns, mean that such 10 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. sequence-directedturnformationalsoreflectsacrosscodesfromantiparallelstrands.Thus,thefolded productsfromabi-directionalgenewilltendtohavecomparablesecondarystructures,withopposite polarities.Bythesecriteria,ClassIandIIaaRSurzymesarebothantiparalleland“insideout”. E. ClassIandIIaaRSaminoacidsubstratespecificities,especiallythosefromancestralcodes, arerelatedbyinversionwithrespecttosidechainsize(Carter,etal.2014;Carter2015). ModernaaRSsprefertheircognateaminoacidsby~5.5kcal/mole,~80%ofwhichcomesfromallosteric influencesofmorerecentlyacquiredmodulesontheurzymeactivities.Lackinginsertion-andanticodonbindingdomains,ClassILeuRSandClassIIHisRSurzymesarerelativelynon-specific(Carter,etal.2014; Carter2015).ExperimentalΔGkcat/KMvaluesshowthattheyhavesimilarandcomplementaryspecificities. LeuRSurzymeprefersClassIsubstrates;HisRSurzymeprefersClassIIsubstrates,bothby~1kcal/mole. TheyarethereforecapableofmakingthecorrectchoicebetweenClassIandIIaminoacidsroughlyfour timesinfive.Thatfidelityistoopromiscuoustosupportmorethan“statisticalensembles”ofpeptides, ashypothesizedbyCarlWoese(Woese1965a,b;Woese,etal.1966).Thus,urzymeswouldhavebeen thepredominantassignmentcatalystswithinamuchbroaderpopulationofmoleculartypes,withthe properties of a “quasispecies-like” cloud as defined by Eigen (Eigen and Schuster 1977) that likely includedmanyspecieswithsimilarspecificity,butlowercatalyticefficiency. TheonlystatisticallysignificantdistinctionbetweenaminoacidsactivatedbyClassIandClassIIaaRSis theirsizes(CarterandWolfenden2015;Wolfenden,etal.2015):ClassIaminoacidsareuniformlylarger thanthosefromClassII.Accountingforthesolventexposureofaminoacidsinfoldedproteinsentails both size and polarity and is therefore two-dimensional (Carter and Wolfenden 2015, 2016). Class II aminoacidsmigratesignificantlytowardwaterinterfacesduringproteinfolding,whereasClassIamino acids migrate toward cores. This differential localization in folded proteins gains significance in the contextofthedistributionofidentityelementsintRNA(§II.F). F. tRNA acceptor stem identity elements represent a code for amino acid side-chain size and otherdescriptorsincludingsidechaincarboxylationandβ-branching. Evidence that the much smaller aaRS urzymic cores could have accelerated the rates of tRNA aminoacylation(Li,etal.2013)nowmakesitincreasinglylikelythatanearly“operationalgeneticcode” (Schimmel, et al. 1993; Schimmel 1996) functioned entirely on the basis of acceptor stem bases that specifiedthemostsignificantdifferencebetweenClassIandClassIIaminoacids.AncestraltRNAsmay havebeenonlyabouthalfthesizeandconsistedofonlytheacceptorandTΨCloopsofmoderntRNAs. Doublingofthisancestralstructurehasbeenproposedtohavecreatedtheanticodonanddihydrouridine loopswiththeanticodoninitiallyservingasaproxyfortheidentityelementsintheacceptorstem(Di Giulio1992;Rodin,etal.1996;DiGiulio2004,2008;RodinandRodin2008).Anysuccessfulmodelfor theemergenceofgeneticcodingfromanRNA-basedsystemofmolecularinformationprocessingshould thusbeconsistentwiththesetwoobservationsaswellaswiththephylogeniesofthetwoaaRSClasses. 11 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. G. ClassI,IIgenes,geneproducts,mechanisms,andspecificitiesaremaximallydifferentiated. An important barrier to the emergence of diversity from quasi-random reproductive processes is the strongtendencyofmutantdaughterspeciestoregresstothecentroidofthedistributionsfromwhich they originate (Eigen, et al. 1988). The centroids behave as “strong attractors”. Inversion symmetries relatingClassIandIIaaRS,describedin§II.B-§IIEsuggestthattheirgenes,geneproducts,functions,and substrates are inherently differentiated to survive successive quasispecies bifurcations necessary for enhancedgeneticcodingtoemergefrompopulationsoflowsequenceidentityandmodestspecificity: 1) Bi-directional coding complementarity means that individual ancestral Class I and II gene sequencesareasdifficultaspossibletointerconvertfromonetotheotherbyserialmutation. 2) DescentoftheClassIandIIaaRSfromabi-directionalgenestabilizestwoquasispeciesthatcan presumablybegintointerpretbinarysequencepatterns,decisivelyovercomingthebarrierposed bythestrongattractionofasinglequasispecies. 3) Reducedpopulationsizeandenhancedfunctionalspecializationofthetwobi-directionallycoded quasispeciessuggestthatfewermutationsarenecessaryforneofunctionalization,successively easingsubsequentbifurcationsduringthebi-directionalcodingregime. 4) Distinct properties of protozymes and urzymes point to successive emergence during the bidirectional coding era of their ATP-, amino-acid- and pyrophosphate-binding sites, consistent withmodularconstructionofbasicaaRSfunctions. 5) Invertedfoldinginstructionsgiveriseto“insideout”ClassIandIItertiarystructuresthatareas differentaspossiblefromoneanother,andthusminimallyvulnerabletomutationsthatmight fusethetwoquasispeciesbyregressiontothecommoncentroid. 6) Catalytic residues in Class I and II aaRS are entirely segregated. Thus, throughout their early evolution,thetwoClassesformedahypercycle-likenetwork(Fig.3).ByargumentsfromEigen andSchuster(Eigen,etal.1988)andWills(Wills2009),theirinterdependencedefendedthem againstcorruptionbymolecularparasitesduringgrowthofcatalyticnetworks. 7) ClassIandIIaminoacidsarethemselvesoptimallyseparatedonthebasisof(i)size,(ii)polarity, andhence(iii)theirultimatedestinationinfoldedproteins. II. Bi-directionalityfurnishesfourpropertiesindispensableforself-organizationofcoding Avoidingmultiplestopcodonsonbothstrandsofabi-directionallycodedancestralgenewouldmandate that each of the four bases have a functionally coded meaning when it occurs as an (internal) codon middlebase(seeforexample(Delarue2007)).Thiswouldimplya(possiblyredundant)alphabetoffour letters.SuchareducedrepertoireisconsistentwiththatexpectedforanancestraltRNAacceptorstem, inkeepingwiththefactthatthecontemporaryacceptorstemcodedistinguishesbestbetween(i)large andsmall,(ii)β-branchedvsunbranched,and(iii)carboxylatevsnon-carboxylatesidechains(Carter andWolfenden2015).Presumably,selectionsubsequentlydroveboththecodeandprimordialcoding 12 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. sequences to capture and employ additional symbolic information for precisely those chemical properties—size and polarity—that determine how the 20 amino acids direct proteins into unique configurations(CarterandWolfenden2016). Bi-directionalcodingofenzymicaaRSimpactsfourpropertiesthatfavormuchmorerapidandefficient evolutionofgeneexpressionthanwouldhavebeenpossibleforribozymalaaRS.Thesepropertiesare developedmoreextensivelyandwithgreatermathematicalrigorinaseparatepaper(WillsandCarter 2017). A. AnysetofaaRSsformsaninterdependentcatalyticnetwork BecausecontemporaryaaRSareproteins,theirownfunctionalstructuresalldependintimatelyonall aaRSfunctionalitiesandsoformhypercycle-likenetworks.Interdependenceimpliesfurtherthatboth programminglanguageintRNAandtheirmRNAprogramsco-evolvedfromsimplerancestorswithfewer distinctionsbetweenthem,andhencelesscomplexinterdependencies.Therefore,theyareexpectedto have something approaching discrete ancestries, and hence successively simpler levels of interdependenceasweapproachtheroot.StructuralvariantsinanyfunctionalaaRSpopulationmust haverespondedcoordinatelythroughouttheirevolutionaryhistorytotwodifferentchemicalsignals— aminoacidandtRNA.Bi-directionalcodingancestrydeeplyanchorssuchinterdependenceintheearliest ancestralquasispecies,asactive-sitecatalyticresiduesinClassIandIIaaRSmustbeactivatedbythe oppositeclass(Carter,etal.2014;Carter2015,2017). B. Reflexivityofprotein-basedassignmentcatalysisofferssuperiorpathstocodebootstrapping andoptimalgenesequences. The aaRS molecular biological interpreters are the first and, probably the only, products of mRNA blueprintsthatcanimplementthetranslationtableembodiedintRNA.Accumulatingreflexivegenetic information—genes whose expression by rules can, in turn, execute those expression rules—is an intrinsicarchitecturalfeatureofthePCWthatisabsentfromanyRCW.Rapidself-organizationofcoding inthePCWisdrivenbyreflexive,in-parallelsensing(Fig.3)oftheaminoacidphasetransferequilibria that drive folding and thus enable aaRS to recognize both the symbolic information in tRNA (i.e., the syntax)andthechemistryofenzymes(i.e.,thesemantics)builtasinterpretationsofmRNAsequence informationwritteninthecodinglanguage(Fig.1C). Theuniversalgeneticcodeisanearlyuniqueselectionfromaninconceivablylargenumberofpossiblecodes and must have been discovered by bootstrapping. It efficiently maps the chemical properties of amino acidsontothesequencespaceoftripletcodons(CarterandWolfenden2016)andisalmostideallyrobust to mutation. Bi-directional ancestry restricted the tiny fraction of the possible codes that share this optimality (Freeland and Hurst 1998; Koonin and Novozhilov 2009) to an even smaller subset by requiring anti-correlated coding of amino acid physical properties (Zull and Smith 1990; Chandrasekaran, et al. 2013). Discovery of such a rare and highly optimized code through random- 13 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. samplingnaturalselectionhasavanishinglysmallprobabilityreminiscentofLevinthal’sparadoxabout proteinfolding(DillandChan1997).Farmorelikelytoproducesucharesultisaseriesoffeedbackaccelerated symmetry-breaking processes, like phase transitions, that could bootstrap the earliest prebiotictranslationsystemintoexistencefromsomeprevious,lesswell-organizedchemistry. Themechanisticimplementation(Fig.3)ofreflexivitymakesitclearthattherequisitesforaccelerating abootstrappeddiscoveryofcodingarebuiltintothePCW,butabsentintheRCW.Weenvisionaminimal, low fidelity instruction set or “boot block” whose realization has been substantially demonstrated ((Martinez,etal.2015);Fig.3),andwhosefeedback-sensitivitycouldimproveitselfbyelaboratingits ownresources,muchlikeinstallinganoperatingsysteminacomputeratstartup.Increasinglyspecific codingassignmentsduringsuccessivetransitionstepscouldtakeholdonlybyconferringnewselective advantage(s)totheevolvinggenes,i.e.mRNAsequences,inwhichtheybecameencoded.Thisway,such asystemcouldexpressnewmeaninginakindofsnowballeffectbeyondthespecificleveloffidelityand complexityalreadyachieved. Thebootstrappingmetaphorcreatessubstantiallynewperspectivesontheemergenceofgeneticcodingby integratinglocalenvironmentalsensingintogeneratingfunction(Fig.4).Codingrulesultimatelyresult fromtheRNAand/orproteinfoldingrulesthatgeneratefunctionalassignmentcatalystsfromsequence. Ribozymal and enzymatic functions are coupled to quite different nano-environmental effects. Specificity,RNAfoldingdependslargelyonbasepairingbecausethefournucleotidebasesareotherwise almostundifferentiated,havingonlytwosizesandsolventphasetransferequilibriathatdifferbyatmost –3.7kcal/moleintheirtransferfreeenergiesfromchloroformtowater(CullisandWolfenden1981). Thecorrespondingphasetransferequilibriaofthe20canonicalaminoacids(RadzickaandWolfenden 1988)exhibitapproximatelyfive-foldgreatervariationsinpolarityand26-foldgreatervariationinsize. These differences together with the dominance of backbone-backbone hydrogen bonding result in profoundlydifferentproteinfoldingrules. Thedifferentialequationsgoverningexpressiondynamics(Fig.4A;(WillsandCarter2017))augment the transcendent difference between coding rules derived in an RCW and in the PCW because the synthesis of protein translatases is autocatalytic (horizontal arrows) in the PCW, but not in an RCW. Coding rules in an RCW must be executed by ribozymes (Fig. 4B). An RCW thus cannot provide the intrinsic self-organization necessary to rapidly refine nanospace protein engineering, the essential advantagetobegainedfromenhancedcodingspecificity. Emergenceofhigher-functioningencodedproteinscannotbetriggeredbyreflexivefeedbackinanRCW whoseexpressionsystemitselfcontainsnoproteins.Moreover,codingrulesbasedonRNAfoldingrules areintrinsicallyinsensitivetoproteinfoldingrulesand/orfunctionality.Bootstrappingofcodedproteins in an RCW would require selecting ever-better ribozymal aaRSs through a slow, indirect Darwinian evolution process that could discover protein folding rules only from non-aaRS protein performance. 14 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Moreover, ribozymal aaRS variants capable of improved assignments would have to be selected for robustness against mutation in protein-coding genes. The extrinsic self-organization resulting from mutationandhigher-levelselectioninanRCWprovidesnodirectfeedbackprocedurefordiscoveringa translation table that embodies an ordered symbolic encoding of amino acid sidechain chemistry in foldedproteins.Suchorderlinesswouldhavetoprogressivelyproveitsadvantagefortherelevantunit ofselection,presumablyaprotocell. InthePCW(Fig.4C),codingrulesaredeterminedbythecatalyticpropertiesoftheextantpopulationof proteinaaRSs,whosesequencesare,themselves,producedbyrulesactingonthesetofgeneticblueprints (mRNAs) that encode the aaRS rule executors. This reflexivity in the PCW enables stages of selforganizationingeneticcodingtooccurrapidly,essentiallyasdynamicphasetransitions,becausenanoenvironmental sensing (Fig. 3) can couple the coding rules naturally to protein folding rules. AARS tertiarystructures—positioningdistantaminoacidsinprimarystructureclosetooneanotherinspace— asdeterminedbyaminoacidphasetransferequilibria(Fig.3),furnishtheaaRSspecificityrequiredto determine the coding rules. Sensitivity of the code to the phase transfer equilibria of amino acid side chains allows those equilibria to feed directly back onto protein aaRS folding and function, naturally producingarefinedmapofthephaseequilibriathatgovernproteinfoldingandfunctionintheexisting code,viathetRNAidentityelements(Wolfenden,etal.1979;RadzickaandWolfenden1988;Wolfenden, et al. 2015; Carter and Wolfenden 2016). Thus, in the PCW the mechanism for nanoscale control of chemistry,i.e.coding,isdetermineddirectlybyitsoutcome. APCWalsocoordinatesandoptimizesdiscoveryofgenesequencesbyplacingaminoacidswithdifferent properties in different positions in accordance with their effects on a folded protein. To consider aminoacylation functionalities as “assignment catalysis” relevant to coding, the specificity for the relevantaminoacidmustalsohavebecomeassociatedwithaparallelspecificityinchoosingprimitive “codons”inprecursormRNA.Enhancementsthatincorporatednewaminoacidsintotheprogramming languagehadtoco-evolvewithmessagesabletoexploitthem. Thisspecialrelationshipbetweenaminoacidsidechainenergetics,thecodingrules,aaRSsequences,and theirgenes,establishesreflexivityatanevenmorefundamentallevel,acquiringadditionalbootstrapping fromthechemicaleffectsofaminoacidsoccurringinparticularrelationshipstooneanotherinthethree dimensionalarchitectureoffoldedsynthetases(Fig.4C).Inotherwords,aPCWautomaticallypressures an evolving code to discover and refine the partition between amino acids that gives the genetic representation of functional properties best adapted for survival: an error-minimized code in which aminoacidswithsimilarchemicalpropertiesareassignedtosimilarcodons.Thisargumentextendsto everystageofcodeexpansion.Thus,codeevolutioninaPCWwillinevitablytargetbothnear-optimal foldedproteinfunctionalityandanencodingthatrepresentssurvivalfitnessaspreciselyaspossible. 15 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. For these reasons de novo emergence of genetic coding into a peptide/RNA world appears to have introducedsuchoverwhelminginfluenceonthechoiceofcodonsbestabletorepresenttheeffectofan aminoacidenteringthedevelopingecologyinsideafoldingproteinthatitmustbeseenasenormously moreprobablethancodingemerginginanRNAWorld. C. Fidelity:AnysimplePCWtakingoveramoresophisticatedribozymalcodingwillincreasethe overallerrorrate,degradefitness,andhencebeeliminatedbypurifyingselection. The PCW is rooted in phylogenetically-based ancestors capable only of the simplest coding assignments—perhapsoneoratmosttwobits—andconsequentlyalsoinacodingsystemnecessarily operatingathigherrorrates.Reducingerrorratesinbothreplicationandtranslationmustcertainlyhave requiredlargeralphabets.Tobeselected,thefunctionalityofsuchprimordialcodingmustalreadyhave exceeded that of whatever preceded it. Its simplicity appears to rule out scenarios involving proteins “takingover”catalyticfunctionsfromanypre-existingworldofsophisticatedRNAcatalysts.Forclarity, wehenceforthrefertoexecutorsofassignmentcatalysisasRNAorprotein“translatases”,todistinguish themfromcontemporaryaaRS. Any coding system depends on the maintenance of a population of templates that either specify the sequencesofribozymalaaRSsorencodethesequencesofproteinaaRSs.InanRCWallsuchtemplates arerequiredsomehowtosurvive,essentiallyasparasites,inaworldofRNAreplicators.Aribozymal coding system, consisting of only ribozymal translatase species, could be functionally autonomous. However, the attractor state of a hybrid ribozymal/protein aaRS system is one in which the protein population also contributes to the overall rate of translation of any genetic template, and more importantly,toitsoverallerrorrate. Aseparatepaper(WillsandCarter2017)treatsthisprobleminanextensionofearliermathematical modelsofcodingself-organization(Bedian1982;Wills1993;Wills1994;Bedian2001;Wills2004)by consideringthedynamicstabilityofco-existingribozyme-andprotein-operatedassignmentcatalysts. Weconfirmanalyticallytheintuitiveconclusionthattranslationerrorswouldinevitablybehigherfor anyhybridcodingsituationdrivensimultaneouslybyseparateribozymalandproteintranslatasesthan theywouldbeforanoptimizedsystemwithonlyonetypeofaaRS.Ifbothtypesoftranslataseseffect codon-to-aminoacidassignmentsatdifferentcharacteristicratesandaccuraciesthehybridsystemwill necessarilyoperateatintermediateerrorrates.AsEq.(25)of(WillsandCarter2017)makesabundantly clear,introducinganysignificantpopulationofproteintranslatasesintrinsicallylessaccuratethanan extant ribozymal coding apparatus will undermine the role of the ribozymal translatases, possibly threatening the protein domain with extinction as the selective advantage of ribozymal translatases, indirectlyconferredbyproteinfunctionality,isdiminished.Theproblemwillbeextremeinthepresence ofrudimentaryancestralproteinaaRSsthatoperatealowdimensionaltranslationtable.Theaccuracy 16 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. oftheproteintranslataseswouldthenbenecessarilybeverymuchlessthanthatoftheextantribozymal population,makingsurvivalofproteinsdependentontheeliminationoftheproteintranslatases. Eitherway,theonlypathtocurrentmolecularbiologythusappearstorequireproteinaaRSgenesto emergeinconcertwithotheressentialencodedproteingenes.Thatrequirementhighlightstheproblems arising from coordinating inheritance with gene expression. We therefore turn our attention to the dynamicsoftemplatereplicationanditseffectontheevolutionoftranslation. Mixed ribozymal and enzymatic protein replicases. Copying of genetic information lies at the heart of Darwinianevolution.ConsidertheadventofaproteinreplicaseinafunctionalRCWinwhichrelatively sophisticated and accurate information copying has evolved through selection of a general ribozymal replicase.IntroducingaproteinreplicaseinanRCWgeneratesthesameproblemjustdescribedforthe adventofproteintranslatases.Anyproteinreplicaselessaccuratethantheribozymalreplicase,whichis tobeexpectedforthefirstsuchproteinstoemergeinanRCW,woulddiminishtheprobabilityofcorrectly copyingallgenes,includingthatcodingfortheribozymalreplicase.Sincethesystemevolutionhasbeen optimizedundertheconstraintoftheribozymalreplicase’sperformance,thesystemwillbeatriskofan errorcatastropheunlessselectionpurgesitoftheemergentproteinreplicase. TheevidentsimplicityoftheearliestcodingapparatusinthePCWposesaninsuperablebarriertotakeover of any more sophisticated coding apparatus in an RCW. Newly emerging protein-based assignment catalysts must have been far less specific than the pre-existing ribozymal assignment catalysts envisioned,forexample,byWolfandKoonin(2007),andcannothavebeenselectedwithinanadvanced RCWbecausetheirveryrudimentaryfunctionalitywouldcorruptanypre-existingribozymaltranslation systemofhigherspecificityanddiversity.NohybridsetofproteinandribozymalaaRSand/orreplicases canhavesuperiorfitnesstothoseofapre-existingRCW,asreinforcedbythefollowingconsiderations: (i)Themoresophisticatedthepre-existingRCW,theharderitwouldhavebeenforearlystagesofPCW code development to compete. Conversely, the detailed inversion symmetries arising from bidirectionallycodedgene(§I)allpointtotheirkeyroleinenforcingdifferentiationearlyintheevolution ofthegeneticcode,whenitwasmostvulnerabletoparasiteswithincorrectspecificities. (ii)ThedramaticrateaccelerationbyaaRSprotozymesontheotherhand,representsadecisiveselective advantageinapepide•RNAworld,firstbyharnessingthechemicalfreeenergytransferofNTPutilization andthenbyprovidingaflowofactivatedaminoacids. (iii) RNA sequences destined to evolve into genes once an accurate translation system had evolved wouldhavehadnoobviousselectiveadvantageunlesstheemergentPCWcodewaspracticallyidentical tothatoperatingintheRCW. Thus,evenwereanRCWtohaveexisted,itwouldbeirrelevanttocontemporarybiologyifthePCWhad torecapitulatetheentiregenesisofthecode.Nor,ofcourse,doesanyevidenceremainofsuchribozymal aminoacidactivatingcatalysts,or,indeedofribozymalpolymerases.Finally,ifthebranchingphylogenies 17 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. of protein aaRS provided opportunity for self-organising quasispecies bifurcations, and their evident reflexivity greatly accelerated the search for an optimal code, then, an extensive phase of ribozymal proteinsynthesisnolongerfillsanytheoreticaldeficiencyinaccountingforthegeneticcode.Thus,itis ourviewthatnaturedidnotre-inventits“operatingsystem”(Bowman,etal.2015). D. Efficiency:avoidingdissipationsleadingtoEigenandOrgelerrorcatastrophes. The bootstrapping requirement (§II.B) and the instability of hybrid coding assignment systems with substantiallydifferenterrorrates(§II.C)mayreflectinherentlycomplementaryargumentsforefficient coupling between self-organization of information storage (replication) and readout (translation). Progressivemutationallossofreflexivityleadstoprogressiveincreasesinthecodingerrorrate(Wills 1994),resultinginthedissipationoffreeenergyflowsandultimatelyinwhathavebeencalled“error catastrophes”. Error rates impede self-organization at multiple levels. We examine here the possible couplingbetweenerrorgenerationduringreplicationandtranslation. Studies of gene-replicase-translatase (GRT) systems reveal gene replication and coded expression are interdependent. Are both self-organizational processes so strongly coupled that they emerged simultaneously? Such coupling is not only possible (Smith, et al. 2014) but it occurs spontaneously (Fü chslinandMcCaskill2001;Markowitz,etal.2006;Wills,etal.2015).GRTsystemsareintrinsically spatiallyself-organizing,andunlikethehypotheticalRCWnoextrinsic,higherlevelunitsofselection— i.e.compartmentation—arerequiredtoassuretheirsurvival. LivingsystemsnowproduceproteinsfrominformationencodedingenesusingaaRStranslataseswhose genesarecopiedusingproteinpolymerases.ThedynamicsoftheRNAdomainsofthePCWandRCW (Fig.4A;(WillsandCarter2017))makeitevidentthatgeneandproteinproductioninthePCWaretightly coupledthroughthepopulationvariablesthatrepresentthegenesandreplicaseenzyme.Furthermore, translatasesintheproteindomainarecooperativelyautocatalytic. Incontrast,eventsintheRCWproteindomainhavenoeffectonthevalueofanyRNAdomainvariable, sothedynamicsofreplicationandcatalyzedcodingassignmentsarecompletelyautonomousintheRNA domainoftheRCW.Moreover,theproteindomainisutterlydependentontheRNAdomainthroughthe variablesthatrepresentthepopulationsofencodinggenesandtheaccuracyoftheribozymaltranslatase population.Itishardtoenvisagehowanyselectionpressurethatproteinsmightexertonthesequences ofnucleicacidsinanRNAWorldcouldmoldarefined,chemicallyorderedsystemofgeneticcoding. Ontheotherhand,overcomingthedoubleriskofEigen-andOrgel-likeerrorcatastrophesininformation storageandreadoutimplicitinhighlycoupledmolecularbiologicalsystemsseemsequallyimpossible, until one takes account of the fact that natural selection is a self-organizing force that staves off the potential error catastrophe that threatens information storage (Eigen 1971). Likewise, coding selforganization(Wills1993)stavesoffthepotentialerrorcatastropheintranslation(Orgel1963).Neither systemcanbeexpectedtooperateunlesseachlimitsdeleteriouseffectsoftheerrorrateoftheother. 18 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Just as power transfer in dissipative electronic structures is optimal if input and output impedances match,somolecularbiologicalorganizationobservedinlife’sinformationalsystemsmayhaveevolved most efficiently by matching improvements in error rates for nucleic acid replication and protein synthesisatsuccessivedevelopmentstages.Ifso,geneexpressionandreplicationbyfunctionalprotein replicases could not have emerged efficiently from a world in which either function was already performedatahigherlevelbyribozymes. Directbootstrappingofgeneticinformationandencodedfunctionalproteinsisfarmoreplausiblethan anyscenarioinwhichtherewasaninitialRNAWorldbythreecriteria—reflexivefeedback,degraded specificityinhybridsystems,andtheneedtomatchthecomplexityofcodingtothatofproteinfunction. Significanttracesinthestructuralbiologyofthecontemporaryaminoacyl-tRNAsynthetasestherefore suggestthattheevolutionofgeneticcodingproceededbypreciselysuchasequenceofphasetransitions, eachentailingbifurcationsoftheinformation-processingcharacteristicofthepreviousstage. Impedancematchingarguesforcoevolutionofreplicationandtranslation.Fig.5illustratesamechanism couplingbiologicalinformationstorageandreadout(dashedlinesinFig.4A;(WillsandCarter2017).We conjecturethatprogressiveincreasesinthedimensionofthecodontable,neff,enhancecodingevolution efficiencybymatchingnoiseingeneticinformationmaintenance(replicationerrorsandquasi-neutral driftinsequencespace)tothatfromthetranslationerrorrate.Toparaphrasefromarecentdefinitionof “information impedance matching” of information sources to receivers in a different context (Martin 2005),readingoutgeneticinformationwithaslittledissipationaspossiblerequiresreadoutmachinery withapproximatelythelevelofnoisepresentintheinformationsources.Iferrorsineitherprocessare eithertoohighortoolow,thesystemwilldissipateenergyunnecessarily,reducingthereadoutefficiency. Inotherwords,atanyevolutionarystageofdevelopmentsinmolecularbiology,theselectiveeffectof the“replicases”andthefidelityofthe“translatases”(andanyassociatedaccessories)needtolimitnoise tocomparablelevelsinordertooptimizetheefficiencyofinformationtransferatthatstage. The notion of impedance-matching is well-established physics. Our heuristic reference to it here is supportedbythefollowingobservations.Errorratesappeartobeavalidmetricforemergingbiological complexity over quite large timescales (Lewis, et al. 2016). Specific aspects of that work support this view:(i)MichaelisMentenparametersfortheLeuRSandHisRS2urzymes(Carter,etal.2014;Carter 2015) suggest that, whereas they are quite impressive catalysts, their specificities for cognate amino acidsarewellbelowthosenecessarytostabilizepopulationsoffull-lengthaaRS,whichhavemuchhigher fidelities.(ii)StructuralstudiesoftheTrpRSurzymeshowthatitshighrateaccelerationarisesfromwhat appearstobeamoltenglobularensemble(Sapienza,etal.2016).Inotherwords,itisalesscomplex molecule—in a higher entropy state—than a properly folded protein. (iii) The million-fold rate accelerationsofbothwildtypeanddesignedClassIandIIprotozymes(Martinez,etal.2015)suggest thatthemanifoldofcatalyticallycompetentpolypeptidesisfarlargerthanpreviouslythoughtpossible. 19 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. (iv) Presumptive error rates for the aaRS constructs therefore exhibit a monotonic decline with increasingmass,andbyimplication,increasingcomplexity. Moreover,errorsquiteliterally(Gladstone2016)slowtheaccumulationofinformationandhencethe growth of complexity in many situations. Thus, although the notion of impedance-matching requires furtherdevelopmentinevolutionarytheory,includingincorporatingmetricsof“efficiency”,itappears that natural selection and self-organization provide efficient coupling between information storage (replication)andinformationreadout(translation),asifthetwoprocesseswereimpedancematched. III. ScenariosforearlyaaRSspeciationandco-evolutionofreplicationandreadout PhylogeneticancestriesofcontemporaryClassIandIIaaRSprojectconvincinglybacktoasinglegene. The simplicity of such a gene furnishes a conceptually consistent “boot block” (Fig. 3) substantially reducingthechallengeofunderstandinghowgeneticcodingmighthaveemergedfromapeptide/RNA partnership.Moreover,thedetailedinversionsymmetrieshelptoexplainhowsuchagenewouldenforce theinitialdifferentiationnecessarytobreakthepowerfulforcesthatmakequasispeciescentroidsstrong attractors,substantiallystrengtheningargumentsthatnogeneticcodecouldhaveprecededtheearliest codedproteinaaRS.Dual-codinggeneticquasispeciesexemplifiedexperimentallybytheprotozymegene described by Martinez, et. al. (2015) and the urzyme gene proposed by Pham, et al (2007) are thus presumptiveancestorstobothClassIandIIaaRSsuperfamiliesandtheuniversalgeneticcodeitself. A. WhydoestablishedproteinphylogeniessuggestlateaaRSspeciation? The strongest argument that an RCW preceded the emergence of proteins is that multiple sequence alignments of contemporary protein families (Aravind, et al. 2002; Leipe, et al. 2002; Koonin and Novozhilov2009;Koonin2011),suggestthataaRSdivergedlateinthesuccessionofproteinfunctions. However,takeoverofaribozyme-basedcomputationaltranslationprocessmustleadinaplausibleway to,theobservedphylogenyofcontemporaryaaRSsuperfamilies. WebelievetheconclusionthataaRSsdevelopedaftertheadventoffullyfunctionalproteinsbasedonan alphabetof20aminoacidsrestsontwoquestionablephylogenicassumptions:(i)thatdomains(~250 aminoacids)arethebasicunitofremoteproteinevolutionaryhistory,and(ii)thattheevolutionofClass I and II aaRS proceeded from independent ancestries. The former assumption fails to account appropriatelyforthehighlymosaicnatureofcontemporaryproteins(Pham,etal.2010;Li,etal.2011). Thelatterignoresthebi-directionalcodingancestryofClassIandIIaaRSurzymesandprotozymes,for whichexperimentalevidenceisnowexceptionallystrong(Pham,etal.2010;Li,etal.2011;Li,etal.2013; Carter2014;Carter,etal.2014;Carter2015;Martinez,etal.2015;Carter2016,2017). The low fidelity of aaRS urzymes implies that they represent an important, but early stage in the evolutionofcomplexityandhencethatdeepphylogeniesbasedonaligningintactcontemporaryaaRS sequences(Aravind,etal.1998;Wolf,etal.1999;Leipe,etal.2002;WolfandKoonin2007)areprobably misleading,especiallyinthecaseofthepre-LUCAheritageoftheaaRSsthemselves(Wolf,etal.1999; 20 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. WolfandKoonin2007).Notably,neitherdomaindatabase(SCOP(Murzin,etal.1995;Andreeva,etal. 2008);CATH(Pearl,etal.2003))hasbeencompiledatsufficientlyhighresolutiontoidentifytheClassI and II urzymes as ancestral forms. Large insertions within the catalytic domains of aaRS were likely accumulatedsegmentally,fromexogenousgeneticmoduleswiththeirownpreviousancestry(Pham,et al.2007),subsequenttotheirinitialevolutionaryspeciation.Suchmosaicityinthemultiplesequence alignments, akin to horizontal gene transfer (Leipe, et al. 2004; Soucy, et al. 2015) albeit in shorter segments than those considered by Wolf, et al. (1999), could obscure deeper ancestral evolutionary trajectoriesinvolvingtheurzymes. Fig. 6 illustratesan alternative phylogeny of Class I aaRS that resolves what we feel to be a mistaken conclusionthattheClassIaaRSdivergedlateintheevolutionoftheproteome.TheschemetracesClass IandIIaaRSancestriesfromasinglegenebytwodistinctprocesses—speciationofthebi-directional gene(I)andstrandspecializationtotranscenditslimitations(II).Itfurnishesasatisfactoryaccountof the increase in structural multiplexing and independent parallel evolution of insertion elements and anticodon-binding domains during a period in which protein synthesis operated with a gradually increasing alphabet size that ultimately required development of editing domains (III) to achieve the requisitefidelityofthecontemporaryproteome. B. Aplausiblescenarioforco-evolutionofinformationstorageandgeneexpression. Wehighlighthowconclusionsfrom§IIchangehowwethinktranslationmighthaveemerged,andoutline aplausiblescenariofortheco-emergenceofinformationstorageandreadout.Ourscenarioiscompatible witharough“impedancematching”inwhichhighnoiseinitiallypermitsco-optionofquiteunrefined functionalities that comprise groupings of related effects averaged over large but separate regions of sequencespace.Noiseisgraduallybroughtundercontrolbyrefiningfunctionalitieswithdistinguishable specificities and selection of genes encoding them, enabling structural diversity and complexity to developsimultaneouslywithincreasesinthedimensionofthecodontable(Fig.5).Althoughaspectsof thisscenarioresemblepreviouslyoutlinedmarginalscenarios(Martin2005), itsscope,continuity,and itslogical,experimental,andphylogeneticsupportareassembledhereforthefirsttime. Theoriginofcontemporarytranslationwasmostlikelyanintimateco-evolutionaryprocessinvolving both polymer classes (Carter and Kraut 1974a; Carter 1975). The chief arguments expressed in the following two sections must remain hypotheses until experimental investigation, perhaps guided by ideaswehaveexpressed,canconvincinglyestablishorrulethemout.Ourrecentuseofproteindesign andmodularengineeringintheexperimentalcolonizationofthevoidthatpreviouslyexistedbetween pre-biotic organic chemistry (Patel, et al. 2015; Sutherland 2015) and the Last Universal Common Ancestor (Forterre, et al. 2005; Wong 2005; Xue, et al. 2005; Fournier, et al. 2011; Fournier and Alm 2015;Wong,etal.2016)arguesthatsuchexperimentationcannowbefruitfulonalargerscale. 21 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Argumentsdevelopedin§II.Dimplythatreplicationandtranslationarenecessarilymoretightlycoupled thanisenvisionedintheRNAWorldhypothesisbytheneedforinformationalimpedance-matching.The overridingchallengeassociatedwiththeemergenceofthegeneticcodeistodevelopascenarioinwhich prebioticchemistryproducesbiologythroughcooperationbetweennucleicacidsandproteins(ortheir precursors),reflexively,inimprovingbothinheritanceandfunction.Detailsofsuchascenarioliebeyond currentresources;itisneverthelessappropriatetooutlineaspectsthatpointtowardfurtherwork. Soon after discovery of catalytic RNA, structural complementarities were identified between extended polypeptidesecondarystructuresandnucleicacids(CarterandKraut1974b;Carter1975;Church,etal. 1977; Warrant and Kim 1978). The short lengths of polymers required to form such complexes—6-8 aminoacidsandlessthanhalfaturnofRNAdoublehelix,suggestedtheymighthavebeenmorestableif their polypeptide and polynucleotide components formed hairpins (Berezovsky, et al. 2000). Their stabilityascomplexesappearedtodependlargelyontheircomplementaryvanderWaalssurfaces. Stereochemicallytemplatedcross-catalysisplausiblyaccountedforthesimultaneousappearanceofcoding andcatalysis.HelixradiiofRNAanddouble-strandedextendedpeptidesproducedoptimalvanderWaals contactsbetweenthetwocomponentsatpreciselytheintegral,indefinitelyrepeatingstoichiometryof two amino acids per base (Carter and Kraut 1974b; Carter 1975). This, coincidental, integral stoichiometry enabled a putative rudimentary stereochemical coding. Moreover, specific hydrogen bondingbetweencarboxylgroupsofantiparallelβ-polypeptidedoublehelicesandRNA2’OHgroups orientedthe3’OHgroupasalikelynucleophilesuggestingthat,inaddition,thepolarinteractionsmight exhibittemplatedcross-catalysis,onepolymeracceleratingtheelongationoftheother. Successive inverted repeats of complementary polypeptide•polynucleotide complexes increased their lengthsfrom~12to~23to~46aminoacidsandfrom~3to~6to~12basepairs.Peptidesequencescapable ofaminoacidactivationwouldhavebeenaplausibleconsequence,forwhichwehaveonlycircumstantial evidence.Peptidesatleast46aminoacidsproducedbystereochemicalcodingbasedoncomplementary van der Waals surfaces of peptide and RNA backbones might already have begun to exhibit ATP dependent carboxyl group activation, potentiating assembly of peptides. Ligation might then have assembledthefirstprotogenesandaproto-ribosome.Partialcomplementarityofthe5’and3’-terminal halvesoftheClassIprotozymegene(Carter2015)suggestscodingofthebi-directionalprotozymegene by an ancestral RNA hairpin. This reasoning suggests that polypeptide catalytic activities could have precededevenrudimentarygeneticcoding(Kamtekar,etal.1993;Moffet,etal.2003;Patel,etal.2009). The wobble effect (Crick 1966) implies that bi-directional coding would have required a triplet code to providemorethan4codons.Weencounterhereasubstantivebrokensymmetry.Theprotozymegene (138bases)is6-foldlongerthanwhateverputativeRNAhairpinmighthavebeenassociatedwiththe earliest~46-residuepeptidesarisingviastereochemicalcoding.Assumingthatsuchasystemcouldhave sustainedreproductionneverthelessleavesuswithasix-foldgapbetweentherelativestoichiometries 22 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. oftemplatedcrosscatalysisandthefirsttruegeneexpression.Transitionsfromaninitialstateinwhich proteinsynthesisisinitiatedwithoutinformation-bearinggenetictemplatesisenvisionedinthetheory ofcodingself-organizationandGRTsystems(Bedian1982;Eigen,etal.1988;Wills1993).Whatsortof continuitymighthaveconnectedanearlierdirect,stereochemicalcodingtoindirect,symboliccodingby introducingmessengerRNAandtheuseofadaptorstogivethemessagesmeaning? Theearliestindirectcodingemulatedthedirectstereochemicalcodingarisingfromcomplementaryvander WaalssurfacesofpeptideandRNAbackbones.AnalysisofaaRSrecognitionelementsintRNAacceptor stem and anticodon bases highlighted the capacity of tRNA acceptor stems to encode the size and bcarbonbranching,butnotthehydrophobicityofaminoacids(CarterandWolfenden2015,2016).These propertiesarenecessaryandsufficienttoencodepeptideswiththemostimportantcharacteristics—βbranchedsidechainsfavoringextendedβ-structureandalternatingsmall/largesidechainsallowingvan derWaalsaccessononeface—forassumingstructurescomplementarytotheRNAminorgroove(Carter and Kraut 1974b; Carter 1975). Symbolic coding by the tRNA acceptor stem could therefore have implementedpreciselythosefeaturesnecessarytopreservemolecularmechanismsthatsustaineddirect stereochemicalcoding.Aselectiveadvantageofthatsymbolicrepresentationintheproto-tRNAacceptor stemwouldhavebeenthatitsmoothedthetransitionbetweendifferentstoichiometries—twoamino acidsperbasevs.threebasesperaminoacid—necessarytoimplementsymboliccoding. The ancestral bi-directional gene produced two amino acid activating enzymes, Class I with a modest specificityforlargeraminoacids,ClassIIwithasimilarspecificityforsmalleraminoacids,inkeepingwith thecontemporaryspecificitiesofClassIandIIaaRSandurzymes.Itisobviouslyofinteresttodetermine howlimitedanaminoacidalphabetisconsistentwithcatalyticactivityofsuchprotozymegenes.Extant experimentalresults,however,showonlythatbyutilizingthefullgeneticcodethetwogeneproducts createdfromoppositestrandscanbothaccelerateaminoacidactivation~106-fold.TheClassIprotozyme possessesaconsensusphosphatebindingsitesite(Hol,etal.1978),suggestingthatitscatalyticactivity mayarisefrombackboneconfigurations,andnotdependentirelyon“catalyticresidues”. TheearliestcatalystsofaminoacylationmayhavecombinedancestralaaRSwithribozymes(Turk,etal. 2010; Turk, et al. 2011). It has not been established whether or not the protozymes might also have acceleratedtRNAchargingwiththeactivatedaminoacidproducts.tRNAacceptorstemIDelementslikely composedtheearliestconnectionbetweenaminoacylatedRNAsandagenesequence(Schimmel,etal. 1993; Rodin, et al. 1996; Henderson and Schimmel 1997; Rodin and Rodin 2008; Rodin, et al. 2009; Rodin, et al. 2011). Dependence of aaRS tRNA affinity on acquiring an additional, anticodon-binding domain suggests that, in contrast to amino acid activation, aminoacylation may have originated in polypeptide•RNAcollaborationandwaslatertakenoverbyurzymeswiththerudimentarycapabilityto recognizetRNAacceptorstems(Li,etal.2013). CONCLUSIONS 23 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The continuing search for ever better RNA replicases (Wochner, et al. 2011; Attwater, et al. 2013; SczepanskiandJoyce2014;Taylor,etal.2015;HorningandJoyce2016)hasbeennomeanachievement. However,wearguehereforamoreholisticandambitioussetofgoalsthanthosefuelingthatsearch, anticipating that data and theory in §I, §II, and (Wills and Carter 2017) will stimulate discussion and furtherresearchonquestionsrelevanttotheoriginsofgeneexpression,biology’sreadoutmechanism. Ahighdegreeofcoherenceconnectstheoriesofself-organizationtotheexperimental,structural,and phylogeneticaspectsoftheevolutionoftheaaRSenzymesthatimplementgeneexpressiontoday. 1) Pronouncedinversionsymmetriesintheaminoacidsubstrates,catalyticresidues,tertiary,and secondary structures are evident in phylogenetic, structural, and biochemical data for contemporaryClassIandIIaaRSandarisefromtheirbi-directionallycodingancestry. 2) Inversionsymmetriesassuremaximalstructuralandfunctionaldifferentiationbetweenthetwo classes,anecessarypreconditionfortheirsurvivalincompetitionwithparasiticmolecularforms. 3) tRNAidentityelementsthatimplementcodingefficientlycapturetheaminoacidphaseequilibria thatdriveproteinfoldingandareoptimalforbi-directionalcoding(ZullandSmith1990). 4) Bi-directionalcodingandtheuniquenessofthecodingtablecreateareflexivefeed-backcycleto guiderapidevolutionaryemergenceofproteinaaRSgenesbybootstrappingrapidlytoanoptimal codingtableandmRNAsequences.Ribozymalassignmentcatalystslacksuchreflexivity. 5) Hybrid system expression dynamics show that any emerging PCW with a lower-dimensional codingtablethanthatofapre-existingRCWwouldnecessarilyhavebeeneliminatedbypurifying selectionbeforeithadsufficienttimetoexpandthedimensionofitscodingtable. 6) Coupling of dynamic equations for gene-translatase-replicase (GRT) systems suggest that matchingoferrorratesmaximizedtheprobabilityoflaunchingreplicationandtranslation. 7) (1)-(6) imply that replication and readout emerged simultaneously from a peptide•RNA partnership.WeoutlineamoreprobablescenariothananRNAworldfortheoriginofbiology. 8) Molecularconstructs(§I)enhancetheabilitytotestspecificelementsofproposedscenarios. Acknowledgments ThisworkwassupportedbyTheNationalInstituteofGeneralMedicalSciences,(grantnumbersR0178227&R01-90406toC.W.C.,Jr.).Thispublicationwasmadepossiblealsothroughthesupportofagrant from the John Templeton Foundation. The opinions expressed in this publication are those of the author(s)anddonotnecessarilyreflecttheviewsoftheJohnTempletonFoundation.H. Fried (Cursor Scientific Editing and Writing, LLC)mademanyusefulsuggestionsonanearlierdraft. 24 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. REFERENCES AndreevaA,HoworthD,ChandoniaJ-M,BrennerSE,HubbardTJP,ChothiaC,MurzinAG.2008.Data growthanditsimpactontheSCOPdatabase:newdevelopments.Nucl.AcidsRes.36:D419-D425. AravindL,AnantharamanV,KooninEV.2002.MonophylyofClassIAminoacyltRNASynthetase, USPA,ETFP,Photolyase,andPP-ATPaseNucleotide-BindingDomains:ImplicationforProtein EvolutionintheRNAWorld.PROTEINS:Structure,Function,andGenetics48:1–14. AravindL,LeipeDD,KooninEV.1998.Toprim—aconservedcatalyticdomainintypeIAandII topoisomerases,DnaG-typeprimases,OLDfamilynucleasesandRecRproteins.NucleicAcids Research26:4205–4213. AttwaterJ,WochnerA,HolligerP.2013.In-iceevolutionofRNApolymeraseribozymeactivity. NatureChemistry5:1101-1018. BedianV.1982.Thepossibleroleofassignmentcatalystsintheoriginofthegeneticcode..Orig. Life12:181–204. BedianV.2001.Self-descriptionandtheoriginofthegeneticcode.Biosystems60:39–47. BerezovskyIN,GrosbergAY,TrifonovEN.2000.Closedloopsofnearlystandardsize:common basicelementofproteinstructure.FEBSLetters466:283-286. BergP,OfengandEJ.1958.AnEnzymaticMechanismforLinkingAminoAcidstoRNA. ProcNatAcadSciUSA44:78-85. BernhardtHS.2012.TheRNAworldhypothesis:theworsttheoryoftheearlyevolutionoflife (exceptforalltheothers).BiologyDirect7:23. BowmanJC,HudNV,WilliamsLD.2015.TheRibosomeChallengetotheRNAWorld.Journalof MolecularEvolution80:143-161. BreakerRR.2012.RiboswitchesandtheRNAWorld.ColdSpringHarbPerspectBiol4:a003566. Caetano-AnollésD,Caetano-AnollésG.2016.PiecemealBuildupoftheGeneticCode,Ribosomes, andGenomesfromPrimordialtRNABuildingBlocks.Life6:43. Caetano-AnollesG,KimHS,MittenthalJE.2007.Theoriginofmodernmetabolicnetworksinferred fromphylogenomicanalysisofproteinarchitecture.ProceedingsoftheNationalAcademyof Sciences,USA1049358–9363. Caetano-AnollésG,WangM,Caetano-AnollésD.2013.StructuralPhylogenomicsRetrodictsthe OriginoftheGeneticCodeandUncoverstheEvolutionaryImpactofProteinFlexibility.PLoSONE 8:e72225. CarterCW,Jr.2015.WhatRNAWorld?WhyaPeptide/RNAPartnershipMeritsRenewed ExperimentalAttention.Life5:294-320. CarterCW,Jr,KrautJ.1974a.AProposedModelforInteractionofPolypeptideswithRNA. ProcNatAcadSciUSA71:283-287. CarterCW,Jr.2016.AnAlternativetotheRNAWorld.NaturalHistory125:28-33. CarterCW,Jr.2017.CodingofClassIandIIaminoacyl-tRNAsynthetases.ProteinReviewsInPress. CarterCW,Jr.1993.CognitionMechanismandEvolutionaryRelationshipsinAminoacyl-tRNA Synthetases.AnnualReviewofBiochemistry62:715-748. CarterCW,Jr.1975.CradlesforMolecularEvolution.NewScientistMarch27:784-787. CarterCW,Jr.2014.Urzymology:ExperimentalAccesstoaKeyTransitionintheAppearanceof Enzymes.J.Biol.Chem.289:30213–30220. CarterCW,Jr.,LiL,WeinrebV,CollierM,Gonzales-RiveraK,Jimenez-RodriguezM,ErdoganO, ChandrasekharanSN.2014.TheRodin-OhnoHypothesisThatTwoEnzymeSuperfamilies DescendedfromOneAncestralGene:AnUnlikelyScenariofortheOriginsofTranslationThatWill NotBeDismissed.BiologyDirect9:11. 25 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. CarterCW,Jr.,WolfendenR.2016.Acceptor-stemandanticodonbasesembedaminoacidchemistry intotRNA.RNABiology13:145–151. CarterCW,Jr.,WolfendenR.2015.tRNAAcceptor-StemandAnticodonBasesFormIndependent CodesRelatedtoProteinFolding.Proc.Nat.Acad.Sci.USA1127489-7494. CarterCW,Jr.,,KrautJ.1974b.AProposedModelforInteractionofPolypeptideswithRNA. ProceedingsoftheNationalAcademyofSciences,USA71:283-287. CechTR.1986.TheInterveningSequenceRNAofTetrahymenaisanEnzyme.ScientificAmerican 255:64-75. ChandrasekaranSN,YardimciG,ErdoganO,RoachJM,CarterCW,Jr.2013.StatisticalEvaluationof theRodin-OhnoHypothesis:Sense/AntisenseCodingofAncestralClassIandIIAminoacyl-tRNA Synthetases.MolecularBiologyandEvolution30:1588-1604. ChurchGM,SussmanJL,KimSH.1977.SecondarystructuralcomplementaritybetweenDNAand proteins.ProceedingsoftheNationalAcademyofSciences,USA74:1458-1462. CrickFHC.1966.Codon-AnticodonPairing:TheWobbleHypothesis.JournalofMolecularBiology 19:548-555. CrickFHC.1955.OnDegenerateTemplatesandtheAdaptorHypothesis.Unpublished; https://profiles.nlm.nih.gov/ps/retrieve/Narrative/SC/p-nid/153. CrickFHC.1968.TheOriginoftheGeneticCode.JournalofMolecularBiology38:367-379. CullisPM,WolfendenR.1981.AffinitiesofNucleicAcidBasesforSolventWater?Biochemistry 20:3024-3028. CusackS.1994.EvolutionaryImplications.NatureStructuralandMolecularBiology1:760. CusackS,Berthet-ColominasC,HärtleinM,NassarN,LebermanR.1990.Asecondclassof synthetasestructurerevealedbyX-rayanalysisofEscherichiacoliseryl-tRNAsynthetaseat2.5Å. Nature347:249-255. DelarueM.2007.Anasymmetricunderlyingruleintheassignmentofcodons:Possiblecluetoa quickearlyevolutionofthegeneticcodeviasuccessivebinarychoices.RNA13:1-9. DiGiulioM.1992.OntheOriginoftheTransferRNAMolecule.J.Theor.Biol.159:199-214. DiGiulioM.2004.TheoriginofthetRNAmolecule:implicationsfortheoriginofproteinsynthesis. JournalofTheoreticalBiology226:89–93. DiGiulioM.2008.TransferRNAgenesinpiecesareanancestralcharacter.EMBOReports9:820. DillK,ChanHS.1997.FromLevinthaltopathwaystofunnels.NatureStructuralBiology4:10-19. EigenM.1971.SelforganizationofMatterandtheEvolutionofBiologicalMacromolecules. Naturwissenschaften58:465-523. EigenM,McCaskillJS,SchusterP.1988.MolecularQuasi-Species.J.Phys.Chem.92:6881-6891. EigenM,SchusterP.1977.TheHypercyde:APrincipleofNaturalSelf-OrganizationPartA: EmergenceoftheHypercycle.Naturwissenschaften64:541-565. ErianiG,DelarueM,PochO,GangloffJ,MorasD.1990.PartitionoftRNASynthetasesintoTwo ClassesBasedonMutuallyExclusiveSetsofSequenceMotifs.Nature347:203-206. ForterreP,GribaldoS,BrochierC.2005.Luca:àlarechercheduplusprocheancêtrecommun universel.MEDECINE/SCIENCES21:860-865. FournierGP,AlmEJ.2015.AncestralReconstructionofaPre-LUCAAminoacyl-tRNASynthetase AncestorSupportstheLateAdditionofTrptotheGeneticCode.JournalofMolecularEvolution 80:171-185. FournierGP,AndamCP,AlmEJ,GogartenJP.2011.MolecularEvolutionofAminoacyltRNA SynthetaseProteinsintheEarlyHistoryofLife.OrigLifeEvolBiosph41621–632 26 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. FreelandSJ,HurstLD.1998.TheGeneticCodeisOneinaMillion.JournalofMolecularEvolution 47:238-248. Fü chslinRM,McCaskillJS.2001.Evolutionaryself-organizationofcell-freegeneticcoding..Proc NatlAcadSciUSA98:9185–9190. GilbertW.1986.TheRNAWorld.Nature319:618. GladstoneE.2016.ErrorinInformationDiffusionProcesses.[[Ithaca,NY]:CornellUniversity. Guerrier-TakadaL,N.,andAltman,S.1989.SpecificInteractionsinRNAEnzymes-Substrate Complexes.Science246:1578-1584. HendersonBS,SchimmelP.1997.RNA-RNAInteractionsBetweenOligonucleotideSubstratesfor Aminoacylation.Bioorganic&MedicinalChemistry5:1071-1079. HofstadterDR.1979.Gödel,Escher,Bach:aneternalgoldenbraid.NewYork:BasicBooks,Inc. HolWJG,vanDuijnenPT,BerensenHJC.1978.Theα-helixdipoleandthepropertiesofproteins. Nature273:443-446. HordijkW,WillsPR,SteelM.2014.AutocatalyticSetsandBiologicalSpecificity.BullMathBiol 76:201–224. HorningDP,JoyceGF.2016.AmplificationofRNAbyanRNApolymeraseribozyme. ProcNatAcadSciUSA113:9786–9791. KamtekarS,SchifferJM,XiongH,BabikJM,HechtMH.1993.ProteinDesignbyBinaryPatterningof PolarandNon-polarAminoAcids.Science262:1680-1685. KimS-H,QuigleyGJ,SuddathFL,McPhersonA,SnedenD,KimJJ,WeinzierlJ,RichA.1973.ThreedimensionalstructureofyeastphenylalaninetransferRNA:Foldingofthepolynucleotidechain.. Science179:285-288. KooninEV.2011.TheLogicofChance:TheNatureandOriginofBiologicalEvolution.UpperSaddle River,NJ:PearsonEducation;FTPressScience. KooninEV.2015.WhytheCentralDogma:onthenatureofthegreatbiologicalexclusionprincipl. BiologyDirect10:52. KooninEV,NovozhilovAS.2009.OriginandEvolutionoftheGeneticCode:TheUniversalEnigma. IUBMBLife,61:99–111. LeipeDD,KooninEV,AravindL.2004.STAND,aClassofP-LoopNTPasesIncludingAnimaland PlantRegulatorsofProgrammedCellDeath:Multiple,ComplexDomainArchitectures,Unusual PhyleticPatterns,andEvolutionbyHorizontalGeneTransfer.J.Mol.Biol.343:1–28. LeipeDD,WolfYI,KooninEV,AravindL.2002.ClassificationandEvolutionofP-loopGTPasesand RelatedATPases.J.Mol.Biol.317:41-72. LewisCA,Jr.,CrayleJ,ZhouS,SwanstromR,WolfendenR.2016.Cytosinedeaminationandthe precipitousdeclineofspontaneousmutationduringEarth’shistory.ProcNatAcadSciUSA 113:8194-8199. LiL,FrancklynC,CarterCW,Jr.2013.AminoacylatingUrzymesChallengetheRNAWorld Hypothesis.J.Biol.Chem.288:26856-26863. LiL,WeinrebV,FrancklynC,CarterCW,Jr.2011.Histidyl-tRNASynthetaseUrzymes:ClassIandII Aminoacyl-tRNASynthetaseUrzymeshaveComparableCatalyticActivitiesforCognateAminoAcid Activation.J.Biol.Chem.286:10387-10395. MarkowitzS,DrummondA,NieseltK,WillsPR.2006.Simulationmodelofprebioticevolutionof geneticcoding.In:RochaLM,YaegerLS,BedauMA,FloreanoD,GoldstoneRL,VespignaniA, editors.ArtificialLife.Cambridge,MA:MITPress.p.152--157. MartinP.2005.SpatialInterpolationinOtherDimensions.[OregonStateUniversity. MartinezL,Jimenez-RodriguezM,Gonzalez-RiveraK,WilliamsT,LiL,WeinrebV,Niranj ChandrasekaranS,CollierM,AmbroggioX,KuhlmanB,etal.2015.FunctionalClassIandIIAmino 27 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. AcidActivatingEnzymesCanBeCodedbyOppositeStrandsoftheSameGene.J.Biol.Chem. 290:19710–19725. MoffetDA,FoleyJ,HechtMH.2003.Midpointreductionpotentialsandhemebinding stoichiometriesofdenovoproteinsfromdesignedcombinatoriallibraries.BiophysicalChemistry 105:231–239. MuñozV,SerranoL.1994.IntrinsicSecondaryStructurePropensitiesoftheAminoAcids,Using Statisticalf-ymatrices:ComparisonwithExperimentalScales.PROTEINS:Structure,Function,and Genetics20:301-311. MurzinAG,BrennerSE,HubbardTJP,ChothiaC.1995.SCOP:astructuralclassificationofproteins databasefortheinvestigationofsequencesandstructures.J.Mol.Biol.247:536-540. NollerH.2004.Thedrivingforceformolecularevolutionoftranslation.RNA10:1833-1837. NollerHF,HoffarthV,ZimniakL.1992.UnusualResistanceofPeptidylTransferasetoProtein ExtractionProceduresScience256:1416-1419. O’DonoghueP,Luthey-SchultenZ.2003.OntheEvolutionofStructureinAminoacyl-tRNA Synthetases.MicrobiologyandMolecularBiologyReviews67:550–573. OrgelLE.1968.EvolutionoftheGeneticApparatus.J.Mol.Biol.88:381-393. OrgelLE.1963.Themaintenanceoftheaccuracyofproteinsynthesisanditsrelevancetoageing.. ProcNat.Acad.Sci.USA49:517-521. PatelBH,PercivalleC,RitsonDJ,DuffyCD,SutherlandJD.2015.CommonoriginsofRNA,protein andlipidprecursorsinacyanosulfidicprotometabolism.NatureChemstry7:301-307. PatelSC,BradleyLH,JinadasaSP,HechtMH.2009.Cofactorbindingandenzymaticactivityinan unevolvedsuperfamilyofdenovodesigned4-helixbundleproteins.ProteinScience18:1388-1400. PearlFM,BennettCF,BrayJE,HarrisonAP,MartinN,ShepherdA,SillitoeI,ThorntonJ,OrengoCA. 2003.TheCATHdatabase:anextendedproteinfamilyresourceforstructuralandfunctional genomics.NucleicAcidsResearch31:452-455. PetrovAS,BernierCR,HsiaoC,NorrisAM,KovacsNA,WaterburyCC,StepanovVG,HarveySC,Fox GE,WartellRM,etal.2014EvolutionoftheRibosomeatAtomicResolution.PNAS11110251– 10256. PetrovAS,WilliamsLD.2015.TheAncientHeartoftheRibosomalLargeSubunit:AResponseto Caetano-Anolles.JMolEvol80:166-170. PhamY,KuhlmanB,ButterfossGL,HuH,WeinrebV,CarterCW,Jr.2010.Tryptophanyl-tRNA synthetaseUrzyme:amodeltorecapitulatemolecularevolutionandinvestigateintramolecular complementation.J.Biol.Chem.285:38590-38601. PhamY,LiL,KimA,ErdoganO,WeinrebV,ButterfossG,KuhlmanB,CarterCW,Jr.2007.AMinimal TrpRSCatalyticDomainSupportsSense/AntisenseAncestryofClassIandIIAminoacyl-tRNA Synthetases.MolCell25:851-862. RadzickaA,WolfendenR.1988.ComparingthePolaritiesoftheAminoAcids:Side-Chain DistributionCoefficientsbetweentheVaporPhase,Cyclohexane,1-0ctanol,andNeutralAqueous Solution.Biochemistry27:1664-1670. RobertsonMP,JoyceGF.2012.TheOriginsoftheRNAWorld.ColdSpringHarbPerspectBiol 4:a003608. RodinA,RodinSN,CarterCW,Jr.2009.OnPrimordialSense-AntisenseCoding.JournalofMolecular Evolution69:555-567. RodinAS,SzathmáryE,RodinSN.2011.OnoriginofgeneticcodeandtRNAbeforetranslation. BiologyDirect6:14. 28 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. RodinSN,OhnoS.1995.TwoTypesofAminoacyl-tRNASynthetasesCouldbeOriginallyEncodedby ComplementaryStrandsoftheSameNucleicAcid.OriginsofLifeandEvolutionoftheBiosphere 25:565-589. RodinSN,RodinA.2006a.OriginoftheGeneticCode:FirstAminoacyl-tRNASynthetasesCould ReplaceIsofunctionalRibozymesWhenOnlytheSecondBaseofCodonsWasEstablished.DNAand CellBiology25:365-375. RodinSN,RodinA.2006b.PartitioningofAminoacyl-tRNASynthetasesinTwoClassesCouldHave BeenEncodedinaStrand-SymmetricRNAWorld.DNAandCellBiology25:617-626. RodinSN,RodinA,OhnoS.1996.Thepresenceofcodon-anticodonpairsintheacceptorstemof tRNAs.Proc.Nat.Acad.Sci.USA93:4537-4542. RodinSN,RodinAS.2008.Ontheoriginofthegeneticcode:Signaturesofitsprimordial complementarityintRNAsandaminoacyl-tRNAsynthetases.Heredity100:341–355. RuffM,KrishnaswamyS,BoeglinM,PoterszmanA,MitschlerA,PodjarnyA,ReesB,ThierryJC, MorasD.1991.ClassIIAminoacylTransferRNASynthetases:CrystalStructureofYeastAspartyltRNASynthetaseComplexedwithtRNAAsp.Science252:1682-1689. SapienzaPJ,LiL,WilliamsT,LeeAL,CarterCW,Jr.2016.AnAncestralTryptophanyl-tRNA SynthetasePrecursorAchievesHighCatalyticRateEnhancementwithoutOrderedGround-State TertiaryStructures.ACSChemicalBiology11:1661−1668. SchimmelP.1996.Originofgeneticcode:AneedleinthehaystackoftRNAsequences.Proc.Nat. Acad.Sci.USA93:4521-4522. SchimmelP,GiegéR,MorasD,YokoyamaS.1993.AnoperationalRNAcodeforaminoacidsand possiblerelationshiptogeneticcode.ProceedingsoftheNationalAcademyofSciences,USA 90:8763-8768. SchneiderTD.2010.Abriefreviewofmolecularinformationtheory.NanoCommunication Networks1173–180. SczepanskiJT,JoyceGF.2014.Across-chiralRNApolymeraseribozyme.Nature515:440-442. SmithJI,SteelM,HordijkW.2014.Autocatalyticsetsinapartitionedbiochemicalnetwork.Journal ofSystemsChemistry5:2. SoucySM,HuangJ,GogartenJP.2015.Horizontalgenetransfer:buildingtheweboflife.Nature ReviewsGenetics16:472 SutherlandJD.2015.TheOriginofLife—OutoftheBlue.Angew.Chem.Int.Ed.54:InPress. TaylorAI,PinheiroVB,SmolaMJ,MorgunovAS,Peak-ChewS,CozensC,WeeksKM,HerdewijnP, HolligerP.2015.Catalystsfromsyntheticgeneticpolymers.Nature518:427-430. TuerckC,GoldL.1990.Systematicevolutionofligandsbyexponentialenrichment:RNAligandsto bacteriophageT7DNApolymerase.Science249:505-510. TurkRM,ChumachenkobNV,YarusM.2010.Multipletranslationalproductsfromafive-nucleotide ribozyme.ProceedingsoftheNationalAcademyofSciencesUSA107:4585–4589. TurkRM,IllangasekareM,YarusM.2011.CatalyzedandSpontaneousReactionsonRibozyme Ribose.J.AM.CHEM.SOC.133:6044–6050. VanNoordenR.2009.RNAWorldEasiertoMake.Naturepublishedonline. WarrantRW,KimS-H.1978.a-Helix-doublehelixinteractionshowninthestructureofa protamine-transferRNAcomplexandanucleoprotaminemodel.Nature271:130-135. WatsonJD,CrickFHC.1953.AStructureforDeoxyriboseNucleicAcid.Nature171:737-738. WillsPR.1994.Doesinformationacquiremeaningnaturally?.BerichtederBunsengesellschaftfür PhysikalicheChemie98:1129–1134. WillsPR.2016.Thegenerationofmeaningfulinformationinmolecularsystems.Phil.Trans.R.Soc. AA374:20150016. 29 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. WillsPR.2009.InformedGeneration:Physicaloriginandbiologicalevolutionofgeneticcodescript interpreters.JournalofTheoreticalBiology257:345–358. WillsPR.1993.Self-organizationofgeneticcoding.J.Theor.Biol.162:267-287. WillsPR.2004.Stepwiseevolutionofmolecularbiologicalcoding.In:PollackJ,BedauM,Husbands P,IkegamiT,WatsonRA,editors.ArtificiallifeIXCambridge::MITPress.p.51–56. WillsPR,CarterCW,Jr.2017.Insuperableproblemsofaninitialgeneticcodeemergingfroman RNAWorld.BiosystemsInPreparation. WillsPR,NieseltK,McCaskillJS.2015.EmergenceofCodinganditsSpecificityasaPhysicoInformaticProblem.OrigLifeEvolBiosphpublishedonline;paginationnotyetavailable. WochnerA,AttwaterJ,CoulsonA,HolligerP.2011.Ribozyme-CatalyzedTranscriptionofanActive Ribozyme.Science332209-212. WoeseC.1967.TheGeneticCode.NewYork:Harper&Row. WoeseCR.1965a.OntheOriginoftheGeneticCode.ProceedingsoftheNationalAcademyof SciencesUSA54:1546-1552. WoeseCR.1965b.OrderintheOriginoftheGeneticCode.ProceedingsoftheNationalAcademyof SciencesUSA54:71-75. WoeseCR,DugreDH,SaxingerWC,DugreSA.1966.Themolecularbasisforthegeneticcode.Proc. Natl.Acad.Sci.USA55:966–974. WoeseCR,OlsenGJ,IbbaM,SollD.2000.Aminoacyl-tRNASynthetases,theGeneticCode,andthe EvolutionaryProcess.MicrobiologyandMolecularBiologyReviews64:202–236. WolfYI,AravindL,GrishinNV,KooninEV.1999.EvolutionofAminoacyl-tRNASynthetases— AnalysisofUniqueDomainArchitecturesandPhylogeneticTreesRevealsaComplexHistoryof HorizontalGeneTransferEvents.GenomeResearch9:689–710. WolfYI,KooninEV.2007.OntheoriginofthetranslationsystemandthegeneticcodeintheRNA worldbymeansofnaturalselection,exaptation,andsubfunctionalization.BiologyDirect2:14. WolfendenR,CullisPM,SouthgateCCF.1979.Water,ProteinFolding,andtheGeneticCode.Science 206:575-577. WolfendenR,LewisCA,YuanY,CarterCW,Jr.2015.Temperaturedependenceofaminoacid hydrophobicities.Proc.Nat.Acad.Sci.USA1127484-7488. WolfendenR,SniderMJ.2001.TheDepthofChemicalTimeandthePowerofEnzymesasCatalysts. AccountsofChemicalResearch34:938-945. WongJT-F.2005.Coevolutiontheoryofthegeneticcodeatagethirty.BioEssays27:416–425. WongJT-F,NgS-K,MatW-K,HuT,XueH.2016.CoevolutionTheoryoftheGeneticCodeatAge Forty:PathwaytoTranslationandSyntheticLife.Life6:12. XueH,NgS-K,TongK-L,WongJT-F.2005.CongruenceofevidenceforaMethanopyrus-proximal rootoflifebasedontransferRNAandaminoacyl-tRNAsynthetasegenes.Gene360:120–130. YarusM.2011a.GettingPasttheRNAWorld:TheInitialDarwinianAncestor.ColdSpringHarb PerspectBiol3:a003590. YarusM.2011b.LifefromanRNAWorld:Theancestorwithin.Cambridge,MA:HarvardUniversity Press. ZullJE,SmithSK.1990.Isgeneticcoderedundancyrelatedtoretentionofstructuralinformationin bothDNAstrands?TrendsinBiochemicalSciences15:257-261. 30 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. FigureLegends Figure1.Informationflowinmolecularbiology.A.TheCentralDogmaissupplementedbythe“adaptor” hypothesis. The dashed triangle represents the crucial elements of Crick’s original insight, which necessarilyimplicatesbothtRNAandaaRS.B.Thephysico-chemicalpropertiesoftheaminoacidsdefine the nano-scale “ecologies” within folded proteins, creating the intersection between genome and proteome. These ecologies also drove the selection of tRNA identity elements, analogous to a programming language, as well as protein folding. As a consequence, they also drive the selection of aminoacidsequencesinmRNAgenesequences(mRNA),analogoustocomputerprograms.C.Network analysisoftheCentralDogmaconsistsofthenodesofatetrahedron.EmbeddingthetrianglefromAinto the ecology in B reveals a uni-directional feedback cycle or self-referential element as generator of complexity in the spirit of Gödel’s incompleteness theorem (Hofstadter 1979). Genetic instructions assembleaminoacidsaccordingtotheirphysicalpropertiesinwaysthat,whentranslatedaccordingto theprogramminglanguageintRNA,yieldfunctionalproteins(enzymes,switches,regulators).AARSwith cognate tRNAs furnish reflexive elements (orange arrow) connecting their gene sequences, via their foldedstructures,totheenzymesthatenforcethecodingrulesinthecodontable.Physicalpropertiesof aminoacidsandthecodonassignmenttableare“fixed”becausetheyaregovernedbychemicalequilibria. Thegenomeandproteomearedynamically-determinedbiologicalprocessesthatformthebasisforthe evolutionofdiversitythroughself-organizationandnaturalselectionofphenotypes. Figure2.QuasispeciesbifurcationsinaaRSgeneorproteinsequencespace.A.Asinglequasispeciesof undifferentiatedfunction makingrandomassignmentsX®yofcodons(X)toaminoacids(y)cannot transmitgeneticinformation.Norcaniteasilybifurcatetoapairofnarrowerquasispecies.Bi-directional codingancestryofthecontemporaryaaRScreatedsuitablequasispeciesdenovo{I,II;redandblue;bold italicsexplicitlyindicatingClassI,IIaaRS}eachseparatelysupportingbinarycodingassignmentsI®i andII®iofspecificsubsetsofcodons{I,II}tocorrespondingsubsetsofaminoacids{i,ii}.Thatdoublehelicalgenewithdualsingle-strandinterpretationsovercametheinitialandmostsubstantialbarrierto theemergenceofgeneticcodingbypartitioningproteinsequencespacedecisivelyintotwofunctionally distinctpopulations.TheplanebetweentheIandIIquasispeciesisalocalrepresentationoftheinversion operator that transforms a sequence into its complement read in the reverse direction. B. Daughter populationdistributionsderivedfromnearlysimultaneousbifurcationofthetwoancestralbinarycoding quasispecies(expandedview)intosmallerseparatesub-populationsofgenesandassignmentcatalysts operating a 4-letter code {Ia ® ia, Ib ® ib, IIa ® iia, IIb ® iib,}. Genetic coding bi-directionality is preserved through complementary gene pairs IaÛIIa and IbÛIIb. Recapitulation of the bifurcation process would further specialize related species, each step being progressively easier owing to the increasedcodingspecificity,buteventuallylosingtheabilitytouseinformationinbothstrandsofgenes. 31 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Figure 3. Reflexivity is an exclusive property of protein aaRS that arises from nano-environmental sensing. A. The putative ancestral amino acid activating protozyme, substantiated experimentally in (Martinez,etal.2015)furnishestwoassignmentcatalysts,eachexecutingacomplementaryassignment, oneforlarge,theotherforsmallaminoacidsidechains.Eachalsocontributestothetranslationofthe other.B.AstheassignmentcatalystsinAareproteins,theirfoldingreactionsaregovernedbythephase transferequilibriaoftheaminoacids,sensingthenano-environmentinanecessarypreludetofunction. Figure4.Feedbackinriboyzymal(RCW)andprotein(PCW)GRTnetworks.A.CoupledReplicaseand Translataseproduction.DifferentialequationsforgeneexpressioninPCWandRCWarecomparedfor RNAsandProteins(WillsandCarter2017).Solidlinesindicateautocatalyticacceleration.Dashedarrows forma(hyper)cyclecouplingproductiondynamicsofReplicaseandtranslataseinthePCW,butnotin anyRCW.B.InanRCW,codingrules[C]areimplementedbyribozymalassignmentcatalysts{T[C]}that cannot sense the phase transfer equilibria accessible to protein assignment catalysts. Thus, natural selectionistheonlyfeedbackcycle.Non-aaRSfunctionalproteins{Pi}furnishtheonlysourceofselective advantage,andhavenodirectinfluenceonthecodingrules.C.InthePCW,codingrulesareexecutedby proteinsthatmustfirstfold.Atighterfeedbackloop(greenarrows)isastructuralfeatureofthereaction network.Proteinfoldingrulesdeterminethefunctionoftheassignmentcatalystsandthereforealsothe eventualchoiceofcodonassignments,substantiallyenhancingnaturalselection. Figure5.Impedance-matchingeaseselaborationofcodingfroma2-letteraminoacidalphabettoafull 20letteralphabet.Noise,N,inthegeneticsignal,S,ontheyaxis,servesastheprimaryobstacleopposing informationtransferintranslation.Increasedcomplexity,onthexaxis,mustbeaccompaniedbyreduced error rates. The error tolerance curve is a hyperbola in which the product of error frequency by complexity, N/S*Ψ, equals the minimum cost of an error, εmin, as estimated by Schneider (Schneider 2010). By analogy to the gears on a bicycle’s derailleur, enlarging the alphabet size increases coding capacity,providingaseriesofmatcheswiththehyperbolicboundingerrortolerancecurve(darkblue), easingthepathtoincreasedfidelitybyenablingsteppedincreasesincodingcapacityandcomplexity. Figure 6. Alternative evolution of Class I aaRS catalytic domains consistent both with phylogenetic analysisandthemoreancientancestryofthemosthighlyconservedmodulesinthetwoaaRSClasses. Thisscenariore-definestheClassICP1insertionbetweentheN-andC-terminalmodules,Ncore (blue) and Ccore (red), of the Class I Urzymes, both of which are portrayed as ancestral to all Rossmannoid superfamilies(AdaptedfromFig.4of(Aravind,etal.2002)).TheinitialCP1insertion(white)istheorigin of most subsequent elaborations of the Class I catalytic domains that appear to have provided the requisite increases in specific amino acid recognition (Carter 2015). Idiosyncratic Class I anticodonbinding domains are and not considered here. We distinguish three phases of aaRS evolution: I. BidirectionallycodedwithClassII;limiteddiversity;II.CP1enforcesstrandspecialization;III.Hydrolytic editingenhancesspecificity. 32 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Fig.1 Fig.2 Fig.3 33 bioRxiv preprint first posted online May. 17, 2017; doi: http://dx.doi.org/10.1101/139139. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. Fig.4 Fig.5 Fig.6 34