Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Gene therapy of the human retina wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Population genetics wikipedia , lookup
Koinophilia wikipedia , lookup
Microevolution wikipedia , lookup
Oncogenomics wikipedia , lookup
bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Title: VastpopulationgeneticdiversityunderliesthetreatmentdynamicsofETV6-RUNX1acute lymphoblasticleukemia One-Sentence Summary: APOBEC and replication-associated mutagenesis contribute to the development of ETV6-RUNX1 ALL, creating massive leukemic population genetic diversity that results in clonal differences in susceptibilities to chemotherapy. Authors: Veronica Gonzalez-Pena1, Matthew MacKay2, Iwijn De Vlaminck2, John Easton3, Charles Gawad1,3 Affiliations: 1 Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN, 38105, USA 2 Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, 14850, USA 2 Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN, 38105, USA Correspondence: Charles Gawad [email protected] bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Abstract Ensemble-averagedgenomeprofilingofdiagnosticsamplessuggeststhatacuteleukemias harborfewsomaticgeneticalterations.Weusedsingle-cellexomeanderror-corrected sequencingtosurveythegeneticdiversityunderlyingETV6-RUNX1acutelymphoblastic leukemia(ALL)athighresolution.Thesurveyuncoveredavastrangeoflow-frequencygenetic variantsthatwereundetectedinconventionalbulkassays,includingadditionalclone-specific “driver”RASmutations.Single-cellexomesequencingrevealedAPOBECmutagenesistobe importantindiseaseinitiationbutnotinprogressionandidentifiedmanymoremutationsper cellthanpreviouslyfound.Usingthisdata,wecreatedabranchingmodelofETV6-RUNX1ALL developmentthatrecapitulatesthegeneticfeaturesofpatients.Exposureofleukemic populationstochemotherapyselectedforspecificclonesinadose-dependentmanner. Together,thesedatahaveimportantimplicationsforunderstandingthedevelopmentand treatmentresponseofchildhoodleukemia,andtheyprovideaframeworkforusingpopulation geneticstodeeplyinterrogatecancerclonalevolution. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Maintext: Introduction Analogoustoorganismalevolutioninanecosystem,theenvironmentalselectionpressures, populationsize,andrateofgenomemodificationarecoredeterminantsoftumorevolutionary dynamics1.Contemporarybulksequencingmethodsthatinterrogatetheensemble-averaged mutationalprofilesofthegenomesofthousandsofcellspredominantlyidentifythose mutationspresentinthemostdominanttumorsubclonesatdiagnosis2.However,bulk sequencingstrategiesarelimitedintheirabilitytocapturethefullgeneticdiversityofa population,astheydonotidentifylower-frequencyclonesandvariants.Adeeper understandingofthegeneticdiversitywithintumorsiskeytounderstandingtumorevolution, includingthatofmalignantcellpopulationsthatmayincreaseinsizeastherelativefitnessof cloneschangesinresponsetonewselectionpressuresaspatientsundergotreatment. Ourknowledgeoftheevolutionarydynamicsofacuteleukemiasislargelyderivedfrombulk sequencingstudies,whichhaveconcludedthatacuteleukemiasharborminimalgenetic complexitywhencomparedtoothermalignantneoplasms3,4.InuteroacquisitionofanETV6RUNX1translocationhasbeenidentifiedasthemostfrequentinitiatingeventofacute lymphoblasticleukemia(ALL)5,6,themostcommonchildhoodleukemia.However,anETV6RUNX1translocationisinsufficientforleukemogenesis7;recentgenomicstudieshaveidentified deletionsofgenesrequiredfornormalB-celldifferentiation8thatharborsignaturesofaberrant RAGrecombinaseactivityasfrequentcooperatinglesions9.Inaddition,ETV6-RUNX1ALLcells bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. alwaysharborsomaticsingle-nucleotidevariants(SNVs)9,10.Byusingsingle-cellgenomics,we determinedthatmostofthedeletionsoccurbeforetheacquisitionofSNVs,whichresultinthe outgrowthofco-dominantclonalpopulations9.Inaddition,analysesoftheSNVsequence motifsrevealedanimportantmutagenicroleforaberrantcytosinedeaminaseactivitybyan APOBEC(apolipoproteinBmRNA-editingenzyme,catalyticpolypeptide-like)protein9,10.As APOBECproteinsmediateinnatedefenseagainstviralinfections11,thisfindingsupportsthe hypothesisthatenvironmentalexposuretovirusestriggersthetransformingmutagenesis12.ALL alsoundergoesclonalevolutionbetweenitsinitialdiagnosisandrelapse13,14.However,the magnitudeofthatevolutionandthechangesinclonalcompositionbetweendiagnosisand relapseremainsunknown. SeveralfundamentalquestionsregardingthegenesisandtreatmentresponseofETV6-RUNX1 ALLcouldbemorepreciselyaddressedbystudyingALLgeneticsonapopulationscale.For example,whyarethereco-dominantclonalpopulations,especiallywhensomeclonesharbor known“driver”mutations?Whatisthetotalmutationburdenacrossthepopulationof malignantcells,anddotheunderlyingmutationalprocesseschangeovertime?Howdoesthis populationgeneticdiversityinfluencetreatmentresponse? Toaddressthesequestions,wefirstusedsingle-cellexomesequencinganderror-corrected sequencingtofurtherdissecttheintraclonalgeneticdiversityandtheshiftinmutational processesthatoccurredduringETV6-RUNX1ALLdevelopment.Theseanalysesrevealed evidenceofmassivepopulationgeneticdiversity.Then,afterpatientsbegantherapy,we bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. examinedtheevolutionofthoseleukemicclonesbyexposingsamplestostandard chemotherapydrugs.Together,ourfindingsenhanceourunderstandingofthedevelopment andtreatmentresponseofETV6-RUNX1ALLatthesingle-celllevelandhavesignificantclinical implications. Results Clone-specific“driver”mutationsidentifiedbysingle-cellexomesequencing Tofurthercharacterizethegenomicdiversityofclonespreviouslydefinedbysegregating variantsidentifiedinthebulksampletosinglecells10,weperformedsingle-cellexome sequencingon3cellsfromeachcloneandon3normalcellsfromthesamepatient(Fig.1A). Weachievedameansaturatingcoveragebreadthof82%ofthetargetexomewith60million reads,ascomparedto95%coverageofthetargetexomeinbulksamples(SupplementaryFig. 1).Usingonlythesinglecells,wecalledmutationsbyrequiringatleast2cellstohavethesame basechangeatthesamegenomicposition.Theinitialclonalstructurehad5high-frequency clones,withoneofthe2largestclonesharboringanE63KKRASmutation,whereaswecould notclearlyidentifytransformingalterationsintheotherclones(Fig.1A).Wethenidentifieda further10to29mutationsperclone(Fig.1B).Usingthenormalcellsasacontrol,weidentified alowfalse-variantcallrateat7sites,possiblyresultingfromclonalmutationsacquiredin nonmalignantcells,amplificationartifacts,orsequencingerrors.Wethencomparedthe mutagenicbase-changepatternoftheearlier,higher-frequencymutationsthatweredetected inbulkandweresharedbetweenclonestothelater,clone-specificchangesthatweredetected onlywithsingle-cellexomesequencing.Interestingly,theearlymutationswerestrongly bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. enrichedforC-to-TandC-to-GchangeswithanassociatedAPOBECmotif(TprecedingC), whereasthemostcommonlatermutationswereA-to-G,A-to-C,andC-to-T,withenrichment forGfollowingC.Thatsignatureismostconsistentwithreplicationerrors(Fig.1C,D)15.Wealso foundaKRASG12Smutationthatwasconfinedtoalessabundantclone,aswellasanother clone-specificNRASG12Dvariant;bothofthesemutationswereC-to-Tchanges(Fig.1A). Becausethe3activatingRASmutations(G12R,G12S,E63K)wereacquiredindistinctclonesand notsequentially,wehypothesizedthattheyoccurredoverarelativelyshorttime.Wethen proposedthatmoreoncogenicmutationswouldbeacquiredslightlylaterthanthedominant clones,resultingintheirbeingpresentatevenlowerfrequencies. Deeperinterrogationidentifiesmassivepopulationgeneticdiversity Tomoredeeplycharacterizeoncogenicmutationsinthebulksamplefromthesamepatient, weperformederror-correctedtargetedsequencingof50mutationalhotspotsinALL (SupplementaryTable1).Weidentifiedthesame3RASmutations(G12R,G12S,E63K)that weredetectedthroughbulkandsingle-cellexomesequencing,alongwith2additionalknown activatingKRASmutations(G12D,D119N)(Fig.2A)16–19,allofwhichwereC-to-Tsubstitutions. Wefoundnoevidenceofmutationsintheother37genesthatweexaminedusingaFisher’s exacttestcutoffof0.01.TheonlybenignlesionfoundinaRASgenewasaG12Gmutationthat waspartofthedinucleotideG12Dvariant. ToevaluatewhetherthesefindingswereapplicabletoawiderrangeofETV6-RUNX1leukemias, weperformederror-correctedsequencingonsamplesfrom6patientswithasingleRAS bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. mutationidentifiedbybulksequencingandon6samplesfrompatientswithnoknownRAS mutations.WedetectednoRASmutationsinthelatter6samples;however,wefoundamedian of5activatingRASmutationsinthe6patientsforwhomamutationwasdetectedinthebulk samples(Fig.2B).OnexaminingtheadditionalRASvariantsmoreclosely,wefoundthe mutationstobeclusteredatknownRASmutationalhotspots,namelyKRAScodons12,13,119, and146andNRAScodons12and13.Therewasnoclearpropensityforvariantsatspecific codonstobesubclonal,supportingtheassertionthatthetimingoftheacquisitiondetermined whetheramutationbecamedominant,asopposedtothechangeinRASsignalingactivitydue tothespecificaminoacidchange(Fig.3C). WehypothesizedthatevenmoresensitivemeasurementswouldidentifyadditionalRAS mutations.Weperformedlimitedsampledilutionsfollowedbyerror-correctedsequencingof patientSJETV075,hypothesizingthatsomeRASmutationsthatwerejustbelowourdetection thresholdwouldrandomlybepresentatslightlyhigherfrequenciesinthedilutesamples.By thisapproach,weidentified3additionallow-frequencyKRASmutations(G12R,A146T,L19F) foratotalof9knownactivatingRASmutationsinpatientSJETV075.OurexaminationofallRAS mutationsinthiscohortrevealedthatofthe33mutationsidentified,31wereC-to-TorC-to-G changes.The5C-to-GchangeswereenrichedfortheAPOBECmotif,butnosimilarenrichment wasfoundintheC-to-Tmutations,suggestingthattheyresultedfromnear-targetAPOBEC activityorreplication-associatedmutagenesis.Thiscontrastswithlungcancer,inwhich approximately60%ofKRASmutationsareG12CorG12V,resultingfromC-to-Asubstitutions20. Together,theseobservationsfurthersupportthehypothesisthatAPOBECactivityand bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. replication-associatedmutagenesisaretheunderlyingprocessesdrivingtheevolutionofETV6RUNX1ALL. Wethenconsideredthevariablesthatwouldresultinsuchalargeintra-patientactivatingRAS mutationburden.Inourfirstmodel,weproposedthatalargepopulationofcellswasatriskfor rapidRAS-mediatedexpansionandunderwentawidespreadmutagenesisprocess,resultingin theconcurrentoutgrowthofmultipleclonescontainingmutationsconferringsimilarfitness.In oursecondmodel,weproposedthatasmallerpopulationwasatriskfortransformation,witha lowerglobalmutationrate,butthatRASwasamutationalhotspot,resultingintheacquisition ofmultipleRASmutationsduringtheperiodofleukemictransformation.IftheRASgeneswere mutationalhotspotsforETV6-RUNX1ALL,wewouldexpecttofindadditionalbenignmutations aspassengersoftheactivatingmutations.However,evenourmoresensitiveapproachto mutationdetection,usingbothsingle-cellanderror-correctedsequencing,foundnoadditional somaticvariantswithinKRAS,NRAS,orHRAS. Takentogether,thesefindingssupportthemodelinwhichamassivemutationburdendevelops withinarelativelyshorttimeinapopulationatriskfortransformation.Tofurthersupportthis model,wemeasuredthemutationrateinsingleleukemiacellsandusedthenormalcellsfrom thesamesamplethathadundergonewhole-genomeamplificationinthesamemicrofluidic chiptocontrolforthebackgroundmutationsandforamplificationandsequencingerrors.By thisapproach,wemeasuredameanof208codingmutationspercellthatwereabovethe backgroundrateinthenormalcells(Fig.3A).Comparedtoourpreviousmutationestimation, bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. thisapproachestimatedgreatergeneticdiversity,witheachcellharboringameanof150 codingvariantsthatwerenotdetectedbybulksequencingorintraclonalexomesequencing (Fig.3B). Simulationestimatesthesizeandgeneticdiversityofleukemicpopulations Giventhelargenumberofmutationspercell,wedevisedamodeltoestimatethepopulation sizeinordertoapproximatethetotalgeneticdiversityacrossalltheclonalpopulationsatthe timeALLisdiagnosed.Toaccomplishthis,weusedthosevariablesthatcouldbeestimated fromtheresultsofpreviousstudies,alongwithourcurrentmeasurements,tosimulateETV6RUNX1ALLdevelopment.WeknowthatthediseaseisinitiatedfromasinglecellbyanETV6RUNX1translocation,asallcellsharborthesamebreakpoint5.Furthermore,eachpopulation acquiresameanof12deletions9.Wealsoknowthatthediseaseisinitiatedinuteroand developsoverameanof4.7years21;thatlymphocyteprecursorsdivideapproximatelyevery 11.9days22,whereasprimaryleukemiccellsdividemuchmorefrequently;andthatreplication errorsoccurincodingregionswithafrequencyofapproximatelyonceinevery300humancell divisions24.Fromourresults,weestimatethateachcellhadacquiredapproximately200coding SNVs,withmanyofthemarisingfromAPOBECmutagenesisthatoccurredinaburstovera shorttime,randomlyresultinginthe16activatingRASmutationsidentifiedinourETV6-RUNX1 ALLcohort.CellsharboringETV6-RUNX1,arecurrentdeletion,andaRASmutationhave significantlyincreasedratesofreplication,resultingintheclinicaldiagnosiswhenchildrenreach atotalleukemiccellburdenofapproximately1 × 1011cells(Fig.4A). bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Usingthoseparameters,weinitiated100simulationsinwhichthemutationratesresultedin eachcellacquiringameanof13deletionsand229SNVsbyameanof28daysafteraburstof APOBECmutagenesis(SupplementaryTable2,Fig.4B).Definingclonesbasedoncellswiththe samesomaticmutationprofileincodingregions,weestimatethattherewere330million clonesintotal(SupplementaryTable2).However,mostcloneswerecreatedwhentheoverall populationsizewashigh,makingthemrare(Fig.4C).Thenumberofhigh-frequencycloneswas variableanddependedonthetimeandfrequencyofRASmutationacquisition(Fig.4D).We alsofoundevidencethatAPOBECmutagenesisandnotreplicationerrorscausedmostofthe distinctRASmutationsandthatthenumberofuniqueRASmutationsvariedconsiderably betweensimulations(SupplementaryFig.3).Takentogether,thesesimulationresultsare consistentwithourexperimentaldataandsupporttheassertionthatthereismassive populationgeneticdiversityatthetimeapatientisdiagnosedwithALL. Massivepopulationdiversitydrivestreatmentdynamicsinapatientsample Withsuchhighpopulationgeneticdiversity,wehypothesizedthatsomeclonesalready harboredmutationsatdiagnosisthatalteredtheirsusceptibilitytotreatment.Totestthat hypothesis,weexposedleukemiccellsto5standardALLchemotherapydrugs(mercaptopurine, vincristine,prednisolone,daunorubicin,asparaginase)andusedexomesequencingtoevaluate themutationalcompositionafterdrugexposure.Fromthisinitialscreen,weidentified537 putativemutationsinatleastonetreatment.Wethenperformederror-correctedsequencing ofthosesitesintriplicateforeachtreatmenttoconfirmthatthemutationswerepre-existing. Wedetected224specificbasechangesatthesamelocationinmorethanonesample bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. (SupplementalTable3)—approximately5timesasmanymutationsasthe41thatweidentified intheinitialbulksequencing. Toidentifyresistantclones,wetreatedthecellswithanincreasingdoseofeachchemotherapy drugandlookedformutationsexhibitingadose-dependentincreaseinfrequency.Thecontrol cellstreatedwithnodrugorDMSO,aswellasthecellstreatedwiththelowestdoseof asparaginase,showedanincreaseinadistinctclusterofmutations.Thisclusterincludedthe highest-frequency146VKRASmutation,alongwith2nonsynonymousTP53mutations(Fig.5, cluster7).Interestingly,2mutations(FOLH1R281HandRGPD3P816C)werestronglyselected forinallcontrolandtreatmentsamplesbutwerenotdetectedinthediagnosticsample, suggestingtheywereselectedforbythecultureconditions.Treatmentwithlow-dose mercaptopurineorhigherdosesofasparaginaseresultedinreducedexpansionoftheKRAS A146Vclone.Thisisconsistentwiththeknownpharmacokineticsofasparaginase,whereby increasingthedoseaftertargetsaturationhasnoeffectoncellkilling25.Treatmentwith vincristineordoselevel2ofmercaptopurinefurtherdecreasedtheclone(s)inmutationcluster 7anddecreasedthefrequencyofthehighest-frequencymutationsincluster1.Finally, exposuretoprednisolone,thehighestdosesofmercaptopurine,ordaunorubicinresultedinthe greatestdecreaseinmutationsinclusters1and7,whereasmutationsinclusters2through4 increasedinfrequencyasaresultofthesetreatments.Treatmentwiththehighestdosesof daunorubicinkilledallcellularpopulations.Takentogether,thesedatashowthatthe underlyinggeneticdiversitywhenALLisdiagnoseddoesaffectthetreatmentdynamics. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Discussion Wehavepresentedtheresultsofacombinationoferror-correctedandsingle-cellsequencing ofETV6-RUNX1ALLcellscollectedatdiagnosisinordertofurtherresolvethetemporalchanges inclonalstructuresandmutationalprocessesthatoccurduringthedevelopmentofthedisease. TheshiftintherelativefrequencyofdifferenttypesofcytosinemutationrevealedanAPOBEC mutagenesispatternthatdecreasesovertime,suggestingthatthisprocessisimportantfor diseaseprogressionbutisnotrequiredforpersistenceorongoingexpansionofleukemic clones.Wedetectedanunexpectedlyhighpopulationmutationburden,whichrevealed differencesinthetreatmentresponseamongtheclonesthatarosefromthepopulationthat hadpreviouslyundergoneAPOBECandreplication-mediatedmutagenesis.Thisfinding emphasizesthedynamicnatureofleukemicevolution,astherelativeimportanceofmutations requiredforcellsurvivalshiftsunderdistinctselectionpressuresaspatientsundergo treatment.Wehaveintegratedthesefindingswithpreviousknowledgetocreateanewmodel ofETV6-RUNX1ALL,whichispresentedinFigure6. ThesenewdetailsoftemporalshiftsinthemutationalprocessesofALLintheperiodleadingup toapatient'sclinicalpresentationhighlighthowsingle-cellgenomicscanbeusedtotraceback themutationalhistoriesoftumors.ThedecreaseinAPOBEC-mediatedmutagenesisduring diseaseprogressionremovesamajorcontributortotheglobalmutationrateinthosepatients. ThischangeinthemutationratemaybeonereasonwhyALLpatientsarecuredatsuchhigh ratescomparedtopatientswithothertumortypesinwhichthesourceofthemutagenesis, suchasamutationthatdecreasesthefidelityoftheDNArepairpathway,doesnotdecrease bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. overtime26.Fromourresults,itisunclearwhateffectongoingexposuretomutagenic chemotherapyagentshasontheinductionofdrug-resistantclones,buttheeffectcouldbe significantinpediatricpatientswithALL,mostofwhomundergotreatmentfor2to3years27. Mutationsthatareconsidereddrivinglesions,suchasvariantsinKRASorTP53,werenot dominantatdiagnosisorafterundergoingselectionduringdrugtreatment,suggestingthat combinationsofmutationsmediatetheclinicalbehaviorsofclones.Wealsoobservedtherise ofclustersofmutations,butitisunclearhowmanydistinctclonalpopulationsthoseclusters represent.Thisisanareawherehigher-resolutionstudiesusingsingle-cellsequencingto determinetheco-occurrencepatternsofmutationswillprovideadditionalinsightsintodrug resistance.Theputativehighnumberanddifferentialdrugsensitivityofleukemicclonesalso provideinsightsintotheneedforcombinationtherapytocurechildrenwithALL.Althoughthe sheergeneticdiversityofthepopulationpresentsnewchallengesforthedesignoftargeted treatmentstrategies,thepresenceofmassivepopulationgeneticdiversityinALLalsoprovides anopportunitytoprobethosesamplestolearnwhywecanalreadyovercomethepre-existent resistanceinmostpatientswithALL.Focusingonspecificmutationsthatareselectedforby particulardrugscouldyieldmechanisticinsightsintodrug-specificresistanceandprovideanew rationaleforchoosingdrugcombinations. Insummary,wehaveprovidednewinsightsintothedevelopmentofETV6-RUNX1ALLandits resistancetotreatmentbystudyingpopulationgenetics.Ourfindingsunderscorethe importanceofstudyingthemutationburden,size,andmutationrateofthepopulationasa bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. whole,notjustthehighest-frequencyvariantsdetectedbystandardbulksequencing,when tryingtounderstandandpredicttreatmentresponse.Greatergeneticdiversityhasalso recentlybeenreportedinotherpremalignantandmalignantstates28,29,suggestingthatthe studyofcancerpopulationgeneticsisimportantforunderstandingmosttumortypes. Together,thesestudieshaverevealednewlayersofcomplexityinleukemicevolutionthatneed tobefullyunderstoodinordertomoreeffectivelyeradicatepremalignantandmalignantALL cellpopulationswithlesstreatment-relatedtoxicity,andtheyhaveprovidedaframeworkfor studyingintra-tumorevolutioninawiderangeofmalignantneoplasms. OnlineMethods Single-cellexomesequencingandmutationcalling AmplifiedDNAfrompatient4thathadundergonesingle-cellisolationandwhole-genome amplificationusingtheFluidigmC1Systemaspreviouslydescribed10wasusedforlibrary constructionandexomecapturewiththeNexteraRapidCaptureExomeKit(Illumina),usedin accordancewiththemanufacturer'sinstructions.Exome-enrichedlibrariesthenunderwent sequencingusing2 ××100readson4flowcellsofaHiSeq2000or2500SequencingSystem (Illumina).AdaptersweretrimmedfromeachofthecellsbyusingTrimmomatic (ILLUMINACLIP:nextera_adapters.fa:2:30:10TRAILING:25LEADING:25SLIDINGWINDOW:4:20 MINLEN:30),followedbyalignmentwithBWAusingdefaultparameters.Duplicateswere markedusingPicard(https://broadinstitute.github.io/picard),andlocalrealignmentfollowed bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. bybasescorerecalibrationwasperformedusingGATK (https://software.broadinstitute.org/gatk).WethencalledvariantsbyusingGATKandfollowed thiswithfilteringusingtheparameter“QD<2.0||FS>60.0||MQ<40.0||HaplotypeScore> 13.0||MQRankSum<−12.5||ReadPosRankSum<−8.0”.On-targetcoveragewascalculated withPicardHsMetrics;thiswasrepeatedaftersubsamplingforanincreasingnumberofreads usingcustombashscripts.Custombashscriptswerealsousedtoidentifylocationsthathadthe samemutationcalledinmorethanonecell.GermlineSNPlocationsidentifiedbybulk sequencingwerethenfilteredout,afterwhichlocationsthatwereidentifiedinanyofthe normalsinglecellswereremoved. Error-correctedsequencing Adapterswithuniqueidentifierswerepreparedaspreviouslydescribed.Aliquotsof250or500 ngofgenomicDNAthenunderwent30minofchemicalfragmentationandstandardlibrary preparationbyusingtheKAPAHyperPlusKit(KapaBiosystems)withadaptersthatcontained uniquemolecularidentifiersasdescribed30,using3μgofadapterperreaction(a10:1molar ratio).PCRamplificationandhybridcapturewereperformedaspreviouslydescribed31. SequencingwasperformedusingMiSeqV2chemistry,using2 × 150-bpPEreads.Wethen trimmedthesequencesto125bpwithTrimmomaticandplacedtheuniquemolecular identifiersintotheheaderbyusingthescripttag_to_header.py30.Readswerealignedusing BWAALNwithstandardparameters.WesortedandindexedusingPicardthenperformed consensuscallingbyusingConsensusMaker.pywithparameters–minmem3,–cutoff0.8,and-Ncutoff0.7.UnmappedreadswereremovedwithSAMtools(http://samtools.sourceforge.net/), bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. thenlocalrealignmentwasperformedusingGATKbeforecreatinganmpileupfile.Normaland TumormpileupfileswerethencomparedusingVarScanSomatic,withsomaticmutations requiringaP-valueoflessthan10−4,ascomputedusingFisher’sexacttestbyVarScan (http://varscan.sourceforge.net).Wealsorequiredthegermlinesampletohavefewerthan5 readsandthatnomorethan90%ofvariantreadswereonthesameDNAstrand.Variants underwentRefSeqannotationwithANNOVAR. Estimationofmutationrates Toestimatethemutationrate,wedownsampledeachofthefilesto70millionreads.Wethen createdmarked,realigned,andbasescore–recalibratedBAMfilesasdescribedabove.Thiswas followedbyfurthervariantcallingandfilteringusingtheGATKfilteringparametersdetailed above.Wethensubtractedthosesitesthatwerefoundinthebulkgermlinesequencing.To subtractthebackgrounderrorrateduetoamplificationerrors,thesomaticmosaicismratesin normalcells,andmutationmiscalls,wesubtractedthemeanmutationrateinthe3normalcells fromthatofeachofthesingletumorcellsandplottedthedistributionofthemutationsrates. Simulation WedesignedacomputationalmodeltoinvestigatewhenRAG-mediateddeletionsandRAS mutationsoccur,theclonalburden,andthetimingofdiseaseonset.Themodelisinitiatedwith asinglecellwithanETV6-RUNX1translocationthatiscapableofdifferentiationanddivides every12days,definedascelltype0.Ateachtimepoint,cellsmaygainadeletionor,ifdividing orifAPOBECmutagenesisisactive,anSNV.Type1cellsarecreatedwhentype0cellsgaina bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. specificRAG-mediateddeletion,whichoccurswithprobabilitypRAG_DA.Type1cellsare consideredtohavedifferentiationarrest,buttheydivideatthesamerateastype0cells.Type 2cellsarecreatedwhentype1cellshaveamutationwithinRAS,whichoccurswithprobability pMut_IP.Type2cellshavedifferentiationarrestaswellasincreasedproliferation,duplicating everyday.Thenumberofcellswithineachcelltypewithaspecificamountofmutationsand deletionsaretracked.Onetimepointisequivalentto1day,andeachcellisstochastically sampled. Foreachcellofagiventype,arandomnumberisgeneratedfrom1to1/pDel,wherepDelisthe probabilityofgainingadeletion.Adeletionoccurswithincellsforwhichthisrandomnumber equals1.Asecondrandomnumberisgeneratedfrom1:1/pRAG_DAforeachtype0cellthat acquiresadeletion.Anycellinwhichthissecondnumberis1containsadeletionthatcausesit totransformintoatype1cell.Thissameprocessiscarriedoutwhencalculatingthemutational burdenaswellasthetransformationoftype2cellsintotype3cells. APOBECmutagenesisstartsonafixedday,ifcelltype3hasnotbeengenerated,andcontinues for2daysaftercelltype3hasformed.MutationsgeneratedfromAPOBECarecalculatedinthe samewayaspreviouslydescribed,withtheexceptionthateverycellwithincelltypes1and2 undergoesAPOBECmutagenesisandthenumberofmutationscausedbyAPOBECwithina givencellisdeterminedbyrandomlysamplingfrom1:Burst.Thesimulationcompleteswhen celltype3hasreached1 × 1011cells.Weestimatethenumberofcellsperclonebythetiming ofthenewlyacquiredmutationandreplicationrate,accountingforthebranchingofcellssuch thatthesumofallcellsequalsthenumberofcellsofagiventype. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. SimulationswererunusingournewRpackage,whichiscalledRepALLandisavailableat https://github.com/mjdm/RepALL.Weranthesimulationwiththefollowingdefault parameters: ProbabilityofgainingaRAG-mediateddeletion,pRAG:.008 Probabilityofcausingdifferentiationarrest,givenadeletion(Type0),pRAG_DA:2e-7 Probabilityofgainingamutationduetoreplicationerror,pBGMut:0.003 APOBECmutagenesisstartday:1290 APOBECdurationafterleukemiccell-typeinitiation:3days MaximumnumberofAPOBECmutationsgeneratedpercellperday,Burst:75 ProbabilityofgainingaRASmutation,givenamutation(Type1),pMut_IP:16/(3e9*.02)=2.6e7 Primarycellcultureanddrugtreatment PrimarysampleswerefrompatientsthathadprovidedconsentinstudiesapprovedbytheSt. JudeIRB.Atthetimeofsamplecollection,mononuclearcellswereisolatedusingFicoll-Paque (GELifeSciences)followedbycryopreservation.Onevialofcellsfromeachpatientwasthawed slowlyusingtheThawSTARsystem(MedCision),andthecellswereplacedincultureunderthe conditionspreviouslydescribed32.Forthelimiteddilutionexperiment,750,000cellswere platedineachwellofa12-wellplateandweregrowninculturefor3weeks.Fordrug treatments,thedrugsand350,000cellswereplatedineachwellofa24-wellplate.Inboth experiments,themediumwaschangedtwiceweeklybycarefullyremovinghalfofitand bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. replacingitwithfreshmedium.Thereplacementmediumincludeda2×drugconcentrationif thecellsamplewasundergoingchemotherapyexposure.AlldrugswerepurchasedfromSigmaAldrich,andtheconcentrationrangeswerebasedonsolubilitylimitsandpreviouslypublished data32.Thedrugsusedweremercaptopurine(500,250,125,and62.5μg/mL ConsensusMaker.pyand90μg/mL),vincristine(810,162,32.4,and6.5μg/mL),daunorubicin (31,6.2,1.2,and0.2μg/mL),andasparaginase(19,9.5,4.8,and2.4μg/mL).Livecellswere isolatedbyusingadeadcellremovalkit(Miltenyl).DNAwasextractedusingaDNAUniversal Kit(ZymoResearch),andlibrarieswerepreparedusingtheHyperPlusKit(KapaBiosciences). Exomeorcustomcapturewasperformedusingoligonucleotidesandthestandardprotocol fromIntegratedDNATechnologies.Qualitytrimming,alignment,andmutationcallingwere performedusingthepipelineoutlinedabove. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. References 1. Merlo,L.M.,PepperJ.W.,Reid,B.J.&MaleyC.C.Cancerasanevolutionaryand ecologicalprocess.Nat.Rev.Cancer6,924–935(2006). 2. CancerGenomeAtlasResearchNetworketal.TheCancerGenomeAtlasPan-Cancer analysisproject.Nat.Genet.45,1113–1120(2013). 3. Kandoth,C.etal.Mutationallandscapeandsignificanceacross12majorcancertypes. Nature502,333–339(2013). 4. Lawrence,M.S.etal.Discoveryandsaturationanalysisofcancergenesacross21 tumourtypes.Nature505,495–501(2014). 5. Golub,T.R.etal.,FusionoftheTELgeneon12p13totheAML1geneon21q22inacute lymphoblasticleukemia.Proc.Natl.Acad.Sci.U.S.A.92,4917–4921(1995). 6. Greaves,M.Pre-nataloriginsofchildhoodleukemia.Rev.Clin.Exp.Hematol.7,233–245 (2003). 7. Mori,H.etal.Chromosometranslocationsandcovertleukemicclonesaregenerated duringnormalfetaldevelopment.Proc.Natl.Acad.Sci.U.S.A.99,8242–8247(2002). 8. Mullighan,C.G.etal.,Genome-wideanalysisofgeneticalterationsinacute lymphoblasticleukaemia.Nature446,758–764(2007). 9. Papaemmanuil,E.etal.,RAG-mediatedrecombinationisthepredominantdriverof oncogenicrearrangementinETV6-RUNX1acutelymphoblasticleukemia.Nat.Genet.46, 116–125(2014). 10. Gawad,C.,Koh,W.&Quake,S.R.Dissectingtheclonaloriginsofchildhoodacute lymphoblasticleukemiabysingle-cellgenomics.Proc.Natl.Acad.Sci.U.S.A.111, 17947–17952(2014). 11. Harris,R.S.&Liddament,M.T.RetroviralrestrictionbyAPOBECproteins.Nat.Rev. Immunol.4,868–877(2004). 12. Greaves,M.Infection,immuneresponsesandtheaetiologyofchildhoodleukaemia. Nat.Rev.Cancer6,193–203(2006). 13. Mullighan,C.G.etal.Genomicanalysisoftheclonaloriginsofrelapsedacute lymphoblasticleukemia.Science322,1377–1380(2008). 14. Ma,X.etal.RiseandfallofsubclonesfromdiagnosistorelapseinpediatricB-acute lymphoblasticleukaemia.Nat.Commun.6,6604(2015). 15. Helleday,T.,Eshtad,S.&Nik-Zainal,S.Mechanismsunderlyingmutationalsignaturesin humancancers.Nat.Rev.Genet.15,585–598(2014). 16. Schubbert,S.etal.GermlineKRASmutationscauseNoonansyndrome.Nat.Genet.38, 331–336(2006). 17. Scholl,C.etal.SyntheticlethalinteractionbetweenoncogenicKRASdependencyand STK33suppressioninhumancancercells.Cell137,821–834(2009). 18. Janakiraman,M.etal.Genomicandbiologicalcharacterizationofexon4KRAS mutationsinhumancancer.CancerRes.70,5901–5911(2010). 19. Smith,G.etal.ActivatingK-Rasmutationsoutwith'hotspot'codonsinsporadic colorectaltumours—implicationsforpersonalisedcancermedicine.Br.J.Cancer102, 693–703(2010). bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. Kerner,G.S.etal.,CommonandrareEGFRandKRASmutationsinaDutchnon-smallcelllungcancerpopulationandtheirclinicaloutcome.PloSOne8,e70346(2013). Rubnitz,J.E.etal.ProspectiveanalysisofTELgenerearrangementsinchildhoodacute lymphoblasticleukemia:aChildren'sOncologyGroupstudy.J.Clin.Oncol.26,2186– 2191(2008). Harbott,J.,Viehmann,S.,Borkhardt,A.,Henze,G.&Lampert,F.IncidenceofTEL/AML1 fusiongeneanalyzedconsecutivelyinchildrenwithacutelymphoblasticleukemiain relapse.Blood90,4933–4937(1997). Cooperman,J.,Neely,R.,Teachey,D.T.,Grupp,S.&Choi,J.K.Celldivisionratesof primaryhumanprecursorBcellsinculturereflectinvivorates.StemCells22,1111– 1120(2004). Drake,J.W.,Charlesworth,B.,Charlesworth,D.&Crow,J.F.Ratesofspontaneous mutation.Genetics148,1667–1686(1998). Asselin,B.L.etal.InvitroandinvivokillingofacutelymphoblasticleukemiacellsbyLasparaginase.CancerRes.49,4363–4368(1989). Negrini,S.,Gorgoulis,V.G.,&Halazonetis,T.D.Genomicinstability—anevolving hallmarkofcancer.Nat.Rev.Mol.CellBiol.11,220–228(2010). Pui,C.H.&Evans,W.E.Treatmentofacutelymphoblasticleukemia.N.Engl.J.Med. 354,166–178(2006). Sottoriva,A.etal.,ABigBangmodelofhumancolorectaltumorgrowth.Nat.Genet.47, 209–216(2015). Martincorena,I.etal.,Tumorevolution.Highburdenandpervasivepositiveselectionof somaticmutationsinnormalhumanskin.Science348,880–886(2015). Kennedy,S.R.etal.Detectingultralow-frequencymutationsbyDuplexSequencing.Nat. Protoc.9,2586–2606(2014). Schmitt,M.W.etal.Sequencingsmallgenomictargetswithhighefficiencyandextreme accuracy.Nat.Methods12,423–425(2015). Pieters,R.etal.,Invitrodrugsensitivityofcellsfromchildrenwithleukemiausingthe MTTassaywithimprovedcultureconditions.Blood76,2327–2336(1990). Acknowledgements The authors would like to acknowledge Stephen Quake, Jinghui Zhang, and Jim Downing for their constructive advice on the project. In addition, the authors would like to thank the members of the Pediatric Cancer Genome Project, especially Charles Mullighan and Ching-Hon Pui who lead the Hematological Malignancies Program. V.G., Y.I., J.E., and C.G. are supported by ALSAC. C.G. is also supported by the Burroughs Wellcome Fund, Leukemia and Lymphoma Society, Hyundai Hope on Wheels, and the American Society of Hematology. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. The data are available in the short read archive accession ID . Author contributions: C.G., V.G., and M.M. designed research; V.G., J.E., M.M., and C.G. performed research; C.G., M.M., and I.D. contributed new reagents/analytic tools; C.G., and M.M. analyzed data; and C.G., V.G., M.M., and I.D. wrote the paper. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. FigureLegends: Figure1.Identificationofclone-specific“driver”mutationsbyusingsingle-cellexome sequencing.(A)TheclonalstructureofanETV6-RUNX1diagnosticpatientsamplethatwas identifiedbyinterrogatingsinglecellsformutationsfirstdetectedinthebulksamplewas furtherresolvedbycallingmutationsinthesinglecellsalone.Theclone-specific“driver”RAS mutationsidentifiedaspossiblecausesoftheclonalexpansionsarenoted.(B)Thenumberof newmutationsidentifiedineachcloneusingphasingofbulkmutationsand2-cellmutation calls.(C)Basesubstitutionpatternsseeninshared(early)andclone-specific(later)mutations. (D)ThesurroundingmotifsinC-to-TmutationsinearlyandlateSNVs,showingthatthestrong APOBECmotifisonlypresentintheearlymutations. Figure2.EvidenceforlargepopulationgeneticdiversityinETV6-RUNX1ALL.(A)Error-corrected sequencingconfirmed3clone-specificactivatingRASmutationsandidentified2additional lower-frequencyactivatingmutations.(B)SubclonalRASmutationswerealsocommonina largercohortinwhicheachpatienthadonemutationidentifiedinthebulksamplebuthada medianof5activatingRASmutations.(C)TheallelefrequencydistributionsofRASmutations shownoevidenceofpreferentialselectionofspecificaminoacidchanges.(D)Theincreased sensitivityofmutationdetectionwithlimitingdilutionidentifiesadditionalactivatingRAS mutations. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Figure3.Estimatingsingle-cellmutationrates.(A)Comparisonofthenumberofexome mutationcallsinleukemiacellstothatinnormalcellsafterremovinggermlineSNVs. Subtractingthenumberofmutationcallsinthenormalcellsprovidesanestimateofthe mutationsacquiredbyeachleukemiacell.(B)Increasingtheresolutionofmutationcalls identifieslower-frequencymutations.Theincreasinggeneticdiversityathigher-resolution measurementsfurthersupportstheexistenceofmuchhigherpopulationgeneticdiversity. Figure4.OnehundredsimulationsofthedevelopmentofETV6-RUNX1ALL.(A)Overviewof ETV6-RUNX1simulationinwhichasinglecellwithanETV6-RUNX1translocationevolvesover theyears.Cellsthatacquiredeletionsthatcausedifferentiationarrestareatriskfor transformationbyeitherreplicationorAPOBECmutationsthatcreateanactivatingRAS mutation.Thecellsexpanduntilthesubjectacquiresatotalleukemiccellburdenof approximately1 × 1011.(B)Thetimingofdifferentiationarrest,APOBECmutagenesis, appearanceofthefirstactivatingRASmutantclone,andclinicalpresentationofdisease,using thedescribedparameters,aresimilartowhatisseeninpatients.(C)Whendefiningaclone basedonthesomaticcodingmutationprofileofeachcell,thereisaninversecorrelation betweenthesizeandfrequencyofclones.Thenumberofcloneswiththelargestnumberof cellsismorevariableacrosssimulationsasaresultofthetimingandfrequencyofactivating RASmutationsincellsthatalreadyharboranETV6-RUNX1translocationandadeletionthat causesdifferentiationarrest.(D)Trackingtheclonesizeandfrequencyforeachofthe100 simulationsshowsthevarianceinthenumberofhigh-frequencyclonesacrosssimulations. bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Figure5.Differentialsensitivityofleukemicpopulationstochemotherapy.Clustersof mutationsshowingpatternsofresponsetodrugtreatmentanddosage.Cloneswithcluster7 mutations,whichincludesKRASA146VandtwoTP53mutations,expandedwithouttreatment oruponexposuretolow-doseasparaginasewhencomparedtothediagnosticsamplethatwas notplacedinculture.Mutationsinclusters8and9wereselectedinallsamplesunderthe cultureconditionsused.Low-dosemercaptopurineorhigherdosesofasparaginaselimitedthe expansionofmutationcluster7.Exposuretovincristineoranincreaseddoseof mercaptopurinefurtherreducedmutationcluster7.Higherdosesofmercaptopurine,aswellas exposuretoprednisoloneordaunorubicin,furtherdecreasedthefrequencyofmutationsin clusters1and7whileselectingforcloneswithmutationsinclusters2through4. Figure6.ModelofETV6-RUNX1evolution.Amassivenumberofclonalpopulationsarepresent atdiagnosis,withasubsetcontainingvariantsthatallowthemtobecomethemostabundant (red).Afterselectionpressureschangeduringtreatment,someofthehigh-frequencyclones contract,whereasothercloneswithdifferentsomaticmutationprofilesundergopositive selection(dashedline). bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. SUPPLEMENTARYMATERIALS SupplementaryTable1.ListofALLhotspotmutationlocationsintheerror-corrected sequencingcapturepanel. SupplementaryTable2.ListofrecurrentmutationsdetectedinpatientSJETV077. SupplementaryTable3.StatisticsfromthesimulationofthedevelopmentofETV6-RUNX1ALL. SupplementaryFigure1.Saturationofsequencingcoverageatincreasingdepth.(A)Exome sequencingofbulksamplesreachedasaturatingcoveragebreadthof94%at40millionreads. (B)Singlecellsreachedasaturatingcoveragebreadthof82%at60millionreads. SupplementaryFigure2.Distributionofduplicatenumbersinerror-correctedsequencing. Uniquemolecularidentifierfamilysizedistributionsfor(A)germlineand(B)leukemiasamples. SupplementaryFigure3.EstimatingtherelativecontributionofAPOBECandreplication mutagenesistoactivatingRASmutations.Thereissignificantvariabilityinthenumberof activatingRASmutationsproducedbyeachsimulation.MostoftheactivatingRASmutationsin thesimulationswereproducedbyAPOBEC(A)ratherthanbyreplication-mediated(B) mutations. A) Bulk Mutation Calls Followed By Single-Cell Interrogation B) Cumulative Number of Mutations in Each Clone Identifed by Bulk Variant Phasing and Two Cell Exome Variant Calls 80 Number of Mutations Mutations Cells Normal Cells 60 Bulk Phasing Shared Bulk Phasing Clone-Specific Two Cell Exome Shared 40 Two Cell Exome Clone-Specific 20 0 bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. ne o Cl C) 1 ne o Cl 2 3 ne o Cl ne o Cl 4 ne o Cl 5 no al rm Type of Base Changes in Early Shared Bulk Mutations or Late Clone-Specific Mutations Fraction of SNVs 1.00 Base Change A->C:T->G A->G:T->C A->T:T->A C->G:G->C C->A:G->T C->T:G->A 0.75 0.50 KRAS E63K 0.25 Exome Sequencing of 3 Cells per Clone and 3 Normal Cells Ea La te rly 0.00 Identi cation of Clone-Speci c “Driver” Mutations D) Bulk Shared Intra-clonal C to T Mutations A C G A G C C A T 3′ weblogo.berkeley.edu 0 5′ T G A 7 7 6 T T 6 C 5 T 4 A A G C A G 4 2 A 3 T 0 5′ G 3 A C G C 1 2 T KRAS E63K bits C 1 1 KRAS G12S 2 5 bits 2 1 NRAS G12R MLL3 G838S Mutant Allele Frequency Location and Variant Frequency of Six Identified Ras Mutations in a Single Patient 1.000 KRAS bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. E63K G12S 0.100 NRAS G12R 0.010 G12D 0 Number of Ras Mutations in Each Sample Detected by Error-Corrected Sequencing 8 6 KRAS NRAS 4 2 D119N G12G 0.001 B) Number of Mutation A) 0 100 200 300 400 T 500 566 E SJ Position along Coding Region (bp) C) Locations and Error-Corrected Frequencies of Ras Mutations in 7 Samples with One Mutation Identified by Standard Sequencing T E SJ 7 7 0 V T E SJ 8 7 0 V T E SJ 3 8 0 V T E SJ 2 9 0 V 8 SU 1 02 Allele Frequency of Ras Mutations Identified in SJETV075 1.000 KRAS A146V KRAS D119N NRAS G12D 0.100 0.5000 0.010 0.2500 0.1000 KRAS NRAS 0.0100 0.0010 Allele Frequency Mutant Allele Frequency T E SJ 5 7 0 V D) 1.0000 0.001 1.000 13 58 63 117 119 Amino Acid Position 146 156 KRAS G12S KRAS G12R 0.001 1.000 0.001 Bulk Bulk Barcode Limiting Dilution 0.010 0.010 12 KRAS G12D 0.100 0.100 0.0001 6 2 0 V NRAS G12R KRAS A146T KRAS L19F A) B) Comparison of Mutation Numbers at Increasing Resolution 300 Number of Mutations *p=0.041 800 700 600 500 400 300 200 m or N ke m ia C al C el el l l 0 *p=0.002 200 150 *p=0.006 100 50 0 100 Le u Number of Mutation Calls Comparison of Normal and Leukemia Single Cell Mutation Call Numbers k l Bu k l Bu d Ph e as o Tw l l e C le S g in l l e C B) Clinical Presenta Density Plot of Distribution of Timing of Differentiation Arrest, Acquisition of First RAS Mutation, and Time of Diagnosis After Acquisition of an ETV6-RUNX1 Translocation APOBEC Mutagenesis Density APOBEC Mutagenesis A) bioRxiv preprint first posted online Mar. 16, 2017; doi: http://dx.doi.org/10.1101/117614. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC 4.0 International license. Day First Cell Undergoes Differentiation Arrest 0.04 Day First RAS Mutation Acquired Day of Diagnosis (100 Billion Cells) 0.02 0.00 0 500 1000 1500 Days After ETV6-RUNX1 Translocation D) Number of Leukemic Clones with Increasing Number of Leukemic Cells Per Clone For Each Simulation 4 0 2 4 6 2 Number of Clones (log10) 8 6 8 Change in Number of Leukemic Clones with Increasing Number of Leukemic Cells Per Clone after 100 Simulations Number of Clones (log10) C) 0 0 1 3 5 7 9 Number of Cells Per Clone (log10) 0 1 3 5 7 9 Number of Cells Per Clone (log10) Mutations MDH1B p.V475D KCNG4 p.T451M SYT16 p.Q8X ANK3 p.S2426F KIAA1024 p.T63M LACC1 p.E53K,LACC1 CTCF p.D247H,CTCF PCDHB5 p.F763L SDHA p.Y215C IGFN1 p.T2132A TRIM48 p.Y192H H2AFV p.Q99R,H2AFV PDS5B p.R473W ADH1B p.R272Q PIK3C2G p.Q118E PANK3 p.I301F CEL p.I488T NBPF10 p.E285A NBPF10 p.M286I FAM186A p.H1367Q HERC2 p.R1211C HGC6.3 p.M72V C17orf112 p.R84T PRRC2A p.R2138Q,PRRC2A GNB5 p.F27L PARP4 p.L1168V DCAF12 p.A379P DPP4 p.D133N PTPN11 p.S502L ARHGAP15 p.F371L ARHGAP15 p.F371S HIVEP1 p.Q2002E KIAA1551 p.Q1577E CRTAP p.R114C FAM194B p.E544K ARHGAP5 p.E1423K,ARHGAP5 SPTA1 p.D455H UBR1 p.E602K KLHL24 p.F158L APC p.S281C,APC SVIL p.D17H,SVIL BRCA1 p.S220X,BRCA1 FHOD1 p.E234K MAN1B1 p.E227K ISCU p.A126V,ISCU BRAT1 p.Q333E SPEG p.E717K SRRT p.E287Q,SRRT NRN1 p.I62M TIGD7 p.S324X SEC61A1 p.I249M TTC24 p.Q48X SPTA1 p.R494T TSPAN19 p.S127F TIGD7 p.L282V SELP p.S557L MBD6 p.R360C SLC4A2 p.E766Q,SLC4A2 PRDM9 p.S814R CR1 p.R1744X,CR1 ZC3H11A p.S805X RGPD1,RGPD2 p.D1365Y,RGPD2 PPFIBP1 p.L57F,PPFIBP1 KRAS p.A146T,KRAS MGAT4C p.R199C ABCB6 p.E220X SUPT6H p.D1716N TP53 p.G66C,TP53 TP53 p.N107D,TP53 CLIP1 p.E434Q,CLIP1 RAP1GDS1 p.Q397H,RAP1GDS1 FOLH1 p.R281H,FOLH1 RGPD3 p.R816C Drug Treatment (Dose) Prednisolone 1 Daunorubicin 1 Daunorubicin 2 Prednisolone 4 Prednisolone 3 Prednisolone 2 Mercaptopurine 4 Mercaptopurine 3 Vincristine 4 Vincristine 2 Mercaptopurine 2 Vincristine 3 Vincristine 1 Asparaginase 4 Asparaginase 2 Asparaginase 3 Mercaptopurine 1 DMSO No Drug or Vehicle Asparaginase 1 Diagnosis 12 4 Mutation Clusters 5 79 Allele Frequency (%) 40 30 20 10 5 Clonal Expansion Rate Low Medium High Disease Initiation Diagnosis Treatment