Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Title:ASIMPLEpipelineformappingpointmutations runningtitle:Pointmutationsmappingpipeline 1 2 1 1,3 GuyWachsman ,JenniferL.Modliszewski ,ManuelValdes andPhilipN.Benfey 1 2 DepartmentofBiology,DukeUniversity,Durham,NC27708,USA; UNC,GenomeSciencesBuilding4144,250Bell 3 TowerDrive,ChapelHill,NC,USA; HowardHughesMedicalInstitute,DukeUniversity,Durham,NC27708,USA Correspondingauthor:[email protected] Keywords:SNP,mapping Abstract Aforwardgeneticscreenisoneofthebestmethodsforrevealingtheregulatoryfunctionsof genes.Inplants,thistechniqueishighlyefficientsinceitisrelativelyeasytogrowandscreenthe phenotypes of hundreds or thousands of individuals. The cost-efficiency and ease of data productionaffordedbynext-generationsequencingtechniqueshavecreatednewopportunities formappinginducedmutations.Theprinciplesofgeneticmappingremainthesame.However, thedetailshavechanged,whichallowsforrapidmappingofcausalmutations.Currentmapping tools are often not user-friendly and complicated or require extensive preparation steps. To simplify the process of mapping new mutations, we developed a pipeline that takes next generationsequencingfastqfilesasinput,callsonseveralwell-establishedandfreelyavailable genome-analysistools,andoutputsthemostlikelycausalDNAchange(s).Thepipelinehasbeen validatedinArabidopsisandcanbereadilyappliedtootherspecies. 1 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Introduction Identifyinggeneticmutationsthatunderliephenotypicchangesisessentialforunderstandinga widevarietyofbiologicalprocesses.Aforwardgeneticscreenisoneofthemostpowerfultools forsearchingforsuchmutations.Spontaneousandinducedmutationshavebeenusedtoidentify genes underlying aberrant phenotypes for over one hundred years (Morgan 1910). In the commoncase,amutagenisusedtogenerateafewthousandrandommutationsinthegenome by physical (radiation; (Muller 1927)), chemical (EMS; (Lewis and Bacher 1968)) or biological (transposons;(McClintock1950))agentsfollowedbyascreenforthedesiredphenotypecaused byoneofthemutations.Onceaplantwiththedesiredphenotypeisisolated,theresearcher must identify the causal mutation. This is done by testing for an association between known geneticmarkersandthephenotype.Suchgeneticlinkage(assessedbyco-segregation)indicates thatthecausalmutationislocatedinthevicinityofthegeneticmarker.Theintroductionofnext- generation sequencing (NGS) for mapping purposes has proven to be very promising, as it is possible to quickly identify a small number of potential causal SNPs. However, the currently availabletoolse.g.,SNPtrack(Leshchineretal.2012),SHOREmap(Schneebergeretal.2009)and NGM(Austinetal.2011)areeitherinoperable(SNPtrack)orrequirecodingknowledge.Wehave developedtheSIMPLEtool(SImpleMappingPipeLinE),whichoperatesontheinputoftheNGS fastqfilesgeneratedfromWTandmutantbulkedDNApools,andproducestablesandaplot showing the most likely candidate genes and locations, respectively. The tool can be easily downloadedandexecutedwithnopriorbioinformaticsknowledgeandrequiresonlyafewsimple preparatorystepstoinitiate.Oncetheprogramruns,theuseraccessesatablewiththemost 2 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. likely candidate genes and a figure file that marks the location(s) of these candidates. Our pipelinehasseveraladvantagesincomparisontoothermappingtools.First,theentireprocess is user-friendly. It does not require any prior programming knowledge or NGS analysis skills. Second,itisβall-inclusiveβ;besidesafewinitialstepssuchasdownloadingthefastqfilestothe right folder and naming them, the user only needs to paste three lines into the terminal applicationtoruntheprogram.ThesestepsaredescribedintheREADMEfile.Third,theprogram canacceptasinputanypaired-endorsingle-endfastqcombinationfromtheWTandmutant bulks. Fourth, this tool accepts, based on our experience, several types of segregating populationssuchasM2,M3,back-crossandmap-cross.TheprojectiscurrentlyhostedonGitHub andisavailablefordownloadathttps://github.com/wacguy/Simple. Results Theconceptbehindthemappingtool All methods that aim to map a causal DNA change implement a similar strategy. For bulk- segregant analysis, a segregating F2 population is divided into mutant phenotype and WT phenotype. DNA from each of the two bulked samples is sequenced by next-generation sequencing.Forrecessivemutations,thebasicprincipleistofindtheDNAchangesthathaveonly non-referencereads(i.e.,locationsthatdifferfromthereferencegenome)inthemutantbulk and~1:2mutanttoWTratioofreadsintheWTbulk(sincetheWTbulkiscomprisedof+/-and +/+ individuals in a 2:1 ratio). For dominant mutations, the concept is similar, but the nonsegregating(withareferencelinkedlocus)bulkwouldbetheWTbatch(+/+)whereasthemutant bulkiscomprisedofmixedgenotypesofmutantindividuals(+/-and-/-).Ourpipelineisbased 3 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. on a short BASH script that calls BWA (Li and Durbin 2009), Samtools (Li et al. 2009), Picard (http://broadinstitute.github.io/picard),GATK(McKennaetal.2010;DePristoetal.2011;Van der Auwera et al. 2013), SnpEff (Cingolani et al. 2012) and R (R Core Team 2013) in order to generatetwoVCFs(variantcallfiles)andaplot.ThefirstVCFlistsallcandidategenesandthe second one contains the entire SNP population found by the GATK HaplotypCaller tool. Generation of the candidate list is based on several criteria. First, and most importantly, the mutationhastosegregateinthecorrectratio.Additionally,wechosetofocusonSNPsthatare consistent with a specific set of criteria to identify the most likely causal mutation. However, thesecriteriacanbereadilychangedbymanipulatingthesimple.shfile.Forexample,weselect only single nucleotide changes, since single nucleotide mutagens such as EMS are commonly used. We also select for mutations that have a significant effect on the protein rather than synonymous mutations or changes in intergenic regions. Figure 1a shows a flowchart of the pipelineconcept. 4 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Figure1.TheSIMPLEpipelineworkflow User-requiredactionsareinyellowandSIMPLEactionsareinblue.Thespecificprogramthatexecutestheaction(s) ofSIMPLEisindicatedatthebottomofeachbluebox. Totestourpipeline,wesequencedtenpopulationsthatweregeneratedinthreeindependent EMSscreens.WeusedfourdifferentpopulationtypesincludingM2(asegregatingpopulation generatedbyselfingaheterozygousplantfromamutagenizedpopulation),M3(asegregating population generated by selfing a heterozygous M2 plant), back-cross (an F2 segregating populationgeneratedbycrossingamutantplantwiththeoriginalparentalline)andmap-cross (anF2segregatingpopulationgeneratedbycrossingamutantplantwithanotheraccession,e.g., L.erecta).Someofthemutantsweresequencedmorethanonceorinsuccessivegenerations,if theyhadmorethanonemappingpopulationorifinitialmappingfailed(seeTable1fordetails). 5 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Table1.MappingpopulationsthatwereusedtotesttheSIMPLEpipeline line 1 474-3 300 300-4 300-7 300-7 633 M381 B381 EMS608 194 generation M3 M2 M3 M3 M3 M2 F2-mapcrossonLer F2-back-crossonCol-0;CASP1::GFP M3 M2 mappedgene SHR RHD3 SECA1 GLUTATHIONEREDUCTASE2 MYB36 BIN4 RHD3 At_num AT4G37650 AT3G13870 AT4G01800 AT3G54660 AT5G57620 AT5G24630 AT3G13870 mut/wt seedlings 23/23 19/19 50/100 50/100 55/50 30/30 50/100 50/100 47/150 23/49 CDS changes 664C>T 1751C>T 1751C>T 2254C>T 2254C>T 790G>A protein change2 Arg222* 174G>A 971G>A 1951C>T Trp58* Gly324Glu Gln651* Ser584Phe Arg752* Ala264Thr numberofreads (mut.ref/mut.alt ;wt.ref/wt.alt) 0/43;38/22 0/13;10/0 0/64;76/42 0/19;30/9 1/16;110/41 1/33;38/9 0/21;14/0 0/20;60/1 1/20;18/1 10/22;49/6 mean coverage (mut/wt) 73/75 21/25 89/156 23/61 90/198 61/57 32/21 31/83 46/39 23/49 validation3 allelictest&similarphenotypetoknownallele remarks allelictest&similarphenotypetoknownallele allelictest&similarphenotypetoknownallele similarphenotypetoknownallele(Yuetal.,2013) complementation&similarphenotypetoknownallele couldn'tbemappedwithM3population wtplantsfromLerparentalline wtplantsfromCol-0;CASP1::GFPparentalline similarphenotypetoknownallele(Breueretal.,2007) similarphenotypetoknownallele 1 Column1denotesthelinenumber.Line300-4wasgeneratedbyselfingaheterozygousplantfromline300. Each 2 3 colorrepresentadifferentEMSmutagenizingexperiment; asteriskindicatesastopcodon; seemethodsformore information. Thepipelineproducesthreefiles,cands4.txt,plot4.txtandRplot.pdf(seedetailsbelow)tohelp identify the causal mutation. In most cases, the candidate list output file (cands4.txt files) togetherwiththepositiontoSNP-ratioplotweresufficienttoidentifythecausalmutation(Fig. 2a). In line 194 no genes were found in the candidate list, most likely due to an erroneous inclusion of WT seedlings in the mutant bulk. Nevertheless, the program provides additional informationtohelpidentifystrongcandidates.Forexample,theplotindicatesalinkedlocusnear thecenterofchromosome3(Fig.2b).BrowsingtheSNPpopulationinthisregionidentifiesa prematurestopcodoninRHD3(AT3G13870).Indeed,thislinehasashortrootandwavyroot hairphenotypesimilartothemappedM2-300/M3-300-4lineandtorhd3-1(Wangetal.1997; 2015). There are two parameters that appear to strongly influence the success of the pipeline in identifyingthecausalmutationoratleastashortlistofpotentialSNPs.First,incorrectinclusion ofWTplantsinthemutantbulkleadstoreferencereadsinthemutantfastqfile.Asaresult,the causalSNPandlinkedmutationsareviewedasheterozygous.Therefore,itisessentialthatall individualsinthemutantbulkbecorrectlyphenotypedasmutant.Incaseswherethephenotype isdifficulttorecognize,forexample,asoftenhappenswithQTLs,itisrecommendedtoworkwith 6 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. smallerbuthighconfidencepopulationsratherthanincludepotentiallywild-typeindividualsin themutantbulk.Alternatively,mutantplantscanbetestedbasedonthesegregationoftheir offspring.Asecondcrucialparameterissequencingdepth.Thisisimportantbecausedifferent genomic regions have variable coverage depth and even when mean coverage appears to be sufficient,someregionsmaystillhaveveryfewreads,whichrendersthemalmostimpossibleto genotype.Werecommendaminimumof30xcoverage. WerecommendworkingwithanF2/M2generationratherthananF3/M3generationfortwo reasons. First, a segregating F3/M3 population generated from a single F2/M2 heterozygous plantishomozygousforonequarterofthelociaswellasbeinghomozygousforthecausalSNP whereasanF2/M2populationishomozygousonlyforthecausalmutationandthegenetically linked region. Therefore, there is substantial information loss in the F3/M3 generation and mappingisheavilydependentonthesegregationoftheWTbulk.Thisbulkislessinformative becauseits2:1WTtomutantratiointhebinomialread-distributionhasahigherpotentialtovary fromtheexpectedvalues.Incontrast,themutantbulkisexpectedtohavestrictlynoreference readsandafewdozenalternatereads.AnotherreasontoprefertheF2/M2generationisthat the following generation goes through a second round of recombination that generates chromosomesandchromosomalregionswithmorecomplexhaplotypesthataremoredifficult tointerpret. Inputfiles Theuserprovidesthefastqfilesthataregeneratedbyanext-generationsequencingplatform suchasIllumina2000andplacestheminthefastqfolder.Eachbulk(mutantorWT)canhave 7 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. either a single file in the case of single-end sequencing or two files for paired-end. All other dependenciessuchasreferencefilesandprogramsareeitherpresentordownloadedaspartof thepipeline,withtwoexceptions.TheuserhastoprovidetheGATKexecutableduetolicensing issues;JavaandRshouldalsobepre-installed.Ashortdescriptionofhowtorunthepipelineis providedintheshortREADMEfile(Supplementaryfile3). Outputfiles Theprogramgeneratesmorethan30files,althoughmostofthemarenotnecessaryfornonprogrammers.Therearethreefilesthatcanhelpidentifythecausalmutation.ThefileRplot.pdf (lookssimilartoFigure2aor2b)showsthechromosomallocationofeachSNP,plottedagainsta LOESS-fittedratiovariable.Thisvariablewasgeneratedusingthisequation: πππ‘ππ = π€π‘()* ππ’π‘()* β π€π‘()* + π€π‘,-. ππ’π‘()* + ππ’π‘,-. where, π€π‘()* is the number of reads in the WT bulk that are called with the reference genome nucleotide, π€π‘,-. is the number of reads in the WT bulk that are called with a non-reference genome nucleotide, ππ’π‘()* isthenumberofreadsinthemutantbulkthatarecalledwiththereferencegenome nucleotide 8 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. and, ππ’π‘,-. isthenumberofreadsinthemutantbulkthatarecalledwithanon-referencegenome nucleotide. The ratio should be around zero for unlinked SNPs and approximately 0.66 for the causal mutationandgeneticallylinkedSNPs(Figure2).WeremovedallSNPswitharatiolowerthan0.1 andappliedLOESSsmoothingwithdegree=2andspan=0.3. 9 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Figure2.Exampleoftheoutputplots.x-axis,chromosomallocation;y-axis,ratiovariable(seeequation1fordetails). Thenumberofeachchromosomeislabeledontopofeachpanel.a)andb)Line300and194,respectively;seeTable 1andmaintextfordetails. Thesecondimportantfile,cands4.txtliststhestrongestcandidategenes.(Supplementaryfile1 asanexample).Thesecandidatesareselectedbasedonthefollowingcriteria.1)TheSNPhasto behomozygous-alternate(namely,onlyreadsthataredifferentfromthereferencegenome)in 10 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. themutantbulkandwithapproximately2:1referencetoalternatereadrationintheWTbulk. Second,theSNPmustbeasinglenucleotidechangeratherthananindel.Third,themutation shouldhaveasignificanteffectontheprotein,i.e.,spliceacceptor,splicedonor,startlost,stop gainedormissensevariant. Thethirdimportantfileisplot4.txt(Supplementaryfile2asanexample).ThisfilelistsalltheSNPs withtheirlocations,thechangeinthecodingsequence,theeffectontheproteinandthenumber ofreadsforeachalleleineachbulk.Thisfileisimportantincasethecands4.txtdidnotyieldany candidategenes.Forexample,insomecases,thephenotypeofthesampledmutantsisdifficult todistinguishfromWT,suchasthecasewhenmappingQTLsormutantswithsubtlephenotypes. In these scenarios, WT plants might be included in the mutant bulk and the causal SNP is interpreted as heterozygous for both bulks. In such cases, the user can manually browse the plot4.txtfiletoidentify:1)SNPsthatarenearlyhomozygous-alternateinthemutantbulkand approximately2:1referencetoalternateallelicratiointheWTbulkand2)SNPswithasignificant effectonthecodingregion. Discussion Approximately half of the Arabidopsis genes have unknown molecular functions (http://www.arabidopsis.org/portals/genAnnotation/genome_snapshot.jsp) which creates a largeopportunityfordiscoveringnovelgenefunctionalitiesthroughrelativelysimpleforwardgenetic screens. The introduction of next-generation sequencing technologies offers new and rapidopportunitiesforidentificationofthegenesaffectedbysuchscreens.Themainadvantage ofgenomesequencingovertraditionalmap-basedcloningmethodsthatusemolecularmarkers 11 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. suchasRFLPandAFLPisthatNGSprovidessingle-nucleotideresolution.Whilepre-NGSmethods could identify a genomic region, this had to be further mined for potential mutation(s) in candidategenes.Bycontrast,sequencingamutagenizedgenomerevealstheentirepopulation of genetic changes. The causal mutation is then precisely identified using bioinformatics/computationaltools.AnotherimportantbenefitofusingNGSformappingisthat asmallpopulation(afewdozenindividuals)isusuallysufficientformapping.Sincethemapping resolutionisashighasasinglenucleotide,theimportanceofeachandeveryrecombinantfor reducingtheregionofthecausalmutation(βchromosomewalkingβ)islessimportant.Inother words,thesizeoftheregionboundedbyrecombinationeventsinwhichthecausalmutationlies isnotimportantastheresearcherisabletovisualizeeachandeverySNP,evaluateit,andchoose theonethathasthehighestlikelihoodofbeingcausal.MappingbyNGSisespeciallyfruitfulwhen alargemappingpopulationcanbeeasilygeneratedandscreenedwhichisthecaseformany plantspecies,nematodesandyeast.Wehavedevelopedaneasytousetoolthatallowsmapping ofsingle-nucleotideinducedmutations,evenbyresearchersthathaveverylittleexperiencewith bioinformaticstools.OurpipelinewilltakethefasqfilesasinputandwillidentifycausalSNPs withnopre-processingstepsrequired.Theoutputtablesandplotscanbereadilyusedtoidentify themostlikelymutation.Eveninthecasewherenocandidategeneispresent,thelistofSNPs withtheirreadcallsnumberinthemutantandWTbulksandtheireffect,willmostlikelypoint towardsthecorrectgene.Intheory,theSIMPLEpipelinecanbeusedwithanydiploidspecies thathasbulkedmutantandWTmappingpopulations.Althoughnotmanyspeciesarecurrently covered,itiseasytoaddanyspeciesofinteresttothepipeline(seeREADMEfile).Theprogram runsonMacOSXversion10.11.6andLinuxrelease6.7(GNOME2.28.2)withJava1.7installed 12 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. (seeREADMEfileforspecification)whichisacommonlyusedplatforminmanylabs.TheSIMPLE project is hosted on GitHub (https://github.com/wacguy/Simple) and includes a quick-start READMEfile. Methods Growthconditionandscreening ArabidopsisseedlingsweregrownonMSmedium((MurashigeandSkoog1962)containing1% sucrose. EMS screen was performed according to (Weigel and Glazebrook 2006). DR5::Luc seedlingwerescreenedusingtheLumazonesystemaccordingto(Moreno-Risuenoetal.2010) withlines474-3,300,300-4,300-7,633and194,withpCASP2::GFPexpressionforlinesB381and M381 (Liberman et al. 2015) and for the cortex marker (Brady et al. 2007) expression in the EMS608 line. The GFP screens for CASP2 and the cortex marker were performed using Leica fluorescencestereodissectingscope. Sequencing Pairedorsingle-endlibrarieswerepreparedusingNexteraXTorKAPAHyper-prepKitaccording tomanufacturerinstructionandsequencedonIllumina2000/2500instrumentwiththeHighor RapidthroughputmodeattheDukeCenterforGenomicandComputationalbiology. Mutantvalidation Line474-3hasasinglegroundtissuelayerandashortrootphenotype,similartoothershralleles (Helariuttaetal.2000).ThephenotypeofF1offspringinacrossbetween474-3asamaledonor 13 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. andshr-2(Helariuttaetal.2000)hasthephenotypeoftheparentallinessuggestingthatline4743isallelictoSHR. Line300(and300-4,asegregatinglineof300)wascrossedwithrhd3-1(Wangetal.1997;2015) andtheF1progenieshaveshortrootandwavyphenotypesimilartotheparentallinessuggesting thattheselinesareallelic.Line300-7wascrossedwithSALKline063371(Liuetal.2010)andthe F1 progenies phenocopied both parental lines. myb36-1 (line 381 M and B) was previously described(Libermanetal.2015).Line194hasthesamephenotypeasline300-4andrhd3-1allele asdescribedabove. AllF1crossesweresequencedforthemaledonorallele(seeTable2forprimers). Table2 line primers name forword reverse remarks 474-3 cw49/cw50 TCTCCATACCTCAAACTCCTCC TTGCCTCTCCGTCTACTGC these primers were used to genotype the shr-2 allele 300/300-4 seq-RHD3-muts-f/r cagagctttctgattaaacaaacttc CAAGTGCTTGAGGCAAGTGA 300-7 M3-300-7-f/r CACTGATGAAGAAAGGAAGAAGg CATCTTGGATTCGATCGGTAA PrimersthatwereusedtoamplifyandsequencethemalealleleinF1seedlings. Dataaccess WholegenomesequencingdatahavebeendepositedintheSequenceReadArchive(SRA)with theaccessionnumberPRJNA353239. Acknowledgments WewouldliketothankpeopleintheBenfeylabforcommentsonthemanuscriptandhelpful experimentalsuggestionsandtoCarmenWilsonfortechnicalassistant.WealsothanktheDuke CenterforGenomicandComputationalBiologyforpreparingandsequencingtheDNAlibraries. 14 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. ThisresearchwasfundedbyagranttoPNBfromtheNIH(R01-GM043778),andbytheHoward HughesMedicalInstituteandtheGordonandBettyMooreFoundation(GBMF3405). Disclosuredeclaration Theauthorsdeclarenoconflictofinterest. Supplementaryfile1 Anexampleofcands4.txtfilefromline633. Supplementaryfile2 Anexampleofplot4.txtfilefromline633. Supplementaryfile3 Instructionfileforrunningthepipeline.ThisfileisalsoavailableontheSIMPLEGitHubrepository. References AustinRS,VidaurreD,StamatiouG,BreitR,ProvartNJ,BonettaD,ZhangJ,FungP,GongY, WangPW,etal.2011.Next-generationmappingofArabidopsisgenes.ThePlantJournal67: 715β725. BradySM,OrlandoDA,LeeJY,WangJY,KochJ,DinnenyJR,MaceD,OhlerU,BenfeyPN.2007. AHigh-ResolutionRootSpatiotemporalMapRevealsDominantExpressionPatterns. Science318:801β806. CingolaniP,PlattsA,WangLL,CoonM,NguyenT,WangL,LandSJ,RudenDM,LuX.2012.A programforannotatingandpredictingtheeffectsofsinglenucleotidepolymorphisms, SnpEff:SNPsinthegenomeofDrosophilamelanogasterstrainw(1118);iso-2;iso-3.Fly (Austin)6:80β92. 15 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. DePristoMA,BanksE,PoplinR,GarimellaKV,MaguireJR,HartlC,PhilippakisAA,delAngelG, RivasMA,HannaM,etal.2011.Aframeworkforvariationdiscoveryandgenotypingusing next-generationDNAsequencingdata.NatGenet43:491β498. FukakiH,TamedaS,MasudaH,TasakaM.2002.Lateralrootformationisblockedbyagain-offunctionmutationintheSOLITARY-ROOT/IAA14geneofArabidopsis.PlantJ29:153β168. HelariuttaY,FukakiH,Wysocka-DillerJ,NakajimaK,JungJ,SenaG,HauserMT,BenfeyPN. 2000.TheSHORT-ROOTgenecontrolsradialpatterningoftheArabidopsisrootthrough radialsignaling.Cell101:555β567. LeshchinerI,AlexaK,KelseyP,AdzhubeiI,Austin-TseCA,CooneyJD,AndersonH,KingMJ, StottmannRW,GarnaasMK,etal.2012.Mutationmappingandidentificationbywholegenomesequencing.GenomeResearch22:1541β1548. LewisEB,BacherF.1968.Methodoffeedingethylmethanesulfonate(EMS)toDrosophila males.Dros.Inf.Serv. LiH,DurbinR.2009.FastandaccurateshortreadalignmentwithBurrows-Wheelertransform. Bioinformatics25:1754β1760. LiH,HandsakerB,WysokerA,FennellT,RuanJ,HomerN,MarthG,AbecasisG,DurbinR.2009. Thesequencealignment/mapformatandSAMtools.Bioinformatics25:2078β2079. LibermanLM,SparksEE,Moreno-RisuenoMA,PetrickaJJ,BenfeyPN.2015.MYB36regulates thetransitionfromproliferationtodifferentiationintheArabidopsisroot.Proceedingsof theNationalAcademyofSciences112:12099β12104. LiuD,GongQ,MaY,LiP,LiJ,YangS,YuanL,YuY,PanD,XuF,etal.2010.cpSecA,athylakoid proteintranslocasesubunit,isessentialforphotosyntheticdevelopmentinArabidopsis. JournalofExperimentalBotany61:1655β1669. McClintockB.1950.TheOriginandBehaviorofMutableLociinMaize.ProcNatlAcadSciUSA 36:344β355. McKennaA,HannaM,BanksE,SivachenkoA,CibulskisK,KernytskyA,GarimellaK,AltshulerD, GabrielS,DalyM,etal.2010.TheGenomeAnalysisToolkit:AMapReduceframeworkfor analyzingnext-generationDNAsequencingdata.GenomeResearch20:1297β1303. Moreno-RisuenoMA,VanNormanJM,MorenoA,ZhangJ,AhnertSE,BenfeyPN.2010. OscillatingGeneExpressionDeterminesCompetenceforPeriodicArabidopsisRoot Branching.Science329:1306β1311. MorganTH.1910.SexlimitedinheritanceinDrosophila.Science32:120β122. MullerHJ.1927.ArtificialTransmutationoftheGene.Science66:84β87. 16 bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. MurashigeT,SkoogF.1962.ARevisedMediumforRapidGrowthandBioAssayswithTobacco TissueCultures.PhysiolPlant15:473β497. OkumuraK-I,GohT,ToyokuraK,KasaharaH,TakebayashiY,MimuraT,KamiyaY,FukakiH. 2013.GNOM/FEWERROOTSisrequiredfortheestablishmentofanauxinresponse maximumforarabidopsislateralrootinitiation.PlantandCellPhysiology54:406β417. SchneebergerK,OssowskiS,LanzC,JuulT,PetersenAH,NielsenKL,JørgensenJ-E,WeigelD, AndersenSU.2009.SHOREmap:simultaneousmappingandmutationidentificationby deepsequencing.NatMeth6:550β551. RCoreTeam.2013.R:Alanguageandenvironmentforstatisticalcomputing. VanderAuweraGA,CarneiroMO,HartlC,PoplinR,delAngelG,Levy-MoonshineA,JordanT, ShakirK,RoazenD,ThibaultJ,etal.2013.FromFastQdatatohighconfidencevariantcalls: theGenomeAnalysisToolkitbestpracticespipeline.CurrProtocBioinformatics43: 11.10.1β33. WangHY,LockwoodSK,HoeltzelMF,SchiefelbeinJW.1997.TheROOTHAIRDEFECTIVE3gene encodesanevolutionarilyconservedproteinwithGTP-bindingmotifsandisrequiredfor regulatedcellenlargementinArabidopsis.Genes&Development11:799β811. WangJ,WangY,YangJ,MaC,ZhangY,GeT,QiZ,KangY.2015.ArabidopsisROOTHAIR DEFECTIVE3isinvolvedinnitrogenstarvation-inducedanthocyaninaccumulation.JIntegr PlantBiol57:708β721. WeigelD,GlazebrookJ.2006.EMSMutagenesisofArabidopsisSeed.CSHProtoc2006: pdb.prot4621.http://cshprotocols.cshlp.org/content/2006/5/pdb.prot4621.abstract. 17