Download A SIMPLE pipeline for mapping point mutations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neurogenomics wikipedia , lookup

Transcript
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Title:ASIMPLEpipelineformappingpointmutations
runningtitle:Pointmutationsmappingpipeline
1
2
1
1,3
GuyWachsman ,JenniferL.Modliszewski ,ManuelValdes andPhilipN.Benfey 1
2
DepartmentofBiology,DukeUniversity,Durham,NC27708,USA; UNC,GenomeSciencesBuilding4144,250Bell
3
TowerDrive,ChapelHill,NC,USA; HowardHughesMedicalInstitute,DukeUniversity,Durham,NC27708,USA
Correspondingauthor:[email protected]
Keywords:SNP,mapping
Abstract
Aforwardgeneticscreenisoneofthebestmethodsforrevealingtheregulatoryfunctionsof
genes.Inplants,thistechniqueishighlyefficientsinceitisrelativelyeasytogrowandscreenthe
phenotypes of hundreds or thousands of individuals. The cost-efficiency and ease of data
productionaffordedbynext-generationsequencingtechniqueshavecreatednewopportunities
formappinginducedmutations.Theprinciplesofgeneticmappingremainthesame.However,
thedetailshavechanged,whichallowsforrapidmappingofcausalmutations.Currentmapping
tools are often not user-friendly and complicated or require extensive preparation steps. To
simplify the process of mapping new mutations, we developed a pipeline that takes next
generationsequencingfastqfilesasinput,callsonseveralwell-establishedandfreelyavailable
genome-analysistools,andoutputsthemostlikelycausalDNAchange(s).Thepipelinehasbeen
validatedinArabidopsisandcanbereadilyappliedtootherspecies.
1
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Introduction
Identifyinggeneticmutationsthatunderliephenotypicchangesisessentialforunderstandinga
widevarietyofbiologicalprocesses.Aforwardgeneticscreenisoneofthemostpowerfultools
forsearchingforsuchmutations.Spontaneousandinducedmutationshavebeenusedtoidentify
genes underlying aberrant phenotypes for over one hundred years (Morgan 1910). In the
commoncase,amutagenisusedtogenerateafewthousandrandommutationsinthegenome
by physical (radiation; (Muller 1927)), chemical (EMS; (Lewis and Bacher 1968)) or biological
(transposons;(McClintock1950))agentsfollowedbyascreenforthedesiredphenotypecaused
byoneofthemutations.Onceaplantwiththedesiredphenotypeisisolated,theresearcher
must identify the causal mutation. This is done by testing for an association between known
geneticmarkersandthephenotype.Suchgeneticlinkage(assessedbyco-segregation)indicates
thatthecausalmutationislocatedinthevicinityofthegeneticmarker.Theintroductionofnext-
generation sequencing (NGS) for mapping purposes has proven to be very promising, as it is
possible to quickly identify a small number of potential causal SNPs. However, the currently
availabletoolse.g.,SNPtrack(Leshchineretal.2012),SHOREmap(Schneebergeretal.2009)and
NGM(Austinetal.2011)areeitherinoperable(SNPtrack)orrequirecodingknowledge.Wehave
developedtheSIMPLEtool(SImpleMappingPipeLinE),whichoperatesontheinputoftheNGS
fastqfilesgeneratedfromWTandmutantbulkedDNApools,andproducestablesandaplot
showing the most likely candidate genes and locations, respectively. The tool can be easily
downloadedandexecutedwithnopriorbioinformaticsknowledgeandrequiresonlyafewsimple
preparatorystepstoinitiate.Oncetheprogramruns,theuseraccessesatablewiththemost
2
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
likely candidate genes and a figure file that marks the location(s) of these candidates. Our
pipelinehasseveraladvantagesincomparisontoothermappingtools.First,theentireprocess
is user-friendly. It does not require any prior programming knowledge or NGS analysis skills.
Second,itisβ€œall-inclusive”;besidesafewinitialstepssuchasdownloadingthefastqfilestothe
right folder and naming them, the user only needs to paste three lines into the terminal
applicationtoruntheprogram.ThesestepsaredescribedintheREADMEfile.Third,theprogram
canacceptasinputanypaired-endorsingle-endfastqcombinationfromtheWTandmutant
bulks. Fourth, this tool accepts, based on our experience, several types of segregating
populationssuchasM2,M3,back-crossandmap-cross.TheprojectiscurrentlyhostedonGitHub
andisavailablefordownloadathttps://github.com/wacguy/Simple.
Results
Theconceptbehindthemappingtool
All methods that aim to map a causal DNA change implement a similar strategy. For bulk-
segregant analysis, a segregating F2 population is divided into mutant phenotype and WT
phenotype. DNA from each of the two bulked samples is sequenced by next-generation
sequencing.Forrecessivemutations,thebasicprincipleistofindtheDNAchangesthathaveonly
non-referencereads(i.e.,locationsthatdifferfromthereferencegenome)inthemutantbulk
and~1:2mutanttoWTratioofreadsintheWTbulk(sincetheWTbulkiscomprisedof+/-and
+/+ individuals in a 2:1 ratio). For dominant mutations, the concept is similar, but the nonsegregating(withareferencelinkedlocus)bulkwouldbetheWTbatch(+/+)whereasthemutant
bulkiscomprisedofmixedgenotypesofmutantindividuals(+/-and-/-).Ourpipelineisbased
3
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
on a short BASH script that calls BWA (Li and Durbin 2009), Samtools (Li et al. 2009), Picard
(http://broadinstitute.github.io/picard),GATK(McKennaetal.2010;DePristoetal.2011;Van
der Auwera et al. 2013), SnpEff (Cingolani et al. 2012) and R (R Core Team 2013) in order to
generatetwoVCFs(variantcallfiles)andaplot.ThefirstVCFlistsallcandidategenesandthe
second one contains the entire SNP population found by the GATK HaplotypCaller tool.
Generation of the candidate list is based on several criteria. First, and most importantly, the
mutationhastosegregateinthecorrectratio.Additionally,wechosetofocusonSNPsthatare
consistent with a specific set of criteria to identify the most likely causal mutation. However,
thesecriteriacanbereadilychangedbymanipulatingthesimple.shfile.Forexample,weselect
only single nucleotide changes, since single nucleotide mutagens such as EMS are commonly
used. We also select for mutations that have a significant effect on the protein rather than
synonymous mutations or changes in intergenic regions. Figure 1a shows a flowchart of the
pipelineconcept.
4
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Figure1.TheSIMPLEpipelineworkflow
User-requiredactionsareinyellowandSIMPLEactionsareinblue.Thespecificprogramthatexecutestheaction(s)
ofSIMPLEisindicatedatthebottomofeachbluebox.
Totestourpipeline,wesequencedtenpopulationsthatweregeneratedinthreeindependent
EMSscreens.WeusedfourdifferentpopulationtypesincludingM2(asegregatingpopulation
generatedbyselfingaheterozygousplantfromamutagenizedpopulation),M3(asegregating
population generated by selfing a heterozygous M2 plant), back-cross (an F2 segregating
populationgeneratedbycrossingamutantplantwiththeoriginalparentalline)andmap-cross
(anF2segregatingpopulationgeneratedbycrossingamutantplantwithanotheraccession,e.g.,
L.erecta).Someofthemutantsweresequencedmorethanonceorinsuccessivegenerations,if
theyhadmorethanonemappingpopulationorifinitialmappingfailed(seeTable1fordetails).
5
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Table1.MappingpopulationsthatwereusedtotesttheSIMPLEpipeline
line 1
474-3
300
300-4
300-7
300-7
633
M381
B381
EMS608
194
generation
M3
M2
M3
M3
M3
M2
F2-mapcrossonLer
F2-back-crossonCol-0;CASP1::GFP
M3
M2
mappedgene
SHR
RHD3
SECA1
GLUTATHIONEREDUCTASE2
MYB36
BIN4
RHD3
At_num
AT4G37650
AT3G13870
AT4G01800
AT3G54660
AT5G57620
AT5G24630
AT3G13870
mut/wt
seedlings
23/23
19/19
50/100
50/100
55/50
30/30
50/100
50/100
47/150
23/49
CDS
changes
664C>T
1751C>T
1751C>T
2254C>T
2254C>T
790G>A
protein
change2
Arg222*
174G>A
971G>A
1951C>T
Trp58*
Gly324Glu
Gln651*
Ser584Phe
Arg752*
Ala264Thr
numberofreads
(mut.ref/mut.alt
;wt.ref/wt.alt)
0/43;38/22
0/13;10/0
0/64;76/42
0/19;30/9
1/16;110/41
1/33;38/9
0/21;14/0
0/20;60/1
1/20;18/1
10/22;49/6
mean
coverage
(mut/wt)
73/75
21/25
89/156
23/61
90/198
61/57
32/21
31/83
46/39
23/49
validation3
allelictest&similarphenotypetoknownallele
remarks
allelictest&similarphenotypetoknownallele
allelictest&similarphenotypetoknownallele
similarphenotypetoknownallele(Yuetal.,2013)
complementation&similarphenotypetoknownallele
couldn'tbemappedwithM3population
wtplantsfromLerparentalline
wtplantsfromCol-0;CASP1::GFPparentalline
similarphenotypetoknownallele(Breueretal.,2007)
similarphenotypetoknownallele
1
Column1denotesthelinenumber.Line300-4wasgeneratedbyselfingaheterozygousplantfromline300. Each
2
3
colorrepresentadifferentEMSmutagenizingexperiment; asteriskindicatesastopcodon; seemethodsformore
information.
Thepipelineproducesthreefiles,cands4.txt,plot4.txtandRplot.pdf(seedetailsbelow)tohelp
identify the causal mutation. In most cases, the candidate list output file (cands4.txt files)
togetherwiththepositiontoSNP-ratioplotweresufficienttoidentifythecausalmutation(Fig.
2a). In line 194 no genes were found in the candidate list, most likely due to an erroneous
inclusion of WT seedlings in the mutant bulk. Nevertheless, the program provides additional
informationtohelpidentifystrongcandidates.Forexample,theplotindicatesalinkedlocusnear
thecenterofchromosome3(Fig.2b).BrowsingtheSNPpopulationinthisregionidentifiesa
prematurestopcodoninRHD3(AT3G13870).Indeed,thislinehasashortrootandwavyroot
hairphenotypesimilartothemappedM2-300/M3-300-4lineandtorhd3-1(Wangetal.1997;
2015).
There are two parameters that appear to strongly influence the success of the pipeline in
identifyingthecausalmutationoratleastashortlistofpotentialSNPs.First,incorrectinclusion
ofWTplantsinthemutantbulkleadstoreferencereadsinthemutantfastqfile.Asaresult,the
causalSNPandlinkedmutationsareviewedasheterozygous.Therefore,itisessentialthatall
individualsinthemutantbulkbecorrectlyphenotypedasmutant.Incaseswherethephenotype
isdifficulttorecognize,forexample,asoftenhappenswithQTLs,itisrecommendedtoworkwith
6
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
smallerbuthighconfidencepopulationsratherthanincludepotentiallywild-typeindividualsin
themutantbulk.Alternatively,mutantplantscanbetestedbasedonthesegregationoftheir
offspring.Asecondcrucialparameterissequencingdepth.Thisisimportantbecausedifferent
genomic regions have variable coverage depth and even when mean coverage appears to be
sufficient,someregionsmaystillhaveveryfewreads,whichrendersthemalmostimpossibleto
genotype.Werecommendaminimumof30xcoverage.
WerecommendworkingwithanF2/M2generationratherthananF3/M3generationfortwo
reasons. First, a segregating F3/M3 population generated from a single F2/M2 heterozygous
plantishomozygousforonequarterofthelociaswellasbeinghomozygousforthecausalSNP
whereasanF2/M2populationishomozygousonlyforthecausalmutationandthegenetically
linked region. Therefore, there is substantial information loss in the F3/M3 generation and
mappingisheavilydependentonthesegregationoftheWTbulk.Thisbulkislessinformative
becauseits2:1WTtomutantratiointhebinomialread-distributionhasahigherpotentialtovary
fromtheexpectedvalues.Incontrast,themutantbulkisexpectedtohavestrictlynoreference
readsandafewdozenalternatereads.AnotherreasontoprefertheF2/M2generationisthat
the following generation goes through a second round of recombination that generates
chromosomesandchromosomalregionswithmorecomplexhaplotypesthataremoredifficult
tointerpret.
Inputfiles
Theuserprovidesthefastqfilesthataregeneratedbyanext-generationsequencingplatform
suchasIllumina2000andplacestheminthefastqfolder.Eachbulk(mutantorWT)canhave
7
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
either a single file in the case of single-end sequencing or two files for paired-end. All other
dependenciessuchasreferencefilesandprogramsareeitherpresentordownloadedaspartof
thepipeline,withtwoexceptions.TheuserhastoprovidetheGATKexecutableduetolicensing
issues;JavaandRshouldalsobepre-installed.Ashortdescriptionofhowtorunthepipelineis
providedintheshortREADMEfile(Supplementaryfile3).
Outputfiles
Theprogramgeneratesmorethan30files,althoughmostofthemarenotnecessaryfornonprogrammers.Therearethreefilesthatcanhelpidentifythecausalmutation.ThefileRplot.pdf
(lookssimilartoFigure2aor2b)showsthechromosomallocationofeachSNP,plottedagainsta
LOESS-fittedratiovariable.Thisvariablewasgeneratedusingthisequation:
π‘Ÿπ‘Žπ‘‘π‘–π‘œ =
𝑀𝑑()*
π‘šπ‘’π‘‘()*
βˆ’
𝑀𝑑()* + 𝑀𝑑,-. π‘šπ‘’π‘‘()* + π‘šπ‘’π‘‘,-.
where,
𝑀𝑑()* is the number of reads in the WT bulk that are called with the reference genome
nucleotide,
𝑀𝑑,-. is the number of reads in the WT bulk that are called with a non-reference genome
nucleotide,
π‘šπ‘’π‘‘()* isthenumberofreadsinthemutantbulkthatarecalledwiththereferencegenome
nucleotide
8
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
and,
π‘šπ‘’π‘‘,-. isthenumberofreadsinthemutantbulkthatarecalledwithanon-referencegenome
nucleotide.
The ratio should be around zero for unlinked SNPs and approximately 0.66 for the causal
mutationandgeneticallylinkedSNPs(Figure2).WeremovedallSNPswitharatiolowerthan0.1
andappliedLOESSsmoothingwithdegree=2andspan=0.3.
9
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Figure2.Exampleoftheoutputplots.x-axis,chromosomallocation;y-axis,ratiovariable(seeequation1fordetails).
Thenumberofeachchromosomeislabeledontopofeachpanel.a)andb)Line300and194,respectively;seeTable
1andmaintextfordetails.
Thesecondimportantfile,cands4.txtliststhestrongestcandidategenes.(Supplementaryfile1
asanexample).Thesecandidatesareselectedbasedonthefollowingcriteria.1)TheSNPhasto
behomozygous-alternate(namely,onlyreadsthataredifferentfromthereferencegenome)in
10
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
themutantbulkandwithapproximately2:1referencetoalternatereadrationintheWTbulk.
Second,theSNPmustbeasinglenucleotidechangeratherthananindel.Third,themutation
shouldhaveasignificanteffectontheprotein,i.e.,spliceacceptor,splicedonor,startlost,stop
gainedormissensevariant.
Thethirdimportantfileisplot4.txt(Supplementaryfile2asanexample).ThisfilelistsalltheSNPs
withtheirlocations,thechangeinthecodingsequence,theeffectontheproteinandthenumber
ofreadsforeachalleleineachbulk.Thisfileisimportantincasethecands4.txtdidnotyieldany
candidategenes.Forexample,insomecases,thephenotypeofthesampledmutantsisdifficult
todistinguishfromWT,suchasthecasewhenmappingQTLsormutantswithsubtlephenotypes.
In these scenarios, WT plants might be included in the mutant bulk and the causal SNP is
interpreted as heterozygous for both bulks. In such cases, the user can manually browse the
plot4.txtfiletoidentify:1)SNPsthatarenearlyhomozygous-alternateinthemutantbulkand
approximately2:1referencetoalternateallelicratiointheWTbulkand2)SNPswithasignificant
effectonthecodingregion.
Discussion
Approximately half of the Arabidopsis genes have unknown molecular functions
(http://www.arabidopsis.org/portals/genAnnotation/genome_snapshot.jsp) which creates a
largeopportunityfordiscoveringnovelgenefunctionalitiesthroughrelativelysimpleforwardgenetic screens. The introduction of next-generation sequencing technologies offers new and
rapidopportunitiesforidentificationofthegenesaffectedbysuchscreens.Themainadvantage
ofgenomesequencingovertraditionalmap-basedcloningmethodsthatusemolecularmarkers
11
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
suchasRFLPandAFLPisthatNGSprovidessingle-nucleotideresolution.Whilepre-NGSmethods
could identify a genomic region, this had to be further mined for potential mutation(s) in
candidategenes.Bycontrast,sequencingamutagenizedgenomerevealstheentirepopulation
of genetic changes. The causal mutation is then precisely identified using
bioinformatics/computationaltools.AnotherimportantbenefitofusingNGSformappingisthat
asmallpopulation(afewdozenindividuals)isusuallysufficientformapping.Sincethemapping
resolutionisashighasasinglenucleotide,theimportanceofeachandeveryrecombinantfor
reducingtheregionofthecausalmutation(β€œchromosomewalking”)islessimportant.Inother
words,thesizeoftheregionboundedbyrecombinationeventsinwhichthecausalmutationlies
isnotimportantastheresearcherisabletovisualizeeachandeverySNP,evaluateit,andchoose
theonethathasthehighestlikelihoodofbeingcausal.MappingbyNGSisespeciallyfruitfulwhen
alargemappingpopulationcanbeeasilygeneratedandscreenedwhichisthecaseformany
plantspecies,nematodesandyeast.Wehavedevelopedaneasytousetoolthatallowsmapping
ofsingle-nucleotideinducedmutations,evenbyresearchersthathaveverylittleexperiencewith
bioinformaticstools.OurpipelinewilltakethefasqfilesasinputandwillidentifycausalSNPs
withnopre-processingstepsrequired.Theoutputtablesandplotscanbereadilyusedtoidentify
themostlikelymutation.Eveninthecasewherenocandidategeneispresent,thelistofSNPs
withtheirreadcallsnumberinthemutantandWTbulksandtheireffect,willmostlikelypoint
towardsthecorrectgene.Intheory,theSIMPLEpipelinecanbeusedwithanydiploidspecies
thathasbulkedmutantandWTmappingpopulations.Althoughnotmanyspeciesarecurrently
covered,itiseasytoaddanyspeciesofinteresttothepipeline(seeREADMEfile).Theprogram
runsonMacOSXversion10.11.6andLinuxrelease6.7(GNOME2.28.2)withJava1.7installed
12
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
(seeREADMEfileforspecification)whichisacommonlyusedplatforminmanylabs.TheSIMPLE
project is hosted on GitHub (https://github.com/wacguy/Simple) and includes a quick-start
READMEfile.
Methods
Growthconditionandscreening
ArabidopsisseedlingsweregrownonMSmedium((MurashigeandSkoog1962)containing1%
sucrose. EMS screen was performed according to (Weigel and Glazebrook 2006). DR5::Luc
seedlingwerescreenedusingtheLumazonesystemaccordingto(Moreno-Risuenoetal.2010)
withlines474-3,300,300-4,300-7,633and194,withpCASP2::GFPexpressionforlinesB381and
M381 (Liberman et al. 2015) and for the cortex marker (Brady et al. 2007) expression in the
EMS608 line. The GFP screens for CASP2 and the cortex marker were performed using Leica
fluorescencestereodissectingscope.
Sequencing
Pairedorsingle-endlibrarieswerepreparedusingNexteraXTorKAPAHyper-prepKitaccording
tomanufacturerinstructionandsequencedonIllumina2000/2500instrumentwiththeHighor
RapidthroughputmodeattheDukeCenterforGenomicandComputationalbiology.
Mutantvalidation
Line474-3hasasinglegroundtissuelayerandashortrootphenotype,similartoothershralleles
(Helariuttaetal.2000).ThephenotypeofF1offspringinacrossbetween474-3asamaledonor
13
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
andshr-2(Helariuttaetal.2000)hasthephenotypeoftheparentallinessuggestingthatline4743isallelictoSHR.
Line300(and300-4,asegregatinglineof300)wascrossedwithrhd3-1(Wangetal.1997;2015)
andtheF1progenieshaveshortrootandwavyphenotypesimilartotheparentallinessuggesting
thattheselinesareallelic.Line300-7wascrossedwithSALKline063371(Liuetal.2010)andthe
F1 progenies phenocopied both parental lines. myb36-1 (line 381 M and B) was previously
described(Libermanetal.2015).Line194hasthesamephenotypeasline300-4andrhd3-1allele
asdescribedabove.
AllF1crossesweresequencedforthemaledonorallele(seeTable2forprimers).
Table2
line
primers name
forword
reverse
remarks
474-3
cw49/cw50
TCTCCATACCTCAAACTCCTCC
TTGCCTCTCCGTCTACTGC
these primers were used to genotype the shr-2 allele
300/300-4
seq-RHD3-muts-f/r
cagagctttctgattaaacaaacttc
CAAGTGCTTGAGGCAAGTGA
300-7
M3-300-7-f/r
CACTGATGAAGAAAGGAAGAAGg
CATCTTGGATTCGATCGGTAA
PrimersthatwereusedtoamplifyandsequencethemalealleleinF1seedlings.
Dataaccess
WholegenomesequencingdatahavebeendepositedintheSequenceReadArchive(SRA)with
theaccessionnumberPRJNA353239.
Acknowledgments
WewouldliketothankpeopleintheBenfeylabforcommentsonthemanuscriptandhelpful
experimentalsuggestionsandtoCarmenWilsonfortechnicalassistant.WealsothanktheDuke
CenterforGenomicandComputationalBiologyforpreparingandsequencingtheDNAlibraries.
14
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
ThisresearchwasfundedbyagranttoPNBfromtheNIH(R01-GM043778),andbytheHoward
HughesMedicalInstituteandtheGordonandBettyMooreFoundation(GBMF3405).
Disclosuredeclaration
Theauthorsdeclarenoconflictofinterest.
Supplementaryfile1
Anexampleofcands4.txtfilefromline633.
Supplementaryfile2
Anexampleofplot4.txtfilefromline633.
Supplementaryfile3
Instructionfileforrunningthepipeline.ThisfileisalsoavailableontheSIMPLEGitHubrepository.
References
AustinRS,VidaurreD,StamatiouG,BreitR,ProvartNJ,BonettaD,ZhangJ,FungP,GongY,
WangPW,etal.2011.Next-generationmappingofArabidopsisgenes.ThePlantJournal67:
715–725.
BradySM,OrlandoDA,LeeJY,WangJY,KochJ,DinnenyJR,MaceD,OhlerU,BenfeyPN.2007.
AHigh-ResolutionRootSpatiotemporalMapRevealsDominantExpressionPatterns.
Science318:801–806.
CingolaniP,PlattsA,WangLL,CoonM,NguyenT,WangL,LandSJ,RudenDM,LuX.2012.A
programforannotatingandpredictingtheeffectsofsinglenucleotidepolymorphisms,
SnpEff:SNPsinthegenomeofDrosophilamelanogasterstrainw(1118);iso-2;iso-3.Fly
(Austin)6:80–92.
15
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
DePristoMA,BanksE,PoplinR,GarimellaKV,MaguireJR,HartlC,PhilippakisAA,delAngelG,
RivasMA,HannaM,etal.2011.Aframeworkforvariationdiscoveryandgenotypingusing
next-generationDNAsequencingdata.NatGenet43:491–498.
FukakiH,TamedaS,MasudaH,TasakaM.2002.Lateralrootformationisblockedbyagain-offunctionmutationintheSOLITARY-ROOT/IAA14geneofArabidopsis.PlantJ29:153–168.
HelariuttaY,FukakiH,Wysocka-DillerJ,NakajimaK,JungJ,SenaG,HauserMT,BenfeyPN.
2000.TheSHORT-ROOTgenecontrolsradialpatterningoftheArabidopsisrootthrough
radialsignaling.Cell101:555–567.
LeshchinerI,AlexaK,KelseyP,AdzhubeiI,Austin-TseCA,CooneyJD,AndersonH,KingMJ,
StottmannRW,GarnaasMK,etal.2012.Mutationmappingandidentificationbywholegenomesequencing.GenomeResearch22:1541–1548.
LewisEB,BacherF.1968.Methodoffeedingethylmethanesulfonate(EMS)toDrosophila
males.Dros.Inf.Serv.
LiH,DurbinR.2009.FastandaccurateshortreadalignmentwithBurrows-Wheelertransform.
Bioinformatics25:1754–1760.
LiH,HandsakerB,WysokerA,FennellT,RuanJ,HomerN,MarthG,AbecasisG,DurbinR.2009.
Thesequencealignment/mapformatandSAMtools.Bioinformatics25:2078–2079.
LibermanLM,SparksEE,Moreno-RisuenoMA,PetrickaJJ,BenfeyPN.2015.MYB36regulates
thetransitionfromproliferationtodifferentiationintheArabidopsisroot.Proceedingsof
theNationalAcademyofSciences112:12099–12104.
LiuD,GongQ,MaY,LiP,LiJ,YangS,YuanL,YuY,PanD,XuF,etal.2010.cpSecA,athylakoid
proteintranslocasesubunit,isessentialforphotosyntheticdevelopmentinArabidopsis.
JournalofExperimentalBotany61:1655–1669.
McClintockB.1950.TheOriginandBehaviorofMutableLociinMaize.ProcNatlAcadSciUSA
36:344–355.
McKennaA,HannaM,BanksE,SivachenkoA,CibulskisK,KernytskyA,GarimellaK,AltshulerD,
GabrielS,DalyM,etal.2010.TheGenomeAnalysisToolkit:AMapReduceframeworkfor
analyzingnext-generationDNAsequencingdata.GenomeResearch20:1297–1303.
Moreno-RisuenoMA,VanNormanJM,MorenoA,ZhangJ,AhnertSE,BenfeyPN.2010.
OscillatingGeneExpressionDeterminesCompetenceforPeriodicArabidopsisRoot
Branching.Science329:1306–1311.
MorganTH.1910.SexlimitedinheritanceinDrosophila.Science32:120–122.
MullerHJ.1927.ArtificialTransmutationoftheGene.Science66:84–87.
16
bioRxiv preprint first posted online Nov. 18, 2016; doi: http://dx.doi.org/10.1101/088591. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
MurashigeT,SkoogF.1962.ARevisedMediumforRapidGrowthandBioAssayswithTobacco
TissueCultures.PhysiolPlant15:473–497.
OkumuraK-I,GohT,ToyokuraK,KasaharaH,TakebayashiY,MimuraT,KamiyaY,FukakiH.
2013.GNOM/FEWERROOTSisrequiredfortheestablishmentofanauxinresponse
maximumforarabidopsislateralrootinitiation.PlantandCellPhysiology54:406–417.
SchneebergerK,OssowskiS,LanzC,JuulT,PetersenAH,NielsenKL,JørgensenJ-E,WeigelD,
AndersenSU.2009.SHOREmap:simultaneousmappingandmutationidentificationby
deepsequencing.NatMeth6:550–551.
RCoreTeam.2013.R:Alanguageandenvironmentforstatisticalcomputing.
VanderAuweraGA,CarneiroMO,HartlC,PoplinR,delAngelG,Levy-MoonshineA,JordanT,
ShakirK,RoazenD,ThibaultJ,etal.2013.FromFastQdatatohighconfidencevariantcalls:
theGenomeAnalysisToolkitbestpracticespipeline.CurrProtocBioinformatics43:
11.10.1–33.
WangHY,LockwoodSK,HoeltzelMF,SchiefelbeinJW.1997.TheROOTHAIRDEFECTIVE3gene
encodesanevolutionarilyconservedproteinwithGTP-bindingmotifsandisrequiredfor
regulatedcellenlargementinArabidopsis.Genes&Development11:799–811.
WangJ,WangY,YangJ,MaC,ZhangY,GeT,QiZ,KangY.2015.ArabidopsisROOTHAIR
DEFECTIVE3isinvolvedinnitrogenstarvation-inducedanthocyaninaccumulation.JIntegr
PlantBiol57:708–721.
WeigelD,GlazebrookJ.2006.EMSMutagenesisofArabidopsisSeed.CSHProtoc2006:
pdb.prot4621.http://cshprotocols.cshlp.org/content/2006/5/pdb.prot4621.abstract.
17