Download JGI - MaizeGDB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Karyotype wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Pathogenomics wikipedia , lookup

Chromosome wikipedia , lookup

Primary transcript wikipedia , lookup

SNP genotyping wikipedia , lookup

Polyploid wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Transposable element wikipedia , lookup

Genome evolution wikipedia , lookup

Nucleosome wikipedia , lookup

DNA barcoding wikipedia , lookup

Y chromosome wikipedia , lookup

DNA vaccination wikipedia , lookup

Genome (book) wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

DNA sequencing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genealogical DNA test wikipedia , lookup

RNA-Seq wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Molecular cloning wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

X-inactivation wikipedia , lookup

Epigenomics wikipedia , lookup

Human genome wikipedia , lookup

Point mutation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

DNA supercoil wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Neocentromere wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Human Genome Project wikipedia , lookup

Microevolution wikipedia , lookup

Microsatellite wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Metagenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomic library wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Transcript
Advancing Science with DNA Sequence
Maize Missouri 17 “chromosome 10”
project update
Dan Rokhsar
3 October 2006
Advancing Science with DNA Sequence
Aims: “Plan A”
• Generate and annotate “gene space” for the ~180
Mbp chromosome 10 of Mo17 using a random
shotgun approach from flow-sorted
chromosomes.
• This resource will complement the BAC-by-BAC
sequencing of B73, informing our understanding
of intra-species variation, from SNPs to
chromosomal organization.
• The project will serve as a pilot R&D study for
chromosome-scale random shotgun sequencing
of complex genomes
Advancing Science with DNA Sequence
Challenges
• Produce high-quality shotgun library from a single
chromosome (year 1)
– Apply flow sorting methods to root tip preparations or oatmaize hybrid lines with maize Mo17-10
• Assemble shotgun sequences and relevant mapping
data to recover non-repetitive and ‘distinguishable
repetitive’ regions (years 1-2)
– DuPont Mo17 BAC library, BAC-end sequence
– Targeted mapping to link across complex repeats
• Targeted finishing of “gene space” from wholechromosome-shotgun draft (year 2)
– Interplay of finishing with annotation
Advancing Science with DNA Sequence
Project goals for researchers and breeders
• Unlimited markers for mapping
• Nearly complete gene set for Mo17-10
• Conserved synteny/chromosome dynamics with
sorghum
• Evolutionary approaches empowered
• Novel reagents begin to emerge
• Framework for understanding strain differences
Advancing Science with DNA Sequence
Milestones
• Year 1
– Produce test libraries from mock flow sorted
material (JGI)
– Produce preliminary flow sorting data for discussion
at Advisory Committee meetings (NFCR)
– Produce 1-10 micrograms of flow sorted
chromosome 10 material (NFCR).
– Complete library production (JGI)
– Begin shotgun sequencing, with associated data
deposition (JGI)
Advancing Science with DNA Sequence
Milestones
• Year 2.
– Complete initial shotgun assembly, with associated data deposition
(JGI)
– Integrate with physical map data from DuPont (JGI)
– Complete two rounds of primer walking (SHGC)
– Annotate initial draft assembly, with data release (JGI)
– Complete subsequent rounds of targeted finishing reactions
(SHGC)
– Complete physical mapping of markers and release to public
repositories (PGML)
– Produce final assembly incorporating finishing data (JGI, SHGC)
– Publish detailed analyses of Maize Genome Project outcomes (all)
– Offer summer course on maize genome data (JGI)
Advancing Science with DNA Sequence
Problems at first step
• First milestone from “plan A” not met
– Flow sorting system is going …
– But no significant progress to chromosome flow sorting at
preparative scale
– Some small-scale root tip chromosome preps have been done, but
not ready to scale up
– Three months of chromosome preps (~10,000 root tips) would be
needed to obtain even a few tenths of micrograms of DNA for first
chromosome-specific cloning attempt, outcome not guaranteed
– JGI library group would prefer more material for robust shotgun
library prep (minimum of several ug); previous chromosomespecific lambda cloning (Arumuganathan) is more forgiving, still
gave low coverage (2X)
– Attempted to contract to Dolezel’s group in Czech. but their
capacity is taken with wheat BAC preps. Willing to advise.
Arumuganathan is now doing human cell sorting, not working with
chromosome preps, and cannot take on task.
Advancing Science with DNA Sequence
Even in expert hands, purity of
chromosome prep is 85-90%
•
Li, Arumuganathan, et al. Flow cytometric
sorting of maize chromosome 9 from an
oat-maize chromosome addition line.
TAG (2001).
Advancing Science with DNA Sequence
Proposal for “Plan B”
• Continue development of flow sorting chromosome 10, but
decouple from sequencing plans in current project
• Produce ~3/4 X random whole genome shotgun sequence
of Mo17 in plasmid and fosmid paired ends (mix TBD)
– ~3 months to bulk prep DNA, make libraries, do quality control
testing/sampling (Jan 2007)
– <3 month to schedule and perform production sequencing run (Apr 2007)
• Note: JGI is not in position to take on significant BACbased shotgun from B73 project
– perhaps a few hundred clones, maybe ~1% of project
Advancing Science with DNA Sequence
Alignment of Mo17 “gene space”
with B73 allele ~97% identity
•
•
Mo17
B73
1
88023
AACCAATTGGCAGCATTATTATTTTGAACAGATAAAAATCACGCCAGGGCGATGGATACT
..............C.........C...................................
60
88082
•
•
Query
Sbjct
61
88083
CAGCTCAATCACGGAATTCATCCATGAACTTCTCGTGGAACTCCTTGAGCCTGGATACTA
............................................................
120
88142
•
•
Query
Sbjct
121
88143
TCGCAGGTATCTTGTCCTCCTGCGGCAGTATCGTGCACCTGAAGTGCCACGTTCCAGGGA
............................................................
180
88202
•
•
Query
Sbjct
181
88203
CCTTCA--------CG--G-T--G-T-C-GC-AAAGCAACGTGTCAGTATCGTGTGCATC
......CGGTGTCG..AA.T.AA.A.C.A..A................G...........
223
88262
•
•
Query
Sbjct
224
88263
TGAAGCTTAACGATGCTTTGAAACGGCAGGGACTTCCACaaaaaaaGG-CTTTTGAGATT
.............................................G..G...........
282
88322
•
•
Query
Sbjct
283
88323
ACCCACCTGTCCAAACCCAGAACCGGGGACGACGACGATTCCAGTGGCTTCCAGTAGGCG
............................................................
342
88382
•
•
Query
Sbjct
343
88383
TTTTGCGTAGTATGCATCTGGCGCAGTGCCGACTGCTTGGGCAGCTCCAATTGCCTTCTG
..........................................T.................
402
88442
•
•
Query
Sbjct
403
88443
GGGTAAATGAAGGCGTGGGAACAGATACATTGCACCTTCGGCTTTGTTGCATGTAATTCC
............................................................
462
88502
•
•
Query
Sbjct
463
88503
TTCTAAACTGTTGAATGCTTCTTCCAAAGCCTGTGACAGAAGAACACGTAACAATAAGAA
............................................................
522
88562
•
•
Query
Sbjct
523
88563
GGTGCTTATAAGATTCAGGaaaaaaaa--TCTTTTTTAAAGTTGTTTTGCATATGTTAAC
...........................GA...............................
580
88622
•
•
Query
Sbjct
581
88623
GGACTACTCGACCAGGGGTATAGCTTTTATTCTTGTTTGATATTTCCATATTAGGACTCT
..........G.................................................
640
88682
• In unique “genic”
regions (especially
coding sequence),
can easily align
Mo17 and B73 to
detect
polymorphism.
• Cf comparable
human-chimp
alignments at
~98.5%
(putative
aminotransferase,
Morgante et al.)
Advancing Science with DNA Sequence
Likely outcomes of Plan B
• Align Mo17 shotgun to emerging B73 draft (at
quarterly intervals)
– Should be easy to recognize allelic variants in non-repetitive (i.e.,
genic) regions, based on Morgante et al. results. Expect unique
coverage of ~40% of B73 sequence. (alternative: MeF, C0t)
– In a typical genic locus of 5 kb, conservatively expect ~100
mismatches or indels. Dense markers allows rapid development of
multiple markers per gene. (Distribute via Gramene, NCBI)
– Repetitive regions within B73 differ by ~90-99%, so identifying
“allelic” repeats will be difficult given ~97% polymorphism (Attempt
to localize “sisters” of unique reads based on B73 map.)
– In places where both ends of a clone are alignable, can confirm
local colinearity of B73 and Mo17, or identify rearrangements and/or
deletions (A la human-chimp comparison, but expect worse)
– Mo17 fosmid clones with localized ends will be available for
distribution and/or targeted sequencing of loci-of-interest
– Potential start towards Mo17 WGS if desirable
Advancing Science with DNA Sequence
JGI Sorghum update
• Sorghum WGS currently at ~7X (in Trace Archive)
– mostly small insert plasmids sequenced to date
• BAC-end and fosmid-end sequences coming by end 2006
– but uniformity of BAC library is in question, may limit assembly
• Quick and dirty assemblies look good using “skeleton” of
method proposed for maize
– ~13 kb contigs and ~300 kb scaffolds (N50 #’s) at ~5X
– considerable scaffolding even without much BAC/fosmid data
– recovering ~2/3 of genome is easy even setting aside “difficult”
repeats, as predicted for maize
– Expect full 8X assembly (with map integration) ready late Q1 2007.
• Quick and dirty annotation: ~42,000 genes in low copy
families
– plus >100K retrotransposon-ish genes even in easy-to-assemble
regions
Advancing Science with DNA Sequence
Early peek at Sorghum-rice comparison
shows syntenic segments
"4dtv Dist/RiceSorg"
Sorghum-Rice syntenic segments are
of uniform molecular “age”
0.5
Comparable to human-chicken
divergence
4DTv distance
0.4
Younger than Rice-Rice paralogs
(from cereal-specific duplication)
0.3
0.2
0.1
0
0
10
20
30
40
50
Segment size (lo ci)
60
Loci in syntenic block
70
80
90
Advancing Science with DNA Sequence
Maize divergences (transversions)
Maize: 7,960 complete/29,922 partial peptides
sugarcane
Sorghum: 5,927 complete/19,681 peptides
Sugarcane: 6,566 complete/ 21,850 peptides
~16,000 gene families at base of grasses
~12,000 families defined by rice/arabidopsis/poplar
sorghum
rice
Arabidopsis