Download Sea Urchin Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Replisome wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Molecular cloning wikipedia , lookup

Community fingerprinting wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA-Seq wikipedia , lookup

DNA sequencing wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

Exome sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Transcript
Sea Urchin Genome
Origin
• The sea urchins (Class
Echinoidea) are one of five
extant classes within the
phylum Echinodermata. S.
purpuratus belongs to a
complex of 10 species that
arose in the North Pacific
during a recent burst of
speciation and ecological
diversification about 15-20
Mya (1-3).
Deuterostome
• The sea urchin genome currently provides
the only outgroup within the
deuterostomes for comparison with
chordate genomes, and thus, for defining
deuterostome-specific, chordate-specific,
and pan-bilaterian genes.
Bilateria
• The Bilateria include the Deuterostomia
and other superclades: the Ecdysozoa
(insects, nematodes and many lesser
groups); the assemblage known as the
Lophotrochozoa (mollusks, annelids,
brachiopods and many other groups); and
the basal acoel flatworms
Phylogeny
Radially symmetric Adult: Body
Plan
• As in all deuterostomes, the adult sea
urchin has mesodermal sheets covering
both gut and inner body wall; it has a gut
divided into pharynx, esophagus, stomach
and intestine; it has a large population of
wandering immune effector cells, many
macrophage-like but of other forms as
well, within its coelom; it has an anus,
gonopores, and a mouth with (five) teeth.
Anterior-posterior Axis
• the expression patterns of posterior hox
genes, indicate that the anterior end of the
sea urchin is the oral, ventral surface on
which it crawls, and the anal or posterior
end of the body plan is the uppermost,
dorsal surface.
Bilateral Larvae
• The bilaterally symmetric embryo is
more easily understood; its
organization is literally transparent.
The S. purpuratus embryo/larva,
complete three days after fertilization,
consists of only about 15 cell types,
including neurons, skeletogenic cells,
a few muscle cells, gut cells, and
some mesenchymal mesoderm cells,
all arranged in structures of single-cell
thickness.
Metamorphosis
• On metamorphosis the larva settles down,
most larval tissues are resorbed, and the
juvenile emerges. Afterwards, the
digestive tract is re-grown in a 4 new
position defining the adult dorso-ventral
axis, mouth down. The pentameral oral
structures and the water vascular system
of the adult develop within the rudiment
before metamorphosis.
Intraspecific Variation
• Intraspecific variation in the unselected
regions of the DNA sequence of this
species was measured at 4-5% (compare
to human at about 0.5%, let alone inbred
strains of mice).
• Polymorphism posed enormous problems
for the sequence assembly, and required a
new strategy.
WGS (Whole Genome Shotgun
Sequencing)
• When the WGS reads reached 3x
coverage, an initial assembly was created
(v 0.1; 9/2003).
Assembling WGS
• produce enough WGS sequence for 6-fold
sequence coverage of the genome
• In the case of the sea urchin, it was
already known that single-copy DNA
varied as much as 4-5%
Atlas Genome Assembly
http://www.genome.org/content/vol14/issue4/images/large/74731-33f1_4t_rev1.jpeg
Atlas Genome Assembly
Steps in the Atlas Assembly System. (1) Trim off vector and low-quality portions of
reads. (2) Count k-mers in WGS reads, saving the overall distribution plus specific
counts for k-mers with copy number above a threshold. (3) Align BAC and WGS
reads sharing rare k-mers and save overlap edges for high-quality end-to-end
alignments. (4) Enrich each BAC read set with overlapping WGS reads and their
mates, assemble using Phrap, scaffold and check for consistent assembly. (5)
Arrange BACs into contiguous sets (bactigs) and flag problem BACs for closer
quality checking. (6) Assemble bactigs in waves designed to limit the number of
BACs that are input to Phrap. (7) Treating each bactig scaffold as a unit, rescaffold
to produce superbactigs, detecting problem joins and missed merges between
bactigs. (8) Link superbactigs together into ultrabactigs based on remaining
(single) mate-pair links, fingerprint contigs, markers and synteny with human and
mouse genomes. (9) Format chromosome files with contigs separated by strings
of Ns representing gaps. Quality-control feedback steps include (10) examining
coassembly scores of problem BACs and removing foreign trays of reads; (11)
resolving superbactig conflicts by modifying bactigs and possibly flagging BACs
for closer checking; and (12) resolving ultrabactig and mapping conflicts in
collaboration with research groups that generated FPC and marker information.
http://www.genome.org/content/vol14/issue4/images/large/74731-33f1_4t_rev1.jpeg
Assembling WGS
• Comparison of the assembled contigs and
scaffolds to high quality assemblies of 25
BAC clones allowed for characterization of
regions of the genome that appeared
difficult to assemble (e.g., regions of
dissimilarity between haplotypes) and to
develop tools to identify such regions and
merge them (e.g., Polyjoin).
– simple altering of mismatch parameters was
not sufficient
SNPs
• Once all the sequence was available and
assembled, a comparison of all WGS
reads within each region of the 3-fold
WGS Assembly revealed at least one
single nucleotide polymorphism (SNP) per
100 bases, and a comparable frequency of
insertion/deletion (indel) changes.
SNPs
• With an 800-base sequencing read the
assembler was faced with an average
minimum of 16 mismatches per read, or 1
mismatch every 50 bases, in a region that
was correctly assembled.
Final WGS
• The final WGS assembly including the full
6-fold sequence coverage was submitted
to GenBank (accession number
AAGJ01000000; 4/2005).
BAC fingerprint map generation
• Concurrent with the WGS sequencing
effort, the BAC library was fingerprinted
using Eco RI at the Genome Sciences
Centre in Vancouver, Canada
(www.bcgsc.ca/lab/mapping/).
BAC fingerprint map generation
• The 85,820 fingerprinted clones (16.8 fold
clone coverage of an 800Mb genome)
were analyzed for shared restriction
fragments and found to form 5,788 contigs
leaving 17,378 singletons.
• The generated minimal tiling path of 8,248
clones suggested a genome size of
approximately 1,015Mb
Clone-array pooled shotgun
sequencing (CAPSS) of BAC
clones
• First application
• Previously, individual DNA preparations
and shotgun libraries were made for each
BAC clone in the tiling path
Clone-array pooled shotgun
sequencing (CAPSS) of BAC
clones
• the CAPSS strategy, the minimal tiling
path clones were separated into groups of
576 non-overlapping clones.
• Sequenced and assembled with Atlas
Summary
Summary
CAPPS
Probability of a Base Not
Sequenced
Probability of a Base Not
Sequenced
Gap Length
High Density Picoliter Reactors
a, Genomic DNA is isolated, fragmented, ligated
to adapters and separated into single strands (top
left). Fragments are bound to beads under
conditions that favour one fragment per bead, the
beads are captured in the droplets of a PCRreaction-mixture-in-oil emulsion and PCR
amplification occurs within each droplet, resulting
in beads each carrying ten million copies of a
unique DNA template (top right). The emulsion is
broken, the DNA strands are denatured, and
beads carrying single-stranded DNA clones are
deposited into wells of a fibre-optic slide (bottom
right). Smaller beads carrying immobilized
enzymes required for pyrophosphate sequencing
are deposited into each well (bottom left). b,
Microscope photograph of emulsion showing
droplets containing a bead and empty droplets.
The thin arrow points to a 28-m bead; the thick
arrow points to an approximately 100-m droplet.
c, Scanning electron micrograph of a portion of a
fibre-optic slide, showing fibre-optic cladding and
wells before bead deposition.
Sequencing
The sequencing instrument consists of the
following major subsystems: a fluidic
assembly (a), a flow chamber that includes
the well-containing fibre-optic slide (b), a
CCD camera-based imaging assembly (c),
and a computer that provides the
necessary user interface and instrument
control.
Nucleotide incorporation is
detected by the associated
release of inorganic
pyrophosphate and the
generation of photons5, 12
Pyrosequencing
•
•
•
•
•
A PCR-amplified ssDNA template is hybridized to a sequencing primer and
incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase
and apyrase, and with the substrates adenosine 5´ phosphosulfate (APS)
and luciferin.
The addition of one of the four deoxynucleotide triphosphates (dNTPs)
initiates the second step. DNA polymerase incorporate complementary
dNTPs onto the template. This incorporation releases pyrophosphate (PPi)
stoichiometrically.
ATP sulfurylase quantitatively converts PPi to ATP in the presence of
adenosine 5´ phosphosulfate. This ATP drives the luciferase-mediated
conversion of luciferin to oxyluciferin that generates visible light in
amounts that are proportional to the amount of ATP. The light produced in
the luciferase-catalyzed reaction is detected by a charge coupled device
(CCD) camera and this can be analyzed in a program. Each light signal is
proportional to the number of nucleotides incorporated.
Nucleotide degradation by apyrase removes dNTPs from the solution.
New nucleotides can be added and a new cycle can start.
Pyrosequencing
reaction