Download 1. A brief overview of sequencing biochemistry

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Gel electrophoresis wikipedia , lookup

RNA-Seq wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

DNA supercoil wikipedia , lookup

Replisome wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Molecular cloning wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Genome evolution wikipedia , lookup

Molecular evolution wikipedia , lookup

DNA sequencing wikipedia , lookup

Exome sequencing wikipedia , lookup

SNP genotyping wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
Supplementary reading materials on Genome sequencing (optional)
The materials are from Mark Blaxter’s lecture notes on Sequencing strategies and Primary Analysis
1. A brief overview of sequencing biochemistry
Modern DNA sequencing uses primer directed extension of a DNA strand from a single-stranded template
using a DNA polymerase. Primers are 18-25 bases in length. Either temperate or thermostable polymerases can
be used, but thermostable polymearse (Taq polymerase and related enzymes) are the norm.
Most sequencing now uses the dideoxy termination system. While the DNA polymerase will add a
dideoxynucleotide complementary to the template strand, it cannot further extend that product after the addition
of a dideoxynucleotide. This biochemistry is used to produce populations of products specifically terminated at
either A, G, C or T residues. These are labeled in some way and visualised after separation by electrophoresis.
One method for labeling is to use radioactive nucleotides (P32 or P33 or S35) to label the oligonucleotide
primer. Four reactions are performed (one each for A,G,C and T), and electrophoresed side by side in a
denaturing polyacrylamide gel. The products are separated by size at base resolution and the sequence read
from the pattern of bands on the gel. Labelling of primers is a time consuming step.
Alternatively, radiolabelled nucleotide triphosphate (usually S35 dCTP) can be added to reactions
performed with unlabelled primer, and the products run as before. This method was the one most widely used
before the introduction of automated sequencers. Sequence read lengths from radioactive gels were typically
200-450 bases from four or eight lanes (the same reactions are often electrophoresed twice, once for four hours,
to resolve long fragments, and once for two hours).
To allow automated, nonradioactive sequencing, dye-labelled sequencing was devised. This method uses a
set of rhodamine-based fluorescent dyes that are detected after excitation with a laser. The sequencing can
proceed as for radioactive labelling, with dye-labelled primer, or dye labelled nucleotide in a dideoxyterminator
reaction. These are run (four lanes per sample) on an acrylamide gel, and the fluorescence read by scanning
with an infrared laser as the DNA products migrate past a particular point on the gel.
The availability of multiple dyes with different emission spectra led to the development of the four-dye one-lane system. Four aliquots of primer end-labelled with the four different dyes are used to perform the
A,G,C and T reactions. These are pooled and run in a single lane of a gel. The sequenator reads the gel by using
a spectrop[hotometer to distinguish between the different dye primers, and thus the different bases.
This system has been further improved by the development of dye-labeled terminators
(dideoxynucleotides) that will simultaneously terminate and fluorescently tag a product. These reactions can be
performed in a single tube, and run in a single lane. Currently, the four-dye systems can routinely read >600
bases/lane, and the four-lane one-dye systems can read over 1kb per reaction.
There is continuing effort to improve both the machines for running sequence and the chemistry of the
reactions. Brighter dyes that give better resolution between emission spectra, and give more even incorporration
have been developed. in terms of instrumentation, it is now possible to perform the electrophoresis and
detection in a calillary tube system, resulting in much improved throughput (current machines can do 96 reads
of >400 bases every 3 hrs, each, for only about £300,000/machine).
Some regions of DNA are difficult to sequence due to the intrinsic properties of the DNA. This can be
compositional bias (AT versus GC content), homopolymeric runs (long stretches of a single nucleotide) or the
presence of heat stable hairpin-forming seuences that prevent or impede the passage of the polymerase. To
sequence such regions it is possible to try different methods (dye primer versus dye terminator), to use
nucleotide analogues (inosine instead of guanosine) and to add modifiers to the reaction mix (such as
dimethylsulphoxide).
2 Strategies for sequencing
For smaller pieces of DNA (individual clones, small viruses, plasmids) it is possible to sequence them to
completion by primer walking strategies. A start point is made using a primer to a region of known sequence. A
sequencing reaction is performed and the new sequence is used to design a new primer further along the
molecule. This primer is used for sequencing and the process repeated until the molecule has been completely
sequenced on both strands. The problems associated with this strategy are
it is slow (400 bases at a time, and 2-6 days between sequencing events)
it is expensive (as so many different new primers are needed).
Despite these limitations, this is the standard way used by most non-genome labs for sequencing fragments
longer than 400 bases.
Alternative methods devised for sequencing such small pieces of DNA involve generating nested deletions
of the cloned DNA fragment, recloning these and sequencing the resultant population of clones from a
universal primer site in the cloning vector. If a set of nested deletions are made, each 400 bases shorter than the
previous one, a sequencing walk can be undertaken using only one primer. There are several ways of making
deletions, and restriction enzymes and non-specific exonucleases can be used. This method is slow because it
involves the step of making the deletion clones which can be tricky.
Primer Walking for sequencing
use a known primer to get first sequence
use new sequence to predict new primer, repeat
Contiguate sequences (from both strands)
Physical mapping
A physical map is a set of cloned DNA fragments whose position relative to each other in the genome is
known. The complete DNA sequence of a gene or genome is the ultimate physical map. However, it is useful to
construct intermediate level physical maps from cloned fragments: these cloned fragments can subsequently be
used for sequencing or other manipulations.
A large genome (say a bacterial genome of 3 million base pairs (Mb)) can be subcloned into a lambda
phage vector, capable of carrying between 15 and 20 kb. Thus the minimum number of clones required to cover
the genome will be 2000, if there is no overlap. A library of such clones can be compared to each other and
those that overlap aligned and placed in position on the chromosome relative to each other. The sets of clones,
called contigs or contiguated clones, can then be checked for stability, exact representation of the starting
genome, etc.
It is usual to use large-insert vectors for physical mapping. Thus lambda phage (up to 20 kb), cosmid (<35 kb),
bacterial artificial chromosome or BAC (<150 kb) and yeast artificial chromosome or YAC (<3 Mb) vectors
are routinely used. For smaller sequencing projects it may be viable to use plasmid vectors.
A genomic library
A genome....cut into pieces
And cloned as a library in a vector (red)
Building a physical map
The clones are ordered by either hybridisation, fingerprinting or end-sequencing.
Hybridisation methods use labelled probes to detect clones that share sequence. Probes can be generated
from each end of the clone by "end rescue" or DNA fragments isolated in other ways can be used (for example,
cDNA clones). One problem with the hybridisation method is that in the presence of a significant repeat
content in the genome, some probes/clone ends will fail to provide a unique link to the next segment of the
genome. This method was used to generate a map of the Schizosaccahromyces pombe genome.
Hybridisation mapping:
1 pick clones into a grid
2 hybridise to probe 1
3 hybridise to probe 2
4 build contigs
In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to
only one probe are predicted to extend out to the left or right.
Fingerprinting methods rely on the presence of unique restriction sites (based on unique sequence) in
segments of DNA shared by two overlapping clones. DNA is prepared from clones and digested with one or
two restriction enzymes to generate a set of subfragments. These are analysed on high-resolution gels, and the
"fingerprint" pattern of bands used to identify the clone. Detection of bands can be by radioactive labelling of
each one, or by staining with sensitive DNA-detection dyes and visualisation using fluorescence readers.
Computer algorithms are used to compare fingerprints from different clones and define overlaps. The C.
elegans genome was physically mapped in this way using a cosmid clone library of 17,000 clones and a
two-enzyme digest.
Fingerprinting Mapping
1.
Digest clones with restriction enzyme and label, electrophorese on gel. (V=vector bands present in all
clones)
2.
Determine overlap by shared patterns of bands.
In completing a physical map, it is often essential to use more than one library, and more than one cloning
system. In random sampling from the library, it is possible that certain segments of the genome are not
represented and others overrepresented. This stochastic selection will result in a physical map with gaps. The
gaps can be crossed by using directed approaches using hybridization selection of "bridging" clones. However,
not all DNA is equally easily cloned. Bacteria for example, tend to "dislike" highly repetitive sequences, and
thus repetitive DNA will be underrepresented in a bacterial clone library. To over come this differential
representation problem, several solutions have been used. Vectors that have lower copy number per cell tend to
yield libraries with better representation (as it is less likely that a "poisonous" sequence will kill the cell, or a
repetitive sequence find a partner to recombine and delete with). Alternatively, different cloning hosts (bacteria
versus yeast in general) have different properties, and it is often possible to recover "unclonable" DNA from an
alternative host. Yeast, for example, is able to maintain AT rich DNA more effectively than E. coli.
A portion of the C. elegans physical map.
The longer lines at the top are YAC clones (their names start with a Y). The shorter items below are
cosmid clones. The bold YACs are ones used in mapping cDNAs to the genome. The yellow boxed cosmids
are those sequenced. Cosmids with a following * are ones that are cananical for a set of smaller cosmids, that
are not displayed. The yellow bar at the bottom represents the sequenced DNA. The triangles indicate points of
transposon insertion in strains of C. elegans.