Download Recombinant DNA Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gel electrophoresis wikipedia , lookup

Mutation wikipedia , lookup

DNA barcoding wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Gene expression wikipedia , lookup

DNA repair wikipedia , lookup

RNA-Seq wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Maurice Wilkins wikipedia , lookup

Silencer (genetics) wikipedia , lookup

DNA sequencing wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Molecular evolution wikipedia , lookup

SNP genotyping wikipedia , lookup

Replisome wikipedia , lookup

DNA vaccination wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

DNA supercoil wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Molecular cloning wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genomic library wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
Recombinant DNA
Technology
Isolating DNA
•
Chemically, DNA is a very simple compound, with little variation
•
between species. It is also chemically quite different from other
macromolecules (proteins, carbohydrates, lipids).
The basic steps:
1.
2.
3.
•
Break the cells open
Remove proteins and other macromolecules
Concentrate the DNA by precipitating it and re-suspending it in fresh
buffer.
Methods for breaking the cells open vary between tissues, but
usually involve mechanical disruption in a buffer that inhibits
DNAases.
–
–
–
Buccal swabs (cells from inside the cheek) are usually extracted by simply incubating
them in a buffer containing detergent (to disrupt cell membranes) and a powerful
protease (to destroy DNAase enzymes)
Mechanical homogenizers (including kitchen blenders) work well for fibrous tissues.
Some homogenizers are very small, to extract DNA from a hair follicle, for example.
Some tissues can be frozen in liquid nitrogen and ground up mechanically
DNA Separation
•
Separating DNA from other cell components relies on the
fact that DNA is chemically different from proteins and
other macromolecules.
–
–
•
•
If the cell extract is mixed with a phenol/chloroform solution,
most of the proteins and other cellular junk goes into the
phenol/chloroform layer while the DNA and RNA stay in the
aqueous phase. (phenol/chloroform extraction)
Phenol is very nasty, and many methods have been invented
to get around this step.
DNA can be precipitated and then resuspended in a
smaller volume using a high salt concentration plus
ethanol (ethanol precipitation).
DNA is usually resuspended in a buffer containing EDTA,
which chelates (removes from solution) Mg2+ ions. This is
useful because all DNA-degrading enzymes use Mg2+
ions as a co-factor.
Transfection
• How to get DNA back into the cell after
manipulation in vitro.
– Transfection means the cell takes up naked DNA
from the environment and incorporates it into its
genome.
– In non-mammalian cells, this process is called
transformation. “Transfection” is used for mammals
because transformation also means going from normal
to cancerous.
• Need to get the DNA through the cell
membrane and into the nucleus.
• Transfection can be transient or stable.
– Transient transfection produced DNA that is NOT
incorporated into the genome. After 2-3 days it gets
degraded and lost.
– Stable transfection causes the transfected DNA to be
incorporated into the genome, where it remains
permanently. In most cases, the transfected DNA is
incorporated into a random chromosomal location.
More Transfection
• Several transient transfection methods are available:
– Lipofection: coating the DNA in lipid vesicles that fuse with the cell
membrane. This is very efficient and common with mammalian cells.
– Electroporation: if you subject a cell to a high voltage electrical field (say
20,000 volts per centimeter), temporary holes appear in the membrane that
DNA can go through. The holes disappear quickly once the voltage
disappears.
– Gene gun: Tiny gold particles are coated with DNA, then blasted through the
cell wall using high pressure gas. One method uses blank .22 rifle cartridges
to generate the pressure.
• Stable transfection with viral vectors. The virus carries the DNA
into the cell more efficiently than any other method.
– You need to select the rare cells that have incorporated the DNA using a drug
resistance gene as part of the transformation vector. The only cells that
survive treatment with the drug have incorporated the foreign DNA.
– Safety can be a problem: these vectors are derived from pathogenic viruses
and some generate strong immune responses.
– Also, random insertion of DNA into the genome can lead to mutations,
including the induction of cancer.
Viral Transfection
• The process:
– Genetic engineering,
done using E. coli. This
results in your
engineered DNA (the
transgene) inserted into
a plasmid vector
– Plasmid is transfected
into a special cell line
(packaging cell line) to
add the viral coat.
– The viruses produced
by the packaging line
can be used to transfect
other cell lines.
Electrophoresis
•
•
•
•
•
•
Electrophoresis is the separation
of charged molecules in an
electric field.
Nucleic acids have 1 charged
phosphate (- charge) per
nucleotide. This implies a
constant charge to mass ratio.
Thus, separation is based almost
entirely on length: longer
molecules move slower.
Done in a gel matrix to stabilize:
agarose or acrylamide.
average run: 100 Volts across a
10 cm gel, run for 2 hours.
Stain with ethidium bromide:
intercalates between DNA bases
and fluoresces orange with UV
light.
Run alongside standards of known
sizes to get lengths
Restriction Enzymes
•
•
One of the easiest ways to characterize DNA is to
determine the positions of restriction sites: sequences
that are cut by restriction enzymes (more formally,
restriction endonucleases).
Restriction enzymes are part of the defense systems
used by bacteria against foreign DNA.
– Foreign DNA entering the cell is cut by the Res, but host
DNA is modified so it can’t be cut.
•
•
•
Restriction enzymes cut at 4-8 bp sequences that are
usually inverted repeats: GTCGAC, for example.
Hundreds of different REs, cutting at different sites, are
available.
Restriction sites are in fixed positions on the DNA.
Digestion with single enzymes are with pairs of
enzymes gives bands of fixed size on electrophoresis
gels. These sizes can be put together to make a map
of a DNA molecule.
Needles in Haystacks
• The primary purpose of molecular techniques in human
genetics is to find and characterize genes responsible for
genetic disease.
• How to find one gene in large genome? A gene might
be 1/1,000,000 of the genome. Three basic approaches:
•
1. Polymerase chain reaction (PCR). Make many
copies of a specific region of the DNA.
•
2. cell-based molecular cloning: create and isolate
a bacterial strain that replicates a copy of your gene.
•
3. hybridization: make DNA single stranded, allow
double strands to re-form using a labeled (e.g.
radioactive) version of your gene to make it easy to
detect.
DNA Polymerase
•
DNA polymerase is the enzyme that replicates DNA. To do this, it needs:
– Single stranded DNA template molecule
– Primer: a short piece of DNA or RNA base-paired with a region of the template
– dNTPs: the 4 deoxy nucleoside triphosphates dATP, dCTP, dGTP and dTTP,
which are the raw materials for the new DNA strand.
– DNA polymerase attaches new nucleotides to the 3’ end of the primer, using
the template strand as a guide to picking the proper nucleotide to add.
Polymerase Chain Reaction
•
Based on DNA polymerase, the enzyme that replicates DNA.
– Needs template DNA and two primers that flank the region to be
amplified. Primers are short (generally 18-30 bases) DNA
oligonucleotides complementary to the ends of the region being
amplified.
– Starting at each primer, DNA polymerase adds new bases to the 3'
ends to create the new second strand.
– PCR is a cyclical process. Each cycle doubles the number of DNA
molecules between the primers: exponential growth.
– A key element in PCR is a special form of DNA polymerase from
Thermus aquaticus, a bacterium that lives in nearly boiling water in the
Yellowstone National Park hot springs. This enzyme, Taq polymerase,
can withstand the temperature cycle of PCR, which would kill DNA
polymerase from E. coli.
• advantages:
– rapid, sensitive, lots of useful variations, robust (works even with partly
degraded DNA)
• disadvantages:
– Only short regions (up to 2 kbp) can be amplified.
– limited amount of product made
PCR Cycle
•
•
•
•
PCR is based on a cycle of
3 steps that occur at
different temperatures.
Each cycle doubles the
number of DNA molecules:
25-35 cycles produces
enough DNA to see on an
electrophoresis gel. Each
step takes about 1 minute
to complete.
1. Denaturation:
make the DNA single
stranded by heating to
94oC
2. Annealing:
hybridize the primers to the
single strands.
Temperature varies with
primer, around 50oC
3. Extension: build
the second strands with
DNA polymerase and
dNTPs: 72oC.
SSR Genetic Markers
•
•
•
•
•
•
. Microsatellites (Simple Sequence Repeats:
SSRs. They are short (2-5 bases)
sequences that are repeated several times
in tandem: TGTGTGTGTGTG is 6 tandem
repeats of TG.
SSRs are found in and near many genes
throughout the genome--they are quite
common and easy to find.
During normal replication of the DNA in the
nucleus, DNA polymerase sometimes slips
and creates extra copies or deletes a few
copies of the repeat.
This happens rarely enough that most
people inherit the same number of repeats
that their parents had (i.e. SSRs are stable
genetic markers), but often enough that
numerous variant alleles exist in the
population.
Mapping SSRs is a matter of having PCR
primers that flank the repeat region, then
examining the PCR products on an
electrophoresis gel and counting the number
of repeats.
SSRs are co-dominant markers: both alleles
can be detected in a heterozygote.
Allele-Specific PCR
• For base change
mutations (single
nucleotide
polymorphisms).
• Use a primer
whose 3’ base
matches the
mutation. Will
amplify one allele
but not the other
because the 3’ end
is not paired with
the template in the
wrong allele.
Real Time PCR
•
Used to quantitate gene
expression:
•
First, convert all mRNA in
a sample to single
stranded cDNA using
reverse transcriptase,
Then, amplify the region of
interest using specific
primers.
Measure the amount of
DNA made by PCR using
a dye that only binds to
double stranded DNA.
SYBR Green is a
commonly used dye.
The more mRNA/cDNA
you started with, the faster
the fluorescence builds up
•
•
•
In this RT-PCR experiment, 30, 300, or 3000
copies of the cDNA were subjected to PCR.
The more copies of the cDNA, the sooner the
fluorescence rises and saturates the detector.
Cell-Based Molecular Cloning
• The original recombinant DNA technique: 1974 by Cohen and
Boyer.
• Several key players:
•
1. restriction enzymes. Cut DNA at specific sequences. e.g.
EcoR1 cuts at GAATTC and BamH1 cuts at GGATCC. Most of
them leave sticky ends: short single stranded regions that will
hybridize with complementary ends.
•
2. Plasmids: independently replicating DNA circles (only
circles replicate in bacteria). Foreign DNA can be inserted into a
plasmid and replicated.
– Plasmids for cloning carry drug resistance genes that are used for
selection.
– Spread antibiotic resistance genes between bacterial species
•
3. DNA ligase. Enzyme that attaches 2 pieces of DNA
together.
•
4. transformation: DNA manipulated in vitro can be put back
into the living cells by a simple process .
– The transformed DNA replicates and expresses its genes.
Plasmid Vectors
•
•
•
•
To replicate, a plasmid must be
circular, and it must contain a
replicon, a DNA sequence that
DNA polymerase will bind to and
initiate replication. Also called
“ori” (origin of replication).
Plasmid cloning vectors must
also carry a selectable marker:
drug resistance. Transformation
is inefficient, so bacteria that
aren’t transformed must be killed.
Most cloning vectors have a
multiple cloning site, a short
region of DNA containing many
restriction sites close together
(also called a polylinker). This
allows many different restriction
enzymes to be used.
Most cloning vectors use a
system for detecting the presence
of a recombinant insert, usually
the blue/white beta-galactosidase
system.
Basic Cloning Process
•Plasmid is cut open with a restriction
enzyme that leaves an overhang: a
sticky end
•Foreign DNA is cut with the same
enzyme.
•The two DNAs are mixed. The sticky
ends anneal together, and DNA ligase
joins them into one recombinant
molecule.
•The recombinant plasmids are
transformed into E. coli using heat
plus calcium chloride.
•Cells carrying the plasmid are
selected by adding an antibiotic: the
plasmid carries a gene for antibiotic
resistance.
DNA Ligase in Action!
I hope
Cloning Vector Types
• For different sizes of DNA:
– plasmids: up to 5 kb
– phage lambda (λ) vectors: up to 50 kb
– BAC (bacterial artificial chromosome): 300 kb
– YAC (yeast artificial chromosome): 2000 kb
• Expression vectors: make RNA and
protein from the inserted DNA
• shuttle vectors: can grow in two different
species
Bacterial Artificial Chromosomes
•
•
•
•
•
BACs are the most common
vector for large inserts such as
eukaryotic genome projects.
Based on the E. coli F plasmid
that confers the ability to
conjugate.
Low copy number plasmids
(usually 1 per cell), which prevents
crossing over between repeated
sequences in the insert DNA
But, low copy number also means
low DNA yield.
Transformed into E. coli using
electroporation, subjecting the
bacteria to a high voltage
electrical field.
Expression Vectors
• Various types:
– RNA only: use a vector that has a phage T7 promoter in front of
the cloning site, and an inducible T& polymerase gene.
Induction by the lac operon repressor gene and the synthetic
inducer IPTG (isopropyl thiogalactoside).
– polypeptide or fragments of polypeptides: can be produced in E.
coli using a ribosome binding site in addition to the promoter.
Need to use the correct reading frame.
• can also be done as a fusion protein (your protein fused to a marker
protein) for easy detection or purification
– post-translationally modified or intron-spliced protein: needs to
be expressed in eukaryotic cells. Needs eukaryotic promoter and
polyadenylation (poly-A addition) signals, plus a selectable
marker that works in eukaryotes (since most antibiotics are
specific for prokaryotes).
Example Expression Vector
•
•
•
•
•
For eukaryotic expression, this vector (from
Invitrogen) has a cauliflower mosaic virus
promoter (PCMV), a bovine growth hormone
polyadenlyation site (BGHpA).
The DNA inserted at “hORF” gets fused to a
short peptide called an epitope, for which
very specific anitbodies exist. It also gets
fused to 6 histidines, which allow easy
purification on a column that has nickel ions
bound to it (an affinity tag).
For growth in mammalian cells, it has an
SV40 viral origin of replication (SV40ori),
and a zeocin resistance gene (Zeocin, with
SV40 promoter/enhancer and SV40 poly A
site).
For growth in E. coli it has the ColE1
replicon. Zeocin works as a selectable
marker in baceria as well as in eukaryotic
cells.
There is also a T7 promoter for making RNA
from the inserted gene, and an f1 origin of
replication for making single stranded DNA
(useful for sequencing).
Sources of DNA to Clone
•
Genomic DNA: cut up whole genome and clone small pieces.
Advantage is, you get everything. Disadvantage is, a lot of it is junk.
– Two general methods:
• 1. randomly shear DNA into small pieces, then ligate linkers to the ends:
oligonucleotides that contain a useful restriction site.
• 2. partially digest the DNA with a restriction enzyme that has a 4 base
recognition site. These sites will appear at random every 256 (44) base
pairs. Take long pieces.
•
cDNA: DNA copy of mRNA, made with reverse transcriptase.
Advantage: you just get the expressed genes. Disadvantages: you
don't get control sequences or introns, and frequency depends on
level of expression.
• Synthetic DNA: synthesized de novo (for example multiple cloning
sites or linkers), or made by PCR
cDNA Synthesis
•use oligo-dT
primer, which binds
to poly-A tail.
•make the first DNA
strand from the RNA
using reverse
transcriptase
More cDNA Synthesis
•Remove the RNA with
heat or alkali.
•The 3’ end
spontaneously forms a
small hairpin.
•Extend the hairpin with
DNA polymerase
•Cut eh loop with S1
nuclease (which cuts at
unpaired bases)
•Attach synthetic
linkers with DNA ligase
and clone into a vector.
Hybridization
• The idea is that if DNA is
denatured (made single stranded,
also called melted), it will pair up
with another DNA (or RNA) with the
complementary sequence. If one of
the DNA molecules is labeled, you
can detect the hybridization.
• Basic applications:
– Southern blot: DNA digested by a
restriction enzyme then separated
on an electrophoresis gel
– Northern blot: uses RNA on the gel
instead of DNA
– in situ hybridization: probing a
tissue
– colony hybridization: detection of
clones
– microarrays
Labeling
• Several methods. One is
random primers labeling:
– use 32P-labeled dNTPs
– short random oligonucleotides
as primers (made
synthetically)
– single stranded DNA template
(made by melting double
stranded DNA by boiling it)
– DNA polymerase copies the
DNA template, making a new
strand that incorporates the
label.
• Can also label RNA
(sometimes called riboprobes),
use non-radioactive labels
(often a small molecule that
labeled antibodies bind to, or a
fluorescent tag), use other
labeling methods.
Hybridization Process
•
•
•
•
All the DNA must be single
stranded (melt at high temp or with
NaOH). Occurs in a high salt
solution at say 60oC.
Complementary DNAs find each
other and stick. Need to wash off
non-specific binding.
Stringency: how perfectly do the
DNA strands have to match in order
to stick together? Less than perfect
matches will occur at lower
stringency (e.g. between species).
Increase stringency by increasing
temp and decreasing salt
concentration.
Rate of hybridization depends on
DNA concentration and time (Cot),
as well as GC content and DNA
strand length.
Autoradiography. Put the labeled
DNA next to X-ray film; the
radiation fogs the film.
Southern Blot
•
•
•
•
•
•
The Southern blot is used to
detect a specific DNA sequence in
a complex mixture, such as
genomic DNA
Cut DNA with restriction enzyme,
then run on an electrophoresis
gel.
Suck buffer through the gel into a
nitrocellulose membrane. The
buffer goes through but the DNA
sticks to the membrane.
Fix the DNA to the membrane
permanently with UV or heat
Hybridize membrane to a
radioactive probe, then detect
specific bands with
autoradiography.
Northern blot uses RNA instead.
RNA must be denatured so the
distance it migrates on the gel is
proportional to its length: put
formaldehyde in the gel.
Restriction Fragment Length
Polymorphisms
•
•
•
•
•
RFLPs: the first DNA-based genetic
mapping technique. Advantage:
every individual has many variations
in their DNA, so you don’t need a
special set of marker mutations.
Also, the markers are co-dominant so
you can accurately determine the
genotype.
Probe is a fragment of a cloned gene
(labeled).
Genomic DNA is cut with a restriction
enzyme.
Polymorphic sites: the restriction site
is present in some individuals but not
in others (due to mutation). But,
even if one site is missing, there will
be another restriction site a little
further away (a restriction enzyme
with a 6 base site cuts on the
average every 46 = 4096 bp).
Then do a Southern blot and
autoradiography.
In Situ Hybridization
• Using tissues or tissue
sections.
• Often done with nonradioactive probes
because the high energy
of 32P emission gives an
imprecise view of where
the hybridization is.
• Counterstain the tissue
so non-hybridizing parts
are visible.
Microarrays
•
•
•
•
A microarray is a set of short (20-60
bases) oligonucleotides bound to a
glass slide. The microarray is
hybridized with fluorescently labeled
DNA.
For gene expression analysis,
messenger RNA isolated from a
tissue, then converted to cDNA. You
see which genes are active in that
tissue.
Often 2 conditions are compared
(control and experimental), using red
and green fluorescent tags.
Semi-quantitative
Tiling Arrays
• Tiling arrays: Microarray chips with short probes that cover the entire
genome (sometimes overlapping, sometimes with gaps between)
– Transcriptome mapping. Use RNA converted to cDNA and labeled to detect
transcribed regions, even if the RNA is very short or not polyadenylated. A
surprising number of transcribed regions do not look like genes: short exons, RNA
only genes.
– Finding protein-binding regions: ChIP-chip (chromatin immunoprecipitation
chip). Isolate DNA with proteins bound (histones, transcription factors, etc.) Then
break up the DNA into small fragments by sonication, immunoprecipitate the protein
of interest with bound DNA, remove the proteins and hybridize the DNA with the
tiling array chip.
Transcriptome Mapping
Chromatin Immunoprecipitation chip
SNP Detection
• The problem with detecting single
nucleotide polymorphisms (SNPs) is
that you need to get good hybridization
with a perfect match, and little or no
hybridization with a 1 base pair
mismatch. It is hard to do this for many
different sequences simultaneously.
• A simple solution: for each SNP
location, have oligos on the chip for all
4 possible bases. The one that
hybridizes best should be the correct
one.
– Remembering that many people are
heterozygotes, so hybridizing to 2 alleles is
common
• There are many other applications for
microarrays.
DNA Sequencing
Determining DNA Sequence
• Originally 2 methods were invented around 1976, but only one is
widely used: invented by Fred Sanger.
– Sanger sequencing is currently thought to produce the most accurate
and longest sequences of any method. However, it is slow and
expensive.
• Uses DNA polymerase to synthesize a second DNA strand that is
labeled. DNA polymerase always adds new bases to the 3’ end of
a primer that is base-paired to the template DNA.
• Also uses chain terminator nucleotides: dideoxy nucleotides
(ddNTPs), which lack the -OH group on the 3' carbon of the
deoxyribose. When DNA polymerase inserts one of these ddNTPs
into the growing DNA chain, the chain terminates, as nothing can
be added to its 3' end.
Sequencing Reaction
•
The template DNA is usually single stranded
DNA, which can be produced from plasmid
cloning vectors that contain the origin of
replication from a single stranded
bacteriophage such as M13 or fd. The primer is
complementary to the region in the vector
adjacent to the multiple cloning site.
•
Sequencing is done by having 4 separate
reactions, one for each DNA base.
All 4 reactions contain the 4 normal dNTPs, but
each reaction also contains one of the ddNTPs.
In each reaction, DNA polymerase starts
creating the second strand beginning at the
primer.
When DNA polymerase reaches a base for
which some ddNTP is present, the chain will
either:
–
terminate if a ddNTP is added, or:
– continue if the corresponding dNTP is
added.
– which one happens is random, based on
ratio of dNTP to ddNTP in the tube.
However, all the second strands in, say, the A
tube will end at some A base: you get a
collection of DNAs that end at each of the A's in
the region being sequenced.
•
•
•
•
Electrophoresis
•
•
•
The newly synthesized DNA from
the 4 reactions is then run (in
separate lanes) on an
electrophoresis gel.
The DNA bands fall into a ladderlike sequence, spaced one base
apart. The actual sequence can
be read from the bottom of the gel
up.
Automated sequencers use 4
different fluorescent dyes as tags
attached to the dideoxy
nucleotides and run all 4 reactions
in the same lane of the gel.
– Today’s sequencers use capillary
electrophoresis instead of slab
gels.
– Radioactive nucleotides (32P) are
used for non-automated
sequencing.
•
Sequencing reactions usually
produce about 500-1000 bp of
good sequence.
Sanger
Sequencing
Protocol
Next Generation Sequencing
•
Recently a number of faster and cheaper sequencing methods have been
developed.
–
–
•
We are going to discuss the Illumina sequencing method, which is probably the most widely
used at present. But, there are several other common methods that can be called “next
generation”: Ion Torrent, 454, SOLiD, and more.
Third generation sequencing: getting long sequences from single molecules, is getting
started.
Applications:
–
–
–
–
sequencing of whole bacterial genomes in a single run
sequencing genomes of individual people or tumors
metagenomics: sequencing DNA extracted from environmental samples
Deep sequencing: looking for rare variants in a single amplified region, in
tumors or viral infections
– RNASeq: sequencing total cellular mRNA converted to cDNA.
Illumina Sequencing
•
•
•
Many sequencing methods have been
invented, and it’s still a very active
area of research.
Most use the concept of sequencing
by synthesis: starting with a primer,
use DNA polymerase to add new
bases are added one at a time, paying
attention to which base is added.
In the Illumina method (current
favorite) , fluorescent tags attached
to the 3’ OH group are used.
–
•
•
•
•
Each of the 4 nucleotides has a different
colored tag.
The fluorescent tags block the 3’-OH
of the new nucleotide, and so the next
base can only be added when the tag
is removed.
A cycle: add one new base, then read
its color, then remove the fluorescent
tag to give a free 3’ OH group.
Repeat the cycle up to 200 times.
End up with 200 bp of sequence
information.
More Sequencing
•
•
•
To get enough signal from the DNA molecule being sequenced, each DNA
molecule needs to be amplified using PCR.
For the Illumina method, this is done by attaching individual DNA molecules to a
solid surface, then PCR-amplifying them in place, giving tiny spots with about a
million identical copies.
The DNA polymerase sequencing reactions are then monitored with a high
resolution video camera.
Sequence
Assembly
•
•
•
•
DNA is sequenced in very small fragments: 100-1000 bp. Compare this to
the size of the human genome: 3,000,000,000 bp.
In shotgun sequencing (the usual method), DNA is fragmented randomly.
Enough data is collected so each base is read 10 times or more on
average.
In principle, assembling a sequence is just a matter of finding overlaps and
combining them.
In practice:
– most genomes contain multiple copies of many sequences,
– there are random mutations (either naturally occurring cell-to-cell variation or
generated by PCR or cloning),
– there are sequencing errors and misreadings,
– sometimes the cloning vector itself is sequenced
– sometimes miscellaneous junk DNA gets sequenced
Sequence Assembly
• The big problem with all current sequencing methods: you only get
very short reads: 200 bp maximum for Illumina, up to 1000 bp for
the older (slower, much more expensive) Sanger method, etc.
– The human genome is 23 DNA molecules (chromosomes) that total 3 billion bp.
Human chromosomes are 50-250 million base pairs long.
– You need to assemble the tiny reads into much longer contigs (continuous
sequences). With a perfectly sequenced genome, the final contigs would be
identical to the DNA sequence of the chromosomes.
• How reads are assembled into contigs: overlapping sequences.
Assembly Problems
•
•
Chromosomes, especially eukaryotic chromosomes, are filled with sequences that
are repeated many times. If you have a read from a repeated sequence, how do you
know which copy it is?
– Some repeats are next to each other (tandem repeats) and some are scattered
all over the genome (dispersed repeats).
The main solution to this problem is to start with longer DNA template molecules and
sequence both ends. You don’t know the sequence in between, but you do know
how far apart the ends are. This often allows you to jump over repeated sequences.
– It’s not perfect, and even now there are no human chromosomes sequenced to 100%
accuracy.
RNA Seq
•
•
This is a new method, published in 2008. It is
probably the method of choice today for
analyzing RNA content. Also called whole
transcriptome shotgun sequencing.
Very simple: isolate messenger RNA, break it into
200-300 base fragments, reverse transcribe, then
perform large scale sequencing using 454,
Illumina. Or other massively parallel sequencing
technology.
– RNA sequences then compared to genomic
sequences to find which gene is expressed and
also exon boundaries
– Exon boundaries are a problem with very short
reads: you might only have a few bases of overlap
to one of the exons.
•
•
As with all RNA methods, which RNAs are
present depends on the tissue analyzed and
external conditions like environmental stress or
disease state.
Get info on copy number over a much wider
range than microarrays. Also detects SNPs.