Download Recombinant DNA Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Whole genome sequencing wikipedia , lookup

History of RNA biology wikipedia , lookup

Mutagen wikipedia , lookup

DNA wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

DNA repair wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Nutriepigenomics wikipedia , lookup

DNA profiling wikipedia , lookup

Human genome wikipedia , lookup

Designer baby wikipedia , lookup

Cancer epigenetics wikipedia , lookup

DNA sequencing wikipedia , lookup

Gene wikipedia , lookup

Nucleosome wikipedia , lookup

DNA polymerase wikipedia , lookup

RNA-Seq wikipedia , lookup

Point mutation wikipedia , lookup

SNP genotyping wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metagenomics wikipedia , lookup

Replisome wikipedia , lookup

DNA vaccination wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Genome editing wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Microsatellite wikipedia , lookup

DNA supercoil wikipedia , lookup

Non-coding DNA wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Epigenomics wikipedia , lookup

Molecular cloning wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Genomic library wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Primary transcript wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
Recombinant DNA
Technology
Isolating DNA
•
Chemically, DNA is a very simple compound, with little variation
•
between species.
The basic steps:
1.
2.
3.
4.
•
•
Methods for breaking the cells open vary between species, but
usually involve mechanical disruption in a buffer that inhibits
DNAases.
Separating DNA from other cell components relies on the fact that
DNA is chemically different from proteins and other
macromolecules.
–
–
•
Break the cells open
Disrupt cell membranes with a detergent
Remove proteins and other macromolecules
Concentrate the DNA by precipitating it and re-suspending it in fresh
buffer.
If the cell extract is mixed with a phenol/chloroform solution, most of the
proteins and other cellular junk goes into the phenol/chloroform layer
while the DNA and RNA stay in the aqueous phase.
Phenol is very nasty, and many methods have been invented to get
around this step.
DNA can be precipitated using ethanol, and then resuspended in a
buffer containing EDTA, which chelates (removes from solution)
Mg2+ ions. This is useful because all DNA-degrading enzymes use
Mg2+ ions as a co-factor.
Transformation
•
How to get DNA back into the cell after manipulation in vitro.
–
–
–
•
•
•
•
Transformation means the cell takes up naked DNA from the
environment and incorporates it into its genome.
In animal cells, this process is often called transfection, because
transformation also means going from normal to cancerous.
A cell that is in a state that allows transformation is called competent.
Need to get the DNA through the cell membrane, plus
possibly the cell wall (bacteria and plants) and get it into the
nucleus (eukaryotes).
Lots of methods, many are specific to a particular group of
organisms.
Natural competence: some bacteria take DNA up without
any special treatment. Good example: Streptococcus
pneumoniae, used by Griffith, and Avery, MacLeod and
McCarty in early experiments showing that DNA with the
hereditary material.
Chemically induced competence: E. coli is made
competent by treating the cells with Ca2+ ions, then giving
them a heat shock (2 minutes at 42oC).
More Transformation
• Electroporation: if you subject a cell to a
high voltage electrical field (say 20,000
volts per centimeter), holes appear in the
membrane that DNA can go through. The
holes disappear quickly once the voltage
disappears.
• Gene gun: Tiny gold particles are coated
with DNA, then blasted through the cell
wall using high pressure gas. One method
uses blank .22 rifle cartridges to generate
the pressure.
• Lipofection: coating the DNA in lipid
vesicles that fuse with the cell membrane.
This is very efficient and common with
mammalian cells.
Electrophoresis
•
•
•
•
•
•
Electrophoresis is the separation
of charged molecules in an
electric field.
Nucleic acids have 1 charged
phosphate (- charge) per
nucleotide. This implies a
constant charge to mass ratio.
Thus, separation is based almost
entirely on length: longer
molecules move slower.
Done in a gel matrix to stabilize:
agarose or acrylamide.
average run: 100 Volts across a
10 cm gel, run for 2 hours.
Stain with ethidium bromide:
intercalates between DNA bases
and fluoresces orange with UV
light.
Run alongside standards of known
sizes to get lengths
Restriction Enzymes
•
Restriction enzymes (or more accurately,
restriction endonucleases) are used by
bacteria to destroy invading DNA.
•
•
•
•
The enzymes recognize particular
sequences in DNA and then cut the DNA.
Their own DNA has been modified
(methylated) at the corresponding
sequences by a restriction methylase
(modification enzyme), so only foreign DNA
is cut.
There are several different types of
restriction enzyme: type I, type II, etc..
They differ on where the cut the DNA
relative to the recognition site: some cut
randomly up to 1000 bp from the
recognition site.
In molecular biology we use type II
restriction enzymes, which cut the DNA at
the recognition site
Restriction Sites
• Most restriction sites are palindromes:
they read the same forwards and
backwards (remembering that “reading
backwards” means reading on the other
strand).
– For example, Eco R1 cuts at GAATTC. TTC is
what is on the opposite strand of GAA, read 5’ to
3’.
• Many restriction enzymes give staggered
cuts: which leave a few bases at the ends
single stranded.
– This is very useful for cloning: staggered cut
sites are sticky: the unpaired bases pair with
unpaired bases on another DNA molecule,
holding the two molecules together long enough
for DNA ligase to attach them covalently.
– An enzyme that cuts both strands in the same
place (e.g. Alu1) produces blunt ends.
Restriction Enzymes for Mapping
•
Most restriction sites are 4 bp or 6
bp long.
–
•
The locations of the sites are fixed
points that are easy to detect, so
they are good for mapping.
–
•
In random DNA, a 4 bp site occurs
about every 44 = 256 bp, and a 6 bp
site appears about every 46 = 4096
bp.
Restriction sites near genes can be
used for cutting out a gene or splicing
something in.
Basic technique:
–
–
–
Digest DNA with each restriction
enzyme individually and in pairs
Measure band sizes using
electrophoresis
Assemble a map from the band
sizes. Single digests tell you how
many sites each enzyme has and
how far apart they are. Double
digests allow you to put the sites in
the proper order.
RNA
• Gene expression is studied using RNA.
However, RNA has two annoying
properties:
– it is very easily degraded. A desirable
property in the cell: allows rapid response to
environmental changes
– It usually has a lot of secondary structure.
This means that migration speed in
electrophoresis is not proportional to length.
The same problem occurs with proteins.
• For direct analysis of RNA, denaturing
agents such as formaldehyde or methyl
mercury hydroxide (very toxic) are
used.
• For sequence analysis, RNA is first
converted to DNA (called cDNA).
cDNA Synthesis
•use oligo-dT primer,
which binds to poly-A
tail.
•make the first DNA
strand from the RNA
using reverse
transcriptase, which
makes a DNA copy
of an RNA molecule.
More cDNA Synthesis
•Remove the RNA with
heat or alkali.
•The 3’ end
spontaneously forms a
small hairpin.
•Extend the hairpin with
DNA polymerase
•Cut the loop with S1
nuclease (which cuts at
unpaired bases)
•Attach synthetic
linkers with DNA ligase
and clone into a vector.
Expressed Sequence Tags
• ESTs are cDNA clones that have has a single round of sequencing
done from one end.
• First extract mRNA from a given tissue and/or environmental
condition. Then convert it to cDNA and clone.
• Sequence thousands of EST clones and save the results in a
database.
• A search can then show whether your sequence was expressed in
that tissue.
– quantitation issues: some mRNAs are present in much higher
concentration than others. Many EST libraries are “normalized” by
removing duplicate sequences.
• Also can get data on transcription start sites and exon/intron
boundaries by comparing to genomic DNA
– but sometimes need to obtain the clone and sequence the rest of it
yourself.
Finding Genes
• How to find one gene in large genome? A gene
might be 1/1,000,000 of the genome. Three
basic approaches:
•
1. Polymerase chain reaction (PCR). Make
many copies of a specific region of the DNA.
•
2. cell-based molecular cloning: create and
isolate a bacterial strain that replicates a copy of
your gene.
•
3. hybridization: make DNA single stranded,
allow double strands to re-form using a labeled
(e.g. radioactive) version of your gene to make it
easy to detect.
Polymerase Chain Reaction
•
Based on DNA polymerase creating a second strand of DNA.
– Needs template DNA and two primers that flank the region to be
amplified. Primers are short (generally 18-30 bases) DNA
oligonucleotides complementary to the ends of the region being
amplified.
– DNA polymerase adds new bases to the 3' ends of the primers to create
the new second strand.
– go from 1 DNA to 2, then 4, 8, etc: exponential growth of DNA from this
region
– A key element in PCR is a special form of DNA polymerase from
Thermus aquaticus, a bacterium that lives in nearly boiling water in the
Yellowstone National Park hot springs. This enzyme, Taq polymerase,
can withstand the temperature cycle of PCR, which would kill DNA
polymerase from E. coli.
• advantages:
– rapid, sensitive, lots of useful variations, robust (works even with partly
degraded DNA)
• disadvantages:
– Only short regions (up to 2 kbp) can be amplified.
– limited amount of product made
PCR Cycle
•
•
•
•
PCR is based on a cycle of 3
steps that occur at different
temperatures. Each cycle
doubles the number of DNA
molecules: 25-35 cycles
produces enough DNA to see
on an electrophoresis gel.
Each step takes about 1 minute
to complete.
1. Denaturation: make the
DNA single stranded by
heating to 94oC
2. Annealing: hybridize
the primers to the single
strands. Temperature varies
with primer, around 50oC
3. Extension: build the
second strands with DNA
polymerase and dNTPs: 72oC.
Other PCR Images
SSR Genetic Markers
•
. Microsatellites (also called Simple Sequence
Repeats (SSRs) or Short Tandem Repeats
(STRs).
–
–
•
SSRs are short (2-5 bases) sequences that are
repeated several times in tandem:
TGTGTGTGTGTG is 6 tandem repeats of TG.
–
–
–
•
•
Used for genome mapping in humans, plants, and other
organisms.
Also used for forensic DNA analysis (criminal identification)
SSRs are found in and near many genes throughout the
genome--they are quite common and easy to find.
During normal replication of the DNA in the nucleus, DNA
polymerase sometimes slips and creates extra copies or
deletes a few copies of the repeat.
This happens rarely enough that most people inherit the
same number of repeats that their parents had (i.e. SSRs
are stable genetic markers), but often enough that numerous
variant alleles exist in the population.
SSRs are detected using PCR. A pair of primers
binds to sites flanking the SSR locus. This region
is amplified by PCR, and the products are
examined on an electrophoresis gel.
The FBI uses the Combined DNA Index System
(CODIS) in crime investigations. CODIS is a set
of 13 microsatellite loci scattered throughout the
genome, with many alleles (repeat numbers) for
each locus.
SSR Example
Real Time PCR
•
Used to quantitate gene
expression:
•
First, convert all mRNA in
a sample to single
stranded cDNA using
reverse transcriptase,
Then, amplify the region of
interest using specific
primers.
Measure the amount of
DNA made by PCR using
a dye that only binds to
double stranded DNA.
SYBR Green is a
commonly used dye.
The more mRNA/cDNA
you started with, the faster
the fluorescence builds up
•
•
•
In this RT-PCR experiment, 30, 300, or 3000
copies of the cDNA were subjected to PCR.
The more copies of the cDNA, the sooner the
fluorescence rises and saturates the detector.
Cell-Based Molecular Cloning
• The original recombinant DNA technique: 1974 by Cohen and
Boyer.
• Several key players:
•
1. restriction enzymes. Cut DNA at specific sequences. e.g.
EcoR1 cuts at GAATTC and BamH1 cuts at GGATCC.
•
2. Plasmids: independently replicating DNA circles (only
circles replicate in bacteria). Foreign DNA can be inserted into a
plasmid and replicated.
– Plasmids for cloning carry drug resistance genes that are used for
selection.
– Spread antibiotic resistance genes between bacterial species
•
•
3. DNA ligase. Attaches 2 pieces of DNA together.
4. transformation: DNA manipulated in vitro can be put back
into the living cells by a simple process .
– The transformed DNA replicates and expresses its genes.
Plasmid Vectors
•
To replicate, a plasmid must be circular,
and it must contain a replicon, a DNA
sequence that DNA polymerase will bind
to and initiate replication. Also called “ori”
(origin of replication).
–
–
•
•
•
Replicons are usually species-specific.
Some replicons allow many copies of the
plasmid in a cell, while others limit the copy
number or one or two.
Plasmid cloning vectors must also carry a
selectable marker: drug resistance.
Transformation is inefficient, so bacteria
that aren’t transformed must be killed.
Most cloning vectors have a multiple
cloning site, a short region of DNA
containing many restriction sites close
together (also called a polylinker). This
allows many different restriction enzymes
to be used.
Most cloning vectors use a system for
detecting the presence of a recombinant
insert, usually the blue/white betagalactosidase system.
Basic Cloning Process
• Plasmid is cut open with a restriction
enzyme that leaves an overhang: a
sticky end
• Foreign DNA is cut with the same
enzyme.
• The two DNAs are mixed. The
sticky ends anneal together, and DNA
ligase joins them into one
recombinant molecule.
• The recombinant plasmids are
transformed into E. coli by treating the
cells with Ca2+ ions and a heat shock.
• Cells carrying the plasmid are
selected by adding an antibiotic: the
plasmid carries a gene for antibiotic
resistance.
DNA Ligase in Action!
I hope
Cloning Vector Types
• For different sizes of DNA:
– plasmids: up to 5 kb
– phage lambda (λ) vectors (also cosmids): up
to 50 kb
– BAC (bacterial artificial chromosome): 300 kb
– YAC (yeast artificial chromosome): 2000 kb
• Expression vectors: make RNA and
protein from the inserted DNA
– shuttle vectors: can grow in two different
species, usually E. coli and something else
Bacterial Artificial Chromosomes
•
•
•
•
•
Based on the F plasmid that
confers the ability to
conjugate.
Low copy number plasmids
(usually 1 per cell), which
prevents crossing over
between repeated
sequences in the insert DNA
But, low copy number also
means low DNA yield.
Transformed into E. coli
using electroporation,
subjecting the bacteria to a
high voltage electrical field.
BACs are currently the most
common vector for large
inserts such as eukaryotic
genome projects.
Expression Vectors
• Various types:
– RNA only: use a vector that has a phage T7 promoter in front of
the cloning site, and an inducible T7 polymerase gene.
Induction by the lac operon repressor gene and the synthetic
inducer IPTG (isopropyl thiogalactoside).
– polypeptide or fragments of polypeptides: can be produced in E.
coli using a ribosome binding site in addition to the promoter.
Need to use the correct reading frame.
• can also be done as a fusion protein (your protein fused to a
marker protein) for easy detection or purification
– post-translationally modified or intron-spliced protein: needs to
be expressed in eukaryotic cells. Needs eukaryotic promoter and
polyadenylation (poly-A addition) signals, plus a selectable
marker that works in eukaryotes (since most antibiotics are
specific for prokaryotes).
Example Expression Vector
•
•
•
•
•
For eukaryotic expression, this vector (from
Invitrogen) has a cauliflower mosaic virus
promoter (PCMV), a bovine growth hormone
polyadenlyation site (BGHpA).
The DNA inserted at “hORF” gets fused to a
short peptide called an epitope, for which
very specific anitbodies exist. It also gets
fused to 6 histidines, which allow easy
purification on a column that has nickel ions
bound to it (an affinity tag).
For growth in mammalian cells, it has an
SV40 viral origin of replication (SV40ori),
and a zeocin resistance gene (Zeocin, with
SV40 promoter/enhancer and SV40 poly A
site).
For growth in E. coli it has the ColE1
replicon. Zeocin works as a selectable
marker in bacteria as well as in eukaryotic
cells.
There is also a T7 promoter for making RNA
from the inserted gene, and an f1 origin of
replication for making single stranded DNA
(useful for sequencing).
Hybridization
• The idea is that if DNA is made
single stranded (melted), it will pair
up with another DNA (or RNA) with
the complementary sequence. If
one of the DNA molecules is
labeled, you can detect the
hybridization.
• Basic applications:
– Southern blot: DNA digested by a
restriction enzyme then separated
on an electrophoresis gel
– Northern blot: uses RNA on the
gel instead of DNA
– in situ hybridization: probing a
tissue
– colony hybridization: detection of the
clone you want from a library
– Microarrays: quantitation of all
mRNAs in a cell
Labeling
• Several methods. One is
random primers labeling:
– use 32P-labeled dNTPs
– short random oligonucleotides
as primers (made
synthetically)
– single stranded DNA template
(made by melting double
stranded DNA by boiling it)
– DNA polymerase copies the
DNA template, making a new
strand that incorporates the
label.
• Can also label RNA
(sometimes called riboprobes),
use non-radioactive labels
(often a small molecule that
labeled antibodies bind to, or a
fluorescent tag), use other
labeling methods.
Hybridization Process
•
•
•
•
All the DNA must be single stranded
(melt at high temp or with NaOH).
Occurs in a high salt solution at about
60oC. Complementary DNAs find each
other and stick. Need to wash off nonspecific binding.
Stringency: how perfectly do the DNA
strands have to match in order to stick
together? Less than perfect matches will
occur at lower stringency (e.g. between
species). Increase stringency by
increasing temp and decreasing salt
concentration.
Rate of hybridization depends on DNA
concentration and time (Cot), as well as
GC content and DNA strand length.
Autoradiography. Put the labeled DNA
next to X-ray film; the radiation fogs the
film.
Southern Blot
•
•
•
•
•
•
Used to detect a specific DNA
sequence in a complex mixture,
such as genomic DNA
Cut DNA with restriction enzyme,
then run on an electrophoresis
gel.
Suck buffer through the gel into a
nitrocellulose membrane. The
buffer goes through but the DNA
sticks to the membrane.
Fix the DNA to the membrane
permanently with UV or heat
Hybridize membrane to a
radioactive probe, then detect
specific bands with
autoradiography.
Northern blot uses RNA instead.
RNA must be denatured so the
distance it migrates on the gel is
proportional to its length: put
formaldehyde in the gel.
In Situ Hybridization
• Using tissues or tissue
sections.
• Often done with nonradioactive probes
because the high energy
of 32P emission gives an
imprecise view of where
the hybridization is.
• Counterstain the tissue
so non-hybridizing parts
are visible.
Microarrays
•
•
Microarrays are used to detect
messenger RNAs from many genes
simultaneously, to get a semiquantitative estimate of gene
expression in a given cell type or
growth condition.
Probes from many different genes
are placed in an array on a glass
microscope slide, then hybridized to
cDNA made from messenger RNA
isolated from a tissue. You see
which genes are active in that tissue.
– Mostly done with 60mers: 60 bases
long, synthetic oligonucleotides,
made using sequence information
from the genes.
– cDNA is fluorescently labeled
•
•
Often 2 conditions are compared
(control and experimental), using red
and green fluorescent tags.
Semi-quantitative
Determining DNA Sequence
•
We are going to discuss 2 DNA sequencing methods.
– The Sanger sequencing method is currently thought to produce the most accurate and
–
•
•
longest sequences of any method. However, it is slow and expensive. It was invented in
1976, and was the only practical sequencing method for many years.
Illumina sequencing is a currently popular type of Next-generation sequencing, a group
of methods invented starting around 1995. These methods are massively parallel and
generate huge amounts of data quickly and cheaply.
The Sanger method uses DNA polymerase to synthesize a second DNA
strand that is labeled. DNA polymerase always adds new bases to the 3’
end of a primer that is base-paired to the template DNA.
An essential part of Sanger sequencing is chain terminator nucleotides:
dideoxy nucleotides (ddNTPs), which lack the -OH group on the 3' carbon
of the deoxyribose. When DNA polymerase inserts one of these ddNTPs
into the growing DNA chain, the chain terminates, as nothing can be
added to its 3' end.
Sequencing Reaction
•
The template DNA is usually single stranded
DNA, which can be produced from plasmid
cloning vectors that contain the origin of
replication from a single stranded
bacteriophage such as M13 or fd. The primer is
complementary to the region in the vector
adjacent to the multiple cloning site.
•
Sequencing is done by having 4 separate
reactions, one for each DNA base.
All 4 reactions contain the 4 normal dNTPs, but
each reaction also contains one of the ddNTPs.
In each reaction, DNA polymerase starts
creating the second strand beginning at the
primer.
When DNA polymerase reaches a base for
which some ddNTP is present, the chain will
either:
– terminate if a ddNTP is added, or:
– continue if the corresponding dNTP is
added.
– which one happens is random, based on
ratio of dNTP to ddNTP in the tube.
However, all the second strands in, say, the A
tube will end at some A base: you get a
collection of DNAs that end at each of the A's in
the region being sequenced.
•
•
•
•
Electrophoresis
•
•
•
The newly synthesized DNA from
the 4 reactions is then run (in
separate lanes) on an
electrophoresis gel.
The DNA bands fall into a ladderlike sequence, spaced one base
apart. The actual sequence can
be read from the bottom of the gel
up.
Automated sequencers use 4
different fluorescent dyes as tags
attached to the dideoxy
nucleotides and run all 4 reactions
in the same lane of the gel.
– Today’s sequencers use capillary
electrophoresis instead of slab
gels.
– Radioactive nucleotides (32P) are
used for non-automated
sequencing.
•
Sequencing reactions usually
produce about 500-1000 bp of
good sequence.
Next Generation Sequencing
•
Recently a number of faster and cheaper sequencing methods have been
developed.
–
–
•
We are going to discuss the Illumina sequencing method, which is probably the most widely
used at present. But, there are several other common methods that can be called “next
generation”: Ion Torrent, 454, SOLiD, and more.
Third generation sequencing: getting long sequences from single molecules, is getting
started.
Applications:
– sequencing of whole bacterial genomes in a single run
– sequencing genomes of individuals
– metagenomics: sequencing DNA extracted from environmental samples. The
large majority of microorganisms can’t be grown in the lab, and have only been
detected by sequencing DNA from the environment.
– Deep sequencing: looking for rare variants in a single amplified region, in
tumors or viral infections
– RNASeq: sequencing total cellular mRNA converted to cDNA.
– ChIP-Seq: identifying the binding sites of transcription factor proteins by
immunoprecipitation of the proteins attached to the DNA, then sequencing the
DNA.
Illumina Sequencing
•
•
•
Many sequencing methods have been
invented, and it’s still a very active
area of research.
Most use the concept of sequencing
by synthesis: starting with a primer,
use DNA polymerase to add new
bases are added one at a time, paying
attention to which base is added.
In the Illumina method, fluorescent
tags attached to the 3’ OH group are
used.
–
•
•
•
•
Each of the 4 nucleotides has a different
colored tag.
The fluorescent tags block the 3’-OH
of the new nucleotide, and so the next
base can only be added when the tag
is removed.
A cycle: add one new base, then read
its color, then remove the fluorescent
tag to give a free 3’ OH group.
Repeat the cycle up to 200 times.
End up with 200 bp of sequence
information.
More Sequencing
•
•
•
To get enough signal from the DNA molecule being sequenced, each DNA
molecule needs to be amplified using PCR.
For the Illumina method, this is done by attaching individual DNA molecules to a
solid surface, then PCR-amplifying them in place, giving tiny spots with about a
million identical copies.
The DNA polymerase sequencing reactions are then monitored with a high
resolution video camera.
Sequence
Assembly
•
•
•
•
DNA is sequenced in very small fragments: 100-1000 bp. Compare this to
the size of the human genome: 3,000,000,000 bp.
In shotgun sequencing (the usual method), DNA is fragmented randomly.
Enough data is collected so each base is read 10 times or more on
average.
In principle, assembling a sequence is just a matter of finding overlaps and
combining them.
In practice:
– most genomes contain multiple copies of many sequences,
– there are random mutations (either naturally occurring cell-to-cell variation or
generated by PCR or cloning),
– there are sequencing errors and misreadings,
– sometimes the cloning vector itself is sequenced
– sometimes miscellaneous junk DNA gets sequenced
Sequence Assembly
• The big problem with all current sequencing methods: you only get
very short reads: 200 bp maximum for Illumina, up to 1000 bp for
the older (slower, much more expensive) Sanger method, etc.
– The human genome is 23 DNA molecules (chromosomes) that total 3 billion bp.
Human chromosomes are 50-250 million base pairs long.
– You need to assemble the tiny reads into much longer contigs (continuous
sequences). With a perfectly sequenced genome, the final contigs would be
identical to the DNA sequence of the chromosomes.
• How reads are assembled into contigs: overlapping sequences.
Assembly Problems
•
•
Chromosomes, especially eukaryotic chromosomes, are filled with sequences that
are repeated many times. If you have a read from a repeated sequence, how do you
know which copy it is?
– Some repeats are next to each other (tandem repeats) and some are scattered
all over the genome (dispersed repeats).
The main solution to this problem is to start with longer DNA template molecules and
sequence both ends. You don’t know the sequence in between, but you do know
how far apart the ends are. This often allows you to jump over repeated sequences.
– It’s not perfect, and even now there are no human chromosomes sequenced to 100%
accuracy.
RNA Seq
•
•
This is a new method, published in 2008. It is
probably the method of choice today for
analyzing RNA content. Also called whole
transcriptome shotgun sequencing.
Very simple: isolate messenger RNA, break it into
200-300 base fragments, reverse transcribe, then
perform large scale sequencing using Illumina or
other massively parallel sequencing technology.
– RNA sequences then compared to genomic
sequences to find which gene is expressed and
also exon boundaries
– Exon boundaries are a problem with very short
reads: you might only have a few bases of overlap
to one of the exons.
•
•
As with all RNA methods, which RNAs are
present depends on the tissue analyzed and
external conditions like environmental stress or
disease state.
Get info on copy number over a much wider
range than microarrays. Also detects SNPs.
Reporter Genes
•
•
•
We often wish to know the location of a gene
product, within an organism or within a cell.
One well-used method is to make an antibody
against your protein and treat tissues mounted
for microscopy with it. Your antibody is then
detected with a second antibody that is
conjugated with a fluorescent tag or an
enzyme with an easily visualized product.
It is often easier to use a DNA based method:
replace the protein-coding part of the gene
with an easily detected reporter gene.
–
–
•
This allows you to separate the effects of the gene
control sequences from the effects of the protein itself.
The reporter gene can be fused with just the promoter
region, or it can be fused to the protein itself, where it
acts as a separate domain.
Often done in conjunction with confocal
microscopy: examining the same image with
visible light and fluorescence.
Some Common Reporter Genes
• GFP: green fluorescent protein. A
small protein isolated from jellyfish. It
requires no substrates or co-factors.
Several variants give different colors: red
fluorescent protein, for example.
– It still works when it is fused to other proteins: it
acts as a separate protein domain.
• GUS: beta-glucuronidase. Produces an
insoluble dye with an artificial substrate.
Works well with plants and fungi because
they don’t have any naturally occurring
beta-glucuronidase.
• lacZ: E. coli beta-galactosidase. Used in
conjunction with the artificial substrate
Xgal, which produces an insoluble blue
dye.
• Luciferase: from fireflies. Requires a
substrate, but doesn’t require any UV
stimulation, which means no autofluorescence by other compounds in the
organism.
Silencing Genes with RNAi
•
•
•
•
•
RNA interference (RNAi) can silence a gene
by cleaving its mRNA. This is called a
knockdown, and usually decreases the
specific mRNA by 50-90%.
RNAi is accomplishing by introducing into the
cell a short double stranded RNA that
matches the mRNA sequence you wish to
attack.
The double stranded RNA is processed by
the Dicer enzyme, which inserts one strand
into the RISC complex. This RNA strand is
now called the guide RNA.
The RISC complex with its guide RNA
hybridizes to the mRNA, and then acts as an
endonuclease to cleave the mRNA.
Getting the double stranded RNA into the cell
can be done by various transformation
methods, or it can be made from a plasmid
introduced into the cell.