Download DNA Sequencing and Gene Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA repair wikipedia , lookup

DNA profiling wikipedia , lookup

DNA replication wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA repair protein XRCC4 wikipedia , lookup

DNA nanotechnology wikipedia , lookup

DNA sequencing wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

DNA polymerase wikipedia , lookup

Replisome wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
DNA Sequencing and Gene
Analysis
Determining DNA Sequence
• Originally 2 methods were invented around 1976, but only one is
widely used: invented by Fred Sanger.
• Uses DNA polymerase to synthesize a second DNA strand that is
labeled. Recall that DNA polymerase always adds new bases to a
primer.
• Also uses chain terminator nucleotides: dideoxy nucleotides
(ddNTPs), which lack the -OH group on the 3' carbon of the
deoxyribose. When DNA polymerase inserts one of these ddNTPs
into the growing DNA chain, the chain terminates, as nothing can
be added to its 3' end.
Sequencing Reaction
•
•
•
•
•
•
The template DNA is usually single stranded DNA,
which can be produced from plasmid cloning
vectors that contain the origin of replication from a
single stranded bacteriophage such as M13 or fd.
Infecting bacteria containing this vector with a
“helper phage” causes single stranded phage to be
produced. The phage DNA contains the cloned
insert
The primer is complementary to the region in the
vector adjacent to the multiple cloning site.
Sequencing is done by having 4 separate reactions,
one for each DNA base.
All 4 reactions contain the 4 normal dNTPs, but
each reaction also contains one of the ddNTPs.
In each reaction, DNA polymerase starts creating
the second strand beginning at the primer.
When DNA polymerase reaches a base for which
some ddNTP is present, the chain will either:
–
–
–
•
terminate if a ddNTP is added, or:
continue if the corresponding dNTP is added.
which one happens is random, based on ratio of dNTP
to ddNTP in the tube.
However, all the second strands in, say, the A tube
will end at some A base: you get a collection of
DNAs that end at each of the A's in the region being
sequenced.
Electrophoresis
•
•
•
•
•
The newly synthesized DNA from
the 4 reactions is then run (in
separate lanes) on an
electrophoresis gel.
The DNA bands fall into a ladderlike sequence, spaced one base
apart. The actual sequence can
be read from the bottom of the gel
up.
Automated sequencers use 4
different fluorescent dyes as tags
and run all 4 reactions in the same
lane of the gel.
Radioactive nucleotides (32P) are
used for non-automated
sequencing.
Sequencing reactions usually
produce about 500 bp of good
sequence.
Single Nucleotide Polymorphisms
•
•
•
•
Looking at many
individuals, you can see
that most bases in their
DNA are the same in
everyone. However, some
bases are different in
different individuals.
These changes are single
nucleotide polymorphisms
(SNPs).
SNPs are found
everywhere in the genome,
and they are inherited in a
regular Mendelian fashion.
These characteristics
makes them good markers
for finding disease genes
and determining their
inheritance.
Lots of ways to detect
SNPs, many of which are
easy to automate.
Primer extension: make a
primer 1 base short of the
SNP site, and then extend
the primer using DNA
polymerase with
nucleotides having
different fluorescent tags.
Gene Detection
• It is surprisingly hard to be sure that a given genomic
sequence is a gene: that it is ever expressed as RNA.
• Protein-coding regions are open reading frames (ORFs):
they don’t contain stop codons.
• But, human genes often contain long introns and very
short exons, and some parts of genes are introns in one
cell type but exons in other cell types. So, finding all the
pieces of a gene can be a challenge.
• Three questions:
– is a given DNA sequence ever expressed?
– is the sequence expressed in a given cell type or set of
conditions?
– what is the intron/exon structure of the sequence?
Evolutionarily Conserved
Sequences
•
•
•
•
When looking across different species, most DNA sequences are not conserved.
However, the exons of genes are often highly conserved, because their function is necessary for life.
Zoo blot: a Southern blot containing genomic DNA from many species. Probe it with the sequence in question:
exons will hybridize with other species’ DNA, while introns and non-gene DNA won’t.
Computer-based homology search: BLAST search. Do similar sequences appear in the nucleotide databases?
Especially chimpanzee and mouse, which have complete genome sequences available.
Detecting Gene Expression
• Northern blots: RNA
extracted from
various tissues or
experimental
conditions, run on an
electrophoresis gel,
then probed with a
specific DNA
sequence.
Detecting Gene Expression
• Real time PCR:
– first convert all mRNA in
a sample to cDNA using
reverse transcriptase,
– then amplify the region of
interest using specific
primers.
– Use a fluorescent probe
to detect and quantitate
the specific product as it
is being made by the
PCR reaction.
– the two components of
the fluorescent tag
interact to quench each
other. When one part is
removed by the Taq
polymerase, the
quenching stops and
fluorescence can be
detected.
Expressed Sequence Tags
• ESTs are cDNA clones that have has a single round of sequencing
done from one end.
• First extract mRNA from a given tissue. Then convert it to cDNA and
clone.
• Sequence thousands of EST clones and save the results in a
database.
• A search can then show whether your sequence was expressed in
that tissue.
– quantitation issues: some mRNAs are present in much higher
concentration than others. Many EST libraries are “normalized” by
removing duplicate sequences.
• Also can get data on transcription start sites and exon/intron
boundaries by comparing to genomic DNA
– but sometimes need to obtain the clone and sequence the rest of it
yourself.
Etc.
• New techniques in DNA/RNA technology
are being developed constantly. The main
goal is to increase reliability and decrease
cost. Primarily the aim is to automate as
much as possible.
• Just a few techniques we are not going to
discuss: RACE, SAGE, differential
display, S1 nuclease protection
Protein Methods
•
•
It is important to be sure that the
protein product of a gene is made,
and to know where in the tissue or
cell it is made, and how much is
made.
Most protein detection is based on
either making antibodies to the
protein of interest, or by making a
fusion protein: your protein fused
to a fluorescent protein.
– GFP: green fluorescent protein.
Isolated from jellyfish. Several
variants give different colors. It
still works when it is fused to other
proteins.
•
Often done in conjunction with
confocal microscopy: examining
the same image with visible light
and fluorescence.
Antibodies
• If you inject rabbits (usually)
with your protein, the rabbit will
develop an immune response
against it. The antibodies can
be isolated from the blood.
• Antibodies bind very
specifically to the antigen. The
antibodies can be detected by
a labeled second antibody that
binds to the first antibody:
fluorescein-labeled goat antirabbit antibody for instance.