Download sequences?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LECTURE 2.
DNA Sequencing and
Structural Genomics
Sequencing with DNA Polymerases and Chain
Terminators (Sanger sequencing)
Synthesize new DNA using cloned DNA as template.
Depends on hybridization of a primer to the DNA template.
1980 Nobel Prize
Fred Sanger
Manual Sanger Sequencing
Properties of DNA Pols used for Sequencing
Processivity
rate of
*
polymerase#
Enzyme
3' exo
Klenow
(+)
10-50
45
Reverse
Transcriptase
(-)
10
5
T7
sequenase**
(-)
2000-3000
300
(-)
7500
35-100
Taq
Major Problem with Sanger sequencing:
DNA secondary structures form with ss DNA.
Intramolecular Watson-Crick Base pairs
Causes Stops and Compressions=Gel Artifacts
(bases are closer together than normal spacing)
This is especially a problem in GC rich regions
(which form stable "hairpins").
STRATEGIES for DNA SEQUENCING
-DIRECTED SEQUENCING
Start at ends of cloned DNA molecule using UNIVERSAL
PRIMER SITES present in the vector sequence. Design a
new sequencing primer based on the first round of
sequence to continue the job: PRIMER WALKING
USED FOR SMALLER DNAs: cDNAs: <10 KB
-RANDOM SEQUENCING
Fragment the cloned DNA randomly and subclone pieces
into vector. Sequence all clones using UNIVERSAL
PRIMER. Use a computer to align sequence overlaps and
determine the entire sequence of the starting DNA
USE FOR LONG DNAs: BACS, etc. (GENOMIC)
PRIMER WALKING
STRATEGIES for DNA SEQUENCING
-DIRECTED SEQUENCING
Start at ends of cloned DNA molecule using UNIVER
PRIMER SITES present in the vector sequence. Des
new sequencing primer based on the first round of
sequence to continue the job: PRIMER WALKING
USED FOR SMALLER DNAs: cDNAs: <10 KB
-RANDOM SEQUENCING
Fragment the cloned DNA randomly and subclone p
into vector. Sequence all clones using UNIVERSAL
PRIMER. Use a computer to align sequence overlap
determine the entire sequence of the starting DNA
USE FOR LONG DNAs: BACS, etc. (GENOMIC)
RANDOM SEQUENCING
BAC clone
Genomes are LARGE and impractical to
sequence by manual methods
50 genes
4100 genes
6000 genes
18,000 genes
35-70,000 genes?
14,000 genes
BOTTLENECKS IN LARGE SCALE
AUTOMATED SEQUENCING:
-sub-cloning of target DNA into appropriate vectors
-preparation of DNA of quality suitable for sequencing
-setting up sequencing reactions
-pouring and loading sequencing gels
-GEL ELECTROPHORESIS ARTIFACTS (due to seconda
DNA structures).
ALTERNATIVES to gels for separating sequencing
products:
-sequencing by HYBRIDIZATION
-Mass Spectrometry
Matrix-Assisted Laser Desorption/Ionization Time of
Flight Mass Spectrometry (MALDI-TOFMS)
-capillary electrophoresis
40 cm
50-100 uM -
+
1. Ultra-thin, long gels can be run at very high voltag
2kV to 10kV: short runs, theoretically good separati
2. Samples can be directly loaded from 96-well plate
format by electrophoresis: easy to automate
3. Use non-polymerized gel media: can be
automatically removed and replaced in between run
don't have to take apart and make sequencing gels
4. Capillaries can be clustered: new automated mod
has 4 X 16 (96) arrays.
The ABI 3700
Automated Sequencer:
Quick, Cheap Genome
Sequencing
Emission Spectra of dyes used
with the ABI3700
Front View
Fully Automated System that
Requires 5 min of manpower per run:
Example: Let's say we that the 9 kV run gives us 600 bp reliably
for run
4 runs (10 hr day) X 96 X 600= 230,400 bp per day!
Human Genome Project Goals: Three Orderly
Steps to Complete the Genome Sequence
1) Complete Genetic Map
The 1999 map is based on 42,000 STSs and ESTs (representing 30,000
genes) and 1102 informative microsattelite markers
http://www.ncbi.nlm.nih.gov/genemap/
Currently, ~4.8 million Single
Nucleotide Polymorphisms are
(SNPs) are mapped.
1 SNP every 1200, on average
~25,000 associated with genes
2) Physical Map is largely assembled
BAC Contigs for the Human Genome
3) As of 25 may, 1999 , ~19 % of the genome
sequenced (+63% in “draft”)
http://www.ncbi.nlm.nih.gov/genome/seq/
Goal: to finish entire sequence by 2003
Cost: $3 billion (orginal goal was 2005)
Shotgun Sequencing the Human Genome:
>90% of the genome has been completed
since Spring 2000 by Celera
Venter JC, Adams MD, Sutton GG, Kerlavage AR, Smith HO, Hunkapiller M 1998.
Shotgun sequencing of the human genome. Science 1 5:1540-1542.
Human Genome Plan is ordered: genetic map, contig,
completely sequence the BACs that make up the contigs
Shotgun Approach: (already proven successful for many
bacterial genomes and in 2000 for drosophila):
-just start sequencing random clones without bothering to
order them
-sequence them only from the ends (not completely)
-sequence enough random clones this way and you will
cover the entire genome
-use sophisticated computer programs to put the genome
back together
Shotgun Approach: Randomly sequence clones
from different types of libraries
Covering the genome. A 100-kbp portion of the genome
showing expected clone coverage needed for shotgun sequencing.
35 billion bases to be sequenced
Time: less than 1 year
Cost: ~$250 million
April 2000: Celera finishes sequencing phase of the project:
11X coverage of the genome of four-five individuals
September, 2000: Initial assembly of the human genome completed
(using sequences in public databases as well)
October 2000: Sequencing phase of mouse genome project
completed; ~9 billion base pairs.
Problems with this approach:
Who’s DNA was sequenced?
Craig Venter (Celera)
-only 90-95% of genome can be sequenced:
many gaps for others to fill
-Sequence will not be annotated and may not
be released in a timely fashion: in fact, you
need to subscribe to Celera for this info
Cost: $450,000 minimum per University
-Are they doing this just to get a jump on
patenting genes? Ethical problems??
What about the Genome Consortium?
May, 1999 , ~19 %
sequenced (+63% in “draft”)
Sept, 2000 , ~24 % sequenced
(+66% in “draft”)
Oct 18, 2001 , ~47 % sequenced
(+51% in “draft”)
Genome Watch
23 Oct 2002
Draft
5.8%
Finished
92.8%
Total
98.6%
Was Shotgun Sequencing of the Human Genome Successful?
Waterston RH, Lander ES, Sulston JE.
2002. On the sequencing of the human genome. PNAS USA 99 :3712-371.
The Celera assembly depended
On BAC tiles in the public database;
gaps in the Celera sequence were
filled with sequence obtained from the
public database
NO!
SORE
LOSERS!
The Truth:
Both Approaches are Required
To Sequence Large Genomes!
Myers EW, Sutton GG, Smith HO, Adams MD, Venter JC.
2002. On the sequencing and assembly of the human genome.
Proc Natl Acad Sci U S A.99 :4145-4146
Where are we now?
Estimates Range that 2-20% of the genome still remains to be sequenced
Completion of the genome is likely still 2-5 years away
Gaps in BACs to fill; “unclonable” sequences?
For example, still controversy over how many genes encoded in
the human genome 30,000 or 70,000?
Chr 21 BAC/gene map
Chr 15 BAC/gene map
See http://www.ncbi.nih.gov/cgi-bin/Entrez/hum_srch