Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LECTURE 2. DNA Sequencing and Structural Genomics Sequencing with DNA Polymerases and Chain Terminators (Sanger sequencing) Synthesize new DNA using cloned DNA as template. Depends on hybridization of a primer to the DNA template. 1980 Nobel Prize Fred Sanger Manual Sanger Sequencing Properties of DNA Pols used for Sequencing Processivity rate of * polymerase# Enzyme 3' exo Klenow (+) 10-50 45 Reverse Transcriptase (-) 10 5 T7 sequenase** (-) 2000-3000 300 (-) 7500 35-100 Taq Major Problem with Sanger sequencing: DNA secondary structures form with ss DNA. Intramolecular Watson-Crick Base pairs Causes Stops and Compressions=Gel Artifacts (bases are closer together than normal spacing) This is especially a problem in GC rich regions (which form stable "hairpins"). STRATEGIES for DNA SEQUENCING -DIRECTED SEQUENCING Start at ends of cloned DNA molecule using UNIVERSAL PRIMER SITES present in the vector sequence. Design a new sequencing primer based on the first round of sequence to continue the job: PRIMER WALKING USED FOR SMALLER DNAs: cDNAs: <10 KB -RANDOM SEQUENCING Fragment the cloned DNA randomly and subclone pieces into vector. Sequence all clones using UNIVERSAL PRIMER. Use a computer to align sequence overlaps and determine the entire sequence of the starting DNA USE FOR LONG DNAs: BACS, etc. (GENOMIC) PRIMER WALKING STRATEGIES for DNA SEQUENCING -DIRECTED SEQUENCING Start at ends of cloned DNA molecule using UNIVER PRIMER SITES present in the vector sequence. Des new sequencing primer based on the first round of sequence to continue the job: PRIMER WALKING USED FOR SMALLER DNAs: cDNAs: <10 KB -RANDOM SEQUENCING Fragment the cloned DNA randomly and subclone p into vector. Sequence all clones using UNIVERSAL PRIMER. Use a computer to align sequence overlap determine the entire sequence of the starting DNA USE FOR LONG DNAs: BACS, etc. (GENOMIC) RANDOM SEQUENCING BAC clone Genomes are LARGE and impractical to sequence by manual methods 50 genes 4100 genes 6000 genes 18,000 genes 35-70,000 genes? 14,000 genes BOTTLENECKS IN LARGE SCALE AUTOMATED SEQUENCING: -sub-cloning of target DNA into appropriate vectors -preparation of DNA of quality suitable for sequencing -setting up sequencing reactions -pouring and loading sequencing gels -GEL ELECTROPHORESIS ARTIFACTS (due to seconda DNA structures). ALTERNATIVES to gels for separating sequencing products: -sequencing by HYBRIDIZATION -Mass Spectrometry Matrix-Assisted Laser Desorption/Ionization Time of Flight Mass Spectrometry (MALDI-TOFMS) -capillary electrophoresis 40 cm 50-100 uM - + 1. Ultra-thin, long gels can be run at very high voltag 2kV to 10kV: short runs, theoretically good separati 2. Samples can be directly loaded from 96-well plate format by electrophoresis: easy to automate 3. Use non-polymerized gel media: can be automatically removed and replaced in between run don't have to take apart and make sequencing gels 4. Capillaries can be clustered: new automated mod has 4 X 16 (96) arrays. The ABI 3700 Automated Sequencer: Quick, Cheap Genome Sequencing Emission Spectra of dyes used with the ABI3700 Front View Fully Automated System that Requires 5 min of manpower per run: Example: Let's say we that the 9 kV run gives us 600 bp reliably for run 4 runs (10 hr day) X 96 X 600= 230,400 bp per day! Human Genome Project Goals: Three Orderly Steps to Complete the Genome Sequence 1) Complete Genetic Map The 1999 map is based on 42,000 STSs and ESTs (representing 30,000 genes) and 1102 informative microsattelite markers http://www.ncbi.nlm.nih.gov/genemap/ Currently, ~4.8 million Single Nucleotide Polymorphisms are (SNPs) are mapped. 1 SNP every 1200, on average ~25,000 associated with genes 2) Physical Map is largely assembled BAC Contigs for the Human Genome 3) As of 25 may, 1999 , ~19 % of the genome sequenced (+63% in “draft”) http://www.ncbi.nlm.nih.gov/genome/seq/ Goal: to finish entire sequence by 2003 Cost: $3 billion (orginal goal was 2005) Shotgun Sequencing the Human Genome: >90% of the genome has been completed since Spring 2000 by Celera Venter JC, Adams MD, Sutton GG, Kerlavage AR, Smith HO, Hunkapiller M 1998. Shotgun sequencing of the human genome. Science 1 5:1540-1542. Human Genome Plan is ordered: genetic map, contig, completely sequence the BACs that make up the contigs Shotgun Approach: (already proven successful for many bacterial genomes and in 2000 for drosophila): -just start sequencing random clones without bothering to order them -sequence them only from the ends (not completely) -sequence enough random clones this way and you will cover the entire genome -use sophisticated computer programs to put the genome back together Shotgun Approach: Randomly sequence clones from different types of libraries Covering the genome. A 100-kbp portion of the genome showing expected clone coverage needed for shotgun sequencing. 35 billion bases to be sequenced Time: less than 1 year Cost: ~$250 million April 2000: Celera finishes sequencing phase of the project: 11X coverage of the genome of four-five individuals September, 2000: Initial assembly of the human genome completed (using sequences in public databases as well) October 2000: Sequencing phase of mouse genome project completed; ~9 billion base pairs. Problems with this approach: Who’s DNA was sequenced? Craig Venter (Celera) -only 90-95% of genome can be sequenced: many gaps for others to fill -Sequence will not be annotated and may not be released in a timely fashion: in fact, you need to subscribe to Celera for this info Cost: $450,000 minimum per University -Are they doing this just to get a jump on patenting genes? Ethical problems?? What about the Genome Consortium? May, 1999 , ~19 % sequenced (+63% in “draft”) Sept, 2000 , ~24 % sequenced (+66% in “draft”) Oct 18, 2001 , ~47 % sequenced (+51% in “draft”) Genome Watch 23 Oct 2002 Draft 5.8% Finished 92.8% Total 98.6% Was Shotgun Sequencing of the Human Genome Successful? Waterston RH, Lander ES, Sulston JE. 2002. On the sequencing of the human genome. PNAS USA 99 :3712-371. The Celera assembly depended On BAC tiles in the public database; gaps in the Celera sequence were filled with sequence obtained from the public database NO! SORE LOSERS! The Truth: Both Approaches are Required To Sequence Large Genomes! Myers EW, Sutton GG, Smith HO, Adams MD, Venter JC. 2002. On the sequencing and assembly of the human genome. Proc Natl Acad Sci U S A.99 :4145-4146 Where are we now? Estimates Range that 2-20% of the genome still remains to be sequenced Completion of the genome is likely still 2-5 years away Gaps in BACs to fill; “unclonable” sequences? For example, still controversy over how many genes encoded in the human genome 30,000 or 70,000? Chr 21 BAC/gene map Chr 15 BAC/gene map See http://www.ncbi.nih.gov/cgi-bin/Entrez/hum_srch