Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Next Generation Sequencing Molecular Methods Sylvain Forêt March 2010 http://dayhoff.anu.edu.au/~sf/next_gen_seq 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary The Genomic Age Recent landmarks in genomics 1995 First bacterial genome (1.8 Mb) 1996 First eukaryotic genome (12 Mb) 1998 First animal genome (100 Mb) 2000 First human genome (3 Gb) The Post-Genomic Age Two big questions How can we continue sequencing ever faster? What can be done with all these sequences? The Archon X Prize To win the prize purse, the registered group must build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced for no more than $10,000 per genome. Other challenges and projects The $1,000 human genome The 1,000 genomes project (NHGRI, BGI, . . . ) Course Outline Next generation (massively parallel) sequencing Molecular methods (course 1) Applications (course 2, course 3) 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary Sanger Method Template Synthesis Primer DNA polymerase G T C A G T T T G C C A C Electrophoresis T T A T A C T A C G Chromatogram Quality score G A Sanger Method Summary Main characteristics Sequencing by synthesis Dye terminator method Input Material: any DNA in sufficient quantity PCR products Molecular clones ... Error and Quality Sources of error Material (contamination, polymorphism, etc) DNA polymerase Signal (more prevalent at the end of the sequences) Error and Quality Quality scores Each base sequenced is assigned a quality score Q By definition: Q = −10 × log10 (probability of error) Q Quality score Thus: probability of error = 10− 10 40 30 20 10 0 0 100 200 300 400 500 Position 600 700 800 900 Error and Quality Consequences Only relatively small sequences can be sequenced Long sequences must be sequenced in several steps For accuracy, each based should be covered more than once 0.003 0.002 0.001 0.000 Density 0.004 0.005 0.006 Histogram of sizes, mean = 743.9 N50 = 762 200 400 600 Size 800 1000 Shotgun sequencing DNA extraction DNA fragmentation Cloning into vectors Vector Grow vector in bacteria Primers Insert Extract and sequence vectors Map or assemble sequences Mate pairs 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary Illumina: Sample Preparation Biological Sample DNA extraction RNA fragmentation, size selection fragmentation, size selection reverse transcription (random primers) Illumina: Sequencing (1) Source: http://www.illumina.com Illumina: Sequencing (2) Source: http://www.illumina.com Illumina: Sequencing (3) Source: http://www.illumina.com Illumina: Sequencing (4) Source: http://www.illumina.com Illumina: Mate Pairs Illumina: Multiplexing Multiplexing Flow cell: 8 lanes Each lane: up to 96 samples Source: http://www.illumina.com Illumina Summary Main characteristics Sequencing by synthesis Reversible terminator method Current size: 100bp 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary Pyrosequencing Chemistry + A Template + Nucleotides Extended template + PPi (pyrophosphate) PPi + ADP phosphosulfate + sulfurylate ATP ATP + Luciferin + Luciferase Oxyluciferin 454 Pyrosequencing From: Margulis et al, Nature 2005 454: Poly-A Tails AAAAAAAAAAA TTTTTTTTTTT AAAAAAAAAAA TTTTTTTTTTT RE RE AAAAAAAAAAA TTGTTTCTTTT 454: Mate Pairs Internal adapter Insert (3kb−20kb) Internal adapter Circularize 150−180 bp Cut 150−180 bp Add sequencing adapters Sequence 454: Multiplexing Multiplexing Each plate: 1, 2, 4, 8 or 16 regions separated by gaskets Each region: up to 12 samples Adapter Template Adapter + MID Template (Multiplex Identifier) Source: http://www.454.com 454 Summary Main characteristics Sequencing by synthesis Pyrosequencing method Current size: 400bp 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary SOLiD: Sequencing (1) Source: http://solid.appliedbiosystems.com SOLiD: Sequencing (2) Source: http://solid.appliedbiosystems.com SOLiD: Sequencing (3) Source: http://solid.appliedbiosystems.com SOLiD: Sequencing (4) Source: http://solid.appliedbiosystems.com SOLiD: Sequencing (5) Source: http://solid.appliedbiosystems.com SOLiD: Mate Pairs Internal adapter Insert (600bp−10kb) Internal adapter Circularize Cut Add sequencing adapters Sequence SOLiD: Multiplexing Multiplexing Each run: 2 slides Each slides: 1, 2, 4, 8 regions Each region: up to 16 samples Source: http://solid.appliedbiosystems.com SOLiD Summary Main characteristics Sequencing by ligation Current size: 50bp 1 Introduction 2 Sanger 3 Illumina 4 454 5 SOLiD 6 Summary Summary Numbers, as of March 2010 454 Illumina SOLiD (Titanium) (Genome Analyser IIx ) (SOLiD 3) Mean read size 400bp 100bp 50 bp Reads per run 106 200 × 106 500 × 106 Run time 10 hours 4 days 1 week Insert size 3kb–20kb 200bp–5kb 600bp–10kb Summary Conclusions Fast moving field Other players: Helicos, Pacific Biosciences, Nano Pores, ... A $1,000 human genome seems possible within a few years Many applications Genome (re)sequencing Transcriptome sequencing ChIP-seq Metagenomics ...