Download Next Generation Sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Next Generation Sequencing
Molecular Methods
Sylvain Forêt
March 2010
http://dayhoff.anu.edu.au/~sf/next_gen_seq
1
Introduction
2
Sanger
3
Illumina
4
454
5
SOLiD
6
Summary
The Genomic Age
Recent landmarks in genomics
1995 First bacterial genome (1.8 Mb)
1996 First eukaryotic genome (12 Mb)
1998 First animal genome (100 Mb)
2000 First human genome (3 Gb)
The Post-Genomic Age
Two big questions
How can we continue sequencing ever faster?
What can be done with all these sequences?
The Archon X Prize
To win the prize purse, the registered group must build a
device and use it to sequence 100 human genomes within
10 days or less, with an accuracy of no more than one
error in every 100,000 bases sequenced for no more than
$10,000 per genome.
Other challenges and projects
The $1,000 human genome
The 1,000 genomes project (NHGRI, BGI, . . . )
Course Outline
Next generation (massively parallel) sequencing
Molecular methods (course 1)
Applications (course 2, course 3)
1
Introduction
2
Sanger
3
Illumina
4
454
5
SOLiD
6
Summary
Sanger Method
Template
Synthesis
Primer
DNA
polymerase
G
T
C
A
G
T
T
T
G
C
C
A
C
Electrophoresis
T
T
A
T
A
C
T
A
C
G
Chromatogram
Quality score
G
A
Sanger Method Summary
Main characteristics
Sequencing by synthesis
Dye terminator method
Input Material: any DNA in sufficient quantity
PCR products
Molecular clones
...
Error and Quality
Sources of error
Material (contamination, polymorphism, etc)
DNA polymerase
Signal (more prevalent at the end of the sequences)
Error and Quality
Quality scores
Each base sequenced is assigned a quality score Q
By definition: Q = −10 × log10 (probability of error)
Q
Quality score
Thus: probability of error = 10− 10
40
30
20
10
0
0
100
200
300
400
500
Position
600
700
800
900
Error and Quality
Consequences
Only relatively small sequences can be sequenced
Long sequences must be sequenced in several steps
For accuracy, each based should be covered more than once
0.003
0.002
0.001
0.000
Density
0.004
0.005
0.006
Histogram of sizes, mean = 743.9 N50 = 762
200
400
600
Size
800
1000
Shotgun sequencing
DNA extraction
DNA fragmentation
Cloning into vectors
Vector
Grow vector in bacteria
Primers
Insert
Extract and sequence vectors
Map or assemble sequences
Mate pairs
1
Introduction
2
Sanger
3
Illumina
4
454
5
SOLiD
6
Summary
Illumina: Sample Preparation
Biological Sample
DNA
extraction
RNA
fragmentation,
size selection
fragmentation,
size selection
reverse transcription
(random primers)
Illumina: Sequencing (1)
Source: http://www.illumina.com
Illumina: Sequencing (2)
Source: http://www.illumina.com
Illumina: Sequencing (3)
Source: http://www.illumina.com
Illumina: Sequencing (4)
Source: http://www.illumina.com
Illumina: Mate Pairs
Illumina: Multiplexing
Multiplexing
Flow cell: 8 lanes
Each lane: up to 96 samples
Source: http://www.illumina.com
Illumina Summary
Main characteristics
Sequencing by synthesis
Reversible terminator method
Current size: 100bp
1
Introduction
2
Sanger
3
Illumina
4
454
5
SOLiD
6
Summary
Pyrosequencing Chemistry
+
A
Template
+
Nucleotides
Extended template
+
PPi (pyrophosphate)
PPi
+
ADP phosphosulfate
+
sulfurylate
ATP
ATP
+
Luciferin
+
Luciferase
Oxyluciferin
454 Pyrosequencing
From: Margulis et al, Nature 2005
454: Poly-A Tails
AAAAAAAAAAA
TTTTTTTTTTT
AAAAAAAAAAA
TTTTTTTTTTT
RE
RE
AAAAAAAAAAA
TTGTTTCTTTT
454: Mate Pairs
Internal adapter
Insert (3kb−20kb)
Internal adapter
Circularize
150−180 bp
Cut
150−180 bp
Add sequencing adapters
Sequence
454: Multiplexing
Multiplexing
Each plate: 1, 2, 4, 8 or 16 regions separated by gaskets
Each region: up to 12 samples
Adapter
Template
Adapter + MID
Template
(Multiplex Identifier)
Source: http://www.454.com
454 Summary
Main characteristics
Sequencing by synthesis
Pyrosequencing method
Current size: 400bp
1
Introduction
2
Sanger
3
Illumina
4
454
5
SOLiD
6
Summary
SOLiD: Sequencing (1)
Source: http://solid.appliedbiosystems.com
SOLiD: Sequencing (2)
Source: http://solid.appliedbiosystems.com
SOLiD: Sequencing (3)
Source: http://solid.appliedbiosystems.com
SOLiD: Sequencing (4)
Source: http://solid.appliedbiosystems.com
SOLiD: Sequencing (5)
Source: http://solid.appliedbiosystems.com
SOLiD: Mate Pairs
Internal adapter
Insert (600bp−10kb)
Internal adapter
Circularize
Cut
Add sequencing adapters
Sequence
SOLiD: Multiplexing
Multiplexing
Each run: 2 slides
Each slides: 1, 2, 4, 8 regions
Each region: up to 16 samples
Source: http://solid.appliedbiosystems.com
SOLiD Summary
Main characteristics
Sequencing by ligation
Current size: 50bp
1
Introduction
2
Sanger
3
Illumina
4
454
5
SOLiD
6
Summary
Summary
Numbers, as of March 2010
454
Illumina
SOLiD
(Titanium)
(Genome Analyser IIx )
(SOLiD 3)
Mean read size
400bp
100bp
50 bp
Reads per run
106
200 × 106
500 × 106
Run time
10 hours
4 days
1 week
Insert size
3kb–20kb
200bp–5kb
600bp–10kb
Summary
Conclusions
Fast moving field
Other players: Helicos, Pacific Biosciences, Nano Pores, ...
A $1,000 human genome seems possible within a few years
Many applications
Genome (re)sequencing
Transcriptome sequencing
ChIP-seq
Metagenomics
...