Download Sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Helicase wikipedia , lookup

DNA replication wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

DNA polymerase wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Microsatellite wikipedia , lookup

Replisome wikipedia , lookup

Helitron (biology) wikipedia , lookup

DNA sequencing wikipedia , lookup

Exome sequencing wikipedia , lookup

Transcript
Sequencing
Definition
§  Sequencing of DNA isthe determination of the
succession of the nuclotides constituting the DNA.
§  It is nowadays a routine technique used in biology
labs.
§  This technique uses the knowledge acquired during
the past 30 years in DNA replication mechanisms.
2
2
é
c
Nucleic Acids
Nucleic acid = polymer of nucleotides
Nucleotides
(nucleic acid: Friedrich MIESCHER, 1871)
Phoebus LEVENE, 1919
Bases
3
l3
m
oc
rg
Protéines
– Glucides
DNA and RNA
r
– A. nucléique
RNA
ribonucleic acid
DNA
desoxyribonucleic acid
More
stable
3’–5 ’
t t t
t t t
44
Cours G. BARTHOLE, ENS Cachan
Secondary structure of DNA
§  1947-1950 Erwin CHARGAFF [Nature 165, 756 (1950)]
Discovers that in humans the proportions are:
Adénine ≈ Thymine (≈30%)
Cytosine ≈ Guanine (≈20%)
è  can be explained by A-T andC-G pairing
§  1953 Rosalind FRANKLIN, James WATSON & Francis CRICK
X ray diffraction pattern (R.
FRANKLIN) showing a cross
characteristic of an helicoidal
structure
Pray, L., Nature Education 1, 100 (2008)
55
5’ End
(relative to the
position of the C in the
pentose)
Base pairing
Hydrogene bond
Synthesis from 5’
to 3’
Pray, L., Nature Education 1, 100 (2008)
3’ End
66
Protein synthesis
Transcription:
DNAà mRNA
Nucleus
Translation:
mRNAà protein
Amino acid
Proteins
Cell
membrane
88
1977
1ST GENERATION SEQUENCING:
SANGER METHOD
9
Sanger Technique
§  Frederick SANGER (1918-2013)
english biochemist who received
2 Nobel prices of chemistry:
–  1958 : structure of proteins (insulin)
–  1980 : for the sequencing
§  The DNA polymerases synthesize a complementary
DNA strand starting from a matrix strand.
§  For sequencing, slightly different nucleotides are
used: the dideesoxyribonucleotides (ddNTP) instead
of the usual desoxyribonucleotides triphosphates
(dNTP).
10
10
dNTP vs ddNTP
§  The difference between ddNTP and dNTP is the absence
of a OH moetie in 3’ position.
§  So, when a DNA polymerase uses a ddNTP, it cannot add
any nucleotide after: the strand synthesis stops.
5’
CH2-phosphate
O
5’
CH2-phosphate
1’ base
O
3’
OH
OH present in 3’ :
dNTP
1’ base
3’
H
OH absent in 3’ :
ddNTP
11
11
Protocole (Sanger)
§  4 solutions are prepared, each containing:
–  the fragment to be sequenced,
–  a small DNA with its sequence complementary to the 3’
end of the fragment to be sequenced= primer
–  the 4 dNTP's (dCTP, dATP, dGTP, dTTP)
–  DNA polymerase
DNA with a known
sequence
primer complementary of a
part of the known sequence
unknown DNA, to be sequenced
synthesis of the DNA complementary strand by
a DNA polymerase
12
12
Protocole
§  In each tube, small quantities of one fluorescent or
radioactif (32P) ddNTP are added
http://wwwarpe.snv.jussieu.fr/coursvt/images_10/sangerprinc.gif
§  The random incorporation of one ddNTP stops the
synthesis.
à At the end of the reactions, a set of DNA strands with
various sizes are obtained, depending on the location
where a ddNTP was incorporated.
13
13
Protocole
Synthesis of the complementary strand è if stop due
to a ddGTP, it means that there is a Cytosine in the
original sequence
14
14
strand reading
fragment length
electrophoresis migration
(4 wells)
3’
known
DNA
DNA to be
sequenced
5’
3’
primer5’(known)
20 nt
synthesis from 5’ to 3’
28
GTAGGCAT
DNA to be sequenced
5’-ATGCCTAC-3’
27
GTAGGCA
26
GTAGGC
25
GTAGG
24
GTAG
23
GTA
22
GT
21
G
Exemple of autoradiography (32P
labeling) of an
electrophoresis
gel.
15
Optical reading of the strands
§  Labeling of each ddNTP with a
different fluorophore (spectrally
separated)
Chromatogramme
§  Capillary Electrophoresis (modern
machine)
Asset: sequencing in a single reaction instead of 4.
16
16
Automatisation
Voir aussi une animation « flash » sur http://www.yourgenome.org/teachers/sequencing.shtml
17
17
Example of application
Research of genetic markers of cancers
Normal
(wild type)
Non-small-cell lung cancer:
look for genetic markers
related to ERBB2 gene.
Région dupliquée
Sick
Presence of a duplicated
region [GCATACGTGATG] of
ERBB2, appearing in several
sequences è association with
the disease
18
18
Performances & Limitations
Performances of modern Sanger sequencers
§  several hundreds of samples simultaneously and one
sequencing per hour.
§  Sequences of 300-1000 nucleotides max
Limitations
§  If amplification (PCR) before sequencing: small parts
of the amplification vector sequence found in the
Sanger sequencing.
§  Mistakes at the beginning of the sequence:
incorrect recognition of the primer.
§  Low resolution between sequence having only 1 nt
length difference.
19
19
2nd generation sequencing:
pyrosequencing
20
Pyrosequencing
§  Based on a « sequencing by synthesis » principle, by
opposition with the sequencing by « terminaison » of
the Sanger method.
§  Sequencing of a single strand DNA by the synthesis of
the complementary strand, base per base by
detecting at each step the polymerase activity using
an other chemiluminescent enzyme : the luciferase.
21
21
Pyrosequencing
§  Nucleotides (dNTP) added
sequentially (≠Sanger sequencing)
Polymerase
ACCTTGAATTCGTCCTAGGA----GATCCT-------dNTP
22
22
Pyrosequencing
§  Nucleotides (dNTP) added
sequentially (≠Sanger sequencing)
§  If it is the right one: incorporation and
release of a pyrophosphate (PPi)
§  Ppi à ATP by action ofATP-sulfurylase
§  L’ATP apporte l’énergie nécessaire à
la réaction de conversion de la
luciférine par la luciférase. Cette
réaction génère de la lumière visible
dont l’intensité est proportionnelle à
la quantité d’ATP.
Polymerase
ACCTTGAATTCGTCCTAGGA----GGATCCT-------dGTP
PPi
§  UnePPi
Apyrase dégrade les
nucléotides en surplus
23
23
Pyrosequencing
§  Nucleotides (dNTP) added
sequentially (≠Sanger sequencing)
§  If it is the right one: incorporation
and release of a pyrophosphate
(PPi)
§  Ppi à ATP by action of ATPsulfurylase
Polymerase
ACCTTGAATTCGTCCTAGGA----GGATCCT-------dGTP
PPi
ATP-sulfurylase
ATP
24
24
Pyrosequencing
§  Nucleotides (dNTP) added
sequentially (≠Sanger sequencing)
§  If it is the right one: incorporation
and release of a pyrophosphate
(PPi)
Polymerase
ACCTTGAATTCGTCCTAGGA----GGATCCT-------dGTP
§  Ppi à ATP by action of ATPsulfurylase
§  The ATP brings the energy necessary
to the reaction of luciferine
convertion by the luciferase. This
reaction generates visible light. Its
intensity is proportionnal to the
quantity of ATP.
PPi
ATP-sulfurylase
ATP
+ Luciférine
Luciferase
Light
25
25
Pyrosequencing
§  Nucleotides (dNTP) added
sequentially (≠Sanger sequencing)
§  If it is the right one: incorporation
and release of a pyrophosphate
(PPi)
§  Ppi à ATP by action of ATPsulfurylase
§  The ATP brings the energy necessary
to the reaction of luciferine
convertion by the luciferase. This
reaction generates visible light. Its
intensity is proportionnal to the
quantity of ATP.
Polymerase
ACCTTGAATTCGTCCTAGGA----GGATCCT-------dNTP
Apyrase
PPi
ATP-sulfurylase
ATP
dNMP
Luciferase
Light
§  An Apyrase degradates the
remaining nucleotides.
26
26
Implementation: « 454 technology»
Pyrosequencing vs. Sanger
§  100 faster and cheaper
§  sequenced fragments shorter (but in progress)
454 sequencing (Roche Diagnostics)
Integration of several high-tech methods:
–  pyrosequencing,
–  picotiter plates made of optical fibers (1,6 millions wells)
–  emulsion PCR (emPCR) in microreactors (300 000 PCR
reactions in parallel)
–  Image analysis…
27
27
Emulsion based clonal amplification
http://classes.soe.ucsc.edu/bme215/Spring09/PPT/BME%20215-5.pdf
28
28
Loading of the beads in a multi-well plate
400 000 sequencing reactions in parallel
29
29
http://www.biopsci.com/2012/02/22/sequencage-de-ladn-la-revolution-est-de-nouveau-en-marche/
Exemple of a « 454 » machine
Roche « GS FLX System + »
Sequencing of 100 to 400 Mbases in 7 hours (per machine)
30
30
Whole genome sequencing
31
31
Sequencing of an entire genome
§  Sequencable part of the human genome: 2,9 Gpb !
–  Impossible to read in one time
–  Anyway, the biologists don’t know how to manipulate so
long DNAs.
–  However, possibility to sequence « relatively fast » with the
new technologies.
Ø  Basic principles for sequencing a genome :
1.  Random fragmentation in large pieces.
2.  Sequencing of the piece ends.
3.  Reconstruction bu using overlapping fragments.
32
32
Parallelisation
« Factory »
making
a bacterial
Library
factory
- artificial
chromosome library (BAC) at the
Whitehead Institute
Whitehead Institute (MIT, USA)
Nature 409, 860 (2001)
Sequencing « factory »
Sequencing
factory (Sanger Institute, UK)
Sanger Institute
33
33
Whole genome shotgun
§  Many overlapping reads are needed
à Some regions will be sequenced many times.
à Despite this redondancy, holes remain.
§  Genome is randomly broken up in many small segments
§  The segments are sequenced using the chain termination
method to obtain reads.
§  Reconstruction
Ex: Haemophilus influenzae (1st sequenced bacteria)
•  1,8 Mb broken up mecanically to give a library of ~2000 bp.
•  20 000 sequenced segments (starting by one or both ends)
•  24 000 reads kept, having an average length of 470 pb
•  à 11,6 Mpb sequenced i.e. 6.3 times the genome, but the
coverage was not perfect!
34
34
Clone by clone sequencing
Nature 409, 860 (2001)
1.  Bacterial
artificial
chromosome
(BAC): 100-200
kb clone
2.  Physical map
to sort the
clones
3.  Sequencing of
small
fragments
(100-1000 bp)
Sanger
sequencing
Overlaps
Summary: http://www.snv.jussieu.fr/vie/dossiers/genomes/methodes_resume.htm
35
35
Chronology
1977
ϕ X147
bacterial
virus
5386 nt
11 genes
1984
HIV
retrovirus
1995
Haemophilus
influenzae
bacteria
1.8 Mbp
1740 genes
1997
Saccharomyces
cerevisae
yeast
13 Mpb
6275 genes
1997
Escherichia coli
bacteria
4,6 Mpb
1998
Caenorhabditis
elegans
animal
97 Mpb
2000
Arabidobsis
thaliana
vegetal
157 Mpb
+ rice
food plant
430 Mpb
cereal with the
smallest genome
90% of genes in
common with us
2002
mouse
2,5 Gpb
2007
zebra fish
1,7 Gpb
circular single strand
DNA
37
37
Human genome project
1988
Human Genome Organisation
1992
1st map of the human genome
1999
sequencing of the 1st human chromosome (22)
2001
Rough draft of the human genome completed
2005
Human metagenome project
38
38
Pacific Biosciences
ZERO-MODE-WAVEGUIDES
39
39
Principle
§  Sequencing by detection of the incorporation of
fluorescent nucleotides.
§  Detection at the single molecule level.
40
40
Single molecule flurescence microscopy:
confocal microscopy
scanner
(x,y,z)
objective
excitation laser
dichroic mirror
tube lens
photodetector
41
41
Single molecule fluorescence microscopy:
TIRFM
§  total-internal reflexion
fluorescence microscopy
(TIRFM)
evanescent
wave
coverslip
fluorophores
oil
Objective
NA=1,45
excitation
beam
fluorescence
signal
42
42
Single molecule fluorescence microscopy:
elementary excitation volumes
§  confocal
§  in TIRFM
onde
évanescente
lamelle
fluorescents
huile
Objectif
ON=1,45
Faisceau
d’excitation
Signal de
fluorescence
Elem.
excitation
volume
/ON
100nm
1,22
Vélem =0,02 fL
NA=1,45
=550nm
1,22
4n
Vélem =0,16 fL
/ON
/ON2
43
43
concentration at which there is, on average,
sion events at high concentrations. The diffusion
Consequence
on
the
fluorescently
labeled
one molecule in the observation volume at
coefficient and
quantum yield of R110-dCTP
any given time. Volumes as small as 10
were found independently to be 2.24 , 10
molecules
concentration
cm s and 77%, respectively (22). These pazeptoliters,
more than four orders of magni-
will affect the quantum yield of the fluorophore and therefore the shape of the observation volume. In general, the radiative rate of a
dipole is proportional to the density of photonic states available for emission at the appropriate frequency (24, 25). A detailed calculation of changes to the radiative rate as a
function of position in zero-mode
waveguides is beyond the scope of this paper.
However, for our current purpose we will
make the approximation that the photon density of states, and hence the radiative rate, is
proportional to the output coupling from the
waveguide and therefore proportional to p(z).
The fluorescence quantum yield, Q, is a function of the radiative and nonradiative rates of
dipole de-excitation, kr and knr, such that
-6
2
tude smaller than the diffraction limit, are
possible. Thus, for the smallest waveguides it
is possible to work at concentrations as high
as 200 'M and still have less than one molecule per volume.
Arrays of zero-mode waveguides were
manufactured as small holes in an 89-nm thick
film of aluminum on fused silica coverslips
(Fig. 3). Holes of various diameters were patterned with the use of electron beam lithography followed by reactive ion etching (22).
FCS was used to characterize the observation volume inside the waveguides and to demonstrate their usefulness for high-concentration
FCS and cross-correlation. One-dimensional
FCS curves can be derived from the profile S(z)
with the use of either a Fourier (27) or a Laplace
(28) transform, assuming nonstick boundary
conditions,
resulting
in the expression for the
T. Traut,
Mol. Cell.
autocorrelation
function
G(()
Biochem.,140,1 (1994)
-1
rameters were used to derive G(() for
waveguides of various diameters with the assumption that the quantum yield at the entrance
of the waveguide is the same as that for the
freely diffusing dye. Fits to a 43-nm waveguide for various concentrations of fluorophore
are shown in Fig. 4A; for comparison, a curve
from a conventional, diffraction-limited volume
using a dye concentration of 4 nM is also
shown. Zero-mode waveguides increased
the usable concentration range by well over
three orders of magnitude.
The value of G(0) scales as expected with
concentration; however, a nonfluctuating
background, B, from the large pool of highly
concentrated dye on the opposite side of the
waveguide can affect the measured value of
G(0) such that
N
G(0) !
(N " B)2
§  If more than one fluorescent molecule is in one
elementary excitation volume: we cannot detect
single molecules.
§  Max. reachable concentrations: 100 nM for TIRFM and
10 nM for confocal
§  Physiological
concentrations:
k (z)
p (z)
Q(z) !
r
k r (z) " k nr
#
p(z) " C
(2)
–  dATP, 24±22 µM
where C is a constant such that Q(0) equals
5.2±4.5
the quantum – 
yielddGTP,
at the entrance
of the µM
waveguide.
–  bare
dCTP,
29±19
Quenching by
metal could
contribute µM
to
–  dTTP 37±30 µM
Fig. 3. A fused silica coverslip with zero-mode waveguides arrays. (A) The
è Excitation volume
must be smaller!
44
44
Levene,
M. J.
Science
299, 682–686
electron microscope
image
of et
an al.,
individual
waveguide
is shown(2003).
in (D).
mode waveguides.
Zero-mode waveguide
For
> c=1.7d
no propagative TE11
mode
Aluminium
0
d
B
150
Solution
z
100
nm
z
A
50
Al
⇤
0
evanescent
wave
-1
-2
Al
-3
I(z) = I0 e
z/⇤
0
-4
Fused Silica
-50
-100
-50
0
nm
50
100
45
Fig. 2. (A) Three-dimensional finite-element time-domain simulation of the 45
inten
diameter
and 100
nm long.waveguide
(B) S(z) curves for different
waveguide
zero
mode
(50 nm
x100diameter,
nm) d. (C
From nanostructures to confine the
excitation volume …
46
46
Levene, M. J. et al., Science 299, 682–686 (2003).
… to the sequencing of a single DNA
47
47
… to the sequencing of a single DNA
48
48
Summary
https://www.youtube.com/watch?v=WMZmG00uhwU&list=UU2y78sjVOumGc2da1tN629g
SMRT = single molecule real time
49
49
Sources
§ 
Thanks to François Treussart (ENS Cachan) for improving these slides.
§ 
Animation séquençage de Sanger : http://www.yourgenome.org/teachers/sequencing.shtml
§ 
Transparents sur le séquençage de Sanger : http://www.dil.univ-mrs.fr/~vancan/optionBio1/cours.html#htoc15
(cours de Sophie Bleves)
§ 
Pyroséquençage : voir http://gepv.univ-lille1.fr/
§ 
§ 
§ 
technologie 454 : http://classes.soe.ucsc.edu/bme215/Spring09/PPT/BME%20215-5.pdf et
http://www.biopsci.com/2012/02/22/sequencage-de-ladn-la-revolution-est-de-nouveau-en-marche/
Les technologies de laboratoire n°5 juillet-août 2007 « Evolution des techniques de séquençage »
T4 : photo issue de http://en.wikipedia.org/wiki/Frederick_Sanger
§ 
T5 : schéma issu de http://www.mun.ca/biology/scarr/iGen3_02-07.html
§ 
T7 : schéma inspiré de http://www.biology.arizona.edu/biochemistry/problem_sets/large_molecules/06t.html
§ 
T9 & T11 : schémas issus de
https://facmed.univ-rennes1.fr/wkf//stock/RENNES20080328110058vdavidSEquencage.pdf
§ 
T12 : http://dc202.4shared.com/doc/ICPFo0Ga/preview.html
§ 
génome humain : http://www.snv.jussieu.fr/vie/dossiers/genomes/methodes_intro.htm
§ 
T29-30 cf http://www.universalis.fr/encyclopedie/sequencage-d-adn-reperes-chronologiques/
§ 
zebra fish :
http://www.zmescience.com/medicine/mind-and-brain/zebrafish-locomotion-human-evolution-942333/ et
Ferris State University
50
50