Download Chap 4 Chemical Synhesis Sequencing and Amplification of DNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

DNA repair wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Maurice Wilkins wikipedia , lookup

RNA-Seq wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Exome sequencing wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Replisome wikipedia , lookup

DNA sequencing wikipedia , lookup

DNA vaccination wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Molecular evolution wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

DNA supercoil wikipedia , lookup

SNP genotyping wikipedia , lookup

Molecular cloning wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Community fingerprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Chap 4 Synthesis, Sequencing and Amplification of DNA
Why chemical synthesis of DNA? [1]
1.
Used as a probe for DNA detection (Hybridization and Southern blotting)
2.
Synthesized DNA for the assembly of genes or gene fragment (if the gene sequence is
difficult to clone)
3.
Introduce mutation (in vitro mutagenesis)
4.
Single-stranded oligonucleotides (17-30-mer) as primers for PCR and DNA sequencing
5.
ds sequences as linkers (two palindromic ss DNA that base pair to each other to form a
restriction site to help cloning of DNA fragments).
6.
Adapter sequence: allows cDNA to be cloned by blunt-end ligation and later excised
from the vector by a different RE allow the DNA to be transferred to and from
vectors.
1
Chemical Synthesis of oligonucleotides (by phosphoramidite method)

Has been automated by commercial DNA synthesizers (can be finished in 10 hr)

Oligonucleotides of specific sequence can also be ordered from suppliers at low cost

Skip the details
I. Synthesis of genes
For short genes (60-80 bp)

Synthesize two complementary strands and anneal (the process of heating (denature) and
slow cooling (renature)) dsDNA.
For larger genes (>300 bp)

The efficiency of correct synthesis of long oligo nt. is low.

Synthesize a set of partially overlapping, complementary oligonucleotides (20-60 nt
long).

The sequences of the oligonucleotides are designed to partially overlap, so they can
anneal as a ds array after annealing.
2
II. DNA sequencing

To determine the DNA sequence

Can be used to compare the similarity of genes (conserved regions), or to ensure accurate
cloning (or synthesis of genes).

Essential in molecular cloning because DNA sequence is needed to devise strategies
Sanger Method [2]: an enzymatic procedure, currently the method of choice

ddNTP: dideoxynucleoside triphosphate,
man-made molecule, blocks continued
synthesis of DNA chain because 3’-OH
group is replaced, phosphodiester bonds
won’t form with the next incoming dNTP.

Note: dNTP and NTP are for DNA and RNA synthesis (in vitro transcription).
3
Synthesize a primer (17-24mer) to a
predetermined segment of a cloning vector near
the cloned DNA, the primer is annealed to the
cloned DNA.
The reactions are carried out in 4 separate tubes
each with different ddNTP. In the tube w/ ddATP
whose concen. is carefully controlled so ddNTP is
incorporated at every possible site. The reaction
generates a set of DNA strands with different
lengths, each ends with a specific ddNTP.
Formamide is added to stop the synthesis and prevent DNA strands from base
pairing (denaturation is crucial otherwise mobility in the gel is changed)
Polyacrylamide gel electrophoresis
autoradiography
Read from bottom to top
3’
5’
4
Note:

The polyacrylamide gel is run in the presence of urea to prevent renaturation, gel is
usually thin and long.

Because the terminated DNA chains of different sizes are 32P labeled, the gel is exposed
to an X-ray film, only radiolabelled DNA fragments are shown. The label can be on the primer or on
the dNTP. If the radiolabel is on the dNTP, the radioactivity is labeled throughout their lengths, rather than at one end  more
radioactive less DNA is needed for sequencing.

Usually the primer sequence is positioned about 10-20 nt away from the insertion site of
the cloned DNA so one can recognize the known sequence at the start.

The accuracy can be improved by sequencing both strands of the DNA.

The bottom means the shortest strand, i.e. the base near the start point.
Capillary Electrophoresis (CE) and DNA Analyzer:

CE uses long capillary tubes to replace the slab (厚板) gelsCE has more efficient heat
dissipation than slab gels higher run voltage, faster run time and higher sample
throughput (new machines can run 96 tubes at one time)

Each terminator is labeled with a different dye, during the electrophoresis in the single
capillary tube (four terminators are included in one tube), the fluorescent dyes are excited
and the fluorescence is detected, the data is processed and analyzed.

Automation (invented by Leory Hood of Caltech): automated polymer gel filler, sample
injection, detection and analysiselimination of manual operations and increased run for
higher consistency and reliability (Craig Venter uses this technique to revolutionize the
DNA sequencing).

Higher sensitivity less DNA per sample
5

Application: Human Genome Project (HGP) made possible by the automation of the
sequencing method. In the HGP, up to 32 million fragments, each 500-600 bp, were
sequenced.

Next generation sequencing
III.PCR (Polymerase Chain Reaction) [3]

Invented by Karry Mullis in the mid-80s

Revolution enable to amplify the DNA (genes) to abundant quantity for analysis (no
need to clone and amplify by cell culture).
Requirements:
1.
Template
2.
4 dNTP
3.
Two synthetic primers (1030 nt) that are
complementary to the
regions on opposite strands
that flank the target
sequencethe two primers
specify the regions to
amplify
4.
Thermostable DNA pol that
can withstand heating to
95C (e.g. Taq DNA pol
from bacterium Thermus
aquaticus inhabiting in
water at 75C)

Three major steps:
denaturation,
annealing, extension
6

Normally 30-32 cycles are used amplify 230 ( 1 billion fold) the short desired
fragment is almost 100% of the entire population (in PCR, the percentages of original and long templates
will drop as the process progresses, so the end product is the DNA fragment specified by the primers) .

After n cycles 2n-2 (2.68108 fold for n=30) desired DNA fragments (1st and 2nd cycle
don’t generate the desired fragments), but in reality, usually 105-106 copies are obtained

Low efficiency is due to

Mismatch of primers (to generate wrong, shorter strand insufficient dNTP for
correct synthesis)

Repeated thermocyclingreduced enzyme activity
7

There are chances of misincorporation. Taq does not have the
proofreading activity, so variants such as Pfu or PowerTaq that
possess proofreading activity can be used.

Primer design should have a minimum 2 structure that prevent
annealing

Thermocycler is now common to provide precise control of
experiment conditions which may vary case by case.

The original template can be obtained from purified DNA or crude cell lysate, as long as
the primers are specific enough.
IV.
Applications of PCR [2]
1. Amplifying cloned DNA from vectors

To check the
accuracy of
cloning and the
amplified DNA
can be introduced
to other vectors.

PCR is run to
amplify the cloned
DNA using
primers flanking
the cloned insert.

The cloned DNA
insert is amplified
and may have
different lengths
8

The products can be further checked by DNA sequencing or a second set PCR using one
GSP and one flanking primer to verify.

2. Introducing unique restriction enzyme sites [1]
3. Creating a recombinant DNA molecule by sequential PCR amplifications [1]
9
10
4. Sequence and Ligation Independent Cloning (SLIC) [4]

LIC is a cloning method that makes use of annealing of single-stranded complementary
overhangs on the target vector and a PCR-generated insert of at least 12 bases. The
commercial InfusionTM system (Clontech) is based on the same principle and requires a
15-base overlap region. S.s. overhangs can be generated by using T4 DNA polymerase
and only one dNTP in the reaction mix, leading to an equilibrium of 3'->5'-exonuclease
and 5'->3'-polymerase activity at the site of the first
occurrence of this nucleotide.

The incubation is done with T4 DNA pol. for 30 min.
and stopped by adding dCTP to the reaction mix.
After annealing of vector and insert, the mixture is
used to transform E. coli.

Linear vector fragment is generated by PCR with single primer pair. Universal annealing
sites (3C-protease sequence at 5’-end, common 3‘-homology region in ccdB gene) added
to gene specific primers (one insert for all vectors).
5. Gibson Assembly

Gibson assembly is a molecular cloning method which allows for the joining of multiple
DNA fragments in a single, isothermal reaction.

The entire Gibson assembly reaction requires few components with minor manipulations.
The method can simultaneously combine more than ten DNA fragments based on
sequence identity. It requires that the DNA fragments contain ~20-40 base pair overlap
with adjacent DNA fragments. These DNA fragments are mixed with a cocktail of three
enzymes, along with other buffer components.
11

The three required enzyme activities are:
exonuclease, DNA polymerase, and DNA
ligase.

The exonuclease chews back DNA from
the 5' end. The resulting single-stranded
regions on adjacent DNA fragments can
anneal.

The DNA polymerase incorporates
nucleotides to fill in any gaps.

The DNA ligase covalently joins the DNA
of adjacent segments, thereby removing
any nicks in the DNA.

The entire mixture is incubated at 50 °C for up
to one hour. The resulting product is different
DNA fragments joined into one.
(see https://www.neb.com/protocols/2012/12/11/gibson-assembly-protocol-e5510?device=pdf
for more details)
5. Identification of an organism associated with a disease (e.g. enterovirus)

Design the primer based on the known sequence of the target DNA (e.g. a DNA sequence
of the virus).

Run PCR using the crude samples (requires only a small quantity) and the primers.

After PCR, a DNA fragment of a specific size that is equivalent to the lengths of target
DNA will be amplified only if the target DNA (i.e. the virus) is present in the sample.

Real-time PCR can be used to quantify the amount of the organisms.

Ex: to detect flu, we should know a certain conserved (or variant) region and design
primers. After PCR, we compare the lengths of the fragment with the lengths of the
known sequences rapid (conventional methods by growing organisms in culture or by
using MAb often take longer). The template DNA (or RNA) for PCR is often genomic
DNA (or RNA) extracted from the cells or viruses and don’t need further purification.
12
V. Directed Mutagenesis and Protein Engineering [3]

Sometimes the expressed proteins are not well suited for a specific application due to the
restrictions in physical and chemical properties.
 Obtain the protein (gene) from an organism that grows in an unusual environment.
e.g. if α-amylase is used at high T (in the industrial production of sugar), we may
isolate the gene from a bacterium that grows naturally at 90 °C.
 the protein expressed from the new gene is stable at high T.

Alternative: by mutagenesis and selection to specifically change the a.a. encoded, hoping
to improve the properties of the proteins such as:
1. Michaelis const. (Km, reflects the tightness of substrate binding), and the max rate of conversion (Vmax)
under defined conditions. (Enzyme kinetics: V=Vmax[S]/(Km+[S]), where [S] is
substrate concentration)
2. Thermal tolerance or pH stability of a protein  enable the protein to be used under special conditions
3. Requirement for a cofactor for certain continuous industrial production process.
4. Resistance to cellular proteases simplify purification, ↑ recovery.
 Very difficult to create a new protein, while it’s feasible to modify the existing properties
of a known protein.
 May need to change 2 or more a.a. which are far apart in the linear sequence but are in
proximity as a result of protein folding.
 3-D structures are important for prediction. Bioinformatics can help predict on the basis of
deduced a.a. sequence 
simplifying the task of producing a
protein.
 To date, directed mutagenesis is a trial
and error process. Typically a library
of proteins is generated and screened
for the desired change.
13
PCR-Amplified Oligonucleotide-Directed Mutagenesis
1. Target gene is cloned into a plasmid vector and dispensed into 2 tubes.
2. Two primers are added to each tube. One primer (e.g. 1 and 3) is completely
complementary to a sequence within or adjacent to the cloned gene except for one nt. The
primers (1 and 3) w/ the nt change anneal to opposite strands  so both nt of a specific base pair are targeted.
3. After PCR, linear DNA is synthesized, how to join both ends? The positioning of the
hybridization regions of the primers is such that, after PCR, the amplified DNA have
different ends. ∵different positioning of the ends, a strand (e.g. 1) from one tube
hybridizes w/ its complementary strand (e.g. 3) from the other reaction tube to form
circular DNA w/ two nicks.
 This procedure introduces a specific point mutation into a cloned gene w/o the need to
insert the cloned gene to M13.
Random Mutagenesis with Degenerate Oligonucleotide Primers

What happens in reality is that usually which a.a. needs to be modified is unknown 
generate all the possible a.a. changes at one site.
14
Ex: chemical synthesis of oligo primers w/ any of the 4 nt at defined positions.
The soln’ containing G also contains a few % of other 3, ∴ the resultant primer is a
heterogeneous set, which will generate a series of mutations that are clustered in a defined
portion of the target gene.

Two advantages:
1. No need to know the exact role of a particular a.a. in the function of the protein.
2. Unexpected mutant proteins w/ novel and useful properties may be generated, ∵ the
introduced changes are not limited to one a.a.
15
* if the desired properties are not found
repeat the procedures w/ a set of primers
that’s complementary to a different region.
Protein Engineering

20 of the many thousand enzymes account
for 90 % of the enzymes currently used in
industry.
Why are other enzymes not used?
 Their activities evolved from natural
conditions are not well suited for in vitro
functions in industry (e.g. high T,
high P). Although thermo-tolerant
enzyme can be isolated from
thermophilic microorganism, these
organisms often lack the particular
enzyme protein engineering by
directed mutagenesis and gene
cloning.
Ex: Adding S-S bonds (between
cysteine –CH2-SH)
 ↑ the stability (may not unfold readily  ↑ resistance to organic solvents and extremes of
pH…)
 Q: Does extra S-S perturb the normal function of a protein?
References:
[1]
[2]
[3]
Ausubel F, Brent R, Kingston R, Moore D, Seidman J, Smith J, et al. Short protocols
in molecular biology. New York: John Wiley & Sons, 1999.
Watson J, Gilman M, Witkowski, J Z, M. Recombinant DNA. New York: W.H.
Freeman and Co., 1992.
Glick B, Pasternak J. Molecular Biotechnology: Principles and Applications of
Recombinant DNA. Washington, D.C.: ASM Press, 2003.
16
[4]
Scholz J, Besir H, Strasser C, Suppmann S. A new method to customize protein
expression vectors for fast, efficient and background free parallel cloning. BMC
Biotechnology 2013; 13:1-11.
Appendix:
Next Generation Sequencing (NGS)

Massively parallel sequencing technique, generating genome-scale data sets 
improving the efficiency from kilobases per run (CE) to gigabases per run, and even to
terabases in a single sequencing run. (Table 1)

The basic principle of Illumina NGS:
1. Library preparation: random fragmentation of the DNA sample followed with 5’ and
3’ adapter ligation  specific adapter sequence designed to hybridize on the surface
of the flow cell, to provide sequencing primer
Genomic DNA
Fragmentation
annealing, and containing the index sequence
for multiplexing sequencing
Adapters
2. Cluster amplification: the library annealed onto
Ligation
the surface-bound oligos complementary to the
adapters, followed with bridge amplification to
amplify distinct, clonal clusters
Bridge amplification &
clonal cluster
formation
17
Sequencing library
3. Sequencing: sequencing by synthesis (SBS) technology using 4 fluorescently labeled
dNTPs  dNTPs are incorporated and the fluorescence is detected at the same time
in every synthesizing cycle
Incorporate 
4. Data analysis: sequencing reads are aligned to a reference sequence with
bioinformatics software  identify differences between reference genome and newly
sequenced read, such as single nucleotide polymorphism (SNP) or insertion-deletion
(indel)
 Detailed animation of SBS sequencing: http://www.illumina.com/SBSvideo

Advances in NGS technology:
Pair-end sequencing: sequencing in both end  after read 1 (P5 linearized) finished,
another round of bridge amplification has been performed and P7 site has been cut this
time, and read 2 (opposite read 1) has been sequenced by SBS.
 more accurate alignment, especially across difficult-to-sequence, repetitive regions of
the genome
Multiplexing: adapters contain unique index sequences  multiple DNA libraries allow
to be pooled and sequenced simultaneously
 dramatically reduce the time for multiple-sample studies
18
Comparison of next-generation sequencing methods (Table 1)
Method
Accuracy
Read
(single Reads per
length read not
run
consensus)
Time per
run
Single-molecule
10,000
50,000 per
87%
real-time
bp to
SMRT cell, 30 minutes
single-read
sequencing
15,000
or 500-1000 to 4 hours
accuracy
(Pacific
bp avg
megabases
Biosciences)
Ion
semiconductor
(Ion Torrent
sequencing)
up to
98%
400 bp
Pyrosequencing
700 bp 99.90%
(454)
Sequencing by
synthesis
(Illumina)
50 to
300 bp
99.90%
50+35
Sequencing by
or
99.90%
ligation (SOLiD
50+50
sequencing)
bp
Chain
termination
(Sanger
sequencing)
400 to
99.90%
900 bp
up to 80
million
1 million
2 hours
24 hours
Cost per 1
million
bases (in
US$)
Advantages
Disadvantages
Longest read
length.
Moderate
throughput.
Equipment can
be very
expensive.
$1
Less expensive
equipment.
Fast.
Homopolymer
errors.
$10
Long read size.
Fast.
Runs are
expensive.
Homopolymer
errors.
$0.13–
$0.60
1 to 11
days,
up to 6
depending
billion
upon
(TruSeq
sequencer
paired-end)
and
specified
read length
Potential for
high sequence Equipment can
yield,
be very
$0.05 to depending upon
expensive.
$0.15
sequencer
Requires high
model and
concentrations of
desired
DNA.
application.
1.2 to 1.4
1 to 2 weeks
billion
$0.13
Slower than other
methods. Has
Low cost per
issues sequencing
base.
palindromic
sequences.
$2,400
More expensive
and impractical
for larger
Long individual
sequencing
reads. Useful
projects. This
for many
method also
applications. requires the time
consuming step
of plasmid
cloning or PCR.
N/A
20 minutes
to 3 hours
19

Next-generation sequencing applies to genome sequencing, genome resequencing,
transcriptome profiling (RNA-Seq), DNA-protein interactions (ChIP-sequencing), and
epigenome characterization. Resequencing is necessary, because the genome of a single
individual of a species will not indicate all of the genome variations among other
individuals of the same species.

Nanopore sequencing
20