Download DNA SEQUENCING DNA sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mutation wikipedia , lookup

Genome evolution wikipedia , lookup

Transcriptional regulation wikipedia , lookup

DNA barcoding wikipedia , lookup

DNA repair wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

RNA-Seq wikipedia , lookup

Exome sequencing wikipedia , lookup

Maurice Wilkins wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Molecular evolution wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Replisome wikipedia , lookup

DNA sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

SNP genotyping wikipedia , lookup

DNA supercoil wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Genomic library wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
Next Generation sequencing and Gene Annotation
Ms. Shivani Bhagwat
Lecturer,
School of Biotechnology
DAVV
DNA SEQUENCING
DNA sequencing includes several methods and technologies that are used for
determining the order of the nucleotide bases—adenine, guanine, cytosine, and
thymine—in a molecule of DNA.
The first DNA sequences were obtained in the early 1970s by academic researchers
using laborious methods based on two-dimensional chromatography.
Maxam–Gilbert sequencing
DNA sequencing method based on chemical modification of DNA and subsequent
cleavage at specific bases.
The method requires radioactive labeling at one 5' end of the DNA (typically by a kinase
reaction using gamma-32P ATP) and purification of the DNA fragment to be sequenced.
 Chemical treatment generates breaks at a small proportion of one or two of the four
nucleotide bases in each of four reactions (G, A+G, C, C+T).
 the purines (A+G) are depurinated using formic acid, the guanines (and to some extent
the adenines) are methylated by dimethyl sulfate, and the pyrimidines (C+T) are
methylated using hydrazine.
The addition of salt (sodium chloride) to the hydrazine reaction inhibits the methylation
of thymine for the C-only reaction.
The modified DNAs are then cleaved by hot piperidine at the position of the modified
base.
Thus a series of labeled fragments is generated, from the radiolabeled end to the first
"cut" site in each molecule.
The fragments in the four reactions are electrophoresed side by side in denaturing
acrylamide gels for size separation.
To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a
series of dark bands each corresponding to a radiolabeled DNA fragment, from which the
sequence may be inferred.
NOTE: Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity
prohibiting its use in standard molecular biology kits, extensive use of hazardous
chemicals, and difficulties with scale-up.
Chain-termination methods
The key principle of the Sanger method was the use of dideoxynucleotide triphosphates
(ddNTPs) as DNA chain terminators.
The classical chain-termination method requires a single-stranded DNA template, a DNA
primer(labelled ), a DNA polymerase, normal deoxynucleotidetriphosphates (dNTPs), and
modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation.
The DNA sample is divided into four separate sequencing reactions, containing all four of
the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase.
To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP,
or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-OH group required for
the formation of a phosphodiester bond between two nucleotides, thus terminating DNA
strand extension and resulting in DNA fragments of varying length.
The newly synthesized and labelled DNA fragments are heat denatured.
Separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a
denaturing polyacrylamide-urea gel with each of the four reactions run in one of four
individual lanes (lanes A, T, G, C).
The DNA bands are then visualized by autoradiography or UV light, and the DNA
sequence can be directly read off the X-ray film or gel image.
NOTE:Limitations include non-specific binding of the primer to the DNA, affecting
accurate read-out of the DNA sequence, and DNA secondary structures affecting the
fidelity of the sequence.
Dye-terminator sequencing
Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which
permits sequencing in a single reaction, rather than four reactions as in the labelledprimer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain
terminators is labelled with fluorescent dyes, each of which emit light at different
wavelengths.
Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384
DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out
capillary electrophoresis for size separation, detection and recording of dye
fluorescence, and data output as fluorescent peak trace chromatograms.
Base calling software typically gives an estimate of quality to aid in quality trimming.
Massively parallel signature sequencing(MPSS)
Was in 1990s and a bit complicated.
It is a sequence based approach that can be used to identify and quantify mRNA
transcripts present in a sample similar to serial analysis of gene expression (SAGE)
but the biochemical manipulation and sequencing approach differ substantially.
mRNA transcripts to be identified through the generation of a 17-20 bp (base pair)
signature sequence adjacent to the 3’-end.
Each signature sequence is cloned onto one of a million microbeads. The technique
ensures that only one type of DNA sequence is on a microbead.
The microbeads are then arrayed in a flow cell for sequencing and quantification.
fluorescently labeled encoders would be used to decode the sequence.
Pyrosequencing Technology
Developed by 454 Life Sciences, which has since been acquired by Roche
Diagnostics.
Based on emulsion PCR technology and detection of pyrophosphate release
on nucleotide incorporation.
ssDNA template is hybridized to a sequencing primer and incubated with the
enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase, and with
the substrates adenosine 5´ phosphosulfate (APS) and luciferin.
The addition of one of the four deoxynucleotide triphosphates (dNTPs) initiates the
second step. DNA polymerase incorporates the correct, complementary dNTPs
onto the template. This incorporation releases pyrophosphate (PPi).
ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5´
phosphosulfate. This ATP acts as fuel to the luciferase-mediated conversion of
luciferin to oxyluciferin that generates visible light in amounts that are proportional
to the amount of ATP.
Unincorporated nucleotides and ATP are degraded by the apyrase, and the
reaction can restart with another nucleotide.
Emulsion PCR (ePCR)
PCR amplification
Sequential nucleotide addition
Light reaction
Sequencing by Synthesis technology(SBS)
Developed by Solexa and sequencing technology based on reversible dyeterminators and bridge PCR.
The combination of short inserts and longer reads increase the ability to fully
characterize any genome.
DNA molecules are first attached to primers on a slide and amplified so that
local clonal colonies are formed (bridge amplification). Four types of
reversible terminator bases (RT-bases) are added, and non-incorporated
nucleotides are washed away. Unlike pyrosequencing, the DNA can only be
extended one nucleotide at a time. A camera takes images of the
fluorescently labelled nucleotides, then the dye along with the terminal 3'
blocker is chemically removed from the DNA, allowing the next cycle.
 Reversible dye terminators: 3’-end has a protection group that can be
reverted to a hydroxyl group once it has been incorporated in the growing
DNA chain.
Sequencing by ligation technology
Developed by Applied Biosystems SOLiD .
Sequencing by ligation relies upon the sensitivity of DNA ligase for base-pairing
mismatches.
The target molecule to be sequenced is a single strand of unknown DNA
sequence, flanked on at least one end by a known sequence. A short "anchor"
strand is brought in to bind the known sequence.
A mixed pool of probe oligonucleotides is then brought in (8 or 9 bases long),
labeled (typically with fluorescent dyes) according to the position that will be
sequenced.
These molecules hybridize to the target DNA sequence, next to the anchor
sequence, and DNA ligase preferentially joins the molecule to the anchor when
its bases match the unknown DNA sequence. Based on the fluorescence
produced by the molecule, one can infer the identity of the nucleotide at this
position in the unknown sequence.
VisiGen Biotechnologies approach
VisiGen Biotechnologies introduced a specially engineered DNA polymerase
for use in their sequencing.
This polymerase acts as a sensor - having incorporated a donor fluorescent dye
by its active centre. This donor dye acts by FRET (fluorescent resonant energy
transfer), inducing fluorescence of differently labeled nucleotides.
This approach allows reads performed at the speed at which polymerase
incorporates nucleotides into the sequence (several hundred per second).
The nucleotide fluorochrome is released after the incorporation into the DNA
strand.
The expected read lengths in this approach should reach 1000 nucleotides,
however this will have to be confirmed.
Nanopore sequencing technology
Developed by Helicose Biosciences.
This method is based on the readout of electrical signal occurring at
nucleotides passing by alpha-hemolysin pores covalently bound with
cyclodextrin.
The DNA passing through the nanopore changes its ion current. This change is
dependent on the shape, size and length of the DNA sequence. Each type of the
nucleotide blocks the ion flow through the pore for a different period of time.
The method has a potential of development as it does not require modified
nucleotides, however single nucleotide resolution is not yet available.
Emulsion PCR
The single-stranded DNA fragments or templates are attached to the surface of
beads using adaptors or linkers, and one bead is attached to a single DNA
fragment from the DNA library.
The DNA library is generated through random fragmentation of the genomic DNA.
The surface of the beads contains oligonucleotide probes with sequences that are
complementary to the adaptors binding the DNA fragments.
After that, the beads will be compartmentalized into separate water-oil emulsion
droplets.
In the aqueous water-oil emulsion, each of the droplets capturing one bead will
serve as a PCR microreactor for amplification steps to take place and produce
clonally amplified copies of the DNA fragment.
Bridge amplification on solid surface
High-density forward and reverse primers are covalently attached to the slide in a
flow cell. The ratio of the primers to the template on the support defines the surface
density of the amplified clusters.
The flowcell is exposed to reagents for polymerase-based extension, and priming
occurs as the free/distal end of a ligated fragment "bridges" to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized amplification of DNA
fragments in millions of unique locations across the flow cell surface. Solid-phase
amplification can produce 100–200 million spatially separated template clusters
(Illumina/Solexa), providing free ends to which a universal sequencing primer can
be hybridized to initiate the NGS reaction.
Single-molecule templates
Some of the clonally amplified methods protocols are cumbersome to implement
and require a large amount of genomic DNA material (3–20 μg).
The preparation of single-molecule templates is more straightforward and
requires less starting material (<1 μg).
More importantly, these methods do not require PCR, which creates mutations
in clonally amplified templates that masquerade as sequence variants.
AT-rich and GC-rich target sequences may also show amplification bias in
product yield, which results in their under representation in genome alignments
and assemblies.
Single molecule templates are usually immobilized on solid supports using one
of at least 3 different approaches:
1. Spatially distributed individual primer molecules are covalently attached to the
solid support. The template, which is prepared by randomly fragmenting the
starting material into small sizes (for example,~200–250 bp) and adding
common adaptors to the fragment ends, is then hybridized to the immobilized
primer
2. Spatially distributed single-molecule templates are covalently attached to the
solid support by priming and extending single-stranded, single-molecule templates
from immobilized primers. A common primer is then hybridized to the template. In
either approach, DNA polymerase can bind to the immobilized primed template
configuration to initiate the NGS reaction.
Both of the above approaches are used by Helicos BioSciences.
3. Spatially distributed single polymerase molecules are attached to the solid
support, to which a primed template molecule is bound. Larger DNA molecules
(up to 10,000 bp) can be used with this technique .
This approach is used by Pacific Biosciences.
GENE ANNOTATION
What is Annotation???
Extraction, definition, and interpretation of features on the genome sequence
derived by integrating computational tools and biological knowledge.
DNA Analysis
-- Find the genes
– Heuristic signals
– Inherent features
– Intelligent methods
Characterize each gene
– Compare with other genes
– Find functional components
– Predict features
Heuristic Signals
DNA contains various recognition sites for internal machinery like:
• Promoter signals
• Transcription start signals
• Start Codon
• Exon, Intron boundaries
• Transcription termination signals
Inherent Features
DNA exhibits certain biases that can be exploited to locate coding regions
• Uneven distribution of bases
• Codon bias
• CpG islands
• Encoded amino acid sequence
• Imperfect periodicity
• Other global patterns
Intelligent Methods
Pattern recognition methods weigh inputs and predict gene location
– Content-based methods
– Site-based methods
– Comparative methods
• Neural Networks
• Hidden Markov Models
neural network was traditionally used to refer to a network or circuit of biological
neurons. The modern usage of the term often refers to artificial neural networks,
which are composed of artificial neurons or nodes.
A hidden Markov model (HMM) is a statistical Markov model in which the
system being modeled is assumed to be a Markov process with unobserved
(hidden) states. An HMM can be considered as the simplest dynamic Bayesian
network.
Looks at several structural features
– Splice donor/acceptor sites
– Putative coding regions
– Intronic regions
– Linear discriminant analysis to split exon / non-exon classes
– Dynamic programming to assemble best gene structure
Quadratic discriminant analysis
– Exon length
– Exon-intron transitions
– Splice sites
– Branch sites
– Exon, strand, frame scores
– Detects internal exons
Strategies
• Select by correlation coefficient
• Select by review paper
• Select by recommendation
• Use them all
Internet Resources
Banbury Cross
FGENEH
GeneID
GeneMachine
GENSCAN
Genotator
GRAIL
GRAIL-EXP
MZEF
PROCRUSTES
RepeatMasker
HMMgene
http://igs-server.cnrs-mrs.fr/igs/banbury
http://genomic.sanger.ac.uk/gf/gf.shtml
http://www1.imim.es/geneid.html
http://genome.nhgri.nih.gov/genemachine
http://genes.mit.edu/GENSCAN.html
http://www.fruitfly.org/_nomi/genotator/
http://compbio.ornl.gov/tools/index.shtml
http://compbio.ornl.gov/grailexp
http://www.cshl.org/genefinder
http://www-hto.usc.edu/software/procrustes
http://ftp.genome.washington.edu/RM/RepeatMasker.html
http://www.cbs.dtu.dk/services/HMMgene
http://www.wiley.com/legacy/products/subject/life/bioinformatics/chapterlinks.html
Characterize a Gene
Collect clues for potential function
• Comparison with other known genes, proteins
• Predict secondary structure
• Fold classification
• Gene Expression
• Gene Regulatory Networks
• Phylogenetic comparisons
• Metabolic pathways