Download Section 3 - DNA Sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Mutagen wikipedia , lookup

DNA wikipedia , lookup

Oncogenomics wikipedia , lookup

DNA repair wikipedia , lookup

Genetic engineering wikipedia , lookup

Minimal genome wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Pathogenomics wikipedia , lookup

DNA barcoding wikipedia , lookup

DNA profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

DNA polymerase wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Point mutation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nucleosome wikipedia , lookup

Genome evolution wikipedia , lookup

Replisome wikipedia , lookup

Designer baby wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

DNA vaccination wikipedia , lookup

Human genome wikipedia , lookup

Gene wikipedia , lookup

Primary transcript wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

SNP genotyping wikipedia , lookup

Genealogical DNA test wikipedia , lookup

DNA sequencing wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Microevolution wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

DNA supercoil wikipedia , lookup

Epigenomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Non-coding DNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Microsatellite wikipedia , lookup

Genome editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Metagenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomics wikipedia , lookup

Transcript
Reading the blueprint of life
DNA sequencing
Introduction
• The blueprint of life is contained in the DNA in
the nuclei of eukaryotic cells and simply
within prokaryotic cells.
• Human genome project – just obtain the list
of approximately 3x109 bases (As, Cs, Gs
and Ts) in the 23 chromosomes.
• Extraction of useful information from this list
and genome sequence of other organisms
relies on computer-intensive data handling –
Bioinformatics.
Sequencing
• The DNA from the genome is chopped into
bits- whole chromosomes are too large to
deal with, so the DNA is broken into
manageably-sized overlapping segments.
• The DNA is amplified by cloning into bacteria
(PCR, see later, doesn’t produce enough and
requires sequence information for the
primers).
• It is then denatured (ie. melted), so that the
two strands split apart.
Sequencing- continued
• Denatured DNA is added to reaction mix with:
– a primer (to start complementary pairing),
– DNA polymerase
– nucleotides including special ones called
dideoxynucleotides. These special nucleotides
do not allow further nucleotides to be added to the
chain. So in a mix with dideoxy-A, every time a
dideoxy-A is added (small proportion of As), the
reaction ends. This results in different length
fragments. The dideoxynucleotides are
fluorescently tagged.
• Fragments can be separated out on a gel by
electrophoresis and their length calculated.
Working out DNA sequence ~ jigsaw puzzle.
DNA sequencing –
preparation
• In order to sequence a piece of DNA, first
need to amplify it. This is sometimes done by
a process called polymerase chain reaction
(PCR).
• PCR: The necessary ingredients for DNA
replication are 1) the DNA itself, 2) DNA
polymerase, 3) free nucleotides and 4)
primers - Place all these in a test tube.
DNA amplification
- PCR
Step 1 – heat to c. 95°C for 30s – this
denatures the DNA and unzips the two
strands
Step 2 – cool to c. 55°C for 20s, this causes the
primer to bind to the DNA
Step 3 – heat to c. 72°C for a minute per kb
(kilobase)– this allows the polymerase to
catalyse the addition of free nucleotides to
the primer, replicating the DNA.
• So in two minutes a c. 1kb piece of DNA is
replicated. Repeat for a few hours  a million
copies.
DNA Amplification
- cloning
• An alternative to PCR is to insert the
piece of DNA into the DNA of a
bacterium. Replicating the bacterium
thus replicates the DNA.
• Cf. recombinant DNA technology
Sequencing using gel
electrophoresis
• Here is a gel with 28 DNA samples: green
bands represent A, blue C, yellow G and red
T.
• Small molecules move faster.
Sequence assembly using
mapping
• Originally sequencing was performed by cutting the
chromosomes into large pieces which were cloned
into bacteria, creating a whole library of DNA
segments. The segments were cut open to look for
common sequence landmarks in overlapping
fragments. These were used to fingerprint the
fragments, so that it was known where in the
chromosome the fragment was- this is called
mapping. The fragments were cut into smaller pieces
and the process repeated and the small fragments
were sequenced. Finally the whole sequence is
known (in terms of short fragments and their
locations on the chromosome).
Shotgun sequencing
• Shotgun sequencing dispenses with the need for
mapping and so is much faster. It involves chopping
the DNA into fragments of size c. 2000 base pairs
(bps) and 10000 bps, sequencing the first and last
500 bps of each fragment. It then uses computer
algorithms to assemble the entire sequence from the
sequenced fragments.
Speed and accuracy of
sequencing
• Shotgun sequencing is much faster- it took a matter
of months to obtain a draft sequence of the fruit-fly,
Drosophila Melanogaster (135Mbps), when the
state-funded conventional sequencing effort had
taken several years to achieve a similar level of
completion.
• BUT assembly of pieces, in eg. the human (3x109
bps), requires very powerful computers
• AND repetitive DNA, which is common in eukaryotic
genomes, causes great difficulties in the assembly
process – may get it wrong.
Acquisition of sequence data
• Genomes must be sequenced several times over on
average, both to ensure complete coverage of the
genome is achieved, and because sequencing data
is somewhat error-prone.
• Increases in the efficiency of sequencing have led to
a year on year increase in the rate of new sequence
data acquisition:
3200
0
http://www2.ebi.ac.uk/genomes/mot/index.html
Statistics of genome
sequences
Statistics can be global or local:
• Base composition of genomes:
• Bacteria (E. coli): 25% A, 25% C, 25% G,
25% T
• Mosquito (P. falciparum): 82%A+T
• Human: 59%A+T
• Translation initiation:
• ATG is the near universal motif (codon)
indicating the start of translation in DNA
coding sequence.
Databases of sequence
information
• Internet has become a vital resource in
making sequence data generally available to
the biological community at large.
• Examples:
GenBank (www.ncbi.nlm.nih.gov/Genbank),
EMBL (www.ebi.ac.uk/embl),
DDBJ (www.ddbj.nig.ac.jp).
• Used for: gene prediction, protein structure/
function prediction, homology searching
Extracting important
information
• The most important parts of the genome are the
genes.
• Efforts have been made to identify genes out of
sequence data.
• Expressed sequence tags (ESTs) are short pieces
of sequence data that correspond to mRNAs found in
cells of the organism.
• ESTs are produced by purifying mRNA from cells and
then using an enzyme called reverse transcriptase
to convert these to copy DNA (cDNA). The DNA is
then cloned in bacteria and sequenced.
• The sequence obtained is usually only short (c. 700
base pairs) and may not be very accurate, but ESTs
still provide very useful information.
Gene prediction
• A weakness of ESTs is that it is very difficult
to obtain them for genes which are expressed
at a low level/ only under certain conditions,
also slow, so
• People try to predict where in sequence the
genes are.
• In prokaryotes, just look for long stretches of
DNA without stop codon in any of the 6
reading frames.
Open reading frames
• There are 6 reading frames, 3 forwards:
5'
3'
atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa
1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa
M
P
K
L
N
S
V
E
G
F
S
S
F
E
D
D
V
*
2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat
C
P
S
*
I
A
*
R
G
F
H
H
L
R
T
M
Y
3
gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata
A
Q
A
E
*
R
R
G
V
F
I
I
*
G
R
C
I
• And 3 backwards (on the other strand). A frame is
said to be open if it contains long stretches without a
stop codon.
• [Lower lines are single-letter amino acid codes,
*=stop.]
Gene prediction in eukaryotes
•
In bacteria, open reading frames (ORFs)
are pretty much enough to indicate genes,
but in eukaryotes finding genes is more
complicated, because
1. Eukaryotic DNA is roughly 97-98%
noncoding- in such a large amount, ORFs
may exist by chance.
2. Eukaryotic DNA contains introns, so finding
the start and the end of a gene is not
enough- also have to find which bits
(introns) to edit out of sequence. Also
introns break up open reading frames.
Introns - reminder
• Mentioned in “Introduction to Molecular
Biology”
• These are pieces of DNA within genes, which
are transcribed but then spliced out of the
RNA before it is translated.
• They make it much harder to find genes,
since finding open reading frames is not
enough, you also need to find where introns
and exons start and end.
Conclusions
• Sequencing DNA involves:
– Amplifying it by PCR or cloning
– Chopping it up into manageable bits
– Replicating it with fluorescently-tagged
dideoxynucleotides
– Running the different length fragments on a gel
and reading this
– Assembling the pieces (sequences of manageable
bits).
• Shotgun sequencing is faster than mappingbased assembly methods, but can have
accuracy problems.
Conclusions
• Sequence data is stored in online databases
• Extracting useful information and patterns
from such data is part of bioinformatics and
often employs intelligent systems techniques.
Next block of lectures
• History of genomics
• Introduction to bioinformatics
• More on gene prediction