Download DNA sequencing

Document related concepts

Gel electrophoresis of nucleic acids wikipedia , lookup

Mutation wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Copy-number variation wikipedia , lookup

DNA vaccination wikipedia , lookup

Primary transcript wikipedia , lookup

Polyploid wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

SNP genotyping wikipedia , lookup

Chromosome wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Oncogenomics wikipedia , lookup

Neocentromere wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA sequencing wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

RNA-Seq wikipedia , lookup

Point mutation wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Human genetic variation wikipedia , lookup

DNA supercoil wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Epigenomics wikipedia , lookup

Transposable element wikipedia , lookup

Gene wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

NUMT wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Minimal genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Metagenomics wikipedia , lookup

Microsatellite wikipedia , lookup

Whole genome sequencing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Human genome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomic library wikipedia , lookup

Genome editing wikipedia , lookup

Genomics wikipedia , lookup

Transcript
Lecture of Principles of gene engineering 2008.
4/28
An overview of genomics:
From the basis of molecular cloning to the genome sequencing
projects (Human Genome Project).
Dr. Jin-Mei Lai
[email protected]
1
So far in this course you have learned how to clone and
identify the gene of your interest.
cDNA  PCR
A
A
A
A
DNA polymerase (Taq)
A
A
+
A
A
ligation
Amp+ plate
(+ X-gal & IPTG)
ampR
selection
Inserted DNA disrupts lac Z’ gene
Blue colony:
non-recombinant
2
White colony:
recombinant
How can I get the sequence
of specific gene?
 Design specific primers
Nick translation
(search information from gene database)
 PCR
(using cDNA or cDNA library
as template)
 cloning into expression vector
* Make a cDNA library
3
Generate cDNA from
tissues or cell lines.
 isolation of total RNA
 purification the mRNA by
oligo-dT column
 reverse transcription
 PCR amplification
RT-PCR
4
Why we should use cDNA?
5
What is a gene ?
- An open reading frame
+
Its transcriptional control elements
(promoter and terminator)
6
Fig. 1.22
The differences of gene expression in prokaryotes and eukaryotes.
7
How can we study the unknown genes?
Fully understanding the genome we studied will improve
to identify and investigate the unknown genes.
* Genome sequencing projects!
1. Genomic mapping.
2. Genetic mapping.
3. Physical mapping.
4. Nucleotide or
Genome sequencing.
8
8
Genomic mapping
The chromosome content of an organism (its
karyotype) can be visualized using a microscope.
shorter arm
~ Different chromosomes are usually different
sizes (ranging in the human from 279x106 bp for
chromosome 1 to 45x106 bp for chromosome 21).
~ distinct chromosome banding patterns.
(Giemsa stain)
longer arm
Cytological map (low resolution)
9
Some chromosome abnormalities that cause inherited
genetic diseases can be observed by karyotype analysis.
e.g. Down’s Sydrome (trisomy 21)
Klinefelter’s syndrome (47XXY)
* Cystic fibrosis  chromosome 7q31; CFRT
10
* Fluorescence in situ hybridization (FISH)
~ a kind of in situ hybridization
in situ: in place
DNA probes: radioactively labeled
fluorescence labeled (now)
Low resolution:
less than 3 Mbp
Yellow: satellite DNA in centromere
11
Genetic mapping
~ is a representation of the distance between two DNA
elements based upon the frequency at which recombination
occurs between the two.
* The first genetic map of a chromosome:
~ from Drosophila mating crosses data
The information gained from the experimental crosses could be
used to plot out the location of genes.
 Tightly linked genes are physically located close to each
other, while those that were only weakly linked are
physically further apart.
12
A centimorgan (cM) is defined as the distance between two loci
on a genetic map. A cM is a measure of genetic distance and not
physical distance. Look closely at the diagram. If two loci are
far apart, it is possible to miss that a double cross-over
occurred.
13
* Major drawbacks for genetic mapping
~ The requirement for a phenotype for the gene that is
being mapped and the number of crosses required to
generate accurate mapping data.
~ A tacit assumption of mapping based on crosses is
that the recombination frequency is equal for all part
of the chromosome.
Except recombinational hot-spots and cold-spots
In human, relatively low number of genes have
been identified, hence difficult to estimate
map distances.
14
* An alternative to genetic mapping using phenotypes
is to follow the inheritance of DNA sequence
variations between individuals.
Though more than 99% of human DNA sequences are the same
across the population.
~ still a huge numbers of variations in DNA sequence
between individuals.
Several methods used to exploit the inheritance
of the variations to map genomic location.
Ex.
1. Single-nucleotide polymorphisms.
2. Variable number tandem repeats (VNTRs).
3. Microsatellites.
15
1. Single-nucleotide polymorphisms (SNPs).
~ the most common types of sequence
variation between individuals.
~ occur as frequently as about once every 100300 bp
What kinds of genome variations are there?
Genome variations include mutations and polymorphisms.
Technically, a polymorphism (a term that comes from the Greek
words "poly," or "many," and "morphe," or "form") is a DNA
variation in which each possible sequence is present in at least 1%
of people. For example, a place in the genome where 93 percent of
people have a T and the remaining 7 percent have an A is a
polymorphism. If one of the possible sequences is present in less
than 1 percent of people (99.9 percent of people have a G and 0.1
percent have a C), then the variation is called a mutation.
Informally, the term mutation is often used to refer to a harmful
genome variation that is associated with a specific human disease,
while the word polymorphism implies a variation that is neither
harmful nor beneficial. However, scientists are now learning that
many polymorphisms actually do affect a person's characteristics,
though in more complex and sometimes unexpected ways.
17
About 90 percent of human genome variation comes in the
form of single nucleotide polymorphisms, or SNPs
(pronounced "snips"). As their name implies, these are
variations that involve just one nucleotide, or base.
~ frequency: once every 100-300 bp
~ may be “disease causing mutations”
occur in non-coding regions of DNA
some alter the restriction enzyme recognition sites.
Restriction fragment length polymorphisms (RFLPs)
(detected by Southern blotting using a radioactive DNA
probe)
18
* Restriction fragment length polymorphisms (RFLPs)
Southern blotting
19
20
RFLP (Restriction Fragment Length Polymorphisms)
21
Highly repeated DNA sequences.
--- short, arranged in tandem.
1. Satellite DNAs
~ consist of short sequences that form very large clusters.
ex. satellite DNA in centromere
2. Minisatellite DNAs
~ range from 12 to 100 base pairs in length and are found in
clusters containing as many as 3000 repeats.
* unstable, the copy number often changes from one
generation to the next.
(polymorphic  apply to DNA fingerprinting)
3. Microsatellite DNAs
~ shortest and are present in small clusters of about 10~40
bps in length
22
2. VNTR stands for "variable number of tandem repeats"
A tandem repeat is a short sequence of DNA that is repeated in a
head-to-tail fashion at a specific chromosomal locus. Tandem
repeats are interspersed throughout the human genome.
Some sequences are found
at only one site -- a single
locus -- in the human
genome. For many tandem
repeats, the number of
repeated units vary
between individuals. Such
loci are termed VNTRs.
23
Think …
One VNTR in humans is a 17 bp sequence of DNA repeated
between 70 and 450 times in the genome. The total number of
base pairs at this locus could vary from 1190 to 7650.
VNTRs are detected as RFLPs by Southern Hybridization.
24
Minisatellite sequences are used to identify individuals in criminal
or paternity cases through the technique of DNA fingerprinting.
criminal case:
V: victim
D: defendant
25
3. Microsatellites.
~ are short, 2-6 bp, tandemly repeated sequences that occur in
a random fraction distributed throughout the genome.
~ generated by polymerase “slippage” during replication.
The most common
type is 5’-AC-3’
26
Physical mapping
~ the physical map of a genome is a map of genetic markers made
by analyzing a genomic DNA sequence directly, rather than
analysing recombination events.
1. Restriction maps
2. Radiation hybrid maps
3. STS maps
Ex 1 . NotI recognition sequence (5’-GCGGCCGC-3’)
NotI would be expected to occur, by chance, every
48=65536 bp
however, it cleaves human DNA on average once every 10 Mbp
Why?
The DNA sequence within the genome is not random!
restriction mapping does provide highly reliable fragment
ordering and distance estimation
27
Radiation hybrids
Whole-genome radiation hybrids
RH maps
are constructed
by typing
a panel of hybrids
with a set of
human DNA markers
Only a PROPORTION of the pieces
of the broken human chromosomes
will integrate into rodent chromosomes
Ex 3. STS maps.
~ STSs (sequence tagged sites) are short DNA sequences (100-200
base pairs) that were generated by PCR using primers based on
already known DNA sequences.
~ have been sequenced and assigned to a chromosomal location,
define a unique site on the genome.
* Aligning clones by STS
mapping.
STS:
 To order inserts from individual human
chromosomes in a YAC library.
29
* Resolution ranges encountered in genome mapping.
30
The different types of cytological, genetic and
physical map of a chromosome.
cM: centiMorgan
Mbp: Megabase pairs
31
The sequencing projects are then used to determine the
individual base sequence of each clone.
Manual DNA sequencing
DNA sequencing methodologies: ca. 1977!

Maxam-Gilbert

base modification by general
and specific chemicals.

Sanger

DNA replication.

substitution of substrate
with chain-terminator
chemical.

depurination or
depyrimidination.

single-strand excision.

more efficient

not amenable to automation

automation??
32
DNA sequencing: Maxam & Gilbert sequencing
~ The method is reliable for
sequencing up to ~100
nucleotides at a time. The
technique requires that the
target DNA is end-labeled
(usually radioactively).
33
Either 4 or 5 separate chemical reactions are performed. The reactions are
carried out in two stages:
Stage 1: Specific chemical modification of bases in the DNA.
Stage 2: Chemical cleavage of sugar-phosphate backbone at modification site.
34
5’3’ direction
DNA sequencing: Sanger’s method (“bio” based methods)
~ dideoxynucleotide
~ based upon the faithful replication of DNA using a DNA
35
Sanger method
- Can lead to clean and
unambiguous assignment
of about 300 bases per
reaction.
* 7M urea gel
* High power level  70oC
Reduce secondary
structure of DNA
36
fragments.
Automated DNA sequencing
~ a set of dideoxynucleotides has been developed that are
labelled with fluorescent dyes precisely.
BigDyeTM
terminator
37
Sophisticated base calling
software is available to convert
the fluorescent patterns
obtained into a sequence of
DNA bases.
 speed, more reliable in
sequence interpretation.
~ as many as 1000 bases can be
read automatically from a single
reaction, although the sequence
obtained from within 500 bp of
the primer is generally more
reliable.
38
ABI 377 envelope: 96 lanes
Capillary electrophoresis
39
How to sequence the genome?
How to reconstruct the original genome sequence
based on the small fragments that are cloned
into individual vectors?
Strategies
 Clone contigs
 Whole genome shotgun
 Hierarchical shotgun
40
Clone contigs
~ the simplest way to generate
overlapping DNA sequence is to
isolate and sequence one clone,
from a library, then identify (by
hybridization) a second clone,
whose insert overlaps with the
first. The second clone is then
sequenced and the information
used to identify a third clone,
whose inset overlaps with the
second clone, and so on.
41
* Contig: (the basis of chromosome walking)
~ contiguous sequence of DNA created by assembling
overlapping sequenced fragments of a chromosome.
42
Chromosome walking
~ This method is used to move systematically along a chromosome
from a known location and to clone overlapping genomic clones that
represent progressively longer parts of a particular chromosome.
43
Whole genome shotgun (WGS)
- was first used to sequence the genome of the bacterium
Haemophilus influenzae.
~ the fragments of the
genome, which have been
randomly generated, are
cloned into a vector and each
insert is sequenced.
 the sequence is then
examined for overlaps and the
genome is reconstructed by
assembling the overlapping
sequences together.
44
Identifying additional clones that contained sequences close
to the gap-point.
* Advantage:
~ no prior knowledge of the sequence of the genome is required.
* Disadvantage:
~ may limited by the ability to identify overlapping sequences.
~Time-consuming (every sequence must be compared with every
other sequence in order to identify the overlaps)
~ Repetitive DNA sequences in the genome may lead to the
incorrect assignment of contigs.
45
Hierarchical shotgun
-- preferred by the Human Genome Project
In this approach, genomic DNA
is cut into pieces of about 150
Mb and inserted into BAC
vectors, transformed into E.
coli where they are replicated
and stored.
Each BAC fragment is
fragmented randomly into
smaller pieces and each piece is
cloned into a plasmid and
sequenced on both strands.
These sequences are aligned so
that identical sequences are
overlapping.
46
Two general strategies for sequencing a complete genome.
47
 What was the Human
Genome Project ?
The Human Genome Project (HGP) was the international,
collaborative research program whose goal was the complete mapping
and understanding of all the genes of human beings. All our genes
together are known as our "genome."
Goals of HGP:
1.
Determine the DNA sequence of the entire human
genome
2. Store this information in databases
3. Identify all of the genes in human DNA
4. Improve tools for data analysis
48
Brief review of genomics- regarding to HGP
Human Genome Project (HGP)
~ started at late 1980 by 20 centers of six nations (coordinated
by NIH/USA), led first by Watson and after 1992 by Collins.
~ The completed sequence of the human genome (3x109 bp)
was published in April 2003 (efforts spanning 13 yrs).
~ Joining by Celera Co. (funded in 1997 by Venter)
accelerated the process (two years ahead of schedule).
James D. Watson
49
49
The Beginning of the Project
Most the first 10 years of the project were spent
improving the technology to sequence and analyze
DNA.
Scientists all around the world worked to make
detailed maps of our chromosomes and sequence
model organisms, like worm, fruit fly, and mouse.
50
The Human Genome Project Began in 1990
The Mission of the HGP: The quest to understand the
human genome and the role it plays in both health and
disease.
“The true payoff from the HGP
will be the ability to better
diagnose, treat, and prevent
disease.”
--- Francis Collins, Director of the HGP
and the National Human Genome
Research Institute (NHGRI)
51
Brief review of genomics- regarding to HGP
The genome is our Genetic Blueprint
Nearly every human cell
contains 23 pairs of
chromosomes
1 - 22 and XY or XX
XY = Male
XX = Female
Length of chr 1-22, X, Y
together is ~3.2 billion
bases (about 2 meters
diploid)
52
Brief review of genomics- regarding to HGP
5000 bases per page
CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG AGAAAAATACAGTGATTCC
AAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAATC
AGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAA
CTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCC
CTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAAT
GAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATC ATTTTCAGGTT
CTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGT
CATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAA
ATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGA
TTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTACC
AGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCT
AAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCT
CAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAAG
AACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATAT
TTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA AACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAAT
TAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAAA
TTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTG
TCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTAC
ATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGCT
ATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGCC
ATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTGG
TACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATCA
TTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCTC
ATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGAA
CATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAAG
AACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCATG
CTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAACT
GGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGTT
CATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATTC
TGTAGGTTGTCTGTTTATTTTGTTAATAGTTTCTTTTGCTATGCAGAAGCTCTTAATAAGTTTAATGAGATCCTGATATGTTAGGCTTTGTGTCCCCACCCAAATCTCATCTTGAATTATA
TCTCCATAATCACCACATGGAGAGACCAGGTGGAGGTAATTGAATCTGGGGGTGGTTTCACCCATGCTGTTCTTGTGATAGTGAATGAGTTCTCACGAGATCTAATGGTTTTATGAGG
GGCTCTTCCCAGCTTTGCCTGGTACTTCTCCTTCCTGCCGCTTTGTGAAAAAGGTGCATTGCGTCCCTTTCACCTTCTTCTATAATTGTAAGTTTCCTGAGGCCTTCCCAGCCATGCTGAA
CTTCAAGTCAATTAAACCTTTTTCTTTATAAATTACTCAGTCTCTGGTGGTTCTTTATAGCAGTGTGAAAATGGACTAATGAAGTTCCCATTTATGAATTTTTGCTTTTGTTGCAATTGCTT
TTGACATCTTAGTCATGAAATCCTTGCCTGTTCTAAGTACAGGACGGTATTGCCTAGGTTGTCTTCCAGGGTTTTTCTAATTTTGTGTTTTGCATTTAAGTGTTTAATCCATCTTGAGTTGA
TTTTTGTATATTGTGTAAGGAAGGGGTCCAGTTTCAATCTTTTGCATATGGCTAGTTAGTTATCCCAGTACCATTTATTGAAAAGACAGTCTTTTCCCCATCGCTCGTTTTTGTCAGTTTT
ATTGATGATCAGATAATCATAGCTGTGTGGCTTTATTTCTGGGTTCTTTATTCTGTTCTATTGGTTTATGTCCCTGTTTTTGTGCCAGTACCATGCTGTTTTGGTTAACATAGCCCTGTAGT
ATAGTTTGAGGTCAGATAGCCTGATGCTTCCAGCTTTGTTCTTTTTCTTAAGATTGCCTTGGCTATTTGGCCTCTTTTTTGGTTCCACATGAATTTTAAAACAGTTGTTTCTAGTTTTTGAA
GAATGTCATTGGTAGTTTGATAGAAATAGCATTTAATCTGTAAATTGATTTGTGCAGTATGGCCTTTTAATGATATTGATTCTTCCTATCCATGAGCATGATATGTTTTCCATTTTGTTTG
TATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCAT TGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT
TGCCTTCCTGATTTGACTGC CAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTG ATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCT
TTGGGGCTGAGACTATGGGGTTTTCTAGATATAGAATCATGTCAGCTTCAAATAGGGATAATTTTACTTCCTCTCTTCCTATTTGGATGCCCTTTATTTCTTTCTCTTGCCTGATTACTCTG
GCTGGGATTTCCTATGTTGAATAGGAGT CATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT
53
The Completion of the Human Genome Sequence
•
June 2000 White House announcement that the majority of
the human genome (80%) had been sequenced (working draft).
•
Working draft made available on the web July 2000 at
genome.ucsc.edu.
•
Publication of 90 percent of the sequence in the February
2001 issue of the journal Nature.
•
Completion of 99.99% of the genome as finished sequence on
July 2003.
54
Fully sequenced genomes are, in fact, not usually complete.
Higher-eukaryotic genomes have large regions of DNA that
currently can’t be cloned or assembled.
Ex. Telomeres, centromeres and “heterochromatic
gapsDNA”,
which has few genes and many repeated regions
55
The Project is not Done…
Imagine the genome as a book written without capitalization or
punctuation, without breaks between words, sentences, or paragraphs,
and with strings of nonsense letters scattered between and even
within sentences. A passage from such a book in English might look like
this:
Even in a familiar language it is difficult to pick out the meaning of
the passage: The quick brown fox jumped over the lazy dog. The dog
lay quietly dreaming of dinner. And the genome is "written" in a far
less familiar language, multiplying the difficulties involved in reading it.
56
The Project is not Done…
Next there is the Annotation:
The sequence is like a topographical map, the
annotation would include cities, towns, schools,
libraries and coffee shops!
So, where are the genes?
How do genes work?
And, how do scientists use this
information for scientific understanding
and to benefit us?
57
Next class,
We will learn how to find the genes?
58
Genome projects use two general approaches:
a.
The mapping approach divides the genome into
segments with genetic and physical mapping, refines
the map of each segment, and finally sequences the
DNA. (Genetic and physical maps are made first to
provide markers for sequencing.)
b. A “shotgun” approach breaks the genome into random,
overlapping fragments, and sequences each fragment.
Based on overlaps, the sequences are assembled by
computer. An advantage is that physical mapping is
not required.
59
Yeast artificial chromosome (YAC) vectors
allow the cloning , within yeast cells, of fragments of foreign genomic DNA that
can approach 500 kbp in size.
A yeast centromere
(CEN4)
Yeast autonomously
replicating sequence
(ARS1)
Yest telomeres (TEL)
Genes for YAC selection in
yeast.
Bacterial replication origin
and a bacterial selectable
marker.
60