Download File - Molecular Biology 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Zinc finger nuclease wikipedia , lookup

DNA nanotechnology wikipedia , lookup

DNA polymerase wikipedia , lookup

DNA profiling wikipedia , lookup

Replisome wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
7
DNA Markers
7.1
DNA Markers Present in Genomic DNA
Genetic variation, in the form of multiple alleles of many genes, exists in most
natural populations of organisms. We have called such genetic differences between
individuals DNA markers; they are also called DNA polymorphisms.
[The term polymorphism literally means “multiple forms.”]
The methods of DNA manipulation can be used in a variety of combinations to detect
differences among individuals. The different approaches are in use because no single
method is ideal for all applications, each method has its own advantages and
limitations, and new methods are continually being developed. In this section we
examine some of the principal methods for detecting DNA polymorphisms among
individuals.
7.2
Single-Nucleotide Polymorphisms (SNPs)
A single-nucleotide polymorphism, or SNP (pronounced “snip”), is present at a
particular nucleotide site if the DNA molecules in the population frequently differ in
the identity of the nucleotide pair that occupies the site. For example, some DNA
molecules may have a TA base pair at a particular nucleotide site, whereas other
DNA molecules in the same population may have a CG base pair at the same site.
This difference constitutes a SNP.
The SNP defines two “alleles” for which there could be three genotypes among
individuals in the population: homozygous with TA at the corresponding site in both
homologous chromosomes, homozygous with CG at the corresponding site in both
homologous chromosomes, or heterozygous with TA in one chromosome and CG in
the homologous chromosome. The word allele is in quotation marks above because
the SNP need not be in a coding sequence, or even in a gene. In the human genome,
any two randomly chosen DNA molecules are likely to differ at one SNP site about
every 1,000–3,000 bp in protein-coding DNA and at about one SNP site every 500–
1,000 bp in noncoding DNA. Note, in the definition of a SNP, the stipulation that DNA
molecules must differ at the nucleotide site “frequently.” This provision excludes rare
genetic variation of the sort found in less than 1 percent of the DNA molecules in a
population. The reason for the exclusion is that genetic variants that are too rare are
1
not generally as useful in genetic analysis as the more common variants. A catalog of
SNPs is regarded as the ultimate compendium of DNA markers, because SNPs are
the most common form of genetic differences among people and because they are
distributed approximately uniformly along the chromosomes.
7.3
Restriction Fragment Length Polymorphisms
(RFLPs)
7.3.1
Principle of RFLPs
Although most SNPs require DNA sequencing to be studied, those that happen to be
located within a restriction site can be analyzed with a Southern blot. An example of
this situation is shown in Figure 7.1, where the SNP consists of a TA nucleotide
pair in some molecules and a CG pair in others. In this example, the polymorphic
nucleotide site is included in a cleavage site for the restriction enzyme EcoRI (5'GAATTC-3'). The two nearest flanking EcoRI sites are also shown. In this kind of
situation, DNA molecules with TA at the SNP will be cleaved at both flanking sites
and also at the middle site, yielding two EcoRI restriction fragments.
Figure 7.1: A minor difference in the DNA sequence of two molecules can be detected if the
difference eliminates a restriction site. (A) This molecule contains three restriction sites for
EcoRI, including one at each end. It is cleaved into two fragments by the enzyme. (B) This
molecule has an altered EcoRI site in the middle, in which 5'-GAATTC-3' becomes 5'-GAACTC3'. The altered site cannot be cleaved by EcoRI, so treatment of this molecule with EcoRI
results in one larger fragment.
Alternatively, DNA molecules with CG at the SNP will be cleaved at both flanking
sites but not at the middle site (because the presence of CG destroys the EcoRI
restriction site) and so will yield only one larger restriction fragment. A SNP that
eliminates
a
restriction
site
is
known
as
a
restriction
fragment
length
polymorphism, or RFLP (pronounced either as “riflip” or by spelling it out).
2
Restriction fragment length polymorphism (RFLP) refers to the differences in the
distance between adjacent restriction endonuclease cut sites of individuals in a
population.
7.3.2
RFLP for Small DNA
If two DNA molecules are very similar, but have a few small differences in their
nucleotide sequences, then the fact that they are not identical may be apparent from
a comparison of their restriction maps. Restriction endonucleases cleave DNA at
specific nucleotide recognition sequences, and if the sequences changes by just 1 bp,
the enzyme will no longer cut at that site. Many possibilities exit, each of which
results in a restriction fragment length polymorphism that would be detected if the
restriction fragments concerned could be visualized by agarose gel electrophoresis
(Figure 7.2).
1
2
3
4
5
Original map
1
2
4
5
1
Point mutation in restriction site 3
1
2
3
3a
4
2
3
4
5
Deletion in restriction fragment 2-3
5
1
Point mutation in restriction fragment 3-4
generates a new restriction site (3a)
2
4
5
Deletion including restriction site 3
Figure 7.2: Four mutations and the effects they will have on the restriction map of a DNA
molecule.
Thus, if a DNA fragment loses (or gain) a restriction endonuclease cleavage site, the
sizes of the fragments produced after digestion will change accordingly. In addition
to changes in nucleotide sequences at a restriction endonuclease cut site, changes in
distances between cut sites may reflect deletions, additions, or rearrangement of
DNA between cut sites and result in different restriction fragment patterns after
separation by agarose gel electrophoresis (Figure 7.3).
3
1
Segment of chromosomal DNA has several
Restriction sites, giving DNA fragments of
different sizes
1.5
2
A mutational change in the DNA sequence
Can occur, removing one RE cut site.
2.5
0.5
2.0
3.0
kb
3.0
kb
Mutational change
1.5 + 2.5
3
The total DNA is digested and separated
by agarose gel electrophoresis.
4
When hybridized to a DNA probe
Representing the particular fragment
of DNA, different patterns radioactive
bands will appear for the two
organisms. These patterns can be
used as physical markers on the
chromosome, as well as
genetic markers.
3.5
0.5
Original
DNA
3.5
Mutant
DNA
2.0
Standard
ladder
(-) ve
—
—
—
—
—
—
—
—
(+) ve
—
—
—
—
—
5.0 kb
4.5 kb
4.0 kb
—
—
—
—
—
—
—
3.5 kb
3.0 kb
2.5 kb
2.0 kb
1.5 kb
1.0 kb
0.5 kb
Figure 7.3: RFLP identification.
Note that these RFLP changes in DNA sequence are inherited in a codominant
fashion, unlike morphological markers. Thus, if one copy of DNA from a diploid
organism has an RFLP and the other does not, you will see both fragment patterns
on an electrophoresis.
7.3.3
RFLP for Large DNA
With large genome such as that for human (about 3.2 billion base pair), a large
number of fragments are generated after restriction endonuclease the recognizes a
6-bp sequence, about 180,000 different fragments would be produced. (Restriction
4
of human DNA with EcoR1, for instance, results in approximately 700,000 fragments,
producing a smear in an agarose gel from which it is impossible to distinguish just
one altered band.) In practice, a strategy based on Southern hybridization is
therefore used (Figure 7.4).
Markers
Human
DNA
(Test)
Human
DNA
Standard
(Control) ladder
(-) ve
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
(+) ve
—
100 kb
90 kb
80 kb
70 kb
60 kb
50 kb
40 kb
30 kb
20 kb
10 kb
Agarose gel
Markers
—
—
—
—
—
—
—
Human
DNA
(Test)
—
—
—
*
—
Human
DNA
Standard
(Control) ladder
—
—
—
—
—
*
—
—
—
—
—
—
—
—
—
—
—
Autoradiograph
*The test sample shows RFLPs
Figure 7.4: RFLP for large DNA.
The restriction digests is electrophoresed in the normal way but, rather than being
visualized by ethidium bromide staining, the smear of the fragments is transferred to
a nitrocellulose or nylon membrane by Southern blotting (Figure 5.4). Hybridization
analysis is then carried out, using a probe a clone that spans the region of interest.
The probe hybridizes to the relevant region, ‘lighting up’ the appropriate restriction
5
fragments on the resulting autoradiograph. If an RFLP is present then it will be
clearly visible when banding pattern is compared with that obtained with the
unmutated gene.
7.4
Random Amplified Polymorphic DNA (RAPD)
For studying DNA markers, one limitation of Southern blotting is that it requires
material (probes available in the form of cloned DNA) and one limitation of PCR is
that
it
requires
sequence
information
(so
primer
oligonucleotides
can
be
synthesized). These are not severe handicaps for organisms that are well studied (for
example, human beings, domesticated animals and cultivated plants, and model
genetic organisms such as yeast, fruit fly, nematode, or mouse), because research
materials and sequence information are readily available. But for the vast majority of
organisms that biologists study, there are neither research materials nor sequence
information. Genetic analysis can still be carried out in these organisms by using an
approach called random amplified polymorphic DNA or RAPD (pronounced
“rapid”), described in this section. RAPD analysis makes use of a set of PCR primers
of 8–10 nucleotides whose sequence is essentially random. The random primers are
tried individually or in pairs in PCR reactions to amplify fragments of genomic DNA
from the organism of interest.
Because the primers are so short, they often anneal to genomic DNA at multiple
sites. Some primers anneal in the proper orientation and at a suitable distance from
each other to support amplification of the unknown sequence between them. Among
the set of amplified fragments are ones that can be amplified from some genomic
DNA samples but not from others, which means that the presence or absence of the
amplified fragment is polymorphic in the population of organisms.
In most organisms it is usually straightforward to identify a large number of RAPDs
that can serve as genetic markers for many different kinds of genetic studies. An
example of RAPD gel analysis is illustrated in Figure 7.5, where three pairs of
primers (sets 1–3) are used to amplify genomic DNA from four individuals in a
population. The fragments that amplify are then separated on an electrophoresis gel
and visualized after straining with ethidium bromide. Many amplified bands are
typically observed for each primer set, but only some of these are polymorphic.
These are indicated in Figure 7.5 by the colored dots. The amplified bands that are
not polymorphic are said to be monomorphic in the sample, which means that they
6
are the same from one individual to the next. This example shows 17 RAPD
polymorphisms.
Figure 7.5: Random amplified polymorphic DNA (RAPD) is detected through the use of
relatively short primer sequences that, by chance, match genomic DNA at multiple sites that
are close enough together to support PCR amplification. Genomic DNA from a single individual
typically yields many bands, only some of which are polymorphic in the population. Different
sets of primers amplify different fragments of genomic DNA.
In modern genetics, the phenotypes that are studied are very often bands in a gel
rather than physical or physiological characteristics. Figure 7.5 offers a good
example. Each position at which a band is observed in one or more samples is a
phenotype, whether or not the band is polymorphic. For example, primer set 1 yields
a total of 19 bands, of which 5 are polymorphic and 14 are monomorphic in the
sample. The phenotypes could be named in any convenient way, such as by
indicating the primer set and the fragment length. For example, suppose that the
smallest amplified fragment for primer set 1 is 125 bp, which is the polymorphic
fragment at the bottom left in Figure 7.5. We could name this fragment
unambiguously as 1-125 because it is a fragment of 125 bp amplified by primer set
7
1. To understand why a DNA band is a phenotype, rather than a gene or a genotype,
it is useful to assign different names to the “alleles” that do or do not support
amplification. (The word allele is in quotation marks again, because the 1-125
fragment that is amplified need not be part of a gene.)
We are talking only about the 1-125 fragment, so we could call the allele capable of
supporting amplification the plus (+) allele and the allele not capable of supporting
amplification the minus () allele. Then there are three possible genotypes with
regard to the amplified fragment: +/+, +/, and /. Using genomic DNA from these
genotypes,
the
homozygous
+/+
and
heterozygous
+/
will
both
support
amplification of the 1-125 fragment, whereas the / genotype will not support
amplification. Hence the presence of the 1-125 fragment is the phenotype observed
in both +/+ and +/ genotypes. In other words, with regard to amplification, the (+)
allele is dominant to the () allele, because the phenotype (presence of the 1-125
band) is present in both homozygous +/+ and heterozygous +/ genotypes.
Therefore, on the basis of the phenotype for the 1-125 band in Figure 7.5, we could
say that individuals A and D could have either a +/+ or +/ genotype but that
individuals B and C must have genotype /.
7.5
Amplified Fragment Length Polymorphisms
AFLPs)
Because RAPD primers are small and may not match the template DNA perfectly, the
amplified DNA bands often differ a great deal in how dark or light they appear. This
variation creates a potential problem, because some exceptionally dark bands may
actually result from two amplified DNA fragments of the same size, and some
exceptionally light bands may be difficult to visualize consistently. To obtain
amplified fragments that yield more uniform band intensities, double-stranded
oligonucleotide sequences that match the primer sequences perfectly can be
attached to genomic restriction fragments enzymatically prior to amplification. This
method, which is outlined in Figure 7.6, yields a class of DNA polymorphisms known
as amplified fragment length polymorphisms, or AFLPs (usually pronounced by
spelling it out).
8
Figure 7.6: An amplified fragment length polymorphism (AFLP). (A and B) Genomic DNA is
digested with one or more restriction enzymes (in this case, EcoRI). (C) Oligonucleotide
adaptors are ligated onto the fragments; note that the single-stranded overhang of the
adaptors matches those of the genomic DNA fragments. (D) The resulting fragments are
subjected to PCR using primers complementary to the adaptors. The number of amplified
fragments can be adjusted by manipulating the number of nucleotides in the adaptors that are
also present in the primers.
(1) The first step (part A) is to digest genomic DNA with a restriction enzyme; this
example uses the enzyme EcoRI, whose restriction site is 5'-GAATTC-3'. Digestion
yields a large number of restriction fragments flanked by what remains of an EcoRI
site on each side. (2) In the next step (part B), double-stranded oligonucleotides
called primer adapters, with single-stranded overhangs complementary to those on
the restriction fragments, are ligated onto the restriction fragments using the
enzyme DNA ligase. (3) The resulting fragments (C) are ready for amplification by
means of PCR. Note that the same adapter is ligated onto each end, so a single
primer sequence will anneal to both ends and support amplification.
9
There are nevertheless a number of choices concerning the primer sequence. A
primer that matches the adapters perfectly will amplify all fragments, but this often
results in so many amplified fragments that they are not well separated in the gel. A
PCR primer must match perfectly at its 3' end to be elongated. Thus, additional
nucleotides added to the 3' end reduce the number of amplified fragments, because
these primers will
amplify only those fragments that, by chance, have a
complementary nucleotide immediately adjacent to the EcoRI site. For example, the
primer sequence in the second row has a single nucleotide 3' extension; because
only 1/4 x 1/4 of the fragments would be expected to have a complementary T
immediately adjacent to the EcoRI site on both sides, this primer is expected to
amplify 1/16 of all the restriction fragments. Similarly, the primer sequence in the
third row has a two-nucleotide 3' extension, so this primer is expected to amplify
1/16 x 1/16 = 1/256 of all the fragments. One application of AFLP analysis is to
organisms with large genomes, such as grasshoppers and crickets, for which RAPD
analysis would yield an excessive number of amplified bands.
How large is a “large” genome? Compared to the human genome, that of the brown
mountain grasshopper Podisma pedestris is 7 times larger, and those of North
American salamanders in the genus Amphiuma are 70 times larger!
7.6
Simple Tandem Repeat Polymorphisms (STRPs)
One more type of DNA polymorphism warrants consideration because it is useful in
DNA typing for individual identification and for assessing the degree of genetic
relatedness between individuals. This type of polymorphism is called a simple
tandem repeat polymorphism (STRP) because the genetic differences among
DNA molecules consist of the number of copies of a short DNA sequence that may be
repeated many times in tandem at a particular locus in the genome. STRPs that are
present at different loci may differ in the sequence and length of the repeating unit,
as well as in the minimum and maximum number of tandem copies that occur in
DNA molecules in the population. A STRP with a repeating unit of 2–9 bp is often
called a microsatellite or a simple sequence length polymorphism (SSLP),
whereas a STRP with a repeating unit of 10–60 bp is often called a minisatellite or
a variable number of tandem repeats (VNTR).
10
Figure 7.7 shows an example of a STRP with a copy number ranging from 1 through
10. Because the number of copies determines the size of any restriction fragment
that includes the STRP, each DNA molecule yields a single-size restriction fragment
depending on the number of copies it contains.
Figure 7.7: In a simple tandem repeat polymorphism (STRP), the alleles in a population differ
in the number of copies of a short sequence (typically 2–60 bp) that is repeated in tandem
along the DNA molecule. This example shows alleles in which the repeat number varies from 1
to 10. Cleavage at restriction sites flanking the STRP yields a unique fragment length for each
allele. The alleles can also be distinguished by the size of the fragment amplified by PCR using
primers that flank the STRP.
The STRP in Figure 7.7 has 10 different “alleles” (again, we use quotation marks
because the STRP may not be in a gene), which could be distinguished either by
Southern blotting using a probe to a unique (nonrepeating) sequence within the
restriction fragment or by PCR amplification using primers to a unique sequence on
11
either side of the tandem repeats. In this situation the locus is said to have multiple
alleles in the population. Even with multiple alleles, however, any one chromosome
can carry only one of the alleles, and any individual genotype can carry at most two
different alleles.
Nevertheless, a large number of alleles means an even larger
number of genotypes, which is the feature that gives STRPs their utility in individual
identification. For example, even with only 10 alleles in a population of organisms,
there could be 10 different homozygous genotypes and 45 different heterozygous
genotypes.
More generally, with n alleles there are n homozygous genotypes and n(n  1)/2
heterozygous genotypes, or n(n + 1)/2 different genotypes altogether. With STRPs,
not only are there a relatively large number of alleles, but no one allele is
exceptionally common, so each of the many genotypes in the population has a
relatively low frequency. If the genotypes at 6–8 STRP loci are considered
simultaneously, then each possible multiple-locus genotype is exceedingly rare.
Because of their high degree of variation among people, STRPs are widely used in
DNA typing (sometimes called DNA fingerprinting) to establish individual identity
for use in criminal investigations, parentage determinations, and so forth.
7.7
Applications of DNA Markers
Why are geneticists interested in DNA markers (DNA polymorphisms)? Their interest
can be justified on any number of grounds. In this section we consider the reasons
most often cited.
7.7.1
Genetic Markers, Genetic Mapping, and “Disease
Genes”
Perhaps the key goal in studying DNA polymorphisms in human genetics is to
identify the chromosomal location of mutant genes associated with hereditary
diseases. In the context of disorders caused by the interaction of multiple genetic
and environmental factors, such as heart disease, cancer, diabetes, depression, and
so forth, it is important to think of a harmful allele as a risk factor for the disease,
which increases the probability of occurrence of the disease, rather than as a sole
causative agent. This needs to be emphasized, especially because genetic risk
factors are often called disease genes. For example, the major “disease gene” for
breast cancer in women is the gene BRCA1. For women who carry a mutant allele of
12
BRCA1, the lifetime risk of breast cancer is about 36 percent, and hence most
women with this genetic risk factor do not develop breast cancer. On the other hand,
among women who are not carriers, the lifetime risk of breast cancer is about 12
percent, and hence many women without the genetic risk factor do develop breast
cancer. Indeed, BRCA1 mutations are found in only 16 percent of affected women
who have a family history of breast cancer. The importance of a genetic risk factor
can be expressed quantitatively as the relative risk, which equals the risk of the
disease in persons who carry the risk factor as compared to the risk in persons who
do not. The relative risk for BRCA1 equals 3.0 (calculated as 36 percent/12 percent).
The utility of DNA polymorphisms in locating and identifying disease genes results
from genetic linkage, the tendency for genes that are sufficiently close together in
a chromosome to be inherited together. The key concepts of genetic linkage are
summarized in Figure 7.8, which shows the location of many DNA polymorphisms
along a chromosome that also carries a genetic risk factor denoted D (for disease
gene).
Figure 7.8: Concepts in genetic localization of genetic risk factors for disease. Polymorphic
DNA markers (indicated by the vertical lines) that are close to a genetic risk factor (D) in the
chromosome tend to be inherited together with the disease itself. The genomic location of the
risk factor is determined by examining the known genomic locations of the DNA
polymorphisms that are linked with it.
Each DNA polymorphism serves as a genetic marker for its own location in the
chromosome. The importance of genetic linkage is that DNA markers that are
sufficiently close to the disease gene will tend to be inherited together with the
disease gene in pedigrees—and the closer the markers, the stronger this association.
Hence, the initial approach to the identification of a disease gene is to find DNA
markers that are genetically linked with the disease gene in order to identify its
chromosomal location, a procedure known as genetic mapping.
13
Once the chromosomal position is known, other methods can be used to pinpoint the
disease gene itself and to study its functions. If genetic linkage seems a roundabout
way to identify disease genes, consider the alternative. The human genome contains
approximately 80,000 genes. If genetic linkage did not exist, then we would have to
examine 80,000 DNA polymorphisms, one in each gene, in order to identify a disease
gene. But the human genome has only 23 pairs of chromosomes, and because of
genetic linkage and the power of genetic mapping, it actually requires only a few
hundred DNA polymorphisms to identify the chromosome and approximate location
of a genetic risk factor.
7.7.2
Other Uses for DNA Markers
DNA polymorphisms are widely used in all aspects of modern genetics because they
provide a large number of easily accessed genetic markers for genetic mapping and
other purposes. Among the other uses of DNA polymorphisms are the following.
7.7.2.1 Individual Identification
We have already mentioned that DNA polymorphisms have application as a means of
DNA typing (DNA fingerprinting) to identify different individuals in a population.
DNA typing in other organisms is used to determine individual animals in endangered
species and to identify the degree of genetic relatedness among individual organisms
that live in packs or herds.
7.7.2.2 Epidemiology and Food Safety Science
DNA typing also has important applications in tracking the spread of viral and
bacterial epidemic diseases, as well as in identifying the source of contamination in
contaminated foods.
7.7.2.3 Human Population History
DNA polymorphisms are widely used in anthropology to reconstruct the evolutionary
origin, global expansion, and diversification of the human population.
7.7.2.4 Improvement of Domesticated Plants and Animals
Plant and animal breeders have turned to DNA polymorphisms as genetic markers in
pedigree studies to identify, by genetic mapping, genes that are associated with
favorable traits in order to incorporate these genes into currently used varieties of
plants and breeds of animals.
14
7.7.2.5 History of Domestication
Plant and animal breeders also study genetic polymorphisms to identify the wild
ancestors of cultivated plants and domesticated animals, as well as to infer the
practices of artificial selection that led to genetic changes in these species during
domestication.
7.7.2.6 DNA Polymorphisms as Ecological Indicators
DNA polymorphisms are being evaluated as biological indicators of genetic diversity
in key indicator species present in biological communities exposed to chemical,
biological, or physical stress. They are also used to monitor genetic diversity in
endangered species and species bred in captivity.
7.7.2.7 Evolutionary Genetics
DNA polymorphisms are studied in an effort to describe the patterns in which
different types of genetic variation occur throughout the genome, to infer the
evolutionary mechanisms by which genetic variation is maintained, and to illuminate
the
processes
by
which
genetic
polymorphisms
within
map-species
become
transformed into genetic differences between species.
7.7.2.8 Population Studies
Population ecologists employ DNA polymorphisms to assess the level of genetic
variation in diverse populations of organisms that differ in genetic organization
(prokaryotes, eukaryotes, organelles), population size, breeding structure, or lifehistory characters, and they use genetic polymorphisms within subpopulations of a
species as indicators of population history, patterns of migration, and so forth.
7.7.2.9 Evolutionary Relationships among Species
Differences in homologous DNA sequences between species is the basis of molecular
systematics, in which the sequences are analyzed to determine the ancestral history
(phylogeny) of the species and to trace the origin of morphological, behavioral, and
other types of adaptations that have arisen in the course of evolution.
Reference
1.
Genetics: Analysis of Genes and Genomes, Sixth Edition. 2005. Daniel L Hartl and Elizabeth W
Jones. Jones & Bartlett Publishers Inc., Boston.
15