Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7 DNA Markers 7.1 DNA Markers Present in Genomic DNA Genetic variation, in the form of multiple alleles of many genes, exists in most natural populations of organisms. We have called such genetic differences between individuals DNA markers; they are also called DNA polymorphisms. [The term polymorphism literally means “multiple forms.”] The methods of DNA manipulation can be used in a variety of combinations to detect differences among individuals. The different approaches are in use because no single method is ideal for all applications, each method has its own advantages and limitations, and new methods are continually being developed. In this section we examine some of the principal methods for detecting DNA polymorphisms among individuals. 7.2 Single-Nucleotide Polymorphisms (SNPs) A single-nucleotide polymorphism, or SNP (pronounced “snip”), is present at a particular nucleotide site if the DNA molecules in the population frequently differ in the identity of the nucleotide pair that occupies the site. For example, some DNA molecules may have a TA base pair at a particular nucleotide site, whereas other DNA molecules in the same population may have a CG base pair at the same site. This difference constitutes a SNP. The SNP defines two “alleles” for which there could be three genotypes among individuals in the population: homozygous with TA at the corresponding site in both homologous chromosomes, homozygous with CG at the corresponding site in both homologous chromosomes, or heterozygous with TA in one chromosome and CG in the homologous chromosome. The word allele is in quotation marks above because the SNP need not be in a coding sequence, or even in a gene. In the human genome, any two randomly chosen DNA molecules are likely to differ at one SNP site about every 1,000–3,000 bp in protein-coding DNA and at about one SNP site every 500– 1,000 bp in noncoding DNA. Note, in the definition of a SNP, the stipulation that DNA molecules must differ at the nucleotide site “frequently.” This provision excludes rare genetic variation of the sort found in less than 1 percent of the DNA molecules in a population. The reason for the exclusion is that genetic variants that are too rare are 1 not generally as useful in genetic analysis as the more common variants. A catalog of SNPs is regarded as the ultimate compendium of DNA markers, because SNPs are the most common form of genetic differences among people and because they are distributed approximately uniformly along the chromosomes. 7.3 Restriction Fragment Length Polymorphisms (RFLPs) 7.3.1 Principle of RFLPs Although most SNPs require DNA sequencing to be studied, those that happen to be located within a restriction site can be analyzed with a Southern blot. An example of this situation is shown in Figure 7.1, where the SNP consists of a TA nucleotide pair in some molecules and a CG pair in others. In this example, the polymorphic nucleotide site is included in a cleavage site for the restriction enzyme EcoRI (5'GAATTC-3'). The two nearest flanking EcoRI sites are also shown. In this kind of situation, DNA molecules with TA at the SNP will be cleaved at both flanking sites and also at the middle site, yielding two EcoRI restriction fragments. Figure 7.1: A minor difference in the DNA sequence of two molecules can be detected if the difference eliminates a restriction site. (A) This molecule contains three restriction sites for EcoRI, including one at each end. It is cleaved into two fragments by the enzyme. (B) This molecule has an altered EcoRI site in the middle, in which 5'-GAATTC-3' becomes 5'-GAACTC3'. The altered site cannot be cleaved by EcoRI, so treatment of this molecule with EcoRI results in one larger fragment. Alternatively, DNA molecules with CG at the SNP will be cleaved at both flanking sites but not at the middle site (because the presence of CG destroys the EcoRI restriction site) and so will yield only one larger restriction fragment. A SNP that eliminates a restriction site is known as a restriction fragment length polymorphism, or RFLP (pronounced either as “riflip” or by spelling it out). 2 Restriction fragment length polymorphism (RFLP) refers to the differences in the distance between adjacent restriction endonuclease cut sites of individuals in a population. 7.3.2 RFLP for Small DNA If two DNA molecules are very similar, but have a few small differences in their nucleotide sequences, then the fact that they are not identical may be apparent from a comparison of their restriction maps. Restriction endonucleases cleave DNA at specific nucleotide recognition sequences, and if the sequences changes by just 1 bp, the enzyme will no longer cut at that site. Many possibilities exit, each of which results in a restriction fragment length polymorphism that would be detected if the restriction fragments concerned could be visualized by agarose gel electrophoresis (Figure 7.2). 1 2 3 4 5 Original map 1 2 4 5 1 Point mutation in restriction site 3 1 2 3 3a 4 2 3 4 5 Deletion in restriction fragment 2-3 5 1 Point mutation in restriction fragment 3-4 generates a new restriction site (3a) 2 4 5 Deletion including restriction site 3 Figure 7.2: Four mutations and the effects they will have on the restriction map of a DNA molecule. Thus, if a DNA fragment loses (or gain) a restriction endonuclease cleavage site, the sizes of the fragments produced after digestion will change accordingly. In addition to changes in nucleotide sequences at a restriction endonuclease cut site, changes in distances between cut sites may reflect deletions, additions, or rearrangement of DNA between cut sites and result in different restriction fragment patterns after separation by agarose gel electrophoresis (Figure 7.3). 3 1 Segment of chromosomal DNA has several Restriction sites, giving DNA fragments of different sizes 1.5 2 A mutational change in the DNA sequence Can occur, removing one RE cut site. 2.5 0.5 2.0 3.0 kb 3.0 kb Mutational change 1.5 + 2.5 3 The total DNA is digested and separated by agarose gel electrophoresis. 4 When hybridized to a DNA probe Representing the particular fragment of DNA, different patterns radioactive bands will appear for the two organisms. These patterns can be used as physical markers on the chromosome, as well as genetic markers. 3.5 0.5 Original DNA 3.5 Mutant DNA 2.0 Standard ladder (-) ve — — — — — — — — (+) ve — — — — — 5.0 kb 4.5 kb 4.0 kb — — — — — — — 3.5 kb 3.0 kb 2.5 kb 2.0 kb 1.5 kb 1.0 kb 0.5 kb Figure 7.3: RFLP identification. Note that these RFLP changes in DNA sequence are inherited in a codominant fashion, unlike morphological markers. Thus, if one copy of DNA from a diploid organism has an RFLP and the other does not, you will see both fragment patterns on an electrophoresis. 7.3.3 RFLP for Large DNA With large genome such as that for human (about 3.2 billion base pair), a large number of fragments are generated after restriction endonuclease the recognizes a 6-bp sequence, about 180,000 different fragments would be produced. (Restriction 4 of human DNA with EcoR1, for instance, results in approximately 700,000 fragments, producing a smear in an agarose gel from which it is impossible to distinguish just one altered band.) In practice, a strategy based on Southern hybridization is therefore used (Figure 7.4). Markers Human DNA (Test) Human DNA Standard (Control) ladder (-) ve — — — — — — — — — — — — — — — — (+) ve — 100 kb 90 kb 80 kb 70 kb 60 kb 50 kb 40 kb 30 kb 20 kb 10 kb Agarose gel Markers — — — — — — — Human DNA (Test) — — — * — Human DNA Standard (Control) ladder — — — — — * — — — — — — — — — — — Autoradiograph *The test sample shows RFLPs Figure 7.4: RFLP for large DNA. The restriction digests is electrophoresed in the normal way but, rather than being visualized by ethidium bromide staining, the smear of the fragments is transferred to a nitrocellulose or nylon membrane by Southern blotting (Figure 5.4). Hybridization analysis is then carried out, using a probe a clone that spans the region of interest. The probe hybridizes to the relevant region, ‘lighting up’ the appropriate restriction 5 fragments on the resulting autoradiograph. If an RFLP is present then it will be clearly visible when banding pattern is compared with that obtained with the unmutated gene. 7.4 Random Amplified Polymorphic DNA (RAPD) For studying DNA markers, one limitation of Southern blotting is that it requires material (probes available in the form of cloned DNA) and one limitation of PCR is that it requires sequence information (so primer oligonucleotides can be synthesized). These are not severe handicaps for organisms that are well studied (for example, human beings, domesticated animals and cultivated plants, and model genetic organisms such as yeast, fruit fly, nematode, or mouse), because research materials and sequence information are readily available. But for the vast majority of organisms that biologists study, there are neither research materials nor sequence information. Genetic analysis can still be carried out in these organisms by using an approach called random amplified polymorphic DNA or RAPD (pronounced “rapid”), described in this section. RAPD analysis makes use of a set of PCR primers of 8–10 nucleotides whose sequence is essentially random. The random primers are tried individually or in pairs in PCR reactions to amplify fragments of genomic DNA from the organism of interest. Because the primers are so short, they often anneal to genomic DNA at multiple sites. Some primers anneal in the proper orientation and at a suitable distance from each other to support amplification of the unknown sequence between them. Among the set of amplified fragments are ones that can be amplified from some genomic DNA samples but not from others, which means that the presence or absence of the amplified fragment is polymorphic in the population of organisms. In most organisms it is usually straightforward to identify a large number of RAPDs that can serve as genetic markers for many different kinds of genetic studies. An example of RAPD gel analysis is illustrated in Figure 7.5, where three pairs of primers (sets 1–3) are used to amplify genomic DNA from four individuals in a population. The fragments that amplify are then separated on an electrophoresis gel and visualized after straining with ethidium bromide. Many amplified bands are typically observed for each primer set, but only some of these are polymorphic. These are indicated in Figure 7.5 by the colored dots. The amplified bands that are not polymorphic are said to be monomorphic in the sample, which means that they 6 are the same from one individual to the next. This example shows 17 RAPD polymorphisms. Figure 7.5: Random amplified polymorphic DNA (RAPD) is detected through the use of relatively short primer sequences that, by chance, match genomic DNA at multiple sites that are close enough together to support PCR amplification. Genomic DNA from a single individual typically yields many bands, only some of which are polymorphic in the population. Different sets of primers amplify different fragments of genomic DNA. In modern genetics, the phenotypes that are studied are very often bands in a gel rather than physical or physiological characteristics. Figure 7.5 offers a good example. Each position at which a band is observed in one or more samples is a phenotype, whether or not the band is polymorphic. For example, primer set 1 yields a total of 19 bands, of which 5 are polymorphic and 14 are monomorphic in the sample. The phenotypes could be named in any convenient way, such as by indicating the primer set and the fragment length. For example, suppose that the smallest amplified fragment for primer set 1 is 125 bp, which is the polymorphic fragment at the bottom left in Figure 7.5. We could name this fragment unambiguously as 1-125 because it is a fragment of 125 bp amplified by primer set 7 1. To understand why a DNA band is a phenotype, rather than a gene or a genotype, it is useful to assign different names to the “alleles” that do or do not support amplification. (The word allele is in quotation marks again, because the 1-125 fragment that is amplified need not be part of a gene.) We are talking only about the 1-125 fragment, so we could call the allele capable of supporting amplification the plus (+) allele and the allele not capable of supporting amplification the minus () allele. Then there are three possible genotypes with regard to the amplified fragment: +/+, +/, and /. Using genomic DNA from these genotypes, the homozygous +/+ and heterozygous +/ will both support amplification of the 1-125 fragment, whereas the / genotype will not support amplification. Hence the presence of the 1-125 fragment is the phenotype observed in both +/+ and +/ genotypes. In other words, with regard to amplification, the (+) allele is dominant to the () allele, because the phenotype (presence of the 1-125 band) is present in both homozygous +/+ and heterozygous +/ genotypes. Therefore, on the basis of the phenotype for the 1-125 band in Figure 7.5, we could say that individuals A and D could have either a +/+ or +/ genotype but that individuals B and C must have genotype /. 7.5 Amplified Fragment Length Polymorphisms AFLPs) Because RAPD primers are small and may not match the template DNA perfectly, the amplified DNA bands often differ a great deal in how dark or light they appear. This variation creates a potential problem, because some exceptionally dark bands may actually result from two amplified DNA fragments of the same size, and some exceptionally light bands may be difficult to visualize consistently. To obtain amplified fragments that yield more uniform band intensities, double-stranded oligonucleotide sequences that match the primer sequences perfectly can be attached to genomic restriction fragments enzymatically prior to amplification. This method, which is outlined in Figure 7.6, yields a class of DNA polymorphisms known as amplified fragment length polymorphisms, or AFLPs (usually pronounced by spelling it out). 8 Figure 7.6: An amplified fragment length polymorphism (AFLP). (A and B) Genomic DNA is digested with one or more restriction enzymes (in this case, EcoRI). (C) Oligonucleotide adaptors are ligated onto the fragments; note that the single-stranded overhang of the adaptors matches those of the genomic DNA fragments. (D) The resulting fragments are subjected to PCR using primers complementary to the adaptors. The number of amplified fragments can be adjusted by manipulating the number of nucleotides in the adaptors that are also present in the primers. (1) The first step (part A) is to digest genomic DNA with a restriction enzyme; this example uses the enzyme EcoRI, whose restriction site is 5'-GAATTC-3'. Digestion yields a large number of restriction fragments flanked by what remains of an EcoRI site on each side. (2) In the next step (part B), double-stranded oligonucleotides called primer adapters, with single-stranded overhangs complementary to those on the restriction fragments, are ligated onto the restriction fragments using the enzyme DNA ligase. (3) The resulting fragments (C) are ready for amplification by means of PCR. Note that the same adapter is ligated onto each end, so a single primer sequence will anneal to both ends and support amplification. 9 There are nevertheless a number of choices concerning the primer sequence. A primer that matches the adapters perfectly will amplify all fragments, but this often results in so many amplified fragments that they are not well separated in the gel. A PCR primer must match perfectly at its 3' end to be elongated. Thus, additional nucleotides added to the 3' end reduce the number of amplified fragments, because these primers will amplify only those fragments that, by chance, have a complementary nucleotide immediately adjacent to the EcoRI site. For example, the primer sequence in the second row has a single nucleotide 3' extension; because only 1/4 x 1/4 of the fragments would be expected to have a complementary T immediately adjacent to the EcoRI site on both sides, this primer is expected to amplify 1/16 of all the restriction fragments. Similarly, the primer sequence in the third row has a two-nucleotide 3' extension, so this primer is expected to amplify 1/16 x 1/16 = 1/256 of all the fragments. One application of AFLP analysis is to organisms with large genomes, such as grasshoppers and crickets, for which RAPD analysis would yield an excessive number of amplified bands. How large is a “large” genome? Compared to the human genome, that of the brown mountain grasshopper Podisma pedestris is 7 times larger, and those of North American salamanders in the genus Amphiuma are 70 times larger! 7.6 Simple Tandem Repeat Polymorphisms (STRPs) One more type of DNA polymorphism warrants consideration because it is useful in DNA typing for individual identification and for assessing the degree of genetic relatedness between individuals. This type of polymorphism is called a simple tandem repeat polymorphism (STRP) because the genetic differences among DNA molecules consist of the number of copies of a short DNA sequence that may be repeated many times in tandem at a particular locus in the genome. STRPs that are present at different loci may differ in the sequence and length of the repeating unit, as well as in the minimum and maximum number of tandem copies that occur in DNA molecules in the population. A STRP with a repeating unit of 2–9 bp is often called a microsatellite or a simple sequence length polymorphism (SSLP), whereas a STRP with a repeating unit of 10–60 bp is often called a minisatellite or a variable number of tandem repeats (VNTR). 10 Figure 7.7 shows an example of a STRP with a copy number ranging from 1 through 10. Because the number of copies determines the size of any restriction fragment that includes the STRP, each DNA molecule yields a single-size restriction fragment depending on the number of copies it contains. Figure 7.7: In a simple tandem repeat polymorphism (STRP), the alleles in a population differ in the number of copies of a short sequence (typically 2–60 bp) that is repeated in tandem along the DNA molecule. This example shows alleles in which the repeat number varies from 1 to 10. Cleavage at restriction sites flanking the STRP yields a unique fragment length for each allele. The alleles can also be distinguished by the size of the fragment amplified by PCR using primers that flank the STRP. The STRP in Figure 7.7 has 10 different “alleles” (again, we use quotation marks because the STRP may not be in a gene), which could be distinguished either by Southern blotting using a probe to a unique (nonrepeating) sequence within the restriction fragment or by PCR amplification using primers to a unique sequence on 11 either side of the tandem repeats. In this situation the locus is said to have multiple alleles in the population. Even with multiple alleles, however, any one chromosome can carry only one of the alleles, and any individual genotype can carry at most two different alleles. Nevertheless, a large number of alleles means an even larger number of genotypes, which is the feature that gives STRPs their utility in individual identification. For example, even with only 10 alleles in a population of organisms, there could be 10 different homozygous genotypes and 45 different heterozygous genotypes. More generally, with n alleles there are n homozygous genotypes and n(n 1)/2 heterozygous genotypes, or n(n + 1)/2 different genotypes altogether. With STRPs, not only are there a relatively large number of alleles, but no one allele is exceptionally common, so each of the many genotypes in the population has a relatively low frequency. If the genotypes at 6–8 STRP loci are considered simultaneously, then each possible multiple-locus genotype is exceedingly rare. Because of their high degree of variation among people, STRPs are widely used in DNA typing (sometimes called DNA fingerprinting) to establish individual identity for use in criminal investigations, parentage determinations, and so forth. 7.7 Applications of DNA Markers Why are geneticists interested in DNA markers (DNA polymorphisms)? Their interest can be justified on any number of grounds. In this section we consider the reasons most often cited. 7.7.1 Genetic Markers, Genetic Mapping, and “Disease Genes” Perhaps the key goal in studying DNA polymorphisms in human genetics is to identify the chromosomal location of mutant genes associated with hereditary diseases. In the context of disorders caused by the interaction of multiple genetic and environmental factors, such as heart disease, cancer, diabetes, depression, and so forth, it is important to think of a harmful allele as a risk factor for the disease, which increases the probability of occurrence of the disease, rather than as a sole causative agent. This needs to be emphasized, especially because genetic risk factors are often called disease genes. For example, the major “disease gene” for breast cancer in women is the gene BRCA1. For women who carry a mutant allele of 12 BRCA1, the lifetime risk of breast cancer is about 36 percent, and hence most women with this genetic risk factor do not develop breast cancer. On the other hand, among women who are not carriers, the lifetime risk of breast cancer is about 12 percent, and hence many women without the genetic risk factor do develop breast cancer. Indeed, BRCA1 mutations are found in only 16 percent of affected women who have a family history of breast cancer. The importance of a genetic risk factor can be expressed quantitatively as the relative risk, which equals the risk of the disease in persons who carry the risk factor as compared to the risk in persons who do not. The relative risk for BRCA1 equals 3.0 (calculated as 36 percent/12 percent). The utility of DNA polymorphisms in locating and identifying disease genes results from genetic linkage, the tendency for genes that are sufficiently close together in a chromosome to be inherited together. The key concepts of genetic linkage are summarized in Figure 7.8, which shows the location of many DNA polymorphisms along a chromosome that also carries a genetic risk factor denoted D (for disease gene). Figure 7.8: Concepts in genetic localization of genetic risk factors for disease. Polymorphic DNA markers (indicated by the vertical lines) that are close to a genetic risk factor (D) in the chromosome tend to be inherited together with the disease itself. The genomic location of the risk factor is determined by examining the known genomic locations of the DNA polymorphisms that are linked with it. Each DNA polymorphism serves as a genetic marker for its own location in the chromosome. The importance of genetic linkage is that DNA markers that are sufficiently close to the disease gene will tend to be inherited together with the disease gene in pedigrees—and the closer the markers, the stronger this association. Hence, the initial approach to the identification of a disease gene is to find DNA markers that are genetically linked with the disease gene in order to identify its chromosomal location, a procedure known as genetic mapping. 13 Once the chromosomal position is known, other methods can be used to pinpoint the disease gene itself and to study its functions. If genetic linkage seems a roundabout way to identify disease genes, consider the alternative. The human genome contains approximately 80,000 genes. If genetic linkage did not exist, then we would have to examine 80,000 DNA polymorphisms, one in each gene, in order to identify a disease gene. But the human genome has only 23 pairs of chromosomes, and because of genetic linkage and the power of genetic mapping, it actually requires only a few hundred DNA polymorphisms to identify the chromosome and approximate location of a genetic risk factor. 7.7.2 Other Uses for DNA Markers DNA polymorphisms are widely used in all aspects of modern genetics because they provide a large number of easily accessed genetic markers for genetic mapping and other purposes. Among the other uses of DNA polymorphisms are the following. 7.7.2.1 Individual Identification We have already mentioned that DNA polymorphisms have application as a means of DNA typing (DNA fingerprinting) to identify different individuals in a population. DNA typing in other organisms is used to determine individual animals in endangered species and to identify the degree of genetic relatedness among individual organisms that live in packs or herds. 7.7.2.2 Epidemiology and Food Safety Science DNA typing also has important applications in tracking the spread of viral and bacterial epidemic diseases, as well as in identifying the source of contamination in contaminated foods. 7.7.2.3 Human Population History DNA polymorphisms are widely used in anthropology to reconstruct the evolutionary origin, global expansion, and diversification of the human population. 7.7.2.4 Improvement of Domesticated Plants and Animals Plant and animal breeders have turned to DNA polymorphisms as genetic markers in pedigree studies to identify, by genetic mapping, genes that are associated with favorable traits in order to incorporate these genes into currently used varieties of plants and breeds of animals. 14 7.7.2.5 History of Domestication Plant and animal breeders also study genetic polymorphisms to identify the wild ancestors of cultivated plants and domesticated animals, as well as to infer the practices of artificial selection that led to genetic changes in these species during domestication. 7.7.2.6 DNA Polymorphisms as Ecological Indicators DNA polymorphisms are being evaluated as biological indicators of genetic diversity in key indicator species present in biological communities exposed to chemical, biological, or physical stress. They are also used to monitor genetic diversity in endangered species and species bred in captivity. 7.7.2.7 Evolutionary Genetics DNA polymorphisms are studied in an effort to describe the patterns in which different types of genetic variation occur throughout the genome, to infer the evolutionary mechanisms by which genetic variation is maintained, and to illuminate the processes by which genetic polymorphisms within map-species become transformed into genetic differences between species. 7.7.2.8 Population Studies Population ecologists employ DNA polymorphisms to assess the level of genetic variation in diverse populations of organisms that differ in genetic organization (prokaryotes, eukaryotes, organelles), population size, breeding structure, or lifehistory characters, and they use genetic polymorphisms within subpopulations of a species as indicators of population history, patterns of migration, and so forth. 7.7.2.9 Evolutionary Relationships among Species Differences in homologous DNA sequences between species is the basis of molecular systematics, in which the sequences are analyzed to determine the ancestral history (phylogeny) of the species and to trace the origin of morphological, behavioral, and other types of adaptations that have arisen in the course of evolution. Reference 1. Genetics: Analysis of Genes and Genomes, Sixth Edition. 2005. Daniel L Hartl and Elizabeth W Jones. Jones & Bartlett Publishers Inc., Boston. 15