* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1BIOLOGY 220W - Lecture Notes Packet
Hardy–Weinberg principle wikipedia , lookup
DNA sequencing wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Genetic drift wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genetic engineering wikipedia , lookup
Metagenomics wikipedia , lookup
Human genome wikipedia , lookup
DNA barcoding wikipedia , lookup
Primary transcript wikipedia , lookup
Cancer epigenetics wikipedia , lookup
DNA polymerase wikipedia , lookup
DNA profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Population genetics wikipedia , lookup
DNA vaccination wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Human genetic variation wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic library wikipedia , lookup
Point mutation wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Molecular cloning wikipedia , lookup
Epigenomics wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Genome editing wikipedia , lookup
SNP genotyping wikipedia , lookup
Non-coding DNA wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Microsatellite wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
History of genetic engineering wikipedia , lookup
1 BIOLOGY 220W - Lecture Notes Packet - Claude dePamphilis Chapter 1 The central importance of variation Objective The stunning degree of matching of DNA sequences of different species makes it very difficult to imagine how they could attain this similarity without actually being related to one another. Fossil evidence makes it clear that species exist today that did not exist in the past, and that there are species that were abundant in the past which are totally gone today. The challenge for Charles Darwin was to piece these observations together with a proposed mechanism whereby one species could give rise to another. His argument was fairly simple, and as we will see it hinges on the observation that all species exhibit variation. In this chapter we will see what is the critical role of this variation, and how an absence of variation is extremely hazardous to a species. We will then spend a good bit of time learning how genetic variation is measured in populations, and how that variation is organized. The simplest statement of how evolution works Charles Darwin recognized the central role of variation in his theory of evolution by Natural Selection. He came to this conclusion after many long hours of observation of many different species. He noted that some birds were more adept or stronger fliers than others. Some plants had deeper roots than others. He reasoned that these differences must result in different chances of survival and reproduction. If the differences among organisms are in some way passed on from parents to offspring, then there is a means for the differences to accumulate over generations. In Darwin’s book Origin of Species, published in 1869, he presents a coherent theory for evolution that is based on three principles: 1. There is VARIATION in populations. 2. To some degree, the variation is passed on from parents to offspring (the variation is INHERITED). 3. Some variants are more successful at surviving and rearing offspring (NATURAL SELECTION discriminates among the variants). 1 -1- These three features are sufficient to result in adaptive changes in populations of organisms over time. These lectures will document the changes that have been brought about by this process of evolution. For now, the point that needs emphasis is that in the absence of variation, the whole process comes grinding to a stop. There must be variation in order for natural selection to choose the fit from the less fit. Variation at the level of the phenotype and genotype It is crucial to distinguish between two levels of variation. Variation among the phenotypes in a population refers to variation among individuals in their appearance or morphology (like height, weight) or their physiology (like running speed, jumping height). Natural selection acts on this level of variation, because it is differences in phenotypes that result in differences in survival and reproduction. The other type of variation is variation at the level of the genotype. Of course, some genotypic differences result in clear phenotypic differences, but in general, variation in the genotype is what is passed along from parents to offspring, and the phenotypes then result from those transmitted genotypes and the effect of environmental variation on their development and expression. Natural selection acts on genotypes only indirectly, through the effect of the genes on the phenotype. If a gene (or a variant of a gene) has no effect on the phenotype at all, then selection does not act on that gene or variant and it is said to be neutral. Discrete vs. continuous variation It is also useful to distinguish between variation in discrete and continuous traits. Discrete traits are those that fall into easily separable and countable classes, the simplest being presence or absence of a morphological structure. Gregor Mendel chose seven discrete traits to study, including smooth vs. wrinkled peas, and yellow vs. green peas. In the case of Mendel’s peas, the discrete phenotypes were associated with discrete, singlelocus genotypes. If natural selection were to act on the trait smooth-vs-wrinkled peas, then the effect of the natural selection would be seen immediately on the frequencies of the alleles for the gene that determines smooth vs. wrinkled. Not all discrete traits, however are determined by a single gene. One might think of disease as a discrete trait -one can be either healthy or diseased. Many common diseases in humans seem to have some degree of genetic basis because the incidence of the disease tends to cluster in families. Arthritis is a good example. When we try to find the genes for susceptibility to arthritis, it immediately becomes clear that many genes are involved, and generally there are variables in the environment that also affect the trait. Continuous traits are those that do not fall into separate categories, but which lie -2- along a continuous axis of measurement. Examples include height, weight, hair color, and shoe size. There is a statistical tendency for the offspring of very tall parents to be tall, and for offspring of very short parents to be short. Even though this correlation is not perfect, it suggests that there are genes involved in the familial resemblance. The relationship between the discreteness of underlying genes and the continuousness of many characters is analogous to recorded music. Music is inherently a continuous flow of analog sound, but engineers have figured out that an efficient way to store, manipulate, and play back sound is to chop it up into discrete time slices which are represented digitally. Changes in some of those bits would make an obvious audible change to the music, but changes in other bits would make no difference that you could hear. Similarly, many genetic changes have a major effect on the phenotype, whereas others have no effect How do we characterize and quantify genetic variation in populations? Now let’s get away from the hypothetical thinking about variation, and go into the lab to actually see how DNA sequence variation can be quantified. We want to be able to measure the amount of variation in the DNA among a sample of individuals. It turns out there are many ways to do this, and most of them depend on first performing a reaction on the DNA known as the polymerase chain reaction. This reaction provides a way to take total genomic DNA and to make millions of copies of just one tiny part of that DNA, also known as an amplification. The reaction depends on the use of a DNA polymerase, which you should recall is the enzyme that catalyzes the assembly of mononucleotide triphosphates into a new DNA strand. The DNA polymerase from most organisms is denatured permanently when it is heated to the temperature that melts apart the two DNA strands. However, the DNA polymerase from hydrothermal vent organisms, like Thermus aquaticus, can stand such high temperatures, so that is the enzyme we’ll use. Polymerase Chain Reaction The basic ingredients of the reaction include: 1. Sample DNA, 2. Two short (20 base pair) DNA primers whose sequences match the target DNA to be amplified, 3. Temperature-stable DNA polymerase, 4. Buffer. There are three steps to the reaction, and they are repeated many times by an instrument called a thermocycler, or a PCR machine. The first step is to denature the sample DNA at 94 degrees C (almost boiling!). This typically takes 30 seconds or so. The second step is to anneal the primers to the sample DNA by lowering the temperature to 55-60 degrees C. The third step is to allow DNA polymerase to synthesize new DNA strands by raising the temperature to 72 degrees C. Each round theoretically doubles the amount of DNA between the primer sequences. If it -3- is repeated 30 times, we should end up with 230 or about 1 billion copies of the fragment of DNA that lies between the two primers. In summary, the steps Denature, Anneal, Synthesis are repeated simply by cycling the temperature between 94, 55, and 72 degrees. This clever process won its inventor, Kary Mullis, the Nobel Prize. Agarose gel electrophoresis separates DNA fragments of different lengths After the PCR reaction is done, we want to know whether we in fact made many copies of one small piece of DNA. This is generally done by running an agarose gel in a procedure called gel electrophoresis. To make an agarose gel, the investigator measures out the needed volume of buffer solution and adds anywhere from 0.8% to 1.4% agarose by weight. Agarose does not dissolve readily at room temperature, and the easiest way to get it to dissolve is to heat the mixture in a microwave oven. Once the agarose is thoroughly melted and mixed, the solution is poured into a gel mold. A half-hour or so later the gel is placed into a buffer chamber, where it is completely immersed in buffer. The gel has small slots in it, and we place some of the PCR reaction solution into these slots with a pipettor. An electric current is then run through the gel, and the negative charge of DNA makes it move toward the positive electrode. Smaller DNA fragments can work their way through the agarose gel matrix faster than can larger fragments, and after an hour or so we can see the location of the DNA fragments by staining the gel with ethidium bromide and looking under an ultraviolet light. The size of the DNA fragments (measured in base pairs) is estimated by also running a length standard on the same gel (in another lane). The length standard, also called a DNA “ladder” will make a series of bands of known lengths. How to score DNA polymorphism After we run the PCR reaction, there are two possibilities. Either the fragments -4- are of different sizes, or they are all the same size. If they are different sizes, then we can detect the difference in size by separating the fragments on an electrophoretic gel. An example of a kind of polymorphism where there are many differences in length is called a microsatellite, also called a Short Tandem Repeat Polymorphism (STRP). Microsatellites are runs of simple repeats, like CACACACACACACA, and it happens that such runs have a high error rate when DNA polymerase copies them. This results in a high mutation rate, and the end result is that populations tend to be highly variable for these runs. Microsatellites are of enormous utility in human genetics, and over 5500 have been identified and mapped onto the human chromosomes. If the DNA fragments are all the same size after the PCR reaction, then we need to do some more work. When the fragments are all the same size, they may still be different in sequence, so any method that can detect sequence differences should do the trick. One possibility is to sequence the fragments. This is somewhat expensive, but the time, cost, and effort in sequencing is rapidly decreasing, so this method is becoming more acceptable. I will show an example of DNA sequencing and sequence data in class. Another possibility is to run the fragments out on an electrophoretic gel that lets the DNA molecules fold up into their native conformation. This is generally done after making the DNA single stranded by heating it, so that single DNA strands fold up in a way that differs if the sequence differs. This method is called Single Strand Conformation Polymorphism (or SSCP). A third approach is to cut the DNA with a restriction endonuclease, an enzyme that cuts DNA in a sequence-specific manner. If there is variation in the DNA sequence, then some DNA molecules will be cut and others will not. The result is fragments of different sizes, which can then be identified by electrophoresis. Variation in DNA sequence is referred to as Single Nucleotide Polymorphisms (SNPs), and this is the kind of variation that has the human genome investigators very excited. SNPs may be very useful for mapping genes that affect the risk of diseases like cancer and heart disease, so they have important medical application. Let’s look at an example of how restriction endonucleases work. Restriction endonucleases cut DNA at specific target sites When DNA in solution is treated with the restriction endonuclease called EcoRI, it finds all locations of its particular recognition site GAATTC, and cuts the DNA: 5’ - gatgctacgGAATTCcatgca - 3’ 3’ - ctacgatgcCTTAAGgtacgt - 5’ ⇓ EcoRI digestion 5’ – gatgctacgGA 3’ – ctacgatgcCTTA -5- ATTCcatgca - 3’ AGgtacgt - 5’ There are hundreds of different restriction endonucleases, and each cuts DNA at its own specific recognition site. The original experiments on cloning of DNA depended heavily on restriction enzymes to cut DNA in prescribed ways, and they are still extremely useful in molecular genetics. Daniel Nathans was awarded the Nobel Prize for his elucidation of restriction endonucleases. The amazing activity of restriction endonucleases may seem puzzling at first, until you realize that they are a defense mechanism that bacteria use to recognize and destroy foreign DNA, such as that of an invading bacteriophage. Provided the bacteria’s own genome does not contain the recognition site, it is safe, but it will cut any invader’s DNA that has the recognition site. Recognition sites are typically 4 or 6 nucleotides in length. Mutations are the original source of genetic variation Although organisms are generally very good at replicating their DNA, and fixing most of the mistakes (mutations) they make along the way, mutations that are not repaired are the ultimate source of genetic variation. Without mutation, there would be no genetic variation, and without genetic variation, no evolution. Mutations can alter the base sequence (A,C,T, or G) or may involve structural 'mistakes' such as a DNA insertion, deletion, inversion, or duplication. Mutations (at least to a good first approximation) occur at random, meaning that mutations at one location are independent of mutations at another location. Mutations occur without regard to whether they are beneficial or deleterious to the organism. All of these different types of mutations are the "stuff" of evolution and provide genetic variation for natural selection to work on. Example of a survey of genetic variation Suppose we extract DNA from 100 classmates. There are kits to do this, and it is probably harder work to get the blood than it is to extract the DNA! Samples of around 10 ng of the DNA are added to small reaction tubes along with the other needed ingredients for PCR. We use a pair of oligonucleotide primers to amplify’ a part of a chromosome that includes a microsatellite repeat, like CACACACACACA. After the PCR reaction, we take about 1/4 of the PCR product and load it onto a gel and perform electrophoresis to separate fragments based on their size. Suppose there are only two fragment sizes, S and L. Each individual will have either a band at the S location, or a band at the L location, or they will have both the S and the L bands. In this case, we assume that the people with only the S band are actually homozygotes, SS, and the people only with the L band are homozygous LL, and the people with both bands are heterozygous, SL. We count up the three types and see 16 SS, 48 SL, and 36 LL. The next thing we want to know are the allele frequencies for the S and L alleles. First we count up the total number of S alleles in our sample. We do this by noting that the S homozygotes have 2 copies, so these 16 individuals have a total of 32 copies of the S allele. Adding the 48 S -6- alleles that the heterozygotes have, we get (16 x 2) + 48 = 80 copies of the S allele. Similarly, for the count of the L alleles we get 48 + (36 x 2) = 120. In our sample of 100 people there are 200 gene copies, because everybody gets an allele from their mother and one from their father (except for the case of X chromosomes in males). To calculate the allele frequency, we divide the allele counts by the total of all alleles in the sample. Thus, the frequency of the S allele is 80/200 = 0.40, and the frequency of the L allele is 120/200 = 0.60. Note that 0.4 + 0.6 = 1, that is, the sum of the frequencies of all alleles is 1. It turns out that this simple process of calculating allele frequencies is of enormous importance in using DNA evidence in rape and murder cases. Let’s see why. Forensic uses of DNA variation Variation is not only important in evolution -- it is also important in solving many violent crimes. The basic idea behind forensic DNA methods is that an individual’s DNA is as unique to that individual as a fingerprint, and matches of samples from a crime scene to the DNA of a suspect can be very incriminating. In practice, the way the method works is as follows. First samples of blood or other fluids or tissues are taken from the crime scene and DNA is extracted. Then DNA is extracted from a blood sample of the suspect. These DNA samples are subjected to PCR to amplify a piece of the genome that is highly variable. One kind of variable gene frequently used for this purpose is called a STRP. Repeats like this frequently produce errors when the DNA is replicated, so that the number of copies increases or decreases. The heterozygosity of regions of DNA like this can be as high as 95%. After the PCR is done, the resulting DNA fragments are separated by gel electrophoresis, as we described above. If the electrophoretic gel patterns do not match, the suspect is innocent, or at least the suspect’s DNA does not match the DNA from the crime scene. But, what if the banding patterns on the gel match? Does this guarantee that the suspect is guilty? Does it even guarantee that the suspect was at the crime scene? Does it even guarantee that the DNA from the crime scene must have come from the suspect? The answer to all three questions is NO, but it might be possible to say that it is extremely likely that the DNA came from the suspect. In order to determine this, we need to calculate the probability that a random person’s DNA would match by chance. To do this, we need the Hardy-Weinberg principle! The Hardy-Weinberg Principle Independently in 1908 Godfrey Hardy and Wilhelm Weinberg determined that the expected frequencies of genotypes should settle to stable values in just one generation. This was of great importance for evolutionary biology, because it meant that genetic variation is not rapidly lost in a population by a simple blending process (as assumed in Darwin’s time), but rather that the discrete allelic types will continue to segregate in the population. -7- The assumptions that Hardy and Weinberg made were: Infinite population size No mutation No selection Mendelian segregation Random mating among the genotypes The last assumption is, in this case, equivalent to assuming that the gametes come together at random. If we consider alleles A and a, whose frequencies are p and q (where p + q = 1), then the Hardy-Weinberg principle states that the offspring generation will have genotypes AA, Aa, and aa in frequencies p2, 2pq and q2, and that these relative genotype frequencies will also be found in all subsequent generations. To see where the Hardy-Weinberg principle comes from, just consider this Punnett square. Male gametes of type A have frequency p, and female gametes of type A also have frequency p. The only way to get an AA offspring is to have a union of an A gamete from the father and an A gamete from the mother. This happens with chance p x p = p2. Similar reasoning is used for the other genotypes. Although our discussion has focused on two alleles at a locus, many loci have more than two alleles. The Hardy-Weinburg principle is easily extended for more alleles, eg., (p + q + r)2 gives the genotype frequencies for 3 independently segregating alleles under Hardy-Weinburg. A convenient way to understand the Hardy-Weinberg principle is to plot a graph showing the frequencies of the genotypes as a function of the allele frequencies. It turns out that the frequencies of all 3 genotypes are parts of parabolas. This is easiest to see in the plot of the -8- heterozygote frequency. Note that the frequency of the heterozygotes is maximal when p = q = 1/2. Also note that there is a point where there are equally many homozygotes AA as there are heterozygotes. This happens when p2 = 2pq = 2p(l-p ) = 2p-2p2, or 3p2 = 2p. Dividing by p we get 3p = 2 or p = 2/3. Returning to our crime scene, we want to know the chance that a random individual drawn from the population has the same genotype as does the blood from the crime scene. To determine this, we must first calculate the population frequency of each allele. For the kind of markers used in forensics, there are almost always more than 2 alleles. Suppose the frequency of allele Ai is pi, and the frequency of allele Aj is pj. Then the Hardy-Weinberg principle tells us that the expected frequency of genotype AiAj is 2pipj. For example, suppose the accused has genotype A10A32, and the blood from the crime scene is also A10A32. The frequency of the A10 allele in the population is 0.02, and the allele frequency of A32 is 0.01. The expected frequency of the genotype A10A32 in the general population is 2 x 0.02 X 0.01 = 0.0004. This number, 0.0004, is the probability that a random person drawn from the population has the genotype A10A32 (assuming that the population is in Hardy-Weinberg equilibrium). The critical question is: Is this a low enough chance to convict the suspect? Typically several genes are examined at once, so that the probability of a random match can be vanishingly low. DNA evidence is widely accepted in courts today, but only recently the issues of how one calculates population frequency, and what one does with the match probability had been a source of heated legal debates. So how much DNA sequence variation is there? Returning to the issue of evolution by natural selection (we did stray just a bit), the methods of PCR, restriction enzymes, and DNA sequencing have been used to determine the amount of DNA sequence variation in a wide variety of organisms. This variation is the fundamental material upon which natural selection works, so it would seem to be very important for the long-term evolutionary prospects for species. It turns out that the amount of variation in populations varies enormously. DNA sequence variation is quantified by a term called the -9- nucleotide diversity, calculated as the chance that two copies of a gene will differ at a site (or you can think of it as the fraction of sites that differ). The fruit fly, Drosophila melanogaster, has an average nucleotide diversity of 0.005, meaning that about 1 base every 200 differs from individual to individual. Humans have an average nucleotide diversity of 0.001, or a difference in about 1 base every 1000. In both humans and flies, enough different genes have been studied to see that the level of nucleotide diversity actually varies by over ten-fold from one gene to another. Reasons for this relate to mutation rates, levels of selective constraint, and rate of recombination within the genes. The cheetah has a nucleotide diversity of less than 0.0001, and is so low because this species underwent a severe bottleneck, reaching very low numbers. In contrast, the bacterium Escherichia ccli has an astronomically large population size, and its average nucleotide diversity is about 0.03. Many things influence the amount of DNA sequence variation, including the rate of mutation, the population size, and the past effects of natural selection, and we will be spending time in the next few lectures learning more about the inferences we can make about how evolution works by determining the role that these forces play in the maintenance of genetic variation. Summary 1. Heritable variation is central to Darwin’s theory of evolution. 2. Variation occurs at the phenotypic and genotypic levels. Natural selection occurs at the phenotypic level, changing the underlying gene frequencies. 3. DNA sequence variation ultimately originates with mutation, which can cause base substitutions, or insertions, deletions, inversions, or duplications of DNA sequences. Many methods exist for scoring DNA sequence variation directly. The Polymerase Chain Reaction has proven remarkably versatile as a method for making many copies of a small segment of a genome. It is useful for scoring restriction site variation and for scoring short tandem repeat polymorphisms. 4. Genetic variation is quantified by allele frequencies. 5. The Hardy-Weinberg principle states that a population with allele frequencies p and q will have stable genotype frequencies p2 of AA, 2pq of Aa and q2 of aa if there is random mating (and other assumptions are met). This principle can easily be extended to multiple alleles at a locus. 6. The Hardy-Weinberg principle is used in forensic DNA analysis to calculate the probability that a random individual’s DNA would match the forensic sample. - 10 -