Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Human genetic variation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Human leukocyte antigen wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Genetic drift wikipedia , lookup
Microevolution wikipedia , lookup
MULTIPLE LOCI Instructor: Dr. Martha B Reiskind AEC 550: Conservation Genetics Spring 2017 Notes for Lecture 6 Multiple Loci Important: If you read the lecture notes before lecture it will help you get a deeper understanding during lecture!!!!! This is a bit complicated so it will likely help a lot to read this first. Follow-up from last couple of lectures. I can only guess that there are some questions about what FST really measures. We talk about it in a variety of ways and you’ll often see many equations to the right of FST = ? so maybe a little time clarifying what it measures would be helpful! The main way we have talked about this measure is in the context of inbreeding. Recall that we are interested in how much population subdivision has caused allele frequencies to change among the subpopulations. We want to measure within subpopulations and compare it to the total expected if there was no subdivision. As you calculate FST you’ll note that you are measuring 1 minus the average expected heterozygosity over the subpopulations and dividing it by that expected without subdivision. This is a summary statistics that is only relevant in this context. You are not calculating it for one subpopulation at a time. Another question that often comes up is the different evolutionary forces working together (drift, gene flow, and natural selection). I think the best way to think about this conceptually is to follow along with the simulations that we talked about in class and that are also covered in your book. Here there are some very important points: (1) gene flow homogenizes divided subpopulations and therefore counteracts divergent directional selection or the random change in frequencies of alleles among subpopulations, (2) drift is stronger in small populations than in larger populations and selection will be more effective in larger populations and/or if selection is strong, and (3) in the case of overdominance both gene flow and selection will minimize differences among subpopulations, make sure you understand why. I would look over the results of those simulations again and then we can talk about it more in class. Another question that comes up is how to select genes for analysis. This really depends on our question. As you can see from the paper we read on the green sea turtles, what marker you use may give you a very different story. First thing to consider, if we want to trace the evolutionary history, understand how populations are connected by gene flow we want to make sure we have genes that are neutral. That is to say they are not affected by adaptive evolution, natural selection. You can imagine that if selection is driving a certain allele to fixation in one population and the other allele in another population it would appear that these populations were highly diverged if we used this locus. This might give us a false impression that the two populations were more distantly related or were divided deeper in the past. 1 MULTIPLE LOCI Remember that both drift and gene flow affect all the loci, but selection only affects loci that are under selection or closely associated with loci that do. Here’s the rub, sometimes you think your loci are neutral, such as microsatellite loci, but they may not be. They may be closely linked to a locus that is not. That’s why it is important to have many markers, in case one marker is behaving differently than the others. To get a more precise measure of FST we really need to have as many random samples of the genome as possible. Using one locus does not give us sense of the background levels of structuring in the population. Think about the whale paper and note that a more accurate understanding of the pre-whaling effective population size was gained from multiple loci. This is why we talk about several loci being better than one. On the flip side, maybe you’re interested in the adaptive differences between two populations that may be in different environments in the range. Then you do want to those loci that behaving differently than the average loci in the population. Finally, students in the past have asked about the non-linear relationship between FST and Nm. First look at the graph and maybe even plug some numbers into the equation and you’ll see that the curve is not linear. This is evident if you dissect the equation as well. While this may seem counterintuitive, if you think about it conceptually, as you increase the number of migrants per generation, you are further homogenizing the population to a point; those last low frequency changes as things are becoming more equal among the subpopulations decelerate as you get closer to complete homogeneity (FST = 0). I will attach a paper that goes through a bit about this relationship as well. MULTIPLE LOCI So far we have looked at allele and genotype frequencies associated with one locus and 2 alleles. We have spent a little time talking about multiple alleles in microsatellite loci when we were looking at the effects of bottlenecks or some of the issues with measuring FST. Now we need to see how loci may interact within individuals and therefore within the population. Previous assumptions included all loci are transmitted independently of alleles at other loci and fitness of genotypes at one locus are not affected by fitness at other loci. Now we can look at how loci may be linked to each other physically or due to evolutionary forces and how this connection is or is not broken by recombination. Effectively we are addressing nonrandom association of alleles among loci! There are two terms used to name non-random associations of alleles among loci, gametic disequilibrium and linkage disequilibrium. Both terms refer to when genotypes are not independent. Linkage or gametic equilibrium means that the genotypes are independent. As stated above they do not need to be physically linked, that is why many researchers prefer to use the term gametic disequilibrium over linkage disequilibrium. Two genes on different chromosomes may be nonindependent and two loci physically close may be independent. To describe how these associations form and dissolve we need to describe the extent of non-random association and what it means biologically Here’s the basic model, there is a lot of 2 MULTIPLE LOCI math and manipulations but pay close attention to the equations that are bolded and two parameters r and D: Let’s walk through it. In the first step you have two parents that have two copies of the same alleles at both loci. For example parent on the left has AA and BB and parent on the right has aa and bb. In this case when these particular parents go through meiosis all their gametes will be identical and we can describe that gametic frequency in the gene pool as GAB and Gab. It follows that the parental genotype frequencies are (GAB)2 and (Gab)2. Parent on the left will have all AB gametes and parent on the right will have all ab gametes. When they form a zygote the new individual (soon to be parent) is heterozygous with the following genotype frequency equal to 2GAB Gab + 2GAb GaB. When this new parent produces gametes there are four options, two from a recombination event, and two that are not. The AB and ab gametes are produced without a recombination event and the gametes aB and Ab are from recombination during meiosis. We can calculate the allele frequencies of the alleles at these two loci in a population by using gamete frequencies and using the following logic. A allele frequency = pA = GAB + GAb a allele frequency = qa = GaB + Gab B allele frequency = pB = GAB + GaB b allele frequency = qb = Gab + GAb We could also find the frequency of the GAB gamete this way: GAB’ = PAB/AB + ½PAB/aB + ½PAB/Ab Here P is the probability of each of those genotypes Is there an association between these two loci? While the above diagram and the ones below show physical linkage remember we can still make these measurements when they are not linked but associated. With random mating eventually, given enough time gametic equilibrium will be achieved between two loci that are physically nearby, provided there are NO other forces! To help figure out recombination rate we want to mate double heterozygous parents and look at the gametes more closely. There are two ways a parent can be heterozygous, as above they can result from two parents that are homozygous for the opposite alleles or from parents that are heterozygous. For example: 3 MULTIPLE LOCI For each of these there are four possible gametes 2 that are produced by recombination and 2 that are not. That means 50% of the gamete are recombinants. The parameter r is the recombination rate, a measure of independence of the 2 loci. If two loci are on different chromosomes or at equilibrium, then r = 0.5. Therefore, r will range from 0 to 0.5, when r = 0 there is no recombination you would only get the parental types, if r = 0.5 the above pattern would be found. Now let’s focus on the gamete produced by the parent on the left. Here the frequency of recombining is r/2 and the frequency of not recombining is (1-r)/2. To help illustrate this we can use different numbers in place of r. First what happens when r is 0, that is when you have two loci that are completely linked. You’ll see that two gametes that resulted from recombination drop out, while the other two show ½ frequency. What about when there is complete recombination such that these two loci are segregating independently, then each of these gametes is at ¼ frequency. Now I’ll introduce a new parameter D. D is the linkage disequilibrium parameter for the population and is the difference between double heterozygotes. AND in the next generation: D = GABGab - GAbGaB OR as your book has it D = (G1G4) – (G2G3) D’ = (1-r)D 4 MULTIPLE LOCI In this case the linkage disequilibrium in generation t+1 is equal to the non recombination rate times linkage disequilibrium in generation t. If we want to know the gamete frequency with some rate of recombination then we can use all of the above to look at the change in the frequency of the gametes. First remember that GAB’ = PAB/AB + ½PAB/aB + ½PAB/Ab GAB’ = (GAB)2 + ½(2GABGAb) + ½(2GABGaB) + (1-r)GABGab + rGAbGaB You can resolve this equation down to: GAB’ = GAB –r(GABGab - GAbGaB) And if you’re super fancy, you’ll recall that D = GABGab - GAbGaB, therefore: GAB’ = GAB –rD What about the other gamete frequencies in the next generation: GAb’ = GAb + rD GaB’ = GaB + rD Gab’ = Gab – rD Now we can figure out how we get the D’ = (1-r)D, that is linkage disequilibrium in the next generation of the population. D = GABGab - GAbGaB For D’ we use the gamete frequencies of the double heterozygotes that we calculated for the next generation D’ = GAB’Gab’ - GAb’ GaB’ If we substitute from above then we have: D’ = (GAB –rD)( Gab – rD) – (GAb + rD)( GaB + rD) After some manipulation this will resolve to the D’ = (1-r)D Phew. Here’s the skinny: If you look at the basic D equation you’ll note that when GaBGAb is greater than GABGab D will be a negative number if GABGab is greater than GaBGAb then D will be a positive number. The sign, negative or positive, shows the direction of linkage. When D is equal to zero you’ve achieved linkage equilibrium or gametic equilibrium, the alleles at the two loci are segregating randomly. Therefore, the rate at which D decays to zero depends on the recombination rate. Remember from above that r ranges from 0 ≤ r ≤ 0.5, with 0.5 recombination rate indicating the two loci are in linkage equilibrium. 5 MULTIPLE LOCI Using this equation D’=(1-r)D if r = 0 then D’ = D there is no change generation to generation whereas if r = 0.5 D is halved each generation. The general equation for D over generations is this: Dt = (1-r)tD0 In this equation t = generations and D0 is the linkage disequilibrium of the population at time zero, or the founding population. Here too as r gets closer to zero there will be a low rate of recombination and a low rate of dissipation of linkage. This equation above is referred to as the geometric exponential rate of decay. Remember this could be two loci that are not on the same chromosome; I’ll talk more about this in class. After all that it turns out that D is a crappy measure of the relative amount of disequilibrium at different pairs of loci in the population because it is constrained by allele frequencies. Look back at how it is defined by allele frequencies. What happens when you have different allele frequencies for the same two loci in different populations? If you think about the gamete frequencies above D is determined by the frequency of the double heterozygotes, which is dependent on the frequencies of p1, p2, q1, q2 in the population. Alternatively we can use D’ which is D/Dmax. Dmax uses the allele frequencies to normalize D by using the smaller of pAqb or qapB. This means we can compare it to other locations. We will run through an example in class. Another measure is R2 = D2/(pAqapBqa) looks at the disequilibrium normalized by all possible combinations R2 = 0 to 1. The square root of R is the correlation coefficient in allelic state between alleles in the same gamete. Both D’ defined above and R2 are used and capture different aspects of associations. With R2 we gain information by looking at all allele frequencies that we may lose looking just at certain combinations of the max and min. On the other hand, D’ focuses on a measure of linkage disequilibrium and is mainly influenced by recombination. My hope is with this background you are ready to focus the relationships between linkage disequilibrium and evolutionary forces. 6