Download multiple loci - Burford Reiskind Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genetic variation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Tag SNP wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Human leukocyte antigen wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Genetic drift wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
MULTIPLE LOCI
Instructor: Dr. Martha B Reiskind
AEC 550: Conservation Genetics
Spring 2017
Notes for Lecture 6 Multiple Loci
Important: If you read the lecture notes before lecture it will help you get a
deeper understanding during lecture!!!!! This is a bit complicated so it will
likely help a lot to read this first.
Follow-up from last couple of lectures. I can only guess that there are some
questions about what FST really measures. We talk about it in a variety of ways and
you’ll often see many equations to the right of FST = ? so maybe a little time clarifying
what it measures would be helpful! The main way we have talked about this
measure is in the context of inbreeding. Recall that we are interested in how much
population subdivision has caused allele frequencies to change among the
subpopulations. We want to measure within subpopulations and compare it to the
total expected if there was no subdivision. As you calculate FST you’ll note that you
are measuring 1 minus the average expected heterozygosity over the
subpopulations and dividing it by that expected without subdivision. This is a
summary statistics that is only relevant in this context. You are not calculating it for
one subpopulation at a time.
Another question that often comes up is the different evolutionary forces working
together (drift, gene flow, and natural selection). I think the best way to think about
this conceptually is to follow along with the simulations that we talked about in
class and that are also covered in your book. Here there are some very important
points: (1) gene flow homogenizes divided subpopulations and therefore
counteracts divergent directional selection or the random change in frequencies of
alleles among subpopulations, (2) drift is stronger in small populations than in
larger populations and selection will be more effective in larger populations and/or
if selection is strong, and (3) in the case of overdominance both gene flow and
selection will minimize differences among subpopulations, make sure you
understand why. I would look over the results of those simulations again and then
we can talk about it more in class.
Another question that comes up is how to select genes for analysis. This really
depends on our question. As you can see from the paper we read on the green sea
turtles, what marker you use may give you a very different story. First thing to
consider, if we want to trace the evolutionary history, understand how populations
are connected by gene flow we want to make sure we have genes that are neutral.
That is to say they are not affected by adaptive evolution, natural selection. You can
imagine that if selection is driving a certain allele to fixation in one population and
the other allele in another population it would appear that these populations were
highly diverged if we used this locus. This might give us a false impression that the
two populations were more distantly related or were divided deeper in the past.
1
MULTIPLE LOCI
Remember that both drift and gene flow affect all the loci, but selection only affects
loci that are under selection or closely associated with loci that do. Here’s the rub,
sometimes you think your loci are neutral, such as microsatellite loci, but they may
not be. They may be closely linked to a locus that is not. That’s why it is important
to have many markers, in case one marker is behaving differently than the others.
To get a more precise measure of FST we really need to have as many random
samples of the genome as possible. Using one locus does not give us sense of the
background levels of structuring in the population. Think about the whale paper and
note that a more accurate understanding of the pre-whaling effective population
size was gained from multiple loci. This is why we talk about several loci being
better than one. On the flip side, maybe you’re interested in the adaptive differences
between two populations that may be in different environments in the range. Then
you do want to those loci that behaving differently than the average loci in the
population.
Finally, students in the past have asked about the non-linear relationship between
FST and Nm. First look at the graph and maybe even plug some numbers into the
equation and you’ll see that the curve is not linear. This is evident if you dissect the
equation as well. While this may seem counterintuitive, if you think about it
conceptually, as you increase the number of migrants per generation, you are
further homogenizing the population to a point; those last low frequency changes as
things are becoming more equal among the subpopulations decelerate as you get
closer to complete homogeneity (FST = 0). I will attach a paper that goes through a
bit about this relationship as well.
MULTIPLE LOCI
So far we have looked at allele and genotype frequencies associated with one locus
and 2 alleles. We have spent a little time talking about multiple alleles in
microsatellite loci when we were looking at the effects of bottlenecks or some of the
issues with measuring FST. Now we need to see how loci may interact within
individuals and therefore within the population. Previous assumptions included all
loci are transmitted independently of alleles at other loci and fitness of genotypes at
one locus are not affected by fitness at other loci. Now we can look at how loci may
be linked to each other physically or due to evolutionary forces and how this
connection is or is not broken by recombination. Effectively we are addressing nonrandom association of alleles among loci!
There are two terms used to name non-random associations of alleles among loci,
gametic disequilibrium and linkage disequilibrium. Both terms refer to when
genotypes are not independent. Linkage or gametic equilibrium means that the
genotypes are independent. As stated above they do not need to be physically
linked, that is why many researchers prefer to use the term gametic disequilibrium
over linkage disequilibrium. Two genes on different chromosomes may be nonindependent and two loci physically close may be independent. To describe how
these associations form and dissolve we need to describe the extent of non-random
association and what it means biologically Here’s the basic model, there is a lot of
2
MULTIPLE LOCI
math and manipulations but pay close attention to the equations that are bolded
and two parameters r and D:
Let’s walk through it. In the first step you have two parents that have two copies of the same
alleles at both loci. For example parent on the left has AA and BB and parent on the right
has aa and bb. In this case when these particular parents go through meiosis all their
gametes will be identical and we can describe that gametic frequency in the gene pool as GAB
and Gab. It follows that the parental genotype frequencies are (GAB)2 and (Gab)2. Parent
on the left will have all AB gametes and parent on the right will have all ab gametes. When
they form a zygote the new individual (soon to be parent) is heterozygous with the
following genotype frequency equal to 2GAB Gab + 2GAb GaB. When this new parent
produces gametes there are four options, two from a recombination event, and two that are
not. The AB and ab gametes are produced without a recombination event and the gametes
aB and Ab are from recombination during meiosis. We can calculate the allele frequencies of
the alleles at these two loci in a population by using gamete frequencies and using the
following logic.
A allele frequency = pA = GAB + GAb
a allele frequency = qa = GaB + Gab
B allele frequency = pB = GAB + GaB
b allele frequency = qb = Gab + GAb
We could also find the frequency of the GAB gamete this way:
GAB’ = PAB/AB + ½PAB/aB + ½PAB/Ab
Here P is the probability of each of those genotypes
Is there an association between these two loci? While the above diagram and the ones
below show physical linkage remember we can still make these measurements when they
are not linked but associated. With random mating eventually, given enough time gametic
equilibrium will be achieved between two loci that are physically nearby, provided there
are NO other forces! To help figure out recombination rate we want to mate double
heterozygous parents and look at the gametes more closely. There are two ways a parent
can be heterozygous, as above they can result from two parents that are homozygous for
the opposite alleles or from parents that are heterozygous. For example:
3
MULTIPLE LOCI
For each of these there are four possible gametes 2 that are produced by recombination
and 2 that are not. That means 50% of the gamete are recombinants. The parameter r is the
recombination rate, a measure of independence of the 2 loci. If two loci are on different
chromosomes or at equilibrium, then r = 0.5. Therefore, r will range from 0 to 0.5, when r =
0 there is no recombination you would only get the parental types, if r = 0.5 the above
pattern would be found. Now let’s focus on the gamete produced by the parent on the left.
Here the frequency of recombining is r/2 and the frequency of not recombining is (1-r)/2.
To help illustrate this we can use different numbers in place of r. First what happens when
r is 0, that is when you have two loci that are completely linked. You’ll see that two gametes
that resulted from recombination drop out, while the other two show ½ frequency. What
about when there is complete recombination such that these two loci are segregating
independently, then each of these gametes is at ¼ frequency. Now I’ll introduce a new
parameter D. D is the linkage disequilibrium parameter for the population and is the
difference between double heterozygotes.
AND in the next generation:
D = GABGab - GAbGaB
OR as your book has it
D = (G1G4) – (G2G3)
D’ = (1-r)D
4
MULTIPLE LOCI
In this case the linkage disequilibrium in generation t+1 is equal to the non recombination
rate times linkage disequilibrium in generation t.
If we want to know the gamete frequency with some rate of recombination then we can use
all of the above to look at the change in the frequency of the gametes.
First remember that GAB’ = PAB/AB + ½PAB/aB + ½PAB/Ab
GAB’ = (GAB)2 + ½(2GABGAb) + ½(2GABGaB) + (1-r)GABGab + rGAbGaB
You can resolve this equation down to:
GAB’ = GAB –r(GABGab - GAbGaB)
And if you’re super fancy, you’ll recall that D = GABGab - GAbGaB, therefore:
GAB’ = GAB –rD
What about the other gamete frequencies in the next generation:
GAb’ = GAb + rD
GaB’ = GaB + rD
Gab’ = Gab – rD
Now we can figure out how we get the D’ = (1-r)D, that is linkage disequilibrium in the next
generation of the population.
D = GABGab - GAbGaB
For D’ we use the gamete frequencies of the double heterozygotes that we calculated for the
next generation
D’ = GAB’Gab’ - GAb’ GaB’
If we substitute from above then we have:
D’ = (GAB –rD)( Gab – rD) – (GAb + rD)( GaB + rD)
After some manipulation this will resolve to the D’ = (1-r)D
Phew.
Here’s the skinny:
If you look at the basic D equation you’ll note that when GaBGAb is greater than GABGab D will
be a negative number if GABGab is greater than GaBGAb then D will be a positive number. The
sign, negative or positive, shows the direction of linkage. When D is equal to zero you’ve
achieved linkage equilibrium or gametic equilibrium, the alleles at the two loci are
segregating randomly. Therefore, the rate at which D decays to zero depends on the
recombination rate. Remember from above that r ranges from 0 ≤ r ≤ 0.5, with 0.5
recombination rate indicating the two loci are in linkage equilibrium.
5
MULTIPLE LOCI
Using this equation D’=(1-r)D if r = 0 then D’ = D there is no change generation to
generation whereas if r = 0.5 D is halved each generation. The general equation for D over
generations is this:
Dt = (1-r)tD0
In this equation t = generations and D0 is the linkage disequilibrium of the population at
time zero, or the founding population. Here too as r gets closer to zero there will be a low
rate of recombination and a low rate of dissipation of linkage. This equation above is
referred to as the geometric exponential rate of decay. Remember this could be two loci
that are not on the same chromosome; I’ll talk more about this in class.
After all that it turns out that D is a crappy measure of the relative amount of disequilibrium
at different pairs of loci in the population because it is constrained by allele frequencies.
Look back at how it is defined by allele frequencies. What happens when you have different
allele frequencies for the same two loci in different populations? If you think about the
gamete frequencies above D is determined by the frequency of the double heterozygotes,
which is dependent on the frequencies of p1, p2, q1, q2 in the population. Alternatively we
can use D’ which is D/Dmax. Dmax uses the allele frequencies to normalize D by using the
smaller of pAqb or qapB. This means we can compare it to other locations. We will run
through an example in class.
Another measure is R2 = D2/(pAqapBqa) looks at the disequilibrium normalized by all
possible combinations R2 = 0 to 1. The square root of R is the correlation coefficient in allelic
state between alleles in the same gamete. Both D’ defined above and R2 are used and
capture different aspects of associations. With R2 we gain information by looking at all
allele frequencies that we may lose looking just at certain combinations of the max and min.
On the other hand, D’ focuses on a measure of linkage disequilibrium and is mainly
influenced by recombination.
My hope is with this background you are ready to focus the relationships between linkage
disequilibrium and evolutionary forces.
6