Download Lecture 3: Allele Frequencies and Hardy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inbreeding wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Genetic drift wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
Lecture 3: Allele Frequencies
and Hardy-Weinberg
Equilibrium
August 24, 2015
Last
Time
◆ Review of genetic variation and
Mendelian Genetics
Ø  Sample calculations for Mendelian expectations: see
solutions in excel file on website
◆ Methods for detecting variation
Ø  Morphology
Ø  Allozymes
Ø  DNA Markers (deferred to Friday: Guest lecture)
➤ Anonymous
➤ Sequence-tagged
Today
◆ Introduction to statistical distributions
◆ Estimating allele frequencies
◆ Introduction to Hardy-Weinberg
Equilibrium
◆ Using Hardy-Weinberg: Estimating allele
frequencies for dominant loci
Statistical Distributions: Normal Distribution
◆  Many types of estimates follow normal distribution
Ø  Can be visualized as a frequency distribution (histogram)
Ø  Can interpret as a probability density function
2 sd
1 sd
Expected Value (Mean):
1 n
x = ∑ xi
n i =1
where n is the
number of samples
Variance (Vx): A measure of
the dispersion around the
mean:
1 n
2
Vx =
(
x
−
x
)
∑ i
n − 1 i =1
Standard Deviation (sd): A
measure of dispersion around
the mean that is on same scale
as mean
sd = Vx
Standard Error of Mean
◆  Standard Deviation is a measure of how individual points
differ from the mean estimates in a single sample
◆  Standard Error is a measure of how much the estimate
differs from the true parameter value (in the case of
means, µ)
Ø  If you repeated the experiment, how close would you expect
the mean estimate to be to your previous estimate?
Standard Error of the Mean (se):
95% Confidence Interval:
se =
Vx
n
x ± 1.96( se)
Estimating Allele Frequencies, Codominant Loci
◆  Measured allele frequency is maximum likelihood estimator
of the true frequency of the allele in the population (See
Hedrick, pp 82-83 for derivation)
p=
1
N12
2
N
N11 +
◆  Expected number of observations of allele A1: E(Y)=np
Ø  Where n is number of samples
Ø  For diploid organisms, n = 2N , where N is number of
individuals sampled
◆  Expected number of observations of allele A1 is analogous
to the mean of a sample from a normal distribution
◆  Allele frequency can also be interpreted as an estimate of
the mean
Allele Frequency Example
◆  Assume a population of Mountain Laurel
(Kalmia latifolia) at Cooper’s Rock, WV
Red buds: 5000
Pink buds: 3000
White buds: 2000
A 1A 1
A 1A 2
A 2A 2
◆  Phenotype is determined by a single,
codominant locus: Anthocyanin
◆  What is frequency of “red” alleles (A1), and “white”
alleles (A2)?
Frequency of A1 = p
1
N11 + N12
2 N11 + N12
2
p=
=
,
N
2N
Frequency of A2 = q
1
N 22 + N12
2 N 22 + N12
2
q=
=
,
N
2N
Allele Frequencies are Distributed as Binomials
◆  Based on samples from a population
Ø  For two-allele system, each sample is like a “trial”
Ø  Does the individual contain Allele A1?
Ø  Remember, q=1-p, so only one parameter is estimated
◆  Binomials are variables that can be interpreted as the
number of successes and failures in a series of trials
⎛ n ⎞ y n − y
P(Y = y ) = ⎜⎜ ⎟⎟ s f ,
⎝ y ⎠
Number of ways
of observing y
positive results
⎛ n ⎞
in n trials
⎜ ⎟
where s is the probability
of a success, and
f is the probability of a
failure
Probability of
observing y positive
results in n trials once
n!
⎜ y ⎟ = C = y!(n − y )!
⎝ ⎠
n
y
Given the allele frequencies that you
calculated earlier for Cooper’s Rock
Kalmia latifolia, what is the probability
of observing two “white” alleles in a
sample of two plants?
Variation in Allele Frequencies, Codominant Loci
◆  Binomial variance is pq or p(1-p)
◆  Variance in number of observations of A1: V(Y) = np(1-p)
◆  Variance in allele frequency estimates (codominant, diploid):
Vp =
p(1 − p)
2N
◆  Standard Error of allele frequency estimates:
SE p =
p(1 − p)
2N
◆  Notice that estimates get better as sample size increases
◆  Notice also that variance is maximum at intermediate allele
frequencies
Maximum variance as a function of allele
frequency for a codominant locus
0.3
0.25
p (1-p )
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
0.4
0.5
p
0.6
0.7
0.8
0.9
1
Why is variance highest at intermediate
allele frequencies?
p = 0.5
p = 0.125
If this were a target, how variable would your outcome
be in each case (red versus white hits)?
Variance is constrained when value approaches limits (0 or 1)
What if there are more than 2 alleles?
◆  General formula for calculating allele frequencies in
multiallelic system with codominant alleles:
1 n
N ii + ∑ N ij
2 j =1
pi =
, j≠i
N
◆  Variance and Standard Error of allele frequency estimates
remain:
V pi =
pi (1 − pi )
SE pi =
2N
pi (1 − pi )
2N
How do we estimate allele frequencies for
dominant loci?
Codominant locus
-
+
A1A1
A1A2
A2A2
Dominant locus
A1A1
A1A2
A2A2
Hardy-Weinberg Law
◆ After one generation of random mating,
single-locus genotype frequencies can be
represented by a binomial (with 2 alleles)
or a multinomial function of allele
frequencies
2
2
( p + q) = p + 2 pq + q
Frequency of A1A1 (P)
Frequency of A1A2 (H)
2
Frequency of A2A2 (Q)
Hardy-Weinberg Law
◆ Hardy and Weinberg came up with this
simultaneously in 1908
◆ After one generation of random mating,
single-locus genotype frequencies can be
represented by a binomial (with 2 alleles)
or a multinomial function of allele
frequencies
2
2
( p + q) = p + 2 pq + q
Frequency of A1A1 (P)
Frequency of A1A2 (H)
2
Frequency of A2A2 (Q)
Hardy-Weinberg Equilibrium
◆ After one generation of random mating,
genotype frequencies remain constant, as
long as allele frequencies remain constant
◆ Provides a convenient Neutral Model to
test for departures from assumptions
◆ Allows genotype frequencies to be
represented by allele frequencies:
simplification of calculations
New Notation
Genotype
AA
Aa
aa
Frequency
P
H
Q
Allele
A
a
Frequency
p
q
Hardy-Weinberg Assumptions
◆ Diploid
◆ Large population
◆ Random Mating: equal probability of
mating among genotypes
◆ No mutation
◆ No gene flow
◆ Equal allele frequencies between sexes
◆ Nonoverlapping generations
Graphical Representation of
Hardy-Weinberg Law
(p+q)2 = p2 + 2pq + q2 = 1
Relationship Between Allele
Frequencies and Genotype Frequencies
under Hardy-Weinberg
Hardy-Weinberg Law and Probability
A(p)
a(q)
A
(p)
AA (p2)
Aa (pq)
a
(q)
aA (qp)
aa (q2)
p2 + 2pq + q2 = 1
How does Hardy-Weinberg Work?
◆  Reproduction is a sampling process
◆  Example: Mountain Laurel at Cooper’s Rock
Red Flowers: 5000
Pink Flowers: 3000
White Flowers: 2000
Alleles:
: A2=14
: A1=26
A 1A 1
A 1A 2
A 2A 2
Frequency of A1 = p = 0.65
Frequency of A2 = q = 0.35
What are expected numbers of phenotypes and
genotypes in a sample of 20 trees?
What are expected frequencies of alleles in pollen and ovules?
Genotypes:
: 4
: 10
: 6
Phenotypes:
: 4
: 10
: 6
What will be the genotype and
phenotype frequencies in the next
generation?
What assumptions must we make?
What about a 3-Allele System?
◆  Alleles occur in gamete pool at same frequency as in adults
◆  Probability of two alleles coming together to form a zygote
is A B
U
Pollen Gametes
A1 (p)
A2 (q)
A3 (r)
A1A1 = p2
A1A2 = 2pq
A1A3 = 2pr
A 2A 2 =
q2
A 3A 3 = r 2
Ovule Gametes
A2A3 = 2qr
A1
(p)
A2
(q)
A3
(r)
From Neal, D. 2004. Introduction to Population Biology.
◆  Equilibrium
established with
ONE GENERATION
of random mating
◆  Genotype
frequencies remain
stable as long as
allele frequencies
remain stable
◆  Remember
assumptions!