Download Lecture 23 (11/16/2007): Population Genetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ploidy wikipedia , lookup

Genome evolution wikipedia , lookup

Frameshift mutation wikipedia , lookup

Tag SNP wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Hybrid (biology) wikipedia , lookup

Dominance (genetics) wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genetic drift wikipedia , lookup

Homologous recombination wikipedia , lookup

Polyploid wikipedia , lookup

Gene expression programming wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Koinophilia wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
Population Genetics Basics
1
Terminology review
•
•
•
•
Allele
Locus
Diploid
SNP
2
Single Nucleotide Polymorphisms
Infinite Sites Assumption:
Each site mutates at most
once
00000101011
10001101001
01000101010
01000000011
00011110000
00101100110
3
What causes variation in a population?
•
•
•
•
Mutations (may lead to SNPs)
Recombinations
Other genetic events (gene conversion)
Structural Polymorphisms
4
Recombination
00000000
11111111
00011111
5
Gene Conversion
•
Gene Conversion
versus crossover
–
Hard to distinguish in
a population
6
Structural polymorphisms
•
Large scale structural changes
(deletions/insertions/inversions) may occur
in a population.
7
Topic 1: Basic Principles
•
In a ‘stable’ population, the distribution of
alleles obeys certain laws
–
•
HW Equilibrium
–
•
Not really, and the deviations are interesting
(due to mixing in a population)
Linkage (dis)-equilibrium
–
Due to recombination
8
Hardy Weinberg equilibrium
•
•
•
•
Consider a locus with 2 alleles, A, a
p (respectively, q) is the frequency of A (resp.
a) in the population
3 Genotypes: AA, Aa, aa
Q: What is the frequency of each genotype
If various assumptions are satisfied, (such as random
mating, no natural selection), Then
• PAA=p2
• PAa=2pq
• Paa=q2
9
Hardy Weinberg: why?
•
Assumptions:
–
–
–
–
–
•
Diploid
Sexual reproduction
Random mating
Bi-allelic sites
Large population size, …
Why? Each individual randomly picks his two
chromosomes. Therefore, Prob. (Aa) = pq+qp
= 2pq, and so on.
10
Hardy Weinberg: Generalizations
•
Multiple alleles with frequencies
–
By HW,
1,2, , H
Pr[homozygous genotype i] =  i2
 Pr[heterozygous genotype i, j] = 2 
i j
•
Multiple loci?

11
Hardy Weinberg: Implications
•
•
•
•
The allele frequency does not change from generation to
generation. Why?
It is observed that 1 in 10,000 caucasians have the disease
phenylketonuria. The disease mutation(s) are all recessive.
What fraction of the population carries the mutation?
Males are 100 times more likely to have the “red’ type of color
blindness than females. Why?
Conclusion: While the HW assumptions are rarely satisfied, the
principle is still important as a baseline assumption, and
significant deviations are interesting.
12
Recombination
00000000
11111111
00011111
13
What if there were no recombinations?
•
•
•
Life would be simpler
Each individual sequence would have a
single parent (even for higher ploidy)
The relationship is expressed as a tree.
14
The Infinite Sites Assumption
00000000
3
00100000
5
8
00100001
•
•
•
00101000
The different sites are linked. A 1 in position 8 implies 0 in
position 5, and vice versa.
Some phenotypes could be linked to the polymorphisms
Some of the linkage is “destroyed” by recombination
15
Infinite sites assumption and Perfect Phylogeny
•
•
Each site is mutated at
most once in the history.
All descendants must carry
the mutated value, and all
others must carry the
ancestral value
i
1 in position i
0 in position i
16
Perfect Phylogeny
•
•
Assume an evolutionary model in which no
recombination takes place, only mutation.
The evolutionary history is explained by a
tree in which every mutation is on an edge of
the tree. All the species in one sub-tree
contain a 0, and all species in the other
contain a 1. Such a tree is called a perfect
phylogeny.
17
The 4-gamete condition
•
•
•
A column i partitions the set
of species into two sets i0,
and i1
A column is homogeneous
w.r.t a set of species, if it has
the same value for all
species. Otherwise, it is
heterogenous.
EX: i is heterogenous w.r.t
{A,D,E}
A
i0 B
C
D
i1 E
F
i
0
0
0
1
1
1
18
4 Gamete Condition
•
4 Gamete Condition
–
–
–
There exists a perfect phylogeny if and only if for all pair of
columns (i,j), either j is not heterogenous w.r.t i0, or i1.
Equivalent to
There exists a perfect phylogeny if and only if for all pairs of
columns (i,j), the following 4 rows do not exist
(0,0), (0,1), (1,0), (1,1)
19
4-gamete condition: proof
•
•
•
Depending on which
edge the mutation j
occurs, either i0, or i1
should be homogenous.
(only if) Every perfect
phylogeny satisfies the 4gamete condition
(if) If the 4-gamete
condition is satisfied,
does a prefect phylogeny
exist?
i
i0
i1
20
An algorithm for constructing a perfect phylogeny
•
•
We will consider the case where 0 is the ancestral
state, and 1 is the mutated state. This will be fixed
later.
In any tree, each node (except the root) has a single
parent.
–
•
•
It is sufficient to construct a parent for every node.
In each step, we add a column and refine some of
the nodes containing multiple children.
Stop if all columns have been considered.
21
Inclusion Property
•
•
For any pair of columns i,j
– i < j if and only if i1  j1
Note that if i<j then the edge
containing i is an ancestor of
the edge containing i
i
j
22
Example
1 2 3 4 5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 1 0
D 0 0 1 0 1
E 1 0 0 0 0
r
A
B
C D
E
Initially, there is a single clade r, and
each node has r as its parent
23
Sort columns
•
•
Sort columns according to the
inclusion property (note that the
columns are already sorted
here).
This can be achieved by
considering the columns as
binary representations of
numbers (most significant bit in
row 1) and sorting in decreasing
order
A
B
C
D
E
1
1
0
1
0
1
2
1
0
1
0
0
3
0
1
0
1
0
4
0
0
1
0
0
5
0
0
0
1
0
24
Add first column
•
In adding column i
–
–
Check each edge and
decide which side you
belong.
Finally add a node if
you can resolve a clade
A
B
C
D
E
1 2 3 4 5
1 1 0 0 0
0 0 1 0 0
1 1 0 1 0
0 0 1 0 1
1 0 0 0 0
r
u
A
C
E
B
D
25
Adding other columns
•
Add other
columns on
edges using the
ordering
property
A
B
C
D
E
1
1
0
1
0
1
2
1
0
1
0
0
3
0
1
0
1
0
4
0
0
1
0
0
5
0
0
0
1
0
r
1
E
3
2
B
5
4
D
C
A
26
Unrooted case
•
•
Switch the values in each column, so that 0 is
the majority element.
Apply the algorithm for the rooted case
27
Handling recombination
•
•
A tree is not sufficient as a sequence may have 2
parents
Recombination leads to loss of correlation between
columns
28
Linkage (Dis)-equilibrium (LD)
•
•
Consider sites A &B
Case 1: No recombination
– Pr[A,B=0,1] = 0.25
•
•
Linkage disequilibrium
Case 2:Extensive
recombination
– Pr[A,B=(0,1)=0.125
•
Linkage equilibrium
A
0
0
0
0
1
1
1
1
B
1
1
0
0
0
0
0
0
29
Handling recombination
•
•
A tree is not sufficient as a sequence may
have 2 parents
Recombination leads to loss of correlation
between columns
30
Recombination, and populations
•
•
•
•
•
Think of a population of N individual chromosomes.
The population remains stable from generation to
generation.
Without recombination, each individual has exactly
one parent chromosome from the previous
generation.
With recombinations, each individual is derived from
one or two parents.
We will formalize this notion later in the context of
coalescent theory.
31