Download Document

Coalescent Theory in Biology www. coalescent.dk Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of sequenced data TGTTGT Parameter Estimation Model Testing CGTTAT CATAGT Wright-Fisher Model of Population Reproduction Haploid Model i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N Assumptions 1. Constant population size 2. No geography Diploid Model 3. No Selection 4. No recombination Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement P(k):=P{k alleles had k distinct parents} 1 1 2N Ancestor choices: k -> any (2N)k k -> k 2N *(2N-1) *..* (2N-(k-1)) =: (2N)[k] k -> k-1 k -> j k   (2N)[k1] 2 Sk, j (2N)[ j ] Sk,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.   k  For k << 2N:  / 2N k  2N[k ] 2 2  P(k)   (k  2N) 1 /2N  e   (2N) k 2 Waiting for most recent common ancestor - MRCA Distribution until 2 alleles had a common ancestor, X2?: P(X2 > 1) = (2N-1)/2N = 1-(1/2N) 1 1 2N P(X2 = j) = (1-(1/2N))j-1 (1/2N) P(X2 > j) = (1-(1/2N))j j j 2 2 1 1 1 2N 1 2N Mean, E(X2) = 2N. Ex.: 2N = 20.000, Generation time 30 years, E(X2) = 600000 years. 10 Alleles’ Ancestry for 15 generations Multiple and Simultaneous Coalescents 1. Simultaneous Events 2. Multifurcations. 3. Underestimation of Coalescent Rates Discrete  Continuous Time tc:=td/2Ne 6 6/2Ne 0 k  k  X k is exp[  ] distributed. E(X k )  1/  2 2 1.0 corresponds to 2N generations 1.0 2N 0 1 4 2 6 5 3 0.0 The Standard Coalescent Two independent Processes Continuous: Exponential Waiting Times Discrete: Choosing Pairs to Coalesce. Waiting {1,2,3,4,5} Coalescing (1,2)--(3,(4,5)) Exp2  {1,2}{3,4,5}   2    Exp3  {1}{2}{3,4,5}   2    Exp4  {1}{2}{3}{4,5}    2    Exp5  {1}{2}{3}{4}{5} 1 2 3 4 5   2    1--2 3--(4,5) 4--5 Expected Height and Total Branch Length Time Epoch Branch Lengths 1 1 2 1/3 1 2 3 k k  2 1 /     2  k (k  1) Expected Total height of tree: 2/(k-1) Hk= 2(1-1/k) i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles. Expected Total branch length in tree, Lk: 2*(1 + 1/2 + 1/3 +..+ 1/(k-1)) ca= 2*ln(k-1) Effective Populations Size, Ne. In an idealised Wright-Fisher model: i. loss of variation per generation is 1-1/(2N). ii. Waiting time for random alleles to find a common ancestor is 2N. Factors that influences Ne: i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller. ii. Population size variation - example k cycle: N1, N2,..,Nk. k/Ne= 1/N1+..+ 1/Nk. N1 = 10 N2= 1000 => Ne= 50.5 iii. Two sexes Ne = 4NfNm/(Nf+Nm)I.e. Nf- 10 Nm -1000 Ne - 40 6 Realisations with 25 leaves Observations: Variation great close to root. Trees are unbalanced. Sampling more sequences The probability that the ancestor of the sample of size n is in a sub-sample of size k is (n  1)(k  1) (n 1)(k  1) Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large. Adding Mutations m mutation pr. nucleotide pr.generation. L: seq. length µ = m*L Mutation pr. allele pr.generation. 2Ne - allele number. Q := 4N*µ -- Mutation intensity in scaled process. Continuous time Continuous sequence Discrete time Discrete sequence 1/L sequence sequence mutation mutation time time 1/(2Ne) coalescence Probability for two genes being identical: Q/2 Q/2 1 P(Coalescence < Mutation) = 1/(1+Q). Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult. Three Models of Alleles and Mutations. Infinite Allele Infinite Site Finite Site acgtgctt acgtgcgt acctgcat tcctgcat tcctgcat Q Q Q acgtgctt acgtgcgt acctgcat tcctggct tcctgcat i. Only identity, non-identity is determinable ii. A mutation creates a new type. represented by a line. i. Allele is represented by a sequence. ii. A mutation always hits a new position. ii. A mutation changes nucleotide at chosen position. i. Allele is Infinite Allele Model {(1)}  11 {(1,2)}  21 {(1), (2)}  12 {(1), (2)}  12 {(1), (2,3)}  1121 {(1), (2,3)}  1121 {(1,2), (3)( 4,5)}  1122 1 2 3 4 5 {(1), (2), (3)( 4,5)}  1 2 3 1 Infinite Site Model Final Aligned Data Set: Labelling and unlabelling:positions and sequences 1 2 3 4 5 Ignoring mutation position Ignoring sequence label 1 2 3 5 4 Ignoring mutation position { , , Ignoring sequence label } The forward-backward argument 2 5(4   ) 4 classes of mutation events incompatible with data  1 (4   )  9 coalescence events incompatible with data Infinite Site Model: An example Theta=2.12 2 3 2 5 3 4 5 9 10 5 14 19 33 Impossible Ancestral States Finite Site Model Final Aligned Data Set: acgtgctt acgtgcgt acctgcat tcctgcat tcctgcat s s s Diploid Model with Recombination An individual is made by: 1. The paternal chromosome is taken by picking random father. 2. Making that father’s chromosomes recombine to create the individuals paternal chromosome. Similarly for maternal chromosome. The Diploid Model Back in Time. A recombinant sequence will have have two different ancestor sequences in the grandparent. 1- recombination histories I: Branch length change 1 1 2 2 3 4 4 3 1 2 3 4 1- recombination histories II: Topology change 1 1 2 2 3 4 4 3 1 2 3 4 1- recombination histories III: Same tree 1 1 2 2 3 4 4 3 1 2 3 4 1- recombination histories IV: Coalescent time must be further back in time than recombination time. c r 1 2 3 4 Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 0 b  1 (1+b) 3 (2+b) 6 2 3 2 1 2 Age to oldest most recent common ancestor Scaled recombination rate -  0 kb 250 kb Number of genetic ancestors to the Human Genome time S– number of Segments E(S) = 1 +  C C C R R R sequence Simulations Statements about number of ancestors are much harder to make. Applications to Human Genome (Wiuf and Hein,97) Parameters used 4Ne 20.000 Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments 52.000 Ancestors 6.800 All chromosomes Ancestors 86.000 Physical Population. 1.3-5.0 Mill. A randomly picked ancestor: (ancestral material comes in batteries!) 0 260 Mb 0 52.000 *35 0 7.5 Mb 8360 6890 *250 0 30kb Ignoring recombination in phylogenetic analysis General Practice in Analysis of Viral Evolution!!! Recombination 1 2 3 4 1 2 4 Assuming No Recombination 3 Mimics decelerations/accelerations of evolutionary rates. No & Infinite recombination implies molecular clock. Simulated Example Genotype and Phenotype Covariation: Gene Mapping Sampling Genotypes and Phenotypes Decay of local dependency Time Reich et al. (2001) Genetype -->Phenotype Function Result:The Mapping Function Dominant/Recessive. Penetrance A set of characters. Binary decision (0,1). Spurious Occurrence Quantitative Character. Heterogeneity genotype Genotype  Phenotype phenotype

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document