Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 21 : Introduction to Phylogenetics November 9, 2015 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based tests of neutrality Ewens-Watterson Test Tajima’s D Today Signatures of selection Hudson-Kreitman-Aguade Test Synonymous versus Nonsynonymous substitutions McDonald-Kreitman Molecular clock Introduction to phylogenetics Hudson-Kreitman-Aguade (HKA) Test • Divergence between species should be proportional to variation within species (polymorphism) • Provides a correction factor for mutation rates at different sites • Perform test for loci under selection and supposedly neutral loci • Loci with less polymorphism than expected are candidates for selective sweeps within a species Hudson-Kreitman- Aguade(HKA) test Typical Gene Selective Sweep (Hamilton 266) Purifying selection in both lineages Hudson-Kreitman-Aguade (HKA) test Neutral Locus Polymorphism Divergence 8 3 20 8 Polymorphism: Variation within species Divergence: Variation between species Slide adapted from Yoav Gilad Test Locus A 8/20 ≈ 3/8 Hudson-Kreitman-Aguade (HKA) test Neutral Locus Polymorphism Divergence Test Locus B 8 3 20 19 8/20 >> 3/19 Conclusion: polymorphism lower than expected in Test Locus B: Selective sweep? Slide adapted from Yoav Gilad Sequence Evolution • DNA or protein sequences in different taxa trace back to a common ancestral sequence • Divergence of neutral loci is a function of the combination of mutation and fixation by genetic drift • Sequence differences are an index of time since divergence Molecular Clock • If neutrality prevails, nucleotide divergence between two sequences should be a function entirely of mutation rate 1 k = 2N m =m 2N Probability of creation of new alleles Probability of fixation of new alleles Time since divergence should therefore be the reciprocal of the estimated mutation rate Expected Time Until Fixation of a New Mutation: t 1 Since μ is number of substitutions per unit time Variation in Molecular Clock • If neutrality prevails, nucleotide divergence between two sequences should be a function entirely of mutation rate So why are rates of substitution so different for different classes of genes? Using Synonymous Substitutions to Control for Factors Other Than Selection dN/dS or Ka/Ks Ratios Types of Mutations (Polymorphisms) Synonymous versus Nonsynonymous SNP First and second position SNP often changes amino acid UCA, UCU, UCG, and UCC all code for Serine Third position SNP often synonymous Majority of positions are nonsynonymous Not all amino acid changes affect fitness: allozymes Synonymous & Nonsynonymous Substitutions • Synonymous substitution rate can be used to set neutral expectation for nonsynonymous rate • dS is the relative rate of synonymous mutations per synonymous site • dN is the relative rate of nonsynonymous mutations per non-synonymous site • = dN/dS – If = 1, neutral selection – If < 1, purifying selection – If > 1, positive Darwinian selection • For human genes, ≈ 0.1 Complications in Estimating dN/dS Multiple mutations in a codon give CGT(Arg)->AGA(Arg) multiple possible paths CGT(Arg)->AGT(Ser)->AGA(Arg) Two types of nucleotide base CGT(Arg)->CGA(Arg)->AGA(Arg) substitutions resulting in SNPs: transitions and transversions not equally likely Back-mutations are invisible Complex evolutionary models using likelihood and Bayesian approaches must be used to estimate dN/dS (also called KA/KS or KN/KS depending on method) (PAML package) http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html dn/ds ratios for 363 mouserat comparisons Most genes show purifying selection (dN/dS < 1) Some evidence of positive selection, especially in genes related to immune system interleukin-3: mast cells and bone marrow cells in immune system Hartl and Clark 2007 McDonald-Kreitman Test • Conceptually similar to HKA test • Uses only one gene • Contrasts ratios of synonymous divergence and polymorphism to rates of nonsynonymous divergence and polymorphism • Gene provides internal control for evolution rates and demography Application of McDonald-Kreitman Test: Aligned 11,624 gene sequences between human and chimp Calculated synonymous and nonsynonymous substitutions between species (Divergence) and within humans (SNPs) Identified 304 genes showing evidence of positive selection (blue) and 814 genes showing purifying selection (red) in humans Positive selection: defense/immunity, apoptosis, sensory perception, and transcription factors Purifying selection: structural and housekeeping genes Bustamente et al. 2005. Nature 437, 1153-1157 Phylogenetics Study of the evolutionary relationships among individuals, groups, or species Relationships often represented as dichotomous branching tree Extremely common approach for detecting and displaying relationships among genotypes Important in evolution, systematics, and ecology (phylogeography) Evolution C A D E B G H I J K L M F N Slide adapted from Marta Riutart O P Q R S T U V W X Y Z Ç What is a phylogeny? O P Q R S T U V W X Y Z Ç Homology: similarity that is the result of inheritance from a common ancestor Slide adapted from Marta Riutart Phylogenetic Tree Terms Group, cluster, clade Leaves, Operational Taxonomic Units (OTUs) terminal branches A B C D E F node interior branches ROOT Slide adapted from Marta Riutart G H I J Tree Topology Bacteria 1 Bacteria 2 Bacteria 3 Eukaryote 1 Eukaryote 2 Eukaryote 3 Eukaryote 4 (Bacteria1,(Bacteria2,Bacteria3),(Eukaryote1,((Eukaryote2,Eukaryote3),Eukaryote4))) Bacteria 1 Bacteria 2 Bacteria 3 Eukaryote 1 Slide adapted from Marta Riutart Eukaryote 2 Eukaryote 3 Eukaryote 4 Are these trees different? How about these? http://helix.biology.mcmaster.ca Rooted versus Unrooted Trees archaea eukaryote archaea Unrooted tree archaea eukaryote eukaryote eukaryote Rooted by outgroup bacteria outgroup archaea Monophyletic group archaea archaea eukaryote eukaryote root eukaryote eukaryote Slide adapted from Marta Riutart Monophyletic group Rooting with D as outgroup G A F E B D C A B C G E F Slide adapted from Marta Riutart D G A Now with C as outgroup F E B D C A G B E C G F E D F A B D C Which of these four trees is different? Baum et al. UPGMA Method Use all pairwise comparisons to make dendrogram UPGMA:Unweighted Pairwise Groups Method using Arithmetic Means Hierarchically link most closely related individuals Read the Lab 12 Introduction! Phenetics (distance) vs Cladistics (discrete character states) Lowe, Harris, and Ashton 2004 Parsimony Methods Based on underlying genealogical relationships among alleles Occam’s Razor: simplest scenario is the most likely Useful for depicting evolutionary relationships among taxa or populations Choose tree that requires smallest number of steps (mutations) to produce observed relationships Choosing Phylogenetic Trees MANY possible trees can be built for a given set of taxa Very computationally intensive to choose among these Lowe, Harris, and Ashton 2004 UN (2n 5)! 2n3 (n 3)! RN (2n 3)! (2n 3)U n n2 2 (n 2)! n=number of taxa Choosing Phylogenetic Trees Many algorithms exist for searching tree space Local optima are problem: need to traverse valleys to get to other peaks Heuristic search: cut trees up systematically and reassemble Branch and bound: search for optimal path through tree space Felsenstein 2004 9 8 9 10 9 9 9 7 8 11 11 5 Choosing Phylogenetic Trees If multiple trees equally likely, select majority rule or consensus Strict consensus is most conservative approach Bootstrap data matrix (sample with replacement) to determine robustness of nodes E 60 Lowe, Harris, and Ashton 2004 A D F CB 60 60 Felsenstein 2004