Download Molecular Evolution

Rates of Nucleotide Substitution Dan Graur 1 r = Rate of substitution per site per year K = Number of substitutions per site per year K r 2T 2 Mean Rate of Nucleotide Substitution in Mammalian Nuclear Genomes Less than -9 10 substitutions/site/year Evolution is a very slow process at the molecular level. Not much happens in evolution. 3 Substitutions Rates in Protein-Coding Regions The rate of synonymous substitution is much larger than the nonsynonymous rate. 4 5 A lot A little 6 Synonymous substitutions are more frequent than nonsynonymous ones. 7 Mean nonsynonymous rate = 0.75  10–9 substitutions per site per year Mean synonymous rate = 3.65  10–9 substitutions per site per year The synonymous substitution rate is 5 times higher than the nonsynonymous substitution rate Coefficient of variation of nonsynonymous rate = 95% Coefficient of variation of synonymous rate = 31% 8 The distribution of KA to KS ratios in >13,000 orthologous protein-coding genes from human and chimpanzee 9 58 nucleotide differences 3 amino acid differences In a comparison of human and yeast ubiquitin genes, the inferred number of synonymous substitutions per synonymous site is ~6 (almost certainly indicative of saturation). The inferred number of nonsynonymous substitutions per nonsynonymous site is 0.03. Thus, synonymous substitutions have accumulated at least 200 times faster than nonsynonymous substitutions. 10 Ratio 1.5 4.4 1.1 11 Substitution Rates of in Noncoding Regions 12 13 Divergence between cow and goat b- and g-globin genes and between cow and goat b-globin pseudogenes ______________________________________________ Region K ______________________________________________ 5’ Flanking region 5.3  1.2 5’ 5’ Untranslated region 4.0  2.0 4-fold degenerate sites 8.6  2.5 Introns 8.1  0.7 3’ Untranslated region 8.8  2.2 3’ 3’ Flanking region 8.0  1.5 Pseudogenes 9.1  0.9 ______________________________________________ 14 15 Coding regions evolve slower than noncoding regions. 16 Evolutionary Rate Profiles 17 Alignment preproinsulin Xenopus Bos MALWMQCLP-LVLVLLFSTPNTEALANQHL MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. *:**** Xenopus Bos CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ CGSHLVEALYLVCGERGFFYTPKARREVEG ***************:***** ** :*::* Xenopus Bos AQVNGPQDNELDG-MQFQPQEYQKMKRGIV PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * ***** Xenopus Bos EQCCHSTCSLFQLENYCN EQCCASVCSLYQLENYCN **** *.***:******* 18 19 Functional regions evolve slower than nonfunctional regions. 20 21 22 Rates of amino acid replacement in different proteins 23 Fibrinogen to Fibrin • • • • Fibrinogen consists of 6 chains: 2a, 2b, 2g Fibrinopeptides are very negatively charged Fibrinopeptides A are cleaved first (to allow polymerization of fibrins) Fibrinopeptides B are cleaved second (to enhance crosslinking) 25 Important proteins evolve slower than unimportant ones. 26 27 Can we explain the different rates of substitution by the selectionist model? 1. Mutations can be either deleterious or advantageous. 2. If the fraction of advantageous mutations is large, the rate of evolution will be high. If the fraction of advantageous mutations is small, the rate of evolution will be low. 3. A mutation occurring at a functional site has a higher probability of being advantageous than a mutation occurring at a nonfunctional site. Expectation: Important entities should evolve faster than less important ones. 29 Can we explain the different rates of substitution by the selectionist model? 1. Mutations can be either deleterious or advantageous. 2. If the fraction of advantageous mutations is large, the rate of evolution will be high. If the fraction of advantageous mutations is small, the rate of evolution will be low. 3. A mutation occurring at a functional site has a higher probability of being advantageous than a mutation occurring at a nonfunctional site. Expectation: Important entities should evolve faster than less important ones. 30 Can we explain the different rates of substitution by the neutralist model? 1. Mutations can be either deleterious or neutral. 2. If the fraction of deleterious mutations is large, the rate of evolution will be low. If the fraction of deleterious mutations is small, the rate of evolution will be high. 3. A mutation occurring at a functional site has a higher probability of being deleterious than a mutation occurring at a nonfunctional site. Expectation: Important entities should evolve slower than less important ones. 31 Can we explain the different rates of substitution by the neutralist model? 1. Mutations can be either deleterious or neutral. 2. If the fraction of deleterious mutations is large, the rate of evolution will be low. If the fraction of deleterious mutations is small, the rate of evolution will be high. 3. A mutation occurring at a functional site has a higher probability of being deleterious than a mutation occurring at a nonfunctional site. Expectation: Important entities should evolve slower than less important ones. 32 Kimura’s First Law of Molecular Evolution 33 Functional entities evolve slower than entities devoid of function. 34 Functional constraint = Degree of intolerance towards mutations at a genomic location. The functional constraint defines the range of alternative residues that are acceptable at a site without affecting negatively the fitness of the organism. 35 For neutral mutations: K=v Rate of substitution Mutation rate 36 Kimura’s model of functional constraint Suppose that a fraction, f0, of all mutations are selectively neutral and the rest (1 − f0) are deleterious. Advantageous mutations are assumed to occur only very rarely, such that their relative frequency is effectively zero. If we denote by vT the total mutation rate per unit time, then the rate of neutral mutation, v0, is v0  vT f0 37 v0  vT f0 According to the neutral theory, the rate of substitution is: Hence, K  v0 K  vT f0 The highest substitution rate is expected in sequences that do not have any function, such that all mutations are neutral f0  1 38 39 An evolutionary experiment Spalax ehrenberghi 40 aA-crystallin 41 In Spalax, aA-crystallin lost its functional role more than 25 million years ago, when the mole rat became subterranean and presumably lost use of its eyes. The aA-crystallin of Spalax evolves 20 times faster than the aA-crystallins in other rodents, such as rats, mice, hamsters, gerbils and squirrel. 42 Additional Facts: (1) The aA-crystallin of Spalax possess all the prerequisites for normal function and expression, including the proper signals for alternative splicing. (2) The aA-crystallin of Spalax evolves slower than pseudogenes. 43 Explanation 1: The aA-crystallin gene may not have lost all of its vision-related functions, such as photoperiod perception and adaptation to seasonal changes. Contradicting evidence: The atrophied eye of Spalax does not respond to light. 44 Explanation 2: The blind mole rat lost its vision more recently than 25 million years ago. The rate of nonsynonymous substitution after nonfunctionalization has been underestimated. Contradicting evidence: The aA-crystallin gene is still an intact gene as far as the essential molecular structures for its expression are concerned. 45 Explanation 3: The aA-crystallin-gene product serves another function (unrelated to that of the eye). aAcrystallin is a multifunctional protein Supporting evidence: 1. aA crystallin has been found in other tissues. 2. aA crystallin also functions as a chaperonin that binds denaturing proteins and prevents their aggregation. 3. The regions within aA crystallin responsible for chaperonin activity are conserved in the mole rat. 46 4. The protein has viable secondary and quarternary Genetic nonfunctionalization or partial nonfunctionalization accelerates evolution. Most evolutionary “action” occurs after death. 47 The Concept of Functional Constraint The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic region towards mutations. The functional or selective constraint defines the range of alternative nucleotides that is acceptable at a site without affecting negatively the function or structure of the gene or the gene product. DNA regions, in which a mutation is likely to affect function, have a more stringent functional constraint than regions devoid of function The stronger the functional constraints on a macromolecule are, the slower its rate of substitution will be. Functional density (Zuckerkandl 1976) The functional density, F, of a gene is defined as ns/N, where ns is the number of sites committed to specific functions and N is the total number of sites. F, therefore, is the proportion of amino acids that are subject to stringent functional constraints. Functional density (Zuckerkandl 1976) The higher the functional density, the lower the rate of substitution is expected to be. Thus, a protein in which the active sites constitute only 1% of its sequence will be less constrained, and therefore will evolve more quickly than a protein that devotes 50% of its sequence to performing specific biochemical or physiological tasks. According to the neutral theory of evolution, the rate of substitution (as inferred from between-species comparisons) should positively correlate with the degree of genetic polymorphism (as inferred from comparisons among individuals within one species). An interesting corollary of this hypothesis is that we should observe very little or no variation at the population level at evolutionary conserved positions. The variation observed at conserved positions should be mostly deleterious (i.e., associated with disease). Substitution rates and disease: The case of Gaucher disease Gaucher disease is an autosomal recessive lysosomal storage disorder due to deficient activity of an enzyme called acid b-glucosidase. There are many subtypes of Gaucher disease with fitness effects ranging from slight reduction in fitness to perinatally lethal, in which death occurs during the period between 154 days of gestation to seven days after birth. b-glucosidase We aligned the amino acid sequences of acid b-glucosidase from nine placental mammals (human, chimpanzee, Sumatran orangutan, bovine, pig, dog, horse, rat, and mouse). The length of the alignment (excluding one gap due to a codon deletion in the ancestor of mouse and rat) was 496 amino-acids, of which 387 (78%) were identical in all nine species and 109 (22%) were variable.. Thirty-six single amino-acid replacements (at 34 amino-acid positions) resulting in Gaucher disease are described in the literature. Perinatal lethal mutations are shown in red. All 36 deleterious mutations occur at completely conserved sites (below asterisks). The expectation under a random model is that only 36 × 0.78 = 28 mutations should occur at completely conserved sites. This statistically significant non-random association between disease and evolutionary conservation (p = 0.0002) indicates that invariable sites are conserved because they evolve under extremely stringent functional constraints and cannot tolerate change. Q: What determines functional constraint? A: Many factors. Q: Example? A: Interactions. A network (or graph) is an abstract representation of a set of objects, where some objects are connected to one another. The objects are represented by vertices (or nodes), and the links that connect the vertices are called edges (or branches). Edges can be polarized Edges can be polarized to indicate directionality and type of interaction (e.g., activation, inhibition). Edges can also be quantified to denoted extent of effect. Protein-protein interaction networks (a) A simple example of a protein-protein interaction network consisting of five proteins (A-E), represented by the nodes, each of which interacts with at least one other protein. There are five interactions, denoted by the links. In biological networks, three variables are usually studied: (b) degree centrality or connectedness = the number of interactions for a protein. (c) betweenness centrality = the number of times that a node appears on the shortest path between all pairs of nodes. (d) closeness centrality = the mean number of links connecting a protein to all other proteins in the network. Proteins with high connectedness evolve slowly. Proteins with low connectedness evolve fast. Proteins with high betweenness evolve slowly. Proteins with low betweenness evolve fast. Proteins with high closeness evolve slowly. Proteins with low closeness evolve fast. Why do the rates of synonymous substitution vary from gene to gene? (1) The variation represents stochastic fluctuations. (2) The variation is due to deterministic factors on top of stochastic fluctuations. (2.1) Variation in the rate of mutation among different regions of the genome. (2.2) Selection operating on synonymous mutations. Fact: There is a positive correlation between synonymous and nonsynonymous substitution rates in a gene. Explanations: (1) The rate of mutation varies along the genome and among genes (and hence some genes will have both high synonymous and nonsynonymous rates of substitution) (2) The extent of selection at synonymous sites is affected by the nucleotide composition at adjacent nonsynonymous positions. (3) (1) and (2). In the absence of positive Darwinian selection, the universal observation is that important sequences tend to evolve slower than less important ones. The opposite, however, is not always true. That is, conserved regions in the genome may not always be important. Defining “importance” is not a trivial undertaking. Hurst and Smith (1999) tested the relationship between rate of substitution and dispensability (a proxy for importance). Approximately two thirds of all knockouts of individual mouse genes give rise to viable fertile mice. These genes have been termed “non-essential,” in contrast to “essential” genes, the knockouts of which result in death or infertility. It is predicted that non-essential genes will subject to lesser intensities of purifying selection, and should therefore evolve faster than essential genes. In a comparison of 74 non-essential genes with 64 essential ones, the rate of substitution was found not to correlate with the severity of the knockout phenotype. To account for differences in function, Hurst and Smith (1999) restricted their analysis exclusively to neuron-specific genes, which have significantly lower rates of substitution than other genes. They could find no difference in the rate of substitution between 16 essential neuron-specific genes and 18 non-essential ones. The functional role (if any) of ~98% of mammalian genomes remains undetermined. Nóbrega et al. (2004) deleted ~2 Mb-long sequences from the mouse genome, a 1,817,000 region mapping to mouse chromosome 3 and a 983,000 region mapping to chromosome 19. (Orthologous regions of about the same size are present on human chromosomes 1 and 10, respectively.) Viable mice homozygous for the deletions were generated and were indistinguishable from wild-type littermates with regard to morphology, reproductive fitness, growth, longevity, and general homeostasis. Further analysis of the expression of multiple genes bracketing the deletions revealed only minor expression differences between homozygous-deletion mice and wild-type mice. The two deleted segments harbor 1,243 non-coding sequences conserved between humans and rodents (more than 100 base pairs, 70% identity). Yet, the deletion of so many sequences that have been conserved for such long period of time (mouse-human divergence ≈ 100 million years) resulted in no reduction in fitness. Conclusion I: There are potentially ‘disposable DNA’ in the genomes of mammals. Conclusion II: Sequence conservation may not necessarily indicate constraint. Ahituv et al. (2007) removed from the mouse genome four ultraconserved elements— sequences of 200 base pairs or longer that are 100% identical among human, mouse, and rat. Remarkably, lines of mice homozygous for the four deletions were viable and fertile, and failed to reveal any developmental or phenotypic abnormalities. These results indicate that extreme sequence conservation may not necessarily reflect extreme evolutionary constraint. There must be forces other than selection that promote sequence conservation.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Molecular Evolution