* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CHAPTER 24 Molecular Evolution
Mitochondrial DNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Human genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic code wikipedia , lookup
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Non-coding DNA wikipedia , lookup
Population genetics wikipedia , lookup
Maximum parsimony (phylogenetics) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Point mutation wikipedia , lookup
Designer baby wikipedia , lookup
Koinophilia wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Peter J. Russell CHAPTER 24 Molecular Evolution edited by Yue-Wen Wang Ph. D. Dept. of Agronomy,台大農藝系 NTU 遺傳學 601 20000 Chapter 24 slide 1 1. Populations and genes change over evolutionary time. Molecular evolution examines DNA and proteins, addressing two types of questions: a. How do DNA and protein molecules evolve? b. How are genes and organisms evolutionarily related? 2. Population genetics focuses on changes between generations. Molecular evolution considers the hundreds or thousands of generations needed for speciation, where small departures from Hardy-Weinberg equilibrium, random effects and slight differences in fitness can become very significant. 3. Development of techniques in molecular biology makes it possible to study molecular evolution, using genomes as historical records that can: a. Reveal the dynamics of evolutionary processes. b. Indicate the chronology of change. c. Identify phylogenetic relationships between organisms. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 2 Patterns and Modes of Substitutions Substitutions in Protein and DNA Sequences 1. Patterns of variation within homologous genes show that some amino acid substitutions are found more frequently than others. Substitutions often involve amino acids with similar chemical characteristics, supporting two evolutionary principles: a. Mutations are rare events. b. Most dramatic changes are removed by natural selection. 2. Chemically similar amino acids tend to have similar codons, and so may result from a single mutation. a. Natural selection acting on this variation produces proteins optimized for role and environment. b. More substantial alterations of protein structure are likely to be deleterious and so be removed from the gene pool. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 3 Sequence Alignments 1. Sequence comparison begins with alignment using computer algorithms based on the idea that the best alignments reflect true ancestral relationships. a. Matching nucleotides are interpreted as unchanged since a common ancestor. b. Substitutions, insertions and deletions can be identified. c. Gaps inserted to maximize the similarity between aligned sequences indicate the occurrence of insertions or deletions (indels). 2. Many alignments are possible between sequences, and algorithms typically maximize the matching number of amino acids or nucleotides, invoking the smallest possible number of indel events. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 4 Substitutions and the Jukes-Cantor Model 1. When DNA sequences diverge, they begin to collect mutations. The number of substitutions (K) found in an alignment is widely used in molecular evolution analysis. a. If the alignment shows few substitutions, a simple count is used. b. If many substitutions have occurred, it is likely that a simple count will underestimate the substitution events, due to the probability of multiple changes at the same site (Figure 24.1). 2. Jukes and Cantor (1969) assumed that each nucleotide is equally likely to change into any other nucleotide, and created a mathematical model to describe multiple base substitutions. a. Rate of change to any of the other three nucleotides is designated α, so the overall rate of substitution for any given nucleotide is 3α. b. For example, if the beginning (t = 0) nucleotide was C, the probability (P) of the site still being C at the first time point (t = 1), is PC(1) = 1 - 3α. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 5 c. After more time has passed (t = 2), the probability (PC(2)) is calculated from the equation: PC(2) = (1 - 3α)PC(1) + α[1 - PA(1)]. d. The probability of that site containing C at any given time in the future is defined by the equation: PC(t) = 1⁄4 + (3⁄4)e-4αt. 3. As data became available a decade later, the observation that different mutations occur at different rates (e.g., transitions are more common than transversions) revealed oversimplifications in the Jukes-Cantor model. 4. The model provided a framework to estimate actual number of substitutions (K) when multiple substitutions were possible. a. K = -3⁄4 ln(1 - 4⁄3p) b. p is the fraction of nucleotides that are different in a simple count of sequence mismatches. i. When few mismatches are observed, p is small and the chance of multiple mutations at a given site is also small. ii. When many mismatches are counted, the actual number of substitutions is calculated to be even larger than the direct count. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 6 Fig. 24.1 Two possible scenarios in which multiple substitutions at a single site would lead to underestimation of the number of substitutions that had occurred if a simple count was performed (t = time) Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 7 Rates of Nucleotide Substitutions 1. The number of substitutions in homologous sequences since divergence is central to molecular evolution analysis. a. Number of substitutions per site (K) coupled with divergence time (T) is converted to a rate (r) of substitution in the equation: r = K/(2T). b. Substitutions are assumed to accumulate simultaneously and independently in both species. 2. Substitution rate comparison provides insight into the mechanisms of molecular change and evolutionary events. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 8 Variation in Evolutionary Rates Within Genes 1. Studies show that different regions of genes evolve at different rates. 2. Distinctions are seen between and within coding and noncoding regions. Examples of noncoding regions include introns, leaders and trailers, nontranscribed flanking regions and pseudogenes. 3. Even within the coding region, not all nucleotide substitutions create changes in the gene product (e.g., a substitution at the third position of a codon may produce a synonymous codon). 台大農藝系 遺傳學 601 20000 Chapter 24 slide 9 Synonymous and NonSynonymous Sites 1. Different gene regions evolve at different rates (Table 24.1). 2. Synonymous changes, which do not alter the amino acids in the protein, are found five times more often than nonsynonymous changes. a. Both types of change are equally likely to occur, but nonsynonymous changes are usually detrimental to fitness, and are eliminated by natural selection. b. This creates a distinction between mutations and substitutions: i. Mutations are changes in nucleotide sequences due to errors in replication or repair. ii. Substitutions are mutations that have passed through the filter of selection. c. Synonymous substitutions probably reflect the actual mutation rate in the genome. Nonsynonymous substitution rates do not. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 10 Flanking Regions 1. Changes in 3’ flanking regions have no known effect on amino acid sequence, and little effect on gene expression, so most are tolerated by natural selection. 2. Introns have rates of change higher than exons, but not as high as 3’ flanking regions, due to their need to retain: a. Sequences required at splice junctions and branch points. b. In some cases, alternative ORFs used by alternative splicing that takes place in some tissues but not others. 3. The 5’ flanking regions have lower rates of change than 3’ regions, due to the presence of promoters and other gene regulatory elements. Small changes in these sequences may have a large effect on protein production, and so be subject to natural selection. 4. Leader and trailer regions have lower rates than the 5’ flanking region, because they contain signals for processing and translation of mRNA. 5. Non-synonymous coding sequences have the lowest rate of change, because most protein coding sequences produce products optimized for their role and environment. Most substitutions are eliminated by natural selection. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 11 Pseudogenes 1. Non-functional pseudogenes have the highest evolution rate seen. No longer encoding proteins, changes do not impact fitness and are not eliminated by natural selection. 2. Between mice and humans, for example, pseudogenes show about five times as many changes as regions that encode proteins or regulate gene expression. 3. Natural selection evaluates the consequences of an enormous number of changes, on an evolutionary time scale. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 12 Codon Usage Bias 1. Codon bias is an example of the effect of small changes over many generations. a. The slightly lower rate of evolutionary change at synonymous sites compared with pseudogenes suggests that some triplet codons are favored over others. b. Sequence data show that synonymous codons are not used equally throughout the coding sequences of an organism. Leucine codons are an example: i. There are six codons that specify leucine (UUA, UUG, CUU, CUC, CUA and CUG). ii. 60% of the leucine codons in a bacterium are CUG. iii. In yeast, 80% of the leucine codons are UUG. 2. Selection appears to favor some codons over others. Proposed reasons why one codon would be more successful include: a. Synonymous codons may be recognized by different tRNAs, and some may be favored because their cognate tRNA is more abundant or efficient. b. Bonding energy between the tRNA and codon may differ, due to differences in base pairs. 3. Selective pressure acting on translation efficiency and/or bonding energy appears to be especially significant in: a. Genes that are expressed at high levels. b. Organisms with short generation times and large populations (e.g., bacteria,Chapter yeast and 台大農藝系 遺傳學 601 20000 24 slide 13 fruit flies). Variation in Evolutionary Rates Between Genes 1. Striking differences also occur in the rate of gene evolution within a species. The difference results from one or both of these factors: a. Differences in substitution frequency. b. The action of natural selection on a locus. 2. Distinguishing between adaptive and random changes in nucleotide sequences requires statistical analysis. a. An example is the McDonald-Kreitman test (1991) comparing within-species polymorphism with between-species divergence at synonymous and nonsynonymous sites in a gene. i. If the ratio of nonsynonymous to synonymous substitutions in a species is the same as between species, the substitutions are likely to be neutral. ii. If the ratios are not the same, natural selection must be responsible. b. In this analysis applied to mammals: i. Synonymous substitution rates usually differ by a factor of less than two. ii. Nonsynonymous substitutions show about 1,000-fold difference between different classes of genes (Table 24.2). 台大農藝系 遺傳學 601 20000 Chapter 24 slide 14 c. Variations in substitution rates between genes must also be largely due to differences in the intensity of natural selection at each locus. d. An example is two classes of genes, histones and apolipoproteins, which have different levels of functional constraint. i. Histones are essential DNA binding proteins in eukaryotes, and most substitutions decrease their ability to bind DNA. Histones are thus very slow to evolve, and are highly conserved across species. ii. Apolipoproteins nonspecifically bind lipids in their hydrophobic domains, and carrying them in the blood. The hydrophobic domains 遺傳學 601 20000 Chapter 24 slide 15 work with any similar hydrophobic台大農藝系 amino acids. 3. Amino acid substitutions are generally deleterious, but sometimes natural selection favors variability. Genes of the mammalian major histocompatability complex (MHC) are an example. a. The MHC genes are under pressure to diversify, and show a greater rate of nonsynonymous substitutions than synonymous ones. b. MHC is a large multigene family involved with the immune system’s ability to recognize foreign antigens. i. About 90% of humans receive different sets of MHC genes from each parent. ii. A sample of 200 humans will have 15–30 different alleles. c. MHC diversity reduces the risk of an entire population being vulnerable to infection with a single virus. It also drives viral evolution, which is much faster than that for mammalian genes, due to error-prone replication and diversifying selection. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 16 Rates of Evolution in Mitochondrial DNA 1. Organelle genomes are distinct from nuclear genomes in replication, transmission and their increased rate of substitutions. a. Mammalian mitochondria have about 15 kb of circular dsDNA (mtDNA). b. Human mtDNA encodes two rRNAs, 22 tRNAs and 13 proteins. 2. Mammalian mitochondrial genes have an average synonymous substitution rate about 10 times the average for nuclear genes. Possible reasons for the increased rate include: a. Lack of proofreading during replication. b. Differences in DNA repair. c. Higher levels of mutagens (e.g., oxygen free radicals) due to metabolic processes. d. Lack of selective pressure, since most cells contain several dozen mitochondria. 3. Mammalian mtDNA is inherited almost exclusively from the mother, and does not undergo meiosis, so all offspring have the maternal mtDNA genotype. Tracing maternal lines via mtDNA is a valuable tool in population genetics. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 17 Fig. 24.2 Lineage relationships among mtDNA types in pocket gophers Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 18 Molecular Clocks 1. Genes with similar functions can show very uniform rates of molecular evolution over long periods of time. 2. This led Zuckerkandl and Pauling (1960s) to suggest that amino acid changes accumulate at a constant rate over many tens of millions of years, functioning as a molecular clock that measures divergence from a common ancestor. 3. The molecular clock runs at different rates in different proteins. Comparison of the divergence between two homologous proteins correlates well with time since speciation. This allows calculation of: a. Phylogenetic relationships between species. b. The time of their divergence (in much the same way as radioactive decay is used to date geological times). 4. The molecular clock hypothesis has been challenged on the basis of: a. Inconsistencies with morphological (classical) evolution, based on a fossil record which has a more erratic tempo. 台大農藝系 遺傳學 Chapter 24 slide 19 b. Questions about the uniformity of evolutionary rates601 in 20000 all genes. Fig. 24.3 The molecular clock runs at different rates in different proteins Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 20 Relative Rate Tests 1. Divergence dates from the fossil record are of questionable accuracy, and so a method to estimate the overall rate of substitution in different lineages without knowing their divergence date was devised by Sarich and Wilson (1973). a. To determine the relative rate of substitution for two species, a third species less related to both is used as an outgroup (e.g., if humans and gorillas are compared, the outgroup might be a baboon, or other primate) (Figure 24.4). b. The number of substitutions between any two species is assumed to be the sum of the number of substitutions along the branches of the tree connecting them. c. Simple algebra is used to calculate the amount of divergence that has taken place since the two species last shared a common ancestor. 2. As DNA sequence data have become available, the molecular clock premise has been tested. a. Substitution rates are similar in rats and mice. b. Substitution rates in humans and apes are about 1⁄2 as rapid as those in rodents. c. The molecular clock clearly varies among taxonomic groups, complicating the use of molecular divergence to date the last common ancestor. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 21 d. In groups with a uniform clock (e.g., rodents) this model is useful. Fig. 24.4 Phylogenetic tree used in a relative rate test Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 22 Causes of Variation in Rates 1. Some possible explanations for the observed differences in evolutionary rates: a. Generation time varies greatly between species. Substitution rates should be related more closely to the number of germ line replications than to simple divergence times. b. Other differences in the lines since the time of divergence may be involved. These include: i. Average repair efficiency. ii. Average exposure to mutagens. iii. Opportunities to adapt to new ecological niches and environments. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 23 Molecular Phylogeny 1. Evolution is defined as genetic change in the face of selective dynamics, and so genetic relationships are key to understanding evolutionary relationships. 2. Organisms that are similar at the molecular level are expected to be more closely related than dissimilar organisms. Phylogenetic relationships among living things are inferred from molecular similarity. a. Before molecular biology, phenotype was used for evolutionary studies to infer genetic information. b. Original studies used gross anatomy. Later, behavioral, ultrastructural and biochemical traits were also used. c. Evolutionary trees were constructed for many groups of plants and animals, and these continue to provide a basis for evolutionary study. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 24 d. Phenotypes can be misleading, because they do not always reflect genetic relatedness. i. Sometimes similarities result from convergent evolution, complicating the study of divergence among organisms (e.g., wings alone would put birds, bats and insects in the same evolutionary group). ii. Not all organisms have easily studied phenotypic features (e.g., bacteria). iii. Among distant relatives (e.g., humans and bacteria), few phenotypic features are shared, and it is difficult to determine how such species should be compared. 3. Molecular evolution provides important information, because the effects of natural selection are generally less pronounced at the DNA sequence level. 4. Comparison of molecular and morphological phylogenies is valuable for examining the effect of natural selection on phenotypic differences at levels from molecular to gross anatomical. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 25 Phylogenetic Trees 1. Phylogenetic trees are used in phylogenetic reconstructions to describe the relationship between species. a. All living things on earth shared a common ancestor about 4 billion years ago. b. Every phylogenetic tree uses branches that connect adjacent nodes. i. Terminal nodes indicate taxa for which molecular information is available. ii. Internal nodes represent common ancestors of the two (or more) groups. c. Branch length may be scaled to show the amount of divergence between taxa. d. If all nodes on the tree have a common ancestor, it is possible to make it a rooted tree, indicating an evolutionary path. i. Unrooted trees show a relationship between nodes, and do not indicate an evolutionary path. ii. Roots for unrooted trees can usually be determined by using an outgroup for comparison. iii. In a situation where only three taxa are considered, there are three possible rooted trees, and only one unrooted tree (Figure 24.5). 台大農藝系 遺傳學 601 20000 Chapter 24 slide 26 Fig. 24.5 The relationship between three taxa can be described by only one unrooted tree but three different rooted trees Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 27 Number of Possible Trees 1. As more taxa are considered, the number of possible trees quickly becomes enormous (Table 24.3). 2. The number of trees can be determined for any number of taxa (n). a. For rooted trees (NR) the equation is: NR = (2n - 3)! ÷ 2n-2(n - 2)! b. For unrooted trees (NU) the equation is: NU = (2n - 5)! ÷ 2n- 3(n - 3)! 3. The value for n can be as large as every species, or even every individual. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 28 Gene Versus Species Trees 1. A gene tree is a phylogenetic tree based on divergence within a single homologous gene. a. A gene tree represents the history of the gene, but not necessarily the history of the species. b. Species trees usually analyze data from multiple genes. 2. Divergence within genes typically occurs prior to speciation. a. This means that members of separate groups may be more similar to each other than they are to members of their own population. b. Divergence is especially high for loci where diversity is advantageous (e.g., MHC). On the basis of MHC alone, many humans would be grouped with gorillas rather than other humans, because the polymorphism predates the split in the two lineages. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 29 Fig. 24.6 Transspecies or shared polymorphism may occur if the ancestor was polymorphic for two or more alleles and if alleles persist to the present in both species Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 30 Reconstruction Methods 1. Many possibilities exist for phylogenetic trees, and it is generally impossible to know which is the true tree that represents actual events in evolution. Most phylogenetic trees generated with molecular data are considered inferred trees. 2. Computer algorithms that generate these inferred trees use two types of approaches, distance matrix and parsimony-based methods. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 31 Distance Matrix Approaches to Phylogenetic Reconstruction 1. Distance matrix approaches group organisms on the basis of overall similarity. a. The unweighted-pair-group method (UPGMA) is based on statistics, and requires data that can be condensed to measure genetic distance between all pairs of taxa being considered. i. Pairwise distances are calculated for each of the taxa. ii. UPGMA begins by clustering the two taxa with the smallest difference, separating them into one composite group. iii. Then a new distance matrix is computed between the group and the remaining taxa, and taxa separated by the smallest distance are clustered. iv. The process repeats until all taxa are grouped. b. If branch lengths represent evolutionary distance, branch points are positioned halfway between the taxa being grouped. 2. The distance matrix approach works well with either morphological or molecular data, as well as combinations of both, and takes all data into account. 3. The UPGMA approach assumes a constant rate of molecular evolution in all lineages, which is probably not accurate. Several alternative matrix-based approaches (e.g., transformed distance and neighbor-joining methods) incorporate different evolutionary rates for different 台大農藝系lineages. 遺傳學 601 20000 Chapter 24 slide 32 Parsimony-Based Approaches to Phylogenetic Reconstruction 1. Parsimony approaches group organisms to minimize the number of substitutions since the last shared ancestor. a. The underlying principle is that mutations are rare events, and so the tree that invokes the fewest mutations is considered best (the tree of maximum parsimony). b. This approach focuses only on sequence positions that favor one tree over another with regard to number of substitutions (informative sites), rather than on all sites equally (Figure 24.7). c. For a site to be informative, it has to have at least two different nucleotides, and each nucleotide has to be present at least twice in the array of sequences considered. 2. Maximum parsimony trees are constructed by identifying all informative sites within an alignment, and then determining which unrooted tree invokes the fewest mutations at these sites. a. This also produces inferred ancestral sequences at each node of the tree, filling in for “missing links” and providing insight into ancestral organisms. b. The approach assumes that all mutations are equally likely, although more complex algorithms consider the differing probabilities of transitions and transversions. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 33 Fig. 24.7 Three different unrooted trees describe all possible relationships between four taxa Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 34 Bootstrapping and Tree Reliability 1. Large numbers (e.g., ≧ 30 species) of long sequences are difficult to analyze, even with fast computers and streamlined algorithms. 2. Neither distance matrix nor maximum parsimony methods can guarantee the correct tree, but generally, if a similar tree results from both of these fundamentally different methods, it is considered to be fairly reliable. 3. The confidence level for portions of inferred trees can be determined by bootstrap tests, in which a subset of the original data is drawn with replacement and a new tree inferred. a. When this is repeated hundreds or thousands of times, and the same groupings usually emerge, these parts of the tree are well supported. b. The fraction of similar groupings is placed next to the nodes in bootstrapped trees to convey the confidence in that part of the tree. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 35 Phylogenetic Trees on a Grand Scale 1. Sequence data have provided new insights into the evolutionary relationships underlying the primary divisions of life. a. The simple dichotomy of plants and animals was revised as more organisms were discovered. b. The basic division became prokaryotes and eukaryotes, even though grouping by the absence of structures (e.g., internal membranes) is recognized as a bad way to construct taxonomic groupings. 2. More recently, Whittaker proposed five kingdoms: a. Prokaryotes. b. Protista. c. Plants. d. Fungi. e. Animals. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 36 The Tree of Life 1. DNA and RNA sequences were first used for phylogenetic purposes in the mid-1980s. a. Woese and Pace constructed an evolutionary tree based upon 16S rRNA sequences, because homologs are found in all organisms, as well as in mitochondria and chloroplasts (Figure 24.8). b. The tree showed three major groups: i. Eubacteria, including traditional bacteria, mitochondria and chloroplasts. ii. Eukaryotes. iii. Archaebacteria, including thermophiles and other extremophiles. c. Archae and eubacteria, although both prokaryotes, were as different genetically as eubacteria are from eukaryotes. 2. Later work comparing other genes (e.g., 5S rRNAs, large rRNAs and genes for fundamental proteins) supports this phylogeny, and also shows that eukaryotic mitochondrial and chloroplast genes have different origins than their nuclear counterparts (Box 24.1). 台大農藝系 遺傳學 601 20000 Chapter 24 slide 37 Fig. 24.8 An evolutionary tree of life revealed by comparison of 16S rRNA sequences Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 38 Human Origins 1. DNA sequence analysis is also used to understand human evolution, because differences between human populations are relatively small. a. For example, mtDNA differs by only about 0.33% between two human populations, while other primates have much larger differences (e.g., subspecies of orangutans at 5%). b. The greatest differences are not separated by continents, but are found in Africa. i. The differences found between African populations are greater than those seen between Africans and humans on other continents. ii. This is generally believed to mean that that humans arose in Africa, developed diverse populations, and then migrated to other continents (the “out of Africa” theory). iii. All humans alive today are believed to carry mtDNA derived from an African ancestor, and all males to have a Y chromosome from the same source. iv. These analyses set the date of divergence for humans at about 200,000 years ago, although the out-of-Africa theory is not universally accepted. 2. DNA sequence data are increasingly important in the study of evolution of humans and other organisms. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 39 Acquisition/Origins of New Functions 1. Haldane (1932) suggested that spare copies of existing genes could give rise to new genes. 2. This appears to account for most new genes, although other mechanisms (e.g., transposons) also exist. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 40 Multigene Families 1. Eukaryotes often have tandemly arrayed copies of genes with very similar sequences (multigene families) that appear to be the result of gene duplication. 2. The human globin genes are a classic example of a multigene family, with a general distribution of seven a-like genes on chromosome 16, and six b-like genes on chromosome 11. a. Globin-like genes are found in many animals and even plants, suggesting an ancient origin. b. Animal globin genes have the same general structure (three exons and two introns), but their number and order vary among species (Figure 24.9). c. Sequence and structure suggest duplication of an ancestral gene, which diverged to produce the a-like and b-like genes. d. Duplication and divergence would then produce the modern a-like and b-like gene groups. 3. Variation in globin gene number and distribution found in modern humans suggest that duplication and deletion of genes is an ongoing process still operating today. a. Duplications and deletions may result from unequal crossing-over. b. Duplications may also arise through transposition. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 41 Fig. 24.9 Organization of the globin gene families in several mammalian species Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 42 Gene Duplication and Gene Conversion 1. Duplication frees a copy of the sequence to undergo changes, since a functional copy will still exist. a. Most changes would produce less functional products, or even nonfunctional pseudogenes. b. A few changes, however, might alter function and/or pattern of expression to something more advantageous for the organism. Selection would allow these genes to become widespread in the population. 2. Misalignment between a pseudogene and a functional copy can result in gene conversion through recombination events, giving the organism even more opportunities to create a gene with a new function. 3. Gene conversion continues to operate in modern humans. An example is two genes for red-green color vision on the X chromosome, which undergo gene conversion in most of the known cases of spontaneous deficiencies in green color vision. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 43 Arabidopsis Genome Results 1. More gene duplications are being found as genomic sequencing projects proceed. An example is Arabidopsis thaliana (thale cress), the first plant genome to be completely sequenced. 2. The Arabidopsis genome appears to have undergone significantly less duplication than other commercially important plants, but still, more than 1⁄2 of its genes are duplicates. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 44 Domain (Exon) shuffling 1. Often, less than an entire gene is duplicated, resulting in copies of protein domains. a. An example is human serum albumin, whose gene has three copies of a 195 amino acid domain. b. Internal duplication is not a rapid method of producing proteins with new functions, however. 2. Most complex proteins arise from assemblages of several protein domains performing different functions (e.g., substrate binding or membrane spanning). a. The beginnings and ends of exons and protein domains often correspond. b. Gilbert (1978) proposed that most gene families today arose through domain shuffling involving duplication and rearrangement of domains (usually encoded by single exons). c. Domain shuffling proposes that introns were a feature of early life on earth, even though they are now missing from prokaryotes. d. Numerous examples of complex genes made from segments of other genes are known, and clearly some novel functions have been created in this way. 台大農藝系 遺傳學 601 20000 Chapter 24 slide 45