Download Evolution by Gene Duplication

IB404 - 25 – Theories – April 23 1. Gene duplication is such an obvious and common event that ideas about it are seldom credited to single individuals, and indeed started with the great evolutionary geneticists Haldane and Fisher in the 1930s, but Susumu Ohno in particular championed the importance of gene duplication early on. Ohno moved from Japan to the US in 1950 and worked at the City of Hope Hospital in Los Angeles, California, and still publishes occasional reviews on the subject. After early work on cytogenetics of mammals and birds, in 1970 he published an influential book titled Evolution by Gene Duplication in which he strongly argued the basic idea that after a gene duplication event, in most cases one of the two duplicates would be lost to deleterious mutations as a pseudogene through nonsense mutations (stop codons), or frameshifting indels, or crucial amino acid changes. But, occasionally the duplicates would remain active long enough to allow them to start to diverge and one might gain a slightly different function from the other, and eventually through selection for this new function diverge significantly from the original. This is now known as the neofunctionalization model. There are many examples of gene duplicates that seem to follow this model, like the odorant receptors and p450s. Perhaps the most obvious features of these are that one of the duplicates diverges more rapidly than the other, in which case the rapidly diverging duplicate is thought to have acquired the new function, for example, the VKORC1 gene where this duplicate in vertebrates is the more rapidly evolving, while its paralog, VKORC1L1 of unknown function, is far more conserved and hence might better reflect the original role of this gene/protein in 2. In 2000 an alternative model was proposed by Michael Lynch, who was an assistant professor here in the Ethology, Ecology, and Evolution (EEE, but now Animal Biology) department and then moved to the University of Oregon at Eugene. He is now at Indiana University in Bloomington. Lynch proposed that instead of rapidly acquiring a new function, most gene duplicates that survived partition the original functions of the single gene, a model known as subfunctionalization. He thinks about this model very broadly, for example, the partitioning could simply involve expression in slightly different, and still overlapping, regions. The central idea is that any partitioning of expression or function means that each copy becomes important for the organism to survive and compete, that is, they complement each other, hence both copies are retained, allowing them to diverge further as time passes. Notice that this model, at least initially, relies only on degenerative mutations, e.g. deletion or mutation of different enhancer elements in each duplicate, which are presumably far more likely to happen than mutations giving one copy a novel function. 3. Lynch worked with zebra fish colleagues at Oregon on one example involving the engrailed gene, where zebra fish have two copies (presumably resulting from their extra polyploidization event) that appear to partition the usual expression pattern of this gene in mammals. One gene is expressed in the developing pectoral bud, while the other is in the hindbrain and spinal column. The single mammalian gene is expressed in both places (most vertebrates have a single copy of this crucial gene, so the other three from the 2R event at the base of vertebrates must have been lost). 4. Next slide - The HOX complex provides many possible examples of this where the genes, either duplicated in tandem along the complex, or in duplicated copies of the complex, have come to be expressed in different spatial patterns from anterior to posterior (for the tandem copies along the complex), or in different tissues, e.g. nerves versus muscles (in vertebrates with 4 HOX complex in fly and mouse 5. Although it remains unclear how duplications of single genes initially occur, if they commonly occur as tandem repeats, which seems to be the case, then it is easy to see how tandem arrays of genes can be generated by unequal crossing over. A classic example is provided by the red/green opsins in our genome. These two recently duplicated genes (paralogs) are 96% identical in DNA sequence and in tandem on the X-chromosome, and commonly undergo unequal recombination such that the resultant chromosomes have one copy of either red or green, while the other chromosome has an extra copy of red or green or a hybrid of the two genes. Males who inherit the former chromosomes are red/green color-blind. It turns out that most of us have various combinations of duplicated versions of these two genes, but as long as we have at least one copy of red and one of green we are not red/green color blind. X and 6. We routinely see these kinds of tandem arrays in any large gene family, such as the immunoglobulin genes in vertebrates, or the p450s in most eukaryotes, or the odorant receptors in any animal. Here’s an example from our work with the odorant receptors of bees, where there has been an expansion of the family to around 170 genes, compared with about 60 in flies. One beespecific subfamily alone consists of 157 genes, and within it there is a large tandem array of 60 genes, which remain today in a perfect tandem array, even though the genes at either end, which are the oldest genes according to a phylogenetic tree of their encoded proteins, barely resemble each other (<20% amino acid identity). Eventually these tandem arrays might get broken up by inversions and other chromosomal rearrangements, indeed in Drosophila flies there are almost no such tandem arrays of odorant receptors, the maximum being three genes together, presumably because Drosophila genomes have been subject to more genome “flux” than have bee genomes. 7. Here’s another example from our work with the tetraspanin family in Drosophila. This time about half of the genes are dispersed around the genome as single copies, however 18 of them are in an array (bottom), mostly in tandem near the centromere of chromosome arm 2R. Nevertheless, when we compare their sequences, these genes in the array are almost as different from each other as the rest, although we do believe that they all originated as an array from a single gene. Among many uncertainties is why these 18 genes have remained in this array for so long despite diverging enormously. For the HOX genes we know it has to do with their regulation 8. With the coming of genome sequences, various groups started to ask about the frequency of gene duplication, and again Michael Lynch took the lead with a Science paper in 2000 showing that in various genomes, the rates of gene duplication are remarkably high. He also confirmed that, as expected, the numbers of duplicates fell off rapidly, revealing a half-life around 4 Myr, although there is a large range. Note that he can identify the polyploidization event in Arabidopsis as a peak of duplicated genes roughly 65 Myr old, estimated from the Ks rate (X axis). 9. The origins of introns remains a controversial topic in molecular biology and genomics. Some time ago Walter Gilbert at Harvard (developer of an independent degradation method of DNA sequencing for which he received a Chemistry Nobel along with Fred Sanger in 1980) developed the idea that introns might be ancient features of genes that were originally present in bacteria, but have been lost from them. The idea is that proteins consist of modules, and these modules originally were encoded by separate exons, which were put together into genes where they are separated by introns. It would then be possible for “exonshuffling” to occur on a large scale, producing diverse proteins from a limited set of modules. This is known as the introns-early model. One of the essential requirements for such a theory is that the introns all be in the same phase with respect to codons, because otherwise when exons are shuffled, the introns would lead to frameshifts. The simplest model has all introns in phase 0, that is, between codons. However, today we find introns in all three possible phases. Gilbert’s lab has shown, however, and genomic sequences and their gene annotations confirm this on a grand scale, that the ratios of intron phases are biased towards phase 0 introns, roughly 2:1:1 for the three possible phases, 0, 1, and 2. Therefore their modified model is that roughly half the phase 0 introns represent the original module-separating introns, while the other half, and all the phase 1 and phase 2 introns, are subsequent acquisitions in the eukaryotic lineages. It is almost impossible to disprove this model, because you don’t know which 50% of the phase 0 introns are supposed to be ancient. Intron name Intron phase 10. Note that their model now allows that 3/4 of introns are eukaryotic acquisitions, which is essentially the introns-late model. Exactly how introns are gained remains obscure however, and there has been a cessation of intron acquisition in the mammalian lineage. But from phylogenetic studies of various genes and gene families it is clear that introns have been gained in all sorts of organismal lineages. Here’s an example from the carbon dioxide receptor genes in insects. The introns are spread throughout the gene (above) in various phases. When mapped onto a phylogenetic tree of the three genes (Gr1-3) in various insects, there are instances of intron loss (lower case letter on branches) and intron gains (upper case letters). The best theory is that new introns derive from transposon insertions. 11. Another major controversial topic is how some genomes get to be so small. We’ve seen how genomes get big, primarily through acquisition of transposable elements and other junk DNA like pseudogenes, but also through tetraploidization. The ranges of genome size are huge. For example, crickets, grasshoppers, and locusts have genomes up to 5 times ours, that is, 15 Gbp, while lungfish and lilies can get up to 100 Gbp. Presumably these organisms have lost control of transposons which have flourished in their genomes. We currently think that RNAi is, at least in part, a genomic defense mechanism against transposons, so perhaps their RNA interference systems are compromised. Human Nematode 12. The removal of DNA in the form of random deletions is also important. Several groups including us have shown that organisms differ enormously in the sizes and frequencies of random deletions in pseudogenes and transposons in their genomes, with the result that those that delete lots of DNA are smaller (above histograms are for transposon copies in human versus nematode note longer deletions in nematode). For example, Drosophila flies are estimated to delete DNA roughly 75 times more rapidly than humans, explaining in large part why there are almost no pseudogenes and very few old transposon copies in the Drosophila genome. Even the mouse genome relative to human appears to be smaller due to more deletions. The big question is why? 13. Michael Lynch at Indiana University has taken a grand view of genome complexity across the entire scale of biology and genome sizes. While the basic notions have been proposed before, Lynch is the first to systematically address these grand-scale questions. Basically he suggests that many of these features of genomes, that is, large size, presence and number of transposons, and even number and length of introns, are mildly deleterious and should be selected against. However, selection only works effectively against these kinds of mildly deleterious traits in very large populations, which are typically small organisms in low trophic levels. In large organisms with relatively small population sizes, typically in higher trophic levels, drift becomes a much more significant factor, and frequently selection does not succeed. A simple laboratory example of this effect of drift on population genetics is shown for beetle populations of two marked strains. When the populations are small (10-20 individuals) drift can win (top), but when they are large, selection winds (bottom). 14. The basic observation is simply that smaller organisms with larger populations (top) have smaller genomes and fewer genes, taken all the way from the most abundant bacteria in the oceans (Prochlorococcus) to mammals (bottom). An important point that we’ve seen a few times before is that there is no great discontinuity in any of these measures between prokaryotes and eukaryotes. Thus there are bacteria with large genomes and many genes. 15. Lynch correlated particular features with genome size, and found positive relationships for transposon numbers, intron size and numbers, and half-life of gene duplicates, explaining them in terms of the weak selection against these mildly deleterious features in large organisms with small population sizes and large genomes. This is an unsettling idea, because it suggests that most of genome complexity arises because selection at the organismal level can’t prevent it, instead of being selected for and adaptive. It mirrors earlier controversies about whether most molecular evolution of base and amino acid changes is neutral and slightly deleterious versus advantageous. It’s now clear that most is effectively neutral. Nevertheless it remains unclear if these genome complexity relationships will hold for finer scale comparisons, e.g. related carnivores versus herbivores. Initial analyses within mammals suggest that genome size is not so simply related, e.g. carnivores actually have slightly smaller genomes than do rodents, when Lynch’s theory would predict the opposite. Many still believe that adaptive explanations are important, e.g. small genome size in birds makes them lighter. transposons Genome Size (Mb)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Evolution by Gene Duplication