Download human accelerated region - School of Life Sciences

IB404 - 20 - Other primates – April 4 1. Two competing groups, one from the US Department of Energy (recall the original home of the human genome project), and the other led by Svante Pääbo at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, made several attempts to sequence nuclear genomic DNA from Neanderthal bones using 454 sequencing(they had already done mitochondrial DNA, which is relatively abundant). Homo neanderthalensis has only recently been recognized as a separate species, living from ~500,000 to ~30,000 years ago in Europe. Pääbo’s group finally succeeded in sequencing a draft genome from three bones aged 38-49,000 years old, published in 2010. This was a technically challenging task, with many improvements in DNA extractions, avoiding contamination with modern human DNA, etc. required, and it is still a rather rough draft genome sequence using Illumina. It required the human reference to extract the few neanderthal sequences from the mess of contaminating bacterial DNA. 2. Other groups has already taken a different tack and specifically amplified and sequenced particular genes, for example, the melanocortin-1 receptor exons from two separate fossils. This receptor partly determines skin and hair color in us, with various mutations that reduce function leading to blonde or red hair and fair skin. Amazingly, both their and Pääbo’s neanderthal sequences shared a novel mutation predicted to reduce receptor function, leading to the inference that they and we independently evolved this adaptation to cold northern climates, presumably to increase vitamin D synthesis. Pääbo’s group found two additional genes implicated in skin coloration with significant differences in neanderthals. 3. The Pääbo group’s analysis of the whole genome sequence is complicated by two things. First, there are many errors in these draft Neanderthal sequences caused by conversion of C to T nucleotides. They therefore restrict much of their analysis to rare changes, called transversions, were purines are exchanged for pyrimidines and vice versa, e.g. A to T. Even then, the differences between the Neanderthals, and between them and the available individual human genome sequences, are relatively low, marginally higher than between the humans. Nevertheless, they were able to discern that Neanderthals share a small portion, variously estimated at around 5%, of their genome with Europeans, to the exclusion of Africans. This implies that after the European human lineage left Africa, they interbred to some extent with Neanderthals living in Europe. More recently Pääbo’s group sequenced a partial genome from a single finger bone found in a cave in Siberia, known as Denisova, and it appears these Denisovans shared a similar portion of their genome specifically with modern-day Asians, implying interbreeding in Asia too. 4. Chimpanzees (Pan troglodytes, or Pan paniscus for bonobo or pygmy chimp) are our closest living relatives, and we last shared a common ancestor ~7 myr ago. When this estimate was first derived from molecular comparisons in Alan Wilson’s lab at Berkeley about twenty five years ago (figure below based on hemoglobin amino acid sequence divergence), it was controversial with paleontologists, who thought the split was much older, but it is now based on many gene comparisons and is pretty confident. Gorillas are our next closest relative, then orangutans, among the greater apes; followed by the lesser apes, represented by gibbons. In case you don’t know, next are Old World monkeys, then New World monkeys, then tarsiers, then prosimians. 5. Simple DNA sequence comparisons show that for matching sequences we differ by around 1.2%. But in addition there are many indel differences, ranging from microindels of 1-10 bp to differential transposon insertions up to 10 kbp, and some larger indels. These are normally counted by molecular evolutionists as single events, no matter how long they are, and since they occur about 1/10 as often as simple base changes, they are a relatively small contributor to the overall difference. But if you count the actual numbers of bases involved in indels, which average 36 bp in a large sample human/chimp comparison, then the difference goes up to 5%. 6. There are also large-scale chromosomal differences, although whether they are of any importance in terms of explaining any of the phenotypic differences is unclear. Given the large number of chromosomal rearrangements between mouse and human (±300), even if most occurred in the mouse lineage, in 6 Myr there should be several between chimp and human, and indeed chromosomes 1, 4, 5, 9,12, 15, 16, 17 and 18 have major inversions, and small inversions have probably been missed. In addition, our chromosome 2 is clearly a head-to-head fusion of two acrocentric chromosomes in the other great apes (it even retains pieces of telomeric sequence associated with the centromere, remnants of the ape telomeres at the fusion point). Chromosome 2 7. Celera again got a jump on the public chimpanzee project (Venter seemed unable to resist tweaking the collective NIH-funded public projects, for example by also publishing in Science a 1.5X coverage sequence of his own poodle before the public project finished a bulldog sequence.) They decided to focus on the coding regions of all annotated human genes by amplifying each exon (using primers designed to the flanking intron sequences, which work most of the time in chimp because of the low DNA sequence divergence) and then sequencing each directly. Most human/chimp exons are short enough to do this. They sequenced about 200,000 exons, and then sorted out those genes for which there was clearly human/chimp/mouse orthology (shown by reciprocal best matches and microsynteny) yielding a conservative dataset of some 7645 genes. 8. Their analysis focused on models of evolution of non-synonymous and synonymous sites in the human and chimp lineages since the split, using mouse as an outgroup. This is somewhat more sophisticated than simple Ka/Ks comparisons, but in principle is similar. The basic idea is to look for genes/proteins whose evolution in either lineage since the split appears to have been accelerated. Roughly 1,500 genes showed such acceleration in either the human or the chimp lineage, however most of the truly convincing examples are accelerated in the human lineage. These fall into several functional classes. 9. The largest group are odorant receptors, perhaps suggesting important changes in how chemical signals are perceived, however it is well known that many of the odorant receptor genes in our genome have become pseudogenized during primate evolution, and this may simply represent an acceleration of this process in our lineage. They say that most of these seem to be intact genes, however they could still be pseudogenes if crucial amino acid changes have inactivated them. 10. Another set of genes is involved in amino acid catabolism. Here their interpretation is that some of these genes/proteins might be important in metabolism of muscle proteins derived from a diet richer in meat than chimpanzees, and especially gorillas, eat. 11. They list several other genes implicated in neurogenesis, skeletal development, etc, including remarkably several homeotic genes which are normally involved in major developmental decisions of timing and positions of development of body regions and hence might be involved in the overall morphological differences between chimps and humans. 12. Another set of human-accelerated genes are involved in speech and hearing. One amongst five they identify involved in hearing is the alpha-tectorin gene, which is involved in making the tectorial membrane of the inner ear. Single-amino acid changes in humans cause deafness, and a mouse knockout is deaf. So they suggest that more in depth studies of differences in human and chimp hearing are warranted. Others had already identified FOXP2 as interesting (next slide). 13. This Celera analysis was an attempt to skim the cream from the milk. The public project concerned the entire genome, which reveals all the other differences in promoters and other regulatory regions, as well as all the junk and transposon differences. The hard part is sorting out which differences really matter, and this is where Celera’s approach of focusing on the coding regions worked well because only here can you compare synonymous and non-synonymous changes and hence get a direct indication of the action of selection accelerating non-synonymous changes. If the simple DNA sequence divergence of alignable regions is just 1% that means there are 30 million base changes, so it will be hard to find those that matter, let alone figure out all the indel differences. In addition, not all changes involve amino acid changes as we will see. 14. The most famous single gene showing accelerated amino acid changes in humans is FOXP2, a gene first cloned as the locus involved in a inherited condition involving severe deficits in articulation and grammar. It encodes a transcription factor that is widely expressed in the brain throughout life. The amino acid sequence has barely changed in mammalian evolution, yet as shown by Pääbo’s and another lab, two amino acids have changed in the human protein (bold). 15. A phylogenetic analysis of the synonymous and non-synonymous changes shows that this is a statistically significant acceleration of non-synonymous changes in the human lineage, shown in these two trees from the two groups, with slightly different numbers for non/synonymous changes on each branch, indicating selection. More detailed analysis of single nucleotide polymorphisms around the gene suggest that these changes were relatively recent, perhaps as young as 200,000 years ago as modern Homo sapiens was evolving. It will be interesting to see further work into exactly what the FOXP2 gene is doing. Presumably it is one of many genes/proteins involved in the evolution of human speech. 16. Another example of an interesting change is that a particular myosin, known as number 16 or MYH16, is a pseudogene in humans while functional in chimps and gorillas and other primates, due to a frameshifting microdeletion of two base pairs in exon 18 of 42 (below), thus producing a truncated protein (* is stop codon). This particular myosin is exclusively expressed in the large masticatory muscles and appears to be a major reason that ours (h and i) are highly reduced in size compared with apes like gorillas (e and f), and indeed Australopithecus species. Loss of this myosin is correlated with great reductions in the sizes of the individual muscle fibers as well as the overall muscle size as shown in these figures. 17. In an interesting analysis, these authors also infer that the pseudogenization mutation occurred roughly 2.4 Myr ago and propose that it might have allowed the evolution of the far larger cranium of Homo sapiens. They do this by comparing the Dn/Ds (similar to the Ka/Ks) ratios in the various primate lineages to that in the human lineage. Assuming that a pseudogene would have a ratio of 1, they estimate that the pseudogene formed roughly 2.4 Myr ago because that would allow enough time evolving as a pseudogene to generate the observed ratio of 0.53 on the human lineage. While this is obviously a little sketchy, it is another interesting use of the pattern of nucleotide changes. 18. Many interesting observations have been made from the subsequently published public chimpanzee genome project (2005). Perhaps the most remarkable used the following logic. It turns out that in comparisons of mammalian genomes, and indeed back to fish, there are a few hundred regions of the genome that are remarkably conserved, something like greater than 95% DNA identity over more than 200 bp. These ultraconserved regions are generally not parts of exons, because even for the most conserved identical proteins, third codon changes would reduce the identity below 95%. Indeed we have little idea what most of them are, although some are clearly non-coding RNAs. Katherine Pollard, a postdoc working in David Haussler’s laboratory at the University of California at Santa Cruz (his lab generates the UCSC Genome Browser), wrote a computer program to identify regions of the mammalian genome that have been conserved for a long time, like these ultra-conserved regions, but which have suddenly sped up in the human lineage since the split from chimpanzees. She found about 200 such regions, and the top one, called HAR1 for human accelerated region one, is just 118 bp long, and is essentially identical across mammals, but has 18 changes in humans. It turns out that HAR1 is indeed a non-coding RNA, and it is expressed in the brain, but also in testes. It seems that it is essential somehow for proper formation of the folded structures of the cerebrum, but precisely how these 18 base changes, which clearly change the 2D and 3D shape of this non-coding RNA (image), might have led to HAR1 contributing to our larger brains remains unclear. 18. There are genome projects underway on representatives of all the other major lineages of primates, e.g. gorilla just published, including the rhesus macaque Macaca mulatta from Asia, which is a major biomedical experimental organism. The divergence from humans and chimps is around 25 Myr. You can see the consequences of time in terms of the numbers of chromosomal rearrangements on these lineages. A detailed view below shows that, much like in flies, the vast majority of these are intrachromosomal, with only a few translocations revealed by color combinations. The authors identify a number of genes indicating positive selection in these three primates, but these analyses really require more species.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download human accelerated region - School of Life Sciences