* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download - California State University
Vectors in gene therapy wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Pathogenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Frameshift mutation wikipedia , lookup
Oncogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Population genetics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Human genetic variation wikipedia , lookup
Minimal genome wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Human genome wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
CHARACTERIZATION OF UNIQUE FEATURES OF THE DENISOVAN EXOME A University Thesis Presented to the Faculty of California State University, East Bay In Partial Fulfillment of the Requirements for the Degree Master of Science in Biological Science By Alexandra Vivelo September, 2013 ABSTRACT The publicly available Denisovan genome sequence increases opportunities to learn what makes modern humans unique and to discover the distinguishing genetic features of an extinct sister lineage. This thesis explores the latter, with emphasis on male reproductive genes, neuronal genes, and a subset of metabolic genes, specifically those that code for enzymes involved in glycolysis and those that code for proteins that vary in modern human populations in connection with long-term dietary trends in those populations. Results include the identification of 34 neuronal genes with single-nucleotide changes that are derived in the Denisovan protein-coding sequence at loci that are nonpolymorphic in modern humans, the computation of the dN/dS ratio for a semen coagulation factor for which the degree of positive selection is known to be correlated with the females’ mean number of male mating partners per periovulatory period, and the determination of the Denisovan variants at a subset of known modern dietary and metabolism-related single-nucleotide polymorphic loci. Possible behavioral and functional correlates of those unique features are suggested, providing the foundation for further study on Denisovan male reproductive selective pressure, unique neuronal gene features, and metabolic genes. ii iii ACKNOWLEDGMENTS I would like to thank Dr. Chris Baysdorfer, who supported this project from the start. He encouraged me to explore new territory, was kind when I proposed implausible plans, and was always enthusiastic. Thanks to Dr. Baysdorfer, my small seed of an idea turned into a full-fledged project, and it wouldn’t have happened without his vision and support. My immense gratitude also goes to Dr. Claudia Uhde-Stone, whose support has been indispensable throughout my time at CSUEB and without whom this thesis would not exist; Dr. Henry Gilbert, who has been tremendously generous with his time and expertise; and Dr. Kenneth Curr for his mentorship. I would also like to thank Dr. Kelly Decker, Dr. Maria Nieto, and Dr. Maria Gallegos for their encouragement. These CSUEB faculty members make up an exceptionally talented and caring group, and I am privileged to be acquainted with each of them. I sincerely appreciate the time and expertise offered by Dr. Ed Green of UCSC in discussing this research. Thanks also to Dr. Bill Lu of SBI for a helpful discussion. I would also like to express my deep gratitude to my husband, Terry Van Belle. Without the significant investment of time and programming skill he put into creating searchable alignment files, and without his programming instruction, I would not have been able to access or analyze the data used in these pages. Finally, thanks to my dad, who has always believed in me; Val, for the Excel help; my aunt, for telling me it’s not too late; and my daughter, for cheerfully sacrificing some of our precious time together and encouraging me every step of the way. iv TABLE OF CONTENTS ABSTRACT ........................................................................................................................ ii ACKNOWLEDGEMENTS ............................................................................................... iv LIST OF TABLES ............................................................................................................ vii LIST OF FIGURES ......................................................................................................... viii INTRODUCTION .............................................................................................................. 1 Research Question .......................................................................................................... 3 Background ..................................................................................................................... 4 What We Know of the Denisovans .............................................................................. 4 The Source of Ancient DNA Sequences ...................................................................... 5 Neanderthal Analyses Hint at What May Be Found in Denisovan DNA .................... 7 Locus-Specific Denisovan Genome Research Published to Date ............................. 13 Challenges and Opportunities in the Characterization of Ancient DNA .................. 16 Summary of Aims ...................................................................................................... 18 METHODS ....................................................................................................................... 19 RESULTS ......................................................................................................................... 23 Male Reproductive Genes ............................................................................................. 23 Premature Stop Codons and Loss-of-Stop Mutations ................................................... 36 Genes Related to Neuron Formation and Function....................................................... 41 L1CAM ...................................................................................................................... 43 PCSK9 ....................................................................................................................... 43 HMCN1 ..................................................................................................................... 44 SETD2 ....................................................................................................................... 45 v REST ......................................................................................................................... 46 GDNF ........................................................................................................................ 47 NYAP1 ....................................................................................................................... 48 CHAT ........................................................................................................................ 48 NAV2 ......................................................................................................................... 49 CLN6 ......................................................................................................................... 50 HEXA ........................................................................................................................ 50 CC2D1A .................................................................................................................... 51 NEFH ........................................................................................................................ 52 Metabolic Genes ........................................................................................................... 52 Metabolic Genes with Known SNPs in Modern Humans ......................................... 55 DISCUSSION ................................................................................................................... 58 Selective Pressures in Male Reproductive Genes ......................................................... 58 Neuronal Genes ............................................................................................................. 59 Premature Stops and Loss-of-Stop Mutations .............................................................. 60 Metabolic Genes ........................................................................................................... 62 REFERENCES ................................................................................................................. 64 APPENDIX A: MALE REPRODUCTIVE GENES ........................................................ 82 APPENDIX B: NEURONAL GENES ............................................................................. 85 vi LIST OF TABLES Table 1. SEMG2 dN/dS Ratios. ....................................................................................... 33 Table 2. Premature Stop Codons Found in the Denisovan Exome. ................................. 36 Table 3. Loss-of-Stop Mutations Found in the Denisovan Exome.................................. 40 Table 4. SNCs in Genes Encoding Glycolytic Enzymes. ................................................ 54 vii LIST OF FIGURES Figure 1. SEMG2 dN/dS Ratios. ..................................................................................... 34 viii 1 INTRODUCTION This project is aimed at characterizing distinct features of the genome of the Denisovans, the species of extinct hominid discovered in Siberia in 2008. The project consists of a bioinformatics analysis of the Denisovan genome with special focus on specific subsets of protein-coding genes that are functional in modern humans and that include at least one amino acid change between Denisovans and modern humans. The present study focuses primarily on single nucleotide changes (SNCs), loci at which the identity of the nucleotide at a particular locus is different between the modern human and Denisovan genomes. The SNCs being studied here are primarily those likely to indicate functional significance based on the fact that they are located in translated regions of exons. To locate the base pairs that show a functionally significant difference, the Denisovan genome sequence reads have been aligned to the human reference genome and all anomalies determined. At loci where the modern human variant matches that of other extant primates (specifically, the gorilla and the orangutan have been used as outgroups) yet the Denisovan variant differs, the Denisovan variant is said to be derived. That is, the mutation arose in Denisovans after their reproductive split from early modern humans. Likewise, at loci where the Denisovan variant matches that of gorillas and orangutans but not that of modern humans, the modern human genome is said to be derived at that locus. The comparison of the Neanderthal variant to select subsets of coding-sequence SNCs between Denisovans and modern humans provides further clues as to the course of evolutionary change at the studied loci. 2 The characterization of the unique features of an extinct group of hominids necessitates willingness to accept that firm conclusions cannot be drawn about the exact nature of any functional significance of amino acid changes. However, educated speculations can be made based on the chemical properties of a substituted amino acid, whether it falls within any of the protein’s main functional domains, and to what degree it affects the protein’s chemical and physical properties. Much of the research focus in paleogenomics is on characterizing adaptive events unique to modern humans. In contrast, the aim of this thesis is to characterize unique features of the Denisovan genome. Ultimately, what we learn about the Denisovan genome will teach us about ourselves. There is much to be learned about the course of hominid evolution and the role specific single-nucleotide polymorphisms play in adaptation. What we learn can not only help us to understand our extinct cousins but also reveal how some of the features of modern human genes came to be. Because so much of current paleogenomics research is dedicated to the latter goal, this thesis explores the former goal, that of understanding the characteristics that are unique to Denisovans. In doing so, the hope is that this thesis will contribute to a knowledge pool that will allow future research to attain greater understanding of modern human genetics. Throughout this thesis, modern humans’ two ancient hominid sister groups are referred to as Denisovans and Neanderthals, rather than by binomial taxonomic classification. In the case of the Denisovans, this is the only option, because no official species name has been assigned to them. In the case of the Neanderthals, the common term is the only one upon which there is consensus. They are sometimes called Homo 3 neanderthalensis and sometimes Homo sapiens neanderthalensis. The question of whether Denisovans, Neanderthals, and modern humans are separate species is an academic one, because “species” has a variety of definitions, none of which provides final authority on the subject. On that basis, reference to Denisovans and Neanderthals as “species” has been avoided here. Research Question The research question addressed by this project is that of identifying codingregion SNCs that exist between the Denisovan and modern human genomes, particularly those that occur in translated regions at loci that are non-variant in modern humans, and determining the likely functional significance of selected SNCs among these. Between the Denisovan and modern human genomes there exist a number of SNCs that result in amino acid changes. The likely functional significance of these can be predicted in a variety of ways. The results of the prediction tools SIFT, PolyPhen, and Condel are included in the appendices of the present work, drawn from previous publications on the Denisovan genome (Meyer et al., 2012). Beyond this, in the present work, a variety of Web-based bioinformatics tools have been used to explore the nature of specific amino acid substitutions. The study of paleogenomic variants adds to our understanding of the modern human genome, especially to the research of modern genetically linked diseases. The reasons for this include the fact that understanding how the disease-linked genes evolved helps in treatment development. Another reason is that the identification of modern- 4 human SNPs in which some individuals have an ancestral allele may lead to an understanding of the evolutionary reason for fixation of the more-frequently found, modern variant. This project focuses on characterizing the Denisovans rather than taking the more common tactic of using the Denisovan genome as a tool for more clearly understanding our own, but its results are intended to add to a knowledge base that may be used in the future for the purposes of modern genetic medicine as well. Background What We Know of the Denisovans The people known as the Denisovans, to whom an official taxonomic name has not yet been given, are known so far only by dentition and a few other fragmentary remains from Denisova Cave near Altai Krai in Siberia. The complete genome was sequenced from the little finger of a female Denisovan (Gibbons, 2011a). Sequencing of mtDNA amplified from samples taken from the 40,000-50,000-year-old remains suggests that the Denisovans represent a branch of Homo that was distinct from the early modern Homo sapiens and the Neanderthals who lived at the same time (Krause et al., 2010). The age given for the remains is a rough estimate, because the dating of Denisova Cave’s layer 11, where the remains were found, may have strata that date both earlier and later, possibly as recent as 16,000 years ago (Gibbons, 2011b). The size of the third molar of the Denisovans has been found to lie outside the range of variation for Homo sapiens. However, arguments have been made against placing the specimens in a new taxon that is created primarily on the basis of DNA sequencing (Trinkaus, 2010). Additional finds and 5 further DNA analysis may be expected to shed light on the Denisovans’ appropriate placement within human taxonomy. The Denisovans’ geographic range may be better understood in the future once more fossils are located, but it is possible to deduce that they must have ranged through much of Southeast Asia based on the fact that a number of Southeast Asian populations share DNA with the Denisovans. Because other Southeast Asian groups share no DNA with Denisovans whatsoever, it can be concluded that the Denisovans passed their genetic material to certain subgroups within Southeast Asia after their dispersal into the region. Some of this gene flow had begun to take place no later than 44,000 years ago (Reich et al., 2011). Admixture between ancient strains of DNA and anatomically modern humans (AMH) may have continued until as recently as 35,000 years ago (Stewart & Stringer, 2012). The Source of Ancient DNA Sequences Although this thesis makes use of previously published sequences and does not entail any new sequencing of ancient DNA, it is relevant to note the process by which these sequences were originally obtained and to discuss how the process used to obtain deep coverage of the Denisovan genome differs from previous techniques. In brief, the isolation of ancient DNA follows a similar general process to that of standard DNA isolation protocols. In the case of obtaining genetic material from bones or teeth, the usual source material for ancient DNA, the first step is the grinding of the sample of bone or tooth to powder. The next step is to use proteinase K to extract the 6 DNA. This is followed by the use of silica to bind the DNA (Rohland & Hofreiter, 2007). Silica is used in DNA isolation from ancient material because it has such a high affinity for binding DNA. This is an advantage in ancient DNA extraction because it maximizes the yield from material that has a smaller amount of DNA than living or recently deceased tissue. However, silica’s disadvantage is that in addition to binding DNA, it adsorbs polymerase onto its surface (Besecker, Cornell, & Hampikian, 2012). Extra care must be taken to eliminate all traces of silica pellets from the extract prior to DNA amplification to avoid loss of effective polymerase concentration due to inhibition by the silica (D. Y. Yang, Eng, Waye, Dudar, & Saunders, 1998). Possible solutions to problems of enzyme inhibition are the addition of BSA (Besecker et al., 2012) and the inclusion of a higher-than-usual concentration of enzyme. Following DNA binding, a wash step and an elution step are performed in much the same way as in other DNA isolation techniques (Rohland & Hofreiter, 2007). The high-coverage Denisovan genome was obtained by preparing the DNA library from single-stranded, rather than the usual double-stranded, DNA. This results in a higher yield for more than one reason. For one, the method eliminates purification steps that result in loss of DNA. The DNA is first dephosphorylated. After heat denaturing, adaptor oligonucleotides are ligated to the 3’ ends of the single strands. These adaptors are biotinylated to allow the fragments to be bound to streptavidin-coated beads. Primers complementary to the adaptors are used for copying of the original single strands. Following the copying step, a double-stranded adaptor is ligated to the 3’ end of the new daughter strand. In addition to the prevention of DNA loss, this method minimizes 7 damage to the DNA during the amplification process and therefore allows researchers to observe the exact patterns of ancient DNA breakage due purely to degradation over time. This method also improves the recovery of long molecules, as well as the recovery of short molecules smaller than 30 base pairs in length (Meyer et al., 2012). This method is what allowed for an approximately 30-fold-coverage whole-genome Denisovan sequence to be published online in February 2013 by the Max Planck Institute. Neanderthal Analyses Hint at What May Be Found in Denisovan DNA The existing research on the Neanderthals provides some examples of the type of information that may be gleaned from the study of the Denisovan genome. For example, Green et al. speculated on the functional significance of changes between Neanderthals and modern humans by examining the protein-coding function of genes with fixed, functional nucleotide substitutions. Many of the nucleotide substitutions between Neanderthals and modern humans are silent, and many of the affected loci are variable in modern humans. Only 78 substitutions identified by Green et al. are both fixed and lead to a change in the protein coded for by the affected gene. These loci indicate ways in which modern humans maintain a stable biochemical difference from the ancestral biochemical profile shared by chimpanzees and Neanderthals. Although 78 fixed, functional substitutions were identified, only a few are located on the same gene together. A mere five genes were identified that include multiple fixed, functional substitutions. These include SPAG17, a gene that codes for a protein found in the sperm flagellum; PCD16, a gene that codes for a substance important in adhesion between cells; TTF1, a 8 gene that codes for a protein that functions in the termination of ribosomal transcription; CAN15, a gene with a product whose function is unknown, and RPTN, the gene that codes for repetin, a protein found in various regions of the epidermis throughout the body (R.E. Green et al., 2010). One group of genes that have been analyzed in detail for differences between the Neanderthal and modern human is those that code for signal peptides. Gralle and Pääbo, both of the Max Planck Institute, have looked for amino acid substitutions that are present in modern humans but not in the Neanderthal genome, which tells us these mutations occurred more recently than approximately 400,000 years ago, according to the most common general consensus for the approximate Homo sapiens/Neanderthal split date (Endicott, Ho, & Stringer, 2010). They believe that for all amino acid substitutions that occurred since the human/Neanderthal split, differences can eventually be identified and studied for functionality. Gralle and Pääbo have begun by studying differences in signal peptides between Neanderthals and Homo sapiens. Signal peptides cause newly formed proteins to be directed to the endoplasmic reticulum. This is usually the signal peptide’s only purpose; it undergoes cleavage from the nascent protein and has no further function after the protein reaches the endoplasmic reticulum. This means that any functionally significant changes in signal peptides will make the journey of a new protein to the endoplasmic reticulum either more or less efficient (Gralle & Pääbo, 2011). In an analysis of a single Neanderthal individual dated to approximately 43,000 years ago, 10 differences between human and Neanderthal signal peptides were found. In all 10 cases, the Neanderthal individual carried the same amino acid chimpanzees carry at 9 that locus. Present-day modern humans have a fixed derived allele at four of those locations. At six of the locations, some modern human individuals have the derived allele and others have the ancestral variant. Single amino acid substitutions on signal peptides can reduce cell survival rates by down-regulating protein transport, which means such substitutions are likely to be relevant in some diseases. Efficient protein transport to the cell membrane is necessary for cell survival rates. Gralle and Pääbo analyzed their data to identify any functional significance between the modern human signal peptides and the ancestral signal peptides shared by Neanderthals and chimpanzees. Their results showed that “no modern human signal peptide differed significantly from its ancestral counterpart, an observation compatible with the neutral theory of molecular evolution” (Gralle & Pääbo, 2011). The functionally significant differences between modern humans and our two extinct sister lineages are rare within the genome, which means that most regions investigated for differences will yield negative results. A more fruitful approach to locating and understanding differences between ancient and modern DNA is to compile a list of all the differences and then take note of the genes upon which these differences are found. Not only SNCs that are fixed in the modern human population, as described above, are of interest. The locus of a modern human polymorphism may also be worthy of investigation. For example, Ovchinnikov and Kholina postulate that certain mitochondrial DNA sequence similarities between fossil samples and modern humans indicate sequences that were integrated into the genome of a common ancestor of Neanderthals and humans not long before the human/Neanderthal split. A particular pair 10 of linked polymorphisms caught Ovchinnikov and Kholina’s attention because they exist in their modern variation in the two ancestral samples examined (Ovchinnikov & Kholina, 2010). The genes in question are ATP6 and ND3. The polymorphisms found both in the samples and in a majority of modern humans are 8701G and 10398G. Pan troglodytes has 8701A and 10398A at these loci, and this polymorphism is believed to have been present in the common ancestor we share with Pan as well. The difference denoted by G versus A is that these ancient samples share a guanine in a specific location with most modern humans; yet chimpanzees, bonobos, and some modern humans whose genomes show a “reversion to the ancestral allele” all have an adenine in this location instead. An exploration of the functional difference between mitochondrial DNA with a guanine at these locations and DNA with an adenine at these locations may shed light on the relationship between large brains and certain diseases affecting the central nervous system. Guanine is present at 8701 and 10398 in Neanderthals and the oldest modern human lineages. One possible reason 10398G underwent positive selection in early modern human populations is that this polymorphism increases the activity of the mitochondrial electron transport chain. Individuals who show the reversion to 10398A are more vulnerable to neurological disorders such as Parkinson’s and Alzheimer’s disease. In other words, electrons can flow more quickly through the mitochondrial membranes of an individual with the 10398G variation, so energy for cellular work can be released faster than in an individual with the 10398A allele. The role of the 8701 locus 11 in specific diseases has not been studied as conclusively as that of 10398, but Ovchinnikov and Kholina suggest that studies will reveal the adenine allele to have a down-regulating effect on ATP synthesis and oxidative phosphorylation similar to that exerted by the adenine allele at 10398 on mitochondrial electron transport. They also suggest that the substitution of 8701G for 8701A and 10398G for 10398A preceded the encephalization of the common ancestor modern humans share with Neanderthals and allowed greater encephalization than that present in chimpanzees (Ovchinnikov & Kholina, 2010). Green et al. located recent modern human selective sweeps by SNPs at which Neanderthals carry an ancestral allele rather than the derived allele that appears at that locus in modern humans. This method showed that 212 regions contain selective sweeps that occurred since modern humans split with Neanderthals. The relative age of these 212 could be determined because the more recent a selective sweep is, the more consistent the sequences of linked nucleotides on either side of the selected allele will be. The more time has passed since the selective sweep, the more mutations will have occurred in the affected region. Five of the 20 most recent selective sweeps contain noncoding DNA only, so they likely represent selection for regulatory regions whose functions have not yet been identified. The most recent of the remaining 15 regions contains a gene called THADA. Mutations in THADA have been shown to be associated with a propensity to develop type II diabetes. Several of the other genes included in the 15 most recent selective sweeps for coding regions also have been implicated in specific diseases. One of these genes is NRG3, which is linked to schizophrenia when certain mutations occur 12 within it. Another is RUNX2, which when mutated causes a genetic disorder known as cleidocranial dysplasia, characterized in part by a bell-shaped rib cage and frontal bossing (R.E. Green et al., 2010). Human Accelerated Regions (HARs) represent another region of the genome that has been studied in Neanderthals. Prior to the sequencing of the Neanderthal genome, the HARs were identified as portions of the human genome that evolved rapidly in humans during recent millennia. HARs are highly conserved in all other vertebrates, including chimpanzees. When Green et al. examined the HARs for differences, they discovered that 91.4% of the studied HARs are shared between modern humans and Neanderthals, indicating that the HARS developed to their current derived state before the modern human-Neanderthal split (R.E. Green et al., 2010). Because Denisovans and Neanderthals are believed to have diverged more recently than the divergence of either lineage from modern humans, one might hypothesize that HARs must be similar in all three lineages. However, only an analysis of these regions could determine this for certain. One argument against placing particular emphasis on these regions is that the possibility exists that the HARs are accelerated in humans not due to adaptation but rather due to the relaxation of selective pressures. That is, the HARs may have little functional purpose in humans and may therefore be free to mutate with minimal consequence. One of the possible non-selective reasons for mutation in the HARs is GC bias, the conversion of AT pairings to G-C pairings in the absence of selective pressure keeping the A-T base pair in place (Katzman, Kern, Pollard, Salama, & Haussler, 2010). According to the study done on the Denisovan HARs so far by Burbano et al., many of the human-specific 13 derivations of the HARs are shared with modern humans by the Denisovans as well as the Neanderthals. However, 8% of the derived alleles are found in the ancestral state rather than the modern human state in both ancient hominids. These authors pinpointed a number of derived alleles that are more recent than the modern human split with the Neanderthal/Denisovan common ancestor and suggest that the HARs are worthy of further comparative study in the future (Burbano et al., 2012). Locus-Specific Denisovan Genome Research Published to Date To date, the number of studies published on the specifics of the Denisovan genome is limited but growing. It has been determined that the Denisovans contributed genetic material to the modern humans of Southeast Asia, and this admixture has been determined to be independent of AMH-Neanderthal interbreeding (Skoglund & Jakobsson, 2011). Abi-Rached et al. have proposed that the Denisovan contribution to the genomes of certain Asian populations is responsible for the presence of the HLA-B*73 allele in these populations. HLA is active in immune function by ligating T cells and the receptors for natural killer cells. The HLA-B*7 allele is rarely found outside of West Asian populations. It, as well as the HLA-C-C allele, is shared by modern Asians and Denisovans, is most likely due to introgression (Abi-Rached et al., 2011). Another study on the evolution of the human immune system identifies gene inactivations that occurred subsequent to the divergence of the Neanderthal/Denisovan common ancestor with early modern humans. SIGLEC13 and SIGLEC17P are expressed in chimpanzees but exist as inactivated pseudogenes in modern humans, with the 14 exception that SIGLEC17P is expressed in human natural killer cells. SIGLEC13 is calculated to have been inactivated about 46,000 years ago, and SIGLEC17P is estimated to have been inactivated approximately 100,000 years ago. The Neanderthal and Denisovan genomes show the modern human variant of both these genes, indicating that either the estimates for the dates of inactivation are incorrect or the inactivations occurred independently in all three lineages (X. Wang et al., 2012). Yet another immunity-related gene, BST2, has been determined to have reached its current form in the common ancestor of modern humans, Denisovans, and Neanderthals, at least 800,000 years ago. BST2 prevents replicated immunodeficiency viruses from being released from their infected host cells. This action confers protection against all known primate immunodeficiency viruses except the HIV-1 group M varieties that affect humans. Therefore, Denisovans and Neanderthals must have shared modern humans’ immunity to most simian immunodeficiency viral strains (Sauter, Vogl, & Kirchhoff, 2011). The genome region that includes the OAS1 gene has a haplotype found only among Melanesians yet closely matched by the Denisovan sequence. This may indicate introgression, or interspecies gene flow, from the Denisovans into an early human subpopulation, resulting in a haplotype that persists today (Mendez, Watkins, & Hammer, 2012). The APOE gene codes for apoliprotein E, which binds to lipids to form lipoproteins. APOE is polymorphic in humans but monomorphic in chimpanzees, and the human polymorphisms are implicated in variations in several aspects of human health. 15 The Denisovan APOE sequence indicates that Denisovans shared the fixed, humanspecific variations within APOE, with apoliprotein E function more closely resembling that of modern humans than that of chimpanzees (McIntosh et al., 2012). A cluster of genes related to dopaminergic neurotransmission, comprising the genes NCAM1, TTC12, ANKK1 and DRD2, shows several human-specific derived alleles. Denisovans were found to have some of the ancestral alleles and some of the derived alleles in these genes, whereas Neanderthals were found to share all the humanspecific derivations (Mota, Araujo-Jnr, Paixão-Côrtes, Bortolini, & Bau, 2012). Denisovans and Neanderthals share the two human-specific alleles in FOXP2. However, an intronic SNC in FOXP2, at which the Neanderthals and Denisovans differed from modern humans, may have affected transcription factor binding and may therefore have affected the expression of FOXP2 (Maricic et al., 2012). Agoni et al. have identified retroviral insertions into the Denisovan and Neanderthal genome subsequent to their divergence from early modern humans. These are the traces of germline cell infections by retroviruses. The evidence supports the conclusion that the Denisovans and Neanderthals diverged more recently from each other than from early modern humans but as yet does not indicate the nature of any influence these genetic insertions may have had on these ancient genomes (Agoni, Golden, Guha, & Lenz, 2012). One gene variant that, in its mutated form, causes disease in modern humans, was found to be present in the Denisovan genome. The gene in question, CC2D1A, when the mutation ValI736Met, can cause miscarriage and thromboembolism when mutated in modern humans. The fact that the disease-causing variant was found in 16 the Denisovan genome may indicate that this variant was typical among Denisovans yet compensated by other mutations, or it may be indicative of disease in the sequenced Denisovan individual (G. Zhang et al., 2011). Challenges and Opportunities in the Characterization of Ancient DNA Reich et al. published a low coverage of the Denisovan genome, about 1.9 fold (Reich et al., 2010). Coverage refers to the number of reads in which each base is present, and therefore higher coverage equates to higher confidence in the accuracy of the final sequence. Low coverage leaves us unable to determine whether a given variant is derived in the ancient organism (Lalueza-Fox & Gilbert, 2011). Meyer et al. subsequently published a 30-fold-coverage sequence of the genome using a single-stranded DNA library preparation technique involving the immobilization of dephosphorylated, denatured ancient DNA on streptavidin-coated beads. This method eliminates the need for purification steps and therefore preserves products that would otherwise be lost during purification (Meyer et al., 2012). The high-coverage genome sequence is available to the public for download and is the sequence being used for this project. Despite the advantages of the high-coverage sequence, caution is still warranted with regard to assumptions about Denisovan adaptive mutations. Because the option to compare the genomes of multiple Denisovan individuals does not yet exist, the possibility remains significant that a given variant is due not to Denisovan adaptation but to degradation or error. It is also impossible to tell which sites were polymorphic and which were fixed in the Denisovan population. This project’s 17 primary focus is on Denisovan-specific derived variants at loci that are non-polymorphic in the modern human population. It is tempting to suggest the possibility that some of these represent Denisovan-specific adaptations. However, any such inferences must be suggested tentatively, with the caveat that because we cannot observe living organisms, any such sites will be regarded as possible, not certain, sites of recent Denisovan-specific adaptive mutations. This research also focuses primarily on SNCs located on exons. Most of the differences between ancient and modern-human DNA are located in non-coding regions. Differences in non-coding regions are worthy of study for possible effects on functions such as regulatory activity and the transcription of microRNAs (Lalueza-Fox & Gilbert, 2011). However, the non-coding changes will be largely outside the range of this project, because of the greater potential for identifying the significance of coding SNCs. Previous studies on the Neanderthal genome give some indication of the extent and type of differences between humans and ancient sister populations. Unsurprisingly given the closely related nature of the lineages in question, when the modern human genome is compared to that of the Neanderthals, only a handful of chromosomal regions contain multiple SNCs predicted to have functional significance. Green et al. identified the top 20 candidate regions for positive selection, dividing them into four categories: regions in which recent selective sweeps were likely in modern humans, regions in which selective sweeps were likely in apes, regions in which selective sweeps were likely to have taken place prior to the reproductive split between modern humans and the common ancestor of Neanderthals and Denisovans, and selective sweeps that approximately 18 coincided with the time of the split (R.E. Green et al., 2010). Crisci et al. have further analyzed those regions for significance and narrowed the results to a list of 29 proteincoding genes that show significant changes between ancient and modern genomes. Among others, these genes include transcription regulators, genes that when mutated have known disease association in modern humans, a proteinase, and several genes in the HOX family. (Crisci, Wong, Good, & Jensen, 2011). The present research takes advantage of the opportunity to extract specific categories of data from the Denisovan genome sequences, to find patterns in the differences between the modern human and Denisovan exomes, and to analyze the potential biochemical implications of a few specific differences. Summary of Aims A. The first aim is to use bioinformatics to locate genes in which there are functional differences between Denisovans and modern humans. B. The second aim is to suggest possible significance of select differences. C. The third aim is to indicate directions for future research. 19 METHODS Denisova genome sequence data were obtained in binary alignment/map (BAM) file format from the most recent, publicly available, approximately 30-fold coverage Denisova sequence reads aligned to the 20-fold coverage modern human genome sequence hg19. These reads are the result of a recently developed technique for sequencing ancient DNA with coverage rivaling that of modern genome sequences (Meyer et al., 2012). These published sequences were downloaded from http://cdna.eva.mpg.de/denisova. The Neanderthal sequences, also aligned to hg19, were downloaded in BAM file format from http://cdna.eva.mpg.de/neandertal/altai/bam/. According to the source Web page belonging to the Max Planck Institute Department of Evolutionary Genetics, this genome sequence averages 50-fold coverage. The sequence alignments were converted to variant call format (VCF) files using SAMtools, a Linux-based package designed to store, process, index, and sort large nucleotide sequence alignments (H. Li et al., 2009). The SAMtools utilities are freely available at http://samtools.sourceforge.net/. The SAMtools mpileup utility was used to extract all mismatches between the modern human and Denisovan sequences, and between the modern human and Neanderthal sequences. For the Denisovan alignment to the modern human sequence, the exon-only mismatches were identified and placed in a separate file. To accomplish this, the loci of all human exons were obtained from Ensembl, and this exon information was used to 20 extract exons from the VCF files using Linux shell tools. The result was a comprehensive, searchable file of the difference between the modern human and Denisovan exomes. For all genome regions and mismatch types studied in this project that had not been not cataloged in previous research, Linux command line tools were used to obtain subsets of data that gave single-nucleotide changes (SNCs) and indels for the regions of interest. Two published catalogs of SNCs between the modern human and Denisovan genomes were downloaded from the material available as a supplement to Meyer et al.’s 2012 paper on their high-coverage Denisovan genome sequence (Meyer et al., 2012). One was the file of all SNCs at loci within the consensus coding sequences (CCDS), at which the Denisovan genome shows a derived allele, and at which the modern human locus has a single fixed allele, or at which the dominant human allele is present at greater than 99%. The other was the file of all SNCs within the CCDS, at which the modern human genome shows a derived allele, and at which the dominant human allele is fixed or present at greater than 99%. Both catalogs were converted to VCF files so they could be searched and analyzed in a Linux environment. The data identified genes by accession numbers only, meaning that specific gene data had to be added. The VCF files of these two SNC catalogs were used to create Excel spreadsheets, to which the gene names, gene annotations, and Neanderthal variants were added. The Neanderthal variant column was filled in only at loci where the Neanderthal variant differed from the ancestral state. The records for which the Neanderthal sequence shares the ancestral state were left blank in the Neanderthal column for ease of visual scanning. Neanderthal variants for these sets of 21 loci were obtained from the VCF file of exon-located differences between the modern human and Neanderthal genomes, described above. Gene names and known or putative functions were obtained from Entrez Gene and Ensembl. OMIM was used to identify the diseases associated with the genes from both the human-derived and Denisovan-derived genes for which the Neanderthal variant matches the derived rather than ancestral state. For the study of subsets of the CCDS SNC data, a search of the gene names and annotations was used to create three comprehensive shorter catalogs of the genes related respectively to neurons, male reproductive genes, and metabolism that show amino acid differences between the Denisovan and modern human exomes. A fourth shorter catalog was compiled showing the details on all premature stop and loss-of-stop changes in genes that are functional in the modern human genome. In the construction of the neuronal and male reproductive gene catalogs, motifs and domains were analyzed using MotifScan and InterProScan. For those genes selected from these catalogs for in-depth investigation, secondary structures and domains were determined using GOR, available at http://npsapbil.ibcp.fr/cgi-bin/secpred_gor4.pl, and InterProScan, available at http://www.ebi.ac.uk/Tools/pfa/iprscan/. The significance of the frequency of changes in sperm-related genes was determined using PAL2NAL, a calculator of dN/dS ratio. PAL2NAL was selected from among many available tools for its robustness to the inclusion of untranslated regions and other irregularities in sequence alignments such as alignment mismatches and frameshifts. It was also selected over other options due to the fact that it is based on upto-date calculation methods and employs the most widely used software. PAL2NAL is an 22 online server that uses the codeml program from PAML to create alignments and perform phylogenetic analyses (Suyama, Torrents, & Bork, 2006). Protein alignments for input into PAL2NAL were obtained using ClustalW. Human nucleotide and amino acid sequences were obtained from NCBI’s CCDS database, and sequences for other species were obtained from NCBI’s Nucleotide database. Denisovan sequences were created by altering the human sequence to match all Denisovan variants in the CDS, using the original Denisovan/modern human SNC pileup created for this project. 23 RESULTS Male Reproductive Genes Little can be said with certainty about a group of hominids who have not lived for tens of thousands of years. All conclusions about any aspect of their existence that cannot be ascertained from their skeletal morphology must be made cautiously and must remain tentative. This is true even of many of the conclusions drawn from DNA, because we have no way of determining for certain how gene products interacted with one another in vivo, nor can we see patterns of transcription and translation. However, parallels can be drawn between observed patterns of behavior and the genetic signatures that accompany them among modern species, and the genetic signatures found in an extinct line. This section proposes to draw one such connection between the Denisovan sequence and observed mating behaviors of several extant primate species and their accompanying genetic signatures of rates of positive and purifying selection. Genes that code for proteins forming part of the sperm’s structure, genes that are expressed specifically in the testis, and genes that code for proteins in semen appear with noteworthy frequency in the catalog of changes unique to Denisovans and found at fixed loci. If all other factors were equal, the percentage of the catalog made up by such genes would be disproportionate. There are 411 different types of cells in the human body (Vickaryous & Hall, 2006). If all factors were equal and the distribution entirely random, one would expect genes related to each of the 411 cell types to make up approximately 0.24 percent of the 2060-gene catalog of mutations unique to Denisovans and found at 24 fixed loci. Sperm-related proteins make up 1.94 percent of the catalog, appearing 8-fold more often than expected due to random chance. However, gene evolution is known to proceed at uneven rates, with a non-random distribution. Given the fact that in mammals, sex-linked genes are known to undergo particularly rapid evolution (Good et al., 2013; Sackton et al., 2013), the percentage found is not unexpected. Sex-biased genes that are expressed preferentially in males undergo more rapid evolutionary change than either female-biased genes or non-sex-biased genes (Ellegren & Parsch, 2007). Therefore the catalog’s abundance of changes in genes that code for proteins specific to spermatozoa and other male reproductive genes is in keeping with what we know about male reproductive gene evolution. Nonetheless, the frequency of the appearance of such genes here, and the relationship of selective pressures on male reproductive genes to mating systems, made these male reproductive genes an intriguing group to study. It has been previously noted that multiple fixed, functional substitutions exist in human SPAG17, a gene coding for a protein found in the sperm flagellum, distinguishing it from the same gene in Neanderthals (Richard E Green et al., 2010). This fact also points to the opportunity to discover differences in reproductive proteins. Certain male reproductive genes have been shown to undergo more rapid positive selection in species in which the females are observed to mate with more males during the periovulatory period (Dorus, Evans, Wyckoff, Choi, & Lahn, 2004; Rooney & Zhang, 1999; Torgerson, Kulathinal, & Singh, 2002; Wong, 2010). Dorus et al. give the mean number of male partners per female per ovulatory period for several species, along with the type of mating system in which each species engages. At the lower end of the 25 spectrum in terms of mean number of male partners are two polygynous species, in which one male has exclusive access to a group of females and the females rarely if ever mate with more than one male per ovulation. These two species are the gorilla and the colobus monkey, each with a mean number of approximately one male partner per periovulatory period. The gibbon, which is monogamous, also has approximately one male partner per female per periovulatory period. The orangutan has a dispersed mating pattern, with animals living in relative solitude rather than forming strong social bonds within mating pairs or groups. Orangutan females have a mean of one to two male partners per each fertile period. Humans, with a wide variety of mating patterns worldwide, also have a mean of one to two male partners per female ovulation. The two remaining species on the list have a multimale-multifemale mating pattern. Macaque females mate with a mean of about three partners in each periovulatory period, and chimpanzees are the most promiscuous, with a mean of eight male partners (Dorus et al., 2004). A number of sperm proteins have been shown to evolve more rapidly than proteins expressed in other tissues. Further, this rapid evolution is demonstrably due to positive selection, because these genes exhibit a high rate of nonsynonymous substitution, yet their rate of synonymous substitution is no higher than that of genes expressed elsewhere. Many such proteins are diverse in function. Some of the expressed proteins bind the egg, others are involved in gene regulation, and others are involved in glycolysis. What they have in common is that they are specifically expressed in sperm (Torgerson et al., 2002). This effect can also be observed in the male reproductive gene SEMG2, which codes for a semen coagulation factor. SEMG2 evolves more rapidly in 26 primate species in which the females mate with more males in a single periovulatory period. These species may be said to experience sperm competition. That is, for a male to pass on his genetic material to the next generation, his sperm must be better equipped to fertilize the egg than the sperm of the female’s other mates. As compared to species in which females mate with no more than one or two males, the species in which females mate with multiple males have “higher sperm counts, richer mitochondrial loading in sperm and more prominent semen coagulation” (Dorus et al., 2004). In general, not just in mammals, evolution of male reproductive genes is expected to proceed more rapidly in polygynous species such as Drosophila than in monandrous species (Ellegren & Parsch, 2007). The notion that sperm competition leads to rapid positive selection in sperm is one that passes logical scrutiny. However, the molecular evidence for this hypothesis has not gone without skeptical scrutiny. For example, protoamine 1, a glycolysis protein given by Torgerson et al. in 2002 as an example of a positively selected male reproductive gene, was described by another team of researchers in 1999 as showing inconclusive evidence for positive selection and requiring further study (Rooney & Zhang, 1999). A study on humans and three other species of great apes—the bonobo, the chimpanzee, the gorilla, and the orangutan—contradicts the results of Torgerson et al.’s research by finding that male reproductive gene evolution was influenced by gene function but not by mating patterns (Good et al., 2013). A 2010 comparison of human, chimpanzee, squirrel monkey, owl monkey, macaque, and colobus rate of nonsynonymous mutation found that testis-specific genes have a more rapid rate of nonsynonymous mutation in chimpanzees than in humans. This 27 study did not claim to show positive proof of correlation between positive selection and mating system (Wong, 2010). In short, although there has been some dissent, a number of studies have supported the idea that sperm competition leads to rapid evolution and positive selection in male reproductive genes. In recent years, the correlation has been called “a well-documented proxy of sexual selection” (Grayson & Civetta, 2012). The overall indication of the literature consulted on this topic is that positive selection due to sperm competition remains a distinct possibility and can be used as the basis for informed conjecture, but it cannot be assumed as absolute fact until further studies are completed on the correlation between primate mating patterns and positive selection in sperm-, testis-, and semen-specific genes. Approximately 25% of all possible mutations will be synonymous. The formula used to calculate the ratio of nonsynonymous to synonymous changes must account for the occurrence of synonymous changes, also known as amino acid code degeneracy. Once this adjustment is made, the ratio of dN/dS will be equal to one if evolution acting on these particular homologous regions is neutral; that is, if neither purifying selection nor positive selection has been at work on these two sequences, relative to each other (Hurst, 2002). Two sequences may have a ratio very close to one even if strong selective forces have acted on them, because purifying selection and positive selection may both have been at work to a significant, yet approximately equal, degree, resulting in a ratio very close to one. Nonetheless, given the fact that neutral evolution and genetic drift are taken as the default assumption for the source of most genetic change, a ratio of one is 28 generally accepted as an indication of neutral evolution, and a ratio that differs from one is generally accepted as sufficient evidence of selection. Each nucleotide, depending on whether it is first, second, or third in the codon and on which codon it is part of, differs in the likelihood that a mutation will result in an amino acid change. A number of different programs exist that use different methods to calculate this likelihood (Hurst, 2002). The likelihood referred to here is expressed as a fraction with a value between zero and one that signifies the probability that a change at that locus will be synonymous or nonsynonymous. For example, the A in ATT has a synonymous value of 0 and a nonsynonymous value of 1, because a mutation to any of the other three nucleotides will result in an amino acid change. However, the T in position three in ATT has a synonymous value of 2/3 and a nonsynonymous value of 1/3, because a shift to a C or an A in this position will result in a sense mutation. Only a mutation to G at this position will result in an amino acid change. The numbers obtained for all sites in a sequence are added together, resulting in a complete number of synonymous sites and a complete number of nonsynonymous sites for that sequence. The comparison between two sequences is obtained by dividing the total number of nonsynonymous sites in the first sequence by the total number of nonsynonymous sites in the second. The result is designated dN (or sometimes Ka). Likewise, the total number of synonymous sites in the first sequence is divided by the total number of synonymous sites in the second sequence, for a result designated dS (or, alternatively, Ks). The final result is obtained by dividing dN by dS, and dN/dS = ω, the nonsynonymous-to-synonymous 29 ratio between the two sequences, and a commonly accepted measure of the degree of positive or purifying selection that has occurred along two lineages. This measure is quite accurate when the two lineages are moderately closely related, but it becomes less reliable if the number of nonsynonymous changes between them is very high. In such cases, the mathematical models can no longer distinguish a measurable degree of divergence. As Hurst describes it, “the amount of information from the alignment decreases and we approach saturation” (Hurst, 2002). The modern human and Denisovan sequences are, of course, so similar that inaccuracy due to an overabundance of changes is far from an issue. Another limitation on the usefulness of the dN/dS metric is that the calculation was originally intended to assess selection when studying lineages that are only distantly related to one another. This is generally a problem if two samples from within the same population are compared to one another (Kryazhimskiy & Plotkin, 2008). The logical conclusion is that, as modern humans and Denisovans are neither so distantly divergent as to have an overly high number of nonsynonymous differences nor part of the same population, dN/dS is a useful calculation to apply. One thing crucial to keep in mind when choosing sequences is that dN/dS can only be calculated if both synonymous and nonsynonymous changes exist between them. The ratio is useless if no nonsynonymous changes occur, because the numerator will always be zero in such a case. If no nonsynonymous differences exist between the sequences, the calculation cannot be performed because the denominator will be zero. 30 Two of the male reproductive genes with Denisovan/modern human differences at fixed loci have been found to show a correlation between positive selection and mating system, discussed in greater detail below. The foundation for most calculators of ω is the maximum-likelihood method, which, in brief, involves counting the numbers of synonymous and nonsynonymous sites and making adjustments for the fact that multiple substitutions may have occurred at the same locus through evolutionary time. Because transitions are more likely to be incorporated into a population as polymorphisms, and transversions are rarer due to lessfavorable molecular kinetics, the more accurate methods also take into account the ratio of the rates of transitions to transversions in the compared sequences (Z. Yang & Nielsen, 2000). It’s important to keep in mind when doing pairwise comparisons such as these that the ratio obtained is not a measure of one of the two species as compared to the other. It is instead a measure of the degree of positive selection in that gene or region for both species as compared to each other. Although when we construct phylogenetic trees we may choose a species to serve as a representative of the ancestral state, no extant species is in point of fact ancestral to another extant species. Likewise, an extinct lineage cannot be assumed to be ancestral to a related living lineage, and in fact we know with certainty that Denisovans are not ancestral to modern humans, except for a tiny percentage of introgression into Melanesian DNA. The various software tools that calculate dN/dS construct unrooted trees. A comparison of human SEMG2 to gorilla SEMG2 will give the same ω as a comparison of gorilla to human. Meaningful 31 conclusions regarding the significance of ω, in the sense of distinguishing the degree of positive selection in the same gene in each of two species, can only be made by multiple interspecies comparisons, rather than calculating a single ratio based on the numbers of synonymous and nonsynonymous differences between two species. It is also important to note that many tools exist for determining ω. All operate on the same principles but use different algorithms. This means that multiple ratios obtained using the same tool and identical methods are meaningful relative to each other, but two different software applications may not produce identical values for ω using the same data. As we have already established, ω is not an absolute measure but an indication. Therefore, if the results of two tools or calculation methods are to be compared, trends in ω should be expected to follow similar curves, but exact values of ω may be different. Another logical point follows this one. Any correlation between ω and another measure may be determined in terms of directionality, but not in terms of precise numerical equivalencies. The goal of this section is to compare the degree of positive selection in Denisovan and modern human male reproductive genes. Out of all the genes found to have Denisovan-specific mutations at fixed loci and those found to have human-specific mutations at fixed loci, two, SEMG2 and ADAM18, have previously been shown to have a direct correlation between degree of positive selection undergone by the gene and level of promiscuity. In a multispecies comparison, SEMG2 shows progressively more positive selection in species in which the females mate with a larger number of males during a single fertile period (Dorus et al., 2004). According to a 2010 study by Finn and Civetta, 32 the ADAM proteins all show evidence of positive selection, particularly at codons found within their disintegrin domains. ADAM2, ADAM18 and ADAM23, (as well as SEMG2, which was used as a positive control in this study), show a correlation between positive selection and multimale-multifemale mating systems, in which sperm are subject to strong postcopulatory selective pressures (Finn & Civetta, 2010). Although only the male reproductive genes with differences at fixed loci were selected for closer scrutiny in this study, all differences between the modern human and Denisovan exons of the relevant isoform of SEMG2 were taken into consideration in calculating ω values. The goal of this section was to determine whether a significant difference exists between the degree of positive selection found in modern human SEMG2 and Denisovan SEMG2. Six pairwise comparisons were performed, Denisovan to chimpanzee, human to chimpanzee, Denisovan to gorilla, human to gorilla, chimpanzee to gorilla, and human to Denisovan. The chimpanzee and gorilla sequences for this gene were used because the chimpanzee is the most promiscuous, and the gorilla the least, of the species for which data on mating systems is available. The results of these comparisons are given in the table below. No complete, reviewed sequence for gorilla ADAM18 CDS was available, and therefore it was concluded that reliable dN/dS calculations could not be obtained. The results of the SEMG2 calculations are given in Table 1. 33 Species Pair ω Denisovan:Chimpanzee 1.0509 Human:Chimpanzee 1.2220 Denisovan:Gorilla 0.4431 Human:Gorilla 0.4852 Chimpanzee:Gorilla 1.9514 Human:Denisovan 0.9433 Table 1. SEMG2 dN/dS Ratios. When compared directly with each other, the Denisovan and human sequences have a value only slightly below one, which is to be expected due to the close similarity of these sequences. When the human and Denisovan SEMG2 sequences are compared to gorilla SEMG2, the value is significantly below one, indicating purifying selection. When compared to the chimpanzee SEMG2 sequence, Denisovan and human SEMG2 sequences both give a value above one, indicating positive, otherwise known as diversifying, selection. The implication of these ω values is best understood by contrasting the chimpanzee-to-gorilla comparison with the modern human and Denisovan calculations against each of these two other primate species. Chimpanzee matings involve the most males per female fertile cycle, and gorilla matings involve the fewest. This is reflected in a ω that is distant from the neutral value of one. It was observed during the performance of these calculations that PAL2NAL does not always give precisely the same value for ω when given exactly the same sequences, 34 particularly when the two sequences are quite different. This variability was used to calculate the standard deviation of the ω calculation method. In both the comparisons to the fewest-males species and the most-males species, the modern human ω is higher than the Denisovan ω. At first glance, this might appear to indicate increased positive selection along the modern human lineage. However, this is misleading, because the informative aspect of the ω values is not in absolute number but in relative proximity to the species with the most and fewest male partners per fertile time. The difference between modern humans and Denisovans in this respect, despite being slight in comparison to the difference between more distantly related lineages, is significant in light of the small variability of the calculation method. The following figure provides a visual expression of the relationships from the table above. SEMG2 dN/dS Ratio: Humans and Denisovans Superimposed 1.5 1 0.5 0 Figure 1. SEMG2 dN/dS Ratios. Compared to Humans Compared to Denisovans 35 More nonsynonymous differences exist between the modern human and chimpanzee sequences than between the Denisovan and chimpanzee sequences (a difference in ω value of 0.1711). It is also true that more nonsynonymous differences exist between the modern human and gorilla sequences than between the Denisovan and gorilla sequences, but this difference is comparatively small (a difference in ω value of 0.0421), four-fold smaller than the difference between the two pairwise comparisons against the chimpanzee sequence. The two nonsynonymous differences between the modern human and Denisovan sequences, resulting in changes to amino acids 274 and 298, are both sites at which the Denisovan sequence has the derived allele and the human sequence has the ancestral allele, as compared to the gorilla, chimpanzee, macaque, colobus, and orangutan. One of these sites is fixed in modern humans, and the other is polymorphic in modern humans. The presence of the fixed locus that is derived in Denisovans indicates that modern human SEMG2 is under purifying selection that was relaxed in Denisovans at that locus. Based on the previous studies correlating ω with mating system, this suggests the possibility that Denisovans, on average through evolutionary time since the reproductive split with the modern human lineage, may have had more male partners per periovulatory period than anatomically modern human females. 36 Premature Stop Codons and Loss-of-Stop Mutations A total of 12 CCDS genes contain premature stop codons at loci fixed in modern humans. These premature stops are summarized in Table 2. Gene Name Locus of Mutation Modern Denisovan Human Allele Allele Amino Acid Position A A T A T Amino Acid in Modern Humans Q Q E R Q 34 761 655 132 775 Total Amino Acids in Modern Human Protein 475 765 673 446 783 ZCCHC5 LRCH2 TTLL10 NADK CCDC30 (aka PFD6L) OR5AC2 GAP43 PROL1 PON3 PZP EFCAB13 TAC4 X_77913818 X_114347796 1_1133168 1_1691211 1_43119670 G G G G C 3_97806515 3_115382704 4_71275250 7_94992090 12_9303247 17_45452287 17_47921435 C C C C A C G T T T T T T A R Q R W Y Q R 167 27 69 253 1459 443 70 309 238 248 354 1482 973 113 Table 2. Premature Stop Codons Found in the Denisovan Exome. It is not possible to predict the precise nature of the physiological effects of these premature stops, beyond the conclusion that most of these genes were non-functional in Denisovans. Some effects may have been minimal, because some of these genes, such as TTLL10, are members of gene families in which a number of other genes perform similar functions. As another example, GAP43 has two isoforms, and only one is affected by the premature stop codon. These factors most likely mitigated the effects of the loss of these genes. Another important observation based on the table above is that several of these stop codons, notably those in LRCH2, TTLL10, and CCDC30, are located very near the end of the modern protein’s coding sequence, meaning the protein may well have 37 remained functional in Denisovans, albeit in an altered form. In terms of the effects of those premature stops that did result in complete loss of function are some of the phenotypic effects of variations in these genes when mutations are found in modern humans. Although this does not offer any direct suggestions as to the nature of the differences between Denisovans and modern humans, it does provide insight into general body systems in which differences may have been found, as well as suggesting some possible, very general predilections Denisovans might hypothetically have had for certain types of disease. The genes with premature stop codons in Denisovans are discussed here in chromosomal order, beginning with X and ascending numerically. It is important to note that this is not a comprehensive list of all Denisovan premature stops in genes with a CCDS, but rather of all such changes at loci that are fixed in modern humans. ZCCHC5 does not have a known disease mutation in modern humans. However, a shared variant, interpreted as benign, was found in siblings with polymicrogyria (Murdock et al., 2011). The other premature stop on the X chromosome is found in LRCH2. This gene also lacks known disease associations due to mutations. LRCH2 expression is down-regulated during human embryogenesis in the presence of EtOH, suggesting that LRCH2 may be one of a number of genes that play some role in fetal alcohol syndrome (Halder et al., 2013). Chromosome 1 harbors three Denisovan-specific premature stops. The first is found in TTLL10, which functions as a slow-acting glycylase, ligating glycine to NAP1 (Ikegami et al., 2008). The second is found in NADK, which presents the potentially most puzzling exception to the conclusion drawn above, that the functions of most of the 38 proteins encoded by these genes were fulfilled by other, similar proteins. NADK consists of 446 amino acids. Amino acid 132, an arginine in modern humans, is replaced by a stop codon in Denisovans, rendering this protein inactive. According to recent publications, until recently only one mammalian NADK was known to exist (Pollak, Niere, & Ziegler, 2007; R. Zhang, 2013). NADK phosphorylates NAD(+) to yield NADP(+) (Ohashi, Kawai, Koshimizu, & Murata, 2011). It plays a key role in multiple biosynthetic pathways and functions as a “universal electron donor” (Agledal, Niere, & Ziegler, 2010). NADK also controls the levels of NADPH in humans, which is vital to resistance to oxidative stress (Pollak et al., 2007). The explanation for the apparent inactivation of NADK in the Denisovan sequence, despite the vital nature of this gene’s function, may be found in the possibility that additional NAD kinases exist in mammals but have yet to be identified. In fact, a human mitochondrial gene that codes for an NAD kinase, MNADK, has recently been identified (R. Zhang, 2013). The third and final premature stop mutation on Chromosome 1 is found in CCDC30, also known as PFD6L. PFD6L is a cytoskeletal protein, only recently characterized and possibly functioning as part of the filament motor system. It is expressed in the pancreas, brain, and kidneys of adult humans (J. Zhang et al., 2006). The first of two premature stops on Chromosome 3 affects OR5AC2. In modern humans, large heterozygous deletions in this gene may be associated with epilepsy (Heinzen et al., 2010). The second Chromosome 3 gene affected by a premature stop, GAP43, is of interest because based on observations so far, it may be the only neuronal gene to be rendered nonfunctional in Denisovans. Rare mutations in GAP43, possibly 39 affecting the formation of synapses, may be implicated in schizophrenia (Shen et al., 2012). Only one gene on Chromosome 4 is affected, PROL1. The encoded protein, opiorphin, functions to suppress pain and regulate mood (Wisner et al., 2006). Two homologs exist, SMR3A and SMR3B, that also encode proteins of the opiorphin family (Koffler et al., 2012). PROL1 is down-regulated in erectile dysfunction (Tong, Tar, Melman, & Davies, 2008). We now jump to Chromosome 7, where PON3 includes a premature stop in the Denisovan sequence. Absence or down-regulation of PON3 appears to reduce resistance to oxidative stress and may contribute to neonatal mortality (Kempster, Belteki, Licence, Charnock-Jones, & Smith, 2012). Elevated PON3 may protect against oxidative damage in HIV patients (Aragones et al., 2012). PON3 SNPs are associated with Alzheimer’s disease (Erlich et al., 2012). Chromosome 12 includes one premature stops. The first is found in PZP. In modern humans, a SNP on PZP may be associated with nonalcoholic fatty liver disease (Chalasani et al., 2010). PZP is highly expressed in Alzheimer’s patients prior to the disease becoming symptomatic (Ijsselstijn et al., 2011). EFCAB13, found on Chromosome 17, also includes a Denisovan-specific premature stop at a locus that is fixed in modern humans. EFCAB13 currently lacks extensive research on disease associations. The encoded protein may be differentially expressed in Graves’ disease, but this apparent difference may be due to experimental error (Matsumoto et al., 2013). A final premature stop, also on Chromosome 17, is 40 located on TAC4, a gene that may be implicated in inflammatory bowel disease (L. Liu et al., 2011). This gene plays a role in early lymphocyte development; TAC4 knockout mice have greater numbers of pro-B cells in bone marrow than wild-type mice (Berger et al., 2010). In addition to these premature stops, the Denisovan sequence includes three lossof-stop mutations, detailed in Table 3. Gene Name Locus of Mutation Modern Human Allele Denisovan Allele PRR15 ZNF804B KTN1 7_29606333 7_88966344 14_56147406 T T T C A A Amino Acid Gained in Denisovan Gene Q K R Table 3. Loss-of-Stop Mutations. The first two loss-of-stop mutations are found on Chromosome 7. The first affects PRR15, in which the mutation of a T to a C in Denisovans results in the conversion of a modern-human stop codon to a glutamine in Denisovans. In modern humans, mutations in PRR15 are associated with Alzheimer’s disease (Olah et al., 2011). The second gene affected on Chromosome 7 is ZNF804B, in which the mutation of a T in modern humans to a C in Denisovans results in the gain of a lysine in place of a stop codon. Modernhuman mutations in ZNF804B have been suggested to have associations with anorexia nervosa (K. Wang et al., 2010). The third and final loss-of-stop mutation is found on Chromosome 14, on the gene KTN1, in which to mutation of a T to an A results in the gain of an arginine in Denisovans. KTN1 is differentially expressed in some tumor 41 tissues (Babeto et al., 2011) Variations are found in patients with muscular dystrophy (Aurino et al., 2008). Genes Related to Neuron Formation and Function Almost two percent of the catalog of nonsynonymous Denisovan-derived changes at fixed loci consists of changes in genes related to the formation and function of neurons. The total number of neuron-related SNCs in this category is 41, with multiple changes on a few genes for a total of 34 neuron-related genes. This was chosen as an area for investigation for two reasons. The first is the relative frequency of these changes. The second reason is the fact that neurological differences between modern humans and ancient hominids are the subject of a persistent area of interest for researchers and the public. The locations of the mutations in neuronal genes were studied to determine whether the mutations were located in a key functional domain and whether the amino acid substitution represents a dramatic change in chemical properties. A table is included in Appendix B showing all 40 genes, the position and nature of each amino acid change, and the functional effect predictions for each change from three different prediction tools. Amino acid change substitution positions and functional effect predictions are taken from the supplementary material of Meyer et al.’s work on the Denisovan genome (Meyer et al., 2012). In addition, the table includes information on the motif or domain where the change is located. 42 After the motifs and domains of fixed derived changes in neuron-associated genes were determined, the annotations and literature sources on each gene were studied to determine which genes were of greatest interest for further follow-up. Out of 34 genes that both appear in the Denisovan-derived catalog of fixed loci and are associated with neuronal growth, function, or differentiation, 13 were chosen for further investigation and discussion here. These genes are by no means the only neuron-associated genes in the catalog worthy of investigation; other genes among the 34 may well be of equal interest for future study. The 13 discussed below were chosen based on the fact that they are currently known to have disease associations in the modern human population. In many cases, these modern disease variants result in serious impairments unlikely to have been present as a frequent allele in Denisovans. The purpose of addressing the diseases associated with these genes is to point out the phenotypes affected by changes in these genes, in the hopes of shedding light on the aspects of physiology that may have been slightly modified in Denisovans as compared to modern humans. This set of genes is selected for the presence of a mutation that is derived in Denisovans at a locus that is fixed in modern humans. For this purpose, a locus is considered fixed if no known polymorphism exists at the site, or if a polymorphism is known to exist but has a frequency of less than one percent in the modern human population. The mutations in the genes described below do not occur at low-frequency polymorphism sites unless otherwise noted, which means that the loci of the changes are not currently known to have direct associations with the diseases discussed below. As in previous sections, the genes are listed according to chromosomal location. 43 L1CAM L1CAM encodes a protein that functions in the cell adhesion of neurons. Mutations in L1CAM have been associated with X-linked mental retardation and hydrocephalus (Vits et al., 1994). The group of heritable diseases linked to L1CAM is collectively known as CRASH Syndrome. CRASH stands for corpus callosum hypoplasia, retardation, adducted thumbs, spasticity, and hydrocephalus. Deletions that result in a shortening of the protein’s extracellular domain lead to the most severe effects (Yamasaki, Thompson, & Lemmon, 1997). L1CAM is also highly expressed in several types of cancer, including colon cancer, gliomas of the brain or spinal cord, breast cancer, and pancreatic cancer. Its overexpression is associated with metastasis (Gavert et al., 2007; Geismann et al., 2009; H. Zhang et al., 2011). The point mutation in the Denisovan sequence results in a change is from leucine to valine, both aliphatic, nonpolar, hydrophobic amino acids. This mutation is predicted to be benign. PCSK9 The effects of mutations in PCSK9 are more subtle than those in L1CAM. PCSK9 has been associated with hypercholesterolemia (Abifadel et al., 2003). However, nonsense mutations have also been associated with lower than average levels of lowdensity lipoprotein (LDL) (Cohen et al., 2005). According to mouse studies, PCSK9’s effect on cholesterol levels, via the destruction of LDL receptors in the brain, appears to take place during embryogenesis. This gene also appears to be involved in the apoptosis 44 of neurons. Loss-of-function mutations in PCSK9 lead to an overabundance of LDL cholesterol in the blood, whereas gain-of-function mutations decrease the amount of LDL in circulation (Rousselet et al., 2011). This effect is due to the fact that during early development, PCSK9 degrades LDL receptors (Canuel et al., 2013). PCSK9 may continue to destroy LDL receptors in the liver in adult organisms (Horton, Cohen, & Hobbs, 2007). The change in the Denisovan version of PCSK9 is located in the primary functional domain of this protein, a proprotein convertase subtilisin/kexin domain. The mutation leads to a change from histidine in humans to leucine in Denisovans, a change from a polar to a nonpolar amino acid. Despite this, the change is predicted to be benign. HMCN1 The protein encoded by HMCN1 is an extracellular immunoglobulin. Based on studies of a C. elegans homolog to HMCN1, this gene most likely plays a role in anchoring mechanosensory neurons to the epidermis (R. W. Li, Li, & Wang, 2012). A few rare mutations in HMCN1 are associated with age-related macular degeneration, but many of the more common mutations in this gene do not appear to have an effect on tendency toward developing macular degeneration (Fisher et al., 2007). Mutations in this gene have also been shown to influence susceptibility to postpartum depression (Friedman, 2009). The Denisovan-specific allele in this gene is predicted to be deleterious. The mutation changes amino acid 509 from arginine to tryptophan, a shift from a polar to a 45 nonpolar residue, and the change is located in an immunoglobulin domain, most likely affecting the protein’s function. SETD2 This gene encodes a histone H3K36 trimethyltransferase that is initially recruited by RNA polymerase II and additionally recruited during the splicing process (De Almeida et al., 2011). SETD2 interacts with huntingtin, and this interaction may be mediated by an interaction with p53 (Xie et al., 2008). The pathogenesis of Huntington’s disease is known to be due to the elongation of a polyglutamine region of the huntingtin protein, but the mechanism of disease development is not known. The fact that huntingtin interacts with WW domain-containing proteins such as SETD2 (also known as HYPB) suggests that SETD2 and related genes may be involved with pathogenesis (Faber et al., 1998). The disease mutant form of huntingtin binds to WW domains at a higher rate than normal huntingtin, supporting the argument that SETD2 interaction is involved in the disease (Passani et al., 2000). The Denisovan variant in SETD2 is benign or neutral according to two prediction tools and deleterious according to a third. It converts amino acid 1259 from a glycine in modern humans to a serine in Denisovans, a shift from nonpolar to polar. The change is not located within a major domain or motif. 46 REST REST is crucial for the regulation of neuronal differentiation. In neuronal tissues, it activates the expression of numerous genes involved in neural development, and it suppresses these same genes in non-neuronal tissues. Reduced levels of REST expression are a consequence of trisomy 21, both during early development and later in life (Canzonetta et al., 2008). The lower expression of REST in Down syndrome leads to an overexpression of DYRK1A, which encodes a highly conserved protein kinase believed to be involved in memory and learning. This DYRK1A overexpression appears to be responsible for the deregulation of a group of several genes involved in neural development. Therefore, the interaction of REST and DYRK1A may be responsible for a significant part of the trisomy 21 phenotype (Lepagnol-Bestel et al., 2009). The Denisovan-specific allele in REST is not located within a major functional domain or motif. The mutation changes amino acid 885 from threonine to alanine, a change from a polar, hydrophilic amino acid to one that is nonpolar and hydrophobic. The change is predicted to be benign, however, probably due to being located outside the protein’s primary functional domains. The locus of this change is that of a known SNP, rs1442591. The dominant modern-human nucleotide variant at this locus is an A. Both the tiny percentage of modern humans with the minority variant and the Denisovan sequence have a G at this locus. According to dbSNP, the G allele at this locus is not currently known to have any clinical significance. 47 GDNF GDNF is a neurotrophic factor particularly involved in the development of peripheral neurons (Trupp et al., 1995). It is also a factor in the differentiation and survival of dopaminergic neurons in the midbrain (Lin, Doherty, Lile, Bektesh, & Collins, 1993). GDNF knockout mice lack kidneys and enteric neurons (Sánchez et al., 1996). Hirschsprung disease, a genetic abnormality of the colon resulting from a lack of enteric ganglion cells in the colon, is associated with several mutations in GDNF. None of these mutations are sole causative agents of the disease, but they are part of a complex of genetic factors that, in combination, can lead to Hirschsprung (Eketjäll & Ibáñez, 2002). GDNF modulates the excitability of dopamine neurons and may assist in synapse formation and axon branching (Airaksinen & Saarma, 2002). Because of GDNF’s role in the survival of dopaminergic neurons, in the late 1990s rodent and rhesus monkey trials were conducted to determine the efficacy of administering the encoded protein for dopaminergic neuron recovery in Parkinson’s disease. These studies indicated that GDNF may be a possible treatment in humans (Date, Aoi, Tomita, Collins, & Ohmoto, 1998; Gash et al., 1996). However, patient improvements in recent human clinical trials were slow, and GDNF administration as a therapy for Parkinson’s requires further research (Kordower & Bjorklund, 2013). The SNC present in the Denisovan GDNF sequence is most likely deleterious. It is located in a glial cell line-derived neurotrophic factor domain and causes a change in amino acid 79 from aspartic acid to valine, a change from a polar to a nonpolar, 48 hydrophobic residue. This change is of interest because it suggests a possible distinction in the regulation of dopaminergic neurons between Denisovans and modern humans. NYAP1 The NYAP proteins regulate PI3K signaling, which is a key part of the organization of neurons into their correct functional units. This signaling pathway is being studied for its potential role in schizophrenia, as well as in autism and epilepsy (Mack & Eickholt, 2011). NYAP1 was first characterized in 2011. Its name stands for Neuronal tYrosine-phosphorylated Adaptor for the PI-3 kinase, and it is crucial for brain development and neurite elongation (Yokoyama et al., 2011). The Denisovan-specific allele in this gene is predicted to be neutral in its effect. Use of motif and domain prediction tools did not identify any predicted domains. This mutation changes a proline in modern humans to a leucine in Denisovans, replacing a cyclic, nonpolar residue with one that is aliphatic, nonpolar, and hydrophobic. CHAT CHAT stands for choline O-acetyltransferase. It encodes the enzyme that synthesizes the neurotransmitter acetylcholine. CHAT mutations are implicated in Alzheimer’s disease (Oda, 1999). Mutations that lead to a reduction in levels of choline O-acetyltransferase cause motor disorders (Cai et al., 2004). 49 The CHAT SNC unique to Denisovans is judged to be damaging. It is located within a choline/carnitine o-acyltransferase functional domain. A neutral glutamine in humans is replaced by a basic arginine in Denisovans. NAV2 NAV2 is expressed in the brain during development and in the kidney in the adult (Maes, Barceló, & Buesa, 2002). This gene’s product functions in the elongation of neurons, neurite growth, and the development of cranial nerves. It also regulates blood pressure in adult mammals. Mutations resulting in reduced protein expression produce a phenotype that includes malformation of the glossopharyngeal and vagus nerves and lower-than-normal nerve density in general (McNeill, Roos, Moechars, & Clagett-Dame, 2010). The Denisovan-specific mutation in NAV2 is located in a calponin homology domain. Calponin homology domains bind actin and are often involved in signaling (Castresana & Saraste, 1995). The change may be deleterious. Amino acid 151 in human NAV2 is aspartic acid, and in Denisovans 151 is asparagine, a shift from an acidic, polar, charged amino acid to oen that is neutral, polar, charged. This difference in NAV2 provides another indicator of possible ways in which the Denisovan nervous system may have differed, perhaps subtly, from that of modern humans. 50 CLN6 The protein encoded by CLN6 is expressed in the endoplasmic reticulum that aids the function of lysosomes (Mole et al., 2004). Via several possible missense mutations and deletions, CLN6 is altered in a neurodegenerative disease known as late infantile neuronal ceroid lipofuscinoses (Sharp et al., 2003). A set of point mutations in this gene is associated with adult-onset neuronal ceroid lipofuscinoses, or Kufs disease (Arsov et al., 2011). Mutation of CLN6 is also implicated in a form of progressive, neurodegenerative epilepsy (Andrade et al., 2012). The effect of the difference between Denisovans and modern humans is probably neutral. The SNC results in a shift from a valine at position 133 in humans to an isoleucine in Denisovans. No major motifs or domains were identified. HEXA This gene encodes the α-subunit of the β-hexosaminidase A enzyme. Mutations lead to low levels of the enzyme, which results in diseases related to problems with lysosomal storage. Mutations in this gene are also responsible for Tay-Sachs disease, which also involves deficient expression of β-hexosaminidase (Gort, de Olano, MaciasVidal, & Coll, 2012). The diseases associated with low levels of the enzyme are known as the GM2 gangliosidoses, because without sufficient β-hexosaminidase A, the glycosphingolipid GM2 ganglioside accumulates in the nervous system (Yamanaka et al., 1994). 51 The mutation in this gene, which leads to the substitution of a phenylalanine in Denisovans for a leucine in modern humans, is located within glycosyl-hydrolase family 20, domain 2. This change from one aliphatic, nonpolar, hydrophobic amino acid to another is predicted to be benign. CC2D1A The protein encoded by this gene is a transcriptional repressor. Mutations in CC2D1A are associated with a form of autosomal recessive non-syndromic mental retardation, which is mental retardation defined by low IQ without physical disabilities. The differences in intellectual function associated with this type of retardation are believed to be related to abnormalities in neurons and synapses (Basel-Vanagaite et al., 2006). CC2D1A’s encoded protein represses the expression of the gene for the serotonin1A receptor (Rogaeva & Albert, 2007). CC2D1A is expressed at lower-than-normal levels in patients with depression (Szewczyk et al., 2010). Perhaps more central to CC2D1A’s role in mental retardation is the fact that it is crucial to synapse maturation. Abnormalities of the synapse response rate and synaptic vesicle trafficking of cortical neurons was detected in CC2D1A-knockout mice (Zhao, Raingo, & Kavalali, 2011). The difference between Denisovan and modern human CC2D1A is located in a domain of unknown function, and the SNC is predicted to be possibly damaging by PolyPhen but is predicted to be non-damaging according to SIFT and Condel. The 52 mutation results in a change from the human glutamine at amino acid position 156 to a glutamic acid in Denisovans, a shift from a polar, charged to a polar, uncharged residue. NEFH Mutations in the KS phosphorylation motif of NEFH are associated with the development of amyotrophic lateral sclerosis (ALS) (Figlewicz et al., 1994). The encoded protein, NF-H, is a neurofilament protein important for cross-linking. An excess accumulation of neurofilaments, as well as abnormal phosphorylation of the tail region of the genes encoding neurofilament proteins, is associated with both ALS and Alzheimer’s disease (Q. Liu et al., 2004). Two nonsynonymous single-nucleotide differences between modern human and Denisovan NEFH are noted. The first affects amino acid 314, resulting in a change from an alanine in modern humans to valine in Denisovans. This change is located within an intermediate filament protein domain and is predicted to be deleterious. The second change results in a change from a lysine at position 428 in modern humans to an arginine in Denisovans. This change, not located within any major motif or domain, is predicted to be tolerated. Metabolic Genes Several genes related to metabolism, including glycolytic enzymes and genes that include SNPs related to dietary differences in the modern human population, were investigated in a search for clues as to the possible dietary habits of Denisovans. Most of 53 the SNPs investigated showed that the Denisovan sequence did not have the modern human minor variant. In other words, these are all loci at which the Denisovan shared the ancestral state with a percentage of modern humans. The SNPs that fall into the above category include rs174570, associated with dwellers in the tropics, with a minor allele that affects cholesterol levels; rs2269426, associated with a high-protein, high-fat diet that includes milk, with a minor allele that affects plasma eosinophil count; rs7395662, associated with foraging subsistence practices, with a minor allele that affects HDL levels; rs10507380, associated with pastoral subsistence practices, with a minor allele that affects electrocardiographic traits; rs17779747 and rs2722425, both associated with populations for whom roots and tubers are the primary sources of calories, affecting QT interval and fasting glucose respectively; and rs2237892, associated with a high consumption of grains. (Hancock et al., 2010). This confirms that, as expected, Denisovans did not have some of the variations known to have arisen in recent millennia in conjunction with the domestication of milk-producing animals and the farming of cereal crops. One SNP investigated for its association with a pastoral lifestyle, rs9642880, which confers a susceptibility to bladder cancer (Hancock et al., 2010), the Denisovan sequence was found to have the minor allele. One other SNP, rs4751995 in pancreatic lipase-related protein 2 (PLRP2) is worthy of special mention here. A frequent human variant at this locus is believed to be associated with adaptation to a cereal-based diet (Hancock et al., 2010). In this case, Denisovans do have the minor allele, a G at locus 10:118397884, rather than the major 54 modern human allele, an A. However, in this case, the minor modern variant is the ancestral state, so this is consistent with the hypothesis that the Denisovan sequence does not show hallmarks of any particular adaptations to a starchy diet. The genes related to glycolysis were investigated for changes that involved a derived Denisovan allele at a locus that is fixed in modern humans. In other words, the search was for respects in which Denisovan glycoloysis may have evolved along its own lines and differed not only from that of modern humans but also from that of most other extant primates. Four glycolysis genes were found to differ in this respect, as shown in Table 4. Gene name Locus of Modern nonsynonymous human SNC nucleotide/ Denisovan nucleotide 1: 9324159 C/T CDS position of change Modern human amino acid/Denisovan amino acid Protein position of change 1607 P/L 536 PHGDH phosphoglycerate dehydrogenase 1: 120266007 A/G 299 N/S 100 PFKM phosphofructokin ase, muscle PGAM5 phosphoglycerate mutase family member 5 12: 48501965 G/C 193 V/L 65 12: 133291580 G/T 328 V/L 110 H6PD hexose-6phosphate dehydrogenase (glucose 1dehydrogenase) Table 4. SNCs in Genes Encoding Glycolytic Enzymes. 55 In H6PD, the change is located in a random coil within a glucose-6-phosphate dehydrogenase domain. In PHGDH, the change is also located within a random coil. The PHGDH change is found in a D-3-phosphoglycerate dehydrogenase domain. In PFKM, the change is located in an extended strand outside of the main phosphofructokinase domain. The change in PGAM5 is located in a random coil, in a phosphoglycerate mutase domain. Further study is needed to determine the significance of the changes found in these enzymes. Metabolic Genes with Known SNPs in Modern Humans A small selection of genes was taken for the purpose of determining whether known modern variants that have been the focus of existing research are present in the Denisovan genome. This investigation, even for this limited set of genes, is preliminary. Since these genes are known to be polymorphic in the modern human population, they may well have been polymorphic in the Denisovan population also and will require further investigation once more Denisovan individuals have been sequenced. Hemochromatosis is a genetic disease that results in excess accumulation of iron in multiple tissues. It results from a mutation in the HFE gene, also known as HLA-H, found on chromosome 1 (Feder et al., 1997). A likely candidate mutation is the nonsynonymous 845GA mutation in HFE (Beutler, Felitti, Koziol, Ho, & Gelbart, 2002). The Denisovan sequence lacks this mutation. The juvenile-onset form of hemochromatosis results from mutations in another gene, HFE2, which is found on 56 chromosome 1(Papanikolaou et al., 2003). The Denisovan sequence for HFE2 shows no variations from the majority human sequence, indicating that the sequenced Denisovan individual did not carry the mutations most commonly associated with hemochromatosis. CPT1A encodes carnitine palmitoyltransferase IA, which regulates the metabolism of fatty acids. Among the Inuit people of Greenland and Canada, a loss-offunction mutation, p.P479L, is associated with higher-than-average levels of plasma HDL and may be protective against atherosclerosis or other cardiovascular disease (Rajakumar et al., 2009). This gene was investigated for the possibility that the Denisovans might have shared this mutation, which is a candidate for adaptation to a cold climate and highprotein, high-fat diet. However, this common Inuit variant was not found in the Denisovan sequence. TCF7L2, which encodes transcription factor 7-like 2, was investigated for its association with type 2 diabetes. Two polymorphisms in TCF7L2 in particular, rs12255372 and rs7903146, result in poor glucose tolerance (Florez et al., 2006). The Denisovan sequence was found to have the major human variant at rs12255372, indicating that, in this individual at least, that risk factor for type 2 diabetes were not present. This sequence does have the risk allele at rs7903146, however, indicating that at least one risk allele for poor glucose tolerance was present in the Denisovans. In modern humans, the minor variant of SNP rs9939609, in the first intron of the FTO gene, is associated with an increased risk of obesity. In this case the derived state, a T at locus 16:53820527, is the major modern human variant. The minor allele, an A at this locus, is the ancestral state and is associated with a greater accumulation of fat mass 57 (Frayling et al., 2007). The Denisovan sequence might be expected to match the ancestral state at this locus, given the fact that its evolution ceased before the development of modern methods of obtaining surplus calories. This is in fact the case. The Denisovan sequence has the ancestral A at the locus of rs9939609, indicating that the Denisovans shared a tendency toward higher body mass with a significant minority of modern humans. Interestingly, the Denisovan sequence also differs from the modern human reference sequence at a different, non-polymorphic site, 16:53738106, where the Denisovans have a G and modern humans, along with other extant primates, have an A. FTO codes for an enzyme that appears to demethylate a minor type of DNA lesion, 3-methylthymine. The protein product of this gene is highly expressed in the brain, and cycles of feeding and fasting affect its abundance. However, the exact mechanism by which FTO affects body mass is not known (Gerken et al., 2007). The Denisovan-derived change occurs outside the FTO catalytic domain, and its impact on protein function is predicted to be neutral. 58 DISCUSSION Selective Pressures in Male Reproductive Genes Sites located in male reproductive genes make up a relatively high percentage of the catalog of nonsynonymous single-nucleotide changes between modern humans and Denisovans, at loci that are fixed in modern humans. This is unsurprising since reproductive genes, and male reproductive genes in particular, are under greater selective pressures than genes expressed in tissues not related to reproduction. Two of the genes found to have fixed-locus differences between the two sequences, SEMG2 and ADAM18, have previously been shown to undergo greater positive selection in species in which greater post-mating sperm competition is a factor. One of these genes, SEMG2, was evaluated for comparative degrees of positive selection in modern humans and Denisovans, as measured by dN/dS. When both the Denisovan and modern human sequences were compared to the chimpanzee and gorilla sequences, both hominid lineages were found to be more closely aligned to chimpanzees than to gorillas. This effect was more pronounced in the Denisovan lineage, indicating greater disparity between modern humans’ mating practices and the multimale-multifemale mating practices of chimpanzees than between Denisovans and chimpanzees. This indicates the possibility that Denisovans may have tended to have a greater degree of post-copulatory selective pressure on sperm, due to females mating with more males per periovulatory period. 59 Human females and, presumably, sister lineages such as the Denisovans and Neanderthals, tend to average just over one male partner per periovulatory period. The tentative finding of this research suggests that, solely in terms of mating patterns, Denisovans may have been slightly more chimpanzee-like than modern humans—that is, in comparison to modern humans they may have tended more toward matings between one female and multiple males in the same fertility cycle. This might also be expressed conversely, in the statement that in comparison to Denisovans, modern humans may have slightly more tendency toward polygyny, in which a single male has access to multiple females who rarely mate with multiple males during a single fertility cycle. Useful future research on this topic would include an analysis of Denisovan ADAM2, ADAM18, and ADAM23. Neuronal Genes Changes in neuron-related genes represent almost two percent of the catalog of Denisovan-specific changes at fixed loci. This frequency warrants investigation. The nature of the changes in neuronal genes, overall, is not drastic, nor are most of the changes associated with known polymorphisms in modern humans. Instead, the nature of the sub-catalog of fixed derived changes suggests that Denisovans had subtle neurological different from modern humans, not dramatic ones. This finding calls for further research on the specific consequences of the changes noted, particularly those predicted to be damaging and located within primary functional domains of the encoded proteins. 60 Several of the neuronal changes noted are of particular interest. One of these is the change in a glial cell line-derived neurotrophic factor domain of GDNF, which is involved in the regulation of dopaminergic neurons. Another is the change in NYAP1, which while it is not currently predicted to be damaging, suggests the possibility of some effect on brain function due to this gene’s possible role in epilepsy, autism, and schizophrenia. Yet another gene that may be the site of some of the features that made Denisovans unique is NAV2, which includes two Denisovan-derived changes at fixed loci and is involved in regulating nerve density as well as the formation of the glossopharyngeal and vagus nerves. Another of the neuronal genes studied, CC2D1A, is intriguing for its role in mental retardation and its function in proper synapse formation. Future study of the Denisovan-specific mutations in neuronal genes may include phylogenetic analysis of these genes and laboratory study of these specific mutations. Premature Stops and Loss-of-Stop Mutations Not all premature stops in the Denisovan genome have been described here, because this investigation of premature stops is limited to those found in genes with a CCDS and that are due to mutations at loci that are fixed in the modern human population. The majority of the premature stops discovered in this research are found in genes whose functions are most likely also performed by other genes, and therefore the impact of these loss-of-function mutations may not have been extensive. The sole exception is NADK, which is essential in modern humans and has been described as the only gene performing its function, which is to phosphorylate NAD(+). However, a 61 mitochondrial gene that phosphorylates NAD(+), MNADK, has recently been described. Other genes may also exist that perform this function but have not yet been discovered. A significant opportunity for future studies lies in the realm of investigating implications of the inactivation of these genes. Only three loss-of-stop mutations are found in the Denisovan sequence at fixed loci. These present the possibility of differences in function in PRR15, a gene associated with Alzheimer’s disease; ZNF804B, a gene that may be associated with anorexia nervosa; and KTN1, a gene associated with muscular dystrophy. The precise consequences of these loss-of-stop mutations are unknown, but the mutation in PRR15 adds to the evidence for some differences in neuronal function. Like several of the genes described in the neuronal gene section, PRR15 mutations are related to neuronal function later in life. This suggests the need for future studies on the possibility that Denisovan neurons aged differently from those of modern humans. One tentative hypothesis might be that perhaps the Denisovans on average may have been more susceptible to agerelated neurological degeneration, conditions similar to Alzheimer’s and Parkinson’s disease, than modern humans. The converse hypothesis could be possible as well. These changes could have been protective against damage due to aging. However, this is less likely, given the fact that most of the changes are predicted to have deleterious effects. Future research could include single-gene knockout studies on the genes inactivated in the Denisovan sequence. 62 Metabolic Genes Several SNPs known to be associated with adaptations to foods from cultivated or domesticated sources, primarily starches, cereals, and milk, were investigated. In all cases except one, the Denisovan sequence shares the ancestral allele and, as expected, lacks the allele associated with evolutionarily recent dietary adaptations. The finding in the Denisovan sequence of a minor allele associated with a pastoral lifestyle does not offer any further information about Denisovan subsistence practices, because the minor variant of this SNP is associated with bladder cancer susceptibility rather than having a direct metabolic effect. While it is no surprise to learn that Denisovans most likely did not eat grains, the result is informative nonetheless, because the sequence also lacks variants associated with the frequent consumption of roots and tubers, which may conceivably have been part of a wild-foods diet among hominids long before the advent of horticulture or agriculture. Although we may never be able to reconstruct exactly what the average Denisovan consumed in a typical day, we may hypothesize that he or she probably did not obtain a large percentage of calories from high-starch foods. Unique Denisovan variants at fixed modern human loci were found in four of the genes that code for enzymes involved in glycolysis. These changes are may provide the foundation for future research. Additional genes investigated that are related to metabolism include HFE and HFE2, CPT1A, and TCF7L2. At most of the SNP loci in these genes, the Denisovan sequence shares the major human variant, indicating no evidence of hemochromatosis and no evidence of cold-climate adaptation resulting in elevation of plasma HDL. The 63 one exception is that the Denisovan sequence has the risk allele at one of the two SNPs in transcription factor 7-like 2 that confer diabetes risk. FTO was also investigated, and the Denisovan sequence was found to share the ancestral human allele at rs9939609, as well as having a derived variant not located within the FTO catalytic domain and predicted to be neutral in effect. Future research on metabolic genes may include study of additional diet-related SNPs, a search for additional risk alleles for diabetes, and analysis of the derived exonic SNC in FTO. 64 REFERENCES Abi-Rached, L., Jobin, M. J., Kulkarni, S., McWhinnie, A., Dalva, K., Gragert, L., . . . Plummer, F. A. (2011). The shaping of modern human immune systems by multiregional admixture with archaic humans. Science, 334(6052), 89-94. Abifadel, M., Varret, M., Rabès, J., Allard, D., Ouguerram, K., Devillers, M., . . . Erlich, D.. (2003). Mutations in PCSK9 cause autosomal dominant hypercholesterolemia. Nature Genetics, 34(2), 154-156. Agledal, L., Niere, M., & Ziegler, M.. (2010). The phosphate makes a difference: cellular functions of NADP. Redox Report, 15(1), 2-10. Agoni, L., Golden, A., Guha, C., & Lenz, J. (2012). Neandertal and Denisovan retroviruses. Current Biology, 22(11), R437-R438. Airaksinen, M. S, & Saarma, M. (2002). The GDNF family: signalling, biological functions and therapeutic value. Nature Reviews Neuroscience, 3(5), 383-394. Andrade, D. M., Paton, T., Turnbull, J., Marshall, C. R., Scherer, S. W., & Minassian, B. A. (2012). Mutation of the CLN6 gene in teenage-onset progressive myoclonus epilepsy. Pediatric Neurology, 47(3), 205-208. doi: 10.1016/j.pediatrneurol.2012.05.004 Aragones, G., Garcia-Heredia, A., Guardiola, M., Rull, A., Beltran-Debon, R., Marsillach, J., . . . Camps, J. (2012). Serum paraoxonase-3 concentration in HIVinfected patients. Evidence for a protective role against oxidation. Journal of Lipid Research, 53(1), 168-174. doi: 10.1194/jlr.P018457 65 Arsov, T., Smith, K. R, Damiano, J., Franceschetti, S., Canafoglia, L., Bromhead, C. J, . . . Rajagopalan, S. (2011). Kufs Disease, the major adult form of neuronal ceroid lipofuscinosis, caused by mutations in< i> CLN6</i>. The American Journal of Human Genetics, 88(5), 566-573. Aurino, S., Piluso, G., Saccone, V., Cacciottolo, M., D'Amico, F., Dionisi, M., . . . Nigro, V. (2008). Candidate-gene testing for orphan limb-girdle muscular dystrophies. Acta Myologica, 27, 90-97. Babeto, E., Conceicao, A. L., Valsechi, M. C., Peitl Junior, P., de Campos Zuccari, D. A., de Lima, L. G., . . . Rahal, P. (2011). Differentially expressed genes in giant cell tumor of bone. Virchows Archiv, 458(4), 467-476. doi: 10.1007/s00428-0111047-4 Basel-Vanagaite, L., Attia, R., Yahav, M., Ferland, R. J, Anteki, L., Walsh, C. A, . . . Taub, E. (2006). The CC2D1A, a member of a new gene family with C2 domains, is involved in autosomal recessive non-syndromic mental retardation. Journal of Medical Genetics, 43(3), 203-210. Berger, A., Benveniste, P., Corfe, S. A, Tran, A. H, Barbara, M., Wakeham, A., . . . Paige, C. J. (2010). Targeted deletion of the tachykinin 4 gene (TAC4−/−) influences the early stages of B lymphocyte development. Blood, 116(19), 37923801. Besecker, J., Cornell, K. A, & Hampikian, G. (2012). Dynamic passivation with BSA overcomes LTCC mediated inhibition of PCR. Sensors and Actuators B: Chemical. 66 Beutler, E., Felitti, V. J, Koziol, J. A, Ho, N. J, & Gelbart, T. (2002). Penetrance of 845G→ A (C282Y)< i> HFE</i> hereditary haemochromatosis mutation in the USA. The Lancet, 359(9302), 211-218. Burbano, H.A., Green, R.E., Maricic, T., Lalueza-Fox, C., de La Rasilla, M., Rosas, A., . . . Pääbo, S. (2012). Analysis of human accelerated DNA regions using archaic hominin genomes. PloS One, 7(3), e32877. Cai, Y., Cronin, C. N., Engel, A. G., Ohno, K., Hersh, L. B., & Rodgers, D. W. (2004). Choline acetyltransferase structure reveals distribution of mutations that cause motor disorders. The EMBO Journal, 23(10), 2047-2058. Canuel, M., Sun, X., Asselin, M., Paramithiotis, E., Prat, A., & Seidah, N. G. (2013). Proprotein convertase subtilisin/kexin type 9 (PCSK9) can mediate degradation of the low density lipoprotein receptor-related protein 1 (LRP-1). PloS One, 8(5), e64145. Canzonetta, C., Mulligan, C., Deutsch, S., Ruf, S., O'Doherty, A., Lyle, R., . . . Groet, Jü. (2008). DYRK1A-dosage imbalance perturbs NRSF/REST levels, deregulating pluripotency and embryonic stem cell fate in Down syndrome. American Journal of Human Genetics, 83(3), 388. Castresana, J., & Saraste, M. (1995). Does Vav bind to F-actin through a CH domain? FEBS letters, 374(2), 149-151. Chalasani, N., Guo, X., Loomba, R., Goodarzi, M. O., Haritunians, T., Kwon, S., . . . Rotter, J. I. (2010). Genome-wide association study identifies variants associated 67 with histologic features of nonalcoholic fatty liver disease. Gastroenterology, 139(5), 1567-1576, 1576 e1561-1566. doi: 10.1053/j.gastro.2010.07.057 Cohen, J., Pertsemlidis, A., Kotowski, I. K, Graham, R., Garcia, C. K., & Hobbs, H. H. (2005). Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nature Genetics, 37(2), 161-165. Crisci, J. L., Wong, A., Good, J. M., & Jensen, J. D. (2011). On characterizing adaptive events unique to modern humans. Genome Biology and Evolution, 3, 791. Date, I., Aoi, M., Tomita, S., Collins, F., & Ohmoto, T. (1998). GDNF administration induces recovery of the nigrostriatal dopaminergic system both in young and aged parkinsonian mice. Neuroreport, 9(10), 2365-2369. De Almeida, S. F., Grosso, A. R., Koch, F., Fenouil, R., Carvalho, S., Andrade, J., . . . Gut, I. (2011). Splicing enhances recruitment of methyltransferase HYPB/Setd2 and methylation of histone H3 Lys36. Nature Structural & Molecular Biology, 18(9), 977-983. Dorus, S., Evans, P. D., Wyckoff, G. J., Choi, S. S., & Lahn, B. T. (2004). Rate of molecular evolution of the seminal protein gene SEMG2 correlates with levels of female promiscuity. Nature Genetics, 36(12), 1326-1329. Eketjäll, S., & Ibáñez, C. F. (2002). Functional characterization of mutations in the GDNF gene of patients with Hirschsprung disease. Human Molecular Genetics, 11(3), 325-329. Ellegren, H., & Parsch, J. (2007). The evolution of sex-biased genes and sex-biased gene expression. Nature Reviews Genetics, 8(9), 689-698. 68 Endicott, P., Ho, S. Y. W., & Stringer, Ch. (2010). Using genetic evidence to evaluate four palaeoanthropological hypotheses for the timing of Neanderthal and modern human origins. Journal of Human Evolution, 59(1), 87-95. Erlich, P. M., Lunetta, K. L., Cupples, L. A., Abraham, C. R., Green, R. C., Baldwin, C. T., & Farrer, L. A. (2012). Serum paraoxonase activity is associated with variants in the PON gene cluster and risk of Alzheimer disease. Neurobiology of Aging, 33(5), 1015 e1017-1023. doi: 10.1016/j.neurobiolaging.2010.08.003 Faber, P. W., Barnes, G. T., Srinidhi, J., Chen, J., Gusella, J. F., & MacDonald, M. E. (1998). Huntingtin interacts with a family of WW domain proteins. Human Molecular Genetics, 7(9), 1463-1474. Feder, J. N., Tsuchihashi, Z., Irrinki, A., Lee, V. K., Mapa, F. A., Morikang, E., . . . Parkkila, S. (1997). The hemochromatosis founder mutation in HLA-H disrupts β2-microglobulin interaction and cell surface expression. Journal of Biological Chemistry, 272(22), 14025-14028. Figlewicz, D. A., Krizus, A., Martinoli, M. G., Meininger, V., Dib, M., Rouleau, G. A., & Julien, J. (1994). Variants of the heavy neurofilament subunit are associated with the development of amyotrophic lateral sclerosis. Human Molecular Genetics, 3(10), 1757-1761. Finn, S., & Civetta, A. (2010). Sexual selection and the molecular evolution of ADAM proteins. Journal of Molecular Evolution, 71(3), 231-240. Fisher, S. A., Rivera, A., Fritsche, L. G., Keilhauer, C. N., Lichtner, P., Meitinger, T., . . . Weber, B. H. F. (2007). Case–control genetic association study of fibulin‐6 69 (FBLN6 or HMCN1) variants in age‐related macular degeneration (AMD). Human Mutation, 28(4), 406-413. Florez, J. C., Jablonski, K. A., Bayley, N., Pollin, T. I., de Bakker, P. I. W., Shuldiner, A. R., . . . Altshuler, D. (2006). TCF7L2 polymorphisms and progression to diabetes in the Diabetes Prevention Program. New England Journal of Medicine, 355(3), 241-250. Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., . . . Rayner, N. W. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 316(5826), 889-894. Friedman, S. H. (2009). Postpartum mood disorders: genetic progress and treatment paradigms. American Journal of Psychiatry, 166(11), 1201-1204. Gash, D. M., Zhang, Z., Ovadia, A., Cass, W. A., Yi, A., Simmerman, L., . . . Collins, F. (1996). Functional recovery in parkinsonian monkeys treated with GDNF. Nature, 380, 252-255. Gavert, N., Sheffer, M., Raveh, S., Spaderna, S., Shtutman, M., Brabletz, T., . . . Domany, E. (2007). Expression of L1-CAM and ADAM10 in human colon cancer cells induces metastasis. Cancer Research, 67(16), 7703-7712. Geismann, C., Morscheck, M., Koch, D., Bergmann, F., Ungefroren, H., Arlt, A., . . . Sipos, B.. (2009). Up-regulation of L1CAM in pancreatic duct cells is transforming growth factor β1–and slug-dependent: role in malignant transformation of pancreatic cancer. Cancer Research, 69(10), 4517-4526. 70 Gerken, T., Girard, C. A., Tung, Y. L., Webby, C. J., Saudek, V., Hewitson, K. S., . . . McNeill, L. A. (2007). The obesity-associated FTO gene encodes a 2oxoglutarate-dependent nucleic acid demethylase. Science, 318(5855), 14691472. Gibbons, A. (2011a). A new view of the birth of Homo sapiens. Science, 331, 392-394. Gibbons, A. (2011b). Who were the Denisovans? Science, 333, 1084-1087. Good, J. M., Wiebe, V., Albert, F. W., Burbano, H. A., Kircher, M., Green, R. E, . . . Fischer, A. (2013). Comparative population genomics of the ejaculate in humans and the great apes. Molecular Biology and Evolution. Gort, L., de Olano, N., Macias-Vidal, J., & Coll, M. A. (2012). GM2 gangliosidoses in Spain: analysis of the HEXA and HEXB genes in 34 Tay-Sachs and 14 Sandhoff patients. Gene, 506(1), 25-30. doi: 10.1016/j.gene.2012.06.080 Gralle, M., & Pääbo, S. (2011). A comprehensive functional analysis of ancestral human signal peptides. Molecular Biology and Evolution, 28(1), 25-28. Grayson, P., & Civetta, A. (2012). Positive selection and the evolution of izumo genes in mammals. International Journal of Evolutionary Biology, 2012. Green, R. E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., . . . Fritz, M. H. Y. (2010). A draft sequence of the Neandertal genome. Science, 328(5979), 710-722. Halder, D., Park, J. H., Choi, M. R., Chai, J. C., Lee, Y. S., Mandal, C., . . . Chai, Y. G. (2013). Chronic ethanol exposure increases goosecoid (GSC) expression in 71 human embryonic carcinoma cell differentiation. Journal of Applied Toxicology. doi: 10.1002/jat.2832 Hancock, A. M., Witonsky, D. B., Ehler, E., Alkorta-Aranburu, G., Beall, C., Gebremedhin, A., . . . Coop, G.. (2010). Human adaptations to diet, subsistence, and ecoregion are due to subtle shifts in allele frequency. Proceedings of the National Academy of Sciences, 107(Supplement 2), 8924-8930. Heinzen, E. L., Radtke, R. A., Urban, T. J., Cavalleri, G. L., Depondt, C., Need, A. C., . . . Catarino, C. B.. (2010). Rare deletions at 16p13. 11 predispose to a diverse spectrum of sporadic epilepsy syndromes. The American Journal of Human Genetics, 86(5), 707-718. Horton, J. D, Cohen, J. C., & Hobbs, H. H. (2007). Molecular biology of PCSK9: its role in LDL metabolism. Trends in Biochemical Sciences, 32(2), 71-77. Hurst, L. D. (2002). The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends in Genetics: TIG, 18(9), 486. Ijsselstijn, L., Dekker, L. J., Stingl, C., van der Weiden, M. M., Hofman, A., Kros, J. M., . . . Luider, T. M. (2011). Serum levels of pregnancy zone protein are elevated in presymptomatic Alzheimer's disease. Journal of Proteome Research, 10(11), 4902-4910. doi: 10.1021/pr200270z Ikegami, K., Horigome, D., Mukai, M., Livnat, I., MacGregor, G. R., & Setou, M. (2008). TTLL10 is a protein polyglycylase that can modify nucleosome assembly protein 1. FEBS Letters, 582(7), 1129-1134. doi: 10.1016/j.febslet.2008.02.079 72 Katzman, S., Kern, A. D., Pollard, K. S., Salama, S. R., & Haussler, D. (2010). GCbiased evolution near human accelerated regions. PLoS Genetics, 6(5), e1000960. Kempster, S. L., Belteki, G., Licence, D., Charnock-Jones, D. S., & Smith, G. C. (2012). Disruption of paraoxonase 3 impairs proliferation and antioxidant defenses in human A549 cells and causes embryonic lethality in mice. American Journal of Physiology, Endocrinology and Metabolism, 302(1), E103-107. doi: 10.1152/ajpendo.00357.2011 Koffler, J., Holzinger, D., Sanhueza, G. A., Flechtenmacher, C., Zaoui, K., Lahrmann, B., . . . Hess, J. (2012). Submaxillary gland androgen-regulated protein 3A expression is an unfavorable risk factor for the survival of oropharyngeal squamous cell carcinoma patients after surgery. European Archives of Oto-Rhino-Laryngology, 1-8. Kordower, J. H., & Bjorklund, A. (2013). Trophic factor gene therapy for Parkinson's disease. Movement Disorders, 28(1), 96-109. Krause, J., Fu, Q., Good, J. M., Viola, B., Shunkov, M. V., Derevianko, A. P., & Pääbo, S. (2010). The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature, 464(7290), 894-897. Kryazhimskiy, S., & Plotkin, J. B. (2008). The population genetics of dN/dS. PLoS Genetics, 4(12), e1000304. Lalueza-Fox, C., & Gilbert, M. T. P. (2011). Paleogenomics of archaic hominins. Current Biology, 21(24), R1002-R1009. 73 Lepagnol-Bestel, A., Zvara, A., Maussion, G., Quignon, F., Ngimbous, B., Ramoz, N., . . . Agier, N. (2009). DYRK1A interacts with the REST/NRSF-SWI/SNF chromatin remodelling complex to deregulate gene clusters involved in the neuronal phenotypic traits of Down syndrome. Human Molecular Genetics, 18(8), 14051414. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., . . . Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 2078-2079. Li, R. W., Li, C., & Wang, T. T. Y. (2012). Transcriptomic alterations in human prostate cancer cell LNCaP tumor xenograft modulated by dietary phenethyl isothiocyanate. Molecular Carcinogenesis. Lin, L., Doherty, D. H., Lile, J. D., Bektesh, S., & Collins, F. (1993). GDNF: a glial cell line-derived neurotrophic factor for midbrain dopaminergic neurons. Science, 260(5111), 1130-1132. Liu, L., Markus, I., Saghire, H. E., Perera, D. S., King, D. W., & Burcher, E. (2011). Distinct differences in tachykinin gene expression in ulcerative colitis, Crohn's disease and diverticular disease: a role for hemokinin-1? Neurogastroenterol Motil, 23(5), 475-483, e179-480. doi: 10.1111/j.1365-2982.2011.01685.x Liu, Q., Xie, F., Siedlak, S. L., Nunomura, A., Honda, K., Moreira, P. I., . . . Perry, G. (2004). Neurofilament proteins in neurodegenerative diseases. Cellular and Molecular Life Sciences CMLS, 61(24), 3057-3075. 74 Mack, T. G. A., & Eickholt, B. J. (2011). New WAVEs in neuronal PI3K signalling. The EMBO Journal, 30(23), 4693-4695. Maes, T., Barceló, A., & Buesa, C. (2002). Neuron navigator: a human gene family with homology to unc-53, a cell guidance gene from Caenorhabditis elegans. Genomics, 80(1), 21-30. Maricic, T., Günther, V., Georgiev, O., Gehre, S., Ćurlin, M., Schreiweis, C., . . . Lalueza-Fox, C.. (2012). A recent evolutionary change affects a regulatory element in the human FOXP2 gene. Molecular Biology and Evolution. Matsumoto, C., Ito, M., Yamada, H., Yamakawa, N., Yoshida, H., Date, A., . . . Miyauchi, A. (2013). Genes that characterize T3-predominant Graves' thyroid tissues. European Journal of Endocrinology, 168(2), 137-144. McIntosh, A. M., Bennett, C., Dickson, D., Anestis, S. F., Watts, D. P., Webster, T. H., . . . Bradley, B. J. (2012). The apolipoprotein E (APOE) gene appears functionally monomorphic in chimpanzees (Pan troglodytes). PloS One, 7(10), e47760. McNeill, E. M., Roos, K. P., Moechars, D., & Clagett-Dame, M. (2010). Nav2 is necessary for cranial nerve development and blood pressure regulation. Neural Development, 5(6). Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2012). Global genetic variation at OAS1 provides evidence of archaic admixture in Melanesian populations. Molecular Biology and Evolution, 29(6), 1513-1520. 75 Meyer, M., Kircher, M., Gansauge, M. T., Li, H., Racimo, F., Mallick, S., . . . de Filippo, C. (2012). A high-coverage genome sequence from an archaic denisovan individual. Science, 338(6104), 222-226. Mole, S. E., Michaux, G., Codlin, S., Wheeler, R. B., Sharp, J. D., & Cutler, D. F. (2004). CLN6, which is associated with a lysosomal storage disease, is an endoplasmic reticulum protein. Experimental Cell Research, 298(2), 399-406. Mota, N. R., Araujo-Jnr, E. V., Paixão-Côrtes, V. R., Bortolini, M. C., & Bau, C. H. D. (2012). Linking dopamine neurotransmission and neurogenesis: The evolutionary history of the NTAD (NCAM1-TTC12-ANKK1-DRD2) gene cluster. Genetics and Molecular Biology, 35(4), 912-918. Murdock, D. R., Clark, G. D., Bainbridge, M. N., Newsham, I., Wu, Y., Muzny, D. M., . . . Ramocki, M. B. (2011). Whole-exome sequencing identifies compound heterozygous mutations in WDR62 in siblings with recurrent polymicrogyria. American Journal of Medical Genetics Part A, 155(9), 2071-2077. doi: 10.1002/ajmg.a.34165 Oda, Y. (1999). Choline acetyltransferase: the structure, distribution and pathologic changes in the central nervous system. Pathology International, 49(11), 921-937. Ohashi, K., Kawai, S., Koshimizu, M., & Murata, K. (2011). NADPH regulates human NAD kinase, a NADP(+)-biosynthetic enzyme. Molecular and Cellular Biochemistry, 355(1-2), 57-64. doi: 10.1007/s11010-011-0838-x Olah, J., Vincze, O., Virok, D., Simon, D., Bozso, Z., Tokesi, N., . . . Ovadi, J. (2011). Interactions of pathological hallmark proteins: tubulin polymerization promoting 76 protein/p25, beta-amyloid, and alpha-synuclein. The Journal of Biological Chemistry, 286(39), 34088-34100. doi: 10.1074/jbc.M111.243907 Ovchinnikov, I. V., & Kholina, O. I. (2010). Genome digging: insight into the mitochondrial genome of Homo. PLoS One, 5(12), e14278. Papanikolaou, G., Samuels, M. .E, Ludwig, E. H., MacDonald, M. L. E., Franchini, P. L., Dubé, M., . . . Politou, M. (2003). Mutations in HFE2 cause iron overload in chromosome 1q–linked juvenile hemochromatosis. Nature Genetics, 36(1), 77-82. Passani, L. A., Bedford, M. T., Faber, P. W., McGinnis, K. M., Sharp, A. H., Gusella, J. F., . . . MacDonald, M. E.. (2000). Huntingtin’s WW domain partners in Huntington’s disease post-mortem brain fulfill genetic criteria for direct involvement in Huntington’s disease pathogenesis. Human Molecular Genetics, 9(14), 2175-2182. Pollak, N., Niere, M., & Ziegler, M. (2007). NAD kinase levels control the NADPH concentration in human cells. Journal of Biological Chemistry, 282(46), 3356233571. Rajakumar, C., Ban, M. R., Cao, H., Young, T. K., Bjerregaard, P., & Hegele, R. A.. (2009). Carnitine palmitoyltransferase IA polymorphism P479L is common in Greenland Inuit and is associated with elevated plasma apolipoprotein AI. Journal of Lipid Research, 50(6), 1223-1228. Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., . . . Johnson, P.L.F. (2010). Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature, 468(7327), 1053-1060. 77 Reich, D., Patterson, N., Kircher, M., Delfin, F., Nandineni, M. R., Pugach, I., . . . Phipps, M. E. (2011). Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. The American Journal of Human Genetics, 89(4), 516-528. Rogaeva, A., & Albert, P. R.. (2007). The mental retardation gene CC2D1A/Freud‐1 encodes a long isoform that binds conserved DNA elements to repress gene transcription. European Journal of Neuroscience, 26(4), 965-974. Rohland, N., & Hofreiter, M. (2007). Ancient DNA extraction from bones and teeth. Nature Protocols, 2(7), 1756-1762. Rooney, A. P., & Zhang, J. (1999). Rapid evolution of a primate sperm protein: relaxation of functional constraint or positive Darwinian selection? Molecular Biology and Evolution, 16(5), 706-710. Rousselet, E., Marcinkiewicz, J., Kriz, J., Zhou, A., Hatten, M. E., Prat, A., & Seidah, N. G. (2011). PCSK9 reduces the protein levels of the LDL receptor in mouse brain during development and after ischemic stroke. Journal of Lipid Research, 52(7), 1383-1391. Sackton, T. B., Corbett-Detig, R. B., Nagaraju, J., Vaishna, R. L., Arunkumar, K. P., & Hartl, D. L. (2013). Positive selection drives faster-Z evolution in silkmoths. arXiv preprint arXiv:1304.7670. Sánchez, M. P., Silos-Santiago, I., Frisén, J., He, B., Lira, S. A., & Barbacid, M. (1996). Renal agenesis and the absence of enteric neurons in mice lacking GDNF. Nature, 382, 70-73. 78 Sauter, D., Vogl, M., & Kirchhoff, F. (2011). Ancient origin of a deletion in human BST2/Tetherin that confers protection against viral zoonoses. Human Mutation, 32(11), 1243-1245. Sharp, J. D., Wheeler, R. B., Parker, K. A., Gardiner, R. M., Williams, R. E., & Mole, S. E. (2003). Spectrum of CLN6 mutations in variant late infantile neuronal ceroid lipofuscinosis. Human Mutation, 22(1), 35-42. Shen, Y. C., Tsai, H. M., Cheng, M. C., Hsu, S. H., Chen, S. F., & Chen, C. H. (2012). Genetic and functional analysis of the gene encoding GAP-43 in schizophrenia. Schizophrenia Research, 134(2-3), 239-245. doi: 10.1016/j.schres.2011.11.016 Skoglund, P., & Jakobsson, M. (2011). Archaic human ancestry in East Asia. Proceedings of the National Academy of Sciences, 108(45), 18301-18306. Stewart, J. R., & Stringer, C. B. (2012). Human evolution out of Africa: The role of refugia and climate change. science, 335(6074), 1317-1321. Suyama, M., Torrents, D., & Bork, P. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Research, 34(suppl 2), W609-W612. Szewczyk, B., Albert, P. R., Rogaeva, A., Fitzgibbon, H., May, W. L., Rajkowska, G., . . . Kyle, P. B. (2010). Decreased expression of Freud-1/CC2D1A, a transcriptional repressor of the 5-HT1A receptor, in the prefrontal cortex of subjects with major depression. Int J Neuropsychopharmacology, 13(8), 1089-1101. Tong, Y., Tar, M., Melman, A., & Davies, K. (2008). The opiorphin gene (ProL1) and its homologues function in erectile physiology. BJU International, 102(6), 736-740. 79 Torgerson, D. G, Kulathinal, R. J., & Singh, R. S. (2002). Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes. Molecular Biology and Evolution, 19(11), 1973-1980. Trinkaus, E. (2010). Denisova Cave, Peştera cu Oase, and Human Divergence in the Late Pleistocene. PaleoAnthropology, 2010, 196-200. Trupp, M., Rydén, M., Jörnvall, H., Funakoshi, H., Timmusk, T., Arenas, E., & Ibáñez, C. F. (1995). Peripheral expression and biological activities of GDNF, a new neurotrophic factor for avian and mammalian peripheral neurons. The Journal of Cell Biology, 130(1), 137-148. Vickaryous, M. K., & Hall, B. K. (2006). Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest. Biological Reviews, 81(3), 425-455. Vits, L., Van Camp, G., Coucke, P., Fransen, E., De Boulle, K., Reyniers, E., . . . Schrander-Stumpel, C. (1994). MASA syndrome is due to mutations in the neural cell adhesion gene L1CAM. Nature Genetics, 7(3), 408-413. Wang, K., Zhang, H., Bloss, C. S., Duvvuri, V., Kaye, W., Schork, N. J., . . . Hakonarson, H. (2010). A genome-wide association study on common SNPs and rare CNVs in anorexia nervosa. Molecular Psychiatry, 16(9), 949-959. Wang, X., Mitra, N., Secundino, I., Banda, K., Cruz, P., Padler-Karavani, V., . . . Rizzi, E. (2012). Specific inactivation of two immunomodulatory SIGLEC genes during human evolution. Proceedings of the National Academy of Sciences, 109(25), 9935-9940. 80 Wisner, A., Dufour, E., Messaoudi, M., Nejdi, A., Marcel, A., Ungeheuer, M. N., & Rougeot, C. (2006). Human Opiorphin, a natural antinociceptive modulator of opioid-dependent pathways. Proceedings of the National Academy of Sciences U S A, 103(47), 17979-17984. doi: 10.1073/pnas.0605865103 Wong, A. (2010). Testing the effects of mating system variation on rates of molecular evolution in primates. Evolution, 64(9), 2779-2785. Xie, P., Tian, C., An, L., Nie, J., Lu, K., Xing, G., . . . He, F. (2008). Histone methyltransferase protein SETD2 interacts with p53 and selectively regulates its downstream genes. Cellular Signalling, 20(9), 1671-1678. Yamanaka, S., Johnson, M. D., Grinberg, A., Westphal, H., Crawley, J. N., Taniike, M, . . . Proia, R. L. (1994). Targeted disruption of the Hexa gene results in mice with biochemical and pathologic features of Tay-Sachs disease. Proceedings of the National Academy of Sciences, 91(21), 9975-9979. Yamasaki, M., Thompson, P., & Lemmon, V. (1997). CRASH syndrome: mutations in L1CAM correlate with severity of the disease. Neuropediatrics, 28(3), 175. Yang, D. Y., Eng, B., Waye, J. S., Dudar, J. C., & Saunders, S. R. (1998). Technical note: improved DNA extraction from ancient bones using silica-based spin columns. American Journal of Physical Anthropology, 105(4), 539-543. Yang, Z., & Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution, 17(1), 32-43. 81 Yokoyama, K., Tezuka, T., Kotani, M., Nakazawa, T., Hoshina, N., Shimoda, Y., . . . Iwakura, Y. (2011). NYAP: a phosphoprotein family that links PI3K to WAVE1 signalling in neurons. The EMBO Journal, 30(23), 4739-4754. Zhang, G., Pei, Z., Ball, E. V., Mort, M., Kehrer-Sawatzki, H., & Cooper, D. N. (2011). Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations. Human Genomics, 5(5), 453-484. Zhang, H., Wong, C. C. L., Wei, H, Gilkes, DM, Korangath, P, Chaturvedi, P, . . . Winnard, PT. (2011). HIF-1-dependent expression of angiopoietin-like 4 and L1CAM mediates vascular metastasis of hypoxic breast cancer cells to the lungs. Oncogene, 31(14), 1757-1770. Zhang, J., Liu, L., Zhang, X., Jin, F., Chen, J., Ji, C., . . . Mao, Y. (2006). Cloning and characterization of a novel human prefoldin and SPEC domain protein gene (PFD6L) from the fetal brain. Biochemical Genetics, 44(1-2), 69-74. doi: 10.1007/s10528-006-9008-3 Zhang, R. (2013). MNADK, a novel liver-enriched mitochondrion-localized NAD kinase. Biology Open, 2(4), 432-438. Zhao, M., Raingo, J., & Kavalali, E. T. (2011). Cc2d1a, a C2 domain containing protein linked to nonsyndromic mental retardation, controls functional maturation of central synapses. Journal of Neurophysiology, 105(4), 1506-1515. 82 APPENDIX A: MALE REPRODUCTIVE GENES Gene Name and Short Description ZNF645 zinc finger protein 645 CYLC1 cylicin, basic protein of sperm head cytoskeleton 1 CYLC1 cylicin, basic protein of sperm head cytoskeleton 1 TSSK3 testisspecific serine kinase 3 SPATA6 spermatogenesis associated 6 33 I, aliphatic, nonpolar Denisovan Amino Acid Variant and Biochemical Characteristics N, neutral, polar (uncharged) 252 G, aliphatic, nonpolar, ambivalent R, basic, polar (positively charged) 235 E, acidic, polar, charged S, nonaromatic hydroxyl, polar (uncharged), hydrophilic R, basic, polar (positively charged) D, acidic, polar, charged T, nonaromatic hydroxyl, polar uncharged, hydrophilic Q, neutral, polar (uncharged) 248 V, aliphatic, nonpolar, hydrophobic 568 V, aliphatic, nonpolar, hydrophobic Protein Position 491 267 ODF2L outer dense fiber of sperm tails 2-like SPAG16 spermassociated antigen 16 PHF7 PHD finger protein 7 359 MORC1 MORC family CW-type zinc finger 1 SPATA16 spermatogenesis associated 16 STPG2 sperm-tail PG-rich repeat containing 2 SPATA5 spermatogenesis associated 5 [Source:HGNC Symbol;Acc:1811 9] 927 Human Amino Acid Variant and Biochemical Characteristics R, basic, polar (positively charged) L, aliphatic, nonpolar, hydrophobic 292 V, aliphatic, nonpolar, hydrophobic R, basic, polar (positively charged) 264 A, aliphatic, nonpolar, hydrophobic, ambivalent 722 C, sulfurcontaining, polar (uncharged) 423 SPEF2 sperm flagellar 2 D, acidic, polar, charged M, sulfurcontaining, nonpolar, hydrophobic C, sulfurcontaining, polar (uncharged) V, aliphatic, nonpolar, hydrophobic M, sulfurcontaining, nonpolar, hydrophobic Q, neutral, polar (uncharged) V, aliphatic, nonpolar, hydrophobic, interior S, nonaromatic hydroxyl, polar (uncharged), hydrophilic SIFT Prediction of Functional Effect PolyPhen Prediction of Functional Effect Condel Prediction of Functional Effect deleterious possibly damaging deleterious Secondary Structure of Locus Change not located within major motif. unknown Changes not located within major motifs. unknown Changes not located within major motifs. deleterious benign neutral Change located in a protein kinaselike domain. tolerated possibly damaging neutral deleterious possibly damaging deleterious No domains/motifs identified. Change located in a domain that shows homology to INCA. Locus moderately conserved. deleterious probably damaging deleterious Change located in a WD40 repeat region. tolerated tolerated neutral benign tolerated deleterious probably damaging deleterious deleterious benign neutral deleterious probably damaging deleterious tolerated benign neutral deleterious benign neutral Change not located within major motif. Zinc finger CWtype coiled-coil domain. No domains/motifs identified. Change not located within major motif. Change is not found within a major motif. Entire protein is an AAAfamily ATPase. Change located ina P-loop containing nucleoside triphosphate hydrolase domain. 83 SPATA9 spermatogenesis associated 9 CATSPER3 cation channel, sperm associated 3 179 I, aliphatic, nonpolar 394 K, basic, polar (positively charged) ZNF165 zinc finger protein 165 61 NME8 NME/NM23 family member 8 SPAM1 sperm adhesion molecule 1 (PH20 hyaluronidase, zona pellucida binding) 336 R, basic, polar (positively charged) 105 I, aliphatic, nonpolar 465 N, neutral, polar (uncharged) ADAM18 ADAM metallopeptidase domain 18 SPATC1 spermatogenesis and centriole associated 1 ODF2 outer dense fiber of sperm tails 2 CATSPER1 cation channel, sperm associated 1 MTL5 metallothioneinlike 5, testisspecific (tesmin) AKAP3 A kinase (PRKA) anchor protein 3 724 P, cyclic, nonpolar R, basic, polar (positively charged) 371 R, basic, polar (positively charged) 75 27 P, cyclic, nonpolar 87 N, neutral, polar (uncharged) CCDC65 coiledcoil domain containing 65 473 AKAP11 A kinase (PRKA) anchor protein 11 AKAP11 A kinase (PRKA) anchor protein 11 REC8 REC8 homolog (yeast) P, cyclic, nonpolar 903 1420 546 I, aliphatic, nonpolar R, basic, polar (positively charged) E, acidic, polar, charged H, basic, polar, positively charged V, aliphatic, nonpolar, hydrophobic T, nonaromatic hydroxyl, polar uncharged, hydrophilic S, nonaromatic hydroxyl, polar (uncharged), hydrophilic C, sulfurcontaining, polar (uncharged) V, aliphatic, nonpolar, hydrophobic S, nonaromatic hydroxyl, polar (uncharged), hydrophilic H, basic, polar, positively charged K, basic, polar (positively charged) C, sulfurcontaining, polar (uncharged) S, nonaromatic hydroxyl, polar (uncharged), hydrophilic T, nonaromatic hydroxyl, polar uncharged, hydrophilic T, nonaromatic hydroxyl, polar uncharged, hydrophilic H, basic, polar, positively charged K, basic, polar (positively charged) Q, neutral, polar (uncharged) deleterious benign neutral No domains/motifs identified. deleterious benign deleterious tolerated benign neutral tolerated possibly damaging deleterious tolerated benign neutral tolerated benign neutral Change is found in a voltage-gated cation channel domain. Change located in a SCAN domain. Locus of change is not very conserved. Change located in a nucleoside diphosphate kinase domain. Change located in an aldolase-type TIM barrel. The domain belongs to the glycoside hydrolase superfamily. Change located in a disintegrin domain. The location is slightly conserved. deleterious probably damaging deleterious No domains/motifs identified. tolerated probably damaging neutral deleterious benign neutral No domains/motifs identified. Change is found in a voltage-gated cation channel domain. tolerated benign neutral Change not located within major motif. tolerated probably damaging neutral Entire protein is an A-kinase anchor. tolerated benign neutral No domains/motifs identified. tolerated benign neutral Entire protein is an A-kinase anchor. tolerated benign neutral Entire protein is an A-kinase anchor. deleterious possibly damaging deleterious Entire protein is an SCC1/RAD21 family member. 84 ZP2 zona pellucida glycoprotein 2 (sperm receptor) SPEM1 spermatid maturation 1 SPATA32 spermatogenesis associated 32 Q, neutral, polar (uncharged) Q, neutral, polar (uncharged) R, basic, polar (positively charged) tolerated possibly damaging neutral E, acidic, polar, charged tolerated benign neutral Change located in a zona pellucida domain. Locus not very conserved. Change not located within major motif. D, acidic, polar, charged deleterious benign neutral No domains/motifs identified. tolerated probably damaging deleterious 287 E, acidic, polar, charged T, nonaromatic hydroxyl, polar uncharged, hydrophilic R, basic, polar (positively charged) deleterious probably damaging deleterious 345 R, basic, polar (positively charged) deleterious probably damaging deleterious 140 P, cyclic, nonpolar W, aromatic, nonpolar C, sulfurcontaining, polar (uncharged) S, nonaromatic hydroxyl, polar (uncharged), hydrophilic 133 G, aliphatic, nonpolar, ambivalent 412 86 383 SPATA20 spermatogenesis associated 20 716 TEX14 testis expressed 14 THEG theg spermatid protein ODF3L2 outer dense fiber of sperm tails 3-like 2 N, neutral, polar (uncharged) tolerated benign neutral D, acidic, polar, charged tolerated probably damaging deleterious F, aromatic, nonpolar, hydrophobic tolerated benign neutral Change located in a sperm-tail PGrich repeat domain. This is a semenogelin/semin al vesicle secretory protein. The locus of the change is highly conserved, and the substituted amino acid does not match the profile for a semenogelin/semin al vesicle secretory protein. This is a seminogelin/semin al vesicle secretory protein. The locus of the change is slightly conserved, and the substituted amino acid does not match the profile for a seminogelin/semin al vesicle secretory protein. neutral Change not located within major motif. neutral Change not located within major motif. SEMG1 semenogelin I SEMG2 semenogelin II 298 SPATA2 spermatogenesis associated 2 WBP2NL WBP2 N-terminal like 25 276 L, aliphatic, nonpolar, hydrophobic S, nonaromatic hydroxyl, polar (uncharged), hydrophilic S, nonaromatic hydroxyl, polar (uncharged), hydrophilic Change not located within major motif. Change located in a protein kinase catalytic domain. Change located within a testicular haploid expressed repeat domain. R, basic, polar (positively charged) L, aliphatic, nonpolar, hydrophobic tolerated tolerated benign benign 85 APPENDIX B: NEURONAL GENES Gene Name and Short Description Protein Position L1CAM L1 cell adhesion molecule Denisovan Amino Acid Variant and Biochemical Characteristics SIFT Prediction of Functional Effect 412 V, aliphatic, nonpolar, hydrophobic 35 A, aliphatic, nonpolar, hydrophobic, ambivalent G, aliphatic, nonpolar, ambivalent 150 R, basic, polar (positively charged) H, basic, polar, positively charged 449 H, basic, polar, positively charged L, aliphatic, nonpolar, hydrophobic tolerated 509 R, basic, polar (positively charged) W, aromatic, nonpolar 554 A, aliphatic, nonpolar, hydrophobic, ambivalent S, non-aromatic hydroxyl, polar (uncharged), hydrophilic EPHA10 EPH receptor A10 PCSK9 proprotein convertase subtilisin/kexin type 9 HMCN1 hemicentin 1 NAV1 neuron navigator 1 12 SCN3A sodium channel, voltagegated, type III, alpha subunit 1935 I, aliphatic, nonpolar V, aliphatic, nonpolar, hydrophobic R, basic, polar (positively charged) V, aliphatic, nonpolar, hydrophobic PolyPhen Prediction of Functional Effect Condel Prediction of Functional Effect Secondary Structure of Locus Change located within an immunoglobin I-set domain. L, aliphatic, nonpolar, hydrophobic CHD5 chromodomain helicase DNA binding protein 5 SLC5A7 solute carrier family 5 (choline transporter), member 7 Human Amino Acid Variant and Biochemical Characteristics tolerated benign neutral tolerated unknown Change not located within a major motif/domain. tolerated probably damaging neutral Change located within an ephrin receptor ligand binding domain. benign neutral Change located in a proprotein convertase subtilisin/kexin domain. deleterious probably damaging deleterious Change located in an IG-like domain. deleterious possibly damaging deleterious Change not located within a major motif/domain. tolerated tolerated benign Change located in a signal-peptide domain. Entire protein is a sodium/solute symporter. benign Entire protein is a sodium channel protein type 3 subunit alpha. neutral 86 MAP2 microtubuleassociated protein 2 374 A, aliphatic, nonpolar, hydrophobic, ambivalent V, aliphatic, nonpolar, hydrophobic 406 K, basic, polar (positively charged) V, aliphatic, nonpolar, hydrophobic 366 I, aliphatic, nonpolar V, aliphatic, nonpolar, hydrophobic P, cyclic, nonpolar V, aliphatic, nonpolar, hydrophobic 1048 S, non-aromatic hydroxyl, polar (uncharged), hydrophilic V, aliphatic, nonpolar, hydrophobic 1259 G, aliphatic, nonpolar, ambivalent V, aliphatic, nonpolar, hydrophobic 1090 T, non-aromatic hydroxyl, polar uncharged, hydrophilic V, aliphatic, nonpolar, hydrophobic 1645 E, acidic, polar, charged V, aliphatic, nonpolar, hydrophobic 27 Q, neutral, polar (uncharged) V, aliphatic, nonpolar, hydrophobic MAP2 microtubuleassociated protein 2 CNTN6 contactin 6 KCNH8 potassium voltage-gated channel, subfamily H (eagrelated), member 8 980 KCNH8 potassium voltage-gated channel, subfamily H (eagrelated), member 8 SETD2 SET domain containing 2 NISCH nischarin ROBO1 roundabout, axon guidance receptor, homolog 1 (Drosophila) GAP43 growthassociated protein 43 deleterious deleterious benign benign neutral Change located in a MAP2/Tau projection domain. neutral Change located in a MAP2/Tau projection domain. benign Change located in an immunoglobin, C-2 type/IG-like domain. tolerated benign neutral Change located within a potassium voltagegated channel subfamily H domain. deleterious probably damaging deleterious Change not located within a major motif/domain. neutral Change not located within a major motif/domain. deleterious Change not located within a major motif/domain. neutral Change not located within a major motif/domain. tolerated deleterious benign deleterious possibly damaging deleterious benign 87 REST RE1silencing transcription factor 885 GDNF glial cell derived neurotrophic factor 79 KCNMB1 potassium large conductance calcium-activated channel, subfamily M, beta member 1 10 SLC22A3 solute carrier family 22 (extraneuronal monoamine transporter), member 3 T, non-aromatic hydroxyl, polar uncharged, hydrophilic D, acidic, polar, charged K, basic, polar (positively charged) V, aliphatic, nonpolar, hydrophobic V, aliphatic, nonpolar, hydrophobic V, aliphatic, nonpolar, hydrophobic tolerated deleterious tolerated benign probably damaging benign neutral Change not located within a major motif/domain. Change located in a glial cell line-derived neurotrophic factor domain. deleterious neutral Change located within a signal-peptide domain. Protein is a calcium-activated potassium channel, beta subunit. 192 V, aliphatic, nonpolar, hydrophobic V, aliphatic, nonpolar, hydrophobic tolerated probably damaging deleterious Change located within a transmembrane region of a major facilitator superfamily domain. Also located within a sugar (and other0 transporter domain. 296 P, cyclic, nonpolar V, aliphatic, nonpolar, hydrophobic tolerated benign neutral No motifs/domains identified. neutral Most of protein, including change locus, identified as a GDNF family receptor 2 domain. neutral Change found in a copper type II ascorbate-dependent monooxygenase domain, and/or a PHM/PNGase F domain. deleterious Change located within a choline/carnitine oacyltransferase domain. NYAP1 neuronal tyrosinephosphorylated phosphoinositide3-kinase adaptor 1 GFRA2 GDNF family receptor alpha 2 399 I, aliphatic, nonpolar DBH dopamine beta-hydroxylase (dopamine betamonooxygenase) 508 D, acidic, polar, charged V, aliphatic, nonpolar, hydrophobic Q, neutral, polar (uncharged) V, aliphatic, nonpolar, hydrophobic CHAT choline Oacetyltransferase 158 V, aliphatic, nonpolar, hydrophobic tolerated tolerated deleterious benign benign possibly damaging 88 PPFIBP2 PTPRF interacting protein, binding protein 2 (liprin beta 2) 345 T, non-aromatic hydroxyl, polar uncharged, hydrophilic V, aliphatic, nonpolar, hydrophobic 350 E, acidic, polar, charged V, aliphatic, nonpolar, hydrophobic 151 D, acidic, polar, charged V, aliphatic, nonpolar, hydrophobic 1399 A, aliphatic, nonpolar, hydrophobic, ambivalent V, aliphatic, nonpolar, hydrophobic 1389 G, aliphatic, nonpolar, ambivalent V, aliphatic, nonpolar, hydrophobic 1329 N, neutral, polar (uncharged) V, aliphatic, nonpolar, hydrophobic 905 M, sulfurcontaining, nonpolar, hydrophobic V, aliphatic, nonpolar, hydrophobic 579 C, sulfurcontaining, polar (uncharged) V, aliphatic, nonpolar, hydrophobic 133 V, aliphatic, nonpolar, hydrophobic V, aliphatic, nonpolar, hydrophobic 107 L, aliphatic, nonpolar, hydrophobic V, aliphatic, nonpolar, hydrophobic PPFIBP2 PTPRF interacting protein, binding protein 2 (liprin beta 2) NAV2 neuron navigator 2 ARHGAP32 Rho GTPase activating protein 32 ARHGAP32 Rho GTPase activating protein 32 ARHGAP32 Rho GTPase activating protein 32 ARHGAP32 Rho GTPase activating protein 32 NAV3 neuron navigator 3 CLN6 ceroidlipofuscinosis, neuronal 6, late infantile, variant HEXA hexosaminidase A (alpha polypeptide) tolerated benign tolerated benign tolerated probably damaging deleterious possibly damaging tolerated tolerated deleterious deleterious tolerated tolerated neutral Entire protein is a LarInteracting protein (LIP)-related protein. neutral Entire protein is a LarInteracting protein (LIP)-related protein. deleterious Change located in a calponin homology domain. deleterious Change not located within any major motifs. benign Change not located within any major motifs. benign Change not located within any major motifs. benign neutral Change not located within any major motifs. neutral Change not located within a major motif/domain. neutral No major motifs/domains identified. benign benign benign Change located in a glycosyl-hydrolase family 20, domain 2. 89 RAI1 retinoic acid induced 1 762 P, cyclic, nonpolar V, aliphatic, nonpolar, hydrophobic 156 Q, neutral, polar (uncharged) V, aliphatic, nonpolar, hydrophobic CC2D1A coiledcoil and C2 domain containing 1A CDK5RAP1 CDK5 regulatory subunit associated protein 1 330 N, neutral, polar (uncharged) V, aliphatic, nonpolar, hydrophobic 314 A, aliphatic, nonpolar, hydrophobic, ambivalent V, aliphatic, nonpolar, hydrophobic 428 K, basic, polar (positively charged) V, aliphatic, nonpolar, hydrophobic NEFH neurofilament, heavy polypeptide NEFH neurofilament, heavy polypeptide tolerated benign tolerated possibly damaging tolerated deleterious tolerated benign neutral Change not located within a major motif/domain. neutral Change located in a domain of unknown function. neutral Change located within a (dimethlyallyl) tRNA methylthiotransferase miaB domain. Region of locus also identified as a radical SAM domain. unknown Change located within an intermediate filament protein domain. unknown Change not located within a major motif/domain.