Download - California State University

Document related concepts

Vectors in gene therapy wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Pathogenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Frameshift mutation wikipedia , lookup

Oncogenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

NEDD9 wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomics wikipedia , lookup

Population genetics wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Human genetic variation wikipedia , lookup

Minimal genome wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human genome wikipedia , lookup

Genome editing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Mutation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
CHARACTERIZATION OF UNIQUE FEATURES OF THE DENISOVAN EXOME
A University Thesis Presented to the Faculty
of
California State University, East Bay
In Partial Fulfillment
of the Requirements for the Degree
Master of Science in Biological Science
By
Alexandra Vivelo
September, 2013
ABSTRACT
The publicly available Denisovan genome sequence increases opportunities to learn what
makes modern humans unique and to discover the distinguishing genetic features of an
extinct sister lineage. This thesis explores the latter, with emphasis on male reproductive
genes, neuronal genes, and a subset of metabolic genes, specifically those that code for
enzymes involved in glycolysis and those that code for proteins that vary in modern
human populations in connection with long-term dietary trends in those populations.
Results include the identification of 34 neuronal genes with single-nucleotide changes
that are derived in the Denisovan protein-coding sequence at loci that are nonpolymorphic in modern humans, the computation of the dN/dS ratio for a semen
coagulation factor for which the degree of positive selection is known to be correlated
with the females’ mean number of male mating partners per periovulatory period, and the
determination of the Denisovan variants at a subset of known modern dietary and
metabolism-related single-nucleotide polymorphic loci. Possible behavioral and
functional correlates of those unique features are suggested, providing the foundation for
further study on Denisovan male reproductive selective pressure, unique neuronal gene
features, and metabolic genes.
ii
iii
ACKNOWLEDGMENTS
I would like to thank Dr. Chris Baysdorfer, who supported this project from the
start. He encouraged me to explore new territory, was kind when I proposed implausible
plans, and was always enthusiastic. Thanks to Dr. Baysdorfer, my small seed of an idea
turned into a full-fledged project, and it wouldn’t have happened without his vision and
support. My immense gratitude also goes to Dr. Claudia Uhde-Stone, whose support has
been indispensable throughout my time at CSUEB and without whom this thesis would
not exist; Dr. Henry Gilbert, who has been tremendously generous with his time and
expertise; and Dr. Kenneth Curr for his mentorship. I would also like to thank Dr. Kelly
Decker, Dr. Maria Nieto, and Dr. Maria Gallegos for their encouragement. These CSUEB
faculty members make up an exceptionally talented and caring group, and I am privileged
to be acquainted with each of them.
I sincerely appreciate the time and expertise offered by Dr. Ed Green of UCSC in
discussing this research. Thanks also to Dr. Bill Lu of SBI for a helpful discussion.
I would also like to express my deep gratitude to my husband, Terry Van Belle.
Without the significant investment of time and programming skill he put into creating
searchable alignment files, and without his programming instruction, I would not have
been able to access or analyze the data used in these pages.
Finally, thanks to my dad, who has always believed in me; Val, for the Excel
help; my aunt, for telling me it’s not too late; and my daughter, for cheerfully sacrificing
some of our precious time together and encouraging me every step of the way.
iv
TABLE OF CONTENTS
ABSTRACT ........................................................................................................................ ii
ACKNOWLEDGEMENTS ............................................................................................... iv
LIST OF TABLES ............................................................................................................ vii
LIST OF FIGURES ......................................................................................................... viii
INTRODUCTION .............................................................................................................. 1
Research Question .......................................................................................................... 3
Background ..................................................................................................................... 4
What We Know of the Denisovans .............................................................................. 4
The Source of Ancient DNA Sequences ...................................................................... 5
Neanderthal Analyses Hint at What May Be Found in Denisovan DNA .................... 7
Locus-Specific Denisovan Genome Research Published to Date ............................. 13
Challenges and Opportunities in the Characterization of Ancient DNA .................. 16
Summary of Aims ...................................................................................................... 18
METHODS ....................................................................................................................... 19
RESULTS ......................................................................................................................... 23
Male Reproductive Genes ............................................................................................. 23
Premature Stop Codons and Loss-of-Stop Mutations ................................................... 36
Genes Related to Neuron Formation and Function....................................................... 41
L1CAM ...................................................................................................................... 43
PCSK9 ....................................................................................................................... 43
HMCN1 ..................................................................................................................... 44
SETD2 ....................................................................................................................... 45
v
REST ......................................................................................................................... 46
GDNF ........................................................................................................................ 47
NYAP1 ....................................................................................................................... 48
CHAT ........................................................................................................................ 48
NAV2 ......................................................................................................................... 49
CLN6 ......................................................................................................................... 50
HEXA ........................................................................................................................ 50
CC2D1A .................................................................................................................... 51
NEFH ........................................................................................................................ 52
Metabolic Genes ........................................................................................................... 52
Metabolic Genes with Known SNPs in Modern Humans ......................................... 55
DISCUSSION ................................................................................................................... 58
Selective Pressures in Male Reproductive Genes ......................................................... 58
Neuronal Genes ............................................................................................................. 59
Premature Stops and Loss-of-Stop Mutations .............................................................. 60
Metabolic Genes ........................................................................................................... 62
REFERENCES ................................................................................................................. 64
APPENDIX A: MALE REPRODUCTIVE GENES ........................................................ 82
APPENDIX B: NEURONAL GENES ............................................................................. 85
vi
LIST OF TABLES
Table 1. SEMG2 dN/dS Ratios. ....................................................................................... 33
Table 2. Premature Stop Codons Found in the Denisovan Exome. ................................. 36
Table 3. Loss-of-Stop Mutations Found in the Denisovan Exome.................................. 40
Table 4. SNCs in Genes Encoding Glycolytic Enzymes. ................................................ 54
vii
LIST OF FIGURES
Figure 1. SEMG2 dN/dS Ratios. ..................................................................................... 34
viii
1
INTRODUCTION
This project is aimed at characterizing distinct features of the genome of the
Denisovans, the species of extinct hominid discovered in Siberia in 2008. The project
consists of a bioinformatics analysis of the Denisovan genome with special focus on
specific subsets of protein-coding genes that are functional in modern humans and that
include at least one amino acid change between Denisovans and modern humans.
The present study focuses primarily on single nucleotide changes (SNCs), loci at
which the identity of the nucleotide at a particular locus is different between the modern
human and Denisovan genomes. The SNCs being studied here are primarily those likely
to indicate functional significance based on the fact that they are located in translated
regions of exons. To locate the base pairs that show a functionally significant difference,
the Denisovan genome sequence reads have been aligned to the human reference genome
and all anomalies determined. At loci where the modern human variant matches that of
other extant primates (specifically, the gorilla and the orangutan have been used as
outgroups) yet the Denisovan variant differs, the Denisovan variant is said to be derived.
That is, the mutation arose in Denisovans after their reproductive split from early modern
humans. Likewise, at loci where the Denisovan variant matches that of gorillas and
orangutans but not that of modern humans, the modern human genome is said to be
derived at that locus. The comparison of the Neanderthal variant to select subsets of
coding-sequence SNCs between Denisovans and modern humans provides further clues
as to the course of evolutionary change at the studied loci.
2
The characterization of the unique features of an extinct group of hominids
necessitates willingness to accept that firm conclusions cannot be drawn about the exact
nature of any functional significance of amino acid changes. However, educated
speculations can be made based on the chemical properties of a substituted amino acid,
whether it falls within any of the protein’s main functional domains, and to what degree it
affects the protein’s chemical and physical properties.
Much of the research focus in paleogenomics is on characterizing adaptive events
unique to modern humans. In contrast, the aim of this thesis is to characterize unique
features of the Denisovan genome. Ultimately, what we learn about the Denisovan
genome will teach us about ourselves. There is much to be learned about the course of
hominid evolution and the role specific single-nucleotide polymorphisms play in
adaptation. What we learn can not only help us to understand our extinct cousins but also
reveal how some of the features of modern human genes came to be. Because so much of
current paleogenomics research is dedicated to the latter goal, this thesis explores the
former goal, that of understanding the characteristics that are unique to Denisovans. In
doing so, the hope is that this thesis will contribute to a knowledge pool that will allow
future research to attain greater understanding of modern human genetics.
Throughout this thesis, modern humans’ two ancient hominid sister groups are
referred to as Denisovans and Neanderthals, rather than by binomial taxonomic
classification. In the case of the Denisovans, this is the only option, because no official
species name has been assigned to them. In the case of the Neanderthals, the common
term is the only one upon which there is consensus. They are sometimes called Homo
3
neanderthalensis and sometimes Homo sapiens neanderthalensis. The question of
whether Denisovans, Neanderthals, and modern humans are separate species is an
academic one, because “species” has a variety of definitions, none of which provides
final authority on the subject. On that basis, reference to Denisovans and Neanderthals as
“species” has been avoided here.
Research Question
The research question addressed by this project is that of identifying codingregion SNCs that exist between the Denisovan and modern human genomes, particularly
those that occur in translated regions at loci that are non-variant in modern humans, and
determining the likely functional significance of selected SNCs among these.
Between the Denisovan and modern human genomes there exist a number of
SNCs that result in amino acid changes. The likely functional significance of these can be
predicted in a variety of ways. The results of the prediction tools SIFT, PolyPhen, and
Condel are included in the appendices of the present work, drawn from previous
publications on the Denisovan genome (Meyer et al., 2012). Beyond this, in the present
work, a variety of Web-based bioinformatics tools have been used to explore the nature
of specific amino acid substitutions.
The study of paleogenomic variants adds to our understanding of the modern
human genome, especially to the research of modern genetically linked diseases. The
reasons for this include the fact that understanding how the disease-linked genes evolved
helps in treatment development. Another reason is that the identification of modern-
4
human SNPs in which some individuals have an ancestral allele may lead to an
understanding of the evolutionary reason for fixation of the more-frequently found,
modern variant. This project focuses on characterizing the Denisovans rather than taking
the more common tactic of using the Denisovan genome as a tool for more clearly
understanding our own, but its results are intended to add to a knowledge base that may
be used in the future for the purposes of modern genetic medicine as well.
Background
What We Know of the Denisovans
The people known as the Denisovans, to whom an official taxonomic name has
not yet been given, are known so far only by dentition and a few other fragmentary
remains from Denisova Cave near Altai Krai in Siberia. The complete genome was
sequenced from the little finger of a female Denisovan (Gibbons, 2011a). Sequencing of
mtDNA amplified from samples taken from the 40,000-50,000-year-old remains suggests
that the Denisovans represent a branch of Homo that was distinct from the early modern
Homo sapiens and the Neanderthals who lived at the same time (Krause et al., 2010). The
age given for the remains is a rough estimate, because the dating of Denisova Cave’s
layer 11, where the remains were found, may have strata that date both earlier and later,
possibly as recent as 16,000 years ago (Gibbons, 2011b). The size of the third molar of
the Denisovans has been found to lie outside the range of variation for Homo sapiens.
However, arguments have been made against placing the specimens in a new taxon that is
created primarily on the basis of DNA sequencing (Trinkaus, 2010). Additional finds and
5
further DNA analysis may be expected to shed light on the Denisovans’ appropriate
placement within human taxonomy.
The Denisovans’ geographic range may be better understood in the future once
more fossils are located, but it is possible to deduce that they must have ranged through
much of Southeast Asia based on the fact that a number of Southeast Asian populations
share DNA with the Denisovans. Because other Southeast Asian groups share no DNA
with Denisovans whatsoever, it can be concluded that the Denisovans passed their
genetic material to certain subgroups within Southeast Asia after their dispersal into the
region. Some of this gene flow had begun to take place no later than 44,000 years ago
(Reich et al., 2011). Admixture between ancient strains of DNA and anatomically
modern humans (AMH) may have continued until as recently as 35,000 years ago
(Stewart & Stringer, 2012).
The Source of Ancient DNA Sequences
Although this thesis makes use of previously published sequences and does not
entail any new sequencing of ancient DNA, it is relevant to note the process by which
these sequences were originally obtained and to discuss how the process used to obtain
deep coverage of the Denisovan genome differs from previous techniques.
In brief, the isolation of ancient DNA follows a similar general process to that of
standard DNA isolation protocols. In the case of obtaining genetic material from bones or
teeth, the usual source material for ancient DNA, the first step is the grinding of the
sample of bone or tooth to powder. The next step is to use proteinase K to extract the
6
DNA. This is followed by the use of silica to bind the DNA (Rohland & Hofreiter, 2007).
Silica is used in DNA isolation from ancient material because it has such a high affinity
for binding DNA. This is an advantage in ancient DNA extraction because it maximizes
the yield from material that has a smaller amount of DNA than living or recently
deceased tissue. However, silica’s disadvantage is that in addition to binding DNA, it
adsorbs polymerase onto its surface (Besecker, Cornell, & Hampikian, 2012). Extra care
must be taken to eliminate all traces of silica pellets from the extract prior to DNA
amplification to avoid loss of effective polymerase concentration due to inhibition by the
silica (D. Y. Yang, Eng, Waye, Dudar, & Saunders, 1998). Possible solutions to problems
of enzyme inhibition are the addition of BSA (Besecker et al., 2012) and the inclusion of
a higher-than-usual concentration of enzyme. Following DNA binding, a wash step and
an elution step are performed in much the same way as in other DNA isolation techniques
(Rohland & Hofreiter, 2007).
The high-coverage Denisovan genome was obtained by preparing the DNA
library from single-stranded, rather than the usual double-stranded, DNA. This results in
a higher yield for more than one reason. For one, the method eliminates purification steps
that result in loss of DNA. The DNA is first dephosphorylated. After heat denaturing,
adaptor oligonucleotides are ligated to the 3’ ends of the single strands. These adaptors
are biotinylated to allow the fragments to be bound to streptavidin-coated beads. Primers
complementary to the adaptors are used for copying of the original single strands.
Following the copying step, a double-stranded adaptor is ligated to the 3’ end of the new
daughter strand. In addition to the prevention of DNA loss, this method minimizes
7
damage to the DNA during the amplification process and therefore allows researchers to
observe the exact patterns of ancient DNA breakage due purely to degradation over time.
This method also improves the recovery of long molecules, as well as the recovery of
short molecules smaller than 30 base pairs in length (Meyer et al., 2012). This method is
what allowed for an approximately 30-fold-coverage whole-genome Denisovan sequence
to be published online in February 2013 by the Max Planck Institute.
Neanderthal Analyses Hint at What May Be Found in Denisovan DNA
The existing research on the Neanderthals provides some examples of the type of
information that may be gleaned from the study of the Denisovan genome. For example,
Green et al. speculated on the functional significance of changes between Neanderthals
and modern humans by examining the protein-coding function of genes with fixed,
functional nucleotide substitutions. Many of the nucleotide substitutions between
Neanderthals and modern humans are silent, and many of the affected loci are variable in
modern humans. Only 78 substitutions identified by Green et al. are both fixed and lead
to a change in the protein coded for by the affected gene. These loci indicate ways in
which modern humans maintain a stable biochemical difference from the ancestral
biochemical profile shared by chimpanzees and Neanderthals. Although 78 fixed,
functional substitutions were identified, only a few are located on the same gene together.
A mere five genes were identified that include multiple fixed, functional substitutions.
These include SPAG17, a gene that codes for a protein found in the sperm flagellum;
PCD16, a gene that codes for a substance important in adhesion between cells; TTF1, a
8
gene that codes for a protein that functions in the termination of ribosomal transcription;
CAN15, a gene with a product whose function is unknown, and RPTN, the gene that
codes for repetin, a protein found in various regions of the epidermis throughout the body
(R.E. Green et al., 2010).
One group of genes that have been analyzed in detail for differences between the
Neanderthal and modern human is those that code for signal peptides. Gralle and Pääbo,
both of the Max Planck Institute, have looked for amino acid substitutions that are
present in modern humans but not in the Neanderthal genome, which tells us these
mutations occurred more recently than approximately 400,000 years ago, according to the
most common general consensus for the approximate Homo sapiens/Neanderthal split
date (Endicott, Ho, & Stringer, 2010). They believe that for all amino acid substitutions
that occurred since the human/Neanderthal split, differences can eventually be identified
and studied for functionality. Gralle and Pääbo have begun by studying differences in
signal peptides between Neanderthals and Homo sapiens. Signal peptides cause newly
formed proteins to be directed to the endoplasmic reticulum. This is usually the signal
peptide’s only purpose; it undergoes cleavage from the nascent protein and has no further
function after the protein reaches the endoplasmic reticulum. This means that any
functionally significant changes in signal peptides will make the journey of a new protein
to the endoplasmic reticulum either more or less efficient (Gralle & Pääbo, 2011).
In an analysis of a single Neanderthal individual dated to approximately 43,000
years ago, 10 differences between human and Neanderthal signal peptides were found. In
all 10 cases, the Neanderthal individual carried the same amino acid chimpanzees carry at
9
that locus. Present-day modern humans have a fixed derived allele at four of those
locations. At six of the locations, some modern human individuals have the derived allele
and others have the ancestral variant. Single amino acid substitutions on signal peptides
can reduce cell survival rates by down-regulating protein transport, which means such
substitutions are likely to be relevant in some diseases. Efficient protein transport to the
cell membrane is necessary for cell survival rates. Gralle and Pääbo analyzed their data to
identify any functional significance between the modern human signal peptides and the
ancestral signal peptides shared by Neanderthals and chimpanzees. Their results showed
that “no modern human signal peptide differed significantly from its ancestral
counterpart, an observation compatible with the neutral theory of molecular evolution”
(Gralle & Pääbo, 2011). The functionally significant differences between modern humans
and our two extinct sister lineages are rare within the genome, which means that most
regions investigated for differences will yield negative results. A more fruitful approach
to locating and understanding differences between ancient and modern DNA is to
compile a list of all the differences and then take note of the genes upon which these
differences are found.
Not only SNCs that are fixed in the modern human population, as described
above, are of interest. The locus of a modern human polymorphism may also be worthy
of investigation. For example, Ovchinnikov and Kholina postulate that certain
mitochondrial DNA sequence similarities between fossil samples and modern humans
indicate sequences that were integrated into the genome of a common ancestor of
Neanderthals and humans not long before the human/Neanderthal split. A particular pair
10
of linked polymorphisms caught Ovchinnikov and Kholina’s attention because they exist
in their modern variation in the two ancestral samples examined (Ovchinnikov &
Kholina, 2010).
The genes in question are ATP6 and ND3. The polymorphisms found both in the
samples and in a majority of modern humans are 8701G and 10398G. Pan troglodytes
has 8701A and 10398A at these loci, and this polymorphism is believed to have been
present in the common ancestor we share with Pan as well. The difference denoted by G
versus A is that these ancient samples share a guanine in a specific location with most
modern humans; yet chimpanzees, bonobos, and some modern humans whose genomes
show a “reversion to the ancestral allele” all have an adenine in this location instead.
An exploration of the functional difference between mitochondrial DNA with a
guanine at these locations and DNA with an adenine at these locations may shed light on
the relationship between large brains and certain diseases affecting the central nervous
system.
Guanine is present at 8701 and 10398 in Neanderthals and the oldest modern
human lineages. One possible reason 10398G underwent positive selection in early
modern human populations is that this polymorphism increases the activity of the
mitochondrial electron transport chain. Individuals who show the reversion to 10398A
are more vulnerable to neurological disorders such as Parkinson’s and Alzheimer’s
disease. In other words, electrons can flow more quickly through the mitochondrial
membranes of an individual with the 10398G variation, so energy for cellular work can
be released faster than in an individual with the 10398A allele. The role of the 8701 locus
11
in specific diseases has not been studied as conclusively as that of 10398, but
Ovchinnikov and Kholina suggest that studies will reveal the adenine allele to have a
down-regulating effect on ATP synthesis and oxidative phosphorylation similar to that
exerted by the adenine allele at 10398 on mitochondrial electron transport. They also
suggest that the substitution of 8701G for 8701A and 10398G for 10398A preceded the
encephalization of the common ancestor modern humans share with Neanderthals and
allowed greater encephalization than that present in chimpanzees (Ovchinnikov &
Kholina, 2010).
Green et al. located recent modern human selective sweeps by SNPs at which
Neanderthals carry an ancestral allele rather than the derived allele that appears at that
locus in modern humans. This method showed that 212 regions contain selective sweeps
that occurred since modern humans split with Neanderthals. The relative age of these 212
could be determined because the more recent a selective sweep is, the more consistent the
sequences of linked nucleotides on either side of the selected allele will be. The more
time has passed since the selective sweep, the more mutations will have occurred in the
affected region. Five of the 20 most recent selective sweeps contain noncoding DNA
only, so they likely represent selection for regulatory regions whose functions have not
yet been identified. The most recent of the remaining 15 regions contains a gene called
THADA. Mutations in THADA have been shown to be associated with a propensity to
develop type II diabetes. Several of the other genes included in the 15 most recent
selective sweeps for coding regions also have been implicated in specific diseases. One of
these genes is NRG3, which is linked to schizophrenia when certain mutations occur
12
within it. Another is RUNX2, which when mutated causes a genetic disorder known as
cleidocranial dysplasia, characterized in part by a bell-shaped rib cage and frontal bossing
(R.E. Green et al., 2010).
Human Accelerated Regions (HARs) represent another region of the genome that
has been studied in Neanderthals. Prior to the sequencing of the Neanderthal genome, the
HARs were identified as portions of the human genome that evolved rapidly in humans
during recent millennia. HARs are highly conserved in all other vertebrates, including
chimpanzees. When Green et al. examined the HARs for differences, they discovered that
91.4% of the studied HARs are shared between modern humans and Neanderthals,
indicating that the HARS developed to their current derived state before the modern
human-Neanderthal split (R.E. Green et al., 2010). Because Denisovans and Neanderthals
are believed to have diverged more recently than the divergence of either lineage from
modern humans, one might hypothesize that HARs must be similar in all three lineages.
However, only an analysis of these regions could determine this for certain. One
argument against placing particular emphasis on these regions is that the possibility exists
that the HARs are accelerated in humans not due to adaptation but rather due to the
relaxation of selective pressures. That is, the HARs may have little functional purpose in
humans and may therefore be free to mutate with minimal consequence. One of the
possible non-selective reasons for mutation in the HARs is GC bias, the conversion of AT pairings to G-C pairings in the absence of selective pressure keeping the A-T base pair
in place (Katzman, Kern, Pollard, Salama, & Haussler, 2010). According to the study
done on the Denisovan HARs so far by Burbano et al., many of the human-specific
13
derivations of the HARs are shared with modern humans by the Denisovans as well as
the Neanderthals. However, 8% of the derived alleles are found in the ancestral state
rather than the modern human state in both ancient hominids. These authors pinpointed a
number of derived alleles that are more recent than the modern human split with the
Neanderthal/Denisovan common ancestor and suggest that the HARs are worthy of
further comparative study in the future (Burbano et al., 2012).
Locus-Specific Denisovan Genome Research Published to Date
To date, the number of studies published on the specifics of the Denisovan
genome is limited but growing. It has been determined that the Denisovans contributed
genetic material to the modern humans of Southeast Asia, and this admixture has been
determined to be independent of AMH-Neanderthal interbreeding (Skoglund &
Jakobsson, 2011). Abi-Rached et al. have proposed that the Denisovan contribution to the
genomes of certain Asian populations is responsible for the presence of the HLA-B*73
allele in these populations. HLA is active in immune function by ligating T cells and the
receptors for natural killer cells. The HLA-B*7 allele is rarely found outside of West
Asian populations. It, as well as the HLA-C-C allele, is shared by modern Asians and
Denisovans, is most likely due to introgression (Abi-Rached et al., 2011).
Another study on the evolution of the human immune system identifies gene
inactivations that occurred subsequent to the divergence of the Neanderthal/Denisovan
common ancestor with early modern humans. SIGLEC13 and SIGLEC17P are expressed
in chimpanzees but exist as inactivated pseudogenes in modern humans, with the
14
exception that SIGLEC17P is expressed in human natural killer cells. SIGLEC13 is
calculated to have been inactivated about 46,000 years ago, and SIGLEC17P is estimated
to have been inactivated approximately 100,000 years ago. The Neanderthal and
Denisovan genomes show the modern human variant of both these genes, indicating that
either the estimates for the dates of inactivation are incorrect or the inactivations occurred
independently in all three lineages (X. Wang et al., 2012).
Yet another immunity-related gene, BST2, has been determined to have reached
its current form in the common ancestor of modern humans, Denisovans, and
Neanderthals, at least 800,000 years ago. BST2 prevents replicated immunodeficiency
viruses from being released from their infected host cells. This action confers protection
against all known primate immunodeficiency viruses except the HIV-1 group M varieties
that affect humans. Therefore, Denisovans and Neanderthals must have shared modern
humans’ immunity to most simian immunodeficiency viral strains (Sauter, Vogl, &
Kirchhoff, 2011).
The genome region that includes the OAS1 gene has a haplotype found only
among Melanesians yet closely matched by the Denisovan sequence. This may indicate
introgression, or interspecies gene flow, from the Denisovans into an early human subpopulation, resulting in a haplotype that persists today (Mendez, Watkins, & Hammer,
2012).
The APOE gene codes for apoliprotein E, which binds to lipids to form
lipoproteins. APOE is polymorphic in humans but monomorphic in chimpanzees, and the
human polymorphisms are implicated in variations in several aspects of human health.
15
The Denisovan APOE sequence indicates that Denisovans shared the fixed, humanspecific variations within APOE, with apoliprotein E function more closely resembling
that of modern humans than that of chimpanzees (McIntosh et al., 2012).
A cluster of genes related to dopaminergic neurotransmission, comprising the
genes NCAM1, TTC12, ANKK1 and DRD2, shows several human-specific derived
alleles. Denisovans were found to have some of the ancestral alleles and some of the
derived alleles in these genes, whereas Neanderthals were found to share all the humanspecific derivations (Mota, Araujo-Jnr, Paixão-Côrtes, Bortolini, & Bau, 2012).
Denisovans and Neanderthals share the two human-specific alleles in FOXP2.
However, an intronic SNC in FOXP2, at which the Neanderthals and Denisovans differed
from modern humans, may have affected transcription factor binding and may therefore
have affected the expression of FOXP2 (Maricic et al., 2012).
Agoni et al. have identified retroviral insertions into the Denisovan and
Neanderthal genome subsequent to their divergence from early modern humans. These
are the traces of germline cell infections by retroviruses. The evidence supports the
conclusion that the Denisovans and Neanderthals diverged more recently from each other
than from early modern humans but as yet does not indicate the nature of any influence
these genetic insertions may have had on these ancient genomes (Agoni, Golden, Guha,
& Lenz, 2012). One gene variant that, in its mutated form, causes disease in modern
humans, was found to be present in the Denisovan genome. The gene in question,
CC2D1A, when the mutation ValI736Met, can cause miscarriage and thromboembolism
when mutated in modern humans. The fact that the disease-causing variant was found in
16
the Denisovan genome may indicate that this variant was typical among Denisovans yet
compensated by other mutations, or it may be indicative of disease in the sequenced
Denisovan individual (G. Zhang et al., 2011).
Challenges and Opportunities in the Characterization of Ancient DNA
Reich et al. published a low coverage of the Denisovan genome, about 1.9 fold
(Reich et al., 2010). Coverage refers to the number of reads in which each base is present,
and therefore higher coverage equates to higher confidence in the accuracy of the final
sequence. Low coverage leaves us unable to determine whether a given variant is derived
in the ancient organism (Lalueza-Fox & Gilbert, 2011). Meyer et al. subsequently
published a 30-fold-coverage sequence of the genome using a single-stranded DNA
library preparation technique involving the immobilization of dephosphorylated,
denatured ancient DNA on streptavidin-coated beads. This method eliminates the need
for purification steps and therefore preserves products that would otherwise be lost during
purification (Meyer et al., 2012).
The high-coverage genome sequence is available to the public for download and
is the sequence being used for this project. Despite the advantages of the high-coverage
sequence, caution is still warranted with regard to assumptions about Denisovan adaptive
mutations. Because the option to compare the genomes of multiple Denisovan individuals
does not yet exist, the possibility remains significant that a given variant is due not to
Denisovan adaptation but to degradation or error. It is also impossible to tell which sites
were polymorphic and which were fixed in the Denisovan population. This project’s
17
primary focus is on Denisovan-specific derived variants at loci that are non-polymorphic
in the modern human population. It is tempting to suggest the possibility that some of
these represent Denisovan-specific adaptations. However, any such inferences must be
suggested tentatively, with the caveat that because we cannot observe living organisms,
any such sites will be regarded as possible, not certain, sites of recent Denisovan-specific
adaptive mutations.
This research also focuses primarily on SNCs located on exons. Most of the
differences between ancient and modern-human DNA are located in non-coding regions.
Differences in non-coding regions are worthy of study for possible effects on functions
such as regulatory activity and the transcription of microRNAs (Lalueza-Fox & Gilbert,
2011). However, the non-coding changes will be largely outside the range of this project,
because of the greater potential for identifying the significance of coding SNCs.
Previous studies on the Neanderthal genome give some indication of the extent
and type of differences between humans and ancient sister populations. Unsurprisingly
given the closely related nature of the lineages in question, when the modern human
genome is compared to that of the Neanderthals, only a handful of chromosomal regions
contain multiple SNCs predicted to have functional significance. Green et al. identified
the top 20 candidate regions for positive selection, dividing them into four categories:
regions in which recent selective sweeps were likely in modern humans, regions in which
selective sweeps were likely in apes, regions in which selective sweeps were likely to
have taken place prior to the reproductive split between modern humans and the common
ancestor of Neanderthals and Denisovans, and selective sweeps that approximately
18
coincided with the time of the split (R.E. Green et al., 2010). Crisci et al. have further
analyzed those regions for significance and narrowed the results to a list of 29 proteincoding genes that show significant changes between ancient and modern genomes.
Among others, these genes include transcription regulators, genes that when mutated
have known disease association in modern humans, a proteinase, and several genes in the
HOX family. (Crisci, Wong, Good, & Jensen, 2011).
The present research takes advantage of the opportunity to extract specific
categories of data from the Denisovan genome sequences, to find patterns in the
differences between the modern human and Denisovan exomes, and to analyze the
potential biochemical implications of a few specific differences.
Summary of Aims
A. The first aim is to use bioinformatics to locate genes in which there are
functional differences between Denisovans and modern humans.
B. The second aim is to suggest possible significance of select differences.
C. The third aim is to indicate directions for future research.
19
METHODS
Denisova genome sequence data were obtained in binary alignment/map (BAM)
file format from the most recent, publicly available, approximately 30-fold coverage
Denisova sequence reads aligned to the 20-fold coverage modern human genome
sequence hg19. These reads are the result of a recently developed technique for
sequencing ancient DNA with coverage rivaling that of modern genome sequences
(Meyer et al., 2012). These published sequences were downloaded from
http://cdna.eva.mpg.de/denisova.
The Neanderthal sequences, also aligned to hg19, were downloaded in BAM file
format from http://cdna.eva.mpg.de/neandertal/altai/bam/. According to the source Web
page belonging to the Max Planck Institute Department of Evolutionary Genetics, this
genome sequence averages 50-fold coverage.
The sequence alignments were converted to variant call format (VCF) files using
SAMtools, a Linux-based package designed to store, process, index, and sort large
nucleotide sequence alignments (H. Li et al., 2009). The SAMtools utilities are freely
available at http://samtools.sourceforge.net/. The SAMtools mpileup utility was used to
extract all mismatches between the modern human and Denisovan sequences, and
between the modern human and Neanderthal sequences.
For the Denisovan alignment to the modern human sequence, the exon-only
mismatches were identified and placed in a separate file. To accomplish this, the loci of
all human exons were obtained from Ensembl, and this exon information was used to
20
extract exons from the VCF files using Linux shell tools. The result was a
comprehensive, searchable file of the difference between the modern human and
Denisovan exomes. For all genome regions and mismatch types studied in this project
that had not been not cataloged in previous research, Linux command line tools were
used to obtain subsets of data that gave single-nucleotide changes (SNCs) and indels for
the regions of interest.
Two published catalogs of SNCs between the modern human and Denisovan
genomes were downloaded from the material available as a supplement to Meyer et al.’s
2012 paper on their high-coverage Denisovan genome sequence (Meyer et al., 2012).
One was the file of all SNCs at loci within the consensus coding sequences (CCDS), at
which the Denisovan genome shows a derived allele, and at which the modern human
locus has a single fixed allele, or at which the dominant human allele is present at greater
than 99%. The other was the file of all SNCs within the CCDS, at which the modern
human genome shows a derived allele, and at which the dominant human allele is fixed
or present at greater than 99%. Both catalogs were converted to VCF files so they could
be searched and analyzed in a Linux environment. The data identified genes by accession
numbers only, meaning that specific gene data had to be added. The VCF files of these
two SNC catalogs were used to create Excel spreadsheets, to which the gene names, gene
annotations, and Neanderthal variants were added. The Neanderthal variant column was
filled in only at loci where the Neanderthal variant differed from the ancestral state. The
records for which the Neanderthal sequence shares the ancestral state were left blank in
the Neanderthal column for ease of visual scanning. Neanderthal variants for these sets of
21
loci were obtained from the VCF file of exon-located differences between the modern
human and Neanderthal genomes, described above. Gene names and known or putative
functions were obtained from Entrez Gene and Ensembl. OMIM was used to identify the
diseases associated with the genes from both the human-derived and Denisovan-derived
genes for which the Neanderthal variant matches the derived rather than ancestral state.
For the study of subsets of the CCDS SNC data, a search of the gene names and
annotations was used to create three comprehensive shorter catalogs of the genes related
respectively to neurons, male reproductive genes, and metabolism that show amino acid
differences between the Denisovan and modern human exomes. A fourth shorter catalog
was compiled showing the details on all premature stop and loss-of-stop changes in genes
that are functional in the modern human genome. In the construction of the neuronal and
male reproductive gene catalogs, motifs and domains were analyzed using MotifScan and
InterProScan. For those genes selected from these catalogs for in-depth investigation,
secondary structures and domains were determined using GOR, available at http://npsapbil.ibcp.fr/cgi-bin/secpred_gor4.pl, and InterProScan, available at
http://www.ebi.ac.uk/Tools/pfa/iprscan/.
The significance of the frequency of changes in sperm-related genes was
determined using PAL2NAL, a calculator of dN/dS ratio. PAL2NAL was selected from
among many available tools for its robustness to the inclusion of untranslated regions and
other irregularities in sequence alignments such as alignment mismatches and
frameshifts. It was also selected over other options due to the fact that it is based on upto-date calculation methods and employs the most widely used software. PAL2NAL is an
22
online server that uses the codeml program from PAML to create alignments and perform
phylogenetic analyses (Suyama, Torrents, & Bork, 2006). Protein alignments for input
into PAL2NAL were obtained using ClustalW. Human nucleotide and amino acid
sequences were obtained from NCBI’s CCDS database, and sequences for other species
were obtained from NCBI’s Nucleotide database. Denisovan sequences were created by
altering the human sequence to match all Denisovan variants in the CDS, using the
original Denisovan/modern human SNC pileup created for this project.
23
RESULTS
Male Reproductive Genes
Little can be said with certainty about a group of hominids who have not lived for
tens of thousands of years. All conclusions about any aspect of their existence that cannot
be ascertained from their skeletal morphology must be made cautiously and must remain
tentative. This is true even of many of the conclusions drawn from DNA, because we
have no way of determining for certain how gene products interacted with one another in
vivo, nor can we see patterns of transcription and translation. However, parallels can be
drawn between observed patterns of behavior and the genetic signatures that accompany
them among modern species, and the genetic signatures found in an extinct line. This
section proposes to draw one such connection between the Denisovan sequence and
observed mating behaviors of several extant primate species and their accompanying
genetic signatures of rates of positive and purifying selection.
Genes that code for proteins forming part of the sperm’s structure, genes that are
expressed specifically in the testis, and genes that code for proteins in semen appear with
noteworthy frequency in the catalog of changes unique to Denisovans and found at fixed
loci. If all other factors were equal, the percentage of the catalog made up by such genes
would be disproportionate. There are 411 different types of cells in the human body
(Vickaryous & Hall, 2006). If all factors were equal and the distribution entirely random,
one would expect genes related to each of the 411 cell types to make up approximately
0.24 percent of the 2060-gene catalog of mutations unique to Denisovans and found at
24
fixed loci. Sperm-related proteins make up 1.94 percent of the catalog, appearing 8-fold
more often than expected due to random chance. However, gene evolution is known to
proceed at uneven rates, with a non-random distribution. Given the fact that in mammals,
sex-linked genes are known to undergo particularly rapid evolution (Good et al., 2013;
Sackton et al., 2013), the percentage found is not unexpected. Sex-biased genes that are
expressed preferentially in males undergo more rapid evolutionary change than either
female-biased genes or non-sex-biased genes (Ellegren & Parsch, 2007). Therefore the
catalog’s abundance of changes in genes that code for proteins specific to spermatozoa
and other male reproductive genes is in keeping with what we know about male
reproductive gene evolution. Nonetheless, the frequency of the appearance of such genes
here, and the relationship of selective pressures on male reproductive genes to mating
systems, made these male reproductive genes an intriguing group to study. It has been
previously noted that multiple fixed, functional substitutions exist in human SPAG17, a
gene coding for a protein found in the sperm flagellum, distinguishing it from the same
gene in Neanderthals (Richard E Green et al., 2010). This fact also points to the
opportunity to discover differences in reproductive proteins.
Certain male reproductive genes have been shown to undergo more rapid positive
selection in species in which the females are observed to mate with more males during
the periovulatory period (Dorus, Evans, Wyckoff, Choi, & Lahn, 2004; Rooney & Zhang,
1999; Torgerson, Kulathinal, & Singh, 2002; Wong, 2010). Dorus et al. give the mean
number of male partners per female per ovulatory period for several species, along with
the type of mating system in which each species engages. At the lower end of the
25
spectrum in terms of mean number of male partners are two polygynous species, in which
one male has exclusive access to a group of females and the females rarely if ever mate
with more than one male per ovulation. These two species are the gorilla and the colobus
monkey, each with a mean number of approximately one male partner per periovulatory
period. The gibbon, which is monogamous, also has approximately one male partner per
female per periovulatory period. The orangutan has a dispersed mating pattern, with
animals living in relative solitude rather than forming strong social bonds within mating
pairs or groups. Orangutan females have a mean of one to two male partners per each
fertile period. Humans, with a wide variety of mating patterns worldwide, also have a
mean of one to two male partners per female ovulation. The two remaining species on the
list have a multimale-multifemale mating pattern. Macaque females mate with a mean of
about three partners in each periovulatory period, and chimpanzees are the most
promiscuous, with a mean of eight male partners (Dorus et al., 2004).
A number of sperm proteins have been shown to evolve more rapidly than
proteins expressed in other tissues. Further, this rapid evolution is demonstrably due to
positive selection, because these genes exhibit a high rate of nonsynonymous
substitution, yet their rate of synonymous substitution is no higher than that of genes
expressed elsewhere. Many such proteins are diverse in function. Some of the expressed
proteins bind the egg, others are involved in gene regulation, and others are involved in
glycolysis. What they have in common is that they are specifically expressed in sperm
(Torgerson et al., 2002). This effect can also be observed in the male reproductive gene
SEMG2, which codes for a semen coagulation factor. SEMG2 evolves more rapidly in
26
primate species in which the females mate with more males in a single periovulatory
period. These species may be said to experience sperm competition. That is, for a male to
pass on his genetic material to the next generation, his sperm must be better equipped to
fertilize the egg than the sperm of the female’s other mates. As compared to species in
which females mate with no more than one or two males, the species in which females
mate with multiple males have “higher sperm counts, richer mitochondrial loading in
sperm and more prominent semen coagulation” (Dorus et al., 2004).
In general, not just in mammals, evolution of male reproductive genes is expected
to proceed more rapidly in polygynous species such as Drosophila than in monandrous
species (Ellegren & Parsch, 2007). The notion that sperm competition leads to rapid
positive selection in sperm is one that passes logical scrutiny. However, the molecular
evidence for this hypothesis has not gone without skeptical scrutiny. For example,
protoamine 1, a glycolysis protein given by Torgerson et al. in 2002 as an example of a
positively selected male reproductive gene, was described by another team of researchers
in 1999 as showing inconclusive evidence for positive selection and requiring further
study (Rooney & Zhang, 1999). A study on humans and three other species of great
apes—the bonobo, the chimpanzee, the gorilla, and the orangutan—contradicts the results
of Torgerson et al.’s research by finding that male reproductive gene evolution was
influenced by gene function but not by mating patterns (Good et al., 2013).
A 2010 comparison of human, chimpanzee, squirrel monkey, owl monkey,
macaque, and colobus rate of nonsynonymous mutation found that testis-specific genes
have a more rapid rate of nonsynonymous mutation in chimpanzees than in humans. This
27
study did not claim to show positive proof of correlation between positive selection and
mating system (Wong, 2010). In short, although there has been some dissent, a number of
studies have supported the idea that sperm competition leads to rapid evolution and
positive selection in male reproductive genes. In recent years, the correlation has been
called “a well-documented proxy of sexual selection” (Grayson & Civetta, 2012).
The overall indication of the literature consulted on this topic is that positive
selection due to sperm competition remains a distinct possibility and can be used as the
basis for informed conjecture, but it cannot be assumed as absolute fact until further
studies are completed on the correlation between primate mating patterns and positive
selection in sperm-, testis-, and semen-specific genes.
Approximately 25% of all possible mutations will be synonymous. The formula
used to calculate the ratio of nonsynonymous to synonymous changes must account for
the occurrence of synonymous changes, also known as amino acid code degeneracy.
Once this adjustment is made, the ratio of dN/dS will be equal to one if evolution acting
on these particular homologous regions is neutral; that is, if neither purifying selection
nor positive selection has been at work on these two sequences, relative to each other
(Hurst, 2002). Two sequences may have a ratio very close to one even if strong selective
forces have acted on them, because purifying selection and positive selection may both
have been at work to a significant, yet approximately equal, degree, resulting in a ratio
very close to one. Nonetheless, given the fact that neutral evolution and genetic drift are
taken as the default assumption for the source of most genetic change, a ratio of one is
28
generally accepted as an indication of neutral evolution, and a ratio that differs from one
is generally accepted as sufficient evidence of selection.
Each nucleotide, depending on whether it is first, second, or third in the codon
and on which codon it is part of, differs in the likelihood that a mutation will result in an
amino acid change. A number of different programs exist that use different methods to
calculate this likelihood (Hurst, 2002). The likelihood referred to here is expressed as a
fraction with a value between zero and one that signifies the probability that a change at
that locus will be synonymous or nonsynonymous. For example, the A in ATT has a
synonymous value of 0 and a nonsynonymous value of 1, because a mutation to any of
the other three nucleotides will result in an amino acid change. However, the T in
position three in ATT has a synonymous value of 2/3 and a nonsynonymous value of 1/3,
because a shift to a C or an A in this position will result in a sense mutation. Only a
mutation to G at this position will result in an amino acid change. The numbers obtained
for all sites in a sequence are added together, resulting in a complete number of
synonymous sites and a complete number of nonsynonymous sites for that sequence. The
comparison between two sequences is obtained by dividing the total number of
nonsynonymous sites in the first sequence by the total number of nonsynonymous sites in
the second. The result is designated dN (or sometimes Ka). Likewise, the total number of
synonymous sites in the first sequence is divided by the total number of synonymous sites
in the second sequence, for a result designated dS (or, alternatively, Ks). The final result
is obtained by dividing dN by dS, and dN/dS = ω, the nonsynonymous-to-synonymous
29
ratio between the two sequences, and a commonly accepted measure of the degree of
positive or purifying selection that has occurred along two lineages.
This measure is quite accurate when the two lineages are moderately closely
related, but it becomes less reliable if the number of nonsynonymous changes between
them is very high. In such cases, the mathematical models can no longer distinguish a
measurable degree of divergence. As Hurst describes it, “the amount of information from
the alignment decreases and we approach saturation” (Hurst, 2002). The modern human
and Denisovan sequences are, of course, so similar that inaccuracy due to an
overabundance of changes is far from an issue. Another limitation on the usefulness of
the dN/dS metric is that the calculation was originally intended to assess selection when
studying lineages that are only distantly related to one another. This is generally a
problem if two samples from within the same population are compared to one another
(Kryazhimskiy & Plotkin, 2008). The logical conclusion is that, as modern humans and
Denisovans are neither so distantly divergent as to have an overly high number of
nonsynonymous differences nor part of the same population, dN/dS is a useful
calculation to apply. One thing crucial to keep in mind when choosing sequences is that
dN/dS can only be calculated if both synonymous and nonsynonymous changes exist
between them. The ratio is useless if no nonsynonymous changes occur, because the
numerator will always be zero in such a case. If no nonsynonymous differences exist
between the sequences, the calculation cannot be performed because the denominator will
be zero.
30
Two of the male reproductive genes with Denisovan/modern human differences at
fixed loci have been found to show a correlation between positive selection and mating
system, discussed in greater detail below.
The foundation for most calculators of ω is the maximum-likelihood method,
which, in brief, involves counting the numbers of synonymous and nonsynonymous sites
and making adjustments for the fact that multiple substitutions may have occurred at the
same locus through evolutionary time. Because transitions are more likely to be
incorporated into a population as polymorphisms, and transversions are rarer due to lessfavorable molecular kinetics, the more accurate methods also take into account the ratio
of the rates of transitions to transversions in the compared sequences (Z. Yang & Nielsen,
2000).
It’s important to keep in mind when doing pairwise comparisons such as these
that the ratio obtained is not a measure of one of the two species as compared to the
other. It is instead a measure of the degree of positive selection in that gene or region for
both species as compared to each other. Although when we construct phylogenetic trees
we may choose a species to serve as a representative of the ancestral state, no extant
species is in point of fact ancestral to another extant species. Likewise, an extinct lineage
cannot be assumed to be ancestral to a related living lineage, and in fact we know with
certainty that Denisovans are not ancestral to modern humans, except for a tiny
percentage of introgression into Melanesian DNA. The various software tools that
calculate dN/dS construct unrooted trees. A comparison of human SEMG2 to gorilla
SEMG2 will give the same ω as a comparison of gorilla to human. Meaningful
31
conclusions regarding the significance of ω, in the sense of distinguishing the degree of
positive selection in the same gene in each of two species, can only be made by multiple
interspecies comparisons, rather than calculating a single ratio based on the numbers of
synonymous and nonsynonymous differences between two species.
It is also important to note that many tools exist for determining ω. All operate on
the same principles but use different algorithms. This means that multiple ratios obtained
using the same tool and identical methods are meaningful relative to each other, but two
different software applications may not produce identical values for ω using the same
data. As we have already established, ω is not an absolute measure but an indication.
Therefore, if the results of two tools or calculation methods are to be compared, trends in
ω should be expected to follow similar curves, but exact values of ω may be different.
Another logical point follows this one. Any correlation between ω and another measure
may be determined in terms of directionality, but not in terms of precise numerical
equivalencies.
The goal of this section is to compare the degree of positive selection in
Denisovan and modern human male reproductive genes. Out of all the genes found to
have Denisovan-specific mutations at fixed loci and those found to have human-specific
mutations at fixed loci, two, SEMG2 and ADAM18, have previously been shown to have
a direct correlation between degree of positive selection undergone by the gene and level
of promiscuity. In a multispecies comparison, SEMG2 shows progressively more positive
selection in species in which the females mate with a larger number of males during a
single fertile period (Dorus et al., 2004). According to a 2010 study by Finn and Civetta,
32
the ADAM proteins all show evidence of positive selection, particularly at codons found
within their disintegrin domains. ADAM2, ADAM18 and ADAM23, (as well as SEMG2,
which was used as a positive control in this study), show a correlation between positive
selection and multimale-multifemale mating systems, in which sperm are subject to
strong postcopulatory selective pressures (Finn & Civetta, 2010).
Although only the male reproductive genes with differences at fixed loci were
selected for closer scrutiny in this study, all differences between the modern human and
Denisovan exons of the relevant isoform of SEMG2 were taken into consideration in
calculating ω values.
The goal of this section was to determine whether a significant difference exists
between the degree of positive selection found in modern human SEMG2 and Denisovan
SEMG2. Six pairwise comparisons were performed, Denisovan to chimpanzee, human to
chimpanzee, Denisovan to gorilla, human to gorilla, chimpanzee to gorilla, and human to
Denisovan. The chimpanzee and gorilla sequences for this gene were used because the
chimpanzee is the most promiscuous, and the gorilla the least, of the species for which
data on mating systems is available. The results of these comparisons are given in the
table below. No complete, reviewed sequence for gorilla ADAM18 CDS was available,
and therefore it was concluded that reliable dN/dS calculations could not be obtained.
The results of the SEMG2 calculations are given in Table 1.
33
Species Pair
ω
Denisovan:Chimpanzee
1.0509
Human:Chimpanzee
1.2220
Denisovan:Gorilla
0.4431
Human:Gorilla
0.4852
Chimpanzee:Gorilla
1.9514
Human:Denisovan
0.9433
Table 1. SEMG2 dN/dS Ratios.
When compared directly with each other, the Denisovan and human sequences
have a value only slightly below one, which is to be expected due to the close similarity
of these sequences. When the human and Denisovan SEMG2 sequences are compared to
gorilla SEMG2, the value is significantly below one, indicating purifying selection. When
compared to the chimpanzee SEMG2 sequence, Denisovan and human SEMG2
sequences both give a value above one, indicating positive, otherwise known as
diversifying, selection. The implication of these ω values is best understood by
contrasting the chimpanzee-to-gorilla comparison with the modern human and Denisovan
calculations against each of these two other primate species. Chimpanzee matings involve
the most males per female fertile cycle, and gorilla matings involve the fewest. This is
reflected in a ω that is distant from the neutral value of one.
It was observed during the performance of these calculations that PAL2NAL does
not always give precisely the same value for ω when given exactly the same sequences,
34
particularly when the two sequences are quite different. This variability was used to
calculate the standard deviation of the ω calculation method.
In both the comparisons to the fewest-males species and the most-males species,
the modern human ω is higher than the Denisovan ω. At first glance, this might appear to
indicate increased positive selection along the modern human lineage. However, this is
misleading, because the informative aspect of the ω values is not in absolute number but
in relative proximity to the species with the most and fewest male partners per fertile
time. The difference between modern humans and Denisovans in this respect, despite
being slight in comparison to the difference between more distantly related lineages, is
significant in light of the small variability of the calculation method. The following figure
provides a visual expression of the relationships from the table above.
SEMG2 dN/dS Ratio: Humans and
Denisovans Superimposed
1.5
1
0.5
0
Figure 1. SEMG2 dN/dS Ratios.
Compared to
Humans
Compared to
Denisovans
35
More nonsynonymous differences exist between the modern human and
chimpanzee sequences than between the Denisovan and chimpanzee sequences (a
difference in ω value of 0.1711). It is also true that more nonsynonymous differences
exist between the modern human and gorilla sequences than between the Denisovan and
gorilla sequences, but this difference is comparatively small (a difference in ω value of
0.0421), four-fold smaller than the difference between the two pairwise comparisons
against the chimpanzee sequence.
The two nonsynonymous differences between the modern human and Denisovan
sequences, resulting in changes to amino acids 274 and 298, are both sites at which the
Denisovan sequence has the derived allele and the human sequence has the ancestral
allele, as compared to the gorilla, chimpanzee, macaque, colobus, and orangutan. One of
these sites is fixed in modern humans, and the other is polymorphic in modern humans.
The presence of the fixed locus that is derived in Denisovans indicates that modern
human SEMG2 is under purifying selection that was relaxed in Denisovans at that locus.
Based on the previous studies correlating ω with mating system, this suggests the
possibility that Denisovans, on average through evolutionary time since the reproductive
split with the modern human lineage, may have had more male partners per periovulatory
period than anatomically modern human females.
36
Premature Stop Codons and Loss-of-Stop Mutations
A total of 12 CCDS genes contain premature stop codons at loci fixed in modern
humans. These premature stops are summarized in Table 2.
Gene
Name
Locus of
Mutation
Modern Denisovan
Human Allele
Allele
Amino
Acid
Position
A
A
T
A
T
Amino
Acid in
Modern
Humans
Q
Q
E
R
Q
34
761
655
132
775
Total Amino
Acids in
Modern Human
Protein
475
765
673
446
783
ZCCHC5
LRCH2
TTLL10
NADK
CCDC30
(aka
PFD6L)
OR5AC2
GAP43
PROL1
PON3
PZP
EFCAB13
TAC4
X_77913818
X_114347796
1_1133168
1_1691211
1_43119670
G
G
G
G
C
3_97806515
3_115382704
4_71275250
7_94992090
12_9303247
17_45452287
17_47921435
C
C
C
C
A
C
G
T
T
T
T
T
T
A
R
Q
R
W
Y
Q
R
167
27
69
253
1459
443
70
309
238
248
354
1482
973
113
Table 2. Premature Stop Codons Found in the Denisovan Exome.
It is not possible to predict the precise nature of the physiological effects of these
premature stops, beyond the conclusion that most of these genes were non-functional in
Denisovans. Some effects may have been minimal, because some of these genes, such as
TTLL10, are members of gene families in which a number of other genes perform similar
functions. As another example, GAP43 has two isoforms, and only one is affected by the
premature stop codon. These factors most likely mitigated the effects of the loss of these
genes. Another important observation based on the table above is that several of these
stop codons, notably those in LRCH2, TTLL10, and CCDC30, are located very near the
end of the modern protein’s coding sequence, meaning the protein may well have
37
remained functional in Denisovans, albeit in an altered form. In terms of the effects of
those premature stops that did result in complete loss of function are some of the
phenotypic effects of variations in these genes when mutations are found in modern
humans. Although this does not offer any direct suggestions as to the nature of the
differences between Denisovans and modern humans, it does provide insight into general
body systems in which differences may have been found, as well as suggesting some
possible, very general predilections Denisovans might hypothetically have had for certain
types of disease. The genes with premature stop codons in Denisovans are discussed here
in chromosomal order, beginning with X and ascending numerically. It is important to
note that this is not a comprehensive list of all Denisovan premature stops in genes with a
CCDS, but rather of all such changes at loci that are fixed in modern humans.
ZCCHC5 does not have a known disease mutation in modern humans. However, a
shared variant, interpreted as benign, was found in siblings with polymicrogyria
(Murdock et al., 2011). The other premature stop on the X chromosome is found in
LRCH2. This gene also lacks known disease associations due to mutations. LRCH2
expression is down-regulated during human embryogenesis in the presence of EtOH,
suggesting that LRCH2 may be one of a number of genes that play some role in fetal
alcohol syndrome (Halder et al., 2013).
Chromosome 1 harbors three Denisovan-specific premature stops. The first is
found in TTLL10, which functions as a slow-acting glycylase, ligating glycine to NAP1
(Ikegami et al., 2008). The second is found in NADK, which presents the potentially
most puzzling exception to the conclusion drawn above, that the functions of most of the
38
proteins encoded by these genes were fulfilled by other, similar proteins. NADK consists
of 446 amino acids. Amino acid 132, an arginine in modern humans, is replaced by a stop
codon in Denisovans, rendering this protein inactive. According to recent publications,
until recently only one mammalian NADK was known to exist (Pollak, Niere, & Ziegler,
2007; R. Zhang, 2013). NADK phosphorylates NAD(+) to yield NADP(+) (Ohashi,
Kawai, Koshimizu, & Murata, 2011). It plays a key role in multiple biosynthetic
pathways and functions as a “universal electron donor” (Agledal, Niere, & Ziegler,
2010). NADK also controls the levels of NADPH in humans, which is vital to resistance
to oxidative stress (Pollak et al., 2007). The explanation for the apparent inactivation of
NADK in the Denisovan sequence, despite the vital nature of this gene’s function, may
be found in the possibility that additional NAD kinases exist in mammals but have yet to
be identified. In fact, a human mitochondrial gene that codes for an NAD kinase,
MNADK, has recently been identified (R. Zhang, 2013).
The third and final premature stop mutation on Chromosome 1 is found in
CCDC30, also known as PFD6L. PFD6L is a cytoskeletal protein, only recently
characterized and possibly functioning as part of the filament motor system. It is
expressed in the pancreas, brain, and kidneys of adult humans (J. Zhang et al., 2006).
The first of two premature stops on Chromosome 3 affects OR5AC2. In modern
humans, large heterozygous deletions in this gene may be associated with epilepsy
(Heinzen et al., 2010). The second Chromosome 3 gene affected by a premature stop,
GAP43, is of interest because based on observations so far, it may be the only neuronal
gene to be rendered nonfunctional in Denisovans. Rare mutations in GAP43, possibly
39
affecting the formation of synapses, may be implicated in schizophrenia (Shen et al.,
2012).
Only one gene on Chromosome 4 is affected, PROL1. The encoded protein,
opiorphin, functions to suppress pain and regulate mood (Wisner et al., 2006). Two
homologs exist, SMR3A and SMR3B, that also encode proteins of the opiorphin family
(Koffler et al., 2012). PROL1 is down-regulated in erectile dysfunction (Tong, Tar,
Melman, & Davies, 2008).
We now jump to Chromosome 7, where PON3 includes a premature stop in the
Denisovan sequence. Absence or down-regulation of PON3 appears to reduce resistance
to oxidative stress and may contribute to neonatal mortality (Kempster, Belteki, Licence,
Charnock-Jones, & Smith, 2012). Elevated PON3 may protect against oxidative damage
in HIV patients (Aragones et al., 2012). PON3 SNPs are associated with Alzheimer’s
disease (Erlich et al., 2012).
Chromosome 12 includes one premature stops. The first is found in PZP. In
modern humans, a SNP on PZP may be associated with nonalcoholic fatty liver disease
(Chalasani et al., 2010). PZP is highly expressed in Alzheimer’s patients prior to the
disease becoming symptomatic (Ijsselstijn et al., 2011).
EFCAB13, found on Chromosome 17, also includes a Denisovan-specific
premature stop at a locus that is fixed in modern humans. EFCAB13 currently lacks
extensive research on disease associations. The encoded protein may be differentially
expressed in Graves’ disease, but this apparent difference may be due to experimental
error (Matsumoto et al., 2013). A final premature stop, also on Chromosome 17, is
40
located on TAC4, a gene that may be implicated in inflammatory bowel disease (L. Liu et
al., 2011). This gene plays a role in early lymphocyte development; TAC4 knockout mice
have greater numbers of pro-B cells in bone marrow than wild-type mice (Berger et al.,
2010).
In addition to these premature stops, the Denisovan sequence includes three lossof-stop mutations, detailed in Table 3.
Gene Name
Locus of
Mutation
Modern
Human Allele
Denisovan
Allele
PRR15
ZNF804B
KTN1
7_29606333
7_88966344
14_56147406
T
T
T
C
A
A
Amino Acid
Gained in
Denisovan
Gene
Q
K
R
Table 3. Loss-of-Stop Mutations.
The first two loss-of-stop mutations are found on Chromosome 7. The first affects
PRR15, in which the mutation of a T to a C in Denisovans results in the conversion of a
modern-human stop codon to a glutamine in Denisovans. In modern humans, mutations
in PRR15 are associated with Alzheimer’s disease (Olah et al., 2011). The second gene
affected on Chromosome 7 is ZNF804B, in which the mutation of a T in modern humans
to a C in Denisovans results in the gain of a lysine in place of a stop codon. Modernhuman mutations in ZNF804B have been suggested to have associations with anorexia
nervosa (K. Wang et al., 2010). The third and final loss-of-stop mutation is found on
Chromosome 14, on the gene KTN1, in which to mutation of a T to an A results in the
gain of an arginine in Denisovans. KTN1 is differentially expressed in some tumor
41
tissues (Babeto et al., 2011) Variations are found in patients with muscular dystrophy
(Aurino et al., 2008).
Genes Related to Neuron Formation and Function
Almost two percent of the catalog of nonsynonymous Denisovan-derived changes
at fixed loci consists of changes in genes related to the formation and function of neurons.
The total number of neuron-related SNCs in this category is 41, with multiple changes on
a few genes for a total of 34 neuron-related genes. This was chosen as an area for
investigation for two reasons. The first is the relative frequency of these changes. The
second reason is the fact that neurological differences between modern humans and
ancient hominids are the subject of a persistent area of interest for researchers and the
public.
The locations of the mutations in neuronal genes were studied to determine
whether the mutations were located in a key functional domain and whether the amino
acid substitution represents a dramatic change in chemical properties. A table is included
in Appendix B showing all 40 genes, the position and nature of each amino acid change,
and the functional effect predictions for each change from three different prediction tools.
Amino acid change substitution positions and functional effect predictions are taken from
the supplementary material of Meyer et al.’s work on the Denisovan genome (Meyer et
al., 2012). In addition, the table includes information on the motif or domain where the
change is located.
42
After the motifs and domains of fixed derived changes in neuron-associated genes
were determined, the annotations and literature sources on each gene were studied to
determine which genes were of greatest interest for further follow-up. Out of 34 genes
that both appear in the Denisovan-derived catalog of fixed loci and are associated with
neuronal growth, function, or differentiation, 13 were chosen for further investigation and
discussion here. These genes are by no means the only neuron-associated genes in the
catalog worthy of investigation; other genes among the 34 may well be of equal interest
for future study. The 13 discussed below were chosen based on the fact that they are
currently known to have disease associations in the modern human population. In many
cases, these modern disease variants result in serious impairments unlikely to have been
present as a frequent allele in Denisovans. The purpose of addressing the diseases
associated with these genes is to point out the phenotypes affected by changes in these
genes, in the hopes of shedding light on the aspects of physiology that may have been
slightly modified in Denisovans as compared to modern humans.
This set of genes is selected for the presence of a mutation that is derived in
Denisovans at a locus that is fixed in modern humans. For this purpose, a locus is
considered fixed if no known polymorphism exists at the site, or if a polymorphism is
known to exist but has a frequency of less than one percent in the modern human
population. The mutations in the genes described below do not occur at low-frequency
polymorphism sites unless otherwise noted, which means that the loci of the changes are
not currently known to have direct associations with the diseases discussed below.
As in previous sections, the genes are listed according to chromosomal location.
43
L1CAM
L1CAM encodes a protein that functions in the cell adhesion of neurons.
Mutations in L1CAM have been associated with X-linked mental retardation and
hydrocephalus (Vits et al., 1994). The group of heritable diseases linked to L1CAM is
collectively known as CRASH Syndrome. CRASH stands for corpus callosum
hypoplasia, retardation, adducted thumbs, spasticity, and hydrocephalus. Deletions that
result in a shortening of the protein’s extracellular domain lead to the most severe effects
(Yamasaki, Thompson, & Lemmon, 1997). L1CAM is also highly expressed in several
types of cancer, including colon cancer, gliomas of the brain or spinal cord, breast cancer,
and pancreatic cancer. Its overexpression is associated with metastasis (Gavert et al.,
2007; Geismann et al., 2009; H. Zhang et al., 2011).
The point mutation in the Denisovan sequence results in a change is from leucine
to valine, both aliphatic, nonpolar, hydrophobic amino acids. This mutation is predicted
to be benign.
PCSK9
The effects of mutations in PCSK9 are more subtle than those in L1CAM. PCSK9
has been associated with hypercholesterolemia (Abifadel et al., 2003). However,
nonsense mutations have also been associated with lower than average levels of lowdensity lipoprotein (LDL) (Cohen et al., 2005). According to mouse studies, PCSK9’s
effect on cholesterol levels, via the destruction of LDL receptors in the brain, appears to
take place during embryogenesis. This gene also appears to be involved in the apoptosis
44
of neurons. Loss-of-function mutations in PCSK9 lead to an overabundance of LDL
cholesterol in the blood, whereas gain-of-function mutations decrease the amount of LDL
in circulation (Rousselet et al., 2011). This effect is due to the fact that during early
development, PCSK9 degrades LDL receptors (Canuel et al., 2013). PCSK9 may
continue to destroy LDL receptors in the liver in adult organisms (Horton, Cohen, &
Hobbs, 2007).
The change in the Denisovan version of PCSK9 is located in the primary
functional domain of this protein, a proprotein convertase subtilisin/kexin domain. The
mutation leads to a change from histidine in humans to leucine in Denisovans, a change
from a polar to a nonpolar amino acid. Despite this, the change is predicted to be benign.
HMCN1
The protein encoded by HMCN1 is an extracellular immunoglobulin. Based on
studies of a C. elegans homolog to HMCN1, this gene most likely plays a role in
anchoring mechanosensory neurons to the epidermis (R. W. Li, Li, & Wang, 2012). A
few rare mutations in HMCN1 are associated with age-related macular degeneration, but
many of the more common mutations in this gene do not appear to have an effect on
tendency toward developing macular degeneration (Fisher et al., 2007). Mutations in this
gene have also been shown to influence susceptibility to postpartum depression
(Friedman, 2009).
The Denisovan-specific allele in this gene is predicted to be deleterious. The
mutation changes amino acid 509 from arginine to tryptophan, a shift from a polar to a
45
nonpolar residue, and the change is located in an immunoglobulin domain, most likely
affecting the protein’s function.
SETD2
This gene encodes a histone H3K36 trimethyltransferase that is initially recruited
by RNA polymerase II and additionally recruited during the splicing process (De
Almeida et al., 2011). SETD2 interacts with huntingtin, and this interaction may be
mediated by an interaction with p53 (Xie et al., 2008). The pathogenesis of Huntington’s
disease is known to be due to the elongation of a polyglutamine region of the huntingtin
protein, but the mechanism of disease development is not known. The fact that huntingtin
interacts with WW domain-containing proteins such as SETD2 (also known as HYPB)
suggests that SETD2 and related genes may be involved with pathogenesis (Faber et al.,
1998). The disease mutant form of huntingtin binds to WW domains at a higher rate than
normal huntingtin, supporting the argument that SETD2 interaction is involved in the
disease (Passani et al., 2000).
The Denisovan variant in SETD2 is benign or neutral according to two prediction
tools and deleterious according to a third. It converts amino acid 1259 from a glycine in
modern humans to a serine in Denisovans, a shift from nonpolar to polar. The change is
not located within a major domain or motif.
46
REST
REST is crucial for the regulation of neuronal differentiation. In neuronal tissues,
it activates the expression of numerous genes involved in neural development, and it
suppresses these same genes in non-neuronal tissues. Reduced levels of REST expression
are a consequence of trisomy 21, both during early development and later in life
(Canzonetta et al., 2008). The lower expression of REST in Down syndrome leads to an
overexpression of DYRK1A, which encodes a highly conserved protein kinase believed
to be involved in memory and learning. This DYRK1A overexpression appears to be
responsible for the deregulation of a group of several genes involved in neural
development. Therefore, the interaction of REST and DYRK1A may be responsible for a
significant part of the trisomy 21 phenotype (Lepagnol-Bestel et al., 2009).
The Denisovan-specific allele in REST is not located within a major functional
domain or motif. The mutation changes amino acid 885 from threonine to alanine, a
change from a polar, hydrophilic amino acid to one that is nonpolar and hydrophobic.
The change is predicted to be benign, however, probably due to being located outside the
protein’s primary functional domains. The locus of this change is that of a known SNP,
rs1442591. The dominant modern-human nucleotide variant at this locus is an A. Both
the tiny percentage of modern humans with the minority variant and the Denisovan
sequence have a G at this locus. According to dbSNP, the G allele at this locus is not
currently known to have any clinical significance.
47
GDNF
GDNF is a neurotrophic factor particularly involved in the development of
peripheral neurons (Trupp et al., 1995). It is also a factor in the differentiation and
survival of dopaminergic neurons in the midbrain (Lin, Doherty, Lile, Bektesh, &
Collins, 1993). GDNF knockout mice lack kidneys and enteric neurons (Sánchez et al.,
1996). Hirschsprung disease, a genetic abnormality of the colon resulting from a lack of
enteric ganglion cells in the colon, is associated with several mutations in GDNF. None
of these mutations are sole causative agents of the disease, but they are part of a complex
of genetic factors that, in combination, can lead to Hirschsprung (Eketjäll & Ibáñez,
2002). GDNF modulates the excitability of dopamine neurons and may assist in synapse
formation and axon branching (Airaksinen & Saarma, 2002). Because of GDNF’s role in
the survival of dopaminergic neurons, in the late 1990s rodent and rhesus monkey trials
were conducted to determine the efficacy of administering the encoded protein for
dopaminergic neuron recovery in Parkinson’s disease. These studies indicated that GDNF
may be a possible treatment in humans (Date, Aoi, Tomita, Collins, & Ohmoto, 1998;
Gash et al., 1996). However, patient improvements in recent human clinical trials were
slow, and GDNF administration as a therapy for Parkinson’s requires further research
(Kordower & Bjorklund, 2013).
The SNC present in the Denisovan GDNF sequence is most likely deleterious. It
is located in a glial cell line-derived neurotrophic factor domain and causes a change in
amino acid 79 from aspartic acid to valine, a change from a polar to a nonpolar,
48
hydrophobic residue. This change is of interest because it suggests a possible distinction
in the regulation of dopaminergic neurons between Denisovans and modern humans.
NYAP1
The NYAP proteins regulate PI3K signaling, which is a key part of the
organization of neurons into their correct functional units. This signaling pathway is
being studied for its potential role in schizophrenia, as well as in autism and epilepsy
(Mack & Eickholt, 2011). NYAP1 was first characterized in 2011. Its name stands for
Neuronal tYrosine-phosphorylated Adaptor for the PI-3 kinase, and it is crucial for brain
development and neurite elongation (Yokoyama et al., 2011).
The Denisovan-specific allele in this gene is predicted to be neutral in its effect.
Use of motif and domain prediction tools did not identify any predicted domains. This
mutation changes a proline in modern humans to a leucine in Denisovans, replacing a
cyclic, nonpolar residue with one that is aliphatic, nonpolar, and hydrophobic.
CHAT
CHAT stands for choline O-acetyltransferase. It encodes the enzyme that
synthesizes the neurotransmitter acetylcholine. CHAT mutations are implicated in
Alzheimer’s disease (Oda, 1999). Mutations that lead to a reduction in levels of choline
O-acetyltransferase cause motor disorders (Cai et al., 2004).
49
The CHAT SNC unique to Denisovans is judged to be damaging. It is located
within a choline/carnitine o-acyltransferase functional domain. A neutral glutamine in
humans is replaced by a basic arginine in Denisovans.
NAV2
NAV2 is expressed in the brain during development and in the kidney in the adult
(Maes, Barceló, & Buesa, 2002). This gene’s product functions in the elongation of
neurons, neurite growth, and the development of cranial nerves. It also regulates blood
pressure in adult mammals. Mutations resulting in reduced protein expression produce a
phenotype that includes malformation of the glossopharyngeal and vagus nerves and
lower-than-normal nerve density in general (McNeill, Roos, Moechars, & Clagett-Dame,
2010).
The Denisovan-specific mutation in NAV2 is located in a calponin homology
domain. Calponin homology domains bind actin and are often involved in signaling
(Castresana & Saraste, 1995). The change may be deleterious. Amino acid 151 in human
NAV2 is aspartic acid, and in Denisovans 151 is asparagine, a shift from an acidic, polar,
charged amino acid to oen that is neutral, polar, charged. This difference in NAV2
provides another indicator of possible ways in which the Denisovan nervous system may
have differed, perhaps subtly, from that of modern humans.
50
CLN6
The protein encoded by CLN6 is expressed in the endoplasmic reticulum that aids
the function of lysosomes (Mole et al., 2004). Via several possible missense mutations
and deletions, CLN6 is altered in a neurodegenerative disease known as late infantile
neuronal ceroid lipofuscinoses (Sharp et al., 2003). A set of point mutations in this gene
is associated with adult-onset neuronal ceroid lipofuscinoses, or Kufs disease (Arsov et
al., 2011). Mutation of CLN6 is also implicated in a form of progressive,
neurodegenerative epilepsy (Andrade et al., 2012).
The effect of the difference between Denisovans and modern humans is probably
neutral. The SNC results in a shift from a valine at position 133 in humans to an
isoleucine in Denisovans. No major motifs or domains were identified.
HEXA
This gene encodes the α-subunit of the β-hexosaminidase A enzyme. Mutations
lead to low levels of the enzyme, which results in diseases related to problems with
lysosomal storage. Mutations in this gene are also responsible for Tay-Sachs disease,
which also involves deficient expression of β-hexosaminidase (Gort, de Olano, MaciasVidal, & Coll, 2012). The diseases associated with low levels of the enzyme are known
as the GM2 gangliosidoses, because without sufficient β-hexosaminidase A, the
glycosphingolipid GM2 ganglioside accumulates in the nervous system (Yamanaka et al.,
1994).
51
The mutation in this gene, which leads to the substitution of a phenylalanine in
Denisovans for a leucine in modern humans, is located within glycosyl-hydrolase family
20, domain 2. This change from one aliphatic, nonpolar, hydrophobic amino acid to
another is predicted to be benign.
CC2D1A
The protein encoded by this gene is a transcriptional repressor. Mutations in
CC2D1A are associated with a form of autosomal recessive non-syndromic mental
retardation, which is mental retardation defined by low IQ without physical disabilities.
The differences in intellectual function associated with this type of retardation are
believed to be related to abnormalities in neurons and synapses (Basel-Vanagaite et al.,
2006). CC2D1A’s encoded protein represses the expression of the gene for the serotonin1A receptor (Rogaeva & Albert, 2007). CC2D1A is expressed at lower-than-normal
levels in patients with depression (Szewczyk et al., 2010).
Perhaps more central to CC2D1A’s role in mental retardation is the fact that it is
crucial to synapse maturation. Abnormalities of the synapse response rate and synaptic
vesicle trafficking of cortical neurons was detected in CC2D1A-knockout mice (Zhao,
Raingo, & Kavalali, 2011).
The difference between Denisovan and modern human CC2D1A is located in a
domain of unknown function, and the SNC is predicted to be possibly damaging by
PolyPhen but is predicted to be non-damaging according to SIFT and Condel. The
52
mutation results in a change from the human glutamine at amino acid position 156 to a
glutamic acid in Denisovans, a shift from a polar, charged to a polar, uncharged residue.
NEFH
Mutations in the KS phosphorylation motif of NEFH are associated with the
development of amyotrophic lateral sclerosis (ALS) (Figlewicz et al., 1994). The encoded
protein, NF-H, is a neurofilament protein important for cross-linking. An excess
accumulation of neurofilaments, as well as abnormal phosphorylation of the tail region of
the genes encoding neurofilament proteins, is associated with both ALS and Alzheimer’s
disease (Q. Liu et al., 2004).
Two nonsynonymous single-nucleotide differences between modern human and
Denisovan NEFH are noted. The first affects amino acid 314, resulting in a change from
an alanine in modern humans to valine in Denisovans. This change is located within an
intermediate filament protein domain and is predicted to be deleterious. The second
change results in a change from a lysine at position 428 in modern humans to an arginine
in Denisovans. This change, not located within any major motif or domain, is predicted to
be tolerated.
Metabolic Genes
Several genes related to metabolism, including glycolytic enzymes and genes that
include SNPs related to dietary differences in the modern human population, were
investigated in a search for clues as to the possible dietary habits of Denisovans. Most of
53
the SNPs investigated showed that the Denisovan sequence did not have the modern
human minor variant. In other words, these are all loci at which the Denisovan shared the
ancestral state with a percentage of modern humans.
The SNPs that fall into the above category include rs174570, associated with
dwellers in the tropics, with a minor allele that affects cholesterol levels; rs2269426,
associated with a high-protein, high-fat diet that includes milk, with a minor allele that
affects plasma eosinophil count; rs7395662, associated with foraging subsistence
practices, with a minor allele that affects HDL levels; rs10507380, associated with
pastoral subsistence practices, with a minor allele that affects electrocardiographic traits;
rs17779747 and rs2722425, both associated with populations for whom roots and tubers
are the primary sources of calories, affecting QT interval and fasting glucose
respectively; and rs2237892, associated with a high consumption of grains. (Hancock et
al., 2010). This confirms that, as expected, Denisovans did not have some of the
variations known to have arisen in recent millennia in conjunction with the domestication
of milk-producing animals and the farming of cereal crops.
One SNP investigated for its association with a pastoral lifestyle, rs9642880,
which confers a susceptibility to bladder cancer (Hancock et al., 2010), the Denisovan
sequence was found to have the minor allele.
One other SNP, rs4751995 in pancreatic lipase-related protein 2 (PLRP2) is
worthy of special mention here. A frequent human variant at this locus is believed to be
associated with adaptation to a cereal-based diet (Hancock et al., 2010). In this case,
Denisovans do have the minor allele, a G at locus 10:118397884, rather than the major
54
modern human allele, an A. However, in this case, the minor modern variant is the
ancestral state, so this is consistent with the hypothesis that the Denisovan sequence does
not show hallmarks of any particular adaptations to a starchy diet.
The genes related to glycolysis were investigated for changes that involved a
derived Denisovan allele at a locus that is fixed in modern humans. In other words, the
search was for respects in which Denisovan glycoloysis may have evolved along its own
lines and differed not only from that of modern humans but also from that of most other
extant primates. Four glycolysis genes were found to differ in this respect, as shown in
Table 4.
Gene name
Locus of
Modern
nonsynonymous human
SNC
nucleotide/
Denisovan
nucleotide
1: 9324159
C/T
CDS
position
of
change
Modern
human amino
acid/Denisovan
amino acid
Protein
position
of
change
1607
P/L
536
PHGDH
phosphoglycerate
dehydrogenase
1: 120266007
A/G
299
N/S
100
PFKM
phosphofructokin
ase, muscle
PGAM5
phosphoglycerate
mutase family
member 5
12: 48501965
G/C
193
V/L
65
12: 133291580
G/T
328
V/L
110
H6PD hexose-6phosphate
dehydrogenase
(glucose 1dehydrogenase)
Table 4. SNCs in Genes Encoding Glycolytic Enzymes.
55
In H6PD, the change is located in a random coil within a glucose-6-phosphate
dehydrogenase domain. In PHGDH, the change is also located within a random coil. The
PHGDH change is found in a D-3-phosphoglycerate dehydrogenase domain. In PFKM,
the change is located in an extended strand outside of the main phosphofructokinase
domain. The change in PGAM5 is located in a random coil, in a phosphoglycerate mutase
domain.
Further study is needed to determine the significance of the changes found in
these enzymes.
Metabolic Genes with Known SNPs in Modern Humans
A small selection of genes was taken for the purpose of determining whether
known modern variants that have been the focus of existing research are present in the
Denisovan genome. This investigation, even for this limited set of genes, is preliminary.
Since these genes are known to be polymorphic in the modern human population, they
may well have been polymorphic in the Denisovan population also and will require
further investigation once more Denisovan individuals have been sequenced.
Hemochromatosis is a genetic disease that results in excess accumulation of iron
in multiple tissues. It results from a mutation in the HFE gene, also known as HLA-H,
found on chromosome 1 (Feder et al., 1997). A likely candidate mutation is the
nonsynonymous 845GA mutation in HFE (Beutler, Felitti, Koziol, Ho, & Gelbart,
2002). The Denisovan sequence lacks this mutation. The juvenile-onset form of
hemochromatosis results from mutations in another gene, HFE2, which is found on
56
chromosome 1(Papanikolaou et al., 2003). The Denisovan sequence for HFE2 shows no
variations from the majority human sequence, indicating that the sequenced Denisovan
individual did not carry the mutations most commonly associated with hemochromatosis.
CPT1A encodes carnitine palmitoyltransferase IA, which regulates the
metabolism of fatty acids. Among the Inuit people of Greenland and Canada, a loss-offunction mutation, p.P479L, is associated with higher-than-average levels of plasma HDL
and may be protective against atherosclerosis or other cardiovascular disease (Rajakumar
et al., 2009). This gene was investigated for the possibility that the Denisovans might
have shared this mutation, which is a candidate for adaptation to a cold climate and highprotein, high-fat diet. However, this common Inuit variant was not found in the
Denisovan sequence.
TCF7L2, which encodes transcription factor 7-like 2, was investigated for its
association with type 2 diabetes. Two polymorphisms in TCF7L2 in particular,
rs12255372 and rs7903146, result in poor glucose tolerance (Florez et al., 2006). The
Denisovan sequence was found to have the major human variant at rs12255372,
indicating that, in this individual at least, that risk factor for type 2 diabetes were not
present. This sequence does have the risk allele at rs7903146, however, indicating that at
least one risk allele for poor glucose tolerance was present in the Denisovans.
In modern humans, the minor variant of SNP rs9939609, in the first intron of the
FTO gene, is associated with an increased risk of obesity. In this case the derived state, a
T at locus 16:53820527, is the major modern human variant. The minor allele, an A at
this locus, is the ancestral state and is associated with a greater accumulation of fat mass
57
(Frayling et al., 2007). The Denisovan sequence might be expected to match the ancestral
state at this locus, given the fact that its evolution ceased before the development of
modern methods of obtaining surplus calories. This is in fact the case. The Denisovan
sequence has the ancestral A at the locus of rs9939609, indicating that the Denisovans
shared a tendency toward higher body mass with a significant minority of modern
humans. Interestingly, the Denisovan sequence also differs from the modern human
reference sequence at a different, non-polymorphic site, 16:53738106, where the
Denisovans have a G and modern humans, along with other extant primates, have an A.
FTO codes for an enzyme that appears to demethylate a minor type of DNA
lesion, 3-methylthymine. The protein product of this gene is highly expressed in the
brain, and cycles of feeding and fasting affect its abundance. However, the exact
mechanism by which FTO affects body mass is not known (Gerken et al., 2007). The
Denisovan-derived change occurs outside the FTO catalytic domain, and its impact on
protein function is predicted to be neutral.
58
DISCUSSION
Selective Pressures in Male Reproductive Genes
Sites located in male reproductive genes make up a relatively high percentage of
the catalog of nonsynonymous single-nucleotide changes between modern humans and
Denisovans, at loci that are fixed in modern humans. This is unsurprising since
reproductive genes, and male reproductive genes in particular, are under greater selective
pressures than genes expressed in tissues not related to reproduction. Two of the genes
found to have fixed-locus differences between the two sequences, SEMG2 and
ADAM18, have previously been shown to undergo greater positive selection in species in
which greater post-mating sperm competition is a factor. One of these genes, SEMG2,
was evaluated for comparative degrees of positive selection in modern humans and
Denisovans, as measured by dN/dS. When both the Denisovan and modern human
sequences were compared to the chimpanzee and gorilla sequences, both hominid
lineages were found to be more closely aligned to chimpanzees than to gorillas. This
effect was more pronounced in the Denisovan lineage, indicating greater disparity
between modern humans’ mating practices and the multimale-multifemale mating
practices of chimpanzees than between Denisovans and chimpanzees. This indicates the
possibility that Denisovans may have tended to have a greater degree of post-copulatory
selective pressure on sperm, due to females mating with more males per periovulatory
period.
59
Human females and, presumably, sister lineages such as the Denisovans and
Neanderthals, tend to average just over one male partner per periovulatory period. The
tentative finding of this research suggests that, solely in terms of mating patterns,
Denisovans may have been slightly more chimpanzee-like than modern humans—that is,
in comparison to modern humans they may have tended more toward matings between
one female and multiple males in the same fertility cycle. This might also be expressed
conversely, in the statement that in comparison to Denisovans, modern humans may have
slightly more tendency toward polygyny, in which a single male has access to multiple
females who rarely mate with multiple males during a single fertility cycle. Useful future
research on this topic would include an analysis of Denisovan ADAM2, ADAM18, and
ADAM23.
Neuronal Genes
Changes in neuron-related genes represent almost two percent of the catalog of
Denisovan-specific changes at fixed loci. This frequency warrants investigation. The
nature of the changes in neuronal genes, overall, is not drastic, nor are most of the
changes associated with known polymorphisms in modern humans. Instead, the nature of
the sub-catalog of fixed derived changes suggests that Denisovans had subtle
neurological different from modern humans, not dramatic ones. This finding calls for
further research on the specific consequences of the changes noted, particularly those
predicted to be damaging and located within primary functional domains of the encoded
proteins.
60
Several of the neuronal changes noted are of particular interest. One of these is
the change in a glial cell line-derived neurotrophic factor domain of GDNF, which is
involved in the regulation of dopaminergic neurons. Another is the change in NYAP1,
which while it is not currently predicted to be damaging, suggests the possibility of some
effect on brain function due to this gene’s possible role in epilepsy, autism, and
schizophrenia. Yet another gene that may be the site of some of the features that made
Denisovans unique is NAV2, which includes two Denisovan-derived changes at fixed
loci and is involved in regulating nerve density as well as the formation of the
glossopharyngeal and vagus nerves. Another of the neuronal genes studied, CC2D1A, is
intriguing for its role in mental retardation and its function in proper synapse formation.
Future study of the Denisovan-specific mutations in neuronal genes may include
phylogenetic analysis of these genes and laboratory study of these specific mutations.
Premature Stops and Loss-of-Stop Mutations
Not all premature stops in the Denisovan genome have been described here,
because this investigation of premature stops is limited to those found in genes with a
CCDS and that are due to mutations at loci that are fixed in the modern human
population. The majority of the premature stops discovered in this research are found in
genes whose functions are most likely also performed by other genes, and therefore the
impact of these loss-of-function mutations may not have been extensive. The sole
exception is NADK, which is essential in modern humans and has been described as the
only gene performing its function, which is to phosphorylate NAD(+). However, a
61
mitochondrial gene that phosphorylates NAD(+), MNADK, has recently been described.
Other genes may also exist that perform this function but have not yet been discovered.
A significant opportunity for future studies lies in the realm of investigating
implications of the inactivation of these genes.
Only three loss-of-stop mutations are found in the Denisovan sequence at fixed
loci. These present the possibility of differences in function in PRR15, a gene associated
with Alzheimer’s disease; ZNF804B, a gene that may be associated with anorexia
nervosa; and KTN1, a gene associated with muscular dystrophy. The precise
consequences of these loss-of-stop mutations are unknown, but the mutation in PRR15
adds to the evidence for some differences in neuronal function. Like several of the genes
described in the neuronal gene section, PRR15 mutations are related to neuronal function
later in life. This suggests the need for future studies on the possibility that Denisovan
neurons aged differently from those of modern humans. One tentative hypothesis might
be that perhaps the Denisovans on average may have been more susceptible to agerelated neurological degeneration, conditions similar to Alzheimer’s and Parkinson’s
disease, than modern humans. The converse hypothesis could be possible as well. These
changes could have been protective against damage due to aging. However, this is less
likely, given the fact that most of the changes are predicted to have deleterious effects.
Future research could include single-gene knockout studies on the genes inactivated in
the Denisovan sequence.
62
Metabolic Genes
Several SNPs known to be associated with adaptations to foods from cultivated or
domesticated sources, primarily starches, cereals, and milk, were investigated. In all cases
except one, the Denisovan sequence shares the ancestral allele and, as expected, lacks the
allele associated with evolutionarily recent dietary adaptations. The finding in the
Denisovan sequence of a minor allele associated with a pastoral lifestyle does not offer
any further information about Denisovan subsistence practices, because the minor variant
of this SNP is associated with bladder cancer susceptibility rather than having a direct
metabolic effect. While it is no surprise to learn that Denisovans most likely did not eat
grains, the result is informative nonetheless, because the sequence also lacks variants
associated with the frequent consumption of roots and tubers, which may conceivably
have been part of a wild-foods diet among hominids long before the advent of
horticulture or agriculture. Although we may never be able to reconstruct exactly what
the average Denisovan consumed in a typical day, we may hypothesize that he or she
probably did not obtain a large percentage of calories from high-starch foods.
Unique Denisovan variants at fixed modern human loci were found in four of the
genes that code for enzymes involved in glycolysis. These changes are may provide the
foundation for future research.
Additional genes investigated that are related to metabolism include HFE and
HFE2, CPT1A, and TCF7L2. At most of the SNP loci in these genes, the Denisovan
sequence shares the major human variant, indicating no evidence of hemochromatosis
and no evidence of cold-climate adaptation resulting in elevation of plasma HDL. The
63
one exception is that the Denisovan sequence has the risk allele at one of the two SNPs in
transcription factor 7-like 2 that confer diabetes risk. FTO was also investigated, and the
Denisovan sequence was found to share the ancestral human allele at rs9939609, as well
as having a derived variant not located within the FTO catalytic domain and predicted to
be neutral in effect.
Future research on metabolic genes may include study of additional diet-related
SNPs, a search for additional risk alleles for diabetes, and analysis of the derived exonic
SNC in FTO.
64
REFERENCES
Abi-Rached, L., Jobin, M. J., Kulkarni, S., McWhinnie, A., Dalva, K., Gragert, L., . . .
Plummer, F. A. (2011). The shaping of modern human immune systems by
multiregional admixture with archaic humans. Science, 334(6052), 89-94.
Abifadel, M., Varret, M., Rabès, J., Allard, D., Ouguerram, K., Devillers, M., . . . Erlich,
D.. (2003). Mutations in PCSK9 cause autosomal dominant hypercholesterolemia.
Nature Genetics, 34(2), 154-156.
Agledal, L., Niere, M., & Ziegler, M.. (2010). The phosphate makes a difference: cellular
functions of NADP. Redox Report, 15(1), 2-10.
Agoni, L., Golden, A., Guha, C., & Lenz, J. (2012). Neandertal and Denisovan
retroviruses. Current Biology, 22(11), R437-R438.
Airaksinen, M. S, & Saarma, M. (2002). The GDNF family: signalling, biological
functions and therapeutic value. Nature Reviews Neuroscience, 3(5), 383-394.
Andrade, D. M., Paton, T., Turnbull, J., Marshall, C. R., Scherer, S. W., & Minassian, B.
A. (2012). Mutation of the CLN6 gene in teenage-onset progressive myoclonus
epilepsy. Pediatric Neurology, 47(3), 205-208. doi:
10.1016/j.pediatrneurol.2012.05.004
Aragones, G., Garcia-Heredia, A., Guardiola, M., Rull, A., Beltran-Debon, R.,
Marsillach, J., . . . Camps, J. (2012). Serum paraoxonase-3 concentration in HIVinfected patients. Evidence for a protective role against oxidation. Journal of
Lipid Research, 53(1), 168-174. doi: 10.1194/jlr.P018457
65
Arsov, T., Smith, K. R, Damiano, J., Franceschetti, S., Canafoglia, L., Bromhead, C. J, . .
. Rajagopalan, S. (2011). Kufs Disease, the major adult form of neuronal ceroid
lipofuscinosis, caused by mutations in< i> CLN6</i>. The American Journal of
Human Genetics, 88(5), 566-573.
Aurino, S., Piluso, G., Saccone, V., Cacciottolo, M., D'Amico, F., Dionisi, M., . . . Nigro,
V. (2008). Candidate-gene testing for orphan limb-girdle muscular dystrophies.
Acta Myologica, 27, 90-97.
Babeto, E., Conceicao, A. L., Valsechi, M. C., Peitl Junior, P., de Campos Zuccari, D. A.,
de Lima, L. G., . . . Rahal, P. (2011). Differentially expressed genes in giant cell
tumor of bone. Virchows Archiv, 458(4), 467-476. doi: 10.1007/s00428-0111047-4
Basel-Vanagaite, L., Attia, R., Yahav, M., Ferland, R. J, Anteki, L., Walsh, C. A, . . .
Taub, E. (2006). The CC2D1A, a member of a new gene family with C2 domains,
is involved in autosomal recessive non-syndromic mental retardation. Journal of
Medical Genetics, 43(3), 203-210.
Berger, A., Benveniste, P., Corfe, S. A, Tran, A. H, Barbara, M., Wakeham, A., . . .
Paige, C. J. (2010). Targeted deletion of the tachykinin 4 gene (TAC4−/−)
influences the early stages of B lymphocyte development. Blood, 116(19), 37923801.
Besecker, J., Cornell, K. A, & Hampikian, G. (2012). Dynamic passivation with BSA
overcomes LTCC mediated inhibition of PCR. Sensors and Actuators B:
Chemical.
66
Beutler, E., Felitti, V. J, Koziol, J. A, Ho, N. J, & Gelbart, T. (2002). Penetrance of
845G→ A (C282Y)< i> HFE</i> hereditary haemochromatosis mutation in the
USA. The Lancet, 359(9302), 211-218.
Burbano, H.A., Green, R.E., Maricic, T., Lalueza-Fox, C., de La Rasilla, M., Rosas, A., .
. . Pääbo, S. (2012). Analysis of human accelerated DNA regions using archaic
hominin genomes. PloS One, 7(3), e32877.
Cai, Y., Cronin, C. N., Engel, A. G., Ohno, K., Hersh, L. B., & Rodgers, D. W. (2004).
Choline acetyltransferase structure reveals distribution of mutations that cause
motor disorders. The EMBO Journal, 23(10), 2047-2058.
Canuel, M., Sun, X., Asselin, M., Paramithiotis, E., Prat, A., & Seidah, N. G. (2013).
Proprotein convertase subtilisin/kexin type 9 (PCSK9) can mediate degradation of
the low density lipoprotein receptor-related protein 1 (LRP-1). PloS One, 8(5),
e64145.
Canzonetta, C., Mulligan, C., Deutsch, S., Ruf, S., O'Doherty, A., Lyle, R., . . . Groet, Jü.
(2008). DYRK1A-dosage imbalance perturbs NRSF/REST levels, deregulating
pluripotency and embryonic stem cell fate in Down syndrome. American Journal
of Human Genetics, 83(3), 388.
Castresana, J., & Saraste, M. (1995). Does Vav bind to F-actin through a CH domain?
FEBS letters, 374(2), 149-151.
Chalasani, N., Guo, X., Loomba, R., Goodarzi, M. O., Haritunians, T., Kwon, S., . . .
Rotter, J. I. (2010). Genome-wide association study identifies variants associated
67
with histologic features of nonalcoholic fatty liver disease. Gastroenterology,
139(5), 1567-1576, 1576 e1561-1566. doi: 10.1053/j.gastro.2010.07.057
Cohen, J., Pertsemlidis, A., Kotowski, I. K, Graham, R., Garcia, C. K., & Hobbs, H. H.
(2005). Low LDL cholesterol in individuals of African descent resulting from
frequent nonsense mutations in PCSK9. Nature Genetics, 37(2), 161-165.
Crisci, J. L., Wong, A., Good, J. M., & Jensen, J. D. (2011). On characterizing adaptive
events unique to modern humans. Genome Biology and Evolution, 3, 791.
Date, I., Aoi, M., Tomita, S., Collins, F., & Ohmoto, T. (1998). GDNF administration
induces recovery of the nigrostriatal dopaminergic system both in young and aged
parkinsonian mice. Neuroreport, 9(10), 2365-2369.
De Almeida, S. F., Grosso, A. R., Koch, F., Fenouil, R., Carvalho, S., Andrade, J., . . .
Gut, I. (2011). Splicing enhances recruitment of methyltransferase HYPB/Setd2
and methylation of histone H3 Lys36. Nature Structural & Molecular Biology,
18(9), 977-983.
Dorus, S., Evans, P. D., Wyckoff, G. J., Choi, S. S., & Lahn, B. T. (2004). Rate of
molecular evolution of the seminal protein gene SEMG2 correlates with levels of
female promiscuity. Nature Genetics, 36(12), 1326-1329.
Eketjäll, S., & Ibáñez, C. F. (2002). Functional characterization of mutations in the
GDNF gene of patients with Hirschsprung disease. Human Molecular Genetics,
11(3), 325-329.
Ellegren, H., & Parsch, J. (2007). The evolution of sex-biased genes and sex-biased gene
expression. Nature Reviews Genetics, 8(9), 689-698.
68
Endicott, P., Ho, S. Y. W., & Stringer, Ch. (2010). Using genetic evidence to evaluate
four palaeoanthropological hypotheses for the timing of Neanderthal and modern
human origins. Journal of Human Evolution, 59(1), 87-95.
Erlich, P. M., Lunetta, K. L., Cupples, L. A., Abraham, C. R., Green, R. C., Baldwin, C.
T., & Farrer, L. A. (2012). Serum paraoxonase activity is associated with variants
in the PON gene cluster and risk of Alzheimer disease. Neurobiology of Aging,
33(5), 1015 e1017-1023. doi: 10.1016/j.neurobiolaging.2010.08.003
Faber, P. W., Barnes, G. T., Srinidhi, J., Chen, J., Gusella, J. F., & MacDonald, M. E.
(1998). Huntingtin interacts with a family of WW domain proteins. Human
Molecular Genetics, 7(9), 1463-1474.
Feder, J. N., Tsuchihashi, Z., Irrinki, A., Lee, V. K., Mapa, F. A., Morikang, E., . . .
Parkkila, S. (1997). The hemochromatosis founder mutation in HLA-H disrupts
β2-microglobulin interaction and cell surface expression. Journal of Biological
Chemistry, 272(22), 14025-14028.
Figlewicz, D. A., Krizus, A., Martinoli, M. G., Meininger, V., Dib, M., Rouleau, G. A., &
Julien, J. (1994). Variants of the heavy neurofilament subunit are associated with
the development of amyotrophic lateral sclerosis. Human Molecular Genetics,
3(10), 1757-1761.
Finn, S., & Civetta, A. (2010). Sexual selection and the molecular evolution of ADAM
proteins. Journal of Molecular Evolution, 71(3), 231-240.
Fisher, S. A., Rivera, A., Fritsche, L. G., Keilhauer, C. N., Lichtner, P., Meitinger, T., . . .
Weber, B. H. F. (2007). Case–control genetic association study of fibulin‐6
69
(FBLN6 or HMCN1) variants in age‐related macular degeneration (AMD).
Human Mutation, 28(4), 406-413.
Florez, J. C., Jablonski, K. A., Bayley, N., Pollin, T. I., de Bakker, P. I. W., Shuldiner, A.
R., . . . Altshuler, D. (2006). TCF7L2 polymorphisms and progression to diabetes
in the Diabetes Prevention Program. New England Journal of Medicine, 355(3),
241-250.
Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren,
C. M., . . . Rayner, N. W. (2007). A common variant in the FTO gene is
associated with body mass index and predisposes to childhood and adult obesity.
Science, 316(5826), 889-894.
Friedman, S. H. (2009). Postpartum mood disorders: genetic progress and treatment
paradigms. American Journal of Psychiatry, 166(11), 1201-1204.
Gash, D. M., Zhang, Z., Ovadia, A., Cass, W. A., Yi, A., Simmerman, L., . . . Collins, F.
(1996). Functional recovery in parkinsonian monkeys treated with GDNF. Nature,
380, 252-255.
Gavert, N., Sheffer, M., Raveh, S., Spaderna, S., Shtutman, M., Brabletz, T., . . .
Domany, E. (2007). Expression of L1-CAM and ADAM10 in human colon
cancer cells induces metastasis. Cancer Research, 67(16), 7703-7712.
Geismann, C., Morscheck, M., Koch, D., Bergmann, F., Ungefroren, H., Arlt, A., . . .
Sipos, B.. (2009). Up-regulation of L1CAM in pancreatic duct cells is
transforming growth factor β1–and slug-dependent: role in malignant
transformation of pancreatic cancer. Cancer Research, 69(10), 4517-4526.
70
Gerken, T., Girard, C. A., Tung, Y. L., Webby, C. J., Saudek, V., Hewitson, K. S., . . .
McNeill, L. A. (2007). The obesity-associated FTO gene encodes a 2oxoglutarate-dependent nucleic acid demethylase. Science, 318(5855), 14691472.
Gibbons, A. (2011a). A new view of the birth of Homo sapiens. Science, 331, 392-394.
Gibbons, A. (2011b). Who were the Denisovans? Science, 333, 1084-1087.
Good, J. M., Wiebe, V., Albert, F. W., Burbano, H. A., Kircher, M., Green, R. E, . . .
Fischer, A. (2013). Comparative population genomics of the ejaculate in humans
and the great apes. Molecular Biology and Evolution.
Gort, L., de Olano, N., Macias-Vidal, J., & Coll, M. A. (2012). GM2 gangliosidoses in
Spain: analysis of the HEXA and HEXB genes in 34 Tay-Sachs and 14 Sandhoff
patients. Gene, 506(1), 25-30. doi: 10.1016/j.gene.2012.06.080
Gralle, M., & Pääbo, S. (2011). A comprehensive functional analysis of ancestral human
signal peptides. Molecular Biology and Evolution, 28(1), 25-28.
Grayson, P., & Civetta, A. (2012). Positive selection and the evolution of izumo genes in
mammals. International Journal of Evolutionary Biology, 2012.
Green, R. E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., . . . Fritz, M.
H. Y. (2010). A draft sequence of the Neandertal genome. Science, 328(5979),
710-722.
Halder, D., Park, J. H., Choi, M. R., Chai, J. C., Lee, Y. S., Mandal, C., . . . Chai, Y. G.
(2013). Chronic ethanol exposure increases goosecoid (GSC) expression in
71
human embryonic carcinoma cell differentiation. Journal of Applied Toxicology.
doi: 10.1002/jat.2832
Hancock, A. M., Witonsky, D. B., Ehler, E., Alkorta-Aranburu, G., Beall, C.,
Gebremedhin, A., . . . Coop, G.. (2010). Human adaptations to diet, subsistence,
and ecoregion are due to subtle shifts in allele frequency. Proceedings of the
National Academy of Sciences, 107(Supplement 2), 8924-8930.
Heinzen, E. L., Radtke, R. A., Urban, T. J., Cavalleri, G. L., Depondt, C., Need, A. C., . .
. Catarino, C. B.. (2010). Rare deletions at 16p13. 11 predispose to a diverse
spectrum of sporadic epilepsy syndromes. The American Journal of Human
Genetics, 86(5), 707-718.
Horton, J. D, Cohen, J. C., & Hobbs, H. H. (2007). Molecular biology of PCSK9: its role
in LDL metabolism. Trends in Biochemical Sciences, 32(2), 71-77.
Hurst, L. D. (2002). The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends
in Genetics: TIG, 18(9), 486.
Ijsselstijn, L., Dekker, L. J., Stingl, C., van der Weiden, M. M., Hofman, A., Kros, J. M.,
. . . Luider, T. M. (2011). Serum levels of pregnancy zone protein are elevated in
presymptomatic Alzheimer's disease. Journal of Proteome Research, 10(11),
4902-4910. doi: 10.1021/pr200270z
Ikegami, K., Horigome, D., Mukai, M., Livnat, I., MacGregor, G. R., & Setou, M.
(2008). TTLL10 is a protein polyglycylase that can modify nucleosome assembly
protein 1. FEBS Letters, 582(7), 1129-1134. doi: 10.1016/j.febslet.2008.02.079
72
Katzman, S., Kern, A. D., Pollard, K. S., Salama, S. R., & Haussler, D. (2010). GCbiased evolution near human accelerated regions. PLoS Genetics, 6(5), e1000960.
Kempster, S. L., Belteki, G., Licence, D., Charnock-Jones, D. S., & Smith, G. C. (2012).
Disruption of paraoxonase 3 impairs proliferation and antioxidant defenses in
human A549 cells and causes embryonic lethality in mice. American Journal of
Physiology, Endocrinology and Metabolism, 302(1), E103-107. doi:
10.1152/ajpendo.00357.2011
Koffler, J., Holzinger, D., Sanhueza, G. A., Flechtenmacher, C., Zaoui, K., Lahrmann, B.,
. . . Hess, J. (2012). Submaxillary gland androgen-regulated protein 3A expression
is an unfavorable risk factor for the survival of oropharyngeal squamous cell
carcinoma patients after surgery. European Archives of Oto-Rhino-Laryngology,
1-8.
Kordower, J. H., & Bjorklund, A. (2013). Trophic factor gene therapy for Parkinson's
disease. Movement Disorders, 28(1), 96-109.
Krause, J., Fu, Q., Good, J. M., Viola, B., Shunkov, M. V., Derevianko, A. P., & Pääbo,
S. (2010). The complete mitochondrial DNA genome of an unknown hominin
from southern Siberia. Nature, 464(7290), 894-897.
Kryazhimskiy, S., & Plotkin, J. B. (2008). The population genetics of dN/dS. PLoS
Genetics, 4(12), e1000304.
Lalueza-Fox, C., & Gilbert, M. T. P. (2011). Paleogenomics of archaic hominins. Current
Biology, 21(24), R1002-R1009.
73
Lepagnol-Bestel, A., Zvara, A., Maussion, G., Quignon, F., Ngimbous, B., Ramoz, N., . .
. Agier, N. (2009). DYRK1A interacts with the REST/NRSF-SWI/SNF chromatin
remodelling complex to deregulate gene clusters involved in the neuronal
phenotypic traits of Down syndrome. Human Molecular Genetics, 18(8), 14051414.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., . . . Durbin, R.
(2009). The sequence alignment/map format and SAMtools. Bioinformatics,
25(16), 2078-2079.
Li, R. W., Li, C., & Wang, T. T. Y. (2012). Transcriptomic alterations in human prostate
cancer cell LNCaP tumor xenograft modulated by dietary phenethyl
isothiocyanate. Molecular Carcinogenesis.
Lin, L., Doherty, D. H., Lile, J. D., Bektesh, S., & Collins, F. (1993). GDNF: a glial cell
line-derived neurotrophic factor for midbrain dopaminergic neurons. Science,
260(5111), 1130-1132.
Liu, L., Markus, I., Saghire, H. E., Perera, D. S., King, D. W., & Burcher, E. (2011).
Distinct differences in tachykinin gene expression in ulcerative colitis, Crohn's
disease and diverticular disease: a role for hemokinin-1? Neurogastroenterol
Motil, 23(5), 475-483, e179-480. doi: 10.1111/j.1365-2982.2011.01685.x
Liu, Q., Xie, F., Siedlak, S. L., Nunomura, A., Honda, K., Moreira, P. I., . . . Perry, G.
(2004). Neurofilament proteins in neurodegenerative diseases. Cellular and
Molecular Life Sciences CMLS, 61(24), 3057-3075.
74
Mack, T. G. A., & Eickholt, B. J. (2011). New WAVEs in neuronal PI3K signalling. The
EMBO Journal, 30(23), 4693-4695.
Maes, T., Barceló, A., & Buesa, C. (2002). Neuron navigator: a human gene family with
homology to unc-53, a cell guidance gene from Caenorhabditis elegans.
Genomics, 80(1), 21-30.
Maricic, T., Günther, V., Georgiev, O., Gehre, S., Ćurlin, M., Schreiweis, C., . . .
Lalueza-Fox, C.. (2012). A recent evolutionary change affects a regulatory
element in the human FOXP2 gene. Molecular Biology and Evolution.
Matsumoto, C., Ito, M., Yamada, H., Yamakawa, N., Yoshida, H., Date, A., . . .
Miyauchi, A. (2013). Genes that characterize T3-predominant Graves' thyroid
tissues. European Journal of Endocrinology, 168(2), 137-144.
McIntosh, A. M., Bennett, C., Dickson, D., Anestis, S. F., Watts, D. P., Webster, T. H., . .
. Bradley, B. J. (2012). The apolipoprotein E (APOE) gene appears functionally
monomorphic in chimpanzees (Pan troglodytes). PloS One, 7(10), e47760.
McNeill, E. M., Roos, K. P., Moechars, D., & Clagett-Dame, M. (2010). Nav2 is
necessary for cranial nerve development and blood pressure regulation. Neural
Development, 5(6).
Mendez, F. L., Watkins, J. C., & Hammer, M. F. (2012). Global genetic variation at
OAS1 provides evidence of archaic admixture in Melanesian populations.
Molecular Biology and Evolution, 29(6), 1513-1520.
75
Meyer, M., Kircher, M., Gansauge, M. T., Li, H., Racimo, F., Mallick, S., . . . de Filippo,
C. (2012). A high-coverage genome sequence from an archaic denisovan
individual. Science, 338(6104), 222-226.
Mole, S. E., Michaux, G., Codlin, S., Wheeler, R. B., Sharp, J. D., & Cutler, D. F. (2004).
CLN6, which is associated with a lysosomal storage disease, is an endoplasmic
reticulum protein. Experimental Cell Research, 298(2), 399-406.
Mota, N. R., Araujo-Jnr, E. V., Paixão-Côrtes, V. R., Bortolini, M. C., & Bau, C. H. D.
(2012). Linking dopamine neurotransmission and neurogenesis: The evolutionary
history of the NTAD (NCAM1-TTC12-ANKK1-DRD2) gene cluster. Genetics
and Molecular Biology, 35(4), 912-918.
Murdock, D. R., Clark, G. D., Bainbridge, M. N., Newsham, I., Wu, Y., Muzny, D. M., . .
. Ramocki, M. B. (2011). Whole-exome sequencing identifies compound
heterozygous mutations in WDR62 in siblings with recurrent polymicrogyria.
American Journal of Medical Genetics Part A, 155(9), 2071-2077. doi:
10.1002/ajmg.a.34165
Oda, Y. (1999). Choline acetyltransferase: the structure, distribution and pathologic
changes in the central nervous system. Pathology International, 49(11), 921-937.
Ohashi, K., Kawai, S., Koshimizu, M., & Murata, K. (2011). NADPH regulates human
NAD kinase, a NADP(+)-biosynthetic enzyme. Molecular and Cellular
Biochemistry, 355(1-2), 57-64. doi: 10.1007/s11010-011-0838-x
Olah, J., Vincze, O., Virok, D., Simon, D., Bozso, Z., Tokesi, N., . . . Ovadi, J. (2011).
Interactions of pathological hallmark proteins: tubulin polymerization promoting
76
protein/p25, beta-amyloid, and alpha-synuclein. The Journal of Biological
Chemistry, 286(39), 34088-34100. doi: 10.1074/jbc.M111.243907
Ovchinnikov, I. V., & Kholina, O. I. (2010). Genome digging: insight into the
mitochondrial genome of Homo. PLoS One, 5(12), e14278.
Papanikolaou, G., Samuels, M. .E, Ludwig, E. H., MacDonald, M. L. E., Franchini, P. L.,
Dubé, M., . . . Politou, M. (2003). Mutations in HFE2 cause iron overload in
chromosome 1q–linked juvenile hemochromatosis. Nature Genetics, 36(1), 77-82.
Passani, L. A., Bedford, M. T., Faber, P. W., McGinnis, K. M., Sharp, A. H., Gusella, J.
F., . . . MacDonald, M. E.. (2000). Huntingtin’s WW domain partners in
Huntington’s disease post-mortem brain fulfill genetic criteria for direct
involvement in Huntington’s disease pathogenesis. Human Molecular Genetics,
9(14), 2175-2182.
Pollak, N., Niere, M., & Ziegler, M. (2007). NAD kinase levels control the NADPH
concentration in human cells. Journal of Biological Chemistry, 282(46), 3356233571.
Rajakumar, C., Ban, M. R., Cao, H., Young, T. K., Bjerregaard, P., & Hegele, R. A..
(2009). Carnitine palmitoyltransferase IA polymorphism P479L is common in
Greenland Inuit and is associated with elevated plasma apolipoprotein AI. Journal
of Lipid Research, 50(6), 1223-1228.
Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., . . . Johnson,
P.L.F. (2010). Genetic history of an archaic hominin group from Denisova Cave
in Siberia. Nature, 468(7327), 1053-1060.
77
Reich, D., Patterson, N., Kircher, M., Delfin, F., Nandineni, M. R., Pugach, I., . . .
Phipps, M. E. (2011). Denisova admixture and the first modern human dispersals
into Southeast Asia and Oceania. The American Journal of Human Genetics,
89(4), 516-528.
Rogaeva, A., & Albert, P. R.. (2007). The mental retardation gene CC2D1A/Freud‐1
encodes a long isoform that binds conserved DNA elements to repress gene
transcription. European Journal of Neuroscience, 26(4), 965-974.
Rohland, N., & Hofreiter, M. (2007). Ancient DNA extraction from bones and teeth.
Nature Protocols, 2(7), 1756-1762.
Rooney, A. P., & Zhang, J. (1999). Rapid evolution of a primate sperm protein:
relaxation of functional constraint or positive Darwinian selection? Molecular
Biology and Evolution, 16(5), 706-710.
Rousselet, E., Marcinkiewicz, J., Kriz, J., Zhou, A., Hatten, M. E., Prat, A., & Seidah, N.
G. (2011). PCSK9 reduces the protein levels of the LDL receptor in mouse brain
during development and after ischemic stroke. Journal of Lipid Research, 52(7),
1383-1391.
Sackton, T. B., Corbett-Detig, R. B., Nagaraju, J., Vaishna, R. L., Arunkumar, K. P., &
Hartl, D. L. (2013). Positive selection drives faster-Z evolution in silkmoths.
arXiv preprint arXiv:1304.7670.
Sánchez, M. P., Silos-Santiago, I., Frisén, J., He, B., Lira, S. A., & Barbacid, M. (1996).
Renal agenesis and the absence of enteric neurons in mice lacking GDNF. Nature,
382, 70-73.
78
Sauter, D., Vogl, M., & Kirchhoff, F. (2011). Ancient origin of a deletion in human
BST2/Tetherin that confers protection against viral zoonoses. Human Mutation,
32(11), 1243-1245.
Sharp, J. D., Wheeler, R. B., Parker, K. A., Gardiner, R. M., Williams, R. E., & Mole, S.
E. (2003). Spectrum of CLN6 mutations in variant late infantile neuronal ceroid
lipofuscinosis. Human Mutation, 22(1), 35-42.
Shen, Y. C., Tsai, H. M., Cheng, M. C., Hsu, S. H., Chen, S. F., & Chen, C. H. (2012).
Genetic and functional analysis of the gene encoding GAP-43 in schizophrenia.
Schizophrenia Research, 134(2-3), 239-245. doi: 10.1016/j.schres.2011.11.016
Skoglund, P., & Jakobsson, M. (2011). Archaic human ancestry in East Asia.
Proceedings of the National Academy of Sciences, 108(45), 18301-18306.
Stewart, J. R., & Stringer, C. B. (2012). Human evolution out of Africa: The role of
refugia and climate change. science, 335(6074), 1317-1321.
Suyama, M., Torrents, D., & Bork, P. (2006). PAL2NAL: robust conversion of protein
sequence alignments into the corresponding codon alignments. Nucleic Acids
Research, 34(suppl 2), W609-W612.
Szewczyk, B., Albert, P. R., Rogaeva, A., Fitzgibbon, H., May, W. L., Rajkowska, G., . .
. Kyle, P. B. (2010). Decreased expression of Freud-1/CC2D1A, a transcriptional
repressor of the 5-HT1A receptor, in the prefrontal cortex of subjects with major
depression. Int J Neuropsychopharmacology, 13(8), 1089-1101.
Tong, Y., Tar, M., Melman, A., & Davies, K. (2008). The opiorphin gene (ProL1) and its
homologues function in erectile physiology. BJU International, 102(6), 736-740.
79
Torgerson, D. G, Kulathinal, R. J., & Singh, R. S. (2002). Mammalian sperm proteins are
rapidly evolving: evidence of positive selection in functionally diverse genes.
Molecular Biology and Evolution, 19(11), 1973-1980.
Trinkaus, E. (2010). Denisova Cave, Peştera cu Oase, and Human Divergence in the Late
Pleistocene. PaleoAnthropology, 2010, 196-200.
Trupp, M., Rydén, M., Jörnvall, H., Funakoshi, H., Timmusk, T., Arenas, E., & Ibáñez,
C. F. (1995). Peripheral expression and biological activities of GDNF, a new
neurotrophic factor for avian and mammalian peripheral neurons. The Journal of
Cell Biology, 130(1), 137-148.
Vickaryous, M. K., & Hall, B. K. (2006). Human cell type diversity, evolution,
development, and classification with special reference to cells derived from the
neural crest. Biological Reviews, 81(3), 425-455.
Vits, L., Van Camp, G., Coucke, P., Fransen, E., De Boulle, K., Reyniers, E., . . .
Schrander-Stumpel, C. (1994). MASA syndrome is due to mutations in the neural
cell adhesion gene L1CAM. Nature Genetics, 7(3), 408-413.
Wang, K., Zhang, H., Bloss, C. S., Duvvuri, V., Kaye, W., Schork, N. J., . . . Hakonarson,
H. (2010). A genome-wide association study on common SNPs and rare CNVs in
anorexia nervosa. Molecular Psychiatry, 16(9), 949-959.
Wang, X., Mitra, N., Secundino, I., Banda, K., Cruz, P., Padler-Karavani, V., . . . Rizzi,
E. (2012). Specific inactivation of two immunomodulatory SIGLEC genes during
human evolution. Proceedings of the National Academy of Sciences, 109(25),
9935-9940.
80
Wisner, A., Dufour, E., Messaoudi, M., Nejdi, A., Marcel, A., Ungeheuer, M. N., &
Rougeot, C. (2006). Human Opiorphin, a natural antinociceptive modulator of
opioid-dependent pathways. Proceedings of the National Academy of Sciences U
S A, 103(47), 17979-17984. doi: 10.1073/pnas.0605865103
Wong, A. (2010). Testing the effects of mating system variation on rates of molecular
evolution in primates. Evolution, 64(9), 2779-2785.
Xie, P., Tian, C., An, L., Nie, J., Lu, K., Xing, G., . . . He, F. (2008). Histone
methyltransferase protein SETD2 interacts with p53 and selectively regulates its
downstream genes. Cellular Signalling, 20(9), 1671-1678.
Yamanaka, S., Johnson, M. D., Grinberg, A., Westphal, H., Crawley, J. N., Taniike, M, . .
. Proia, R. L. (1994). Targeted disruption of the Hexa gene results in mice with
biochemical and pathologic features of Tay-Sachs disease. Proceedings of the
National Academy of Sciences, 91(21), 9975-9979.
Yamasaki, M., Thompson, P., & Lemmon, V. (1997). CRASH syndrome: mutations in
L1CAM correlate with severity of the disease. Neuropediatrics, 28(3), 175.
Yang, D. Y., Eng, B., Waye, J. S., Dudar, J. C., & Saunders, S. R. (1998). Technical note:
improved DNA extraction from ancient bones using silica-based spin columns.
American Journal of Physical Anthropology, 105(4), 539-543.
Yang, Z., & Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution
rates under realistic evolutionary models. Molecular Biology and Evolution,
17(1), 32-43.
81
Yokoyama, K., Tezuka, T., Kotani, M., Nakazawa, T., Hoshina, N., Shimoda, Y., . . .
Iwakura, Y. (2011). NYAP: a phosphoprotein family that links PI3K to WAVE1
signalling in neurons. The EMBO Journal, 30(23), 4739-4754.
Zhang, G., Pei, Z., Ball, E. V., Mort, M., Kehrer-Sawatzki, H., & Cooper, D. N. (2011).
Cross-comparison of the genome sequences from human, chimpanzee,
Neanderthal and a Denisovan hominin identifies novel potentially compensated
mutations. Human Genomics, 5(5), 453-484.
Zhang, H., Wong, C. C. L., Wei, H, Gilkes, DM, Korangath, P, Chaturvedi, P, . . .
Winnard, PT. (2011). HIF-1-dependent expression of angiopoietin-like 4 and
L1CAM mediates vascular metastasis of hypoxic breast cancer cells to the lungs.
Oncogene, 31(14), 1757-1770.
Zhang, J., Liu, L., Zhang, X., Jin, F., Chen, J., Ji, C., . . . Mao, Y. (2006). Cloning and
characterization of a novel human prefoldin and SPEC domain protein gene
(PFD6L) from the fetal brain. Biochemical Genetics, 44(1-2), 69-74. doi:
10.1007/s10528-006-9008-3
Zhang, R. (2013). MNADK, a novel liver-enriched mitochondrion-localized NAD kinase.
Biology Open, 2(4), 432-438.
Zhao, M., Raingo, J., & Kavalali, E. T. (2011). Cc2d1a, a C2 domain containing protein
linked to nonsyndromic mental retardation, controls functional maturation of
central synapses. Journal of Neurophysiology, 105(4), 1506-1515.
82
APPENDIX A: MALE REPRODUCTIVE GENES
Gene Name and
Short
Description
ZNF645 zinc
finger protein 645
CYLC1 cylicin,
basic protein of
sperm head
cytoskeleton 1
CYLC1 cylicin,
basic protein of
sperm head
cytoskeleton 1
TSSK3 testisspecific serine
kinase 3
SPATA6
spermatogenesis
associated 6
33
I, aliphatic,
nonpolar
Denisovan
Amino Acid
Variant and
Biochemical
Characteristics
N, neutral,
polar
(uncharged)
252
G, aliphatic,
nonpolar,
ambivalent
R, basic, polar
(positively
charged)
235
E, acidic,
polar, charged
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
R, basic, polar
(positively
charged)
D, acidic,
polar, charged
T, nonaromatic
hydroxyl, polar
uncharged,
hydrophilic
Q, neutral,
polar
(uncharged)
248
V, aliphatic,
nonpolar,
hydrophobic
568
V, aliphatic,
nonpolar,
hydrophobic
Protein
Position
491
267
ODF2L outer
dense fiber of
sperm tails 2-like
SPAG16 spermassociated
antigen 16
PHF7 PHD finger
protein 7
359
MORC1 MORC
family CW-type
zinc finger 1
SPATA16
spermatogenesis
associated 16
STPG2 sperm-tail
PG-rich repeat
containing 2
SPATA5
spermatogenesis
associated 5
[Source:HGNC
Symbol;Acc:1811
9]
927
Human
Amino Acid
Variant and
Biochemical
Characteristics
R, basic, polar
(positively
charged)
L, aliphatic,
nonpolar,
hydrophobic
292
V, aliphatic,
nonpolar,
hydrophobic
R, basic, polar
(positively
charged)
264
A, aliphatic,
nonpolar,
hydrophobic,
ambivalent
722
C, sulfurcontaining,
polar
(uncharged)
423
SPEF2 sperm
flagellar 2
D, acidic,
polar, charged
M, sulfurcontaining,
nonpolar,
hydrophobic
C, sulfurcontaining,
polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
M, sulfurcontaining,
nonpolar,
hydrophobic
Q, neutral,
polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic,
interior
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
SIFT
Prediction
of
Functional
Effect
PolyPhen
Prediction
of
Functional
Effect
Condel
Prediction
of
Functional
Effect
deleterious
possibly
damaging
deleterious
Secondary
Structure of
Locus
Change not
located within
major motif.
unknown
Changes not
located within
major motifs.
unknown
Changes not
located within
major motifs.
deleterious
benign
neutral
Change located in
a protein kinaselike domain.
tolerated
possibly
damaging
neutral
deleterious
possibly
damaging
deleterious
No domains/motifs
identified.
Change located in
a domain that
shows homology to
INCA. Locus
moderately
conserved.
deleterious
probably
damaging
deleterious
Change located in
a WD40 repeat
region.
tolerated
tolerated
neutral
benign
tolerated
deleterious
probably
damaging
deleterious
deleterious
benign
neutral
deleterious
probably
damaging
deleterious
tolerated
benign
neutral
deleterious
benign
neutral
Change not
located within
major motif.
Zinc finger CWtype coiled-coil
domain.
No domains/motifs
identified.
Change not
located within
major motif.
Change is not
found within a
major motif. Entire
protein is an AAAfamily ATPase.
Change located ina
P-loop containing
nucleoside
triphosphate
hydrolase domain.
83
SPATA9
spermatogenesis
associated 9
CATSPER3
cation channel,
sperm associated
3
179
I, aliphatic,
nonpolar
394
K, basic, polar
(positively
charged)
ZNF165 zinc
finger protein 165
61
NME8
NME/NM23 family
member 8
SPAM1 sperm
adhesion
molecule 1 (PH20 hyaluronidase,
zona pellucida
binding)
336
R, basic, polar
(positively
charged)
105
I, aliphatic,
nonpolar
465
N, neutral,
polar
(uncharged)
ADAM18 ADAM
metallopeptidase
domain 18
SPATC1
spermatogenesis
and centriole
associated 1
ODF2 outer
dense fiber of
sperm tails 2
CATSPER1
cation channel,
sperm associated
1
MTL5
metallothioneinlike 5, testisspecific (tesmin)
AKAP3 A kinase
(PRKA) anchor
protein 3
724
P, cyclic,
nonpolar
R, basic, polar
(positively
charged)
371
R, basic, polar
(positively
charged)
75
27
P, cyclic,
nonpolar
87
N, neutral,
polar
(uncharged)
CCDC65 coiledcoil domain
containing 65
473
AKAP11 A kinase
(PRKA) anchor
protein 11
AKAP11 A kinase
(PRKA) anchor
protein 11
REC8 REC8
homolog (yeast)
P, cyclic,
nonpolar
903
1420
546
I, aliphatic,
nonpolar
R, basic, polar
(positively
charged)
E, acidic,
polar, charged
H, basic,
polar,
positively
charged
V, aliphatic,
nonpolar,
hydrophobic
T, nonaromatic
hydroxyl, polar
uncharged,
hydrophilic
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
C, sulfurcontaining,
polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
H, basic, polar,
positively
charged
K, basic, polar
(positively
charged)
C, sulfurcontaining,
polar
(uncharged)
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
T, nonaromatic
hydroxyl, polar
uncharged,
hydrophilic
T, nonaromatic
hydroxyl, polar
uncharged,
hydrophilic
H, basic, polar,
positively
charged
K, basic, polar
(positively
charged)
Q, neutral,
polar
(uncharged)
deleterious
benign
neutral
No domains/motifs
identified.
deleterious
benign
deleterious
tolerated
benign
neutral
tolerated
possibly
damaging
deleterious
tolerated
benign
neutral
tolerated
benign
neutral
Change is found in
a voltage-gated
cation channel
domain.
Change located in
a SCAN domain.
Locus of change is
not very
conserved.
Change located in
a nucleoside
diphosphate kinase
domain.
Change located in
an aldolase-type
TIM barrel. The
domain belongs to
the glycoside
hydrolase
superfamily.
Change located in
a disintegrin
domain. The
location is slightly
conserved.
deleterious
probably
damaging
deleterious
No domains/motifs
identified.
tolerated
probably
damaging
neutral
deleterious
benign
neutral
No domains/motifs
identified.
Change is found in
a voltage-gated
cation channel
domain.
tolerated
benign
neutral
Change not
located within
major motif.
tolerated
probably
damaging
neutral
Entire protein is an
A-kinase anchor.
tolerated
benign
neutral
No domains/motifs
identified.
tolerated
benign
neutral
Entire protein is an
A-kinase anchor.
tolerated
benign
neutral
Entire protein is an
A-kinase anchor.
deleterious
possibly
damaging
deleterious
Entire protein is an
SCC1/RAD21
family member.
84
ZP2 zona
pellucida
glycoprotein 2
(sperm receptor)
SPEM1 spermatid
maturation 1
SPATA32
spermatogenesis
associated 32
Q, neutral,
polar
(uncharged)
Q, neutral,
polar
(uncharged)
R, basic, polar
(positively
charged)
tolerated
possibly
damaging
neutral
E, acidic,
polar, charged
tolerated
benign
neutral
Change located in
a zona pellucida
domain. Locus not
very conserved.
Change not
located within
major motif.
D, acidic,
polar, charged
deleterious
benign
neutral
No domains/motifs
identified.
tolerated
probably
damaging
deleterious
287
E, acidic,
polar, charged
T, nonaromatic
hydroxyl, polar
uncharged,
hydrophilic
R, basic, polar
(positively
charged)
deleterious
probably
damaging
deleterious
345
R, basic, polar
(positively
charged)
deleterious
probably
damaging
deleterious
140
P, cyclic,
nonpolar
W, aromatic,
nonpolar
C, sulfurcontaining,
polar
(uncharged)
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
133
G, aliphatic,
nonpolar,
ambivalent
412
86
383
SPATA20
spermatogenesis
associated 20
716
TEX14 testis
expressed 14
THEG theg
spermatid protein
ODF3L2 outer
dense fiber of
sperm tails 3-like
2
N, neutral,
polar
(uncharged)
tolerated
benign
neutral
D, acidic,
polar, charged
tolerated
probably
damaging
deleterious
F, aromatic,
nonpolar,
hydrophobic
tolerated
benign
neutral
Change located in
a sperm-tail PGrich repeat domain.
This is a
semenogelin/semin
al vesicle secretory
protein. The locus
of the change is
highly conserved,
and the substituted
amino acid does
not match the
profile for a
semenogelin/semin
al vesicle secretory
protein.
This is a
seminogelin/semin
al vesicle secretory
protein. The locus
of the change is
slightly conserved,
and the substituted
amino acid does
not match the
profile for a
seminogelin/semin
al vesicle secretory
protein.
neutral
Change not
located within
major motif.
neutral
Change not
located within
major motif.
SEMG1
semenogelin I
SEMG2
semenogelin II
298
SPATA2
spermatogenesis
associated 2
WBP2NL WBP2
N-terminal like
25
276
L, aliphatic,
nonpolar,
hydrophobic
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
S, nonaromatic
hydroxyl, polar
(uncharged),
hydrophilic
Change not
located within
major motif.
Change located in
a protein kinase
catalytic domain.
Change located
within a testicular
haploid expressed
repeat domain.
R, basic, polar
(positively
charged)
L, aliphatic,
nonpolar,
hydrophobic
tolerated
tolerated
benign
benign
85
APPENDIX B: NEURONAL GENES
Gene Name and
Short
Description
Protein
Position
L1CAM L1 cell
adhesion
molecule
Denisovan
Amino Acid
Variant and
Biochemical
Characteristics
SIFT
Prediction
of
Functional
Effect
412
V, aliphatic,
nonpolar,
hydrophobic
35
A, aliphatic,
nonpolar,
hydrophobic,
ambivalent
G, aliphatic,
nonpolar,
ambivalent
150
R, basic, polar
(positively
charged)
H, basic, polar,
positively
charged
449
H, basic, polar,
positively
charged
L, aliphatic,
nonpolar,
hydrophobic
tolerated
509
R, basic, polar
(positively
charged)
W, aromatic,
nonpolar
554
A, aliphatic,
nonpolar,
hydrophobic,
ambivalent
S, non-aromatic
hydroxyl, polar
(uncharged),
hydrophilic
EPHA10 EPH
receptor A10
PCSK9 proprotein
convertase
subtilisin/kexin
type 9
HMCN1
hemicentin 1
NAV1 neuron
navigator 1
12
SCN3A sodium
channel, voltagegated, type III,
alpha subunit
1935
I, aliphatic,
nonpolar
V, aliphatic,
nonpolar,
hydrophobic
R, basic, polar
(positively
charged)
V, aliphatic,
nonpolar,
hydrophobic
PolyPhen
Prediction
of
Functional
Effect
Condel
Prediction
of
Functional
Effect
Secondary Structure
of Locus
Change located within
an immunoglobin I-set
domain.
L, aliphatic,
nonpolar,
hydrophobic
CHD5
chromodomain
helicase DNA
binding protein 5
SLC5A7 solute
carrier family 5
(choline
transporter),
member 7
Human Amino
Acid Variant
and
Biochemical
Characteristics
tolerated
benign
neutral
tolerated
unknown
Change not located
within a major
motif/domain.
tolerated
probably
damaging
neutral
Change located within
an ephrin receptor
ligand binding domain.
benign
neutral
Change located in a
proprotein convertase
subtilisin/kexin
domain.
deleterious
probably
damaging
deleterious
Change located in an
IG-like domain.
deleterious
possibly
damaging
deleterious
Change not located
within a major
motif/domain.
tolerated
tolerated
benign
Change located in a
signal-peptide domain.
Entire protein is a
sodium/solute
symporter.
benign
Entire protein is a
sodium channel
protein type 3 subunit
alpha.
neutral
86
MAP2
microtubuleassociated protein
2
374
A, aliphatic,
nonpolar,
hydrophobic,
ambivalent
V, aliphatic,
nonpolar,
hydrophobic
406
K, basic, polar
(positively
charged)
V, aliphatic,
nonpolar,
hydrophobic
366
I, aliphatic,
nonpolar
V, aliphatic,
nonpolar,
hydrophobic
P, cyclic,
nonpolar
V, aliphatic,
nonpolar,
hydrophobic
1048
S, non-aromatic
hydroxyl, polar
(uncharged),
hydrophilic
V, aliphatic,
nonpolar,
hydrophobic
1259
G, aliphatic,
nonpolar,
ambivalent
V, aliphatic,
nonpolar,
hydrophobic
1090
T, non-aromatic
hydroxyl, polar
uncharged,
hydrophilic
V, aliphatic,
nonpolar,
hydrophobic
1645
E, acidic, polar,
charged
V, aliphatic,
nonpolar,
hydrophobic
27
Q, neutral, polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
MAP2
microtubuleassociated protein
2
CNTN6 contactin
6
KCNH8 potassium
voltage-gated
channel,
subfamily H (eagrelated), member
8
980
KCNH8 potassium
voltage-gated
channel,
subfamily H (eagrelated), member
8
SETD2 SET
domain containing
2
NISCH nischarin
ROBO1
roundabout, axon
guidance
receptor, homolog
1 (Drosophila)
GAP43 growthassociated protein
43
deleterious
deleterious
benign
benign
neutral
Change located in a
MAP2/Tau projection
domain.
neutral
Change located in a
MAP2/Tau projection
domain.
benign
Change located in an
immunoglobin, C-2
type/IG-like domain.
tolerated
benign
neutral
Change located within
a potassium voltagegated channel
subfamily H domain.
deleterious
probably
damaging
deleterious
Change not located
within a major
motif/domain.
neutral
Change not located
within a major
motif/domain.
deleterious
Change not located
within a major
motif/domain.
neutral
Change not located
within a major
motif/domain.
tolerated
deleterious
benign
deleterious
possibly
damaging
deleterious
benign
87
REST RE1silencing
transcription factor
885
GDNF glial cell
derived
neurotrophic
factor
79
KCNMB1
potassium large
conductance
calcium-activated
channel,
subfamily M, beta
member 1
10
SLC22A3 solute
carrier family 22
(extraneuronal
monoamine
transporter),
member 3
T, non-aromatic
hydroxyl, polar
uncharged,
hydrophilic
D, acidic, polar,
charged
K, basic, polar
(positively
charged)
V, aliphatic,
nonpolar,
hydrophobic
V, aliphatic,
nonpolar,
hydrophobic
V, aliphatic,
nonpolar,
hydrophobic
tolerated
deleterious
tolerated
benign
probably
damaging
benign
neutral
Change not located
within a major
motif/domain.
Change located in a
glial cell line-derived
neurotrophic factor
domain.
deleterious
neutral
Change located within
a signal-peptide
domain. Protein is a
calcium-activated
potassium channel,
beta subunit.
192
V, aliphatic,
nonpolar,
hydrophobic
V, aliphatic,
nonpolar,
hydrophobic
tolerated
probably
damaging
deleterious
Change located within
a transmembrane
region of a major
facilitator superfamily
domain. Also located
within a sugar (and
other0 transporter
domain.
296
P, cyclic,
nonpolar
V, aliphatic,
nonpolar,
hydrophobic
tolerated
benign
neutral
No motifs/domains
identified.
neutral
Most of protein,
including change
locus, identified as a
GDNF family receptor
2 domain.
neutral
Change found in a
copper type II
ascorbate-dependent
monooxygenase
domain, and/or a
PHM/PNGase F
domain.
deleterious
Change located within
a choline/carnitine oacyltransferase
domain.
NYAP1 neuronal
tyrosinephosphorylated
phosphoinositide3-kinase adaptor
1
GFRA2 GDNF
family receptor
alpha 2
399
I, aliphatic,
nonpolar
DBH dopamine
beta-hydroxylase
(dopamine betamonooxygenase)
508
D, acidic, polar,
charged
V, aliphatic,
nonpolar,
hydrophobic
Q, neutral, polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
CHAT choline Oacetyltransferase
158
V, aliphatic,
nonpolar,
hydrophobic
tolerated
tolerated
deleterious
benign
benign
possibly
damaging
88
PPFIBP2 PTPRF
interacting protein,
binding protein 2
(liprin beta 2)
345
T, non-aromatic
hydroxyl, polar
uncharged,
hydrophilic
V, aliphatic,
nonpolar,
hydrophobic
350
E, acidic, polar,
charged
V, aliphatic,
nonpolar,
hydrophobic
151
D, acidic, polar,
charged
V, aliphatic,
nonpolar,
hydrophobic
1399
A, aliphatic,
nonpolar,
hydrophobic,
ambivalent
V, aliphatic,
nonpolar,
hydrophobic
1389
G, aliphatic,
nonpolar,
ambivalent
V, aliphatic,
nonpolar,
hydrophobic
1329
N, neutral, polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
905
M, sulfurcontaining,
nonpolar,
hydrophobic
V, aliphatic,
nonpolar,
hydrophobic
579
C, sulfurcontaining,
polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
133
V, aliphatic,
nonpolar,
hydrophobic
V, aliphatic,
nonpolar,
hydrophobic
107
L, aliphatic,
nonpolar,
hydrophobic
V, aliphatic,
nonpolar,
hydrophobic
PPFIBP2 PTPRF
interacting protein,
binding protein 2
(liprin beta 2)
NAV2 neuron
navigator 2
ARHGAP32 Rho
GTPase activating
protein 32
ARHGAP32 Rho
GTPase activating
protein 32
ARHGAP32 Rho
GTPase activating
protein 32
ARHGAP32 Rho
GTPase activating
protein 32
NAV3 neuron
navigator 3
CLN6 ceroidlipofuscinosis,
neuronal 6, late
infantile, variant
HEXA
hexosaminidase A
(alpha
polypeptide)
tolerated
benign
tolerated
benign
tolerated
probably
damaging
deleterious
possibly
damaging
tolerated
tolerated
deleterious
deleterious
tolerated
tolerated
neutral
Entire protein is a LarInteracting protein
(LIP)-related protein.
neutral
Entire protein is a LarInteracting protein
(LIP)-related protein.
deleterious
Change located in a
calponin homology
domain.
deleterious
Change not located
within any major
motifs.
benign
Change not located
within any major
motifs.
benign
Change not located
within any major
motifs.
benign
neutral
Change not located
within any major
motifs.
neutral
Change not located
within a major
motif/domain.
neutral
No major
motifs/domains
identified.
benign
benign
benign
Change located in a
glycosyl-hydrolase
family 20, domain 2.
89
RAI1 retinoic acid
induced 1
762
P, cyclic,
nonpolar
V, aliphatic,
nonpolar,
hydrophobic
156
Q, neutral, polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
CC2D1A coiledcoil and C2
domain containing
1A
CDK5RAP1 CDK5
regulatory subunit
associated protein
1
330
N, neutral, polar
(uncharged)
V, aliphatic,
nonpolar,
hydrophobic
314
A, aliphatic,
nonpolar,
hydrophobic,
ambivalent
V, aliphatic,
nonpolar,
hydrophobic
428
K, basic, polar
(positively
charged)
V, aliphatic,
nonpolar,
hydrophobic
NEFH
neurofilament,
heavy polypeptide
NEFH
neurofilament,
heavy polypeptide
tolerated
benign
tolerated
possibly
damaging
tolerated
deleterious
tolerated
benign
neutral
Change not located
within a major
motif/domain.
neutral
Change located in a
domain of unknown
function.
neutral
Change located within
a (dimethlyallyl) tRNA
methylthiotransferase
miaB domain. Region
of locus also identified
as a radical SAM
domain.
unknown
Change located within
an intermediate
filament protein
domain.
unknown
Change not located
within a major
motif/domain.