Download ESM 2 - Springer Static Content Server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Helitron (biology) wikipedia , lookup

Microsatellite wikipedia , lookup

Mitochondrial Eve wikipedia , lookup

Transcript
Supplementary information: potential question for students with answers
Supplementary information Table 2 describes the published HVR1 sequence data that will be used in
these questions. For ancient DNA samples, the approximate age of the fossil is listed, for extant species,
an estimate of MRCA with humans is listed. The data will be analyzed in DNAsp [1] and is available in
nexus format in “Supplementary information data 2”. Using any available data prepare narrative
responses to each of the following (one page limit for each, including all figures and tables):
Question 1.
How diverse is our class in terms of mtDNA variation?
Hint: one of the easiest measures of DNA diversity to conceptualize is the average number of pairwise
differences between sequences, often abbreviated k. To give you an idea, k within populations often
ranges from over 20 to nearly zero. Most species of animals with large population sizes (not those near
extinction!) are more diverse than humans (this includes all other great apes!). Another form of
measurement which is independent of sequence length is the average number of pairwise differences per
site (π or Pi) which can be calculated dividing k by the length of the sequence.
Answer: In our class data k was 6.16 and Pi was 0.017
Question 2.
Based on our data do you think it is likely that modern humans and Neanderthals are a single species?
Do our data prove that humans and Neanderthals did/did not interbreed? As a group, how divergent are
Neanderthals from modern humans? How does this compare to the maximum level of divergence
within Neanderthals and modern humans?
Hint: Human and Neanderthal mtDNA are exclusively monophyletic (i.e. belong to their own clades).
Answer: There is no “cut off” to define species. In our dataset humans and Neanderthals had 0.074
nucleotide substitutions per site (compare to 0.017 within humans and to 0.18 for humans and chimps).
Neanderthal mtDNA fall outside the human mtDNA clade and is quite divergent, thus according to
mtDNA data alone there is no evidence they interbred. Across the genome, however, humans and
Neanderthals are known to be very similar and would probably fall closer to subspecies category rather
than full species, they were still capable to hybridizing and producing fertile offspring [2]. The lack of
evidence for interbreeding in mtDNA could be either a gene sorting phenomena (Neanderthal mtDNA
was lost due to drift or selection) or maybe the hybridizing females tended to be human (unidirectional
gene flow).
Question 3.
It has been estimated that humans mtDNA share a common ancestor with Neanderthals at approximately
825,000 years ago [2]. Assuming this to be correct, what is the substitution rate (μ) of HVR1? (What is
the difference between substitution rate and mutation rate and how are they related to reach other? )
1
Consider that μ=pi/2t, where t is the time of divergence and K(JC) is the equivalent of pi for divergence
between populations. PS: Use only extant humans in this (ancient humans will have a shorter branch
length)! What is the problem with this estimate? What are the assumptions about t and branch lengths?
Will your estimate likely underestimated or overestimated?
Answer: The problem with this estimate is that Neanderthals are extinct thus t is not the same as t for
humans (it is shorter since they stopped “evolving”). The estimate is thus an underestimate (µ should
be larger had Neanderthals had the full “t” to accumulate mutations!). In our sample µ=4.48×10-8
(substitutions per site per year). However, because the DNA from Neanderthals is fossil DNA, its
quality will be lower and sequence accuracy could be compromised. Students should understand that
while mutation rate refers to the rate at which mutations arise, substitution rate refers to the rate at
which they get fixed in the population. Under the neutral theory the rate of substitution is equal to the
rate of mutation and the molecular clock is observed.
Question 4.
The Lake Mungo 3 (LM3) fossil is the remains of an anatomically modern human (i.e. belonging to the
species H. sapiens) that was discovered in a 60,000 year old stratum in the New South Wales region of
southern Australia [3]. This fossil represents the oldest anatomically modern human for which DNA has
been isolated. Despite being a member of our species, LM3 has an mtDNA haplotype that is no longer
present in the modern human gene pool (so far as we know). How similar is LM3’s mtDNA to that of
extant humans (in terms of π or K(JC))? Using HVR1 data as a molecular clock (and the substitution
rate we determined in class), estimate the time of mtDNA divergence between LM3 and extant modern
humans. Why is there a discrepancy between the age of the LM3 fossil (60,000 years) and the HVR1
divergence time? If so why would that be? Given that LM3 was a modern human who likely
exchanged genes with other modern humans, why is his mtDNA so different from that found in extant
humans? Develop a hypothetical evolutionary scenario (involving genetic drift, migration, natural
selection, or some combination of these factors) that can account for the failure of LM3’s mtDNA
haplotype to be found in extant humans. Draw trees! Think in terms of last common ancestor and
assumptions on the calculations!
Answer: The lineage of LM3 is extinct (for whatever reason nothing similar exists in current human
populations) – thus the estimate is actually an estimate of humans to the MRCA with LM 3, not to the
actual fossil itself (i.e. fossil age is not an accurate description of MRCA). In our samples t=393,770,
which pre-dates the extant human coalescence time (time to mitochondrial eva). Furthermore there will
also be a problem with “t”, since LM3 is has died long ago, it does not have the same branch length as
extant humans.
Question 5.
Is the accuracy of a molecular clock scale-dependent? To answer this question use available sequence
data, together with information in Table 2, to make several independent estimates of the substitution rate
of HVR1 (to avoid problem above only use species for this purpose, not ancient humans). Is your
estimate uniform, regardless of the calibration point? What is the relationship between divergence time
and your estimate of HVR1 substitution rate (support your answer with an appropriate figure)? Keeping
in mind that HVR1 is one of the most quickly mutating portions of the genome, provide a hypothesis
2
that may explain the patterns you observe. What are the implications of your observations for the use of
molecular clocks in general?
Hint: if you use “Analysis > Polymorphism and Divergence…” function on DNAsp you will get two
estimates of divergence (K) and (K(JC)), they are equivalent to Pi within species (that is substitution per
site). The first is uncorrected and the second is corrected for multiple substitutions.
Answer: Neanderthals to humans have a very high substitution rate that cannot be explained by shorter
branch length in Neanderthals – it is possible that this is a consequence of lower quality of fossil DNA.
However, even discounting the Neanderthal comparison, substitution rates gets lower with time. This is
because more than one substitution might be accumulating per site (breaking the assumption of infinite
sites model). This may cause reversals and older mutations to be “erased” by more recent mutations
thus the substitution rate to be underestimated (counted as one change rather than two). Using the
corrected K(JC) decreases this problem but does not solve it. Finally there is the complicating factor of
generation time (which is not really apparent in this dataset, but could be raised as a potential
problem). The way the clock is being calculated is per on a year basis, so if one organism reproduces
faster than another it could be evolving faster. Although the molecular clock is weakly dependent on
generation time, Ohta [4] found that synonymous sites follow a per-generation time scale while nonsynonymous site tend to follow a per-year time scale. One reason for that is that organisms with short
generation time tend to be more abundant (higher effective population sizes) and thus have a more
effective negative selection (Ns<<1), as the HVRI is non-coding it would be more affected by generation
time.
Question 6.
An under-appreciated aspect of chimpanzees is that there are actually two distinct species of them. The
common chimpanzee (Pan troglodytes) was historically widespread across east, central and west Africa,
although its range has been severely limited by habitat destruction and poaching. The bonobo, or pygmy
chimpanzee (Pan paniscus), is morphologically and behaviorally distinct from the common chimpanzee.
Its historical geographic range is limited to a small area of the Congo basin in central Africa. Like the
common chimpanzee, the bonobo is critically endangered. Based on the HVR1 data how long has it
been since common chimpanzees and bonobos shared a common ancestor? Based on the results of
question above, how much confidence do you have in this estimate? How could you change your
experimental approach to improve the confidence in your estimate of divergence time between these
species? In applying a molecular clock that was calibrated from human data to chimpanzees, what
additional assumptions are you making about the nature of the clock?
Answer: Using Neanderthal human substitution rate gives a divergence time of ~1.6 million years
t=K(JC)/(µ×2). Using human-Chimp substitution rates gives a divergence time to ~4.6 million years.
The first is an underestimate because substitution rate is likely to be inflated (due to low quality fossil
DNA) and the second is an overestimate since substitution rate is likely underestimated (due to
substitution accumulation). The actual divergence time is about 2 million years [5].
Question 7.
3
Using any available sequence data (class data, extant human data, ancient DNA data, inter-specific data,
or other published data that you track down yourself) test an evolutionary hypothesis of your own
choosing.
Answer: Answers will vary. Population demography hypothesis can be evaluated with Tajima’s D or Fu
and Li’s tests. In our data these tests they were not significant, but Tajima’s D was negative (-2.15)
indicating an excess of low frequency polymorphism and evidence of population expansion. Students can
also look at genetic diversity in different continents. Although this test would be highly biased (and
technically not acceptable), since we do not have a random sample for each continent, Africa is still the
continent with the highest diversity.
Question 8.
You have gotten information on the haplotype group that your analyzed DNA (and others in class)
belongs to. Are human haplotypes fixed within populations? How about continents? How can you use
mtDNA information to trace migratory paths of ancient humans? How certain can you be about the
geographic origin of your haplotype? (PS: you can choose to talk about any haplotype analyzed in class,
not necessarily the one you analysed).
If you run a more complex, model based tree searching method (e.g. Bayesian analysis), it is likely
that many of the sequences will lose resolution (that is, will become polytomic with many groups).
Why is this happening?
Answer: Here the goal is to evaluate the students understanding of phylogenetic results and make them
aware that haplotypes are not fixed within a population although some (although not all) could be
restricted to given continents. They should understand how migration patterns can be inferred from
phylogenetic/phylogeographic data.
The loss of resolution observed in model based tree searches in relation to distance matrix methods can
be explained by the high rate of change in HVR. Distance matrix methods are unable to account for
homoplasies as they just measure the distance between each pair of species, leaving out all information
from higher-order combinations of character states.
References:
1.
2.
3.
4.
5.
Librado, P. and J. Rozas, DnaSP v5: a software for comprehensive analysis of DNA
polymorphism data. Bioinformatics, 2009. 25(11): p. 1451-2.
Green, R.E., et al., A Draft Sequence of the Neandertal Genome. Science, 2010. 328(5979): p.
710-722.
Adcock, G.J., et al., Mitochondrial DNA sequences in ancient Australians: Implications for
modern human origins. Proceedings of the National Academy of Sciences of the United States of
America, 2001. 98(2): p. 537-542.
Ohta, T., Synonymous and Nonsynonymous Substitutions in Mammalian Genes and the Nearly
Neutral Theory. Journal of Molecular Evolution, 1995. 40(1): p. 56-63.
Yang, Z.H. and B. Rannala, Bayesian estimation of species divergence times under a molecular
clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution,
2006. 23(1): p. 212-226.
4