Download Bio 113/244 Problem Set #1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expanded genetic code wikipedia , lookup

Gene wikipedia , lookup

Hybrid (biology) wikipedia , lookup

DNA barcoding wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Frameshift mutation wikipedia , lookup

Mutation wikipedia , lookup

Human genetic variation wikipedia , lookup

Genome evolution wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Genetic drift wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Bio 113/244 Problem Set #1
1)
Imagine that DNA contained 7 nucleotides instead of 4. Derive the JukesCantor correction for this situation. Draw the graph of the dependence of the
probability of starting with one nucleotide and ending with the same one
(Pr[X,X]) after a time (t). Draw a similar graph for Pr[not-X,X].
2)
Substitutions between leucine and isoleucine are much more common in
proteins than substitutions between leucine and serine. How would you explain
this observation? Consider all possibilities and suggest how you would distinguish
among them.
3)
Noncoding DNA in a newly-sequenced microbe is 50% GC rich and has
roughly the following proportions of dinucleotides of the form (CpX): CpA –
10%, CpT-20%, CpG-50%, CpC-20%. Provide an explanation for this pattern.
(Hint: Compare it to the situation in the mammalian genome).
4)
Derive equation #3.27 from Graur and Li (Hint: instead of proceeding as
in the one-parameter model, first derive P (= probability of two sequences
differing by a transition) and Q (= probability of two sequences differing by a
transversion), decide what K (= number of substitutions per site since time of
divergence) should be and rearrange to get K in terms of P and Q).
5)
Suppose you have the following data on genotype frequencies for the
ABO blood groups:
Genotype
Phenotype
OO
0.4
OA
0.3
AA
0.08
OB
0.12
BB
0.04
AB
0.06
Calculate the genotype frequencies for the next generation assuming random
mating.
6)
In Margaret Atwood’s novel Oryx and Crake, the character Crake designs
a new species of human being, bereft of the flaws in Homo sapiens responsible
for hunger, disease, and war. Crake substitutes his creation for normal humans by
designing a Homo sapiens specific virus that he releases worldwide. Suppose this
new species starts with a population size of 100 that by design doubles every year
for ten years and then remains constant in size.
a)
Assuming the Wright-Fisher model, what is the variance effective
population size (Ne) during the tenth year?
b)
Suppose Crake designed the new species to have extremely high fidelity
DNA replication. If the observed total heterozygosity is 2.2 x 10-9, did Crake
succeed based on the estimated mutation rate?
7)
The following DNA sequences come from an orthologous locus in two
cosmopolitan species of fly, Drosophila simulans and Drosophila melanogaster.
Only one sequence has been sampled from the melanogaster population, but six
sequences have been sampled from different individuals in the simulans
population.
mel
TGAGTTTTTGCCACGAGATAGCAAAGTGGTCATTATTCTT
sim1
sim2
sim3
sim4
sim5
sim6
TGAGTTTTTGCTACGAGATGGCAATGTGGTCATTATTCTT
TGAGTTTTTGCCACGAGATGGCAATGTGGTCATTATTCTT
TGAGTTTTTGCCACGAGATGGCAATGTGGTCATTATTCTT
TGAGTTTTTGCCACGAGATGGCAAAGTGGTCATTATTCTT
TGAGTTTTTGCCACGAGATGGCAAAGTGGTCATTATTCTT
TGAGTTTTTGCCACGAGATGGCAAAGTGGTCATTATTCTT
a) Assume that the left end of the sequences is the 5' end. Could these
sequences code for part of a protein?
b) How many segregating sites are there in the D. simulans sample?
c) Assume that the effective population size for D. simulans is 10^6. Estimate
the mutation rate based on the simulans sample.
d) How many sites diverge between mel and sim1?
e) Assume that the mutation rate you calculated in (c) has remained constant
since the two species split. Assume further that these species reproduce ten times
per year. Estimate how long ago the species split, using:
i) the raw divergence between mel and sim1,
ii) the Jukes-Cantor correction, and
iii) the Kimura correction?
f) How do your answers change if you repeat the calculations in (e), but
between mel and sim2? What does this mean?
g) According to the best evidence available, D. simulans and D. melanogaster
diverged around 3 million years ago. Is this consistent with your estimates for the
divergence time?
8)
Polymorphism or Divergence? The following terms are associated with
polymorphism, divergence, or both. In each case, argue for one of the three
options. (Example: The term 'between-species' is probably more strongly
associated with divergence than polymorphism. After all, divergence measures
the number of changes in a locus between species.)
between-species
segregating site
fixation
nucleotide diversity
natural selection
random genetic drift
within-population
heterozygosity
mutation
genetic diversity
substitution
effective population size
9)
Write a computer simulation of genetic drift. Start with a population of N
asexual types, all different. Produce a new population by random sampling, and
continue until all are identical. Use the program to check the statement that the
expected time to fixation of one of the types is 2N generations, with a standard
deviation a little greater than N. Then extend your simulation to include diploid
types, and check the statement again.
10)
Middle-Earth has two distinct populations of elves: those of Lothlorien,
and those of Mirkwood. The woods of Lothlorien are much better suited to elfhabitation than those of Mirkwood, so there are twice as many elves in the former
as in the latter. These communities have been isolated from each other for a long
enough time that allele frequencies at many loci have changed, but they are still
fully capable of interbreeding, and therefore must be considered a single species.
Each population has only 2 alleles at the disposition locus (congenial or arrogant),
but the frequencies of the alleles are not the same in the two populations. Each
population is initially in Hardy-Weinberg equilibrium with itself. Now that
the Adversary has been defeated, the communities merge on their way back to the
Elf Havens across the Sea.
a)
Calculate the change in the proportion of homozygotes between the P1
generation where the populations merge, and the F1 generation, assuming that
mating is now random within the entire Elven community.
b)
Does the sign of the change in the proportion of homozygotes depend on
whether the larger or smaller population initially had a higher incidence of
arrogance? Explain your answer.
11)
The peppered moth Biston Betularia can be one of two colors, white or
dark brown. A single locus with two alleles is responsible for determining the
body color phenotype. Allele ‘M’ is dominant to ‘m’, and its presence leads to a
greater production of melanin that darkens the moth’s body color.
An extremely large population of the peppered moth has thrived in a forest of
dark brown and white barked trees for many centuries. 55% of the moths in this
population are white in color. Both white and dark brown moths can survive in
the forest because they are camouflaged against trees that match their body color.
When a fire sweeps though the forest, the benefits enjoyed by the white moths are
eliminated. While many of the moths and birds in the forest survive the blaze, the
dark colored moths are now better camouflaged against the dark burnt bark of the
trees. The white moths are so poorly camouflaged that they are immediately
eaten by predatory birds. None of the white moths are ever able to reproduce.
NOTE: Assume Hardy-Weinberg assumptions for the following questions
including an infinite population size.
a)
What are P (MM), Q (Mm), R (mm), p (frequency of M) and q (frequency
of m) for the population of moths before the forest fire?
b)
Assuming both colors of moths survive the forest fire with equal
probability, what are P, Q, R, p and q for the first generation of moths after the
fire (at the time when they are born).
c) If white moths are destroyed generation after generation, find an equation for qt
in terms of qo.
d) After how many generations will q = .05? How long will it take until q is
exactly 0.
12)
A great catastrophe befalls Whoville; when Horton falls asleep, some of
the other residents of the Jungle of Nool (claiming that they are acting in the best
interests of the poor deranged elephant) boil the dust speck that contains the
Who's world. Only 1 male and 1 female Who survive the horrible calamity. Horton recovers the dust speck, and with the aid of caffeine and increased
vigilance, he protects the tiny dust speck long enough for the survivors to produce
a (F1) generation of 10 Whos. The ability for these Whos to produce substantial
noise is controlled by a simple 2-allele locus, and before the catastrophe the
incidence of the loud allele was .25, while the incidence of the quiet allele was
.75. Assuming that the Whos were in Hardy-Weinberg equilibrium before the
catastrophe,
a) Use a Wright-Fisher model to predict the probability of the quiet allele being
extinct in the F1 generation.
b) Use a Wright-Fisher model to predict the probability of the loud allele being
extinct in the F1 generation.
c) Identify the shortcomings of the Wright-Fisher model in this example (ie, what
might actually happen in reality that wouldn't be indicated in the Wright-Fisher
model). Give an example in which these shortcomings could substantially change
the probabilities calculated in the parts b and c.
13)
Scientists discover a very basic form of life on Mars. The genetic system
is based on only four amino acids - X, Y, Z, and W - and two nucleotides – A and
B. Codons in this new system are only two nucleotides long and code for the
amino acids according to the following table.
Nuc
leoti
des
AA
AB
BA
BB
Amino
Acids
X
Y
Z
W
Assuming that a nucleotide changes to any other in a given time step t with
probability b, answer the following questions about the substitutions in a
particular “neutral” amino acid sequence.
HINTS:
•
for simplicity assume that only a single substitution can occur in a given
time step.
•
compare the problem to Kimura’s 2-parameter correction
a) What is the equation K(t) for the expected number of amino acid changes that
actually do occur after time t.
b) Find a correction for the number of changes that have occurred (K) in terms of
the number of observed amino acid substitutions (Hint: there are two types of
substitutions. Include both in the formula for K.)
14)
Elephant seals possess an interesting system of mating. One alpha-male
lies in the center of a harem of females and attempts to mate with as many of
these females as possible throughout the course of the mating season. The harem
is also surrounded by 5 beta-males, usually younger and smaller, who lie around
the harem and protect it from invasion by other males. When the alpha-male
starts to mate, the beta-males use his distraction to do some mating of their own
with the females at the outer edges of the harem. Other males, not alphas or
betas, stay in the water and are not allowed to mate at all.
One would think that the alpha male sires the most offspring in this situation,
however, recent genetic studies have found that beta-males as a group actually
have at least as much mating success as the alpha-male. From a typical harem
50% of the children are sired from beta males and 50% are sired from the alpha.
Given that there are 40 alpha-males, and 2000 females in the population,
assuming that each of the harems is exactly the same size, assuming that all betamales have exactly the same probability of having an offspring, and assuming that
all females are fertile and available for mating, calculate the effective population
size for this population of elephant seals.
NOTE: assume non-overlapping generations for simplicity.
15)
You find out that your favorite protein is translated from the following
mRNA:
5’ CUAUGGCAACAUCAUCAGCGGCA 3’
a) Write down the amino acid sequence of the protein if translation starts at the
first Met codon encountered in the mRNA sequence
b) Now assume that translation starts at the first Tyr codon. Write down the amino
acid sequence of the new protein.
c) Which protein should experience more amino acid changes per unit time if all
cytosines in your organism are methylated? Explain.
16)
Imagine that you have cloned two homologous genes in two species of
Drosophila. You sequence and align them. Below are the short parts of the
alignments of the expected mRNA sequences. The sequences are parsed into
expected codons (for instance, the first codon in the Drosophila simulans
sequence of Gene 1 is AUC)
Gene 1:
Drosophila sim:
Drosophila mel:
AUC-ACC-CAC-CAA-CAG-UUC-UGU-GCU
AUG-ACA-CAC-CAA-CGG-UUC-UGC-GAU
Gene 2:
Drosophila sim:
Drosophila mel:
ACA-GAU-GGU-CCU-CGC-GUG
ACA-CUU-AGU-AUU-CAC-GCA
a)
Above are protein sequences from two genes in two closely related
species, D.simulans and D.melanogaster. Assuming that the path with the fewest
nonsynonymous substitutions represents the true path between the two codons,
calculate Ka and Ks for both genes.
b) Would you be surprised to learn that both genes have no function at the level of
the protein? Explain.
c) Would you be surprised to learn that Gene 2 has been under strong selection to
change its protein sequence? Explain.
17)
You are given two sequences (50 bp each) of the homologous pseudogene
in two species of yeast. The alignment is shown below:
CCTCGACGGCTTAGATCTGATCTGACCTAATGCTGCAATCGTTACAAAGT
CCTCCACGAGTAAGAGTTGATCCGACTTAGTCCTGCGATCGTTAGATAAT
You know that these species last shared a common ancestor 10 MYA and that
both species go through 50 generations a year.
a) Using Jukes-Cantor model of nucleotide substitution, estimate mutation rate
per nucleotide per year in these two species of yeast. Assume that mutation rate is
the same in both species.
b) Do you believe that Jukes-Cantor model is appropriate in this case? Do you
see any evidence that you need to use Kimura 2-parameter model instead?
Explain.
c) You sample 10 alleles of this pseudogene different by origin in the population
of one of these species of yeast. You sequence the same 50 bp region in all 10
cases and find that there are 5 distinct alleles in the following proportions:
allele 1
allele 2
allele 3
allele 4
allele 5
6
1
1
1
1
Estimate the effective population size in this species of yeast.
18)
In Drosophila the rates of all transitions and transversions are all very
similar to each other except for the transition from C to T (or equivalently from G
to A). You know that the average GC content of neutral, entirely unconstrained
DNA in Drosophila is 34%. Estimate the relative probability of a C to T
transitions versus T to C transition.
19)
Your friend just sequenced the full genome of a new species of bacteria.
The GC content in this species is 50%. Your friend decides to calculate the
proportion of different kinds of nucleotide pairs (dinucleotides) and finds among
other things that C’s are followed by A’s 40% of the time while 60% of the time
they are followed by the other three nucleotides. This reminds you of a similar
pattern in the human genome. From what you know about methylation-dependent
deamination, advise your friends which other dinucleotide frequencies he needs to
look at and give qualitative predictions of what he should see.
20)
A molecular biologist discovers that human cells that express a particular
allele of a particular membrane protein do not get infected by the HIV virus.
Excitedly he measures the frequency of this allele in the human population and
discovers that it is present in approximately 10% of the population. He finds this
results surprising because only ~1% of the human population shows natural
immunity to HIV and not 10%. Knowing that you are taking a class in molecular
evolution, he asks you for help. What would you tell your friend? Is his result
consistent with the observations of natural immunity to HIV? Tell us in a few
sentences what you can surmise from this information.
21)
A neutral region of a Drosophila genome is 60% AT rich (60% of the
nucleotides are AT pairs and 40% are GC pairs). Assuming that the mutational
pressure is solely responsible for maintenance of this AT content, and that
mutation operates the same way on both DNA strands, fill in the missing parts of
the mutation frequency table.
From
To
A
T
G
C
A
x
T
G
x
0.15
x
C
0.06
0.08
0.06
x
22)
In the fly Drosophila mauritania, a transposable element called mariner
exists at a several loci in the genome and segregates neutrally. Each copy is
deleted from the genome with a frequency of .5 percent per generation. In a
population in which mariner is fixed at a specific locus, how many generations
will it take until the frequency of individuals that are homozygous for the deletion
is 5 percent?
23)
Imagine that you discover life on Mars and find that martian DNA
contains 5 different nucleotides.
a) Assuming that amino acids are still coded with triplets on Mars and that the
Mars universal genetic code contains 3 stop codons, what is the maximum
number of aminoacids potentially encoded by Martian genes
b) Derive the Jukes/Cantor correction applicable on Mars.
24)
You study two anonymous regions in the maize and rice genomes. You
find that in the first region (region A) 25% of nucleotide positions are different in
the two species and in the other one (region B) only 5% are different.
a) This result is entirely consistent with the neutral theory. Explain why. What
would Kimura infer about the biological difference between the two regions?
b) You proceed to study nucleotide polymorphism in these two regions. In a
sample of 10 maize alleles from locus A (2000 bp in length) you find 20
segregating sites. On the assumptions of the neutral theory, how many segregating
sites do you expect to observe in a sample of 5 alleles of the same length (2000
bp) in the region B?
25)
Use the coalescence approach to find the expected time to the common
ancestor of a very large sample (size n) in an even much larger diploid population
of the size N. The following fact should prove helpful: the sum(1/n(n-1)) is ~ 1 as
n becomes very large,
26)
You sample 5 DNA sequence alleles from a population and discover 10
segregating sites. On the assumptions of the neutral theory, how many more
segregating sites do you think you will observe if you sample 5 more alleles?
27)
The population of unicorns has the effective size that is 10 times greater
than the effective size of the leprechaun population. The generation time of
leprechauns is 10 times longer than generation time of unicorns. You observe that
there are 500 nucleotide differences between the sequences of a particular gene in
leprechauns and unicorns. Assuming that the neutral theory is correct, and that
mutation rate per generation is the same in both unicorns and leprechauns
determine
a) How many mutations were fixed in this gene since the common ancestor of
leprechauns and unicorns
b) How many mutations in this gene were fixed in the leprechaun populations
since the common ancestor? How many were fixed in the unicorn population
c) What the expected ratio of heterozygosities is in the unicorn and leprechaun
populations
28)
You manage a population of an endangered species that is in danger of
inbreeding depression (loss of genetic diversity). You monitor the population for
five years and find that the sex ratio and overall population size fluctuate as
shown below:
Year
Males/Female
Population Size
1
.6
200
2
.7
400
3
.5
100
4
.8
600
5
.6
200
A survey of genetic diversity at two loci reveals the following genotype
frequencies:
Locus 1
Genotype
A1A1
A1A2
A2A2
A1A3
A2A3
A3A3
Freq
.01
.1
.25
.08
.40
.16
Locus 2
Genotype
B1B1
B1B2
B2B2
Freq
.04
.72
.24
a) Do any of these loci appear to be under selection? (Explain how you can tell.)
b) Assuming your observation of the population's demographics over the last five
years is representative of its future, how long will it take for the genetic diversity
at these loci to decay to 90% of their current levels?
c) If you wanted to slow this process, would you be better served by stabilizing
the population size at 500 (with sex ratio continuing to fluctuate), or stabilizing
the sex ratio at 1?
29)
You sequence part of a pseudogene in five, randomly sampled
chromosomes:
AAGCTGGACT
AAGCTGGACT
AAGCTAGACT
GAAGCTAAGC
GAAGCTAAGC
GAAGCTAAGC
TATTACGACG
TATTGCGACG
TATTGCGACG
GCCATTACGA
GCCATTACGA
GCCATTGCGA
AAGCTCCGTT
AAGCTCCGTA
AAGCTCCGTA
AAGCTAGACC
AAGCTAGACC
GAAGCTAAGC
GAAGCTAAGC
TATTGCGACG
TATTGCGACG
ACCATTGCGA
ACCATTGCGA
AAGCTCCGTT
AAGCTCCGTT
a)
You know from other studies of this species that the mutation rate is
approximately 10-8 mutations/(site*generation). How large do you think the
population is?
b)*
Say your sample looked like this:
AAGCTGGACT
AAGCTGGACT
AAGCTAGACT
AAGCTGGACT
AAGCTGGACC
GAAGCTAAGC
GAAGCTAAGC
GAAGCTAAGC
GAAGCTAAGC
GAAGCTAAGC
TATTACGACG
TATTGCGACG
TATTGCGACG
TATTGCGACG
TATTGCGACG
GCCATTACGA
GCCATTGCGA
GCCATTGCGA
ACCATTGCGA
GCCATTGCGA
AAGCTCCGTT
AAGCTCCGTA
AAGCTCCGTT
AAGCTCCGTT
AAGCTCCGTT
What would be the topology (shape) of this sample's coalescent tree? What kind
of population history would result in this pattern? (Hint: Think about what
determines the probability of a coalescent event at any given time in the
population's history.)
30)* You discover a peculiar mating system in a plant species where 30% of
the individuals only self-fertilize and 70% of the individuals only cross-fertilize.
You also observe that both mating types produce equal numbers of offspring
plants and that a particular plant’s phenotype is completely independent of their
parent’s phenotype. Given this information,
a) Determine the effective population size for a population of 10,000 plants.
b) Can you find frequencies of self-fertilizers and cross-fertilizers that would
make Ne = 10,000? (That is change the 30 and 70%)
c) Instead of the mating system described above, assume all individuals self to
make 30% of their progeny and cross to make the other 70%? What would Ne be
in this case?
31)
Deleterious mutations can be maintained in a population even as selection
purges them from the population if they recur with some frequency. Equation
#3.9 in Gillespie shows the equilibrium frequency of such a deleterious allele
given incomplete dominance.
Derive the equilibrium frequency of a deleterious allele that is completely
recessive (i.e. h = 0) (hint: refer to Gillespie’s method on p. 70 but calculate Δq
instead).