Download Homework 1 - Berkeley MCB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA vaccination wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Chromosome wikipedia , lookup

Non-coding RNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of RNA biology wikipedia , lookup

RNA-Seq wikipedia , lookup

Deoxyribozyme wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Messenger RNA wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Point mutation wikipedia , lookup

Epitranscriptome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

NEDD9 wikipedia , lookup

Primary transcript wikipedia , lookup

Transcript
Quantitative Biology Module
Fall 2016
Homework 1: Biological Numeracy And A Feeling for
the Organism
Hernan G. Garcia
August 26, 2016
Homeworks in the Quantitative Biology Module
Over the course of the module, we will post a few homeworks. These will include several
problems aimed at cementing the skills and ideas developed during the course, as well as
expanding them. They will not be graded. However, we will post solutions in a passwordprotected website as well as set up office hours for discussion of of these problems or any
other topic from the lectures.
The objective of this homework is to get a feeling for the numbers in whatever problem you’re considering in biology. Just like you always need to check the units in your
calculations, a more subtle sanity check of your theoretical results stems from having some
expectation about the order of magnitude you will obtain.
A Feeling for the Numbers in Biology
In this section, we flex our quantitative muscles a little bit more to develop various estimated
regarding biological systems.
1 Protein Sequences: The Frances Arnold Estimate Problem
In a 2001 Bioengineering seminar at Caltech, Professor Frances Arnold made a startling
remark that it is the aim of the present problem to examine. The basic point is to try and
generate some intuition for the HUGE, ASTRONOMICAL number of ways of choosing
amino acid sequences. To drive home the point, she noted that if we consider a protein with
300 amino acids, there will be a huge number of different possible sequences.
(a) How many different sequences are there for a 300 amino acid protein?
But that wasn’t the provocative remark. The provocative remark was that if we took only
one molecule of each of these different possible proteins, it would take a volume equal to five
1
of our universes to contain all of these different distinct molecules.
(b) Estimate the size of a protein with 300 amino acids. Justify your result, but remember
it is an estimate. Next, find an estimate of the size of the universe and figure out whether
Frances was guilty of hyperbole or if her statement was on the money.
2 DNA Synthesis Over Your Lifetime
Estimate the total length of DNA your body will produce over your lifetime. Hint: Figure
out a way to estimate how often some of the cells in your body are replaced.
3 ATP Synthesis
Estimate the daily ATP synthesis by ATP synthases in a human. To do this estimate, imagine a typical human diet and hypothesize that roughly half of the caloric input is converted
into ATPs. What is the mean rate of ATP synthesis per cell in the human body resulting
from this estimate? If each ATP synthase synthesizes roughly 100 ATPs per second (see
Alberts, MBoC, chap. 14), what does this imply about the number of such ATP synthases
per cell? Note: the entirety of this question is to give us a feeling for the numbers and the
actual numbers might be substantially different because of cell type and physiology.
E. coli by the Numbers
4 Is There Enough Time to Replicate the Genome?
E. coli’s circular genome is about 5 million base pairs long. A special sequence called the
origin of replication recruits two replisomes that replicate DNA as they move in opposite
directions. Each replisome moves at a rate of about 200-1000 bp/s.
(a) How long does it take for these two replisomes to make a copy of the E. coli genome?
(b) How can E. coli manage to replicate as fast as every 20 minutes?
(Problem adapted from Moran et al., Cell 141:1262-1262 e1 (2010).)
5 E. coli in culture
(a) A saturated E. coli culture contains approximatelly 109 cells/ml. What’s the mass density
of such culture? What’s the mean spacing between cells?
(b) DNA replication in E. coli introduces 10−9 mutations/bp on each DNA strand. Estimate
the total number of mutations introduced in a saturated culture. Hint: Estimate the number
of cell divisions in the last round of replication before reaching saturation.
(Problem adapted from Moran et al., Cell 141:1262-1262 e1 (2010).)
6 Mind your media: carbon, nitrogen, and phosphate content of cells
Minimal growth medium for bacteria such as E. coli includes various salts with characteristic
concentrations in the mM range and a carbon source. The carbon source is typically glucose
2
and it is used at 0.5% (a concentration of 0.5 g/100 mL). For nitrogen, minimal medium
contains ammonium chloride (NH4 Cl) with a concentration of 0.1 g/100 mL.
(a) Make an estimate of the number of carbon atoms it takes to make up the macromolecular
contents of a bacterium such as E. coli. Similarly, make an estimate of the number of
nitrogens it takes to make up the macromolecular contents of a bacterium? What about
phosphate? Hint: You can use the Table 1.
(b) How many cells can be grown in a 5 mL culture using minimal medium before the medium
exhausts the carbon? How many cells can be grown in a 5 mL culture using minimal medium
before the medium exhausts the nitrogen? Note that this estimate will be flawed because it
neglects the energy cost of synthesizing the macromolecules of the cell.
The Central Dogma by the Numbers
7 Transcription by the Numbers
If we consider a characteristic transcription rate of roughly 50 nucleotides/s (the range of
experimental values is, say, 10-70 nucleotides/s - see Bionumbers), and we note that the
footprint of the polymerase on the DNA is of order 50 nucleotides (the actual value is closer
to 60 nucleotides), compute the maximum rate at which the transcription apparatus can
produce new transcripts.
Table 1: Observed macromolecular census of an E. coli cell. (Data from F. C. Neidhardt et al., Physiology of the Bacterial Cell, Sunderland, Sinauer Associates Inc., 1990
and M. Schaechter et al., Microbe, Washington DC, ASM Press, 2006.)
Substance
% of total dry weight
Number of molecules
Macromolecule
Protein
RNA
23S RNA
16S RNA
5S RNA
Transfer RNA (4S)
Messenger RNA
Phospholipid
Lipopolysaccharide (outer membrane)
DNA
Murein (cell wall)
Glycogen (sugar storage)
Total macromolecules
Small molecules
55.0
20.4
10.6
5.5
0.4
2.9
0.8
9.1
3.4
3.1
2.5
2.5
96.1
Metabolites, building blocks, etc.
Inorganic ions
Total small molecules
2.9
1.0
3.9
3
2.4 × 106
19,000
19,000
19,000
200,000
1,400
22 × 106
1.2 × 106
2
1
4,360
8 Translation by the numbers
(a) Ribosomes can synthesize new proteins at a rate of roughly 20 aa/s. Following logic
similar to that we used for transcription, use the fact that the footprint of the ribosome on
the mRNA is roughly 35 nucleotides wide and determine the maximal rate at which new
proteins could be synthesized?
(b) Given a lifetime of an mRNA molecule in E. coli of about 5 min, how many proteins can
be produced off of this transcript during its lifetime?
(c) If you use this as the mean number of proteins per gene, how many total proteins would
you estimate in E. coli?
Flies by the numbers
9 Making the Fly
In this problem, we carry out estimates about the number of cells that make different structures in the fly. Describe your calculations in detail.
(a) Estimate the number of cells in the fly wing. At three hours into development, there are
about 6 cells that will make the future wing imaginal disc. Calculate the number of divisions
to get from this stage to an adult wing and the time between mitosis. To learn more about
this, you might want to check out Abouchar et al. (2014), J R Soc Int 11:20140443 and
Garcia-Bellido et al. (1979), Scientific American 241:102.
(b) How many cells are there in a fly eye? Use this chance to learn about ommatidia, and
the number of cells and of photoreceptors.
10 Transcription and translation in development
(a) The average length of a gene in Drosophila melanogaster is about 11 kb and the average
elongation rate of a transcript is about 1.2 kb/min. How does the time to produce an
average mRNA compare to the nuclear cycle times in the initial stages of fly development?
For example, nuclear cycles 9 through 11 last no more than 6 minutes, while nuclear cycle
12 lasts about 10 minutes and cycle 13 on the order of 12 minutes. How do the genes that
are actually expressed in this stage compare to the average gene in terms of their lengths?
You can search for the size of these genes by going to flybase.org and searching for hb
(hunchback), gt (giant), kr (krüppel) and kni (knirps).
(b) When Drosophila eggs are laid they already contain mRNA for several “maternal factors”. Bicoid is an example of such a factor. Its mRNA is localized at the anterior end of
the embryo, serving as a source of Bicoid protein. It is essentially stable up until the end
of nuclear cycle 14 when it gets actively degraded. In this problem we want to estimate the
number of mRNA molecules deposited in the embryo by its mother from measurements of
the number of Bicoid proteins at nuclear cycle 14. Assume that all Bicoid is localized to the
nuclei, which at cycle 14, approximately 120 minutes after the egg is laid, have a radius of
about 3.3 µm. Use the data shown in Figure 1(B) in order to estimate the total number
of Bicoid molecules in the whole embryo at this point. Assuming that translation of bicoid
mRNA is constant, estimate the number of mRNA molecules that led to your calculated
4
(A)
(B)
(C)
50
150
40
100
30
20
10
50 mm
0
0
200
raw anti-Hb intensity
[Bcd]nuc(nM)
60
0.2
0.4
0.6
x/L
0.8
1
50
0
0
50 100 150 200
raw anti-Bcd intensity
Figure 1: Spatial patterning of Bicoid and Hunchback. (A) Fluorescent image showing
Bicoid (green) and Hunchback (red) which are both localized to the anterior half of the
embryo. DNA is stained with a different color (blue) allowing for individual nuclei to be
identified. (B) Quantification of the bicoid gradient as a function of relative position with
respect to the egg length on the dorsal (red) and ventral (blue) sides of the embryo. The
black points denote the background fluorescence intensity of an embryo that does not harbor
fluorescent proteins. (C) Scatter plot of Hunchback versus Bicoid fluorescence intensity from
1299 identified nuclei in a single embryo shows a sharp onset of Hunchback with increasing
Bicoid concentration. (Adapted from T. Gregor et al., Cell 130:153, 2007.)
ribosomes
RNA
DNA
transcription
start
Figure 2: Electron microscopy image of simultaneous transcription and translation. The
image shows bacterial DNA and its associated mRNA transcripts, each of which is occupied
by ribosomes. (Adapted from O. L. Miller et al., Science 169:392, 1970.)
number of Bicoid proteins. You might find it useful to estimate the number of ribosomes
per kb on a transcript from Figure 2 and to use the translation rate discussed in Problem 8.
11 Mutation correlation and physical proximity on the gene
In Section 4.6.1 of Physical Biology of the Cell, Sturtevant’s analysis of mutant flies that
culminated in the generation of the first chromosome map is briefly described. For a more
detailed explanation refer to Stutervant, Journal of Experimental Zoology, 14:43 (1913) (a
version of this paper with a modern introduction can be found on the course website).
In Table 2, we show the crossover data associated with the different mutations that he
used to draw the map. A crossover refers to a chromosomal rearrangement in which parts
of two chromosomes exchange DNA. An illustration of the process is shown in Figure 11.
The six factors looked at by Sturtevant are B, C, O, P, R, and M. Flies recessive in B, the
black factor, have a yellow body color. Factors C and O are completely linked, they always
go together and flies recessive in both of these factors have white eyes. A fly recessive in
factor P has vermilion eyes instead of the ordinary red eyes. Finally, flies recessive in R have
rudimentary wings and those recessive in M have miniature wings. For example, the fraction
of flies that presented a crossover of the B and P factors is denoted as BP. Assume that the
5
Table 2: Fraction of crossovers of six sex-linked factors in Drosophila. (Adapted from A. H.
Sturtevant, J. Exp. Zool. 14:43, 1913.)
Factors Fraction of crossovers
BR
B(C,O)
(C,O)P
(C,O)R
(C,O)M
PR
PM
BP
BM
115/324
214/21736
471/1584
2062/6116
406/898
17/573
109/458
1464/4551
260/693
frequency of recombination is proportional to the distance between loci on the chromosome.
Reproduce Sturtevant’s conclusions by drawing your own map using the first seven data
points from Table 2.
Keep in mind that shorter “distances” are more reliable than longer ones because the latter
are more prone to double crossings. Are distances additive? For example, can you predict
the distance between B and P from looking at the distances B(C,O) and (C,O)P? What is
the interpretation of the two last data points from Table 2?
6
(A)
1
2
3
4
5
6
1/5
*
*
*
*
(B)
1
2
3
4
5
6
3/5
*
*
Figure 3: Crossing over of chromosomes. (A) Chromosomes before crossing over showing
two loci labeled P and M. (B) Illustration of the crossing over event. (C) Chromosomes after
crossover.
7