Download microbial genetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
MICROBIAL GENETICS
DNA Replication
Prof. Rupinder Tewari
Professor
Department of Biotechnology
Panjab University
Sector 14, Chandigarh - 160 014
E-mail: [email protected]
26- Oct-2006 (Revised 12-Dec-2007)
CONTENTS
DNA Replication
Models for DNA Replication
Bidirectional Nature of DNA Replication
5′-3′ Semi-discontinuous Synthesis of DNA
DNA Replication Requires Many Enzymes and Proteins
Mechanism of DNA Replication
Eukaryotic DNA Replication
DNA Replication – Enzymes, Accessary Proteins and Inhibitors
DNA Helicases
Topoisomerases
Primases
DNA Polymerases
Ligases
Accessory Proteins
Inhibitors of DNA Replication
Concepts of Gene Expression
Introduction
Transcription
Transcription in Prokaryotes
Transcription in Eukaryotic Cells
Processing of Primary mRNA Transcript
Translation
Ribosomes
tRNAs
Aminoacyl RNA Synthetases
Protein Factors
Mechanism of Translation
Translation in Eukaryotes
Regulation of Gene Expression
Prokaryotic and Eukaryotic Gene Expression: a Comparison
Eukaryotic Gene Regulation
Key words
DNA Replication; Transcription; Translation; Gene regulation: Gene expression.
DNA Replication
Deoxyribonucleic acid (DNA) contains the genetic instructions for the biological
development of a cellular form of life including some viruses. DNA is an antiparallel double
helix molecule with sugar-phosphate backbone on the outer side and nitrogen bases in the
inner side. The bases are paired specifically, also known as complementary pairing, Adenine
(A) with Thymine (T), and Guanine (G) with Cytosine (C) by two and three hydrogen bonds,
respectively. DNA is a long polymer of nucleotides (a polynucleotide) that transfers the
information to RNA (mRNA) that in turn controls the sequence of amino acid residues in
proteins, using the genetic code. James Watson and Francis Crick proposed the DNA double
helix structure in 1953 and shared the the Noble Prize in 1962 with Maurice Wilkins.
Models for DNA Replication
While proposing their double helix model, Watson and Crick were particularly excited
because the complementary nature of the DNA molecule suggested a self-replicating
mechanism, a property which would be crucial for a genetic material. Three hypothesis were
in contention regarding the mode of self-replication of DNA and these were (Fig. 1):
i)
Conservative replication according to which the original strand is unchanged and a
completely new DNA molecule is synthesized.
ii)
Semi-conservative replication describes a way in which each DNA strand of a
double helical molecule serves as a template for the synthesis of two new DNA
molecules each with one new and one old strand.
iii) Dispersive replication suggests more-or-less random interspersion of parental and
new segments in daughter DNA molecules.
2
In 1957, Matthew Meselson and Franklin Stahl, proved that DNA is replicated by semiconservative mode of replication. Their experiment involved growing a culture of
Escherichia coli in a medium containing 15NH4Cl (ammonium chloride labeled with the
heavy isotope of nitrogen) for several generations. A small aliquot of culture was taken out,
it’s DNA extracted and was run on CsCl density gradient centrifugation. All the DNA was
seen to settle at a point as a single band. From this culture, some cells were transferred to a
normal growth medium (containing non- labeled 14NH4Cl). Samples were taken after 20
minutes (i.e. after one cycle of cell division) and 40 minutes (i.e. after two cell divisions) of
incubation. DNA was extracted from each sample and subjected to density gradient
centrifugation. The results are depicted in Fig.2. After 20 minutes all the DNA contained
similar amounts of 14N and 15N, but after 40 minutes two bands were seen, one corresponding
to hybrid 14N-15N-DNA, and the other to DNA molecules made entirely of 14N. This was
inferred from the different patterns of DNA band(s) after density gradient centrifugation.
The banding pattern seen after 20 minutes enables conservative replication to be discounted
because this scheme predicts that after one round of replication there will be two different
types of double helix, one containing just 15N and the other containing just 14N. The single
14
N-15N-DNA band that was actually seen after 20 minutes is compatible with both dispersive
and semi-conservative replication, but the two bands seen after 40 minutes are consistent
only, with semi-conservative replication. Dispersive replication should continue to give rise
to hybrid 14N-15N molecules after two rounds of replication, whereas the granddaughter
molecules produced at this stage by semiconservative replication include two that are made
entirely of 14N-DNA and 14N /15N-DNA.
3
Bidirectional Nature of DNA Replication
Once the semi-conservative mode of replication of DNA was confirmed, scientists started
asking new questions, e.g. a) is there a discreet region on DNA for initiating DNA
replication?, b) is replication unidirectional or bidirectional? and c) Does both DNA strands
unwind partially or completely during replication? Answer to these problems were provided
by J. Cairns in 1963. He was conducting experiments with the bacterium, Escherichia coli to
prove semi-conservative mode of replication using autoradiography technique. This
experiment was based on the fact that radioactive atoms exposed to photographic film will
form visible grains on the film which provide an estimate of quantity of radioactive material
present. Cairns grew E.coli cells in media containing radioactive thymidine (with 3H), a
component of DNA. He then extracted the DNA, examined it under electron microscope and
coupled it with autoradiography by placing a photographic film on the same. His
interpretation of autoradiograph revealed several points. First, that E.coli DNA was circular.
Second, The radioactive impression on film formed a structure which was similar to Greek
letter theta (θ) thereby suggesting that DNA replicated while maintaining its integrity that is
it didn’t appear to break during the process of DNA replication (Fig.3). Third, replication
appeared to be occurring at one or two Y-junctions or replication forks in the loop so formed.
The DNA is unwound and replication proceeded in one or both directions. In some samples,
radioactivity was not applied till the replication fork (dynamic ends of loops where parental
strands are unwound and separated strands are replicated) had begun. So the radioactivity
label appeared after the theta structure had already begun forming. By comparing the
distribution of silver grains on autoradiograph, Cairns found the replication to be
bidirectional. Also the replication loops always originate at a particular point called as origin
of replication which in E.coli is known as OriC. The length of DNA that is replicated
following one initiation event at a single origin is a unit referred as a replicon. Thus, bacterial
chromosome, such as in E.coli can be considered as a replicon.
4
5′-3′ Semi-discontinuous Synthesis of DNA
The work of several scientists in the subsequent years provided the critical inputs on the
mechanism of DNA replication. A new strand of DNA is synthesized, by DNA polymerase,
always in 5′-3′ direction and thus the incoming nucleotide is added on to the 3′-OH group of
the last nucleotide in the growing daughter DNA strand. Till date, no DNA polymerase has
been discovered that can synthesize DNA in 3′-5′ direction. As the two DNA strands are
antiparallel the strand acting as template is read from its 3′ end towards 5′ end. In a
replication fork, as observed by Cairns, if both of the strands are to be synthesized
simultaneously then one of the strands would have to grow in 3′-5′ direction. But as
mentioned earlier, DNA polymerase with 3′-5′ biosynthetic activity does not occur. The other
possibility is that the two strands completely unwind which was also negated by the work of
Cairns. So the question is, how both DNA daughter strands are synthesized at the same time?
This problem was solved by Reiji Okazaki and his colleagues in 1960s. They showed that
while one of the strands is synthesized in short segments called Okazaki fragments (ranging
in length from few hundred to few thousand nucleotides), the other follows a continuous
replication. They thus proposed that one strand is synthesized continuously, called the leading
strand, in which the 5′-3′ synthesis of new strand moves in the same direction as that of the
replication fork. The other undergoes discontinuous synthesis in the form of Okazaki
fragments called lagging strand. While in the latter, the 5′-3′ synthesis is maintained it
proceeds in direction opposite to the direction of fork movement.
DNA Replication Requires Many Enzymes and Proteins
Arthur Kornberg and his colleagues (1955) purified and crystallized DNA polymerase from
E.coli cells and called it as DNA polymerase I. This enzyme had the ability to polymerize
deoxynucleotides to synthesize new daughter DNA strands. However, subsequent studies
revealed more DNA polymerases, and it turned out that DNA polymerase III, and not DNA
polymerase I, was the key enzyme involved in DNA replication.
The process of DNA replication is very complex and requires the involvement of nearly 30
proteins even in a prokaryotic organism like E.coli. The number is much more in case of
eukaryotic DNA replication. Briefly, enzymes/proteins are needed for unwinding of DNA,
removal of topological tension in DNA strands due to unwinding, synthesis of DNA,
proteins/enzymes for correcting errors in newly synthesized DNA strands, prevention of
ssDNA from digestion by nucleases etc. These enzymes/proteins have been more or less
characterized in this organism and are described later in this chapter.
Mechanism of DNA Replication
The mechanism of DNA replication in all forms of life i.e. prokaryotes and eukaryotes is
essentially the same though it is more complex in eukaryotes. The basic steps involved in
DNA synthesis are: Initiation, Elongation and Termination.
Initiation
In E.coli, DNA replication gets started at a discreet region of the chromosome called origin
of replication (oriC). This region consists of 245bp and has many unique features which are
highly conserved in bacterial kingdom. This locus contains two series of short repeats: a) a
5
tandem array of three 13bp sequence that are rich in A=T base pairs and b) four repeats of a
9bp sequence (Fig. 4).
The initiation phase of DNA replication requires at least nine enzymes/proteins as listed in
Table 1.
Table 1: Proteins/enzymes involved in initiation step of DNA replication in Escherichia
coli
Protein/enzyme
Size
No. of Function
(KDa) subunits
DnaA protein
52.00
1
Recognition of oriC region in DNA;
denaturation of oriC region
Helicase (DnaB protein)
300.00 6
Unwinding of dsDNA
DnaC protein
29.00
1
Assists helicase to bind to DNA
HU
19.00
2
Helps DnaB protein in denaturation of
DNA
Primase (DnaG protein)
60.00
1
Synthesis of RNA primers
Single strand DNA binding 75.60
4
Binding to ssDNA
proteins (SSB)
RNA polymerase
454.00 5
Promotes DnaA protein activity
DNA
gyrase
(DNA 400.00 4
Relieves topological strain generated
topoisomerase II)
by DNA unwinding
Dam methylase
32.00
1
Methylates (5′) GATC palindromic
sequences present in oriC locus
6
To begin with, nearly 20 DnaA protein molecules bind to four 9 bp repeats of OriC. These
Dna molecules in conjunction with histone like protein molecules, called HU proteins, and
ATP molecules denatures the DNA in the A=T rich region of conserved sequence. This is
followed by the binding of DnaB proteins to the A=T rich regions of 13 bp repeats, which in
turn is assisted by another protein, DnaC and ATP molecules. DnaB protein in fact is an
enzyme called helicase which unwinds the double stranded DNA to create a localized region
of single stranded form. Another set of protein called single stranded DNA binding proteins
(SSB) attach to the unwound DNA strands. Their function is to prevent binding of ssDNA
forms into their native state i.e. dsDNA form and to prevent its digestion by nucleases. The
unwound part of dsDNA bound to SSB, called prepriming complex, is the region where
DNA polymerase will initiate synthesis of new DNA. The unwinding of short stretch of
DNA results in putting tension, also called topological stress, on the dsDNA molecule. This
tension is relieved by the involvement of another enzyme, DNA gyrase or topoisomerase II.
Initiation of DNA replication is regulated, such that replication occurs only once during a cell
cycle. This control is mediated by DNA methylation. An enzyme Dam (DNA Adenine
Methylation) methylase, methylates the N-6 position of adenine located within the
palindromic sequence (5′) GATC in the oriC. The 245 bp oriC has 11 such sequences as
compared to one per 256bp in rest of the E.coli chromosome. Newly synthesized DNA strand
is not methylated. Therefore, immediately after replication of DNA, the hemimethylated
DNA (one strand methylated and other not methylated) gets bound to cell membrane of
bacteria. In this situation, DNA cannot be replicated. After a while, oriC region of hemimethylated DNA is released from cell membrane, gets methylated at specific places (N-6 of
adenines in GATC region) and is ready for next round of DNA replication. As per Cairns
model of DNA replication, both parental strands act as templates simultaneouly for synthesis
of two daughter complementary DNA strands. This type of synthesis generates what is called
a replication fork as it resembles a two-pronged fork. With the local unwinding of dsDNA,
two template strands now become available for the initiation of replication. All known DNA
polymerases have a common property i.e. they cannot initiate DNA replication or in other
words all of them require the presence of a short stretch of nucleotides, called as primer, on to
which incoming deoxynucleotides are added. During DNA synthesis, an RNA polymerase
primase or primase adds a short stretch (10-60 nucleotides long) of RNA nucleotides called
as primer, complementary to one of the parental DNA strands.
Elongation
Once the primer has been formed, DNA polymerase starts adding new nucleotides and thus
the DNA daughter strands start to grow. The addition of nucleotides is very fast . Nearly one
thousand nucleotides per second are incorporated in both leading and lagging strands. As
described earlier, the new DNA strand is always synthesized, by DNA polymerase, in 5′-3′
direction and incoming nucleotide attaches to the 3′-OH group of the last nucleotide in the
growing daughter DNA strand. This process continues as the longer template stretch is made
availably by progressive unwinding of the dsDNA. In order to maintain the 5′- 3′ direction of
DNA synthesis on two antiparallel template strands, Reiji Okazaki and his colleagues in
1960s proposed that one of the DNA strands is synthesized in small stretches called Okazaki
fragments, ranging in length from few hundred to few thousand nucleotides (Fig.5).
This strand, whose DNA synthesis is discontinuous, is termed as lagging strand The other
DNA daughter strand is synthesized continuously, called the leading strand, in which the 5′-3′
synthesis of new strand moves in the same direction as that of the replication fork. The
7
direction of synthesis of lagging strand is opposite to the direction of fork movement. The
discontinuous synthesis of lagging strand in 5′-3′ DNA direction gives rise to overall growth
in 3′-5′ direction (Fig.5). It is interesting to note that the DNA Pol III molecule , which
synthesizes leading strand also synthesizes lagging strand. This can be understood by
examining the structure of this enzyme. DNA Pol III is a dimeric enzyme consisting of 10
different subunits. Each subunit is responsible for the synthesis of both leading and lagging
strands. A small region of the template strand, responsible for the synthesis of lagging strand,
makes a loop. The orientation of the loop is such that the direction of the DNA fragment i.e.
Okazaki fragment to be synthesized is the same as that of leading strand. After synthesis of
nearly 1000 nucleotides in the lagging strand, DNA Pol III enzyme let go the lagging strand.
A new loop is then formed and a new cycle of DNA synthesis starts.
8
Later on, the primer sequence from leading as well as individual Okazaki fragments is
excised and replaced by deoxynucleotides by DNA polymerase I. The nicks are sealed by
DNA ligase. The ligation is carried out in the presence of NAD+ in bacteria and ATP in case
of viruses and eukaryotes. Table 2 lists the role of individual proteins in the elongation of
DNA strands during DNA replication.
Table 2: Role of individual proteins in the elongation of DNA strands during DNA
replication
Protein
SSB
Size
(KDa)
75.6
DnaB (helicase)
DnaG (primase)
DNA polymerase III
DNA polymerase I
300.00
60.00
791.50
103.00
DNA ligase
DNA gyrase
(Topoisomerase II)
74.00
400.00
No. of Function
subunits
4
SSB proteins coat ssDNA and prevents
its degradation by nucleases
6
Unwinding of dsDNA
1
Synthesis of RNA primer
17
Synthesis of DNA strands
1
Filling of gaps in DNA strands;
excision of RNA primers
1
Ligation
4
Supercoiling of DNA
Termination
Near the completion of the replication, the two replication forks meet at the terminus (Ter)
region containing multiple copies of Ter sequence. In this process, the Tus (terminus
utilization substance) protein binds to the Ter sequence and forms a Tus-Ter complex. This
complex halts the replication fork which ever enters first at the termination site. Ter sequence
prevents the over replication in case when one of the replication fork gets delayed due to
DNA damage or some other reason by stopping the first replication fork. There are six ter
sequences in a chromosome, three stop the fork coming from one side and the other three
stop the one coming from other side. The fork reaching later stops by colliding with the first
one. At the end of theta mode of replication, it generates a topologically intertwined two
circular molecules of chromosomal DNA, which are released by topoisomerase IV later on
and the ligase closes them up.
Eukaryotic DNA Replication
Although the mechanism of DNA synthesis in eukaryotes and prokaryotes is similar, DNA
replication in eukaryotes is much more complicated. Eukaryotic cells have more than a dozen
DNA polymerases. Two of these (sigma and delta) are important for the replication of
eukaryotic chromosomes. The rate of synthesis of DNA in eukaryotic cells is about one tenth
the rate of bacterial DNA synthesis. E.coli replication fork moves at a speed of 25000 base
pairs per minute as compared to only 2000 base pairs per minute in case of eukaryotes. At
this speed, the cells would take as much as 500 hours to replicate it’s DNA. To overcome this
hurdle, DNA of eukaryotes have many replicons. In yeast like Saccharomyces cerevisae,
which can be considered to be the eukaryotic equivalent of E. coli, these replicons are known
9
as Autonomously Replicating Sequences (ARS). There are
elements in 12 chromosomes of Saccharomyces cerevisae.
approximately 400 ARS
The mode of DNA replication in eukaryotes is similar to prokaryote’s i.e. semi-conservative,
in which one strand serves as the template for the second strand. Furthermore, DNA
replication occurs only at a specific step in the cell cycle. The first step in the eukaryotic
DNA replication is the formation of the pre-initiation replication complex (the pre-RC) which
occurs in G1 phase. The formation of the pre-RC is known as licensing, but a licensed preRC cannot initiate replication. Initiation of replication occurs during the S-phase. Thus, the
separation of licensing and activation ensures that the origin can only fire once per cell cycle.
DNA replication in eukaryotes is not very well characterized. However, researchers believe
that it begins with the binding of the origin recognition complex (ORC) to the origin of
replication sites. This complex is a hexamer of related proteins and remains bound to the
origin, even after DNA replication occurs. Once the complex is properly positioned around
origin of replication, DNA replication sets in motion. At each of the multiple origin sites, the
replication is bidirectional. DNA helicase unwinds parental DNA strand and SSB called
replication protein A molecules attach to ssDNA template strands. Formation of RNA
primers and daughter DNA strand, up to 20 nucleotides is carried out by DNA polymerase
alpha. Another protein called protein replication factor C (RFC) displaces DNA polymerase
alpha. RFC sequentially binds a protein called proliferative cell nuclear antigen (PCNA),
DNA polymerase delta. This complex then synthesizes long stretches of DNA strands. DNA
polymerase-δ has a 3′-5′ proofreading activity and has structural similarity to E.coli DNA
polymerase III. DNA polymerase € is used in DNA repair where it replaces the DNA
polymerase-δ or may be in the removal of primers of okazaki fragments. The short Okazaki
fragments are joined to one another, end to end, by DNA ligase. The replication continues in
both the directions until adjacent replicons meet and fuse.
In general, prokaryotic genome is single and circular, where as eukaryotic chromosomes are
more than one and linear. DNA polymerase finds it difficult to replicate DNA ends because
of topological constraints. DNA sequence analysis of the chromosome ends revealed that
they are made up of tandem repeats of GC rich hexanucleotide sequence called telomeres. In
human each telomere comprises of over hundred repeat sequences (GGGTTC). Telomeres
are synthesized by a special class of enzymes called telomerase. Finally, telomeres must be
capped by a protein to prevent chromosomal instability.
Each replicon is replicated only once per single cell division. This fact is controlled by a
group of proteins, which are involved in formation of ORC required for initiation of DNA
replication. Once replication has started, these factors are broken down by cellular proteases.
DNA Replication – Enzymes, Accessary Proteins and Inhibitors
At the very basic, we know that DNA is replicated by DNA-directed DNA polymerases,
which utilize the single stranded DNA as template to synthesize the daughter strand.
However, one cannot imagine the process of replication to be single handedly carried out by a
polymerase alone. Firstly, as mentioned above, the two strands have to be separated so as to
provide the needed single stranded template – a process which involves the action of a class
of enzymes known as the helicases. Moreover, none of the already discovered polymerases
can initiate replication, that is they can only elongate pre-existing strand. Hence, there arises
the need for another class of enzymes, namely, the primases. Thirdly, the lagging strand
synthesis is semi-discontinuous, thereby facilitating the requirement of ligases.
10
The above paragraph strives to justify the fact that the process of DNA replication involves
the interplay of a plethora of enzymes, as well as accessory proteins, many of which have
been understood in good amount of detail. What follows is a description of each of the
components of the replicative machinery.
DNA Helicases
Helicases constitute a class of enzymes that can move along a DNA duplex utilizing the
energy of ATP hydrolysis to separate the strands. In most prokaryotes, the separated strands
are inhibited from subsequently reannealing by a single-strand-binding protein (SSB
protein), which binds to both separated strands. Since a single stranded template is imperative
for the action of polymerases, almost all DNA polymerases acting on duplex DNA are
dependent on helicase activity.
In E.coli, over ten different helicases are present. These have varying polarities as well as
processivities. Helicases may traverse along the DNA strand in either the 5′-3′ or the 3′-5′
direction, while one helicase is known to move in either direction. Processivity is measured
by the number of nucleotides separated during each association event.
Among the different known helicases, the one that is best characterized in E.coli and is in
fact, central to its DNA replication is the DnaB protein. It is a hexamer of 50 kDa subunits.
The subunits are arranged so that there is a hole down the center into which the ssDNA fits.
The three-dimensional structure of the amino-terminal 141-residues of DnaB helicase has
been determined by both high-resolution NMR and x-ray crystallography. This is the domain
that is essential for the interaction with proteins such as primase and DnaC. It is a six-helix
bundle with a folding pattern that is quite unique to it. It processively unwinds the duplex
with a 5′-3′polarity. This enzyme possesses some rather distinctive features. Firstly, it is the
only enzyme present which can utilize not only ATP but also GTP and CTP, although ATP is
preferred. Moreover, melting can be independent of most of the replicative machinery. It does
not require the action of primase or DNA pol III though these when present do stabilize the
replication fork. Apart from its role in strand separation, the DnaB protein also interacts with
the primase, and is imperative for its activity. In keeping with its multiple roles, the DnaB
helicase has sites for, activation of primase, DnaC protein (which holds it in an inactive
state), single stranded DNA, double stranded DNA and bringing about the hydrolysis of ATP.
Topoisomerases
Enzymes belonging to this class solve the topological problems associated with DNA
replication by introducing temporary single- or double-strand breaks in the DNA.What
essentially happens is that for every 10.5 base pairs melted by the helicase, one link between
the two strands must be removed by introducing a strand breakage. In fact, in prokaryotes, a
negatively supercoiled template is a prerequisite for initiation of replication. However this is
a high energy state and hence topoisomerases are imperative to maintaining this energy state.
Topoisomerases are classified based on the type of strand break that they introduce.
Accordingly, the Type I enzymes, introduce a break into one strand of DNA while Type II
enzymes generate a break in both strands. Type I enzymes can act on single stranded circular
DNA or duplexes only if they contain a nick or a gap, while Type II enzymes can act on
even two covalently closed duplex circles.
11
In E.coli, the Type I enzymes include the Topoisomerase I as well as Topoisomerase III.
Topo I is a zinc dependent enzyme, which has a molecular weight of approx. 100 kDa. It is
essentially involved in relaxing of the negatively supercoiled DNA, in the course of
replication. In eukaryotes, Topo I, is a monomeric protein of approx 95 kD. In contrast to its
prokaryotic counterpart, the eukaryotic enzyme relaxes positive supercoils as well and favors
duplex regions over ssDNA. Topo III is encoded by the topB gene in E.coli and is a monomer
of 74 kDa. It possesses a significantly higher activity than Topo I as well as DNA gyrase,
when it comes to separation of nicked or interlocked circles. It is present in only one to ten
copies per cell.
The E.coli DNA gyrase, also know as Topoisomerase II, is a type II enzyme. Although
gyrase is evolutionarily related to other topoisomerases, it is the only enzyme of this group
that can introduce supercoils into DNA. ATP is hydrolysed in this process. It consists of two
subunits that form an A2B2 complex, with a molecular weight close to 400 kDa. The GyrA
subunit is encoded by the gyrA (nalA) gene and has a molecular weight of 96,756 Da. The
GyrB subunit, is encoded by the gyrB (cou) gene and has a molecular weight of 89762 Da.
The A protein consists of an amino-terminal domain (59-64 kDa), which is involved in DNA
breakage and reunion, and a carboxy-terminal domain (33 kDa), which is involved in DNAprotein interactions. The B protein consists of an amino-terminal domain (43 kDa), which
contains the ATPase activity, and a carboxyterminal domain (47 kDa), which is involved in
interactions with the A protein and DNA. The eukaryotic Topo II enzyme is a homodimer,
and is involved in relaxation of negative as well as positive supercoils, but in contrast to the
bacterial gyrase does not introduce negative supercoils.
Primases
The fact that a free 3′–OH is imperative for the DNA polymerase to initiate replication, calls
for the need of an enzyme to catalyze the synthesis of primers. Moreover, while the leading
strand synthesis is continuous and hence needs to be primed just once, the discontinuous
synthesis of the lagging strand in the form of Okazaki fragments, involves synthesis of primer
for each one of these fragments.
Most prokaryotic systems have two enzymes. One is the RNA polymerase which also
happens to be the enzyme which mediates transcription. The other is the Primase which bears
great similarity to the former. However subtle differences do exist. Primase, the product of
the dnaG gene is the smaller of the two and is a single-stranded DNA dependent polymerase
as against RNA polymerase which is double stranded DNA-dependent polymerase. Moreover
inhibition studies using the RNA polymerase inhibitor rifampicin have shown that primase,
which is insensitive to rifampicin, initiates lagging strand synthesis, since the presence of this
inhibitor only stopped leading strand synthesis. Thus, it may be inferred that it is the lagging
strand synthesis, where the primase assumes prime importance. Moreover, by regulating the
synthesis of the primers for the lagging strand, primase plays a regulatory role in the DNA
replication process.
E. coli primase protein has 582 amino acids and is present in around 50 -100 copies per cell.
It has several characteristic motifs, among which motif1 is a zinc finger motif which apart
from being involved in the chain initiation step, may also play a role in conferring stability on
the primase. No particular function has been ascribed to motifs 2 and 3 while motifs 4, 5 and
6 have been postulated to be magnesium binding sites. One of the magnesium binding motifs
12
resembles the “active magnesium” motif found in all DNA and RNA polymerases. This motif
can be considered to be involved in phosphor-diester bond formation.
However, primase function is not independent and is a function of its interactions with
multiple accessory proteins as well as other components of the replicative machinery. Based
on its function at the replication fork, primase must minimally bind to 2 magnesium ions, 2
dNTPs, DnaB helicase and one or more of the ten subunits of DNA polymerase III
holoenzyme among other cofactors.
In eukaryotes, primase activity is generally a part of DNA polymerase α, which is composed
of four subunits including a large DNA polymerase subunit (approx 180 kDa), two small
subunits ( approx. 60 ad 50 kDA respectively) associated with primase activity and a 70 kDa
protein. Pol α is localized in the nucleus and has been implicated in chromosomal replication.
The primase activity of the two small subunits remains intact even when separated from the
larger subunits.
DNA Polymerases
Discovered by Arthur Kornberg in 1957, DNA Pol I was the first polymerase identified. The
principal functions carried out by this enzyme may be listed as polymerization,
pyrophosphorolysis, pyrophosphate exchange and two independent exonucleolytic
hydrolyses, one mediated from the 3′-5′ end and the other from the 5′-3′ end. Except for the
5′–3′ exonucleolytic activity which is present in the relatively small N- terminal fragment, all
the other activities are contained in the large C- terminal fragment (Klenow fragment). The
5′-3′ exonuclease activity is unique to DNA Pol I. The small N-terminal fragment can be
separated from rest of the structure by mild protease treatment. The enzyme without 5′-3′
activity domain is called large or Klenow fragment, and has the polymerization and proof
reading (3′-5′) activity. Klenow fragment comprises two domains: smaller domain (residues
324- 517) containing the 3′-5′ exonuclease activity, and the larger domain (residues 521-928)
containing the polymerase active site at the bottom of a cleft which is the binding site for the
B-DNA molecule.
The observation that E.coli mutants with a limited Pol I activity, could grow quite normally
stimulated the hunt for other enzymes, and this led to the subsequent identification of other
DNA polymerases which are listed in Table 3.
Table 3: Important DNA polymerases present in cells
Gene
No. of subunits
3′-5′
exonuclease
activity
(proof reading)
5′-3′ exonuclease activity
Rate of polymerization
Processivityx
DNA Pol I
DNA Pol II
DNA Pol III
polA
1
Yes
polB
polC
Equal/more than 4 Equal/more than 10
Yes
Yes
Yes
16-20
3-200
No
40
1,500
No
250-1000
Equal/more than 500,000
13
The major enzyme involved in DNA synthesis is Pol III also known as replicase. Its core
subunit has the composition αεθ and its catalytic properties resemble those of Pol I.
It is in fact, a part of the multisubunit enzyme, the Pol III holoenzyme, which contains atleast
10 types of subunits, as listed in the table 4. However processive DNA synthesis, a property
of DNA polymerase III holoenzyme of Escherichia coli, is not achieved by combining the
pol III core components. This is because processivity requires the association of the core with
the β subunit in the presence of the γ complex – γδδ’χψ. The β subunit confers almost
unlimited processivity to the enzyme, owing to its role as a clamp. Its 366 residue monomeric
subunits associate to form a dimer, which is doughnut shaped, having a diameter of 80 A°,
through which the DNA strand can thread out.
Table 4: Components of DNA Pol III holoenzyme
Subunit
Mass (KDa) Structural gene
α
ε
θ
130
27.5
10
polC
dnaQ
holE
τ
71
dnaX
γ
45.5
dnaX
δ
35
holA
δ'
33
holB
χ
15
holC
ψ
β
12
40.6
holD
dnaN
Eukaryotes contain at least 5 different types of DNA polymerases, named according to the
order in which they were discovered. Pol α (> 250 kDa) , a multi subunit protein, occurs only
in the cell nucleus, and plays a role in the replication of chromosomal DNA. It has an
associated primase activity. Pol δ(170 kDa), on the other hand exhibits a 3′- 5′ proofreading
activity. Moreover, considering that in association with a protein known as the proliferating
cell nuclear antigen (PCNA), it has almost unlimited processivity. It is assumed that the Pol
δ-PCNA complex is the leading strand replicase while, Pol α with its limited processivity is
the lagging strand replicase. Pol ε (256 kDa) is highly processive even in absence of PCNA,
thus indicating that it may play arole in DNA replication. However there is compelling
evidence to prove that it plays a major role in DNA repair. Pol β (36-38 kDa) has a relatively
small size and its biological function has not yet been ascertained. All these polymerases are
localized in the nucleus. Pol γ (160-300 kDa) occurs in mitochondria, wherein it probably
replicates the mitochondrial genome.
Mechanism of DNA Polymerase
The DNA polymerase catalyze the DNA synthesis at the 3′-OH of the existing primer (a short
RNA molecule, synthesized by primases). Unlike other enzyme’s active site dedicated to only
14
one substrate, DNA polymerase can incorporate any of the four types of deoxynucleotides,
due to the nearly identical geometry of these all. The polymerase incorporates a correct dNTP
in the growing strand by their in-built capacity to form hydrogen bonds with complementary
bases in the template strand and promote the catalysis to form phosphodiester bond in
between the 3′-OH and α-PO4 of the incoming nucleotide, rather than detecting each new
dNTP added. Incorrect base pairing dramatically lowers rates of nucleotide addition due to
catalytically unfavorable alignment of the substrates. This is called kinetic selectivity. The
DNA polymerases structure resembles a right hand shape with three domains being thumb,
fingers and palm and the DNA strand sits in the cleft formed between the fingers and thumb.
The palm domain is composed of β-sheets and form the primary elements of catalytic sites.
At this region, the two divalent ions (Mg++ and Mn++) bind, and alters the environment
around the 3′-OH of the primer and the incoming nucleotide. The affinity of 3′-OH for its H
is decreased by one of the ions and forms O3- which begins the nucleophillic attack to α-PO4
of the incoming nucleotide. The β-PO4- and γ-PO4- are stabilized by the other divalent ion for
the release of pyrophosphate for the formation of phosphodiester bond in between the primer
and the incoming nucleotide. Palm domain also monitors the accuracy of the dNTP addition.
This palm region makes hydrogen bonds with the nucleotides recently added in minor groove
of the DNA which is specific and any mismatch between the base pairs results in the slowed
catalysis of the polymerase and eventual release of DNA template from the cleft. Finger
domain also has some residues which bind to the incoming nucleotides and when correct a
base pair is formed between incoming nucleotide and template. The thumb domain helps in
strong association between DNA polymerase and its template. DNA replication is very
accurate. In E.coli the possibility of a mistake being made is only once every 109 -10 10
nucleotides added. For the E.coli chromosome of 4.6 X 106 it means that an error occur every
1000-10,000 replications. Discrimination between the correct and incorrect nucleotides
during polymerization doesn’t only rely on the hydrogen bonds between the nucleotides that
specify correct base pairing but it also depends on the correct geometry of the base pair which
is only able to fit into the active site of the enzyme. If there is incorrect base pair then it may
not fit into the active site and the nucleotide is excluded before the formation of the
phosphor-diester bond. The 3′-5′ exonuclease activity present in all DNA polymerases also
double checks every nucleotide after it has been incorporated. If a wrong nucleotide is added
the translocation of enzyme to the site where a new nucleotide is to be added is inhibited so it
removes any mispaired nucleotide and polymerase begins again. This activity is the Proof
reading activity of the polymerase.
Ligases
The requirement of an enzyme for sealing broken ends of DNA stems from the discontinuous
mode of synthesis of the lagging strand. Essentially, ligases catalyze the formation of a
phosphodiester bond between adjacent 3′-OH and 5′ –P termini of polynucleotides hydrogenbonded to a complementary strand. The E.coli ligase is a 75 kDa polypeptide and is present in
around 300 copies per cell. Eukaryotic systems possess two forms of DNA ligase. Ligase I is
a 125 kDa monomer, predominantly present in proliferating cells. Its C- terminal appears to
have the major portion of the enzyme activity. Ligase II, a 68 kDa heat-labile protein is more
involved in repair rather than replication. The reaction catalyzed by ligases involves three
basic steps:
1. Transfer of the adenyl group of NAD or ATP to the ε – NH2 of a lysine residue in the
enzyme resulting in formation of an enzyme – nucleotide intermediate.
2. Adenylyl activation of the 5′ P terminus of DNA.
3. Phosphodiester bond formation with release of AMP.
15
All prokaryotic ligases utilize NAD+ as the cofactor , while all eukaryotic and phage ligases
prefer ATP as the energy source for synthesis of the diester bond.
Accessory Proteins
While the major enzymes have been already described, what remains is a brief look at some
of the accessory proteins which have been implicated at various steps of DNA replication.
Single Strand Binding (SSB) Proteins
These may be defined as binding proteins with a strong preference for DNA over RNA and
for ssDNA over duplex DNA. E.coli SSB is present in around 270 copies per replication fork
and has a mass of 75.6 kDa. It is a tetaramer and contributes to the opening and unwinding of
a duplex at an origin of replication. Moreover it also sustains the unwinding action carried out
by helicases at replication forks and improves the fidelity of strand elongation by the DNA
polymerases.
DNA A Protein
Initiation of E.coli chromosomal replication requires the binding of approximately
20 molecules of DnaA protein to the DnaA boxes in oriC, promoting strand opening of the
AT-rich 13-mers, and facilitating the loading of DnaB helicase. DnaA protein is not
cytosolic, as might have been expected, but instead is a membrane-associated protein. DnaA
protein is structurally subdivided to four domains. The N-terminal domains I–II (amino acids
1–129) contain regions for associating with DnaB and DnaA itself. Domain III (amino acids
130–350) that contains an ATP-binding/ hydrolysis module also plays a role in associating
with DnaB and DnaA. The C-terminal domain IV (amino acids 351–467) is the DNA-binding
region that contains a helix-turn-helix motif.
Eukaryotic Replication Protein A (RP-A)
This is a single strand binding protein, which acts as an auxiliary factor for Pol playing a
dual role: (i) It renders the pol /primer complex more stable, thus acting as a Pol clamp; and
(ii) it significantly increases the fidelity of Pol .
Inhibitors of DNA Replication
Molecules which can thwart the duplication of the genome of an organism, are of utmost
significance, primarily bcause of the following reasons:
1. These serve as effective anti-bacterial agents
2. Inhibition studies have contributed in a major way to our understanding of the
replicative mechanisms, especially in eukaryotic systems where mutant generation as
well as selection is cumbersome.
With systematic screening experiments, a lot of natural inhibitors have been detected.
Moreover, chemical synthesis of carefully designed inhibitors has also added to the extensive
list of known inhibitors of DNA replication. While some directly target replication enzymes,
others exert an indirect effect by modifying the DNA or causing its denaturation or
16
preventing synthesis of the nucleotides. What follows is a brief look at the various classes of
inhibitors. These have been classified based on the specific component of the replicative
machinery targeted by them as well as their mode of action.
Inhibitors of Nucleotide Biosynthesis
A reduction in the quantity of precursors can eventually limit the rate of synthesis though
such inhibitors may not be very effective since cells may have salvage pathways in addition
to the de novo pathways. However these assist as anti-metabolites in blocking the enzymatic
use of a substrate. These include inhibitors of synthesis of purines, pyrimidines, folate,
deoxunucleotides as well as nucleoside triphosphate synthesis (Table 5).
Table 5: Inhibitors of nucleotide biosynthesis
Action
Inhibitors
Purine Biosynthesis
Azaserine,
6-Mercaptopurine,
Ribavirin,
Hadacidin, Thiadiazole, 8-Azaguanine
6-Azauridine, 5-Azacytidine, 6-Azacytidine,
5-Azaorotate
Sulfonamides, Trimethoprim, Methotrexate,
Aminopterin
Hydroxyurea, 5- Fluorouracil
Pyrimidine biosynthesis
Folate synthesis
Deoxynucleotide synthesis
Nucleoside triphosphate synthesis Cyclamidomycin (desdanine)
Inhibitors That Modify DNA
Compounds belonging to this class associate either covalently or non-covalently with the
DNA, thus modifying it and hence preventing its replication or decreasing the fidelity of
replication. There are varying mechanisms by means of which this is achieved. These include
crosslinking, chain breakage, intercalation, or alkalyation.
Non-covalent binders such as netropsin and distamycin A form stable associations with
duplex DNA, especially with the base pairs of poly d(AT) and poly d(IC) by hydrogen
bonding and electrostatic interactions and hence inhibit chain growth. Compounds such as the
acridine dyes, ethidium bromide as well as propidium bromide may affect the fidelity of
replication by intercalation. Moreover since the acridines inhibit plasmid replication, they
result in bacterial cells losing their plasmids (curing). Echinomycin, an antibiotic produced by
Streptomyces is highly effective against Gram positive bacteria as well as viruses. It binds to
GC rich regions.
Covalent binders such as the alkylating agents - nitrogen mustards, alkyl sulfonates and
nitrosoureas, form reactive intermediates that link them to a variety of nucleophilic targets on
DNA, prime among them being the N7 of guanine. Some of these have been effectively used
as chemotherapeutic agents in neoplasms as well as in immunosuppression. Bleomycins,
isolated from Streptomyces, can preferentially break DNA sequences in chromatin through
17
their nuclease like action. Psoralens, are heterocyclic compounds that can intercalate in
duplex DNA and subsequently, can induce cross links within pyrimidines located on opposite
strands.
Inhibitors of Topoisomerases
In the course of replication, the topological state of the DNA molecule holds a lot of
significance. Hence, topoisomerases have been developed as important drug targets in
prokaryotes. Morever, eukaryotic topoisomerases are targets for antitumour agents, exploiting
the fact that cancer cells divide more rapidly than other cell types. Camptothecin and
etoposide are examples of antitumour drugs targeted to eukaryotic topoisomerases.
Coumarin antibiotics such as novobiocin, have been found to inhibit DNA gyrase, by
competing with ATP for binding to the enzyme and hence inhibiting the ATPase reaction
catalysed by this enzyme. In contrast to coumarins, Quinolone drugs such as nalidixic acid
and oxalinic acid are entirely synthetic and exert their effect on gyrase by disrupting the DNA
breakage-reunion reaction, thus inhibiting DNA supercoiling. Another compound, Microcin
B17 is a 3.1-kDa bactericidal peptide. It has no detectable effect on gyrase-catalysed DNA
supercoiling or relaxation activities in vitro and is unable to stabilise DNA cleavage in the
absence of nucleotides. However, in the presence of ATP, or the non-hydrolysable analogue
5′-adenylyl beta, gamma-imidodiphosphate, microcin B17 stabilises a gyrase-dependent
DNA cleavage complex in a manner similar to quinolones (Table 6 ).
Table 6: Inhibitors of the enzyme topoisomerase
Class of drug
Topoisomerases inhibited
Coumarins
Bacterial gyrase (B subunit), eukaryotic
topo II
Bacterial gyrase( A subunit)
Eukaryotic Topo II
Eukaryotic Topo II
Eukaryotic Topo I
Bacterial gyrase (B subunit)
Eukaryotic Topo I
Quinolones
Acridines
Anthracyclines
Alkaloids
Peptides
Amilorides
Inhibitors of DNA Polymerases
There are a host of compounds which have been seen to be potent inhibitors of the DNA
polymerases. These include the arylhydrazinopyrimidines, which are purine dNTP analogs
and hence produce a significantly strong inhibition of Bacillus subtilis DNA polymerase III.
Effective anti-viral compounds have also been identified which selectively inhibit the
polymerases of the Herpes simplex and vaccinia viruses. Other inhibitors are listed in the
Table 7.
18
Table 7: Inhibitors of DNA polymerase activity
Inhibitor
Protein bound
Arylhydrazinopyrimidines
B.subtilis pol III
Phosphonoacetic acid
Phosphonoformic acid
Aphidicolin
Herpes and
polymerase
vaccinia
DNA
Eukaryotic pol α, pol δ
Concepts of Gene Expression
Introduction
DNA stores genetic information of an organism in a stable form that can be replicated
readily. However the expression of genetic information generally proceeds from DNA to
RNA to protein. One of the strands of DNA acts as the template for the synthesis of an RNA
molecule. This RNA, called mRNA, then in turn directs the synthesis of a particular protein
by regulating the sequence of amino acids in a protein. This principle of direction flow of
genetic information is more commonly know as the Central Dogma of Biology. Thus, this
flow of information involves replication of DNA, as a autocatalytic function and also the
flow of information, comprising transcription of information carried by DNA to the RNA,
and consequently the translation of this information from RNA to protein.
The RNA synthesis or transcription is the process of transcribing DNA nucleotide sequence
into RNA sequence information. This term is used when referring to RNA synthesis using
DNA as a template to emphasize that this phase of gene expression is simply a transfer of
information from one nucleic acid to another. On the other hand, protein synthesis is known
as translation because the four letter alphabet of nucleic acids is translated completely into an
entirely different twenty letter alphabet of proteins, i.e. it involves the change from a
nucleotide sequence of an RNA molecule to the amino acid sequence of a polypeptide chain.
Befitting its position, linking the nucleic acid and proteins, the process of protein synthesis
depends critically on both nucleic acid and protein factors. Protein synthesis takes place on
ribosomes—enormous complexes containing three large RNA molecules and more than 50
proteins. RNA that is translated into protein is called messenger RNA (mRNA) because it
carries a genetic message from DNA to the ribosome. In addition to the mRNA, two other
types of RNA involved in protein synthesis are ribosomal RNA (rRNA), which are integral
part of the ribosomes and the transfer RNA (tRNA) molecules which act as the intermediaries
in translating the coded base sequence of mRNA by bringing the appropriate amino acids to
the ribosome.
Transcription
Template Strand of DNA is Transcribed
The sequence of nucleotides in mRNA is complementary to one of the two DNA strands.
DNA strand involved in the process of transcription is called template (-) strand and the other
strand is called coding (+) strand because its nucleotide sequence matches 100% with that of
RNA transcript except that thymine (in DNA) has been replaced with uracil (in mRNA).
19
Coding and template strands are also called sense or antisense strands respectively. The
genetic information in the template strand is read out in 3′-5′ direction. Both strands of DNA
contain genes and hence act as sense strand or coding strand.
Key enzyme-DNA dependant RNA Polymerase: In case of prokaryotes, RNA synthesis is
mediated by a single type of enzyme, called DNA dependent RNA polymerase, whereas in
eukaryotes, three different types of DNA dependent- RNA polymerase are involved in the
synthesis of four types of RNA molecules (Table 8).
Table 8: Classes of eukaryotic DNA dependent RNA polymerases
Class of enzyme Location
RNA product
I (A)
II (B)
III (C)
rRNA (5.8S, 18S, 28s)
mRNA, snRNAs
tRNA, 5S rRNA, U6
snRNA, 7S RNA
Nucleolus
Nucleolus
Nucleoplasm
The initial RNA transcript formed is called primary transcript. In prokaryotes, as genes may
be organized in an operon, it may comprise information for that many genes. But in
eukaryotes, it usually contains a single gene.
RNA synthesis takes place in three stages, namely initiation, elongation and termination. Like
DNA replication, transcription proceeds in the 5′ → 3′ direction. The fundamental reaction of
RNA synthesis is the formation of a phosphodiester bond. With the concomitant release of
the pyrophosphate to orthophosphate, the α-phosphate group of the incoming nucleoside
triphosphate is attacked by the 3′- hydroxyl group of the last nucleotide in the chain. This
reaction is thermodynamically favorable, and the subsequent degradation of the
pyrophosphate to orthophosphate locks the reaction in the direction of RNA synthesis.
Transcription in Prokaryotes
The fundamentals of RNA synthesis were first elucidated in bacteria, where we come across
relatively simpler mechanisms and molecules, so it seems fair enough to start with the
transcription in prokaryotes. The first stage in this process is the transcription of a nucleotide
sequence in DNA into a sequence of nucleotides in RNA. RNA is quite similar to DNA
differing only in that it contains ribose instead of deoxyribose as its sugar, and the thymine of
DNA is replaced by the uracil of RNA, and it is usually single stranded.
Transcription is Catalyzed by RNA Polymerase
Transcription of DNA is carried out by the enzyme RNA polymerase, which catalyzes the
synthesis of RNA using DNA as template. In case of bacterial cells, there is just one kind of
RNA polymerase which can synthesize all the three major classes of RNA, namely mRNA,
rRNA, tRNA. Let us take into consideration, the RNA polymerase from Escherichia coli. It is
fairly large in size, approximately 400 KDa and has been well characterized. It consists of
four kinds of subunits and the subunit composition of the entire enzyme describes the
holoenzyme. The holoenzyme may be denoted as α2ββ’σ. It comprises two α subunits, two β
20
subunits that differ enough to be identified as β and β’ and a dissociable subunit called the
sigma (σ) factor. The sigma factor helps in recognizing a promoter site where the
transcription gets started, it then plays a role in the initiation of RNA synthesis but later
dissociates from the rest of the enzyme. RNA polymerase without this subunit (α2ββ’) is
known as the core enzyme which contains the catalytic site. Although the core enzyme is
competent enough to carry out RNA synthesis it does require the complete holoenzyme to
ensure initiation at the proper sites within a DNA molecule. Bacteria contain a variety of
different sigma factors that selectively initiate the transcription of specific categories of
genes. Most common sigma factor is called σ70 factor.
Transcription Involves Four Stages
Binding, Initiation, Elongation, and Termination:
Binding of RNA polymerase to a Promoter Sequence: The first step in RNA synthesis is
the binding of RNA polymerase to a particular site on DNA called promoter site. This site
consists of a specific sequence of several dozen base pairs and direct the RNA polymerase to
the proper initiation site for transcription. Bacterial promoter sequences, also termed as
promoter elements, are 40 nucleotides (approx.) long and have two conserved regions called 35 and -10 regions (Fig.6). The numbers denote the position of these regions downstream
(left) of the first nucleotide (i.e. +1) of the primary transcript. The -35 region has a consensus
sequence of 8 nucleotide base pairs (5′- TGTTGACA-3′) and -10 region has a consensus
sequence of 6 base pairs (5′- TATAAT-3′) often called TATA box, or Pribnow box. The
sequence between -10 and -35 regions is not conserved. But the distance between the two
sites is very important for the efficient functioning of promoters. This site is important
recognition site that interacts with a σ factor of RNA polymerase. In eukaryotes, somewhat
similar sequences are also found. However, there is involvement of additional proteins and
factors as well as DNA regions called enhancers or repressors (of transcription). These
regions are located far away from the promoter regions, maybe upstream or down stream of
promoter sites.
21
By convention, promoter sequences are transcribed in the 5′ to 3′ direction on the coding
strand, which is the strand that lies opposite to the template strand. The first nucleotide to be
transcribed is denoted as +1 and the second as +2, the nucleotide preceding the start site is
denoted as -1. These designations refer to the coding strand of DNA. The coding strand is
also known as the sense (+) strand while the template strand is known as the (-) strand.
Initiation of RNA Synthesis: Binding of RNA polymerase leads to unwinding of the DNA
double helix in the promoter region and hence initiation of RNA synthesis can take place. A
region of duplex DNA must be unpaired so that nucleotides on one of its strands become
accessible for base pairing with incoming ribonucleoside triphosphates (Fig.7). The promoter
carrying DNA strand determines which way the RNA polymerase faces and the orientation of
the enzyme in turn determines which DNA strand it transcribes. After the hydrogen bonding
of the first two incoming NTP’s to the complementary bases of the DNA strand, RNA
polymerase catalyzes the formation of a phosphodiester bond between 3′ hydroxyl group of
the first ribonucleotide and 5′ phosphate group of the second, accompanied by release of
pyrophosphate. This continues till the chain becomes nine nucleotides long. At this point , the
sigma factor detaches from the RNA polymerase molecule and the initiation stage is
complete.
22
Elongation of RNA Chain: The elongation phase of RNA synthesis begins after the
formation of the first phosphodiester bond. Here detachment of the σ takes place and the core
enzyme without this σ factor binds more strongly to the DNA template. Chain elongation
hence continues as RNA polymerase moves along the DNA molecule, untwisting the helix
and adding one complementary nucleotide to the growing RNA chain. The region containing
the RNA polymerase, DNA, and nascent RNA is called a transcription bubble because it
contains a locally melted bubble of DNA. This RNA-DNA hybrid helix is about 8 bp long,
which corresponds to nearly one turn of a double helix. This bubble moves a distance of 17
nm in a second, which corresponds to a rate of elongation of about 50 nucleotides per second.
The polymerase continues to move on, rewinding the DNA behind and unwinding the DNA
ahead as it goes. Hence the lengths of the RNA-DNA hybrid and of the unwound region of
DNA stay rather constant as RNA polymerase moves on. This hybrid must also rotate every
time a nucleotide is added so that the 3′-OH end of the RNA stays at the catalytic site. The
RNA transcript forms a transient RNA-DNA hybrid complex with DNA template strand.
However, as the transcription bubble moves on, the RNA transcript moves away from DNA.
The RNA polymerase lacks nuclease activity that would allow it to correct mistakes by
removing mismatched base pairs as RNA elongation proceeds. Consequently the fidelity of
transcription is much lower than that of replication. However this does not pose much of a
problem because multiple RNA molecules are usually transcribed from each gene, and so a
small number of inaccurate copies can be tolerated.
Termination of RNA Synthesis: The termination of transcription is as precisely controlled
as its initiation. In the termination phase of transcription, the formation of phosphodiester
bond ceases, the RNA-DNA hybrid dissociates, the melted region of DNA rewinds, and RNA
polymerase releases the DNA. Elongation of the growing RNA chain continues until RNA
polymerase reaches a special sequence, called a termination signal, which triggers the end of
transcription. In prokaryotes two classes of termination signals can be distinguished that
differ in whether or not they require the participation of a protein called the Rho (ρ) factor.
The most common termination signal is ρ independent and comprises a palindrome rich in
GC, followed by an AT-rich sequence. This palindromic region of RNA transcript forms a
hairpin structure followed by a sequence of four or more uracil residues (Fig.8).
23
The second type of transcription termination signal is ρ-dependent. It is a protein which
binds to a termination site on template strand and stops transcription. It does so by
hydrolyzing ATP in the presence of single stranded RNA but not in the presence of DNA or
duplex RNA. The ρ is brought into action by sequences located in the nascent RNA that are
rich in cystein and poor in guanine. The ATPase activity of ρ enables the protein to pull the
nascent RNA. When ρ catches RNA polymerase at the transcription bubble, it breaks the
RNA-DNA hybrid helix by functioning as RNA-DNA helicase.
RNA processing: Prokaryotic mRNA transcripts are hardly processed i.e. modified. In fact,
protein synthesis starts even before the complete mRNA transcript has been synthesized.
However, other RNAs i.e. tRNA and rRNA need post-transcriptional modifications.
Transcription in Eukaryotic Cells
We now turn to transcription in eukaryotes, which happens to be a much more complicated
process as compared to that in prokaryotes. In eukaryotes, the transcription takes place in the
membrane bounded nucleus, whereas translation takes place outside the nucleus, in the
cytoplasm. This separation of transcription and translation enables eukaryotes to regulate
gene expression in much more intricate ways. The main differences between the prokaryotic
and eukaryotic transcription may be described as follows:
1. Three different RNA polymerases transcribe the nuclear DNA of eukaryotes.
2. Eukaryotic promoters are more varied than prokaryotic promoters.
3. Binding of eukaryotic RNA polymerases to DNA requires the participation of
additional proteins, called transcription factors, which in the eukaryotic system are not
an inherent part of the RNA polymerase molecule unlike the bacterial sigma factor.
4. At the first stage of eukaryotic transcription, protein-protein interactions are of great
importance.
5. Nascent eukaryotic RNA undergo extensive RNA processing, i.e various
modifications.
Three Types of RNA Polymerases Regulate the Transcription in Eukaryotes
In the nucleus of a eukaryotic cell, lie three types of RNA polymerases differing in their
template specificity, location in the nucleus, and susceptibility to inhibitors. These
polymerases are large proteins, containing from 8 to 14 subunits and having a total molecular
mass greater than 500 kDa. RNA polymerase I is present in the nucleolus and it carries out
the synthesis of a RNA molecule that serves as a precursor for three or four types of rRNA
found in eukaryotic ribosomes. This enzyme is not sensitive to α-amanitin. RNA polymerase
II is found in the nucleoplasm and synthesizes mRNA precursors. It also synthesizes most of
the snRNAs involved in the post transcriptional RNA processing. This enzyme is sensitive to
low levels of α-amanitin. RNA polymerase III is also a nucleoplasmic enzyme, but it
synthesizes a variety of small RNAs, including tRNA precursors and the smallest type of
ribosomal RNA. Mammalian RNA polymerase III is sensitive to higher levels of α-amanitin.
The promoters to which eukaryotic RNA polymerases bind are even more varied than
prokaryotic promoters. These are grouped into three categories. RNA polymerase I promoter
or the promoter for the transcription unit that produces precursor for the largest rRNAs
comprises the core promoter that extends into the nucleotide sequence to be transcribed and
24
an upstream control element which makes the transcription more efficient. A typical
promoter used by RNA polymerase II has a short initiator sequence surrounding the start
point and usually, at around -25, a short sequence known as the TATA box, which has a
consensus sequence of TATA followed by two or three more AA’s. In contrast to RNA
polymerase I and II, the promoters used by RNA polymerase III are entirely downstream of
the transcription unit’s start point when transcribing genes for tRNAs and 5S RNA. The
promoters used by all the eukaryotic RNA polymerases must be recognized and bound by
transcription factors before the RNA polymerase can bind to the DNA.
Gene Organization
Unlike prokaryotic genes, eukaryotic protein coding DNA sequences are discontinuous.
Protein coding regions (called exons) are interspersed with non-coding regions (called
introns). Number as well as the size of introns in a gene varies greatly. So far no introns have
been observed in histone mRNAs.
Initiation of Transcription: As mentioned earlier, mRNA transcription is mediated by RNA
Pol II. This enzyme binds to a specific region located 25-35 bp upstream the start site (of
mRNA transcript) and has a sequence similar to TATA box i.e. TATA(AT)A(A/T). For
proper binding of RNA Pol II to the promoter region, there is need of many proteins called as
General (or basal) transcription factors (TFII) e.g. TFIIA, B, D, E, F, J, H. In the first
instance, TFIID binds to TATA region via one of its subunit called TATA box binding
protein (TBP). The combination of RNA pol II and large number of TFII factors is called
transcription initiation complex. Once this complex has bound to promoter region, synthesis
of mRNA transcript starts. In some cases, where TATA boxes are not preset upstream protein
coding regions, there are initiator elements. These elements bind the pol II enzyme and other
transcription factors and initiate the process of transcription.
Elongation and Termination: The elongation of mRNA is quite similar in prokaryotes and
eukaryotes.
However, it is believed that transcription signals exist far away from 3′ end of mRNA
transcript. Though involvement of transcription factors is expected, but experimental data has
yet to be generated. It is believed that RNA endonucleases cleave the primary mRNA
transcript 15 bases (approx.) downstream a consensus sequence AAUAAA. Finally the 3′ end
of mRNA transcript is polyadenylated in the nucleoplasm.
Processing of Primary mRNA Transcript
The primary mRNA transcript (pre-mRNA) undergoes various modifications (capping,
cleavage, polyadenylation, splicing and editing) to yield a functional mRNA molecule
5′ Processing (capping)
Immediately after pre-mRNA synthesis, a single phosphate group of the first ribonucleotide
located at 5′ end is removed by a phosphatase. With the help of another enzyme, guanosyl
transferase, a guanosine residue is added to the ribonucleotide via unusual 5′5′ triphosphate
link. A methyl group is added to N-7 position of guanosine. The addition of methylated
25
guanosine to pre-mRNA transcript is called capping. The cap protects mRNA from
degradation by ribonucleases and also assists in initiation of translation process.
3′ Processing (cleavage and polyadenylation)
The 3′ region of the pre-mRNA transcript carries a polyadenylation signal sequence (5′AAUAAA-3′). A short distance from this sequence is a 5′-YA-3′ sequence in which Y stands
for any pyrimidine. Normally, a short distance away from this region is a GC-rich sequence.
Several specific proteins, called cleavage proteins, bind to these 3′ end sequences and
ultimately cleave the pre-mRNA between polyadenyaltion signal and GU-rich sequence. A
ploy (A) tail comprising around 250 A residues bind to the 3′ end of processed mRNA with
the help of an enzyme, ploy(A) polymerase (PAP). The poly (A) tail is immediately get
coated with poly (A) binding proteins so as to ward off attack by nucleases.
RNA Splicing
It stands for removal of intron sequences and joining of exon sequences to form mRNA. In
majority of pre-mRNAs, introns start with GU and end with AG. In between these two ends
(of introns) lie a polypyrimidine tract and a conserved branch point sequence. RNA splicing
reaction involves two transesterification steps. First step is excision of introns and second
step is ligation of exons. The reactions are caused by the complex (spliceosome) formed of
snRNPs with accessory proteins.
RNA Editing
The process by which the original mRNA sequence gets changed by mechanisms other than
RNA splicing is called RNA editing. RNA editing involves either changes in the nucleotide/s
or addition/deletion of nucleotide/s which ultimately results in the synthesis of new proteins
having structural and functional properties different from the original protein. The
processing of pre-mRNA transcript, involving capping, cleavage, polyadenylation, splicing
and editing results in the formation of a functional mRNA ready to be translated by protein
synthesis machinery.
Translation
The process by which genetic information residing in mRNA is utilized for the synthesis of
protein is called translation. The genetic code of mRNA is read in groups of three nucleotides
called codons. Each codon is specific for a particular amino acid (Table). As there are four
types of nucleotides in mRNA, maximum number of triplet codons can be 43=64. We also
know that there are 20 types of amino acids present in proteins of prokaryotes and
eukaryotes. It implies that many codons can code for a single amino acid (Table 9). In
scientific term, we say that code is degenerate. 61 codons code for 20 different amino acids.
Three codons (UAA,UAG,UGA) do not code for any amino acid. They have important
function to perform in protein synthesis i.e termination of the synthesis of a protein. Hence
these codons are called STOP codons. The first amino acid incorporated in the protein is by
and large formyl methionine in prokaryotes and methionine in eukaryotes coded by the same
codon, AUG. This codon is thus termed as initiation or start codon.
Earlier, it was thought that in all forms of life each codon specifies a particular amino acids
i.e. genetic code is universal. But now a few exceptions have been observed. A few codons
26
of mitochondrial DNA, code for different amino acid as compared to chromosomal DNA
(Table 10 ).
Table 9: The genetic code
Codon sequence
1st base
2nd base
U
Phe
Phe
Leu
Leu
Leu
Lue
Leu
Leu
Ile
Ile
Ile
Met
Val
Val
Val
Val
U
C
A
G
C
Ser
Ser
Ser
Ser
Pro
Pro
Pro
Pro
Thr
Thr
Thr
Thr
Ala
Ala
Ala
Ala
A
Tyr
Tyr
Stop
Stop
His
His
Gln
Gln
Asn
Asn
Lys
Lys
Asp
Asp
Glu
Glu
3rd base
G
Cys
Cys
Stop
Trp
Arg
Arg
Arg
Arg
Ser
Ser
Arg
Arg
Gly
Gly
Gly
Gly
U
C
A
G
U
C
A
G
U
A
C
G
U
A
C
G
Table 10: Exceptions to universality of genetic code
Codon
Amino acid coded
AUA
By chromosomal DNA---Met
By mitochondrial DNA--Ile
By chromosomal DNA---Stop
By mitochondrial DNA--Trp
By chromosomal DNA---Arg
By mitochondrial DNA--Stop
By chromosomal DNA---Arg
By mitochondrial DNA--Trp
UGA
AGA, AGG (in some
animal mitochondria)
CGG (in plant
mitochondria)
The sequence of codons starting with initiation codon and terminating with stop codon is
called open reading frame (ORF) and codes for a single polypeptide. The major
components involved in protein synthesis are ribosomes, mRNA, tRNA, enzymes and protein
factors.
27
Ribosomes
Ribosomes are cytoplasmic organelles and play a vital role in protein synthesis. They are
referred to as work benches of protein synthesis. Their role is to orient the mRNA and amino
acid-carrying tRNAs in the proper relation with respect to each other so as to facilitate proper
reading of the genetic code, hence catalyzing formation of the peptide bonds that link the
amino acids in a specific sequence into a polypeptide. Prokaryotic and eukaryotic ribosomes
resemble each other structurally but are not identical, due to the fact that prokaryotic
ribosomes are smaller in size, carry lesser protein, have smaller RNA molecules and are
sensitive to different inhibitors of protein synthesis. A prokaryotic ribosome is 70S and
comprises two subunits (small and large; 30S & 50S, respectively) Each subunit is made up
of rRNA and many proteins. An eukaryotic ribosome is 80S and also comprises two subunits
(small and large; 40S & 60S, respectively). These subunits are also made up of rRNA and
many proteins.
Three main sites located on ribosomes are highly important in the process of protein
synthesis: an mRNA binding site and two sites where tRNA can bind: an A (aminoacyl) site
that binds each new tRNA with its attached amino acid, and a P (peptidyl) site, where the
tRNA carrying the growing polypeptide chain resides. An additional E or exit site has also
been proposed from where the tRNAs leave the ribosome after they have discharged their
amino acids.
tRNAs
The fidelity of protein synthesis requires the accurate recognition of three base codons on
mRNA. An amino acid cannot itself recognize a codon, therefore, it is attached to a specific
transfer RNA (tRNA) which can do so. Thus a tRNA serves as the adapter molecule that
binds to a specific codon and brings with it an amino acid for the incorporation into the
polypeptide chain. This adapter must therefore mediate the interaction between amino acids
and mRNA. A tRNA is a clover leaf shaped structure (Fig.9) made up of a single chain of 76
ribonucleotides. The 5′ terminus is phosphorylated (pG), whereas the 3′ terminus has a free
hydroxyl group. The amino acid attachment site is the 3′-hydroxyl group of adenosine residue
at the 3′ terminus of the molecule. Once the amino acid is attached it is known as the
aminoacyl tRNA. This tRNA is said to be charged, as it carries an activated amino acid. The
tRNA molecules can recognize codons in mRNA because each tRNA possesses an anticodon,
a special trinucleotide sequence located within one of the loops of the tRNA molecule; these
anticodons permit tRNA molecules to recognize codons in mRNA by complementary base
pairing. However, these mRNA and tRNA line up on the ribosome in a way which allows
considerable flexibility between the third base of the codon and corresponding base in the
anticodon.
Aminoacyl RNA Synthetases
The enzymes, aminoacyl-tRNA synthetase carry out two important functions: firstly they are
responsible for linking the tRNA molecules to the corresponding amino acids at the 3′
acceptor end and also activate an amino acid by reacting with ATP and forming aminoacyl
adenylic acid. These enzymes catalyze the joining of an amino acid to its proper tRNAs via
an ester bond, accompanied by the hydrolysis of ATP to AMP and pyrophosphate.
Aminoacyl-tRNA Synthetases recognize nucleotides located in at least two different regions
of tRNA molecules when they pick out the tRNA that is to become linked to a particular
28
amino acid. Once the correct amino acid has been joined to its tRNA, it is the tRNA itself
that recognizes the appropriate codon in mRNA. These results prove that codons in mRNA
recognize tRNA molecules rather than their bound amino acids.
Protein Factors
In addition to aminoacyl-tRNA synthetases and the protein components of the ribosome,
translation also requires the participation of several other kinds of protein molecules. Though
rRNA is of paramount importance, protein factors play important roles in efficient synthesis
of proteins.
29
Mechanism of Translation
This mechanism of translation of mRNAs into polypeptides is an ordered, step-wise process
that proceeds in the 5′to 3′ direction leading to the synthesis of polypeptides in the N-terminal
to C-terminal direction. This process can be divided into three stages (Fig.10). These are:
(a) Initiation: in which mRNA is bound to the ribosome and positioned for proper
translation.
(b) Elongation: in which amino acids are sequentially joined together via peptide bonds in
as order specified by the arrangement of codons in mRNA
(c) Termination: in which the mRNA and the newly formed polypeptide chain are
released from the ribosome.
30
Initiation of Translation
As mentioned earlier, first amino acid to be translated is formyl methionine or methionine
coded by initiation codon, AUG. Since a protein has methinone amino acids present
internally also, question arises as to how the protein synthesis machinery finds out which
AUG codon is responsible for initiating protein synthesis. A cell has two types of aminoacyltRNAmethionine-tRNA (tRNA Met) and N-formylmethionine-tRNA (fMet-tRNAfMet). The
latter aminoacyl tRNA is used for initiating protein synthesis. In mRNA,upstream of the 5′
end of initiating AUG codon lies a purine rich sequence of nucleotides (5′-AGGAGGU-3′)
called Shine-Dalgarno sequence, which is complementary to 16S rRNA sequence of 30S
ribosomal sub unit. After the pairing of 16S rRNA of 30S subunit to Shine-Dalgarno
sequence of mRNA, thus stabilizing the association, and subunit (30S) now migrates in 3′
direction till it encounters AUG initiation codon. In short, one can say that Shine-Delgarno
sequence determines which AUG codon (on mRNA) will serve as initiating codon. The
initiation of protein synthesis also requires additional factors called initiation factors (IF).
In prokaryotes, Initiation factors IF1 and IF3 bind to the small (30S) ribosomal subunit. 30S
sub unit then migrates along mRNA till it finds Shine-Dalgarno sequence and then initiating
codon, AUG. IF2 binds GTP and the concomitant conformational change enables IF2 to
associate with formylmethionyl-tRNA. The IF2-GTP- formylmethionyl-tRNA complex binds
with mRNA at AUG triplet and 30S subunit to form the 30S initiation complex. IF3 is
released. 50S sub unit then binds to 30S initiation complex with concomitant hydrolysis of
GTP and release of IF1 and IF2. Resulting complex is called 70S initiation complex.. The
hydrolysis of GTP bound to IF2 on entry of the 50S subunit leads to the release of the
initiation factors. Once both subunits of the ribosomes are assembled with mRNA binding
sites for two charged tRNA molecules are created: the P or peptidyl site, and the A, or
aminoacyl site. At this stage, all three initiation factors have been released and the ribosome
is ready for elongation phase of protein synthesis. It is to be noted that fMet-tRNAfMet binds P
site of ribosomes with the corresponding AUG triplet of mRNA.
In eukaryotes, the initiation process in eukaryotes involves a different set of initiation factors,
a somewhat different pathway for assembling the initiation complex and a special initiator
tRNAMet that carries methionine and does not get formylated. There are mainly two types of
initiation processes: cap-dependent and cap-independent and they maybe explained as
follows. Initiation of translation involves an interaction of some proteins with a special tag
(cap) bound to 5′-end of the mRNA molecules as described earlier. The protein factors bind
the small ribosomal 40S subunit. The subunit accompanied by some of those protein factors
moves along the mRNA chain towards its 3′-end and scans for the 'start' codon (mostly AUG)
on the mRNA, which indicates where the mRNA starts coding for the protein. The sequence
downstream between the 'start' and 'stop' codons is then translated by the ribosome into the
amino acid sequence -- thus a protein is synthesized. In eukaryotes and archaea, the amino
acid encoded by the start codon is methionine. The initiator tRNA charged with Met forms
part of the ribosomal complex and thus all proteins start with this amino acid (unless it is
cleaved away by a protease in some subsequent steps). What differentiates cap-independent
translation from cap-dependent translation is that cap-independent translation does not
require the ribosome to start scanning from the 5′ end of the mRNA cap until the start codon.
The ribosome can be trafficked to the start site by ITAFs (IRES trans-acting factors)
bypassing the need to scan from the 5′ end of the untranslated region of the mRNA. This
method of translation has been recently discovered, and has been found to be important in
31
conditions that require the translation of specific mRNAs, despite cellular stress or the
inability to translate most mRNAs.
Chain Elongation
The second phase of protein synthesis is the chain elongation consisting of the growth of
polypeptide chain by one amino acid. This involves sequential cycles of aminoacyl tRNA
binding, peptide bond formation and translocation. Aminoacyl-tRNA binding: This phase
begins with the insertion of an aminoacyl-tRNA, corresponding to second codon of mRNA,
into the empty A site on the ribosome, thus bringing a new amino acid into position to be
joined to the polypeptide chain. The aminoacyl tRNA is delivered to the A site in association
with a protein known as the elongation factor. There are two elongation factors FT-Tu and
EF-Ts, each of which plays a specific role. The function of EF-Tu is to convey the new
aminoacyl tRNA to the A site of the ribosome and to bind to GTP. This solves two functions,
i.e. protecting the delicate ester linkage in aminoacyl-tRNA from hydrolysis and to promote
the binding of all aminoacyl tRNAs to the ribosome except the initiator tRNA making sure
that AUG codons located downstream from the stop codon do not mistakenly recruit an
initiator tRNA to the ribosome. The role of EF-Ts is to regenerate EF-Tu from EF-Tu-2GTP
for the next round of the elongation cycle. This is followed by the formation of a peptide
bond between the amino group of the amino acid bound at the A site and the carboxyl group
of the initiating amino acid attached to the RNA of the P site. The peptide bond formation is
catalyzed by peptidyl tranferase, a function of rRNA of large subunit. However, protein
synthesis cannot continue without the translocation of the mRNA and the tRNAs within the
ribosome. The formation of the peptide bond causes the growing polypeptide chain to be
transferred from the tRNA located at the P site to the tRNA at the A site. At this stage, the
covalent bond between the amino acid and tRNA at P site is hydrolyzed. This reaction does
not require any outside source of energy and the necessary energy is supplied by cleavage of
the high-energy bond that joins the amino acid or peptide chain to the tRNA located at the P
site. The P site now has an empty or uncharged tRNA and the A site contains a peptidyl
tRNA. The entire mRNA-tRNA-aa2-aa1 now shifts to P site and the mRNA advances a
distance of three nucleotides relative to the small subunit bringing the next codon into proper
position for translocation. Translocation is stimulated by EF-G and energy required for
translocation is supplied by the hydrolysis of GTP. During this process of translocation, the
uncharged tRNA moves from the P site to the E site where it is released from the ribosome.
With the availability of a vacant A site and the corresponding mRNA codon, the process will
be repeated by another charged tRNA bringing in a new amino acid.
Termination of Polypeptide Synthesis
Termination occurs when one of the three termination codons moves into the A site. These
codons are not recognized by any tRNAs. Instead, they are recognized by proteins called release
factors namely RF1 (recognizing the UAA and UAG stop codons) or RF2 (recognizing the UAA and
UGA stop codons). A third release factor RF-3 catalyzes the release of RF-1 and RF-2 at the end of
the termination process. These factors trigger the hydrolysis of the ester bond in peptidyl-tRNA
and the release of the newly synthesized protein from the ribosome. As the polypeptide is
released, the other components of the ribosomal complex (mRNA, tRNA and ribosomes)
come apart and ready to translate afresh. The ribosome also dissociates into its subunits aided
by IF3.
32
Translation in Eukaryotes
By and large, protein synthesis is more or less similar in prokaryotes and eukaryotes and
involves three stages i.e. Initiation, Elongation and Termination. However, there are a few
differences which are mentioned below.
• Prokaryotic ribosomes are 70S (50S+30S) and eukaryotic ribosomes are 80S (60S +
40S).
• Prokaryotic mRNA is polycistronic and eukaryotic mRNA is monocistronic.
• Shine-Dalgarno sequences are not present in eukaryotes. 40S subunit of eukaryotic
ribosome binds to 5′ end of mRNA and starts moving in 3′ direction till it finds AUG,
the initiating codon. Usually, initiation codon is recognizable as it is present in a
sequence called Kozak consensus (5′-ACCAUGG-3′).
• The initiation codon, in eukaryotes, codes for methionine in contrast to Nformylmethionine used by prokaryotes.
• For initiation of protein synthesis, prokaryotes require three initiating factors (IFs)
whereas eukaryotes require nine initiating factors (eIFs).
• Elongation step involves the use of three elongation factors (EF for prokaryotes and
eEF for eukaryotes). These elongation factors are distinct from one another.
• Although the genetic code is universal, but in a few exceptions, eukaryotic
mitochondrial codons code for amino acids different from the usual. Note that
mitochondria are absent in prokaryotes.
• In prokaryotes, protein termination is carried out by three release factors (RF). In
eukaryotes, there is single eukaryotic release factor (eRF).
Regulation of Gene Expression
Gene expression is the process by which a gene's DNA sequence is converted into the
structures and functions of a cell. Non-protein coding genes (e.g. rRNA genes, tRNA genes)
are not translated into protein. The amount of protein that a cell expresses depends on the
tissue, the developmental stage of the organism and the metabolic or physiologic state of
the cell as well as the environmental signal. Regulation of gene expression is the cellular
control of the amount and timing of appearance of the functional product of a gene. As
described earlier gene expression is a multi-step process and any of the step could be
regulated. For example, this could happen at transcriptional or post transcriptional stage as
well as during translation followed by folding, post-translational modification and
targeting. Gene regulation gives the cell control over structure and function, and is the basis
for cellular differentiation, morphogenesis and the versatility and adaptability of any
organism. Gene activity is controlled first and foremost at the level of transcription. Much of
this control is achieved through the interactions between proteins that bind to specific DNA
sequences and their DNA binding sites. This aspect of regulation has been covered in the
chapter on Gene Regulations.
Prokaryotic and Eukaryotic Gene Expression: a Comparison
In spite of all the fundamental differences between prokaryotes and eukaryotes, a few basic
similarities between their gene expressions have been observed. The most important among
these is the transcription initiation as a key control point and the factors affecting this control
point. Also in both these systems, many genes are controlled by regulatory proteins that
interact with specific DNA sequences to turn transcription on or off. However, on the other
hand, a few contrasting differences have also been observed. The differences may arise in the
33
prokaryotes and eukaryotes on the basis of genome organization and expression. The greater
diversity of regulatory mechanisms in eukaryotic cells reflects basic differences between
prokaryotes and eukaryotes in the structure, organization, and compartmentalization of the
genome, as well some fundamental features of the multicellular way of life. Beginning with
the genome size and complexity, the eukaryotic genomes are almost always larger than those
of prokaryotes. Incase of most eukaryotes, however, genes constitute only a small portion of
the genome. In addition to the non- coding DNA located between the genes, the non-coding
DNA introns located within most eukaryotic genes are another important genomic feature
that may be involved in regulation. Another basic difference between the two is that most of
the genetic information of eukaryotes is located in the nucleus, separated from the
cytoplasmic sites of protein synthesis by nuclear envelope. An additional difference lies in
the structural organization of the genome. The chromosomes of eukaryotic cells differ from
bacterial chromosomes in both chemical composition and structure. Bacterial DNA is
complexed with small basic proteins that may interact with DNA in much the same way as
the histones do in eukaryotic chromatin. However, unlike in eukaryotes, prokaryotic DNA
folding is not based on nucleosomes and the successive levels of DNA compaction seen in
eukaryotes are not observed in prokaryotes. Now considering post translational modification
of proteins, the proteins may be extensively modified after protein synthesis, especially in
eukaryotes. Also signal sequences that target proteins to specific cellular compartments must
often be removed. Other important protein modifications include the addition of prosthetic
groups, the control of protein activity by phosphorylation and dephosphorylation, and the
glycosylation of proteins destined for secretion or integration into the plasma membrane.
Another difference that arises from the contrasting lifestyles of prokaryotes and eukaryotes is
the relative importance of protein degradation as means of eliminating proteins no longer
needed by the cell. Prokaryotes have enzymes that can degrade defective or unneeded
proteins. As a result, prokaryotic cells are not generally dependent on protein degradation as a
means of eliminating proteins. Eukaryotic cells, on the other hand, are more likely to stop
dividing and to persist as non-dividing differentiated cells for long periods of time thereafter.
They possess specific mechanisms for degrading proteins selectively. The regulated
degradation and replacement of proteins, called protein turnover, is a more prominent feature
of eukaryotic cells than of prokaryotes.
Eukaryotic Gene Regulation
Gene expression in eukaryotic cells, ultimately measured in terms of activities of gene
products, is the culmination controls acting at several different levels. Hence, there are five
main levels at which control might be practiced:
(i)
The genome
(ii)
Transcription
(iii)
RNA processing and export from nucleus to cytoplasm
(iv)
Translation
(v)
Post translational events
Gene regulation in eukaryotes is significantly more complex as compared to that in
prokaryotes due to a variety of reasons. Firstly, the genome being regulated in the former is
considerably larger in size. Hence it would be extremely difficult for a DNA-binding protein
to recognize a unique site in this vast array of DNA sequences. Another source of complexity
in eukaryotic gene regulation is the many different cell types present in most eukaryotes.
Also, eukaryotic genes are not generally organized into operons. On the other hand, genes
that encode proteins for steps within a given pathway often are spread widely across the
34
genome. Having covered the complexities, we now take up each aspect of eukaryotic gene
regulation control. Eukaryotic gene regulation almost always involves the selective
expression of a specific subset of genes from the same complete genome present in every
cell, rather than any cell specific changes in the genome itself.
Genomic Control
In spite of the fact that previous studies show that the genome tends to be identical in all cells
of an adult eukaryotic organism, a few types of gene regulation create slight deviations to this
rule. One example is gene amplification or the selective replication of certain genes. Gene
amplification is one of the ways of exerting genomic control that is a regulatory change in the
makeup or structural organization of the genome. In addition to amplifying gene sequences
whose products are required in greater numbers, some cells also delete genes whose products
are not required. This is known as gene deletion. Another aspect is that DNA rearrangements
can alter the genome. A few cases are known in which gene regulation is based on the
movement of DNA segments from one location to another within the genome known as DNA
rearrangement. Mainly two types of DNA arrangements have been observed. The first is the
yeast mating type rearrangements: all haploid cells carry one allele for mating type; and a
cell’s actual mating phenotype depends on which of the two alleles, α or a, is present at a
special site in the genome known as the MAT locus. Cells frequently switch mating type,
presumably as a means of maximizing opportunities for mating. They do so by transposing
the information for the alternative allele at the MAT locus. This process of rearrangement is
called the cassette mechanism as it consists of one active and two silent loci. The second type
is the antibody gene rearrangements: antibodies are proteins composed of two kinds to
polypeptide subunits, called heavy chains and light chains. Any one of the thousands of
different kinds of heavy chains can be assembled with any one of the thousand different kinds
of light chains, creating the possibility of millions of different types of antibodies. The DNA
rearrangement process that creates antibody genes also activates transcription via a
mechanism involving special DNA sequences called enhancers, which are located near DNA
sequences coding for specific segments, but a promoter is not present in this area and hence
transcription does not occur normally. Another, means of regulating the availability of
various regions of the genome is DNA methylation, the addition of methyl groups to select
cytosine groups in DNA. The DNA of most vertebrates contains small amounts of methylated
cytosine, which tends to cluster in non coding regions at the 5′ends of genes. Moreover,
methylation patterns are inherited after DNA replication. In other words, if the old strand of a
newly formed double helix has a methylated 5′-CG-3′ sequence, then the complementary 3′GC-5′ sequence in the newly formed strand will become a target for the DNA-methylating
enzyme. Changes in histone acetylation, HMG proteins, and connections to the nuclear
matrix are associated with active regions of the genome. Hence they also play a vital role in
genomic control of eukaryotic gene regulation.
Transcriptional Control
The different proteins produced by the two cell types are a reflection of differential gene
transcription. i.e. transcription of different genes produces different sets of RNA. Certain
statements can be made regarding these differences between one cell type and another that
many protein processes are common to all cells and any two cells in a single organism.
Therefore, they have many proteins in common. Also some proteins are abundant in
specialized cells in which they function and cannot be detected elsewhere. Studies of the
number of different mRNAs suggest that at any one time, a typical human cell expresses
35
approximately 10,000-20,000 of its approximately 30,000 genes. When the patterns of
mRNAs in a series of different human cells lines are compared, it is found that the level of
expression of almost every active gene varies from one cell type to another. And lastly
although the differences in mRNAs among specialized cell types are striking, they
nonetheless underestimate the full range of differences in the pattern of protein production.
Another point to remember is that a cell can change its expression of genes in response to
external signals. Most of the specialized cells in a multicellular organism are capable of
altering their patterns of gene expression in response to extracellular cues. Different cell types
often respond in different ways to the same extracellular signals. Also the realization that
multiple DNA control elements and their regulatory transcription factors has led to a
combinatorial model for gene regulation. This proposes that a relatively small number of
different DNA control elements and transcriptional factors, acting in different combinations,
can establish highly specific and precisely controlled patterns of gene expression in different
cell types. Hence, a gene is expressed at a maximum level only when the set of transcription
factors produced by a given cell type includes all the regulatory transcription factors that bind
to that gene’s positive DNA control elements. Another aspect for consideration under the
transcriptional control is the common structural motifs that allow regulatory transcription
factors to bind to DNA and activate transcription. Regulatory transcription factors possess
two distinct activities, the ability to bind to a specific DNA sequence and the ability to
regulate transcription. Both the activities reside in separate domains. The domain that
recognizes and binds to a specific DNA sequence, is called the transcription factor’s DNAbinding domain, whereas the protein region required regulating the transcription is known as
the transcription regulation domain. A gene regulatory protein recognizes specific DNA
sequences because the surface of the protein is extensively complementary to the special
surface features of the double helix of that region. In most cases the protein makes a large no.
of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic
interactions. DNA-protein interactions include some of the highest and most specific
molecular interactions. These form motifs and they in turn use either α helices or β sheets to
bind to the major groove of the DNA. A few of these structural motifs are:
ƒ
ƒ
ƒ
The helix-turn-helix motif which is one of the simplest and most common DNAbinding motif. This is constructed from two α helices connected by a short extended
chain of amino acids, which constitute the turn. The two helices are held at a fixed
angle, primarily through interactions between the two helices. The C-terminal helix is
called the recognition helix because it fits into the major groove of DNA; its amino
acid side chains play an important part in recognizing the specific DNA sequence to
which the protein binds. Each protein presents its helix-turn-helix motif in a unique
way which enhances the versatility of the helix-turn-helix motif by increasing the
number of sequences that the motif can be used to recognize.
Another type of motif is the zinc-finger motif. This consists of an α helix and two
segments of β sheets, held in place by the interaction of precisely positioned cystein
or histidine residues with a zinc atom. Zinc fingers protrude from the protein surface
and serve as the points of contact with specific base sequences in the major groove of
the DNA.
Another motif, the leucine zipper motif mediates both DNA binding and protein
dimerization. The portion of the protein that is responsible for the dimerization is
distinct from the portion that is responsible for DNA binding. This motif combines
these two functions and is named leucine zipper motif because of the way the two α
helices, one from each monomer are joined together to form a short coiled coil. The
helices are held together by interactions between hydrophobic amino acids side chains
that extend from one side of each helix.
36
ƒ
The helix-loop-helix motif is composed of a short α helix connected by a loop to
another, longer α helix. This motif contains hydrophobic regions that usually connect
two polypeptides, which maybe either similar or different. The formation of the four
helix bundle results in juxtaposition of a recognition helix derived from one
polypeptide with a recognition helix derived from the other polypeptide, creating a
two part DNA-binding domain.
Post-transcriptional Control
After the first two levels of regulation, namely the genomic controls and the transcriptional
controls, we now move on to the next level. After transcription has taken place, the flow of
genetic information involves a complex series of posttranscriptional events, any or all of
which can turn out to be regulatory points. In eukaryotes, posttranscriptional events begin
with the processing of primary RNA transcripts, which provides many opportunities for the
regulation of gene expression. Of all the RNA transcripts in eukaryotic nuclei, RNA splicing
is an especially important way of regulation because the phenomenon of alternative RNA
splicing allows cells to create a variety of mRNA molecules from the same primary
transcript, thereby permitting a given gene to generate more than one protein product. After
RNA splicing the next important post-transcriptional process subject to potential control is
the export of mRNA through nuclear pore complexes and into the cytoplasm. This transport
maybe closely associated with RNA splicing. Once mRNA has been exported from the
nucleus to the cytoplasm, several translational control mechanisms are available to regulate
the rate at which each mRNA is translated into its polypeptide product. Some translational
control mechanisms work by altering ribosomes or protein synthesis factors; others regulate
the activity and/or stability of mRNA itself, in other words, the more rapidly an mRNA
molecule is degraded, the less time available for it to be translated.
After translation of an mRNA molecule has produced a polypeptide chain, there are still
many ways of regulating the activity of the polypeptide product. With this we come to the
final level of controlling gene expression, namely, the posttranslational control mechanism,
that modify the protein structure and function. The regulatory mechanisms that are included
in this category are reversible structural alterations that influence protein function, such as
protein phosphorylation and dephosphorylation, as well as permanent alterations, such as
proteolytic cleavage. Other posttranslational events subject to regulation include the guiding
of protein folding by chaperone proteins, the targeting of proteins to intracellular or
extracellular locations, and the interaction of proteins with regulatory molecules or ions, such
as cAMP or Ca2+.
Thus, a particular alteration in cell function or behavior is based on a change in gene
expression and the underlying explanation might involve mechanisms operating at any one or
more of these levels of control.
Suggested Reading
1.
2.
Lehninger’s Principles of Biochemistry, 4th edn. D. L. Nelson & N. M. Cox. W. H. Freeman & Coy.,
New York, USA.
Biochemistry. 3rd edn. D. Voet and J. G. Voet. J. Wiley & Sons, Inc. New Jersey, USA.
37