Download Concept of DNA and RNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic library wikipedia , lookup

Polyadenylation wikipedia , lookup

Frameshift mutation wikipedia , lookup

RNA world wikipedia , lookup

Mutation wikipedia , lookup

RNA silencing wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Messenger RNA wikipedia , lookup

Expanded genetic code wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Designer baby wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Nucleosome wikipedia , lookup

Genealogical DNA test wikipedia , lookup

NEDD9 wikipedia , lookup

Genetic engineering wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

DNA polymerase wikipedia , lookup

RNA wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular cloning wikipedia , lookup

Epigenomics wikipedia , lookup

Genomics wikipedia , lookup

DNA vaccination wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Gene wikipedia , lookup

Genetic code wikipedia , lookup

History of RNA biology wikipedia , lookup

DNA supercoil wikipedia , lookup

Non-coding RNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Non-coding DNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Microevolution wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Replisome wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Helitron (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Primary transcript wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
Concept of DNA & RNA
Dr. Satish Kumar
Anthropological Survey of India
Manav Bhavan, Bogadi, Mysore
CONTENTS
Nucleic Acids - Introduction
Nucleic Acids and Heredity
DNA is the Genetic Material of Bacteria
DNA is the Genetic Material of Viruses
DNA is the Genetic Material of Eukaryotic Cells
Composition of Nucleic Acids
The DNA, RNA difference:
The structure of DNA
Physical Properties of DNA
Denaturation and Renaturation of DNA; Hybridization
Circular DNA
Great length versus tiny breadth
Entropic stretching behavior
Different helix geometries
Supercoiled DNA
Sugar pucker
DNA replication
Mutation – the sequence change in DNA
Single-base substitutions
Missense mutations
Nonsense mutations
Silent mutations
Splice-site mutations
Insertions and Deletions (Indels)
Duplications
Translocations
Inversion
The structure and function of RNA
Messenger RNA (mRNA)
Ribosomal RNA (rRNA)
Transfer RNA (tRNA)
Noncoding RNA (ncRNA)
Protein synthesis
The genetic code
Transcription – the mRNA synthesis
Initiation of Transcription
Termination of Transcription
Fate of synthesized mRNA
Translation
Nucleic Acids - Introduction
The hereditary of all cellular life forms and viruses is defined by its genome, which is a long
sequence of nucleic acids that contains the genetic instructions specifying the biological
development of the organism. Nucleic Acids are macromolecules in the form of
deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). DNA is the molecule of heredity of
all the cellular form of life (and most vireses), however some viruses like tobacco mosaic virus
and poliomyelitis virus are reported to contain only RNA, which acts as the hereditary material
of such viruses.
N Nucleic Acids are linear polymer - composed of four different building blocks, the
nucleotides. It is in the sequence of the nucleotides in the polymers where the genetic
information is located. This information is transmitted by transcription from DNA to RNA
molecules, which are utilized in the synthesis of proteins. In fact, the central dogma of modern
biology is:
In complex cells (eukaryotes), such as those from plants, animals, fungi and protists, most of the
DNA is located in the cell nucleus. By contrast, in simpler cells called prokaryotes (the
eubacteria and archaea), DNA is not separated from the cytoplasm by a nuclear envelope. The
cellular organelles known as chloroplasts and mitochondria also carry DNA. RNA is found both
in the nucleus, where it is synthesized, and in the cytoplasm, where the synthesis of proteins
occurs.
Background
DNA, Chromosomes and Genes. How do these terms relate to one another? Aren't these just
different terms for the same thing? Well, yes and no. When I listen to this discussion amongst
my younger collogues, I felt it very necessary to give in this chapter, the background of these
terms, and how it was discovered that DNA was, in fact, the genetic material.
Once it had been accepted that there was genetic transmission of traits, the search began for the
factor that carried the information. It was established that the following characteristics were
required of genetic material:
• It must contain information for replicating itself, in order to be in each cell of a growing
organism.
•
•
It must be able to control expression of traits. As it was known that the enzymes and
proteins that act within us determine traits, and it is the unique sequences of the protein
that makes it specific to the function, hence the genetic material must be able to encode
the sequence of proteins.
It must be capable of mutational change in a controlled way, in order to ensure evolution
and survival of a species in a changing environment.
Hereditary Material is Bound on Chromosomes
The identity of Mendel's "factors" remained unsubstantiated until the turn of the century, some
forty years after Mendel's painstaking experiments. At that time, two exciting methodological
developments - the construction of increasingly powerful microscopes and the discovery of dyes
or stains that selectively colored the various components of the cell, made it possible to examine
cellular nuclei, which lad to the discovery of long, thin, rod-like structures. These nuclear
structures were termed as chromosomes. Many more microscopic observations confirmed the
role of chromosomes:
1. A variety of chromosome types, as defined by relative size and shape, were found to be
present in the nucleus of each cell. Furthermore, there usually were two copies of each
type of chromosome. This cell is called a diploid cell.
2. All of the cells of an organism, excluding sperm cells, egg cells, and red blood cells, and
all organisms of the same species, were observed to have the same number of
chromosomes.
3. The number of chromosomes in any cell appeared to double immediately prior to the cell
division processes of mitosis and cytokinesis, in which a single cell splits to form two
identical offspring cells.
4. The sex or germ cells appeared to have exactly half (i.e. just one copy of each
chromosome type) of the number of chromosomes as were found in the somatic cells of
any organism. Such cells are called haploid cells.
5. The fertilization of an egg with a sperm cell produces a diploid cell called a zygote,
which has the same number of chromosomes as the somatic cells of that organism.
Suddenly, the implications of Mendel's work became obvious: chromosomes behaved
like the particles or factors that Mendel described. Mendel's hereditary factors were located on
the newly discovered chromosomes or were the chromosomes themselves.
Proof that the chromosomes were Mendel's hereditary factors did not come until 1905, when the
first physical trait was shown to be the result of the presence of specific chromosomal material
and, conversely, that the absence of that specific chromosome meant the absence of the particular
physical trait. Microscopic observations had discovered the presence of what have come to be
called the sex chromosomes. These chromosomes, distinguished from other chromosomes and
from each other by their size, were named "X" and "Y." Researchers in 1905 were surprised to
observe that somatic cells taken from human female donors always contained two copies of the
X chromosome, while somatic cells taken from human male donors always contained one copy
of the X chromosome and one copy of the Y chromosome. All of the other chromosomes in the
nucleated cells of both male and female donors appeared identical. Though mechanism was not
known, it seemed quite clear that the sex of an individuals was directly related to the identity of
the chromosomes in that organism's cells. Thus, sex was shown to be the direct result of a
specific combination of chromosomal material, and sex became the first phenotype (physical
characteristic) to be assigned a chromosomal location - specifically the X and Y-chromosomes.
Chromosomal Subunit that Carries Hereditary Information
Quantitatively DNA forms the 40 per cent of a chromosome, whereas proteins accounts for 60
per cent. At first, it seemed that protein must be responsible for carrying hereditary information,
since not only is protein present in larger quantities than DNA, but protein molecules are
composed of twenty different subunits while DNA molecules are composed of only four. It
seemed clear that a protein molecule could encode not only more information, but also a greater
variety of information, because it possessed a substantially larger collection of ingredients with
which to work.
Now it had to be determined which component of the chromosome, DNA or protein, was the
genetic material. Many scientists were sure that it was protein. After all, there were so many
subunits (20 amino acids) that it seemed obvious that there existed within protein the possibility
for much more diversity in expressing the genetic code than in DNA, which only has 4 subunits.
DNA was considered a boring molecule.
NUCLEIC ACIDS AND HEREDITY
DNA was first identified in 1868 by Friedrich Miescher, a Swiss biologist, in the nuclei of pus
cells obtained from discarded surgical bandages. He called the substance nuclein, noted the
presence of phosphorous, and separated the substance into a basic part and an acidic part.
DNA is the Genetic Material of Bacteria
In 1928, Frederick Griffith performed an experiment using pneumonia bacteria and mice. This
was one of the first experiments that hinted that DNA was the genetic code material. He used
two strains of Streptococcus pneumoniae: a strain which has a polysaccharide coating around it
that makes it look smooth when viewed with a microscope, and a strain which does not have the
coating, thus looks rough under the microscope. When he injected live S strain into mice, the
mice contracted pneumonia and died. When he injected live R strain, a strain, which typically
does not cause illness, into mice, as predicted they did not get sick, but lived. Thinking that
perhaps the polysaccharide coating on the bacteria somehow caused the illness and knowing that
polysaccharides are not affected by heat, Griffith then used heat to kill some of the S strain
bacteria and injected those dead bacteria into mice. This failed to infect/kill the mice, indicating
that the polysaccharide coating was not what caused the disease, but rather, something within the
living cell. Since Griffith had used heat to kill the bacteria and heat denatures protein, he next
hypothesized that perhaps some protein within the living cells that was denatured by the heat,
caused the disease. He then injected another group of mice with a mixture of heat-killed S and
live R, and the mice died! When he did a necropsy on the dead mice, he isolated live S strain
bacteria from the corpses. Griffith concluded that the live R strain bacteria must have absorbed
genetic material from the dead S strain bacteria, and since heat denatures protein, the protein in
the bacterial chromosomes was not the genetic material. This evidence pointed to DNA as being
the genetic material. Transformation is the process whereby one strain of a bacterium absorbs
genetic material from another strain of bacteria and turns into the type of bacterium whose
genetic material it absorbed. Because DNA was so poorly understood, scientists remained
skeptical up through the 1940s.
In 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty revisited Griffith's experiment
and concluded the transforming factor was DNA (Fig. 1). Their evidence was strong but not
totally conclusive. The then-current favorite for the hereditary material was protein; DNA was
not considered by many scientists to be a strong candidate.
Fig. 1: Transforming Principle - DNA Might be the Genetic Material after experiments of
Griffith 1928 and
Oswald Avery, Colin MacLeod, and Maclyn McCarty 1944 (Modified from
http://www.accessexcellence.org/)
The breakthrough in the quest to determine the hereditary material came from the work of Max
Delbruck and Salvador Luria in the 1940s. Bacteriophages are a type of virus that attacks
bacteria; the viruses that Delbruck and Luria worked with were those attacking Escherichia coli,
a bacterium found in human intestines. Bacteriophages consist of protein coats covering DNA.
Bacteriophages infect a cell by injecting DNA into the host cell. This viral DNA then
"disappears" while taking over the bacterial machinery and beginning to make new virus instead
of new bacteria. After 25 minutes the host cell bursts, releasing hundreds of new bacteriophage.
Phages have DNA and protein, making them ideal to resolve the nature of the hereditary
material.
DNA is the Genetic Material of Viruses
In 1952, Alfred Hershey and Martha Chase did an experiment, which is so significant; it has
been nicknamed the Hershey-Chase Experiment. At that time, people knew that viruses were
composed of DNA (or RNA) inside a protein coat/shell called a capsid. It was also known that
viruses replicate by taking over the host cell metabolic functions to make more virus. We are
used to thinking and talking about viruses, which invade our bodies and make us sick, but there
are other, different kinds of viruses that infect other kinds of animals, still other viruses, which
infect plants, and even some viruses that infect bacteria. A virus, which infects a bacterium, is
called a bacteriophase because the host bacterium cell is killed as the new virus particles leave
the bacterial cell. In order to do all this, the virus must inject whatever is the viral genetic code
into the host cell. Thus, people realized that the viral genetic code material had to be either its
DNA or its protein capsid. Hershey and Chase sought an answer to the question, Is it the viral
DNA or viral protein coat (capsid) that is the viral genetic code material which gets injected into
a host bacterium cell? To try to answer this question, Hershey and Chase performed an
experiment using a bacterium named Escherichia coli, or E. coli for short (named after a scientist
whose last name was Escher) and a virus called T2 that is a bacteriophage that infects E. coli.
Isolated T2, like other viruses, is just a crystal of DNA and protein, so it must live inside E. coli
in order to make more viruses like itself. When the new T2 viruses are ready to leave the host E.
coli cell (and go infect others), they burst the E. coli cell open, killing it (hence the name
(Bacteriophage). The results that Hershey and Chase obtained indicated that the viral DNA, not
the protein, is its genetic code material.
Hershey and Chase used radioactive chemicals to distinguish between (label) the protein capsid
and the DNA in T2 virus so they could tell which of those molecules entered the E. coli cells.
Since some amino acids contain sulfur in their side chains, if T2 is grown in E. coli with a source
of radioactive sulfur, the sulfur will be incorporated into the T2 protein coat making it
radioactive. Since DNA has lots of phosphorus in its phosphate (PO4) groups, if T2 is grown in
E. coli with a source of radioactive phosphorus, the phosphorus will be incorporated into the
viral DNA, making that radioactive. Hershey and Chase grew two batches of T2 and E. coli: one
with radioactive sulfur and one with radioactive phosphorus to get batches of T2 labeled with
either radioactive S or radioactive P. Then, these radioactive T2 were placed in separate, new
batches of E. coli, but were left there only 10 minutes. This was to give the T2 time to inject their
genetic material into the bacteria, but not reproduce. In the next step, still in separate batches, the
mixtures were agitated in a kitchen blender to knock loose any viral parts not inside the E. coli
but perhaps stuck on the outer surface. Hopefully, this would differentiate between the protein
and DNA portions of the virus. Then, each mixture was spun in a centrifuge to separate the
heavy bacteria (with any viral parts that had gone into them) from the liquid solution they were
in (including any viral parts that had not entered the bacteria). The centrifuge causes the heavier
bacteria to be pulled to the bottom of the tube where they form a pellet, while the lightweight
viral leftover stay suspended in the liquid portion called the supernatant. In the subsequent step,
the pellet and supernatant from each tube were separated and tested for the presence of
radioactivity. Radioactive sulfur was found in the supernatant, indicating that the viral protein
did not go into the bacteria. Radioactive phosphorus was found in the bacterial pellet, indicating
that viral DNA did go into the bacteria (Fig. 2).
Batch of T2 Bacteriophage
grown in radioactive sulfur
Batch of T2 Bacteriophage
grown in radioactive phosphorus
Fig. 2: Showing DNA is the Genetic Material of Viruses, after Hershey-Chase Experiment
1952 (Modified from http://www.accessexcellence.org/).
Based on these results, Hershey and Chase concluded that DNA must be the genetic code
material, not protein as many people believed. When their experiment was published and people
finally acknowledged that DNA was the genetic material, there was a lot of competition to be the
first to discover its chemical structure.
DNA is the Genetic Material of Eukaryotic Cells
When nucleic acid is added to populations of cells growing in culture, the nucleic acid enters the
cells, and in some of them results in the production of new proteins. When a purified DNA is
used, its incorporation leads to the production of a particular protein.
Although for historical reasons these experiments are described as transfection when performed
with eukaryotic cells, they are direct counterpart to bacterial transformation. The DNA that is
introduced into the recipient cell becomes part of its genetic material, and is inherited in the same
way as any other part. Its expression confers a new trait upon the cells. At first, these
experiments were successful only with individual cells adapted to grow in a culture medium.
Since then, however, the process of trasfection is a powerful tool used to study and control gene
expression. Cloned genes can be transfected into cells for biochemical characterization,
mutational analyses, investigation of the effects of gene expression on cell growth, investigation
of gene regulatory elements, and to produce a specific protein for purification. Transfection of
RNA can be used either to induce protein expression, or to repress it using antisense or RNA
interference (RNAi) procedures.
Such experiments show directly not only that DNA is the genetic material in eukaryotes, but also
that it can be transferred between different species and yet remain functional.
The genetic material of all known organisms and many viruses is DNA. However, some viruses
use an alternative type of nucleic acid, ribonucleic acid (RNA), as the genetic material. The
general principle of the nature of the genetic material, then, is that it is always nucleic acid; in
fact, it is DNA except in the RNA viruses.
COMPOSITION OF NUCLEIC ACIDS
During the 1920s, biochemist P.A. Levene analyzed the components of the DNA molecule. He
found it contained four nitrogenous bases: cytosine, thiamine, adenine, and guanine; deoxyribose
sugar; and a phosphate group. He concluded that the basic unit (nucleotide) was composed of a
base attached to a sugar and that the phosphate also attached to the sugar. He (unfortunately) also
erroneously concluded that the proportions of bases were equal and that there was a
tetranucleotide that was the repeating structure of the molecule. Erwin Chargaff analyzed the
nitrogenous bases in many different forms of life, concluding that the amount of purines does not
always equal the amount of pyrimidines (as proposed by Levene).
As of now it is very clear that Nucleic acids are formed of a sugar moiety-the pentose (Fig. 3),
nitrogenous bases- purines and pyrimidines (Fig. 4) and phosphoric acid. The nucleotides are the
monomeric units of the nucleic acid. They result from the covalent bonding of a phosphate and a
heterocyclic base to the pentose (Fig. 5). Within the nucleotide, the combination of a base with
the pentose constitutes a nucleoside. For example, adenine is a purine base; adenosine (adenine +
ribose) is the corresponding nucleoside, and adenosine monophosphate (AMP), the nucleotide.
Nucleic acids are linear polymers in which the nucleotides are linked together by means of
phosphate-diester bridges with the pentose moiety. These bonds link the 3' carbon in one
nucleotide to the 5' carbon in the pentose of the adjacent nucleotide. The backbone of nucleic
acids consists, therefore, of alternating phosphates and pentoses (Fig. 6). The nitrogenous bases
are attached to the sugars of this backbone.
Ribose (RNA backbone)
Deoxy-ribose (DNA backbone)
Fig3: DNA and RNA Sugars
Fig 4: DNA and RNA bases
Pyrimidines
Purines
Thymine DNA only
Adenine
Uracil RNA only
Guanine
Cytosine
Fig 5: DNA and RNA nucleotides
CG
AT
AU
Fig 6: Sugar Backbone of nucleic acids
5’
3’
The phosphoric acid uses two of its three acid groups in the 3', 5' diester links. The remaining
negative group confers to the polynucleotide its acid properties and enables the molecule to form
ionic bonds with basic proteins. In eukaryotic cells, DNA is associated with histones (i.e., basic
proteins rich in arginine or lysine), Forming a nucleoprotein. This anionic group also causes
nucleic acids to be highly basophilic, i.e., they stain readily with basic dyes.
The DNA, RNA difference:
Pentoses are of two types: ribose in RNA, and deoxyribose in DNA. The only difference
between these two sugars is that the oxygen in the 2' carbon is lacking in deoxyribose. The bases
found in nucleic acids are either pyrimidines or purines. Pyrimidines have a single heterocyclic
ring, whereas purines have two fused ritigs (Fig. 1). In DNA the pyrimidines are thymine (T) and
cytosine (C); the purines are adenine (A) and guanine (G) (Fig. 1). RNA contains uracil (U)
instead of thymine. Therefore between RNA and DNA there are two main differences: in the
pentose moiety (ribose and deoxyribose, respectively) and in the pyrimidine base (uracil instead
of thymine). This explains why radioactive thymidine (i.e., the nucleoside) is used to label DNA
and radioactive uridine for RNA in various experiments.
All the genetic information of a living organism is stored in the linear sequence of the four bases.
Therefore, a four-letter alphabet (A, T, C, G) must code for the primary structure of all proteins
(i.e., composed of 20 amino acids). All the excitement in molecular biology, leading to the
unraveling of the genetic code, began when the structure of DNA was understood
THE STRUCTURE OF DNA
DNA had been proven as the genetic material by the Hershey-Chase experiments, but how DNA
served as genes was not yet certain. DNA must carry information from parent cell to daughter
cell. It must contain information for replicating itself. It must be chemically stable, relatively
unchanging. However, it must be capable of mutational change. Without mutations there would
be no process of evolution.
Many scientists were interested in deciphering the structure of DNA; among them were Francis
Crick, James Watson, Rosalind Franklin, and Maurice Wilkens. Watson and Crick gathered all
available data in an attempt to develop a model of DNA structure. Franklin took X-ray
diffraction photomicrographs of crystalline DNA extract, the key to the puzzle. The data known
at the time was that DNA was a long molecule, proteins were helically coiled (as determined by
the work of Linus Pauling), Chargaff's base data, and the x-ray diffraction data of Franklin and
Wilkens.
In 1953, based on the available data, Watson and Crick proposed a model for the DNA structure
that provided an explanation for the Chargaff's base composition data and for the biological
properties of DNA - particularly its duplication in the cell. In the Watson and Crick model there
are two right-handed helical polynucleotide chains that form a double helix around a central axis.
The two strands are antiparaliel, i.e., their 3'- 5' phosphodiester links are in opposite directions.
Furthermore, the bases are stacked inside the helix in a plane perpendicular to the helical axis
(Fig.7).
The two strands are held together by hydrogen bonds established between the pairs of bases.
Since there is a fixed distance (i.e., 1.08 nm) between the two sugar moieties in the opposite
strands, only certain base pairs can fit into the structure. As may be seen in Figure 7, the only
two pairs that are possible are AT and CG. Two hydrogen bonds are formed between A and T,
and three are formed between C and G. In addition to hydrogen bonds, hydrophobic interactions,
established between the stacked bases, are also important in maintaining the double helical
structure.
As per the Watson and Crick model distance between the stacked bases is 3.4 Å (0.34 nm),
which corresponds to the primary period demonstrated by x-ray diffraction. Furthermore, a turn
of the double helix is completed in 34 Å (3.4 nm), a length that corresponds to 10 nucleotide
residues. This distance corresponds to a secondary period along the axis. The double helix has a
mean diameter of ~20 Å (2.0 nm); furthermore, two grooves (a major or deep groove, and a
minor or more shallow one) are observed (Fig. 8).
Fig 7: Structure of DNA after Watson and Crick 1953
T
A
P
P
P
P
A
T
34Å
P
P
G
C
C
G
20Å
Fig 8: Space filling model of a segment of DNA showing major and minor grooves on the
surface after Feughelman et al., 1955 (taken from internet)
MAJOR GROOVE
MINOR GROOVE
The axial sequence of bases along one polynucleotide chain may vary considerably, but on the
other chain the sequence must be complementary as in the following example:
First chain: sugar phosphatesugarphosphatesugarphosphatesugar…
3’
T
A
C
G 5’
¦
5’
Second
chain:
A
sugar
¦
¦
¦
T
G
C 3’
sugar
sugar
sugar
...
phosphate
phosphate
phosphate
Because of this property, given an order of bases on one chain, the other chain is exactly
complementary. During DNA duplication, the two chains dissociate, and each one serves as a
template for the synthesis of two complementary chains. In this way two DNA molecules are
produced, each having exactly the same molecular constitution. The varying sequence of the four
bases along the DNA chains forms the basis for genetic information. Four bases can produce
thousands of different hereditary characters, because DNA molecules are long polymers along
which an immense number of combinations may be produced.
Physical Properties of DNA
Denaturation and Renaturation of DNA; Hybridization
The hydrogen bonds between the strands of the double helix are weak enough that they can be
easily separated by enzymes. Enzymes known as helicases unwind the strands to facilitate the
advance of sequence-reading enzymes such as DNA polymerase. The unwinding requires that
helicases chemically cleave the phosphate backbone of one of the strands so that it can swivel
around the other. The strands can also be separated by gentle heating, as used in PCR, provided
they have fewer than about 10,000 base pairs (10 kilobase pairs, or 10 kbp). The intertwining of
the DNA strands makes long segments difficult to separate.
Circular DNA
When the ends of a piece of double-helical DNA are joined so that it forms a circle, as in plasmid
DNA, the strands are topologically knotted. This means they cannot be separated by gentle
heating or by any process that does not involve breaking a strand. The task of unknotting
topologically linked strands of DNA falls to enzymes known as topoisomerases. Some of these
enzymes unknot circular DNA by cleaving two strands so that another double-stranded segment
can pass through. Unknotting is required for the replication of circular DNA as well as for
various types of recombination in linear DNA.
Great length versus tiny breadth
The narrow breadth of the double helix makes it impossible to detect by conventional electron
microscopy, except by heavy staining. At the same time, the DNA found in many cells can be
macroscopic in length -- approximately 2 meters long for strands in a human chromosome.
Consequently, cells must compact or "package" DNA to carry it within them. This is one of the
functions of the chromosomes, which contain spool-like proteins known as histones, around
which DNA winds.
Entropic stretching behavior
When DNA is in solution, it undergoes conformational fluctuations due to the energy available
in the thermal bath. For entropic reasons, more floppy states are thermally accessible than
stretched out states; for this reason, a single molecule of DNA stretches similarly to a rubber
band. Using optical tweezers, the entropic stretching behavior of DNA has been studied and
analyzed from a polymer physics perspective, and it has been found that DNA behaves like the
Kratky-Porod worm-like chain model with a persistence length of about 53 nm.
Furthermore, DNA undergoes a stretching phase transition at a force of 65 pN; above this force,
DNA is thought to take the form that Linus Pauling originally hypothesized, with the phosphates
in the middle and bases splayed outward. This proposed structure for overstretched DNA has
been called "P-form DNA," in honor of Pauling.
Different helix geometries
The DNA helix can assume one of three slightly different geometries, of which the "B" form
described by James D. Watson and Francis Crick is believed to predominate in cells. It is 2
nanometres wide and extends 3.4 nanometres per 10 bp of sequence. This is also the approximate
length of sequence in which the double helix makes one complete turn about its axis. This
frequency of twist (known as the helical pitch) depends largely on stacking forces that each base
exerts on its neighbors in the chain.
Supercoiled DNA
The B form of the DNA helix twists 360° per 10.6 bp in the absence of strain. But many
molecular biological processes can induce strain. A DNA segment with excess or insufficient
helical twisting is referred to, respectively, as positively or negatively "supercoiled". DNA in
vivo is typically negatively supercoiled, which facilitates the unwinding of the double-helix
required for RNA transcription.
Sugar pucker
There are four conformations that the ribofuranose rings in nucleotides can acquire:
1. C-2' endo
2. C-2' exo
3. C-3' endo
4. C-3' exo
Ribose is usually in C-3'endo, while deoxyribose is usually in the C-2' endo sugar pucker
conformation. The A and B forms differ mainly in their sugar pucker. In the A form, the C3'
configuration is above the sugar ring, whilst the C2' configuration is below it. Thus, the A form
is described as "C3'-endo." Likewise, in the B form, the C2' configuration is above the sugar
ring, whilst C3' is below; this is called "C2'-endo." Altered sugar puckering in A-DNA results in
shortening the distance between adjacent phosphates by around one angstrom. This gives 11 to
12 base pairs to each helix in the DNA strand, instead of 10.5 in B-DNA. Sugar pucker gives
uniform ribbon shape to DNA, a cylindrical open core, and also a deep major groove more
narrow and pronounced that grooves found in B-DNA.
Conditions for formation of A and Z helices
The two other known double-helical forms of DNA, called A and Z, differ modestly in their
geometry and dimensions. The A form appears likely to occur only in dehydrated samples of
DNA, such as those used in crystallographic experiments, and possibly in hybrid pairings of
DNA and RNA strands. Segments of DNA that cells have methylated for regulatory purposes
may adopt the Z geometry, in which the strands turn about the helical axis like a mirror image of
the B form.
Non-helical forms
Other, including non-helical, forms of DNA have been described, for example a side-by-side
(SBS) configuration. Indeed, it is far from certain that the B-form double helix is the dominant
form in living cells.
DNA REPLICATION
Watson and Crick were particularly excited about their model because the complementary nature
of the DNA molecule suggested a way in which it might self-replicate. The two strands could
separate from one another, each still containing the complete information, and synthesize a new
strand. But it was only in 1957 Matthew Meselson and Franklin Stahl did an experiment to
determine, whether, the two strands unwind and each act as a template for new strands - the
semiconservative replication as proposed by the Watson and Crick or the strands do not
unwind, but somehow generate a new double stranded DNA - the conservative replication. In
order to determine this they have labeled DNA strand with the heavy isotope of nitrogen (N-15)
and then this DNA was allowed to go through one round of replication with N-14, and then the
mixture was centrifuged so that the heavier DNA would form a band lower in the tube, and the
intermediate (one N-15 strand and one N-14 strand) and light DNA (all N-14) would appear as a
band higher in the tube. With the result of this experiment Meselson and Stahl could prove that
the DNA replication is semiconservative, where one strand (old) acts as the template for the
synthesis of the new one (Fig. 9).
Fig 9: Showing semiconservative and conservative models of DNA replications.
DNA replication is not a passive and spontaneous process; it requires two strands of parental
duplex to separate. However the disruption of the structure is only transient and is reversed as the
daughter duplex is formed. The process of DNA replication is catalyzed by a number of
enzymes. DNA replication begins with the activity of the topoisomerase enzyme, which is
responsible for initiation of the unwinding of the DNA by nicking a single strand of DNA and
releasing tension holding the helix in its coiled and supercoiled structure. Then an enzyme
known as DNA Helicase accomplishes unwinding of the original double strand, once
supercoiling has been eliminated by the topoisomerase. The two strands very much want to bind
together because of their hydrogen bonding affinity for each other, so the helicase activity
requires energy (in the form of ATP) to break the strands apart. The partial unwounded DNA
double helix at an area is known as the replication fork (Fig. 10). This unwound section appears
under electron microscopes as a "bubble" and is thus also known as a replication bubble. As the
two DNA strands separate and the bases are exposed, the enzyme DNA polymerase (III) moves
into position at the point where synthesis will begin. The start point for DNA polymerase is a
short segment of RNA known as an RNA primer. The very term "primer" is indicative of its
role, which is to "prime" or start DNA synthesis at certain points. The primer is "laid down"
complementary to the DNA template by an enzyme known as RNA polymerase or Primase.
The DNA polymerase (once it has reached its starting point as indicated by the primer) then adds
nucleotides one by one in an exactly complementary manner, A to T and G to C. DNA
polymerase is described as being "template dependent" in that it will "read" the sequence of
bases on the template strand and then "synthesize" the complementary strand. The template
strand is always read in the 3' to 5' direction. The new DNA strand (since it is complementary)
must be synthesized in the 5' to 3' direction (as both strands of a DNA molecule are described as
being antiparallel). DNA polymerase catalyzes the formation of the hydrogen bonds between
each arriving nucleotide and the nucleotides on the template strand. In addition to catalyzing the
formation of Hydrogen bonds between complementary bases on the template and newly
synthesized strands, DNA polymerase also catalyzes the reaction between the 5' phosphate on an
incoming nucleotide and the free 3' OH on the growing polynucleotide forming a
phosphodiester bond. As a result, the new DNA strands can grow only in the 5' to 3'
direction, and strand growth must begin at the 3' end of the template. Because the original DNA
strands are complementary and run antiparallel, only one new strand can begin at the 3' end of
the template DNA and grow continuously as the point of replication (the replication fork) moves
along the template DNA. The other strand must grow in the opposite direction because it is
complementary, not identical to the template strand. The result of this side's discontiguous
replication is the production of a series of short sections of new DNA called Okazaki fragments
(after their discoverer). To make sure that this new strand of short segments is made into a
continuous strand, the sections are joined by the action of an enzyme called DNA ligase, which
ligates the pieces together by forming the missing phosphodiester bonds. The last step is for an
enzyme to come along and remove the existing RNA primers and then fill in the gaps with DNA.
This RNA primer is eventually removed by RNase H and the gap is filled in by DNA
polymerase I.
Since each new strand is complementary to its old template strand, two identical new copies of
the DNA double helix are produced during replication. In each new helix, one strand is the old
template and the other is newly synthesized, a result described by saying that the replication is
semi-conservative.
Fig 10: The DNA replication fork (From http://www.accessexcellence.org/)
Prokaryotes
The single molecule of DNA that is the E. coli genome contains 4.7 x 106 nucleotide pairs. DNA
replication begins at a single, fixed location in this molecule, the replication origin, proceeds at
about 1000 nucleotides per second, and thus is done in no more than 40 minutes. And thanks to
the precision of the process (which includes a "proof-reading" function), the job is done with
only about one incorrect nucleotide for every 109 nucleotides inserted. In other words, more
often than not, the E. coli genome (4.7 x 106) is copied without error!
Eukaryotes
The average human chromosome contains 150 x 106 nucleotide pairs, which are copied at about
50 base pairs per second. The process would take a month (rather than the hour it actually does)
but for the fact that there are many places on the eukaryotic chromosome where replication can
begin. Replication begins at some replication origins earlier in S phase than at others, but the
process is completed for all by the end of S phase. As replication nears completion, "bubbles" of
newly replicated DNA meet and fuse, finally forming two new molecules.
MUTATION – THE SEQUENCE CHANGE IN DNA
All organisms suffer a certain number of structural changes in their DNA as the result of normal
cellular operations (changes results when the DNA polymerase makes a mistake, which happens
about once every 100,000,000 bases) or random interactions with the environment factors like
ultraviolet light, nuclear radiation, and certain chemicals. The actual numbers of such changes
that remain incorporated into the DNA are far lower than their rate of occurrence, as the cells
contain special DNA repair proteins that fix many of the changes in the DNA. The changes that
have been skipped from the repair mechanism and get incorporated into the DNA are called
spontaneous mutations; the rate at which they occur is characteristic for any particular organism
and is sometimes called the background level. Mutations are rare events, and of course those that
damage a gene are selected against during evolution. It is therefore difficult to obtain large
numbers of spontaneous mutants to study from natural populations.
Some of these changes occur in cells of the body - such as in skin cells as a result of sun
exposure - but are not passed on to children are called somatic mutations. But other errors can
occur in the DNA of cells that produce the eggs and sperm. These are called germline
mutations and can be passed from parent to child. If a child inherits a germline mutation from
their parents, every cell in their body will have this error in their DNA. Germline mutations are
what cause diseases to run in families, and are responsible for the kind of hereditary diseases
covered by Genetic Health.
A gene is essentially a sequence of the bases A, T, G, C and it is in the sequence of these bases
lies the information that describes how to make a protein. Any changes in the sequence that can
alter the gene's meaning and change the protein that is made, or how or when a cell makes that
protein. There are many different ways to alter a gene. Following are the examples of some types
of mutations:
Single-base substitutions
A single base, say an A, becomes replaced by another. Single base substitutions are also called
point mutations. If one purine (A or G) or pyrimidine (C or T) is replaced by the other, the
substitution is called a transition. If a purine is replaced by a pyrimidine or vice-versa, the
substitution is called a transversion.
Missense mutations
With a missense mutation, the new nucleotide alters the codon so as to produce an altered amino
acid in the protein product. e.g. sickle-cell disease The replacement of A by T at the 17th
nucleotide of the gene for the beta chain of hemoglobin changes the codon GAG (for glutamic
acid) to GTG (which encodes valine). Thus the 6th amino acid in the chain becomes valine
instead of glutamic acid.
Nonsense mutations
With a nonsense mutation, the new nucleotide changes a codon that specified an amino acid to
one of the STOP codons (TAA, TAG, or TGA). Therefore, translation of the messenger RNA
transcribed from this mutant gene will stop prematurely. The earlier in the gene that this occurs,
the more truncated the protein product and the more likely that it will be unable to function.
Silent mutations
Most amino acids are encoded by several different codons. e.g. if the third base in the TCT
codon for serine is changed to any one of the other three bases, serine will still be encoded. Such
mutations are said to be silent because they cause no change in their product and cannot be
detected without sequencing the gene (or its mRNA).
Splice-site mutations
The removal of intron sequences, as pre-mRNA is being processed to form mRNA, must be done
with great precision. Nucleotide signals at the splice sites guide the enzymatic machinery. If a
mutation alters one of these signals, then the intron is not removed and remains as part of the
final RNA molecule. The translation of its sequence alters the sequence of the protein product.
Insertions and Deletions (Indels)
Extra base pairs may be added (insertions) or removed (deletions) from the DNA of a gene. The
number can range from one to thousands. Collectively, these mutations are called indels. Indels
involving one or two base pairs (or multiples thereof) can have devastating consequences to the
gene because translation of the gene is “frameshifted". This figure shows how by shifting the
reading frame one nucleotide to the right, the same sequence of nucleotides encodes a different
sequence of amino acids. The mRNA is translated in new groups of three nucleotides and the
protein specified by these new codons will be worthless. Frameshifts often create new STOP
codons and thus generate nonsense mutations. Perhaps that is just as well as the protein would
probably be too garbled anyway to be useful to the cell.
Indels of three nucleotides or multiples of three may be less serious because they preserve the
reading frame. However, a number of inherited human disorders are caused by the insertion of
many copies of the same triplet of nucleotides. Huntington's disease and the fragile X
syndrome are examples of such trinucleotide repeat diseases.
Duplications
Duplications are a doubling of a section of the genome. During meiosis, crossing over between
sister chromatids that are out of alignment can produce one chromatid with a duplicated gene and
the other having gene with deletions.
Translocations
Translocations are the transfer of a piece of one chromosome to a nonhomologous
chromosome. Translocations are often reciprocal; that is, the two nonhomologues swap
segments. Translocations can alter the phenotype is several ways:
• the break may occur within a gene destroying its function
• translocated genes may come under the influence of different promoters and enhancers so
that their expression is altered. The translocations in Burkitt's lymphoma are an example.
• the breakpoint may occur within a gene creating a hybrid gene. This may be transcribed
and translated into a protein with an N-terminal of one normal cell protein coupled to the
C-terminal of another. The Philadelphia chromosome found so often in the leukemic cells
of patients with chronic myelogenous leukemia (CML) is the result of a translocation
which produces a compound gene (bcr-abl).
Inversion
In an inversion mutation, an entire section of DNA is reversed. A small inversion may involve
only a few bases within a gene, while longer inversions involve large regions of a chromosome
containing several genes.
THE STRUCTURE AND FUNCTION OF RNA
Ribonucleic acid, or RNA, gets its name from the sugar group in the molecule's backbone ribose. The primary structure of RNA is similar to that of DNA. Several important similarities
and differences exist between RNA and DNA. Like DNA, RNA has a sugar-phosphate backbone
with nucleotide bases attached to it. Like DNA, RNA contains the bases adenine (A), cytosine
(C) and guanine (G); however, RNA does not contain thymine, instead, contain uracil (U) base.
Unlike the double-stranded DNA molecule, RNA is a single-stranded molecule its base
composition does not follow Chargaff’s rule. Nevertheless, there is some degree of secondary
structure in the different RNA types, because the molecule can form hairpin loops of hydrogen
bonded A-U or G-C pairs. The actual sequence of ribonucleotides in RNA is sometimes called its
primary structure. With the loops included it is said to have a secondary structure. It can also fold
into a three dimensional shape referred to as its tertiary structure.
RNA is the main genetic material used in the organisms called viruses, and RNA is also
important in the production of proteins in other living organisms. RNA can move around the
cells of living organisms and thus serves as a sort of genetic messenger, relaying the information
stored in the cell's DNA out from the nucleus to other parts of the cell where it is used to help
make proteins. There are three main types of RNA molecules:
• messenger RNA (mRNA)
• ribosomal RNA (rRNA)
• transfer RNA (tRNA)
There are also many other types of RNA molecules that are not directly involved in protein
synthesis. They are sometimes called noncoding RNA.
Messenger RNA (mRNA)
Messenger RNA is the type of RNA familiar to most people, which carry the information from
DNA to the site of protein synthesis. The term messenger RNA (mRNA) proposed by Jacob and
Monod in 1961, refers to the fact that this is a template molecule copied from DNA and has a
rapid turnover. The information stored in mRNA is used to make proteins. When mRNA is first
created in eucaryotes it is called precursor mRNA because it needs to be modified before it can
pass on the information it has for the formation of protein. The first two modifications are
capping and the addition of a poly A tail. The third type of modification involves the removal of
introns and the splicing together of exons. Segments of DNA that contain information for the
formation of proteins are called exons. Exons typically have other segments DNA separating
them from each other. These segments are called introns. The precursor mRNA contains both the
exons and introns. The introns need to be cut out and the exons need to be connected back
together. Some human genes for proteins are split up into as many as 79 different exons. A
spliceosome is a complex of proteins and small RNA molecules, and is where the removal of
introns and the splicing together of exons take place. Messenger RNA makes up only about 5%
of all RNA in a typical cell and is made up of small amounts of thousands of different mRNA
molecules. In bacteria mRNA is modified very little if at all. Since bacteria do not have a
nucleus, translation starts before transcription even ends so there is no time for RNA splicing, or
a need as prokaryotic genes are not split into separate exons.
Ribosomal RNA (rRNA)
Ribosomes are made of protein and ribosomal RNA (rRNA) and are where translation of RNA to
protein takes place. In E. coli ribosomes contain three kinds of rRNA - 23S, 16S and 5S. In
eucaryotes, there are four kinds of rRNA - 18S, 28S, 5.8S, and 5S. One 18S molecule is used to
make the small subunit of the ribosome, with the help of several proteins. The 28S, 5.8S, and 5S
rRNA molecules are involved with the construction of the large subunit of the ribosome. The
28S, 18S, and 5.8S molecules are made from the processing of a single precursor RNA.
Transfer RNA (tRNA)
There are at least 32 different kinds of tRNA in an eucaryotic cell. They are relatively small
molecules, each one is made up of only 73-93 ribonucleotides. Although tRNA is a single strand
of RNA, it bends around in certain places resulting in some ribonucleotides pairing up with
others in the same chain, forming three loops (Fig. 11). Each tRNA molecule has one amino acid
attached to its 3' end. Since there are only 20 amino acids and around 32 different kinds of
tRNAs, some amino acids are carried by more than one type of tRNA. On one of the three loops
is what is called an anticodon. Anticodons are made up of three bases and are involved in
translation. The particular amino acid attached to a tRNA molecule is determined by its
anticodon sequence.
Fig. 11: Diagram of tRNA showing aminoacyle binding site (acceptor); the anticodon loop,
which bind to
mRNA at specific codon; the ribosomal recognition site (TψC loop); and the D loop
(From http://motif.stanford.edu/thesis/tRNA.html.).
Noncoding RNA (ncRNA)
Noncoding RNA is not involved (at least not directly) in protein synthesis. Instead it is involved
in many other cell processes including the regulation of transcription, DNA replication and RNA
processing and modification. The size of noncoding RNAs can be any where from 21
ribonucleotides long to as much as 10,000 or more ribonucleotides in length. In bacteria ncRNA
is sometimes referred to as small RNA (sRNA). Some examples of ncRNA are:
XIST RNA SnRNA
-
inactivates one of the two X chromosomes in females
involved with the processing of larger precursor RNA molecules
SnoRNA
miRNA
siRNA
destruction
-
is involved in making ribosomes and telomeres
is involved with the regulation of the expression of mRNA
small, bind to complementary RNA sequences targeting them for
PROTEIN SYNTHESIS
The genetic code
Living organisms are complex systems. Hundreds of thousands of proteins exist inside each one
of us to help carry out our daily functions. These proteins are produced locally, assembled pieceby-piece to exact specifications. An enormous amount of information is required to manage this
complex system correctly. This information, detailing the specific structure of the proteins inside
of our bodies, is stored in DNA molecule. At the molecular level it has been found that the
codons, i.e., the hereditary units that contain the information to code for a single amino acid, are
made of three nucleotides (a triplet). This information is first transcribed into the messenger
RNA (mRNA), which has a sequence of bases complementary with DNA, from which it is
copied. In fact, mRNA, like DNA has only four bases, whereas proteins may contain up to 20
amino acids. Permutation of the 4 bases yields 43 or 64 triplets - more than enough to code for 20
amino acids. If the genetic code consisted of doublets, the number of codons would be
insufficient (i.e., 42 = 16). The mRNA in turn serves as an intermediary that contains the same
genetic information and translates this information into the amino acid sequence of the protein.
It is important to remember some of the fundamental experiments that facilitated the discovery of
the genetic code. In. 1961 Nirenberg and Matthaei made the basic observation that synthetic
polyribonucleotides could act as artificial mRNAs and could stimulate the incorporation of
amino acids into polypeptides. The first one used was polyuridylic acid (poly U) and the result
was the coding of polyphenylalanine (a peptide chain made of phenylalanine). Thus, it was
deduced that the codon for phenylalanine was UUU. Other homopolymers, such as poly A,
stimulated the uptake of lysine and poly C of proline. The use of synthetic RNAs of known
composition was made possible by a previous discovery by Ochoa that the enzyme polynucleotide phosphorylase can link the specific nucleotides added to the medium. By 1963, the
experiments with synthetic RNAs done in the laboratories of Nirenberg and Ochoa had
established most of the codon sequences. The recognition of codons was later made possible by
the use of trinucleotide templates of known base composition." When ribosomes are incubated
with 14C-AA-tRNA and such trinucleotides, complexes are formed that can easily be detected
by filtration. In the laboratory of Khorana, polyribonucleotides with alternating doublets or
triplets of known sequences were synthesized and used in cell free systems.
As shown in Table 1, several RNA codons may code for a single amino acid-a fact that is also
called degeneracy of the genetic code. Leucine, for example, may be coded by CUU, CUC, and
CUA. In most cases the synonymous codons differ only in the base occupying the third position
of the triplet. The first two bases of the codon are apparently more important in coding. Since the
same amino acid is coded by synonymous codons, it is logical to assume that mutations due to
replacement of the third base may go unnoticed.
The initiation signal for the synthesis of a protein is the AUG codon. When the AUG codon is at
the beginning of the message (starting codon), in bacteria, it will code for N-formylmethionine.
If the AUG codon is in another position, it will code for methionine. The termination signal is
provided by the so-called nonsense codons UAG, UAA, and UGA (Table 1).
Table 1: The genetic code
1st
Base
U
U
C
A
C
A
G
3rd
Base
UUU
Phe
UCU
Ser
UAU
Tyr
UGU
Cys
U
UUC
Phe
UCC
Ser
UAC
Tyr
UGC
Cys
C
UUA
Leu
UCA
Ser
UAA
Nonsense UGA
Nonsense A
UUG
Leu
UCG
Ser
UAG
Nonsense UGG
Trp
G
CUU
Leu
CCU
Pro
CAU
His
CGU
Arg
U
CUC
Leu
CCC
Pro
CAC
His
CGC
Arg
C
CUA
Leu
CCA
Pro
CAA
Gln
CGA
Arg
A
CUG
Leu
CCG
Pro
CAG
Gln
CGG
Arg
G
AUU
Ile
ACU
Thr
AAU
Asn
AGU
Ser
U
AUC
Ile
ACC
Thr
AAC
Asn
AGC
Ser
C
AUA
Ile
Met
F-Met
ACA
Thr
AAA
Lys
AGA
Arg
A
ACG
Thr
AAG
Lys
AGG
Arg
G
GCU
Ala
GAU
Asp
GGU
Gly
U
AUG
GUU
G
2nd Base
GUC
Val
GCC
Ala
GAC
Asp
GGC
Gly
C
GUA
Val
GCA
Ala
GAA
Glu
GGA
Gly
A
GUG
Val
GCG
Ala
GAG
Glu
GGG
Gly
G
Although most of our knowledge about the genetic code comes from experiments with E. coli,
essentially similar results have been obtained with other system such as amphibian, mammalian
liver, and plant tissue. It may be said that the genetic code is largely universal, i.e., there is a
single code for all living organisms. As Nirenberg has pointed out, the genetic code may have
developed at the same time as the first bacteria, some three billion years ago, and since then it
has changed relatively little throughout evolution of living organisms.
Transcription – the mRNA synthesis
The process of converting the information contained in a DNA segment into proteins begins with
the synthesis of mRNA molecules containing anywhere from several hundred to several
thousand ribonucleotides, depending on the size of the protein to be made. Each of the 100,000
or so proteins in the human body is synthesized from a different mRNA that has been
synthesized in the cell nucleus by transcription of DNA (gene), a process highly analogous to
DNA replication. Of course, there are different effectors, or proteins, that direct transcription.
Primary among these is the RNA polymerase holoenzyme, an agglomeration of many different
factors that together direct the synthesis of mRNA on a DNA template. An mRNA molecules
may contain anywhere from several hundred to several thousand ribonucleotides, depending on
the size of the protein to be made.
Initiation of Transcription
RNA polymerase must be able to recognize the beginning of a gene so that it knows where to
start synthesizing an mRNA. It is directed to the start site of transcription by one of its subunits'
affinity to a particular DNA sequence that appears at the beginning of genes. Such unidirectional
sequence on one strand of DNA is called as promoter site. These sites are recognized by a factor
called "SIGMA". It is sigma's job to recognize the promoter sites and "tell" the DNA dependent
RNA polymerase both where to start and in which direction (that is, on which strand) to continue
synthesis. Once the RNA polymerase has been directed to the start point of the gene by sigma,
the sigma factor is released and the RNA polymerase carries out the process of transcription. The
bacterial promoter almost always contains some version of the elements shown in figure 12. The
RNA polymerase then stretches open the double helix at that point in the DNA and begin
synthesis of an RNA strand complementary to one of the strands of DNA. The DNA strand from
which it copies RNA is called antisense or template strand, and the other strand, to which it is
identical, is called sense or coding strand. The RNA polymerase recruits rNTPs (ribonucleic
nucleotides triphosphates) in the same way that DNA polymerase recruits dNTPs. However,
since synthesis is single stranded and only proceeds in the 5' to 3' direction, there is no need for
Okazaki fragments.
Fig. 12: Transcription initiation site showing promoter sequences.
Termination of Transcription
How does RNA polymerase know when to stop transcribing a gene? Like the promoter
sequence, there are other base sequences at the end of a gene that signal a stop to mRNA
synthesis. Just as there is a sigma factor to help signal the beginning of a gene, another factor
called "Rho" aids in terminating the process of transcription. When the end of the gene is near,
the “Rho” factor binds to the mRNA and interacts with the RNA polymerase. The interaction of
Rho with the RNA polymerase causes the enzyme to "fall off" the DNA template strand, thus
stopping transcription.
Fate of synthesized mRNA
The average life span of some of the mRNAs in E. coli is about two minutes, after which the
molecules are broken down by ribonuleases. In fact in bacteria mRNA may be read on one hand
while the other end is still being transcribed. It may also disintegrate at the starting end, while the
reading is terminating in the other. In contrast origin and fate of mRNA in eukaryotic cells is
much more complex. In eukaryotes the formation of a functionally active mRNA is the
consequence of a complex series of steps that comprises (1) The actual transcription of DNA into
mRNA precursors (2) The intracellular processing of these precursors and (3) The transport of
the mRNAs into cytoplasm and there association with the ribosomes to initiate the process of
translation or protein synthesis.
Translation
The cellular factory responsible for synthesizing proteins is the ribosome. The ribosome consists
of structural RNA and about 80 different proteins. In its inactive state, it exists as two subunits; a
large subunit and a small subunit. When the small subunit encounters an mRNA, the process of
translation of the mRNA to protein begins. There are two sites in the large subunit, the first site
is the site where the growing peptide (another word for protein) will reside, it is known as the P
site. Whereas another site just to the 3' direction of the P site; it is known as the A site. This is
where the incoming tRNA will attach itself.
As discussed previously, the adaptor molecule that acts as a translator between mRNA and
protein is a specific RNA molecule, tRNA (transfer RNA). Each tRNA has a specific anticodon
and acceptor site. Each tRNA also has a specific charger protein; this protein can only bind to
that particular tRNA and attach the correct amino acid to the acceptor site. The energy to make
this bond comes from ATP. These charger proteins are called aminoacyl tRNA synthetases.
The first AUG codon on the 5' end of the mRNA acts as a "start" signal for the translation
machinery and codes for the introduction of a methionine amino acid. Initiation is complete
when the methionine tRNA occupies one of the two binding sites on the ribosome. The next
incoming tRNA will bind to the A site (next to where the tRNA with the methionine attached is
on the P site). ALL available tRNAs will approach the site and try to attach, but the only tRNA
which will successfully attach is the one whose anticodon is complementary to the codon of the
A site on the mRNA. In order for a protein chain to form, the amino acids must be attached,
linked together. The link between amino acids is called a peptide bond. Amino acids continue to
be linked until the protein is finished. This special type of bond is formed by the enzyme
peptidase. Once the bond has formed between the two amino acids, the tRNA on the P site leaves
and passes its amino acid on to the tRNA on the A site. The tRNA with the two amino acids on it
is now sitting on the P site (because it is holding the growing protein). The ribosome slides down
three bases (1 codon on the mRNA) exposing a new A site by the action of a translocase The
next appropriate tRNA molecule "lands" bringing its amino acid right next to the tRNA holding
the two amino acids. At this point, the process repeats itself: a peptide bond forms between the
two amino acid molecules already joined together and the newly brought in amino acid; the
tRNA on the P site leaves and the chain of amino acids is passed to the tRNA on the A site by
the action of translocase (now this site is called the P site because this tRNA now has the
growing protein chains). The ribosome slides down another codon and the procedure repeats
itself until the termination event occurs. A "stop" codon (UAA, UGA, or UAG) signals the end
of the process (Fig. 13). There is no tRNA that is complementary to the Stop Codon, so the
process of building the protein stops. An enzyme called the releasing factor then frees the newly
made polypeptide chain, also known as the protein, from the last tRNA. The mRNA molecule is
released from the ribosome as the small and large subunits fall apart. The mRNA can then be retranslated or it may be degraded, depending on how much of that particular protein is needed. All
mRNA messages are eventually degraded when the protein no longer needs to be made. Even
though every protein begins with the Methionine amino acid, not all proteins will ultimately have
methionine at one end. If the "start" methionine is not needed, it is removed before the new
protein goes to work (either inside the cell or outside the cell, depending on the type of protein
synthesized)
Fig.
13:
Showing
different
stages
http://web.mit.edu/esgbio/www/7001main.html)
of
protein
synthesis
(from
THE GENETIC REGULATION
As we know all the cells of an organism have the same DNA content and the DNA of the cell
specify its activities and characteristics, however, that there are different cell types in our bodies,
and that the activities of these cells changes with time. The hormone-producing cells in the
pituitary gland only produce growth hormone during childhood and adolescence. These same
cells remain in the pituitary in adulthood, but they do not function to produce growth hormone.
How do they know when they are needed or not needed? This question as it applied to large,
complex organisms like humans was very daunting for scientists in the first half of the 20th
century.
The Lac Operon - A Inducer
Francois Jacob and Jacques Monod were the first scientists to postulate the existence of a
transcriptionally regulated system, the Operon. While working on the lactose metabolism of E.
Coli, they elucidated that the Lac Operon comprises three structural genes-the Lac Z, Lac Y, and
Lac A and produces a polycistronic mRNA, which codes for the following enzymes:
• beta-galactosidase: This enzyme hydrolyzes the bond between the two sugars,
glucose and galactose. It is coded for by the gene LacZ.
• Lactose Permease: This enzyme spans the cell membrane and brings lactose
into the cell from the outside environment. The membrane is otherwise
essentially impermeable to lactose. It is coded for by the gene LacY.
•
Thiogalactoside transacetylase: The function of this enzyme is not known. It is
coded for by the gene LacA.
The structural genes responsible for these three enzymes appear adjacent to each other on the E.
Coli genome. A region precedes them is responsible for the regulation of the lactose metabolic
genes. It contains Lac I or regulatory gene, the promoter and the operator regions of the lac
Operon. Lac i gene code for a repressor, which is a soluble protein and bind specifically to the
lac operator region. Each subunit of the repressor has one binding site for the inducer. Whereas
promoter segment is the region to which the RNA polymerases first become attached to initiate
the transcription of the structural genes (Fig. 14a).
When lactose is present, it acts as an inducer of the operon. It enters the cell and binds to the Lac
repressor, inducing a conformational change that allows the repressor to fall off the operator
segment. Now the RNA polymerase is free to move along the DNA and RNA can be made from
the three genes. Lactose can now be metabolized (Fig. 14b).
When the inducer (lactose) is removed, the repressor returns to its original conformation and
binds to the promoter. No RNA and no protein are made. Note that RNA polymerase can still
bind to the promoter though it is unable to move and pass the operator region as repressor has
already in the position and blocking the transcriptional path (Fig. 14c). That means that when the
cell is ready to use the operon, RNA polymerase is already there and waiting to begin
transcription; the promoter does not have to wait for the holoenzyme to bind.
When levels of glucose (a catabolite) in the cell are high, a molecule called cyclic AMP is
inhibited from forming. So when glucose levels drop, more cAMP forms. cAMP binds to a
protein called CAP (catabolite activator protein), which is then activated to bind to the CAP
binding site. This activates transcription, perhaps by increasing the affinity of the site for RNA
polymerase. This phenomenon is called catabolite repression, a misnomer since it involves
activation, but understandable since it seemed that the presence of glucose repressed all the other
sugar metabolism operons (Fig. 14d).
Fig.
14a:
Diagram
representing
the
http://web.mit.edu/esgbio/www/7001main.html)
Lac
Operon
(modified
: Region coding for protein
: Regulatory region
: Deffusable regulatory proteins
from
Operator
(LacO)
Promoter
(LacP)
Repressor
(LacI)
Pi
CAP
: Binding site for repressor
: Binding site for RNA polymerase
: Gene encoding lac repressor protein
: Binds to DNA at operator and blocks binding of RNA
polymerase at promoter
: Promoter for LacI
: Binding site for cAMP/CAP complex
Fig. 14b: Diagram representing the regulation of Lac Operon in the presence of inducer
(modified from http://web.mit.edu/esgbio/www/7001main.html).
Fig. 14c: Diagram representing the regulation of Lac Operon in the absence of inducer
(modified from http://web.mit.edu/esgbio/www/7001main.html).
Fig. 14d: Diagram representing the mechanism by which cAMP regulate of Lac Operon
(modified from http://web.mit.edu/esgbio/www/7001main.html).
The Tryptophan Operon: A Repressor
When should the bacteria be transcribing genes for the synthesis of the amino acid tryptophan?
When levels of tryptophan in the cell are low, the bacteria must make its own. However, if
tryptophan is abundant in the cell or is being provided in the medium, it is a waste of energy for
the bacteria to be synthesizing it.
The Trp repressor protein can bind to the operator of the Trp operon, which contains the
tryptophan biosynthetic genes. When tryptophan is in abundance, it binds to the repressor and
induces a change so that the repressor can bind to operator DNA segment. When tryptophan
levels are low, the tryptophan falls off the repressor, and the repressor goes back to its original
conformation, losing its ability to bind to the DNA. The operator is now free for RNA
polymerase and transcription proceeds, making tryptophan biosynthetic genes and replenishing
the cell's supply of tryptophan. This kind of feedback inhibition of transcription is very common.
The Lambda Phage Cycle: Decision Control
A bacteriophage can choose between lytic and lysogenic phage cycles. When there are many
bacteria around to infect, and they are growing well, the phage wants to take advantage and
replicate itself as much as possible. However, when there are few bacteria around and little
growth potential, the phage is better off integrating into the bacterial genome and waiting until
the pickings are good again so that its progeny will have another bacterium to infect. How does
the phage make these decisions?
There exist two competing proteins in the lambda bacteriophage. One protein, C1, promotes the
lysogenic cycle. The other protein, Cro, promotes the lytic phase. These two proteins are in
direct competition to each other for sites on the "right" promoter of lambda. Being synthesized
continuously at low levels C1 concentration builds up in short availability of bacteria. It binds
with the promoter and inhibit the lytic phase (Fig. 15).
Fig. 16: Diagram representing the Lambda Phage Cycle: Decision Control (modified from
http://web.mit.edu/esgbio/www/7001main.html).
The promoter and enhancer and transcription control in eukaryotes
Initiation of transcription requires the enzyme RNA polymerase and transcription factors. Any
protein that is needed for the initiation of transcription, but which is not itself part of RNA
polymerase, is defined 'as a transcription factor. Many transcription factors act by recognizing
cis-acting sites on DNA. However, binding to DNA is not the only means of action for a
transcription factor. A factor may recognize another factor, or may recognize RNA polymerase,
or may be incorporated into an initiation complex only in the presence of several other proteins.
The ultimate test for membership of the transcription apparatus is functional: a protein must be
needed for transcription to occur at a specific promoter or set of promoters.
A significant difference between the transcription of eukaryotic and prokaryotic mRNAs is that
initiation at a eukaryotic promoter involves a large number of factors that bind to a variety of cis-
acting elements. The promoter is defined as the region containing all these binding sites, that is,
which can support transcription at the normal efficiency and with the proper control. So the
major feature defining the promoter for a eukaryotic mRNA is the location of binding sites for
transcription factors. RNA polymerase itself binds around the start point, but does not directly
contact the extended upstream region of the promoter. By contrast, the bacterial promoters
discussed early in this section are largely defined in terms of the binding site for RNA
polymerase in the immediate vicinity of the start point. Transcription in eukaryotic cells is
divided into three classes. Different RNA polymerase transcribes each class:
RNA polymerase I transcribes rRNA
RNA polymerase II transcribes mRNA
RNA polymerase III transcribes tRNA and other small RNAs.
Transcription factors are needed for initiation, but are not required subsequently. For the three
eukaryotic enzymes, the factors, rather than the RNA polymerases themselves, are principally
responsible for recognizing the promoter. This is different from bacterial RNA polymerase,
where it is the enzyme that recognizes the promoter sequences. For all eukaryotic RNA
polymerases, the factors create a structure at the promoter to provide the target that is recognized
by the enzyme. For RNA polymerases I and III, these factors are relatively simple, but for RNA
polymerase II they form a sizeable group collectively known as the basal factors. The basal
factors join with RNA polymerase II to form a. complex surrouriding the startpoint, and they
determine the site of initiation. The basal factors together with RNA polymerase constitute the
basal transcription apparatus.
The promoters for RNA polymerases I and II are (mostly) upstream of the startpoint, but some
promoters for RNA polymerase III lie downstream of the startpoint. Each promoter contains
characteristic sets of short conserved sequences that are recognized by the appropriate class of
factors. RNA polymerases I and III each recognize a relatively restricted set of promoters, and
rely upon a small number of accessory factors.
Promoters utilized by RNA polymerase II show more variation in sequence, and have a modular
organization. Short sequence elements that are recognized by transcription factors lie upstream
of the startpoint. These cis-acting sites usually are spread out over a region of >200 bp. Some of
these elements and the factors that recognize them are common: they are found in a variety of
promoters and are used constitutively. Others are specific: they identify particular classes of
genes and their use is regulated. The elements occur in different combinations in individual
promoters.
All RNA polymerase II promoters have sequence elements close to the startpoint that are bound
by the basal apparatus and that establish the site of initiation. The sequences farther upstream
determine whether the promoter is expressed in all cell types or is specifically regulated.
Promoters that are constitutively expressed (their genes are sometimes called housekeeping
genes) have upstream sequence elements that are recognized by ubiquitous activators. No
element/factor combination is an essential component of the promoter, which suggests that
initiation by RNA polymerase II may be sponsored in many different ways. Promoters that are
expressed only in certain times or places have sequence elements that require activators that are
available only at those times or places.
Sequence components of the promoter are defined operationally by the demand that they must be
located in the general vicinity of the startpoint and are required for initiation. The enhancer is
another type of site involved in initiation. It is identified by sequences that stimulate initiation,
but that are located a considerable distance from the startpoint. Enhancer elements are often
targets for tissue-specific or temporal regulation.
The components of an enhancer resemble those of the promoter; they consist of a variety of
modular elements. However, the elements are organized in a closely packed array. The elements
in an enhancer function like those in the promoter, but the enhancer does not need to be near the
startpoint. However, proteins bound at enhancer elements interact with proteins bound at
promoter elements. The distinction between promoters and enhancers is operational, rather than
implying a fundamental difference in mechanism. This view is strengthened by the fact that
some types of element are found in both promoters and enhancers.
Eukaryotic transcription is most often under positive regulation: a transcription factor is provided
under tissue-specific control to activate a promoter or set of promoters that contain a common
target sequence. Regulation by specific repression of a target promoter is less common.
Literature cited
Avery, O. T., MacLeod, C. M., and McCarty, M. (1944). Studies on the chemical nature of the
substance inducing transformation of pneumococcal types. J. Exp. Med. 98,451-460.
Benzer, S., and Champe, S. P. (1961). Ambivalent rll mutants of phage T4. Proc. Nat. Acad. Sci.
USA 47, 403-416.
Cairns, J., Stent, G., and Watson, J. D. (1966). Phage and the Origins of Molecular Biology.
Cold Spring Harbor Symp. Quant. BioI.
Chomet, S. (1994): DNA Genesis of a Discovery, 1994, Newman-Hemisphere Press, London.
Coulondre, C. et al. (1978). Molecular basis of base substitution hotspots in E. coli. Nature 274,
775-780.
Crick, F. H. C., Barnett, L., Brenner, S., and WattsTobin, R. J. (1961). General nature of the
genetic code for proteins. Nature 192, 1227-1232.
Delmonte, C. S. and Mann, L. R. B.: a recent research paper summarises some key experimental
data which are better explained by SBS models than by the double
helixhttp://www.ias.ac.in/currsci/dec102003/1564.pdf
Delmonte, C. S., A detailed study of the experimental results remaining to be explained by the
double helix model. http://www.notahelix.com/delmonte/new_struct_mol_biol.pdf
Drake, J. W. (1991). A constant rate of spontaneous mutation in DNA-based microbes. Proc.
Nat. Acad. Sci. USA 88,7160-7164.
Drake, J. W., and Balz, R. H. (1976). The biochemistry of mutagnesis. Ann. Rev. Biochem. 45,
11-37.
Drake, J. W., Charleswort_, B., Charlesworth, D., and Crow, J. F. (1998). Rates of spontaneous
mutation. Genetics 148, 1667-1686.
Griffith, F. (1928). The significance of pneumococcal types. J. Hyg. 27,113-159.
Grogan, D. W., Carver, G. T., and Drake, J. W. (2001). Genetic fidelity under harsh conditions:
analysis of spontaneous mutation in the thermoacidophilic archaeon Sulfolobus
acidocaldarius. Proc. Nat. Acad. Sci. USA 98, 7928-7933.
Hershey, A. D., and Chase, M. (1952). Independent functions of viral protein and nucleic acid in
growth of bacteriophage. J. Gen. Physiol. 36, 39-56.
Holmes, F. (2001). Yale University Press. Meselson, Stahl, and the Replication of DNA: A
History of The Most Beautiful Experiment in Biology.
http://biocrs.biomed.brown.edu/Books/Chapters/Ch%208/DH-Paper.html: Text of the original
paper that Watson and Crick published in 1953.
http://darwin.cshl.org/: (Cold Spring Harbor Laboratory) This site has several excellent
animations (Shockwave enhanced) as well as information about their favorite molecule,
DNA.
http://web.mit.edu/esgbio/www/7001main.html
http://www.accessexcellence.org/
http://www.gdb.org/Dan/DOE/prim6.html: (DOE) Terms peculiar to molecular genetics.
Judson, H. (1978). The Eighth Day of Creation. Knopf, New York.
Maki, H. (2002). Origins of Spontaneous Mutations: Specificity and Directionality of BaseSubstitution, Frameshift, and Sequence-Substitution Mutageneses. Ann. Rev. Genet. 36,
279-303.
Meselson, M. and Stahl, F. W. (1958). The replication of DNA in E. coli. Proc. Nat. Acad. Sci" 'USA 44, 671-682.
Millar, C. B., Guy, J., Sansom, O. J., Selfridge, J., MacDougall, E., Hendrich, B., Keightley, P.
D., Bishop, S. M., Clarke, A. R., and Bird, A. (2002). Enhanced CpG mutability and
tumorigenesis in MBD4-deficient mice. Science 297, 403-405.
Olby, R. (1974). The Path to the Double Helix. MacMillan, Lopdon.
Pamela Peters, from "Biotechnology: A Guide To Genetic Engineering." Wm. C. Brown
Publishers, Inc., 1993.
Pellicer, A., Wigler, M., Axel, R., and Silverstein, S. (1978). The transfer and stable integration
of the HSV thymidine kinase gene into mouse cells. Cell 14, 133-141.
Richard Dawkins (1990). The Selfish Gene, Oxford University Press.
Roth, J. R. (1974). Frameshift mutations. Ann. Rev.'Genet. 8, 319-346.
Watson, J. D., and Crick, F. H. C. (1953): A structure for DNA. Nature 171, 737-738.
Watson, J. D., and Crick, F. H. C. (1953): Genetic implications of the structure of DNA. Nature
171, 964-967.
Wilkins, M. F. H., Stokes, A. R., and Wilson, H. R. (1953): Molecular structure of DNA. Nature
171, 738-740.
Suggested Readings:
1. Lewin, B (2004): Gene VIII. Pearson Prentice Hall, Pearson Edu. Inc., NJ
Updated Internet version of the book is maintained at www.ergito.com.
2. Alvin S., Laura S., Virginia B. S. (2002): DNA. Twenty-First Century Books, A
division of Lemer Publishing Group, USA
3. Kumar A. and Srivastava A.K. (2001): Advanced Topic in Molecular Biology.
Horizon Scientific Press
4. Brown T. A. (1992): Genetics: A Molecular Approach. Chapman and Hall, London.
5. Hartl D.L. and Jones E.W. (2001): Genetics: Analysis of Gene and Genomes. Jones
and Bartlett Publishers.
6. Strachan T. and Read A.P. (1996): Human Molecular Genetics. John Wiley & Sons
Ltd. NY.