Download RNA Transcription

Document related concepts

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA interference wikipedia , lookup

Molecular cloning wikipedia , lookup

Expanded genetic code wikipedia , lookup

List of types of proteins wikipedia , lookup

Biochemistry wikipedia , lookup

Transcription factor wikipedia , lookup

SR protein wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Molecular evolution wikipedia , lookup

RNA silencing wikipedia , lookup

Messenger RNA wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

Replisome wikipedia , lookup

RNA wikipedia , lookup

Polyadenylation wikipedia , lookup

Genetic code wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Epitranscriptome wikipedia , lookup

Biosynthesis wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Non-coding RNA wikipedia , lookup

Gene expression wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transcript
1
2
3
Your goal for today’s lecture is to understand how
genetic information is transmitted from the genome to the
ribosome. The objectives are to explain the differences
between RNA and DNA, the similarities and differences
between replication and transcription, the main features of
the transcription machinery and how transcripts are
processed into mature mRNAs.
4
DNA is the information carrier and having thymine
instead of uracil allows DNA repair the mutagenic
consequence of deamination of cytosine, as per last week’s
lecture!
5
6
7
The overarching tenet of molecular biology is that
information in the form of the order of nucleotides and the
order of amino acids flows from nucleic acid to nucleic acid
and from nucleic acid to protein but not back again. This
tenet was enunciated by Francis Crick as the “Central
Dogma.”
8
Now that we have discussed the structure of RNA
it is time to discuss one of its most important functions. The
central dogma is the way information is transferred in the
cell.
The information present in genomes is arranged
in the form of a linear code based on the sequence of
nucleotides in DNA. We have already talked about one part
of the central dogma, how DNA is replicated to create an
identical copy of DNA. DNA replication is very important
when cells divide. The cell wants to create an exact copy
with the same DNA.
The transfer of information from DNA into other
forms is important for the cell’s day to day functions. The
9
DNA in genomes does not directly participate in protein
synthesis itself, but instead uses RNA as an intermediary
molecule. When the cell needs a particular protein, the
nucleotide sequence of the appropriate portion of the
immensely long DNA molecule in a chromosome is first
copied into RNA (transcription). The resulting RNA copies
are used as templates to direct the synthesis of the protein
(translation). The flow of genetic information in cells is
therefore from DNA to RNA to protein. All cells, from
bacteria to humans, express their genetic information in this
way. This principle is so fundamental that it is termed the
central dogma of molecular biology.
9
During transcription Watson and Crick are
temporarily unwound (by an enzyme known as RNA
polymerase as we shall see) to create a transcription
bubble. The bubble has two strands known as the template
and the non-template strand. RNA is copied from the
template strand. The region of strand separation moves
down the DNA with continual unwinding and rewinding of
the two strands to create a moving bubble.
Note that the non-template strand has the same
sequence as the RNA transcript. Note too the direction of
transcription – left to right as shown – is determined by the
strand that is being copied as dictated by the 5’ to 3’ rule
and the antiparallel rule. That is, if the lower strand, which
has its 3’ end on the left, is being copied, then the RNA
must be being synthesized from right to left. The product of
10
transcription, the RNA, is extruded from the template. Thus,
transcription takes place in a moving bubble with the
template and non-template strands re-annealing as the
growing transcript is extruded.
Practice drawing a transcription bubble labeling
the 5’ and 3’ ends of the two DNA strands and the growing
RNA transcript.
10
Convince yourself that the direction of transcription
indicates which strand is being copied!
11
Transcription is carried out by the enzyme RNA
polymerase, which like DNA polymerase, is able to catalyze
the formation of the phosphodiester bonds that link the
nucleotides together to form a linear nucleic acid chain. Its
structure resembles a crab claw. The active site is at the
base of the opening and the claws clamp down on the DNA
as we shall see.
For the sake of simplicity we will begin our
discussion of the transcription process in bacteria.
12
The initiation of bacterial transcription is the
crucial point at which the bacterial cell regulates which
proteins are synthesized and at what rate. Bacterial RNA
polymerase is able to recognize specific sequences in the
DNA that mark the start of the gene that should be
transcribed into RNA. This sequence is called the
promoter. RNA polymerase weakly binds to the bacterial
DNA and typically slides along until it encounters a
sequence called a promoter.
Bacterial promoters are characterized by two
conserved sequences centered at positions -10 and -35
upstream from the start of transcription of the gene. By
convention the nucleotide positions that are upstream from
the start site are given negative numbers while those
downstream or 3’ to the start site are given positive
13
numbers. Therefore the first nucleotide position that defines
the start of transcription is termed the +1 position.
DNA-recognizing proteins bind to a range of
sequences that conform to a greater or lesser extent to a
particular consensus, a kind of Platonic ideal. Usually any
given sequence is not a perfect match to the consensus. In
the case of the -10 and -35 sequences, the consensuses
are TATAAT (notice in the figure the -10 is actually TATATT!)
and TTGACA, respectively. A strong promoter, at which
RNA polymerase initiates efficiently, closely approximately
this ideal and a weak promoter less so.
Inspection of the -10 and -35 sequences in
bacterial promoters reveals that they are asymmetrical in
orientation. This asymmetry has important consequences
for their arrangement in genomes. Since DNA is doublestranded, two different RNA molecules could in principle be
transcribed from any gene, using each of the two DNA
strands as a template. However a gene typically has only a
single promoter, and because the nucleotide sequences of
bacterial (as well as eukaryotic) promoters are asymmetric
the polymerase can bind in only one orientation in which the
-10 position is pointing in the direction of transcription.
Therefore, the polymerase must transcribe the one DNA
strand, since it can synthesize RNA only in the 5’ to 3’
direction. The choice of template strand for each gene is
therefore determined by the location and orientation of the
promoter. Genome sequences reveal that the DNA strand
used as the template for RNA synthesis varies from gene to
13
gene.
Are TTGACA and TATAAT located on the same
strand that will serve as a template for transcription or the
non-template strand?
13
If the -35 and -10 elements are a Platonic ideal
with few if any promoters conforming exactly to the ideal,
then how were the two sequences identified?
The graphs shows the frequency of each of the
four bases at each of the positions of the two elements
among a large number of promoters.
Promoters are rarely a perfect match to the
TTGACA and TATA sequences. Instead, they are an
approximation in which the closer they are to the consensus
the stronger the promoter (all other things being equal).
This is a general feature of sequence elements in the
genome; they are rarely a perfect match to the Platonic
ideal, making it a challenge for bioinformaticians to ferret
14
out binding sites in the DNA.
As you can see, at each position TTGACA and
TATAAT is the most frequently occurring nucleotide.
14
Now lets consider the steps that take place during
the process of transcription of a gene.
RNA polymerase initially binds to DNA at a
promoter site to create a closed complex. A particular
subunit of the RNA polymerase called sigma mediates
recognition of the -10 and -35 sequences by directly
contacting them.
15
Next, the RNA polymerase unwinds the double
helix to expose a short stretch of nucleotides on each
strand. This is known as the open complex.
16
With the DNA unwound, one of the two exposed
DNA strands acts as a template for complementary basepairing with incoming ribonucleotides, two of which are
joined together by the polymerase to begin the synthesis of
an RNA chain. This is known as initiation.
17
The RNA polymerase moves stepwise along the
DNA, unwinding the DNA helix just ahead of the active site
for polymerization to expose a new region of the template
strand for complementary base-pairing. In this way, the
growing RNA chain is extended by one nucleotide at a time
in the 5’-to-3’ direction, proceeding at a rate of about 50
nucleotides per second.
The substrates are nucleoside triphosphates
(ATP, CTP, UTP, and GTP), but unlike the situation in DNA
replication, transcription pairs the base, uracil, with adenine.
The unwound stretch of DNA is usually about 13 bases long
and is referred to as a transcription bubble. Once the short
stretch of unwound DNA has been transcribed, the DNA
18
double helix rewinds behind the moving RNA polymerase.
This phase of transcription is known as elongation.
18
Finally, the elongating RNA polymerase
encounters a second punctuation mark in the DNA known
as a terminator that triggers the dissociation of the RNA
polymerase and the newly synthesized transcript from the
DNA, terminating transcription.
After the polymerase has been released at a
termination sequence, it free to bind to a new promoter,
where it can begin the process of transcription again.
19
20
The transcription machinery in eukaryotes is
remarkably different and more complex than that in
bacteria! Eukaryotic RNA polymerase does not have a
sigma factor. Instead, promoters are recognized by proteins
that assemble on the DNA and in turn recruit RNA
polymerase. The most important of these is the TATAbinding proteins, which recognizes a promoter element
called the TATA box.
21
The protein that recognizes the TATA box is called
TATA-binding protein. Interestingly, it binds in the minor
groove in contrast to most sequence-specific DNA binding
proteins as we discussed in an earlier lecture. In binding to
DNA, the TATA-binding protein induces a sharp kink in the
helix. TATA-binding protein is one of the most distinctive
features of the eukaryotic transcription machinery just as
sigma factor is for the prokaryotic machinery.
22
Although eukaryotic RNA polymerase has many
structural similarities to bacterial RNA polymerase, there are
several important differences in the way in which the
bacterial and eukaryotic enzymes function. For example,
bacterial RNA polymerase is able to initiate transcription on
a DNA template without the help of additional proteins. In
contrast, eukaryotic RNA polymerases require the help of a
large set of proteins called general transcription factors,
which must assemble at the promoter with the polymerase
before the polymerase can begin transcription.
The assembly process starts with the binding of a
general transcription factor to a short double-helical DNA
sequence primarily composed of T and A nucleotides. For
this reason, this sequence is known as the TATA sequence,
or TATA box. The TATA box is typically located roughly 30
23
nucleotides upstream from the transcription start site. It is
not the only DNA sequence that signals the start of
transcription, but for most polymerase promoters, it is the
most important.
23
TBP in turn recruits additional protein factors to
the promoter.
24
This complex of promoter-bound proteins, in turn,
recruits RNA polymerase. “recruits” simply means that by
diffusion RNA polymerase bumps into the assemblage and
is then held there by binding to it.
25
Finally, yet other factors are recruited that trigger
DNA melting, open complex formation and the initiation of
transcription.
26
27
This slide summarizes the main points on
transcription.
28
Now we turn to what happens to the transcript
before it is ready to be translated by the ribosome. In
bacteria, the production of messenger RNA molecules,
which serve as the template for protein synthesis, is
relatively simple. The 5’ end of an mRNA molecule is
produced by the initiation of transcription by RNA
polymerase at a promoter and the 3’ end is produced by the
termination of transcription. Since bacterial genes are
entirely comprised of contiguous coding sequence, the
bacterial protein is translated from the unprocessed RNA
transcript (sometimes referred to as the primary transcript).
Since bacteria lack a nucleus, transcription and subsequent
translation into protein take place in a common
compartment.
In eukaryotes, transcription takes place in the
29
nucleus and the translation of mRNAs into proteins takes
place in the cytoplasm. But before a newly synthesized
transcript is ready to be translated, it undergoes three
critical maturation events as we discuss: it acquires a CAP
at the 5’ end, a poly-A tail at the 3’ end, and sequences in
between are spliced out. All of these modification reactions
take place in the nucleus and are catalyzed by a variety of
enzymes.
29
The first of these maturation events is acquistion
of a CAP. Newly synthesized transcripts acquire a CAP in
the nucleus. The CAP is an unusual structure in which a
guanine nucleotide attached at its 5’ end to the 5’ terminus
of the transcript via three phosphates! (Another distinctive
feature of the CAP that you need not learn is the presence
of a methyl group at the 7 position of the guanine, which
was removed from the figure for simplicity.) The CAP will
become important when we consider the translation of
eukaryotic mRNAs.
30
A second modification takes place at the 3’ end of
the transcript. An enzyme called poly-A polymerase adds,
one at a time, approximately 200 adenine nucleotides to the
3′ end of the RNA. The nucleotide precursor for these
additions is ATP, and 5′-to-3′ phosphodiester bonds are
formed as in conventional RNA synthesis. Unlike the usual
RNA polymerases, poly-A polymerase does not require a
template; hence the poly-A tail of eukaryotic mRNAs is not
directly encoded in the genome.
The third and most spectacular modification is
splicing. Coding sequences (“exons”) in mRNA in higher
cells are frequently interrupted by non-coding sequences
known as “introns”, which must be removed by splicing
before the RNA is ready to be translated.
31
The organization of eukaryotic genes is more
complex than that of their bacterial counterparts. The
majority of eukaryotic genes are made up of sequences that
encode protein and thus are expressed (so-called exons)
interspersed with intervening sequences (so-called introns)
that do not code for protein. In other words, the proteincoding segments of eukaryotic genes (but rarely prokaryotic
genes) are interrupted by non-protein coding introns. Often
these introns compose the large majority of the gene.
Therefore, in eukaryotic cells the primary RNA transcript
(sometimes referred to as the pre-mRNA) contains both
coding (exon) and noncoding (intron) sequences. Before
the transcripts can be translated into protein, the introns
must be spliced out. Like the addition of the CAP and the
poly-A tail, splicing takes place in the nucleus before the
resulting mature resulting mRNA is transported to the
32
cytoplasm, where translation takes place.
32
Eukaryotic cells are able to recognize and splice
out intron sequences with high fidelity. The process of intron
sequence removal involves three positions on the RNA
known as the 5’ splice site, the 3’ splice site, and the branch
point adenosine in the intron sequence. Each of these
three sites has a consensus nucleotide sequence that is
similar from intron to intron, providing the cell with cues on
where splicing is to take place. The 5’ splice site sits at the
boundary between exon 1 and the 5’ end of the intron.
Likewise, the 3’ splice site sits at the boundary of exon 2
and the 3’ end of the intron. Finally, an adenosine internal
to the intron known as the branch point participates in the
splicing process as we discuss.
Most of the bases important for
33
splicing lie within the intron. The most
important (but not important for LS 1a!)
are: GU at the 5’ splice site; AG at the 3’
splice site; the “branch point” A internal
to the intron.
How does splicing occur? It is
catalyzed by a large complex of proteins and
RNA molecules known as the spliceosome. The
splicesome catalyzes splicing by two sequential
phosphoryl-transfer (trans esterification)
reactions (meaning simply that one ester
linkage is replaced by another). These reactions
involve the 2’ hydroxyl of the branch point
adenosine (highlighted in red) located within the
intron. The 5’ and 3’ positions of the sugar of
the branch point are as you know esterified to
adjacent nucleotides in the polynucleotide
backbone, but its 2’ hydroxyl is free to
participate in the splicing reaction.
33
In the first trans esterification reaction
the 2’ hydroxyl of the branch point adenosine
attacks the phosphorous at 5’ splice site, the
boundary between the 5’ end of the intron and
the adjacent exon 1. As a consequence the
intron forms a lariat-like loop and the sugar
phosphate backbone is broken between the
intron and exon 1.
34
Notice that the lariat is a most unusual 2’ 5’
branch structure.
Practice drawing it!
35
In the second trans esterification reaction, The
released free 3’-OH end of exon 1 attacks at the 3’ splice
site, the boundary between 3’ end of the intron and the 5’
end of exon 2. This reaction joins the two exons together,
releasing the intron lariat. The two exon sequences thereby
become joined into a continuous coding sequence and the
released intron lariat is degraded.
Thus, a single splicing event removes one intron
by proceeding through two sequential phosphoryl-transfer
reactions. This pair of reactions join two exons while
removing the intron as a lariat.
It is vitally important that the spliceosome mediate
36
splicing with nucleotide precision. If it did not, the resulting
mRNA would have one or more nucleotides added or
deleted from the composite
36
37
38
39
40
41
42
43
You should be able to explain the above from this lecture.
44
As we discussed earlier, RNA is made up of ribonucleotides (adenine,
cytidine, guanine, and uracil), while DNA contains deoxyribonucleotides of adenine,
cytidine, guanine, and thymine. Since four nucleotides, taken individually, could
represent only 4 of the 20 possible amino acids in coding the linear arrangement in
proteins, a group of nucleotides is required to represent each amino acid. The code
employed must be capable of specifying at least 20 amino acids.
If two nucleotides were used to code for one amino acid, then only 16
(or
different code units could be formed, and there would not be sufficient unique
codes to account for 20 amino acids. However, if a group of three nucleotides is used
for each amino acid, then 64 (or 43) code units are available for use. Therefore, any
code using groups of three or more nucleotides will have more than enough units to
encode 20 amino acids, and many such arrangements are mathematically possible.
4 2)
45
The genetic code is a triplet code, with every three nucleotides being decoded from a
specified starting point in the mRNA and in the 5’ to 3’ direction. Each triplet is called a
codon.
Since there are 61 codons for 20 amino acids, it follows that many amino acids
being specified by more than one codon. Indeed, only two — methionine and
tryptophan — have a single codon; at the other extreme, leucine, serine, and arginine
are each specified by six different codons. The different codons for a given amino acid
are said to be synonymous. The code itself can be termed degenerate since it contains
redundancies.
46
The genetic code is a triplet code, with every three nucleotides being
decoded from a specified starting point in the mRNA and exclusively in the 5’ to 3’
direction. Each triplet is called a codon. Of the 64 possible codons in the genetic code,
61 specify individual amino acids.
47
48
49
Just as proteins have primary (sequence),
secondary (local folding), and tertiary (higher order, threedimensional folding), so too do RNA molecules. tRNA
molecules exhibit a characteristic secondary structure that
resembles a clover leaf (upside down in the above) with
three regions of double-stranded base pairing. The 3’ and
5’ ends of the RNA pair with each other in the stem of the
cloverleaf with the protruding 3’ terminus being the site of
amino acid attachment. The “anticodon”, which pairs with
the codon in mRNA, is in a loop at the opposite end of the
tRNA.
50
Shown is the three-dimensional (tertiary)
structure of the tRNA molecules, which has a L-like
configuration (upside down in the above). As in the clover
leaf, the amino acid attachment site and the anticodon are
located at opposite ends of the molecule. On the ribosome,
the anticodon pairs with the codon in mRNA while the 3’
terminus protrudes into the catalytic center where peptide
bond formation takes place, as we shall see.
51