Download Transcription

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular cloning wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Hammerhead ribozyme wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Epigenomics wikipedia , lookup

Transposable element wikipedia , lookup

Genomics wikipedia , lookup

X-inactivation wikipedia , lookup

MicroRNA wikipedia , lookup

Metagenomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Point mutation wikipedia , lookup

DNA supercoil wikipedia , lookup

Genetic code wikipedia , lookup

History of genetic engineering wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Microevolution wikipedia , lookup

Human genome wikipedia , lookup

Transcription factor wikipedia , lookup

DNA polymerase wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Replisome wikipedia , lookup

Messenger RNA wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA interference wikipedia , lookup

Gene wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA world wikipedia , lookup

Polyadenylation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Epitranscriptome wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

RNA silencing wikipedia , lookup

RNA-binding protein wikipedia , lookup

RNA-Seq wikipedia , lookup

History of RNA biology wikipedia , lookup

Non-coding RNA wikipedia , lookup

Primary transcript wikipedia , lookup

Transcript
Transcription
RNA
• Genetic information is stored as DNA. DNA is a very
stable molecule.
• Information is expressed by first transcribing regions of
the DNA (genes) into RNA. In contrast to DNA, RNA is
quite unstable. This is appropriate for rapid responses
to changes in the cell’s condition and environment: RNA
represents what is needed now, not some time in the
distant past.
• RNA has a few key differences from DNA:
• RNA uses ribose instead of deoxyribose as its sugar. Two main
effects: the -OH group is subject to chemical attack, making RNA
less stable than DNA. Also, the bulky –OH group makes it
impossible to make a long regular double helix out of RNA.
• RNA uses uracil instead of thymine as a base. The difference is a
–CH3 (methyl) group on thymine.
• RNA is usually single stranded.
• RNA is short compared to DNA: just 1 gene long in eukaryotes,
and only a few genes long in prokaryotes.
RNA Secondary Structure
• The 3’ –OH group prevents double stranded RNA from
forming the regular B-form double helix that DNA forms
• RNA is usually single stranded, but the bases are
capable of pairing with each other. This causes RNA
molecules to fold up into characteristic shapes, which is
called the secondary structure of RNA.
• In addition to the usual base pairings (G-C and A-U),
several non-conventional base pairings occur: U-G, for
example.
• The basic RNA structures are stem-loops, caused by
sequences that are reverse-complements of each other.
• A stem-loop with a very small loop is sometimes called a
hairpin.
• Another important structure is the pseudoknot, which
consists of pairing between bases in a loop and a
sequence outside the stem-loop structure.
Stem-loop
and
associated
Pseudoknot
RNA Catalytic Activity
• RNA secondary structure allows RNA molecules to fold up in ways
that create microenvironments capable of catalyzing reactions: RNA
can act as an enzyme. RNAs with catalytic activity are called
ribozymes.
• There are only a few naturally occurring ribozymes, but there are
several fundamental processes carried out by RNA/protein hybrids, in
which RNA performs the actual catalytic function.
• A few examples of RNA/protein hybrids in which RNA is catalytic:
• Ribosomes: attaching an amino acid to the growing chain is catalyzed by
ribosomal RNA.
• Telomerase: adding telomere repeats is performed by RNA
• Signal Recognition particle: guides messenger RNA for membrane
proteins to the endoplasmic reticulum
• Splicing out introns in messenger RNA
• Several ribozymes have been developed in vitro as potential
pharmaceuticals. Many of them recognize specific RNA sequences
(such as those produced by viruses) and cleave them.
RNA World Hypothesis
• RNA molecules are capable of both storing information and performing
metabolic activities. In present day cells, DNA stores information and proteins
perform catalysis, with RNA as the intermediate between DNA and protein.
One can imagine a time when there was no DNA or protein, just RNA
performing both functions: this is the RNA World hypothesis.
• Very long ago, at least 3.5-4 billion years ago. (Recall the Earth is 4.6 billion
years old, and didn’t have a stable solid surface until about 4 billion years ago).
• Presumably there was once a self-replicating RNA molecule. However, no such
RNA has been found or made artificially so far.
• The RNA World hypothesis is an intriguing concept, but there is very little real
evidence for it.
• DNA seems like a molecule that was derived from RNA later in time.
• It is easy to synthesize ribose from very simple compounds like formaldehyde
(HCHO), but deoxyribose requires more and harder steps
• Without the 3’ –OH group, DNA forms a very nice B-form double helix.
• Using thymine instead of uracil makes error correction easier: deamination of
cytosine produces uracil, which can be recognized as not a normal DNA base.
• None of these arguments constitutes anything like proof
Transcription Basics
• There are 3 stages in transcription: initiation,
elongation, and termination.
• Transcription involves unwinding a short portion
of the DNA double helix, then using one strand as
a template to make a complementary RNA copy.
• The RNA transcript does not stay paired with the
template DNA strand (unlike DNA replication). This
allows several RNAs to be made from a single
template.
• Only a short region of DNA is transcribed: there is no
attempt to transcribe the entire length of the DNA
molecule
• The enzyme involved is RNA polymerase. Similar
to DNA polymerase, it used ribonucleoside
triphosphates (rNTPs or just NTPs) and adds new
nucleotides to the 3’ OH group of a growing chain.
• Unlike DNA polymerase, RNA polymerase does not
need a primer: it does not need a double stranded
region to build off. Instead, RNA polymerase starts
transcription at a single stranded promoter site.
• RNA polymerase is more error-prone than DNA
polymerase: 1 mistake in 104 bases, as opposed to 1
base in 109 for DNA polymerase
Transcription in Bacteria
• There are enough differences in transcription between bacteria and
eukaryotes that we will treat them separately. Archaea are more similar to
eukaryotes in this regard.
• All transcription in bacterial cells is done by the same RNA polymerase
enzyme.
• Transcription starts when RNA polymerase binds to the promoter region
just upstream (that is, 5’ to) from the gene.
• There isn’t a single DNA sequence that is used as a promoter. Instead,
promoters have a consensus sequence: all promoters are similar to but not
necessarily identical to the consensus.
• Bacterial promoters consist of 2 regions of about 6 bases, located about 10
and 35 bases upstream from the first base that is transcribed.
• The -10 sequence is often called the Pribnow box. It has a consensus sequence
of TATAAT.
• The -35 element also has a consensus sequence (TTGACAT), but it is usually just
called the -35 box.
The numbers above represent the
percentage of E. coli promoters
with the bases shown: the
consensus sequences.
Transcription Initiation
• The first step in transcription is called initiation. In this phase, RNA
polymerase binds to the DNA and unwinds a short stretch of it.
• In bacteria, RNA polymerase has a special subunit, called the sigma
factor, which is responsible for recognizing the promoter sequence.
• Bacterial cells contain several different sigma factors, which recognize
several different types of promoter. For instance, E. coli has 7
different sigma factors.
• There is a primary sigma factor used by most genes during normal
growth
• Also special sigma factors for heat shock conditions, nitrogen limitation,
general starvation, etc.
• Once RNA polymerase has bound to the promoter, it unwinds a
short stretch of DNA to make a single stranded region.
• At this point, the sigma factor is released, and transcription
enters the elongation phase.
Elongation
• RNA polymerase adds nucleotides to the 3’ OH group of
the growing chain. We say that the RNA chain grows
from 5’ to 3’. The beginning of an RNA molecule has a
free phosphate attached to the 5’ carbon, and the end
has a free OH group on the 3’ carbon.
• The RNA polymerase is using the template strand
(sometimes called the antisense strand) of the DNA to
make a complementary copy. Note that RNA polymerase
is moving down the template DNA strand from 3’ to 5’.
• The other DNA strand is called the coding strand or sense
strand, because it has the same base sequence as the RNA
(and not the complementary sequence).
• RNA polymerase unwinds a short region of the DNA, then
rewinds it after it has transcribed that region. RNA
polymerase forms a transcription bubble that passes
down the DNA as it is being transcribed.
• This unwinding and re-winding causes topological
problems that require topisomerases to relieve the stress
in the DNA.
Termination
• In bacteria, transcription ends at a terminator sequence.
• There are 2 types of terminators: rho-dependent and rho-independent (also called
intrinsic terminators). Rho is a protein.
• Both mechanisms require the formation of a stem-loop structure in the RNA at the
site of termination. These are caused by sequences that are inverted repeats
(sometimes called palindromes). When RNA polymerase transcribes the inverted
repeat region, a stem-loop forms in the exit channel of the polymerase, causing it to
temporarily stall.
• Rho-independent terminators are regions of RNA that can fold into a G-C rich stemloop. Immediately following the stem-loop is a sequence of U’s.
• Formation of the stem-loop causes RNA polymerase to stall temporarily just
when it is transcribing the U’s. Since A-U base pairs are weak (2 H bonds instead
of 3 in G-C pairs), the template strand DNA and the newly formed RNA dissociate
from each other, ending transcription.
• For rho-dependent terminators, the rho protein binds to a site on the RNA, then uses
ATP-derived energy to pull itself towards the RNA polymerase. When a stem-loop
forms in the newly synthesized RNA, the polymerase stalls. This allows rho to catch
up to the polymerase, and when that happens, rho pulls the polymerase off the DNA
template, ending transcription.
Transcription in Eukaryotes
• The main differences between bacterial transcription and eukaryotic
transcription:
• Eukaryotes have 3 different RNA polymerases, with different functions
• Eukaryotes have a more complicated method for initiation: several
general transcription factors (proteins) are needed
• In eukaryotes, transcription occurs in the nucleus, but translation occurs
in the cytoplasm: there is time lag between the processes.
• Many eukaryotic genes are interrupted by introns that have to be
removed before the RNA can be translated
• Eukaryotes have a very different method of termination from bacteria
• RNA polymerases in eukaryotes:
• Polymerase I: transcribes the main ribosomal RNA genes
• Polymerase II (pol2 for short): transcribes protein-coding genes. We will
mainly be discussing this enzyme
• Polymerase III: transcribes the 5S ribosomal RNA genes, transfer RNA
genes, and small non-coding RNA genes.
• Mitochondria and chloroplasts also have their own separate RNA
polymerases
RNA Polymerase II Initiation
• The main eukaryotic promoter is called a TATA box, because
its consensus sequence is TATAAA, about 25 bases upstream
from the transcription start site.
• Note its similarity to the bacterial promoter, the Pribnow (-10) box:
TATAAT. The main point is that A-T base pairs are weaker than G-C, thus
easier to pull apart.
• The initiation process is started when the TATA binding
protein (a subunit of initiation factor TFIID) binds to the
promoter.
• After TFIID binds, other transcription factors also bind,
eventually including RNA polymerase. This forms the preinitiation complex. At this point it is a closed complex,
meaning that the DNA is still wound into a double helix, with
proteins bound to it.
• We will study control of transcription later, but for now, realize
that various gene-specific transcription factors are also
involved.
More Eukaryotic Initiation
• One of the last transcription factors to bind to the preinitiation complex is TFIIH. This factor is a helicase: it
uses energy from ATP to unwind the DNA at the
promoter, forming a transcription bubble. The complex
of proteins and DNA is now an open complex.
• At this point, NTPs enter the RNA polymerase active site
and start the messenger RNA chain.
• Often, RNA polymerase stays stuck to the promoter and
repeatedly transcribes the first few bases of the gene.
This process is called abortive initiation. These tiny (3-8
bases) RNAs have no known function.
• It isn’t clear what causes the abortive initiation process
to end. But, at some point, a protein kinase (one of the
transcription factors) phosphorylates RNA polymerase.
This causes it to release most of the initiation factors
and escape from the promoter.
• Transcription then enters the elongation phase as RNA
polymerase starts moving down the DNA template strand.
Elongation
• Once it escapes from the promoter, RNA
polymerase moves down the DNA molecule.
• About 1 turn (10 bp) is unwound in the transcription
bubble. As the polymerase moves, the DNA is rewound
behind it.
• The RNA exits the polymerase using a different channel
than the DNA.
• Several protein elongation factors are involved.
• RNA polymerase often pauses. Pausing near
the promoter before transcribing the rest of the
gene seems to be a point of gene regulation
used by many genes.
Termination
• Transcription termination in eukaryotes is not as well known as in bacteria. There are no obvious
terminators in eukaryotes
• Instead, when RNA polymerase transcribes the polyadenylation sequence (consensus is AAUAAA)
an enzyme bound to the RNA polymerase recognizes it and cleaves the RNA about 30 bases
downstream. Another enzyme, polyadenylate polymerase, adds about 200 A’s to the end, using
ATP as the source.
• Note that after this, transcription of the gene is complete: the RNA is no longer bound to RNA polymerase or to the
DNA.
• Surprisingly, RNA polymerase continues transcribing the DNA after the polyadenylation sequence.
What causes it to finally stop is a bit mysterious.
• One theory: an exonuclease latches onto the transcript and starts chewing it up. When the nuclease reaches the
RNA polymerase, it ends transcription (the torpedo theory). Similar to rho-dependent termination in prokaryotes.
• Another theory: RNA polymerase spontaneously falls off the DNA at termination signals downstream from the
polyadenylation site.
RNA Processing in Eukaryotes
• In bacteria, transcription and translation occur
simultaneously: the ribosome moves down the messenger
RNA while it is being synthesized.
• In eukaryotes, transcription occurs in the nucleus, and the
RNA is then transported to the cytoplasm for translation
• RNA is easily degraded, so it must be protected during this
time. The ends of the molecule are especially vulnerable.
• Add a cap to the 5’ end, and a poly(A) tail to the 3’ end.
• The enzymes required for these processes are bound to the RNA
polymerase.
• Many eukaryotic genes are interrupted by introns, which are
spliced out of the RNA before it leaves the nucleus.
• The initial RNA copy of a gene is called the primary
transcript (or pre-mRNA). It is an exact RNA copy of the
gene’s DNA sequence. After processing, the RNA that leaves
the nucleus is called messenger RNA (mRNA).
• In bacteria, the primary transcript is messenger RNA immediately:
there is no RNA processing.
5’ Capping and 3’ Polyadenlyation
• The 5’ cap is a guanine nucleotide that has been methylated (7methyl guanine, m7G) and attached by a 5’5’ linkage to the
first nucleotide of the transcript. There are 3 phosphate groups
between the two nucleotides.
• The 3’ end of newly transcribed RNA is protected by adding 100200 adenine nucleotides to the end. An enzyme that tries to
degrade the RNA from the 3’ end first has to remove all the A’s
before it can hurt the RNA itself.
• At the 3’ end of eukaryotic genes there is a polyadenylation
sequence, whose consensus is AAUAAA. When this sequence is
transcribed, an enzyme bound to the RNA polymerase
recognizes it and cleaves the RNA about 30 bases downstream.
Another enzyme, polyadenylate polymerase, adds A’s to the
end, using ATP as the source.
• Note that after this, transcription of the gene is complete: the RNA
is no longer bound to RNA polymerase or to the DNA.
• RNA polymerase continues transcription after this point.
Intron Splicing
• Most eukaryotic genes are interrupted by introns, sequences that do not appear in the
final messenger RNA and are not translated into protein.
• Gene sequences that do appear in the messenger RNA are called exons. A gene consists of
alternating exons and introns.
• Introns are found in genes for ribosomal RNA and transfer RNA as well as protein-coding
genes.
• Introns start with a GU… and end with …AG. They have a few common
sequence elements in the middle as well: a polypyrimidine tract (U’s and C’s),
with a conserved A just upstream. However, most of the intron sequence is
not evolutionarily conserved.
• Introns in protein-coding genes are removed by spliceosomes, which are RNA/protein
hybrids.
• Spliceosomes contain 5 RNA molecules, called snRNAs (small nuclear RNA) and individually
named U1, U2, U4, U5, and U6. Each snRNA is assembled with some proteins into a snRNP
(pronounced “snurp”), a small nuclear ribonucleoprotein. Together, the snRNPs make up the
spliceosome.
• The snRNPs bind to different sequence elements of the intron, then join to form the
spliceosome. The RNA of one of the snRNPs catalyzes the breaking and joining of the
phosphodiester bonds to remove the intron and join the exons together.
Basic Splicing Mechanism
• First, the snRNPs recognize the intron
ends
• Then the conserved A near the 3’ end
is joined to the 5’ end of the intron,
leaving a free 3’ end on the preceding
exon.
• Finally, the 3’ end of the exon is joined
to the 5’ end of the following exon.
• This releases the intron in the form of
a lariat RNA. The lariat RNA is then
degraded by RNase enzymes.
Alternative Splicing
• Many genes contain sequences that are introns (i.e. spliced out) in
some cell types, but exons (i.e., not spliced out) in other cell types.
This arrangement allows one gene to produce many different variant
proteins.
• The number of genes in humans to about 20,000, but the number of
different polypeptides in humans is closer to 100,000.
• The different proteins produced by the same gene are called isoforms.
• In addition, some genes use several different transcription initiation
sites and polyadenylation sites to generate alternate protein
isoforms.
• The process is regulated by a set of proteins that bind to different
sequences within the intron.
Various types of
alternate splicing
Why Introns
• There is no general agreement about the origin or purpose of
introns.
• In general, the more complex an organism is, the more introns it has.
• Introns are rare in bacteria and archaea, and these domains do not
have spliceosomes.
• Self-splicing introns, where the RNA of the intron catalyzes its own
removal, and found in all domains of life. All bacterial introns are
self-splicing (which implies that they are ribozymes: catalyticallyactive RNAs).
• Intron positions are mostly constant across large evolutionary
distances
• Introns are more frequently found between protein domains (but
not always). This makes it easier for a random chromosomal break
and rejoin to put different domains into the same gene, possibly
creating a novel function: this is called exon shuffling.
• Two main theories: introns-early and introns-late.
• Introns-early theory: the spliceosome is a relic of the (hypothesized)
RNA World, and bacteria lack introns because they were lost as a
way of making transcription more efficient. Or, spliceosomal introns
are derived from prokaryotic self-splicing introns.
• Introns-late theory: introns have been added at various points during
evolution. This is certainly true, but the origin of introns might still
be very ancient.
RNA Editing
• RNA editing is a fairly rare form of RNA processing, in which
nucleotides are added, removed, or altered in the RNA
sequence after transcription has occurred.
• RNA editing has been seen in all domains of life, and in many
different types of RNA, including protein-coding messenger RNA.
• One simple mechanism: an enzyme deaminates a specific C in a
mRNA, converting it to U. In DNA this would be repaired, but U
is a legitimate RNA base.
• In apolioprotein, this change converts a CAA (glutamine) codon
into a UAA stop codon, which allows translation of a much shorter
protein.
• Another deamination, from adenine to inosine, is common in
mammals. Inosine acts like guanine in translation and in base
pairing.
• Adding or removing nucleotides also occurs. This process
requires a guide RNA, which is part of another
RNA/protein complex, the editosome.
Transport Out of the Nucleus and Degradation
• There is a lot of RNA in the nucleus that
is not needed in the cytoplasm: intron
lariats, unsplicing (or mis-spliced) RNAs,
etc.
• However, after being processed,
messenger RNA molecules need to
move to the cytoplasm for translation.
• The mRNA molecules must pass
through nuclear pores, which are
protein complexes that only let wellformed mRNAs out.
• The processing features of mRNA: 5’
cap, 3’ poly(A) tail, and spliced out
introns, are all marked by specific
proteins that bind to these structures.
The nuclear pore complex recognizes
each of these features, and only
releases mRNAs that have all of them.
Once the mRNA reaches the cytoplasm, ribosomes bind to it
and translation occurs.
Messenger RNA molecules have a finite lifespan, from
minutes to days. Degradation is mostly the job of the
exosome complex, which is a 3’5’ exonuclease (it starts at
the 3’ end). The length of the poly (A) tail is critical here.
Also, while being translated, other proteins block the
exosome’s access to the 3’ end. (Note: there is also an
unrelated organelle called the exosome involved in
secretion.)
Ribosomal RNA Genes
• Eukaryotic ribosomes contain 4 different RNA molecules, which
constitute 60% of the total weight of a ribosome.
• The RNAs are named by their sedimentation rate in a centrifuge:
28S, 18S, 5.8S, and 5S.
• Of these, the 28S, 18S, and 5.8S genes are transcribed as a single
unit that is then cleaved to produce the individual RNAs
• The cleavage is performed by other small RNA/protein hybrids:
snoRNPs (small nucleolar ribonucleoproeins)
• RNA polymerase I does the transcription of ribosomal RNA genes
• The ribosomal RNA genes are found in long tandem arrays
located on several different chromosomes (5 chromosomes in
humans)
• The nucleolus, an organelle within the nucleus, is the site of
ribosome production. The nucleolus sits on the ribosomal RNA
genes, which cluster together in the nucleus . Sometimes this
chromosomal region is called the nucleolus organizer.
RNA Polymerase III
• RNA polymerase III (pol3) transcribes the fourth ribosomal
RNA (5S) as well as the transfer RNA (tRNA) genes and
several other functional RNAs.
• Pol3 also transcribes the Alu sequences (mobile DNA present
in high copy number in primates, derived from the 7SL RNA
present in the signal recognition particle)
• Unlike pol1 and pol2, the promoter for many pol3transcribed genes lies within the transcribed region
• Some pol3 genes (especially tRNA genes) contain introns
that are spliced out by proteins (and not by spliceosomes).
• Transfer RNA molecules are heavily processed, altering
many of the bases.
• The CCA sequence at the 3’ end is added enzymatically (i.e. it is not
coded in the DNA)
• Altered bases include pseudouridine (Ψ), 7-methyl guanine, 5-methyl
cytosine, dihydrouridine (D), and others.