* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Transcription
Molecular cloning wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Hammerhead ribozyme wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Epigenomics wikipedia , lookup
Transposable element wikipedia , lookup
X-inactivation wikipedia , lookup
Metagenomics wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Point mutation wikipedia , lookup
DNA supercoil wikipedia , lookup
Genetic code wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Microevolution wikipedia , lookup
Human genome wikipedia , lookup
Transcription factor wikipedia , lookup
DNA polymerase wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Messenger RNA wikipedia , lookup
Non-coding DNA wikipedia , lookup
RNA interference wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Polyadenylation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Epitranscriptome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
RNA silencing wikipedia , lookup
RNA-binding protein wikipedia , lookup
History of RNA biology wikipedia , lookup
Transcription RNA • Genetic information is stored as DNA. DNA is a very stable molecule. • Information is expressed by first transcribing regions of the DNA (genes) into RNA. In contrast to DNA, RNA is quite unstable. This is appropriate for rapid responses to changes in the cell’s condition and environment: RNA represents what is needed now, not some time in the distant past. • RNA has a few key differences from DNA: • RNA uses ribose instead of deoxyribose as its sugar. Two main effects: the -OH group is subject to chemical attack, making RNA less stable than DNA. Also, the bulky –OH group makes it impossible to make a long regular double helix out of RNA. • RNA uses uracil instead of thymine as a base. The difference is a –CH3 (methyl) group on thymine. • RNA is usually single stranded. • RNA is short compared to DNA: just 1 gene long in eukaryotes, and only a few genes long in prokaryotes. RNA Secondary Structure • The 3’ –OH group prevents double stranded RNA from forming the regular B-form double helix that DNA forms • RNA is usually single stranded, but the bases are capable of pairing with each other. This causes RNA molecules to fold up into characteristic shapes, which is called the secondary structure of RNA. • In addition to the usual base pairings (G-C and A-U), several non-conventional base pairings occur: U-G, for example. • The basic RNA structures are stem-loops, caused by sequences that are reverse-complements of each other. • A stem-loop with a very small loop is sometimes called a hairpin. • Another important structure is the pseudoknot, which consists of pairing between bases in a loop and a sequence outside the stem-loop structure. Stem-loop and associated Pseudoknot RNA Catalytic Activity • RNA secondary structure allows RNA molecules to fold up in ways that create microenvironments capable of catalyzing reactions: RNA can act as an enzyme. RNAs with catalytic activity are called ribozymes. • There are only a few naturally occurring ribozymes, but there are several fundamental processes carried out by RNA/protein hybrids, in which RNA performs the actual catalytic function. • A few examples of RNA/protein hybrids in which RNA is catalytic: • Ribosomes: attaching an amino acid to the growing chain is catalyzed by ribosomal RNA. • Telomerase: adding telomere repeats is performed by RNA • Signal Recognition particle: guides messenger RNA for membrane proteins to the endoplasmic reticulum • Splicing out introns in messenger RNA • Several ribozymes have been developed in vitro as potential pharmaceuticals. Many of them recognize specific RNA sequences (such as those produced by viruses) and cleave them. RNA World Hypothesis • RNA molecules are capable of both storing information and performing metabolic activities. In present day cells, DNA stores information and proteins perform catalysis, with RNA as the intermediate between DNA and protein. One can imagine a time when there was no DNA or protein, just RNA performing both functions: this is the RNA World hypothesis. • Very long ago, at least 3.5-4 billion years ago. (Recall the Earth is 4.6 billion years old, and didn’t have a stable solid surface until about 4 billion years ago). • Presumably there was once a self-replicating RNA molecule. However, no such RNA has been found or made artificially so far. • The RNA World hypothesis is an intriguing concept, but there is very little real evidence for it. • DNA seems like a molecule that was derived from RNA later in time. • It is easy to synthesize ribose from very simple compounds like formaldehyde (HCHO), but deoxyribose requires more and harder steps • Without the 3’ –OH group, DNA forms a very nice B-form double helix. • Using thymine instead of uracil makes error correction easier: deamination of cytosine produces uracil, which can be recognized as not a normal DNA base. • None of these arguments constitutes anything like proof Transcription Basics • There are 3 stages in transcription: initiation, elongation, and termination. • Transcription involves unwinding a short portion of the DNA double helix, then using one strand as a template to make a complementary RNA copy. • The RNA transcript does not stay paired with the template DNA strand (unlike DNA replication). This allows several RNAs to be made from a single template. • Only a short region of DNA is transcribed: there is no attempt to transcribe the entire length of the DNA molecule • The enzyme involved is RNA polymerase. Similar to DNA polymerase, it used ribonucleoside triphosphates (rNTPs or just NTPs) and adds new nucleotides to the 3’ OH group of a growing chain. • Unlike DNA polymerase, RNA polymerase does not need a primer: it does not need a double stranded region to build off. Instead, RNA polymerase starts transcription at a single stranded promoter site. • RNA polymerase is more error-prone than DNA polymerase: 1 mistake in 104 bases, as opposed to 1 base in 109 for DNA polymerase Transcription in Bacteria • There are enough differences in transcription between bacteria and eukaryotes that we will treat them separately. Archaea are more similar to eukaryotes in this regard. • All transcription in bacterial cells is done by the same RNA polymerase enzyme. • Transcription starts when RNA polymerase binds to the promoter region just upstream (that is, 5’ to) from the gene. • There isn’t a single DNA sequence that is used as a promoter. Instead, promoters have a consensus sequence: all promoters are similar to but not necessarily identical to the consensus. • Bacterial promoters consist of 2 regions of about 6 bases, located about 10 and 35 bases upstream from the first base that is transcribed. • The -10 sequence is often called the Pribnow box. It has a consensus sequence of TATAAT. • The -35 element also has a consensus sequence (TTGACAT), but it is usually just called the -35 box. The numbers above represent the percentage of E. coli promoters with the bases shown: the consensus sequences. Transcription Initiation • The first step in transcription is called initiation. In this phase, RNA polymerase binds to the DNA and unwinds a short stretch of it. • In bacteria, RNA polymerase has a special subunit, called the sigma factor, which is responsible for recognizing the promoter sequence. • Bacterial cells contain several different sigma factors, which recognize several different types of promoter. For instance, E. coli has 7 different sigma factors. • There is a primary sigma factor used by most genes during normal growth • Also special sigma factors for heat shock conditions, nitrogen limitation, general starvation, etc. • Once RNA polymerase has bound to the promoter, it unwinds a short stretch of DNA to make a single stranded region. • At this point, the sigma factor is released, and transcription enters the elongation phase. Elongation • RNA polymerase adds nucleotides to the 3’ OH group of the growing chain. We say that the RNA chain grows from 5’ to 3’. The beginning of an RNA molecule has a free phosphate attached to the 5’ carbon, and the end has a free OH group on the 3’ carbon. • The RNA polymerase is using the template strand (sometimes called the antisense strand) of the DNA to make a complementary copy. Note that RNA polymerase is moving down the template DNA strand from 3’ to 5’. • The other DNA strand is called the coding strand or sense strand, because it has the same base sequence as the RNA (and not the complementary sequence). • RNA polymerase unwinds a short region of the DNA, then rewinds it after it has transcribed that region. RNA polymerase forms a transcription bubble that passes down the DNA as it is being transcribed. • This unwinding and re-winding causes topological problems that require topisomerases to relieve the stress in the DNA. Termination • In bacteria, transcription ends at a terminator sequence. • There are 2 types of terminators: rho-dependent and rho-independent (also called intrinsic terminators). Rho is a protein. • Both mechanisms require the formation of a stem-loop structure in the RNA at the site of termination. These are caused by sequences that are inverted repeats (sometimes called palindromes). When RNA polymerase transcribes the inverted repeat region, a stem-loop forms in the exit channel of the polymerase, causing it to temporarily stall. • Rho-independent terminators are regions of RNA that can fold into a G-C rich stemloop. Immediately following the stem-loop is a sequence of U’s. • Formation of the stem-loop causes RNA polymerase to stall temporarily just when it is transcribing the U’s. Since A-U base pairs are weak (2 H bonds instead of 3 in G-C pairs), the template strand DNA and the newly formed RNA dissociate from each other, ending transcription. • For rho-dependent terminators, the rho protein binds to a site on the RNA, then uses ATP-derived energy to pull itself towards the RNA polymerase. When a stem-loop forms in the newly synthesized RNA, the polymerase stalls. This allows rho to catch up to the polymerase, and when that happens, rho pulls the polymerase off the DNA template, ending transcription. Transcription in Eukaryotes • The main differences between bacterial transcription and eukaryotic transcription: • Eukaryotes have 3 different RNA polymerases, with different functions • Eukaryotes have a more complicated method for initiation: several general transcription factors (proteins) are needed • In eukaryotes, transcription occurs in the nucleus, but translation occurs in the cytoplasm: there is time lag between the processes. • Many eukaryotic genes are interrupted by introns that have to be removed before the RNA can be translated • Eukaryotes have a very different method of termination from bacteria • RNA polymerases in eukaryotes: • Polymerase I: transcribes the main ribosomal RNA genes • Polymerase II (pol2 for short): transcribes protein-coding genes. We will mainly be discussing this enzyme • Polymerase III: transcribes the 5S ribosomal RNA genes, transfer RNA genes, and small non-coding RNA genes. • Mitochondria and chloroplasts also have their own separate RNA polymerases RNA Polymerase II Initiation • The main eukaryotic promoter is called a TATA box, because its consensus sequence is TATAAA, about 25 bases upstream from the transcription start site. • Note its similarity to the bacterial promoter, the Pribnow (-10) box: TATAAT. The main point is that A-T base pairs are weaker than G-C, thus easier to pull apart. • The initiation process is started when the TATA binding protein (a subunit of initiation factor TFIID) binds to the promoter. • After TFIID binds, other transcription factors also bind, eventually including RNA polymerase. This forms the preinitiation complex. At this point it is a closed complex, meaning that the DNA is still wound into a double helix, with proteins bound to it. • We will study control of transcription later, but for now, realize that various gene-specific transcription factors are also involved. More Eukaryotic Initiation • One of the last transcription factors to bind to the preinitiation complex is TFIIH. This factor is a helicase: it uses energy from ATP to unwind the DNA at the promoter, forming a transcription bubble. The complex of proteins and DNA is now an open complex. • At this point, NTPs enter the RNA polymerase active site and start the messenger RNA chain. • Often, RNA polymerase stays stuck to the promoter and repeatedly transcribes the first few bases of the gene. This process is called abortive initiation. These tiny (3-8 bases) RNAs have no known function. • It isn’t clear what causes the abortive initiation process to end. But, at some point, a protein kinase (one of the transcription factors) phosphorylates RNA polymerase. This causes it to release most of the initiation factors and escape from the promoter. • Transcription then enters the elongation phase as RNA polymerase starts moving down the DNA template strand. Elongation • Once it escapes from the promoter, RNA polymerase moves down the DNA molecule. • About 1 turn (10 bp) is unwound in the transcription bubble. As the polymerase moves, the DNA is rewound behind it. • The RNA exits the polymerase using a different channel than the DNA. • Several protein elongation factors are involved. • RNA polymerase often pauses. Pausing near the promoter before transcribing the rest of the gene seems to be a point of gene regulation used by many genes. Termination • Transcription termination in eukaryotes is not as well known as in bacteria. There are no obvious terminators in eukaryotes • Instead, when RNA polymerase transcribes the polyadenylation sequence (consensus is AAUAAA) an enzyme bound to the RNA polymerase recognizes it and cleaves the RNA about 30 bases downstream. Another enzyme, polyadenylate polymerase, adds about 200 A’s to the end, using ATP as the source. • Note that after this, transcription of the gene is complete: the RNA is no longer bound to RNA polymerase or to the DNA. • Surprisingly, RNA polymerase continues transcribing the DNA after the polyadenylation sequence. What causes it to finally stop is a bit mysterious. • One theory: an exonuclease latches onto the transcript and starts chewing it up. When the nuclease reaches the RNA polymerase, it ends transcription (the torpedo theory). Similar to rho-dependent termination in prokaryotes. • Another theory: RNA polymerase spontaneously falls off the DNA at termination signals downstream from the polyadenylation site. RNA Processing in Eukaryotes • In bacteria, transcription and translation occur simultaneously: the ribosome moves down the messenger RNA while it is being synthesized. • In eukaryotes, transcription occurs in the nucleus, and the RNA is then transported to the cytoplasm for translation • RNA is easily degraded, so it must be protected during this time. The ends of the molecule are especially vulnerable. • Add a cap to the 5’ end, and a poly(A) tail to the 3’ end. • The enzymes required for these processes are bound to the RNA polymerase. • Many eukaryotic genes are interrupted by introns, which are spliced out of the RNA before it leaves the nucleus. • The initial RNA copy of a gene is called the primary transcript (or pre-mRNA). It is an exact RNA copy of the gene’s DNA sequence. After processing, the RNA that leaves the nucleus is called messenger RNA (mRNA). • In bacteria, the primary transcript is messenger RNA immediately: there is no RNA processing. 5’ Capping and 3’ Polyadenlyation • The 5’ cap is a guanine nucleotide that has been methylated (7methyl guanine, m7G) and attached by a 5’5’ linkage to the first nucleotide of the transcript. There are 3 phosphate groups between the two nucleotides. • The 3’ end of newly transcribed RNA is protected by adding 100200 adenine nucleotides to the end. An enzyme that tries to degrade the RNA from the 3’ end first has to remove all the A’s before it can hurt the RNA itself. • At the 3’ end of eukaryotic genes there is a polyadenylation sequence, whose consensus is AAUAAA. When this sequence is transcribed, an enzyme bound to the RNA polymerase recognizes it and cleaves the RNA about 30 bases downstream. Another enzyme, polyadenylate polymerase, adds A’s to the end, using ATP as the source. • Note that after this, transcription of the gene is complete: the RNA is no longer bound to RNA polymerase or to the DNA. • RNA polymerase continues transcription after this point. Intron Splicing • Most eukaryotic genes are interrupted by introns, sequences that do not appear in the final messenger RNA and are not translated into protein. • Gene sequences that do appear in the messenger RNA are called exons. A gene consists of alternating exons and introns. • Introns are found in genes for ribosomal RNA and transfer RNA as well as protein-coding genes. • Introns start with a GU… and end with …AG. They have a few common sequence elements in the middle as well: a polypyrimidine tract (U’s and C’s), with a conserved A just upstream. However, most of the intron sequence is not evolutionarily conserved. • Introns in protein-coding genes are removed by spliceosomes, which are RNA/protein hybrids. • Spliceosomes contain 5 RNA molecules, called snRNAs (small nuclear RNA) and individually named U1, U2, U4, U5, and U6. Each snRNA is assembled with some proteins into a snRNP (pronounced “snurp”), a small nuclear ribonucleoprotein. Together, the snRNPs make up the spliceosome. • The snRNPs bind to different sequence elements of the intron, then join to form the spliceosome. The RNA of one of the snRNPs catalyzes the breaking and joining of the phosphodiester bonds to remove the intron and join the exons together. Basic Splicing Mechanism • First, the snRNPs recognize the intron ends • Then the conserved A near the 3’ end is joined to the 5’ end of the intron, leaving a free 3’ end on the preceding exon. • Finally, the 3’ end of the exon is joined to the 5’ end of the following exon. • This releases the intron in the form of a lariat RNA. The lariat RNA is then degraded by RNase enzymes. Alternative Splicing • Many genes contain sequences that are introns (i.e. spliced out) in some cell types, but exons (i.e., not spliced out) in other cell types. This arrangement allows one gene to produce many different variant proteins. • The number of genes in humans to about 20,000, but the number of different polypeptides in humans is closer to 100,000. • The different proteins produced by the same gene are called isoforms. • In addition, some genes use several different transcription initiation sites and polyadenylation sites to generate alternate protein isoforms. • The process is regulated by a set of proteins that bind to different sequences within the intron. Various types of alternate splicing Why Introns • There is no general agreement about the origin or purpose of introns. • In general, the more complex an organism is, the more introns it has. • Introns are rare in bacteria and archaea, and these domains do not have spliceosomes. • Self-splicing introns, where the RNA of the intron catalyzes its own removal, and found in all domains of life. All bacterial introns are self-splicing (which implies that they are ribozymes: catalyticallyactive RNAs). • Intron positions are mostly constant across large evolutionary distances • Introns are more frequently found between protein domains (but not always). This makes it easier for a random chromosomal break and rejoin to put different domains into the same gene, possibly creating a novel function: this is called exon shuffling. • Two main theories: introns-early and introns-late. • Introns-early theory: the spliceosome is a relic of the (hypothesized) RNA World, and bacteria lack introns because they were lost as a way of making transcription more efficient. Or, spliceosomal introns are derived from prokaryotic self-splicing introns. • Introns-late theory: introns have been added at various points during evolution. This is certainly true, but the origin of introns might still be very ancient. RNA Editing • RNA editing is a fairly rare form of RNA processing, in which nucleotides are added, removed, or altered in the RNA sequence after transcription has occurred. • RNA editing has been seen in all domains of life, and in many different types of RNA, including protein-coding messenger RNA. • One simple mechanism: an enzyme deaminates a specific C in a mRNA, converting it to U. In DNA this would be repaired, but U is a legitimate RNA base. • In apolioprotein, this change converts a CAA (glutamine) codon into a UAA stop codon, which allows translation of a much shorter protein. • Another deamination, from adenine to inosine, is common in mammals. Inosine acts like guanine in translation and in base pairing. • Adding or removing nucleotides also occurs. This process requires a guide RNA, which is part of another RNA/protein complex, the editosome. Transport Out of the Nucleus and Degradation • There is a lot of RNA in the nucleus that is not needed in the cytoplasm: intron lariats, unsplicing (or mis-spliced) RNAs, etc. • However, after being processed, messenger RNA molecules need to move to the cytoplasm for translation. • The mRNA molecules must pass through nuclear pores, which are protein complexes that only let wellformed mRNAs out. • The processing features of mRNA: 5’ cap, 3’ poly(A) tail, and spliced out introns, are all marked by specific proteins that bind to these structures. The nuclear pore complex recognizes each of these features, and only releases mRNAs that have all of them. Once the mRNA reaches the cytoplasm, ribosomes bind to it and translation occurs. Messenger RNA molecules have a finite lifespan, from minutes to days. Degradation is mostly the job of the exosome complex, which is a 3’5’ exonuclease (it starts at the 3’ end). The length of the poly (A) tail is critical here. Also, while being translated, other proteins block the exosome’s access to the 3’ end. (Note: there is also an unrelated organelle called the exosome involved in secretion.) Ribosomal RNA Genes • Eukaryotic ribosomes contain 4 different RNA molecules, which constitute 60% of the total weight of a ribosome. • The RNAs are named by their sedimentation rate in a centrifuge: 28S, 18S, 5.8S, and 5S. • Of these, the 28S, 18S, and 5.8S genes are transcribed as a single unit that is then cleaved to produce the individual RNAs • The cleavage is performed by other small RNA/protein hybrids: snoRNPs (small nucleolar ribonucleoproeins) • RNA polymerase I does the transcription of ribosomal RNA genes • The ribosomal RNA genes are found in long tandem arrays located on several different chromosomes (5 chromosomes in humans) • The nucleolus, an organelle within the nucleus, is the site of ribosome production. The nucleolus sits on the ribosomal RNA genes, which cluster together in the nucleus . Sometimes this chromosomal region is called the nucleolus organizer. RNA Polymerase III • RNA polymerase III (pol3) transcribes the fourth ribosomal RNA (5S) as well as the transfer RNA (tRNA) genes and several other functional RNAs. • Pol3 also transcribes the Alu sequences (mobile DNA present in high copy number in primates, derived from the 7SL RNA present in the signal recognition particle) • Unlike pol1 and pol2, the promoter for many pol3transcribed genes lies within the transcribed region • Some pol3 genes (especially tRNA genes) contain introns that are spliced out by proteins (and not by spliceosomes). • Transfer RNA molecules are heavily processed, altering many of the bases. • The CCA sequence at the 3’ end is added enzymatically (i.e. it is not coded in the DNA) • Altered bases include pseudouridine (Ψ), 7-methyl guanine, 5-methyl cytosine, dihydrouridine (D), and others.