* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download RNA Transcription
Community fingerprinting wikipedia , lookup
Gene regulatory network wikipedia , lookup
RNA interference wikipedia , lookup
Molecular cloning wikipedia , lookup
Expanded genetic code wikipedia , lookup
List of types of proteins wikipedia , lookup
Biochemistry wikipedia , lookup
Transcription factor wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Molecular evolution wikipedia , lookup
RNA silencing wikipedia , lookup
Messenger RNA wikipedia , lookup
Point mutation wikipedia , lookup
Polyadenylation wikipedia , lookup
Genetic code wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epitranscriptome wikipedia , lookup
Biosynthesis wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Non-coding RNA wikipedia , lookup
Gene expression wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
1 2 3 Your goal for today’s lecture is to understand how genetic information is transmitted from the genome to the ribosome. The objectives are to explain the differences between RNA and DNA, the similarities and differences between replication and transcription, the main features of the transcription machinery and how transcripts are processed into mature mRNAs. 4 DNA is the information carrier and having thymine instead of uracil allows DNA repair the mutagenic consequence of deamination of cytosine, as per last week’s lecture! 5 6 7 The overarching tenet of molecular biology is that information in the form of the order of nucleotides and the order of amino acids flows from nucleic acid to nucleic acid and from nucleic acid to protein but not back again. This tenet was enunciated by Francis Crick as the “Central Dogma.” 8 Now that we have discussed the structure of RNA it is time to discuss one of its most important functions. The central dogma is the way information is transferred in the cell. The information present in genomes is arranged in the form of a linear code based on the sequence of nucleotides in DNA. We have already talked about one part of the central dogma, how DNA is replicated to create an identical copy of DNA. DNA replication is very important when cells divide. The cell wants to create an exact copy with the same DNA. The transfer of information from DNA into other forms is important for the cell’s day to day functions. The 9 DNA in genomes does not directly participate in protein synthesis itself, but instead uses RNA as an intermediary molecule. When the cell needs a particular protein, the nucleotide sequence of the appropriate portion of the immensely long DNA molecule in a chromosome is first copied into RNA (transcription). The resulting RNA copies are used as templates to direct the synthesis of the protein (translation). The flow of genetic information in cells is therefore from DNA to RNA to protein. All cells, from bacteria to humans, express their genetic information in this way. This principle is so fundamental that it is termed the central dogma of molecular biology. 9 During transcription Watson and Crick are temporarily unwound (by an enzyme known as RNA polymerase as we shall see) to create a transcription bubble. The bubble has two strands known as the template and the non-template strand. RNA is copied from the template strand. The region of strand separation moves down the DNA with continual unwinding and rewinding of the two strands to create a moving bubble. Note that the non-template strand has the same sequence as the RNA transcript. Note too the direction of transcription – left to right as shown – is determined by the strand that is being copied as dictated by the 5’ to 3’ rule and the antiparallel rule. That is, if the lower strand, which has its 3’ end on the left, is being copied, then the RNA must be being synthesized from right to left. The product of 10 transcription, the RNA, is extruded from the template. Thus, transcription takes place in a moving bubble with the template and non-template strands re-annealing as the growing transcript is extruded. Practice drawing a transcription bubble labeling the 5’ and 3’ ends of the two DNA strands and the growing RNA transcript. 10 Convince yourself that the direction of transcription indicates which strand is being copied! 11 Transcription is carried out by the enzyme RNA polymerase, which like DNA polymerase, is able to catalyze the formation of the phosphodiester bonds that link the nucleotides together to form a linear nucleic acid chain. Its structure resembles a crab claw. The active site is at the base of the opening and the claws clamp down on the DNA as we shall see. For the sake of simplicity we will begin our discussion of the transcription process in bacteria. 12 The initiation of bacterial transcription is the crucial point at which the bacterial cell regulates which proteins are synthesized and at what rate. Bacterial RNA polymerase is able to recognize specific sequences in the DNA that mark the start of the gene that should be transcribed into RNA. This sequence is called the promoter. RNA polymerase weakly binds to the bacterial DNA and typically slides along until it encounters a sequence called a promoter. Bacterial promoters are characterized by two conserved sequences centered at positions -10 and -35 upstream from the start of transcription of the gene. By convention the nucleotide positions that are upstream from the start site are given negative numbers while those downstream or 3’ to the start site are given positive 13 numbers. Therefore the first nucleotide position that defines the start of transcription is termed the +1 position. DNA-recognizing proteins bind to a range of sequences that conform to a greater or lesser extent to a particular consensus, a kind of Platonic ideal. Usually any given sequence is not a perfect match to the consensus. In the case of the -10 and -35 sequences, the consensuses are TATAAT (notice in the figure the -10 is actually TATATT!) and TTGACA, respectively. A strong promoter, at which RNA polymerase initiates efficiently, closely approximately this ideal and a weak promoter less so. Inspection of the -10 and -35 sequences in bacterial promoters reveals that they are asymmetrical in orientation. This asymmetry has important consequences for their arrangement in genomes. Since DNA is doublestranded, two different RNA molecules could in principle be transcribed from any gene, using each of the two DNA strands as a template. However a gene typically has only a single promoter, and because the nucleotide sequences of bacterial (as well as eukaryotic) promoters are asymmetric the polymerase can bind in only one orientation in which the -10 position is pointing in the direction of transcription. Therefore, the polymerase must transcribe the one DNA strand, since it can synthesize RNA only in the 5’ to 3’ direction. The choice of template strand for each gene is therefore determined by the location and orientation of the promoter. Genome sequences reveal that the DNA strand used as the template for RNA synthesis varies from gene to 13 gene. Are TTGACA and TATAAT located on the same strand that will serve as a template for transcription or the non-template strand? 13 If the -35 and -10 elements are a Platonic ideal with few if any promoters conforming exactly to the ideal, then how were the two sequences identified? The graphs shows the frequency of each of the four bases at each of the positions of the two elements among a large number of promoters. Promoters are rarely a perfect match to the TTGACA and TATA sequences. Instead, they are an approximation in which the closer they are to the consensus the stronger the promoter (all other things being equal). This is a general feature of sequence elements in the genome; they are rarely a perfect match to the Platonic ideal, making it a challenge for bioinformaticians to ferret 14 out binding sites in the DNA. As you can see, at each position TTGACA and TATAAT is the most frequently occurring nucleotide. 14 Now lets consider the steps that take place during the process of transcription of a gene. RNA polymerase initially binds to DNA at a promoter site to create a closed complex. A particular subunit of the RNA polymerase called sigma mediates recognition of the -10 and -35 sequences by directly contacting them. 15 Next, the RNA polymerase unwinds the double helix to expose a short stretch of nucleotides on each strand. This is known as the open complex. 16 With the DNA unwound, one of the two exposed DNA strands acts as a template for complementary basepairing with incoming ribonucleotides, two of which are joined together by the polymerase to begin the synthesis of an RNA chain. This is known as initiation. 17 The RNA polymerase moves stepwise along the DNA, unwinding the DNA helix just ahead of the active site for polymerization to expose a new region of the template strand for complementary base-pairing. In this way, the growing RNA chain is extended by one nucleotide at a time in the 5’-to-3’ direction, proceeding at a rate of about 50 nucleotides per second. The substrates are nucleoside triphosphates (ATP, CTP, UTP, and GTP), but unlike the situation in DNA replication, transcription pairs the base, uracil, with adenine. The unwound stretch of DNA is usually about 13 bases long and is referred to as a transcription bubble. Once the short stretch of unwound DNA has been transcribed, the DNA 18 double helix rewinds behind the moving RNA polymerase. This phase of transcription is known as elongation. 18 Finally, the elongating RNA polymerase encounters a second punctuation mark in the DNA known as a terminator that triggers the dissociation of the RNA polymerase and the newly synthesized transcript from the DNA, terminating transcription. After the polymerase has been released at a termination sequence, it free to bind to a new promoter, where it can begin the process of transcription again. 19 20 The transcription machinery in eukaryotes is remarkably different and more complex than that in bacteria! Eukaryotic RNA polymerase does not have a sigma factor. Instead, promoters are recognized by proteins that assemble on the DNA and in turn recruit RNA polymerase. The most important of these is the TATAbinding proteins, which recognizes a promoter element called the TATA box. 21 The protein that recognizes the TATA box is called TATA-binding protein. Interestingly, it binds in the minor groove in contrast to most sequence-specific DNA binding proteins as we discussed in an earlier lecture. In binding to DNA, the TATA-binding protein induces a sharp kink in the helix. TATA-binding protein is one of the most distinctive features of the eukaryotic transcription machinery just as sigma factor is for the prokaryotic machinery. 22 Although eukaryotic RNA polymerase has many structural similarities to bacterial RNA polymerase, there are several important differences in the way in which the bacterial and eukaryotic enzymes function. For example, bacterial RNA polymerase is able to initiate transcription on a DNA template without the help of additional proteins. In contrast, eukaryotic RNA polymerases require the help of a large set of proteins called general transcription factors, which must assemble at the promoter with the polymerase before the polymerase can begin transcription. The assembly process starts with the binding of a general transcription factor to a short double-helical DNA sequence primarily composed of T and A nucleotides. For this reason, this sequence is known as the TATA sequence, or TATA box. The TATA box is typically located roughly 30 23 nucleotides upstream from the transcription start site. It is not the only DNA sequence that signals the start of transcription, but for most polymerase promoters, it is the most important. 23 TBP in turn recruits additional protein factors to the promoter. 24 This complex of promoter-bound proteins, in turn, recruits RNA polymerase. “recruits” simply means that by diffusion RNA polymerase bumps into the assemblage and is then held there by binding to it. 25 Finally, yet other factors are recruited that trigger DNA melting, open complex formation and the initiation of transcription. 26 27 This slide summarizes the main points on transcription. 28 Now we turn to what happens to the transcript before it is ready to be translated by the ribosome. In bacteria, the production of messenger RNA molecules, which serve as the template for protein synthesis, is relatively simple. The 5’ end of an mRNA molecule is produced by the initiation of transcription by RNA polymerase at a promoter and the 3’ end is produced by the termination of transcription. Since bacterial genes are entirely comprised of contiguous coding sequence, the bacterial protein is translated from the unprocessed RNA transcript (sometimes referred to as the primary transcript). Since bacteria lack a nucleus, transcription and subsequent translation into protein take place in a common compartment. In eukaryotes, transcription takes place in the 29 nucleus and the translation of mRNAs into proteins takes place in the cytoplasm. But before a newly synthesized transcript is ready to be translated, it undergoes three critical maturation events as we discuss: it acquires a CAP at the 5’ end, a poly-A tail at the 3’ end, and sequences in between are spliced out. All of these modification reactions take place in the nucleus and are catalyzed by a variety of enzymes. 29 The first of these maturation events is acquistion of a CAP. Newly synthesized transcripts acquire a CAP in the nucleus. The CAP is an unusual structure in which a guanine nucleotide attached at its 5’ end to the 5’ terminus of the transcript via three phosphates! (Another distinctive feature of the CAP that you need not learn is the presence of a methyl group at the 7 position of the guanine, which was removed from the figure for simplicity.) The CAP will become important when we consider the translation of eukaryotic mRNAs. 30 A second modification takes place at the 3’ end of the transcript. An enzyme called poly-A polymerase adds, one at a time, approximately 200 adenine nucleotides to the 3′ end of the RNA. The nucleotide precursor for these additions is ATP, and 5′-to-3′ phosphodiester bonds are formed as in conventional RNA synthesis. Unlike the usual RNA polymerases, poly-A polymerase does not require a template; hence the poly-A tail of eukaryotic mRNAs is not directly encoded in the genome. The third and most spectacular modification is splicing. Coding sequences (“exons”) in mRNA in higher cells are frequently interrupted by non-coding sequences known as “introns”, which must be removed by splicing before the RNA is ready to be translated. 31 The organization of eukaryotic genes is more complex than that of their bacterial counterparts. The majority of eukaryotic genes are made up of sequences that encode protein and thus are expressed (so-called exons) interspersed with intervening sequences (so-called introns) that do not code for protein. In other words, the proteincoding segments of eukaryotic genes (but rarely prokaryotic genes) are interrupted by non-protein coding introns. Often these introns compose the large majority of the gene. Therefore, in eukaryotic cells the primary RNA transcript (sometimes referred to as the pre-mRNA) contains both coding (exon) and noncoding (intron) sequences. Before the transcripts can be translated into protein, the introns must be spliced out. Like the addition of the CAP and the poly-A tail, splicing takes place in the nucleus before the resulting mature resulting mRNA is transported to the 32 cytoplasm, where translation takes place. 32 Eukaryotic cells are able to recognize and splice out intron sequences with high fidelity. The process of intron sequence removal involves three positions on the RNA known as the 5’ splice site, the 3’ splice site, and the branch point adenosine in the intron sequence. Each of these three sites has a consensus nucleotide sequence that is similar from intron to intron, providing the cell with cues on where splicing is to take place. The 5’ splice site sits at the boundary between exon 1 and the 5’ end of the intron. Likewise, the 3’ splice site sits at the boundary of exon 2 and the 3’ end of the intron. Finally, an adenosine internal to the intron known as the branch point participates in the splicing process as we discuss. Most of the bases important for 33 splicing lie within the intron. The most important (but not important for LS 1a!) are: GU at the 5’ splice site; AG at the 3’ splice site; the “branch point” A internal to the intron. How does splicing occur? It is catalyzed by a large complex of proteins and RNA molecules known as the spliceosome. The splicesome catalyzes splicing by two sequential phosphoryl-transfer (trans esterification) reactions (meaning simply that one ester linkage is replaced by another). These reactions involve the 2’ hydroxyl of the branch point adenosine (highlighted in red) located within the intron. The 5’ and 3’ positions of the sugar of the branch point are as you know esterified to adjacent nucleotides in the polynucleotide backbone, but its 2’ hydroxyl is free to participate in the splicing reaction. 33 In the first trans esterification reaction the 2’ hydroxyl of the branch point adenosine attacks the phosphorous at 5’ splice site, the boundary between the 5’ end of the intron and the adjacent exon 1. As a consequence the intron forms a lariat-like loop and the sugar phosphate backbone is broken between the intron and exon 1. 34 Notice that the lariat is a most unusual 2’ 5’ branch structure. Practice drawing it! 35 In the second trans esterification reaction, The released free 3’-OH end of exon 1 attacks at the 3’ splice site, the boundary between 3’ end of the intron and the 5’ end of exon 2. This reaction joins the two exons together, releasing the intron lariat. The two exon sequences thereby become joined into a continuous coding sequence and the released intron lariat is degraded. Thus, a single splicing event removes one intron by proceeding through two sequential phosphoryl-transfer reactions. This pair of reactions join two exons while removing the intron as a lariat. It is vitally important that the spliceosome mediate 36 splicing with nucleotide precision. If it did not, the resulting mRNA would have one or more nucleotides added or deleted from the composite 36 37 38 39 40 41 42 43 You should be able to explain the above from this lecture. 44 As we discussed earlier, RNA is made up of ribonucleotides (adenine, cytidine, guanine, and uracil), while DNA contains deoxyribonucleotides of adenine, cytidine, guanine, and thymine. Since four nucleotides, taken individually, could represent only 4 of the 20 possible amino acids in coding the linear arrangement in proteins, a group of nucleotides is required to represent each amino acid. The code employed must be capable of specifying at least 20 amino acids. If two nucleotides were used to code for one amino acid, then only 16 (or different code units could be formed, and there would not be sufficient unique codes to account for 20 amino acids. However, if a group of three nucleotides is used for each amino acid, then 64 (or 43) code units are available for use. Therefore, any code using groups of three or more nucleotides will have more than enough units to encode 20 amino acids, and many such arrangements are mathematically possible. 4 2) 45 The genetic code is a triplet code, with every three nucleotides being decoded from a specified starting point in the mRNA and in the 5’ to 3’ direction. Each triplet is called a codon. Since there are 61 codons for 20 amino acids, it follows that many amino acids being specified by more than one codon. Indeed, only two — methionine and tryptophan — have a single codon; at the other extreme, leucine, serine, and arginine are each specified by six different codons. The different codons for a given amino acid are said to be synonymous. The code itself can be termed degenerate since it contains redundancies. 46 The genetic code is a triplet code, with every three nucleotides being decoded from a specified starting point in the mRNA and exclusively in the 5’ to 3’ direction. Each triplet is called a codon. Of the 64 possible codons in the genetic code, 61 specify individual amino acids. 47 48 49 Just as proteins have primary (sequence), secondary (local folding), and tertiary (higher order, threedimensional folding), so too do RNA molecules. tRNA molecules exhibit a characteristic secondary structure that resembles a clover leaf (upside down in the above) with three regions of double-stranded base pairing. The 3’ and 5’ ends of the RNA pair with each other in the stem of the cloverleaf with the protruding 3’ terminus being the site of amino acid attachment. The “anticodon”, which pairs with the codon in mRNA, is in a loop at the opposite end of the tRNA. 50 Shown is the three-dimensional (tertiary) structure of the tRNA molecules, which has a L-like configuration (upside down in the above). As in the clover leaf, the amino acid attachment site and the anticodon are located at opposite ends of the molecule. On the ribosome, the anticodon pairs with the codon in mRNA while the 3’ terminus protrudes into the catalytic center where peptide bond formation takes place, as we shall see. 51