Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Essential Question Reginald H. Garrett Charles M. Grisham www.cengage.com/chemistry/garrett • How are the genes of prokaryotes and eukaryotes transcribed to form RNA products that can be translated into proteins? Chapter 29 Transcription and the Regulation of Gene Expression Reginald Garrett & Charles Grisham • University of Virginia All Cells Contain Three Major Classes of RNA – mRNA, rRNA, and tRNA • All three forms participate in protein synthesis • All RNAs are synthesized from DNA templates by DNA-dependent RNA polymerases • This process is called transcription • Only mRNAs direct the synthesis of proteins • Transcription is tightly regulated in all cells • Only 3% of genes in a typical eukaryotic cell are undergoing transcription at any given moment • The metabolic conditions and growth status of the cell dictate which gene products are needed at any moment 29.1 How Are Genes Transcribed in Prokaryotes? • In prokaryotes, virtually all RNA is synthesized by a single species of DNA-dependent RNA polymerase • RNA polymerases link NTPs (ATP, GTP, CTP, and UTP) in the order specified by base pairing with a DNA template • The polymerase moves along the DNA strand in the 3'-5' direction • Thus, the RNA chain grows 5'-3' during transcription • Subsequent hydrolysis of PPi to inorganic phosphate by pyrophosphatases makes the polymerase reaction thermodynamically favorable Sigma Subunits of Prokaryotic RNA Polymerases Identify Transcription Start Sites Conventions Used in Expressing the Sequences of Nucleic Acids and Proteins • Transcription is initiated in prokaryotes by RNA polymerase holoenzyme, with the subunit composition α2ββ'σ • The core polymerase is α2ββ' (see Figure 29.1) • Binding of the σ subunit allows the polymerase to recognize different DNA sequences that act as promoters • Promoters are nucleotide sequences that identify the location of transcription start sites, where transcription begins • Without σ bound, the core polymerase can transcribe DNA into RNA, but cannot initiate transcription • Certain conventions are used in describing information transfer from DNA to protein: • The strand of duplex DNA that is read by RNA polymerase is termed the template stand • The strand not read is the nontemplate strand • The template is read by the RNA polymerase moving 3'-5' along the template, so the RNA product, the transcript, grows in the 5'-3' direction • By convention, when the order of nucleotides in DNA is shown as a single strand, it is the 5'-3' sequence of nucleotides in the nontemplate strand that is shown Conventions Used in Expressing the Sequences of Nucleic Acids and Proteins Structure of the Core RNA Polymerase from Thermus thermophilus The template DNA strand is green, the nontemplate strand is blue, and the RNA transcript is hot pink. The 2 α chains are orange, the β chain is cyan, the β' chain is yellow. Bacteriophage T7 Expresses a Simpler, Monomeric RNA Polymerase Figure 29.2 Bacteriophage T7 RNA polymerase in the act of transcription. The DNA is shown entering the enzyme from the upper right. The template is green, the nontemplate is blue, and the RNA transcript is hot pink. The Process of Transcription Has Four Stages • Transcription can be divided into four stages: 1)Binding of RNA polymerase holoenzyme to template DNA at promoter sites 2)Initiation of polymerization 3)Chain elongation 4)Chain termination Binding of Polymerase to Template DNA Figure 29.3 Sequence of events in the initiation and elongation phases of transcription as it occurs in prokaryotes. Nucleotides in this region are numbered with reference to the base at the transcription start site, which is designated +1. • Polymerase binds nonspecifically to DNA with low affinity and migrates along it, looking for promoter • Sigma subunit recognizes promoter sequence • RNA polymerase holoenzyme and promoter form a closed promoter complex (in which the DNA is not unwound) • Polymerase then unwinds about 12 pairs to form "open promoter complex“ • RNA polymerase binding protects a nucleotide sequence spanning the region from -70 to +20, where +1 is defined as the transcription start site Properties of Prokaryotic Promoters • Promoters recognized by the σ factor typically consist of a 40 bp region on the 5'-side of the transcription start site • Within the promoter are two consensus sequence elements: • The Pribnow box near -10, with consensus TATAAT - this region is ideal for unwinding why? (It is rich in As and Ts, which only form two H bonds per base pair) • The -35 region, with consensus TTGACA - σ subunit binds here. The more the -35 region sequence corresponds to the consensus sequence, the better the σ subunit binds, and the greater is the efficiency of gene transcription The Nucleotide Sequences of Representative E. coli Promoters Figure 29.4 Consensus sequences for the -35 region, the Pribnow box, and the initiation site are shown at the bottom. The numbers represent the percent occurrence of the indicated base. In this figure, sequences are aligned relative to the Pribnow box. DNA Footprinting: Identifying the Nucleotide Sequence in DNA Where a Protein Binds • DNA footprinting is a widely used technique to identify the nucleotide sequence within DNA where a specific protein binds (such as a promoter sequence bound to RNA polymerase holoenzyme) • The protein is incubated with a labeled DNA fragment containing the sequence where the protein is thought to bind • Digestion with DNase cleaves the DNA backbone in exposed regions, but not where the DNA-binding protein is bound • Analysis of the DNase digests reveals the location of the protein-binding site on the DNA DNA Footprinting: Identifying the Nucleotide Sequence in DNA Where a Protein Binds Initiation of Polymerization Chain Elongation • RNA polymerase has two binding sites for NTPs • The initiation site prefers to bind ATP and GTP (most RNAs begin with a purine at 5'-end) • The elongation site binds the second incoming NTP • 3'-OH of the first nucleotide bound attacks α-P of the second to form a new phosphoester bond (eliminating PPi) • When 6-10 unit oligonucleotide has been made, the σ subunit dissociates, signaling the completion of "initiation" • The core polymerase (without σ) is the elongation enzyme • RNA polymerase is accurate - only about 1 error in 10,000 bases • Even this error rate is OK, since many transcripts are made from each gene • Elongation rate is 20-50 bases per second - slower in G/C-rich regions (why??*) and faster elsewhere • Topoisomerases precede and follow polymerase to relieve supercoiling *G-C base pairs share 3 H bonds, whereas A-T base pairs, with 2 H bonds, are less stable Supercoiling Versus Transcription Chain Termination (a) If the RNA polymerase followed the template strand around the axis of the DNA duplex, no supercoiling of the DNA would occur but the RNA chain would be wrapped around the double helix once every 10 bp. This possibility seems unlikely because it would be difficult to untangle the transcript from the DNA duplex. (b) Instead, gyrases and topoisomerases act to remove the torsional stresses induced by transcription. • Two types of transcription termination mechanisms operate in bacteria: One depends on Rho termination factor • rho is an ATP-dependent helicase • it moves along RNA transcript, finds the “transcription bubble", unwinds the DNA:RNA hybrid and releases RNA chain • It is likely that the RNA polymerase stalls in a G:C-rich termination region, allowing rho factor to overtake it Figure 29.7 Transcription termination by rho factor. Intrinsic Termination • The second termination mechanism is termed intrinsic termination • Here termination is determined by specific sequences in the DNA – called termination sites • Termination sites consist of 3 structural features • inverted repeats, rich in G:C, which form a stable stem-loop structure in RNA transcript • A nonrepeating segment that punctuates the inverted repeats • A run of 6-8 As in the DNA template, coding for Us in the transcript Figure 29.6 The intrinsic ermination site for the E.coli trp operon. The inverted repeats give rise to a step-loop, or “hairpin,” structure ending in a series of U residues 29.2 – How Is Transcription Regulated in Prokaryotes? • Genes for enzymes for pathways are grouped in clusters on the chromosome - called operons • This allows coordinated expression through transcription into a single polycistronic mRNA • Regulatory sequences adjacent to such a unit determines whether it is transcribed – these regulatory sequences are the promoter and the operator • Regulatory proteins work with operators to control transcription of the genes The General Organization of Operons Figure 29.8 Operons consist of transcriptional control regions and a set of related structural genes, all organized in a contiguous linear array along the chromosome. The transcriptional control regions are the promoter and the operator, which lie next to, or overlap, each other, upstream from the structural genes they control. Operators may lie at various positions relative to the promoter, either upstream or downstream. Expression of the operon is determined by access of RNA polymerase to the promoter, and occupancy of the operator by regulatory proteins influences this access. Induction activates transcription from the promoter; repression prevents it. Lactose is an Inducer of the lac Operon Figure 29.9 The structure of lactose, a β-galactoside. Metabolism of lactose depends on hydrolysis into its component sugars, glucose and galactose, by the enzyme β-galactosidase. Lactose availability induces the synthesis of this enzyme by activating transcription of the lac operon. Transcription of Operons is Controlled by Induction and Repression • Increased synthesis of enzymes in response to the presence of a metabolite is induction • Decreased synthesis in response to a metabolite is repression • Some substrates induce enzyme synthesis even though the enzymes can’t metabolize the substrate - these are gratuitous inducers - such as IPTG (isopropyl β-thiogalactoside) IPTG is a Gratuitous Inducer Figure 29.10 The structure of IPTG (isopropyl βthiogalactoside), a gratuitous inducer. The lac Operon Serves as a Pardigm of Operons • lacI mutants express the genes needed for lactose metabolism • The structural genes of the lac operon are controlled by negative regulation • lacI gene product is the lac repressor, a tetrameric protein • The lac operator is a palindromic DNA segment • lac repressor – has a DNA binding domain on Nterminus; the C-terminus binds inducer, forms tetramer. The Mode of Action of lac Repressor Figure 29.12 The structure of the lac repressor tetramer with bound IPTG (purple) is also shown. The lac Operon Figure 29.11 The operon consists of two transcription units. In one unit, there are three structural genes, lacZ, lacY, and lacA, under control of the promoter, plac, and the operator O. In the other unit, there is a regulator gene, lacI, with its own promoter, placI. The Nucleotide Sequence of the lac Operator Figure 29.13 This sequence comprises 36 bp showing nearly palindromic symmetry. The inverted repeats that constitute this approximate twofold symmetry are shaded in rose. The bases are numbered relative to the +1 start site for transcription. The G:C base pair at position +11 represents the axis of symmetry. In vitro studies show that bound lac repressor protects a 26-bp region from -5 to +21 against nuclease digestion. Bases that interact with bound lac are indicated below the operator. Lac Repressor Is a Negative Regulator of the lac Operon Catabolite Activator Protein Provides Positive Control of the lac Operon • Some promoters require an accessory protein to speed transcription • Catabolite activator protein or CAP is one such protein • CAP is a dimer of 22.5-kD peptides • N-terminus binds cAMP; C-terminus binds DNA • Binding of CAP-(cAMP)2 to DNA assists formation of closed promoter complex • Catabolite repression ensures that the operons necessary for metabolism of alternative energy sources (the lac and gal operons) remain repressed until the supply of glucose is exhausted. The Mechanism of Catabolite Repression and CAP Action Binding of CAP-(cAMP)2 induces a severe bend in DNA Figure 29.14 The mechanism of catabolite repression and CAP action. Glucose instigates catabolite repression by lowering cAMP levels. cAMP is necessary for CAP binding near promoters of operons whose gene products are involved in the metabolism of alternative energy sources such as lactose, galactose, and arabinose. The binding sites for the CAP-(cAMP)2 complex are consensus DNA sequences containing the conserved pentamer TGTGA and a less well-conserved inverted repeat, TCANA (where N is any nucleotide). Figure 29.15 The CAP dimer with two molecules of cAMP bound interacts with 27 to 30 base pairs of duplex DNA. Negative and Positive Control Systems are Fundamentally Different Negative and Positive Control Systems are Fundamentally Different • Negative and positive control systems operate in fundamentally different ways • Genes under negative control are transcribed unless they are turned off by the presence of a repressor protein • Often, transcription activation is merely the release from negative control • In contrast, genes under positive control are expressed only if an active regulator protein is present Figure 29.16 Control circuits governing the expression of genes. The araBAD Operon is Both Positively and Negatively Controlled by AraC • E. coli can use the plant pentose L-arabinose as sole source of carbon and energy • Arabinose is metabolized by three enzymes encoded in the araBAD operon • Transcription of this operon is regulated by both catabolite repression and arabinose-mediated induction • AraBAD is regulated both positively and negatively by AraC • Positive control of araBAD occurs in the presence of L-arabinose and cAMP Regulation of the araBAD operon by the combined action of CAP and AraC Protein Figure 29.17 The trp Operon is Regulated Through a CoRepressor-Mediated Negative Control Circuit trp Operon is Regulated by a Co-RepressorMediated Negative Control Circuit • The trp operon encodes a leader sequence and 5 proteins (trpE through TrpA) that synthesize tryptophan • Trp repressor controls the operon • Trp repressor binding excludes RNA polymerase from the promoter • Trp repressor also regulates trpR and aroH operons and is itself encoded by the trpR operon. This is autogenous regulation (autoregulation). Figure 29.18 The trp operon of E.coli. Attenuation is a Prokaryotic Mechanism for PostTranslational Regulation of Expression Attenuation is a Prokaryotic Mechanism for PostTranslational Regulation of Expression • In addition to repression, expression of the trp operon is controlled by transcription attenuation • Unlike the mechanisms discussed thus far, attenuation regulates transcription after it has begun • Attenuation is any regulatory mechanism that manipulates transcription termination or transcription pausing to regulate gene transcription downstream • In prokaryotes, transcription and translation are coupled, and the translating ribosome is affected by the formation and persistence of secondary structure in the mRNA • In many operons encoding enzymes of amino acid biosynthesis, a transcribed leader region lies between the promoter and the first major structural gene • These regions encode a short leader peptide containing multiple codons for the pertinent amino acid • For example, the leader peptide of the leu operon has four leucine codons, the trp operon has two tandem tryptophan codons, and so on (Fig. 29.19) • Translations of these codons depends on availability of the amino acid Sequences of Leader Peptides in Various Amino Acid Biosynthetic Operons Figure 29.19 Amino acid sequences of leader peptides in various amino acid biosynthetic operons regulated by attenuation. Color indicates amino acids synthesized in the pathway catalyzed by the operon’s gene products. (The ilv operon encodes enzymes of isoleucine, leucine, and valine biosynthesis. Attenuation is a Prokaryotic Mechanism for PostTranslational Regulation of Expression • When tryptophan is scarce, the entire trp operon is transcribed to give a polycistronic mRNA • But as [Trp] increases, more and more of the trp transcripts consist of only a 140-nucleotide fragment corresponding to the 5'-end of trpL • Tryptophan availability is causing premature attentuation of trp transcripts • This is transcription attenuation • The secondary structure of the 160 bp leader region transcript is the principal control element in transcription attenuation (Figure 29.20) The Secondary Structure of the Leader Transcript is the Control Element in Transcription Attenuation Figure 29.20 Alternative secondary structures for the leader region of the trp operon transcript. Figure 29.21 The mechanism of attenuation in the trp operon. DNA: Protein & Protein: Protein Interactions are Essential to Transcription Regulation DNA Looping Allows Multiple DNA-Binding Proteins to Interact With One Another • DNA: protein interactions are a central feature in transcriptional control • The DNA sites where regulatory proteins bind commonly display at least partial dyad symmetry or inverted repeats • DNA-binding proteins themselves are generally even-numbered oligomers (dimers, tetramers, etc.) that have innate twofold rotational symmetry • Protein: protein interactions are an essential component of transcriptional activation • Proteins that activate transcription work through protein: protein contacts with RNA polymerase • Because transcription must respond to a variety of regulatory signals, multiple proteins are essential for appropriate regulation of gene expression • These regulatory proteins are the sensors of cellular circumstances • They communicate this information to the genome by binding at specific nucleotide sequences • But DNA is a one-dimensional polymer, with limited space for proteins to bind • DNA looping permits additional proteins to convene at the initiation site and to exert their influence on creating and activating the initiation complex DNA Looping Allows Multiple DNA-Binding Proteins to Interact With One Another 29.3 How Are Genes Transcribed in Eukaryotes? Figure 29.22 Formation of a DNA loop delivers DNA-bound transcriptional activator to RNA polymerase positioned at the promoter. Protein: protein interactions between the transcriptional activator and RNA polymerase activate transcription. • Three classes of RNA polymerases (I, II and III) transcribe rRNA, mRNA and tRNA genes, respectively • Pol III transcribes a few other RNAs as well • All 3 are big, multimeric proteins (500-700 kD) • All have 2 large subunits with sequences similar to β and β' in E.coli RNA polymerase, so the catalytic site is evolutionarily conserved • Pol II is most sensitive to α−amanitin, an octapeptide from Amanita phalloides ("destroying angel mushroom") • Pol III is less so, and Pol I is insensitive Sensitivity to α-Amanitin Distinguishes the Three Classes of RNA Polymerase Figure 29.23 The structure of αamanitin, one of a series of toxic compounds known as amatoxins that are found in the mushroom Amanita phalloides. 29.3 How Are Genes Transcribed in Eukaryotes? • With three categories of polymerases acting on three sets of genes, there are also at least three categories of promoters that maintain specificity • Eukaryotic promoters are very different from prokaryotic promoters • All three eukaryotic RNA polymerases interact with their promoters via transcription factors • Transcription factors are DNA-binding proteins that recognize and accurately initiate transcription at specific promoter sequences RNA Polymerase II Transcribes Protein-Coding Genes RNA Polymerase II Transcribes Protein-Coding Genes • RNA Pol II must be capable of transcribing a great diversity of genes, but must also function at any moment only on the genes whose products are appropriate to the needs of the cell • The RNA Pol II enzymes from yeast and humans are homologous • The structure of RNA Pol II from yeast is known (Figure 29.24) and consists of 12 polypeptides • RNA polymerases adopt a claw-like structure, to grasp the DNA duplex Figure 29.24 Structure of RNA Pol II. Template DNA is green, nontemplate DNA is blue, RNA transcript is pink, emerging from the bottom of the structure. RNA Polymerase II Transcribes Protein-Coding Genes • Yeast Pol II consists of 12 different peptides (RPB1 RPB12) • RPB1 and RPB2 are homologous to E. coli RNA polymerase β′ and β • RPB1 has DNA-binding site; RPB2 binds NTP • RPB1 has C-terminal domain (CTD) consisting of multiple YSPTSPS repeats • 5 of 7 residues in the heptad repeat have –OH group, both a hydrophilic and a phosphorylatable site RNA Polymerase II Transcribes Protein-Coding Genes • CTD of RPB1 is essential; this domain projects away from the globular portion of the enzyme • Only RNA Pol II whose CTD is NOT phosphorylated can initiate transcription • TATA box (TATAAA) is a consensus promoter • Several general transcription factors are required • See TBP bound to TATA (Fig. 29.28) RNA Polymerase II Transcribes Protein-Coding Genes The Regulation of Gene Expression is More Complex in Eukaryotes The Site of Transcription Initiation Includes an Initiator (Inr) and a TATA Box • Pol II promoters consist of two separate sequence features: • the core element near the start site, where general transcription factors bind, and • More distantly located regulatory elements (known as enhancers and silencers) • Promoters encoding proteins typically contain modules of short conserved sequences • In addition to promoters, eukaryotic genes have enhancers, also known as upstream activation sequences, which may lie far from the promoter • DNA looping allows multiple proteins bound to different DNA sequences to convene Figure 29.25 The Inr and TATA box in selected eukaryotic genes. The consensus sequence of a number of such promoters is presented in the lower part of the figure, the numbers giving the percent occurrence of various bases at the positions indicated. Promoter Regions of Several Representative Eukaryotic Genes Promoter Regions of Several Representative Eukaryotic Genes Figure 29.26 Response Elements are Promoter Modules Responsive to Common Regulation Response Elements are Promoter Modules Responsive to Common Regulation • Promoter modules in genes responsive to common regulation are termed response elements • Examples include • the heat shock element (HSE) • the glucocorticoid response element (GRE), and • the metal response element (MRE) • Many genes are subject to multiple regulatory influences • Regulation of such genes is achieved through the presence of an array of different regulatory elements • The metallothionein gene is a good example (Figure 29.27) Response Elements are Promoter Modules Responsive to Common Regulation Figure 29.27 The metallothionein gene possesses several constitutive elements in its promoter (the TATA and GC boxes) as well as specific response elements such as MREs and a GRE. The BLEs are elements involved in basal level expression (constitutive expression). TRE is a tumor response element activated in the presence of tumor-promoting phorbol esters such as TPA (tetradecanoyl phorbol acetate). Transcription Initiation by RNA Polymerase II Requires TBP and the GTFs • The eukaryotic transcription initiation complex consists of: • RNA polymerase II • Five general transcription factors (GTFs) • A complex called Mediator (Srb/Med) • The CTD of Pol II anchors Mediator • Mediator allows Pol II to communicate with transcriptional activators bound at sites distant from the promoter Transcription Initiation by RNA Polymerase II Requires TBP and the GTFs Figure 29.28 Transcription initiation. (a) Model of the TATA-binding protein (TBP, gold) in complex with a DNA TATA sequence. Figure 29.28b Transcription Initiation (b) RNA pol II: Mediator: TFIIF complex. The Role of Mediator in Transcription Activation and Repression • Transcription activation requires Mediator • Mediator is a bridge between gene-specific transcription co-activators bound to enhancers and the RNA polymerase II/GTF transcription machinery bound at the promoter • Once DNA is accessible (through chromatin remodeling), a transcription co-activator binds to an enhancer and recruits Mediator to the gene • Mediator promotes the binding of GTFs and RNA polymerase II at the promoter • Mediator is 1 million daltons in mass, with a core comprised of about 20 distinct subunits in yeast and 30 subunits in humans The Role of Mediator Figure 29.29 Simple models of Mediator in the regulation of eukaryotic gene transcription. (a) Mediator as a transcription activator. Mediator regions are highlighted in color: green for the tail, yellow for the middle, and red for the head. RNA polymerase II and the GTFs are blue. The transcription co-activator is orange. DNA is shown as a black line. (b) Mediator as a repressor. Chromatin-Remodeling Complexes Alleviate Repression Due to Nucleosomes • Chromatin-remodeling complexes are enormous (MW =1 megadalton) • These assemblies serve to loosen the DNA:protein interactions in nucleosomes by sliding, ejecting, inserting, or otherwise restructuring core octamers • Two sets of factors are important: chromatinremodeling complexes that mediate ATP-dependent conformational changes in nucleosome structure • Histone-modifying enzymes that introduce covalent modifications into the N-terminal tails of the histone core octamer • Chromatin remodeling and histone modification are closely linked processes Chromatin-Remodeling Complexes Alleviate Repression Due to Nucleosomes • The central structural unit of nucleosomes, the histone “core octamer”, is constructed from the eight histone-fold protein domains of the eight various histone monomers comprising the octamer • Interactions between histone tails contributed by core histones in adjacent nucleosomes are an important influence in establishing higher orders of chromatin organization • Activation of eukaryotic transcription depends on: • Relief from repression imposed by chromatin structure • Interaction of RNA polymerase II with promoter and transcription regulatory proteins Chromatin-Remodeling Complexes Alleviate Repression Due to Nucleosomes • Two sets of factors are important to eukaryotic transcription: • Chromatin-remodeling complexes that mediate ATP dependent conformational changes • Histone-modifying enzymes that introduce covalent modifications into the N-terminal tails of the histone core octomer • Chromatin remodeling and histone modification are closely linked processes • Chromatin-remodeling complexes are nucleic-acid –stimulated multisubunit ATPases Covalent Modification of Histones Covalent Modification of Histones Forms the Basis of the Histone Code • Chromatin is remodeled through the actions of enzymes that covalently modify side chains on histones within the core octamer • Initial events in transcriptional activation include acetyl-CoA-dependent acetylation of ε-amino acids on lysine residues in histone tails by histone acetyltransferases (HATs) • Phosphorylation of Ser residues and methylation of Lys residues in histone tails also contribute to transcription regulation • Attachment of small proteins to histone C-terminal Lys residues through ubiquitination and sumolyation are two other forms of covalent modification • A code based on histone-tail covalent modifications determines gene expression through selective recruitment of proteins • Proteins that cause chromatin compaction (heterochromatin formation) lead to repression • Proteins giving easier access to DNA through relaxation of histone: DNA interactions favor the possibility of gene expression • Prominent forms of histone covalent modification are lysine acetylation, lysine methylation, serine phosphorylation, lysine ubiquitination, and lysine sumoylation Methylation and Phosphorylation Act as a Binary Switch in the Histone Code • As cells enter mitosis, the chromatin becomes condensed and histone H3 is not only methylated at K9 but also phosphorylated at the adjacent S10 • S10 phosphorylation triggers dissociation of HP1 from the heterochromatin • Thus phosphorylation next to K9 trumps HP1 binding • Similarly phosphorylation of Thr (T3) neighboring K4 in the histone H3 tail evicts CHD1 from its site on the methylated K4. • Lysine methylation is the “on” position for the binary switch that recruits proteins to the histone tail and phosphorylation at a neighboring residue turns the switch to “off” by ejecting the bound proteins Nucleosome Alteration and Interaction of RNA Polymerase II are Essential • Gene activation (initiation of transcription) requires two principal steps: (1)Alterations in nucleosomes (and thus chromatin) that relieve the general repressed state imposed by chromatin structure, followed by (2)The interaction of RNA polymerase II and the GTFs with the promoter • Transcription activators initiate the process by recruiting chromatin-altering proteins (the chromatinremodeling complexes and histone-modifying enzymes) • Once these have occurred, promoter DNA is accessible to TBP:TFIID, other GTFs, and RNA Pol II Figure 29.30 Diagram of the nucleosome. Figure 29.30 Diagram of the nucleosome. • The following slide shows a schematic diagram of the nucleosome, illustrating the various covalent modifications on the N-terminal tails of histones: • AcK = acetylated lysine residue • meK – methylated lysine residue • meR – methylated arginine residue • PS – phosphorylated serine residue • The numbers indicate the positions of the amino acids in the amino acid sequences. Note the prevalence of modifiable sites, particularly acetylatable lysine, on the N-terminal tails of histones H2B, H3, and H4. A Model for the Transcriptional Regulation of Eukaryotic Genes Figure 29.31 The DNA is a green ribbon wrapped around disclike nucleosomes. A specific transcription factor (TF, pink) is bound to a regulatory element (either an enhancer or silencer). RNA polymerase II and its associated GTF (blue) are bound at the promoter. The N-terminal tails of histones are shown as wavy lines (blue) emanating from the nucleosome discs. A specific transcription factor that is a transcription activator stimulates transcription through interactions with a co-activator whose HAT activity renders DNA more accessible. 29.4 How Do Gene Regulatory Proteins Recognize Specific DNA Sequences? • Proteins that recognize nucleic acids do so by the basic rule of macromolecular recognition: • They present a three-dimensional shape that is structurally and chemically complementary to the surface of a DNA sequence • Protein contacts with the bases of DNA usually occur within the major groove of the DNA (but not always) • Protein contacts with DNA involve H bonding and salt bridges with electronegative oxygen atoms of the phosphodiester linkages • 80% of DNA-binding proteins belong to one of three principal classes 29.4 How Do Gene Regulatory Proteins Recognize Specific DNA Sequences? • 80% of DNA-binding proteins below to one of three principal classes based on their structures: • The helix-turn-helix (HTH) motif • The zinc-finger (or Zn-finger) motif • The leucine zipper-basic region (or bZIP) • Alpha helices fit into the major groove of B-DNA • α-helix diameter (including side chains) is 1.2 nm • DNA major groove: 1.2 nm wide x 0.6 to 0.8 nm deep • The α-helix and B-form DNA are the predominant structures involved in protein: DNA interactions Proteins With the Helix-Turn-Helix Motif Use One Helix to Recognize DNA • The HTH motif is a protein structural domain consisting of two successive α-helices separated by a sharp β-turn (Figure 29.32) • All contain two α-helices separated by a loop with a β-turn • The C-terminal helix (denoted helix 3) fits in major groove of DNA; the N-terminal helix (helix 2) locks helix 3 into its DNA interface • Recognition of DNA sequence involves the sides of base pairs that face the major groove Alpha Helices and DNA • • • • A perfect fit A recurring feature of DNA-binding proteins is the presence of α-helical segments that fit directly into the major groove of B-form DNA Diameter of helix is 1.2 nm Major groove of DNA is about 1.2 nm wide and 0.6 to 0.8 nm deep Proteins can recognize specific sites in DNA Proteins With the Helix-Turn-Helix Motif Use One Helix to Recognize DNA • An HTH motif example: antp is a member of a family of eukaryotic proteins involved in the regulation of early embryonic development that have in common an amino acid sequence element known as the homeobox domain • The homeobox is a DNA motif that encodes a related 60-residue sequence (the homeobox) found among proteins of virtually every eukaryote • Embedded in the homeobox domain is an HTH motif • Homeobox domain proteins are sequence-specific transcription factors • Other DNA-binding proteins with HTH motifs are lac repressor, trp repressor, and the CAP C-term domain Proteins With the Helix-Turn-Helix Motif Use One Helix to Recognize DNA Figure 29.32 An HTH motif protein: Antp monomer bound to DNA. Helix 3 (yellow) is locked into the major groove of the DNA by helix 2 (magenta). Some Proteins Bind to DNA via Zn-Finger Motifs Figure 29.33 The Zn-finger motif of the C2H2 type showing (a) the coordination of Cys and His residues to Zn and (b) the secondary structure. Some Proteins Bind to DNA via Zn-Finger Motifs First discovered in TFIIIA from Xenopus laevis, the African clawed toad • Now known to exist in nearly all organisms • Two main classes: C2H2 and Cx • C2H2 domains consist of Cys-x2-Cys and His-x3His domains separated by at least 7-8 amino acids • This motif can be repeated as many as 13 times over the primary structure of a Zn-finger protein • Cx domains consist of 4, 5 or 6 Cys residues separated by various numbers of other residues • The Cx proteins have a variable number of Cys residues available for Zn chelation Some Proteins Bind to DNA via Zn-Finger Motifs (c) Structure of a classic C2H2 zinc finger protein with three zinc fingers bound to DNA. Some Proteins Bind to DNA via Zn-Finger Motifs • Comparison of secondary and tertiary structures • C2H2 -type Zn fingers form a folded beta strand and an alpha helix that fits into the DNA major groove • Cx-type Zn fingers consist of two minidomains of four Cys ligands to Zn followed by an alpha helix: the first helix is the DNA recognition helix, second helix packs against the first Model for a Dimeric bZIP Protein Some DNA-Binding Proteins Use a Basic Region Leucine Zipper (bZIP) Motif First found in C/EBP, a DNA-binding protein in rat liver nuclei • Now found in nearly all organisms • Characteristic features: a 28-residue sequence with Leu every 7th position and a "basic region" • (What do you know by now about 7-residue repeats?) • This suggests amphipathic α−helices and a coiled-coil dimer (see Chapter 6, page 155) The Structure of the Leucine Zipper • • • Figure 29.34 BR-A and BR-B are basic regions A and B. • • • In complex with DNA Leucine zipper proteins (aka bZIP proteins) dimerize, either as homo- or hetero-dimers The basic region is the DNA-recognition site Basic region is often modeled as a pair of helices that can wrap around the major groove Homodimers recognize dyad-symmetric DNA Heterodimers recognize non-symmetric DNA Fos and Jun are classic bZIPs Structure of a Leucine Zipper:DNA Complex Figure 29.35 Model for the heterodimeric bZIP transcription factor c-Fos:c-Jun bound to a DNA oligomer containing the AP-1 consensus target sequence TGACTCA. Eukaryotic Genes are Split Genes • Introns (non-coding regions) intervene between exons (protein-coding regions) • Examples: actin gene has 309-bp intron between first three amino acids and the other 350 or so • But chicken pro α-2 collagen gene is 40-kbp long, with 51 exons of only 5 kbp total. • In these cases, the exons range in size from 45 to 249 bases • The mechanism by which introns are excised and exons are spliced together is complex and must be precise 29.5 How Are Eukaryotic Transcripts Processed and Delivered to the Ribosomes for Translation? • In prokaryotes, transcription and translation are concomitant processes • In eukaryotes, the two processes are spatially separated: transcription occurs on DNA in the nucleus, and translation occurs on ribosomes in the cytoplasm • Thus, transcripts must be transported from the nucleus to the cytosol to be translated • On the way, these transcripts undergo processing • Alterations that convert the newly synthesized RNAs (primary transcripts) into mature mRNAs • And unlike prokaryotes, eukaryotic mRNAs encode only one polypeptide; i.e., they are monocistronic Eukaryotic Genes are Split Genes Figure 29.36 The organization of split eukaryotic genes. Eukaryotic Genes are Split Genes mRNA Processing Involves Capping, Methylation, Polyadenylylation, & Splicing • Primary transcripts (aka pre-mRNAs or heterogeneous nuclear RNA) are usually capped by addition of a guanylyl group • The reaction is catalyzed by guanylyl transferase • Cap G residue is methylated at 7-position • Additional methylations occur at 2'-O positions of next two residues and at 6-amino of the first adenine Figure 29.37 The organization of the mammalian DHRF gene in three representative species. Note that the exons are much shorter than the introns. Note also that the exon pattern is more highly conserved than the intron pattern. The Capping of Eukaryotic pre-mRNAs Figure 29.38 Guanylyl transferase catalyzes the addition of a guanylyl residue derived from GTP to the 5'-end of the growing transcript, which has a 5-triphosphate group already there. In the process, pyrophosphate (pp) is liberated from GTP and the terminal phosphate (p) is removed from the transcript: Gppp + pppApNpNpNp.. → GpppApNpNpNp… + pp + p (A is often the initial nucleotide in the primary transcript.) Methylation at Several Sites is Essential to mRNA Maturation Figure 29.39 A cap bearing only a single –CH3 on the guanyl is termed cap O. This methylation occurs in all eukaryotic mRNAs. If a methyl is also added to the 2'-O position of the first nucleotide after the cap, a cap 1 structure is generated. This is the predominant cap form in RNA from all multicellular eukaryotes. 3'-Polyadenylylation of Eukaryotic mRNAs • Termination of transcription occurs only after RNA polymerase has transcribed past a consensus AAUAAA sequence - the poly(A) addition site • 10-35 nucleotides past this site, a string of 100 to 200 adenine residues are added to the mRNA transcript - the poly(A) tail • Poly(A) polymerase adds these A residues • Poly(A) tail enhances mRNA stability Nuclear Pre-mRNA Splicing • Within the nucleus, hnRNA forms ribonucleoprotein particles (RNPs) through association with a characteristic set of nuclear proteins • These proteins maintain the hnRNA in an untangled and accessible conformation • The substrate for splicing, that is, intron excision and exon ligation, is the capped primary transcript emerging from the RNA polymerase II transcriptional apparatus • Splicing occurs exclusively in the nucleus • Consensus sequences define the exon/intron junctions in eukaryotic mRNA precursors Figure 29.40 Poly (A) addition to the 3'-ends of transcripts occurs 10 to 35 nucleotides downstream from a consensus AAUAAA sequence, defined as the polyadenylylation signal. CPSF (cleavage and polyadenylylation specificity factor) binds to this signal sequence and mediates looping of the 3'-end of the transcript through interactions with a G/U-rich sequence even further downstream. Splicing of Pre-mRNA Capped, polyadenylated RNA, in the form of a RNP complex, is the substrate for splicing • In "splicing", the introns are excised and the exons are joined together to form mature mRNA • The 5'-end of an intron in higher eukaryotes is always GU and the 3'-end is always AG • All introns have a "branch site" 18 to 40 nucleotides upstream from 3'-splice site • The branch site is essential to splicing Figure 29.41 Consensus Sequences at the Splice Sites in Vertebrate Genes The Splicing Reaction Proceeds via Formation of a Lariat Intermediate • Figure 29.42 shows the splicing mechanism • The branch site is usually YNYRAY, where Y = pyrimidine, R = purine and N is anything • The lariat, a covalently closed loop of RNA, is formed by attachment of the 5'-P of the intron's invariant 5'-G to the 2'-OH at the branch A site • The exons then join, excising the lariat. • The lariat is unstable; the 2'-5' phosphodiester is quickly cleaved and the intron is degraded in the nucleus. The Splicing Reaction Proceeds via Formation of a Lariat Intermediate Figure 29.42 Splicing of mRNA precursors. A representative precursor mRNA is depicted. Exon 1 and Exon 2 indicate two exons separated by an intervening sequence (an intron) with consensus 5', 3', and branch sites. Splicing Depends on snRNPs • Splicing depends on a unique set of small nuclear ribonucleoprotein particles - snRNPs, pronounced "snurps" • A snRNP consists of a small RNA (100-200 bases long) and about 10 different proteins • Some of the 10 proteins are general, some are specific (see Table 29.6) • Major snRNP species are abundant, with more than 100,000 copies per nucleus • snRNPs and pre-mRNA form the spliceosome • The spliceosome is the size of ribosomes, and its assembly requires ATP Splicing Depends on snRNPs snRNPs Form the Spliceosome • Splicing occurs when the various snRNPs come together with the pre-mRNA to form a multicomponent complex called the spliceosome • The spliceosome is a large complex, about the size of a ribosome; its assembly requires ATP • snRNPs U1 and U5 bind at the 5'- and 3'- splice sites, and U2 snRNP binds at the branch site • Interaction between the snRNPs brings 5'- and 3'splice sites together so the lariat can form and exon ligation can occur • Spliceosome assembly requires ATP-dependent RNA rearrangements catalyzed by spliceosomal DEAD-box ATPases/helicases Figure 29.43 Structure of the core domain of the U4 SnRNP. The U4 snRNA (orange, with bases in light blue stick) passes through the central hole in the heteroheptad Sm protein complex, SmG-SmD3-SmBSmD1-SmD2-SmF. Each of the seven Sm proteins is a different color. snRNPs Form the Spliceosome Figure 29.44 The mammalian U1 snRNA can be arranged in a secondary structure where its 5'-end is single-stranded and can base-pair with the consensus 5'-splice site of the intron. snRNPs Form the Spliceosome Alternative RNA Splicing Creates Protein Isoforms Figure 29.45 Events in spliceosome assembly. U1 snRNP binds at the 5'-splice site, followed by the association of U2 snRNP with the UACUAA*C branch-point sequence. The triple U4/U6U5 snRNP complex replaces U1 at the 5'-splice site and directs the juxtaposition of the branch-point sequence with the 5'-splice site, whereupon U4 snRNP is released. • In constitutive splicing, every intron is removed and every exon is incorporated into the mature RNA • This produces a single form of mature mRNA from the primary transcript • However, many eukaryotic genes can give rise to multiple forms of mature RNA transcripts • This may occur by: • Use of different promoters • Selection of different polyadenylylation sites • Alternative splicing of the primary transcript, or • A combination of these three mechanisms Alternative RNA Splicing Creates Protein Isoforms • Different transcript from a single gene make possible a set of related polypeptides, termed protein isoforms, each with a slightly altered function • The isoforms of fast skeletal muscle troponin T are an example of alternative splicing • This gene consists of 18 exons, 11 of which are found in all mature mRNAs and are constitutive • Five of the exons (4 through 8) are combinatorial, in that they may be included or excluded • Two (16 and 17) are mutually exclusive – one is always present but never both • 64 different mature mRNA can be formed from this gene by alternative splicing Alternative RNA Splicing Creates Protein Isoforms Figure 29.46 Organization of the fast skeletal muscle troponin T gene and the 64 possible mRNAs that can be generated from it. Exons are constitutive (yellow), combinatorial (green), or mutually exclusive (blue or orange). RNA Editing: Another Way To Increase the Diversity of Genetic Information 29.6 Can Gene Expression Be Regulated Once the Transcript Has Been Synthesized? • RNA editing is a process that changes one or more nucleotides in an RNA transcript by deaminating a base, either A→I or C→U • These changes alter the coding possibilities in a transcript, because I will pair with G (not U as A does) and U will pair with A (not G as C does) • RNA editing can increase protein diversity by (1) Altering amino acid coding possibilities (2) Introducing premature stop codons (3) Changing a splice site in a transcript • miRNAs are key regulators in post-transcriptional gene regulation • miRNAs are a large family of small, noncoding RNAs found in animals, plants, and protists • At least 800 are found in mammals and they are predicted to target the expression of about 60% of all protein-coding genes • Mature miRNAs are incorporated into a miRNA-induced silencing complex through interaction with AGO2 • In most cases, miRNAs target the 3’-untranslated region of the mRNAs they regulate • miRNA-RISC blocks gene expression in two ways: • miRNA-RISC binding can interfere with recruitment of ribosomes; in addition, miRNA-RISC complexes destabilize mRNAs through deadenylation at their 3′-ends Figure 29.47 Domain organization of human Argonaute 2 (AGO2) 29.7 – Can We Propose a Unified Theory of Gene Expression? • Traditionally, the stages of eukaryotic gene expression, from transcriptional activation through mRNA translation, have been viewed as discrete steps • We now know that each stage is part of a continuous process with physical and functional connections, running from transcription through processing to protein synthesis as DNAÆ RNAÆ protein • This continuous process is achieved by an interacting network of macromolecular machines - nucleosomes, HATs, chromatin remodeling complexes, RNA pol II, capping, splicing and poly(A) enzymes, mRNA export proteins, and ribosomes RNA Degradation Figure 29.48 A unified theory of gene expression. RNA Degradation Figure 29.49 Structure of the human exosome core, composed of nine different polypeptides. A hexameric ring of subunits surrounds a central cavity that is capped by a set of three other proteins (the ones colored pink here). • The amount of specific mRNAs or proteins in a cell at any time represents a balance between rates of macromolecular synthesis and degradation • Regulation degradation of mRNAs and proteins (see Chapter 31) is a rapid and effective way to control the levels of these macromolecules • Targeted degradation of RNAs and proteins is enclosed within ringlike or cylindrical macromolecular complexes – the exosome for RNA and the proteasome for proteins (Chapter 31) • Exosome consist of a ring of six subunits surrounding a central cavity, with one or more having RNase PH activity