Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
3862 Divya Kapoor Sanjeev Kumar Chandrayan Shubbir Ahmed Purnananda Guptasarma Division of Protein Science and Engineering, Institute of Microbial Technology (IMTECH), Chandigarh, India Received May 16, 2007 Revised June 18, 2007 Accepted June 19, 2007 Electrophoresis 2007, 28, 3862–3867 Short Communication Using DNA sequencing electrophoresis compression artifacts as reporters of stable mRNA structures affecting gene expression The formation of secondary structure in oligonucleotide DNA is known to lead to “compression” artifacts in electropherograms produced through DNA sequencing. Separately, the formation of secondary structure in mRNA is known to suppress translation; in particular, when such structures form in a region covered by the ribosome either during, or shortly after, initiation of translation. Here, we demonstrate how a DNA sequencing compression artifact provides important clues to the location(s) of translation-suppressing secondary structural elements in mRNA. Our study involves an engineered version of a gene sourced from Rhodothermus marinus encoding an enzyme called Cel12A. We introduced this gene into Escherichia coli with the intention of overexpressing it, but found that it expressed extremely poorly. Intriguingly, the gene displayed a remarkable compression artifact during DNA sequencing electrophoresis. Selected “designer” silent mutations destroyed the artifact. They also simultaneously greatly enhanced the expression of the cel12A gene, presumably by destroying stable mRNA structures that otherwise suppress translation. We propose that this method of finding problem mRNA sequences is superior to software-based analyses, especially if combined with low-temperature CE. Keywords: Compression artifact / DNA sequencing electrophoresis / Nucleic acid secondary structure DOI 10.1002/elps.200700359 It is widely appreciated that the electrophoretic separation of oligonucleotide (oligo) populations with a resolution of one nucleotide base length is critical for DNA sequencing. It is also known that compression artifacts owing to secondary structure formation in DNA (e.g., hairpins or stem-loops) present striking anomalies in oligo separations that frustrate DNA sequencing [1–3]. Figure 1A shows such an anomaly which was culled-out from the middle of a sequence readout of engineered cel12A. Because the dye-blobs that are sometimes seen in DNA sequences at nucleotide read lengths of about 70 nt happened to overlap with the latter parts of the anomalous sequence, we read the sequence manually by examining the colored electropherograms corresponding to the four DNA bases. The manually read sequence beginning with, and including, the BamH1 site (shown in bold letters below) is: @GGATCCACTGTTGAGTCGGGTGGG ACACGAGAACGG@. Correspondence: Dr. Purnananda Guptasarma, Division of Protein Science and Engineering, Institute of Microbial Technology (IMTECH), Sector 39-A, Chandigarh 160036, India E-mail: [email protected] Fax: 191-172-2690585 Abbreviation: oligo, oligonucleotide © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim The sequence should, however, actually have been: @GGATCCACTGTCGAGCTGTTCGGACAATGGGACACGAGAACGG”. In the above sequences, the region suffering from the anomaly is shown flanked by 11 or 12 correctly sequenced (underlined) nucleotides flanking it on either side. It is clear that the anomalous sequence is both compressed in relation to the correct sequence, and wrong in its detail. We were extremely intrigued by this anomaly which was reproducibly observed in a number of different sequencing reactions involving different clones of the gene. What interested us especially was the fact that the anomaly was located very close to the 50 -end of the gene, which had been inserted between the BamHI and HindIII restriction sites of the expression vector, pQE-30 (Qiagen). The BamHI site itself consists of two codons (GGA and TCC) which encode the eleventh and twelfth residues, respectively, of a 12 residues-long affinity tag encoding the sequence “MRGSHHHHHHGS” which is separated from the 30 -end of the ribosome binding site (RBS, also known as the Shine– Dalgarno sequence) by nine bases. This generates a 50 baseslong separation between the ribosome-binding site and the compression artifact. To examine whether the compression anomaly could reflect the potential for formation of a secondary structural element in mRNA that would halt, or adversely affect, the www.electrophoresis-journal.com Electrophoresis 2007, 28, 3862–3867 Nucleic Acids 3863 Figure 1. (A) DNA sequencing electropherogram showing a run of @N@ assignments interrupting a normal sequence “read”. Manual reading of the electrophoretogram showed that that the sequence in this region was wrong, although the sequences preceding and following this “compression artifact” are correct. (B) Electropherogram showing the correct sequence of the same clone after the introduction of three silent mutations (details in Fig. 3) which destroyed the secondary structure responsible for the compression artifact in (A). motion of the ribosome along the mRNA template, we analyzed the sequence of the region and found that it can form very stable hairpin structures. In Fig. 2A, we show the likely behavior of populations of lengths varying over a stretch of sequence spanning 46 bases, starting from roughly the middle of the BamHI site (GGATTC). As oligos produced through primer extension and termination increase in length up until step n 1 16 in Fig. 2A, the sequencing readout remains normal. Upon further elongation, a tight loop is formed by the closure of two successive GC base pairs. Further elongation leads to the formation of a very long hairpin containing eight very closelyspaced base pairs and a tight loop, with the antiparallel © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim strand advancing backwards along the hairpin all the way to the region of the BamHI site (by the time step n 1 26 is reached). This long double-stranded hairpin creates the compression artifact, owing to hydrodynamic volume changes associated with secondary structure formation which affect oligo mobility by making oligos travel faster through gel or polymer matrix. Generally, DNA sequencing readouts return to normal once a compression artifact has been passed in the sequence, since all longer oligos containing the secondary structure are subject to the same compression effect. In keeping with this expectation, the DNA sequence towards the 30 -end of the artifact can be seen to return to the expected sequence in www.electrophoresis-journal.com 3864 D. Kapoor et al. Electrophoresis 2007, 28, 3862–3867 Figure 2. (A) A schematic diagram, showing the products of successive step-wise additions of nucleotide bases to a growing oligo chain during cycle thermosequencing through primer extension. The likely folding (secondary structure-forming) behavior of these products is also shown. Oligos spanning a range of sequence from n to n 1 43 are shown. A tight loop involving a G–C base pair can form in the product of step n 1 16, after release of the oligo from the template. By step n 1 26, oligo folding is seen to be capable of generating a long more-or-less double helical stem involving eight base pairs and 26 nucleotides, with the end of the stem approaching its neck. Further addition of bases leads to the formation of another stem starting at step n 1 36 which also returns to its neck region (shared with the first stem) by step n 1 43. The second stem and the first compete for one or two common bases, resulting in the destabilization of both. This destabilization in the DNA–DNA oligo results in a rearrangement. However, the rearranged RNA–RNA secondary structure in the transcript (starting at the 50 end of the gene) reduces translational efficiency of the ribosome. The three silent mutations made to restore DNA sequence correctness and enhance expression are shown with arrows in the products corresponding to steps n 1 5, n 1 8, and n 1 23. (B) An expanded version of the long stem formed through DNA–DNA base pairing in the oligo which is postulated to reduce expression. (C) An expanded version of the secondary structure formed through RNA–RNA base pairing in the transcript, following rearrangement of the RNA fold forced by competition between the RNA homologs of the two stems postulated for the DNA structure. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com Electrophoresis 2007, 28, 3862–3867 Fig. 1. Intriguingly, however, we discovered that further chain elongation leads to possibilities of the formation of one more secondary structure (this time, a stem-loop structure rather than a hairpin) which advances backwards to return to the neck of the first secondary structure (i.e., to the neck of the hairpin), to destabilize it partially by competing for some of the same nucleotide bases, thereby forcing a rearrangement of the structure of the whole region. The topology of the first hairpin structure is shown in the schematic diagram in Fig. 2B, including both canonical and some non-Watson– Crick base pairs. The likely rearranged structure of the whole region, following extension of oligos into the region of the second structure proposed to be formed (between steps n 1 36 and n 1 43) in Fig. 2A, is shown in Fig. 2C. Note, however, that while it is the DNA form of the hairpin structure that is shown in Fig. 2B, it is the RNA form of the rearranged structure of the entire region (including both secondary structures) that is shown in Fig. 2C for reasons made clear below. The destabilization of the hairpin and the proposed forced rearrangement of the region’s structure was supported by the observation that when sequencing was done with the reverse primer, the location of the compression artifact was shifted by about 13 bases (data not shown). This shift could be inferred to owe to the differential structure-forming possibilities that apply to two different oligos (covering the same stretch of DNA on two complementary strands) growing in length in mutually opposite directions. To disrupt the formation of secondary structure in the region, we made three specific base alterations in the gene’s sequence, using silent mutations that would not alter the encoded amino acid residues. So, we took the assumed correct sequence, which was: @GGATCCACTGTCGAGCTGTTCGGACAATGGGACACGAGAACGG@ and incorporated three mutations (shown in small italicized letters) to create the sequence shown below which encodes the same amino acid sequence: @GGATCCACgGTCGAGCTGTgCGGACAgTGGGACACGAGAACGG”. To our satisfaction, we found that these three mutations which were designed to destroy secondary structure also ended up destroying the compression artifact and allowing the sequencing readout to correspond to the correct sequence (Fig. 1B). More importantly, the introduced silent mutations also led to a profound enhancement of the heterologous expression of the 25 kDa Rhodothermus marinus protein (engineered Cel12A) encoded by the clone. Figure 3A shows gel lanes corresponding to uninduced and induced forms of the original clones that failed to express well. Figure 3B shows the corresponding lanes for clones incorporating the three silent mutations described above (and also shown in Fig. 2A). It is clear from Fig. 3A that a very remarkable upshift in expression levels results from these silent mutations. The observed enhancement of expression suggests that the three silent mutations indeed lead to more efficient translation of the transcript. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Nucleic Acids 3865 Structures formed in RNA which have a free energy of stabilization exceeding about 25.5 kcal/mol are known to disrupt translation [4]. Using the web interface to the software RNAfold (URL:http://rna.tbi.univie.ac.at/cgi-bin/ RNAfold.cgi) [5] we determined the stability of the DNA, and RNA, forms of the secondary structural element shown in Fig. 2C, and found these to be 26.36 and 211.97 kcal/mol, respectively [6, 7]. Both of these structures are stable; however, the RNA–RNA structure is the more stable of the two structures, as expected. Certainly, the RNA structure appears to be stable enough to interfere with translation, explaining the enhancement of expression seen upon its destruction, whereas the DNA structure appears to be not quite as stable. In comparison, the DNA structure of the hairpin shown in Fig. 2B that causes the artifact was assessed to have a stability of only 23.24 kcal/ mol. Although this is the very first time in the literature that gene expression problems have been shown to be reflected in compression artifacts in DNA sequencing electrophoretograms, it may be held that there are other ways of examining the probabilities for secondary structure formation in mRNA that could interfere with translation, e.g., through software. We subjected the sequence of the cel12A gene to analysis by the RNAfold software. The probabilities for intramolecular interactions occurring within different regions of a large stretch of mRNA sequence are so immense in number that any software can only be expected to examine a limited set of possibilities. Programs like RNAfold thus generally try and first find interactions between the 50 - and 30 -ends of the sequence used as input, before examining the remaining probabilities. Even so, as can be clearly seen in Fig. 3C, RNAfold shows a very large number of structure-forming probabilities when the entire cel12A gene sequence is used as input. Notably, the secondary structure that we destroyed through mutations does not even show up in the structure shown in Fig. 3C. Given the high density of stems and loops in the predicted structure, it is worth reflecting upon the argument that if, in real-life, the mRNA chain were indeed to look like it does in Fig. 3C, it would probably never be successfully translated by a ribosome. Clearly, therefore, the results from software analyses are only as useful as the (shortness of the) length of sequence used as input, since short sequences have a reduced set of probabilities to be examined. For the molecular biologist, the question that then arises is the following: which stretch of sequence should one use as input when working with RNA fold-prediction software? Our finding that the compression artifact is separated from the RBS by 50 bases shows that secondary structures do not necessarily have to interfere directly with ribosome binding in order to affect translation. In effect, any secondary structure that is actually formed in a stable manner could potentially interfere with translation. Surely, therefore, instead of examining a large set of theoretical structure-forming possiwww.electrophoresis-journal.com 3866 D. Kapoor et al. Figure 3. (A) Expression of the genetic construct corresponding to Fig. 1A. Molecular weight markers are shown in lane 5. The 14.4 kDa marker which is also seen in the right panel has run out of the gel. Lanes 1, 3, 6, and 8 show total cell lysates for uninduced cultures of four different clones. Lanes 2, 4, 7, and 9, show the corresponding respective total lysates for induced cultures. (B) Expression of the genetic construct corresponding to Fig. 1B. Molecular weight markers are shown in lane 9. Lanes 1, 3, 5, and 7 show total cell lysates for uninduced cultures of four different clones. Lanes 2, 4, 6, and 8, show the corresponding respective total lysates for induced cultures. Note that the three silent mutations introduced in the construct (which led to the clearing up of the sequence anomaly shown in Figs. 1A and B) also resulted in a profound increase in expression (except in the clone shown in lanes 7 and 8 in the right panel. (C) Secondary structural analysis of the entire cel12A gene by RNAfold software. The sequence that creates the compression artifact is in the upper line of the encircled region, and shown stretched out over a long length towards the left of the figure, rather than as a hairpin, showing that the identified structure is not predicted through whole gene analysis. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Electrophoresis 2007, 28, 3862–3867 bilities and eliminating them one by one to examine effects on expression, it would be much more profitable for one to search for and identify the DNA/RNA sequences that definitively, and naturally, form stable enough secondary structures to give rise to compression artifacts – and destroy these, since one would at least be sure that these secondary structures do form. Thus, given our observation that secondary structure prediction software could not predict the location of the critical secondary structure-forming region when the entire cel12A gene’s sequence was analyzed (although, of course, once the region had been located, its potential to form secondary structure was confirmed through such software), we propose that searching for compression artifacts in DNA sequences may be a better approach for the identification of mRNA secondary structural elements which suppress translation, than software-based searches. It is known that compression artifacts can be destroyed by performing electrophoresis with the product of a DNA sequencing reaction at temperatures higher than room temperature, since heating destroys hydrogen bonds stabilizing DNA secondary structure [1–3]. Therefore, we propose that in order to identify all potential regions that can actually form secondary structural elements in an mRNA chain, electrophoretic analyses of DNA sequencing reactions be performed at temperatures lower than room temperature, to favor the hydrogen bonding necessary to generate all possible compression artifacts. In most automated sequencers, capillary oven temperatures tend to be set at 607C to prevent the occurrence of any complementarity-based annealing interactions either within, or between, pieces of ssDNA. Available instrument control softwares, however, allow for the setting of temperatures at much lower values, e.g., in the instrument used by us, the software allows for the temperature to be set anywhere between 18 and 657C. We propose that an oven temperature setting of 377C would be optimal for searching for secondary structures within mRNA being translated in the expression host Escherichia coli. Of course, for thermophile- or hyperthermophile-derived genes, particularly robust secondary structure can survive and show up even at higher temperatures, as turned out to be the case, e.g., in the sequencing experiment that led to the observation reported here, for which an oven temperature of 607C was used. Less robust structures would be likely to be revealed by electrophoresis at 377C. Below, we provide some details of the sequencing conditions used by us for the engineered cel12A gene. The sequencing reaction itself used the Big-Dye terminator cycle sequencing chemistry Version 3.1. Reactions were loaded onto an Applied Biosystems ABI 3130 XL Genetic Analyzer, using an oven temperature of 607C, a prerun voltage of 15 KV, an injection voltage of 1.6 KV using an injection time of 18 s, a run voltage of 13.4 KV using a run time of 1800 s and a current stability of 5 mA. The length of the capillary used for electrophoresis was 50 cm, optimized for a read www.electrophoresis-journal.com Nucleic Acids Electrophoresis 2007, 28, 3862–3867 length of 850 bases with a quality value setting of 20. The medium used for electrophoresis was the ABI performance optimized polymer-7 (POP-7). POP-7 was present at between 4 and 5 mL of volume per capillary in the 16-capillary sequencer. Data were collected with ABI’s Data Collection Software Version 3.0, and the sequence analyzed with the Sequence Analysis Software Version 5.2. 3867 References [1] Sanger, F., Coulson, A. R., J. Mol. Biol. 1975, 94, 441–446. [2] Maniatis, T., Jeffrey, A., van deSande, H., Biochemistry 1975, 14, 3787–3794. [3] Sanger, F., Nicklen, S., Coulson, A. R., Proc. Natl. Acad. Sci. USA 1977, 74, 5463–5467. [4] Mukund, M. A., Bannerjee, T., Ghosh, I., Datta, S., Curr. Sci. 1999, 76, 1486–1489. We thank Mr. Deepak Bhatt for technical assistance with electrophoresis. [5] Hofacker, I. L., Nucleic Acids Res. 2003, 31, 3429–3431. [6] SantaLucia, J., Proc. Natl. Acad. Sci. USA 1998, 95, 1460– 1465. [7] Mathews, D. H., Sabina, J., Zucker, M., Turner, H., J. Mol. Biol. 1999, 288, 911–940. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com