Download Using DNA sequencing electrophoresis compression artifacts as

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

United Kingdom National DNA Database wikipedia , lookup

Helicase wikipedia , lookup

Replisome wikipedia , lookup

Exome sequencing wikipedia , lookup

DNA sequencing wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
3862
Divya Kapoor
Sanjeev Kumar Chandrayan
Shubbir Ahmed
Purnananda Guptasarma
Division of Protein Science and
Engineering,
Institute of Microbial
Technology (IMTECH),
Chandigarh, India
Received May 16, 2007
Revised June 18, 2007
Accepted June 19, 2007
Electrophoresis 2007, 28, 3862–3867
Short Communication
Using DNA sequencing electrophoresis
compression artifacts as reporters of stable
mRNA structures affecting gene expression
The formation of secondary structure in oligonucleotide DNA is known to lead to “compression” artifacts in electropherograms produced through DNA sequencing. Separately,
the formation of secondary structure in mRNA is known to suppress translation; in particular, when such structures form in a region covered by the ribosome either during, or
shortly after, initiation of translation. Here, we demonstrate how a DNA sequencing compression artifact provides important clues to the location(s) of translation-suppressing secondary structural elements in mRNA. Our study involves an engineered version of a gene
sourced from Rhodothermus marinus encoding an enzyme called Cel12A. We introduced
this gene into Escherichia coli with the intention of overexpressing it, but found that it
expressed extremely poorly. Intriguingly, the gene displayed a remarkable compression
artifact during DNA sequencing electrophoresis. Selected “designer” silent mutations
destroyed the artifact. They also simultaneously greatly enhanced the expression of the
cel12A gene, presumably by destroying stable mRNA structures that otherwise suppress
translation. We propose that this method of finding problem mRNA sequences is superior
to software-based analyses, especially if combined with low-temperature CE.
Keywords:
Compression artifact / DNA sequencing electrophoresis / Nucleic acid secondary
structure
DOI 10.1002/elps.200700359
It is widely appreciated that the electrophoretic separation of oligonucleotide (oligo) populations with a resolution
of one nucleotide base length is critical for DNA sequencing.
It is also known that compression artifacts owing to secondary structure formation in DNA (e.g., hairpins or stem-loops)
present striking anomalies in oligo separations that frustrate
DNA sequencing [1–3]. Figure 1A shows such an anomaly
which was culled-out from the middle of a sequence readout
of engineered cel12A. Because the dye-blobs that are sometimes seen in DNA sequences at nucleotide read lengths of
about 70 nt happened to overlap with the latter parts of the
anomalous sequence, we read the sequence manually by
examining the colored electropherograms corresponding to
the four DNA bases. The manually read sequence beginning
with, and including, the BamH1 site (shown in bold letters
below) is:
@GGATCCACTGTTGAGTCGGGTGGG ACACGAGAACGG@.
Correspondence: Dr. Purnananda Guptasarma, Division of Protein Science and Engineering, Institute of Microbial Technology
(IMTECH), Sector 39-A, Chandigarh 160036, India
E-mail: [email protected]
Fax: 191-172-2690585
Abbreviation: oligo, oligonucleotide
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
The sequence should, however, actually have been:
@GGATCCACTGTCGAGCTGTTCGGACAATGGGACACGAGAACGG”.
In the above sequences, the region suffering from the
anomaly is shown flanked by 11 or 12 correctly sequenced
(underlined) nucleotides flanking it on either side. It is clear
that the anomalous sequence is both compressed in relation
to the correct sequence, and wrong in its detail.
We were extremely intrigued by this anomaly which was
reproducibly observed in a number of different sequencing
reactions involving different clones of the gene. What interested us especially was the fact that the anomaly was located
very close to the 50 -end of the gene, which had been inserted
between the BamHI and HindIII restriction sites of the
expression vector, pQE-30 (Qiagen). The BamHI site itself
consists of two codons (GGA and TCC) which encode
the eleventh and twelfth residues, respectively, of a 12
residues-long affinity tag encoding the sequence
“MRGSHHHHHHGS” which is separated from the 30 -end
of the ribosome binding site (RBS, also known as the Shine–
Dalgarno sequence) by nine bases. This generates a 50 baseslong separation between the ribosome-binding site and the
compression artifact.
To examine whether the compression anomaly could
reflect the potential for formation of a secondary structural
element in mRNA that would halt, or adversely affect, the
www.electrophoresis-journal.com
Electrophoresis 2007, 28, 3862–3867
Nucleic Acids
3863
Figure 1. (A) DNA sequencing electropherogram showing a run of @N@ assignments interrupting a normal sequence “read”. Manual
reading of the electrophoretogram showed that that the sequence in this region was wrong, although the sequences preceding and following this “compression artifact” are correct. (B) Electropherogram showing the correct sequence of the same clone after the introduction
of three silent mutations (details in Fig. 3) which destroyed the secondary structure responsible for the compression artifact in (A).
motion of the ribosome along the mRNA template, we analyzed the sequence of the region and found that it can form
very stable hairpin structures. In Fig. 2A, we show the likely
behavior of populations of lengths varying over a stretch of
sequence spanning 46 bases, starting from roughly the middle of the BamHI site (GGATTC).
As oligos produced through primer extension and termination increase in length up until step n 1 16 in Fig. 2A,
the sequencing readout remains normal. Upon further
elongation, a tight loop is formed by the closure of two successive GC base pairs. Further elongation leads to the formation of a very long hairpin containing eight very closelyspaced base pairs and a tight loop, with the antiparallel
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
strand advancing backwards along the hairpin all the way to
the region of the BamHI site (by the time step n 1 26 is
reached). This long double-stranded hairpin creates the
compression artifact, owing to hydrodynamic volume changes associated with secondary structure formation which
affect oligo mobility by making oligos travel faster through
gel or polymer matrix.
Generally, DNA sequencing readouts return to normal
once a compression artifact has been passed in the sequence,
since all longer oligos containing the secondary structure are
subject to the same compression effect. In keeping with this
expectation, the DNA sequence towards the 30 -end of the
artifact can be seen to return to the expected sequence in
www.electrophoresis-journal.com
3864
D. Kapoor et al.
Electrophoresis 2007, 28, 3862–3867
Figure 2. (A) A schematic diagram, showing the products of successive step-wise additions of nucleotide bases to a growing oligo chain
during cycle thermosequencing through primer extension. The likely folding (secondary structure-forming) behavior of these products is
also shown. Oligos spanning a range of sequence from n to n 1 43 are shown. A tight loop involving a G–C base pair can form in the
product of step n 1 16, after release of the oligo from the template. By step n 1 26, oligo folding is seen to be capable of generating a long
more-or-less double helical stem involving eight base pairs and 26 nucleotides, with the end of the stem approaching its neck. Further
addition of bases leads to the formation of another stem starting at step n 1 36 which also returns to its neck region (shared with the first
stem) by step n 1 43. The second stem and the first compete for one or two common bases, resulting in the destabilization of both. This
destabilization in the DNA–DNA oligo results in a rearrangement. However, the rearranged RNA–RNA secondary structure in the transcript
(starting at the 50 end of the gene) reduces translational efficiency of the ribosome. The three silent mutations made to restore DNA
sequence correctness and enhance expression are shown with arrows in the products corresponding to steps n 1 5, n 1 8, and n 1 23. (B)
An expanded version of the long stem formed through DNA–DNA base pairing in the oligo which is postulated to reduce expression. (C) An
expanded version of the secondary structure formed through RNA–RNA base pairing in the transcript, following rearrangement of the RNA
fold forced by competition between the RNA homologs of the two stems postulated for the DNA structure.
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
Electrophoresis 2007, 28, 3862–3867
Fig. 1. Intriguingly, however, we discovered that further
chain elongation leads to possibilities of the formation of one
more secondary structure (this time, a stem-loop structure
rather than a hairpin) which advances backwards to return to
the neck of the first secondary structure (i.e., to the neck of
the hairpin), to destabilize it partially by competing for some
of the same nucleotide bases, thereby forcing a rearrangement of the structure of the whole region. The topology of
the first hairpin structure is shown in the schematic diagram
in Fig. 2B, including both canonical and some non-Watson–
Crick base pairs.
The likely rearranged structure of the whole region, following extension of oligos into the region of the second
structure proposed to be formed (between steps n 1 36 and
n 1 43) in Fig. 2A, is shown in Fig. 2C. Note, however, that
while it is the DNA form of the hairpin structure that is shown
in Fig. 2B, it is the RNA form of the rearranged structure of the
entire region (including both secondary structures) that is
shown in Fig. 2C for reasons made clear below. The destabilization of the hairpin and the proposed forced rearrangement
of the region’s structure was supported by the observation that
when sequencing was done with the reverse primer, the location of the compression artifact was shifted by about 13 bases
(data not shown). This shift could be inferred to owe to the
differential structure-forming possibilities that apply to two
different oligos (covering the same stretch of DNA on two
complementary strands) growing in length in mutually
opposite directions. To disrupt the formation of secondary
structure in the region, we made three specific base alterations in the gene’s sequence, using silent mutations that
would not alter the encoded amino acid residues.
So, we took the assumed correct sequence, which was:
@GGATCCACTGTCGAGCTGTTCGGACAATGGGACACGAGAACGG@
and incorporated three mutations (shown in small italicized letters) to create the sequence shown below which
encodes the same amino acid sequence:
@GGATCCACgGTCGAGCTGTgCGGACAgTGGGACACGAGAACGG”.
To our satisfaction, we found that these three mutations
which were designed to destroy secondary structure also
ended up destroying the compression artifact and allowing
the sequencing readout to correspond to the correct
sequence (Fig. 1B). More importantly, the introduced silent
mutations also led to a profound enhancement of the heterologous expression of the 25 kDa Rhodothermus marinus
protein (engineered Cel12A) encoded by the clone. Figure 3A
shows gel lanes corresponding to uninduced and induced
forms of the original clones that failed to express well. Figure
3B shows the corresponding lanes for clones incorporating
the three silent mutations described above (and also shown
in Fig. 2A). It is clear from Fig. 3A that a very remarkable upshift in expression levels results from these silent mutations.
The observed enhancement of expression suggests that the
three silent mutations indeed lead to more efficient translation of the transcript.
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Nucleic Acids
3865
Structures formed in RNA which have a free energy of
stabilization exceeding about 25.5 kcal/mol are known to
disrupt translation [4]. Using the web interface to the software RNAfold (URL:http://rna.tbi.univie.ac.at/cgi-bin/
RNAfold.cgi) [5] we determined the stability of the DNA,
and RNA, forms of the secondary structural element
shown in Fig. 2C, and found these to be 26.36 and
211.97 kcal/mol, respectively [6, 7]. Both of these structures are stable; however, the RNA–RNA structure is the
more stable of the two structures, as expected. Certainly,
the RNA structure appears to be stable enough to interfere
with translation, explaining the enhancement of expression
seen upon its destruction, whereas the DNA structure
appears to be not quite as stable. In comparison, the DNA
structure of the hairpin shown in Fig. 2B that causes the
artifact was assessed to have a stability of only 23.24 kcal/
mol.
Although this is the very first time in the literature that
gene expression problems have been shown to be reflected
in compression artifacts in DNA sequencing electrophoretograms, it may be held that there are other ways of
examining the probabilities for secondary structure formation in mRNA that could interfere with translation, e.g.,
through software. We subjected the sequence of the cel12A
gene to analysis by the RNAfold software. The probabilities
for intramolecular interactions occurring within different
regions of a large stretch of mRNA sequence are so
immense in number that any software can only be expected to examine a limited set of possibilities. Programs like
RNAfold thus generally try and first find interactions between the 50 - and 30 -ends of the sequence used as input,
before examining the remaining probabilities. Even so, as
can be clearly seen in Fig. 3C, RNAfold shows a very large
number of structure-forming probabilities when the entire
cel12A gene sequence is used as input. Notably, the secondary structure that we destroyed through mutations does
not even show up in the structure shown in Fig. 3C. Given
the high density of stems and loops in the predicted
structure, it is worth reflecting upon the argument that if,
in real-life, the mRNA chain were indeed to look like it
does in Fig. 3C, it would probably never be successfully
translated by a ribosome. Clearly, therefore, the results
from software analyses are only as useful as the (shortness
of the) length of sequence used as input, since short
sequences have a reduced set of probabilities to be examined.
For the molecular biologist, the question that then arises
is the following: which stretch of sequence should one use as
input when working with RNA fold-prediction software? Our
finding that the compression artifact is separated from the
RBS by 50 bases shows that secondary structures do not
necessarily have to interfere directly with ribosome binding
in order to affect translation. In effect, any secondary structure that is actually formed in a stable manner could potentially interfere with translation. Surely, therefore, instead of
examining a large set of theoretical structure-forming possiwww.electrophoresis-journal.com
3866
D. Kapoor et al.
Figure 3. (A) Expression of the genetic construct corresponding
to Fig. 1A. Molecular weight markers are shown in lane 5. The
14.4 kDa marker which is also seen in the right panel has run out
of the gel. Lanes 1, 3, 6, and 8 show total cell lysates for uninduced cultures of four different clones. Lanes 2, 4, 7, and 9, show
the corresponding respective total lysates for induced cultures.
(B) Expression of the genetic construct corresponding to Fig. 1B.
Molecular weight markers are shown in lane 9. Lanes 1, 3, 5, and
7 show total cell lysates for uninduced cultures of four different
clones. Lanes 2, 4, 6, and 8, show the corresponding respective
total lysates for induced cultures. Note that the three silent
mutations introduced in the construct (which led to the clearing
up of the sequence anomaly shown in Figs. 1A and B) also
resulted in a profound increase in expression (except in the clone
shown in lanes 7 and 8 in the right panel. (C) Secondary structural
analysis of the entire cel12A gene by RNAfold software. The
sequence that creates the compression artifact is in the upper line
of the encircled region, and shown stretched out over a long
length towards the left of the figure, rather than as a hairpin,
showing that the identified structure is not predicted through
whole gene analysis.
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Electrophoresis 2007, 28, 3862–3867
bilities and eliminating them one by one to examine effects
on expression, it would be much more profitable for one to
search for and identify the DNA/RNA sequences that definitively, and naturally, form stable enough secondary structures to give rise to compression artifacts – and destroy these,
since one would at least be sure that these secondary structures do form.
Thus, given our observation that secondary structure
prediction software could not predict the location of the critical secondary structure-forming region when the entire
cel12A gene’s sequence was analyzed (although, of course,
once the region had been located, its potential to form secondary structure was confirmed through such software), we
propose that searching for compression artifacts in DNA
sequences may be a better approach for the identification of
mRNA secondary structural elements which suppress translation, than software-based searches.
It is known that compression artifacts can be destroyed
by performing electrophoresis with the product of a DNA
sequencing reaction at temperatures higher than room temperature, since heating destroys hydrogen bonds stabilizing
DNA secondary structure [1–3]. Therefore, we propose that
in order to identify all potential regions that can actually
form secondary structural elements in an mRNA chain,
electrophoretic analyses of DNA sequencing reactions be
performed at temperatures lower than room temperature, to
favor the hydrogen bonding necessary to generate all possible compression artifacts.
In most automated sequencers, capillary oven temperatures tend to be set at 607C to prevent the occurrence of any
complementarity-based annealing interactions either
within, or between, pieces of ssDNA. Available instrument
control softwares, however, allow for the setting of temperatures at much lower values, e.g., in the instrument used
by us, the software allows for the temperature to be set
anywhere between 18 and 657C. We propose that an oven
temperature setting of 377C would be optimal for searching
for secondary structures within mRNA being translated in
the expression host Escherichia coli. Of course, for thermophile- or hyperthermophile-derived genes, particularly
robust secondary structure can survive and show up even at
higher temperatures, as turned out to be the case, e.g., in
the sequencing experiment that led to the observation
reported here, for which an oven temperature of 607C was
used. Less robust structures would be likely to be revealed
by electrophoresis at 377C.
Below, we provide some details of the sequencing conditions used by us for the engineered cel12A gene. The sequencing reaction itself used the Big-Dye terminator cycle
sequencing chemistry Version 3.1. Reactions were loaded
onto an Applied Biosystems ABI 3130 XL Genetic Analyzer,
using an oven temperature of 607C, a prerun voltage of
15 KV, an injection voltage of 1.6 KV using an injection time
of 18 s, a run voltage of 13.4 KV using a run time of 1800 s
and a current stability of 5 mA. The length of the capillary
used for electrophoresis was 50 cm, optimized for a read
www.electrophoresis-journal.com
Nucleic Acids
Electrophoresis 2007, 28, 3862–3867
length of 850 bases with a quality value setting of 20. The
medium used for electrophoresis was the ABI performance
optimized polymer-7 (POP-7). POP-7 was present at between
4 and 5 mL of volume per capillary in the 16-capillary
sequencer. Data were collected with ABI’s Data Collection
Software Version 3.0, and the sequence analyzed with the
Sequence Analysis Software Version 5.2.
3867
References
[1] Sanger, F., Coulson, A. R., J. Mol. Biol. 1975, 94, 441–446.
[2] Maniatis, T., Jeffrey, A., van deSande, H., Biochemistry 1975,
14, 3787–3794.
[3] Sanger, F., Nicklen, S., Coulson, A. R., Proc. Natl. Acad. Sci.
USA 1977, 74, 5463–5467.
[4] Mukund, M. A., Bannerjee, T., Ghosh, I., Datta, S., Curr. Sci.
1999, 76, 1486–1489.
We thank Mr. Deepak Bhatt for technical assistance with
electrophoresis.
[5] Hofacker, I. L., Nucleic Acids Res. 2003, 31, 3429–3431.
[6] SantaLucia, J., Proc. Natl. Acad. Sci. USA 1998, 95, 1460–
1465.
[7] Mathews, D. H., Sabina, J., Zucker, M., Turner, H., J. Mol. Biol.
1999, 288, 911–940.
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com