Download Sequence Architecture Downstream of the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein–protein interaction wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Protein wikipedia , lookup

Epitranscriptome wikipedia , lookup

RNA interference wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Magnesium transporter wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Biochemistry wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Biosynthesis wikipedia , lookup

Plant breeding wikipedia , lookup

Community fingerprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Gene wikipedia , lookup

Gene regulatory network wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene expression wikipedia , lookup

RNA-Seq wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genetic code wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Expression vector wikipedia , lookup

Transcript
Sequence Architecture Downstream of the Initiator Codon
Enhances Gene Expression and Protein Stability in Plants1
Samir V. Sawant, Kanti Kiran, Pradhyumna K. Singh, and Rakesh Tuli*
National Botanical Research Institute, Rana Pratap Marg, Lucknow-226001, India
Nucleotide positions conserved on the 3⬘ side of the initiator codon ATG and the corresponding N-terminal amino acid
residues in a number of highly abundant plant proteins were identified by computational analysis of a dataset of highly
expressed plant genes. The reporter genes uidA and gfp were modified to introduce these features. Insertion of GCT TCC
TCC after the initiator codon ATG augmented expression for both the reporter genes. The insertion of each successive codon
improved the expression of ␤-glucuronidase (GUS) in an incremental fashion in transient transformation of tobacco
(Nicotiana tabacum) leaves. The insertion of alanine-serine (Ser)-Ser resulted in about a 2-fold increase in the stability of GUS.
However, this did not account for the 30- to 40-fold increase in GUS activity between the constructs coding for methioninealanine-Ser-Ser-GUS and the native enzyme. Substitution of the codon for Ser at the third amino acid residue with
synonymous codons reduced GUS expression. The results suggest a role for the conserved nucleotides in the ⫹4 to ⫹11
region in augmenting posttranscriptional events in the expression of genes in plants.
Sequences flanking the translation initiator codon
ATG have been surveyed in vertebrate (Cavener,
1987; Kozak, 1987) and plant (Joshi et al., 1997;
Sawant et al., 1999) gene databases to predict context
requirements for translational initiation. Some of the
predictions made by computational analysis have
been substantiated experimentally by using in vitro
translation systems. The current model (Kozak, 1989)
for the initiation of eukaryotic translation broadly
postulates that the first ATG codon is recognized by
scanning of the mRNA sequence from the 5⬘ end by
the 40S ribosomal initiation complex. The ⫺3 and ⫹4
positions (where the A of ATG is ⫹1) are considered
as the most important in determining a favorable
context of initiator ATG (Sullivan and Green, 1993).
The nucleotides upstream of the ATG are believed to
comprise the context required for efficient initiation
of translation. The modulation of gene expression by
nucleotides upstream of the ATG has been studied in
plant cells (Taylor et al., 1987; McElroy et al., 1991;
Dinesh-Kumar and Miller, 1993; Johnston and
Rochon, 1996; Marcin et al., 2000), though the functional roles of the preferred nucleotides at the ⫺1,
⫺2, and ⫺3 positions are not known. Specific, “most
preferred” nucleotides at these positions were reported to give a two-fold variation in gene expression, depending upon the plant species and the cell
type (Marcin et al., 2000). Using in vitro translation,
Lütcke et al. (1987) reported differences in the nucleotide requirement at the ⫺3 position between mammalian (rabbit reticulocyte) and plant (wheat germ)
1
This work was supported by fellowships (to S.V.S. and P.K.S.)
and by a research grant from The Council of Scientific and Industrial Research, Government of India.
* Corresponding author; e-mail [email protected]; fax
0522–205836.
1630
systems. The role of the sequence context downstream of the ATG, especially beyond the ⫹5 position, has not been examined in plants. The G at ⫹4
and A at ⫹5 positions were suggested to determine
the efficient utilization of ATG initiation sites in in
vitro translation experiments using the rabbit reticulocyte system (Grunert and Jackson, 1994). However,
in disagreement with this study and that of Boeck
and Kolakofsky (1994), Kozak (1997) reported a role
of only up to the ⫹4 position in correct recognition of
the initiator codon. The extent of conserved positions
further downstream of ⫹4 and their function in in
vivo gene expression have not been studied in detail.
In none of the earlier studies was the context analysis
done after classifying the gene database with respect
to their level of expression.
The nucleotides following the initiator ATG also
determine the type of amino acid residues at the N
terminus of encoded proteins. This can additionally
influence gene expression indirectly by altering the
stability of proteins. A variety of mechanisms that
affect the intracellular stability of proteins in yeast
(Saccharomyces cerevisiae; Bachmair et al., 1986) and
mammalian cells (Lèvy et al., 1999) have been reported to operate by recognizing the type of amino
acids at the N-terminal end. We examined a dataset
of highly expressed plant genes to see if they had
features conserved downstream of the ATG initiator
codon and if such features were in contrast to the
genes expressed at low levels. The functional validity
of the features noticed in computational analysis was
established by determining their role on the level of
in vivo gene expression. Keeping the ATG upstream
context constant, this study reports the influence of
nucleotide sequences conserved on the 3⬘ side of the
initiator ATG on in vivo expression of uidA and gfp
Plant Physiology, August
2001, Vol.
pp. 1630–1636,
www.plantphysiol.org
© 2001 American Society of Plant Biologists
Downloaded
from126,
on June
18, 2017 - Published
by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
ATG Initiator Codon and Gene Expression
genes, using a transient assay involving biolistic
transformation of tobacco (Nicotiana tabacum) leaves.
RESULTS
Features Conserved on the 3ⴕ Side of the Initiator
ATG in Highly Expressed Plant Genes
The underlying assumption of this study was that
the features characteristic of the 3⬘ side of the initiator ATG codon found in a number of genes expressed
at high levels in plants may reveal specialized architecture that facilitates the high level of their expression in vivo. Two features were noticed. First, there
was a significant bias for specific nucleotides at positions ⫹4, ⫹5, ⫹6, ⫹8, ⫹9, ⫹10, and ⫹11 in the
dataset of highly expressed genes (Table I). This results in a second feature that characterizes the abundantly expressed plant proteins. The predominance
of GCT at the ⫹4 to ⫹6 positions gives an Ala next to
N-terminal Met in 95% of such proteins. Other conserved nucleotides in the second and third codon
positions similarly result in the predominance of Ser
as the third and the fourth residues in the highly
expressed plant proteins (Table I). Features of the
canonical ATG downstream sequence identified as
conserved in the dataset of highly expressed plant
genes remained the same, irrespective of whether,
in the case of gene families, multiple members
or orthologous representatives were considered in
the analyses. When all members of the gene families were considered, the dataset comprised 236 entries. This dataset resulted in the nucleotide sequence
on 3⬘ of the initiator ATG and the corresponding
N-terminal amino acid sequence shown in Table I.
Table I. Modification of initiator ATG-downstream sequence in
reporter genes
Conserved ATG downstream features in highly expressed plant
genes: (a) nucleotides to 3⬘ of initiator ATG, A100T100G100G98C94T75
NC62C40(T37/A35)C48, and (b) N-terminal amino acids, Met100
Ala95Ser32Ser31. uidA Constructs with modified ATG downstream sequence: upstream primers, 5⬘-AATTACATCTAGATAAACAATGⴱTTACGTCCTGTAGAAACCCCAA-3⬘. gfp Constructs with modified ATG
downstream sequence: upstream primers, 5⬘-ATGCATTCTAGATCAACAATGⴱGGTAAAGGAGAAGAACTTTTCACTGGAGTT3⬘.
Name of Construct
uidA Constructs
(i) M-GUS
(ii) MA-GUS
(iii) MAS1-GUS
(iv) MAS1S-GUS
(v) MDS1S-GUS
(vi) MACS-GUS
(vii) MAS2S-GUS
(viii) MAS3S-GUS
(ix) MAS4S-GUS
gfp Constructs
(i) M-GFP
(ii) MASS-GFP
Insertion at ⴱ
Name of
Upstream Primer
None
GCT
GCT TCC
GCT TCC TCC
GAT TCC TCC
GCT TGC TCC
GCT TCG TCC
GCT AGC TCC
GCT AGT TCC
Primer
Primer
Primer
Primer
Primer
Primer
Primer
Primer
Primer
None
GCT TCC TCC
Primer 10
Primer 11
Plant Physiol. Vol. 126, 2001
1
2
3
4
5
6
7
8
9
Reconstruction of the database to include only one
representative sequence per gene family gave a dataset comprising of 74 entries. This smaller dataset
resulted in a similar conserved ATG 3⬘ nucleotide
sequence, that is to say A100 T100 G100 G98 C92 T75 T40
C60 C40 (T37/A35) C41, and corresponding N-terminal
amino acid sequence Met100 Ala92 Ser36 Ser30. Results
from the complete dataset shown in Table I were
used for detailed analyses.
Conservation of Codons Following Initiator ATG in
Highly Expressed Plant Genes
A high frequency of G at ⫹4 and C at ⫹5 noticed in
our dataset of highly expressed plant genes has been
documented earlier in vertebrate genes (Grunert and
Jackson, 1994). It was also correlated with a high
frequency of Ala as the corresponding amino acid at
the second position in vertebrate (Grunert and Jackson, 1994) and plant (Luehrsen and Walbot, 1994)
proteins. However, the advantage, if any, of an Ala
next to the initiator Met has not been reported. The
preferred occurrence of T at ⫹6, which is the wobble
position in the codon for Ala, is characteristic of the
penultimate N-terminal codon in the dataset developed by us. In an earlier study (Murray et al., 1989),
the frequency of GCT employed for Ala in plant gene
database was calculated at 37%. Quite comparably,
the dataset of highly expressed nuclear genes analyzed by us also showed GCT for Ala at 39% positions in the reading frame. However, the frequency
of GCT usage at ⫹4 to ⫹6 position in the same
dataset was 78%. This points to a possible role of U at
⫹6 position over and above a possible advantage in
having an Ala as the second amino acid in abundantly expressed proteins. Likewise, a preferred deployment of CC at the second and third codon positions, with Ser as the third amino acid residue, could
be indicative of an architecture at ⫹8 and ⫹9 that
facilitates a high level of gene expression. The UCC is
used in only 20% of the cases for Ser elsewhere in the
dataset of highly expressed nuclear genes in plants.
However, Ser at the third amino acid position in the
highly expressed genes is coded by UCC in 46% of
the cases. The ⫹10 and ⫹11 positions show a preference for T/A and C, respectively, and a preference
for Ser as the fourth N-end residue. Of the highly
expressed proteins, 46% had Met-Ala, 18% had MetAla-Ser, 17% had Met-Ala-X-Ser, and 14% had MetAla-Ser-Ser as the N-terminal amino acids. The last
class comprised members of the RuBP carboxylase
small subunit family from different plants and forms
part of the most abundant proteins in plants.
Effect of Conserved Features on Expression of the
Reporter Genes
The functional validity of the conserved features
was established by comparing the suitably mutated
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
1631
Sawant et al.
uidA genes in transient expression, following their
bombardment on tobacco leaves. The uidA gene modified (Table I) to resemble the ATG downstream architecture of the highly expressed plant genes up to
⫹11 position, i.e. MAS1S-␤-glucuronidase (GUS),
showed a 30- to 40-fold increase in GUS expression in
tobacco leaves. The activity increased progressively
(Fig. 1) as the nucleotide context in Met-Leu-Leu-ProGUS (M-GUS) was progressively extended to include
the conserved motif up to ⫹11 in MAS1S-GUS. In
order to establish that the increase in GUS activity
was the result of the specific ATG downstream nucleotide sequence and/or the corresponding amino
acid residues and not merely the consequence of
providing a 5⬘ extension to the mRNA, the expression of GUS was examined from constructs carrying
point mutations of MAS1S-GUS. As shown in Table
II, the substitution of C at ⫹5 with A (MDS1S-GUS,
results in Ala23-Asp) and C at ⫹8 with G (MACSGUS, results in Ser33-Cys) resulted in a significant
decline in GUS activity. The decline was most severe
in the case of the ⫹5 mutation (MDS1S-GUS) and that
corresponds with this position being conserved in
94% of the highly expressed genes (Table I). The
importance of a specific nucleotide per se at a given
position was further examined by making synonymous substitutions in TCC, the third N-terminal
codon. The C3G mutation at ⫹9 (MAS2S-GUS) gave
a 5.5-fold reduction in expression (Table II), although
both MAS1S- and MAS2S-GUS have Ser as the third
N-terminal residue. A double mutation in the third
codon, i.e. T3A and C3G at positions ⫹7 and ⫹8,
respectively, in MAS3S, resulted in a six-fold reduction in expression level. When the TCC was replaced
by the synonymous codon AGT (MAS4S), thereby
mutating all three positions without altering the
amino acid, the expression was reduced by
seven-fold.
Validation of the results on the augmentation of
GUS activity by the ATG 3⬘ motif in MAS1S-GUS was
further substantiated by inserting the same motif
after the ATG of a second reporter gene, in this case
gfp. Insertion of the canonical ⫹4 to ⫹11 motif en-
Figure 1. Effect of ATG downstream modifications on in vivo GUS
activity in cell-free extracts prepared at different time points after
bombardment of tobacco leaves with gene constructs expressing
M-GUS (f), MA-GUS (䡺), MAS1-GUS (F), and MAS1S-GUS (E).
1632
Table II. Expression of reporter proteins with N-terminal modifications
Name of Construct
GUS constructs
(i) M-GUS
(ii) MA-GUS
(iii) MAS1-GUS
(iv) MAS1S-GUS
(v) MDS1S-GUS
(vi) MACS-GUS
(vii) MAS2S-GUS
(viii) MAS3S-GUS
(ix) MAS4S-GUS
GFP constructs
(i) M-GFP
(ii) MAS1S-GFP
Activity
Fold Activation
14.6a (⫾1.2)
59.9 (⫾4.1)
181.6 (3.9)
603.0 (⫾13.8)
4.2 (⫾0.3)
44.4 (⫾2.6)
110.2 (⫾2.8)
100.6 (⫾1.5)
86.3 (⫾3.6)
1.0
4.1
12.4
41.2
0.3
3.1
7.6
6.9
5.9
16.3b (⫾1.1)
345.5 (⫾9.7)
1.0
21.2
a
GUS ⫾ SD (glucuronidase activity expressed as ⫻103 pmol/h/mg
b
protein).
GFP ⫾ SD (GFP activity expressed as ⫻102 relative
fluorescence/mg protein).
hanced the expression of MAS1S-green fluorescent
protein (GFP) also by 21-fold (Table II) as compared
with the wild-type M-GFP.
Given the fact that the gene architecture upstream
of the initiator codon is exactly similar in all the
constructs, the increase in expression in the MAS1Sfusion constructs is likely to be due to the enhancement of one or more posttranscriptional events. It
could be due to improved transcript stability, higher
translational efficiency, an increase in activity of the
proteins due to altered N-terminal conformation, decrease in their rate of intracellular degradation, or a
combination of one or more of these aspects. The
different GUS constructs were examined in detail to
resolve these possibilities.
Substrate saturation curves (Fig. 2) showed that all
four forms of GUS, that is to say M-GUS, MA-GUS,
MAS1-GUS, and MAS1S-GUS, exhibited exactly similar reaction kinetics. Similar K0.5 values and saturation curves of the four forms suggested that the
N-terminal modifications made in this study did not
enhance the enzyme activity by imparting a catalytically favorable functional conformation to GUS.
Similarly, the decline in GUS activities in MDS1SGUS and MACS-GUS was not due to an effect on the
catalytic site of the enzyme.
Figure 3 shows the results of reverse transcriptase
(RT)-PCR, to compare relative steady state levels of
GUS transcripts formed after bombardment with the
constructs coding for M-GUS and MAS1S-GUS. From
the 26th to 30th cycle, the amplification products
showed a linear increase. The level of amplification
achieved in different cases was comparable, suggesting that the different constructs transcribed uidA at
similar levels. This excluded any substantial contribution of transcription and transcript stability on the
30- to 40-fold higher level of GUS expression from
MAS1S-GUS as compared to M-GUS.
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
Plant Physiol. Vol. 126, 2001
ATG Initiator Codon and Gene Expression
Figure 2. Substrate saturation curves of GUS in cell-free extracts
prepared from tobacco leaves, 36 h after bombardment with different
ATG constructs. Using 4-methylumbelliferyl ␤-D-glucuronide as the
variable substrate, the raw data on GUS-specific activities were
plotted (shown in inset, with y axis as in Fig. 1) and then normalized
to allow visual comparison of the shape of the curves. MDS1S-GUS
(‚) and MACS-GUS (Œ) are also included, besides the constructs
given in Figure 1.
Effect of Conserved N-Terminal Amino Acids on
Protein Stability
The cycloheximide treatment arrested the increase
in GUS activity (Fig. 4). The half-life of GUS was
calculated by using model y ⫽ ␣ ⫹ ␤␳x as the best fit.
The activity of M-GUS declined with an average
half-life of 4.26 h. Insertion of an Ala after the initiator Met gave a distinct advantage in reducing turnover of MA-GUS by 1.5-fold. The half life of GUS
increased progressively in the constructs MA-GUS,
MAS1-GUS, and MAS1S-GUS to 6.8, 7.6, and 9.3 h.
However, the gain in intracellular stability was not
sufficient to account for the nearly four-, eight-, and
30-fold higher in vivo activity seen, respectively (Fig.
1), in the case of MA-GUS, MAS1-GUS, and MAS1SGUS, as compared with M-GUS.
lowing the initiator ATG does not enhance stability
of the transcripts. A two-fold increase in intracellular
stability of GUS protein suggests a modest role of the
N-terminal amino acids in determining the half-life
of proteins. However, a majority of the 30- to 40-fold
increase in the expression of GUS following insertion
of the ATG downstream conserved motif seems to be
due to posttranscriptional events, although we can
rule out protein and transcript stability. Though detailed steps in translational initiation and elongation
are not understood, it is possible that nucleotides
downstream of the ATG from ⫹4 to ⫹11 influence
these events. Translation initiation factors (Raught et
al., 2000) and ATG upstream nucleotides (Lütcke et
al., 1987) specific to plants have been reported and
may have distinct roles in determining translational
efficiency. It is possible that efficient ribosomal recruitment at the initiator ATG involves an interaction
between the ⫹4 to ⫹11 positions and the 48S preinitiation complex in plants. The experiments with
synonymous substitutions suggest that the TC at ⫹7
to ⫹8 and the C at ⫹9 possibly influence translational
efficiency favorably. Similarly, a C at ⫹5 appears to
be critical. In vitro studies on mRNAs with several
substitutions in the ATG downstream region are required to understand the mechanism of such activation. The role of ATG downstream features in translational control is not known, though enhancer
elements downstream of the ATG initiation codon
have also been postulated in certain bacterial genes
(O’Connor et al., 1999).
The results reported here show that the intracellular stability of GUS expressed in tobacco leaves was
DISCUSSION
Our study shows that computational analysis of
highly expressed plant genes can be used to identify
features in gene architecture that possibly contribute
toward determining their high level of expression.
Though context sequences around the initiator ATG
have been recognized earlier, such analyses have not
been done after classifying the genes with respect to
their level of expression or function. Following classification of genes into a dataset of highly expressed
plant genes, novel features were identified downstream of the initiator ATG in this study. The validity
of their function in augmenting the level of expression was supported by introducing such features in
two reporter genes, uidA and gfp.
The results on the steady state of GUS transcripts in
various constructs suggest that insertion of the conserved nucleotide sequence GCTNCC(T/A) CN folPlant Physiol. Vol. 126, 2001
Figure 3. Quantification of GUS transcripts by RT-PCR. One microgram of total RNA prepared from tobacco leaves bombarded with
different constructs was subjected to RT-PCR. The amplification
products in aliquots drawn at different stages were compared after
electrophoresis. The agarose gel shows HaeIII ØX174 molecular
weight standards (lane 1), PCR of standard GUS plasmid (lane 2),
RT-PCR of control (non-bombarded; lane 3), and that of M-GUS
(lanes 4, 6, 8, and 10) and MAS1S-GUS (lanes 5, 7, 9, and 11) on
samples drawn after 26, 28, 30, and 40 cycles of amplification.
Another control, comprising total RNA from MAS1S-GUS leaf but
without adding the RT during the RT-PCR, showed no fragment after
40 cycles of amplification (lane not included).
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
1633
Sawant et al.
Ser-Ser, using the appropriate codons, may be a useful N-terminal tag for designing vectors for
enhancing the stability of proteins in plants, provided their catalytic function is not affected by such
modification.
MATERIALS AND METHODS
Computational Analysis
Figure 4. Relative activity of GUS in cell-free extracts prepared at
different time points after transfer of leaves to medium containing
cycloheximide. One set of leaf discs in each case was transferred to
medium containing cycloheximide, 60 h after the bombardment.
Each data point gives the average of six independent bombardment
events. The symbols are the same as in Figure 1.
influenced by the specific nucleotide sequence and
the corresponding amino acids at the N-terminal end.
The stabilizing effect increased when the Met exposed at the N terminal of GUS was followed by
Ala-Ser-Ser as the next three residues, with the maximum effect being observed when all three were
present. Applicability of the result to GFP, used as a
second reporter protein, further supports the validity
of a purely computational analysis done by classifying genes by their level of expression. The results
suggest that the specific nucleotide sequence and the
corresponding amino acids at the N-terminal end of
highly expressed proteins may have been evolutionarily selected to obtain high level of expression of
these proteins. Interestingly, in Escherichia coli the
presence of amino acids with smaller side chain
lengths, i.e. Ala or Ser at the second position, was
reported as particularly unfavorable to N-terminal
stability of ␤-galactosidase (Hirel et al., 1989). The
excision of N-terminal Met by Met aminopeptidase
was reported to decrease in E. coli when amino acids
with a longer side chain were inserted at the penultimate position of ␤-galactosidase. Thus, the length
parameter rule for E. coli proteins beginning with
Met does not apply to GUS and GFP stability in
tobacco leaves.
In the case of processed proteins, different sets of
amino acids at the non-Met N terminal impart stability in plants (Hondred et al., 1999), yeast, and rabbit
reticulocytes (Tobias et al., 1991). Intracellular stability in such cases is determined by the ubiquitindependent N-end rule pathway. For instance, Met,
Pro, Val, or Gly at the amino terminus stabilizes
proteins in reticulocytes (Tobias et al., 1991). In addition to these, Thr, Ser, Ala, or Cys stabilizes proteins in yeast. In E. coli, any of the above with the
exception of Pro and, in addition, Gln, Asn, Glu, Asp,
Ile, or His, stabilize proteins. The principles of intracellular stability of ubiquitinylated proteins have recently been employed to enhance the accumulation
of heterologous proteins in transgenic tobacco (Hondred et al., 1999). Our results suggest that Met-Ala1634
The EMBL gene database was screened manually to
create a subset of angiosperm genes, potentially expressed
at high level in plants, irrespective of their tissue or environmental specificities. The genes were classified as highly
expressed, based on published information (Ruan et al.,
1998; several other references cited in Sawant et al., 1999),
on their expression level, various expressed sequence tags,
and microarray analyses.
The selected dataset comprised highly expressed plant
genes, several of which occur as gene families, that is to say
chlorophyll a/b binding proteins, late embryogenesis
abundant proteins, RuBP carboxylase small subunit, seed
storage proteins (2S albumins, 7S albumins, 12S globulins,
prolamins, napins, oleosins, 19-kD globulins, phaseolins,
kafirins, zeins, and citrins), lectins, histones, photosystem
related proteins (photosystem I proteins, photosystem II
proteins, and cytochromes), nucleus coded mitochondrial
proteins (RP15, citrate synthases, ATPases, and cytochromes), ribosomal proteins (L36, S10, S28, CL15, CL9,
CS1, and CS70) Phe ammonia lyases, acyl carrier proteins,
calmodulins, peroxidases, catalases, Pro- and Gly-rich proteins, extensins, polygalacturonases, cyclins (delta, 2A, 2B,
3A, and 3B), nodulins, translational elongation factors (TS,
TU, and 1␤), ␣-amylases, ␤-amylases, and amylase inhibitors. Other entries represented unique genes, that is to say
actinidin protease from Actinidia delicosa, cinnamyl alchohol dehydrogenase, calcium binding protein, flavone
isomerase, ferrodoxin, RNA binding protein from Arabidopsis, microspore specific protein from Brassica napus,
fructokinase, phosphoglycerate kinase, ascorbate oxidase,
nitrilase from tobacco (Nicotiana tabacum), allergen protein,
glutathione reductase, lipoamide dehydrogenase from Pisum sativum, latex protein from Papaver somniferum, arabinogalactan, lipid transfer protein, Suc transfer protein from
Pinus taeda, etc.
The sequences were manually aligned with respect to
translational initiator ATG, as identified by the authors
against individual entry in the database. Information about
the ATG region of the 236 entries was employed in this
analysis. A complete list of genes classified broadly with
respect to their level of expression and used in this study is
available on the Internet at www.geocities.com/rakeshtuli/PLNATG.htm, along with respective database references. The dataset of 236 entries was translated into nascent protein sequences to survey amino acid residues at the
first 10 positions from the N-terminal region. The conserved nucleotide and the amino acid positions as determined by statistical tests of significance are given in Table
I. A detailed analysis showing that several nucleotide sequence features in the highly expressed plant genes are in
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
Plant Physiol. Vol. 126, 2001
ATG Initiator Codon and Gene Expression
contrast to the features in genes expressed ubiquitously at
low level has been published (Sawant et al., 1999). Differences in the levels of expression within gene families were
ignored while creating the dataset since the objective was
to identify features conserved in a majority of the members.
As explained in “Results,” the conclusions did not change
significantly if only one ortholog sequence representing a
given gene family was included in the dataset instead of all
the members reported from multiple plant species.
Construction of the Reporter Genes with Modified
ATG 3ⴕ Contexts
The uidA gene was amplified from pBI101.1, using nine
different upstream primers designed to obtain the desired
variants of the wild-type GUS mRNA. As given in Table I,
the standard upstream primer (Primer 1) was used to amplify the wild-type uidA gene that comprises the native
open reading frame that codes for M-GUS. The primers 2,
3, 4, 5, and 6 amplified GUS with Ala, Ala-Ser, Ala-Ser-Ser,
Asp-Ser-Ser, and Ala-Cys-Ser, respectively, inserted between the first Met and the second Leu in the wild-type
protein. The primers 7, 8, and 9 amplified variants of the
Ala-Ser-Ser-coding mRNA in which the codon for the
former Ser was substituted by synonymous codons. The
nine constructs are referred to as M-, MA-, MAS1-,
MAS1S-, MDS1S-, MACS-, MAS2S-, MAS3S-, and MAS4SGUS, respectively. These were cloned into pUC19 by PCRbased ligation, in front of a highly expressed plant promoter (Sawant et al., 2001). The PCR was conducted using
VentR DNA polymerase, which is known for a low error
rate (5.7 ⫻ 10⫺5, which is very close to 4 ⫻ 10⫺5 for
Klenow). The cloned fragments were sequenced in the
ATG context region and found to be free of any errors. The
different constructs, therefore, had exactly the same
ATG-5⬘ translational context, untranslated leader, and
promoter-regulatory region. They differed only with respect to the presence of additional codons inserted after the
initiator Met as per the architecture identified in highly
expressed plant genes. The primers 10 and 11 (Table I)
were used (as upstream primers) to clone gfp (Reichel et al.,
1996) to validate the effect of the conserved features on the
expression of a second gene.
Analysis of Transient Expression of the Reporter
Genes in Tobacco Leaves
The transient expression studies were carried out by
microprojectile-mediated delivery of DNA in tobacco leaf
discs using an improved method (Sawant et al., 2000). The
modified protocol gives highly reproducible shot-to-shot
results. Stability of the GUS in the bombarded leaf was
measured by blocking de novo protein synthesis by placing
the leaf discs on medium containing 300 ␮g mL⫺1 cycloheximide. Following bombardment with DNA, the leaf was
cut into approximately 1-cm2 pieces and incubated on Murashige and Skoog-agar medium (in light). After 60 h of
incubation, the leaf pieces were removed to determine GUS
activity fluorometrically (Jefferson and Wilson, 1991). EnPlant Physiol. Vol. 126, 2001
dogenous GUS-like activity was suppressed by incorporating 20% (v/v) methanol in the reaction and treating the
samples at 55°C for 10 min. At least six independent bombardments were done in each case to estimate GUS activity
at each time point.
The single-point GUS and GFP activities were measured
after incubating six replicates of the tissue on agarMurashige and Skoog medium (in light) for 72 h after
bombardment. The cell-free extracts for GFP measurements
were prepared in 50 mm sodium phosphate buffer, pH 7.
Emission was measured at 510 nm, following excitation at
475 nm (Reichel et al., 1996).
Analysis of GUS Transcript by RT-PCR
The relative amounts of GUS transcripts present in tobacco leaf discs were estimated in terms of the product
formed in RT-PCR after bombardment of the discs with
different gene constructs. One microgram of total RNA
extracted in TRIZOL LS Reagent (Gibco-BRL, Cleveland)
was treated with RNase-free DNase (1 unit of enzyme per
microgram RNA; Amersham-Pharmacia, Uppsala), reverse
transcribed by Superscript II (Gibco-BRL), and subjected to
40 cycles of PCR using primers that amplified a 175-bp
fragment internal to the uidA gene. The phase of linear
increase in PCR products was determined by drawing
aliquots at different stages. The amount of amplified DNA
was estimated both with Hoechst dye (H 33258, Sigma, St.
Louis) and by scanning with the Flour-S (Bio-Rad Laboratories, Hercules, CA) documentation system after agarose
gel electrophoresis.
ACKNOWLEDGMENTS
We thank Dr. S.K. Mandal of Central Drug Research
Institute (Lucknow, India) for advice on statistical analysis
of the data and Dr. Jaideep Mathur for providing the gfp
construct.
Received January 18, 2001; returned for revision February
15, 2001; accepted April 18, 2001.
LITERATURE CITED
Bachmair A, Finley D, Varshavsky A (1986) In vivo halflife of a protein is a function of its amino-termianl residue. Science 234: 179–186
Boeck R, Kolakofsky D (1994) D Positions ⫹5 and ⫹6 can
be major determinants of the efficiency of non-AUG
initiation for protein synthesis. EMBO J 13: 3608–3617
Cavener DR (1987) Comparison of the consensus sequence
flanking translational start site in Drosophila and vertebrates. Nucleic Acids Res 15: 1353–1361
Dinesh-Kumar SP, Miller WA (1993) Control of start
codon choice on a plant viral RNA encoding overlapping
genes. Plant Cell 5: 679–692
Grunert S, Jackson RJ (1994) The immediate downstream
codon strongly influences the efficiency of utilization of
eukaryotic translation initiation codons. EMBO J 13:
3618–3630
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
1635
Sawant et al.
Hirel Ph-H, Schmitter J-M, Dessen P, Fayat G, Blanquet S
(1989) Extent of N-terminal methionine excision from
Escherichia coli proteins is governed by the side- chain
length of the penultimate amino acid. Proc Natl Acad Sci
USA 86: 8247–8251
Hondred D, Walker JM, Mathews DE, Vierstra RD (1999)
Use of ubiquitin fusions to augment protein expression
in transgenic plants. Plant Physiol 119: 713–723
Jefferson RA, Wilson KJ (1991) The GUS gene fusion
system. In S Gelvin, R Schilperoort, eds, Plant Molecular
Biology Manual, Vol B14. Kluwer Academic Publishers,
Dordrecht, The Netherlands, pp 1–33
Johnston JC, Rochon DM (1996) Both codon context and
leader length contribute to efficient expression of two
overlapping open reading frames of a cucumber necrosis
virus bifunctional subgenomic mRNA. Virology 221:
232–239
Joshi CP, Zhou H, Huang X, Chiang VL (1997) Context
sequences of translation initiation codon in plants. Plant
Mol Biol 35: 993–1001
Kozak M (1987) At least six nucleotides preceding the
AUG initiator codon enhance translation in mammalian
cells. J Mol Biol 196: 947–950
Kozak M (1989) The scanning model for translation: an
update. J Cell Biol 108: 229–241
Kozak M (1997) Recognition of AUG and alternative initiator codons is augmented by G in position ⫹4 but is not
generally affected by the nucleotides in positions ⫹5 and
⫹6. EMBO J 16: 2482–2492
Lèvy F, Johnston JA, Varshavsky A (1999) Analysis of a
conditional degradation signal in yeast and mammalian
cells. Eur J Biochem 259: 244–252
Luehrsen KR, Walbot V (1994) The impact of AUG start
codon context on maize gene expression in vivo. Plant
Cell Rep 13: 454–458
Lütcke HA, Chow KC, Mickel FS, Moss KA, Kern HF,
Scheele GA (1987) Selection of AUG codons differs in
plants and animals. EMBO J 6: 43–48
Marcin L, Marc F, Bènèdicte J, Arnaud S, Marc B (2000) In
vivo evaluation of the context sequence of the translation
initiation codon in plants. Plant Science 154: 89–98
McElroy D, Blowers AD, Jenes B, Wu R (1991) Construction of expression vectors based on the rice actin 1 (Act1)
1636
5⬘ region for use in monocot transformation. Mol Gen
Genet 231: 150–160
Murray EE, Lotzer J, Eberle M (1989) Codon usage in plant
genes. Nucleic Acids Res 17: 477–498
O’Connor M, Asai T, Squires CL, Dahlberg AE (1999)
Enhancement of translation by the downstream box does
not involve base pairing of mRNA with the penultimate
stem sequence of 16S rRNA. Proc Natl Acad Sci USA 96:
8973–8978
Raught B, Gingras A-C, Sonenberg N (2000) Regulation
of ribosomal recruitment in eukaryotes. In N Sonenberg, JWB Hershey, MB Mathews, eds, Translational
Control of Gene Expression, C.S.H. Press, New York, pp
245–293
Reichel C, Mathur J, Eckes P, Langenkemper K, Koncz C,
Schell J, Reiss B, Maas C (1996) Enhanced green fluorescence by the expression of an Aequoreu victoria green
fluorescent protein mutant in mono- and dicotyledonous
plant cells. Proc Natl Acad Sci USA 93: 5888–5893
Ruan Y, Gilmore J, Conner T (1998) Towards Arabidopsis
genome analysis: monitoring expression profiles of 1400
genes using DNA microarays. Plant J 15: 821–833
Sawant SV, Singh PK, Gupta SK, Madanala R, Tuli R
(1999) Conserved nucleotide sequence in highly expressed genes in plants. J Genet 78: 1–8
Sawant SV, Singh PK, Madanala R, Tuli R (2001) Designing of an artificial expression cassette for high level
expression of transgenes in plants. Theor Appl Genet
102: 635–644
Sawant SV, Singh PK, Tuli R (2000) Pretreatment of microprojecticles to improve the delivery of DNA in plant
transformation. BioTechniques 29: 246–248
Sullivan ML, Green PJ (1993) Post transcriptional regulation of nuclear-encoded genes in higher plants: the roles
of mRNA stability and translation. Plant Mol Biol 23:
1091–1104
Taylor JL, Jones JDG, Sandler S, Mueller GM, Bedbrook
J, Dunsmuir P (1987) Optimizing the expression of chimeric genes in plant cells. Mol Gen Genet 210: 572–577
Tobias JW, Shrader TE, Rocap G, Varshavsky A (1991)
The N-end rule in bacteria. Science 254: 1374–1377
Tuli R (2000) List of highly expressed genes used for
ATG region analysis. www.geocities.com/rakeshtuli/
PLNATG.htm (December 28, 2000)
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2001 American Society of Plant Biologists. All rights reserved.
Plant Physiol. Vol. 126, 2001