* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Relationship between codon biased genes, microarray expression
Epitranscriptome wikipedia , lookup
Biosynthesis wikipedia , lookup
RNA interference wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Gene desert wikipedia , lookup
Molecular ecology wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
Secreted frizzled-related protein 1 wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene expression wikipedia , lookup
Gene regulatory network wikipedia , lookup
Genetic code wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Microbiology (2004), 150, 2313–2325 DOI 10.1099/mic.0.27097-0 Relationship between codon biased genes, microarray expression values and physiological characteristics of Streptococcus pneumoniae Antonio J. Martı́n-Galiano,13 Jerry M. Wells2 and Adela G. de la Campa1 1 Unidad de Genética Bacteriana (CSIC), Centro Nacional de Microbiologı́a, Instituto de Salud Carlos III, 28220, Majadahonda, Madrid, Spain Correspondence Antonio J. Martı́n-Galiano 2 [email protected] Bacterial Infection and Immunity Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK Received 13 February 2004 Revised 26 April 2004 Accepted 28 April 2004 A codon-profile strategy was used to predict gene expression levels in Streptococcus pneumoniae. Predicted highly expressed (PHE) genes included those encoding glycolytic and fermentative enzymes, sugar-conversion systems and carbohydrate-transporters. Additionally, some genes required for infection that are involved in oxidative metabolism and hydrogen peroxide production were PHE. Low expression values were predicted for genes encoding specific regulatory proteins like two-component systems and competence genes. Correspondence analysis localized 484 ORFs which shared a distinctive codon profile in the right horn. These genes had a mean G+C content (33?4 %) that was lower than the bulk of the genome coding sequences (39?7 %), suggesting that many of them were acquired by horizontal transfer. Half of these genes (242) were pseudogenes, ORFs shorter than 80 codons or without assigned function. The remaining genes included several virulence factors, such as capsular genes, iga, lytB, nanB, pspA, choline-binding proteins, and functions related to DNA acquisition, such as restriction-modification systems and comDE. In order to compare predicted translation rate with the relative amounts of mRNA for each gene, the codon adaptation index (CAI) values were compared with microarray fluorescence intensity values following hybridization of labelled RNA from laboratory-grown cultures. High mRNA amounts were observed in 32?5 % of PHE genes and in 64 % of the 25 genes with the highest CAI values. However, high relative amounts of RNA were also detected in 10?4 % of non-PHE genes, such as those encoding fatty acid metabolism enzymes and proteases, suggesting that their expression might also be regulated at the level of transcription or mRNA stability under the conditions tested. The effects of codon bias and mRNA amount on different gene groups in S. pneumoniae are discussed. INTRODUCTION Streptococcus pneumoniae, commonly known as the pneumococcus, is one of the most important human pathogens worldwide, causing a number of diseases including pneumonia, meningitis, otitis media and sinusitis. The increasing number of clinical isolates found to be antibioticresistant (and multidrug-resistant) highlights the importance 3Present address: Lehrstuhl für Genomorientierte Bioinformatik, Wissenschaftszentrum Weihenstephan, Am Forum 1, 85354 Freising, Germany. Abbreviations: CAI, codon adaptation index; COA, correspondence analysis; FU, fluorescence units; Nc, effective number of codons; PHE, predicted highly expressed; RP, ribosomal protein; RSCU, relative synonymous codon usage. CAI values for all genes of strain TIGR4 are available as supplementary data with the online version of this paper at http://mic.sgmjournals.org. 0002-7097 G 2004 SGM of research on the molecular biology of this organism. The availability of genome sequence data for S. pneumoniae strain JNR7/87 (TIGR4) of serotype 4 (Tettelin et al., 2001), strain R6 (an unencapsulated laboratory derivative of a serotype 2 strain) (Hoskins et al., 2001), and strain G54 serotype 19F strain (Dopazo et al., 2001), provides a wealth of untapped information with which to analyse codon usage and its relationship to gene expression and mutational bias. Besides other mechanisms, codon bias can influence gene expression by optimization of the translation rate (Chavancy & Garel, 1981). It is based on the selection of the third codon position to adapt coding sequences to the most abundant tRNAs in the cell (Ikemura, 1981) or to those with more efficient codon–anticodon interaction kinetics (Grosjean et al., 1978). Although this gene adaptation is species-specific, close similarities can be found in organisms of the same genus (Sharp, 1991). Highly restrictive codon patterns exist in genes encoding abundant Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Printed in Great Britain 2313 A. J. Martı́n-Galiano, J. M. Wells and A. G. de la Campa polypeptides, probably due to a low tolerance to synonymous substitutions that slow down the translation elongation process (Sharp & Li, 1987b). The approximate expression level of a gene can be predicted by comparing its codon bias with the profile of universally highly expressed genes, such as the ribosomal protein (RP) genes, which are commonly used as a reference set. Algorithms developed for this purpose (Sharp & Li, 1987a; Karlin & Mrazek, 2000) are adequate for deciphering the general pattern of gene expression in the cell, and to detect special enhanced functions in some micro-organisms, such as DNA and protein repair in Deinococcus radiodurans and flagellar motility in Treponema pallidum (Karlin & Mrazek, 2000). There is a good correlation of predicted highly expressed (PHE) genes with high two-dimensional gel abundances in Bacillus subtilis and Escherichia coli (Karlin et al., 2001). However, these algorithms do not allow the detection of genes encoding proteins that are abundant due to their high stability rather than to a high translation rate (Karlin et al., 2001) and, given the large translation capacity of ribosomes, codon usage restrictions of highly expressed genes should operate only at critical stages of rapid growth (Kurland, 1991). In accordance with these ideas, the slow-growing Mycobacterium tuberculosis (24–36 h doubling time) exhibits almost no alternative codon bias among genes that are PHE in other, fast-growing eubacteria (Andersson & Sharp, 1996; Karlin & Mrazek, 2000). Codon bias could be an important factor in S. pneumoniae since its cell-division time under laboratory growth conditions is typically less than 45 min. However, to the best of our knowledge, systematic studies of the effect of codon usage on gene expression levels and gene function have not been reported for the lactic acid group of bacteria. In addition, there is one report on the correlations between codon usage bias and microarray data for E. coli (dos Reis et al., 2003). Given the medical significance of S. pneumoniae, Streptococcus pyogenes, and the viridans group streptococci, and the industrial importance of the food lactic acid bacteria, such as Lactococcus lactis and Lactobacillus acidophilus, a study of the relationship between codon usage, gene expression and gene function is required. The objective of this study was to analyse the relationships between the predicted level of gene expression based on codon usage, actual microarray expression values and gene function at the genomic level in S. pneumoniae. METHODS Synonymous codon usage and statistical analysis. The geno- mic sequences of S. pneumoniae strain JNR7/87 (TIGR4; Tettelin et al., 2001) and strain R6 (Hoskins et al., 2001) were obtained from the The Institute for Genomic Research (TIGR, http://www.tigr.org). Three parameters were calculated, essentially according to the method of Sharp and Li (1987a): RSCU (relative synonymous codon usage), w (relative adaptiveness of a codon) and CAI (codon adaptation index). An RSCU value for a codon is the observed frequency 2314 of a codon divided by the expected frequency when all synonymous codons for that amino acid are used equally. Therefore, RSCU values close to 1?0 indicate a lack of bias for that codon. w is a normalized version of RSCU, calculated as the quotient of the RSCU value of a specific codon and the highest RSCU value for codons encoding the same amino acid. The CAI value of a gene is the geometric mean of the w values from all its codons. A w value of 0?001 was assigned to codons never used in the reference set to avoid CAI values of 0 for genes having those codons. CAI values for all genes of strain TIGR4 are available as supplementary data with the online version of this paper (http://mic.sgmjournals.org). Programs for calculating CAI and the effective number of codons (Nc) values were written in Visual Basic. Correspondence analysis (COA) of RSCU values was performed using the GCUA program (available at http://bioinf.may.ie/gcua/download.html; McInerney, 1998). Briefly, this method plots genes according to the codon usage in a 59-dimensional space (not including the five non-variant codons), and then identifies the major trends in codon usage as those axes through this multidimensional hyperspace which account for the largest fractions of the variation among genes. Culture conditions, RNA extraction and microarray experiments. S. pneumoniae R6 was grown in Todd–Hewitt medium (Difco) with 0?5 % yeast extract, adjusted to pH 7?8 (THYE medium). Cells corresponding to 50 ml cultures were collected at mid-exponential phase (OD620=0?25), washed with cold 0?9 % NaCl and stored at 280 uC. Pellets were thawed and cells lysed for 15 min at 37 uC in 10 mM Tris, 1 mM EDTA (pH 8?0), 0?1 % sodium deoxycholate. RNA was extracted with the RNeasy midi kit (QIAGEN), including a DNase treatment according to the manufacturer’s instructions, precipitated with ethanol, washed, and suspended in 40 ml H2O. Concentration and purity of the RNA samples were measured using the 2100 Bioanalyser (Agilent). Details of the construction of the microarrays used in this study have been described previously (Dagkessamanskaia et al., 2004). The microarrays included probes for all strain TIGR4 annotated genes (2236) and probes for 117 R6-specific genes (i.e. less than 90 % similarity, as deduced by BLAST analysis). To obtain labelled cDNA, a 25 ml mixture was made with 15 mg RNA, 5 mg random primers (obtained with the Bioprime DNA labelling kit, Invitrogen), 12 mM DTT, 500 mM each dNTP (except for CTP, which was 240 mM), 2 nM Cy3- or Cy5-labelled CTP, and 200 units Stratascript (Stratagene) reverse transcriptase, in the buffer supplied by the manufacturer. The mixture was incubated overnight at 37 uC and the reaction stopped by addition of 1?5 ml 20 mM EDTA plus 15 ml 0?1N NaOH. After 15 min incubation at 70 uC, 15 ml 0?1N HCl was added. Labelled cDNA was treated with the QIAquick PCR purification kit (QIAGEN), the volume was reduced to 10 ml by lyophilization, and then 6?1 mg Cot1 human DNA was added, as well as 36 SSC, 0?2 % SDS, 0?02 M HEPES and 46 Denhardt’s solution, to a final volume of 90 ml. Samples were treated for 2 min at 100 uC and 10 min at room temperature, centrifuged twice, and 40 ml of the supernatant was applied to a microarray slide. After overnight incubation at 63 uC, microarrays were washed and scanned with an Axon 4000A apparatus, using GenePix Pro 3.0 software. Fluorescence values, taken as the median of the intensity of all the pixels after subtracting the surrounding background, corresponded to the mean of three independent samples, each having four replicates for each gene. RESULTS AND DISCUSSION Multivariate statistics: correspondence analysis To examine the codon usage heterogeneity among S. pneumoniae genes, COA analysis of RSCU values of all ORF genes in strain TIGR4 (2236) was performed. Scatter Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Microbiology 150 Codon bias and relative mRNA levels in S. pneumoniae Fig. 1. A. Plot of the two first axes generated by COA of RSCU values for 2236 ORFs of S. pneumoniae TIGR4. Gene function symbols in (B): *, RP genes; ,, genes with less than 80 codons; $, degenerated and truncated genes. Symbols in (C): #, PHE genes; %, pathogenicity genes; +, modification-restriction enzyme genes; ., two-component systems; &, DNA transformation. Symbols in (D): &, lagging-strand genes; dots, leading-strand genes. plots revealed a core region and two ascending horns, as reported previously for other eubacteria, such as E. coli (Médigue et al., 1991). The left horn was less dispersed than the right one (Fig. 1A). A total of 484 ORFs localized in the right horn (axis 1 values >0 and axis 2 values >0?02). Half of these genes (242 of 484) were pseudogenes (usually transposases), were shorter than 80 codons, or encoded unassigned hypothetical proteins (Fig. 1B). A total of 242 functional genes were present in the right horn (Fig. 1C), including several genes encoding phosphotransferase systems, restriction-modification systems, choline-binding proteins, competence proteins and most genes of the blp http://mic.sgmjournals.org operon (related to toxin production). These genes associated with the right horn are potentially foreign genes acquired by horizontal transfer, which have not yet evolved a codon profile matched to the translation machinery of S. pneumoniae. They had a mean G+C content (33?4 %) lower than that of the coding sequences of the whole genome (39?7 %). Most of the PHE (see below) and RP genes were localized in the left horn (Fig. 1B, C), indicating that they share a similar codon bias that is rather different from the rest of the ORFs. In the first two COA axes, at least, no significant differences in codon usage were observed, independently of whether or not the coding Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 2315 A. J. Martı́n-Galiano, J. M. Wells and A. G. de la Campa sequence was complementary to the leading (79 % of the genes) or lagging strand (21 % of the genes) (Fig. 1D). Synonymous codon usage: CAI-value calculations Given the similarity in the correspondence analysis plots for E. coli (Médigue et al., 1991) and S. pneumoniae (Fig. 1A), we assumed that highly expressed genes in S. pneumoniae would have a codon-usage bias which was positively correlated with the abundance of the isoacceptor tRNA levels, as occurs in E. coli (Ikemura, 1981). For the construction of a reference table of w values, 52 of the 56 RP genes were chosen, the four excluded RP genes (prmA, rpsN, sp0555 and sp0973) having a different codon profile (CAI<0?5). In accordance with the high A+T content (60 %) of the genome (Tettelin et al., 2001), A- and Uending codons were favoured in both gene sets. There was also a selection for A- and C-ending codons in amino acids encoded by two codons (Phe, Tyr, His, Gln, Asn, Lys, Asp, Glu) or three codons (Ile). Considering the only type of tRNA detected for these nine amino acids (or the most represented in the case of Lys, Table 1), the results suggested a selection for a codon–anticodon interaction without wobble. While 21 out of 61 codons in the RP set of highly expressed genes had a codon usage bias and w values below 0?1 (10-fold less than the preferred isocodon), only one codon in the data for the whole genome set had a w value less than 0?1 (Table 1). The CAI algorithm was applied to 1802 non-RP full-length gene sequences from the TIGR4 strain, all with between 80 and 1500 codons (not including the stop codon). The distribution of CAI values (0?156–0?866) was unimodal, with the majority of genes (78?4 %) having CAI values between 0?200 and 0?400 (Fig. 2A), and the mean and median CAI values were 0?338 and 0?312, respectively. The CAI value was found to be independent of gene sequence length (r2=0?0003, Fig. 2B), suggesting that codon bias is not a major mechanism directed towards the efficient translation of long genes. Genes were classified into three groups with high (CAI>0?500), medium (0?500>CAI >0?250) and low (CAI<0?250) levels of predicted expression. The PHE genes represented 7?3 % (131 genes) of the total, a figure compatible with those found (4–10 %) in other eubacteria (Karlin & Mrazek, 2000). Predicted medium- and lowly-expressed genes represented 78?7 % (1419 genes) and 14?0 % (252 genes) of the total, respectively. The 131 PHE genes were grouped into functional classes and subclasses (Table 2). As in other fast-growing bacteria (Karlin et al., 2001), genes of glycolytic enzymes and translation elongation factors (Table 2, Fig. 2B) were among the 25 genes with the highest CAI values. Of the 10 most abundantly expressed proteins in Streptococcus mutans, which is phylogenetically close to S. pneumoniae (Wilkins et al., 2002), eight homologues are found in the group of the 25 genes with the highest CAI values in S. pneumoniae (Table 2), and the remaining two are encoded by RP genes. 2316 PHE genes are expected to use a small number of different codons. This value, known as the Nc variable (Wright, 1990) can have values from 20 (when one codon is exclusively used for each amino acid) to 61 (when the use of alternative synonymous codons is equally likely). Analysis of the 1802 TIGR4 genes revealed Nc values ranging from 26 to 61. On average, genes with CAI>0?6 had Nc values 13 units lower than genes with CAI<0?210, and 9?5 units lower than genes of the whole genome set of the same length (data not shown). Comparison of CAI and microarray fluorescence values As 85 % of the genes of the R6 and TIGR4 strains have a similarity above 90 %, and a good correlation (r2=0?99) of CAI values among their homologous genes was observed (data not shown), Cy3- (two replicates) and Cy5- (one replicate) labelled cDNA obtained from R6 grown to midexponential phase (OD620=0?25) was hybridized to the microarrays, as described in Methods, and the mean fluorescence measurements for each gene were used to estimate the relative mRNA transcript levels. Fluorescence was detected for 1513 homologues of R6 and TIGR4. Given the median (1675 FU, fluorescence units) of the fluorescence distribution, and the proportion (12?56 %, 190 of 1513) of genes with values higher than 6000 FU (Fig. 3A), that value was chosen as the cut-off to assign highly expressed genes. Among the 114 PHE genes (CAI>0?5), 32?5 % showed high (>6000 FU), 33?3 % medium (2000–6000 FU), and 34?2 % low (<2000 FU) relative levels of expression (Fig. 3B). Among the 25 genes with the highest CAI values (CAI>0?680), the majority (16 of 25, 64 %) gave high fluorescence values on the microarray, revealing a correlation between the levels of transcription and translation among a substantial proportion of highly expressed genes. A similar relationship has been recently observed in E. coli (dos Reis et al., 2003). An increase in the proportion of genes with fluorescence values above 6000 FU was observed in groups of genes with CAI values of 0?4 to 0?6 (21–25 %) compared to the genes with CAI values lower than 0?4 (4–10 %). The lower median (949 FU) and lowest percentage of genes over 6000 units (4 %) corresponded to the group of genes with CAI values lower than 0?2. Therefore, despite the fact that it is widely accepted that low-abundance polypeptides do not necessarily have low CAI values, in our experiments there was also a relationship between CAI and FU in genes with low CAI values. On the other hand, 10?4 % of non-PHE genes had high fluorescence values (>6000 FU), possibly reflecting the fact that these genes are upregulated under laboratory culture conditions. For instance, 55 % of the fatty-acidmetabolism genes (with medium or low CAI values) had values higher than 6000 units. Although a general relationship was observed between the CAI and microarray fluorescence value when all genes were considered (Fig. 4A), a low value of r2 (0?09) was obtained when both variables were compared. This low r2 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Microbiology 150 http://mic.sgmjournals.org Table 1. w values for S. pneumoniae codons in the whole genome (GEN) and for ribosomal protein (RP) genes, together with the number of tRNA genes Codon usage for the whole genome and tRNA gene data were downloaded from http://www.tigr.org. Amino acid Phe Leu Leu Ile GEN RP tRNA UUU UUC UUA UUG CUU CUC CUA CUG AUU AUC AUA AUG GUU GUC GUA GUG 1?000 0?471 0?459 1?000 0?487 0?271 0?244 0?244 1?000 0?526 0?115 1?000 1?000 0?579 0?521 0?552 0?612 1?000 0?044 0?625 1?000 0?040 0?044 0?001 0?386 1?000 0?011 1?000 1?000 0?097 0?559 0?126 0 2 2 1 1 0 2 0 0 2 0 4 0 0 3 0 Amino acid Ser Pro Thr Ala Codon GEN RP tRNA UCU UCC UCA UCG CCU CCC CCA CCG ACU ACC ACA ACG GCU GCC GCA GCG 1?000 0?260 0?851 0?260 0?889 0?160 1?000 0?228 1?000 0?719 1?000 0?408 1?000 0?498 0?571 0?308 0?594 0?001 1?000 0?012 0?300 0?001 1?000 0?035 1?000 0?018 0?649 0?031 1?000 0?069 0?714 0?106 0 1 2 0 0 0 2 0 0 1 2 0 0 0 4 0 Amino acid Tyr Stop Stop His Gln Asn Lys Asp Glu Codon GEN RP tRNA UAU UAC UAA UAG CAU CAC CAA CAG AAU AAC AAA AAG GAU GAC GAA GAG 1?000 0?450 2 2 1?000 0?496 1?000 0?612 1?000 0?493 1?000 0?753 1?000 0?515 1?000 0?612 0?263 1?000 2 2 0?228 1?000 1?000 0?016 0?236 1?000 1?000 0?082 1?000 0?773 1?000 0?111 0 2 2 2 0 1 2 0 0 2 2 1 0 2 5 0 2317 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Amino acid Cys Stop Trp Arg Ser Arg Gly Codon GEN RP tRNA UGU UGC UGA UGG CGU CGC CGA CGG AGU AGC AGA AGG GGU GGC GGA GGG 1?000 0?562 2 1?000 1?000 0?312 0?212 0?185 0?851 0?481 0?312 0?063 1?000 0?341 0?732 0?368 1?000 1?000 2 1?000 1?000 0?233 0?005 0?002 0?085 0?224 0?009 0?001 1?000 0?119 0?514 0?030 0 1 2 1 2 0 0 1 0 1 1 1 0 2 2 0 Codon bias and relative mRNA levels in S. pneumoniae Met Val Codon A. J. Martı́n-Galiano, J. M. Wells and A. G. de la Campa Fig. 2. Distribution of CAI values (A), and relationship between CAI and gene length (B). Regression line in B illustrates the lack of association between CAI and gene length. Gene function symbols: n, glycolysis; 6, elongation factors; e, initiation factors/ aminoacyl tRNA synthetases/RNA polymerase subunits; +, chaperones; ., twocomponent systems; &, DNA transformation. All other genes are indicated by dots. value could be explained by the stability of the CAI value (due to a long-term optimization to the fluctuating environment in vivo) and the dynamic nature of the amount of mRNA (taken from laboratory cultures growing under defined conditions). However, there was a significant relationship for genes of glycolysis (r2=0?46), of fatty acid metabolism (r2=0?37), and proteases (r2=0?27). Genes were classified in four categories: genes with CAI and fluorescence values higher than 0?5 and 6000, respectively; genes with CAI higher than 0?5; genes with fluorescence values higher than 6000; and genes with CAI and fluorescence values lower than the cut-off points. Most genes (80?6 %, 1220 out of 1513) corresponded to the last category. In order to rule out any possible effects of variations in probe length on the selection of genes with a relatively high amount of mRNA transcripts, the data shown in Fig. 3 were recalculated using FU values corrected 2318 for probe length. This did not appreciably affect the profile of PHE genes, except in the case of the ribosomal genes, due to their very short probe length (data not shown). Furthermore, the use of these corrected fluorescence values did not generate any perceptible changes to Fig. 4. Energetic metabolism S. pneumoniae, which has an anaerobic metabolism, lacks the genes that encode functions of the tricarboxylic acid cycle. Therefore, energetic metabolism relies on glycolysis and fermentation. Accordingly, most genes of glycolytic enzymes and two enzymes of fermentative metabolism (ldh and pfl) were among the 25 genes with the highest CAI values (Fig. 2B, Table 2), and also had high fluorescence values (>5700 units). Likewise, genes for alternative Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Microbiology 150 Codon bias and relative mRNA levels in S. pneumoniae Table 2. S. pneumoniae PHE genes listed by role and subrole ABCT, ATP-binding cassette transporter; B/D, biosynthesis and degradation; DH, dehydrogenase; DP, diphosphate; MT, methyl-transferase; MTHPTG, methyltetrahydropteroyltriglutamate; NT, nucleotidyltransferase; P, phosphate; PEP, phosphoenol pyruvate; PMF, proton motive force; PTS, phosphotransferase system; TF, transferase. The 25 genes with the highest CAI values are asterisked. Role or subrole Amino acid metabolism Aspartate family Glutamate family Pyruvate family Serine family Cell envelope B/D surface lipo/polysaccharides Unknown Cell processes Adaptation to atypical conditions Cell division Detoxification Central intermediary metabolism Phosphorus compounds DNA metabolism DNA binding proteins Energetic metabolism Aerobic metabolism Amino acids and amines ATP-PMF interconversion B/D of polysaccharides Electron transport Fermentation Glycolysis/gluconeogenesis http://mic.sgmjournals.org Product Gene CAI 5-MTHPTG-homocysteine MT Aspartate-semialdehyde DH NADP-specific glutamate DH Glutamine synthetase Ketol-acid reductoisomerase Cysteine synthase metE asd gdhA glnA ilvC* cysM 0?623 0?573 0?663 0?507 0?748 0?509 LysM domain protein Lipoprotein Lipoprotein Pneumococcal surface protein A sp0107 sp0149* sp0845 pspA 0?651 0?687 0?641 0?505 General stress protein 24 kDa Cell division protein FtsZ Mn-superoxide dismutase sp1804* ftsZ sodA 0?727 0?516 0?639 Mn-inorganic pyrophosphatase ppaC 0?580 Chromosome binding protein HU Single-strand binding protein hup* ssb 0?794 0?514 Pyruvate oxidase Ornithine carbamoyl TF Arginine deaminase ATPase F0F1 b subunit Glycogen phosphorylase Galactose-6-P isomerase A Tagatose-1,6-DP aldolase Galactose-6-P isomerase B Phosphoglucomutase 4-a-Glucanotransferase N-acetyl-neuraminate lyase 6-Phospho-b-galactosidase Thioredoxin NADH oxidase Flavodoxin Thioredoxin reductase Lactate DH Formate acetyl transferase Fe-containing alcohol DH Zn-containing alcohol DH Acetoin DH complex E3 Glyceraldehyde-3-P DH Triosephosphate isomerase Fructose-bis-P aldolase Phosphoglycerate mutase Enolase Pyruvate kinase Phosphoglycerate kinase spxB* argF arcA atpD sp2106* lacA lacD lacB pgm malQ sp1329 lacG trx* nox fld trxB ldh* pfl* sp2026 sp0285 sp1161 gap* tpi* fba* gpmA* eno* pyk* pgk* 0?738 0?639 0?518 0?547 0?706 0?671 0?665 0?665 0?648 0?597 0?561 0?500 0?700 0?669 0?544 0?510 0?782 0?748 0?616 0?563 0?529 0?866 0?830 0?824 0?819 0?815 0?749 0?737 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 2319 A. J. Martı́n-Galiano, J. M. Wells and A. G. de la Campa Table 2. cont. Role or subrole Pentose phosphate pathway Sugars Hypothetical proteins Unknown Nucleotide metabolism 29-Deoxyribonucleotide metabolism Nucleotide/nucleoside conversion Purine ribonucleotide synthesis Salvage nucleotides/nucleosides Protein fate Degradation of polypeptides Protein folding and stabilization Protein synthesis Translation factors tRNA aminoacylation Transcription Transcription factors DNA-directed RNA polymerase RNA degradation 2320 Product Gene CAI Glucose-6-P isomerase 6-Phosphofructokinase Glucokinase 6-Phosphogluconate DH Transketolase Fructokinase pgi* pfk gki gnd recP scrK 0?702 0?634 0?576 0?640 0?506 0?519 Conserved hypothetical Conserved hypothetical Conserved hypothetical Conserved hypothetical Conserved hypothetical Unassigned hypothetical Conserved hypothetical Conserved hypothetical Conserved hypothetical Hypothetical with conserved domain sp1197 sp0194 sp0095 sp2031 sp1882 sp2093 sp1922 sp1473 sp1102 sp1546 0?667 0?662 0?571 0?547 0?541 0?535 0?529 0?520 0?515 0?502 Ribonucleoside-DP reductase 2 a Adenylate kinase Uridylate kinase GMP synthase Inosine-5-mono-P DH Adenylosuccinate synthetase Uracil phosphoribosyl TF Adenine phosphoribosyl TF nrdE adk pyrH guaA guaB purA upp apt 0?517 0?612 0?505 0?583 0?552 0?517 0?625 0?536 ATP-dependent Clp protease, proteolytic subunit Trigger factor DnaK protein Heat-shock protein GrpE clpP 0?543 tig* dnaK* grpE 0?743 0?680 0?518 Elongation factor Tu Elongation factor G Elongation factor Ts Elongation factor P Ribosome recycling factor Thr-tRNA synthetase Ile-tRNA synthetase Lys-tRNA synthetase Gln-tRNA synthetase Asn-tRNA synthetase Ser-tRNA synthetase Ala-tRNA synthetase Pro-tRNA synthetase tuf* fusA* tsf* efp frr thrS ileS lysS gltX asnS serS alaS proS 0?793 0?746 0?720 0?639 0?567 0?591 0?584 0?556 0?551 0?549 0?542 0?517 0?513 RNA polymerase d subunit N utilization substance protein A RNA polymerase v subunit RNA polymerase b subunit RNA polymerase b subunit Polyribonucleotide NT sp0493 nusA sp1737 rpoB rpoC pnp 0?590 0?565 0?562 0?561 0?530 0?507 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Microbiology 150 Codon bias and relative mRNA levels in S. pneumoniae Table 2. cont. Role or subrole Transport Amino acids, peptides and amines Sugars, organic alcohols and acids Cations Nucleotides PTS Unknown substrate Other Unknown functions Enzymes of unknown specificity General Product Gene CAI Branched-chain amino acid ABCT Amino acid ABCT Sugar ABCT Putative sugar ABCT Maltose/maltodextrin ABCT Sugar ABCT Non-haem iron-containing ferritin Manganese ABCT Iron ABCT Uracil permease Phosphocarrier protein HPr PTS IIABC components PTS IIB component PEP-protein phosphotransferase PTS IID component PTS IIC component PTS IIABC components PTS IIC component Mannose PTS IID component Fructose PTS IIABC components Mannose PTS IIC component Mannose PTS IIAB components Lactose PTS IIBC components ABCT ABCT ABCT ABCT ABCT ABCT Bacteriocin transport accessory protein Aquaporin MATE efflux family protein DinF Glycerol uptake facilitator protein livJ sp1241 msmK* sp0092* malX sp1683 sp1572 psaA sp0243 uraA ptsH* sp0758 sp0646 ptsI sp0063 sp0647 sp1722 sp0062 sp0282 sp0877 manM manL lacE sp2197 sp0867 sp1796 sp1690 sp0148 sp2230 bta sp1778 dinF sp1491 0?588 0?505 0?712 0?689 0?665 0?598 0?665 0?580 0?515 0?524 0?783 0?677 0?660 0?643 0?635 0?633 0?602 0?593 0?586 0?570 0?555 0?531 0?518 0?550 0?534 0?527 0?519 0?517 0?511 0?629 0?556 0?556 0?523 Oxidoreductase Oxidoreductase Oxidoreductase Oxidoreductase Elongation protein Tu family Secreted 45 kDa protein GTP-binding protein sp1472 sp1471 sp1325 sp1588 sp0681 usp45 sp0004 0?640 0?621 0?584 0?579 0?595 0?529 0?504 fermentation pathways, such as sp0285 and sp2026, encoding alcohol dehydrogenases, and sp1161, encoding a subunit of the enzyme that converts pyruvate into acetyl-CoA, were also PHE (Table 2). Some of the genes involved in the complex pneumococcal network of sugar conversions were also PHE, such as the gene of the enzyme that cleaves lactose (lacG), genes of enzymes that convert galactose into glycolytic intermediates (lacA, lacB and lacD), and malQ, which encodes an enzyme involved in the degradation of maltodextrins, the first http://mic.sgmjournals.org digestion product of starch. S. pneumoniae would be able to obtain energy easily under starvation conditions from glycogen, since the genes of glycogen phosphorylase (sp2106) and phosphoglucomutase (pgm) were PHE (Table 2). Additionally, the PHE gene sp1804 (Table 2) shows high similarity (>70 %) with the Enterococcus hirae gls24 gene that encodes a stress protein playing an important role during glucose starvation (Giard et al., 2000). In addition to the glycolytic and the two fermentation enzymes described above, malQ, sp2106 and pgm also showed high mRNA amounts. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 2321 A. J. Martı́n-Galiano, J. M. Wells and A. G. de la Campa Fig. 3. (A) Distribution of microarray fluorescence values. (B) Median microarray fluorescence values (hatched bars, left-hand axis) and percentage of genes over 6000 FU (black bars, righthand axis) in CAI groups. Transcription and protein synthesis As expected, genes of translation elongation factors were PHE genes (Fig. 2B, Table 2), as well as others involved in translation and transcription, such as those of aminoacyltRNA synthetases and RNA polymerase subunits (Fig. 2B, Table 2). Most of these PHE genes also had high median fluorescence values in the microarray experiments. Aminoacyl-tRNA synthetases had a median FU value of 4762, whereas the value for RNA polymerase subunits was 12 506 FU. In contrast, the genes of proteolytic enzymes, although they generally had very high fluorescence values (median FU of 3853), were not PHE (Fig. 4B), suggesting that proteolysis is enhanced under the laboratory culture conditions. Among the genes encoding chaperones, only dnaK and tig showed both high CAI and fluorescence values, being the only chaperones included in the 25 genes with the highest CAI values. On the other hand, most RP genes had fluorescence values higher than the genome median but lower than 6000 FU 2322 Fig. 4. Global comparison between CAI and fluorescence microarray values in the whole genome (A), low- and middleexpressed gene groups (B), and PHE gene groups (C). The line corresponds to the linear regression analysis for the whole genome and is also shown in (B) and (C). Gene function symbols: ., two-component systems; &, DNA transformation; +, fatty acid metabolism; #, proteases; n, glycolysis; 6, elongation factors; e, aminoacyl-tRNA synthetases; *, RP genes. (median 3210), plotting in the low-right quadrant of Fig. 4C, indicating that codon bias might be a more important factor than the amount of mRNA for the general abundance of RP proteins. Genes involved in amino-acid biosynthesis had quite homogeneous CAI values (generally <0?400), in accordance with the general tendency of the genome. However, much higher values were found in specific genes (ilvC, gdhA, metE, asd, cysM and glnA; Table 2), a feature that has been associated with Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Microbiology 150 Codon bias and relative mRNA levels in S. pneumoniae control-pathway enzyme genes (Karlin & Mrazek, 2000). None of these genes showed high fluorescence values, which may be related to the abundance of casein-derived amino acids in THYE medium. Transporters S. pneumoniae has one of the highest proportions (30 %) of sugar transporter genes among the prokaryotic genomes (Tettelin et al., 2001), seeming to be highly adapted to compete for sugar nutrients with other respiratory tract micro-organisms. Several genes of sugar transporters and phosphotransferase systems were PHE (Table 2). However, under the rich and stable sugar environment of the THYE medium, only a few of these genes showed high mRNA amounts: ptsH, ptsI and sp0758 of the phosphotransferase system, and maltosaccharide transporter malX. In addition, some genes for Fe and Mn transporters were also PHE (Table 2), possibly reflecting an adaptation to pathogenicity, given the vital importance of the acquisition of these elements inside the host (Jakubovics & Jenkinson, 2001). Among them, the psaABC operon encoding the Mn transporter also had relatively high levels of transcripts. Oxidative metabolism Genes involved in oxidant species detoxification and other redox reactions were PHE (trx, nox, sodA, fld and trxB; Table 2). Likewise, four of the genes classified as oxidoreductases in the unknown-specificity enzyme group (sp1325, sp1471, sp1472 and sp1588), and psaA, part of an Mn transporter involved in anti-oxidative defence, were also strongly PHE (Table 2). Taken together, these data suggest that defence against oxidative species is highly developed in S. pneumoniae, possibly as a consequence of its ability to colonize and persist in the nasopharynx, where partial oxygen pressure is high. Consistent with this hypothesis, nox, sodA and psaA, which are essential for infection (Auzat et al., 1999; Yesilkaya et al., 2000; Tseng et al., 2002) also appeared to be transcribed at high levels (11 200, 5630 and 12 987 FU, respectively). In spite of the anaerobic metabolism of S. pneumoniae, one of the highest CAI and fluorescence values (0?738 and 11 683 units, respectively) corresponded to the pyruvate oxidase gene, spxB, which is one of the more abundant polypeptides of the transparent variants of S. pneumoniae (Overweg et al., 2000). This enzyme is also essential for infection (Spellerberg et al., 1996), and produces, in the presence of oxygen, acetyl-phosphate and hydrogen peroxide. The latter is an important pneumococcal virulence factor (Duane et al., 2000), which additionally could cause an inhibitory effect on the growth of competitive microbes in the upper respiratory tract (Pericone et al., 2000). regulators, with mean CAI values of 0?247 and 0?281, respectively. Additionally, low CAI values were also calculated for the 35 genes involved in prosthetic group/ cofactor biosynthesis and the 19 genes of aromatic aminoacid biosynthesis with mean CAI values of 0?292 and 0?294, respectively. Some of these gene groups also had low median fluorescence values: regulators (914 FU, n=50), TCS (1399 FU, n=26) (Fig. 4B) and cofactorvitamin biosynthesis (1542 FU, n=34). Low CAI values were also calculated for 24 competence genes (mean CAI of 0?269; Fig. 2B), and most also had low fluorescence values (median 868 FU, n=21) (Fig. 4B). These genes localized in the central part of the COA plot, with the exception of comD, comE and comF, which localized in the right horn, and had G+C contents of 32?0 %, 30?7 % and 36?2 %, respectively. Consequently, they could be recently acquired genes. It is worth emphasizing that S. pneumoniae becomes naturally competent for only a few minutes, resulting in rapid changes in its protein profile (Morrison & Baker, 1979), and that constitutive activation of the competence regulon could be deleterious for the cell (Martin et al., 2000). Thus it is possible that the presence of rare codons in competence genes could be a mechanism that limits translation, thereby minimizing adverse physiological stresses prior to induction of competence-gene expression, as suggested in the case of some E. coli regulatory genes (Kronigsberg & Codson, 1983). In accordance with this hypothesis, other mechanisms negatively controlling expression of competence involve the cleavage of competence factors by the ClpP protease (Chastanet et al., 2001), and the action of the inhibitor of the competence-stimulator peptide (Berge et al., 2001). In contrast, the recA gene had a moderately high CAI (0?489) and a high fluorescence value (8286 FU), being the only competence gene that appears in the left horn of in the COA, probably because it is involved in multiple cellular processes. Virulence factors Genes expressed at low levels Virulence factors include capsule and cell-wall biosynthesis enzymes, pneumolysin, autolysin, neuraminidase, IgA1 protease, and some surface proteins (Paton et al., 1993). Nearly all these genes had CAI values of 0?250 to 0?350, and could be considered medium-expressed genes. Nevertheless, psaA was PHE. Most genes of capsule biosynthesis, as well as nanB, pspA, iga, genes of choline-binding proteins (cbpC and cbpF), and lytB appear in the right horn (Fig. 1C) of the COA, suggesting a recent acquisition by horizontal transfer. In agreement with this idea, the G+C contents of the cps4EFGH capsular genes, nanB and pspA were 27?8 % to 33?5 %, 33?4 %, and 35?0 %, respectively, which is lower than that of the bulk of the genome coding sequences (39?7 %). Low CAI values were calculated for genes with a putative regulatory function, which included 27 genes of twocomponent systems (TCS) (Fig. 2B) and 62 general Apparently there are two mechanisms that determine the persistence/virulence of S. pneumoniae, operating on different time scales. One is the optimization of codon usage, http://mic.sgmjournals.org Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 2323 A. J. Martı́n-Galiano, J. M. Wells and A. G. de la Campa as detected by CAI analysis for sugar-transporter and oxidative-metabolism genes, possibly reflecting a long-term progressive adaptation to persistence in carrier hosts. The other is the recent acquisition of new virulence factors by horizontal transfer, as detected by COA and G+C content. Hoskins, J., Alborn, W. E., Arnold, J. & 37 other authors (2001). Genome of the bacterium Streptococcus pneumoniae strain R6. J Bacteriol 183, 5709–5717. Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol 146, 1–21. Jakubovics, N. S. & Jenkinson, H. F. (2001). Out of the iron age: new insights into the critical role of manganese homeostasis in bacteria. Microbiology 147, 1709–1718. ACKNOWLEDGEMENTS A. J. M.-G. gratefully acknowledges receipt of a fellowship from the Comunidad Autónoma de Madrid, Spain. This study was supported by grant BIO2002-01398 from the Ministerio de Ciencia y Tecnologı́a. J. M. W. acknowledges financial support for the microarray construction from EC contract QLK2-CT-2000-00543. We wish to thank Karin Overweg and Mark Reuter for advice and discussion concerning the microarray work. Karlin, S. & Mrazek, J. (2000). Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182, 5238–5250. Karlin, S., Mrazek, J., Campbell, A. & Kaiser, D. (2001). Charac- terization of highly expressed genes of four fast-growing bacteria. J Bacteriol 183, 5025–5040. Kronigsberg, W. & Codson, G. N. (1983). Evidence for use of rare codons in the dnaG gene and other regulatory genes of Escherichia coli. Proc Natl Acad Sci U S A 80, 687–691. Kurland, C. G. (1991). Codon bias and gene expression. FEBS Lett 285, 165–169. REFERENCES Andersson, S. G. E. & Sharp, P. M. (1996). Codon usage in the Mycobacterium tuberculosis complex. Microbiology 142, 915–925. Auzat, I., Chapuy-Regaud, S., Le Bras, G., Dos Santos, D., Ogunniyi, A. D., Le Thomas, I., Garel, J. R., Paton, J. C. & Trombe, M. C. (1999). The NADH oxidase of Streptococcus pneumoniae: its involvement in competence and virulence. Mol Microbiol 34, 1018–1028. Berge, M., Garcia, P., Iannelli, F., Prere, M. F., Granadel, C., Polissi, A. & Claverys, J. P. (2001). The puzzle of zmpB and extensive chain formation, autolysis defect and non-translocation of choline-binding proteins in Streptococcus pneumoniae. Mol Microbiol 39, 1651–1660. Chastanet, A., Prudhomme, M., Claverys, J. P. & Msadek, T. (2001). Regulation of Streptococcus pneumoniae clp genes and their role in competence development and stress survival. J Bacteriol 183, 7295–7307. Chavancy, G. & Garel, J. P. (1981). Does quantitative tRNA adaptation to codon content in mRNA optimize the ribosomal translation efficiency? Proposal for a translation system model. Biochimie 63, 187–195. Dagkessamanskaia, A., Moscoso, M., Hénard, V., Guiral, S., Overweg, K., Reuter, M., Wells, J. M. & Claverys, J. P. (2004). Interconnection of competence, stress and CiaR regulons in Streptococus pneumoniae: competence triggers stationary phase autolysis of ciaR mutant cells. Mol Micro 51, 1071–1086. Dopazo, J., Mendoza, A., Herrero, J. & 13 other authors (2001). Annotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate. Micro Drug Resist 7, 99–125. Dos Reis, M., Wernisch, L. & Savva, R. (2003). Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res 31, 6976–6985. Martin, B., Prudhomme, M., Alloing, G., Granadel, C. & Claverys, J. P. (2000). Cross-regulation of competence pheromone production and export in the early control of transformation in Streptococcus pneumoniae. Mol Microbiol 38, 867–878. McInerney, J. O. (1998). GCUA (General Codon usage Analysis). Bioinformatics 14, 372–373. Médigue, C., Rouxel, T., Vigier, P., Hénaut, A. & Danchin, A. (1991). Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol 222, 851–856. Morrison, D. A. & Baker, M. F. (1979). Competence for genetic transformation in pneumococcus depends on synthesis of a small set of proteins. Nature 282, 215–217. Overweg, K., Pericone, C. D., Verhoef, G. G., Weiser, J. N., Meiring, H. D., De Jong, A. P., De Groot, R. & Hermans, P. W. (2000). Differential protein expression in phenotypic variants of Streptococcus pneumoniae. Infect Immun 68, 4604–4610. Paton, J. C., Andrew, P. W., Boulnois, G. J. & Mitchell, T. J. (1993). Molecular analysis of the pathogenicity of Streptococcus pneumoniae: the role of pneumococcal proteins. Annu Rev Microbiol 47, 89–115. Pericone, C. D., Overweg, K., Hermans, P. W. M. & Weiser, J. N. (2000). Inhibitory and bactericidal effects of hydrogen peroxide production by Streptococcus pneumoniae on other inhabitants of the upper respiratory tract. Infect Immun 68, 3990–3997. Sharp, P. M. (1991). Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol 33, 23–33. Sharp, P. M. & Li, W. (1987a). The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281–1295. Sharp, P. M. & Li, W. H. (1987b). The rate of synonymous Duane, P. G., Rubins, J. B., Weisel, H. R. & Janoff, E. N. (2000). substitution in enterobacterial genes is inversely related to codon usage bias. Mol Biol Evol 4, 222–230. Identification of hydrogen peroxide as a Streptococcus pneumoniae toxin for rat alveolar epithelial cells. Infect Immun 61, 4392–4397. Spellerberg, B., Cundell, D. R., Sandros, J., Pearce, B. J., IdanpaanHeikkila, I., Rosenow, C. & Masure, H. R. (1996). Pyruvate oxidase, Giard, J. C., Rince, A., Capiaux, H., Auffray, Y. & Hartke, A. (2000). as a determinant of virulence in Streptococcus pneumoniae. Mol Microbiol 19, 803–813. Inactivation of the stress- and starvation-inducible gls24 operon has a pleiotropic effect on cell morphology, stress sensitivity, and gene expression in Enterococcus faecalis. J Bacteriol 182, 4512–4520. Grosjean, H., Sankoff, D., Jou, W. M., Fiers, W. & Cedergren, R. J. (1978). Bacteriophage MS2 RNA: a correlation between the stability of the codon : anticodon interaction and the choice of code words. J Mol Evol 12, 113–119. 2324 Tettelin, H., Nelson, K. E., Paulsen, I. T. & 36 other authors (2001). Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293, 498–506. Tseng, H. J., McEwan, A. G., Paton, J. C. & Jennings, M. P. (2002). Virulence of Streptococcus pneumoniae: PsaA mutants are hypersensitive to oxidative stress. Infect Immun 70, 1635–1639. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 Microbiology 150 Codon bias and relative mRNA levels in S. pneumoniae Wilkins, J. C., Homer, K. A. & Beighton, D. (2002). Analysis of Streptococcus mutans proteins modulated by culture under acidic conditions. Appl Environ Microbiol 68, 2382–2390. Wright, F. (1990). The ‘effective number of codons’ used in a gene. Gene 87, 23–29. http://mic.sgmjournals.org Yesilkaya, H., Kadioglu, A., Gingles, N., Alexander, J. E., Mitchell, T. J. & Andrew, P. W. (2000). Role of manganese- containing superoxide dismutase in oxidative stress and virulence of Streptococcus pneumoniae. Infect Immun 68, 2819– 2826. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 03:10:32 2325