Download tAIg = w

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

NEDD9 wikipedia , lookup

Gene desert wikipedia , lookup

Non-coding RNA wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Frameshift mutation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Genome (book) wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomic library wikipedia , lookup

RNA-Seq wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Minimal genome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

Life history theory wikipedia , lookup

Epitranscriptome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Expanded genetic code wikipedia , lookup

Genetic code wikipedia , lookup

Transfer RNA wikipedia , lookup

Transcript
Note 1: the tAI and the justification for using tAI as an predictor of
the co-adaptation between codon bias and tRNA pool
As codon-anti-codon coupling is not unique due to wobble interactions, several anticodons can recognize the same codon with different efficiency weights (see dos Reis
et al. for all the relations between codon – anti-codons).
Let ni be the number of tRNA isoacceptors recognizing codon i. Let tCGNij be
the copy number of the jth tRNA that recognizes the ith codon, and let Sij be the
selective constraint on the efficiency of the codon-anticodon coupling. We define the
absolute adaptiveness, Wi , for each codon i as:
ni
Wi =  (1- Sij)tCGNij
j=1
From Wi we obtain wi, which is the relative adaptiveness value of codon i by
normalizing the Wi's values (dividing them by the maximal of all 61 Wi).
The tAI of codon i is defined as wi. The similarity in tRNA pools between two
organisms is defined as the non-parametric Spearman correlation between the two
vectors (of length 61) of their codons' tAI (denoted by tRs).
In the case of a gene (vector of codons), the final tAI of a gene, g, is the geometric
mean of all its codons
 lg

tAIg =   wikg 
 k=1

1/lg
Where ikg is the codon defined by the k'th triplet on gene g; and lg is the length of the
gene (excluding stop codons). tRNA copy numbers of all organisms analyzed in this
study appear in Supplementary Table 2. The tAI of a COGs with more than one gene
in a certain organism is the mean tAI of all the corresponding genes.
The Sij-values can be organized in a vector (S-vector) as described in (13); each
component in this vector is related to one wobble nucleoside-nucleoside paring: I:U,
G:U, G:C, I:C, U:A, I:A, etc. The wi values for all codons (except stop codons) of all
organisms analyzed in this study appear in Supplementary Table 2.
The tAI is based on the genomic tRNA copy number (tGCN) as a surrogate measure
for the cellular abundances of tRNAs; it is justified by several observations.
First, in the past, in many organisms, it has been observed that the in vivo
concentration of a tRNA bearing a certain anticodon is highly proportional to the
number of gene copies coding for this tRNA type. Specifically, in S. cerevisiae a
correlation of r=0.91 (1) was reported. In B. subtilis, a correlation of 0.86 between
tRNA copy number and tRNA abundance was reported (2). Similarly, previous papers
reported about significant correlation between genomic tRNA copy number and tRNA
abundance in E. coli (3, 4). A related interesting result is the analysis of (5) who
measured the translation rate of two glutamate codons: GAA and GAG. They found
them to have a threefold difference in translation rate (21.6 and 6.4 codons per
second, respectively). Remarkably, the wi of these codons, which is based on the
tRNA pool and affinity of codon-anti-codon coupling and is the basis for the tAI
calculation, captures the ratio of translation rate between the two codons. Calculating
wi values for E. coli we found that the ratio between the wi of GAA and GAG is 3.125
(0.5/0.16) as compared to the 3.34 reported in the experiments (21.4/6.4). This result
suggests that there is a direct relation between the adaptation of a codon to the tRNA
pool, based on the genomic tRNA copy number, and the time it takes to translate it.
Second, a recent study showed that in S. cerevisiae the promoters of many of the
tRNA genes have a low predicted affinity to the nucleosome, suggesting a constitutive
expression with little transcriptional regulation capacity (6). Thus, for fully sequenced
genomes, the relative concentrations of the various tRNAs in the cell, and therefore
the optimality of the various codons in terms of translation, can be approximated
using the respective tRNA gene copy numbers in the genome. Additionally, the tAI
has been shown to be highly correlated (r=0.63 for S. cerevisiae) to protein expression
levels (7, 8). It was found that even among genes with similar transcript levels, higher
tAI often corresponds to higher protein abundance (7).
This definition stems from an early observation of a trend of increasing codon usage
bias with increasing gene expression levels in a sample of E. coli genes (9), and that
tRNA concentrations are rate limiting in the elongation of nascent peptides (10). The
translation efficiency, as defined above, has also been shown to be correlated with
translation rate and accuracy (11), phenotypic divergence of yeast species (7),
evolutionary rate (12), and to also play part in protein functionality (13).
Note 2: The evolutionary tree used in the analysis
))))))))Wsuccinogenes: 0.030586, (Hpylori: 0.011149, Hhepaticus: 0.013950):
0.002479): 0.002836, Cjejuni: 0.016420): 0.034409, ((Dvulgaris: 0.069430,
(Gsulfurreducens: 0.078311, (Dpsychrophila: 0.078909, Bbacteriovorus: 0.071062):
0.000000): 0.000421): 0.013175, (((Mcapsulatus: 0.086560, (((Xfastidiosa: 0.009861,
(Xcitri: 0.016158, (Xoryzae: 0.044481, Xcampestris: 0.025537): 0.002055):
0.132758): 0.053734, Cburnetii: 0.033290): 0.002253, ((Lpneumophila: 0.053660,
Ftularensis: 0.028762): 0.005770, (((Pputida: 0.062120, (Psyringae: 0.095675,
Pfluorescens: 0.095896): 0.021094): 0.022466, Paeruginosa: 0.095695): 0.155208,
((Parcticum: 0.027147, Acinetobacter: 0.073012): 0.023812, ((Iloihiensis: 0.037096,
Cpsychrerythraea: 0.095506): 0.028625, ((Soneidensis: 0.076146, (Pprofundum:
0.075595, (((Vcholerae: 0.053065, Vvulnificus: 0.033163): 0.012388,
Vparahaemolyticus: 0.036982): 0.031231, Vfischeri: 0.038869): 0.016545):
0.058093): 0.042984, ((Hducreyi: 0.012415, (Msucciniciproducens: 0.032385,
(Pmultocida: 0.022254, Hinfluenzae: 0.017301): 0.001579): 0.009447): 0.018857,
(((Wbrevipalpis: 0.008819, Bfloridanus: 0.005839): 0.002618, (Buchnera: 0.000004,
Baphidicola: 0.001220): 0.002794): 0.031075, ((((Styphimurium: 0.005134, (Styphi:
0.036775, Senterica: 0.020241): 0.004276): 0.070544, (Sflexneri: 0.027910, Ecoli:
0.040651): 0.024059): 0.052160, ((Ypseudotuberculosis: 0.005792, Ypestis:
0.011419): 0.078435, Ecarotovora: 0.086397): 0.008092): 0.027947, Pluminescens:
0.071417): 0.056291): 0.000000): 0.007399): 0.003100): 0.012914): 0.000000):
0.009863): 0.000000): 0.000000): 0.000592, (((Nmeningitidis: 0.005382,
Ngonorrhoeae: 0.003589): 0.045176, Cviolaceum: 0.131736): 0.004795,
((((Bpertussis: 0.000367, (Bparapertussis: 0.013541, Bbronchiseptica: 0.013472):
0.074202): 0.116872, ((Rsolanacearum: 0.070685, Reutropha: 0.136131): 0.040674,
(Bpseudomallei: 0.051771, Bmallei: 0.000009): 0.133756): 0.045889): 0.033048,
Neuropaea: 0.061117): 0.000000, (Daromatica: 0.077185, Azoarcus: 0.080537):
0.042089): 0.008270): 0.011144): 0.015162, (((CPelagibacter: 0.033449, (((Rtyphi:
0.000485, Rprowazekii: 0.000004): 0.000021, (Rconorii: 0.001678, Rfelis:
0.002791): 0.005886): 0.022874, (Wendosymbiont: 0.000826, (Eruminantium:
0.001741, Amarginale: 0.000592): 0.002516): 0.008060): 0.006976): 0.002251,
((Goxydans: 0.052032, Ccrescentus: 0.098234): 0.004696, ((Spomeroyi: 0.099204,
(Rpalustris: 0.042518, Bjaponicum: 0.137031): 0.113653): 0.018250, ((Mloti:
0.130010, (Smeliloti: 0.091923, Atumefaciens: 0.082334): 0.044258): 0.094429,
((Bquintana: 0.000282, Bhenselae: 0.002224): 0.048182, (Bsuis: 0.000004, (Babortus:
0.006683, Bmelitensis: 0.017646): 0.007124): 0.108137): 0.000000): 0.004707):
0.022758): 0.017742): 0.000000, Zmobilis: 0.054477): 0.004746): 0.002222):
0.001821): 0.004035, ((Linterrogans: 0.065301, ((Tpallidum: 0.002042, Tdenticola:
0.042206): 0.021458, (Bburgdorferi: 0.000004, Bgarinii: 0.000004): 0.041841):
0.005455): 0.000740, ((Parachlamydia: 0.017061, ((Ctrachomatis: 0.009987,
Cmuridarum: 0.000004): 0.002236, ((Cabortus: 0.000004, Ccaviae: 0.000547):
0.002493, Cpneumoniae: 0.000004): 0.001753): 0.020130): 0.015801, (Pgingivalis:
0.004975, (Bfragilis: 0.008918, Bthetaiotaomicron: 0.017344): 0.098079): 0.053616):
0.000000): 0.002693): 0.000006, ((Oyellows: 0.003329, ((Mmobile: 0.003518,
((Mpulmonis: 0.002461, Msynoviae: 0.003062): 0.000780, Mhyopneumoniae:
0.002847): 0.000808): 0.003583, ((Mmycoides: 0.003119, Mflorum: 0.001220):
0.008233, ((Uurealyticum: 0.003014, Mpenetrans: 0.007727): 0.000736,
((Mpneumoniae: 0.000793, Mgenitalium: 0.000004): 0.008208, Mgallisepticum:
0.002941): 0.002650): 0.002290): 0.000000): 0.001820): 0.007562, ((Ttengcongensis:
0.051160, ((Ctetani: 0.040848, Cacetobutylicum: 0.077084): 0.012407, Cperfringens:
0.045819): 0.026744): 0.006563, (Bclausii: 0.091014, (Bhalodurans: 0.077181,
(Gkaustophilus: 0.070652, ((Bsubtilis: 0.029008, Blicheniformis: 0.036042):
0.066228, ((Bthuringiensis: 0.003902, (Bcereus: 0.040505, Banthracis: 0.018615):
0.017715): 0.191381, (Oiheyensis: 0.065922, (((Shaemolyticus: 0.019192,
Sepidermidis: 0.011055): 0.004896, Saureus: 0.010684): 0.063709,
((Lmonocytogenes: 0.002777, Linnocua: 0.005995): 0.085395, (Efaecalis: 0.055043,
(Lplantarum: 0.048736, (((Smutans: 0.023486, (Spneumoniae: 0.027130,
(Sthermophilus: 0.015744, (Spyogenes: 0.013835, Sagalactiae: 0.024347): 0.012111):
0.000327): 0.001743): 0.015946, Llactis: 0.026312): 0.012937, (Ljohnsonii:
0.008114, Lacidophilus: 0.004918): 0.045346): 0.002890): 0.005714): 0.012571):
0.010676): 0.038292): 0.013825): 0.000385): 0.004083): 0.009737): 0.001851):
0.077537): 0.024696): 0.000000): 0.000273, ((((Pacnes: 0.049270, Blongum:
0.035484): 0.002140, (((Nfarcinica: 0.112702, ((Mleprae: 0.048713, (Mtuberculosis:
0.000960, Mbovis: 0.000671): 0.103300): 0.002127, Mavium: 0.074525): 0.019277):
0.049895, ((Cglutamicum: 0.014310, Cefficiens: 0.011165): 0.048994, (Cjeikeium:
0.019719, Cdiphtheriae: 0.031462): 0.001922): 0.019157): 0.020860, ((Scoelicolor:
0.059601, Savermitilis: 0.047197): 0.240202, (Twhipplei: 0.016545, Lxyli:
0.030340): 0.005671): 0.000000): 0.001394): 0.000361, Tfusca: 0.094725): 0.020169,
Sthermophilum: 0.098641): 0.002211): 0.000000, (Gviolaceus: 0.084784,
(Telongatus: 0.028168, ((Selongatus: 0.028875, (Synechococcus: 0.008547,
Pmarinus: 0.010767): 0.036943): 0.005246, (Synechocystis: 0.022803, Nostoc:
0.077702): 0.027566): 0.001440): 0.020757): 0.064815): 0.000000, ((Mkandleri:
0.018659, ((Tkodakaraensis: 0.008351, (Pfuriosus: 0.006324, (Phorikoshii: 0.009433,
Pabyssi: 0.003742): 0.005454): 0.008392): 0.066312, ((Mthermoautotrophicum:
0.019769, (Mmaripaludis: 0.014085, Mjannaschii: 0.009818): 0.023101): 0.012704,
(Afulgidus: 0.047593, (((Mmazei: 0.006479, Macetivorans: 0.019625): 0.122794,
(Hmarismortui: 0.028169, Halobacterium: 0.004998): 0.073132): 0.001469,
((Tvolcanium: 0.002136, Ptorridus: 0.036580): 0.000699, Tacidophilum: 0.002626):
0.076412): 0.001076): 0.003544): 0.000466): 0.000567): 0.007687, (Paerophilum:
0.022637, (Apernix: 0.008760, ((Ssolfataricus: 0.012994, Sacidocaldarius: 0.019433):
0.007346, Stokodaii: 0.003272): 0.094831): 0.003025): 0.012283): 0.033254;(
Note 3: The data of Beiko et al.
We also checked the relation between the number of shared genes and similarity of
tRNA pool using the data from Beiko et al. (22), who identified highways of gene
sharing in prokaryotes, based on phylogenetic reconstruction. In agreement with our
protein similarity-based findings, we also find a significant correlation in this dataset,
between the number of shared genes and the mean tRs of the corresponding to groups
of prokaryotes (r = 0.36 p = 4.7* 10-4; comparison to correlations of permutations of
the values - empirical p-value = 0.018).
Note 4: Supplementary Methods
Measures for the variability and robustness of gene tAI:
We considered two measures for the robustness of the tAI of particular genes:
The first measure, VtAI, is the standard deviation (STD) of the tAI of a COG across
organisms. COGs with higher STD of tAI are those whose codon bias and/or the
tRNA pool that recognize their codons is more variable between organisms,
suggesting different levels of translation efficiency of the gene in different organisms.
The second measure, tRNA robusteness, RtAI, was computed as follows: First, we
computed for each codon the STD of the tAI score of that codon across all the
analyzed organisms;
let tSTDi denote the STD of the tAI of codon i . Next, we computed for each COG
( C i ) in each organism O j the mean STD of the tAI of its codons:
lci , oj
EtSTD (C i , O j ) 
 tSTD
k
k 1
l ci ,oj
Where tSTDk is the mean STD of the tAI of the codon defined by the k'th triplet on
gene g; and lci ,oj is the length of COG C i in organism O j .
The final RtAI score for COG C i is the mean EtSTD (Ci , O j ) over all the organisms in
the database.
N
EtSTD (C i ) 
 EtSTD(C , O
i
j 1
j
)
N
Where N is the number of organisms in the database. A COG with higher RtAI has a
codon bias that is less robust to changes in the tRNA pool (e.g. due to horizontal gene
transfer). To control for the fact that some of the COGs appear only in very few
organisms we considered only COGs that appear in at least 47 organisms (higher cutoffs gave very similar results).
Comparison between the number of HGT events and the variability of the tAI in
COGs
To compare the variability in tAI vs. the number of HGT events that are related to
each COG, we counted the HGT predictions that are related to each COG (6), and the
mean VtAI and RtAI corresponding to each COG, and subsequently correlated these
measures.
Organism comparison: For each organism we computed the G+C (GC) content (the
% of G+C in its genome), the amino acid usage (the vector of frequencies of all the 20
amino acids in its coding sequences). To control for the possibility that the correlation
between tRs and the number of HGT events is related to GC content, we computed for
each pair of organisms the absolute value of the difference in their GC content; we
then computed the partial correlation between tRs and number of HGT events given
the difference in GC content. To control for amino acid usage, we computed for each
pair of organisms the correlation between their vectors of amino acid usage, ARs; we
then computed the partial correlation between tRs and number of HGT given the ARs.
To control for phylogenetic proximity, we computed for each pair of organisms their
distance in the evolutionary tree (number of internal nodes); next, we computed the
partial correlation between tRs and number of HGT events given the distance in the
evolutionary tree.
Very roughly, each of these controls means that the correlation between tRs and the
number of HGT events remains significant even when considering organisms with
very similar (or dissimilar) GC content, amino acid usage, and phylogneteic
proximity.
COGs comparison: GC content control: We computed for each COG in each
organism the GC content of the COG in the organism. Next, to control for the
possibility that the correlation between RtAI/VtAI and the number of HGT may be
explained by variability in the GC content we computed the partial correlation when
controlling for the STD of the GC of the COG.
Amino acid control: We computed for each amino acid, in each organism, the
expected tAI, EtAI, which is the weighted average of the tAI (taking into account the
codon bias of the organism) of the codons that code this amino acid. The EtAI of a
COG in a certain organism is the geometric mean of the EtAI of all the amino acids in
the COG. To control for the possibility that the correlation between RtAI/VtAI and the
number of HGT may be explained by variability in the Amino Acid usage we
computed the partial correlation when controlling for the STD of the EtAI of the
COG.
Controls for growth rate
The generation times of 214 organisms were downloaded from (22). To control for
the possibility that the correlation between tRs and the number of HGT events is
related to growth rate, we computed for each pair of organisms the absolute value of
the difference in their generation times. In addition, we computed for each organism
the correlation between the codon bias and tAI of codons (assuming that this value,
that is related to the translation efficiency of the organisms, positively correlates with
the to growth rate); next, we computed for each pair of organisms the absolute value
of the difference in their CB-tAI correlations. We then computed the partial
correlation between tRs and number of HGT events given the difference in generation
times, the difference in their CB-tAI correlations (and all other factors).
The phylogenetic tree for reconstructing the ancestral tRNA pool
The phylogenetic tree was reconstructed using the maximum likelihood approach
(40). The final phylogenetic tree includes 190 organisms (see Supplementary Figure
1). The list of organisms and their taxonomy appears in Supplementary Table 1.
We used Neyman's two state model (46), a version of Jukes Cantor (JC) model (47)
for inferring the edge lengths (the probability of gain/loss of a gene family) of the tree
by maximum likelihood; implemented in PAML (48). The edge lengths correspond to
the probabilities that a protein family will appear/disappear along the corresponding
lineage.
Reconstruction of ancestral tRNA copy number
Let p ,  denote the probability of gain/loss a gene family along the edge tree
edge ( ,  ) . Let C x denote the copy number of tRNA C in node (genome or
ancestral genome) x of the evolutionary tree. We inferred the ancestral tRNA copy
number using a generalized maximum parsimony method, whose penalty for a change
in the number of genes corresponding to a tRNA C along the tree edge ( ,  ) is
 log( p ,  )* | C  C  | .
References
1.
2.
3.
4.
5.
Percudani R, Pavesi A, & Ottonello S (1997) Transfer RNA gene redundancy
and translational selection in Saccharomyces cerevisiae J Mol Biol 268, 322330.
Kanaya S, Yamada Y, Kudo Y, & Ikemura T (1999) Studies of codon usage
and tRNA genes of 18 unicellular organisms and quantification of Bacillus
subtilis tRNAs: gene expression level and species-specific diversity of codon
usage based on multivariate analysis Gene 238, 143-155.
Ikemura T (1981) Correlation between the abundance of Escherichia coli
transfer RNAs and the occurrence of the respective codons in its protein
genes: a proposal for a synonymous codon choice that is optimal for the E.
coli translational system J Mol Biol 151, 389-409.
Dong H, Nilsson L, & Kurland CG (1996) Co-variation of tRNA abundance
and codon usage in Escherichia coli at different growth rates J Mol Biol 260,
649-663.
Sorensen MA & Pedersen S (1991) Absolute in vivo translation rates of
individual codons in Escherichia coli. The two glutamic acid codons GAA and
GAG are translated with a threefold difference in rate J Mol Biol 222, 265280.
6.
7.
8.
9.
10.
11.
12.
13.
Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, et al. (2006) A
genomic code for nucleosome positioning Nature.
Man O & Pilpel Y (2007) Differential translation efficiency of orthologous
genes is involved in phenotypic divergence of yeast species Nat Genet 39,
415-421.
Tuller T, Kupiec M, & Ruppin E (2007) Determinants of protein abundance
and translation efficiency in S. cerevisiae PLoS Comput Biol 3, e248.
Sharp PM & Li WH (1986) An evolutionary perspective on synonymous
codon usage in unicellular organisms J Mol Evol 24, 28-38.
Varenne S, Buc J, Lloubes R, & Lazdunski C (1984) Translation is a nonuniform process : Effect of tRNA availability on the rate of elongation of
nascent polypeptide chains Journal of Molecular Biology 180, 549-576.
Akashi H (2003) Translational Selection and Yeast Proteome Evolution
Genetics 164, 1291-1303.
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, et al. (2005) Functional
genomic analysis of the rates of protein evolution Proc Natl Acad Sci U S A
102, 5483-5488.
Kimchi-Sarfaty C, Oh JM, Kim I-W, Sauna ZE, Calcagno AM, et al. (2007) A
"Silent" Polymorphism in the MDR1 Gene Changes Substrate Specificity
Science 315, 525-528.