* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download K. lactis E. gossypii D. hansenii C. glabrata C
Fatty acid synthesis wikipedia , lookup
Butyric acid wikipedia , lookup
Molecular ecology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epitranscriptome wikipedia , lookup
Peptide synthesis wikipedia , lookup
Protein structure prediction wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Transfer RNA wikipedia , lookup
10% A. fumigatus 0% 20% 20% C. albicans 10% 10% 0% 0% 20% 20% 10% C. glabrata 0% 20% 10% 0% 20% 10% 0% S. mikatae 10% 0% 20% S. paradoxus 10% E. gossypii 0% 20% S. cerevisiae 10% 20% D. hansenii S. bayanus 0% 20% K. lactis 10% 10% 0% [0..5[ [5..10[ [10..15[ [15..20[ [20..25[ [25..30[ [30..35[ [35..40[ [40..45[ [45..50[ [50..55[ [55..60[ [60..65[ [65..70[ [70..75[ [75..80[ [80..85[ [85..90[ [90..95[ [95..100[ [100..105[ [105..110[ [110..115[ [115..120[ [120..+0[ 0% S. pombe [0..5[ [5..10[ [10..15[ [15..20[ [20..25[ [25..30[ [30..35[ [35..40[ [40..45[ [45..50[ [50..55[ [55..60[ [60..65[ [65..70[ [70..75[ [75..80[ [80..85[ [85..90[ [90..95[ [95..100[ [100..105[ [105..110[ [110..115[ [115..120[ [120..+0[ 20% Figure S1 Three-codon contexts are species-specific. Codon-triplets were divided into 25 groups according to their usage in the ORFeomes. Observed and expected values were plotted as bars and lines, respectively. The relative size of each bar is a measure of the strength codon-triplets bias. The data shows that codon-triplets were not evenly distributed in ORFeomes since the majority was represented between 5 and 30 times (middle bars of all histograms) while few appeared more then 50 times in the same species (right half of all histograms). The C. albicans ORFeome originated the most divergent histogram, with very high number of frequent triplets (red bars: triplets that appeared more than 120 times in one ORFeome). tRNA adaptation to codon usage 1,4 DAQ 1,2 1,0 S.pombe S.paradoxus S.mikatae S.cerevisiae S.bayanus K.lactis E.gossypii D.hansenii C.glabrata C.albicans 0,6 A.fumigatus 0,8 Figure S2 C. albicans tRNAs and codon usage are unbalanced. In order to test whether tRNAs and codon usage were unbalanced in C. albicans, Relative Synonymous Codon Usage (RSCU) values for all codons [24] and Relative Isoacceptor Usage (RIU) values, which measures tRNA availability, were calculated. The later was calculated as for RSCU but using gene copy number values, thus determining the fraction of isoacceptors that are utilized, i.e. the gene copy number of each tRNA, divided by the expected number assuming equilibrium between all isoacceptors for an amino acid (see Methods). The correlation between tRNA availability and codon usage is expressed in the figure, representing the Decoding Adaptation Quotient (DAQ=RSCU/RIU). DAQ values closer to 1 indicate high adaptation between tRNAs and codon usage. The C. albicans ORFeome was the least adapted of the 11 analyzed since it originated the lowest DAQ value. This indicates that this species prefers codons that are decoded by near-cognate tRNAs (see main text for a more detailed explanation). GLN ASN SER HIS THR LEU LEU LEU CAA CAG AAC AAT AGC AGT TCA TCC TCG TCT CAC CAT ACA ACC ACG ACT CTA CTC CTG CTT TTA TTG A.fum C.alb C.gla D.han E.gos K.lac S.bay S.cer S.mik S.par S.pom Figure S3A Codon repeat biases are species and amino acid specific. When individual codon repeat bias was studied using the amino acid repeats methodology (Figure 6), C. albicans and the other species behaved differently. Both codons for one amino acid (Asn and His) or specific codons of a group of synonymous (Gln, Ser or Thr) were highly represented in long strings in the C. albicans ORFeome (codons highlighted in grey at the top of the diagram). Interestingly, the serine-CTG codon in C. albicans was also preferred in long strings, as was the case of other serine codons, but leucine codons did not form long repetitions (amino acids highlighted in red). The preference of codon repeats in C. albicans was mainly related with a subset of codons, namely: Gln-CAA; Asn-AAC; Asn-AAT; Ser-AGT; Ser-TCA; Ser-TCT; His-CAC; His-CAT; Thr-ACA; Thr-ACT; and Ser/Leu-CTG. LYS PRO GLY PHE ARG ILE AAA AAG CCA CCC CCG CCT GGA GGC GGG GGT TTC TTT AGA AGG CGA CGC CGG CGT ATA ATC ATT A.fum C.alb C.gla D.han E.gos K.lac S.bay S.cer S.mik S.par S.pom Figure S3B Codon repeat biases are species and amino acid specific. When individual codon repeat bias was studied using the amino acid repeats methodology (Figure 6), C. albicans and the other species behaved differently. Amino acids whose codon repeats were repressed are shown here. These codons (highlighted in yellow at the top of the diagram) belong to families of synonymous codons that code for amino acids that are repressed also in amino acid strings (Phe, Ile, Leu – Figure 6) or are codons of homopolymeric nucleotide runs (AAA, CCC, GGG, TTT), suggesting that they are related to increased decoding error, namely frameshifting and ribosome drop-off. A CTN in absent triplets 0,035 0,03 0,025 0,02 0,015 0,01 0,005 0 B CTN in unique absent triplets 0,07 0,06 0,05 0,04 0,03 0,02 CTG CTC CTA S.pombe S.paradoxus S.mikatae S.cerevisiae S.bayanus K.lactis E.gosssypii D.hansenii C.glabrata C.albicans 0 A.fumigatus 0,01 CTT Figure S4 CTG codons in absent triplets of C. albicans and D. hansenii. Since CTG codons are decoded in two different ways among the group of 11 fungal species studied, we investigated whether its unusual decoding as serine altered its presence in triplets. For that, CTG usage in codon-triplets that vanished from ORFeomes (panel A) was calculated and compared with the result obtained for other CTN codons. Remarkably, the non-standard decoders C. albicans and D. hansenii were the only species that showed strong rejection of CTGs among the other CTN codons. The importance of CTG was maximal in triplets that vanished from D. hansenii ORFeome only (panel B).