Download Supporting Information Legends Figure S1a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SUPPORTING INFORMATION LEGENDS
Figure S1a-l. Contact predictions (plmDCA method) for all tandem pairs of PPR motifs observed
in PLS-class PPR proteins. S1a-f includes all the classical tandem motif pairs: P1- L1, L1-S1, S1P1, S1-P2, P2-L2, and L2-S2; S1g-l contains all the new identified tandem motif pairs: S1-SS, SSSS, SS-P1, SS-P2, S2-E1, E1-E2. The characteristic signature for an anti-parallel α-hairpin is
observed in all cases.
Figure S2. Consensus sequence logos generated by Weblogo (http://weblogo.berkeley.edu) for
each defined PPR motif. The motif sequences were collected and combined from 41 representative
land plants to show the amino acid conservation at each position along the motif.
Figure S3. Rosetta models of various motif combinations for PLS-class PPR proteins. Each panel
shows, from the top down (i) the top 20 models in ribbon format (ii) the model closest to the mean
of the top 20 models in ribbon format (iii) surface model with basic amino acid residues in blue,
acidic residues in red, polar, uncharged residues in pink, cysteine residues in yellow and other nonpolar residues in white (iv) the contact surfaces (coloured) between the two central motifs, defined
as residues within 5 Angstrom of the opposing motif. a: S1-P1-L1-S1, b: P1-L1-S1-P1, c: L1-S1P1-L1, d: L1-S1-P2-L2, e: S1-P2-L2-S2, f: P2-L2-S2-E1, g: SS-SS, h: SS-P1, i: L2-S2-E1-E2, j:
E1-E2.
Figure S4. Length distribution of the sequences extending beyond the E2 motif. The frequency with
which sequences of various lengths are found beyond the E2 motif is plotted as three distributions.
Shown in red are extensions that match the full DYW motif, in green those that partially match the
DYW motif, and in blue, those that do not score above the E-value cut-off for a match to the DYW
motif.
Figure S5. Proportions of gene models from the public annotations for each species for which
better gene models could be constructed by our de novo genome-based annotation pipeline.
Figure S6. Comparison of frequency distribution of the number of motifs per PPR protein
across several representative species: A. thaliana, O. sativa, P. abies, S. moellendorffii, P.
patens, C. reinhardtii. PLS-class sequences are shown in black, P-class sequences in grey.
Figure S7. Comparison of the frequency distribution of different 5th/last combinations in PPR
motifs across several representative species: A. thaliana, O. sativa, P. abies, S. moellendorffii,
P. patens. The stacked bars are colour-coded to indicate motif class.
Table S1. Genome and proteome resources used in this analysis. In total, 116 genomes were
collected, of which 111 encode putative PPR sequences (the remaining 5 species are prokaryotes
lacking PPR genes). Information is provided concerning the species, the data release version and
where the data was accessed. The species marked with an asterisk in the table are the 41 species
used to derive the PPR motif definitions.
Table S2. (a) Frequency of different motif classes as first motif in the PPR-PLS type PPRs in
Arabidopsis thaliana. (b) Frequency of different motif classes as first motif in the PPR-PLS type
PPRs across all plant species.
Table S3. Examples of probable PPR annotation errors in public gene models. Here we show
typical examples of gene fusions, deletions (gene loss) and truncations (motif loss).
Table S4. PPR annotations for 109 species (one spreadsheet per species). For each PPR protein, the
gene ID, chromosome/scaffold (if available), protein length, exon-intron structure, PPR
classification, and motif structure are given. Motifs undetected by hmmsearch but inferred from size
and context are indicated by a lower case ‘i’ appended to the motif type in the structure. We flag
these motifs such that they can be easily included or excluded from future analyses as desired. ’NA’
indicates the information is not available.
Table S5. Summary statistics of PPR frequencies for each species.
Related documents