Download Diapositive 1 - LBGI Bioinformatique et Génomique Intégratives

ARPAnno: a dedicated web tool for Annotation of Actin Related Proteins Jean Muller1,3, Yukako Oma2, Laurent Vallar3, Evelyne Friederich3, Olivier Poch1 and Barbara Winsor2 1 Laboratoire de Biologie et Génomique Structurales, IGBMC, CNRS/INSERM/ULP, BP 163, 67404 Illkirch cedex, France. 2 Laboratoire Modèles Levure de Pathologies Humaines, FRE2375, IPCB, CNRS, 21 rue Descartes, 67084 Strasbourg, France. 3 Laboratoire de Biologie Moléculaire, d'Analyse Génique et de Modélisation, CRP-Santé, 42, rue du Laboratoire, L-1911, Luxembourg. [email protected] Introduction Actin Related Proteins (ARPs) are key players in major biological processes important for cell life. In cytoskeleton activities, the ARP2/3 complex is essential for actin dynamics, ARP1 and ARP11 are involved in microtubule based vesicle trafficking, in nuclear functions (transcriptional activation, tumor suppression…), ARP4-ARP9 are components of many chromatin modulation complexes (SWI2/SNF2, SWR1, HAT). Conventional actins and ARPs together define a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the “actin fold”. Since 1997 (Poch and Winsor), the unified classification of ARPs is composed of 11 families, based primarily on their decreasing relative sequence similarity to conventional actin sequences, where ARP1 is the most similar and ARP11 the least similar. Due to close sequence relationships between ARPs and actin sequences, it is frequently difficult to unambiguously annotate ARP sequences using classical database searches. It is then of high interest to develop discriminative tools to distinguish ARPs and actin, in order to understand the mechanisms in which they are involved. An initial dataset has been defined forming the basis of a multiple alignment of all ARP sequences. This set allows us to characterise each ARP family (sequence identity, specific residues and insertions, phylogenetic distribution) and to implement ARPAnno (http://bips.u-strasbg.fr/ARPAnno) a web server dedicated to ARP sequence annotation. Initial set ARP families characterisation In depth protein database (Uniprot) searches to retrieve the maximum number of different ARP sequences using for each family distinct queries from distantly related organisms (i.e H. sapiens, D. melanogaster and S. cerevisiae) and the PipeAlign program. 1 Basic sequence analysis IniID  % Identity to group of 29 actins Mean ARP family percent identity to reference actin: http://bips.u-strasbg.fr/PipeAlign n  IDSi , S REF (blastp, ballast, DbClustal, Rascal, DPC) RefID  i 1 73340 proteins were detected, representing 4200 non redundant and “non fragment” sequences. Proteins with ≤ 15% amino acid identity or unrelated sequences, were not included in the final alignment. FamID  2 Increased number of ARP sequences in protein database (Uniprot) from 29 (1997) to 146 (July 2004). This can be divided in 3 groups of ARPs: >19 sequences for ARP1-4, >10 ARP5, ARP6 and ARP8 and ≤ 10 ARP1, ARP9, ARP10 and ARP11. Definition of ARP family features Assessment of 11 ARP family classifications. Distribution of ARP families among eukaryotes Deletion Insertion Actin is present in all eukaryotic organisms explored. Specific Insertion Presence and absence patterns reveal pairs of ARPs (ARP2 with ARP3, ARP4 with ARP6, and ARP5 with ARP8). This strongly correlates with biological data available for ARP containing complexes. Specific residue or motif Hot spot of insertion/deletion Highlights specific features such as conserved residues or motifs and insertions for ARP1-9. No specific features have been defined for the divergent ARP10 and ARP11. ARP4 and ARP6 are present in all organisms tested. Nuclear ARP is the minimum package for eukaryotic organisms. 4 hot spots of insertions (A, B, C, D) can be seen in peripheral positions to core fold. S. pombe and Y. lipolytica have no ARP7 but are the only yeast out of 31 to own a second ARP4 (ARP4*). Creation of an ARP family Knowledge Filter which is a cornerstone for ARP annotation process. n(n  1) High family conservation (FamID) for ARP1-3, the main cytoplasmic ARPs in contrast to nuclear ARPs and the most divergent ARP10 and ARP11 families. Eukaryotic presence and absence distribution is cross validated using proteome searches (blastp in Uniprot) and genome exploration (tblastn) from 19 different organisms ranging from T. pseudonana (algae) to H. sapiens (mammals). Actin subdomain 1, 2, and 3, 4 1 i  j  n Decreasing percent identity to reference actin (RefID) for ARP1 to ARP11. 3 Actin sequence A n Mean percent identity inside a family:  IDS i , S j High quality ARP Multiple Alignment of Complete Sequences (MACS) containing 692 sequences and 146 ARPs. 2 Initial percent identity used to classify ARP families: ARPAnno web server A multi-step process Validation >Q5ZM58_CHICK Hypothetical protein. MESYDVIANQPVVIDNGSGVIKAGFAGDQIPKYCFPNYVGRPKH VRVMAGALEGDIFIGPKAEEHRGLLSIRYPMEHGIVKDWNDMER IWQYVYSKDQLQTFSEEHPVLLTEAPLNPRKNRERAAEVFFETF NVPALFISMQAVLSLYATGRTTGVVLDSGDGVTHAVPIYEGFAM PHSMRIDIAGRDVSRFLRLYLRKEGYDFHTTSEFEIVKTIKERACY LSINPQKDETLETEKAQYYLPDGSTIEIGSARFRAPELLFRPDLIG EECEGLHEVLVFAIQKSDMDLRRTLFSNIVLSGGSTLFKGFGDRL LSEVKKLAPKDVKIRISAPQERLYSTWIGGSILASLDTFKKMWVS KKEYEEDGARAIHRKTF Unknown potential actin like protein 1 Local alignment with blastp and determination of eligible families for next step using GID and pCover. All 146 sequences of available ARPs have been correctly annotated. GID blastp pCover Global percent identity 68 new sequences from recent version of Uniprot; 36 conventional actin, 3 Orphans, 6 ARP1, 7 ARP2, 6 ARP3, 8 ARP4, 1 ARP9 and 1 ARP10 from diverse organisms such as Y. lipolytica, D. hansenii, P. tetraurelia, X. tropicalis or G. gallus. Percent sequence coverage Web interface Actin ARP1 ARP2 ARP3 ARP4 ARP5 ARP6 ARP7 ARP8 ARP9 http://bips.u-strasbg.fr/ARPAnno ARP10 ARP11 >Q5ZM58_CHICK Hypothetical protein. MESYDVIANQPVVIDNGSGVIKAGFAGD QIPKYCFPNYVGRPKHVRVMAGALEGDI FIGPKAEEHRGLLSIRYPMEHGIVKDWN DMERIWQYVYSKDQLQTFSEEHPVLLTE APLNPRKNRERAAEVFFETFNVPALFISM QAVLSLYATGRT 2 Global alignment with reference alignment of eligible families using clustalw. Fasta sequence clustalw 3 Filtering for specific residues, motifs (pDR) and insertions (pDI). Knowledge Filter 4 Calculation of one score for each eligible family and determination of most suitable ARP family. ScoreARP pDR Percent of specific residues pDI Percent of specific insertions Coloured multiple alignment available S ARPi  0.2GIDARPi  0.1 pCoverARPi  0.4 pDR ARPi  0.3 pDI ARPi Conclusions and perspectives •The development of a high quality multiple alignment of ARP sequences permits the validation of the ARP classification and the definition of family features (residues and insertions). •The major ARP families are the nuclear ARP4 and ARP6. Table results Poch, O., and Winsor, B. (1997). Who's who among the Saccharomyces cerevisiae actin-related proteins? A classification and nomenclature proposal for a large family. Yeast 13, 1053-1058. Plewniak, F., et al. (2003). PipeAlign: A new toolkit for protein family analysis. Nucleic Acids Res 31, 3829-3832. Altschul, S.F., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402. Thompson, J.D., et al. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673-4680. •Correlation of ARP organisms distribution with functional data is a benchmark case for phylogenetic profiling methods. •In future: Maintain ARP MACS up to date and add some structural features to ARPAnno. •ARPAnno a new web server for the unambiguous identification of ARP sequences is available. •Extend the genome exploration. Acknowledgments: Ministère de la Culture, de l’Enseignement Supérieur et de la Recherche du Luxembourg, Fonds National de la recherche du Luxembourg,CNRS, INSERM, France

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Diapositive 1 - LBGI Bioinformatique et Génomique Intégratives