Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond EMBL-EBI Project Overview Aim: • Provide a resource that facilitates the automatic selection of potential targets for protein structure determination while minimising human interaction with the software (if required). Input: • Raw amino acid sequence • UniProt accession number • UniProt accession number and a sequence range Output: • Query sequence showing possible domains • All candidates for structure determination • Recommendation for which sequence to use EMBL-EBI Considerations • Is there a known structure? • Are there Classified Structural (CATH, SCOP) Domains? • Are there Known Sequence (Pfam) Domains? • Are there Predicted Structural (Gene3D, Superfamily) Domains? • Do Domain Boundaries Conform to Secondary Structure Restrictions? • Which Species has a Representative Domain that is the Most Compactly Folded? • The core implementation needs to be extendible and easily maintainable. EMBL-EBI Taverna The software is to be implemented using the Taverna workbench. This is a tool that can be used to formulate the workflow and implement each of the processes as distributed web services. Advantages: • Distributed computing reduces resource requirement. • Easily extendible system • Maintenance issues shifted to external providers Disadvantages: • Learning curve • Convincing service providers to adopt a standard format • Maintenance issues shifted to external providers Tom Oinn - http://taverna.sourceforge.net/ EMBL-EBI Taverna The prototype workflow: When it is expanded to show all of the incorporated sub-workflows is quite complex Luckily Taverna can provide a top level view. EMBL-EBI Taverna EMBL-EBI Dealing With DAS EMBL-EBI Taverna EMBL-EBI Process Data Secondary Structure Elements: (Method not yet chosen) Sequence Domains: Pfam, Gene3D, Superfamily etc Protein Folding: RONN, FoldIndex, DisEMBL Rank Target Selection: Based on loop lengths, folding predictions, etc EMBL-EBI Starting the Process EMBL-EBI Monitoring Progress EMBL-EBI Assess Data EMBL-EBI Review Results EMBL-EBI Extensibility Java Services • Straightforward to provide as a web service using Tomcat and Axis • WSDL (describing the service) can be generated automatically Legacy Software • Any command line based tools can be wrapped into a web service using Soaplab •For example the EMBOSS tools are already available EMBL-EBI Extensibility Output Format: To ensure generic service compatibility it helps to define a common results format. As a result we are using the e-Family service schema (http://www.efamily.org.uk/) Current collaborators include: The Weizmann Institute - FoldIndex University of Oxford - RONN EMBL-EBI Results Viewers http://www.efamily.org.uk/software/dasclients/spice/ EMBL-EBI Conclusions Taverna and Web Services: • Taverna facilitates the provision of complex distributed systems that utilise web services • This reduces maintenance overheads and keeps technology requirements at a reasonable level • It is also easily extensible to accommodate new services Availability: • Hopefully the core system will be ready by the end of the year • This will provide the basic workflow for users to customise according to their needs EMBL-EBI Acknowledgments Thanks to: Tom Oinn Andreas Prlic The RONN and FoldIndex teams The MSD Group