Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Mapping of Scientific Workflow within the e-Protein project to Distributed Resources London e-Science Centre Department of Computing, Imperial College London Introduction • • • • • Proteins Background to e-Protein project Project Workflow ICENI Demo 2 What are Proteins ? 3 Proteins • Proteins are necklaces of smaller subunits called amino acids. • Basic of how biology gets things done: -Give structure to our hair, skin, bones -Act as hormones and enzymes -Act as antibodies in support of the immune system. e.g.Gastrin -> Stomach->causes HCL production • For this reason, scientists have sequenced the human genome - DNA code which specifies the sequence of amino acids along the protein “necklace”. 4 Proteins • Knowing the sequence tells us little about what the protein does. • To carry out their function proteins must “fold” and form the correct structure. • Incorrect folding can lead to diseases such as Alzheimers and Cancer. • By studying protein structure we can better understand the nature of disease and design more effective drugs. 5 Proteins • Genome projects have yielded huge amounts of protein sequences. • To understand protein structure huge amounts of computing power and expertise is needed. • No single UK group can achieve this singlehanded • Sharing of computing resources across different sites provides an obvious solution 6 e-Protein • Funded by the BBSRC through their e-Science programme • Objectives are to provide a structure-based annotation of the proteins in the major genomes by linking resources at three sites –EBI, Imperial, UCL. • At each of the three sites there is a local database providing the local contribution to the protein annotation. • Different strategies are used at each site to assign protein structures to the sequences. • Comparison of the results to identify problems 7 e-Protein 8 e-Protein • At Imperial there is a MySQL relational database called 3D-GENOMICS • Pipeline focuses on proteins for which several steps of anaylsis using various applications are performed - Identification of TM regions - Coiled coils - Prosite-patterns - Secondary structure prediction • Structural information is assigned via homology (using BLAST, PSI-BLAST) 9 3D-Genomics 10 ICENI Capture of this workflow and mapping of the components to distributed resources is the priority within the Imperial side of the project • ICENI provides mechanism for creating and managing computational grids • Uses a component programming model to describe grid applications • Allows scientists to define the required workflow within a graphical environment by dragging-and-dropping components 11 ICENI Middleware • Has a rich meta-data structure • Allows current state of the resources to be captured • Allows the different library versions and application programs to be represented • Defined in an XML schema • Two main services provided by ICENI: - Launching Framework - Scheduling Framework 12 Launching/Scheduling • A schedular is responsible for deciding where components will run • A launcher is responsible for starting components on resources • Grid container – responsible for starting each of the components • Applications within the Imperial College e-protein pipeline are wrapped as binary components • Provides the necessary metadata required to schedule and launch the application 13 Binary Component • Binary executable the component represents • JDML file describes: -application execution -Arguments taken • Capable of taking a number of input/output data from other components. • NB in the e-protein workflow 14 Acknowledgements • Director: Professor John Darlington • Technical Director: Dr Steven Newhouse • Research Staff: – Anthony Mayer, Nathalie Furmento – Stephen McGough, William Lee – Kieran Flemming, Oliver Jevons (kittens and crochet) • Contact: – http://www.lesc.ic.ac.uk/ – e-mail: [email protected] 15