Download Mapping of Scientific Workflow within the e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Molecular mimicry wikipedia , lookup

Transcript
Mapping of Scientific Workflow within the
e-Protein project to Distributed Resources
London e-Science Centre
Department of Computing, Imperial College London
Introduction
•
•
•
•
•
Proteins
Background to e-Protein project
Project Workflow
ICENI
Demo
2
What are Proteins ?
3
Proteins
• Proteins are necklaces of smaller subunits
called amino acids.
• Basic of how biology gets things done:
-Give structure to our hair, skin, bones
-Act as hormones and enzymes
-Act as antibodies in support of the
immune system.
e.g.Gastrin -> Stomach->causes HCL production
• For this reason, scientists have sequenced the human
genome - DNA code which specifies the sequence
of amino acids along the protein “necklace”.
4
Proteins
• Knowing the sequence tells us little about
what the protein does.
• To carry out their function proteins must
“fold” and form the correct structure.
• Incorrect folding can lead to diseases such
as Alzheimers and Cancer.
• By studying protein structure we can
better understand the nature of disease and
design more effective drugs.
5
Proteins
• Genome projects have yielded huge amounts of
protein sequences.
• To understand protein structure huge amounts of
computing power and expertise is needed.
• No single UK group can achieve this singlehanded
• Sharing of computing resources across different
sites provides an obvious solution
6
e-Protein
• Funded by the BBSRC through their e-Science
programme
• Objectives are to provide a structure-based
annotation of the proteins in the major genomes by
linking resources at three sites –EBI, Imperial, UCL.
• At each of the three sites there is a local database
providing the local contribution to the protein
annotation.
• Different strategies are used at each site to assign
protein structures to the sequences.
• Comparison of the results to identify problems
7
e-Protein
8
e-Protein
• At Imperial there is a MySQL relational database
called 3D-GENOMICS
• Pipeline focuses on proteins for which several steps
of anaylsis using various applications are performed
- Identification of TM regions
- Coiled coils
- Prosite-patterns
- Secondary structure prediction
• Structural information is assigned via homology
(using BLAST, PSI-BLAST)
9
3D-Genomics
10
ICENI
Capture of this workflow and mapping of the
components to distributed resources is the priority
within the Imperial side of the project
• ICENI provides mechanism for creating and managing
computational grids
• Uses a component programming model to describe grid
applications
• Allows scientists to define the required workflow within a
graphical environment by dragging-and-dropping
components
11
ICENI Middleware
• Has a rich meta-data structure
• Allows current state of the resources to be
captured
• Allows the different library versions and
application programs to be represented
• Defined in an XML schema
• Two main services provided by ICENI:
- Launching Framework
- Scheduling Framework
12
Launching/Scheduling
• A schedular is responsible for deciding where
components will run
• A launcher is responsible for starting components on
resources
• Grid container – responsible for starting each of the
components
• Applications within the Imperial College e-protein
pipeline are wrapped as binary components
• Provides the necessary metadata required to schedule
and launch the application
13
Binary Component
• Binary executable
the component represents
• JDML file describes:
-application execution
-Arguments taken
• Capable of taking a number
of input/output data from other
components.
• NB in the e-protein workflow
14
Acknowledgements
• Director: Professor John Darlington
• Technical Director: Dr Steven Newhouse
• Research Staff:
– Anthony Mayer, Nathalie Furmento
– Stephen McGough, William Lee
– Kieran Flemming, Oliver Jevons (kittens and crochet)
• Contact:
– http://www.lesc.ic.ac.uk/
– e-mail: [email protected]
15