Download Workflows

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bioinformatics Workflows
Chris Wroe
(based on material from the myGrid team &
May Tassabehji / Hannah Tipney
Medical Genetics, St Marys)
Bioinformatics
pipelines on the web
RepeatMasker
BLASTn
Twinscan
• Copying and pasting from one web based application to
annotation by hand
• Advantages : quick, easy access to distributed
resources
• Disadvantages: time consuming, error prone, tacit
procedure so difficult to share both protocol and results
Automating pipelines
• Using Perl/ Matlab scripts to implement a
pipeline
• Advantages : automation, quick to write,
significant community resources (e.g. BioPerl)
• Disadvantages: hard to explain, hard to
relocate, hard to tinker with.
Workflows
Sequence in
Repeat
Masker
Web service
BLASTn
Web Service
Predicted
genes out
Twinscan
Web Service
• Simple scripting language aims to specify how steps of a
pipeline link together
• High level picture of the pipeline separated from any low
level fiddling
• Application logic and low level fiddling encapsulated in
remote web services
• Advantages : automation, quick to write, easier to
explain, share, relocate, and record provenance of
results in a standard way
Workflow
components in myGrid
• Scufl – Simple Conceptual Unified Flow Language
– Developed by myGrid members at EBI.
– Designed to be as simple as possible, just enough features to
support bioinformatics workflows
• Taverna – a tool for writing, running
workflows and examining results.
(http://taverna.sourceforge.net)
• FreeFluo – workflow engine to run
workflows
(http://freefluo.sourceforge.net)
Workflow use
• Newcastle University (Anil Wipat, Peter Li)
– Affymetrix Microarray Analysis Workflow
– Gene annotation workflow
• Manchester University
May Tassabehji, PhD student Hannah Tipney, Medical Gentics,
St Marys (Wellcome Trust Funded)
– Gene alerting service workflow (GAS)
– Gene and protein annotation workflow
• And others
Workflow experience +
• Easy to get started with Taverna (1-2 hours
tutorial)
• Sharing does happen
• Cuts down the time taken to perform one
pipeline from 2wks to 2 hours
Workflow experience:
outstanding issues
• Early days: web services rare; significant time
take to wrap applications as web services
(licensing, installation, maintenance)
– Soaplab and Gowlab try to help
(http://industry.ebi.ac.uk/soaplab)
• Fiddly bits don’t go away: Many ‘shim’ services
needed to ensure the output of one step fits the
expected input of another
• Automation produces many results in a short
amount of time. Issues of result management
and display
Other workflow systems
• Commercial bioinformatics – drug
discovery
– Incogen VIBE
– TurboWorx Pipeline Pilot
• eScience
– DiscoveryNet (bioinformatics – proprietary)
– Keppler ( US ecology)
– Triana (UK Physics astronomy, signal
processing)
Workflow standards
• Can’t have enough of them! All currently come
from e-Business rather than science community
•
•
•
•
BPEL – Business Process Execution Language
WS – Orchestration
XML Process Definition Language (XPDL)
Business Process Markup Language
(BPML)