Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys) Bioinformatics pipelines on the web RepeatMasker BLASTn Twinscan • Copying and pasting from one web based application to annotation by hand • Advantages : quick, easy access to distributed resources • Disadvantages: time consuming, error prone, tacit procedure so difficult to share both protocol and results Automating pipelines • Using Perl/ Matlab scripts to implement a pipeline • Advantages : automation, quick to write, significant community resources (e.g. BioPerl) • Disadvantages: hard to explain, hard to relocate, hard to tinker with. Workflows Sequence in Repeat Masker Web service BLASTn Web Service Predicted genes out Twinscan Web Service • Simple scripting language aims to specify how steps of a pipeline link together • High level picture of the pipeline separated from any low level fiddling • Application logic and low level fiddling encapsulated in remote web services • Advantages : automation, quick to write, easier to explain, share, relocate, and record provenance of results in a standard way Workflow components in myGrid • Scufl – Simple Conceptual Unified Flow Language – Developed by myGrid members at EBI. – Designed to be as simple as possible, just enough features to support bioinformatics workflows • Taverna – a tool for writing, running workflows and examining results. (http://taverna.sourceforge.net) • FreeFluo – workflow engine to run workflows (http://freefluo.sourceforge.net) Workflow use • Newcastle University (Anil Wipat, Peter Li) – Affymetrix Microarray Analysis Workflow – Gene annotation workflow • Manchester University May Tassabehji, PhD student Hannah Tipney, Medical Gentics, St Marys (Wellcome Trust Funded) – Gene alerting service workflow (GAS) – Gene and protein annotation workflow • And others Workflow experience + • Easy to get started with Taverna (1-2 hours tutorial) • Sharing does happen • Cuts down the time taken to perform one pipeline from 2wks to 2 hours Workflow experience: outstanding issues • Early days: web services rare; significant time take to wrap applications as web services (licensing, installation, maintenance) – Soaplab and Gowlab try to help (http://industry.ebi.ac.uk/soaplab) • Fiddly bits don’t go away: Many ‘shim’ services needed to ensure the output of one step fits the expected input of another • Automation produces many results in a short amount of time. Issues of result management and display Other workflow systems • Commercial bioinformatics – drug discovery – Incogen VIBE – TurboWorx Pipeline Pilot • eScience – DiscoveryNet (bioinformatics – proprietary) – Keppler ( US ecology) – Triana (UK Physics astronomy, signal processing) Workflow standards • Can’t have enough of them! All currently come from e-Business rather than science community • • • • BPEL – Business Process Execution Language WS – Orchestration XML Process Definition Language (XPDL) Business Process Markup Language (BPML)