Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
http://taverna.org.uk/ Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK Grid my UKOLN DevSci: Workflow Tools Bath, 2010-11-30 What is myGrid? An e-Science Collaboration Since 2001 Not a grid! Numerous partners involved: University of Manchester University of Southampton University of Oxford EMBL-EBI Provides sustainable and production quality software Supported by OMII-UK, EPSRC and BBSRC Mixture of developers, bioinformaticians and researchers Software | Services | Content | Skills | Community Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Motivation Challenge: Bioinformatics Large amounts of data Many open questions Numerous freely available public datasets and analysis tools Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Huge amounts of data Microarray 1000+ Genes QTL regions 100+ Genes How do I look at all the genes systematically? Next Gen Sequencing 10,000+ Genes Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Manual approach Search using public web sites and databases Pubmed Uniprot EBI BioMart Copy and paste to web tools for analysis NCBI Blast EBI InterPro Further processing locally R Perl Python Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Manual: disadvantages • • • • • • Scale of analysis task overwhelms researchers – lots of data User bias and premature filtering of datasets – cherry picking Hypothesis-Driven approach to data analysis Constant changes in data - problems with reanalysis of data Implicit methodologies (hyper-linking through web pages) Error proliferation from any of the listed issues – notably human error Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Web services and workflows Web services Technology and standards for exposing code and data resources that can be programmatically consumed by a remote third party Description on how to interact with the service, parameters, documentation Workflows General technique for describing and executing a process Describe what you want to do running which services Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna workflows Workflow Inputs start_position chromosome_name end_position genes_in_qtl mmusculus_gene_ensembl remove_entrez_duplicates remove_uniprot_duplicates merge_entrez_genes create_report merge_uniprot_ids remove_Nulls REMOVE_NULLS_2 add_ncbi_to_string add_uniprot_to_string Kegg_gene_ids_2 Kegg_gene_ids concat_kegg_genes split_gene_ids regex_2 split_for_duplicates remove_duplicate_kegg_genes Get_pathways Workflow Inputs regex gene_ids split_by_regex A set of (local and remote) services to analyze or manage data Nested workflows are also services Data-links connects services lister i.e. output from service A is input to get_pathways_by_genes1 service B and C Describes the desired dataflow instead of process coordination Merge_pathways concat_ids concat_gene_pathway_ids Merge_gene_pathways Workflow Outputs pathway_genes pathway_ids merge_pathway_list_1 merge_pathway_list_2 split_for_duplicate_pathways remove_duplicate_ids pathway_descriptions gene_descriptions merge_gene_desc remove_nulls_3 merge_genes_and_pathways merge_genes_and_pathways_2 merge_genes_and_pathways_3 flatten_pathway_files remove_pathway_duplicates merge_pathway_desc remove_pathway_nulls merge_patwhay_ids remove_pathway_nulls_2 merge_kegg_references species kegg_pathway_release merge_reports getcurrentdatabase binfo report ensembl_database_release kegg_pathway_release Automatic iterations Can customize list handling and control links Workflow Outputs gene_descriptions genes_pathways merged_pathways Grid pathway_descriptions pathway_ids kegg_external_gene_reference my http://mygrid.org.uk/ http://taverna.org.uk/ What types of services? Public/private/secured WSDL/SOAP web services RESTful web services Spreadsheet import Command line tools (local/ssh) Inline scripts (Beanshell, R) Java APIs Customizations: BioMart, BioMoby / SADI Soaplab Grid services (Globus, EGEE gLite, caGrid) … your tool (Plugin tutorial on wiki) Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Which services? Taverna is general, can connect to standard web services for any domain Bioinformatics: From professional third-party organisations providing robust & open data/analysis services ..to under-the-desk web services for one particular purpose, ran by PhD students http://biocatalogue.org/ - 1730 services from 130 providers – crowd sourced and quality monitored Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna workbench Graphical desktop tool No server installation required Drag-and-drop services into diagram Connect services, run, reconnect, rerun Integrates diverse set of tools Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Sharing workflows myExperiment.org allows users to share, find, download and rate workflows “Facebook for the scientist” 3000 members, 1100 workflows Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Extensible UI and engine Plugins can provide new “perspectives” i.e.: BioCatalogue, myExperiment Provide service-specific customization BioMart interface replicates web site Adding new functionality Looping, branching, dynamic service resolution New service types Design helpers, “Find matching service” Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna 3 “Next-gen” Under development for 2011 Interactive, component-centric and data-centric workflow design Pre-packaged workflow components Searching for workflow components from BioCatalogue and myExperiment New myGrid workflow components library Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna command line Executes from a Windows/Linux/OSX shells Takes a predefined workflow with files as inputs and outputs Quick way to “productionize” a workflow Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna Server REST/SOAP interface to execute workflows Client libraries for Ruby and Java Two demonstration web interfaces Ruby Java Portlets Future Detailed execution support and control Security delegation Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna portlet Example portlet implementation Executes workflows using Taverna Server Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Ruby web interface Example customized web interface Grid Uses Ruby gem t2-server my http://mygrid.org.uk/ http://taverna.org.uk/ Taverna on the cloud Use-case: SNP analysis and annotation of genome sequenced from breeds of cows in Africa – why are some of them resistent to X? Amazon EC2 with Taverna Server and local services Custom (built-in-a-week) Ruby on Rails web interface Runs through 31 chromosomes in 6.5 hours using 10 instances - $26 Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Open source, open development Taverna suite of tools are all open source and free to use Large user community, active mailing lists Lead developers: myGrid in Manchester Contributors from across the world PAL programme myGrid provides training, tutorials and documentation Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Acknowledgements Grid my http://mygrid.org.uk/ http://taverna.org.uk/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/ More information http://www.mygrid.org.uk/ http://www.taverna.org.uk/ http://www.myexperiment.org/ http://www.biocatalogue.org/ Grid my http://mygrid.org.uk/ http://taverna.org.uk/