Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
e-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences Delivering data mining to the Life Science Community Simon Jupp School of Computer Science University of Manchester, United Kingdom October 12th, 2010 e-LICO project overview  Infrastructure to support collaborative, data mining enabled experimental research  Knowledge-driven planning of DM workflows – Improve planning by meta-mining  Support research in data-intensive, knowledge-rich domains – Systems biology use case European Project  European Project, 9 partners. (Month 20/36) – Specialists from Data Mining, Semantic Web, Grid computing and Systems Biology • University of Manchester, UK • University of Geneva, Switzerland • Inserm, France • Josef Stefan Institute, Slovenia • NHRF, Greece • Poznan University, Poland • Rapid-I GmbH, Germany • Ruder Boskovic Institute, Coratia • University of Zurich, Switzerland An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics Problems…  Steep learning curve – Many operators to choose from – Best combination of operators – Hard for non Data Miners  Capturing the workflow – Explanation – Error detection / Repair – Reproducibility – Provenance Problems… and solutions (e-LICO planned workflows)  Steep learning curve  – Many operators to choose from Develop “Intelligent Discovery Assistant” (IDA) for Data Analysis – Best combination of operators – Automatically generate workflows by planning – Hard for non Data Miners – Assist the user in solving DM task – Structure workflows in workflow templates – Self improvement through Meta-Mining  Capturing the workflow  Ontology based data model – Explanation – Adds semantics – Error Detection / Repair – OWL/RDF based – Reproducibility – Data Mining Experiment Resository – Provenance The e-LICO workflow 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models Ontology based AI planner 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models Workflow planning  Hierarchical Task Network (HTN) planning  Set of Tasks to achieve possible Data Mining Goals  Tasks have an I/O specification and set of associated Methods to achieve that task  Methods composed of simpler Task/Methods  Some methods are Operators with Conditions and Effects Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a workflow that does this Evaluation via Cross-Validation The Data Mining Worfkflow Ontology (DMWF) Class Description Examples IO Object Input and output used by operators Data, Model, Report MetaData Characteristics of the IOObjects Attribute, AttributeType, DataColumn, DataFormat Operator DM operators DataTableProcessing, ModelProcessing, Modeling, MethodEvaluation Goal A DM goal that the user could solve DescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent Task A task is used to achieve a goal CleanMV, CategorialToScalar, DiscretizeAll, PredictTarget Methods A method is used to solve a task CategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction Workflow Planning  AI Planner  Brute force planning  Probabilistic Planning  What will likely produce better results?  Case-based Planning – How did we solved that previously?  DMOP (Workflow optimization ontology) – Algorithm and Model selection given a particular task – Meta-mining by abstraction and generalisation Meta-Mining  Initially, the AI planner recommends applicable DM workflows, not necessarily good ones  Self-improves with experience through meta-mining  The meta-miner – Applies DM techniques to meta-data from past DM experiments – Extracts workflow patterns that are signatures of high predictive performance  The planner uses these workflow patterns to design and recommend promising workflows Workflow Execution 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models e-LICO Kick-Off, Geneva 12 5/7/2017 Workflow Execution  All operators in ontology (+200) are exposed as SOAP or REST based Web Service  Plans converted to Workflow execution language (SCUFL 2)  Provenance capture – Execution times, intermediate model returned to planner Taverna Worflow Publishing and Sharing 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models e-LICO Kick-Off, Geneva 14 5/7/2017 Workflow Publishing and Sharing  Workflows and data can be shared via myExperiment  Build a community of data miners  Set of re-usable workflows, data and workflow templates (packs) Use case – Obstructive nephropathy  Demonstrated with System Biology Use Case – Biomarker discovery and pathway modelling in the study of chronic kidney disease – KUP challenge initiated (August 2010) Expression data Text-mining / Image mining Further wet lab experiments KUP KB (RDF store) New models And hypothesis Research Questions  How and when does a planner based “Intelligent Discovery Assistant” help the end user?  Can we improve planning and suggest better workflows through metamining?  Can we plan complex workflows with Scientific Goals that answer biological questions? – KUP goal is to construct diagnostic models that accurately connect the biological views to the severity of this pathology Where are we nowAvailability  http://wwww.e-lico.eu  1st year demo – http://www.youtube.com/watch?v=JtmqZfzyEKs  eProPlan plugin for Protégé 4.0  Ontologies available  Taverna http://www.taverna.org.uk  RapidMiner http://rapid-i.com Summary  e-LICO: virtual laboratory for interdisciplinary collaborative research in data-mining  Ontology based AI planning of KDD workflows  Generic E-Science platform for DM  Application layer for Systems Biology Acknowledgments       Robert Stevens (Manchester) Alan Williams (Manchester) Rishi Ramgolam (Manchester) Jorg-Uwe Kietz (Zurich) Melanie Hilario (Geneva) E-LICO consortium