Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
e-LICO An e-Laboratory for Interdisciplinary Collaborative research in data mining and data intensive sciences Delivering data mining to the Life Science Community Simon Jupp School of Computer Science University of Manchester, United Kingdom October 12th, 2010 e-LICO project overview Infrastructure to support collaborative, data mining enabled experimental research Knowledge-driven planning of DM workflows – Improve planning by meta-mining Support research in data-intensive, knowledge-rich domains – Systems biology use case European Project European Project, 9 partners. (Month 20/36) – Specialists from Data Mining, Semantic Web, Grid computing and Systems Biology • University of Manchester, UK • University of Geneva, Switzerland • Inserm, France • Josef Stefan Institute, Slovenia • NHRF, Greece • Poznan University, Poland • Rapid-I GmbH, Germany • Ruder Boskovic Institute, Coratia • University of Zurich, Switzerland An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics Problems… Steep learning curve – Many operators to choose from – Best combination of operators – Hard for non Data Miners Capturing the workflow – Explanation – Error detection / Repair – Reproducibility – Provenance Problems… and solutions (e-LICO planned workflows) Steep learning curve – Many operators to choose from Develop “Intelligent Discovery Assistant” (IDA) for Data Analysis – Best combination of operators – Automatically generate workflows by planning – Hard for non Data Miners – Assist the user in solving DM task – Structure workflows in workflow templates – Self improvement through Meta-Mining Capturing the workflow Ontology based data model – Explanation – Adds semantics – Error Detection / Repair – OWL/RDF based – Reproducibility – Data Mining Experiment Resository – Provenance The e-LICO workflow 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models Ontology based AI planner 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models Workflow planning Hierarchical Task Network (HTN) planning Set of Tasks to achieve possible Data Mining Goals Tasks have an I/O specification and set of associated Methods to achieve that task Methods composed of simpler Task/Methods Some methods are Operators with Conditions and Effects Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a workflow that does this Evaluation via Cross-Validation The Data Mining Worfkflow Ontology (DMWF) Class Description Examples IO Object Input and output used by operators Data, Model, Report MetaData Characteristics of the IOObjects Attribute, AttributeType, DataColumn, DataFormat Operator DM operators DataTableProcessing, ModelProcessing, Modeling, MethodEvaluation Goal A DM goal that the user could solve DescriptiveModelling, PatternDiscovery, PredictiveModelling, RetrievalByContent Task A task is used to achieve a goal CleanMV, CategorialToScalar, DiscretizeAll, PredictTarget Methods A method is used to solve a task CategorialToScalarRecursive, CleanMVRecursive, DiscretizeAllRecursive, DoPrediction Workflow Planning AI Planner Brute force planning Probabilistic Planning What will likely produce better results? Case-based Planning – How did we solved that previously? DMOP (Workflow optimization ontology) – Algorithm and Model selection given a particular task – Meta-mining by abstraction and generalisation Meta-Mining Initially, the AI planner recommends applicable DM workflows, not necessarily good ones Self-improves with experience through meta-mining The meta-miner – Applies DM techniques to meta-data from past DM experiments – Extracts workflow patterns that are signatures of high predictive performance The planner uses these workflow patterns to design and recommend promising workflows Workflow Execution 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models e-LICO Kick-Off, Geneva 12 5/7/2017 Workflow Execution All operators in ontology (+200) are exposed as SOAP or REST based Web Service Plans converted to Workflow execution language (SCUFL 2) Provenance capture – Execution times, intermediate model returned to planner Taverna Worflow Publishing and Sharing 3 1 Input Data Workflow execution engine Ontology based AI planner 4 Publish and share 2 Meta-mining Output: Data, provenance and models e-LICO Kick-Off, Geneva 14 5/7/2017 Workflow Publishing and Sharing Workflows and data can be shared via myExperiment Build a community of data miners Set of re-usable workflows, data and workflow templates (packs) Use case – Obstructive nephropathy Demonstrated with System Biology Use Case – Biomarker discovery and pathway modelling in the study of chronic kidney disease – KUP challenge initiated (August 2010) Expression data Text-mining / Image mining Further wet lab experiments KUP KB (RDF store) New models And hypothesis Research Questions How and when does a planner based “Intelligent Discovery Assistant” help the end user? Can we improve planning and suggest better workflows through metamining? Can we plan complex workflows with Scientific Goals that answer biological questions? – KUP goal is to construct diagnostic models that accurately connect the biological views to the severity of this pathology Where are we nowAvailability http://wwww.e-lico.eu 1st year demo – http://www.youtube.com/watch?v=JtmqZfzyEKs eProPlan plugin for Protégé 4.0 Ontologies available Taverna http://www.taverna.org.uk RapidMiner http://rapid-i.com Summary e-LICO: virtual laboratory for interdisciplinary collaborative research in data-mining Ontology based AI planning of KDD workflows Generic E-Science platform for DM Application layer for Systems Biology Acknowledgments Robert Stevens (Manchester) Alan Williams (Manchester) Rishi Ramgolam (Manchester) Jorg-Uwe Kietz (Zurich) Melanie Hilario (Geneva) E-LICO consortium