Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London Scientific workflow field • Scientific workflows: a high-level programming language with explicit graphical representation of flow of data and/or control • Research into automation of processes supporting scientific research • Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana • Lingua franca of service-oriented computing Deluge of workflows Kepler Meandre GenePatterns Triana Taverna Pegasus Discovery Net Orange Wildfire Pentaho BPEL VisTrails KNIME Galaxy UGENE Trident LONI YAWL Bioinformatics Business Intelligence Sensor informatics Cheminformatics Environmental Science Astronomy … Workflow analysis • There is a need for formal models to capitalize on the benefits of this infrastructure o o Work evaluated on Discovery Net workflow Concepts applicable to other workflow systems • Some aims o o Minimise cost of data movement and processing Provide technology for workflow clients and warehouses (indexing, guided construction…) • Tasks o o o o Safeness Instance bounds Static workflow optimization Establishing polymorphic type profiles of workflows Underlying models • Control flow model o o Process calculus definitions Communication along named channels • Fixed for atomic execution, dynamic for streaming o o New instance of the process launched as soon as the node receives a token Computational tree logic modelling execution states • Data flow model o o o Nodes associated with lambda calculus formulas and term graphs Polymorphic type transformations Rewrite rules defined for sets of nodes as term graph transformations • Embedding o Way of combining the control and data semantics Workflow analysis tool • Similarity checker o Bisimilarity of processes • Process profiler o o o Deadlock/livelock detection Reachability Task bounds • Composability checker o o o Design-time tests Type requirements Polymorphic properties • Equivalence checker o Functional equivalence • Optimizer o Rewrite rules for transformations Similarity checker Workflow Process model Model checker • Based purely on the pi-calculus process model o o o Workflows translated into the process model Parallel composition of independent node processes with named channels Compared in terms of: • Internal executions (node actions) • Set of observable outputs - define only relevant outputs • Model checker used to test different types of bisimilarity o o o Node executions conveniently represented as silent actions Strong bisimulation becomes strict one-to-one workflow action mapping Weak bisimulation ignores internal actions and communications and focuses on visible outputs Similarity checker: example • ABC (Another Bisimilarity Checker) used • Model checker used to test different types of bisimilarity o o o Node executions conveniently represented as silent actions Strong bisimulation becomes strict one-to-one workflow action mapping Weak bisimulation ignores internal actions and communications and focuses on visible outputs Process profiling Workflow Process model Kripke frame • The process algebra representation translated into a Kripke frame o o o o Enumerated states denoting the number of instances of each workflow node Transitions of the frame are the node executions Use CTL formulas to query NuSMV model checker employed • Allows questions such as: o o o o Reachability of a particular state Detection of deadlocks and livelocks Safety - some state always executing Bounds on a number of instances of a node Process profiling: example • Reachability o o EF Fτ1 – Is there an execution that achieves one instance of F AF Fτ1 – Do all executions always achieve one instance of F • Livelocks o o AG (Cτ -> AG AF Cτ) – Is there always a livelock with C EF (Cτ -> AG AF Cτ) – Can there be a livelock with C • Instance bounds o maxX .EF Aτx – What is the maximum number of instances of A Composability checker Workflow Data model Type formulas • Polymorphic type formulas for the workflow components/fragments • When composing: o o o The output and input of each fragment compared in terms of free and bound type variables If no clashes, free variables resolved to form the type formula of the composition Inference engine developed specifically for the tool • Determines: o o If a workflow fragment can be reused on a new input Find compatible services in the warehouse Composability checker: example • Fragment of three nodes LMN o o o Input q, with required attributes A, B, D Two outputs u, v A present in both. B in u. D in neither. • Two outputs can be joined with O Equivalence tester / optimizer Workflow Data model Node equivalences • Uses a set of node equivalence rules o o Defined for each workflow system or node subset Algorithm applies allowed transformations to reduce two workflows to the same expression • Combined with rewrite heuristics o o Node-specific again Simple example: relational model again Equivalence tester/optimizer: example • Relational workflow searching for Adverse Drug Reactions in GPRD database • Rewrite rules o Set of relational equivalences • Heuristics o o o Early projections/selections Late joins Easy scenario – brute force algorithm works Related and future work • Data typing o COMAD for Kepler • Workflow process analysis o o GWorkflowDL YAWL • New workflow tools with relational structures o o o KNIME Orange Pentaho • Extensions: o o o Streaming – blocking and batching Improved state reduction algorithms for CTL model Adding more type constructs for polymorphism Summary • Workflow analysis needed to improve takeup and exploitation of workflows o o o Enterprise environments Profile resource usage, risk of failure, execution time Support reuse and repurposing • Separation of control and data aspects allows use of existing model checkers and familiar techniques o Process algebras, temporal logics, type polymorphisms, term graphs • Current version works on Discovery Net/InforSense o o KNIME, Pentaho very similar – only require extra parsers Full streaming process model for Taverna in the works