Download Slide Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

Data model wikipedia , lookup

B-tree wikipedia , lookup

Operational transformation wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
The design and implementation
of a workflow analysis tool
Vasa Curcin
Department of Computing
Imperial College London
Scientific workflow field
• Scientific workflows: a high-level programming
language with explicit graphical representation of
flow of data and/or control
• Research into automation of processes supporting
scientific research
• Significant role in providing middleware for UK
eScience programme: Taverna, Discovery Net,
Triana
• Lingua franca of service-oriented computing
Deluge of workflows
Kepler
Meandre
GenePatterns
Triana
Taverna
Pegasus
Discovery Net
Orange
Wildfire
Pentaho
BPEL
VisTrails
KNIME
Galaxy
UGENE
Trident
LONI
YAWL
Bioinformatics
Business Intelligence
Sensor informatics
Cheminformatics
Environmental Science
Astronomy
…
Workflow analysis
• There is a need for formal models to capitalize on the
benefits of this infrastructure
o
o
Work evaluated on Discovery Net workflow
Concepts applicable to other workflow systems
• Some aims
o
o
Minimise cost of data movement and processing
Provide technology for workflow clients and warehouses
(indexing, guided construction…)
• Tasks
o
o
o
o
Safeness
Instance bounds
Static workflow optimization
Establishing polymorphic type profiles of workflows
Underlying models
• Control flow model
o
o
Process calculus definitions
Communication along named channels
• Fixed for atomic execution, dynamic for streaming
o
o
New instance of the process launched as soon as the node
receives a token
Computational tree logic modelling execution states
• Data flow model
o
o
o
Nodes associated with lambda calculus formulas and term graphs
Polymorphic type transformations
Rewrite rules defined for sets of nodes as term graph
transformations
• Embedding
o
Way of combining the control and data semantics
Workflow analysis tool
• Similarity checker
o
Bisimilarity of processes
• Process profiler
o
o
o
Deadlock/livelock detection
Reachability
Task bounds
• Composability checker
o
o
o
Design-time tests
Type requirements
Polymorphic properties
• Equivalence checker
o
Functional equivalence
• Optimizer
o
Rewrite rules for
transformations
Similarity checker
Workflow
Process
model
Model
checker
• Based purely on the pi-calculus process model
o
o
o
Workflows translated into the process model
Parallel composition of independent node processes with named
channels
Compared in terms of:
• Internal executions (node actions)
• Set of observable outputs - define only relevant outputs
• Model checker used to test different types of bisimilarity
o
o
o
Node executions conveniently represented as silent actions
Strong bisimulation becomes strict one-to-one workflow action mapping
Weak bisimulation ignores internal actions and communications and
focuses on visible outputs
Similarity checker: example
• ABC (Another Bisimilarity Checker) used
• Model checker used to test different types of bisimilarity
o
o
o
Node executions conveniently represented as silent actions
Strong bisimulation becomes strict one-to-one workflow action
mapping
Weak bisimulation ignores internal actions and communications
and focuses on visible outputs
Process profiling
Workflow
Process
model
Kripke
frame
• The process algebra representation translated into a Kripke frame
o
o
o
o
Enumerated states denoting the number of instances of each workflow
node
Transitions of the frame are the node executions
Use CTL formulas to query
NuSMV model checker employed
• Allows questions such as:
o
o
o
o
Reachability of a particular state
Detection of deadlocks and livelocks
Safety - some state always executing
Bounds on a number of instances of a node
Process profiling: example
• Reachability
o
o
EF Fτ1 – Is there an execution that achieves one instance of F
AF Fτ1 – Do all executions always achieve one instance of F
• Livelocks
o
o
AG (Cτ -> AG AF Cτ) – Is there always a livelock with C
EF (Cτ -> AG AF Cτ) – Can there be a livelock with C
• Instance bounds
o
maxX .EF Aτx – What is the maximum number of instances of A
Composability checker
Workflow
Data
model
Type
formulas
• Polymorphic type formulas for the workflow
components/fragments
• When composing:
o
o
o
The output and input of each fragment compared in terms of free
and bound type variables
If no clashes, free variables resolved to form the type formula of
the composition
Inference engine developed specifically for the tool
• Determines:
o
o
If a workflow fragment can be reused on a new input
Find compatible services in the warehouse
Composability checker: example
• Fragment of three nodes LMN
o
o
o
Input q, with required attributes A, B, D
Two outputs u, v
A present in both. B in u. D in neither.
• Two outputs can be joined with O
Equivalence tester / optimizer
Workflow
Data
model
Node
equivalences
• Uses a set of node equivalence rules
o
o
Defined for each workflow system or node subset
Algorithm applies allowed transformations to reduce
two workflows to the same expression
• Combined with rewrite heuristics
o
o
Node-specific again
Simple example: relational model again
Equivalence tester/optimizer: example
• Relational workflow searching for Adverse Drug Reactions in GPRD database
• Rewrite rules
o
Set of relational equivalences
• Heuristics
o
o
o
Early projections/selections
Late joins
Easy scenario – brute force algorithm works
Related and future work
• Data typing
o
COMAD for Kepler
• Workflow process analysis
o
o
GWorkflowDL
YAWL
• New workflow tools with relational structures
o
o
o
KNIME
Orange
Pentaho
• Extensions:
o
o
o
Streaming – blocking and batching
Improved state reduction algorithms for CTL model
Adding more type constructs for polymorphism
Summary
• Workflow analysis needed to improve takeup and
exploitation of workflows
o
o
o
Enterprise environments
Profile resource usage, risk of failure, execution time
Support reuse and repurposing
• Separation of control and data aspects allows use of
existing model checkers and familiar techniques
o
Process algebras, temporal logics, type polymorphisms, term
graphs
• Current version works on Discovery Net/InforSense
o
o
KNIME, Pentaho very similar – only require extra parsers
Full streaming process model for Taverna in the works