Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania Provenance Challenge, Sept. 2006 1 Our approach Model of provenance Based on study of user requirements (CIPRES) Based on careful studies of workflow systems (Kepler, MyGrid, Chimera) minimal information to reason about provenance No workflow system is proposed User views Capability of workflow systems to group steps (forming boxes) and to zoom into boxes Multi-granularity levels of provenance Implemented in Oracle 10g and Java Relational framework augmented with transitive closure Java/Spring/JDBC: object layer and user interface Provenance Challenge, Sept. 2006 2 Workflow Representation input data reslice: step-class 8.reslice: step Terminology output data Step-classes (static) An execution of a workflow generates a partial order of steps (dynamic) Instances of step classes Each step has input and output data Provenance Challenge, Sept. 2006 3 Provenance Trace Base tables Data(dataid, name, type), DataAttributes(dataid, attribute, value) Data(1, Anatomy Image1, Anatomy Image) DataAttributes(1, center, UChicago) Center=UChicago InstanceOf(Step,Step-Class,ts), StepParams(step, attribute, value), StageInstance(step, stage) Input(stepId,dataId,ts) / Output(stepId,dataId,ts) stepId takes as input /produces dataId at time ts Views Process(stepId, stepClass, input, output, time) … Provenance Challenge, Sept. 2006 4 Provenance Queries Q1: Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is SELECT DISTINCT step, step-class, input, output FROM Process START WITH output = ( SELECT ID FROM DataID WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output Implements ORDER BY step; transitive closure. Necessary to return all the data used to (recursively) compute Atlas X Graphic. Provenance Challenge, Sept. 2006 5 Provenance Queries (Cont.) All the queries can be answered by our system Using SQL Code available on TWiki Connect by operators Joins with several tables (e.g. Parameters, DataAttribute) Minus and Union operators The generalization of Q7 (difference between workflows) is currently not answerable Provenance Challenge, Sept. 2006 6 Workflow Variant: User Views What are User views? Box1 Level of detail the user wishes to track Permissions given to the user Ability of the user to see / know the sub-steps (distributed computation) Box2 Why use User Views? Throw away unimportant intermediate results Better understanding of the workflow Reduce the amount of work to be redone UBio UBlackBox UAdmin can see everything Provenance Challenge, Sept. 2006 7 Querying within User Views Need information from Workflow: Step-class containment and user views Cinput(sid,idid,tsi), Coutput(sid,idid,tso) View UProcess(usr, step, step-class, input, output) Query: What are all the data items used to produce“Resliced Image1”? SELECT * FROM uProcess upc WHERE usr = :userName START WITH outputName = 'Resliced Image1' CONNECT BY PRIOR upc.output = upc.input; UAdmin: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header, Wrap param1 UBio: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header UBlackBox: empty answer! Provenance Challenge, Sept. 2006 8 Conclusion, Perspectives Able to answer the queries, including Variation of the workflow and queries considering user views Data and Step provenance Immediate and Deep (recursive) provenance Multi-granularity levels of provenance Only visible and necessary data are kept Open questions What is the meaning of “stage” in a workflow (with respect to user views)? What are we expecting as an answer to the difference between two workflows (cf. query 7)? Are all the procedures of the workflow “biologically significant” (cf. user views)? Provenance Challenge, Sept. 2006 9 Acknowledgements Kepler Group Shawn Bowers Bertram Ludascher Timothy McPhillips Biologists from the CIPRES project Members from the Database group, University of Pennsylvania This work is supported by NSF grants 0513778, 0415810, and 0612177 Provenance Challenge, Sept. 2006 10 User interface Provenance Challenge, Sept. 2006 11