Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
myGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06 Components • Identifiers – LSIDs • Data – JDBC data store • Metadata – RDF Provenance Plugin • Browsing – Provenance Browser Plugin • Security – Under development LSID LSID: Life Science Identifier • URN specification in progress • 5 part identifier (with optional version id) – urn:lsid:www.mygrid.org.uk:lsdocument:X1234 – urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi :7717376 • protocol for retrieving data and metadata about an object • commitment by the provider to always return the same data for an ID LSID (ctd) • Issue – LSID Authorities • Resolution – LSID Resolvers • Examples – myGrid – Long Term Ecological Research Network – BioPathways Consortium LSID (ctd 2) • abstraction • lightweight • independent from actual storage implementation – database – file system – application • both for private and public data sources Data Data Storage (current) • Taverna can persist inputs, outputs and intermediate results in an SQL database via JDBC • Optional and can be done by configuring a Baclava Data Store • Allows the LSIDs of data items to be resolved against the actual data Data Storage (future) • Domain-specific databases – use outside myGrid • Develop: – taverna processor for JDBC/OGSA-DAI – associated interface (cf BioMart) • Users will be able to study the contents of an existing database and: – write queries that extract data from the database, where the query may be parameterised with values passed in from the workflow; – write requests that insert data from the workflow into a named table in the database. Metadata Metadata Generation • Taverna Provenance Plugin • Listen to Taverna Events – WorkflowEventListener • Faithfully record them as ontological instance data – RDF graphs (one for each Taverna run) Metadata • Representation • Ontology (Schema) • Storage • Query • Browsing Representation • RDF – triples • subject –predicate object – URIs (hence easy data integration) – semantic web language – XML serialization – flexible, powerful – sets of triples gives rise to graphs Workflow Run urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed belongsTo urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 Schema • Ontology – RDF schema • Taxonomic inferences – also available as OWL • opens it up to complex reasoning Typed Workflow Run executed WorkflowRun runs Workflow Provenance Ontology launchedBy ProcessRun Experimenter urn:lsid:…:workflow:6 Organization belongsTo urn:lsid:…:org:HY7 runs urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed belongsTo urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 Storage • Named RDF graphs – retrieve whole graphs (eg workflows) – implementation in • NG4J (Jena + MySQL) – scalability issues • Sesame2 native store – scalable – Java 5 Query • RDF query languages – TriQL, SeRQL, SPARQL • query languages for named RDF graphs • Ontology inspection/reasoning • Canned Queries – workflows with failed processes – input/output of past process runs – workflows with data changed by user Browsing Provenance Browsing • Provenance Browser Plugin – reusing Taverna GUI components • Matthew Gamble Analysis Provenance Analysis • Comparison • Aggregation • etc – see work by Jun Zhao Security • User sends LSID ref and credentials to the Access Point • Access Point returns data and metadata or denies access as follows: – credentials are passed to a User Directory – User Directory passes the corresponding user to the Authorization Authority – Authorization Authority returns the user attributes in the form of a (possibly signed) SAML assertion – this assertion, together with the lsid and its corresponding metadata, is passed to the Policy Enforcement Point (PEP) – PEP uses these three inputs to form an XACML request that is passed to a Policy Decision Point (PDP) that is preloaded with an XACML Policy Set. – PDP evaluates the request against its policy set and returns an XACML response to PEP – PEP decodes the response and either allows data/metadata to be returned to the user or denies access. myGrid XACML Policy • Scenario – supervisors can access all workflows in the organization – students can access only their own workflows – blacklisted users cannot access anything • See policySet.xml on myGrid wiki