Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations VI: Provenance Deborah McGuinness TA Weijing Chen Semantic eScience Week 9, October 31, 2011 1 References • • • • • PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL-07-07. http://www.ksl.stanford.edu/KSL_Abstracts/KSL-07-07.html Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic Web Conference 2003 - Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html McGuinness, D.L.; Zeng, H.; Pinheiro da Silva, P.; Ding, L.; Narayanan, D.; Bhaowal, M. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. The Workshop on the Models of Trust for the Web (MTW'06), Edinburgh, Scotland, May 22, 2006. 2006. http://www.ksl.stanford.edu/KSL_Abstracts/KSL-06-05.html More from http://inference-web.org/wiki/Publications 2 Note the LOGD converter generates PML Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Adopt Leverage Science/Expert Rapid Technology Open World: Prototype Technology Review & Iteration Approach Infrastructure Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 3 Ingest/pipelines: problem definition • Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control • Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision • We often fail to capture, represent and propagate manually generated information that need to go with the data flows • Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects • The task of event determination and feature classification is onerous and we don't do it until after we get the data 4 5 20080602 Fox VSTO et al. Use cases • Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter? • What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? • Find all good images on March 21, 2008. • Why are the quick look images from March 21, 2008, 1900UT missing? • Why does this image look bad? 6 7 20080602 Fox VSTO et al. 8 20080602 Fox VSTO et al. Provenance • Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility • Knowledge provenance; enrich with ontologies and ontology-aware tools 9 Semantic Technology Foundations • PML – Proof Markup Language – used for knowledge provenance interlingua • Inference Web Toolkit – used to manipulate and access knowledge provenance • OWL-DL ontologies (including SWEET and VSTO ontologies) • • PML -McGuinness, Ding, Pinheiro da Silva, Chang. PML 2: A Modular Explanation Interlingua. AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, Can., 7/07. Stanford Tech report KSL-07-07. Inference Web - McGuinness and Pinheiro da Silva. Explaining Answers from the Semantic Web: The Inference Web Approach. Web Semantics: Science, Services and Agents on the World Wide Web Special issue: International Semantic Web Conference 2003 - Edited by K.Sycara and J.Mylopoulis. Volume 1, Issue 4. Journal published Fall, 2004 Inference Web Explanation Architecture WWW SDS OWL-S/BPEL Trace of web service discovery Learners * Proof Markup Language (PML) Toolkit IWTrust Trust computation IW Explainer/ Abstractor End-user friendly visualization Learning Conclusions JTP/CWM KIF/N3 Trust Theorem prover/Rules SPARK SPARK-L Trace of task execution UIMA Justification Provenance Text Analytics IWBrowser Expert friendly Visualization IWSearch search engine based publishing IWBase provenance registration Trace of information extraction • Semantic Web based infrastructure • PML is an explanation interlingua – Represent knowledge provenance (who, where, when…) – Represent justifications and workflow traces across system boundaries • Inference Web provides a toolkit for data management and visualization Global View and More Views of Explanation filtered focused global Explanation (in PML) discourse trust • • provenance Explanation as a graph Customizable browser options – – – – • abstraction Proof style Sentence format Lens magnitude Lens width More information – – – – Provenance metadata Source PML Proof statistics Variable bindings Provenance View • Source metadata: name, description, … • Source-Usage metadata: which fragment of a source has been used when Views of Explanation filtered focused Explanation (in PML) trust global abstraction discourse provenance Trust View Views of Explanation filtered Detailed trust explanation Trust Tab Explanation (in PML) trust • • • Fragment colored by trust value focused global abstraction discourse provenance (preliminary) simple trust representation Provides colored (mouseable) view based on trust values Enables sharing and collaborative computation and propagation of trust values Discourse View • • • • (Limited) natural language interface Mixed initiative dialogue Exemplified in CALO domain Explains task execution component powered by learned and human generated procedures Views of Explanation filtered focused Explanation (in PML) trust global abstraction discourse provenance Selected IW and PML Applications • Portable proofs across reasoners: JTP (with temporal and context reasoners (Stanford); CWM (W3C), SNARK(SRI), … • Explaining web service composition and discovery (SNRC) • Explaining information extraction (more emphasis on provenance – KANI, UIMA) • Explaining intelligence analysts’ tools (NIMD/KANI) • Explaining tasks processing (SPARK / CALO) • Explaining learned procedures (TAILOR, LAPDOG, / CALO) • Explaining privacy policy law validation (TAMI) • Explaining decision making and machine learning (GILA) • Explaining trust in social collaborative networks (TrustTab) • Registered knowledge provenance: IW Registrar (Explainable Knowledge Aggregation) • Explaining natural science provenance – VSTO, SPCDIS, … PML1 vs. PML2 • PML1 was introduced in 2002 – It has been used in multiple contexts ranging from explaining theorem provers to text analytics to machine learning. – It was specified as a single ontology • PML2 improves PML1 by – Adopting a modular design: splitting the original ontology into three pieces: provenance, justification, and trust • This improves reusability, particularly for applications that only need certain explanation aspects, such as provenance or trust. – Enhancing explanation vocabulary and structure • Adding new concepts, e.g. information • Refining explanation structure PML Provenance Ontology • Scope: annotating provenance metadata • Highlights – Information – Source Hierarchy – Source Usage Referencing, Encoding and Annotating a Piece of Information • Referencing a piece of information – using URI • Encoding the content of information – Complete Quote: <hasRawString>(type TonysSpecialty SHELLFISH) </hasRawString> – Obtained from URL: <hasURL>http://inferenceweb.org/ksl/registry/storage/documents/tonys_fact.kif</hasURL> • Annotations – For human consumption: <hasPrettyString>Tonys’ Specialty is ShellFish</hasPrettyString> – For machine consumption • Language: <hasLanguage rdf:resource="http://inference-web.org/registry/LG/KIF.owl#KIF" /> • Format: <hasFormat "http://inference-web.org//registry/FM/PDF.owl#PDF" /> Source Hierarchy • Source is the container of information • Our source hierarchy offers – Many well-known sources such as • Sensor (e.g. geo-science) • InferenceEngine (e.g. reasoner) • WebService (e.g. workflow) – Finer granularity of source than just document • DocumentFragment (for text analytics) Source Usage • Source Usage – logs the action that accesses a source at a certain dateTime to retrieve information – is part of PML1 • Example: Source #ST was accessed on certain date <pmlp:SourceUsage rdf:about="#usage1"> <pmlp:hasUsageDateTime>2005-10-17T10:30:00Z</pmlp:hasUsageDateTime> <pmlp:hasSource rdf:resource="#ST"/> </pmlp:SourceUsage> PML Justification Ontology • Scope: annotating justification process • Highlights – Template for questionanswer/justification – Four types of justification Four Types of Justification Goal conclusion without justification Assumption conclusion assumed (using Assumption Rule) asserted by an InferenceEngine, no antecedent Direct Assertion conclusion directly asserted (using DirectAssertion rule) by an InferenceEngine, no antecedent Regular conclusion derived from antecedent conclusions PML Trust Ontology • Scope: annotate trust and belief assertions • Highlights – Extensible trust representation (user may plug in their quantitative metrics using OWL class inheritance feature) – Has been used to provide a trust tab filter for wikipedia – see McGuinness, Zeng, Pinheiro da Silva, Ding, Narayanan, and Bhaowal. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. WWW2006 Workshop on the Models of Trust for the Web (MTW'06), Edinburgh, Scotland, May 22, 2006. 25 26 20080602 Fox VSTO et al. Quick look browse 27 20080602 Fox VSTO et al. 28 Visual browse 29 30 31 Search and structured query Search Structured Query 32 Search 33 20080602 Fox VSTO et al. Provenance within SemantAqua 34 SemantAqua System Architecture Virtuoso access 35 Provenance • Preserves provenance in the Proof Markup Language (PML). • Data Source Level Provenance: – The captured provenance data are used to support provenance-based queries. • Reasoning level provenance: – When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data, the converted data, and regulatory data. 36 Visualization 37 http://tinyurl.com/iswc-swqp Visualization 38 http://tinyurl.com/iswc-swqp Visualization 39 http://tinyurl.com/iswc-swqp Visualization 40 http://tinyurl.com/iswc-swqp Visualization 41 http://tinyurl.com/iswc-swqp Visualization • Time series Visualization: – Presents data in time series visualization for user to explore and analyze the data Violation, measured value: 2032.8 Violation, measured value: 971 Limit value: 400 42 http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=1&site=http%3A%2F%2Ftw2.tw.rpi.edu%2 Fzhengj3%2Fowl%2Fepa.owl%23facility-110009444869 Selected Results • Provenance information encoded using semantic web technology supports transparency and trust. • SemantAqua provides detailed provenance information: – Original data, intermediate data, data source • “What if” Scenario: – User can apply a stricter regulation from another state to a local water source. • User may be interested only in certain sources and can use the interface to control queries 43 Aim at providing at least as much provenance as SemantAqua • Questions? 44