Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Process Ontology for Cell Biology Stuart Aitken Artificial Intelligence Applications Institute Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 1 Outline • Rapid Knowledge Formation (RKF) Project – RKF Project goals and domain – The Cyc knowledge based-system – RKF Tools • Process Ontology – General approach – Formalisation – Example Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 2 Rapid Knowledge Formation • The RKF project aims to develop tools which will allow domain experts to enter knowledge directly into the KBS. • DARPA-funded, two teams: – CYCORP – SRI • Organised around ‘Challenge Problems’ – Cell Biology Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 3 RKF Aim: To enable biologists to construct an ontology/KB from a textbook source formalise Ontology Alberts et al, Essential Cell Biology, 1998 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 4 Rapid Knowledge Formation Key techniques: • The KBS has knowledge of the KA process – Knowledge of salience – Knowledge of the requirements of an adequate formalisation • There is a dialogue between expert and system, which clarifies the concept being defined. Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 5 Rapid Knowledge Formation Evaluation: After a period of tool development, • trials are organised, both • expert performance, and • KE performance is measured, • and assessed independently. The evaluation is extensive – over a period of 2 weeks Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 6 The Cyc KBS • Cyc (Doug Lenat) is a knowledgebased system, under development since ~1984, aiming to represent common sense knowledge. • Cyc uses a large upper-level ontology • Uses a logical language based on first-order logic Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 7 The Cyc KBS Concepts in the Upper Ontology: – – – – – – Thing, Agent, Event TangibleThing, InformationBearingObject …. Dog, Book subclass(genls), instance-of(isa) parts, subevent, role predicates 1600 concepts in total in the public release (1998) - small% of Cyc Classification: – Stuff-like vs Object-like – Individual vs Set Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 8 The Cyc KBS • The upper-ontology supports application development: Thing Upper-level Intermediate-level Application-level Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 9 The Cyc KBS Cyc includes: • An inference engine, • GUI, • tools for ontology development. • Until the RKF project, ontology development was by trained knowledge engineers, working with domain experts. Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 10 RKF New tools in Cyc: • Define a new concept, and place it correctly in the ontology • Refine a concept definition • Define a new predicate • Assert a new fact • Define a new rule • State an analogy • Construct a new process Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 11 RKF User interaction: • Selection of items in the interface – Choice determined ‘intelligently’, KBS has knowledge of salience, and the KA process, this knowledge must be authored • Browsing of the ontology • Search • Natural language dialogue Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 12 Process Models RNA Transcription BindsTogether Move Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 13 Process Descriptor Q: Name the process A: [ RNA Transcription ] Q:Select the type of Process that describes the category best • event localised • creation or destruction event… • ‘say this:’[ _ _ _ _ _ _ ] Q: Define: • affected object: [ _ _ _ _ _ ] • location: [ _ _ _ _ _ ] • actor: [_____] Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 14 Process Models Describing Processes: • Complex expressions at the instance level • Simpler to describe in terms of types Upper-level subevent(Event,Event) doneBy(Event,Agent) Intermediate-level Application-level ? ForAll ?E ?F ?G implies (subevent(?E,?G) and isa(?E,BindsTogether) subevent(?F,?G) and isa(?F,Move)) before(startOf(?E),startOf(?F)) Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 15 Script Vocabulary The Script theory defines the semantics of Type-Level assertions (typePlaysRoleInScene RNATranscription DNAMolecule BindsTogether objectActedOn) • Requires rules for identity – Can require complex reasoning • Good for user input • Can be extended to cover pre and postconditions of actions Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 16 Scripts subevents RNA Transcription startsAfterStartingOfInScript BindsTogether Move t e f Forall subevents f of t, of type Move, and all subevents e of t, of type BindsTogether, (startsAfterStartingof f e) where t is of type RNATranscription Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 17 Scripts Type playing role Types: Instance: Nucleotide BindsTogether N e objectActedOn For some n in N, (objectActedOn e n) Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 18 New Script Vocabulary • Pre and Post conditions (preconditionOfScene-negated BindsTogether touchingDirectly <Ribonucleotide Nucleotide>) BindsTogether N not R touchingDirectly N R (postconditionOfScene BindsTogether connectedTo <Ribonucleotide Nucleotide>) connectedTo Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 19 New Script Vocabulary Types: BindsTogether Nucleotide Set of Instances: N role e role Ribonucleotide R Precondition: Postcondition: Some ?n in N, some ?r in R (not (touchingDirectly ?n ?r)) Some ?n in N, some ?r in R (connectedTo ?n ?r) identity Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 20 Script Vocabulary • The Script vocabulary forms an ‘intermediate level’, which • lies behind the Process descriptor GUI (i.e. the textboxes) • Not, in itself, a taxonomy of processes, but allows processes to be described in detail. • Defining the subclass relation is just one task. Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 21 Vaccinia Virus Life Cycle • The vaccinia virus life cycle was selected as an example of a complex model to formalise as a set of Scripts. • The model includes actions, decomposition, ordering, objectsplaying-roles and pre/postconditions • It is a good test for the Script vocabulary Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 22 Vaccinia Virus Life Cycle Temporal: mRNATranscription-Early ViralGeneTranslation-Early MovementOfProtein Participants mRNATranscription-Early Outputs:messengerRNA ViralGeneTranslation-Early Inputs:messengerRNA MovementOfProtein Conditions: mRNATranscription-Early Pre:spatiallySubsumes Cell VirusCore ViralGeneTranslation-Early MovementOfProtein Post:spatiallySubsumes CellCytoplasm Vitf2 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 23 Evaluation • 8 biologists were selected, and trained in the tools, 4 per team • The knowledge to be formalised was selected (chapter 7 in Alberts) • The knowledge base was allowed to contain ‘pump-priming’ knowledge • The biologists entered knowledge , using the tools, then tested it against a set of questions, • Ontology/KB was revised Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 24 Evaluation Results (outline) • A huge amount of data was collected, but analysis is complex (IET Inc) • Domain experts were able to develop ontologies after ‘light’ training • Knowledge engineers out-perform domain experts in ontology construction Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 25 Summary ‘Power Tools’ for ontology development are being implemented and tested in the RKF project. • A Script/Process vocabulary has been developed and applied to processes in cell biology, covering: – – – – Temporal order Participants Pre/postconditions Repetition Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications 26