Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Operational transformation wikipedia , lookup

Clusterpoint wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Database model wikipedia , lookup

Web Ontology Language wikipedia , lookup

Business intelligence wikipedia , lookup

Semantic Web wikipedia , lookup

Upper ontology wikipedia , lookup

Transcript
The DHS Ontology Case
Presentation by OntologyStream Inc
Paul Stephen Prueitt, PhD
19 March 2005
Ontology Tutorial 1, copyright, Paul S Prueitt 2005
Ontology technology exists in the market
However, practical problems block progress in designing and implementing DHS wide
ontology technology. Business process mythology, like Earned Value program management,
is not focusing on the right questions and maturity models for software development have
precluded most innovations that come from outside the relational database paradigm.
These practical problems are also partially a consequent of
1.
Some specific institutional behaviors related to traditional program management.
2.
Confusion caused by long term, and heavily funded Artificial Intelligence marketing activities
As a general proposition, through out the federal government, quality metrics
are not guiding management decisions supporting:
1) Quick transitions from database centered information technology to XML
based Semantic Web technology.
2) Transitions from XML repositories to ontology mediated Total Information
Awareness, with Informational Transparency, in Secure Channels.
?
DHS Ontology
1)
2)
3)
4)
World-wide Trade Data
Investigation Targeting
Risks, Threats and
Vulnerabilities
Policy Enforcement
Emerging Semantic Web
Standards
Diagram from Prueitt, 2003
First two steps
are missing
Seven step
AIPM
RDBMS
Diagram from Prueitt, 2003
is not
complete
The measurement/instrumentation task
First two steps
in the AIPM
Measurement is part of the “semantic
extraction” task, and is accomplished with a
known set of techniques”
• Latent semantic technologies
• Some sort of n-gram measurement
with encoding into hash tables or internal
ontology representation (CCM and
NdCore, perhaps AeroText and
Convera’s process ontology (?), Orbs,
Hilbert encoding, CoreTalk/Cubicon.
• Stochastic and neural/ genetic
architectures
One model for semantic extraction explicitly focuses on the first two aspects of
the AIPM; e.g. instrumentation/measurement and data-encoding/interpretation
Actionable Intelligence Process Model has an action-perception event
cycle.
Stratified ontology supports the use of this cycle to produce knowledge of
attack and anticipatory mechanisms based on the measurement of substructural categorical invariance.
Work flow and process
ontology is available as a
basis for encoding knowledge
of anticipatory response
mechanisms.
Categorical Invariance is
measured, using Orbs
(Ontology referential bases)
for example or CCM
(Contiguous Connection
Model) encoding, and
organized as a resource for
RDF triples using some lite
OWL OIL.
Distributed ontology management is already
available in some military activities
distributed Ontology
Management Architecture
d-OMA
Ontology Architecture
March 10, 2005
Version 3.0
Part of a series on the nature of machine encoding of sets of concepts
Benchmark
Higher level Ontologies as
part of the d-OMA
Transition 
Community
Community
Community
Localized Ontologies within
an DAML type agency
structure
Higher level Ontologies
Links to Web services
and Internal R&D
Community
Community
Community
Localized Ontologies
Ontology users have the
following roles:
Ontology
librarian
Knowledge engineers
Knowledge
Engineers
Information mediators
( e.g. ontology librarians )
A reconciliation process
is required between
ontology services.
Intelligence Targeting
Analyst
Community
Community workers
(e.g. analysts)
Higher level Ontologies
Real time acquisition of
new concepts, and
modifications to
existing concepts are
made via an piece of
software called
Ontology Use Agent
Ontology Use Agents
have various type of
functions expressed in
accordance with roles
Community
Mature scholarship from
evolutionary psychology
research communities.
Localized Ontologies
Human and system interaction with a
common Ontology Structure
Ontology Presentation
Part of a series on the nature of machine encoding
of sets of concepts
First principles
First, and before all else, an computer based ontology is a
{ concepts }
In the natural, physical, sciences an ontology is the causes of those
things that one can interact with in Nature.
Physical science informs us that a formative process is involved
in the expression of natural ontology in natural settings.
The set of “our” personal and private concepts is often thought to be the causes of how we
reason as humans. This metaphor is operational in many peoples’ understanding about the
nature and usefulness of machine encoded ontology. But this metaphor can also mislead us!!!!
Extensive literatures indicate that the Artificial Intelligence (AI) mythology has lead
many to believe that the “reasoning” of an ontology might be the same as the
reasoning of a human in all cases.
•This inference is not proper because the truthful-ness of this inference has not been demonstrated by natural science,
and perhaps cannot be demonstrated no matter what the levels of government funding for AI.
•AI is discounted in Tim Berner’s Lee’s notion of the Semantic Web.
Tim Berner’s Lee Notion of Semantic Web
The point being
made here is that
the notion of
“inference” is very
different depending
if one is talking
about the human
side or the
machine side of
the Semantic Web.
One consequence of acknowledging this difference is to elevate the work of the
authors of the OASIS standards, in particular Topic Maps. In Topic Maps we
have an open world assumption and very little emphasis on computational
inference. Human knowledge is represented in a “shallow form, and
visualization is used to manage this representation.
Computation with topic maps AND OWL ontologies work together with XML
repositories.
First principles
{ concepts }
Let is use only set theory to consider Tim Berners Lee’s
notion of Semantic Web.
Let C = { concepts } and B be a subset of C
Some software
Subsetting
function
The subsetting function might be an
“ask-tell”
Human and/or machine computation
creates a well formed query
interaction between two ontologies
B is a subset of C
First principles
{ concepts }
At this point we have various possible
consequences.
1) The small(er) B ontology might simply
be viewed by a human and actions
taken outside of the information system
2) The smaller ontology might be used in
several different ways
a)
Imported into a reasoner to be considered in
conjunction with various different data
sources.
b)
Send messages to other ontologies via a
distributed Ontology Management System
Subsetting
function
B is a subset of C
First principles
Situational Ontology
Software
The knowledge repository acts as a
“perceptual ground” in a “figure-ground”
relationship.
The ontology sub-setting function has pulled part, but not all, of the background into a
situational focus. This first principle is consistent with perceptual physics and thus is
“informed” by natural science.
(The following slides are from OntologyStream’s Ontology Presentation VII – General Background)
Extending Ontology Structure over
legacy information systems
Part of a series on the nature of machine encoding of sets of concepts
Presentation Contents
Functional specs
Ontology Use Start-up Use Model
Model: Steady State Ontology System
Components: Framework for Query Entity
Data Access: Steady State Ontology System
Framework for Query Entity
Building ontology from data flow
Using Ontology
Ontology Generating Framework
The inverse problem: generating synthetic data
Finding data regularity in its natural contextualization
Functional specs
Functional specs:
1. Human-centric: must be human (individual) centric in design and
function
2. Support data retrieval: must act as a data retrieval mechanism
3. Event structure measurement: must assist in the definition of data
acquisition requirements on an on-going basis
4.
Interactive: must support multiple interacting ontologies
5. Real Time: must aid in real time problem solving and in the long term
management of specific sets of concepts
Note: Ontology mediated knowledge systems have operational properties that are quite different from traditional
relational database information systems. These five functional specs have been reviewed by a small community of
professional ontologists, as has been deemed correct for knowledge systems.
Ontology Use Start-up Use Model I
Transaction process
Entity
updates
Inferences about
Knowledge
Base
Query
entities
One of the
Ontology Reasoners
Start-up Use Case – Step 1
startup
At start-up, resources are loaded into the reasoner.
1) Some part of the knowledge base is expressed as
reasoner complaint RDF
2) Some inference logic is loaded into the reasoner
3) Query entities are loaded to reflect interests of
analyst(s)
4) Transaction processes are occurring in real time
Start-up Use Model II
Transaction process
Entity
updates
inferences
Knowledge
Base
Query
entities
Reasoner
Start-up Use Case Step 2
startup
Instance data
Since instance data is much larger (2 or more
magnitudes larger) than the knowledge base, the
instance data is managed in a separate start-up
process.
Model: Steady State Ontology System
Transaction process
Entity
updates
inferences
Inference Mgr
Knowledge
Base
Query
entities
Query Mgr
Data Access Mgr
Reasoner
Ontology Mgr
Instance data
Instance data may be remote or local.
Local data is on the same network as the
knowledge base.
Components: Framework, User Visualization point of view
Transaction processes
updates
inferences
Inference Manager
Query Manager
Data
Reasoner
Query
entities
Data Access Mgr
Knowledge
Base
Ontology Mgr
Use Case: Steady State Ontology System
Data Access: Steady State Ontology System
using the OWL standard **
The RDF knowledge base ** is a set of concepts
expressed as a set:
Instance data
Pipes and
Threads
Data Object
{ < subject, verb, predicate > }
Data
and the data is either “XML” or a data structure
such as one would have as a C construct.
The Data Access Manager must manage the
mapping between local data stores (sometimes
having millions of elements) and the set of
concepts.
The remote data may have many persistence
forms, and will be accessed via a data object.
Data Access Mgr
Knowledge
Base
** We use RDF and OWL as a standard to create minimal and well knowledge inference capabilities.
Framework from Query Entity point of view
Transaction processes
updates
inferences
Inference Manager
Query Manager
Data
Reasoner
Query
entities
Data Access Mgr
Knowledge
Base
Ontology Mgr
A Query Entity is itself a type of light
ontology. It develops “knowledge”
about the user(s) and about the query
process.
Framework from Knowledge Management point of view
Transaction processes
Query Manager
Query
entities
Real time analysis is
supported through the
development and use of query
entities.
These entities have “regular”
structure and are managed
within a Framework.
Building ontology from data flow
Transaction processes
A model of the "causes" of transaction
data.
The model is based on, grounded in,
the concept of "occurrence in the real
world", or "event".
Associated with each event, we may
have a "measurement".
So we have a set of events
{ e(i) }
Objective: We convert a stream of
event measurements into an
“transaction” ontology, and create
auxiliary processes that will use a
general risk ontology, an ontology
about process optimization, and other
“utility ontologies”.
where i is a counter.
Some of the fields MAY not be used.
Later the number of fields in any
"findings data flow" may increase or
decrease without us caring at all.
Using Ontology
Transaction processes
Knowledge
Base
Query Manager
Data
Query
entities
Consider a set of events { e(i) } where i is a
counter.
Each event will have a weakly w structured (free
form text) and structured s component. So we use
the notation
e = w/s or
e(i) = w(i)/s(i) .
Instance data
Ontology Generating Framework
Semantic extraction
Transaction processes
{ w(i) }
{ e(i) }
{ s(i) }
Notation
e(i) = w(i)/s(i)
Discrete analysis
An event is measured by filling in slots in a data
entry form, and by typing in natural language into
comments fields in these entry forms.
Semantic extraction is performed using one of
several tools, or tools in combination with each other
Observation: Given real data, one can categorize
the set of events due to the nature of the
information filled in.
Discrete analysis is mostly the manual
development of ontology through the study of natural
categories in how the data is understood by
humans.
Finding data regularity in its natural contextualization
For each event we may have zero or more free text
fields. Suppose we concatenate these, into one text
unit, and perhaps also develop some metadata (in
some way) that will help contextualize the semantic
extraction process. We label this unit as “ w(i) ”.
Free form text is weakly structured.
The set
{ w(i) }
Semantic extraction
is a text corpus that we would like to “associate” with
several ontologies.
{ w(i) }
Regularity in data flow is “caused” by the events
occurring in the external world. Thus the instances
of specific data in data records provide to the
knowledge system a “measurement” of the events in
the external world.
Each association is made as exploratory activities with
specific goals.
Each s(i) is a record from a single table.
Suppose there are 120 columns. Each column has
values, sometimes empty. Fix the counter at *.
Let s(*, j) , j = 1, . . . , 120 be the columns. We can call
these columns also using the term “slot”.
{ s(i) }
Discrete analysis
Now for each s(*, j) list the values that are observed to
be in that column. These values are the possible
“fillers” for the associated slot.
Data regularity in context …. Patterns and invariance
•
XML and related standards
•
community open and protected standards (CoreTalk, Rosetta Net,
ebXML)
•
.NET component management
•
J2EE frameworks – Spring
•
General framework constructions
•
autonomous extraction (Hilbert encoding of data into keyless hash
tables, SLIP Shallow Link analysis Iterated Scatter-Gather an
Parcelation) Core, AeroText (etc) semantic extraction using process
knowledge and text input
The role of community
1) A community of practice provide a reification process that is
Human centric (geographical-community / functional-community)
2) Each community may have locally instantiated OWL ontology
with topic map visualization.
a) Consistently and completeness is managed locally as a property of the human individuals,
acting within a community, and a locally persisted history of informational transactions
with his/her agent
b) Individual agents can query for and acquire XML based Web Services, negotiate with other
agents and create reconciliation processes involving designated agencies.
3) Knowledge engineers act on the behalf of policy makers to
•
•
•
reify new concepts,
delete concepts and to
instantiate reconciliation containers
Establishing coherence in “natural” knowledge representation
1) Coherence is a physical phenomenon seen in lasers
a)
b)
c)
Brain function depends critically on electromagnetic coherence
Incoherence, e.g. non-rationality, and in-completeness are two separate contrasting issues
to the issue of coherence
Mathematics, and therefore computer science and logic, have completeness and
consistency issues that are well established and well studied
2) Logical coherence is sometimes treated as consistency in logic
a)
b)
One may think that one has logical consistency and yet this property of the full system was lost at the
last transaction within the Ontology
The act of attempting to find a complete representation of information organization is sometimes
called “reification”, and reification efforts works against efforts to retain consistency
3) Human usability often is a function of a proper balance
between logic and agility
Understanding multiple viewpoints
1) Logical consistency and single minded-ness are operational
linked together in most current generation decision support
systems. Database schema legacy issues. Schema servers, FEA (Federal
Enterprise Architecture) standards, schema independent data systems
2) Observation: Human cognitive capabilities have far more
agility than current generation decision support systems
3) The topic map standard (2001, Newcomb and Briezski ) was
specifically developed to address the non-agility of Semantic
Web standards based on OWL and RDF. (Ontopia, Steve Pepper)
4) Combining XML repositories, OWL, distributed agent
architectures and Topic Maps is expressed as Stratified
Ontology Management
Detection of novelty
Scenario: an targeting and search analyst at the Port of Seattle is
only partially aware of why she feels uncomfortable about
some characteristic of a shipment from Sweden. The feeling is
expressed in a hand written finding and fed into a document
management repository for findings. A targeting and search
analyst at the Port of LA expresses a fact about a similar
shipment without knowing of her colleague’s sense of
discomfort.
1)
Conceptual roll-up techniques are used on a minute by minute basis to create a
viewable topic map over occurrences of concepts expressed in findings.
2) Link analysis connects an alert about uncertainty in the Seattle finding to the fact
from LA to produce new information related to a known vulnerability and attack
pattern.
3) New knowledge forms are propagated into OWL instantiated ontology
and rules and viewed using Topic Maps.
Agent architecture
Scenario: Human analysts provide situational awareness via tacit
knowledge, personally agent mediated interactions with agent
structures, and human to human communications. A model of
threats and vulnerabilities has evolved but does not account
for a specific new strategy being developed by a smart
smuggler. The smuggler games the current practices in order
to bring illegal elements into the United States
1)
The model of threats and vulnerabilities expresses as a reification process from
various techniques and encoded OWL/Protégé ontology with rules
2)
Global Ontology: The model is maintained via near real time agency processes
under the observation, active review, of knowledge engineers and program
managers working with knowledge of policy and event structure
3)
Local Ontology: Information is propagated to individual analysts via alerts and
ontology management services controlled by the localized agent (of the person)
New (1/30/2005) tutorial on automated
extraction of ontology from free form text:
http://www.bcngroup.org/beadgames/anticipatoryWeb/23.htm