Download Modeling Symbolic Concepts and Statistical Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Differentiable manifold wikipedia , lookup

CR manifold wikipedia , lookup

Transcript
Modeling Symbolic Concepts
and Statistical Data in a
Unified Knowledge
Representation Using
Manifold Learning
Project Proposal
Ian Perera
Summary
Project Objectives
The high-level goal of this project is to create a general-purpose
knowledge representation system for combining numerical data with
conceptual representations of that data in such a way as to allow for
inference methods combining symbolic and statistical data.
Intellectual Merit
Both symbolic and statistical approaches to artificial intelligence have
unique limitations. Symbolic AI can encode general knowledge that people
use in interpreting data, but requires a strict representation that limits the
kinds of data that can be added to the knowledge base. Statistical AI can
more easily handle a wide variety of data, but is unable to assign meaningful
structure or domain-specific information to such data. Combining the
strengths of these two approaches could lead to advances in both fields.
Impacts on the Field
Any field dealing with machine learning or artificial intelligence stands
to gain from a unified knowledge representation. The task of object
identification in computer vision can use symbolic knowledge to combine
those data and identify an object. For example, the system could know that
pencils are usually yellow and have a pink end on one side and a point on
the other. Even if the system has not seen a pencil before, the combination
of various detectors would provide characteristics that match those expected
of a pencil.
The benefits extend to the symbolic side as well – knowledge base
systems can use evidence from the world obtained through various sensors
to increase their knowledge base. If a knowledge system sees an object that
has been identified as a pencil, then it can store the other properties it
detects in a symbolic representation (such as, pencils have a pointed end).
Description
Background
Research in the fields of symbolic and statistical AI has progressed
mostly independently in the past decades. However, one common thread
that has appear in literature on both sides is the notion of distance metrics.
Symbolic natural language processing has delved into numerical methods for
hierarchical ontologies such as WordNet to judge the semantic similarity of
words, while statistical AI uses a variety of distance metrics to classify
samples using a nearest neighbor approach in a vector space.
This unified concept of distance suggests a manifold learning approach
to conceptual modeling. Given a high-dimensional vector representing
various features of a dataset, manifold learning generates a lowerdimensional arrangement of the data points that attempts to preserve the
original distances between the points as much as possible (known as
reducing the stress). Statistical inference performed in this lower-dimensional
space is more reliable, more general, and can be assigned symbolic
representation.
Related Work
The motivation to represent semantic concepts in a vector space is
outlined in Conceptual Spaces: The Geometry of Thought by Peter
Gärdenfors. Gärdenfors suggests modeling semantic knowledge through the
intersection of various vector spaces for different features, such as color,
shape, texture, etc. However, he does not outline a working system for
implementing such a space and leaves open the questions of how to deal
with high-dimensional data, what kind of manifold learning system to use,
and how to perform inference on the data.
Others have used symbolic data for statistical classification in
computer vision, but they have not implemented a unified system that
applies to a wide variety of fields.
Manifold learning and other dimensionality reduction techniques have
been used in various fields, yet few, if any systems have implemented
constraints from a symbolic concept representation on dimensionality
reduction or vice versa.
Overview
This system will be composed of three core components that interact
with each other:
1. A manifold learning algorithm that allows for incremental updates
without recomputing the entire vector space, and allows for
constraints between manifolds in the same space (yet perhaps from
a different dimension)
2. A system for mapping of semantic relationships (inheritance,
properties, etc.) to vector space patterns and distances
3. An inference system that incorporates both spatial arrangement and
symbolic links to generate a model of the data.
One possible manifold learning algorithm that would fit this system is the
elastic map algorithm, where the manifold is found by reducing the stress of
a network of springs representing the manifold in a low-dimensional space.
The springs connect actual or interpolated data points and together form the
structure of the manifold. Since new springs can be added to the manifold
(representing new data points), the manifold of a particular concept can be
updated incrementally. Furthermore, semantic concepts themselves can be
springs where the stiffness is proportional to the confidence of the
relationship. Various mappings of semantic relationships would be explored
to find those that are suited to a vector space representation.
Schedule
1. Implement a manifold learning system with the desired algorithm
2. Provide an interface for supplying arbitrary data
3. Formulate an analogue of semantic representations for the manifold
learning system
4. Provide an interface for combining data with semantic information
5. Implement categorization that is conditional on both symbolic and
statistical data
6. Implement inference that is conditional on both symbolic and statistical
data
7. Develop a demonstration of the system in a computer vision
application
8. Research data mining techniques that could be used to increase the
size and quality of the knowledge base
Investigator Roles
The required expertise for this project is as follows:
1. A machine learning expert with experience in manifold learning
2. An expert in symbolic knowledge representations, particularly with
semantic networks or similar representations
3. An expert in knowledge extraction from various sources, including the
web
Sources
Gärdenfors, Peter. Conceptual Spaces: The Geometry of Thought. Cambridge,
MA: MIT, 2000.