Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling Symbolic Concepts and Statistical Data in a Unified Knowledge Representation Using Manifold Learning Project Proposal Ian Perera Summary Project Objectives The high-level goal of this project is to create a general-purpose knowledge representation system for combining numerical data with conceptual representations of that data in such a way as to allow for inference methods combining symbolic and statistical data. Intellectual Merit Both symbolic and statistical approaches to artificial intelligence have unique limitations. Symbolic AI can encode general knowledge that people use in interpreting data, but requires a strict representation that limits the kinds of data that can be added to the knowledge base. Statistical AI can more easily handle a wide variety of data, but is unable to assign meaningful structure or domain-specific information to such data. Combining the strengths of these two approaches could lead to advances in both fields. Impacts on the Field Any field dealing with machine learning or artificial intelligence stands to gain from a unified knowledge representation. The task of object identification in computer vision can use symbolic knowledge to combine those data and identify an object. For example, the system could know that pencils are usually yellow and have a pink end on one side and a point on the other. Even if the system has not seen a pencil before, the combination of various detectors would provide characteristics that match those expected of a pencil. The benefits extend to the symbolic side as well – knowledge base systems can use evidence from the world obtained through various sensors to increase their knowledge base. If a knowledge system sees an object that has been identified as a pencil, then it can store the other properties it detects in a symbolic representation (such as, pencils have a pointed end). Description Background Research in the fields of symbolic and statistical AI has progressed mostly independently in the past decades. However, one common thread that has appear in literature on both sides is the notion of distance metrics. Symbolic natural language processing has delved into numerical methods for hierarchical ontologies such as WordNet to judge the semantic similarity of words, while statistical AI uses a variety of distance metrics to classify samples using a nearest neighbor approach in a vector space. This unified concept of distance suggests a manifold learning approach to conceptual modeling. Given a high-dimensional vector representing various features of a dataset, manifold learning generates a lowerdimensional arrangement of the data points that attempts to preserve the original distances between the points as much as possible (known as reducing the stress). Statistical inference performed in this lower-dimensional space is more reliable, more general, and can be assigned symbolic representation. Related Work The motivation to represent semantic concepts in a vector space is outlined in Conceptual Spaces: The Geometry of Thought by Peter Gärdenfors. Gärdenfors suggests modeling semantic knowledge through the intersection of various vector spaces for different features, such as color, shape, texture, etc. However, he does not outline a working system for implementing such a space and leaves open the questions of how to deal with high-dimensional data, what kind of manifold learning system to use, and how to perform inference on the data. Others have used symbolic data for statistical classification in computer vision, but they have not implemented a unified system that applies to a wide variety of fields. Manifold learning and other dimensionality reduction techniques have been used in various fields, yet few, if any systems have implemented constraints from a symbolic concept representation on dimensionality reduction or vice versa. Overview This system will be composed of three core components that interact with each other: 1. A manifold learning algorithm that allows for incremental updates without recomputing the entire vector space, and allows for constraints between manifolds in the same space (yet perhaps from a different dimension) 2. A system for mapping of semantic relationships (inheritance, properties, etc.) to vector space patterns and distances 3. An inference system that incorporates both spatial arrangement and symbolic links to generate a model of the data. One possible manifold learning algorithm that would fit this system is the elastic map algorithm, where the manifold is found by reducing the stress of a network of springs representing the manifold in a low-dimensional space. The springs connect actual or interpolated data points and together form the structure of the manifold. Since new springs can be added to the manifold (representing new data points), the manifold of a particular concept can be updated incrementally. Furthermore, semantic concepts themselves can be springs where the stiffness is proportional to the confidence of the relationship. Various mappings of semantic relationships would be explored to find those that are suited to a vector space representation. Schedule 1. Implement a manifold learning system with the desired algorithm 2. Provide an interface for supplying arbitrary data 3. Formulate an analogue of semantic representations for the manifold learning system 4. Provide an interface for combining data with semantic information 5. Implement categorization that is conditional on both symbolic and statistical data 6. Implement inference that is conditional on both symbolic and statistical data 7. Develop a demonstration of the system in a computer vision application 8. Research data mining techniques that could be used to increase the size and quality of the knowledge base Investigator Roles The required expertise for this project is as follows: 1. A machine learning expert with experience in manifold learning 2. An expert in symbolic knowledge representations, particularly with semantic networks or similar representations 3. An expert in knowledge extraction from various sources, including the web Sources Gärdenfors, Peter. Conceptual Spaces: The Geometry of Thought. Cambridge, MA: MIT, 2000.