Download FYP Project Report

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
FYP Project Report - 1
Knowledge Retrieval for Financial Data
Problem statement
Given a set of financial documents our project is to extract as much Knowledge from it as
an expert human would do. In this project, we present a design of a decision support
system for the automated retrieval of fiscal information, from a collection of text
documents. This system extends the idea of data mining to the notion of knowledge
retrieval. The collected information is processed in a knowledge-intensive way using
knowledge models of the application domain and can subsequently be used to support
various decision-making processes.
Motivation
The most important problem from the point of view of companies is integration of
database management and IR systems. After reading a few papers in the information
retrieval and knowledge discovery fields, we wanted to know where hot research is going
on. We stumbled across a survey paper conducted by IBM for identifying the 10 most
important goals for IR and Knowledge Retrieval was the most important. We were
looking for a way to make the decision making process a little simpler and the solution is
right here. Retrieving knowledge from vast unstructured text will help us in various ways,
the foremost being decision making and trend prediction.
In the modern day business world, decision making is dependant on the inferences we
draw from financial data. Knowledge in this domain forms the back-bone of all decision
making, trend prediction for the company etc. Hence we decided upon implementing a
Knowledge Retrieval System for financial data.
Background
Information Retrieval mainly concerns with getting Useful Information from Huge
amounts of Data. We looked into many papers in the field of Text Mining for this
amongst which the most related and useful were
“Text Mining: Finding Nuggets in Mountain of Text Data”
Joschen Dorre et al...
Another important field in IR is getting useful information from the daily news.
“Monitoring a Newsfeed for Hot Topics”
Mark Shewhart et al…
According to a survey paper conducted by IBM for identifying the 10 most important
goals for IR our project qualifies to be in the 1st position for Integrated Solutions.
This kind of work in going on in many different forms from the
Knowledge Discovery of IBM to Web Information Systems
“Recognizing Ontology – Applicable Multiple Record Web Documents”
David Embely et al…
“Querying Web Information Systems”
Kalus Dieter
“Distributed Hypertext Resource Discovery through Examples”
Soumen Chakrabarthy et al…
This project has primarily two aspects to it.
1. Information Extraction
2. Data Mining
Information Extraction field for free text started with AutoSlog which was a single slot IE
algorithm. It required syntax definition and semantic tagging. The rules generated also
had to be trimmed to get the final results. Then Crystal was developed for multi slot
extraction. But it still required syntax definition and semantic tagging and the rules
generated still had to be trimmed. The first algorithm to remove trimming was Liep but it
was for multi slot only and required syntax definition and semantic tagging. Then Whisk
was developed which could do both single and multi slot with least amount of tagging.
We would be using Whisk for our project.
The papers we have gone through include:
Learning Information Extraction Rules from Semi structured and free text
Soderland et al..
Crystal: Inducing a conceptual dictionary
Soderland et al..
Automatically constructing a dictionary for IE
Riloff E.
Wrapper Induction for IE
Kushmerick et al..
Learning Information Extraction Patterns from Examples
Huffman S.
Various Data Mining Algorithms are present.
Data Mining Algorithms implemented to get Knowledge would be

Time Series Analysis

Temporal Data Mining

Clustering

Decision Tree

Association Rules
Plan for the overall project implementation
IE
Mining
Knowledge
Unstructured
Free Text in
the Financial
Domain like
Company
Reports,
quarterly
results, etc.
Structured Schema.
This contains coherent
information extracted
from the unstructured
text which is given a
structure by the schema
of the database.
Knowledge helpful in
decision making, trend
prediction, etc. Aids as an
analysis tool for the
company.
Decision of the data set will decide the kind of knowledge we will be retrieving. We plan
to capitalize on newspaper reports, companies’ financial reports etc.
Time line for project execution

By 1/12/2002:
Collect all the relevant datasets and start implementing the
Information Extraction Algorithm “Whisk”

By 1/01/2003: Complete Implementing Whisk and Start experimenting with the
dataset for a Schema and other things.

By 1/02/2003: Create sample and actual database instances for Data Mining part.

By 1/03/2003: Complete Implementation of Data Mining Algorithms and make
the project ready for demonstrations.
Collection of Data Sets
Start Implementing Whisk
25/11/2002
Finish Implementing Whisk:
1/01/2003
Create Sample Database
Instances: 1/02/2003
Try the data mining algorithms
on the data set: 3/02/2003
Final Submission
1/03/2003
Project Team Members:
Arun Malhotra - 99007
Shalav Gupta - 99063
OVL Kiran Kumar - 99026