Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FYP Project Report - 1 Knowledge Retrieval for Financial Data Problem statement Given a set of financial documents our project is to extract as much Knowledge from it as an expert human would do. In this project, we present a design of a decision support system for the automated retrieval of fiscal information, from a collection of text documents. This system extends the idea of data mining to the notion of knowledge retrieval. The collected information is processed in a knowledge-intensive way using knowledge models of the application domain and can subsequently be used to support various decision-making processes. Motivation The most important problem from the point of view of companies is integration of database management and IR systems. After reading a few papers in the information retrieval and knowledge discovery fields, we wanted to know where hot research is going on. We stumbled across a survey paper conducted by IBM for identifying the 10 most important goals for IR and Knowledge Retrieval was the most important. We were looking for a way to make the decision making process a little simpler and the solution is right here. Retrieving knowledge from vast unstructured text will help us in various ways, the foremost being decision making and trend prediction. In the modern day business world, decision making is dependant on the inferences we draw from financial data. Knowledge in this domain forms the back-bone of all decision making, trend prediction for the company etc. Hence we decided upon implementing a Knowledge Retrieval System for financial data. Background Information Retrieval mainly concerns with getting Useful Information from Huge amounts of Data. We looked into many papers in the field of Text Mining for this amongst which the most related and useful were “Text Mining: Finding Nuggets in Mountain of Text Data” Joschen Dorre et al... Another important field in IR is getting useful information from the daily news. “Monitoring a Newsfeed for Hot Topics” Mark Shewhart et al… According to a survey paper conducted by IBM for identifying the 10 most important goals for IR our project qualifies to be in the 1st position for Integrated Solutions. This kind of work in going on in many different forms from the Knowledge Discovery of IBM to Web Information Systems “Recognizing Ontology – Applicable Multiple Record Web Documents” David Embely et al… “Querying Web Information Systems” Kalus Dieter “Distributed Hypertext Resource Discovery through Examples” Soumen Chakrabarthy et al… This project has primarily two aspects to it. 1. Information Extraction 2. Data Mining Information Extraction field for free text started with AutoSlog which was a single slot IE algorithm. It required syntax definition and semantic tagging. The rules generated also had to be trimmed to get the final results. Then Crystal was developed for multi slot extraction. But it still required syntax definition and semantic tagging and the rules generated still had to be trimmed. The first algorithm to remove trimming was Liep but it was for multi slot only and required syntax definition and semantic tagging. Then Whisk was developed which could do both single and multi slot with least amount of tagging. We would be using Whisk for our project. The papers we have gone through include: Learning Information Extraction Rules from Semi structured and free text Soderland et al.. Crystal: Inducing a conceptual dictionary Soderland et al.. Automatically constructing a dictionary for IE Riloff E. Wrapper Induction for IE Kushmerick et al.. Learning Information Extraction Patterns from Examples Huffman S. Various Data Mining Algorithms are present. Data Mining Algorithms implemented to get Knowledge would be Time Series Analysis Temporal Data Mining Clustering Decision Tree Association Rules Plan for the overall project implementation IE Mining Knowledge Unstructured Free Text in the Financial Domain like Company Reports, quarterly results, etc. Structured Schema. This contains coherent information extracted from the unstructured text which is given a structure by the schema of the database. Knowledge helpful in decision making, trend prediction, etc. Aids as an analysis tool for the company. Decision of the data set will decide the kind of knowledge we will be retrieving. We plan to capitalize on newspaper reports, companies’ financial reports etc. Time line for project execution By 1/12/2002: Collect all the relevant datasets and start implementing the Information Extraction Algorithm “Whisk” By 1/01/2003: Complete Implementing Whisk and Start experimenting with the dataset for a Schema and other things. By 1/02/2003: Create sample and actual database instances for Data Mining part. By 1/03/2003: Complete Implementation of Data Mining Algorithms and make the project ready for demonstrations. Collection of Data Sets Start Implementing Whisk 25/11/2002 Finish Implementing Whisk: 1/01/2003 Create Sample Database Instances: 1/02/2003 Try the data mining algorithms on the data set: 3/02/2003 Final Submission 1/03/2003 Project Team Members: Arun Malhotra - 99007 Shalav Gupta - 99063 OVL Kiran Kumar - 99026