Download Information and Knowledge Extracting from a large amount of data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Information and Knowledge Extracting from a large amount of data
ABSTRACT
Most of data extracting Techniques has proposed extracting useful patterns
In text documents.This is useful to how to powerfully use and moderate in
discovered patterns is still an open research issue, especially in the domain of text
mining. Since most existing text mining methods adopted term-based approaches,
they all suffer from the problems of polysemy and synonymy. Over the years,
people have often held the hypothesis that pattern (or phrase)-based approaches
should perform better than the term-based ones, but many experiments do not
support this hypothesis. Here this an introduce and Information and Knowledge
extracting from a large amount of data technique which includes the processes of
pattern deploying and pattern evolving, to improve the performance of using and
updating Knowledge extracting for finding relevant and interesting information.
Existing System
Many of existing letters mining methods given only term based so most of
the mistakes created from polysemy and synonymy.nowadays users have often
held the hypothesis that pattern based approaches should perform better than the
term-based ones, but many experiments do not support this hypothesis.
Problems on existing system:
1. It has problems for selecting properties to documents,
Proposed System
We Provide an Information and Knowledge Extracting, which first calculates
finding specificities of patterns and then evaluates term weights according to the
distribution of terms in the Knowledge Extracting rather than the distribution in
documents for solving the misinterpretation problem. It also considers the
influence of patterns from the negative training examples to find ambiguous
(noisy) patterns and try to reduce their influence for the low-frequency problem.
The process of updating ambiguous patterns can be referred as pattern evolution.
The proposed approach can improve the accuracy of evaluating term weights
because discovered patterns are more specific than whole documents.
Implementation
Implementation is the stage of the project when the theoretical design is
turned out into a working system. Thus it can be considered to be the most
critical stage in achieving a successful new system and in giving the user,
confidence that the new system will work and be effective.
The implementation stage involves careful planning, investigation of the
existing system and it’s constraints on implementation, designing of methods to
achieve changeover and evaluation of changeover methods.
Main Modules:1. To Rupture Method:
No of paragraphs are splited from a Documents. These Documents get a
Number of paragraphs .It has set of documents that is group of positive and
group of negative documents.
2. Specimen Assemble Method :
To set the semantic information in specimen assemble to improve the quality
of closed Specimen in Data Extracting,To arrange the patterns for easy to get . The
Searching Evoluation handled by easy for improving performance.
3. Finding Inner Paragraphs :
Now, we are going to finding for the inner paragraphs for how to handled
The d-patterns method to be find the documents in the training set.this is very
helpful to reduce the side effect of noisy patterns because of lowfrequancy
problem. A threshold is usually used to classify document into relevant and
irrelevant categories.
4. Find And Exposed :
Reuters data collection is used to find the proposed approach. Term
stemming and stopword removal techniques are used in the prior stage of text
preprocessing. Several common measures are then applied for performance
evaluation and our results are compared with the state-of-art approaches in data
mining, concept-based, and term-based methods.
5.
Baseline Models :
There are three classes of models here concept-based model ,Term based model,
These also introduce for the baseline of methodsSystem
Configuration:H/W System Configuration:Processor
-
Pentium –III
Speed
-
1.1 Ghz
RAM
-
256 MB(min)
Hard Disk
- 20 GB
Floppy Drive
-
1.44 MB
Key Board
-
Standard Windows Keyboard
Mouse
-
Two or Three Button Mouse
Monitor
- SVGA
S/W System Configuration:
Operating System
:Windows95/98/2000/XP

Application Server
: Tomcat5.0/6.X

Front End
: HTML, Java, Jsp

Scripts

Server side Script
: Java Server Pages.

Database
: Mysql 5.0

Database Connectivity
: JDBC.
: JavaScript.