Download Text Mining Application Programming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Text Mining Application Programming
Chapter 1 Introduction
Manu Konchady, 2006
Definition: Text Mining
 all types of text processing that deal with finding,
organizing, and analyzing information.
(formal) the creation of new information that is not
obvious in a collection of documents.
New information is defined as a pattern, trend, or
relationship that can’t be easily gleaned by reading
individual documents.
The term document to refer to any unit of text, such as a
Web page, an e-mail, a formatted article, a set of slides, or
a plain text file.
Data Mining vs. Text Mining
 Data mining deals with structured numeric
data, text mining deals with unstructured text.
Data used for data mining is extracted,
transformed, and loaded in a data warehouse.
Text mining attempts to build a model from
data that is assumed to be imprecise.
Origins of Text Mining
Information Retrieval
Natural Language Processing
Understanding Text
 “Alice saw the rabbit with glasses,”
Polysemy
“In what state would you find Lincoln”
“free software”
Synonymy
More than one word can be expressed the same meaning.
Exuberant: lush, luxuriant, profuse, and riotous.
An Architecture for Text Mining
Applications
Text Mining Functions
Searching
Information Extraction
Clustering
Categorization
Summarization
Information Monitor
Question and Answer
A Layered Model
Text Mining Installation
Text Mine (http://textmine.sf.net) is a
collection of Perl modules and code on
SourceForge to index, cluster, classify, and
summarize text.
Usage
Command line
Web-based interface.
Web Interface