Download Text Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Transcript
Text Mining
“Text mining seeks to apply some of the same types of analysis, such as knowledge
discovery, or trend analysis, to unstructured textual data, that data mining applies to
structured data. Text mining combines the disciplines of data mining, information
extraction, information retrieval, text categorization, probabilistic modeling, linear
algebra, machine learning, and computational linguistics to discover structure,
patterns, and knowledge in large textual corpora.”2
The linear algebra aspect of text mining really takes advantage of vector spaces.
The way it works is, data is represented in vector space models as numeric vectors
and matrices, then using matrix analysis results are discovered that are relative to
the search item. This is highly used by most search engines, especially google. On
reason being, this allows for a ranking system. When your list of websites gets
displayed you will see a relevance percentage. “The vector space model allows
documents to partially match a query by assigning each document a number between 0
and 1, which can be interpreted as the likelihood of relevance to the query.” That number
is turned into a percentage and that is the percentage that will display next to each site in
rank. Here is a pictorial representation of the vector spaces returned when applied to the
word chair.
And I’ll leave you with a little quote I found about google during my research that I
know you will appreciate and I hope to find room for on my poster.
“It’s not my homepage, but it might as well be. I use it to ego-surf. I use it to read the news. Anytime I want
to find out anything, I use it.”—Matt Groening, creator and executive producer, The Simpsons1
1. Google's PageRank and Beyond: The Science of Search Engine Rankings, Amy N.
Langville and Carl D. Meyer
2.Visualize Word Meanings, from the infoMap project done by Stanford Uninversity.
http://infomap.stanford.edu./