Download Text Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Text and Web
Mining
I. Text Mining is part of our
lives
Google trends
Google correlate
Social Metrics Insight
Related words on “bigdata”
Sentiment analysis on
“bigdata”
summly
summly
In March 2011, D’Aloisio created Trimit, an
app that summerizes e-mails, blog posts
and more into 1,000, 500, or 140-character
summaries and be able to share it via SMS,
email, Facebook, Twitter in .txt form in just
a few clicks or shakes of your iPhone. In
July of the same year, Apple named Trimit
as a noteworthy app on the. App Store
II. What is text mining?
Text Mining
•
Text mining
Application of data mining to nonstructured or less structured text files. It
entails the generation of meaningful
numerical indices from the unstructured
text and then processing these indices
using various data mining algorithms
Text Mining
•
Text mining helps organizations:
– Find the “hidden” content of documents,
including additional useful relationships
– Relate documents across previous unnoticed
divisions
– Group documents by common themes
Text Mining
•
Applications of text mining
– Automatic detection of e-mail spam or
phishing through analysis of the document
content
– Automatic processing of messages or e-mails
to route a message to the most appropriate
party to process that message
– Analysis of warranty claims, help desk
calls/reports, and so on to identify the most
common problems and relevant responses
Text Mining
•
Applications of text mining
– Analysis of related scientific publications in
journals to create an automated summary
view of a particular discipline
– Creation of a “relationship view” of a
document collection
– Qualitative analysis of documents to detect
deception
Text Mining
•
How to mine text
1. Eliminate commonly used words (stop-words)
2. Replace words with their stems or roots
(stemming algorithms)
3. Consider synonyms and phrases
4. Calculate the weights of the remaining terms
Web Mining
•
Web mining
The discovery and analysis of interesting
and useful information from the Web,
about the Web, and usually through Webbased tools
Data Mining Project Processes
Web Mining
•
•
•
Web content mining
The extraction of useful information from Web
pages
Web structure mining
The development of useful information from the
links included in the Web documents
Web usage mining
The extraction of useful information from the
data being generated through webpage visits,
transaction, etc.
Web Mining
•
Uses for Web mining:
– Determine the lifetime value of clients
– Design cross-marketing strategies across
products
– Evaluate promotional campaigns
– Target electronic ads and coupons at user
groups
– Predict user behavior
– Present dynamic information to users
Sentiment analysis
(Opinion Mining)
• sentiment analysis aims to determine the
attitude of a speaker or a writer with respect to
some topic or the overall contextual polarity of a
document. The attitude may be his or her
judgment or evaluation on affective state (that
is to say, the emotional state of the author
when writing), or the intended emotional
communication.
A basic task in sentiment analysis is
classifying the polarity (+, -)of a given text
at the document, sentence, or
feature/aspect level — whether the
expressed opinion in a document, a
sentence or an entity feature/aspect is
positive, negative, or neutral. Advanced,
"beyond polarity" sentiment classification
looks, for instance, at emotional states
such as "angry," "sad," and "happy."
Applications
Detecting the polarity of product reviews and movie
reviews respectively.
Classifying a movie review as either positive or negative
to predicting star ratings on either a 3 or a 4 star scale
Analysis of restaurant reviews, predicting ratings for
various aspects of the given restaurant, such as the food
and atmosphere (on a five-star scale).
Social network analysis
Social network analysis (SNA) is the use of
network theory to analyse social networks. Social
network analysis views social relationships in
terms of network theory, consisting of nodes,
representing individual actors within the network,
and ties which represent relationships between
the individuals, such as friendship, kinship,
organizations and sexual relationships. These
networks are often depicted in a social network
diagram, where nodes are represented as points
and ties are represented as lines. (NodeXL)
Human SNS Graph
III. Text Mining Cases
• Cases on text mining
IV. Text Mining Techniques
• R
• Python
• Open API