Download capturing the value of unstructured data: introduction to text mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CAPTURING THE VALUE OF UNSTRUCTURED DATA:
INTRODUCTION TO TEXT MINING
Mary-Elizabeth (“M-E”) Eddlestone
Principal Systems Engineer, Analytics
SAS Customer Loyalty, SAS Institute, Inc.
Copyright © 2013, SAS Institute Inc. All rights reserved.
Is there
valuable
information
“locked away”
in your
unstructured
data?
Copyright © 2013, SAS Institute Inc. All rights reserved.
2
CURRENT SITUATION:
COMMON QUESTIONS ABOUT TEXTUAL DATA SOURCES
Are there hidden insights within text data
sources that can help my organization?
Such as call center notes, emails, news,
government filings, social media…
Can I also use text data
to analyze and
predict the future?
To reduce fraud, reduce churn,
improve sales, reduce costs…
How can I leverage on our textual
data sources?
What value can it bring?
How can I leverage on both
unstructured and structured
data sources?
Customer data + Customer
feedback?
Need to leverage the most from text data!
Copyright © 2013, SAS Institute Inc. All rights reserved.
WHAT IF YOU
COULD….
Extract key information from text data? e.g. people, places, companies
See how things are related to each other?
Across a large number of documents and messages?
Discover main ideas/ topics across all documents and messages
Find patterns across non/text data, that can predict the future
Copyright © 2013, SAS Institute Inc. All rights reserved.
WHAT IF YOU
COULD…
Discover new insights from large text data sources
Extract key patterns from text data to predict the future
Discover current topics about your products from customer opinions
Find patterns within customer feedback, that predicts good interest in upsell opportunities
Detect anomalies from usual topics described in text reports, text applications or feedback
Find patterns in reports that may seem to predict/ relate to suspicious behavior
Understand previously unknown issues/ concerns, from citizen discussions on twitter/ forums
Extract key opinions from citizen feedback to forecast citizen sentiments in the near future
Customers
Fraud
Public Opinion
Copyright © 2013, SAS Institute Inc. All rights reserved.
WHERE IS TEXT MINING USED?
Text Mining
has numerous
applications in
any industry
Government
Finance
Insurance
Detect fraudulent activity.
Spot emerging trends and
public concerns.
Retention of current customer
base using call center
transcriptions or transcribed
audio. Identification of
potentially fraudulent activities.
Identify fraudulent claims.
Track competitive
intelligence.
Brand management
Retail
Manufacturing
Telecommunications
Life Sciences
Identify the most profitable
customers and the
underlying reasons for their
loyalty.
Brand management
Reduce time to detect root
cause of product issues.
Identify trends in market
segments.
Help prevent churn and suggest
up-sell/cross-sell opportunities for
individual customers.
Identify adverse
events.
Recommend
appropriate research
materials.
Copyright © 2013, SAS Institute Inc. All rights reserved.
TEXT MINING
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS® Text Analytics
Domain-Driven
Analysis-Driven
Information Organization and
Access
Predictive Modeling, Discover
Trends and Patterns
SAS
Enterprise
Content
Categorization
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS
Ontology
Management
SAS Text Miner
SAS Sentiment
Analysis
SAS® TEXT MINER
•
Is a complete solution, to discover insights or predict behaviour
and outcomes – by leveraging on data mining capabilities of SAS®
Enterprise Miner™ and SAS natural language processing (NLP)/
advanced linguistic technologies.
•
What is Concept Extraction?
•
•
What is Concept Linking?
•
•
To look within a large corpus of text documents to discover how
concepts/ key information are associated/ linked with each
other.
What is Topic Discovery?
•
Copyright © 2013, SAS Institute Inc. All rights reserved.
To automatically locate and extract the key information from
documents based on the rules & advanced linguistic logic
To analyse a large corpus of text documents to discover topics
by grouping messages that has very similar content.
HOW DOES TEXT MINING WORK?
EXPLORING & DISCOVERING INSIGHTS
1. Input text messages –
e.g. twitter data, reports, email, news, forum messages
Copyright © 2013, SAS Institute Inc. All rights reserved.
2. Parse & explore Text Data –break down text and explore relationships of key concepts such as persons, places, organizations…
3. Discover Topics – cluster documents of similar content and describe them with important key words
HOW DOES TEXT MINING WORK?
DISCOVER PATTERNS FOR PREDICTIVE MODELING
1. Input text messages with relevant structured data –
e.g. email, call center notes, applications
2. Parse Text Data and Discover Topics – Break down text into structured data, group messages of similar content
3. Predictive Modeling with text data – text data input into models may provide reliable info to predict outcome & behavior
Customer
data
Predict activity that is likely fraudulent…
Copyright © 2013, SAS Institute Inc. All rights reserved.
WHAT CAN WE DISCOVER?
Discover relationships between concepts described in large corpus of text data –
how are persons, places, organizations related?
Discover topics mentioned in text data–
what are main topics mentioned? What are the rare topics?
Discover patterns related to structured data –
e.g. how is feedback related to customer purchase behavior?
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE – DISCOVERING INSIGHTS
FROM CUSTOMER COMPLAINT DATA
From customer complaints to engineer logs to legal documents, it is a considerable challenge to draw insights from large amounts of information, and usually unfeasible via manual means.
This is even more difficult when we wish to detect concepts and patterns within the documents, in order to find trends and detect high risk events
THE DRIVER SIDE SEAT BELT SOMETIMES
FAILS TO RETRACT. WHEN I PULLED THE
BELT OUT, IT STAYED OUT AND WOULD
NOT RETRACT. I INSPECTED THE AREA
AND FOUND NO INTERFERENCE. THIS
HAPPENED ON A SAT. I DROVE THE
VEHICLE SAT. AND SUN WITH A FAULTY
BELT. I CALLED THE DEALERS SERVICE
DEPT. TOLD THEM THE PROBLEM BUT
COULDN'T GET IN FOR A WEEK.
Copyright © 2013, SAS Institute Inc. All rights reserved.
How can we analyse millions of documents quickly and identify key patterns and cases of high risk? (e.g. risk of fraudulent activity)
EXAMPLE – DISCOVERING INSIGHTS
FROM CUSTOMER COMPLAINT DATA
SAS Text Miner automates manual comprehension of text documents, uncovering relationships and trends of concepts mentioned across documents, allowing drill down analysis and integrated with predictive modeling
within SAS Enterprise Miner.
In this example, we look at a large database of car faults
Car Fault
Records
Copyright © 2013, SAS Institute Inc. All rights reserved.
THE DRIVER SIDE SEAT BELT
SOMETIMES FAILS TO RETRACT.
WHEN I PULLED THE BELT OUT, IT
STAYED OUT AND WOULD NOT
RETRACT. I INSPECTED THE AREA
AND FOUND NO INTERFERENCE…
Here, SAS Text Miner runs a Text Parsing processing on thousands of reports of car faults –
• Recognizing and extracting entities and parts of speech • Supporting a wide range of languages • Into a detailed term/ document matrix
• Allowing us deeper analysis/ visualization of insights
EXAMPLE – DISCOVERING INSIGHTS
FROM CUSTOMER COMPLAINT DATA
This allows us to discover relationships between concepts across all messages –
e.g. what is usually mentioned with issues such as “brake problems”?
Discover topics mentioned in text data– e.g.
Understand the main topics: “dealerships”…
Uncover the emerging topics: “Battery issues”…
Discover patterns related to structured data –
e.g. Complaints on “engine trouble” have a higher chance of car accidents
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE – DISCOVERING INSIGHTS
FROM CUSTOMER COMPLAINT DATA
How does this help?
•
Discovery of new insights/ topics:
•
•
Text data – forum messages, emails, logs, records typically contain rich, yet sparse/ uncommon insights. Text mining allows you to:
• Parse and extract information from text data • Reliably filter and retain important information
• Automatically group documents into similar topics, allowing discovery of important/ large topics or rare/ small topics
Copyright © 2013, SAS Institute Inc. All rights reserved.
•
Text mining input in Predictive modeling:
•
Documents and records often contain important facts that can reliably predict outcomes – for e.g. any mention of bad maintenance habits will likely result in earlier car failure
•
Empowered by SAS Natural Language Processing and wide multi‐language support, Text mining discovers key trends within large amounts of text, to be used as clean, reliable input in data mining analysis.
BENEFITS
•
SAS Text Miner helps your organization to:

Uncover previously undetected associations and relationships

Get a complete view data, and drill down to specific documents
for more insight

Automate time-consuming tasks of reading and understanding text.

Analyse both text and non-text data produce predictive models that spot more
opportunities and recognize trends more accurately
Discover hidden patterns from text data
for insights and predictive modeling!
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS® TEXT MINER
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS® TEXT MINER – ANALYTICAL WORKFLOW
Text Mining
Raw Data
Copyright © 2013, SAS Institute Inc. All rights reserved.
Model with Structured
and Unstructured Data
EXAMPLE
Copyright © 2013, SAS Institute Inc. All rights reserved.
TEXT MINING PROCESS FLOWS
EXAMPLE
TEXT MINING PROCESS FLOWS
Start with a table that contains either:
- Documents saved as a variable (column)
- A column that points to physical text files
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE INPUT
VARIABLE CONTAINS FULL TEXT
DATA
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE INPUT
VARIABLE CONTAINS POINTER TO TEXT FILE
DATA
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE
TEXT MINING PROCESS FLOWS
Apply natural language processing algorithms to parse
the documents and quantify information about the
terms in the corpus.
Copyright © 2013, SAS Institute Inc. All rights reserved.
TEXT PARSING
NODE
•
•
•
•
•
•
Tokenization - break sentences or documents into terms
Stemming - identify the root form of a word (run, runs, running, ran,
etc.)
Synonyms
Remove low-information words such as a, an, and the (stop list)
Part of speech identification (noun, verb, etc.)
Identify Standard and Custom Entities (names, places, etc.)



Multiword terms or phrases (“blue screen of death”)
Import custom entities, facts, and events as defined in SAS Enterprise Content
Categorization (ECC)
Include negation entities from SAS ECC for Sentiment Analysis
Copyright © 2013, SAS Institute Inc. All rights reserved.
SUPPORTED
LANGUAGES
Arabic, Chinese, Dutch, English, French, German, Italian, Japanese,
Korean, Polish, Portuguese, Spanish, and Swedish, Czech, Danish,
Finnish, Greek, Hebrew, Hungarian, Indonesian, Norwegian,
Romanian, Russian, Slovak, Thai, Turkish, Vietnamese, Russian,
Greek, Vietnamese, Turkish, Czech, Indonesian, Thai, Danish,
Norwegian, Slovak, Finnish, Romanian, Hebrew, Hungarian, Korean
New in SAS 9.3
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE TEXT MINING PROCESS FLOWS
Perform spell-checking and refine synonym lists.
Discover related concepts using Concept Linking.
Perform full text search. Subset documents and/or
terms for further analysis.
Copyright © 2013, SAS Institute Inc. All rights reserved.
TEXT FILTER NODE
•
•
•
•
•
Spell checking
Concept Linking
Full text search
Define additional synonyms
Sub-setting management of terms and documents that are
passed to subsequent nodes
Copyright © 2013, SAS Institute Inc. All rights reserved.
FILTER VIEWER
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS Text Mining
Copyright © 2013, SAS Institute Inc. All rights reserved.
CONCEPT LINKING
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE TEXT MINING PROCESS FLOWS
Analyze the documents to create topics and assign each
document to one or more topics. In addition to derived
topics, users can add their own topic definitions.
Copyright © 2013, SAS Institute Inc. All rights reserved.
TEXT TOPIC NODE
Multiple topics per document
• Soft clustering using rotated SVD (PROC SVD followed by
PROC FACTOR)
• Allows automatic creation of single and multi-word topics
• User defined topics and editing of automatic topics
•
Copyright © 2013, SAS Institute Inc. All rights reserved.
INTERACTIVE TOPIC
VIEWER
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE TEXT MINING PROCESS FLOWS
Analyze the documents to create clusters and assign
each document to a single cluster.
Copyright © 2013, SAS Institute Inc. All rights reserved.
CLUSTER VIEWER
Copyright © 2013, SAS Institute Inc. All rights reserved.
CLUSTER VIEWER
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE TEXT MINING PROCESS FLOWS
Clusters can be further explored using the Segment
Profile node to identify factors that differentiate data
segments from the population.
Copyright © 2013, SAS Institute Inc. All rights reserved.
SEGMENT PROFILE
The Segment Profile node is available on the Assess tab of Enterprise
Miner.
• It allows the examination of segmented or clustered data to identify factors
that differentiate data segments from the population.
•
Copyright © 2013, SAS Institute Inc. All rights reserved.
SEGMENT PROFILE
Copyright © 2013, SAS Institute Inc. All rights reserved.
EXAMPLE TEXT MINING PROCESS FLOWS: PREDICTION
Several methods are available to use the unstructured
data to create predictions.
Copyright © 2013, SAS Institute Inc. All rights reserved.
WHERE IS TEXT MINING USED?
Text Mining
has numerous
applications in
any industry
Government
Finance
Insurance
Detect fraudulent activity.
Spot emerging trends and
public concerns.
Retention of current customer
base using call center
transcriptions or transcribed
audio. Identification of
potentially fraudulent activities.
Identify fraudulent claims.
Track competitive
intelligence.
Brand management
Retail
Manufacturing
Telecommunications
Life Sciences
Identify the most profitable
customers and the
underlying reasons for their
loyalty.
Brand management
Reduce time to detect root
cause of product issues.
Identify trends in market
segments.
Help prevent churn and suggest
up-sell/cross-sell opportunities for
individual customers.
Identify adverse
events.
Recommend
appropriate research
materials.
Copyright © 2013, SAS Institute Inc. All rights reserved.
BENEFITS
•
SAS Text Miner helps your organization to:

Uncover previously undetected associations and relationships

Get a complete view data, and drill down to specific documents
for more insight

Automate time-consuming tasks of reading and understanding text.

Analyse both text and non-text data produce predictive models that spot more
opportunities and recognize trends more accurately
Discover hidden patterns from text data
for insights and predictive modeling!
Copyright © 2013, SAS Institute Inc. All rights reserved.
LEARNING MORE
Copyright © 2013, SAS Institute Inc. All rights reserved.
SAS® TEXT MINER
RESOURCES
SAS Text Miner Product Web Site
http://www.sas.com/text-analytics/text-miner/index.html
SAS Text Miner Technical Support Web Site
http://support.sas.com/software/products/txtminer/index.html
SAS Text Miner Technical Forum (Join Today!)
https://communities.sas.com/community/supportcommunities/sas_data_mining_and_text_mining
SAS Training
Data Miner Training Path: http://support.sas.com/training/us/paths/dm.html
Courses for SAS® Text Miner:
https://support.sas.com/edu/prodcourses.html?code=TM&ctry=US
Copyright © 2013, SAS Institute Inc. All rights reserved.
Step-bystep
how-to
guide
http://support.sas.com/documentation/onlinedoc/txtminer/index.html
Copyright © 2013, SAS Institute Inc. All rights reserved.
Data for the
step-bystep how-to
guide
Copyright © 2013, SAS Institute Inc. All rights reserved.
DISCUSSION FORUMS
http://communities.sas.com
Copyright © 2013, SAS Institute Inc. All rights reserved.
DISCUSSION FORUMS
https://communities.sas.com/community/support-communities/text-analytics
Copyright © 2013, SAS Institute Inc. All rights reserved.
COMPLIMENTARY ON-DEMAND WORKSHOPS
http://www.sas.com/reg/offer/corp/handson
Copyright © 2013, SAS Institute Inc. All rights reserved.
THANK YOU FOR USING SAS!
Copyright © 2013, SAS Institute Inc. All rights reserved.
www.SAS.com