Download Strategies for Choosing Your Next Text Analytics Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Pattern recognition wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Strategies for Choosing Your
Next Text Analytics Project
Andrew Fast
Chief Scientist
Elder Research, Inc.
[email protected]
Text Analytics World – Boston 2013
Trough of Disillusionment
Source: Hype Cycle for Emerging Technologies 2012, Gartner Focus on The Business Side
•  Analysis of ~150 projects performed by Elder
Research, Inc. Technical
Success
(95%)
Technical
Failure (5%)
Business
Success
(66%)
Business
Failure (33%)
Strategies for Success
•  Success happens at the
intersection of:
–  Business Need: the
problem being tackled
Data
Availability
Business
Need
–  Data Availability: the data
being analyzed
–  Technical Resources: the
people and software
currently available
Technical
Resources
(Graphic inspired by Paul Cohen, Empirical Methods for Artificial Intelligence, MIT Press) Focus on Business Need
•  Manage Expectations
•  Choose A Ripe
Problem
Business
Need
•  Favor High Returns
(Graphic inspired by Paul Cohen, Empirical Methods for Artificial Intelligence, MIT Press) Manage Expectations
•  Software purchase(s)
needed, unless you
want to do analytics by
hand.
•  Time is required to tune
system and/or label
data.
•  The first solution will
likely need to be refined.
–  Initial performance often
low
Text Mining…is there anything it can’t do? Choose a Ripe Problem
•  Gain Expected
–  Leverageable - an
incremental improvement
will matter –  “Low-hanging fruit” nobody’s yet dared attack
the problem
•  Addresses major pain
point
–  Move towards the ideal
process
Favor High Returns
•  Conduct a pilot project.
Simultaneously:
–  “Hit a single”: automate
key task, create
dashboard, or graphic
–  “Swing for fences”:
attack core weakness
•  Prove ROI early
–  Make allies and decisionmakers look good
Focus on Data Availability
•  Be Intentional About
Variety
Data
Availability
•  Don’t wait for Perfect
Data
Be Intentional About Variety
•  Enhance the value of
text with structured
data (or other
unstructured data)
•  Big Data systems
“integrate information
from varied sources for
•  Structured data is
deeper/broader
unambiguous (unlike
understanding”
text)
–  Sue Feldman, CEO of Synthexsis
(TAW San Francisco, 2013)
•  Helps to steer text
analytics
Don’t Wait For Perfect Data
http://www.sas.com/knowledge-exchange/business-analytics/operationalizing-analytics/cant-start-yet-im-waiting-on-perfect-data/index.html
•  The perfect data are never available!
–  Don’t wait for it to be perfect
–  Don’t assume what you have is perfect
•  Can you acquire more data?
Focus On Technical Resources
•  Focus on People, not
Technology
•  Shake Things Up
Technical
Resources
Focus on People, not Technology
•  Interdisciplinary –  Experts needed in business
area, statistics, algorithms,
linguistics, and databases
•  Business Champion is
essential!
•  Close ties between
analytic experts and key
personnel
–  Transfer technology
–  Internalize essential steps
Shake Things Up
•  Increase your available
skill set through
training, hiring, etc.
•  Try a new software
tool
•  Analyze a new dataset
•  Explore a new practice
area of text analytics
Text Mining Taxonomy
Text Mining
Foundations
Are you interested in results about
individual words or at a higher level
(i.e., sentences, paragraphs or
documents)?
Documents
Do you want to sort
all documents into
categories or search for
specific documents ?
Do you want to
automatically identify
specific facts or gain
overall understanding?
Sort
Search
Information
Retrieval
Specific Facts
Do you have categories
already?
No Categories
Document
Clustering
From Chapter 2
Words
Have Categories
Information
Extraction
Are your documents
independent or
connected via
hyperlinks?
Independent
Document
Classification
Connected
Web Mining
Understanding
Is your focus on the
meaning of the text or the
structure?
Structure
Natural Language
Processing
Meaning
Concept
Extraction
Strategies for Success
1.  Manage Expectations
2.  Choose A Ripe Problem
3.  Favor High Returns
4.  Be Intentional About Variety
Data
Availability
Business
Need
5.  Don’t wait for Perfect Data
6.  Focus on People, not
Technology
Technical
Resources
7.  Shake Things Up
(Graphic inspired by Paul Cohen, Empirical Methods for Artificial Intelligence, MIT Press) Contact Information
Andrew Fast, Ph.D.
Chief Scientist
[email protected]
(434) 973-7673
www.datamininglab.com
17
Practical Text Mining
•  Winner of the 2012
PROSE award for
Computing and
Information Science
•  Written for a technical
audience seeking more
text experience
•  Includes trial versions
of major software tools
Andrew Fast"
Chief Scientist, Elder Research, Inc.
Dr. Andrew Fast leads research in Text Mining and Social
Network Analysis at Elder Research, the nation’s leading data
mining consultancy. ERI was founded in 1995 and has offices in
Charlottesville VA and Washington DC,
(www.datamininglab.com). ERI focuses on Federal, commercial,
investment, and security applications of advanced analytics,
including stock selection, image recognition, biometrics,
process optimization, cross-selling, drug efficacy, credit
scoring, risk management, and fraud detection.
Dr. Fast graduated Magna Cum Laude from Bethel University and earned Master’s
and Ph.D. degrees in Computer Science from the University of Massachusetts
Amherst. There, his research focused on causal data mining and mining complex
relational data such as social networks. At ERI, Andrew leads the development of
new tools and algorithms for data and text mining for applications of capabilities
assessment, fraud detection, and national security.
Dr. Fast has published on an array of applications including detecting securities
fraud using the social network among brokers, and understanding the structure
of criminal and violent groups. Other publications cover modeling peer-to-peer
music file sharing networks, understanding how collective classification works,
and predicting playoff success of NFL head coaches (work featured on ESPN.com).
With John Elder and other co-authors, Andrew has written a book on Practical
Text Mining, that was awarded the prose Award for Computing and Information
Science in 2012.
19