Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Strategies for Choosing Your Next Text Analytics Project Andrew Fast Chief Scientist Elder Research, Inc. [email protected] Text Analytics World – Boston 2013 Trough of Disillusionment Source: Hype Cycle for Emerging Technologies 2012, Gartner Focus on The Business Side • Analysis of ~150 projects performed by Elder Research, Inc. Technical Success (95%) Technical Failure (5%) Business Success (66%) Business Failure (33%) Strategies for Success • Success happens at the intersection of: – Business Need: the problem being tackled Data Availability Business Need – Data Availability: the data being analyzed – Technical Resources: the people and software currently available Technical Resources (Graphic inspired by Paul Cohen, Empirical Methods for Artificial Intelligence, MIT Press) Focus on Business Need • Manage Expectations • Choose A Ripe Problem Business Need • Favor High Returns (Graphic inspired by Paul Cohen, Empirical Methods for Artificial Intelligence, MIT Press) Manage Expectations • Software purchase(s) needed, unless you want to do analytics by hand. • Time is required to tune system and/or label data. • The first solution will likely need to be refined. – Initial performance often low Text Mining…is there anything it can’t do? Choose a Ripe Problem • Gain Expected – Leverageable - an incremental improvement will matter – “Low-hanging fruit” nobody’s yet dared attack the problem • Addresses major pain point – Move towards the ideal process Favor High Returns • Conduct a pilot project. Simultaneously: – “Hit a single”: automate key task, create dashboard, or graphic – “Swing for fences”: attack core weakness • Prove ROI early – Make allies and decisionmakers look good Focus on Data Availability • Be Intentional About Variety Data Availability • Don’t wait for Perfect Data Be Intentional About Variety • Enhance the value of text with structured data (or other unstructured data) • Big Data systems “integrate information from varied sources for • Structured data is deeper/broader unambiguous (unlike understanding” text) – Sue Feldman, CEO of Synthexsis (TAW San Francisco, 2013) • Helps to steer text analytics Don’t Wait For Perfect Data http://www.sas.com/knowledge-exchange/business-analytics/operationalizing-analytics/cant-start-yet-im-waiting-on-perfect-data/index.html • The perfect data are never available! – Don’t wait for it to be perfect – Don’t assume what you have is perfect • Can you acquire more data? Focus On Technical Resources • Focus on People, not Technology • Shake Things Up Technical Resources Focus on People, not Technology • Interdisciplinary – Experts needed in business area, statistics, algorithms, linguistics, and databases • Business Champion is essential! • Close ties between analytic experts and key personnel – Transfer technology – Internalize essential steps Shake Things Up • Increase your available skill set through training, hiring, etc. • Try a new software tool • Analyze a new dataset • Explore a new practice area of text analytics Text Mining Taxonomy Text Mining Foundations Are you interested in results about individual words or at a higher level (i.e., sentences, paragraphs or documents)? Documents Do you want to sort all documents into categories or search for specific documents ? Do you want to automatically identify specific facts or gain overall understanding? Sort Search Information Retrieval Specific Facts Do you have categories already? No Categories Document Clustering From Chapter 2 Words Have Categories Information Extraction Are your documents independent or connected via hyperlinks? Independent Document Classification Connected Web Mining Understanding Is your focus on the meaning of the text or the structure? Structure Natural Language Processing Meaning Concept Extraction Strategies for Success 1. Manage Expectations 2. Choose A Ripe Problem 3. Favor High Returns 4. Be Intentional About Variety Data Availability Business Need 5. Don’t wait for Perfect Data 6. Focus on People, not Technology Technical Resources 7. Shake Things Up (Graphic inspired by Paul Cohen, Empirical Methods for Artificial Intelligence, MIT Press) Contact Information Andrew Fast, Ph.D. Chief Scientist [email protected] (434) 973-7673 www.datamininglab.com 17 Practical Text Mining • Winner of the 2012 PROSE award for Computing and Information Science • Written for a technical audience seeking more text experience • Includes trial versions of major software tools Andrew Fast" Chief Scientist, Elder Research, Inc. Dr. Andrew Fast leads research in Text Mining and Social Network Analysis at Elder Research, the nation’s leading data mining consultancy. ERI was founded in 1995 and has offices in Charlottesville VA and Washington DC, (www.datamininglab.com). ERI focuses on Federal, commercial, investment, and security applications of advanced analytics, including stock selection, image recognition, biometrics, process optimization, cross-selling, drug efficacy, credit scoring, risk management, and fraud detection. Dr. Fast graduated Magna Cum Laude from Bethel University and earned Master’s and Ph.D. degrees in Computer Science from the University of Massachusetts Amherst. There, his research focused on causal data mining and mining complex relational data such as social networks. At ERI, Andrew leads the development of new tools and algorithms for data and text mining for applications of capabilities assessment, fraud detection, and national security. Dr. Fast has published on an array of applications including detecting securities fraud using the social network among brokers, and understanding the structure of criminal and violent groups. Other publications cover modeling peer-to-peer music file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head coaches (work featured on ESPN.com). With John Elder and other co-authors, Andrew has written a book on Practical Text Mining, that was awarded the prose Award for Computing and Information Science in 2012. 19