Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Top 10 Data Mining Mistakes by John Elder You’ve made a mistake if you… 0. Lack Data 1. Focus on Training 6. Discount Pesky Cases 7. Extrapolate 8. Answer Every Inquiry 9. Sample Casually 2. Rely on One Technique 3. Ask the Wrong QuesGon 4. Listen (only) to the Data 5. Accept Leaks from the future 10. Believe the Best Model Copyright © 2012 Elder Research, Inc. 1 Why business mistakes? • Not all data mining projects are a success • Approximately 90% of projects meet their technical goals • Approximately 65% of solutions are actually deployed at the client organization Copyright © 2012 Elder Research, Inc. 2 Who benefits? • Business leaders: need to establish the best environment and culture to ensure technical and business success • Technical leaders: need to understand the business obstacles that the business client are facing and need to avoid those mistakes directly and indirectly Copyright © 2012 Elder Research, Inc. 3 You might be making a data mining business mistake if you… #1: Fail to define an objective • Without a clear objective, data mining can be an exercise in futility • Find a pain point and find the best approach to solve it Copyright © 2012 Elder Research, Inc. 5 #2: Start too big • Developing a transformational data mining service can be a major undertaking requiring extensive energy and resources • Unless there is complete organizational investment, the task can be overwhelming and quickly result in frustration and failure • Starting small allows the organization to get a feel for what it takes to succeed • Shoot for an early “small success” Copyright © 2012 Elder Research, Inc. 6 #3: Lack support from the keepers of the data • Modelers not only need timely access to data, but information about the data • Access is needed to the people who are familiar with the data and know: – how it is collected and maintained – why it is messy and/or incomplete – what each data field means – how the data is used – ensure that the transformed data properly represents the business use and understanding Copyright © 2012 Elder Research, Inc. 7 Copyright © 2012 Elder Research, Inc. 8 #3: Lack support from the keepers of the data • Modelers not only need timely access to data, but information about the data • Access is needed to the people who are familiar with the data and know: – how it is collected and maintained – why it is messy and/or incomplete – what each data field means – how the data is used – ensure that the transformed data properly represents the business use and understanding Copyright © 2012 Elder Research, Inc. 9 #4: Wait for perfect data • No matter how long one works at it, data will never be perfect • Good modelers expect to work with messy data and have tools to deal with it • Give them the data you have and let them go to work Copyright © 2012 Elder Research, Inc. 10 #5: Believe you have perfect data • Understanding, cleansing, and preparing data accounts for 65-80% of time on data mining engagements • Even with relatively clean data, it is a necessary process that takes time for effective modeling Copyright © 2012 Elder Research, Inc. 11 Recent Client with Perfect Data • Early (unexpected) insights – Call volumes are greater when HCP info is inconsistent, suggesting that some outbound calls may have a primary purpose of data verification, rather than order generation • Data anomalies – Sometimes Ship_Date precedes Call_Date – Some orders have multiple call dates, sometimes many months apart. • Modeling decision – Use call rather than ship date, since this is causal to an order Copyright © 2012 Elder Research, Inc. 12 #6: Rely too heavily on software • Lots of good software on the market • Even the best software requires expert users (data miners) to make it work • Software is a tool to be used to build models to obtain valuable outputs • Expecting the tool to do it all results in wasted money and shelf space Copyright © 2012 Elder Research, Inc. 13 Copyright © 2012 Elder Research, Inc. 14 15 #7: Don’t understand the different levels of analytics Copyright © 2012 Elder Research, Inc. 16 The 9 Levels of Analytics Descriptive Techniques: 1 – Standard Reporting 2 – Custom Reporting or “Slicing and Dicing” the Data (Excel) 3 – Queries/drilldowns (SQL, OLAP) 4 – Dashboards/alerts (Business Intelligence) 5 – Statistical Analysis 6 – Clustering (Unsupervised Learning) Predictive Techniques: 7 – Predictive Modeling 8 – Optimization & Simulation 9 – Next Generation Analytics – Text Mining & Link Analysis Copyright © 2012 Elder Research, Inc. 17 #7: Don’t understand the different levels of analytics • A lot of people are caught up in the buzz of data mining and analytics in general • A lack of understanding of the different levels of analytics can result in wasted money and time • Understanding the different levels of analytics can help an organization to better focus resources on the right solution Copyright © 2012 Elder Research, Inc. 18 #8: Exclude the domain SMEs • Having expert data miners is not enough • SMEs needed to: – Provide business understanding – Provide common sense checks to the modeling process – Maximize use of the models • Buy-in from the internal SME helps to make believers out of the non-SMEs Copyright © 2012 Elder Research, Inc. 19 #9: Don’t plan for deployment • Building the models is only the start • Deployment within the organization infrastructure can be a larger effort in terms of resources and time • Need to decide on the deliverable format and work with IT to figure out how it will be accomplished Copyright © 2012 Elder Research, Inc. 20 Copyright © 2012 Elder Research, Inc. 21 #10: Rush the process • Good modeling is a methodical, iterative process. • Working with the data can take 65-80% of the project time • Cutting corners will result in weak or incorrect models Copyright © 2012 Elder Research, Inc. 22 CRISP-DM Copyright © 2012 Elder Research, Inc. 23 Summary 1. Fail to define an objective 2. Start too big 3. Lack support from the keepers of the data 4. Wait for perfect data 5. Believe you have perfect data 6. Rely too heavily on software 7. Don’t understand the different levels of analytics 8. Exclude the domain SMEs 9. Don’t plan for deployment 10. Rush the process Copyright © 2012 Elder Research, Inc. 24