Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Data Mining Part 1 Introductory Material Overview This module will introduce Data Mining and provide examples of successful applications. What is Data Mining? Why is it important? How do we mine data? Introductory Material Overview Text Book: Building Better Models with JMP Pro by Jim Grayson, Sam Gardner, and Mia L. Stephens Software: Excel JMP Pro Version 12 What is Data Mining? Data Mining is extracting useful information from massive quantities of data that businesses generate. It is exploration and analysis by automatic means, of large quantities of data, to discover actionable insights and rules. More applications? e-mail spam filters examine millions of emails to classify as spam or not banks trying to identify applicants more likely to default on loans fraud prediction credit card companies: questioning a charge? insurance companies: which claims are most likely fraudulent government: which tax returns are most likely fraudulent More applications? Text mining at Google and Yahoo helps order websites by relevance to your search. Retaining good customers for cell phone carriers (& banks): which customers are more likely to abandon service (that is, churn) discounts or other enticements might be offered Why is Data Mining so effective now? Data are warehoused and computerized. Collected through bar coding, scanning, RFID, internet, ERP systems, etc. Computing power is cheap Competitive pressure Commercial products available How do we do it? Two Basic Styles 1. Top-Down HYPOTHESIS TESTING Science SUPERVISED have a theory, experiment to prove/disprove 2. Bottom-Up KNOWLEDGE DISCOVERY UNSUPERVISED start with data, see new patterns Creativity Data Mining Process: CRISP-DM How does it work? Cross Industry Standard Process – for Data Mining A standard process model for data mining Independent of industry sector & technology CRISP-DM Phases Result: Business understanding Problem definition Data understanding Data collected Data preparation Clean Data Table Modeling Choose model(s) Evaluation Do Model results achieve objectives? Deployment Use the model to make decisions CRISP-DM Phases Business understanding Data understanding Data preparation Modeling Evaluation Deployment Solve a specific problem Clear definition helps Measurable success criteria Convert business objectives to set of datamining goals What to achieve in technical terms CRISP-DM Phases Business understanding Data understanding Data preparation Modeling Evaluation Deployment Data can come from many sources Internal ERP system, data warehouse, External Government data, commercial, Created Research CRISP-DM Phases Business understanding Data understanding Data preparation Modeling Clean data format, identify gaps, filter outliers & redundancies sampling rare events Transform & create dataset for modeling Types of data Evaluation Deployment Nominal Ordinal Continuous CRISP-DM Phases Business understanding Data understanding Data preparation Modeling Evaluation Deployment Data Treatment How many data sets? training, validation, test. Techniques Regression Decision Trees Neural Nets Boosted Trees Bootstrap Forest CRISP-DM Phases Business understanding Data understanding Data preparation Modeling Evaluation Deployment Check that the model is good and evaluate to assure that nothing is missing. Does model meet business objectives? Any important business objectives not addressed? Does model make sense? Is it actionable? CRISP-DM Phases Business understanding Data understanding Data preparation Modeling Evaluation Deployment Ongoing monitoring & maintenance evaluate performance against success criteria market reaction? competitor changes?