Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining for Business Intelligence Learning Objectives Define data mining as an enabling technology for business intelligence Understand the objectives and benefits of business analytics and data mining Recognize the wide range of applications of data mining Learn the standardized data mining processes 5-2 CRISP-DM, SEMMA, KDD, … Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Learning Objectives Understand the steps involved in data preprocessing for data mining Learn different methods and algorithms of data mining Build awareness of the existing data mining software tools 5-3 Commercial versus free/open source Understand the pitfalls and myths of data mining Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Why Data Mining? 5-4 More intense competition at the global scale Recognition of the value in data sources Availability of quality data on customers, vendors, transactions, Web, etc. Consolidation and integration of data repositories into data warehouses The exponential increase in data processing and storage capabilities; and decrease in cost Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Definition of Data Mining 5-5 The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases. - Fayyad et al., (1996) Keywords in this definition: Process, nontrivial, valid, novel, potentially useful, understandable. Data mining: a misnomer? Other names: knowledge extraction, pattern analysis, knowledge discovery, pattern searching,… Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining at the Intersection of Many Disciplines ial e Int tis tic s c tifi Ar Pattern Recognition en Sta llig Mathematical Modeling Machine Learning Databases Management Science & Information Systems 5-6 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall ce DATA MINING Data Mining Characteristics/Objectives 5-7 Source of data for DM is often a consolidated data warehouse DM environment is usually a client-server or a Web-based information systems architecture Data is the most critical ingredient for DM which may include soft/unstructured data Data mining tools’ capabilities and ease of use are essential (Web, Parallel processing, etc.) Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data in Data Mining Data: a collection of facts usually obtained as the result of experiences, observations, or experiments Data may consist of numbers, words, images, … Data: lowest level of abstraction (from which information and knowledge are derived) Data - DM with different data types? Categorical Nominal 5-8 - Other data types? Numerical Ordinal Interval Ratio Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall What Does DM Do? DM extract patterns from data Types of patterns 5-9 Pattern? A mathematical (numeric and/or symbolic) relationship among data items Association Prediction Cluster (segmentation) Sequential (or time series) relationships Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall A Taxonomy for Data Mining Tasks Data Mining Learning Method Popular Algorithms Supervised Classification and Regression Trees, ANN, SVM, Genetic Algorithms Classification Supervised Decision trees, ANN/MLP, SVM, Rough sets, Genetic Algorithms Regression Supervised Linear/Nonlinear Regression, Regression trees, ANN/MLP, SVM Unsupervised Apriory, OneR, ZeroR, Eclat Link analysis Unsupervised Expectation Maximization, Apriory Algorithm, Graph-based Matching Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique Unsupervised K-means, ANN/SOM Prediction Association Clustering Outlier analysis 5-10 Unsupervised K-means, Expectation Maximization (EM) Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Applications Customer Relationship Management Banking and Other Financial 5-11 Maximize return on marketing campaigns Improve customer retention Maximize customer value Identify and treat most valued customers Automate the loan application process Detecting fraudulent transactions Optimizing cash reserves with forecasting Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Applications (cont.) Retailing and Logistics Manufacturing and Maintenance 5-12 Optimize inventory levels at different locations Improve the store layout and sales promotions Optimize logistics by predicting seasonal effects Predict/prevent machinery failures Discover novel patterns to improve product quality Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Applications Brokerage and Securities Trading Insurance 5-13 Predict changes on certain bond prices Forecast the direction of stock fluctuations Assess the effect of events on market movements Identify and prevent fraudulent activities in trading Forecast claim costs for better business planning Optimize marketing to specific customers Identify and prevent fraudulent claim activities Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Applications (cont.) 5-14 Computer hardware and software Science and engineering Government and defense Homeland security and law enforcement Travel industry Healthcare Highly popular application areas for data mining Medicine Entertainment industry Sports Etc. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Process: CRISP-DM 1 Business Understanding 2 Data Understanding 3 Data Preparation Data Sources 6 4 Deployment Model Building 5 Testing and Evaluation 5-15 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Process: CRISP-DM Step Step Step Step Step Step 5-16 1: 2: 3: 4: 5: 6: Business Understanding Data Understanding Data Preparation (!) Model Building Testing and Evaluation Deployment Accounts for ~85% of total project time The process is highly repetitive and experimental Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall