Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision Support and Business Intelligence Systems (9th Ed., Prentice Hall) Chapter 5: Data Mining for Business Intelligence Learning Objectives 5-2 Define data mining as an enabling technology for business intelligence Understand the objectives and benefits of business analytics and data mining Recognize the wide range of applications of data mining Understand the steps involved in data preprocessing for data mining Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Introduction Data is produced at a phenomenal rate Our ability to store has grown Users expect more sophisticated information How? UNCOVER HIDDEN INFORMATION DATA MINING 5-3 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Examples: What is (not) Data Mining? What is not Data Mining? Look up phone number in phone directory Query a Web search engine for information about “Amazon” What is Data Mining? Certain names are more prevalent in certain US locations (e.g. in Boston area,…) 5-4 Group together similar documents returned by search engine according to their context (e.g. Amazon.com, …) A customer with income between 10,000 and 20,000 and age between 20 and 25 who purchased milk and bread is likely to purchase diapers within 5 years. The amount of fish sold to people living in a certain area and have income between 20,000 and 35,000 is increasing. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Data Mining: the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. Potential Result: Higher-level meta information that may not be obvious when looking at raw data Similar terms 5-5 Exploratory data analysis Data driven discovery Deductive learning Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 5 Decisions in Data Mining Databases to be mined Relational, transactional, object-oriented, object-relational, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc. Knowledge to be mined Association, classification, clustering, etc. Techniques utilized Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. Applications adapted 5-6 Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall DBMS and Data Mining 5-7 DBMS Data Mining Task Extraction of detailed and summary data Knowledge discovery of hidden patterns and insights Type of result Information Insight and Prediction Method Deduction (Ask the question, verify with data) Induction (Build the model, apply it to new data, get the result) Example question Who purchased mutual funds in the last 3 years? Who will buy a mutual fund in the next 6 months and why? Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Tasks Prediction Tasks Use some variables to predict unknown or future values of other variables Description Tasks Find human-interpretable patterns that describe the data. Common data mining tasks Classification Clustering 5-8 [Descriptive] Identify customers with similar buying habits.(Clustering) Association Rule Discovery [Predictive] Find all credit applicants who are poor credit risks. (classification) [Descriptive] Find all items which are frequently purchased with milk Sequential Pattern Discovery [Descriptive] Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall A Taxonomy for Data Mining Tasks Data Mining Learning Method Popular Algorithms Supervised Classification and Regression Trees, ANN, SVM, Genetic Algorithms Classification Supervised Decision trees, ANN/MLP, SVM, Rough sets, Genetic Algorithms Regression Supervised Linear/Nonlinear Regression, Regression trees, ANN/MLP, SVM Unsupervised Apriory, OneR, ZeroR, Eclat Link analysis Unsupervised Expectation Maximization, Apriory Algorithm, Graph-based Matching Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique Unsupervised K-means, ANN/SOM Prediction Association Clustering Outlier analysis 5-9 Unsupervised K-means, Expectation Maximization (EM) Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Classification: Definition Given a collection of records (training set ) Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. 5-10 Each record contains a set of attributes, one of the attributes is the class. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Classification Example Tid Refund Marital Status Taxable Income Cheat Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No No Single 75K ? 2 No Married 100K No Yes Married 50K ? 3 No Single 70K No No Married 150K ? 4 Yes Married 120K No Yes Divorced 90K ? 5 No Divorced 95K Yes No Single 40K ? 6 No Married No No Married 80K ? 60K 10 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 5-11 Training Set Learn Classifier Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Test Set Model Classification: Application Example Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. Approach: Use the data for a similar product introduced before. We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute. Collect various demographic, lifestyle, and companyinteraction related information about all such customers. 5-12 Type of business, where they stay, how much they earn, etc. Use this information as input attributes to learn a classifier model. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Clustering Definition Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that 5-13 Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Clustering: Application Example Market Segmentation: Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. Approach: 5-14 Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Association Rule :Application Example Supermarket shelf management. Goal: To identify items that are bought together by sufficiently many customers. Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items. A classic rule - 5-15 If a customer buys diaper and milk, then he is very likely to buy beer: Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Preparation – A Critical DM Task Real-world Data Data Consolidation · · · Collect data Select data Integrate data Data Cleaning · · · Impute missing values Reduce noise in data Eliminate inconsistencies Data Transformation · · · Normalize data Discretize/aggregate data Construct new attributes Data Reduction · · · Reduce number of variables Reduce number of cases Balance skewed data Well-formed Data 5-16 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Examples of Data Mining applications: Retail / Marketing Banking Identifying successful medical therapies. Banking and Other Financial 5-17 Detecting patterns of CC fraud Identifying loyal customers. Medicine Identifying buying patterns of customers. Predicting response to mailing campaigns. Automate the loan application process Detecting fraudulent transactions Maximize customer value (cross-, up-selling) Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Examples of Data Mining applications: Customer Relationship Management Manufacturing and Maintenance 5-18 Maximize return on marketing campaigns Improve customer retention Maximize customer value (cross-, up-selling) Identify and treat most valued customers Predict/prevent machinery failures Identify anomalies in production systems to optimize the use manufacturing capacity Discover novel patterns to improve product quality Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Applications 5-19 Brokerage and Securities Trading Predict changes on certain bond prices Forecast the direction of stock fluctuations Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Data Mining Software SPSS PASW Modeler (formerly Clementine) RapidMiner SAS / SAS Enterprise Miner Microsoft Excel R Your own code Commercial Weka (now Pentaho) SPSS - PASW (formerly Clementine) SAS - Enterprise Miner IBM - Intelligent Miner StatSoft – Statistical Data Miner … many more Free and/or Open Source KXEN Weka RapidMiner… MATLAB Other commercial tools KNIME Microsoft SQL Server Other free tools Zementis Oracle DM Statsoft Statistica Salford CART, Mars, other Orange Angoss C4.5, C5.0, See5 Bayesia Insightful Miner/S-Plus (now TIBCO) Megaputer Viscovery Clario Analytics Alone Thinkanalytics Source: KDNuggets.com, May 2009 5-20 Total (w/ others) Miner3D 0 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 20 40 60 80 100 120