Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Data Abstraction Research Group IBM TJ Watson Research Center [email protected] http://www.research.ibm.com/dar Overview • Knowledge discovery and data mining (KDD) techniques are used for analyzing and discovering actionable insights from data. • The talk will – Provide technical descriptions of the core algorithms that comprise data mining analytics – Describe some business application scenarios for KDD – Discuss issues in business intelligence systems – Map trends in this area Background – Widespread and explosive growth in use and size of databases • Traditional use: query based report generation – Size and volumes raise new issues: • will data help business to achieve an advantage • can data be used to model underlying processes and predict their behavior • can we understand the data – Providing capabilities to support exploration, summarization, and modeling of large databases is the goal of Business Intelligence systems 1 From Transactions to Warehouses • Transactional databases: – Reliable and accurate data capture; logging, book -keeping • Data warehousing: – Turning transactional data into a history repository • Can be queried for summaries and aggregate reports – First step in transforming transactional data (primary purpose: reliable storage) to one whose primary use is business intelligence – May require integration of multiple sources of data • Dealing with multiple formats; multiple database systems; integrating distributed databases; data cleaning; creating unified logical view of underlying non-homogeneous data On-Line Analytical Processing (OLAP) • Supports query driven exploration of the data warehouse – Utilizing pre-computed aggregates along data dimensions – Deciding which aggregates to pre-compute and how to derive or reliably estimate from pre-computed projections – Extends the Structured Query Language (SQL) framework to accommodate queries that would otherwise have been computationally impossible on a relational database management system Beyond OLAP • Supporting queries at much more abstract level than SQL and OLAP – Computer-driven exploration of data as opposed to human analyst -driven – Facilitating data exploration of high dimensional data • Providing solutions when user cannot describe goal in terms of a specific query – e.g. discovering fraudulent cases in credit card or telephone uses • Visualizing and understanding massive volumes of highdimensional data • Rates of growth of data sets exceed by far any rates with which traditional human analyst techniques can cope 2 A Definition for Data Mining • Automated search procedures for discovering credible and actionable insights from large volumes of high dimensional data – emphasis upon • symbolic learning and modeling methods (i.e. techniques that produce interpretable results) • data management methods – use of techniques from statistics, pattern recognition, and machine learning • machine learning and statistical modeling also heavily used in vision, speech recognition, image processing, handwriting recognition, natural language understanding, etc. – issues of scalability and automated business intelligence solutions drive much of and differentiate data mining from the other applications of machine learning and statistical modeling Machine Learning and Statistical Modeling serve as an important core Temporal Modeling Complex Pattern Detection Systems Performance Management Feature Creation and Analysis Markov Modeling Data Management Business Decision Support Systems Knowledge Discovery and Data Mining Speech Understanding Machine Learning Statistical Modeling Computational Linguistics Statistical Text Processing Handwriting Recognition Vision / Image Knowledge Management NLP/NLU Others (Agents, Education, etc..) Typical Business Intelligence Applications • Risk Analysis – Given a set of current customers and their finance/insurance history data, build a predictive model that can be used to class ify a new customer into a risk category • Targeted Marketing – Given a set of current customers and history on their purchases and their responses to promotions, target new promotions to those most likely to respond • Customer Retention – Given a set of past customers and their behavior prior to leaving, predict who is most likely to leave and take proactive action • Fraud Detection – Detect fraudulent activities either proactively or on-line real-time • Many other new applications keep surfacing 3 There’s More to it Than Just Data Mining • The process of identifying valid, novel, potentially useful, and understandable patterns in data requires one or more of: – – – – – Selecting or sampling data from a data warehouse Cleaning or pre-processing it Transforming or reducing it Applying a data mining component to extract models or patterns Evaluating the derived structure • The process is also known as KDD (Knowledge Discovery from Data Mining) – Data mining is a key component concerned with the algorithmic means by which structures are extracted from data while meeting computational efficiency constraints The KDD Process Identify Business Opportunity Select Data Warehouse Transform Mine Assimilate Selected Data data mining n. The process of extracting valid, previously unknown, and ultimately comprehensible and actionable information from large databases and using it to make crucial business decisions. Visualization Data Mining Techniques • Predictive Modeling – Predict a specific attribute (database field) based upon the oth er attributes (fields) in the data • Clustering (Segmentation) – Group data records into subsets where items in subsets are more similar to each other than to items in other subsets • Frequent Patterns – Find interesting similarities between a few attributes in subsets of the data • Change & Deviation – Detect and account for interesting sequence of information in data records • Dependencies – Generate the joint probability density function that might have generated the data 4 Predictive Modeling • Estimate a function ? that maps points from an input space ? to an output space ? ??given only a finite sampling of the mapping – Predict value of field (? ??in a database based on the other fields (? ? • Accurately construct an estimator ƒ’ of ƒ from a finite sample known as the training set – May be corrupted (i.e. noisy) • If predicted quantity is numeric (i.e. ? ?R, the real line) then the prediction problem is that of regression modeling • If the predicted quantity is discrete (i.e. ??????????? ) then the prediction problem is that of classification modeling Issues in Predictive Modeling • Transformations on input space X to improve estimation capability – Feature extraction / construction / selection • Evaluating the estimate ƒ’ in terms of how well it performs on data not present in the training set – Maximizing prediction accuracy by avoiding underfitting or over-fitting • Trading off model complexity versus model accuracy – Bias-variance tradeoff, penalized likelihood, minimum message length (MML) or minimum description length (MDL) Classification • Predicting the most likely state of a categorical variable (the class) given the values of other variables – Density estimation problem: deriving the value of Y given x?? from the joint density on Y and ? • Kernel density estimators • Metric-space based methods (k-nearest neighbor) • Projection into decision regions – divide attribute space into decision regions and associate prediction with each region – Linear classifiers, neural networks, decision trees, disjunctive normal form (DNF) rule-based classifiers – Projection methods by far the most practical for data mining 5 Regression • Predicting the most likely value of a numerical variable (the target column) given the values of other variables – Numerical function approximation problem: deriving the value of Y given x?? from the joint probability distribution on Y and ? • Statistical probability models (e.g. linear regression) • Projection into decision regions – divide attribute space into decision regions and associate constant value with each region – Neural networks, decision trees, disjunctive normal form (DNF) rule-based classifiers • Hybrid – Coupling projection methods with statistical models – Projection and hybrid methods by far the most practical for data mining The Predictive Modeling Process Mine historical data to train patterns/models that can predict future behaviors Data Behaviors Response to Direct Mail Product Quality (Defects) Declining Activity Credit Risk Delinquency Likelihood to buy specific products Profitability etc. Score with models to reflect likelihood to exhibit the modeled behavior Act to optimize business objectives based on these scores 120 120 100 100 80 80 60 60 40 40 20 20 0 0 0 10 20 30 40 50 60 70 80 90 100 5 15 25 35 45 55 65 75 85 95 Decision Trees for Predictive Modeling • Tree generation algorithm – Beginning with the training set at the root node, recursively split until a stopping criteria is met • Split using “best” test among all possible tests on all attributes • Prune tree (MDL, cross-validation, etc.) 6 Issues in Decision Tree Building • Splitting at nodes – greedy search: GINI (entropy minimizing), class probability profile difference, log-loss likelihood, etc. – exhaustive search: ReliefF, Contextual Merit, etc. • Testing of attributes – Numerical attributes: inequality tests on cut -points – Categorical attributes: subset tests • Leaf models – Piecewise constant – Linear – Probability functions • Scalability A Typical Decision Tree Expected Sales Revenue Historical Sales per Mlg Less than 7 Hist Avg Sales per Order Less than 113 Segment 1 Historical Sales per Mlg 7-15 Hist Avg Sales per Order Greater or equal to 113 Climate Indicator 0 Segment 2 Credit Limit Less than 2200 Segment 6 Historical Sales per Mlg Greater than 15 Segment 8 Credit Limit Greater or equal to 2200 Segment 7 Climate Indicator 1 Risk Score Less than 687 Segment 3 Risk Score Greater or equal to 687 Outdoor Purchases Less than 3 Segment 4 Outdoor Purchases Greater or equal to 3 Segment 5 Training a Predictive Model Observations Predictive Model Predicted Outcomes Prediction Errors Actual Outcomes 7 Training Generalization Training Validation Too many segments Over fit Too few segments Under fit About right About right Optimal Training Prediction Error Validation Error Training Error Optimum Rule Set Number of Rules (degrees of freedom) Clustering • Given a finite sampling of points, group them into sets of similar points – Representing clusters of points with common characteristics • In predictive modeling, class (or value) membership is known in the training data – In clustering, this knowledge is not known apriori, and is perhaps being discovered by clustering or segmentation 8 Techniques for Clustering/Segmentation • Two-stage approach – outer loop to determine cluster number k – inner loop to fit points to clusters • Metric distance-based methods; find best k-way partition so that points in a partition are closer to each other than to points in other partitions • Model based methods: a best fit (very typically probabilistic) model is hypothesized for each cluster • Partition based methods: iteratively enumerating and scoring various partition scenarios using heuristic scoring functions k-Mean Clustering Algorithm • Widely used in data mining – Given k cluster centers c 1,j,c 2,j ,…,c k,j at iteration j, compute c 1,j+1,c 2,j+1,…,c k,j+1 • Cluster assignment: For each i=1,…,m, assign x i to cluster l(i) such that c l(i),j is nearest to x i • Cluster Center Update: For l=1,…,k set cl,j+1 to be the mean of all x i assigned to c l,j – Stop when c l,j= c l,j+1, l=1,…,k • Extensions include support for scalability, efficient placement of initial k means, and (harder problem) determining the number of clusters k Frequent Patterns • Extracting compact patterns that describe subsets of data – Row-wise patterns – Column-wise patterns • Association rules: detecting combinations of attribute values that occur with a minimum level of frequency (support) and certainty (confidence) – Scalable algorithms can find all such rules in linear time under certain conditions of data sparseness – Rules are not statements about causal effects amongst attributes, but can still provide useful insights 9 Change and Deviation • Detecting sequence information, temporal or otherwise – Ordering information of transactions (rows) is utilized – Under certain conditions of data sparseness, sequences with desired levels of frequency and certainty can be computed in linear time Dependency Modeling • Detecting causal structure within data – Causal models • Discovering probabilistic distributions governing the data • Discovering functional dependencies between attributes in the data – Techniques • Density estimation methods – Expectation maximization • Explicit causal modeling – Bayesian networks Applying Data Mining in Business Profit Customer Satisfaction Who are the best customers to sell my products to? What are the most effective market segments for my business? How do I increase market share of my products? How do I reduce my costs and not impact production? How do I optimize my inventory? Efficiency 10 Mapping Operations into Applications • Predictive Modeling – Assigning risk levels to new insurance and financial contracts • Clustering / Segmentation – Identifying distinct market groups in customer population • Frequent Patterns – Market basket analysis (what gets shopped together in a supermarket) • Change and Deviation – Fraud discovery in health claim data – Discovering shopping patterns over time Business Application Opportunities Retail/Distribution Category management Merchandise planning Product management Production planning/tracking Insurance/Healthcare Claims analysis Provider analysis Managed care Outcomes analysis Telecommunications Utilities Industrial customer profiles Financial analysis "Bulk Power" analysis Customer profiles/ segmentation Product profitability Demand forecasting Usage analysis Budgeting Financial reporting Demographics Product costing Manufacturing quality and efficiency Parts analysis Transportation Yield management Pricing/rate analysis Logistics Financial Cross Industry Government Manufacturing Market basket analysis Target marketing Customer segmentation Customer service Fraud and abuse Financial performance Customer profitability and segmentation Products/portfolio profitability Risk management Cross-sale analysis Branch performance The Data Challenge Whe It? re Is W Me hat D an oe ? s it t I Ge Can How It? Wha In? t Forma t Is It No Single View of Data Multiple Data Sources Many Interfaces Difficult to Access Different Formats, Platforms Inconsistencies, Redundancies 11 An Efficient Environment for Data Mining Enterprise Data Model Enterprise Warehouse Operational Data Global Specialized Analysis Datamarts Datamart Data Mining Analysis External Data Transaction Data Transformation (legacy system context removal) Meta-data Definition (e.g. consistent business terms) Analytical dimension definition (e.g. time, policyholder) Summarization Aggregation Identification / Collection Review Conversion / Reduction / Normalization Representation The Business Intelligence Process Access Transform Operational & External Data Enhancing Summarizing Aggregating Distribute Staging Joining From Multiple Sources Populating On-Demand Store Relational Data Multiple Platforms & Hardware Find & Understand Discover, Analyze, Visualize Information Catalog Query Business Views Data Interpretation Models MultiDimensional Analysis Automate & Manage Data Mining Data Flow & Process Flow Open Interfaces Multi-Vendor Support Data Mining Marketplace Status • Enabler for business intelligence systems – Data mining algorithm suites • Loosely coupled with database technology – Emphasis on data warehousing followed by exploratory data mining – Typically conducted by consultants or in-house analytic teams • Issues – Data warehouse requirements – Sophisticated analytics requirements 12 Key Challenges and Trends • Infrastructure – Enabling transparent and pervasive usage • Algorithms – Optimized and robust mining • Solutions – Vertically integrated for critical problems – Enhanced emphasis on the Internet Infrastructure • Making data mining transparent – Database extenders • e.g. DB2/UDB User Defined Functions for model training/scoring • Sufficient statistics (e.g. histograms, counts, samples, etc.) – Parallel and distributed data mining • Scalability (sampling and parallelization) – XML based APIs for database coupling and application embedding • Interoperability • Training and scoring in different environments – Intelligent or semi-automated data warehousing for mining • Industry specific templates • Meta-data mining Algorithms • Robust and Automated – Evaluation metrics – Automated feature extraction / transformation / selection – Discovering relational and hierarchical structures amongst attributes – Incorporating prior knowledge to account for costs / benefits / uncertainty / missing values – Incremental and on-line mining – Privacy preserving data mining – Heterogeneous data mining 13 Solutions • Business – Risk management – Targeted marketing – Portfolio management • Systems – Performance management • Internet – Site profiling and performance tuning – User personalization Summary • Data mining is being embedded in vertical solutions for business intelligence and decision support – Data Management – eCommerce – Critical Large-Scale Solutions (CRM, etc.) • Using data mining should eventually become as easy and pervasive as working with databases and spreadsheets today References • Mathematical Programming for Data Mining: Formulations and Challenges, by Bradley et al., INFORMS Journal on Computing, Volume 11, No. 3, Summer 1999 • KDD Nuggets – http://www.kdnuggets.com • IBM – http://www.research.ibm.com/compsci/kdd • http://www.research.ibm.com/dar – http://www.ibm.com/bi 14