Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics www.LearnPredictiveAnalytics.com BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution Dimensional slicing Mostly as-is reporting DATA MINING - Finding useful patterns in data Limited distribution Algorithms Insights and Predictions DATA MINING Data Mining in simpler terms, is finding useful patterns in the data. “It is non-trivial process of finding useful, valid, novel, understandable patterns or relationships in the data to make important decisions” (Fayyad et al., 1996) Statistics Quantitative Operations Research Computing Machine Learning Data Stores Computation Machine Learning, Optimization, Algorithms DATA MINING: MODELS DATA MINING: TYPES Tasks Regression Classification Feature Selection Clustering Data Mining Text Mining Anomaly detection Time Series Applications Association DATA MINING: TYPES Tasks Examples Classification Assigning voters into known buckets by political parties eg: soccer moms. Bucketing new customers into one of known customer groups. Regression Predicting unemployment rate for next year. Estimating insurance premium. Anomaly detection Fraud transaction detection in credit cards. Network intrusion detection. Time series Sales forecasting, production forecasting, virtually any growth phenomenon that needs to be extrapolated Clustering Finding customer segments in a company based on transaction, web and customer call data. Association analysis Find cross selling opportunities for a retailer based on transaction purchase history. DATA MINING: TYPES Tasks Algorithms Classification Decision Trees, Neural networks, Bayesian models, Induction rules, K nearest neighbors Regression Linear regression, Logistic regression Anomaly detection Distance based, Density based, LOF Time series Exponential smoothing, ARIMA, regression Clustering K means, density based clustering - DBSCAN Association analysis FP Growth, Apriori DATA MINING: PROCESS DATA MINING: PROCESS DATA MINING: PROCESS DATA MINING: PROCESS DATA MINING: PROCESS DATA MINING: PROCESS Data Mining Scoring 625 DATA MINING: PROCESS Data Mining + Business Intelligence CLASSIC BI ARCHITECTURE Security Layer Extraction Transformation &Loading Star Schema Staging OLAP Dashboards, reports, alerts, ad hoc... ANALYTICAL ARCHITECTURE #1 Data Mining Tool Scoring Data Mining Tool Extraction Transformation &Loading Star Schema Staging OLAP Dashboards, reports, alerts, ad hoc... Data Mining tool does the scoring. Robust modeling and scoring capabilities. BI tool reports the scored like any other data points. Limitations: New records cannot be scored, unless scoring is provided by DM tool. Required multiple analytical tools. ANALYTICAL ARCHITECTURE #2 Database Scoring Extraction Transformation &Loading Star Schema Staging OLAP Database does the scoring. Can handle large data. Model, scoring and data in one place. Limitations: DB vendors have to provide full DM suite. Analysis Skills Dashboards, reports, alerts, ad hoc... ANALYTICAL ARCHITECTURE #3 BI Scoring: Native Modeling Extraction Transformation &Loading Star Schema Staging OLAP Dashboards, reports, alerts, ad hoc... BI platform does the scoring. Good integration between predictive metrics with BI metrics. Security. Distribution. Real time scoring. Limitations: Performance. Limited Functionality ANALYTICAL ARCHITECTURE #4 BI Scoring: Data Mining Tool Modeling Extraction Transformation &Loading Star Schema Data Mining Tool Staging OLAP Dashboards, reports, alerts, ad hoc... PMML Model BI platform does the scoring. Modeled by DM tool and imported in BI platform. Real time scoring. Supports wide selection of algo. Limitations: Performance. ANALYTICAL ARCHITECTURE Data Mining Tool Scoring Database Scoring BI Scoring - Native Modeling - Data Mining Tool Modeling USE CASE Association Analysis or Market Basket Analysis CLICKSTREAM DATA Can be generalized to transactions Applies to any product purchases in an enterprise CLICKSTREAM DATA Creation of Association Rules CLICKSTREAM DATA Creation of Association Rules CLICKSTREAM DATA Creation of Association Rules DATA MINING USING BI SYSTEM Model Building in BI MicroStrategy Desktop > Data Mining Services DATA MINING SERVICE MicroStrategy Desktop > Data Mining Services DATA MINING SERVICE MicroStrategy Desktop > Data Mining Services DATA MINING SERVICE MicroStrategy Desktop > Data Mining Services MODEL DETAILS MicroStrategy Desktop > Data Mining Services RESULTS MicroStrategy Desktop > Data Mining Services RESULTS PMML MicroStrategy Desktop > Data Mining Services PMML PMML PMML PMML PMML BI VS. DATA MINING THINKING Number of customers lost last month Production downtime report ROI for Marketing Campaigns Yesterday’s revenue Who will most likely churn in next 10 days What part of process will fail and mitigation Whats the next action will the prospect make Tomorrow’s Data Mining + Business Intelligence ISSUES Data Mining Business Intelligence - People: Skills of data mining and business intelligence are exclusive - Organization: They live in different organizations within an enterprise - Technology: Minimal overlap in the tools, platform and technology - Use cases: History reporting vs. prediction and insights BENEFITS Data Mining Business Intelligence - Distribution: Data Mining insights will have wider real time distribution - Smarter Analytics: History + Predictions - Visual discovery: Common link - Security: Secure delivery of insights RECOMMENDED READING Advanced Reporting Guide: Enhancing Your Business Intelligence OPEN SOURCE DATA MINING TOOLS THANK YOU Vijay Kotu linkedin.com/in/vkotu www.LearnPredictiveAnalytics.com Data Mining + Business Intelligence Appendix CLUSTERING CLUSTERING CLUSTERING CLUSTERING Data Set CLUSTERING k-Means Clustering CLUSTERING CLUSTERING CLUSTERING CLUSTERING DECISION TREES DECISION TREES DECISION TREES DECISION TREES DECISION TREES DECISION TREES DECISION TREES