Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Data Mining Lifecycle Carlos Bossy Principal Consultant MCTS, MCITP BI Aabcom Solutions www.aabcomsolutions.com www.carlosbossy.com Background Experience 6+ years Business Intelligence Consultant 15 years Software Development as Programmer thru CTO Current Projects Data Warehouse development for State of Oregon Child Welfare Integration/Warehouse/Mining Architecture for Solar Energy Manufacturer Data Mining model for City Government SQL Saturday o Essential Foundations of a Business Intelligence Architecture o Real-time Data Integration o SSIS Patterns and Best Practices Why do you give a Da#@! Predictive Modeling In-Database Analytics Complete BI Solution ROI of Business Analytics Projects when Incorporating Predictive Analytics (Source: IDC) 145% vs 89% Is Data Mining an integral component of your Data Architecture? What is Data Mining? Prognostication - Forecasts - Predictions No Math! The Lifecycle Develop the Problem Statement What do you want to forecast? What data is available to make the prediction? State the Solution with a Specific Target Foster Care Forecast Faster return home is better for Child, saves Money Time Child will stay in Foster Care? Child Profile and History • • • Average stay in Foster Care: 413 days Error: 50% 206 days Target Error: 20% 82 days Gather Data Data Mining DB • Data Warehouse • OLTP • Cube • External • User Data Data Transformation A robust Business Intelligence Architecture requires a Data Model suitable for the function it is delivering. • • • • • Transform the data to make sense in your Model Get derivations from SMEs, Data Analysis, Groupings Discrete vs. Continuous Analyze Quantities (Are 6 moves 3 times worse than 2?) Review Code Meanings for Groupings (1 threat, 2-5 harmful, 6 fatality ) Data Manipulation Age –> Continuous values 0 – 17 Age Bands –> 0-2, 3-6, 7-10, 11-14, 15-17 Exact Age –> Continuous values with 2 decimals 0 - 17.99 Normalized Age –> Continuous values with 2 decimals 0 - .999 (Exact Age divided by 18) 1 Threat of Harm 2 Mental Injury 3 Neglect 4 Physical Abuse 5 Sexual Abuse/Exploitation 6 Fatality Neglect –––> Abuse Other (Threat, Mental Injury) Derived Data Stock Market NFL Baseball Marketing Simple Moving Averages Passer Rating Batting Average Age Range Exponential Moving Averages Defenseadjusted Yards Above Replacement On-Base Percentage Zip Codes Moving Average Convergence Divergence Defenseadjusted Value Over Replacement OBP + Slugging Buying Patterns Percentage Price Oscillator QB Score Total Average Credit Card Usage Model Development Gather Test Transform Train Select Training Set Choose DM Algorithms Summary Thank you and Feedback • Questions? • Thank you for attending SQL Rally! • Please make sure to evaluate the session