Download A Data Mining Solution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The Data Mining Lifecycle
Carlos Bossy
Principal Consultant
MCTS, MCITP BI
Aabcom Solutions
www.aabcomsolutions.com
www.carlosbossy.com
Background
Experience


6+ years Business Intelligence Consultant
15 years Software Development as Programmer thru CTO
Current Projects
 Data Warehouse development for State of Oregon Child Welfare
 Integration/Warehouse/Mining Architecture for Solar Energy Manufacturer
 Data Mining model for City Government
SQL Saturday
o Essential Foundations of a Business Intelligence Architecture
o Real-time Data Integration
o SSIS Patterns and Best Practices
Why do you give a Da#@!
Predictive Modeling
In-Database Analytics
Complete BI Solution
ROI of Business Analytics Projects when
Incorporating Predictive Analytics (Source: IDC)
145% vs 89%
Is Data Mining an integral component
of your Data Architecture?
What is Data Mining?
Prognostication - Forecasts - Predictions
No Math!
The Lifecycle
Develop the Problem Statement
What do you want to forecast?
What data is available to make the prediction?
State the Solution with a Specific Target
Foster Care Forecast
Faster return home is better
for Child, saves Money
Time Child will stay
in Foster Care?
Child Profile and
History
•
•
•
Average stay in Foster Care: 413 days
Error: 50% 206 days
Target Error: 20% 82 days
Gather Data
Data
Mining DB
• Data Warehouse
• OLTP
• Cube
• External
• User Data
Data Transformation
A robust Business Intelligence Architecture requires a
Data Model suitable for the function it is delivering.
•
•
•
•
•
Transform the data to make sense in your Model
Get derivations from SMEs, Data Analysis, Groupings
Discrete vs. Continuous
Analyze Quantities (Are 6 moves 3 times worse than 2?)
Review Code Meanings for Groupings (1 threat, 2-5 harmful, 6 fatality )
Data Manipulation
Age –> Continuous values 0 – 17
Age Bands –> 0-2, 3-6, 7-10, 11-14, 15-17
Exact Age –> Continuous values with 2 decimals 0 - 17.99
Normalized Age –> Continuous values with 2 decimals 0 - .999 (Exact Age divided by 18)
1
Threat of Harm
2
Mental Injury
3
Neglect
4
Physical Abuse
5
Sexual Abuse/Exploitation
6
Fatality
Neglect
–––>
Abuse
Other (Threat, Mental Injury)
Derived Data
Stock
Market
NFL
Baseball
Marketing
Simple Moving
Averages
Passer Rating
Batting Average
Age Range
Exponential
Moving
Averages
Defenseadjusted Yards
Above
Replacement
On-Base
Percentage
Zip Codes
Moving Average
Convergence
Divergence
Defenseadjusted Value
Over
Replacement
OBP + Slugging
Buying Patterns
Percentage Price
Oscillator
QB Score
Total Average
Credit Card
Usage
Model Development
Gather
Test
Transform
Train
Select Training
Set
Choose DM
Algorithms
Summary
Thank you and Feedback
• Questions?
• Thank you for attending SQL Rally!
• Please make sure to evaluate the session