Download Neural Network Algorithm - QLD SQL Server User Group

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
• “The half-life of BI is typically shorter than
the life of the project needed for its
implementation.”
--Industry whitepaper (see references)
• “Predicting is hard…
• …especially about the future”
--Yogi Berra
• A recent Gartner Group Advanced
Technology Research Note listed data
mining at the top of the five key
technology areas that "will clearly have a
major impact across a wide range of
industries within the next 3 to 5 years."
• Data Mining finds patterns in data
• Data Mining finds patterns in data
– Using Machine Learning Algorithms
• Don’t worry: the hard yards are done
– A lot at Microsoft Research
• Data Mining finds patterns in data
• Uses these patterns to make predictions
SSAS
≠
Cube
Dimensional Modelling:
Build a Cube 
Learn MDX  
Construct Analyses
…of the PAST
Data Mining:
Build Structure 
Use Model 
Make Predictions
…about the Future
• Cubes summarize facts:
– For Example:
•
•
•
•
Sums of Sales in all regions for all months
Aggregated by Gender and Age
For each Product
…
• Cubes summarize facts:
– For Example:
•
•
•
•
Sums of Sales in all regions for all months
Aggregated by Gender and Age
For each Product
…
– Data mining find patterns in data
• Cubes summarize facts:
– For Example:
•
•
•
•
Sums of Sales in all regions for all months
Aggregated by Gender and Age
For each Product
…
– Data mining find patterns in data
– Cubes abstract much of the interesting
information
• Cubes summarize facts:
– For Example:
•
•
•
•
Sums of Sales in all regions for all months
Aggregated by Gender and Age
For each Product
…
– Data mining find patterns in data
– Cubes abstract much of the interesting
information
• Facts that form the patterns are lost in the Cube’s
summations
• Connect to Data Source
• Highlight Exceptions
• Forecasting
• Key Influencers
• Is it all just smoke and mirrors???
• Is it all just smoke and mirrors???
• “Excel data mining add-in was invented to
make astrology look respectable!”
– Donald Data, industry pundit
Jargon:
ADO = ActiveX Data Objects
ADO MD = ADO Multidimensional
AMO = Analysis Management Objects
DSO = Decision Support Objects
XMLA = XML for Analytics
Books Online

Contents
or…
Search
For
Data Mining Tutorials
• Business Intelligence Development Studio
• Demo: Key Influencers
– Models and Model Viewers
•
•
•
•
Decision Tree
Cluster
Naïve Bayes
Neural Network
Correlation

Tree Node
Correlation

Tree Node
• Hybrid
• Linear regression & association & classification
• Hybrid
• Linear regression & association & classification
• Algorithm highlights
• Remove rare attributes (“Feature Selection”)
• Hybrid
• Linear regression & association & classification
• Algorithm highlights
• Remove rare attributes (“Feature Selection”)
• Group values into bins for performance
• Hybrid
• Linear regression & association & classification
• Algorithm highlights
• Remove rare attributes (“Feature Selection”)
• Group values into bins for performance
• Correlate input attributes with outcomes
• Hybrid
• Linear regression & association & classification
• Algorithm highlights
•
•
•
•
Remove rare attributes (“Feature Selection”)
Group values into bins for performance
Correlate input attributes with outcomes
Find attribute separating outcomes with maximum
information gain
• Hybrid
• Linear regression & association & classification
• Algorithm highlights
Remove rare attributes (“Feature Selection”)
Group values into bins for performance
Correlate input attributes with outcomes
Find attribute separating outcomes with maximum
information gain
• Split tree and re-apply
•
•
•
•
• Algorithm options:
• Non-scalable (all records)
• Algorithm options:
• Non-scalable (all records)
• Scalable (50,000 records + 50,000 more if needed)
– 3 x faster than non-scalable
• Algorithm options:
• Non-scalable (all records)
• Scalable (50,000 records + 50,000 more if needed)
– 3 x faster than non-scalable
• K – means (hard)
• Algorithm options:
• Non-scalable (all records)
• Scalable (50,000 records + 50,000 more if needed)
– 3 x faster than non-scalable
• K – means (hard)
• Expectation Maximization (soft) (default)
• Algorithm options:
• Non-scalable (all records)
• Scalable (50,000 records + 50,000 more if needed)
– 3 x faster than non-scalable
• K – means (hard)
• Expectation Maximization (soft) (default)
– Form initial cluster
• Algorithm options:
• Non-scalable (all records)
• Scalable (50,000 records + 50,000 more if needed)
– 3 x faster than non-scalable
• K – means (hard)
• Expectation Maximization (soft) (default)
– Form initial cluster
– Assign probability each attribute-value in each cluster
• Algorithm options:
• Non-scalable (all records)
• Scalable (50,000 records + 50,000 more if needed)
– 3 x faster than non-scalable
• K – means (hard)
• Expectation Maximization (soft) (default)
– Form initial cluster
– Assign probability each attribute-value in each cluster
– Iterate until model = likelihood of data
• Simple, fast, surprisingly accurate
• Simple, fast, surprisingly accurate
• “Naïve”: attributes assumed to be independent
of each other
• Simple, fast, surprisingly accurate
• “Naïve”: attributes assumed to be independent
of each other
• Pervasive use throughout Data Mining
• Simple, fast, surprisingly accurate
• “Naïve”: attributes assumed to be independent
of each other
• Pervasive use throughout Data Mining
P(Result | Data) =
P(Data | Result) * P(Result) / P(Data)
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20/40
P(Girl) = 40/100
P(Trousers) = 80/100
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20/40
P(Girl) = 40/100
P(Trousers) = 80/100
P(Girl | Trousers) =
P(Trousers | Girl) P(Girl) / P(Trousers)
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20/40
P(Girl) = 40/100
P(Trousers) = 80/100
P(Girl | Trousers) =
P(Trousers | Girl) P(Girl) / P(Trousers)
= (20/40)(40/100)/(80/100) = 20/80 = 0.25
2
Weight
Cars
W
W
W
W
W
W
3
Weight
Cars
W
W
W
W
Weight
Age
Input Neurons
Buy
No
W
W
W
W
W
Hidden Neurons
Output
Neurons
• Multilayer Perceptron Network =
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Assign weights: assess importance of input on output
using training dataset
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Assign weights: assess importance of input on output
using training dataset
• Batch Learning
– Start at outputs and propagate back through the network:
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Assign weights: assess importance of input on output
using training dataset
• Batch Learning
– Start at outputs and propagate back through the network:
– Evaluate weight accuracy: predicted value vs. holdout value
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Assign weights: assess importance of input on output
using training dataset
• Batch Learning
– Start at outputs and propagate back through the network:
– Evaluate weight accuracy: predicted value vs. holdout value
– Adjust weights to improve prediction
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Assign weights: assess importance of input on output
using training dataset
• Batch Learning
– Start at outputs and propagate back through the network:
– Evaluate weight accuracy: predicted value vs. holdout value
– Adjust weights to improve prediction
» Weight can be negative to show inhibiting influence
• Multilayer Perceptron Network =
• Back-Propagated Delta Rule Network
• Assign weights: assess importance of input on output
using training dataset
• Batch Learning
– Start at outputs and propagate back through the network:
– Evaluate weight accuracy: predicted value vs. holdout value
– Adjust weights to improve prediction
» Weight can be negative to show inhibiting influence
• Iterate using conjugate gradient algorithm to
converge
• SSMS (aka SQL Mangler)
– Analysis Services Database
• Data Mining
• Business Intelligence Development Studio
• Lift Chart: Key Influencers
– Decision Tree
– Cluster
– Naïve Bayes
– Neural Network
Lift Chart
Operation
Random: 50%
Population
Ideal:
100%
Targeted
Data Mining:
85%
Bike Buyers
• Demo: Targeted Mailing
– Find prospective customers
– Save results to database
– Import in a new Data Source View
– Process again with Data Mining!
•
•
•
•
•
Fill By Example
Goal Seek
What If
Highlight Exceptions
Data Mining Tab:
– Explore Data
– Clean Data, etc….
•
•
•
•
•
Off-the-shelf toolkit
No Cube required
No code required
Good default parameters
Easily explored models
– Change parameters, filter input, compare lift
• Excel Add-In
•
•
Data Mining Add-ins
http://office.microsoft.com/en-us/excel-help/data-mining-add-insHA010342915.aspx#_Toc257717762
•
•
Analysis Services - Data Mining Videos
http://msdn.microsoft.com/en-us/library/dd776389(v=SQL.100).aspx
•
•
SQL Server Data Mining Home
http://www.sqlserverdatamining.com/ssdm/
•
•
Microsoft Contoso BI Demo Dataset for Retail Industry
http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=868662dc
-187a-4a85-b611-b7df7dc909fc
•
•
What Every IT Manager Should Know About Business Users’ Real Needs for BI
http://docs.media.bitpipe.com/io_25x/io_25515/item_392177/Tableau_S_MktgLtr_BI_IT.pdf
•
•
An Introduction to Data Mining : Discovering hidden value in your data warehouse
http://www.thearling.com/text/dmwhite/dmwhite.htm
• Problems:
– Data to old to be useful
– Need for instantaneous feedback
• Solution:
– StreamInsight
• Complex Event Processing
• Processing and querying of event data
streams
• Data queried while “in flight”
• May involve multiple concurrent event
sources
• Works with high data rates
• Aims for near-zero latency
Months
CEP Target Scenarios
Days
Relational Database Applications
Hours
Operational Analytics
Applications (e.g., Logistics)
Data Warehousing
Applications
Web Analytics Applications
Minutes
Seconds
Monitoring
Applications
100 ms
Manufacturing Applications
Financial Trading
Applications
< 1 ms
0
10
100
1000
10000
Aggregate Data Rate (Events/sec)
100000
higher
Data Sources, Operations, Assets, Feeds, Sensors, Devices
Input
Data Streams
Input
Data Streams
Output
Data Streams
CEP Engine
Monitor
&
Record
Operational
Data Store &
Archive
Mine
&
Design
f(x)
f'(x)
g(y)
h(x,y)
Manage
&
Benefit
CEP Engine
Results
f(x)
g(y)
f'(x)
h(x,y)
StreamInsight Architecture
•
•
•
•
•
•
Algorithmic trading
Smart order routing
Real-time profit and loss
Rapid analysis of transactional cost
Fraud detection
Risk management
• Often 100,000 events per second
• Automate
– Page layout
– Navigation
– Presentation
– Targeted advertising
•
•
•
•
•
Real-time network monitoring
Quality of service monitoring
Location-based services
Fraud detection
Intrusion detection
•
•
•
•
Battlefield control
Monitoring of resource locations
Intrusion detection
Network traffic analysis
– Emails
– Network traffic
– Watch lists
– Financial movements
• Asset monitoring
• Aggregation of machine-based sensor
data
• Generation of alerts in error conditions
• Identifying the “golden batch”
•
•
•
•
•
Real-time monitoring
Managing player interest
Website traffic analysis
Detecting and eliminating undesired behaviors
Understanding behavioral patterns
•
•
•
•
Patient management
Outbreak management
Trend detection
Insurance risk analysis
•
•
•
•
Vehicle management
Supply chain forecasting and tracking
Maritime logistics
GPS tracking
• Monitoring
– Consumption
– Variations
• Detecting outages
• Smart grid management
• Aggregating data across the grid
• Gaming machine event analysis
• Card table analysis
– Fraud detection
– Profit and loss in real-time
• Targeted advertising
– Player behavior
– Loyalty system implementation