Download Data Warehousing (DW) Online Analytical Processing (OLAP) Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Business Intelligence Workshop, Helia, May, 2008
Fritz Laux
Reutlingen
University
DBTechNet
Data Warehousing (DW)
Online Analytical Processing (OLAP)
Data Mining
© F. Laux
Topics
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
1. Introduction to BI
and CPM
2. ETL Process
5 Data Mining
duction
2 ETL Process
3 DW Modeling
3 DW Model
4 OLAP
1 Intro
3. DW Modeling
4. OLAP
4 OLA
P
5 Data Minin
g
5. Data Mining
2 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
1
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Critical Questions About an Enterprise
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
ªAre we on the right way?
) Yes, we are!
ªHow about our competitors?
3 DW Model
4 OLAP
5 Data Mining
ªEconomical trends?
3 /70
© F. Laux
fm
ipl.-K
of. D
© Pr
oth
. A. R
Critical Questions About an Enterprise
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
ªAre we on the right way?
) Yes, we are!
ªHow about our competitors?
) Ahead of us!
4 OLAP
5 Data Mining
ªEconomical trends?
) Turbulences!
4 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
of.
© Pr
Dipl.
-Kfm
oth
. A. R
2
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Where do we get the Knowledge from?
Fritz Laux
Reutlingen
University
ªAbout the enterprise
) From the company’s
operational information systems
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªAbout the market and competitors
) From census bureau
) From public statistical data
ªAbout economical trends
) From financial and economical publications
ªHow you gather, manage, and use information will
determine whether you win or lose
(Bill Gates, Business @ The Speed of Thought, 1999)
5 /70
© F. Laux
ªSo, where is the problem?
Definition and Problems to solve in Business Intelligence
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ª Definition: Business Intelligence (BI) refers to processes and
technologies using fact based systems to analyze business
ª BI needs to deal with:
1.
2.
3.
4.
5.
Information overload
Missing knowledge
We do not know which are the right questions
We do not know the influencing factors and their impact
Key measures or indicators to steer an enterprise are
missing
6 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
3
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Information Pyramid
Fritz Laux
Reutlingen
University
Data
Data
Mining
Mining
3 DW Model
Growing
knowledge
Ma
rk
et
5 Data Mining
OLAP
OLAP
Vi
ew
4 OLAP
DDSSSS
Data
DataWarehouse
Warehouse
l
tionnaal
OOppeerraaetio
s
m
SSyysst tems
OLTP
OLTP
IT
ITView
View
7 /70
© F. Laux
eww
Viei
sV
esss
nine
ussi
BBu
2 ETL
) IT view
) Business View
) Market View
SIS
EEI
1 Intro to BI
ª Aspects
Amount of information
ªWe're drowning in information and starving for
knowledge. (Rutherford D. Rogers, Yale, 1985)
Motivation
Fritz Laux
Reutlingen
University
ªWhat is the goal of my organization?
ªHow do we affect the market?
1 Intro to BI
2 ETL
3 DW Model
ªHow do we perform?
4 OLAP
5 Data Mining
8 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
of.
© Pr
Dipl.
-Kfm
oth
. A. R
4
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Motivation
Fritz Laux
Reutlingen
University
ªBusiness Intelligence as critical success factor
ª Purpose: Support business decision making
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
9 /70
© F. Laux
fm
ipl.-K
of. D
© Pr
oth
. A. R
Corporate Performance Management (CPM)
Fritz Laux
Reutlingen
University
ªHow can we steer an enterprise?
set
goals
start
plan
1 Intro to BI
2 ETL
3 DW Model
re-plan
execute
4 OLAP
5 Data Mining
analyze
monitor
Idea from MIK AG:
http://www.mik.info
ªBI Tools provide the means to steer an enterprise by
) Measuring the effect of decisions and
) Analyzing the performance and
) Compare with goals
10 /70
© F. Laux
ªDefinition: CPM is the framework for steering an
enterprise by means of Business Intelligence
(c) 2008, Fritz Laux, Reutlingen University
5
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
How Can we Measure Corporate Performance?
Fritz Laux
Reutlingen
University
1 Intro to BI
ªThrough Key Performance Indicators (KPIs)
) Definition: KPI is a metric to define and measure state and
progress towards an organization’s goal
set
) Usually high
level relative plan
values
goals
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªExamples
re-plan KPIs
) Customer
execute
Ö Customers satisfaction
Ö Customer attrition (loss)
) Manufacturing
analyzeKPIs
monitor
Ö Overall Equipment Effectiveness
OEE = Availability * Performance * Quality
) Financial KPIs
Ö Profit Margin PM = Net Income / Sales
Ö Return on Investment ROI = Turnover * Earnings / Sales = …
11 /70
© F. Laux
Return on Investment (ROI)
Fritz Laux
Reutlingen
University
ªFinancial KPIs have „natural“ metrics
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Source: Fred Nickols,
2000, originally by
Johnson and Kaplan
12 /70
© F. Laux
ªBut how about soft factor metrics?
(c) 2008, Fritz Laux, Reutlingen University
6
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Soft Factor Metric
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
ªExample: Customer satisfaction
) General satisfaction
) Specific satisfaction:
quality/price of product, speed of delivery, …
Ö How do we compare these?
4 OLAP
5 Data Mining
ªSearch for a mapping of categorical values to ordinal
values
) Totally satisfied (ts) Æ 9
) Partially satisfied (ps) Æ 3
ªMeaning of the metric
13 /70
) ts = 3 * ps? … No! But ts is-better-than ps
) Are two metrics comparable? … No! But we do weighted
comparisons.
© F. Laux
Motivation
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªWhy can‘t we use our OLTP System?
) Missing information
Ö Need for integration of economical and census data
Ö Need for soft factors to assess an enterprise
) Missing KPIs and steering parameters
Ö Need for highly significant KPIs and parameters
) Influencing factors and different perspectives not available
Ö Need for multidimensional analysis and presentation
14 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Source: One Hundred
& Eighty Degrees
Systems Limited. 2004
7
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Motivation
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªWhy can‘t we use our OLTP System?
) Queries only explicit information
y Select customer, sum(sales) from Orders
where Region ….
Group by …
Ö We don‘t know what to ask!
Ö Need for interactive, explorative analysis
) Inappropriate presentation of information
Ö Tabular presentation
one dimensional analysis
Sales ok?
Trend ok? Reason?
15 /70
© F. Laux
Ö We can’t see the problem!
Ö Need for multidimensional analysis and presentation
Management Cockpit
Fritz Laux
Reutlingen
University
ªThe CPM paradise
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Source: Juergen Daum,
New Economy Analyst
Report, 2004
16 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Source: SAP Whitepaper,
SAP SEM / CPM,
http://help.sap.com/
8
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
The Business Intelligence Process
Fritz Laux
Reutlingen
University
Data
Warehouse
Data
Sources
Analysis
Cubes,
Data Marts
1 Intro to BI
2 ETL
xls
3 DW Model
4 OLAP
OLAP
5 Data Mining
DBS
stats
Data
Mining
ETL
WWW
WWW
Product
Build up
17 /70
Design
© F. Laux
Time
Region
Extraction Transformation Loading
Fritz Laux
Reutlingen
University
Data
Warehouse
Data
Sources
Analysis
Cubes,
Data Marts
1 Intro to BI
2 ETL
xls
3 DW Model
4 OLAP
OLAP
5 Data Mining
DBS
stats
ETL
WWW
WWW
Product
18 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Data
Mining
Time
Region
9
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Data Sources
Fritz Laux
Reutlingen
University
ªTechnical data sources
supported by SQL Server
Integration Services
(SSIS)
ªGeneral sources
) Time
) Geography
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªOLTP
) Master data
) Transaction data
ªPlanning
) Planning turnover
) profit, etc
ªEconomic data
19 /70
© F. Laux
) Business sector data
) Economic forecast
Extract and Transform
Fritz Laux
Reutlingen
University
ªSelect
which data are needed?
ªCleanse
where are the user data?
ªConvert
have all facts the same unit,
coding and granularity?
ªHarmonize
have we synonyms and homonyms?
ªAdjust
grouping, classification?
ªCorrect
are the data correct?
ªAmend
are the data complete?
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
20 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
10
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Extract and Transform Example
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
ª Select
ª Cleanse
ª Convert
3 DW Model
4 OLAP
5 Data Mining
e.g. http://.../consumptionPerCapita/coffee.html
e.g. strip off html tags
e.g. convert
consumption into kg
ª Harmonize e.g. import with
consumption?
ª Adjust
e.g. region grouping
ª Correct
e.g. incorrect
value for D 1989
ª Amend
e.g. for NL 1988
21 /70
© F. Laux
<table border="1" … width="21%">
<tr>
<td width="58%">Country</td>
<td width="45%">1987</td>
</tr>
<tr>
<td width="58%">Finland</td>
<td width="45%">12,04</td>
</tr>
</table>
Country
1987
1988
1989
Finland
12,04
?
11,68
Sweden
11,64
11,71
11,08
Norway
20,13 lb
20,81 lb
18,19 lb
11
10,65
10,2
Benelux
19,65
20,48
19,89
Austria
7,75
8,17
8,01
Germany
7,38
8,17
0,827
Denmark
Hands on Lab: Integration Services (SSIS)
Fritz Laux
Reutlingen
University
1 Intro to BI
1. Open SS Business Intelligence Studio
2. Create
new project
2 ETL
3 DW Model
4 OLAP
5 Data Mining
3. Select
22 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
11
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Hands on Lab: Integration Services (SSIS)
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3. Build a
control flow
3 DW Model
4 OLAP
2. Design a data
flow from source
to destination
5 Data Mining
source
destination
1. Define connection
managers for data
sources and destinations
23 /70
© F. Laux
Hands on Lab: Integration Services (SSIS)
Fritz Laux
Reutlingen
University
ªGraphically design control and data flow
ªExample 1: Loop control, data and error flow
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
Control
loop
Text file
data source
5 Data Mining
error flow
Data flow
24 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
12
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Hands on Lab: Integration Services (SSIS)
Fritz Laux
Reutlingen
University
ª Example 2: ETL control flow design & a data flow taking date
entries from sales and purchase orders to build date dimension
Start of
control flow
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Excel data
source
Data
transformation
End of
control flow
Destination
DW
25 /70
© F. Laux
Data Warehouse Modeling
Fritz Laux
Reutlingen
University
Data
Warehouse
Data
Sources
Cubes,
Data Marts
1 Intro to BI
2 ETL
xls
3 DW Model
4 OLAP
OLAP
5 Data Mining
DBS
stats
ETL
WWW
WWW
Product
26 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Data
Mining
Time
Region
13
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Data Warehouse
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
ªDefinition:
„A data warehouse is a
) subject-oriented,
) integrated,
) time-variant,
) nonvolatile
5 Data Mining
collection of data in support of management’s
decision-making process.“
William H. ‘Bill’ Inmon (1996)
27 /70
© F. Laux
Data Warehouse
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªProperties
) Subject-oriented
Ö data is selected and organized so support business analysis
Ö Optimized for query and analysis
Ö Objects (facts) and their determining factors (dimensions) are
linked together
Ö Not to support OLTP
) Time-variant
Ö accumulates historical data over time
) Non-volatile (archival)
Ö Data is read-only; it is never updated, only added
Ö May have redundancies
Ö Contains pre-calculated aggregations
) Integrated
28 /70
Ö contains data from different sources (OLTP systems,
economical databases, etc)
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
14
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
5.3 Dimensional Fact Model
ªProperties
Fritz Laux
Reutlingen
University
) Multidimensional model
) Distinction between fact (measures) and dimension
) Structural dimensions
) Attributes of Dimension
) computed values
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Fact
measure
Comp. value
Sales
average,
semi-additive
Dimension
Year
Month
Dim.attribute
Week
amount
onStock
value
weight
Type
Product Prod.group
29 /70
© F. Laux
2.1a Taxonomy of Facts
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
Fact
3 DW Model
4 OLAP
5 Data Mining
numerical
additive
semi-additive
categorical
ordinal
nominal
temporal
30 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
15
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
DW Schemes
Fritz Laux
Reutlingen
University
ªStar :
one Fact table,
multiple Dimension tables
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªGalaxy:
multiple Fact table,
multiple Dimension tables
ªSnowflake:
Dimension tables normalized,
Fact tables aggregated
31 /70
© F. Laux
ªAll 3 Schemata are relational models in disguise
Example Star Scheme
Fritz Laux
Reutlingen
University
ª SSAS Source View
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
Dimension
table
Fact table
5 Data Mining
32 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
16
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Example Galaxy Scheme
Fritz Laux
Reutlingen
University
ª SSAS Source View
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Joint
dimension
table
Fact tables
33 /70
© F. Laux
Example Snowflake Scheme
Fritz Laux
Reutlingen
University
ª SSAS Source View
Normalized
product dimension
Aggregated
fact table
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
34 /70
© F. Laux
Fact table
(c) 2008, Fritz Laux, Reutlingen University
17
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Design Rules for DW Scheme
Fritz Laux
Reutlingen
University
ªUse Star if
) Dimensions have few or dynamic Attributes
) Measures are orthogonal
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªUse Snowflake if
) Dimensions are structured (aggregation)
) Measures are orthogonal
ªUse Galaxy if
) Dimension are reused
) Measures are not orthogonal
35 /70
© F. Laux
Hands on Lab: SQL Server Management Studio
Fritz Laux
Reutlingen
University
1. Start the SQL Server Management Studio
2. Create a new database
1 Intro to BI
2 ETL
3. Add a new database diagram
3 DW Model
4 OLAP
5 Data Mining
36 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
18
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Hands on Lab: SQL Server Management Studio
Fritz Laux
Reutlingen
University
4. Create tables
5. Define foreign keys
enter table
definition
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Manage keys,
relationships
Drag and
drop columns
to define
foreign keys
37 /70
© F. Laux
Modeling Cubes, OLAP
Fritz Laux
Reutlingen
University
Data
Warehouse
Data
Sources
Cubes,
Data Marts
1 Intro to BI
2 ETL
xls
3 DW Model
4 OLAP
OLAP
5 Data Mining
DBS
stats
ETL
WWW
WWW
Product
38 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Data
Mining
Time
Region
19
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
5.2 Cube Model
Fritz Laux
Reutlingen
University
ªMultidimensional view of the Data
Warehouse
)Dimensions correspond with coordinates
)Structured Dimensions
)Facts are a function of multiple dimensions
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
E240
country
car
vehicle
truck
product
Fact:
sales =
f(product, country, time)
C220
time
39 /70
© F. Laux
5.4 object oriented model
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªObject-oriented view of the Data Warehouse
) „Intelligent“ dimensions and Facts:
Ö Meta-information for dimensions and facts
) Example:
Ö Product Dimension has hierarchical aggregation
Ö costs can be compared with earnings, but
not with noOfOrders
) Object oriented structure allows semantically correct
navigation and aggregation
Hierarchy
level
child
Product
40 /70
© F. Laux
#Orders
price
(c) 2008, Fritz Laux, Reutlingen University
Timespan
start
end
Month
days
20
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
MS visualization of a hypercube
Fritz Laux
Reutlingen
University
ªRelational view on the OLAP cube structure
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
41 /70
© F. Laux
MS visualization of a hypercube
Fritz Laux
Reutlingen
University
ªPivot table view on the OLAP data
) Drag and drop measures and dimensions on the pivot table
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
42 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
21
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
OLAP Storage models
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
ªMOLAP: Multidimensional (md) storage
) Single cube Æ one large md array with sparse data
) Multi-cube Æ galaxy structured md arrays
) Storing md array on a linear address space
) Optimized OLAP for small cubes
4 OLAP
5 Data Mining
ªROLAP: Relational storage
) Storing facts and dimensions in tables
) Storing aggregations in tables
) Best choice for very large cubes
ªHOLAP: Hybrid storage
43 /70
© F. Laux
) Storing facts and dimensions in tables
) Storing aggregations as ms arrays
) Best performance for large cubes
Hands on Lab: SSAS Cube Design
Fritz Laux
Reutlingen
University
1 Intro to BI
ª Start SQL Server Business Intelligence Studio
ª Create a new SSAS project
ª Add Data Source, View, and create a new cube
2 ETL
3 DW Model
4 OLAP
ª Identify fact and dimension tables
5 Data Mining
44 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
22
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Hands on Lab: SSAS Cube Design
Fritz Laux
Reutlingen
University
1 Intro to BI
ª Select measures
ª Define dimensions and aggregation hierarchies
ª Save cube definition
2 ETL
3 DW Model
4 OLAP
5 Data Mining
45 /70
© F. Laux
Hands on Lab: SSAS Cube Design
Fritz Laux
Reutlingen
University
ª Select storage model and its parameters
ª Process and deploy cube
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
46 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
23
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Hands on Lab: performing OLAP
Fritz Laux
Reutlingen
University
1 Intro to BI
ª Drill down – Roll up
ª Slice and Dice
ª Drill through
2 ETL
3 DW Model
4 OLAP
5 Data Mining
47 /70
© F. Laux
Data Mining
Fritz Laux
Reutlingen
University
Data
Warehouse
Data
Sources
Cubes,
Data Marts
1 Intro to BI
2 ETL
xls
3 DW Model
4 OLAP
OLAP
5 Data Mining
DBS
stats
ETL
WWW
WWW
Product
48 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Data
Mining
Time
Region
24
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Decision Tree Classification
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªGoal: Mapping/prediction of objects to predefined classes
based on their attribute values
ªProcess:
1. Build a decision tree DT (classification model) with the help
of sample objects (training data)
2. Validation for the DT
(e.g. precision)
with test data
car type
≠ truck
= truck
3. Classification
of unknown
objects
Risk = low
age
> 60
49 /70
≤ 60
Risk = low
© F. Laux
Risk = high
Regression Tree
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªGoal: Prediction of a numeric value for objects based on a DT
with linear regression functions on the leaf level
ªProcess:
1. Build a DT with the help
car type
of training data
2. Replace some branches by
≠ truck
= truck
a linear regression
formula
Price = 20k€ + 2k€*weight
insurance class
3. Generate prediction
values tune regression
< III
> VI
parameters
Price = 10k€ +
4. Testing (like DT)
Price = 3ok€ +
3k€*class
6k€*class
5. Prediction (like DT)
[IV..VI]
50 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
Price = 20k€ + 4k€*class + 10€*HP
25
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
SSAS Decision Tree Viewer
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
51 /70
© F. Laux
SSAS Dependency Network
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
52 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
26
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
SSAS Decision Tree Prediction
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
53 /70
© F. Laux
Clustering Basics
Fritz Laux
Reutlingen
University
ª Clustering (Grouping) :=
Arrangement of objects into groups, that
) objects in the same cluster are most „similar“
) objects from different clusters are most „dissimilar“
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ª Types of clustering
) Partitioning clusters (an object o1 belongs to only one cluster)
) Hierarchical clusters (nested clusters)
ª Distance function d:
) d(o1, o2) ≥ 0; d(o1, o2) = 0 ⇔ o1 = o2; d(o1, o2) = d(o2, o1)
ª Similarity of o1 and o2 is defined via distance function
) The smaller the distance, the more alike are the objects
ª Goal function
54 /70
© F. Laux
) Maximize the compactness of the clusters
) Compactness of a cluster C :=
|C| / Sumoi∈C (d(oi,c), where c = center of C
(c) 2008, Fritz Laux, Reutlingen University
27
Business Intelligence Workshop,
Helia, May 2008
f1
Introduction to DW, OLAP, and DM
using SQL Server
K-Means based Clustering (1/2)
Fritz Laux
Reutlingen
University
ª Algorithm:
1. Choose k cluster
centers (centroids)
2. Assign each object to
its nearest centroid
3. Recalculate the
cluster centers
(centroids)
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
Beispiel
1
5
a
k=2
6
3
2
b
7
4
Initiale Zentroide c und d
55 /70
© F. Laux
f2
K-Means basiertes Clustering (2/2)
Fritz Laux
Reutlingen
University
ª Algorithm:
1. Choose k cluster
centers (centroids)
2. Assign each object to
its nearest centroid
3. Recalculate the
cluster centers
(centroids)
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ª
Repeat steps 2-3 until the
centroids stabilize
Example
1
5
a
a*
3
k=2
6
2
b
b*
7
4
Initial centroids c and d
56 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
28
Folie 55
f1
Animation für K-Mean
Hans Muster; 09.11.2006
Folie 56
f2
Animation für K-Mean
Hans Muster; 09.11.2006
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
SSAS Clustering
Fritz Laux
Reutlingen
University
ªImplements K-Means and EM Clustering
ªBoth are partitioning algorithms
) K-Means is
distance based
) EM is
probability based
) Scalable means:
one single
data scan
only
57 /70
© F. Laux
SSAS Cluster Viewer
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
58 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
29
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
MS Cluster Profile Viewer
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
59 /70
© F. Laux
SSAS Cluster Characteristics
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
60 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
30
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
SSAS List Chart
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
61 /70
ªLift = %ofCorrectPredictions / %ofPopulation
© F. Laux
Association Rules
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªExample (basket analysis)
) Available items I = {Bred,
Coffee, Milk, Cake, Butter, Tea}
) Support of X = {Coffee, Milk}
Ö Support(X) = 3/6 = 50%
) Support of R = X ∪ {Cake} i.e.
Support of Rule:
„Milk, Coffee ⇒ Cake“
Ö Support(R) = 2/6 = 33%
) Confidence of Rule:
Ö Confidence („Milk, Coffee ⇒
Cake“) = Support(R)/Support(X)
= 2/3 = 67%
Transaction set T
t
bought items
1 Bred, Coffee, Milk,
Cake
2 Coffee, Milk, Cake
3 Bred, Butter, Coffee,
Milk
4 Milk, Cake
5 Bred, Cake
6 Bred
62 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
31
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
SSAS Item Sets Viewer
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
63 /70
© F. Laux
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ªProbability = Confidence
64 /70
ªImportance
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
32
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Key Performance Indicators (KPI)
Fritz Laux
Reutlingen
University
ªIdea to measure performance of an enterprise with
simple numbers as return on investment (ROI), profit,
capital turnover
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
ª ROI := Earnings / Investments
ª Profit := Revenue – Costs
65 /70
ª Capital turnover := Sales / Investments
© F. Laux
SSAS Key Performance Indicators (KPI)
Fritz Laux
Reutlingen
University
ªKPI = f(measures, goal)
) Measures are compared with a goal function
) KPI is normally analyzed over time
1 Intro to BI
2 ETL
3 DW Model
Define new KPI
4 OLAP
5 Data Mining
Drag measure to value
or goal expression
66 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
33
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
Time Series
Fritz Laux
Reutlingen
University
ªDefinition: A time series (TS) is a timely equidistant
ordered sequence of numbers
) The ordering is relevant (i.e. following numbers are not
independent)
ª Additive TS Model
) y(t) := Trend(t) + Season(t) + R(t) (t ∈ {1, 2, 3, …}
) Trend is monotonic
(linear or non-linear)
) Season is periodic
(sine or other)
) R(t) random value
67 /70
time
© F. Laux
SSAS Autoregressive Tree Models for Time-Series Analysis
Fritz Laux
Reutlingen
University
ª Definition: Let y = (y1, y2, …, yt) be a time series TS. The model
for TS is called auto regressive, if for all p <τ ≤ t the
probability distribution of yτ depends as a linear regression on
the previous p values of yτ -π
yτ -1
yτ -p
yτ
ª Definition: An auto regressive tree model is a piecewise linear
autoregressive model, where the boundaries are defined by a
decision tree.
Yτ-1 < a
false
P(yt) = N(m1,σ12)
true
Yτ-1 > b
false
68 /70
P(yt) = N(m2,σ22)
true
P(yt) = N(m3,σ32)
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
a
t
b
34
Business Intelligence Workshop,
Helia, May 2008
Introduction to DW, OLAP, and DM
using SQL Server
MS Time Series
Fritz Laux
Reutlingen
University
ªUses regression tree
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
69 /70
© F. Laux
Fritz Laux
Reutlingen
University
1 Intro to BI
2 ETL
3 DW Model
4 OLAP
5 Data Mining
70 /70
© F. Laux
(c) 2008, Fritz Laux, Reutlingen University
35