Download DATA MART - Amazon Web Services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Business Intelligence : a primer
Rev April 2012
Introduction & overview
The paradigm of BI systems
Platforms
Appendix
Review questions
Our approach to BI
Rep
DSS
Plan
Exec
Mon
Ctl
BI Architecture
Info
BI systems modelling
ES
taxonomy
Worker
Customer
Dash
Manager
Cost
Flexibility & speed
1. Production unit
cost
2. Productivity
3. Usage / workload
1. Process duration
2. Activity timeliness
3. Resource flexiblity
Quality & satisfaction
1.
2.
Spec conformity of
service and products
Technology
dependability
KPI
Identification /
mapping
HIGO
1. Customer access /
acquisition unit
cost
2. Customer use cost
1. Expectation conformity
1. Response time
of the service
2. Response timeliness
2. Service dependability
3. Vendor flexiblity
3. Customer satisfaction
1. Execution unit
cost
2. Preparation effort
1. Technology
response time
2. Technology
timeliness
3. Activity &
technology
flexibility
1.
2.
3.
Expectation conformity
of work / work
environment
Technology
dependability
Employee satisfaction
Enterprise
Information
Modeling
SIRE
Aggregate Strategic Level (ASL)
Analytic Information
Modeling
DFM
GUI Modeling
GOA
Implementation Level
Rich
Semantic
Level (RSL)
Software
Engineering
Interface (SEI)
Business Intelligence:
the role within Enterprise Systems
Management support
Management Information Systems
[Planning & Management Control + Business Intelligence ]
Front-end systems
(Support the life cycle of
customers and end products)
Back-end systems
(Support the cycle of
production and delivery)
Administrative systems (Finance, HR etc.)
Operations support
Acronyms
•
•
•
•
•
•
•
•
•
•
•
•
ABC: Activity Base Costing
ABM: Activity Based Management
BI: Business Intelligence
BW: Business Warehouse (synonym of
DW)
BSC: Balanced Score Card
CPM: Corporate Performance
Management (synonym of SEM)
CRM: Customer Relationship
Management
CSF: Critical Success Factor
DBMS: Data Base Management
System
DSS: Decision Support System
DW: Data Warehouse
EIS: Executive Information System
•
•
•
•
•
•
•
•
•
•
•
•
EPM: Enterprise Performance
Management (synonym of SEM)
ERP: Enterprise Resource Planning
ERM: Enterprise Resource
Management
ES: Enterprise System
KPI: Key Performance Indicator
MBO: Management By Objectives
MRP: Manufacturing Resource
Management
ODS: Operational Data Store
OLAP: On Line Analytical Processing
OLTP: On Line Transaction
Processing
SCM: Supply Chain Management
SEM: Strategic Enterprise
Management
Characteristics of Analytic & Management Information
•
Information is
– Periodical
– Output of computation or aggregations
– Reflects objectives or actual data
•
E.g. data of P& L of an imaginative
Car Company come from different
transaction processing systems
–
–
–
–
•
Sales
Purchasing
Accounting
Etc.
Therefore, the design of BI / MIS :
– Is top-own
– Defines first target data i.e. the
variables that BI should process
– Identifies corresponding source data
– Defines the process to extract and
transform source in target data
The 4-layer paradigm of BI /MIS systems
Decision support
engines (DSS)
Presentation / reporting
engine
(EIS, reporting)
Mining & other
application engines
DATA MART
DATA WAREHOUSE
Loading
Tranformation
Extraction
DATA ENTRY
BASI DATI OPERATIVE
BASI DATI OPERATIVE
BASI DATI OPERATIVE
Transactions Data Bases
The 4-layer paradigm of BI /MIS systems
•
•
•
•
•
BI/MIS applications are based on 4
layers
Layer 1 contains source data, typically
stored in Transaction Data Base
Layer 2 extracts information, and
transforms source data into Multi-key
& Time-dependent data
Layer 3 stores such transformed
information
Layer 4 processes transformed
information according various
purposes
–
Support decisions (DSS)
•
–
Prepare reports and dashboard (Report)
•
–
E.g. define the sale budget
E.g., sales performance
Mine stored data (Mining)
•
E.g. identify customer who may churn
Business Intelligence : a primer
Rev April 2012
Introduction & overview
The paradigm of BI systems
Platforms
Appendix
Review questions
“Jones” case study
• CONTEXT
– The Supermarket Chain
«Jones» includes 300 shops
in 3 regions with 60k items
on sale
– A POS (Point Of Sale)
system supports all
activities of each shop :
•
•
•
•
item receiving,
storing,
scrapping,
selling
– Specifically, POS terminals
record sales transactions and
issue receipts
• REQUIREMENTS
– Management want to
analyze sales
– Facts : Sales
– Measures: amount, quantity,
number of tickets
– Analysis dimensions
• Date
• Item
• Shop
– Time span : 24 months
rolling
Level 1 (source data)
«Jones» case study
Ticket # 2002a23b11
Store #0021MI
Item
#190
#69
#90
TOTALE
Payment
Date
Des
Pen
Mat
Lib
Receipt Heading
• # Store
• # Ticket
• Amount
• Payment
• Date
Price
3560
550
32000
Qty Amount
2 7.12
10 5.50
1 32.00
44.62
Fidelity P.
120109
Item Master Data
• # Item
• # Store
• Description
• Price
• Qunatity mesuere
• Stock on hand
• Stock at the beginning of the day
• Average forecasted dayly sale
Receipt detail
• # Ticket
• # Item
• Amount
• Qty
Level 2
•
DSS
Report/
dashboard
– Select source data
– Check and clean source data (data
cleaning o data cleansing)
– Staging of extracted data (as needed)
– Log of extractions
Mining &
other
•
DATA MART
DATA ENTRY
TRANSACTIONS DATABASES
Extraction can be
– Automatic: a batch procedure that runs
periodically (e.g. daily, weekly,
monthly)
– Interactive: integrates and fixes
automatic data
DATA WAREHOUSE
LOADING
TRANSFORMATION
EXTRACTION
Extraction includes
•
ETL can use intermediate databases
– Staging Area : where extracted data
are temporarily parked (e.g. Data of
each individual shop)
– Operational Data Store (ODS): where
granular data are stored and reconciled
for future use (e.g. receipt data)
Level 3
DSS
Report/
dashboard
Mining &
other
DATA MART
DATA WAREHOUSE
LOADING
TRANSFORMATION
EXTRACTION
DATA ENTRY
TRANSACTIONS DATABASES
• Data are stored in Data
Warehouse and Data Marts
• A Data Warehouse is a
“subject-oriented, integrated,
time-variant (temporal), non
volatile collection of summary
and detailed data, used to
support strategic decisionmaking process for the
enterprise” (Inmon 1996)
• Data Mart is a smaller
warehouse, often a subset or
extraction of a warehouse.
• Warehouse e Mart typically
adopt different data schemas
Level 3 : Data Warehouse
Key table 1
•Key 1
•Attribute 1
•Attribute 2
•Attribute ….
Key table 2
•Key 2
•Attribute 1
•Attribute 2
•Attribute ….
Fact table
•Key 1
•Key 2
•Key …
•Measure 1
•Measure 2
•Measure ….
Key table …
•Key …
•Attribute 1
•Attribute 2
•Attribute ….
• The warehouse is typically implemented by
relational database, whose schema reflects
the corresponding DFM (Dimensional Fact
Model).
• In relational schemas:
• Fact tables:
• Store the value of facts (measures)
• Are identified by multiple keys
(K>= 2)
• Key tables
• Describe the attributes of
dimensions
Level 3: Data Warehouse: star schema
Jones case study
Shop
Shop#
• Description
• Shop-class
• ZIP-code
Time
Date#
• Week-day
• Flag work/holyday for local calendar
• Date in muslim calendar
• Flag work/holyday for muslim calendar
Sales
•
•
•
•
•
•
Date#
Item#
Shop#
Sales amount
Sales qty
Number of receipts
Item
Item#
• Billing-metric
•Item description
• Bar-code#
• Package qty
• Package-class
• Supplier-brand
• Item-class
• A simple implementation of the DFM is a STAR schema where key tables are
implemented only for immediate keys
• Further analysis / segmentation is obtained by queries on attributes of key tables
Level 3 : Data Warehouse : Snow flake schema
Jones Case study
ZIP
Shop
Provinceregion
Area
Holiday
Sales
Date
Muslim
date
Week-day
Chinese
date
Class
Superclass
Time
Item
Supplier
• A full implementation of the DFM requirements implies a snow
flake schema with a key table for every hierarchy node
Level 3: design steps
The process from extraction up to data warehouse creation is supported by warehouse
building tools that are incorporated in most BI platforms
1
2
Target Data design
3
Mapping of Source Data
into Target Data
4
ETL code generation
5
Source Data Base
Identification
6
Data extraction
Creation of Data Warehouse
Level 3: design steps : detail
Level 3: Data Mart
DSS
Report/
dashboard
Mining &
other
DATA MART
DATA WAREHOUSE
LOADING
TRANSFORMATION
EXTRACTION
DATA ENTRY
TRANSACTIONS DATABASES
• Data mart store frequently accessed
information
• From a same warehouse multiple data
marts can be created
• Data marts are typically implemented by
hypercube (OLAP technology)
Level 3: Data Mart
Customer History
Sales Analysis
Marketing
Accounting
Shop
Data Warehouse
From a same
warehouse multiple
data marts can be
created
Level 3: Data Mart :
Hyper-cube : display
Pages
Facts
Columns
Level 3: Data Mart :
Hyper-cube : logic
Fact
Sales
Shop
Item
Time
Event
Dimension
Quantity = 20
Amount= 100
• An hypercube is a matrix of
tables
• A Fact (e.g. Sales) is identified
in a multidimensional space
whose axes are Analysis
Dimensions (e.g. Shop, Time,
Item)
• An hypercube enables to
instantly retrieve complex
information e.g. :
– Sales in last Year (aggregation
of Time)
– by Region (=aggregation of
Shops)
– by Category (= aggregation of
Product)
Level 3: Data Mart :
Hyper-cube : logic
Shops
Shop
Item
Month
BUDGET
MB21000
MB21000
MB21000
MB21000
MB31000
MB31000
MB31000
MB31000
MB41000
MB41000
MB41000
MB41000
0601
0601
0602
0602
0601
0601
0602
0602
0601
0601
0602
0602
Jan
Feb
Jan
Feb
Jan
Feb
Jan
Feb
Jan
Feb
Jan
Feb
50
55
50
60
65
45
55
50
60
70
65
75
MB21000
MB31000 MB41000
Item
0601
Date
Jan
Feb
Mar
ITEM’
SHOP
OLAP dimensions = warehouse key
0602
Apr
Level 3: Data Mart :
Hyper-cube : logic
Dimension
Product
Hierarchy
Type
Category
Svelto
….
Ajax
Washing
powder
House
Cleaning
Dash
…
Soap
Palmolive
Dairy
Bread &
Biscuit
Food
Drinks
Tools
Hardware
Nuts &
bolts
All Products
• Dimensions are arranged in
«aggregation hierarchies»
(roll-up)
• Levels of hierarchies are
called «dimensional
attributes»
• A multidimensional analysis
is performed by navigating
trough aggregation levels of
dimensions
Level 3: Data Mart :
Hyper-cube : implementation
•
FACT
TIME
ITEM
Date
Tempo (ch)
Tempo attributi (da def.)
Prodotto (ch)
Prodotto attributi (da def.)
Item
Shop
Sales-amount
Sales-qty
Shop
PuntoVendita (ch)
PuntoVendita attributi (da def.)
•
Shop
Shop
Shop
Receipt-number
Item
Item
Item
A wise approach to
implement
multidimensional
information is to have
an hyper-cube for each
measure
This easies arithmetic
operations and keeps
hyper-cubes light
Level 4
DSS
Report/
dashboard
Mining &
other
DATA MART
DATA WAREHOUSE
LOADING
TRANSFORMATION
EXTRACTION
DATA ENTRY
TRANSACTIONS DATABASES
• It processes information for
management from various
perspectives
– Define / assess decisions and
program (DSS)
– Present information with a
friendly navigation that enables
roll up and drill down (EIS &
dashboard)
– Produce structured reports
(reporting)
– Identify trends an pattern in
stored information (mining and
profiling)
Leve 4 : reporting
Information distribution and privileges handling
Format editing
Semantic Layer
Data Marts
Data warehouse
Data Bases
Level 4: reporting : semantic layer
• Purpose: to map data from
heterogeneous sources
• Generally semantic layer
includes a set of types e.g.:
– Dimensions (= warehouse
keys)
– Dimensions attributes ( = key
attributes)
– Measures and Facts
Level 4: reporting : format editing
• Includes editing functions by
which report pages are
defined.
• He content of the report is
obtained by dragging an
dropping information item
from the catalogue of the
semantic layer
• Further activities manage the
layout of pages
Level 4: reporting : information distribution
Level 4 : DSS
• A DSS is a computer based
application designed to support
semi-structured management
decisions by
– Searching and analyzing information
on a collection of sources
– Compute and assess results (e.g.
sensitivity analysis)
• Typical application fields are:
–
–
–
–
Planning
Budgeting
Optimization
Funding and Investment Decisions
• ERP / CRM vendors offer DSS
suites for corporate planning as
Oracle’s EPM and SAP’s BO
Level 4 : DSS : an example (budgeting)
The control system produces
monthly a financial report and a
report with physical
performance indicators (KPI)
Financial report and KPI report
are on 5 dimensions:
1.
2.
3.
4.
5.
Time
Cost centers
Item
Sales channel
Activity
Sales data come from the Sales
systems and are stored in a data
mart; the same approach is also
for sales budget, actual costs and
budget costs
Data marts are merged in two
hyper-cubes, respectively KPI
and Financial.
Over hyper-cubes a software
processes reports on P&L, A&L,
Cashflow, KPI
Ricavi a budget
Ricavi
Conto economico
Processi
di calcolo
Sistema di vendita
Ricavi
Stato patrimoniale
Dati
Processi
finanziari di calcolo
Cashflow
Sistema amministrativo
Voci economiche e patrimoniali
Spese e costi a budget
Costi
KPI
Processi
di calcolo
KPI
Processi
di calcolo
Memorizzazione e calcolo
Elaborazione report
Level 4 : Analysis Engines
• Data mining applications for research and marketing are
designed for
– Discover in a data base relations and associations previously unknown
(“data mining helps end user extract useful business information from
large databases” (Berson 1997)).
– Mining software is a key in marketing to calculate predictive indicators
as
•
•
•
•
Churning,
Fraud risk,
Saving attitude,
Economic potential etc.
• Customer Profiling systems (Analytic CRM)
Business Intelligence : a primer
Rev April 2012
Introduction & overview
The paradigm of BI systems
Platforms
Appendix
Review questions
BI solutions are offered by all main vendors
•
•
BI is 5-10% of the ES market
Main vendors offer BI products &
applications
–
ES vendors
•
Oracle: the largest DB vendor
–
–
•
SAP: the largest ERP vendor
–
–
•
–
products on Warehousing and
applications from vendors acquired
(Essbase, Hyperion )
Applications: EPM analogous of SAP’s
SEM
Applications: Strategic Enterprise
Management (SEM) to support the entire
management and analysis life cycle
Products : Crystal report, Business
Object (founder of reporting paradigm)
Microsoft : Office products , SQL server
family
BI vendors
•
•
•
SAS: founder of BI and the largest BI
independent vendor, offers a wide range
of applications by industry and business
area, and specific solutions
Microstrategy
Open source platforms: e.g. Pentaho
Business Intelligence Platforms : SAS
•
By industry
–
–
–
–
–
•
…
Education
Financial Services
Government
…..
By solution
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Analytics
Business Analytics
Business Intelligence
Customer Intelligence
Data Management
Fraud & Financial Crimes
High-Performance Analytics
IT & CIO Enablement
On Demand Solutions
Performance Management
Risk Management
SAS® 9.3
Supply Chain Intelligence
Sustainability Management
•
Featured solutions
– SAS® 9.3
– SAS® Clinical Data Integration
– SAS® Curriculum Pathways®
– SAS® Enterprise Guide®
– SAS® Enterprise Miner™
– SAS Fraud Framework for
Government
– SAS® High-Performance Analytics
– SAS® Inventory Optimization
– SAS® OnDemand for Academics
– SAS® Social Media Analytics
– SAS® Text Analytics
– SAS® Visual Data Discovery
Business Intelligence : a primer
Rev April 2012
Introduction & overview
The paradigm of BI systems
Platforms
Appendix
Review questions
Data Warehouse and Data Mart vs Database
Data base
Data Warehouse
Data Mart
Conceptual modeling
(Rich Semantic Layer)
ERA
DFM
DFM
Information type
(Master, Event,
Analysis)
Master + Event
Analysis
Analysis
Information
organization
Normalized (e.g. 3NF)
Star or snowflake
Hypercube
Data schema
Relational
Relational
OLAP or Relational
Processing orientation
Create + Update
Read
Read
Typical data operations
Insert one individual
record or modify one or
multiple records
Access a vector of
records Roll-up, Drill
down, Dice
Access one ore multiple
a vector of records
Roll-up, Drill down,
Dice
Transaction example
Enter a customer order
Segment customer in
Italy with a degree of
loyalty >70% by age and
region
Segment customer in
Italy with a degree of
loyalty >70% by age and
region
Business Intelligence : a primer
Rev April 2012
Introduction & overview
The paradigm of BI systems
Platforms
Appendix
Review questions
Review questions
• Illustrate the input, process and output of the four layers of BI
systems
• What is an Hyper-cube ?
• What is a data mart?
• What is a data warehouse? Compare data warehouse versus
classic database in terms of
–
–
–
–
Conceptual modeling (Rich Semantic Layer)
Implementation (DB schema)
Information type (Master, Event, Analysis)
Processing orientation