Download TDWI-Finland-2014 3up DanL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
© Dan Linstedt, 2014 all rights reserved
6/12/2014
© Dan Linstedt, 2014 all rights reserved
Dan Linstedt
25 Years in the industry
http://LearnDataVault.com
Inside the pressure cooker
that is BI and EDW
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
3
1
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Business Issues…
Big Data (volume, velocity)
Unstructured/Multi‐Structured Data (variety)
Managed Self‐Service BI (analytics)
Managed Self‐Service Data Discovery (bypassing IT)
Auditability / Accountability
Ownership and Governance
Security and Privacy
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
4
Project Issues
IT…
Takes too long
Over‐budget
Too complex
Can’t sustain growth
THE GAP!!
Business…
Changes Frequently
Needs Accountability
Demands Auditability
Wants Visibility
Desires Autonomy
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
5
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
6
http://LearnDataVault.com
2
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Diametrically opposed goals for the EDW Layer.
Information Mart Goals
 Interpretation
 Interpolation
 Correlation
 Quality
 Rapid Delivery
“Data” Warehouse Goals
 Sourcing
 Latency
 Scalability
 Auditability
 Historical Storage
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
7
Ever Growing Dimensional Warehouse Costs
Forced
Conformity
per
Cost month
$500k
Data Mart 3
Projects:
1) 3 months, $100k
2) 5 months, $250k
3) 7 months, $500k
Data Mart 2
$250k
Data Mart 1
$100k
3
6
9
12
15
18
21
Time in Months
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
8
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
9
http://LearnDataVault.com
3
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Data Silos
SALES
We built our own because IT costs too much…
FINANCE
We built our own because IT took too long…
MARKETING
We built our own because we needed
customized dimension data…
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
10
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
11
“One cannot solve a
problem with the
same consciousness
that created it.”
Albert Einstein
Time For A
CHANGE
http://LearnDataVault.com
4
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Forging Ahead
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
13
Data Vault 1.0
DV1 also uses Sequences!
Data Vault 2.0 System
Data Vault 1.0 is All About The Data Vault Model
Methodology
• Consistent
• Repeatable
• Pattern Based
Architecture
• Multi‐Tier
• Scalable
• Supports NoSQL
Model
• Flexible, Scalable
• Joins to NoSQL
• Hub & Spoke
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
15
5
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Agile Methodology
BENEFITS:
• Drives Agile Deliveries (2/3 weeks)
• Includes CMM, Six Sigma, TQM
• Manages Risk, Governance, Versioning
• Defines Automation, Generation
• Designs Repeatable Optimized Processes
• Combines Best Practices for BI
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
16
DV 2.0 Methodology & CMM
Follows: SEI/CMMI Level 5, PMP, Six Sigma, TQM, and Agile elements
5
Optimized business processes, repeatable, scalable, fault‐
tolerant. Automatable (generate‐able)
4
Metrics, Estimates vs Actuals, Function Point Analysis,
Identification of broken processes
3
Defined Business Processes, Defined
Goals, Defined Objectives
2
Risk assessments / analysis, managed
processes, basic alignment efforts
1
Process unpredictable and
poorly controlled
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
17
6
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Model
Satellite
BENEFITS:
• Follows Scale Free Architecture
• Based on Hub & Spoke Design
• Backed by Set Logic & MPP Math
Link
Hub
Satellite
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
19
Data Vault 2.0 Model
DV2 uses Hash
Keys
Why?
NoSQL
http://LearnDataVault.com
RDBMS
7
© Dan Linstedt, 2014 all rights reserved
6/12/2014
RDBMS
NoSQL
How Hashes Work With ELT / ETL
RDBMS
RDBMS Staging
Satellite 1
Stage
Table
With Hashes
Source File
Satellite 3
Link
Hub
EL
process
EL
process
Distinct
Parallel
Load
Operations
Satellite 2
Hub
(Staging from Hadoop to Relational)
NoSQL (Document Store)
Source File
Copy
or
Load
Hadoop
File
Hadoop
Attach
Hashes
Hashed
Hadoop File
Joins across hash
values can be done
post load
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
23
Hashing / Data Vault 2.0 Model
NoSQL / Hadoop
RDBMS
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
JSON DOC {
LNK_OU_COMP_MD5,
SAT_LDTS,
SAT_LEDTS
SAT_RSRC,
ORG_UNIT_DETAILS {
UNIT_DESCRIPTION,
UNIT_LOCATION {
UNIT_LAT,
UNIT_LON }
UNIT_DATES {
UNIT_START_PRODUCTION,
UNIT_END_PRODUCTION }
}
JSON
Document
Audio file
Video File
Multi‐Structured
XML
24
8
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Architecture
BENEFITS:
• Enhances De‐Coupling
• Ensures Low Impact Changes
• Provides Managed Self‐Service BI
• Includes Seamless NoSQL
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
25
DV2.0 Systems Architecture
Ontology
Modeling
& Metadata
Soft Rules
Write Back
RDBMS
Finance
Cubes
Real Time
Planning
Soft
Rules
Hard
Rules
Production
In Memory
Batch
Appliances
Excel
Analytic
g
Tooling
Word
Sources
Staging
EDW – DV2
Data Marts
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
26
Implementation
BENEFITS:
• Enhances Automation
• Ensures Scalability
• Provides Consistency
• Includes Fault‐Tolerance
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
27
9
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Data Vault 2.0 is an
Enterprise BI System
Model
Architecture
Methodology
Implementation
•
•
•
•
•
•
•
Scalability
Flexibility
Consistency
Repeatability
Agility
Adaptability
Auditability
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
28
Changing Gears:
One Part of
Success
Managing effectively, but
empowering users
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
29
If you give a kid a bunch
of finger paint, does that
automatically make them
a master artist?
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
30
10
© Dan Linstedt, 2014 all rights reserved
6/12/2014
The correct approach is:
Managed
Self‐Service BI
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
31
Why is it managed?
Business users have
controlled access to
information in the
EDW system
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
32
Managed Self‐Service BI
33
http://LearnDataVault.com
11
© Dan Linstedt, 2014 all rights reserved
6/12/2014
So, How does this work?
Managed Self‐Service BI – Part 1
End Users manage their
own master data and
hierarchies directly in the
EDW / Data Vault!
Managed Self‐Service BI – Part 2
Data Driven Virtual Marts!
Tabular Data
Excel, Tableau, SAS, QlikView, Cubes
http://LearnDataVault.com
12
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Managed Self‐Service BI – Part 3
Visual Process Design - Business Rule Injection
Business User
Business Rules
GUI Tooling
Bringing Data Vault 2.0 to your Project
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
38
Key: Flexibility
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
39
13
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Case In Point:
Result of flexibility:
Merged 3 companies in 90 days
– ALL systems, ALL DATA!
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
40
Key: Scalability in Architecture
Scaling is easy, its based on the following principles
• Hub and spoke design
• MPP Shared‐Nothing Architecture
• Scale Free Networks
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
41
Case In Point:
Result:
Produced Data Vault,
Scaled to 3 Petabytes
(circa 2003)
‐ still growing today!
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
42
14
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Key: Scalability in Team Size
You should be able to SCALE your TEAM as well!
With the Data Vault 2.0 Methodology, you can:
Scale your team when desired, at different points in the project!
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
43
Case In Point:
(Dutch Tax Authority)
Result:
Changed Team Size on Demand!
Included Entry Level When
Needed
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
44
Key: Productivity
Increasing Productivity requires a reduction in
complexity. The Data Vault System simplifies all of
the following:
• ETL Loading Routines
• Real‐Time Ingestion of Data
• Data Modeling for the EDW
• Enhancing and Adapting for Change to the Model
• Ease of Monitoring, managing and optimizing
processes
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
45
15
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Case in Point:
Result of Productivity was: 2 people in 2 weeks merged 3
systems, built a full Data Vault EDW, 5 star schemas and 3
reports.
Generated:
• 90% of the ETL code for moving the data set
• 100% of the Staging Data Model
• 75% of the finished EDW data Model
• 75% of the star schema data model
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
46
The Competing Bid?
The competition bid this with 15 people
and 3 months to completion, at a cost of
$250k! (they bid a Very complex system)
Our total cost? $30k and 2 weeks!
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
47
Results?
Changing the direction
of the river takes less
effort than stopping
the flow of water
(Chinese Proverb)
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
48
16
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Who’s using it? Who Endorses it?
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
49
C.I.T.O Queensland Super Fund
“DV2.0 brings the assurance that we can cope with
an increased velocity in change, without falling
behind in our ability to support time sensitive
decision‐making.
The quality improvement and estimate accuracy
resulting from the disciplined process are bonus
factors in project delivery.”
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
50
Nols Ebersohn (Qsuper, Mgr of Information Architecture)
“DV2.0 training provides all the patterns and sample
code, so the learning curve for developers is contracted.
We ingested 7 systems, 6500 data items into our DV2.0
with the use of 3 ETL templates in 8 months, all using 2
week sprints for delivery cycles.”
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
51
17
© Dan Linstedt, 2014 all rights reserved
6/12/2014
Endorsements?
•
•
•
•
•
•
•
•
Bill Inmon
Claudia Imhoff
Clive Finkelstein
Peter Aiken
Scott Ambler
Stephen Brobst
John O’Brien
Howard Dresner
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
52
Who’s Using Data Vault?
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
53
THANK – YOU!
Book & Training:
http://LearnDataVault.com/
(Intro to Data Vault
is a FREE course)
CORPORATE PACKAGES AVAILABLE
Consulting:
Contact Us:
[email protected]
[email protected]
• Kick Start Package
• Accelerator Package
• Advanced Assessment Package
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
http://LearnDataVault.com
54
18