Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
© Dan Linstedt, 2014 all rights reserved 6/12/2014 © Dan Linstedt, 2014 all rights reserved Dan Linstedt 25 Years in the industry http://LearnDataVault.com Inside the pressure cooker that is BI and EDW LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 3 1 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Business Issues… Big Data (volume, velocity) Unstructured/Multi‐Structured Data (variety) Managed Self‐Service BI (analytics) Managed Self‐Service Data Discovery (bypassing IT) Auditability / Accountability Ownership and Governance Security and Privacy LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 4 Project Issues IT… Takes too long Over‐budget Too complex Can’t sustain growth THE GAP!! Business… Changes Frequently Needs Accountability Demands Auditability Wants Visibility Desires Autonomy LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 5 LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 6 http://LearnDataVault.com 2 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Diametrically opposed goals for the EDW Layer. Information Mart Goals Interpretation Interpolation Correlation Quality Rapid Delivery “Data” Warehouse Goals Sourcing Latency Scalability Auditability Historical Storage LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 7 Ever Growing Dimensional Warehouse Costs Forced Conformity per Cost month $500k Data Mart 3 Projects: 1) 3 months, $100k 2) 5 months, $250k 3) 7 months, $500k Data Mart 2 $250k Data Mart 1 $100k 3 6 9 12 15 18 21 Time in Months LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 8 LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 9 http://LearnDataVault.com 3 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Data Silos SALES We built our own because IT costs too much… FINANCE We built our own because IT took too long… MARKETING We built our own because we needed customized dimension data… LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 10 LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 11 “One cannot solve a problem with the same consciousness that created it.” Albert Einstein Time For A CHANGE http://LearnDataVault.com 4 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Forging Ahead LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 13 Data Vault 1.0 DV1 also uses Sequences! Data Vault 2.0 System Data Vault 1.0 is All About The Data Vault Model Methodology • Consistent • Repeatable • Pattern Based Architecture • Multi‐Tier • Scalable • Supports NoSQL Model • Flexible, Scalable • Joins to NoSQL • Hub & Spoke LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 15 5 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Agile Methodology BENEFITS: • Drives Agile Deliveries (2/3 weeks) • Includes CMM, Six Sigma, TQM • Manages Risk, Governance, Versioning • Defines Automation, Generation • Designs Repeatable Optimized Processes • Combines Best Practices for BI LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 16 DV 2.0 Methodology & CMM Follows: SEI/CMMI Level 5, PMP, Six Sigma, TQM, and Agile elements 5 Optimized business processes, repeatable, scalable, fault‐ tolerant. Automatable (generate‐able) 4 Metrics, Estimates vs Actuals, Function Point Analysis, Identification of broken processes 3 Defined Business Processes, Defined Goals, Defined Objectives 2 Risk assessments / analysis, managed processes, basic alignment efforts 1 Process unpredictable and poorly controlled LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 17 6 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Model Satellite BENEFITS: • Follows Scale Free Architecture • Based on Hub & Spoke Design • Backed by Set Logic & MPP Math Link Hub Satellite LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 19 Data Vault 2.0 Model DV2 uses Hash Keys Why? NoSQL http://LearnDataVault.com RDBMS 7 © Dan Linstedt, 2014 all rights reserved 6/12/2014 RDBMS NoSQL How Hashes Work With ELT / ETL RDBMS RDBMS Staging Satellite 1 Stage Table With Hashes Source File Satellite 3 Link Hub EL process EL process Distinct Parallel Load Operations Satellite 2 Hub (Staging from Hadoop to Relational) NoSQL (Document Store) Source File Copy or Load Hadoop File Hadoop Attach Hashes Hashed Hadoop File Joins across hash values can be done post load LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 23 Hashing / Data Vault 2.0 Model NoSQL / Hadoop RDBMS LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com JSON DOC { LNK_OU_COMP_MD5, SAT_LDTS, SAT_LEDTS SAT_RSRC, ORG_UNIT_DETAILS { UNIT_DESCRIPTION, UNIT_LOCATION { UNIT_LAT, UNIT_LON } UNIT_DATES { UNIT_START_PRODUCTION, UNIT_END_PRODUCTION } } JSON Document Audio file Video File Multi‐Structured XML 24 8 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Architecture BENEFITS: • Enhances De‐Coupling • Ensures Low Impact Changes • Provides Managed Self‐Service BI • Includes Seamless NoSQL LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 25 DV2.0 Systems Architecture Ontology Modeling & Metadata Soft Rules Write Back RDBMS Finance Cubes Real Time Planning Soft Rules Hard Rules Production In Memory Batch Appliances Excel Analytic g Tooling Word Sources Staging EDW – DV2 Data Marts LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 26 Implementation BENEFITS: • Enhances Automation • Ensures Scalability • Provides Consistency • Includes Fault‐Tolerance LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 27 9 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Data Vault 2.0 is an Enterprise BI System Model Architecture Methodology Implementation • • • • • • • Scalability Flexibility Consistency Repeatability Agility Adaptability Auditability LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 28 Changing Gears: One Part of Success Managing effectively, but empowering users LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 29 If you give a kid a bunch of finger paint, does that automatically make them a master artist? LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 30 10 © Dan Linstedt, 2014 all rights reserved 6/12/2014 The correct approach is: Managed Self‐Service BI LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 31 Why is it managed? Business users have controlled access to information in the EDW system LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 32 Managed Self‐Service BI 33 http://LearnDataVault.com 11 © Dan Linstedt, 2014 all rights reserved 6/12/2014 So, How does this work? Managed Self‐Service BI – Part 1 End Users manage their own master data and hierarchies directly in the EDW / Data Vault! Managed Self‐Service BI – Part 2 Data Driven Virtual Marts! Tabular Data Excel, Tableau, SAS, QlikView, Cubes http://LearnDataVault.com 12 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Managed Self‐Service BI – Part 3 Visual Process Design - Business Rule Injection Business User Business Rules GUI Tooling Bringing Data Vault 2.0 to your Project LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 38 Key: Flexibility LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 39 13 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Case In Point: Result of flexibility: Merged 3 companies in 90 days – ALL systems, ALL DATA! LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 40 Key: Scalability in Architecture Scaling is easy, its based on the following principles • Hub and spoke design • MPP Shared‐Nothing Architecture • Scale Free Networks LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 41 Case In Point: Result: Produced Data Vault, Scaled to 3 Petabytes (circa 2003) ‐ still growing today! LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 42 14 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault 2.0 Methodology, you can: Scale your team when desired, at different points in the project! LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 43 Case In Point: (Dutch Tax Authority) Result: Changed Team Size on Demand! Included Entry Level When Needed LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 44 Key: Productivity Increasing Productivity requires a reduction in complexity. The Data Vault System simplifies all of the following: • ETL Loading Routines • Real‐Time Ingestion of Data • Data Modeling for the EDW • Enhancing and Adapting for Change to the Model • Ease of Monitoring, managing and optimizing processes LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 45 15 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. Generated: • 90% of the ETL code for moving the data set • 100% of the Staging Data Model • 75% of the finished EDW data Model • 75% of the star schema data model LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 46 The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Our total cost? $30k and 2 weeks! LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 47 Results? Changing the direction of the river takes less effort than stopping the flow of water (Chinese Proverb) LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 48 16 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Who’s using it? Who Endorses it? LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 49 C.I.T.O Queensland Super Fund “DV2.0 brings the assurance that we can cope with an increased velocity in change, without falling behind in our ability to support time sensitive decision‐making. The quality improvement and estimate accuracy resulting from the disciplined process are bonus factors in project delivery.” LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 50 Nols Ebersohn (Qsuper, Mgr of Information Architecture) “DV2.0 training provides all the patterns and sample code, so the learning curve for developers is contracted. We ingested 7 systems, 6500 data items into our DV2.0 with the use of 3 ETL templates in 8 months, all using 2 week sprints for delivery cycles.” LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 51 17 © Dan Linstedt, 2014 all rights reserved 6/12/2014 Endorsements? • • • • • • • • Bill Inmon Claudia Imhoff Clive Finkelstein Peter Aiken Scott Ambler Stephen Brobst John O’Brien Howard Dresner LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 52 Who’s Using Data Vault? LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved 53 THANK – YOU! Book & Training: http://LearnDataVault.com/ (Intro to Data Vault is a FREE course) CORPORATE PACKAGES AVAILABLE Consulting: Contact Us: [email protected] [email protected] • Kick Start Package • Accelerator Package • Advanced Assessment Package LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved http://LearnDataVault.com 54 18