Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Do It Strategically with Microsoft Business Intelligence! Bojan Ciric Strategic Consultant ARCHITECTURE Technical aspects of the project Data Integration Data Analysis Data Visualization Solution Architecture SQL Server as a backbone DATA INTEGRATION ETL: Extract, Transform, and Load 1. Extract data from the source systems 2. Transform the data to convert it to a desired format 3. Load the data into the data warehouse Data Integration More than Just ETL Transform corporate data into meaningful and actionable information Challenges – Retrieve and merge data from multiple sources – Cleanse and transform the data – Load the data into appropriate data stores for analysis and reporting Enterprises spend 60%–80% of their BI resources in the data integration stage Top tips for successful ETL: Choose the right ETL tool •large volume of data and complex transformations requires powerful tool available to dealing with these challenges Use a lot metadata •since that essence of ETL is composed by mapping of operational systems to destination as well as data transformation, store all of that in metadata instead of applications. Over the time you will get benefits through maintenance lower costs. Engage experienced ETL developers •they are expensive but several times more efficient then inexperienced Choose the right destination data model •operational sources you cannot choose, but good destination models allows you less complex transformations and better information potential Establish data governance •during ETL process, it is extremely important and quite efficient to establish data governance within the organization if no any. Good Data Governance policy and procedures is essential environment for the ETL process during the operational work. Let’s do ETL with SSIS SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: – Control flow – Data flow 32-bit and 64-bit editions The Package The basic unit of work, deployment, and execution An organized collection of: – – – – – – Connection managers Control flow components Data flow components Variables Event handlers Configurations Can be designed graphically or built programmatically Saved in XML format to the file system or SQL Server Control Flow Control flow is a process-oriented workflow engine A package contains a single control flow Control flow elements – – – – Containers Tasks Precedence constraints Variables Data Flow The Data Flow Task – Performs traditional ETL and more – Fast and scalable Data Flow Components – Extract data from Sources – Load data into Destinations – Modify data with Transformations Service Paths – Connect data flow components – Create the pipeline Data Flow Sources Sources extract data from – Relational tables and views – Files – Analysis Services databases Data Flow Destinations Destinations load data to – – – – Relational tables and views Files Analysis Services databases and objects DataReaders and Recordsets Row Transformations Update column values or create new columns Transform each row in the pipeline input Rowset Transformations Create new rowsets that can include – – – – Aggregated values Sorted values Sample rowsets Pivoted or unpivoted rowsets This is a heavy-weight performer of SSIS Are also called asynchronous components Split and Join Transformations Distribute rows to different outputs Create copies of the transformation inputs Join multiple inputs into one output Perform lookup operations Custom Developed Components Fully aligned with IBM BDW model Pexim Message Mapper Classification Resolver Pexim Surrogate Key Pexim Accounting Unit Pexim Rank Resolver Source To BDW Mapping Classification Management Surrogate Key Management Accounting Unit Management Rank Management ETL Concept Evolution – Custom Components ETL Concept Evolution – Logic out of SQL Queries Microsoft SQL Server Integration Services DEMO OLAM, Multidimensional Data, Data mining and so on DATA ANALYSIS OLAP or Multidimensional Data Online Analytical Processing = Multidimensional Data Measures and Dimensions Uses a calculation engine for fast, flexible transformation of base data (such as aggregates) Supports discovery of business trends and statistics not directly visible in data warehouse queries Cube (UDM) Unified Dimensional Model Combination of measures (from facts) and dimensions as one conceptual model Rich data model enhanced by – – – – – – Calculations Key Performance Indicators (KPIs) Actions Perspectives Translations Partitions Formally, cube is called a UDM Querying a Simple Cube What sales did we expect to achieve in North America for CY 2004 Q1? 5,005,000 Q1 Q2 Pacific Europe Q4 North America Measures Dimension Sa le s Q3 Te rri to ry N/A Star Schema Star Schema Benefits Transforms normalized data into a simpler model Delivers high-performance queries Delivers higher performing queries using Star Join Query Optimization Uses mature modeling techniques that are widely supported by many BI tools Requires low maintenance as the data warehouse design evolves Snowflake Dimension Tables Define hierarchies using multiple dimension tables Support fact tables with varying granularity Simplify consolidation of data from multiple sources Potential for slower query performance in relational reporting No difference in performance in Analysis Services database Hierarchies Benefits – View of data at different levels of summarization – Path to drill down or drill up Implementation – Denormalized star schema dimension – Normalized snowflake dimension – Self-referencing relationship Fact Table Fundamentals Collection of measurements associated with a specific business process Specific column types – Foreign keys to dimensions – Measures – numeric and additive – Metadata and lineage Consistent granularity – the most atomic level by which the facts can be defined Fact Table Examples Parent-Child Hierarchy A dimension that contains a parent attribute A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table Common examples – Organizational charts – General Ledger structures – Bill of Materials Parent-Child Hierarchy Example Slowly Changing Dimensions Support primary role of data warehouse to describe the past accurately Maintain historical context as new or changed data is loaded into dimension tables Implement changes by Slowly Changing Dimension (SCD) type – Type 1: Overwrite the existing dimension record – Type 2: Insert a new ‘versioned’ dimension record – Type 3: Track limited history with attributes SCD Type 1 Existing record is updated History is not preserved SCD Type 2 Existing record is ‘expired’ and new record inserted History is preserved Most common form of SCD SCD Type 3 Existing record is updated Limited history is preserved Implementation is rare SQL Server 2008 Analysis Services OLAP component – – – Aggregates and organizes data from business data sources Performs calculations difficult to perform using relational queries Supports advanced business intelligence, such as Key Performance Indicators Data mining component – – Discovers patterns in both relational and OLAP data Enhances the OLAP component with discovered results Cube = Unified Dimensional Model Multidimensional data Combination of measures and dimensions as one conceptual model – Measures are sourced from fact tables – Dimensions are sourced from dimension tables Dimensions and Facts Basis of All BI Fact – something that happened – – – – Sale, purchase, shipping... Transaction or an event Verb Essentially a Measure Dimension – describes a fact – Customer, product, account... – Object – Noun A fact (measure) is expressed in terms of dimensions – 16 balls sold to John on 20090115. Dimensions Describe business entities Contain attributes that provide context to numeric data Present data organized into hierarchies Dimensions Members from tables/views in a data source view (based on a Data Warehouse) Contain attributes matching dimension columns Organize attributes as hierarchies – One All level and one leaf level – User hierarchies are multi-level combinations of attributes – Can be placed in display folders Used for slicing and dicing by attribute Hierarchy Defined in Analysis Services Ordered collection of attributes into levels Navigation path through dimensional space Very important to get right! Measure Group Group of measures with same dimensionality Analogous to a fact table Cube can contain more than one measure group – E.g. Sales, Inventory, Finance Defined by dimension relationships Dimension Model Calculations Expressions evaluated at query time for values that cannot be stored in fact table Types of calculations – Calculated members – Named sets – Scoped assignments Calculations are defined using MDX MDX = MultiDimensional EXpressions Microsoft SQL Server Analysis Services - OLAP DEMO DATA MINING Data Mining Discovery of (very) hidden patterns in mountains of data Correlation search engine Recent combination of statistics, probability analysis, database technologies, machine learning, and AI What does Data Mining do? Typical Uses Seek Profitable Customers Correct Data During ETL Detect and Prevent Fraud Understand Customer Needs Data Mining Build Effective Marketing Campaigns Anticipate Customer Churn Predict Sales Typical Scenarios Customer Classification & Segmentation • Who are our customers? • Are there any relationships between their demographics and their buying power? Profitability and Risk • Who are our most profitable customers? • Can I predict profit of a future customer based on demographics? • Are they creditworthy? • How much should I charge them to give a good loan and protect against losses? Customer Needs Analysis • How do they behave? • What are they likely to do once they bought that really expensive car? • Should I intervene? Forecasting • What are my sales going to be like in the next few months? • Will I have credit problems? • Will my server need an upgrade in the next 3 months? Mining Process Data Mining DEMO Summary Data Mining is a powerful, predictive technology Turns data into valuable, decision-making knowledge SQL Server 2008 Analysis Services support Predictive Analytics Mine your mountains of data for gems of intelligence today! DATA VISUALIZATION Key Performance Indicator (KPI) Quantifiable measurement comparing business performance to goals Measure of overall organizational health when combined into a collection for a business scorecard Three main ways to build KPIs: – Using OLAP (cubes) – Directly in Performance Point Server – Using data mining (predictive KPI) KPI Characteristics Value Goal Status Trend Dashboards and Scorecards Scorecard – Table (pivot-like) of KPIs Dashboard – Contains scorecards, reports, and other analytical visualisations Building Dashboard with Microsoft Excel 2010 DEMO PowerPivot for Excel 2010 PowerPivoting Massive Data Volumes With a few mouse clicks, a user can create and publish intuitive and interactive self-service analysis solutions. Empower Your Users—With Familiar Tools The ease of Excel now with: Unmatched computational power Advanced analytic expressions Empower Your Users—With Familiar Data Users know reports: What’s available What’s useful Where to find them How to use them IT provions reports: With security and scalability Interactive Slicing and Dicing Excel 2010 and Excel Services Interactive slicers enable users to look at the data from various directions in Excel 2010 and in the browser through PowerPivot for SharePoint and Excel Services. © 2010 Asseco SEE. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Asseco SEE and/or Bojan Ciric. The material presented is not certain and may vary based on several factors. Portions © 2009 Asseco SEE & entire material © 2009 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view Asseco SEE as of the date of this presentation. Because Asseco SEE & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Asseco SEE cannot guarantee the accuracy of any information provided after the date of this presentation. Asseco SEE makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.