* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DATC01 - Business Intelligence For The Relational Expert
Survey
Document related concepts
Operational transformation wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Relational model wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Data analysis wikipedia , lookup
Data vault modeling wikipedia , lookup
Transcript
DATC01 Business Intelligence For The Relational Expert Len Wyatt And Dave Wickert [email protected] [email protected] Program Managers Microsoft Corporation Overview BI Users and Applications BI Technology Overview …And a little terminology Dimensional Modeling Intro to ETL and Data Transformation Svcs. Intro to OLAP and Analysis Services Intro to Reporting Services Intro to Client Tool Options Key BI Challenges Getting more information BI Users Business users Don’t want to formulate queries Clear, graphical output Need model that matches business concepts Usage models Reports (in many forms!) Interactive exploration Portals Multidimensional Analysis Cube concept Edges are dimensions Groceries Electronics Product Clothing North South Garden East West Automotive Q1 Q2 Q3 Time Q4 Geog Dimension Hierarchies Time dimension “All” level g All Time Year level g 2003 Quarter level g Month level g Day level g Jan 2004 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Feb Mar Oct Nov Dec Jul Aug Sep 1 2 3 4 5 6 7 Product dimension levels: Group, Subgroup, Family, Brand, Item, UPC … Business Analysis with Drilldown, Slice And Pivot OLTP Versus Business Intelligence Who asks what? OLTP Questions Analysis Questions When did that order What factors affect order ship? processing time? How many units are in How did each product inventory? line (or product) contribute to profit last Does this customer quarter? have unpaid bills? Which products have the lowest Gross Margin? Are any of customer X’s line items on What is the value of items backorder? on backorder, and is it trending up or down over time? What Is A DW Made From? Data Marts and cubes Source Systems Relational Data Store Reports Clients DW = ETL + Relational Store + OLAP + Reporting + Clients + Metadata ? + Data Marts ? + Data Mining ? Compare and contrast… “Relational BI” and “OLAP” “Data warehouse” and “Data mart” Subject orientation… Warehouse == Multi-subject Mart == Single-subject Staging area Warehouse: Staging area (back room) Most complete, stable data Mart: Supports user queries Analysis Services is more than a data mart Query engine for dimensional data Platform for analytical applications Microsoft BI Product Stack Solution Accelerators Excel OWC Data Analyzer SharePoint Portal Server SQL Server Relational Engine and DTS (ETL) Reporting Services Analysis Services OLAP and Data Mining Management Tools Dev Tools Visual Studio .Net SSABI, Business Scorecards and Excel Reporting Open, integrated, and based on industry standards Dimensional Modeling As implemented in a RDBMS Data is presented in a standard, intuitive framework that allows for high-performance access The measurement data is organized in a single table with a multipart key, called the fact table, and a set of smaller tables called dimension tables The other non-FK fields in the fact table are numeric, additive measures, e.g., sales $, counts, etc. Dimensions are then mostly all descriptive, text fields Star And Snowflake Schemas Star Snowflake • Above the line, FKs • Below are measures Star And Snowflake Schemas Star = denormalized hierarchies into a single dimension table Snowflake = normalized (FK constraint) tables Structures are always one-to-many up the hierarchy – they can be regular, ragged or parent-child, but always one-to-many Facilitates rollups; drill down, etc. Why Use A Dimensional Model? Clear, predictable model which allows reporting and query tools to make assumptions around How data is presented, grouped, and navigated (i.e., slice ‘n pivot) Browse dimensions (and hierarchies) independent of numeric data Users are not locked into one data access route – drilldown and pivot Well-understood semantics for high-performance, i.e., aggregation Multidimensional Analysis with Drilldown, Slice And Pivot Slice ‘n Dice Analysis The underlying MDX . . . Slice ‘n Dice Relationally Equivalent TSQL if done relationally… Slice ‘n Dice Relationally To implement declarative MDX in a relational system becomes: SUMs over GROUP BYs Exponential number of possible combinations Should be materialized for performance Thus we need a specialized runtime engine . . . Data Transformation Services Intro to ETL ETL = Extract, Transform, Load Moving data from production systems to DW Checking data integrity Assigning surrogate key values Collecting data from disparate systems Reorganizing data Data Transformation Services Data Transformation Services Key Uses Populating Data Warehouses from production systems Grocery store chain uploads all store transactions daily, and summarizes in a data warehouse used by HQ and store managers Collecting data from disparate systems Airport security system gets a regular feed of employee lists from airline HR systems Reorganizing data Web server logs are filtered, surrogate keys are assigned, and data is aggregated for later analysis Synthesizing data Financial positions are valued each night, with suspect records spooled to a special location for troubleshooting the next day Analysis Services Data organized as a dimensional model, e.g. cubes, dimensions, etc. Direct support for slice ‘n dice, drilldown and pivot Aggregates are pre-calculated for fast query response DBAs can control what aggregates are calculated (trade-off of ‘processing time’ verses ‘query time’) Fact table is indexed for fast retrieval, if needed Built-in support for calculated measures, e.g. Profit = ([Sales] – [Cost]) Built-in statistical functions, e.g. std dev, trends, period comparisons, top count, etc. Standard query interface allows open client tool market Analysis Services Key Idea: Aggregations Subtotals at a certain level from every dimension Highest Level Aggregation Customers Product All Customers Country State City Name All Products Category Intermediate Aggregation Brand countryCode productID Units Sold Item Can sd452 9456 SKU US yu678 4623 Customer Product Units Sold Sales All All 347814123 $345,212,301.30 Sales $23,914.30 $57,931.45 … Facts custID SKU Units Sold Sales 345-23 135123 2 $45.67 563-01 451236 34 $67.32 … Analysis Services Contrasting OLAP to relational Analysis Services Aggregation Designing aggregations Optimizing queries Query language Intrinsic to AS Relational DW Materialized view with GROUP BY clauses Wizards for simple Analyze which aggs design process best support query pattern Run-time query Query Analyzer + engine selects aggregate navigator nearest aggregations (or very sophisticated query tools) MDX SQL About MDX… You need to learn MDX to use AS The query language For expressing calculations and security It looks a little like SQL, but don’t be fooled! MDX select {[Measures].[Store Sales]} on columns, {[Time].[1997].Children} on rows from Sales SQL select time.quarter, sum(fact.store_sales) from sales_fact_1997 fact, time_by_day time where fact.time_id = time.time_id and time.the_year = '1997' group by time.quarter Reporting Services Multiple report delivery targets Common formats: HTML, PDF, .XLS, etc. Report definition is separated from rendering: can re-render without re-running report Scheduling / Subscriptions / Snapshots Parameterized reports, linked reports, sub-reports, graphics, charts, bursting Extensible and embeddable Report Rendering HTML PDF Excel Report Authoring Report developers can create reports to be published to the Report Server using Microsoft or 3rd party design tools that support the Reporting Services XML report definition format. Authoring Management Execution & Delivery End Users Report Definition Managed Report Generated Report Delivery Targets Report Management Report definitions, folders, and resources are published and managed in a SQL Server database. Reporting Services provides Web Service APIs and graphical tools for report management. Authoring Management Execution & Delivery End Users Report Definition Managed Report Generated Report Delivery Targets Report Execution And Delivery Execute on-demand or via schedule Queries are executed and the result sets are merged with the report layout On-demand “pull” or server-based “push” delivery View reports from the included web-based front-end or subscribe to reports for delivery Authoring Management Execution & Delivery End Users Report Definition Managed Report Generated Report Delivery Targets Client Tool Options Reporting Renderers HTML, PDF, Excel, etc. Exploration MS Excel, OWC, MS Excel Accelerator, MS MapPoint, MS Data Analyzer Proclarity, Panorama, and other 3rd party OLAP query tools Portal / KPI MS SharePoint Portal Server, MS Balanced Scorecard Accelerator, and other web-based tools Exploration Tool Proclarity (as an example) Portal / KPI SharePoint Portal Server Office Solution Accelerator for Business Scorecards (beta) Geospatial Exploration MapPoint Tie BI data to map display Connect directly to OLAP data from MapPoint (via add-in) http://msdn.microsoft.com/downloads/sample.asp?url=/MSDNFILES/027/001/871/msdncompositedoc.xml Client Selection Criteria Web deployment? Analysis Services functionality used? Existing BI infrastructure? Users? Executives, Analysts, Management, Front line Visualization Needs Data Mining Needs Reporting Needs Cost What We’ve Covered… Data Marts and cubes Source Systems Relational Data Store Reports Clients Key BI Challenges: Data Changing data “Changing dimensions” issue Surrogate keys Rewriting history versus maintaining history Restatements Try not to alter facts; Use journal approach What if you have to re-do a month of data? Data lineage Where does this number come from? Data cleansing Data normalization Across various source systems Surrogate keys help here too Key BI Challenges: Scale Often there are large data volumes in a DW Must manage multiple technologies at scale! Be aware of calculation complexity at scale Watch out for these in any large system; Might work in the POC but not at scale Complex calculations (“wide queries”) Median calculations are simple example Distinct counts Very large dimensions CrossJoins on large dims Airline origins and destinations, for example Summary What BI is Why it is important What infrastructure you will need and its technology components Key challenges More Information Technical: “The Data Warehouse Toolkit” by Ralph Kimball, and “The Data Warehouse Lifecycle Toolkit” http://www.ralphkimball.com/html/books.html Business: “Business Intelligence” by Elizabeth Vitt, et. al. http://www.microsoft.com/MSPress/books/5735.asp SQL Server 2000 Books Online is a great resource White papers, tools, etc. on Microsoft BI technologies http://www.microsoft.com/sql/evaluation/bi/ Please fill out a session evaluation on CommNet Q1: Overall satisfaction with the session Q2: Usefulness of the information Q3: Presenter’s knowledge of the subject Q4: Presenter’s presentation skills Q5: Effectiveness of the presentation © 2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.