* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Analysis Services
Expense and cost recovery system (ECRS) wikipedia , lookup
Data analysis wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Information privacy law wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Operational transformation wikipedia , lookup
3D optical data storage wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Business intelligence wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Data vault modeling wikipedia , lookup
Analysis Services Choosing between Multidimensional and Tabular Helena Cabral - BI4ALL [email protected] Our Main Sponsors: Say Thank you to Volunteers: They spend their FREE time to give you this event. Because they are crazy. Because they want YOU to learn from the BEST IN THE WORLD. Paulo Matos: Pedro Simões: André Batista: Paulo Borges: André Melancia: Murilo Miranda: Quilson Antunes: 5 Sponsor Sessions at 15:10 Don’t miss them, they might be getting distributing some awesome prizes! Rumos BI4ALL Bold Int CozyRoc Pythian Important Activities: WIT – Women in Technology 15:10 at BizSpark Room (Ground Floor) SQLClinic Challenges 10:00 DEV (Neil Hambly) 11:50 DBA (Uwe Ricken) 17:00 BI (Steph Locke) Agenda SSAS Evolution Tabular: What’s it and how does it work? Multidimensional vs Tabular Development experience Data modeling Scalability Query performance Process performance Security Client applications support Conclusions v12 SQL Server 2014 v11 SQL Server 2012 v10 SQL Server 2008 (and R2) v9 SQL Server 2005 v8 SQL Server 2000 v7 SQL Server 7.0 •First release with OLAP Services •MOLAP, ROLAP and HOLAP architectures •MDX query language •SSDT-BI •Power View for Multidimensional Models (using DAX) •xVelocity engine •Power Pivot and Tabular Model •DAX language •Business Intelligence Semantic Model (BISM) •Improved Scalability and Performance •Attribute and User Hierarchies •Business Intelligence Development Studio (BIDS) •Unified Dimension Model (UDM) •Data Mining services •Improved dimension design (changing dimensions, parent-child, virtual dimensions) •Improved calculation engine (custom rollups, cell calculations) •Dimension security •Connectivity over HTTP SSAS Evolution SSAS Evolution Why two approaches? Only one data modeling tool for different needs Similar development experience Easy migration between SelfService and Corporate BI New column-oriented and inmemory storage technology Multidimensional concepts are still difficult to learn What is it? How does it work? Does it still have dimensions and measure groups? SSAS TABULAR SSAS Tabular What is it? Relational modeling Uses xVelocity which is an in-memory engine with column-oriented storage Provides higher compression and faster in-memory operations such as aggregations Supports for both MDX and DAX query languages Available in SQL Server 2012/14 Enterprise and Business Intelligence editions Also supports a DirectQuery mode You can add calculated columns and measures to any table in the model You can create hierarchies in any table in the model You can partition any table in the model Unlike Multidimensional models, there’s no difference between a dimension table and a measure group table. Sometimes we don’t need a dimensional model! Dimensions vs Fact Tables Which development tools can I use? How long is the learning curve? DEVELOPMENT EXPERIENCE Development tools TABULAR SQL Server Data Tools for Visual Studio Open an existing Power Pivot model MULTIDIMENSIONAL SQL Server Data Tools for Visual Studio Development process TABULAR Easy to wrap a model over raw sources or DW for prototyping purposes Model can be created from a Self-Service experience Smaller learning curve (Excel like) Need to learn DAX MULTIDIMENSIONAL Model should be created over a DW source Not so easy prototyping Higher learning curve IT required Wide knowledge base Need to learn MDX Development experience TABULAR Workspace database directly related with the project WYSIWYG experience, but also can make the development experience too slow Single developer at a time (one model per solution) MULTIDIMENSIONAL Need to be deployed and processed to server to see the results Support for multiple developers Does Tabular have the same advanced model options? Do I have to learn DAX? DATA MODELING Data Modeling TABULAR MULTIDIMENSIONAL Relational Modeling: Tables and more tables Dimensional Modeling: Facts and Dimensions Basic Relationships (…and advanced relationships through DAX calculations) Advanced Relationships natively supported (parentchild, many-to-many, attribute relationships, …) No role playing dimension (two tables can only have one active relationship) Support multiple granularity DAX calculations Do not support multiple granularity in the same measure group MDX calculations Other (un)supported features TABULAR MULTIDIMENSIONAL Some features (as Actions, Translations and Display Folders) are possible by editing the generated XMLA or by using BIDS Helper tool available at CodePlex Named-Sets, Scopes, Actions, Translations Scopes can be defined by using calculated columns Display Folders, custom format of measures, default values for dimensions attributes Write-back, Data Mining, Custom Assemblies, Custom Rollups Process behavior configuration DimSalesReason SalesReasonKey SalesReasonName SalesReasonReasonType FactInternetSalesReason SalesOrderNumber SalesOrderLineNumber SalesReasonKey Relational model FactInternetSales SalesOrderNumber SalesOrderLineNumber SalesAmount SalesAmountWithReason := CALCULATE ( SUM ( 'Internet Sales'[Sales Amount] ); FILTER ( FactInternetSalesReason; CALCULATE ( COUNTROWS ( DimSalesReason ) > 0 ) ) ) Many to many relationships are not supported by Tabular (natively) But can be achieved by using some advanced DAX calculations! DAX Calculation DEMO #1 DimProduct ProductKey (...) Relational model FactProductInventory DimDate ProductKey DateKey UnitsIn UnitsOut DateKey (...) DAX Query EVALUATE SUMMARIZE ( CALCULATETABLE ( 'Product Inventory'; 'Product Category'[Product Category Name] = "Clothing" ); 'Product'[Product Name]; 'Date'[Date]; "Total Units Movement", [Total Units Movement] ) ORDER BY 'Product'[Product Name]; 'Date'[Date] Calculating balances by using transactions Need to sum all transactions up to a given day A common solution is to create a snaphost table with the quantity (calculated in ETL) for each product/day Calculate balances on the fly using Tabular/DAX! DEMO #2 Does Tabular support a large volume of data? SCALABILITY AND DEPLOY Operation modes TABULAR MULTIDIMENSIONAL In-memory/cached: All data is loaded in memory and all queries are answered from there MOLAP: Data is pulled from relational store, is aggregated and stored in a multidimensional format Direct Query: Bypass the in-memory model, queries are answered directly from database source (SQL Server only) ROLAP: All data stay in the relational store and additiona objetcs are created for calculated aggregations Model scalability TABULAR MULTIDIMENSIONAL Data volume mostly limited to physical memory Data volumes can scale to multiple terabytes Small/medium size models Medium/large size models Processor centric: Memory is the most critical resource Disk centric: Disk performance is important because is used as the primary data storage for preaggregated data Disk performance is not an important factor How fast is Tabular compared to Multidimensional? Is always faster? QUERY PERFORMANCE Querying scenarios TABULAR Always read columnar data from RAM Aggregate data in memory Second (warm) run is just as fast the first (cold) one MULTIDIMENSIONAL With no aggregations defined: read atomic data from disk and aggregate data in memory With predefined aggregations: Cold Cache: read aggregated data from disk Warm Cache: read aggregated data from RAM Querying preparation TABULAR MULTIDIMENSIONAL Use DAX for detailed reports Use MDX or DAX for querying Use MDX for aggregated data Need to define aggregations No special tuning is required Benefits from warmup cache mechanisms Outperforms Multidimensional: Report on low granularity data Outperforms Tabular: Report on aggregated data with predefined aggregations on Warm Cache Report on aggregated data with no predefined aggregations MDX Query SELECT { [Measures].[Discount Amount] ,[Measures].[Reseller Freight Cost] ,[Measures].[Reseller Gross Profit] ,[Measures].[Reseller Total Product Cost] ,[Measures].[Reseller Sales Amount] ,[Measures].[Reseller Tax Amount] } ON COLUMNS, NON EMPTY ( [Geography].[Geography].[City] *[Product].[Product].[Product] *[Sales Territory].[Sales Territory Region].[Sales Territory Region] *[Promotion].[Promotion].[Promotion] ) ON ROWS FROM [Adventure Works] Detailed queries are normally an headache Most of the times, we opt to query directly the DW relational source With Tabular, this detailed queries perform better! DEMO #3 DimCustomer CustomerKey Relational model (...) FactInternetSales CustomerKey OrderDateKey SalesAmount DimDate Distinct Count (DC) operations are expensive in Multidimensional Moreover, need to create and maintain multiple measure groups (one for each DC measure) With Tabular, DC measures performs better without need special modeling! DateKey (...) DAX Query NewCustomers := CALCULATE ( DISTINCTCOUNT ( 'Internet Sales'[CustomerKey] ); FILTER ( ALL ( Date ); 'Date'[DateKey] <= MAX ( 'Date'[DateKey] ) ) ) - CALCULATE ( DISTINCTCOUNT ( 'Internet Sales'[CustomerKey] ); FILTER ( ALL ( Date ); 'Date'[DateKey] < MAX ( 'Date'[DateKey] ) ) ) DEMO #4 What are the process options? Is there any dependencies? PROCESS PERFORMANCE Processing preparation TABULAR Every table can be partitioned No processing dependencies (one table can be processed without processing related tables) MULTIDIMENSIONAL Only measure groups can be partitioned Processing dependencies (need to process dimensions first) Don’t support parallel partition processing Support for parallel partition processing Data in other tables/partitions can be queried during processing Aggregations must be recalculated when dimensions processing occurs Processing options TABULAR Full process of a whole DB might consume too much memory Process Defrag: rebuilds the table dictionary Process Recalc: recalculates calculated columns, rebuilds relationships, user hierarchies and other internal engine structures MULTIDIMENSIONAL • Process Index: creates or rebuilds indexes and aggregations for all processed partitions Does Tabular have the same security options as Multidimensional? SECURITY Security TABULAR MULTIDIMENSIONAL Don’t support cell security, only table and row level security • Advanced security features that can go up to cell-level Drill through permission is set to all roles • It’s possible to set drill through permission at role level Dynamic Security • Dynamic Security Only supports visual totals • Supports both visual totals and non-visual totals Use DAX to restrict access to Row Sets Do I have to change my client applications? CLIENT APPLICATIONS SUPPORT Client Applications Support TABULAR Query languages: MULTIDIMENSIONAL Query languages: In Cached Mode: MDX and DAX In DirectQuery mode: only DAX • MDX and DAX (as of SQL Server 2012 SP1 CU4) Client tool Language Microsoft Excel PivotTable MDX Power View DAX SQL Server Reporting Services MDX / DAX PerformancePoint MDX Other applications (ADO.NET of OLE DB provider) MDX / DAX So.. What do I choose? CONCLUSIONS You should choose Multidimensional! You have an extremely large cube that exceeds server memory You need some of the features not supported by Tabular (even with DAX) You have a complex model (budgeting/forecasting, multiple many-to-many, parent-child…) You are an MDX expert You hate workarounds Your reporting needs are typically aggregated You have all your models in Multidimensional mode and want to migrate to Tabular just because! You should choose Tabular! You have small/medium models that fit in server memory You need a non-multidimensional model You have simple models (and not need neither of the unsupported features) You really never understood MDX (just give a try to DAX, although is not so simple than they say) You are an enthusiastic about workarounds! You have detailed and ad-hoc reports that takes to long to run You have a new model with no special needs and it will be great if you can show it just after lunch (prototyping!) Our Main Sponsors: