Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The New Possibilities in Microsoft Business Intelligence Johan Åhlén & Tim Peterson, SolidQ Guest speakers: Tim Mallalieu & Miguel Llopis, Microsoft • "Information is the Oil of the 21st Century - BI and Analytics are the Refinery” (Gartner) Presenters • Johan Åhlén – SolidQ Mentor & Sweden CEO – President, Swedish SQL Server User Group – Microsoft SQL Server MVP – Blog: www.joinsights.com • Tim Peterson – SolidQ Mentor & Nordic Board Member – Co-author of the SSAS 2008 R2 Maestros course – Blog: http://timpetersonbi.wordpress.com Data Explorer presenters • Timothy Mallalieu – Group Program Manager, Cloud Data Services Team – Microsoft – Blog: http://blogs.msdn.com/timmall • Miguel Llopis – Program Manager, Cloud Data Services Team – Microsoft – Blog: http://blogs.msdn.com/mllopis Challenge: New data sources TOMORROW TODAY YESTERDAY What was the result ? How does the customer see us ? How can we continue to succeed ? Efficient processes? Business Processes Customers Finance VISION/ STRATEGY Social Media Competitor Data etc Challenge: Data explosion • World wide information stored volume is at least doubling each year. (EMC) • 87% of performance issues in application databases are related in some way to data growth. (OAUG) Challenge: The BI dilemma Operational Analytics Data Warehouse / ETL Management’s Perceived value Developer’s Effort Scorecards and Dashboards The New Possibilities New Data Sources Big Data Windows Azure Marketplace PDW Self-service BI PowerPivot Power View Codename Data Explorer Hadoop (not covered in this session) Codename Data Explorer End-to-end self service BI DEMO The Business Intelligence Semantic Model • The Past - The Unified Data Model (UDM) in Analysis Services 2005/2008 • The Future – The Business Intelligence Semantic model in Analysis Services 2012 – Multidimensional model – Tabular model Upgrading to BISM • Upgrading to 2012 BISM Multidimensional – Almost no change from Analysis Services 2008 – No preparation needed – Some improvements • Upgrading to 2012 BISM Tabular – Very different structure – Standard recommendation – start over! Tabular/Multidimensional Differences Calculations Querying Use with Crescent In-Memory Aggregations Multidimensional MDX MDX No No Yes (optional) Tabular DAX DAX or MDX Yes Yes - as option No Querying Relational Database Yes - as option (ROLAP/HOLAP) Yes -as option (Direct Query) Client Choice Direct Query No Yes - as option Multidimensional/Tabular Advantages Multidimensional Speed Scalability Ease of Use Migration from AS2008 Integration with PowerPivot in Excel Tabular X – When In-Memory X – MOLAP scales more than Vertipaq X – More like relational, DAX like Excel formulas, less tuning needed X – Almost no change X – Uses the same Vertipaq engine Advantages (Continued) Multidimensional Use with Crescent Multidimensional Logic Querying Relational Database – Ease of Use Querying Relational Database - Logic Tabular X – Only option for now X – More with MDX X – Direct Query appears to be easier than ROLAP X – Direct Query supports limited DAX logic Migrating from AS2008 Cubes to 2012 BISM Tabular Model DEMO The Parallel Data Warehouse • Large capacity data warehouse – 100’s of terabytes – Massive Parallel Processing • Sold as an appliance – Software/hardware package – Multiple servers running the SQL Server database engine – Pre-configured, centrally managed, so it is manageable PDW Configuration • Control Rack – – – – Control nodes Management Nodes Landing Zone Backup Nodes • 1-4 Data Racks – Compute Nodes – Storage Nodes PDW Data Racks • Each rack has 10 active nodes and 1 passive node (in case one of the other nodes fails) • Each node has 16 processors • Each node receives 8 distributions (instances of a distribute table) • A full 4 data rack system has 320 distributions – 4 racks X 10 nodes X 8 distributions How the processing is distributed • Replicated Tables – Full copy with all data created on every node – Used for dimension tables • Distributed Tables – Table created on every node, each with a portion of the data – Data divided as evenly as possible • Use a hash function on a key with a large number of values – Used for fact tables (and very large dimension tables) Three Types of PDW Joins • Ultra Shared-Nothing Join – Join made between a distributed table and a repliated table – Fully local on every node • Shared-Nothing Join – Join made between two distributed tables with compatible distribution keys • Redistribution (or Shuffle) Join – Join made between two distributed tables that do not have compatible distribution keys Speed of Joins • At TechNet in May a demo was done comparing a Shared-Nothing Join and a Redistribution Join – 6 billion rows joined with 1.5 billion rows – Only difference between the two demos was that one had compatible distribution keys and the other did not • Shared-Nothing Join took 3 seconds • Redistribution Join took 3 minutes PDW Database Design • If you have a multidimensional data structure (star schema), your design is almost done – Replicate the dimension tables – Distribute the fact tables • If you have one large dimension table, you can distribute the fact tables along the same key as the dimension table – You will still have excellent performance How Do You Get Speed in Retrieving Data? • Create good indexes • Put data into a multidimensional database • Add aggregation tables in the relational database or aggregations in the multidimensional database • Create a better type of index for data retrieval (columnar) • Put all the data into memory and compress it (Vertipaq) Speed – The PDW Solution • Use Massively Parallel Processing – Divide the data into small parts – Retrieve the data from each of the parts – Combine all the results together • MPP gives the most effective result when you have a very large amount of data • And you can still use indexes to improve performance further – Columnar indexes in Denali Using PDW with Analysis Services • Using the multidimensional model with ROLAP • Using the multidimensional model with HOLAP • Using the tabular model with Direct Query Microsoft’s Vision for Cloud Data Services Microsoft Codename “Data Explorer” Add & Manage Data Sources Classify Transform Snapshot Understand Mash up Publish Recommend Cleanse Sell http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx Demo - codename “Data Explorer” DEMO Learn more • Power View – http://joinsights.com/tag/power-view/ • Migrating to BISM Tabular – Link to Tim’s whitepaper • Windows Azure Data Market – https://datamarket.azure.com/ • Parallel Data Warehouse (PDW) – http://www.microsoft.com/sqlserver/en/us/editions/data-warehouse.aspx • Codename “Data Explorer” – http://blogs.msdn.com/b/dataexplorer THANK YOU! For attending this session and PASS SQLRally Nordic 2011, Stockholm